[power-pro] Re: Unicode bugs?

entropyreduction Fri, 14 Aug 2009 09:12:16 -0700

--- In [email protected], "Sheri" <sheri...@...> wrote:
>
> I don't know. Probably could be. I found the "lower case" spelling in one of 
> the demo unicode scripts I think. BTW, those scripts don't currently run 
> without errors.


Okay, will test.
 
> Actually what I observed is that not surprisingly the case modifiers for 
> backreferences in the regex plugin format string do not work for utf8 strings 
> involving non-ascii (above 127) characters. If you remember, we have e.g., 
> $u0 to make $0 in upper case. I will document that the case modifiers should 
> be avoided for utf8 (although as long as modding only among the lower 127 
> characters I think it is ok).
 
> I suppose it would be possible (if you want) to implement a second set of 
> signals in the format string such as x, y, z instead of l, u, t (lower, 
> upper, title). I'm just try to avoid impacting the performance of the case 
> mods for non-utf8 stuff. So the user would include, e.g. $x0 for lower case 
> $0 in utf8. Behind the scene you'd need to convert the backreference from 
> utf8 to unicode, modify the case, and convert back to utf8. Hopefully nothing 
> would be added or lost in translation.

Wouldn't it be simpler just to keep with the current case flags?
Woyld just be a simple extra test if such a flag found ("is unicode present?  
then go thataway).

Seems there ought to be some more direct was to convert case in UTF8,
but a quick search suggested I'd probably have to import a big chunk of code to 
do it.  (see e.g. 

http://bytes.com/topic/c/answers/469334-how-convert-characters-upper-case-utf8-env

> Or, user could do something similar with pcrereplacecallback to implement 
> his/her own case-modded backreferences in a replacement. In a pcrematchall, 
> user could output a vector, and do unicode case modifications on the utf8 
> elements in the vector.
> 
> What do you think?

Probably a lot easier for user if I deal with it withou plugin.

[power-pro] Re: Unicode bugs?

Reply via email to