--- In [email protected], "Sheri" <sheri...@...> wrote:
>
> I don't know. Probably could be. I found the "lower case" spelling in one of
> the demo unicode scripts I think. BTW, those scripts don't currently run
> without errors.
Okay, will test.
> Actually what I observed is that not surprisingly the case modifiers for
> backreferences in the regex plugin format string do not work for utf8 strings
> involving non-ascii (above 127) characters. If you remember, we have e.g.,
> $u0 to make $0 in upper case. I will document that the case modifiers should
> be avoided for utf8 (although as long as modding only among the lower 127
> characters I think it is ok).
> I suppose it would be possible (if you want) to implement a second set of
> signals in the format string such as x, y, z instead of l, u, t (lower,
> upper, title). I'm just try to avoid impacting the performance of the case
> mods for non-utf8 stuff. So the user would include, e.g. $x0 for lower case
> $0 in utf8. Behind the scene you'd need to convert the backreference from
> utf8 to unicode, modify the case, and convert back to utf8. Hopefully nothing
> would be added or lost in translation.
Wouldn't it be simpler just to keep with the current case flags?
Woyld just be a simple extra test if such a flag found ("is unicode present?
then go thataway).
Seems there ought to be some more direct was to convert case in UTF8,
but a quick search suggested I'd probably have to import a big chunk of code to
do it. (see e.g.
http://bytes.com/topic/c/answers/469334-how-convert-characters-upper-case-utf8-env
> Or, user could do something similar with pcrereplacecallback to implement
> his/her own case-modded backreferences in a replacement. In a pcrematchall,
> user could output a vector, and do unicode case modifications on the utf8
> elements in the vector.
>
> What do you think?
Probably a lot easier for user if I deal with it withou plugin.