> >Here is an example, "re`sume`" takes 6 characters in Latin-1, but
> >could take 8 characters in Unicode. All Perl functions that directly
> >deal with character position and length will be sensitive to encoding.
> >I wonder how we should handle this case.
> 
> My first inclination is to force normalization on any data we manipulate.

That was one of the reasons I proposed UTF-8 string encoding. If we don't
do normalization (by keeping multiple encoding), we have to avoid using
character position, string length, ord(), since they are encoding specific.
Perl users will have to face all kinds of problem when they try to deal
with individual characters.

In any case, we need to make sure that regex not have any problems with 
normalization.

Hong

Reply via email to