> >Here is an example, "re`sume`" takes 6 characters in Latin-1, but > >could take 8 characters in Unicode. All Perl functions that directly > >deal with character position and length will be sensitive to encoding. > >I wonder how we should handle this case. > > My first inclination is to force normalization on any data we manipulate. That was one of the reasons I proposed UTF-8 string encoding. If we don't do normalization (by keeping multiple encoding), we have to avoid using character position, string length, ord(), since they are encoding specific. Perl users will have to face all kinds of problem when they try to deal with individual characters. In any case, we need to make sure that regex not have any problems with normalization. Hong
- Re: PDD 4: Internal data types Dan Sugalski
- Re: PDD 4: Internal data types Uri Guttman
- Re: PDD 4: Internal data types Hong Zhang
- Re: PDD 4: Internal data types Dan Sugalski
- Re: PDD 4: Internal data types Jarkko Hietaniemi
- Re: PDD 4: Internal data types Dan Sugalski
- Re: PDD 4: Internal data types Hong Zhang
- Re: PDD 4: Internal data types Dan Sugalski
- Re: PDD 4: Internal data types Hong Zhang
- Re: PDD 4: Internal data types Dan Sugalski
- Re: PDD 4: Internal data types Hong Zhang
- Re: PDD 4: Internal data types Dan Sugalski
- Re: PDD 4: Internal data types Hong Zhang
- Re: PDD 4: Internal data types Dan Sugalski
- Re: PDD 4: Internal data types Paolo Molaro
- Re: PDD 4: Internal data types David Mitchell
- Re: PDD 4: Internal data types Dan Sugalski
- Re: PDD 4: Internal data types Nicholas Clark
- Re: PDD 4: Internal data types Hong Zhang
- Re: PDD 4: Internal data types Dan Sugalski
- Re: PDD 4: Internal data types Dan Sugalski