Re: PDD 4: Internal data types

Hong Zhang Tue, 06 Mar 2001 12:58:27 -0800
> Unless I really, *really* misread the unicode standard (which is
distinctly
> possible) normalization has nothing to do with encoding,

I understand what you are trying to say. But it is not very easy in
practice.
The normalization has something to do with encoding. If you compare two
strings
with the same encoding, of course you don't have to care about it. But if
you
compare two strings with different encodings (what Perl 6 will do), you have
to care about it. The 6 character "re`sume`" in latin-1 encoding should
equal to 8 characters decomposed unicode string. That is what people would
expect. If the language does not handle it, some library will do it.

> and the encoding
> we choose doesn't make any difference to the character position, string
> length, or ord stuff if we define them to work on characters rather than
> bytes. Which doesn't mean it's not a problem, it's just a different
problem.

Anyway, that is the problem I tried to raise, different problem is still
problem. I am not sure what the character definition you are using. The
single codepoint "e`" can be expressed by two codepoints in unicode.
So the ord("e`") will return different value depending on its own encoding.
All the concept of character position, string length, and ord() stuff
depend on encoding. If Perl 6 uses only one encoding, everything will be
just fine. Otherwise, someone has to handle this problem.

> >Perl users will have to face all kinds of problem when they try to deal
> >with individual characters.
>
> Most won't, honestly. At a guess, 90% of perl's current userbase doesn't
> care about Unicode for any reason other than XML,

I totally agree with you on this. That was not my point. What I tried to
express is what Perl 6 should do for people who do care about it. I like
to see the solution, be it part of language or some unicode library.

Hong
Re: PDD 4: Internal data types

Reply via email to