Re: Unicode handling

Dan Sugalski Mon, 26 Mar 2001 09:55:12 -0800
At 04:34 PM 3/24/2001 -0800, Dave Storrs wrote:
>         I'll just toss my 0.01 cents in...my thought here is that this
>thread has now tied up a lot of cycles from a lot of very smart, very
>experienced people without resulting in an answer that is clearly The
>Right Thing.  Whatever we do, there is a problem at some point...if we do
>normalizations internally for some functions, then you end up with a
>situation like the code above, which looks like it should produce
>identical input and output files, but won't necessarily.  OTOH, if we
>don't do normalizations, then (e.g.) length() can return different values
>for different representations of the same string.

For length, I'd as soon it returned the number of code points, but glyphs 
and bytes are also valid return values.

Part of the problem isn't so much an argument over functionality as one of 
mapping. We have a number of things we're trying to wedge into one or two 
functions, and that's always going to cause problems. (It might just be 
that we haven't wrapped our brains completely around the possibility that 
we can actually change the language... :)

There's also the problem of extending the functionality without 
inconveniencing the current crop of perl programmers. Getting Unicode 
working is grand, but we can't make life more difficult for folks that 
don't use or need it. And I'm a touch nervous about arranging things so 
that perl just happens to Do The Right Thing in most circumstances, like 
defining length to return the number of code points, since that'll bite 
folks when the accidental functionality fails for some reason.

>         My suggestion is, let's punt on this one...make it the
>programmer's responsibility to ensure that Unicode strings are represented
>in the desired way.

For a good bit of this I'd agree completely.

                                        Dan

--------------------------------------"it's like this"-------------------
Dan Sugalski                          even samurai
[EMAIL PROTECTED]                         have teddy bears and even
                                      teddy bears get drunk
Re: Unicode handling

Reply via email to