At 04:34 PM 3/24/2001 -0800, Dave Storrs wrote:
> I'll just toss my 0.01 cents in...my thought here is that this
>thread has now tied up a lot of cycles from a lot of very smart, very
>experienced people without resulting in an answer that is clearly The
>Right Thing. Whatever we do, there is a problem at some point...if we do
>normalizations internally for some functions, then you end up with a
>situation like the code above, which looks like it should produce
>identical input and output files, but won't necessarily. OTOH, if we
>don't do normalizations, then (e.g.) length() can return different values
>for different representations of the same string.
For length, I'd as soon it returned the number of code points, but glyphs
and bytes are also valid return values.
Part of the problem isn't so much an argument over functionality as one of
mapping. We have a number of things we're trying to wedge into one or two
functions, and that's always going to cause problems. (It might just be
that we haven't wrapped our brains completely around the possibility that
we can actually change the language... :)
There's also the problem of extending the functionality without
inconveniencing the current crop of perl programmers. Getting Unicode
working is grand, but we can't make life more difficult for folks that
don't use or need it. And I'm a touch nervous about arranging things so
that perl just happens to Do The Right Thing in most circumstances, like
defining length to return the number of code points, since that'll bite
folks when the accidental functionality fails for some reason.
> My suggestion is, let's punt on this one...make it the
>programmer's responsibility to ensure that Unicode strings are represented
>in the desired way.
For a good bit of this I'd agree completely.
Dan
--------------------------------------"it's like this"-------------------
Dan Sugalski even samurai
[EMAIL PROTECTED] have teddy bears and even
teddy bears get drunk