Re: PDD 4: Internal data types

Simon Cozens Thu, 22 Mar 2001 03:44:45 -0800
On Tue, Mar 06, 2001 at 01:21:20PM -0800, Hong Zhang wrote:
> The normalization has something to do with encoding. If you compare two
> strings with the same encoding, of course you don't have to care about it.

Of course you do. Think about it.

If I'm comparing "(Greek letter lower case alpha with tonos)" with "(Greek
letter lower case alpha)(+tonos)" I want them to compare equal. One string is
normalized, the other isn't; how they're encoded is irrelevant, you still have
to care about normalization. (This is where Perl 5 currently falls over)

Normalization has utterly nothing at all to do with encoding. Nothing.

Now, since we have to normalize strings in some cases (like the comparison
above) when the user hasn't explicitly asked for it, let's not make things
like length() and substr() dependent on whether or not the string is
normalized, eh? The *last* thing I want to happen is this:

    $a = "(Greek letter lower case alpha with tonos)"
    print length $a; # 1
    if ($a eq "(Greek letter lower case alpha)(+tonos)") {
        # (Which it damned well ought to)

        print length $a; # 2! HA! Surprise! $a had to be normalized!
    }

Please see my Unicode RFCs.

-- 
Hanlon's Razor:
        Never attribute to malice that which is adequately explained
        by stupidity.
Re: PDD 4: Internal data types

Reply via email to