Re: Should we care much about this Unicode-ish criticism?

Bryan C . Warnock Tue, 05 Jun 2001 14:18:11 -0700
On Tuesday 05 June 2001 03:24 pm, Dan Sugalski wrote:

> > > The second objection is again related to character versus  glyph
> > > issues: since Chinese,
> >
> >I think this problem =~ locale. For any unicode character, you can not
> >properly tell its lower case or upper case without considering locale.
> >And unicode does not encode locale.
>
> Yeah, that is a problem. The alternative isn't any better, unfortunately.
> Human languages are a pain. :)
>
> We're going to need case-translation stuff for perl 6, I think, if lc, uc,
> and its ilk are going to work properly.
>

Yes,  we've discussed this off and on for various things - character class 
identification, sorting, comparison, case-translation.

Where do you draw the line, lines, and/or default line?  I'd like Perl to be 
able to handle textual information, and not just do character manipulation, 
but that doesn't mean at the core level.

Some additional stuff to ponder over, and maybe Unicode addresses these - I 
haven't been able to read *all* the Unicode stuff yet.  (And, yes, Simon, you
will see me in class.)

Some languages don't have upper or lower case.  Are tests and translations 
on caseless characters true or false?  (Or undefined?)  

Should the same Unicode character, when used in two different languages, be 
string equivalent?  

Asciibetical order is one thing, as it (roughly) maps alphabetical order for 
English.  But unless you've been blessed with a root language for Unicode 
mapping (such as Arabic), Unicodical sorting is going to be non-sensical, as 
you hop between your language variants and the characters encoded somewhere 
else (as in Farsi).  And, of course, there are several different orderings 
for eastern glyph languages, IINM.

But I think it'd be too heavy to make Perl inherently locale-aware.  The 
best, I think, would be to have Perl simply be Unicode neutral - to treat 
the characters (with any equivalencies, etc) as just data - and to allow 
locale modules to replace or supplement the ops/functions/* that *is* locale 
aware.

That would allow all the locale-specific handling code to be 
written/debugged/distributed separately from the core on its own timeframe.  
It would ultimately lead to a little more consistency, since everyone can 
use a common handler instead of rolling your own.  No need to have locale 
handlers for locales you won't use.

Of course, being Unicode neutral, that still leaves some stuff (like case 
determination) undefined.  So maybe there should be a default locale in 
place - the current, or barring that, English, I suppose.

-- 
Bryan C. Warnock
[EMAIL PROTECTED]
Re: Should we care much about this Unicode-ish criticism?

Reply via email to