On Tuesday 05 June 2001 05:49 pm, Simon Cozens wrote:
> YES. Definitely. Same Unicode character, same thing. You wanted something
> else, use a different Unicode character.

I don't understand.  There *is* only one character.  I can't choose another. 
 Take 0x0648, for instance.  It's both waw, the 27th letter of the Arabic 
alphabet, and veh, the 30th letter of the Persian alphabet, which aren't the 
same letter.  Same character, different letters.  Equivalent, or different?  
In Unicode, or locale independent, they're the same, I've no problem with 
that.  Within one locale or the other..... I'm not so sure.  I think it 
needs to be able to go both ways, with equivalence perhaps being the default.
(Perhaps this need only be so simple as to be able to tag and query (via 
attributes, for instance) the language of the string, and handling the logic 
yourself.  If the languages differ, no sense in comparing, yadda yadda 
yadda.  Then again, whether it is a difference or not may also be a language 
issue.  I'd be inclined to think that waw and veh are different, but "Gift" 
(in English) and "Gift" (in Gernan) are the same.  To me, those are the same 
characters and same letters (even though, I guess, technically they are 
not), with just different meanings.)  In either case (or perhaps it is an 
extension of the same case), each locale should be able to specify and 
handle its own determination of equivalency.

As I watch everyone talk about the eastern languages, and I think of the 
middle eastern languages, I realize what a mess this potentially is.  For 
the most part, Hong is right - it's for the applications to handle.  But I 
think that we need to have a clear understanding of what we're asking the 
applications to handle, in an effort to make the hard things easy.  (And for 
some of these languages, it can be quite hard.)

-- 
Bryan C. Warnock
[EMAIL PROTECTED]

Reply via email to