On Tuesday 05 June 2001 05:49 pm, Simon Cozens wrote:
> YES. Definitely. Same Unicode character, same thing. You wanted something
> else, use a different Unicode character.
I don't understand. There *is* only one character. I can't choose another.
Take 0x0648, for instance. It's both waw, the 27th letter of the Arabic
alphabet, and veh, the 30th letter of the Persian alphabet, which aren't the
same letter. Same character, different letters. Equivalent, or different?
In Unicode, or locale independent, they're the same, I've no problem with
that. Within one locale or the other..... I'm not so sure. I think it
needs to be able to go both ways, with equivalence perhaps being the default.
(Perhaps this need only be so simple as to be able to tag and query (via
attributes, for instance) the language of the string, and handling the logic
yourself. If the languages differ, no sense in comparing, yadda yadda
yadda. Then again, whether it is a difference or not may also be a language
issue. I'd be inclined to think that waw and veh are different, but "Gift"
(in English) and "Gift" (in Gernan) are the same. To me, those are the same
characters and same letters (even though, I guess, technically they are
not), with just different meanings.) In either case (or perhaps it is an
extension of the same case), each locale should be able to specify and
handle its own determination of equivalency.
As I watch everyone talk about the eastern languages, and I think of the
middle eastern languages, I realize what a mess this potentially is. For
the most part, Hong is right - it's for the applications to handle. But I
think that we need to have a clear understanding of what we're asking the
applications to handle, in an effort to make the hard things easy. (And for
some of these languages, it can be quite hard.)
--
Bryan C. Warnock
[EMAIL PROTECTED]