Re: Character equivalence mapping (was: Re: [idn] SLC minutes)

Mark Davis Wed, 21 Jul 2004 08:29:50 -0700

This doesn't happen. Take

ÎÏÎÎÎÎÏÎÎÎÎÎ.com
=>
xn--kxaehbdxfck2b6b2d.com
=>
ÎÏÎÎÎÎÏÎÎÎÎÎ.com

See http://oss.software.ibm.com/cgi-bin/icu/idnademo

âMark

----- Original Message ----- 
From: "tedd" <[EMAIL PROTECTED]>
To: <[EMAIL PROTECTED]>
Sent: Wednesday, July 21, 2004 06:38
Subject: Re: Character equivalence mapping (was: Re: [idn] SLC minutes)

To whomever:

Some time ago, I argued (see below) that the
current version (at that time) of nameprep mapped
upper case Greek Letters to lower case.

My comments were basically dismissed as "That's
the way we do it, get used to it!"

Now, I find that the current version of PUNNYCODE
does exactly the opposite than what was claimed.
For example, try entering the code point 2126
(upper case Omega) through --

http://www.imc.org/idna/do-idna.cgi

-- and see what happens. The end result is uppercase and NOT lowercase.

Is this the "new way" to resolve the old issue
discussed? Has the "powers that be" reversed
themselves or did I find an error in PUNNYCODE?

Many thanks for any replies.

tedd

--- as previously stated on this list in January 2002 ---

Mark, john, Edmon:

>1. This issue was debated at length some time ago. I suggest that the
people
>arguing for visual confusability as a criterion for matching look at that
>discussion in detail before proceding.

I'm not arguing (in this debate) the
"look-a-like" position. In other words, it makes
no difference to me if certain glyph's look
identical in numerous char sets. I am arguing the
opposite position -- the characters in my example
don't look a like.

I am arguing the point that the decision "has
been made" to map upper case Greek letters to
lower case letters. For proof, look at the
current version of nameprep (
http://www.imc.org/nameprep/  ) and try running
code point 2126 (upper case omega) through it.
You will find that it IS mapped to code point
03A9 (lower case omega).

My question is "Why?" What's the foundation for
this determination? For what good reason is there
to conclude that the upper case Omega should be
mapped to a lower case omega?

I see no "A.com" to  "a.com" argument/problem
here. Clearly, if someone registered ?.com and
someone else registered w.com there is
significant difference in identification between
the two names. Those two domain names can be
completely unique domain names with no
significant resultant problems. Whereas, in the
Latin char set, I can see the reason for making
W.com and w.com identical (i.e., mapping W to w)
because there is an UC/LC
consideration/distinction in the language. But,
that's not a problem in the Greek char set -- is
it... really?

>(i) From observation, when scripts have two cases, the
>upper-case form is more likely to be highly stylized, and hence
>differentiated from characters in other scripts, than the
>lower-case one.  Hence, if one is going to adopt
>stylization-based (glyph-distinction, if you prefer)
>canonicalization rules, one is better off treating upper case as
>the normal form, rather than lower case.

It looks to me as if someone has already made the
determination to map other languages based upon
the Latin char set UC/LC problem without concern
that other languages may not have the UC/LC
distinction and thus be absent of the UC/LC
problem. I think the Greek example I gave above
sufficiently demonstrates my observation.

tedd

-- 
http://sperling.com

Re: Character equivalence mapping (was: Re: [idn] SLC minutes)

Reply via email to