> (3) American-biased equivalences according to Mark Davis's UTR 21, > which is _not_ part of the Unicode standard.
(a) These are not American-biased equivalences. (b) It is hardly "my" UTR. I'm the author, but the content is produced under direction of the UTC and is approved at every stage by the UTC. (c) UTR 21 was approved by the UTC for incorporation into Unicode 3.2. Its new status will be reflected in Unicode 3.2, which will be final very shortly. Mark ————— Πόλλ’ ἠπίστατο ἔργα, κακῶς δ’ ἠπίστατο πάντα — Ὁμήρου Μαργίτῃ [For transliteration, see http://oss.software.ibm.com/cgi-bin/icu/tr] http://www.macchiato.com ----- Original Message ----- From: "D. J. Bernstein" <[EMAIL PROTECTED]> To: <[EMAIL PROTECTED]> Sent: Monday, February 11, 2002 18:50 Subject: Re: Inputting mixed SC/TC (Re: [idn] A question...) > Adam M. Costello writes: > > The reason IDNA does case-folding is to be consistent with the existing > > standard for domain names, which says they are case-insensitive. > > What the existing standard actually says is ``domain name comparisons > for all present domain functions are done in a case-insensitive manner, > assuming an ASCII character set, and a high order zero bit.'' > > Similarly, the Internet mail standards specifically require that bytes > in message headers---including domain names---be interpreted as ASCII > characters. > > Complete consistency with the existing standards would mean continuing > to use only bytes 0-127, continuing to interpret those bytes as ASCII, > and continuing to compare names as case-insensitive ASCII names. > > But we don't _want_ to follow those rules. We want to see glyphs that > simply aren't available in the ASCII character set. > > Of course, we have to maintain INTEROPERABILITY with all strings used > today, so we'll have to continue accepting A-Z and a-z as equivalent. > But there are many possible equivalence rules for non-ASCII strings. > Here are several examples---certainly not a complete list: > > (1) Exactly what software uses now: no equivalences outside ASCII. > > (2) Equivalence of characters that have duplicate glyphs but that > were kept separate by Unicode for one of the reasons described in > http://www.unicode.org/unicode/standard/where. > > (3) American-biased equivalences according to Mark Davis's UTR 21, > which is _not_ part of the Unicode standard. > > (4) German equivalences: for example, o-umlaut equivalent to oe, and > the German ss equivalent to the two-byte Latin sequence SS, which > in turn is equivalent to the two-byte Latin sequence ss. > > (5) Hebrew equivalences: for example, aleph-bar equivalent to aleph. > > (6) Various Chinese equivalences for the benefit of Chinese users. > > (7) Some combination of the above. > > All of these are INTEROPERABLE with the existing use of ASCII. None of > them are CONSISTENT with the existing standards. One of them, #1, has > the advantage of being by far the easiest to implement---but provides > the most opportunities for confusion and fraud. > > What exactly is the rational line between, for example, #3 and #4? For > ASCII characters they both boil down to A-Z matching a-z. Why is #3 a > better extension of the current situation than #4, or #3+#4? > > James Seng states that #6 is pointless because ``domain names are > identifier ... should enter into the computer exactly as they seen it or > reference it.'' Under exactly the same principle, #3 and #4 and #5 are > all pointless, so IDNA has no excuse for the costs of #3. > > Another approach, allowing the software simplicity of #1 but eliminating > user confusion, is to allow _selected_ non-ASCII characters. We don't > have to map all characters to the selected set; we simply have to make > sure that the selected characters won't be confused by the users. This > neatly dodges the difficulty of defining a broad equivalence rule. > > The decisions here have to be based on rational assessments of costs and > benefits. Costello's notion of ``consistency'' is obviously not helpful: > it leads to such huge costs for Chinese users that it has already drawn > objections from _three hundred_ people. > > ---D. J. Bernstein, Associate Professor, Department of Mathematics, > Statistics, and Computer Science, University of Illinois at Chicago > >
