James Mitchell wrote:

> As a thought, consider names written in the Arabic script.

Arabic? Isn't Latin/French localization much more than enough
to show IDNs are not operational.

If not...

> Being
> a cursive script, how is a TLD applicant expected to separate
> 'words' in a top level domain without the use of a hyphen or
> equivalent.

> Removing the spaces will cause the characters to join, and the
> meaning lost (besides which the A-label will contain hyphens
> anyway?!).

> I'm far from qualified to talk authoritatively on the Arabic
> script, or how it will be used in domain names, however I do
> know of parties that will be applying for Arabic TLDs - are
> we to exclude these applicants?

As is explained in wikipedia

   http://en.wikipedia.org/wiki/Arabic_alphabet

arabic letters have four forms, "isolated", "end", "middle"
and "beggining" and the form is determined by the location
of a letter within a word. Thus, they are not different
from Latin distinctions of capital/small letters, which is
determined by the location of a letter within a sentence.

So, it is an extended case insensitivity problem actively
ignored by people working on Unicode.

Moreover, as the wikipedia entry says:

   For compatibility with previous standards, all these forms can
   be encoded separately in Unicode; however, they can also be
   inferred from their joining context, using the same encoding.
   The following table shows this common encoding, in addition to
   the compatibility encodings for their normally contextual
   forms (Arabic texts should be encoded today using only the
   common encoding, but the rendering must then infer the joining
   types to determine the correct glyph forms, with or without
   ligation).

while separate code points of Unicode are used between different
forms, extended case insensitivity between forms of Arabic
characters is not specified at all.

The situation is even more confusing than Latin/French, which
is why I wrote:

   It should be noted that, extended case insensitivities beyond
   European characters, such as correspondence between Chinese
   ones, the problem is even more unsolvable.

It is trivially easy to declare some broken specification as
IDN 2008 or something like that, but it does not make localized
domain names operational.

                                                Masataka Oh
_______________________________________________
DNSOP mailing list
DNSOP@ietf.org
https://www.ietf.org/mailman/listinfo/dnsop

Reply via email to