[idn] The purpose of IDN

Dave Crocker Sun, 01 Sep 2002 13:14:39 -0700

At 12:28 AM 9/1/2002 -0400, John C Klensin wrote:
>I believe that IDNA (and the supporting documents) are... a reasonably 
>well-understood solution to _some_ problem. I'm not sure I know what that 
>problem is, who cares about it, and whether it is important enough to 
>justify changes to the way the DNS works and is interpreted.

We should all be *very* worried that this major IETF effort has gone on for 
nearly 3 years, yet such a question can be seriously raised.  (Or rather, 
that such a question legitimately reflects a concern among serious 
participants, as it clearly does.)

This suggests strongly that the charter and the working group process 
adequately characterized *neither* the problem to solve nor the benefactors 
of the solution.

Here is my own effort at doing both:

IDN Scope and Goals
-------------------

1.  The set of characters available for use in domain names is problematic 
for large portions of the global Internet user population.  Many users do 
not use Latin characters *ever*.  Current technology standards permit them 
to use their local set of non-Latin characters for all their Internet 
activities, EXCEPT domain names!  Hence, the set of characters that can be 
used for domain names needs to be increased.

2.  The most immediate need is for support of this increased set of 
characters in email and web domain names.

3.  Domain names are used both by humans and by computers.  The human uses 
occur within free text and even non-computer contexts, such as business 
cards and advertising.  The human benefit of domain names, over such 
alternatives as numeric strings, is mnemonic salience.  The darn things are 
simply easier to remember, though not necessarily easy to guess.

4.  A long history of making changes to Internet infrastructure highly 
recommends finding a way to do this enhancement so that it a) does not 
disturb the installed base, b) interworks with that installed base, and c) 
permits incremental benefit when there is incremental adoption.

Hence, the immediate functional description of IDN is that users will be 
able to register and use non-Latin characters in domain names that are part 
of email and web addresses.  Further, the method of transmitting those 
characters, in the DNS protocol environment, needs to use a layered 
encoding scheme, along the lines of content-transfer-encoding for binary 
data in MIME.

>I know that the IDNA approach, with adequate definition of the characters 
>to be used, will permit internationalization of low-level identifiers.

That is excellent, because that is exactly what it is supposed to 
do.  After all that is what domain names are, albeit identifiers with some 
mnemonic qualities.

>  If that is all
>we care about, then there is a case to be made that it is
>actually more mechanism than is needed: if the same constrained
>processes are to be used to access a name that cause it to be
>created, than the subtle issues of character matching for
>different codepoints may not be relevant.

I really hate to ask this, but I am not even sure what you mean "character 
matching for
different codepoints", unless this is the Unicode equivalent to ASCII 
case-INsensitivity.

If my guess happens to be correct, I admit to continued ignorance about the 
reasonableness of having strings be case insensitive for non-Latin 
characters.  And this is one of those issues that, frankly, simply requires 
a decision.  While case insensitivity can be a Very Good Thing, we have 
plenty of examples of its absence being acceptable, even when it is quite 
inconvenient.

>   By contrast, it is clear that the WG has not solved (Dave would, I 
> think, say that
>it has no scope or charter to even examine) the set of questions 
>associated with accurate transcription of DNS names from other 
>environments and media.

You are prescient.  Cut and paste is a *user interface* issue that already 
exists, far beyond domain names.  And the world already has mechanisms for 
dealing with it.

However well or poorly those mechanisms work, it is not within IETF scope 
to try to alter them.

>   It is equally clear that many people are focused on that problem and 
> won't consider any "DNS internationalization" problem to be solved unless 
> it has some adequate resolution.

This is a good example of the reason "DNS Internationalization" is not a 
useful term.  It is also a good example of the reason the target usage 
scenarios need to be extremely explicit, as I have tried to make them, above.

The IETF has a long history of participants wanting to pursue topics beyond 
what is practical.  The solution is to NOT pursue those topics.

At 08:53 AM 9/1/2002 -0400, vinton g. cerf wrote:
>One working definition of internationalization is that the 
>encoding/expression is "understood" by speakers of all languages.

This highlights why "localization" is probably a much more useful term.

While retaining global interoperability, this domain name enhancement needs 
to permit use of characters that are tailored to smaller communities -- 
where one such "smaller" community is more than a billion people...

>Consequently, someone sending a letter from the US to a recipient in 
>Vietnam can write the destination address in Vietnamese and the US postal 
>service need only understand the characters "VIETNAM" at the bottom of the 
>destination address.

IDNA accomplishes this combination of global "interpretation routing".  In 
fact, it is inherent in domain names, and IDNA strings are valid domain 
names within the current DNS.

So what does IDNA do that might be viewed as a problem?  The answer is that 
an *encoded* IDNA string has no mnemonic value for anyone.  It looks like a 
random string.

IDNA strings that are in Unicode are as mnemonic as ASCII strings, for 
those users who support the relevant Unicode set of characters.  For those 
who do not, they will not see those Unicode characters.  They will see the 
"random" string, which is a valid domain name, but lacks mnemonic benefit.

>multilingual domain names may not necessarily contribute to universal 
>ability to use the resulting strings because it may be difficult to 
>impossible to render or enter arbitrary character sets at the user 
>interface to a local service.

Has MIME's ability to support non-ASCII characters been helpful to the 
overall utility of the Internet?  I claim it has, even though I cannot read 
any of those other characters.  (I am, after all, a typical American....)

The real issue, here, is whether the Internet infrastructure properly 
labels and carries information that can be understood by some users, but 
not others.  Frankly, I see the question of non-Latin characters as being 
the same as using obscure vocabulary.  It is fine to use that vocabulary, 
as long as it works with the intended recipients.

Presumably, no one has a problem with a domain name like 
tak-apa.com.  However, only speakers of Bahasa are likely to know that it 
means "no problem".

We need to make exactly the same distinction between semantic/mnemonic 
utility, versus mechanical utility.  IDNA permits a much broader -- and 
more localized -- range of semantic/mnemonic domain names, while retaining 
all of the necessary global, mechanical interoperability.

>  We have collectively probably created some confusion for ourselves by 
> using the term "internationalized domain names" to cover both concepts.
>  It strikes me that the IDNA documents are more aimed at 
> localization/multilingualization than internationalization, using the 
> "definition" in the first paragraph above.

You are exactly correct.  The problematic nature of the term 
"internationalization" has been discussed before.  I would have wished for 
different terminology, too.

>Concerns about how cut/paste will work are germane to the discussion about 
>the utility of IDNs because such actions may be the ONLY way in which 
>someone may be able to enter special character strings into text intended 
>to represent an IDN.

The technical issues about multi-data-type cut-and-paste are beyond the 
competence of this community.

All we know -- and all we need to know -- is that modern user interfaces 
are quite good at supporting cut and paste of pictures, voice, labeled 
strings, and lots more.  There is no reason to believe that an IDN is even 
slightly difficult for such an environment.

>I usually end up cutting and pasting the characters. This works because 
>the text of email is permitted to be pretty general in its encoding. I 
>don't know how that would work out if I were dealing with non-Latin 
>character sets.

And you do not know that it will NOT work.

We DO know that it needs to work, and we DO know that it is a matter 
entirely within the purview of user interface designers, not protocol geeks.

>One of the important questions that I sense is being asked in the 
>discussion of IDNA is just how applications that encounter these encoded 
>objects/strings should handle them,

Is there some reason that the IETF should pursue this matter any more 
deeply than it has done for MIME?

d/

----------
Dave Crocker <mailto:[EMAIL PROTECTED]>
TribalWise, Inc. <http://www.tribalwise.com>
tel +1.408.246.8253; fax +1.408.850.1850

[idn] The purpose of IDN

Reply via email to