[idn] WG last call documents

Eric A. Hall Sat, 09 Feb 2002 11:52:14 -0800


This is commentary on


  draft-hoffman-stringprep-00.txt
  draft-ietf-idn-nameprep-07.txt
  draft-ietf-idn-idna-06.txt


draft-hoffman-stringprep-00.txt:

 | 4. Normalization

 | [[[[[ Patrik asks: Is not addition of unassigned codepoints regarded
 | an update of UAX15? This is relevant here. ]]]]]

embedded comments need to be addressed or removed

Overall, I think that stringprep needs to define a data structure for
profiles and also needs to discuss version negotiation, both of which are
needed to facilitate programmatic upgrades. This means defining a
standardized character notation scheme to be used by all profiles, a file
structure (version tags, comment notation, line syntax, EOL markers, etc),
and so forth.

This also means describing suggested common mechanisms for retrieving
profiles. One obvious method would include "will be made available via FTP
by IANA", for example. Encouraging vendors to "provide updates in routine
patch kits" is another. Protocols can extend on that as needed as well,
(it is feasible that DNS could even use zone files and the zone transfer
services to retrieve version 1.1 of the subordinate stringpreps). The
point here would be to encourage rapid adoption of updated profiles,
through as many mechanisms as possible. No method will work 100% of the
time, so find five mechanisms that each work 60% of the time.

The hole I am concerned with here is with future mappings, where 'A' is
currently assigned but it is not currently mapped to 'a', and where a
future version creates such a mapping. In that case, the client will
generate 'A' as unmapped, the server will respond with an error or with
'a', either of which would trigger a failure on the client. Consider
ISO/IEC 8859-1 and 8859-15, where the former had lowercase y with
diaeresis and where the latter defines an uppercase y with diaeresis. It
seems likely that this kind of disjointed mapping will happen again, so we
need to be thinking about ways to minimize the disjointedness.

Secondarily, the IETF has had the lesson of versioning beaten into it over
the last few years, and it seems odd not to address this issue head-on in
the controlling spec.


draft-ietf-idn-nameprep-07.txt:

 | Stringprep Profile for Internationalized Host Names

Since this profile is specifically for host names, all of the rules that
apply to hostnames should be placed here. Consider that there are apps and
protocols which may use these rules to build internationalized host names
but will not use IDNA (either because they do not need IDNA conversion for
their data-path or they use something other than IDNA to encode the
names). Those services will need access to the complete set of rules.
Putting some of the rules here and some of the rules in IDNA essentially
means that every application that wants to build an i18n hostname has to
use both specifications, even though the "profile" is the only document
that a reasonable person should be expected to know about.

As it stands, this profile is incomplete without IDNA, but it does not
reference IDNA as a mandatory component (nor should it).

Furthermore, any subsequent work in this field will almost certainly be
required to obsolete this profile in order to clarify these issues, which
reinforces the need for versionin in stringprep.


draft-ietf-idn-idna-06.txt:

 | Abstract

 | This document defines internationalized domain names (IDNs)

That definition should be in a separate document, specifically either in
nameprep or in a stringprep profile that defines i18n "domain names"
rather than hostnames.

IDNA should "define a mechansim for encoding internationalized domain name
labels into STD13 labels".

 | 2 Terminology

 | The "ACE prefix" is defined in this document to be a string of ASCII
 | characters that appears at the beginning of every ACE label. It is
 | specified in section 5.

I thought the idea behind adopting "punycode" was to avoid the use of the
generic "ACE" term?

 | 3. Requirements

 | IDNA conformance means adherence of the following three rules:

 | 1) Whenever a domain name is put into a generic domain name slot,
 | every label MUST contain only ASCII characters. 

"generic domain name slot" should be substituted with "STD13 slot". There
is no reason whatsoever that application protocols cannot exchange IDNs or
IHNs in whatever form they wish.

 | 2) ACE labels SHOULD be hidden from users whenever possible. 

I have argued against this before and I will do so again now.

Should WHOIS servers perform this conversion on output (email addresses
and nameservers) since it is likely to be displayed to a user? Should an
email client perform conversion on a message? Every protocol and
data-format has unique considerations. Making blanket statements like this
is harmful.

CONVERSION MUST ONLY OCCUR WHERE A PROTOCOL OR DATA-FORMAT EXPLICITLY
DEFINES THE BEHAVIOR. MANDATORY CONVERSION WILL BREAK THINGS IF THERE IS
NOT AN AGREED-UPON METHOD FOR CONVERSION SPECIFIC TO THE APPLICATION.

| 4.1 ToASCII
| 4.2 ToUnicode

Add a disclaimer to the effect of ~"these routines MUST NOT be used except
where the data is known to contain an i18n domain name label". Trying to
pipe all Unicode data through ToASCII will be harmful, to say the least.
Trying to pipe an STD13 binary domain name through ToUnicode may result in
some false-positives.

 | B. Design philosophy

What value does this section provide? This text is religion not science,
seeing as how it omits key counter-arguments, namely that there are
hundreds if not thousands of applications for every DNS server, so
upgrading servers in addition to applications represents an incremental
cost (rather than the "The cost of implementing IDN may thus be much
lower, and the speed of implementation could be much higher" erroneous
opinion). Other aspects (such as ~"no change to protocols") are just as
wrong (protocols MUST BE CHANGED FOR CONVERSION TO BE PROPERLY MANAGED),
if not argumentative. This section is non-contributory to say the least,
and possibly even a hindrance to future developments, and has no place in
a technical specification such as this.


General comments:

In general, I think that there is a tremendous amount of architectural
work that needs to be done here. We need to strongly and clearly define
the different domain name data-types that are in use, make up stringprep
profiles for them, impose versioning controls on stringprep profiles, and
tighten up the IDNA spec. I think at this point that IDNA is pretty close
to where it needs to be, but it is still far too grandiose. It should only
define an encoding for specific data-types, and it should not encourage
any interpretative behavior on the applications.

One specific item, there does not seem to be a prohibition against certain
characters in the first position of a domain name label. There is a minor
prohibition in IDNA, but there needs to be an exception list in nameprep
that covers the characters we discussed in the WG earlier, which are
punctuation, diacritical marks, or dashes. We discussed this at length on
the mailing list.

"Adam M. Costello" wrote in <[EMAIL PROTECTED]>:

> There are some code points that are prohibited in host names, but not
> in all textual domain names.  The underscore is the best example.
> These prohibitions belong in ToASCII.

I disagree with this for a couple of reasons. First of all, these
exceptions are associated with protocol identifier labels, which are not
hostnames. Nameprep defines i18n hostnames, so these characters belong in
the nameprep prohibition. Secondarily, "protocol identifiers" are
generally considered as protocol operators which do not need to be
internationalized (therefore never needing anything but ASCII). This is
why I defined them as a separate data-type (and why I think they need
their own stringprep profile). See BCP18:

 | 2.  Where to do internationalization
 | 
 | Internationalization is for humans. This means that protocols are not
 | subject to internationalization; text strings are. Where protocol
 | elements look like text tokens, such as in many IETF application
 | layer protocols, protocols MUST specify which parts are protocol and
 | which are text. [WR 2.2.1.1]

SRV owner labels are protocol elements, not text. They should never go
through any conversion. A dedicated profile for this would help.

I agree with many of the other comments which have been made. I think that
a parent i18n "domain name" stringprep profile can describe certain
generic characteristics (such as mapping of full-stop characters) that the
label-centric profiles cannot.

DJ Bernstein is right-on when he says that certs don't solve the spoofing
problem. There is nothing preventing the wiley hacker from getting a cert
for [EMAIL PROTECTED], and there would be nothing about that cert which
would alert anybody that spoofing had occurred. In general terms, this
means that I agree with DJB that the initial scope should be tightly
controlled and slowly incremented. I think that this opinion was also
expressed by J Klensin's solicitation for a "safe-set" and D Crocker's
endorsement of a safe-set. This doesn't mean we all agree with DJB, but I
think that in this case, he is probably correct. A stronger versioning
system in Stringprep would make this a feasible approach, I think.

-- 
Eric A. Hall                                        http://www.ehsco.com/
Internet Core Protocols          http://www.oreilly.com/catalog/coreprot/

[idn] WG last call documents

Reply via email to