Re: [idn] IDNA: is the specification proper, adequate,and complete? (was: Re: I-D ACTION:draft-ietf-idn-idna-08.txt)

John C Klensin Tue, 11 Jun 2002 17:27:04 -0700

--On Tuesday, 11 June, 2002 23:04 +0000 "Adam M. Costello"
<[EMAIL PROTECTED]> wrote:

> John C Klensin <[EMAIL PROTECTED]> wrote:
> 
>> (i) The specification now appears to say that applications can
>> decide to use IDNA or not.  Presumably, they can decide to use
>> something else instead.
> 
> Paul Hoffman / IMC <[EMAIL PROTECTED]> replied:
> 
>> If either sentence is at all true, we need to fix it.  I
>> don't see any place that says that an application doesn't
>> need to use IDNA for non-LDH domain names.
> 
> You're both right.  John must be referring to this sentence in
> section 1:
> 
>     This document does not require any applications to conform
> to IDNA,     but applications can elect to use IDNA in order
> to support IDN while     maintaining interoperability with
> existing infrastructure.

yes

> So applications can indeed decide not to use IDNA, in which
> case they can't use non-ASCII characters in domain names.  If
> you want to use non-ASCII characters in domain names, IDNA is
> your only option.

IMO, that needs to be said, and said right there and very, very,
explicitly.

>> And I certainly don't see anything that would indicate that
>> the second sentence is true.
> 
> Agreed.  We can't stop applications from using custom mappings
> from non-ASCII text onto domain names, but the rest of the
> world would never see the non-ASCII characters.  The rest of
> the world would simply see RFC 1035 domain names, which
> contain only ASCII characters (and possibly octets 80..FF,
> which are not defined to represent any characters at all).

But we can say, clearly, that any such applications or
activities are non-conforming to IETF standards.  And I believe
we should do so.

> John wrote:
> 
>> It is also not clear whether "an application", as used in the
>> spec, refers to a standard protocol and expectations about how
>> all conforming implementations will behave or to particular
>> implementations and implementation choices.
> 
> Both, I think.  An individual application can unilaterally
> decide to support IDNA, and a spec can opt to require IDNA
> support.

Ok.  I think it should be more clear.  And I need to think
through the cases on this.

>> But, if an application does a DNS lookup for an LDH name
>> (with no prefix) with, say, an MX Qtype, and some of the
>> result data are returned with IDNA-appearing prefixes, then
>> the application needs additional clues as to how to present
>> error messages to the user, how to continue processing in
>> other areas, etc.
> 
> I don't see why.  If the application conforms to IDNA, it will
> work.  If it ignores IDNA and treats the name like any other
> traditional ASCII name, it will work.

True as long as it doesn't support a prefix-based
interpretation, or a different prefix, with a different
interpretation than IDNA.  I took the text to imply that you
were willing to tolerate such things.  If you don't intend to,
the text has to both make clear that they are non-conforming
and, preferably, to damn them roundly and explain why.

>> (ii) The specification seems inconsistent about what it about
>> and who needs to understand it.  It implies in several places
>> that it is not about the DNS and has no impact on the DNS.
> 
> Paul replied:
> 
>> I can't find any place that makes it not about the DNS.
> 
> Perhaps John is referring to this:
> 
>     IDNA does not require any changes to DNS servers,
> resolvers, or     protocol elements

For starters, yes.  I'll try to go back and find the rest of the
text that set me off.  Or have a look at some of Dave's "not
related to the DNS, just 'micro layer'" comments of the last few
days.

> John wrote:
> 
>> But it then makes and imposes normative DNS operational
>> statements. E.g., "Non-ACE labels that begin with the ACE
>> prefix will confuse users and SHOULD NOT be allowed in DNS
>> zones" is certainly a requirement about DNS and DNS zone
>> population.
> 
> You are quite right.  When we wrote "IDNA does not require any
> changes to DNS servers", the word "require" was intended in
> the sense of "depend on".  I have already suggested changing
> it to "depend on" (here and in similar places in the spec).

That would be _much_ better, IMO.

> Although IDNA does impose this new recommendation on DNS
> servers, it would still work if it didn't.
> 
>> (iii) Despite the "applications can choose to do this, or not
>> do it" language, section 6.4 effectively implies a
>> requirement that any application that uses the DNS be
>> upgraded.
> 
> Paul replied:
> 
>> Every application of DNS should be upgraded for every
>> extension or  improvement of the DNS protocols.  Nothing in
>> 6.4 requires this, as  far as I can see.
> 
> I think John must be referring to this:
> 
>     All applications that might show the user a domain name
> obtained     from a domain name slot, such as from
> gethostbyaddr or part of a     mail header, SHOULD be updated
> as soon as possible in order to     prevent users from seeing
> the ACE.

yes

> "SHOULD" is fairly strong, perhaps strong enough that this
> paragraph might be considered inconsistent with this one:
> 
>     This document does not require any applications to conform
> to IDNA,     but applications can elect to use IDNA in order
> to support IDN while     maintaining interoperability with
> existing infrastructure.

yes

> Maybe it would be less contentious if 6.4 merely stated the
> benefits of upgrading applications, leaving the reader
> conclude that doing so is a good idea.

Personally, I would actually prefer that this whole IDN package
come with a plan about how we are going to deprecate the
non-conforming.  The trick we tried to use with ESMTP was to
invent the notion of a "contemporary implementation" and then
talk about what "contemporary" (or "modern", or...) did and how
they behaved.  "Legacy" (or "older", or ...) implementations
could still be said to conform to the old standards, but the
strong implication was that they were not up-to-date if they
weren't prepared to deal with the new stuff.  

I think something like that might help here although, regardless
of my other points and issues, this whole situation feels more
and more as if we ought to try to get an AS out that addresses
relationships among protocols, expectations, and quality of
implementation issues.  I wouldn't mind just stuffing all of
that into IDNA, but it probably isn't the right thing to do (or
fair to you guys).

>> (iv) Section 7 imposes several normative requirements on name
>> servers and zone populations.  Again, these requirements are
>> buried in a document that elsewhere appears to claim that it
>> doesn't impact the DNS.
> 
> First requirement:
> 
>     Internationalized domain name data in zone files (as
> specified by     section 5 of RFC 1035) MUST be processed with
> ToASCII before it is     entered in the zone files.
> 
> This follows directly from section 3 requirement 2.  It is not
> a special requirement on DNS, it is a general requirement on
> any application that elects to use IDNs.  Section 7 is simply
> telling how it applies to DNS. It's still true that IDNA is
> optional; if you don't want to support it, then don't, but
> then you have no way to enter non-ASCII names into zones.

This is, to put it mildly, not clear.  Or not clear enough.  At
least to someone who hasn't been immersed in the document.

> Second requirement:
> 
>     a primary master name server MUST NOT contain an
> ACE-encoded label     that decodes to an ASCII label.
> 
> This is a vacuous requirement.  There is no such thing as an
> ACE label that decodes to an ASCII label, because of the
> design of the ToASCII operation.  Therefore, this is not
> really requiring anything.  I think I once asked for it's
> removal, but since it's harmless, I didn't press it.

Recommendation: Leave it there, since it is harmless and may be
helpful to someone.  But rephrase it, not as a "MUST" (or any
other form of requirement), but as an observation.  E.g., "note
that, since the design of the ToASCII operation prevents any ACE
label from decoding to an ASCII label (i.e., one without any
non-ASCII characters), a primary master name server will
never...".  Or something like that.

> Third requirement:
> 
>> The third paragraph of section 7 seems to contradict or
>> repeal the statements of RFC 2181 that non-ASCII strings are
>> permitted in labels
> 
> Good catch.  We'll have to do something about that.

Yes.  Or Randy and kre will kill you. :-)

>> (v) In particular, several of the statements in the draft go
>> beyond almost anything we have about the DNS in trying to
>> narrowly constrain the behavior of RR labels and data that
>> are not yet defined, even in Classes that are not yet defined.
> 
> You found one such statement, please feel free (and
> encouraged) to enumerate the others.  :)

I'll go back and look.  I think other comments above identify
places where I simply reached different conclusions about what
you meant than you intended.  I think those things are worth
clarifying -- others won't read the thing even as carefully as I
have, must less be as familiar with it as you are.

But I'd still prefer to see either enumerations of where IDNA
can be used, or some much more clear language about its
applicability.  That gets back to the "applications" issue and
the very broad "all labels" and its implications.  See below.

>> IDNA, by specifying nameprep profiling as part of the
>> procedure, violates this principle by effectively requiring a
>> one-off protocol for situations in which nameprep is
>> inappropriate but other IDNA steps would be reasonable.
> 
> Currently, all domain labels are compared the same way.

This is exactly where I get hung up.  Leaving the more subtle
issue that Eric is trying to explore and some of its close
relatives aside for the moment, let's assume that it is rational
to apply IDNA to all domain labels and associated parameters are
all _current_ RRs.  That may be completely reasonable.  But
suppose that, at some point, and just as an example, some
proposal goes forward to use a different Class for some purpose,
or to use the DNS for a different purpose entirely.  Such
transitions involve exactly the sorts of infrastructure pain and
suffering that IDNA was designed to avoid, but it is legitimate
for the community to conclude that they are necessary.  If we do
decide to accept that pain and suffering, then it would be good,
IMO, to be able to use it as an opportunity to clean out, as
much  as possible, things that were done only to preserve
compatibility.  I'd like to give those future designers the
opportunity to make choices about whether to use IDNA (or ACE
generally) to handle internationalization, the ability to think
through whether more of the normalization or matching processes
should be moved to the server, rather than being handled
exclusively in the client, etc.  I don't want (or need) to
predict what they will conclude, but I think it is unnecessary
and dangerous for us, at this stage, to write what seems to be
an "all internationalization uses IDNA, now and forever,
including in RRs and Classes and uses of the DNS we have no way
to anticipate" rule.

So I think we all need to understand the points Eric is trying
to make.  But, even if they are totally irrelevant or we
conclude that they aren't worth the trouble, I believe that we
should confine IDNA application and interpretation to what we
can reasonable claim we know about today and that we should
_explicitly_ leave future use and applicability for decisions to
be made in the future.

> Allowing multiple profiles amounts to allowing multiple
> comparison functions. Eric Hall disputes this, but if you
> stumble across two labels, one containing uppercase Latin
> letter O with umlaut, the other containing lowercase latin
> letter o followed by combining umlaut, and you wonder whether
> they match, what is the answer?  Yes, no, or maybe?  For ASCII
> domain names, there is no maybe.  We deliberately designed
> IDNA so that the same is true for IDNs.

Understood, but not the point.  

What follows is going to be long, and I apologize, but we have
proof that this isn't easy to explain.

I strongly suspect that the problems/ issues that are driving
Eric are different from mine, so please don't take this as
representing (or preempting) his concerns and cases.  At the
risk of exposing another philosophical problem, let me repeat
(or paraphrase) what I said in Minneapolis: for me, the essence
of a good engineering job is to understand _all_ of the
constraints that define the solution space for a problem and
then either find a solution that fits within that solution space
or be very clear to identify solutions that may not quite fit
and it issues that make up the tradeoffs.   Conversely, I
consider that saying "we will define the problem very narrowly
so that we can solve it and then leave the mess that creates for
the rest of the world to work out" is lousy engineering and
worst standardization practice.

Now, that is just my opinion, but it is the foundation for an
explanation, so bear with me.

My guess is that the whole notion of "the registries will have
to figure this out" is going to fail miserably.  I don't think
that is a valid argument for not publishing IDNA (or any of the
rest of these IDN WG things) -- I am speculating, and this is
the reason we have proposed standards and multiple maturity
levels.  But, to the extent to which LDH has advantages, those
advantages have "worked" because they have been enforced by code
in applications, not just by registries.  If we say "open
season, anything can go in, but some names aren't going to
resolve because the registries will prohibit them" we end up in
a much weaker state.  We also end up with increased load on
servers and, unless the state of negative caching improves a
lot, probably disproportionately more load on the root.  So, I
contend there are big advantages to being able to sanity-check
names before going off to the DNS and that the advantages get
larger in the very complex world in which nearly-arbitrary
Unicode strings are permitted as names.

Now, suppose someone comes along and invents a new RR type.
And, in addition to learning from our trying to over-rely on
registres, the nature of that RR type is that, while some
internationalization is appropriate for its labels, performance
--for both successful and unsuccessful queries-- is a really
critical issue, even relative to DNS norms.  And let's suppose
that, after appropriate analysis, the conclusion is that the
characters that should be permitted in labels for that RR type
should be slightly more than the 63 (or 64) of LDH, but many
fewer than "most of Unicode"... say a few hundreds or, at most,
a few thousand, of select characters.  And, whatever characters
are chosen, they are unambiguous, for both technical and
user-perception definitions of ambiguity.

Now, to accomplish that goal (please let's not get started on
whether it is reasonable or not -- this is both future and
hypothetical and I don't know what would cause such an RR to be
created), the best path, given our current framework, would be
to modify stringprep to add a table of the characters permitted
by that RR.  _Then_ the right thing to do is to create a profile
that uses that table instead of some of the maps and
prohibitions of nameprep.  

I am suggesting that (i) it should still be possible to use
IDNA, but with a different profile and (ii) we should not, as a
side effect of the way we specify IDNA today, prevent that RR
from being defined, regardless of whether it can use
IDNA+different profile or whether it needs to use an entirely
different protocol to set up, compare, and map names.

   john

Re: [idn] IDNA: is the specification proper, adequate,and complete? (was: Re: I-D ACTION:draft-ietf-idn-idna-08.txt)

Reply via email to