David Hopwood <[EMAIL PROTECTED]> wrote: > An ACE label means a type 00 domain label that consists of the ACE tag > and an output of the ACE encoding algorithm.
Not quite. The simplest fully-precise definition is: An ACE label is a label that gets altered when ToUnicode is applied to it. The definition in the IDNA draft preceeds the definition of ToUnicode, so it's a little more vague: An ACE label is a label that contains only ASCII characters but represents a label containing non-ASCII characters. Actually, that definition is a tad sloppy regarding equivalence of Unicode strings. For example, if X is an all-ASCII ACE label, then the fullwidth equivalent of X is also an ACE label, even though it's not all-ASCII. If Y is a simple all-ASCII label (without the ACE prefix), and Z is the fullwidth equivalent of Y, then Y might appear to satisfy the definition (Y is an all-ASCII label representing Z, which contains non-ASCII characters), but Y is not an ACE label. So what the definition is trying to say is something like: An ACE label is an ASCII (or ASCII-equivalent) label that represents a non-ASCII (and non-ASCII-equivalent) label. But ASCII-equivalence would be defined in terms of nameprep, so we can't very well put that in the Terminology section. > In fact the ToASCII algorithm appears to be incorrect, i.e. not what > was intended, for two reasons: > - by calling nameprep, it disallows general domain names (i.e. not > hostnames) that contain both octets >= 0x80 and octets that > represent non-LDH ASCII. Currently, nameprep includes one of the host restrictions (the prohibition of non-LDH ASCII characters) but not the other one (the prohibition of leading/trailing hyphens). I have been arguing for the removal of the ASCII characters from nameprep's prohibition table. The way it is now, if you want to use nameprep with the host restrictions, it doesn't get the job done, and if you want to use nameprep without the host restrictions, you can't. ToASCII provides a place (step 3, immediately following nameprep) where the full host restrictions can be applied (if applicable). > The only way to specify case annotation rigourously is for the case > bit of each character to be an output of nameprep. The only way to *implement* mixed-case annotation so that it records the case of the original string is for the case flags to be output from the nameprep implementation. Yes indeed. There's no way around that, because nameprep can alter the length of the string. (Of course an implementation of nameprep can compute and output extra information while still conforming to the nameprep spec.) However, I think mixed-case annotation is *specified* rigorously in the AMC-ACE-Z spec. It tells exactly what the annotations mean, in terms of what they are asking the decoder to do. There are no requirements for creating the annotations; they do not necessarily record the original case; they record recommendations for how the string should be displayed. Presumably, in most cases, encoders will want to recommend that the string to be displayed in a way that looks identical to the original string. But that is not required, and no particular method for creating annotations to accomplish that goal is required. AMC
