Hello Adam, I understand that the documents have been approved by the IESG, so at this stage, changes are not appropriate. But maybe some of the changes can be done in the next stage (draft standard,...).
At 23:23 02/11/27 +0000, Adam M. Costello wrote:
Martin Duerst <[EMAIL PROTECTED]> wrote:
> I don't think this is a very good idea to use the U+ for > distinction, for the following reasons: > > 1) The u+ -> lower case, U+ -> upper case is not documented anywhere > in the punycode draft (or at least I didn't find it). If used at > all, it should be documented straight at the start of the examples. It is not documented in the spec because it is not a feature of Punycode. The Punycode algorithm inputs and outputs code points, which are numbers. It does not input or output "u+".
I agree that documenting it in the normative part of the spec would be a bad idea. But what I was proposing was that it be mentioned in the examples section.
The sample implementation inputs and outputs "u+". Therefore the use of the u as a 1-bit annotation is mentioned in the documentation of the sample implementation, which is embedded in the source code (you can either read the source code of the usage() function, or run the program with no arguments).
I agree with what Kent Karlson said on this.
I tried to downplay mixed-case annotation as much as possible in the draft, because Paul and Patrik have never liked it. I'll ask them if they think the Examples section should call attention to the mixed-case annotations.
I agree with downplaying. But then the best thing would be to not use it in the examples section. The alternative is to mention them in the examples section.
> 2) The above convention is very easy to overlook, in particular > because u+ and U+ look so very similar. It is close to a widely > established convention, but differs slightly. I'm curious--can you explain or point me to that convention?
The convention is the very widely established convention of writing Unicode/UTC codepoints with an U+ followed by 4 to 6 digits. In that convention, the U is always upper case. For many people used to this convention, this convention is so well established that it's very easy to overlook the lower-case u+. That happened to two of us on Tuesday, and it took quite some time for us to figure out what was actually going on.
> 3) Punycode can be used in different ways, on mixed strings, on lc > strings that still contain the original casing info, and on pure > lc strings. Maybe there should be separate examples for all these > three uses. Long ago some of my drafts included explanations of various scenarios, some of which are not applicable to domain names. Since then, we agreed that the Punycode draft should present itself as a piece of IDNA which, by the way, could perhaps be useful outside IDNA; not as as a general encoding that happens to be used in IDNA. The goal was to avoid confusing implementors of IDNA. So I cut out all but the essentials of mixed-case annotation, and confined its description to a single appendix (and the sample implementation).
I think that basically, that was the right thing to do.
Maybe, if/when there is real interest in using mixed-case annotations and/or using Punycode for things other than domain names, we could update the spec, or augment it with a separate document.
There is definitely no interest from my side for using Punycode for things other than domain names, and therefore no interest for mixed-case annotations. Regards, Martin.
