[ When i read IDNA draft today, I still can't find the answer from it for the following question about IDN label length. If the following issue is already addressed in the draft, please correct me. ]
I have a punycode label of length 63 octets: L1: zq--o39AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA L2=ToUnicode(L1) produces: U+AC00 x 56 times ( Hangul "KA" repeated 56 times) L2: U+AC00 U+AC00 U+AC00 U+AC00 U+AC00 U+AC00 U+AC00 U+AC00 U+AC00 U+AC00 U+AC00 U+AC00 U+AC00 U+AC00 U+AC00 U+AC00 U+AC00 U+AC00 U+AC00 U+AC00 U+AC00 U+AC00 U+AC00 U+AC00 U+AC00 U+AC00 U+AC00 U+AC00 U+AC00 U+AC00 U+AC00 U+AC00 U+AC00 U+AC00 U+AC00 U+AC00 U+AC00 U+AC00 U+AC00 U+AC00 U+AC00 U+AC00 U+AC00 U+AC00 U+AC00 U+AC00 U+AC00 U+AC00 U+AC00 U+AC00 U+AC00 U+AC00 U+AC00 U+AC00 U+AC00 U+AC00 But this L2 can be encoded in various unicode/legacy encodings into various lengths of octets: UTF8 : 3 x 56 = 168 octets UCS2 : 2 x 56 = 112 octets UCS4 : 4 x 56 = 224 octets KSX1001/EUC-KR : 2 x 56 = 112 octets These encodings produce labels longer than 63 octets Moreover, each ACE label of valid (<256 octets) ACE-form FQDN IDN may be converted into below-63-octets valid UTF8 labels, while the cumulative sum of the length of each UTF8 label of the FQDN IDN may exceed 256 octets limits. Many internet applications impose/assumes the 63-octets-limit of label lengths. IF this assumption is violated, the label will be regarded as invalid labels, and produce unpredictable errors by some implementations. From implementators' point of view, more precise specificiation is needed about whether IDN label/FQDN has *NEW* length restrictions in various char encodings, if IDNA tries to extend the character repertoires of allowable characters. The above case is very rare, but in any cases, the implementors have practical security-related need to impose some limits on the iDN lables in non-ACE encodings. (for example, to avoid buffer overflow errors due to expanded ToUnicode labels) Cheers, Soobok Lee
