tedd <[EMAIL PROTECTED]> wrote: > in IDNA the Tilde (code point 007E) is prohibited, but the Tilde > Operator (code point 223C) is not.
IDNA inherits the prohibition of U+007E from RFC-1123 (STD-3), which by reference to RFC-952 defined host names as ASCII strings containing only A-Z, a-z, 0-9, hyphen-minus, and dot. Therefore some ASCII characters were explicitly allowed, all other ASCII characters were explicitly forbidden, and non-ASCII characters were not even in the realm of possibility. In order to extend the notion of host name to non-ASCII strings, we needed to keep the existing prohibitions on ASCII characters in host names (otherwise it wouldn't be a proper extension), but the rules for non-ASCII characters were up to the working group to define. The consensus was to allow all non-ASCII Unicode graphic characters (perhaps because the group could never have reached agreement on any particular non-empty set of prohibited graphic characters). > Considering that keyboard space is at a premium, why isn't code point > 007E mapped to 223C in PUNYCODE? Punycode accepts and supports all Unicode characters, including non-graphic characters and all ASCII characters, including U+007E. It does no mapping. All mapping and prohibition are done at higher layers. I supposed you could instead ask why tilde isn't mapped to tilde operator in Nameprep. The mapping step in Nameprep was designed to avoid alternate representations of the same characters, and to erase case distinctions, not to save typing. Tilde and tilde operator are entirely distinct characters according to the Unicode spec (and if we had decided not to accept the Unicode spec at face value, we'd still be arguing about what maps to what). If tilde operator is too difficult to type, then don't register domain names containing it. We made one concession for ease of typing, for dot, only because all domain names (except TLDs) are *required* to contain dots, and dots can be cumbersome to type for the huge number of CJK users. The mapping from ideographic full stop to dot is not done in Nameprep, which sees only individual labels, not the separators between them, but at a higher layer that divides the domain name into labels, converts them independently, and glues them back together. AMC
