"Erik van der Poel" <[EMAIL PROTECTED]> writes: > GNU libidn handles the case below in the same way as Opera 9 and ICU, > but MSIE 7 and Firefox 2 handle it differently. > > I tried the demo page at http://josefsson.org/idn.php/ ... > > Speaking of U+2024 and where in the protocol stack to handle things, I > just discovered that MSIE 7 and Firefox 2 both perform NFKC on this > character, to yield U+002E (.). After that, they divide the host name > into labels *again*, so the new U+002E becomes a new label separator.
I don't understand what the problem is. I'm not even sure you are claiming there is a problem in libidn? If I invoke: [EMAIL PROTECTED]:~$ idn --debug --quiet foo․bar Charset `UTF-8'. input[0] = U+0066 input[1] = U+006f input[2] = U+006f input[3] = U+2024 input[4] = U+0062 input[5] = U+0061 input[6] = U+0072 tld[0] = U+0066 tld[1] = U+006f tld[2] = U+006f tld[3] = U+002e tld[4] = U+0062 tld[5] = U+0061 tld[6] = U+0072 output[0] = U+0066 output[1] = U+006f output[2] = U+006f output[3] = U+002e output[4] = U+0062 output[5] = U+0061 output[6] = U+0072 foo.bar [EMAIL PROTECTED]:~$ The web page for the same input is: http://josefsson.org/idn.php/?data=foo%E2%80%A4bar&profile=Nameprep&mode=toascii&debug=on&charset=UTF-8&lastcharset=UTF-8 This looks correct to me. What is wrong? > If we ever get around to writing a document about IDNA in HTML, we may > want to make a note of this. I.e. the steps are: > > (1) Divide the domain name into labels by looking for IDNA2003 dots. > (2) Perform Nameprep2003 on each non-ASCII label. > (3) Divide each label into multiple labels, by looking for regular dots. > (4) Perform Punycode2003 on each non-ASCII label. Why not add U+2024 to the list of dot-like code points in RFC 3490 section 3.1 instead? /Simon _______________________________________________ Help-libidn mailing list Help-libidn@gnu.org http://lists.gnu.org/mailman/listinfo/help-libidn