Martin Duerst <[EMAIL PROTECTED]> writes: >>The page is served in the charset you select. > > No. When I go to http://josefsson.org/idn.php, I didn't > select iso-8859-1. In fact, my main browser (Netscape 7) sends you > Accept-Charset: UTF-8, * > in its HTTP headers. For the others I used, Opera7 sends > Accept-Charset: windows-1252,utf-8,utf-16,iso-8859-1;q=0.6,*;q=0.1 > whereas IE6 and Tango don't send anything. Both Netscape 7 > and Opera clearly express a preference for UTF-8 over iso-8859-1.
All your clients support ISO-8859-1, and it makes things easier for me (I don't have fully working UTF-8 editor on the web server host), so ISO-8859-1 seemed like a good choice. ISO-8859-1 is mentioned more prominently than UTF-8 in HTML 3.2 and HTTP 1.1, so it should be more interoperable too. But okay, I have changed to UTF-8 by default. >>Standards compliant browsers handle charset conversions in copy/paste. > > Well, yes, they handle character encoding conversion in copy/paste. > They convert from the encoding used in the clipboard to their > internal (unicode-based) encoding. That's why > you should avoid confusing the user with 'charset' stuff. It is a debugging tool for libidn, I expect users to at least be aware of charset stuff. As it happens, one of the browsers I use to test the page uses mule for the internal encoding, and has incomplete unicode support, so moving the page to UTF-8 forces me to select ISO-8859-1 and reload the page. Honestly, the only advantage of moving to UTF-8 for the page I can see is if there is a browser out there that supports UTF-8 but not ISO-8859-1. Since HTML 3.2 and HTTP 1.1 uses ISO-8859-1 that browser would be broken anyway. > "The following string must only contain characters that can be > represented in ISO-8859-1." Thanks, I'm using it now. > But there are cases where you produce garbage on your own. > For example, if I input ü<u">.josefsson.org, > ü followed by an actual u-umlaut, (where <u"> is actually an u-umlaut), > and switch on UseSTD3ASCIIRules, I get: > /usr/local/bin/idn: idna_to_ascii_from_locale() failed with error 3. > which I guess means IDNA_CONTAINS_LDH = 3. > Now if I switch off UseSTD3ASCIIRules and use the same input, > what I see as a result is xn--<u">-8ya.josefsson.org. The correct > result is of course ü�.josefsson.org, which is in the > source, but not visible. So you have to fix the source to > be xn--&uuml;-8ya.josefsson.org. Ah, true. I'll see if there is a handy function for escaping...
