Re: IDNA processing
On 18/05/17 14:14, Anne van Kesteren wrote: > That's fairly non-specific, unless you really mean that you don't want > "A" lowercased. Well, yes, as you note, with UTS#46 or whatever it is. > I don't think it's that big, there's plenty of other things disallowed > that we should always be able to find something, if it comes to that. Then I think whatever you decide about how to deal with ASCII fast paths is fine. :-) Gerv ___ dev-platform mailing list dev-platform@lists.mozilla.org https://lists.mozilla.org/listinfo/dev-platform
Re: IDNA processing
On Mon, May 15, 2017 at 3:38 PM, Gervase Markham wrote: > Well, you generally know my opinion :-) IDNA 2008 non-transitional. That's fairly non-specific, unless you really mean that you don't want "A" lowercased. > ISTM that the 3rd/4th placed hyphens were banned so the domain name > system had an extension mechanism, and that was used for IDNA (xn--). If > we allow domains of that form, we no longer have that extension > mechanism. The question is, how big a loss is that? I don't think it's that big, there's plenty of other things disallowed that we should always be able to find something, if it comes to that. I ended up doing some more work on this and created a bunch of tests and filed various browser bugs. Not applying ToASCII to ASCII-only input is its own can of worms, due to input such as "xn--a" then round-tripping poorly. https://github.com/whatwg/url/issues/267 has the relevant details and pointers. -- https://annevankesteren.nl/ ___ dev-platform mailing list dev-platform@lists.mozilla.org https://lists.mozilla.org/listinfo/dev-platform
Re: IDNA processing
On 12/05/17 08:46, Anne van Kesteren wrote: > For about five years I've been trying to figure out the IDNA algorithm > that a) browsers follow and b) browsers want to follow, but I've not > had much luck thus far getting folks to reply. E.g., > https://lists.w3.org/Archives/Public/www-archive/2017Feb/0006.html > went largely unaddressed. Well, you generally know my opinion :-) IDNA 2008 non-transitional. > One big difference between http://www.unicode.org/reports/tr46/ and > browsers is how ASCII is handled. Per UTS #46 ASCII is handled the > same as non-ASCII. However, in browsers ASCII takes a "fast path" and > skips the ToASCII algorithm. YouTube now depends on that (it has CDN > domains with hyphens in the third and fourth position, as reported at, > e.g., https://github.com/nodejs/node/issues/12965). ISTM that the 3rd/4th placed hyphens were banned so the domain name system had an extension mechanism, and that was used for IDNA (xn--). If we allow domains of that form, we no longer have that extension mechanism. The question is, how big a loss is that? Gerv ___ dev-platform mailing list dev-platform@lists.mozilla.org https://lists.mozilla.org/listinfo/dev-platform
IDNA processing
If this is better suited elsewhere, such as dev-tech-network, please let me know. For about five years I've been trying to figure out the IDNA algorithm that a) browsers follow and b) browsers want to follow, but I've not had much luck thus far getting folks to reply. E.g., https://lists.w3.org/Archives/Public/www-archive/2017Feb/0006.html went largely unaddressed. One big difference between http://www.unicode.org/reports/tr46/ and browsers is how ASCII is handled. Per UTS #46 ASCII is handled the same as non-ASCII. However, in browsers ASCII takes a "fast path" and skips the ToASCII algorithm. YouTube now depends on that (it has CDN domains with hyphens in the third and fourth position, as reported at, e.g., https://github.com/nodejs/node/issues/12965). A question I've had is whether we should standardize the fast path or try to get consistent handling. I've also raised this with the Unicode folks at https://docs.google.com/document/d/11PEww2N0PbXyPhbsCdW_PjD3BNgZMy5XHUv02SSXNqY/edit and elsewhere and it seems an upcoming draft adds another flag to the UTS #36 ToASCII algorithms to ignore hyphen requirements. However, hyphens are not the only requirement that might influence how a pure ASCII domain is handled and therefore it's unclear it actually solves the problem, especially if browsers continue to ignore it. Since I haven't gotten much cooperation my plan is to just standardize what browsers do (in https://url.spec.whatwg.org/ which ends up invoking UTS #46 ToASCII) and go from there, especially as not doing what browsers do tends to break other ecosystems, but I thought I'd raise this here as a final attempt to get some input. -- https://annevankesteren.nl/ ___ dev-platform mailing list dev-platform@lists.mozilla.org https://lists.mozilla.org/listinfo/dev-platform