IDNA processing

Anne van Kesteren Fri, 12 May 2017 00:46:48 -0700

If this is better suited elsewhere, such as dev-tech-network, please
let me know.


For about five years I've been trying to figure out the IDNA algorithm
that a) browsers follow and b) browsers want to follow, but I've not
had much luck thus far getting folks to reply. E.g.,
https://lists.w3.org/Archives/Public/www-archive/2017Feb/0006.html
went largely unaddressed.

One big difference between http://www.unicode.org/reports/tr46/ and
browsers is how ASCII is handled. Per UTS #46 ASCII is handled the
same as non-ASCII. However, in browsers ASCII takes a "fast path" and
skips the ToASCII algorithm. YouTube now depends on that (it has CDN
domains with hyphens in the third and fourth position, as reported at,
e.g., https://github.com/nodejs/node/issues/12965).

A question I've had is whether we should standardize the fast path or
try to get consistent handling. I've also raised this with the Unicode
folks at 
https://docs.google.com/document/d/11PEww2N0PbXyPhbsCdW_PjD3BNgZMy5XHUv02SSXNqY/edit
and elsewhere and it seems an upcoming draft adds another flag to the
UTS #36 ToASCII algorithms to ignore hyphen requirements.

However, hyphens are not the only requirement that might influence how
a pure ASCII domain is handled and therefore it's unclear it actually
solves the problem, especially if browsers continue to ignore it.

Since I haven't gotten much cooperation my plan is to just standardize
what browsers do (in https://url.spec.whatwg.org/ which ends up
invoking UTS #46 ToASCII) and go from there, especially as not doing
what browsers do tends to break other ecosystems, but I thought I'd
raise this here as a final attempt to get some input.


-- 
https://annevankesteren.nl/
_______________________________________________
dev-platform mailing list
dev-platform@lists.mozilla.org
https://lists.mozilla.org/listinfo/dev-platform

IDNA processing

Reply via email to