Re: IDNA processing

2017-05-18 Thread Gervase Markham
On 18/05/17 14:14, Anne van Kesteren wrote:
> That's fairly non-specific, unless you really mean that you don't want
> "A" lowercased.

Well, yes, as you note, with UTS#46 or whatever it is.

> I don't think it's that big, there's plenty of other things disallowed
> that we should always be able to find something, if it comes to that.

Then I think whatever you decide about how to deal with ASCII fast paths
is fine. :-)

Gerv
___
dev-platform mailing list
dev-platform@lists.mozilla.org
https://lists.mozilla.org/listinfo/dev-platform


Re: IDNA processing

2017-05-18 Thread Anne van Kesteren
On Mon, May 15, 2017 at 3:38 PM, Gervase Markham  wrote:
> Well, you generally know my opinion :-) IDNA 2008 non-transitional.

That's fairly non-specific, unless you really mean that you don't want
"A" lowercased.


> ISTM that the 3rd/4th placed hyphens were banned so the domain name
> system had an extension mechanism, and that was used for IDNA (xn--). If
> we allow domains of that form, we no longer have that extension
> mechanism. The question is, how big a loss is that?

I don't think it's that big, there's plenty of other things disallowed
that we should always be able to find something, if it comes to that.


I ended up doing some more work on this and created a bunch of tests
and filed various browser bugs. Not applying ToASCII to ASCII-only
input is its own can of worms, due to input such as "xn--a" then
round-tripping poorly. https://github.com/whatwg/url/issues/267 has
the relevant details and pointers.


-- 
https://annevankesteren.nl/
___
dev-platform mailing list
dev-platform@lists.mozilla.org
https://lists.mozilla.org/listinfo/dev-platform


Re: IDNA processing

2017-05-15 Thread Gervase Markham
On 12/05/17 08:46, Anne van Kesteren wrote:
> For about five years I've been trying to figure out the IDNA algorithm
> that a) browsers follow and b) browsers want to follow, but I've not
> had much luck thus far getting folks to reply. E.g.,
> https://lists.w3.org/Archives/Public/www-archive/2017Feb/0006.html
> went largely unaddressed.

Well, you generally know my opinion :-) IDNA 2008 non-transitional.

> One big difference between http://www.unicode.org/reports/tr46/ and
> browsers is how ASCII is handled. Per UTS #46 ASCII is handled the
> same as non-ASCII. However, in browsers ASCII takes a "fast path" and
> skips the ToASCII algorithm. YouTube now depends on that (it has CDN
> domains with hyphens in the third and fourth position, as reported at,
> e.g., https://github.com/nodejs/node/issues/12965).

ISTM that the 3rd/4th placed hyphens were banned so the domain name
system had an extension mechanism, and that was used for IDNA (xn--). If
we allow domains of that form, we no longer have that extension
mechanism. The question is, how big a loss is that?

Gerv
___
dev-platform mailing list
dev-platform@lists.mozilla.org
https://lists.mozilla.org/listinfo/dev-platform


IDNA processing

2017-05-12 Thread Anne van Kesteren
If this is better suited elsewhere, such as dev-tech-network, please
let me know.

For about five years I've been trying to figure out the IDNA algorithm
that a) browsers follow and b) browsers want to follow, but I've not
had much luck thus far getting folks to reply. E.g.,
https://lists.w3.org/Archives/Public/www-archive/2017Feb/0006.html
went largely unaddressed.

One big difference between http://www.unicode.org/reports/tr46/ and
browsers is how ASCII is handled. Per UTS #46 ASCII is handled the
same as non-ASCII. However, in browsers ASCII takes a "fast path" and
skips the ToASCII algorithm. YouTube now depends on that (it has CDN
domains with hyphens in the third and fourth position, as reported at,
e.g., https://github.com/nodejs/node/issues/12965).

A question I've had is whether we should standardize the fast path or
try to get consistent handling. I've also raised this with the Unicode
folks at 
https://docs.google.com/document/d/11PEww2N0PbXyPhbsCdW_PjD3BNgZMy5XHUv02SSXNqY/edit
and elsewhere and it seems an upcoming draft adds another flag to the
UTS #36 ToASCII algorithms to ignore hyphen requirements.

However, hyphens are not the only requirement that might influence how
a pure ASCII domain is handled and therefore it's unclear it actually
solves the problem, especially if browsers continue to ignore it.

Since I haven't gotten much cooperation my plan is to just standardize
what browsers do (in https://url.spec.whatwg.org/ which ends up
invoking UTS #46 ToASCII) and go from there, especially as not doing
what browsers do tends to break other ecosystems, but I thought I'd
raise this here as a final attempt to get some input.


-- 
https://annevankesteren.nl/
___
dev-platform mailing list
dev-platform@lists.mozilla.org
https://lists.mozilla.org/listinfo/dev-platform