Hi whatwg,

I'm writing with a proposal to improve the handling of "#" in data URIs. I'm particularly looking for feedback from other browser vendors, but of course feedback from others is welcome as well.

SUMMARY:
========
Browsers handle the "#" character in data URIs very differently, and the arguably "correct" behavior is probably not what authors actually want in many cases.

This could be more intuitive/do-what-I-mean if we restricted the cases under which "#" is treated as a fragment-ID delimiter inside of data URIs. In particular: when a "#" character is followed by ">" or "<" in a data URI, I propose that we *don't* treat the "#" as a delimiter, and instead just treat it as part of the encoded document.

Now, a set of tests, to which I'll refer below:
  http://people.mozilla.org/~dholbert/dataURIHashTests/tests_v1.xhtml

PROBLEM:
========
When an author writes a data URI for a document that contains a "#" character, she may unintentionally end up with broken results (or at least inconsistently-handled results), because the "#" may be treated as the end of the document & the beginning of the URI's fragment identifier.

(I believe this to be the _technically_ correct (albeit unintuitive) behavior per the URI RFC [1] -- it's the behavior we've implemented in Firefox 6 [2] and it's what I've described as "Correct" in my testcase. (with quotes to indicate unintuitiveness))

Technically, the author *really* should encode the "#" character as "%23", if she doesn't want it to be a delimiter.

However, this gotcha is easy to overlook -- especially because Opera & Webkit are less strict than Firefox in this respect and will gladly accept "#" inside data URIs under some circumstances.

THE PROPOSAL & HOW IT HELPS:
============================
We can help out the author by relaxing our fragment-ID-parsing rules a bit here.

Note that in cases where an author *accidentally* includes "#" inside their data URI (e.g. <body background="#f00">), there almost certainly will be more content following it -- in particular, there will be an </html>, or an </svg>, or at least a ">" (if it's inside the final tag) still to come.

So we can proactively check for >/< characters anywhere after the "#", and if we find them, then we can pretty safely assume that the author intended for the "#" to be part of the document, rather than a fragment-ID delimiter.

OVERVIEW OF BROWSERS' CURRENT HANDLING OF "#" IN DATA URIs:
===========================================================
url: http://people.mozilla.org/~dholbert/dataURIHashTests/tests_v1.xhtml

* Firefox 6+ breaks the author's expectations in my tests A & B due to URI parsing strictness. (But if we were to implement the above proposal, we'd match the author's expectations.) We pass test C due to correctly trimming "#target" off of the end and scrolling to the referenced element. And we fail test D only due to a bug with over-enforcing same-origin checks.[3]

* WebKit matches the author's expectations on A & B -- however, that's only because they don't seem to support "#ref" suffixes on the ends of data URIs at all, so they _always_ include "#" in the document. (They *do* apparently support _relative_ references within data URI documents, e.g. xlink:href='#greenRect' as used in test B.) So, Webkit ends up failing test C because they don't strip off the "#target" suffix (resulting in broken XML). They fail test D presumably for the same reason. (They also have some zooming issues on the <img> examples, but I'm ignoring those for the purposes of this post.)

* Opera is interesting -- it can exhibit either the Firefox or WebKit behaviors in tests A/B/C, depending on whether the data URI as an embedded element (via iframe/img) or view it directly. When you view it as an embedded element (in my testcase), Opera matches WebKit on A/B/C (including the XML parse error on C). However, if you *directly view* the data URIs (right-click on iframe, Frame|Open, focus URLbar & hit enter), then Opera matches Firefox. Also, Opera passes test D.

(I don't have results for IE -- I briefly tried to support it in the test, but I had issues getting data URIs to work there at all.)

CONCLUSION:
===========
So - to sum up the test-results above: webkit doesn't give "#" any special delimiter status in data URIs, which is a bug, but probably matches what authors intend a lot of the time; Opera sometimes behaves like Webkit and sometimes not; and Firefox parses fragment-identifiers strictly, potentially giving authors headaches and truncating content that renders fine in Opera/Webkit.

With my proposal here -- relaxing the situations under which "#" should be treated as a delimiter in a data URI -- I think we'd better match author expectations and improve the browser-compatibility picture.

Thoughts?

Thanks,
Daniel Holbert
Mozilla Corporation

P.S. Thanks to Robert O'Callahan for coming up with this proposal a week or so back.

P.P.S. Browser versions that I tested (on Ubuntu 11.04 x86):
 Firefox 6.02
 Opera 11.51
 Chromium 14.0.835.126 (Developer Build 99097 Linux)

[1] https://www.ietf.org/rfc/rfc2396.txt See section 4.1 & appendix "B" ("Parsing a URI Reference with a Regular Expression") which shows that "#" is technically disallowed up until the #reference at the end.)

[2] https://bugzilla.mozilla.org/show_bug.cgi?id=308590

[3] https://bugzilla.mozilla.org/show_bug.cgi?id=686013

Reply via email to