On Tue, May 23, 2017 at 5:01 PM, Daniel Fath <daniel.fa...@gmail.com> wrote: > So, if I understand this correctly - We'll first need to land this component > in Firefox, right? And if it proves itself fine, then formalize it.
No, both the implementation and the spec would have to be pretty solid before stuff can go into Firefox. But, as noted, DTDs are a blocker (if Firefox is to use the same XML parser for both XUL and for the Web, which makes sense in terms of binary size even if it's rather sad for XUL to constrain the Web side). >> I was thinking of having resolutions for the issues that are currently >> warnings in red and multi-vendor buy-in. (Previously, Tab from Google >> was interested in making SVG parsing non-Draconian, but I have no idea >> how reflective of wider buy-in that remark was.) > > You also mentioned warnings in red and multi-vendor buy-in. What does that > entail? Looks like at this time, even Mozilla-internal buy-in is lacking. :-/ On Tue, May 23, 2017 at 9:23 PM, Eric Rahm <er...@mozilla.com> wrote: > I was hoping to write a more thorough blog post about this proposal (I have > some notes in a gist [1]), but for now I've added comments inline. The main > takeaway here is that I want to do a bare-bones replacement of just the > parts of expat we currently use. It needs to support DTD entities, have a > streaming interface, and support XML 1 v4. That's it, no new features, no > rewrite of our entire XML stack. OK. > Our current interface is UTF-16, so that's my target for now. I think > whatever cache-friendliness would be lost converting from UTF-16 -> UTF-8 -> > UTF-16. I hope this can be reconsidered, because the assumption that it would have to be UTF-16 -> UTF-8 -> UTF-16 isn't accurate. encoding_rs (https://bugzilla.mozilla.org/show_bug.cgi?id=encoding_rs) adds the capability to decode directly to UTF-8. This is a true direct-to-UTF-8 capability without pivoting through UTF-16. When the input is UTF-8 (as is the case with our chrome XML and with most on-the-Web XML), in the streaming mode, except for the few bytes representing code points split across buffer boundaries, this is fast UTF-8 validation (without doing math to compute scalar values and with SIMD acceleration for ASCII runs) and memcpy. (In the non-streaming case, it's validation and borrow when called from Rust and validation and nsStringBuffer refcount increment when called from C++.) On the other side of the parser, it's true that our DOM API takes UTF-16, but if all the code points in a text node are U+00veryFF or under, the text gets stored with leading zeros omitted. It would be fairly easy to add a hole in the abstraction to allow a UTF-8-oriented parser to set the compressed form directly without expansion to UTF-16 and then compression immediately back to ASCII when the parser knows that a text node is all ASCII. For element and attribute names, we already support finding atoms by UTF-8 representation and in most cases element and attribute names are ASCII with static atoms already existing for them. It seems to me that attribute values would be the only case where a conversion from UTF-8 to UTF-16 would be needed all the time, and that conversion can be fast for ASCII, which is what attribute values mostly are. Furthermore, the main Web XML case is SVG, which has relatively little natural-language text, so it's almost entirely ASCII. Looking at the ratio markup and natural-language text in XUL, it seems fair to guess that parsing XUL as UTF-8 would be a cache-friendliness win, too. -- Henri Sivonen hsivo...@hsivonen.fi https://hsivonen.fi/ _______________________________________________ dev-platform mailing list dev-platform@lists.mozilla.org https://lists.mozilla.org/listinfo/dev-platform