Re: Scope of XML parser rewrite?
On Fri, May 26, 2017 at 1:18 AM, Eric Rahmwrote: > Limiting to modifying nsScanner might be an option, but probably not > changing all callers that use the nsAString interface. I guess we can just > UTF-16 => UTF-8 those and file a bunch of follow ups? Yeah. The main follow-up would be https://bugzilla.mozilla.org/show_bug.cgi?id=1355106 , which would allow the avoidance of UTF-16 expansion in the innerHTML/createContextualFragment/DOMParser cases for strings that the JS engine doesn't store as 16-bit units in the first place. (Performance-wise, I see the network entry point as the main thing for the XML parser and innerHTML/createContextualFragment/DOMParser as secondary.) > One thing we've ignored are all the consumers expect output to be UTF-16, so > there's a fair amount of work on that side as well. I guess we have a viewpoint difference in terms of what the "consumers" are. I think of the DOM as a consumer, and the DOM takes Atoms (which can be looked up from UTF-8). While the callbacks in nsExpatDriver aren't bad code like nsScanner is, I don't think of the exact callback callback code as worth preserving in its precise current state. > Maybe a reasonable approach is to use a UTF-8 interface for the replacement > Rust library and work on a staged rollout: > > Start just converting UTF-16 => UTF-8 for input at the nsExpatDriver level, > UTF-8 => UTF-16 for output > Modify/replace nsScanner with something that works with UTF-8 (and > encoding_rs?), convert UTF-16 => UTF-8 for the nsAString methods > Follow up replacing nsAString methods with UTF-8 versions > Look into whether modifying the consumers of the tokenized data to handle > UTF-8 is reasonable, follow up as necessary > > WDYT? Seems good to me with the note that doing the direct UTF-8 to nsIAtom lookup would probably be a pretty immediate thing rather a true follow-up. -- Henri Sivonen hsivo...@hsivonen.fi https://hsivonen.fi/ ___ dev-platform mailing list dev-platform@lists.mozilla.org https://lists.mozilla.org/listinfo/dev-platform
Re: Scope of XML parser rewrite?
Thanks Henri, I think we can find a middle ground so as to avoid a ton of scope creep but leave the door open to a better iterative solution. See notes inline. -e On Wed, May 24, 2017 at 11:18 PM, Henri Sivonenwrote: > On Wed, May 24, 2017 at 10:11 AM, Henri Sivonen > wrote: > >> Our current interface is UTF-16, so that's my target for now. I think > >> whatever cache-friendliness would be lost converting from UTF-16 -> > UTF-8 -> > >> UTF-16. > > > > I hope this can be reconsidered, because the assumption that it would > > have to be UTF-16 -> UTF-8 -> UTF-16 isn't accurate. > > I see that this part didn't get an on-list reply but got an blog reply: > http://www.erahm.org/2017/05/24/a-rust-based-xml-parser-for-firefox/ > Yes sorry, I should have followed up here as well! > I continue to think it's a bad idea to write another parser that uses > UTF-16 internally. Even though I can see your desire to keep the > project tightly scoped, I think it's fair to ask you to expand the > scope a bit by 1) adding a way to pass Latin-1 data to text nodes > directly (and use this when the the parser sees a text node is all > ASCII) and 2) replacing nsScanner with a bit of new buffering code > that takes bytes from the network and converts them to UTF-8 using > encoding_rs. > > We've both had the displeasure of modifying nsScanner as part of a > security fix. nsScanner isn't valuable code that we should try to > keep. It's no longer scanning for anything. It's just an > over-complicated way of maintaining a buffer of UTF-16 data. While > nsScanner and the associated classes are a lot of code, they do > something simple that should be done in quite a bit less code, so as > scope creep, replacing nsScanner should be a drop in a bucket > effort-wise compared to replacing expat. > > I think it's super-sad if we get another UTF-16-using parser because > replacing nsScanner was scoped out of the project. > Limiting to modifying nsScanner might be an option, but probably not changing all callers that use the nsAString interface. I guess we can just UTF-16 => UTF-8 those and file a bunch of follow ups? One thing we've ignored are all the consumers expect output to be UTF-16, so there's a fair amount of work on that side as well. Maybe a reasonable approach is to use a UTF-8 interface for the replacement Rust library and work on a staged rollout: 1. Start just converting UTF-16 => UTF-8 for input at the nsExpatDriver level, UTF-8 => UTF-16 for output 2. Modify/replace nsScanner with something that works with UTF-8 (and encoding_rs?), convert UTF-16 => UTF-8 for the nsAString methods 3. Follow up replacing nsAString methods with UTF-8 versions 4. Look into whether modifying the consumers of the tokenized data to handle UTF-8 is reasonable, follow up as necessary WDYT? > -- > Henri Sivonen > hsivo...@hsivonen.fi > https://hsivonen.fi/ > ___ dev-platform mailing list dev-platform@lists.mozilla.org https://lists.mozilla.org/listinfo/dev-platform
Re: Scope of XML parser rewrite?
On Wed, May 24, 2017 at 10:11 AM, Henri Sivonenwrote: >> Our current interface is UTF-16, so that's my target for now. I think >> whatever cache-friendliness would be lost converting from UTF-16 -> UTF-8 -> >> UTF-16. > > I hope this can be reconsidered, because the assumption that it would > have to be UTF-16 -> UTF-8 -> UTF-16 isn't accurate. I see that this part didn't get an on-list reply but got an blog reply: http://www.erahm.org/2017/05/24/a-rust-based-xml-parser-for-firefox/ I continue to think it's a bad idea to write another parser that uses UTF-16 internally. Even though I can see your desire to keep the project tightly scoped, I think it's fair to ask you to expand the scope a bit by 1) adding a way to pass Latin-1 data to text nodes directly (and use this when the the parser sees a text node is all ASCII) and 2) replacing nsScanner with a bit of new buffering code that takes bytes from the network and converts them to UTF-8 using encoding_rs. We've both had the displeasure of modifying nsScanner as part of a security fix. nsScanner isn't valuable code that we should try to keep. It's no longer scanning for anything. It's just an over-complicated way of maintaining a buffer of UTF-16 data. While nsScanner and the associated classes are a lot of code, they do something simple that should be done in quite a bit less code, so as scope creep, replacing nsScanner should be a drop in a bucket effort-wise compared to replacing expat. I think it's super-sad if we get another UTF-16-using parser because replacing nsScanner was scoped out of the project. -- Henri Sivonen hsivo...@hsivonen.fi https://hsivonen.fi/ ___ dev-platform mailing list dev-platform@lists.mozilla.org https://lists.mozilla.org/listinfo/dev-platform
Re: Scope of XML parser rewrite?
Am 24.05.17 um 09:34 schrieb Anne van Kesteren: On Tue, May 23, 2017 at 8:23 PM, Eric Rahmwrote: I was hoping to write a more thorough blog post about this proposal (I have some notes in a gist), but for now I've added comments inline. The main takeaway here is that I want to do a bare-bones replacement of just the parts of expat we currently use. It needs to support DTD entities, have a streaming interface, and support XML 1 v4. That's it, no new features, no rewrite of our entire XML stack. "XML5" supports entities (at least my original version did), I think the main problem is that there's no support for external DTDs. Not sure how much that differs from parsing the internal subset. Either way, that's always been a feature that as far as the web is concerned is not supported so could conceivably be a Firefox-only thing. Only XUL needs it. Technical correction, our use of DTDs is independent of XUL, we use the same thing for XHTML UI parts. Which is the reason why we're not using HTML there. We do intend to get rid of it, that's what L20n and Fluent are for, and we're more than happy to see more people fight for that :-) Truth be told, though, we can only drop support when the last bit of UI is converted to L20n, and not just in Firefox, but also the other stuff. Y'know, Thunderbird, too, I guess. Axel My current goal is a drop-in replacement for expat with just the features gecko cares about, so just 1.0 version 4 I guess. It's possible whatever we end up with could be merged with another library when XML5 is settled, but I don't want to wait for that. Contrary to Henri, I think XML 1.0 edition 5 (which isn't "XML5") is worth considering given https://bugzilla.mozilla.org/show_bug.cgi?id=501837. It's what Chrome ships and our current implementation doesn't seem to align with either the 4th or 5th edition of XML 1.0. ___ dev-platform mailing list dev-platform@lists.mozilla.org https://lists.mozilla.org/listinfo/dev-platform
Re: Scope of XML parser rewrite?
On Wed, May 24, 2017 at 10:34 AM, Anne van Kesterenwrote: > Contrary to Henri, I think XML 1.0 edition 5 (which isn't "XML5") is > worth considering given > https://bugzilla.mozilla.org/show_bug.cgi?id=501837. It's what Chrome > ships and our current implementation doesn't seem to align with either > the 4th or 5th edition of XML 1.0. OK, if Chrome has shipped 1.0 5th ed., we should, too. :-( -- Henri Sivonen hsivo...@hsivonen.fi https://hsivonen.fi/ ___ dev-platform mailing list dev-platform@lists.mozilla.org https://lists.mozilla.org/listinfo/dev-platform
Re: Scope of XML parser rewrite?
On Tue, May 23, 2017 at 8:23 PM, Eric Rahmwrote: > I was hoping to write a more thorough blog post about this proposal (I have > some notes in a gist), but for now I've added comments inline. The main > takeaway here is that I want to do a bare-bones replacement of just the > parts of expat we currently use. It needs to support DTD entities, have a > streaming interface, and support XML 1 v4. That's it, no new features, no > rewrite of our entire XML stack. "XML5" supports entities (at least my original version did), I think the main problem is that there's no support for external DTDs. Not sure how much that differs from parsing the internal subset. Either way, that's always been a feature that as far as the web is concerned is not supported so could conceivably be a Firefox-only thing. Only XUL needs it. > My current goal is a drop-in replacement for expat with just the features > gecko cares about, so just 1.0 version 4 I guess. It's possible whatever we > end up with could be merged with another library when XML5 is settled, but I > don't want to wait for that. Contrary to Henri, I think XML 1.0 edition 5 (which isn't "XML5") is worth considering given https://bugzilla.mozilla.org/show_bug.cgi?id=501837. It's what Chrome ships and our current implementation doesn't seem to align with either the 4th or 5th edition of XML 1.0. -- https://annevankesteren.nl/ ___ dev-platform mailing list dev-platform@lists.mozilla.org https://lists.mozilla.org/listinfo/dev-platform
Re: Scope of XML parser rewrite?
On Wed, May 24, 2017 at 10:11 AM, Henri Sivonenwrote: > It seems to me that attribute values would be the only case where a > conversion from UTF-8 to UTF-16 would be needed all the time, and that > conversion can be fast for ASCII, which is what attribute values > mostly are. Moreover, this conversion doesn't need to have the cost of converting potentially-bogus UTF-8 to UTF-16 but only the cost of converting guaranteed-valid UTF-8 to UTF-16, because UTF-8 validity was already guaranteed earlier. -- Henri Sivonen hsivo...@hsivonen.fi https://hsivonen.fi/ ___ dev-platform mailing list dev-platform@lists.mozilla.org https://lists.mozilla.org/listinfo/dev-platform
Re: Scope of XML parser rewrite?
On Tue, May 23, 2017 at 5:01 PM, Daniel Fathwrote: > So, if I understand this correctly - We'll first need to land this component > in Firefox, right? And if it proves itself fine, then formalize it. No, both the implementation and the spec would have to be pretty solid before stuff can go into Firefox. But, as noted, DTDs are a blocker (if Firefox is to use the same XML parser for both XUL and for the Web, which makes sense in terms of binary size even if it's rather sad for XUL to constrain the Web side). >> I was thinking of having resolutions for the issues that are currently >> warnings in red and multi-vendor buy-in. (Previously, Tab from Google >> was interested in making SVG parsing non-Draconian, but I have no idea >> how reflective of wider buy-in that remark was.) > > You also mentioned warnings in red and multi-vendor buy-in. What does that > entail? Looks like at this time, even Mozilla-internal buy-in is lacking. :-/ On Tue, May 23, 2017 at 9:23 PM, Eric Rahm wrote: > I was hoping to write a more thorough blog post about this proposal (I have > some notes in a gist [1]), but for now I've added comments inline. The main > takeaway here is that I want to do a bare-bones replacement of just the > parts of expat we currently use. It needs to support DTD entities, have a > streaming interface, and support XML 1 v4. That's it, no new features, no > rewrite of our entire XML stack. OK. > Our current interface is UTF-16, so that's my target for now. I think > whatever cache-friendliness would be lost converting from UTF-16 -> UTF-8 -> > UTF-16. I hope this can be reconsidered, because the assumption that it would have to be UTF-16 -> UTF-8 -> UTF-16 isn't accurate. encoding_rs (https://bugzilla.mozilla.org/show_bug.cgi?id=encoding_rs) adds the capability to decode directly to UTF-8. This is a true direct-to-UTF-8 capability without pivoting through UTF-16. When the input is UTF-8 (as is the case with our chrome XML and with most on-the-Web XML), in the streaming mode, except for the few bytes representing code points split across buffer boundaries, this is fast UTF-8 validation (without doing math to compute scalar values and with SIMD acceleration for ASCII runs) and memcpy. (In the non-streaming case, it's validation and borrow when called from Rust and validation and nsStringBuffer refcount increment when called from C++.) On the other side of the parser, it's true that our DOM API takes UTF-16, but if all the code points in a text node are U+00veryFF or under, the text gets stored with leading zeros omitted. It would be fairly easy to add a hole in the abstraction to allow a UTF-8-oriented parser to set the compressed form directly without expansion to UTF-16 and then compression immediately back to ASCII when the parser knows that a text node is all ASCII. For element and attribute names, we already support finding atoms by UTF-8 representation and in most cases element and attribute names are ASCII with static atoms already existing for them. It seems to me that attribute values would be the only case where a conversion from UTF-8 to UTF-16 would be needed all the time, and that conversion can be fast for ASCII, which is what attribute values mostly are. Furthermore, the main Web XML case is SVG, which has relatively little natural-language text, so it's almost entirely ASCII. Looking at the ratio markup and natural-language text in XUL, it seems fair to guess that parsing XUL as UTF-8 would be a cache-friendliness win, too. -- Henri Sivonen hsivo...@hsivonen.fi https://hsivonen.fi/ ___ dev-platform mailing list dev-platform@lists.mozilla.org https://lists.mozilla.org/listinfo/dev-platform
Re: Scope of XML parser rewrite?
I was hoping to write a more thorough blog post about this proposal (I have some notes in a gist [1]), but for now I've added comments inline. The main takeaway here is that I want to do a bare-bones replacement of just the parts of expat we currently use. It needs to support DTD entities, have a streaming interface, and support XML 1 v4. That's it, no new features, no rewrite of our entire XML stack. -e [1] https://gist.github.com/EricRahm/f718c4d8a862cc08b69d7d4290c02927 On Mon, May 22, 2017 at 11:43 PM, Henri Sivonenwrote: > In reference to: https://twitter.com/nnethercote/status/866792097101238272 > > Is the rewrite meant to replace expat only or also some of our old > code on both above and below expat? > Just expat. > Back in 2011, I wrote a plan for rewriting the code around expat > without rewriting expat itself: > https://wiki.mozilla.org/Platform/XML_Rewrite > I've had higher-priority stuff to do ever since... > > Yes, I've seen this. It explicitly calls out not replacing expat, so the plans are mostly orthogonal. > (The above plan talks about pushing UTF-16 to the XML parser and > having deep C++ namespaces. Any project starting this year should make > the new parser use UTF-8 internally for cache-friendliness and use > less deep C++ namespaces.) > Our current interface is UTF-16, so that's my target for now. I think whatever cache-friendliness would be lost converting from UTF-16 -> UTF-8 -> UTF-16. > Also, I think the decision of which XML version to support should be a > deliberate decision and not an accident. I think the reasonable > choices are XML 1.0 4th edition (not rocking the boat) and reviving > XML5 (original discussion: https://annevankesteren.nl/2007/10/xml5 , > latest draft: https://ygg01.github.io/xml5_draft/). XML 1.1 is dead. > XML 1.0 5th edition tried to have the XML 1.0 cake and eat the XML 1.1 > cake too and expanded the set of documents that parser doesn't reject. > Any of the newly well-forming documents would be incompatible with 4th > ed. and earlier parsers, which would be a break from universal XML > interop. I think it doesn't make sense to relax XML only a bit. If XML > is to be relaxed (breaking interop in the sense of starting to accept > docs that old browsers would show the Yellow Screen of Death on), we > should go all the way (i.e. XML5). > > My current goal is a drop-in replacement for expat with just the features gecko cares about, so just 1.0 version 4 I guess. It's possible whatever we end up with could be merged with another library when XML5 is settled, but I don't want to wait for that. > Notably, it looks like Servo already has an XML5 parser written in Rust: > https://github.com/servo/html5ever/tree/master/xml5ever > > Yes, this lacks DTD support (and 1.0 support). > The tweets weren't clear about whether xml5ever had been considered, > but https://twitter.com/eroc/status/866808814959378434 looks like it's > talking about writing a new one. > > Correct, I looked at xml5ever and spoke with some folks on #servo about it. It doesn't meet Firefox's requirements. > It seems like integrating xml5ever (as opposed to another XML parser > written in Rust) into Gecko would give some insight into how big a > deal it would be to replace Gecko's HTML parser with html5ever > (although due to document.write(), HTML is always a bigger deal > integration-wise than XML). > > That's a non-goal for me, but I can see how it would be useful. > (If the outcome here is to do XML5, we should make sure the spec is > polished enough at the WHATWG in order not to a unilateral thing in > relative secret.) > > That is not my current goal, but that seems reasonable regardless of this project. > -- > Henri Sivonen > hsivo...@hsivonen.fi > https://hsivonen.fi/ > ___ dev-platform mailing list dev-platform@lists.mozilla.org https://lists.mozilla.org/listinfo/dev-platform
Re: Scope of XML parser rewrite?
Am 23.05.17 um 16:01 schrieb Daniel Fath: So, if I understand this correctly - We'll first need to land this component in Firefox, right? And if it proves itself fine, then formalize it. I was thinking of having resolutions for the issues that are currently warnings in red and multi-vendor buy-in. (Previously, Tab from Google was interested in making SVG parsing non-Draconian, but I have no idea how reflective of wider buy-in that remark was.) You also mentioned warnings in red and multi-vendor buy-in. What does that entail? Will lack of support for DTD be a problem? In XML5 it was decided, that instead of parsing DTD we just store list of named character references from https://html.spec.whatwg.org/multipage/syntax.html#named-character-references. While we could add another list and expand entities.json. It's possible I need to update spec to reflect that. Yes, not parsing DTDs would be a deal-breaker for the foreseeable future, as we're abusing DTDs to localize X(H)TML documents. Axel PS. I hope I'm not spamming you guys too hard, I'm kind of new to the mailing list thing. Daniel Fath, daniel.fa...@gmail.com ___ dev-platform mailing list dev-platform@lists.mozilla.org https://lists.mozilla.org/listinfo/dev-platform
Re: Scope of XML parser rewrite?
So, if I understand this correctly - We'll first need to land this component in Firefox, right? And if it proves itself fine, then formalize it. > I was thinking of having resolutions for the issues that are currently > warnings in red and multi-vendor buy-in. (Previously, Tab from Google > was interested in making SVG parsing non-Draconian, but I have no idea > how reflective of wider buy-in that remark was.) You also mentioned warnings in red and multi-vendor buy-in. What does that entail? Will lack of support for DTD be a problem? In XML5 it was decided, that instead of parsing DTD we just store list of named character references from https://html.spec.whatwg.org/multipage/syntax.html#named-character-references. While we could add another list and expand entities.json. It's possible I need to update spec to reflect that. PS. I hope I'm not spamming you guys too hard, I'm kind of new to the mailing list thing. Daniel Fath, daniel.fa...@gmail.com ___ dev-platform mailing list dev-platform@lists.mozilla.org https://lists.mozilla.org/listinfo/dev-platform
Re: Scope of XML parser rewrite?
On Tue, May 23, 2017 at 12:44 PM, Daniel Fathwrote: >> (If the outcome here is to do XML5, we should make sure the spec is >> polished enough at the WHATWG in order not to a unilateral thing in >> relative secret.) > > What does it mean to be polished enough at the WHATWG? I was thinking of having resolutions for the issues that are currently warnings in red and multi-vendor buy-in. (Previously, Tab from Google was interested in making SVG parsing non-Draconian, but I have no idea how reflective of wider buy-in that remark was.) > Also how far reaching should spec be? Include Namespaces? I would expect the spec to take a byte stream as input, specify how the encoding is determined, delegate the decoding from bytes to Unicode code points to the Encoding Standard and then define how the code point stream is processed into a DOM tree. (Bonus points for defining a coercion to an XML 1.0 4th ed. Infoset, too, for non-browser use cases.) That would include the processing of Namespaces. -- Henri Sivonen hsivo...@hsivonen.fi https://hsivonen.fi/ ___ dev-platform mailing list dev-platform@lists.mozilla.org https://lists.mozilla.org/listinfo/dev-platform
Re: Scope of XML parser rewrite?
> (If the outcome here is to do XML5, we should make sure the spec is > polished enough at the WHATWG in order not to a unilateral thing in > relative secret.) What does it mean to be polished enough at the WHATWG? Also how far reaching should spec be? Include Namespaces? On Tue, May 23, 2017 at 9:01 AM, Henri Sivonenwrote: > Figured out the email address of the XML5 editor / xml5ever developer, > so adding to CC. > > On Tue, May 23, 2017 at 9:43 AM, Henri Sivonen > wrote: > > In reference to: https://twitter.com/nnethercote/status/ > 866792097101238272 > > > > Is the rewrite meant to replace expat only or also some of our old > > code on both above and below expat? > > > > Back in 2011, I wrote a plan for rewriting the code around expat > > without rewriting expat itself: > > https://wiki.mozilla.org/Platform/XML_Rewrite > > I've had higher-priority stuff to do ever since... > > > > (The above plan talks about pushing UTF-16 to the XML parser and > > having deep C++ namespaces. Any project starting this year should make > > the new parser use UTF-8 internally for cache-friendliness and use > > less deep C++ namespaces.) > > > > Also, I think the decision of which XML version to support should be a > > deliberate decision and not an accident. I think the reasonable > > choices are XML 1.0 4th edition (not rocking the boat) and reviving > > XML5 (original discussion: https://annevankesteren.nl/2007/10/xml5 , > > latest draft: https://ygg01.github.io/xml5_draft/). XML 1.1 is dead. > > XML 1.0 5th edition tried to have the XML 1.0 cake and eat the XML 1.1 > > cake too and expanded the set of documents that parser doesn't reject. > > Any of the newly well-forming documents would be incompatible with 4th > > ed. and earlier parsers, which would be a break from universal XML > > interop. I think it doesn't make sense to relax XML only a bit. If XML > > is to be relaxed (breaking interop in the sense of starting to accept > > docs that old browsers would show the Yellow Screen of Death on), we > > should go all the way (i.e. XML5). > > > > Notably, it looks like Servo already has an XML5 parser written in Rust: > > https://github.com/servo/html5ever/tree/master/xml5ever > > > > The tweets weren't clear about whether xml5ever had been considered, > > but https://twitter.com/eroc/status/866808814959378434 looks like it's > > talking about writing a new one. > > > > It seems like integrating xml5ever (as opposed to another XML parser > > written in Rust) into Gecko would give some insight into how big a > > deal it would be to replace Gecko's HTML parser with html5ever > > (although due to document.write(), HTML is always a bigger deal > > integration-wise than XML). > > > > (If the outcome here is to do XML5, we should make sure the spec is > > polished enough at the WHATWG in order not to a unilateral thing in > > relative secret.) > > > > -- > > Henri Sivonen > > hsivo...@hsivonen.fi > > https://hsivonen.fi/ > > > > -- > Henri Sivonen > hsivo...@hsivonen.fi > https://hsivonen.fi/ > ___ dev-platform mailing list dev-platform@lists.mozilla.org https://lists.mozilla.org/listinfo/dev-platform
Re: Scope of XML parser rewrite?
Figured out the email address of the XML5 editor / xml5ever developer, so adding to CC. On Tue, May 23, 2017 at 9:43 AM, Henri Sivonenwrote: > In reference to: https://twitter.com/nnethercote/status/866792097101238272 > > Is the rewrite meant to replace expat only or also some of our old > code on both above and below expat? > > Back in 2011, I wrote a plan for rewriting the code around expat > without rewriting expat itself: > https://wiki.mozilla.org/Platform/XML_Rewrite > I've had higher-priority stuff to do ever since... > > (The above plan talks about pushing UTF-16 to the XML parser and > having deep C++ namespaces. Any project starting this year should make > the new parser use UTF-8 internally for cache-friendliness and use > less deep C++ namespaces.) > > Also, I think the decision of which XML version to support should be a > deliberate decision and not an accident. I think the reasonable > choices are XML 1.0 4th edition (not rocking the boat) and reviving > XML5 (original discussion: https://annevankesteren.nl/2007/10/xml5 , > latest draft: https://ygg01.github.io/xml5_draft/). XML 1.1 is dead. > XML 1.0 5th edition tried to have the XML 1.0 cake and eat the XML 1.1 > cake too and expanded the set of documents that parser doesn't reject. > Any of the newly well-forming documents would be incompatible with 4th > ed. and earlier parsers, which would be a break from universal XML > interop. I think it doesn't make sense to relax XML only a bit. If XML > is to be relaxed (breaking interop in the sense of starting to accept > docs that old browsers would show the Yellow Screen of Death on), we > should go all the way (i.e. XML5). > > Notably, it looks like Servo already has an XML5 parser written in Rust: > https://github.com/servo/html5ever/tree/master/xml5ever > > The tweets weren't clear about whether xml5ever had been considered, > but https://twitter.com/eroc/status/866808814959378434 looks like it's > talking about writing a new one. > > It seems like integrating xml5ever (as opposed to another XML parser > written in Rust) into Gecko would give some insight into how big a > deal it would be to replace Gecko's HTML parser with html5ever > (although due to document.write(), HTML is always a bigger deal > integration-wise than XML). > > (If the outcome here is to do XML5, we should make sure the spec is > polished enough at the WHATWG in order not to a unilateral thing in > relative secret.) > > -- > Henri Sivonen > hsivo...@hsivonen.fi > https://hsivonen.fi/ -- Henri Sivonen hsivo...@hsivonen.fi https://hsivonen.fi/ ___ dev-platform mailing list dev-platform@lists.mozilla.org https://lists.mozilla.org/listinfo/dev-platform