Re: Scope of XML parser rewrite?

2017-05-25 Thread Henri Sivonen
On Fri, May 26, 2017 at 1:18 AM, Eric Rahm  wrote:
> Limiting to modifying nsScanner might be an option, but probably not
> changing all callers that use the nsAString interface. I guess we can just
> UTF-16 => UTF-8 those and file a bunch of follow ups?

Yeah. The main follow-up would be
https://bugzilla.mozilla.org/show_bug.cgi?id=1355106 , which would
allow the avoidance of UTF-16 expansion in the
innerHTML/createContextualFragment/DOMParser cases for strings that
the JS engine doesn't store as 16-bit units in the first place.

(Performance-wise, I see the network entry point as the main thing for
the XML parser and innerHTML/createContextualFragment/DOMParser as
secondary.)

> One thing we've ignored are all the consumers expect output to be UTF-16, so
> there's a fair amount of work on that side as well.

I guess we have a viewpoint difference in terms of what the
"consumers" are. I think of the DOM as a consumer, and the DOM takes
Atoms (which can be looked up from UTF-8). While the callbacks in
nsExpatDriver aren't bad code like nsScanner is, I don't think of the
exact callback callback code as worth preserving in its precise
current state.

> Maybe a reasonable approach is to use a UTF-8 interface for the replacement
> Rust library and work on a staged rollout:
>
> Start just converting UTF-16 => UTF-8 for input at the nsExpatDriver level,
> UTF-8 => UTF-16 for output
> Modify/replace nsScanner with something that works with UTF-8 (and
> encoding_rs?), convert UTF-16 => UTF-8 for the nsAString methods
> Follow up replacing nsAString methods with UTF-8 versions
>  Look into whether modifying the consumers of the tokenized data to handle
> UTF-8 is reasonable, follow up as necessary
>
> WDYT?

Seems good to me with the note that doing the direct UTF-8 to nsIAtom
lookup would probably be a pretty immediate thing rather a true
follow-up.

-- 
Henri Sivonen
hsivo...@hsivonen.fi
https://hsivonen.fi/
___
dev-platform mailing list
dev-platform@lists.mozilla.org
https://lists.mozilla.org/listinfo/dev-platform


Re: Scope of XML parser rewrite?

2017-05-25 Thread Eric Rahm
Thanks Henri, I think we can find a middle ground so as to avoid a ton of
scope creep but leave the door open to a better iterative solution. See
notes inline.

-e

On Wed, May 24, 2017 at 11:18 PM, Henri Sivonen 
wrote:

> On Wed, May 24, 2017 at 10:11 AM, Henri Sivonen 
> wrote:
> >> Our current interface is UTF-16, so that's my target for now. I think
> >> whatever cache-friendliness would be lost converting from UTF-16 ->
> UTF-8 ->
> >> UTF-16.
> >
> > I hope this can be reconsidered, because the assumption that it would
> > have to be UTF-16 -> UTF-8 -> UTF-16 isn't accurate.
>
> I see that this part didn't get an on-list reply but got an blog reply:
> http://www.erahm.org/2017/05/24/a-rust-based-xml-parser-for-firefox/
>

Yes sorry, I should have followed up here as well!


> I continue to think it's a bad idea to write another parser that uses
> UTF-16 internally. Even though I can see your desire to keep the
> project tightly scoped, I think it's fair to ask you to expand the
> scope a bit by 1) adding a way to pass Latin-1 data to text nodes
> directly (and use this when the the parser sees a text node is all
> ASCII) and 2) replacing nsScanner with a bit of new buffering code
> that takes bytes from the network and converts them to UTF-8 using
> encoding_rs.
>
> We've both had the displeasure of modifying nsScanner as part of a
> security fix. nsScanner isn't valuable code that we should try to
> keep. It's no longer scanning for anything. It's just an
> over-complicated way of maintaining a buffer of UTF-16 data. While
> nsScanner and the associated classes are a lot of code, they do
> something simple that should be done in quite a bit less code, so as
> scope creep, replacing nsScanner should be a drop in a bucket
> effort-wise compared to replacing expat.
>
> I think it's super-sad if we get another UTF-16-using parser because
> replacing nsScanner was scoped out of the project.
>

Limiting to modifying nsScanner might be an option, but probably not
changing all callers that use the nsAString interface. I guess we can just
UTF-16 => UTF-8 those and file a bunch of follow ups?

One thing we've ignored are all the consumers expect output to be UTF-16,
so there's a fair amount of work on that side as well.

Maybe a reasonable approach is to use a UTF-8 interface for the replacement
Rust library and work on a staged rollout:

   1. Start just converting UTF-16 => UTF-8 for input at the nsExpatDriver
   level, UTF-8 => UTF-16 for output
   2. Modify/replace nsScanner with something that works with UTF-8 (and
   encoding_rs?), convert UTF-16 => UTF-8 for the nsAString methods
   3. Follow up replacing nsAString methods with UTF-8 versions
   4.  Look into whether modifying the consumers of the tokenized data to
   handle UTF-8 is reasonable, follow up as necessary

WDYT?


> --
> Henri Sivonen
> hsivo...@hsivonen.fi
> https://hsivonen.fi/
>
___
dev-platform mailing list
dev-platform@lists.mozilla.org
https://lists.mozilla.org/listinfo/dev-platform


Re: Scope of XML parser rewrite?

2017-05-25 Thread Henri Sivonen
On Wed, May 24, 2017 at 10:11 AM, Henri Sivonen  wrote:
>> Our current interface is UTF-16, so that's my target for now. I think
>> whatever cache-friendliness would be lost converting from UTF-16 -> UTF-8 ->
>> UTF-16.
>
> I hope this can be reconsidered, because the assumption that it would
> have to be UTF-16 -> UTF-8 -> UTF-16 isn't accurate.

I see that this part didn't get an on-list reply but got an blog reply:
http://www.erahm.org/2017/05/24/a-rust-based-xml-parser-for-firefox/

I continue to think it's a bad idea to write another parser that uses
UTF-16 internally. Even though I can see your desire to keep the
project tightly scoped, I think it's fair to ask you to expand the
scope a bit by 1) adding a way to pass Latin-1 data to text nodes
directly (and use this when the the parser sees a text node is all
ASCII) and 2) replacing nsScanner with a bit of new buffering code
that takes bytes from the network and converts them to UTF-8 using
encoding_rs.

We've both had the displeasure of modifying nsScanner as part of a
security fix. nsScanner isn't valuable code that we should try to
keep. It's no longer scanning for anything. It's just an
over-complicated way of maintaining a buffer of UTF-16 data. While
nsScanner and the associated classes are a lot of code, they do
something simple that should be done in quite a bit less code, so as
scope creep, replacing nsScanner should be a drop in a bucket
effort-wise compared to replacing expat.

I think it's super-sad if we get another UTF-16-using parser because
replacing nsScanner was scoped out of the project.

-- 
Henri Sivonen
hsivo...@hsivonen.fi
https://hsivonen.fi/
___
dev-platform mailing list
dev-platform@lists.mozilla.org
https://lists.mozilla.org/listinfo/dev-platform


Re: Scope of XML parser rewrite?

2017-05-24 Thread Axel Hecht

Am 24.05.17 um 09:34 schrieb Anne van Kesteren:

On Tue, May 23, 2017 at 8:23 PM, Eric Rahm  wrote:

I was hoping to write a more thorough blog post about this proposal (I have
some notes in a gist), but for now I've added comments inline. The main
takeaway here is that I want to do a bare-bones replacement of just the
parts of expat we currently use. It needs to support DTD entities, have a
streaming interface, and support XML 1 v4. That's it, no new features, no
rewrite of our entire XML stack.


"XML5" supports entities (at least my original version did), I think
the main problem is that there's no support for external DTDs. Not
sure how much that differs from parsing the internal subset. Either
way, that's always been a feature that as far as the web is concerned
is not supported so could conceivably be a Firefox-only thing. Only
XUL needs it.


Technical correction, our use of DTDs is independent of XUL, we use the 
same thing for XHTML UI parts. Which is the reason why we're not using 
HTML there.


We do intend to get rid of it, that's what L20n and Fluent are for, and 
we're more than happy to see more people fight for that :-)


Truth be told, though, we can only drop support when the last bit of UI 
is converted to L20n, and not just in Firefox, but also the other stuff. 
Y'know, Thunderbird, too, I guess.


Axel


My current goal is a drop-in replacement for expat with just the features
gecko cares about, so just 1.0 version 4 I guess. It's possible whatever we
end up with could be merged with another library when XML5 is settled, but I
don't want to wait for that.


Contrary to Henri, I think XML 1.0 edition 5 (which isn't "XML5") is
worth considering given
https://bugzilla.mozilla.org/show_bug.cgi?id=501837. It's what Chrome
ships and our current implementation doesn't seem to align with either
the 4th or 5th edition of XML 1.0.




___
dev-platform mailing list
dev-platform@lists.mozilla.org
https://lists.mozilla.org/listinfo/dev-platform


Re: Scope of XML parser rewrite?

2017-05-24 Thread Henri Sivonen
On Wed, May 24, 2017 at 10:34 AM, Anne van Kesteren  wrote:
> Contrary to Henri, I think XML 1.0 edition 5 (which isn't "XML5") is
> worth considering given
> https://bugzilla.mozilla.org/show_bug.cgi?id=501837. It's what Chrome
> ships and our current implementation doesn't seem to align with either
> the 4th or 5th edition of XML 1.0.

OK, if Chrome has shipped 1.0 5th ed., we should, too. :-(

-- 
Henri Sivonen
hsivo...@hsivonen.fi
https://hsivonen.fi/
___
dev-platform mailing list
dev-platform@lists.mozilla.org
https://lists.mozilla.org/listinfo/dev-platform


Re: Scope of XML parser rewrite?

2017-05-24 Thread Anne van Kesteren
On Tue, May 23, 2017 at 8:23 PM, Eric Rahm  wrote:
> I was hoping to write a more thorough blog post about this proposal (I have
> some notes in a gist), but for now I've added comments inline. The main
> takeaway here is that I want to do a bare-bones replacement of just the
> parts of expat we currently use. It needs to support DTD entities, have a
> streaming interface, and support XML 1 v4. That's it, no new features, no
> rewrite of our entire XML stack.

"XML5" supports entities (at least my original version did), I think
the main problem is that there's no support for external DTDs. Not
sure how much that differs from parsing the internal subset. Either
way, that's always been a feature that as far as the web is concerned
is not supported so could conceivably be a Firefox-only thing. Only
XUL needs it.


> My current goal is a drop-in replacement for expat with just the features
> gecko cares about, so just 1.0 version 4 I guess. It's possible whatever we
> end up with could be merged with another library when XML5 is settled, but I
> don't want to wait for that.

Contrary to Henri, I think XML 1.0 edition 5 (which isn't "XML5") is
worth considering given
https://bugzilla.mozilla.org/show_bug.cgi?id=501837. It's what Chrome
ships and our current implementation doesn't seem to align with either
the 4th or 5th edition of XML 1.0.


-- 
https://annevankesteren.nl/
___
dev-platform mailing list
dev-platform@lists.mozilla.org
https://lists.mozilla.org/listinfo/dev-platform


Re: Scope of XML parser rewrite?

2017-05-24 Thread Henri Sivonen
On Wed, May 24, 2017 at 10:11 AM, Henri Sivonen  wrote:
> It seems to me that attribute values would be the only case where a
> conversion from UTF-8 to UTF-16 would be needed all the time, and that
> conversion can be fast for ASCII, which is what attribute values
> mostly are.

Moreover, this conversion doesn't need to have the cost of converting
potentially-bogus UTF-8 to UTF-16 but only the cost of converting
guaranteed-valid UTF-8 to UTF-16, because UTF-8 validity was already
guaranteed earlier.

-- 
Henri Sivonen
hsivo...@hsivonen.fi
https://hsivonen.fi/
___
dev-platform mailing list
dev-platform@lists.mozilla.org
https://lists.mozilla.org/listinfo/dev-platform


Re: Scope of XML parser rewrite?

2017-05-24 Thread Henri Sivonen
On Tue, May 23, 2017 at 5:01 PM, Daniel Fath  wrote:
> So, if I understand this correctly - We'll first need to land this component
> in Firefox, right? And if it proves itself fine, then formalize it.

No, both the implementation and the spec would have to be pretty solid
before stuff can go into Firefox. But, as noted, DTDs are a blocker
(if Firefox is to use the same XML parser for both XUL and for the
Web, which makes sense in terms of binary size even if it's rather sad
for XUL to constrain the Web side).

>> I was thinking of having resolutions for the issues that are currently
>> warnings in red and multi-vendor buy-in. (Previously, Tab from Google
>> was interested in making SVG parsing non-Draconian, but I have no idea
>> how reflective of wider buy-in that remark was.)
>
> You also mentioned warnings in red and multi-vendor buy-in. What does that
> entail?

Looks like at this time, even Mozilla-internal buy-in is lacking. :-/

On Tue, May 23, 2017 at 9:23 PM, Eric Rahm  wrote:
> I was hoping to write a more thorough blog post about this proposal (I have
> some notes in a gist [1]), but for now I've added comments inline. The main
> takeaway here is that I want to do a bare-bones replacement of just the
> parts of expat we currently use. It needs to support DTD entities, have a
> streaming interface, and support XML 1 v4. That's it, no new features, no
> rewrite of our entire XML stack.

OK.

> Our current interface is UTF-16, so that's my target for now. I think
> whatever cache-friendliness would be lost converting from UTF-16 -> UTF-8 ->
> UTF-16.

I hope this can be reconsidered, because the assumption that it would
have to be UTF-16 -> UTF-8 -> UTF-16 isn't accurate.

encoding_rs (https://bugzilla.mozilla.org/show_bug.cgi?id=encoding_rs)
adds the capability to decode directly to UTF-8. This is a true
direct-to-UTF-8 capability without pivoting through UTF-16. When the
input is UTF-8 (as is the case with our chrome XML and with most
on-the-Web XML), in the streaming mode, except for the few bytes
representing code points split across buffer boundaries, this is fast
UTF-8 validation (without doing math to compute scalar values and with
SIMD acceleration for ASCII runs) and memcpy. (In the non-streaming
case, it's validation and borrow when called from Rust and validation
and nsStringBuffer refcount increment when called from C++.)

On the other side of the parser, it's true that our DOM API takes
UTF-16, but if all the code points in a text node are U+00veryFF or
under, the text gets stored with leading zeros omitted. It would be
fairly easy to add a hole in the abstraction to allow a UTF-8-oriented
parser to set the compressed form directly without expansion to UTF-16
and then compression immediately back to ASCII when the parser knows
that a text node is all ASCII.

For element and attribute names, we already support finding atoms by
UTF-8 representation and in most cases element and attribute names are
ASCII with static atoms already existing for them.

It seems to me that attribute values would be the only case where a
conversion from UTF-8 to UTF-16 would be needed all the time, and that
conversion can be fast for ASCII, which is what attribute values
mostly are.

Furthermore, the main Web XML case is SVG, which has relatively little
natural-language text, so it's almost entirely ASCII. Looking at the
ratio markup and natural-language text in XUL, it seems fair to guess
that parsing XUL as UTF-8 would be a cache-friendliness win, too.

-- 
Henri Sivonen
hsivo...@hsivonen.fi
https://hsivonen.fi/
___
dev-platform mailing list
dev-platform@lists.mozilla.org
https://lists.mozilla.org/listinfo/dev-platform


Re: Scope of XML parser rewrite?

2017-05-23 Thread Eric Rahm
I was hoping to write a more thorough blog post about this proposal (I have
some notes in a gist [1]), but for now I've added comments inline. The main
takeaway here is that I want to do a bare-bones replacement of just the
parts of expat we currently use. It needs to support DTD entities, have a
streaming interface, and support XML 1 v4. That's it, no new features, no
rewrite of our entire XML stack.

-e

[1] https://gist.github.com/EricRahm/f718c4d8a862cc08b69d7d4290c02927

On Mon, May 22, 2017 at 11:43 PM, Henri Sivonen 
wrote:

> In reference to: https://twitter.com/nnethercote/status/866792097101238272
>
> Is the rewrite meant to replace expat only or also some of our old
> code on both above and below expat?
>

Just expat.


> Back in 2011, I wrote a plan for rewriting the code around expat
> without rewriting expat itself:
> https://wiki.mozilla.org/Platform/XML_Rewrite
> I've had higher-priority stuff to do ever since...
>
>
Yes, I've seen this. It explicitly calls out not replacing expat, so the
plans are mostly orthogonal.


> (The above plan talks about pushing UTF-16 to the XML parser and
> having deep C++ namespaces. Any project starting this year should make
> the new parser use UTF-8 internally for cache-friendliness and use
> less deep C++ namespaces.)
>

Our current interface is UTF-16, so that's my target for now. I think
whatever cache-friendliness would be lost converting from UTF-16 -> UTF-8
-> UTF-16.


> Also, I think the decision of which XML version to support should be a
> deliberate decision and not an accident. I think the reasonable
> choices are XML 1.0 4th edition (not rocking the boat) and reviving
> XML5 (original discussion: https://annevankesteren.nl/2007/10/xml5 ,
> latest draft: https://ygg01.github.io/xml5_draft/). XML 1.1 is dead.
> XML 1.0 5th edition tried to have the XML 1.0 cake and eat the XML 1.1
> cake too and expanded the set of documents that parser doesn't reject.
> Any of the newly well-forming documents would be incompatible with 4th
> ed. and earlier parsers, which would be a break from universal XML
> interop. I think it doesn't make sense to relax XML only a bit. If XML
> is to be relaxed (breaking interop in the sense of starting to accept
> docs that old browsers would show the Yellow Screen of Death on), we
> should go all the way (i.e. XML5).
>
>
My current goal is a drop-in replacement for expat with just the features
gecko cares about, so just 1.0 version 4 I guess. It's possible whatever we
end up with could be merged with another library when XML5 is settled, but
I don't want to wait for that.


> Notably, it looks like Servo already has an XML5 parser written in Rust:
> https://github.com/servo/html5ever/tree/master/xml5ever
>
>
Yes, this lacks DTD support (and 1.0 support).


> The tweets weren't clear about whether xml5ever had been considered,
> but https://twitter.com/eroc/status/866808814959378434 looks like it's
> talking about writing a new one.
>
>
Correct, I looked at xml5ever and spoke with some folks on #servo about it.
It doesn't meet Firefox's requirements.


> It seems like integrating xml5ever (as opposed to another XML parser
> written in Rust) into Gecko would give some insight into how big a
> deal it would be to replace Gecko's HTML parser with html5ever
> (although due to document.write(), HTML is always a bigger deal
> integration-wise than XML).
>
>
That's a non-goal for me, but I can see how it would be useful.


> (If the outcome here is to do XML5, we should make sure the spec is
> polished enough at the WHATWG in order not to a unilateral thing in
> relative secret.)
>
>
That is not my current goal, but that seems reasonable regardless of this
project.


> --
> Henri Sivonen
> hsivo...@hsivonen.fi
> https://hsivonen.fi/
>
___
dev-platform mailing list
dev-platform@lists.mozilla.org
https://lists.mozilla.org/listinfo/dev-platform


Re: Scope of XML parser rewrite?

2017-05-23 Thread Axel Hecht

Am 23.05.17 um 16:01 schrieb Daniel Fath:

So, if I understand this correctly - We'll first need to land this
component in Firefox, right? And if it proves itself fine, then formalize
it.


I was thinking of having resolutions for the issues that are currently
warnings in red and multi-vendor buy-in. (Previously, Tab from Google
was interested in making SVG parsing non-Draconian, but I have no idea
how reflective of wider buy-in that remark was.)


You also mentioned warnings in red and multi-vendor buy-in. What does that
entail?

Will lack of support for DTD be a problem? In XML5 it was decided, that
instead of parsing DTD we just store list of named character references
from
https://html.spec.whatwg.org/multipage/syntax.html#named-character-references.
While we could add another list and expand entities.json. It's possible I
need to update spec to reflect that.


Yes, not parsing DTDs would be a deal-breaker for the foreseeable 
future, as we're abusing DTDs to localize X(H)TML documents.


Axel



PS. I hope I'm not spamming you guys too hard, I'm kind of new to the
mailing list thing.

Daniel Fath,
daniel.fa...@gmail.com



___
dev-platform mailing list
dev-platform@lists.mozilla.org
https://lists.mozilla.org/listinfo/dev-platform


Re: Scope of XML parser rewrite?

2017-05-23 Thread Daniel Fath
So, if I understand this correctly - We'll first need to land this
component in Firefox, right? And if it proves itself fine, then formalize
it.

> I was thinking of having resolutions for the issues that are currently
> warnings in red and multi-vendor buy-in. (Previously, Tab from Google
> was interested in making SVG parsing non-Draconian, but I have no idea
> how reflective of wider buy-in that remark was.)

You also mentioned warnings in red and multi-vendor buy-in. What does that
entail?

Will lack of support for DTD be a problem? In XML5 it was decided, that
instead of parsing DTD we just store list of named character references
from
https://html.spec.whatwg.org/multipage/syntax.html#named-character-references.
While we could add another list and expand entities.json. It's possible I
need to update spec to reflect that.

PS. I hope I'm not spamming you guys too hard, I'm kind of new to the
mailing list thing.

Daniel Fath,
daniel.fa...@gmail.com
___
dev-platform mailing list
dev-platform@lists.mozilla.org
https://lists.mozilla.org/listinfo/dev-platform


Re: Scope of XML parser rewrite?

2017-05-23 Thread Henri Sivonen
On Tue, May 23, 2017 at 12:44 PM, Daniel Fath  wrote:
>> (If the outcome here is to do XML5, we should make sure the spec is
>> polished enough at the WHATWG in order not to a unilateral thing in
>> relative secret.)
>
> What does it mean to be polished enough at the WHATWG?

I was thinking of having resolutions for the issues that are currently
warnings in red and multi-vendor buy-in. (Previously, Tab from Google
was interested in making SVG parsing non-Draconian, but I have no idea
how reflective of wider buy-in that remark was.)

> Also how far reaching should spec be? Include Namespaces?

I would expect the spec to take a byte stream as input, specify how
the encoding is determined, delegate the decoding from bytes to
Unicode code points to the Encoding Standard and then define how the
code point stream is processed into a DOM tree. (Bonus points for
defining a coercion to an XML 1.0 4th ed. Infoset, too, for
non-browser use cases.) That would include the processing of
Namespaces.

-- 
Henri Sivonen
hsivo...@hsivonen.fi
https://hsivonen.fi/
___
dev-platform mailing list
dev-platform@lists.mozilla.org
https://lists.mozilla.org/listinfo/dev-platform


Re: Scope of XML parser rewrite?

2017-05-23 Thread Daniel Fath
> (If the outcome here is to do XML5, we should make sure the spec is
> polished enough at the WHATWG in order not to a unilateral thing in
> relative secret.)

What does it mean to be polished enough at the WHATWG?

Also how far reaching should spec be? Include Namespaces?

On Tue, May 23, 2017 at 9:01 AM, Henri Sivonen  wrote:

> Figured out the email address of the XML5 editor / xml5ever developer,
> so adding to CC.
>
> On Tue, May 23, 2017 at 9:43 AM, Henri Sivonen 
> wrote:
> > In reference to: https://twitter.com/nnethercote/status/
> 866792097101238272
> >
> > Is the rewrite meant to replace expat only or also some of our old
> > code on both above and below expat?
> >
> > Back in 2011, I wrote a plan for rewriting the code around expat
> > without rewriting expat itself:
> > https://wiki.mozilla.org/Platform/XML_Rewrite
> > I've had higher-priority stuff to do ever since...
> >
> > (The above plan talks about pushing UTF-16 to the XML parser and
> > having deep C++ namespaces. Any project starting this year should make
> > the new parser use UTF-8 internally for cache-friendliness and use
> > less deep C++ namespaces.)
> >
> > Also, I think the decision of which XML version to support should be a
> > deliberate decision and not an accident. I think the reasonable
> > choices are XML 1.0 4th edition (not rocking the boat) and reviving
> > XML5 (original discussion: https://annevankesteren.nl/2007/10/xml5 ,
> > latest draft: https://ygg01.github.io/xml5_draft/). XML 1.1 is dead.
> > XML 1.0 5th edition tried to have the XML 1.0 cake and eat the XML 1.1
> > cake too and expanded the set of documents that parser doesn't reject.
> > Any of the newly well-forming documents would be incompatible with 4th
> > ed. and earlier parsers, which would be a break from universal XML
> > interop. I think it doesn't make sense to relax XML only a bit. If XML
> > is to be relaxed (breaking interop in the sense of starting to accept
> > docs that old browsers would show the Yellow Screen of Death on), we
> > should go all the way (i.e. XML5).
> >
> > Notably, it looks like Servo already has an XML5 parser written in Rust:
> > https://github.com/servo/html5ever/tree/master/xml5ever
> >
> > The tweets weren't clear about whether xml5ever had been considered,
> > but https://twitter.com/eroc/status/866808814959378434 looks like it's
> > talking about writing a new one.
> >
> > It seems like integrating xml5ever (as opposed to another XML parser
> > written in Rust) into Gecko would give some insight into how big a
> > deal it would be to replace Gecko's HTML parser with html5ever
> > (although due to document.write(), HTML is always a bigger deal
> > integration-wise than XML).
> >
> > (If the outcome here is to do XML5, we should make sure the spec is
> > polished enough at the WHATWG in order not to a unilateral thing in
> > relative secret.)
> >
> > --
> > Henri Sivonen
> > hsivo...@hsivonen.fi
> > https://hsivonen.fi/
>
>
>
> --
> Henri Sivonen
> hsivo...@hsivonen.fi
> https://hsivonen.fi/
>
___
dev-platform mailing list
dev-platform@lists.mozilla.org
https://lists.mozilla.org/listinfo/dev-platform


Re: Scope of XML parser rewrite?

2017-05-23 Thread Henri Sivonen
Figured out the email address of the XML5 editor / xml5ever developer,
so adding to CC.

On Tue, May 23, 2017 at 9:43 AM, Henri Sivonen  wrote:
> In reference to: https://twitter.com/nnethercote/status/866792097101238272
>
> Is the rewrite meant to replace expat only or also some of our old
> code on both above and below expat?
>
> Back in 2011, I wrote a plan for rewriting the code around expat
> without rewriting expat itself:
> https://wiki.mozilla.org/Platform/XML_Rewrite
> I've had higher-priority stuff to do ever since...
>
> (The above plan talks about pushing UTF-16 to the XML parser and
> having deep C++ namespaces. Any project starting this year should make
> the new parser use UTF-8 internally for cache-friendliness and use
> less deep C++ namespaces.)
>
> Also, I think the decision of which XML version to support should be a
> deliberate decision and not an accident. I think the reasonable
> choices are XML 1.0 4th edition (not rocking the boat) and reviving
> XML5 (original discussion: https://annevankesteren.nl/2007/10/xml5 ,
> latest draft: https://ygg01.github.io/xml5_draft/). XML 1.1 is dead.
> XML 1.0 5th edition tried to have the XML 1.0 cake and eat the XML 1.1
> cake too and expanded the set of documents that parser doesn't reject.
> Any of the newly well-forming documents would be incompatible with 4th
> ed. and earlier parsers, which would be a break from universal XML
> interop. I think it doesn't make sense to relax XML only a bit. If XML
> is to be relaxed (breaking interop in the sense of starting to accept
> docs that old browsers would show the Yellow Screen of Death on), we
> should go all the way (i.e. XML5).
>
> Notably, it looks like Servo already has an XML5 parser written in Rust:
> https://github.com/servo/html5ever/tree/master/xml5ever
>
> The tweets weren't clear about whether xml5ever had been considered,
> but https://twitter.com/eroc/status/866808814959378434 looks like it's
> talking about writing a new one.
>
> It seems like integrating xml5ever (as opposed to another XML parser
> written in Rust) into Gecko would give some insight into how big a
> deal it would be to replace Gecko's HTML parser with html5ever
> (although due to document.write(), HTML is always a bigger deal
> integration-wise than XML).
>
> (If the outcome here is to do XML5, we should make sure the spec is
> polished enough at the WHATWG in order not to a unilateral thing in
> relative secret.)
>
> --
> Henri Sivonen
> hsivo...@hsivonen.fi
> https://hsivonen.fi/



-- 
Henri Sivonen
hsivo...@hsivonen.fi
https://hsivonen.fi/
___
dev-platform mailing list
dev-platform@lists.mozilla.org
https://lists.mozilla.org/listinfo/dev-platform