On 06/03/14 02:05, Keegan McAllister wrote:

Writing our own HTML5 parser would be a lot of work, but does not
seem infeasible.  The parsers I've found (including the translated
C++ code for Gecko) are in the 10-20 KLoC range.  We can do a
one-time translation from Java for the most mechanical parts, without
building a complete translator.

FWIW I would estimate that a from-scratch implementation of a HTML parser that could replace Hubbub would be a "summer of code" sized project i.e. I would expect a reasonably new contributer to manage it in a couple of months and an experienced contributer to manage it in much less than that. Indeed much of hubbub itself was originally done as a GSoC project [1].

There is a standard test suite [2] for static HTML5 parsers.
Browsers have additional requirements due to speculation and
document.write(), but it looks like [3] Gecko implements that outside
the translated parser, so this is code we would have to write and
test in any case.

So part of the difficulty of document.write comes from the fact that it has to interact with the script loading / document lifecycle. Therefore it's going to be hard to get those parts of (any) parser right until we actually implement a more correct model of document loading. Ideally the two things would be designed concurrently so that there isn't an impedance mismatch between the parser and the loading code.

For the short term I will continue to work on the translator and see
if we can get more clarity about some of these unknowns.  But I'm
also inclined to try implementing parts of a new HTML5 parser in
Rust.  At any rate we should pay close attention to Gecko's parser
design, and I will continue reading through that code.

My suspicion is that it's possible to spend more time talking about various options than it would take to stand up a rough prototype parser (with e.g. less important tokenizer/treebuilder states missing). Therefore I think this sounds like a great idea.

[1] http://www.netsurf-browser.org/developers/gsoc/
_______________________________________________
dev-servo mailing list
dev-servo@lists.mozilla.org
https://lists.mozilla.org/listinfo/dev-servo

Reply via email to