Re: How to efficiently walk the DOM tree and its strings

Felipe G Tue, 04 Mar 2014 11:47:38 -0800

Thanks for the feedback so far!

If I go with the clone route (to work on the snapshot'ed version of the
data), how can I later associate the cloned nodes to the original nodes
from the document?  One way that I thought is to set a a userdata on the
DOM nodes and then use the clone handler callback to associate the cloned
node with the original one (through weak refs or a WeakMap).  That would
mean iterating first through all nodes to add the handlers, but that's
probably fine (I don't need to analyze anything or visit text nodes).


I think serializing and re-parsing everything in the worker is not the
ideal solution unless we can find a way to also keep accurate associations
with the original nodes from content. Anything that introduces a possibly
lossy data aspect will probably hurt translation which is already an
innacurate science.


On Tue, Mar 4, 2014 at 6:26 AM, Andrew Sutherland <
asutherl...@asutherland.org> wrote:

> On 03/04/2014 03:13 AM, Henri Sivonen wrote:
>
>> It saddens me that we are using non-compliant ad hoc parsers when we
>> already have two spec-compliant (at least at some point in time) ones.
>>
>
> Interesting!  I assume you are referring to:
> https://github.com/davidflanagan/html5/blob/master/html5parser.js
>
> Which seems to be (explicitly) derived from:
> https://github.com/aredridel/html5
>
> Which in turn seems to actually includes a few parser variants.
>
> Per the discussion with you on https://groups.google.com/d/
> msg/mozilla.dev.webapi/wDFM_T9v7Tc/Nr9Df4FUwuwJ for the Gaia e-mail app
> we initially ended up using an in-page data document mechanism for
> sanitization.  We later migrated to using a worker based parser.  There
> were some coordination hiccups with this migration (
> https://bugzil.la/814257) and some time B2G time-pressure so a
> comprehensive survey of HTML parsers did not happen so much.
>
> While we have a defense-in-depth strategy (CSP and iframe sandbox should
> be protecting us from the worst possible scenarios) and we're hopeful that
> Service Workers will eventually let us provide nsIContentPolicy-level
> protection, the quality of the HTML parser is of course fairly important[1]
> to the operation of the HTML sanitizer.  If you'd like to bless a specific
> implementation for workers to perform streaming HTML parsing or other some
> other explicit strategy, I'd be happy to file a bug for us to go in that
> direction.  Because we are using a white-list based mechanism and are
> fairly limited and arguably fairly luddite in what we whitelist, it's my
> hope that our errors are on the side of safety (and breaking adventurous
> HTML email :), but that is indeed largely hope.  Your input is definitely
> appreciated, especially as it relates to prioritizing such enhancements and
> potential risk from our current strategy.
>
> Andrew
>
>
> 1: understatement
>
> _______________________________________________
> dev-platform mailing list
> dev-platform@lists.mozilla.org
> https://lists.mozilla.org/listinfo/dev-platform
>
_______________________________________________
dev-platform mailing list
dev-platform@lists.mozilla.org
https://lists.mozilla.org/listinfo/dev-platform

Re: How to efficiently walk the DOM tree and its strings

Reply via email to