Re-visiting the DOM tree depth limit in layout

Henri Sivonen Mon, 11 Sep 2017 04:41:56 -0700

Frame construction runs recursive algorithms along the depth of the
DOM tree. The depth is currently capped at 200, because deeper trees
caused stack overflow crashes on Windows in the Windows 95 era. There
is a corresponding limit the HTML parser. The parser limit is there
just to avoid showing frame construction to deep trees; there is no
reason for the limit arising from the parser itself. The parser tries
to preserve text nodes even after the limit has been reached. This
works in simple cases, but apparently doesn't work in more complex
cases.

For many years, my standard response to people who complained was to
point the finger at layout and mark bug reports as duplicates of
https://bugzilla.mozilla.org/show_bug.cgi?id=256180 . This seems
acceptable, because the failures where on sites deep in the long tail.

Recently, it has come to my attention
(https://bugzilla.mozilla.org/show_bug.cgi?id=1188731) that the depth
limit is a problem in a more serious case. Apparently there are email
clients that generate unreasonably nested HTML as a side effect of
rich text editing and there are notable webmail clients (at least
Yahoo! apparently, but I have a vague recollection that I saw a
complaint about Hotmail, too, when browsing duplicates) that don't
restructure such HTML email, so the deep trees are exposed to Firefox.
When the HTML parser fails to keep text nodes visible when trying to
rewrite the tree before it reaches layout, the result is parts of
emails going missing from layout, which may lead to notable badness
due to users misinterpreting what the email say, which may lead to
users switching browsers after realizing that there are other browsers
that don't hide these parts of emails.

There are three areas where changes could be made:
1) We could re-calibrate the depth limit.
2) The HTML parser could try harder to make the DOM rewrites keep
text nodes visible.
3) The frame constructor could switch from a full-features recursive
algorithm to an iterative text node-only traversal near the depth
limit.

I'd expect text node recovery (flattening out elements and just
considering text nodes) in the frame constructor to be a more robust
solution than trying to address the problem in the HTML parser. What's
the feasibility of such a frame constructor change?

In the meantime, considering that this is a problem that can result in
users switching browsers, I think we should change the depth limit and
make it different depending on operating system so that Mac and Linux
users don't need to switch browsers due to Windows limitations. (Or
users of 64-bit Windows due to 32-bit Windows limitations.)

My findings from testing with a very high limit so far are as follows:
* Firefox opt build on x86_64 Linux crashes when the DOM depth is
between 3000 and 3100.
* Mac: between 3900 and 4000.
* Windows 64-bit: between 740 and 750.
* 32-bit Firefox on 64-bit Windows 10: between 500 and 510.
* 32-bit Firefox on 64-bit Windows 7: between 510 and 520.

As for other browsers:
I didn't find a depth limit for Chrome. (Tried up to 16000.)
On 64-bit Windows 10, Edge's content process crashes when the depth is
between 800 and 1000.
On 64-bit Windows 10, IE11's content process crashes when the depth is
between 1000 and 1100.

OK if I change the limit on a per-OS basis according to these numbers?

Can we ask Windows to give us more stack space?

(I'd appreciate testing help on 32-bit Windows. Build with the limit
set to so large that the stack overflows first:
https://queue.taskcluster.net/v1/task/TUMHVvq1QpG0qmsfREuDZw/runs/0/artifacts/public/build/target.zip
; test cases: https://hsivonen.com/test/moz/deeptree/ )

--
Henri Sivonen
hsivo...@hsivonen.fi
https://hsivonen.fi/
_______________________________________________
dev-platform mailing list
dev-platform@lists.mozilla.org
https://lists.mozilla.org/listinfo/dev-platform

Re-visiting the DOM tree depth limit in layout

Reply via email to