I realized this morning that we can probably avoid the need for a leaf set by intertwining selector matching and flow construction, and flow construction and intrinsic-width-bubbling, to some degree. This is similar to what Gecko and WebKit do, but potentially somewhat cleaner because they are still separate functions and the implementations are strictly separate (no bouncing back and forth to handle special incremental-reflow cases); the parallel driver just knows how to invoke them. This works thanks to the heterogeneous nature of the `WorkQueue`: it can accept heterogeneous tasks and can run them all in parallel.

* Once the selectors have been matched for a leaf node, we can immediately start constructing its flows. Just call `construct_flows` once the node has been matched. * Trickier, but also likely possible: Once flows have been constructed for a leaf node, immediately call `bubble_widths` on it. This works because we always know when a flow is going to be a leaf since e579daefc2956a2eb151588b628c51342de236d0. * Once `assign_widths` has been called on a leaf, immediately start assigning its heights via `assign_heights`.

Assuming this works out, all parallel traversals will start from the root and go down, eliminating the need for a leaf set. We will probably still want a "backdoor" that sequentially computes bubble-widths for two reasons: (1) during incremental reflow, min/pref widths may have been invalidated without invalidating the flow; (2) it's easier to benchmark style recalc against Gecko and WebKit when it's not intertwined with intrinsic width calculation.

This would have numerous benefits:

1. Leaf set construction is expensive. On Wikipedia it's 16% of selector matching time on 4 cores. For comparison, that's difference between getting a 2.4x speedup for selector matching and getting a 2.9x speedup on 4 cores. 2. Eliminates one or two parallel traversals, reducing overhead. (In particular the warmup phase will go to essentially zero.) 3. Eliminates the synchronization point between selector matching and flow construction, allowing better multicore utilization. 4. Eliminates the necessity of ensuring that DOM nodes in the leaf set are alive which will be a bit of a pain when we start doing incremental reflow.
5. Better memory usage since the leaf set data structures will go away.

