We just had another discussion with several outcomes:

# Floats

Height assignment is currently an in-order traversal in Servo, to cope with floats. However, this could be turned into a bottom-up traversal. The trick is to treat all DOM subtrees that contain floats (and no `clear: both`) as single "macro-nodes" for the purposes of layout. Consider this picture:

             X
            / \
           X   X
               |
               O   <--
              / \
             O   O
             |   |
             O   O
                / \
               Y   X

Suppose that the `O` nodes are potentially impacted by floats and the `X` and `Y` nodes are known not to be in a context that floats could impact (for example, if `Y` has the `clear: both` style). We need to traverse the `O` nodes sequentially to track floats. This can, however, be modeled as a bottom-up traversal with a more complex kernel function.

Recall that kernel functions are allowed to inspect and modify their children, just not their parents. We can therefore simply make the assign-heights kernel function for the node marked by the arrow perform the in-order, sequential traversal that accounts for floats on its subtree. The kernel functions for the remainder of the `O` nodes become no-ops. In other words, for every node in a potentially-float-affected subtree, we turn the assign-heights kernel functions into no-ops for every node except the parent node.

We could probably optimize this in the future to not even call the no-op functions at all.

# Scheduling

The scheme with the concurrent queue that I proposed earlier doesn't work, because we have to make sure that children are never visited until their parents are (for top-down) and that parents are never visited before their children are finished (for bottom-up). Instead, we need somewhat more complex synchronization.

Probably the most efficient thing to do would be to use a Chase-Lev work-stealing queue for each OS thread. We would have a sequential setup phase that does something different depending on whether this is a top-down or bottom-up traversal:

* For a top-down traversal, the setup phase counts the total number of nodes in the tree and then enqueues the root of the tree.

* For a bottom-up traversal, the setup phase counts the number of children of each flow and writes that value into the flow. Then it enqueues the leaves of the tree.

After the setup phase, the worker threads start up. Each worker thread spins on its work queue until done. When a worker has work to do, it removes a node from the work queue and executes the untrusted kernel function. Then:

* For a top-down traversal, the worker atomically decrements the total number of nodes. If zero, it shuts down and sends a Rust message to the main thread to wake up. Otherwise, it pushes children of this node onto its work queue.

* For a bottom-up traversal, the worker looks for a parent. If there is no parent (it's at the root of the tree), it shuts down, atomically sets a "we're done" flag, and sends a Rust message to the main tree to wake up. Otherwise, it atomically decrements its parent's child count. If nonzero, it looks for more work to do. If zero, it pushes the parent onto its work queue.

# Pointer compression

Chase-Lev queues need work items to be representable by a single pointer. This is a problem because flows are two words in Rust (since we do not have inheritance). However, we can perform compression to pack the two words into one word: the pointer should have its lower 4 bits clear, so we can stuff the flow type in there. This can probably be folded into the cast that needs to be done between `Flow<SingleThreaded>` and `Flow<Kernel>` anyway.

# Work items

The outcome is that the work will probably go like this:

1. Change assign-heights to be a bottom-up traversal.

2. Create the phantom type and add it to flows.

3. Add conversions between the different flow phantom types and make the kernel functions use the `Flow<Kernel>` type.

4. Add the count field to each flow, which will be needed for parallel bottom-up traversals.

5. Implement the Chase-Lev deque. As a simpler approach, we could use a shared work queue for now.

6. Implement the parallel traversals.

Patrick
_______________________________________________
dev-servo mailing list
dev-servo@lists.mozilla.org
https://lists.mozilla.org/listinfo/dev-servo

Reply via email to