I've been doing some more profiling of selector matching and looking at webkit's implementation a bit, and one difference is that they don't have an equivalent of our RuleProcessorData setup: their selector matching uses only data that's available on the node itself, or caches data on the node or on its RenderStyle. In particular, any cache used there persists across possibly multiple style resolutions and is not dynamically allocated.

I tried hacking up SelectorMatches/SelectorMatchesTree to just match directly on the node instead of on RuleProcessorData and changed various callsites to not actually allocate RuleProcessorData structs, more or less. This sped up the SlickSpeed querySelector test by a good bit (40% or so), as expected, since in that case we only match each data against one selector so the caching is not worth it. I also tried the patch on a complete reframe of the HTML5 single-page spec, and it looks like it's a slight win there too (order of 50ms out of 1200ms). This test is more interesting, since this is the situation the RuleProcessorData is supposed to help with... however it might be that namespace+tag+id+classes are fast enough to get and eliminate enough selectors that in practice the more complicated caching isn't worth it, at least on this page.

Now it _is_ possible to create testcases where such a non-caching approach will be a lot slower than what we have now; I'm just not sure how common they are in practice.

Looking in detail into what RuleProcessorData stores (after sdwilsh lands his async history stuff) we have:

* mPresContext -- only needed to allocate the parent/prevsibling data,
                  so can go away if RuleProcessorData does.
* mContent -- would become an argument
* mParentContent -- cheap to get
* mRuleWalker -- only needed for the rulehash enumeration, NOT for
                 selector matching.  Rulehash enumeration could keep
                 using a struct that has the rulewalker, prescontext
                 and maybe a few other things.
* mScopedRoot -- Need to figure out the right place to stash this.
                 Most importantly, this is tied to the rule being
                 matched, not to the element or selector.
* mContentTag -- cheap to get
* mContentID -- cheap enough to get; can be made cheaper
* mIsHTMLContent -- cheap to get
* mIsHTML -- involves a check on the document, but the document boolean
             here is invariant across a wide range of things (e.g. all
             of style resolution for a node, or an entire querySelector
             invocation), so can be passed around to selector-matching
             code explicitly.
* mHasAttributes -- in practice, cheap enough to get
* mCompatMode -- can be passed around like the document HTML boolean;
                 doesn't depend on element.
* mNameSpaceID -- cheap to get
* mClasses -- cheap enough; can be made cheaper
* mPreviousSiblingData -- would go away
* mParentData -- would go away
* mLanguage -- could just be computed each time in the rare cases it's
               needed, I think.  It _would_ be possible to write
               pathological testcases that are slower as a result, but
               I don't think we care.
* mNthIndices -- see below
* mContentState -- could be cached in the node, I think... Or we could
                   stop using a bitfield here and use the webkit setup
                   of explicit boolean getters plus casts to subclasses
                   in some cases (e.g. for the form control states).
                   Or we could just get it each time we need it (for
                   pseudo-class matches only).  It's usually not THAT
                   expensive to compute, and not needed that much.

That leaves mNthIndices. As a first cut we could just not cache these, but that can lead to a bit of pain if multiple :nth-child() selectors are around that might all match a given node. That's the only case it helps us right now, though, except in querySelector where we use the previous sibling's indices to good effect. What Webkit does here is to cache a 17-bit index in nodes (using some spare bits they have); they only cache the :nth-child index, so there's no caching for nth-of-type or either of the *-last pseudos. If we wanted to, we could move this cache to either the element or to slots, but then invalidation becomes an issue. Webkit invalidates by simply triggering a reresolve on the parent on DOM mutations if one of these selectors was used; since they reresolve along the DOM this works fine (earlier kids' indices are recomputed beforel later ones, so the later ones can use the earlier cache to figure out theirs). We reresolve along the frame tree, and XBL makes it such that the order here does not match DOM order. So we would not in fact have the needed earlier/later guarantee.

So apart from mNthIndices and maybe mContentState, I think this struct can just go away. If we can figure out a good plan for mNthIndices, I think we should just kill it off....

Thoughts?

-Boris
_______________________________________________
dev-tech-layout mailing list
[email protected]
https://lists.mozilla.org/listinfo/dev-tech-layout

Reply via email to