Selector matching and caching (aka kill RuleProcessorData)

Boris Zbarsky Wed, 06 Jan 2010 14:10:22 -0800

I've been doing some more profiling of selector matching and looking atwebkit's implementation a bit, and one difference is that they don'thave an equivalent of our RuleProcessorData setup: their selectormatching uses only data that's available on the node itself, or cachesdata on the node or on its RenderStyle. In particular, any cache usedthere persists across possibly multiple style resolutions and is notdynamically allocated.

I tried hacking up SelectorMatches/SelectorMatchesTree to just matchdirectly on the node instead of on RuleProcessorData and changed variouscallsites to not actually allocate RuleProcessorData structs, more orless. This sped up the SlickSpeed querySelector test by a good bit (40%or so), as expected, since in that case we only match each data againstone selector so the caching is not worth it. I also tried the patch ona complete reframe of the HTML5 single-page spec, and it looks like it'sa slight win there too (order of 50ms out of 1200ms). This test is moreinteresting, since this is the situation the RuleProcessorData issupposed to help with... however it might be thatnamespace+tag+id+classes are fast enough to get and eliminate enoughselectors that in practice the more complicated caching isn't worth it,at least on this page.

Now it _is_ possible to create testcases where such a non-cachingapproach will be a lot slower than what we have now; I'm just not surehow common they are in practice.

Looking in detail into what RuleProcessorData stores (after sdwilshlands his async history stuff) we have:


* mPresContext -- only needed to allocate the parent/prevsibling data,
                  so can go away if RuleProcessorData does.
* mContent -- would become an argument
* mParentContent -- cheap to get
* mRuleWalker -- only needed for the rulehash enumeration, NOT for
                 selector matching.  Rulehash enumeration could keep
                 using a struct that has the rulewalker, prescontext
                 and maybe a few other things.
* mScopedRoot -- Need to figure out the right place to stash this.
                 Most importantly, this is tied to the rule being
                 matched, not to the element or selector.
* mContentTag -- cheap to get
* mContentID -- cheap enough to get; can be made cheaper
* mIsHTMLContent -- cheap to get
* mIsHTML -- involves a check on the document, but the document boolean
             here is invariant across a wide range of things (e.g. all
             of style resolution for a node, or an entire querySelector
             invocation), so can be passed around to selector-matching
             code explicitly.
* mHasAttributes -- in practice, cheap enough to get
* mCompatMode -- can be passed around like the document HTML boolean;
                 doesn't depend on element.
* mNameSpaceID -- cheap to get
* mClasses -- cheap enough; can be made cheaper
* mPreviousSiblingData -- would go away
* mParentData -- would go away
* mLanguage -- could just be computed each time in the rare cases it's
               needed, I think.  It _would_ be possible to write
               pathological testcases that are slower as a result, but
               I don't think we care.
* mNthIndices -- see below
* mContentState -- could be cached in the node, I think... Or we could
                   stop using a bitfield here and use the webkit setup
                   of explicit boolean getters plus casts to subclasses
                   in some cases (e.g. for the form control states).
                   Or we could just get it each time we need it (for
                   pseudo-class matches only).  It's usually not THAT
                   expensive to compute, and not needed that much.

That leaves mNthIndices. As a first cut we could just not cache these,but that can lead to a bit of pain if multiple :nth-child() selectorsare around that might all match a given node. That's the only case ithelps us right now, though, except in querySelector where we use theprevious sibling's indices to good effect. What Webkit does here is tocache a 17-bit index in nodes (using some spare bits they have); theyonly cache the :nth-child index, so there's no caching for nth-of-typeor either of the *-last pseudos. If we wanted to, we could move thiscache to either the element or to slots, but then invalidation becomesan issue. Webkit invalidates by simply triggering a reresolve on theparent on DOM mutations if one of these selectors was used; since theyreresolve along the DOM this works fine (earlier kids' indices arerecomputed beforel later ones, so the later ones can use the earliercache to figure out theirs). We reresolve along the frame tree, and XBLmakes it such that the order here does not match DOM order. So we wouldnot in fact have the needed earlier/later guarantee.

So apart from mNthIndices and maybe mContentState, I think this structcan just go away. If we can figure out a good plan for mNthIndices, Ithink we should just kill it off....


Thoughts?

-Boris
_______________________________________________
dev-tech-layout mailing list
[email protected]
https://lists.mozilla.org/listinfo/dev-tech-layout

Selector matching and caching (aka kill RuleProcessorData)

Reply via email to