I've been doing some more profiling of selector matching and looking at
webkit's implementation a bit, and one difference is that they don't
have an equivalent of our RuleProcessorData setup: their selector
matching uses only data that's available on the node itself, or caches
data on the node or on its RenderStyle. In particular, any cache used
there persists across possibly multiple style resolutions and is not
dynamically allocated.
I tried hacking up SelectorMatches/SelectorMatchesTree to just match
directly on the node instead of on RuleProcessorData and changed various
callsites to not actually allocate RuleProcessorData structs, more or
less. This sped up the SlickSpeed querySelector test by a good bit (40%
or so), as expected, since in that case we only match each data against
one selector so the caching is not worth it. I also tried the patch on
a complete reframe of the HTML5 single-page spec, and it looks like it's
a slight win there too (order of 50ms out of 1200ms). This test is more
interesting, since this is the situation the RuleProcessorData is
supposed to help with... however it might be that
namespace+tag+id+classes are fast enough to get and eliminate enough
selectors that in practice the more complicated caching isn't worth it,
at least on this page.
Now it _is_ possible to create testcases where such a non-caching
approach will be a lot slower than what we have now; I'm just not sure
how common they are in practice.
Looking in detail into what RuleProcessorData stores (after sdwilsh
lands his async history stuff) we have:
* mPresContext -- only needed to allocate the parent/prevsibling data,
so can go away if RuleProcessorData does.
* mContent -- would become an argument
* mParentContent -- cheap to get
* mRuleWalker -- only needed for the rulehash enumeration, NOT for
selector matching. Rulehash enumeration could keep
using a struct that has the rulewalker, prescontext
and maybe a few other things.
* mScopedRoot -- Need to figure out the right place to stash this.
Most importantly, this is tied to the rule being
matched, not to the element or selector.
* mContentTag -- cheap to get
* mContentID -- cheap enough to get; can be made cheaper
* mIsHTMLContent -- cheap to get
* mIsHTML -- involves a check on the document, but the document boolean
here is invariant across a wide range of things (e.g. all
of style resolution for a node, or an entire querySelector
invocation), so can be passed around to selector-matching
code explicitly.
* mHasAttributes -- in practice, cheap enough to get
* mCompatMode -- can be passed around like the document HTML boolean;
doesn't depend on element.
* mNameSpaceID -- cheap to get
* mClasses -- cheap enough; can be made cheaper
* mPreviousSiblingData -- would go away
* mParentData -- would go away
* mLanguage -- could just be computed each time in the rare cases it's
needed, I think. It _would_ be possible to write
pathological testcases that are slower as a result, but
I don't think we care.
* mNthIndices -- see below
* mContentState -- could be cached in the node, I think... Or we could
stop using a bitfield here and use the webkit setup
of explicit boolean getters plus casts to subclasses
in some cases (e.g. for the form control states).
Or we could just get it each time we need it (for
pseudo-class matches only). It's usually not THAT
expensive to compute, and not needed that much.
That leaves mNthIndices. As a first cut we could just not cache these,
but that can lead to a bit of pain if multiple :nth-child() selectors
are around that might all match a given node. That's the only case it
helps us right now, though, except in querySelector where we use the
previous sibling's indices to good effect. What Webkit does here is to
cache a 17-bit index in nodes (using some spare bits they have); they
only cache the :nth-child index, so there's no caching for nth-of-type
or either of the *-last pseudos. If we wanted to, we could move this
cache to either the element or to slots, but then invalidation becomes
an issue. Webkit invalidates by simply triggering a reresolve on the
parent on DOM mutations if one of these selectors was used; since they
reresolve along the DOM this works fine (earlier kids' indices are
recomputed beforel later ones, so the later ones can use the earlier
cache to figure out theirs). We reresolve along the frame tree, and XBL
makes it such that the order here does not match DOM order. So we would
not in fact have the needed earlier/later guarantee.
So apart from mNthIndices and maybe mContentState, I think this struct
can just go away. If we can figure out a good plan for mNthIndices, I
think we should just kill it off....
Thoughts?
-Boris
_______________________________________________
dev-tech-layout mailing list
[email protected]
https://lists.mozilla.org/listinfo/dev-tech-layout