On 4/28/15 2:59 AM, Glen Huang wrote:
Looking at the microbenchmark again, for Gecko, getElementById is around 300x
faster than querySelector('#id'), and even getElementsByClassName is faster
than it.
This is why one should not write microbenchmarks. ;) Or at least if
one does, examine the results very carefully.
The numbers you see for the getElementById benchmark there are on the
order of 2e9 operations per second, yes? And modern desktop/laptop CPUs
are clocked somewhere in the 2-4 GHz range. So what you're seeing is
that the benchmark claims the operation is performed in 1-2 clock
cycles. This should seem unlikely if you think the operation involves a
hashtable lookup!
What's happening there is that Gecko happens to know at JIT compile time
in this microbenchmark:
1) The bareword lookup is going to end up at the global, because there
is nothing on the scope chain that would shadow the "document" name.
2) The global has an own property named "document" whose getter is
side-effect-free.
3) The return value of the "document" property has only been observed
to be a Document.
4) Looking up "getElementById" on the return value of the "document"
property has consistently found it on Document.prototype.
5) Document.prototype.getElementById is known to be side-effect-free.
6) The return value of getElementById is not used (assigned to a
function-local variable that is then not used).
The upshot of all that is that with a few guards both the
"getElementById" call get and the "document" get can be dead-code
eliminated here. And even if you stored the value somewhere persistent
they could both still be loop-hoisted in this case. So what this
getElementById benchmark measures is how fast a loop counter can be
decremented from some starting value to 0. It happens that this can be
done in about 1-2 clock cycles per loop iteration.
OK, so what about querySelector("#id") vs getElementsByClassName?
In the former case, loop-hoisting and dead code elimination are
disallowed because querySelector can throw. That means that you can't
eliminate it, and you can't move it past other things that might have
observable side effects (like the counter increment). Arguably this is
a misfeature in the design of querySelector.
In the latter case, loop-hoisting or dead code elimination can't happen
because Gecko doesn't know enough about what [0] will do so assumes the
worst: that it can have side-effects that can affect what the "document"
getter returns as well as what the getElementsByClassName() call returns.
So there are no shortcuts here; you have to actually do the calls. What
do those calls do?
querySelector does a hashtable lookup for the selector to find a parsed
selector. Then it sets up some state that's needed for selector
matching. Then it detects that the selector's right-hand-most bit has a
simple ID selector and does a fast path that involves looking up that id
in the hashtable and then comparing the selector to the elements that
are returned until one of them matches.
getElementsByClassName has to do a hashtable lookup on the class name,
then return the result. Then it has to do the [0] (which is actually
surprisingly expensive, by the way, because of the proxy machinery
involved on the JS engine side).
So we _could_ make querySelector faster here by adding another special
case for selectors that are _just_ an id as opposed to the existing
optimization (which works for "#foo > #bar" and similar as well). And
of course the new special case would only work the way you want for
document.querySelector, not element.querySelector; the latter needs to
check for your result being a descendant of the element anyway. It's a
tradeoff between complexity of implementation (which has its own
maintenance _and_ performance costs) and real-life use cases.
Lastly, I'd like to put numbers to this. On this particular testcase,
the querySelector("#list") call takes about 100ns on my hardware: about
300 CPU cycles. We could add that other set of special-casing and get
it down to 70ns (I just checked by implementing it, so this is not a
random guess). At that point you've got two hashtable lookups (which we
could try to make faster, perhaps), the logic to detect that the
optimization can be done at all (which is not that trivial; our selector
representation requires a bunch of checks to ensure that it's just an id
selector), and whatever work is involved in the binding layer. In this
case, those all seem to have about the same cost; about 17-18ns (50 CPU
cycles) each.
So is your use case one where the difference between querySelector
costing 100ns and it costing 70ns actually makes a difference?
It doesn't look like it benefits much from an eagerly populated hash table?
It benefits a good bit for non-toy documents where avoiding walking the
entire DOM is the important part of the optimization. Again,
microbenchmarks mostly serve to highlight the costs of constant-time
overhead, which is a useful thing to do, as long as you know that's what
you're doing. But for real-life testcases algorithmic complexity can
often be much more important.
-Boris
P.S. Other engines make different complexity/speed tradeoffs here; for
example Safari's querySelector with ".foo" and "#list" is about 60ns on
my hardware. I know they have extra fast paths for both those cases.