On Thu, Oct 15, 2009 at 2:33 PM, Michael McCandless <
luc...@mikemccandless.com> wrote:

>
> I don't think we do any branch tuning on the PQ insertion -- the ifs
> involved in re-heapifying the PQ are simply hard for the CPU to
> predict (though, apparently, not as hard as comparing strings ;).
>

But it does look like you do some nice short-circuting where you compare
to bottom instead of doing the naive insertWithOverflow() and check the
result, which is cool.


> > The single-PQ approach requires only O(K * log(K) * log(M))
> > insertions, but each one requires, instead of O(log(K)) int compares,
> > O(log(K)) *string* compares, and a segment conversion.
>
> Right, except, for a large segment, if it has enough insertions, as
> more entries in the PQ belong to that segment, more of the comparisons
> become ints.
>

Assuming you're processing the larger segments first, as Yonik said
was done, then this won't typically help very much, right?  If the
second segment is much smaller than the first, then not as many
replacements will happen, leaving the majority of the PQ having
entries from the previous segment.  Additionally - as you move to further
and further segments, whether they are the same size or smaller than
previous, the bottom element of the PQ is getting better and better, and
so you're chances of actually filling up even a significant minority of it
with
entries from the current segment is very small.

Looking back through LUCENE-1483, we were seeing much more sizable
> gains in the new API vs the old single-PQ-only-int-comparison approach
> (which should be faster than both the single and multi PQ approaches
> we are discussing here).  BUT: those tests combined searching by
> segment AND the new FieldCollector API, plus other more basic code
> speedups.  Though it'd be odd if the switch to searching by segment
> really was most of the gains here.  I'm confused (for now) on where
> the gains we were seeing in that issue came from...
>

You don't give yourself enough credit here: it could certainly that the
perf improvement was from more basic code-level speedups, as we're
only dealing with the sub-leading behavior here, and the constants
look like they most certainly matter!

  -jake

Reply via email to