Re: RangeFilter performance problem using MultiReader

Yonik Seeley Sat, 11 Apr 2009 14:22:26 -0700

OK, I think this will improve the situation:
https://issues.apache.org/jira/browse/LUCENE-1596


-Yonik
http://www.lucidimagination.com


On Fri, Apr 10, 2009 at 1:47 PM, Michael McCandless
<[email protected]> wrote:
> We never fully explained it, but we have some ideas...
>
> It's only if you iterate each term, and do a TermDocs.seek for each,
> that Multi*Reader seems to show the problem.  Just iterating the terms
> seems OK (I have a 51 segment index, and I can iterate ~ 10M unique
> terms in ~8 seconds).
>
> But loading FieldCache, or doing eg RangeQuery, also does a
> MultiTermDocs.seek on each term, which in turn calls
> SegmentTermDocs.seek for each of the sub-readers in sequence.  I
> *think* maybe for highly unique terms, where typically all segments
> but one actually have the term, the cost of invoking seek on those
> segments without the term is high.  Really, somehow, we want to only
> call seek on those segments that have the term, which we know from the
> pqueue...
>
> Mike

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Re: RangeFilter performance problem using MultiReader

Reply via email to