OK, I think this will improve the situation: https://issues.apache.org/jira/browse/LUCENE-1596
-Yonik http://www.lucidimagination.com On Fri, Apr 10, 2009 at 1:47 PM, Michael McCandless <[email protected]> wrote: > We never fully explained it, but we have some ideas... > > It's only if you iterate each term, and do a TermDocs.seek for each, > that Multi*Reader seems to show the problem. Just iterating the terms > seems OK (I have a 51 segment index, and I can iterate ~ 10M unique > terms in ~8 seconds). > > But loading FieldCache, or doing eg RangeQuery, also does a > MultiTermDocs.seek on each term, which in turn calls > SegmentTermDocs.seek for each of the sub-readers in sequence. I > *think* maybe for highly unique terms, where typically all segments > but one actually have the term, the cost of invoking seek on those > segments without the term is high. Really, somehow, we want to only > call seek on those segments that have the term, which we know from the > pqueue... > > Mike --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
