[
https://issues.apache.org/jira/browse/LUCENE-893?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12558386#action_12558386
]
Paul Elschot commented on LUCENE-893:
-------------------------------------
I think the different results of 26 May 2007 for conjunction queries and
disjunction queries may be caused by the use of TermScorer.skipTo() in
conjunctions and TermScorer.next() in disjunctions.
That points to different optimal buffer sizes for conjunctions (smaller because
of the skipping) and for disjunctions (larger because all postings are going to
be needed).
LUCENE-430 is about reducing term buffer size for the case when the buffer is
not going to be used completely because of the small number of documents
containing the term.
In all, I think it makes sense to allow the (conjunction/disjunction)Scorer to
choose the maximum buffer size for the term, and let the term itself choose a
lower value when it needs less than that.
Another way to promote sequential reading for disjunction queries is to process
all their terms sequentially, i.e. one term at a time. In lucene this is
currently done by Filters for prefix queries and ranges. Unfortunately this
cannot be done when the combined frequency of the terms in each document is
needed. In that case DisjunctionSumScorer could be used, with larger buffers on
the terms that are contained in many documents.
> Increase buffer sizes used during searching
> -------------------------------------------
>
> Key: LUCENE-893
> URL: https://issues.apache.org/jira/browse/LUCENE-893
> Project: Lucene - Java
> Issue Type: Improvement
> Components: Search
> Affects Versions: 2.1
> Reporter: Michael McCandless
>
> Spinoff of LUCENE-888.
> In LUCENE-888 we increased buffer sizes that impact indexing and found
> substantial (10-18%) overall performance gains.
> It's very likely that we can also gain some performance for searching
> by increasing the read buffers in BufferedIndexInput used by
> searching.
> We need to test performance impact to verify and then pick a good
> overall default buffer size, also being careful not to add too much
> overall HEAP RAM usage because a potentially very large number of
> BufferedIndexInput instances are created during searching
> (# segments X # index files per segment).
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.
---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]