[jira] Commented: (LUCENE-1614) Add next() and skipTo() variants to DocIdSetIterator that return the current doc, instead of boolean

Michael McCandless (JIRA) Tue, 19 May 2009 12:58:18 -0700

    [ 
https://issues.apache.org/jira/browse/LUCENE-1614?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12710851#action_12710851
 ]


Michael McCandless commented on LUCENE-1614:
--------------------------------------------

{quote}
> This would save CPU for scorers that merge multiple sub-scorers (like 
> BooleanScorer/2), because instead of having to check for -1 returned from 
> each sub-scorer, they could simply proceed with their normal logic and check 
> for Integer.MAX_VALUE just before collecting the doc.

But for scorers that use a priority queue, does checking and immediately 
removing from the queue (hence making the heap smaller) offer any advantages? I 
had assumed so since this is what current scorers do. Immediately removing 
scorers also causes early termination for minimumNrMatchers>1 in 
DisjunctionSumScorer.
{quote}

But that only helps at the tail end of the iteration, vs saving an if
check per-sub-scorer X per-next?

Ie presumably much more CPU is spent iterating while the PQ is full,
than while it's winding down, so saving the if per-sub-scorer-next is
better?

Also, I think over time we should migrate away from the PQ (ie, use
BooleanScorer's batch approach, not Disjunction*Scorer's PQ) since the
batch scoring approach gives better performance.  EG I think we should
extend BooleanScorer to handle MUST clauses.  BooleanScorer handles
doc=Integer.MAX_VALUE for a sub-scorer quite efficiently (the chunk is
always skipped for that sub-scorer, after one if check).


> Add next() and skipTo() variants to DocIdSetIterator that return the current 
> doc, instead of boolean
> ----------------------------------------------------------------------------------------------------
>
>                 Key: LUCENE-1614
>                 URL: https://issues.apache.org/jira/browse/LUCENE-1614
>             Project: Lucene - Java
>          Issue Type: Improvement
>          Components: Search
>            Reporter: Shai Erera
>             Fix For: 2.9
>
>         Attachments: LUCENE-1614.patch
>
>
> See 
> http://www.nabble.com/Another-possible-optimization---now-in-DocIdSetIterator-p23223319.html
>  for the full discussion. The basic idea is to add variants to those two 
> methods that return the current doc they are at, to save successive calls to 
> doc(). If there are no more docs, return -1. A summary of what was discussed 
> so far:
> # Deprecate those two methods.
> # Add nextDoc() and skipToDoc(int) that return doc, with default impl in DISI 
> (calls next() and skipTo() respectively, and will be changed to abstract in 
> 3.0).
> #* I actually would like to propose an alternative to the names: advance() 
> and advance(int) - the first advances by one, the second advances to target.
> # Wherever these are used, do something like '(doc = advance()) >= 0' instead 
> of comparing to -1 for improved performance.
> I will post a patch shortly

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[jira] Commented: (LUCENE-1614) Add next() and skipTo() variants to DocIdSetIterator that return the current doc, instead of boolean

Reply via email to