[jira] Commented: (LUCENE-1652) Enhancements to Scorers following the changes to DocIdSetIterator

Michael McCandless (JIRA) Mon, 25 May 2009 09:29:11 -0700

    [ 
https://issues.apache.org/jira/browse/LUCENE-1652?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12712745#action_12712745
 ]


Michael McCandless commented on LUCENE-1652:
--------------------------------------------

bq. I'm not sure about it. In 3.0, we'll make nextDoc() abstract (for sure, 
since the default impl calls next()) and probably advance() also. So when you 
upgrade to 2.9, you can switch to calling nextDoc() and advance(), but if you 
implemented DISI, you won't be required to implement nextDoc() and/or 
advance(), so when you upgrade to 3.0 your code won't compile.

You're right -- on making nextDoc & advance abstract in 3.0, your code
won't compile on upgrading to 3.0 and you'd have to go fix any custom
DISIs you have.  But: if we leave doc() as is, you wouldn't be forced
to do anything on that.  You just implement nextDoc/advance and think
you're done...

bq. When upgrading, I think we should assume (or even require) users reading 
CHANGES. When they notice that DISI has changed and that they need to implement 
two new methods, they should also notice the change in semantics of doc().

Relying only on this (seeing CHANGES.txt) is what makes me nervous.

bq. I take it that by "catastrophic" you mean that you're ok with people 
upgrading to 3.0 and don't compile, since that will force them to read CHANGES 
or javadocs and understand what they are now supposed to implement. Therefore 
if document() documents the new semantics, it is ok for us to rely on that, and 
if something fails, it's the user's problem.

Right that's what I mean by "catastrophic" (note: Marvin used it
first, but I like it ;) ) But: I want the catastrophe specifically to
apply to doc() as well, so that you are forced to make that a new
method.  Ie, I'm hoping that the extra step of having a newly named
method is enough to get you to go and understand that we subtly
changed its semantics.

bq. If we add document() (note the longer method name, compared to doc()) we 
can implement it following the new semantics and take advantage of that in 2.9 
already (I think?).

Exactly, another benefit of this approach (besides bringing
catastrophe) is that we can do all of this in 2.9, including taking
advantage of the new semantics.  Which is great.

bq. If this indeed should work, where should I do it - in this issue (I need 
1614 to be committed first) or in 1614?

I think do this as another iteration of the patch on LUCENE-1614?


> Enhancements to Scorers following the changes to DocIdSetIterator
> -----------------------------------------------------------------
>
>                 Key: LUCENE-1652
>                 URL: https://issues.apache.org/jira/browse/LUCENE-1652
>             Project: Lucene - Java
>          Issue Type: Improvement
>          Components: Search
>            Reporter: Shai Erera
>             Fix For: 3.0
>
>
> In LUCENE-1614, we changed the semantics of DocIdSetIterator's methods to 
> return a sentinel NO_MORE_DOCS (= Integer.MAX_VALUE) when the iterator has 
> exhausted. Due to backward compatibility issues, we couldn't implement that 
> semantics in doc(). Therefore this issue, which can be introduced in 3.0 only 
> will:
> # Implement the new semantics in all extending classes, such that doc() will 
> return NO_MORE_DOCS when the iterator has exhausted.
> # Change BooleanScorer to take advantage of that by removing sub.done from 
> SubScorer and operate under the assumption that NO_MORE_DOCS is larger than 
> any doc ID (Integer.MAX_VALUE).
> # Change ConjunctionScorer to operate under the same assumptions and remove 
> 'more'.
> # Change ReqExclScorer to not rely on reqScorer in doc(), since the latter 
> may be null.
> # Make more changes to ConjunctionScorer's init() and remove 'firstTime' to 
> improve the performance of nextDoc(), score(), advance().
> # Add start()/finish() to DISI?
> A snippet from LUCENE-1614 regarding the change in BooleanScorer
> {code}
> int doc = sub.done ? -1 : scorer.doc();
> while (!sub.done && doc < end) {
>   sub.collector.collect(doc);
>   doc = scorer.nextDoc();
>   sub.done = doc < 0;
> }
> {code}
> To this:
> {code}
> int doc = scorer.doc();
> while (doc < end) {
>   sub.collector.collect(doc);
>   doc = scorer.nextDoc();
> }
> {code}
> And in ConjunctionScorer, change this:
> {code}
> while (more && (firstScorer=scorers[first]).doc() < 
> (lastDoc=lastScorer.doc())) {
>   more = firstScorer.advance(lastDoc) >= 0;
>   lastScorer = firstScorer;
>   first = (first == (scorers.length-1)) ? 0 : first+1;
> }
> return more;
> {code}
> To this:
> {code}
> while ((firstScorer=scorers[first]).doc() < (lastDoc=lastScorer.doc())) {
>   firstScorer.advance(lastDoc);
>   lastScorer = firstScorer;
>   first = (first == (scorers.length-1)) ? 0 : first+1;
> }
> return lastDoc != DOC_SENTINEL;
> {code}

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[jira] Commented: (LUCENE-1652) Enhancements to Scorers following the changes to DocIdSetIterator

Reply via email to