[
https://issues.apache.org/jira/browse/LUCENE-2215?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12848896#action_12848896
]
Shai Erera commented on LUCENE-2215:
------------------------------------
I've reviewed PagingCollector.java and the first thing I have to say about it
is that I really like it ! :) Saves lots of unnecessary heapify code, if the
application can allow itself to store the lowest last SD.
I have few comments/questions.
I don't understand what getLastScoreDoc is for? Is it just a utility method? Is
it something the app can compute by itself? Anyway, it lacks javadocs, so
perhaps if they existed I wouldn't need to ask ;).
In collect(), there's the following code:
{code}
} else if (score == previousPassLowest.score && doc <=
previousPassLowest.doc) {
// if the scores are the same and the doc is less than
or equal to
// the
// previous pass lowest hit doc then skip because this
collector
// favors
// lower number documents.
return;
{code}
I think there's a typo in the comment "favors lower number documents" .. while
it seems to prefer higher doc IDs? The way I understand it, irregardless of
whether docs are collected in/out of order, HitQueue ensures that when scores
are equals, the lowest IDs are favored. Thus the first round always keeps the
lowest IDs among the docs whose scores match. The next round will favor the
docs whose IDs come next, and so forth ... am I right? (just clarifying my
understanding).
If that's the case, I think it'll be good if it's spelled out in the comment,
and also mention that it means that document has already been returned
previously (like it's documented in the previous 'if').
The last 'else' really looks like TSDC's out-of-order version, which makes me
think whether PagingCollector can be viewed as a filter on top of TSDC (and
possibly even TopFieldCollector)? So if a hit should be collected, it just
calls super.collect? I realize though that a Collector is a hotspot and we want
to minimize 'if' let alone method call statements as much as possible. But it
just feels so strong that it should be a filter ... :). And you wouldn't need
to specifically handle in/out orderness ... and w/ the right design, it can
also wrap a TFC or any other TDC implementation ...
BTW, I've noticed that you don't track maxScore - is it assumed that the
application stores it from the first round? If so I'd document it, because the
application needs to know it should use TSDC the first round, and
PagingCollector the second round.
Also, PagingCollector offers a ctor which does not force the application to
pass in a ScoreDoc. See my comment from above - it might be misleading, because
if you use this collector right from the very first search, you lose the
maxScore tracking. I also don't see why it should be allowed - if a dummy
previousPassLowest ScoreDoc is used, collect() does a lot of unnecessary 'if's.
I think this collector should be used only from the second round, and a single
ctor which forces a ScoreDoc to be passed would make more sense. If the
application wishes to shoot itself in the leg (performance-wise), it can pass a
dummy SD itself.
> paging collector
> ----------------
>
> Key: LUCENE-2215
> URL: https://issues.apache.org/jira/browse/LUCENE-2215
> Project: Lucene - Java
> Issue Type: New Feature
> Components: Search
> Affects Versions: 2.4, 3.0
> Reporter: Adam Heinz
> Assignee: Grant Ingersoll
> Priority: Minor
> Attachments: IterablePaging.java, PagingCollector.java,
> TestingPagingCollector.java
>
>
> http://issues.apache.org/jira/browse/LUCENE-2127?focusedCommentId=12796898&page=com.atlassian.jira.plugin.system.issuetabpanels%3Acomment-tabpanel#action_12796898
> Somebody assign this to Aaron McCurry and we'll see if we can get enough
> votes on this issue to convince him to upload his patch. :)
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]