[jira] [Commented] (LUCENE-5527) Make the Collector API work per-segment

2014-04-04 Thread ASF subversion and git services (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-5527?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13960055#comment-13960055
 ] 

ASF subversion and git services commented on LUCENE-5527:
-

Commit 1584747 from jpou...@apache.org in branch 'dev/trunk'
[ https://svn.apache.org/r1584747 ]

LUCENE-5527: Refactor Collector API to use a dedicated Collector per leaf.

> Make the Collector API work per-segment
> ---
>
> Key: LUCENE-5527
> URL: https://issues.apache.org/jira/browse/LUCENE-5527
> Project: Lucene - Core
>  Issue Type: Improvement
>Reporter: Adrien Grand
>Priority: Minor
> Fix For: 5.0
>
> Attachments: LUCENE-5527.patch, LUCENE-5527.patch
>
>
> Spin-off of LUCENE-5299.
> LUCENE-5229 proposes different changes, some of them being controversial, but 
> there is one of them that I really really like that consists in refactoring 
> the {{Collector}} API in order to have a different Collector per segment.
> The idea is, instead of having a single Collector object that needs to be 
> able to take care of all segments, to have a top-level Collector:
> {code}
> public interface Collector {
>   AtomicCollector setNextReader(AtomicReaderContext context) throws 
> IOException;
>   
> }
> {code}
> and a per-AtomicReaderContext collector:
> {code}
> public interface AtomicCollector {
>   void setScorer(Scorer scorer) throws IOException;
>   void collect(int doc) throws IOException;
>   boolean acceptsDocsOutOfOrder();
> }
> {code}
> I think it makes the API clearer since it is now obious {{setScorer}} and 
> {{acceptDocsOutOfOrder}} need to be called after {{setNextReader}} which is 
> otherwise unclear.
> It also makes things more flexible. For example, a collector could much more 
> easily decide to use different strategies on different segments. In 
> particular, it makes the early-termination collector much cleaner since it 
> can return different atomic collectors implementations depending on whether 
> the current segment is sorted or not.
> Even if we have lots of collectors all over the place, we could make it 
> easier to migrate by having a Collector that would implement both Collector 
> and AtomicCollector, return {{this}} in setNextReader and make current 
> concrete Collector implementations extend this class instead of directly 
> extending Collector.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-5527) Make the Collector API work per-segment

2014-04-03 Thread Hoss Man (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-5527?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13959340#comment-13959340
 ] 

Hoss Man commented on LUCENE-5527:
--

bq. I dont think we have to "give up" on the close() idea.

Well, I only brought it up because i didn't realize the full context of 
existing discussion that already took place in LUCENE-5299 and it seemed like 
an "easy" win if we were already going to change the api/lifecycle of Collector 
-- but i don't want to derail Adrien's current goal.  

If, as you say, ...

bq. ...it (adding Closable) just has its own issue and its own set of 
problems

...that seems like good grounds for it to be tackled in it's own Jira, and let 
this one (splitting out LeafCollector) move forward first.

(Adrien: sorry for stiring up so much shit :( )

> Make the Collector API work per-segment
> ---
>
> Key: LUCENE-5527
> URL: https://issues.apache.org/jira/browse/LUCENE-5527
> Project: Lucene - Core
>  Issue Type: Improvement
>Reporter: Adrien Grand
>Priority: Minor
> Fix For: 5.0
>
> Attachments: LUCENE-5527.patch
>
>
> Spin-off of LUCENE-5299.
> LUCENE-5229 proposes different changes, some of them being controversial, but 
> there is one of them that I really really like that consists in refactoring 
> the {{Collector}} API in order to have a different Collector per segment.
> The idea is, instead of having a single Collector object that needs to be 
> able to take care of all segments, to have a top-level Collector:
> {code}
> public interface Collector {
>   AtomicCollector setNextReader(AtomicReaderContext context) throws 
> IOException;
>   
> }
> {code}
> and a per-AtomicReaderContext collector:
> {code}
> public interface AtomicCollector {
>   void setScorer(Scorer scorer) throws IOException;
>   void collect(int doc) throws IOException;
>   boolean acceptsDocsOutOfOrder();
> }
> {code}
> I think it makes the API clearer since it is now obious {{setScorer}} and 
> {{acceptDocsOutOfOrder}} need to be called after {{setNextReader}} which is 
> otherwise unclear.
> It also makes things more flexible. For example, a collector could much more 
> easily decide to use different strategies on different segments. In 
> particular, it makes the early-termination collector much cleaner since it 
> can return different atomic collectors implementations depending on whether 
> the current segment is sorted or not.
> Even if we have lots of collectors all over the place, we could make it 
> easier to migrate by having a Collector that would implement both Collector 
> and AtomicCollector, return {{this}} in setNextReader and make current 
> concrete Collector implementations extend this class instead of directly 
> extending Collector.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-5527) Make the Collector API work per-segment

2014-04-03 Thread Robert Muir (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-5527?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13959326#comment-13959326
 ] 

Robert Muir commented on LUCENE-5527:
-

I dont think we have to "give up" on the close() idea. it just has its own 
issue and its own set of problems. probably a more annoying/larger issue than 
this one to get right. I think its ok to just tackle it there...

> Make the Collector API work per-segment
> ---
>
> Key: LUCENE-5527
> URL: https://issues.apache.org/jira/browse/LUCENE-5527
> Project: Lucene - Core
>  Issue Type: Improvement
>Reporter: Adrien Grand
>Priority: Minor
> Fix For: 5.0
>
> Attachments: LUCENE-5527.patch
>
>
> Spin-off of LUCENE-5299.
> LUCENE-5229 proposes different changes, some of them being controversial, but 
> there is one of them that I really really like that consists in refactoring 
> the {{Collector}} API in order to have a different Collector per segment.
> The idea is, instead of having a single Collector object that needs to be 
> able to take care of all segments, to have a top-level Collector:
> {code}
> public interface Collector {
>   AtomicCollector setNextReader(AtomicReaderContext context) throws 
> IOException;
>   
> }
> {code}
> and a per-AtomicReaderContext collector:
> {code}
> public interface AtomicCollector {
>   void setScorer(Scorer scorer) throws IOException;
>   void collect(int doc) throws IOException;
>   boolean acceptsDocsOutOfOrder();
> }
> {code}
> I think it makes the API clearer since it is now obious {{setScorer}} and 
> {{acceptDocsOutOfOrder}} need to be called after {{setNextReader}} which is 
> otherwise unclear.
> It also makes things more flexible. For example, a collector could much more 
> easily decide to use different strategies on different segments. In 
> particular, it makes the early-termination collector much cleaner since it 
> can return different atomic collectors implementations depending on whether 
> the current segment is sorted or not.
> Even if we have lots of collectors all over the place, we could make it 
> easier to migrate by having a Collector that would implement both Collector 
> and AtomicCollector, return {{this}} in setNextReader and make current 
> concrete Collector implementations extend this class instead of directly 
> extending Collector.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-5527) Make the Collector API work per-segment

2014-04-03 Thread Hoss Man (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-5527?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13959324#comment-13959324
 ] 

Hoss Man commented on LUCENE-5527:
--

bq. Its controversial. ...

Fair enough, i retract the suggestion.

Off my list of questions, that just leaves...

bq. is setNextReader really an appropriate name anymore? or should it be 
something like getLeafCollector(AtomicReaderContext)

...which Adrien seemed to be on board with.  Any other concerns?

> Make the Collector API work per-segment
> ---
>
> Key: LUCENE-5527
> URL: https://issues.apache.org/jira/browse/LUCENE-5527
> Project: Lucene - Core
>  Issue Type: Improvement
>Reporter: Adrien Grand
>Priority: Minor
> Fix For: 5.0
>
> Attachments: LUCENE-5527.patch
>
>
> Spin-off of LUCENE-5299.
> LUCENE-5229 proposes different changes, some of them being controversial, but 
> there is one of them that I really really like that consists in refactoring 
> the {{Collector}} API in order to have a different Collector per segment.
> The idea is, instead of having a single Collector object that needs to be 
> able to take care of all segments, to have a top-level Collector:
> {code}
> public interface Collector {
>   AtomicCollector setNextReader(AtomicReaderContext context) throws 
> IOException;
>   
> }
> {code}
> and a per-AtomicReaderContext collector:
> {code}
> public interface AtomicCollector {
>   void setScorer(Scorer scorer) throws IOException;
>   void collect(int doc) throws IOException;
>   boolean acceptsDocsOutOfOrder();
> }
> {code}
> I think it makes the API clearer since it is now obious {{setScorer}} and 
> {{acceptDocsOutOfOrder}} need to be called after {{setNextReader}} which is 
> otherwise unclear.
> It also makes things more flexible. For example, a collector could much more 
> easily decide to use different strategies on different segments. In 
> particular, it makes the early-termination collector much cleaner since it 
> can return different atomic collectors implementations depending on whether 
> the current segment is sorted or not.
> Even if we have lots of collectors all over the place, we could make it 
> easier to migrate by having a Collector that would implement both Collector 
> and AtomicCollector, return {{this}} in setNextReader and make current 
> concrete Collector implementations extend this class instead of directly 
> extending Collector.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-5527) Make the Collector API work per-segment

2014-04-03 Thread Robert Muir (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-5527?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13959300#comment-13959300
 ] 

Robert Muir commented on LUCENE-5527:
-

{quote}
 Even if parallelization is controversial, adding close() doesn't seem to be, 
so why not do all of the known desirable, non-controversial, Collector API 
changes at once?
{quote}

Its controversial. Its going to be some work to do it correctly, and its 
arguably completely unnecessary (after IS.search returns, you are done with the 
collector).

I described these problems here: LUCENE-4370

If we can add close() in a way thats consistent, fine. But adding this 
"lifecycle" stuff in a way that is inconsistent is very bad. We need semantics 
that can be relied upon.

> Make the Collector API work per-segment
> ---
>
> Key: LUCENE-5527
> URL: https://issues.apache.org/jira/browse/LUCENE-5527
> Project: Lucene - Core
>  Issue Type: Improvement
>Reporter: Adrien Grand
>Priority: Minor
> Fix For: 5.0
>
> Attachments: LUCENE-5527.patch
>
>
> Spin-off of LUCENE-5299.
> LUCENE-5229 proposes different changes, some of them being controversial, but 
> there is one of them that I really really like that consists in refactoring 
> the {{Collector}} API in order to have a different Collector per segment.
> The idea is, instead of having a single Collector object that needs to be 
> able to take care of all segments, to have a top-level Collector:
> {code}
> public interface Collector {
>   AtomicCollector setNextReader(AtomicReaderContext context) throws 
> IOException;
>   
> }
> {code}
> and a per-AtomicReaderContext collector:
> {code}
> public interface AtomicCollector {
>   void setScorer(Scorer scorer) throws IOException;
>   void collect(int doc) throws IOException;
>   boolean acceptsDocsOutOfOrder();
> }
> {code}
> I think it makes the API clearer since it is now obious {{setScorer}} and 
> {{acceptDocsOutOfOrder}} need to be called after {{setNextReader}} which is 
> otherwise unclear.
> It also makes things more flexible. For example, a collector could much more 
> easily decide to use different strategies on different segments. In 
> particular, it makes the early-termination collector much cleaner since it 
> can return different atomic collectors implementations depending on whether 
> the current segment is sorted or not.
> Even if we have lots of collectors all over the place, we could make it 
> easier to migrate by having a Collector that would implement both Collector 
> and AtomicCollector, return {{this}} in setNextReader and make current 
> concrete Collector implementations extend this class instead of directly 
> extending Collector.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-5527) Make the Collector API work per-segment

2014-04-03 Thread Adrien Grand (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-5527?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13959280#comment-13959280
 ] 

Adrien Grand commented on LUCENE-5527:
--

I think there are things to discuss regarding proper wrapping and how to deal 
with exceptions that are thrown during collection: should close/finish be 
called in case an exception is thrown during collection (do we want to do the 
same in case of an IOException or a TimeExceededException)?

> Make the Collector API work per-segment
> ---
>
> Key: LUCENE-5527
> URL: https://issues.apache.org/jira/browse/LUCENE-5527
> Project: Lucene - Core
>  Issue Type: Improvement
>Reporter: Adrien Grand
>Priority: Minor
> Fix For: 5.0
>
> Attachments: LUCENE-5527.patch
>
>
> Spin-off of LUCENE-5299.
> LUCENE-5229 proposes different changes, some of them being controversial, but 
> there is one of them that I really really like that consists in refactoring 
> the {{Collector}} API in order to have a different Collector per segment.
> The idea is, instead of having a single Collector object that needs to be 
> able to take care of all segments, to have a top-level Collector:
> {code}
> public interface Collector {
>   AtomicCollector setNextReader(AtomicReaderContext context) throws 
> IOException;
>   
> }
> {code}
> and a per-AtomicReaderContext collector:
> {code}
> public interface AtomicCollector {
>   void setScorer(Scorer scorer) throws IOException;
>   void collect(int doc) throws IOException;
>   boolean acceptsDocsOutOfOrder();
> }
> {code}
> I think it makes the API clearer since it is now obious {{setScorer}} and 
> {{acceptDocsOutOfOrder}} need to be called after {{setNextReader}} which is 
> otherwise unclear.
> It also makes things more flexible. For example, a collector could much more 
> easily decide to use different strategies on different segments. In 
> particular, it makes the early-termination collector much cleaner since it 
> can return different atomic collectors implementations depending on whether 
> the current segment is sorted or not.
> Even if we have lots of collectors all over the place, we could make it 
> easier to migrate by having a Collector that would implement both Collector 
> and AtomicCollector, return {{this}} in setNextReader and make current 
> concrete Collector implementations extend this class instead of directly 
> extending Collector.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-5527) Make the Collector API work per-segment

2014-04-03 Thread Hoss Man (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-5527?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13959254#comment-13959254
 ] 

Hoss Man commented on LUCENE-5527:
--

bq. The patch only moves 3 methods of the Collector class into a new 
LeafCollector class that gets instantiated per-segment. It doesn't add any 
functionnality but just changes the way documents are collected.

sure ... but "in for a penny in for a pound" ... this change is going to 
require every existing implementation of "Collector" to change, so *if* we're 
already going to require that of all existing classes, why not also add a 
{{close()}} method (that can easily be a No-Op for most existing 
implementations) at the same time?

bq. I think having callbacks upon collection completion would be useful and 
help clean up code as Shikhar Bhushan mentionned. But this LeafCollector 
refactoring has already been stalled in LUCENE-5299 because LUCENE-5299 also 
discussed parallelizing collection so I would like to keep this change as small 
as possible and get it in.

Fair enough -- as i said, i don't have strong opinions about any of this, i 
just wanted to raise the question.  But personally i think that adding 
{{Closable}} here and now as part of this API change issue makes sense (since 
it's about the "life cycle" of the Collector/LeafCollector) independent of any 
question of parallelization.,  Even if parallelization is controversial, adding 
{{close()}} doesn't seem to be, so why not do all of the known desirable, 
non-controversial, Collector API changes at once?

If i'm wrong, and there is some dissent about whether {{close()}} makes sense 
on these classes, then by all means yes: ignore the suggestion and focus purely 
on what you've got in the current patch.

> Make the Collector API work per-segment
> ---
>
> Key: LUCENE-5527
> URL: https://issues.apache.org/jira/browse/LUCENE-5527
> Project: Lucene - Core
>  Issue Type: Improvement
>Reporter: Adrien Grand
>Priority: Minor
> Fix For: 5.0
>
> Attachments: LUCENE-5527.patch
>
>
> Spin-off of LUCENE-5299.
> LUCENE-5229 proposes different changes, some of them being controversial, but 
> there is one of them that I really really like that consists in refactoring 
> the {{Collector}} API in order to have a different Collector per segment.
> The idea is, instead of having a single Collector object that needs to be 
> able to take care of all segments, to have a top-level Collector:
> {code}
> public interface Collector {
>   AtomicCollector setNextReader(AtomicReaderContext context) throws 
> IOException;
>   
> }
> {code}
> and a per-AtomicReaderContext collector:
> {code}
> public interface AtomicCollector {
>   void setScorer(Scorer scorer) throws IOException;
>   void collect(int doc) throws IOException;
>   boolean acceptsDocsOutOfOrder();
> }
> {code}
> I think it makes the API clearer since it is now obious {{setScorer}} and 
> {{acceptDocsOutOfOrder}} need to be called after {{setNextReader}} which is 
> otherwise unclear.
> It also makes things more flexible. For example, a collector could much more 
> easily decide to use different strategies on different segments. In 
> particular, it makes the early-termination collector much cleaner since it 
> can return different atomic collectors implementations depending on whether 
> the current segment is sorted or not.
> Even if we have lots of collectors all over the place, we could make it 
> easier to migrate by having a Collector that would implement both Collector 
> and AtomicCollector, return {{this}} in setNextReader and make current 
> concrete Collector implementations extend this class instead of directly 
> extending Collector.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-5527) Make the Collector API work per-segment

2014-04-03 Thread Adrien Grand (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-5527?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13959252#comment-13959252
 ] 

Adrien Grand commented on LUCENE-5527:
--

Uwe, are you talking about the {{LeafReader}} or the {{LeafCollector}} renaming?

> Make the Collector API work per-segment
> ---
>
> Key: LUCENE-5527
> URL: https://issues.apache.org/jira/browse/LUCENE-5527
> Project: Lucene - Core
>  Issue Type: Improvement
>Reporter: Adrien Grand
>Priority: Minor
> Fix For: 5.0
>
> Attachments: LUCENE-5527.patch
>
>
> Spin-off of LUCENE-5299.
> LUCENE-5229 proposes different changes, some of them being controversial, but 
> there is one of them that I really really like that consists in refactoring 
> the {{Collector}} API in order to have a different Collector per segment.
> The idea is, instead of having a single Collector object that needs to be 
> able to take care of all segments, to have a top-level Collector:
> {code}
> public interface Collector {
>   AtomicCollector setNextReader(AtomicReaderContext context) throws 
> IOException;
>   
> }
> {code}
> and a per-AtomicReaderContext collector:
> {code}
> public interface AtomicCollector {
>   void setScorer(Scorer scorer) throws IOException;
>   void collect(int doc) throws IOException;
>   boolean acceptsDocsOutOfOrder();
> }
> {code}
> I think it makes the API clearer since it is now obious {{setScorer}} and 
> {{acceptDocsOutOfOrder}} need to be called after {{setNextReader}} which is 
> otherwise unclear.
> It also makes things more flexible. For example, a collector could much more 
> easily decide to use different strategies on different segments. In 
> particular, it makes the early-termination collector much cleaner since it 
> can return different atomic collectors implementations depending on whether 
> the current segment is sorted or not.
> Even if we have lots of collectors all over the place, we could make it 
> easier to migrate by having a Collector that would implement both Collector 
> and AtomicCollector, return {{this}} in setNextReader and make current 
> concrete Collector implementations extend this class instead of directly 
> extending Collector.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-5527) Make the Collector API work per-segment

2014-04-03 Thread Uwe Schindler (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-5527?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13959246#comment-13959246
 ] 

Uwe Schindler commented on LUCENE-5527:
---

bq. That surprised me that it was marked as "experimental"! It's been around in 
it's current form for 3 years, and anyone doing anything "interesting" with 
Lucene would most likely be using it. Perhaps this change should only be 
targeted toward trunk (5.0)?

+1 - Yes, please. This would make upgrading a pain for lot of people. Like the 
renaming issue, I would only ever do this for 5.0. Why should we do it before 
5.0? It is mainly a refactoring.

> Make the Collector API work per-segment
> ---
>
> Key: LUCENE-5527
> URL: https://issues.apache.org/jira/browse/LUCENE-5527
> Project: Lucene - Core
>  Issue Type: Improvement
>Reporter: Adrien Grand
>Priority: Minor
> Fix For: 5.0
>
> Attachments: LUCENE-5527.patch
>
>
> Spin-off of LUCENE-5299.
> LUCENE-5229 proposes different changes, some of them being controversial, but 
> there is one of them that I really really like that consists in refactoring 
> the {{Collector}} API in order to have a different Collector per segment.
> The idea is, instead of having a single Collector object that needs to be 
> able to take care of all segments, to have a top-level Collector:
> {code}
> public interface Collector {
>   AtomicCollector setNextReader(AtomicReaderContext context) throws 
> IOException;
>   
> }
> {code}
> and a per-AtomicReaderContext collector:
> {code}
> public interface AtomicCollector {
>   void setScorer(Scorer scorer) throws IOException;
>   void collect(int doc) throws IOException;
>   boolean acceptsDocsOutOfOrder();
> }
> {code}
> I think it makes the API clearer since it is now obious {{setScorer}} and 
> {{acceptDocsOutOfOrder}} need to be called after {{setNextReader}} which is 
> otherwise unclear.
> It also makes things more flexible. For example, a collector could much more 
> easily decide to use different strategies on different segments. In 
> particular, it makes the early-termination collector much cleaner since it 
> can return different atomic collectors implementations depending on whether 
> the current segment is sorted or not.
> Even if we have lots of collectors all over the place, we could make it 
> easier to migrate by having a Collector that would implement both Collector 
> and AtomicCollector, return {{this}} in setNextReader and make current 
> concrete Collector implementations extend this class instead of directly 
> extending Collector.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-5527) Make the Collector API work per-segment

2014-04-03 Thread Yonik Seeley (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-5527?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13959225#comment-13959225
 ] 

Yonik Seeley commented on LUCENE-5527:
--

bq. I had initially thought about doing this change for 5.0 only but since the 
migration ends up being easy and since Collector is an experimental API, we 
could also have it for 4.8.

That surprised me that it was marked as "experimental"!  It's been around in 
it's current form for 3 years, and anyone doing anything "interesting" with 
Lucene would most likely be using it.  Perhaps this change should only be 
targeted toward trunk (5.0)?

> Make the Collector API work per-segment
> ---
>
> Key: LUCENE-5527
> URL: https://issues.apache.org/jira/browse/LUCENE-5527
> Project: Lucene - Core
>  Issue Type: Improvement
>Reporter: Adrien Grand
>Priority: Minor
> Fix For: 5.0
>
> Attachments: LUCENE-5527.patch
>
>
> Spin-off of LUCENE-5299.
> LUCENE-5229 proposes different changes, some of them being controversial, but 
> there is one of them that I really really like that consists in refactoring 
> the {{Collector}} API in order to have a different Collector per segment.
> The idea is, instead of having a single Collector object that needs to be 
> able to take care of all segments, to have a top-level Collector:
> {code}
> public interface Collector {
>   AtomicCollector setNextReader(AtomicReaderContext context) throws 
> IOException;
>   
> }
> {code}
> and a per-AtomicReaderContext collector:
> {code}
> public interface AtomicCollector {
>   void setScorer(Scorer scorer) throws IOException;
>   void collect(int doc) throws IOException;
>   boolean acceptsDocsOutOfOrder();
> }
> {code}
> I think it makes the API clearer since it is now obious {{setScorer}} and 
> {{acceptDocsOutOfOrder}} need to be called after {{setNextReader}} which is 
> otherwise unclear.
> It also makes things more flexible. For example, a collector could much more 
> easily decide to use different strategies on different segments. In 
> particular, it makes the early-termination collector much cleaner since it 
> can return different atomic collectors implementations depending on whether 
> the current segment is sorted or not.
> Even if we have lots of collectors all over the place, we could make it 
> easier to migrate by having a Collector that would implement both Collector 
> and AtomicCollector, return {{this}} in setNextReader and make current 
> concrete Collector implementations extend this class instead of directly 
> extending Collector.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-5527) Make the Collector API work per-segment

2014-04-03 Thread Adrien Grand (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-5527?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13959218#comment-13959218
 ] 

Adrien Grand commented on LUCENE-5527:
--

The patch only moves 3 methods of the {{Collector}} class into a new 
{{LeafCollector}} class that gets instantiated per-segment. It doesn't add any 
functionnality but just changes the way documents are collected.

I think having callbacks upon collection completion would be useful and help 
clean up code as [~shikhar] mentionned. But this {{LeafCollector}} refactoring 
has already been stalled in LUCENE-5299 because LUCENE-5299 also discussed 
parallelizing collection so I would like to keep this change as small as 
possible and get it in.

> Make the Collector API work per-segment
> ---
>
> Key: LUCENE-5527
> URL: https://issues.apache.org/jira/browse/LUCENE-5527
> Project: Lucene - Core
>  Issue Type: Improvement
>Reporter: Adrien Grand
>Priority: Minor
> Fix For: 5.0
>
> Attachments: LUCENE-5527.patch
>
>
> Spin-off of LUCENE-5299.
> LUCENE-5229 proposes different changes, some of them being controversial, but 
> there is one of them that I really really like that consists in refactoring 
> the {{Collector}} API in order to have a different Collector per segment.
> The idea is, instead of having a single Collector object that needs to be 
> able to take care of all segments, to have a top-level Collector:
> {code}
> public interface Collector {
>   AtomicCollector setNextReader(AtomicReaderContext context) throws 
> IOException;
>   
> }
> {code}
> and a per-AtomicReaderContext collector:
> {code}
> public interface AtomicCollector {
>   void setScorer(Scorer scorer) throws IOException;
>   void collect(int doc) throws IOException;
>   boolean acceptsDocsOutOfOrder();
> }
> {code}
> I think it makes the API clearer since it is now obious {{setScorer}} and 
> {{acceptDocsOutOfOrder}} need to be called after {{setNextReader}} which is 
> otherwise unclear.
> It also makes things more flexible. For example, a collector could much more 
> easily decide to use different strategies on different segments. In 
> particular, it makes the early-termination collector much cleaner since it 
> can return different atomic collectors implementations depending on whether 
> the current segment is sorted or not.
> Even if we have lots of collectors all over the place, we could make it 
> easier to migrate by having a Collector that would implement both Collector 
> and AtomicCollector, return {{this}} in setNextReader and make current 
> concrete Collector implementations extend this class instead of directly 
> extending Collector.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-5527) Make the Collector API work per-segment

2014-04-03 Thread Hoss Man (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-5527?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13959116#comment-13959116
 ] 

Hoss Man commented on LUCENE-5527:
--

bq. Indeed there is! I think it would be nice to have it in Lucene, I would 
just like to keep this issue contained in order to make it easier to make 
progress. 

Understood -- but my point is that already in this issue:
* {{LeafCollector}} is completely new, so it can have whatever methods we think 
it needs -- if it should have {{close()}} let's give it close.
* we're discussing major changes to both the api and the functional semantics 
of {{Collector}} -- so if the semantics are on the table, and we think having 
{{close()}} is an appropriate semantic for Collector to have (in it's new 
context as a think that returns {{LeafCollectors}}) then lets make sure that's 
on the table as well.

> Make the Collector API work per-segment
> ---
>
> Key: LUCENE-5527
> URL: https://issues.apache.org/jira/browse/LUCENE-5527
> Project: Lucene - Core
>  Issue Type: Improvement
>Reporter: Adrien Grand
>Priority: Minor
> Fix For: 5.0
>
> Attachments: LUCENE-5527.patch
>
>
> Spin-off of LUCENE-5299.
> LUCENE-5229 proposes different changes, some of them being controversial, but 
> there is one of them that I really really like that consists in refactoring 
> the {{Collector}} API in order to have a different Collector per segment.
> The idea is, instead of having a single Collector object that needs to be 
> able to take care of all segments, to have a top-level Collector:
> {code}
> public interface Collector {
>   AtomicCollector setNextReader(AtomicReaderContext context) throws 
> IOException;
>   
> }
> {code}
> and a per-AtomicReaderContext collector:
> {code}
> public interface AtomicCollector {
>   void setScorer(Scorer scorer) throws IOException;
>   void collect(int doc) throws IOException;
>   boolean acceptsDocsOutOfOrder();
> }
> {code}
> I think it makes the API clearer since it is now obious {{setScorer}} and 
> {{acceptDocsOutOfOrder}} need to be called after {{setNextReader}} which is 
> otherwise unclear.
> It also makes things more flexible. For example, a collector could much more 
> easily decide to use different strategies on different segments. In 
> particular, it makes the early-termination collector much cleaner since it 
> can return different atomic collectors implementations depending on whether 
> the current segment is sorted or not.
> Even if we have lots of collectors all over the place, we could make it 
> easier to migrate by having a Collector that would implement both Collector 
> and AtomicCollector, return {{this}} in setNextReader and make current 
> concrete Collector implementations extend this class instead of directly 
> extending Collector.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-5527) Make the Collector API work per-segment

2014-04-03 Thread Adrien Grand (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-5527?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13959109#comment-13959109
 ] 

Adrien Grand commented on LUCENE-5527:
--

bq. Thanks for picking this up Adrien! I always wanted to push forward at least 
the API refactoring but did not get the chance to do so.

You're welcome. :-)

bq. It's not just Solr with DelegatingCollector that has something like this, I 
think I remember seeing this pattern even in ES.

Indeed there is! I think it would be nice to have it in Lucene, I would just 
like to keep this issue contained in order to make it easier to make progress. 
We can iterate on LUCENE-4370 once this one is in!

> Make the Collector API work per-segment
> ---
>
> Key: LUCENE-5527
> URL: https://issues.apache.org/jira/browse/LUCENE-5527
> Project: Lucene - Core
>  Issue Type: Improvement
>Reporter: Adrien Grand
>Priority: Minor
> Fix For: 5.0
>
> Attachments: LUCENE-5527.patch
>
>
> Spin-off of LUCENE-5299.
> LUCENE-5229 proposes different changes, some of them being controversial, but 
> there is one of them that I really really like that consists in refactoring 
> the {{Collector}} API in order to have a different Collector per segment.
> The idea is, instead of having a single Collector object that needs to be 
> able to take care of all segments, to have a top-level Collector:
> {code}
> public interface Collector {
>   AtomicCollector setNextReader(AtomicReaderContext context) throws 
> IOException;
>   
> }
> {code}
> and a per-AtomicReaderContext collector:
> {code}
> public interface AtomicCollector {
>   void setScorer(Scorer scorer) throws IOException;
>   void collect(int doc) throws IOException;
>   boolean acceptsDocsOutOfOrder();
> }
> {code}
> I think it makes the API clearer since it is now obious {{setScorer}} and 
> {{acceptDocsOutOfOrder}} need to be called after {{setNextReader}} which is 
> otherwise unclear.
> It also makes things more flexible. For example, a collector could much more 
> easily decide to use different strategies on different segments. In 
> particular, it makes the early-termination collector much cleaner since it 
> can return different atomic collectors implementations depending on whether 
> the current segment is sorted or not.
> Even if we have lots of collectors all over the place, we could make it 
> easier to migrate by having a Collector that would implement both Collector 
> and AtomicCollector, return {{this}} in setNextReader and make current 
> concrete Collector implementations extend this class instead of directly 
> extending Collector.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-5527) Make the Collector API work per-segment

2014-04-03 Thread Shikhar Bhushan (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-5527?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13959101#comment-13959101
 ] 

Shikhar Bhushan commented on LUCENE-5527:
-

Thanks for picking this up Adrien! I always wanted to push forward at least the 
API refactoring but did not get the chance to do so.

+1 on adding a method like LeafCollector.done()  / finish() or such, and making 
that part of the usage contract.

It's not just Solr with DelegatingCollector that has something like this, I 
think I remember seeing this pattern even in ES.

LUCENE-5299 had this as a SubCollector.done() method and it led to a lot of 
code-cleanup at various places where we were trying to detect a transition to 
the next segment based on a call to setNextReader(). In some cases, the result 
finalization was being done lazily when result retrieval methods are being 
called, because there is no other good way of knowing that the last segment has 
been processed.

> Make the Collector API work per-segment
> ---
>
> Key: LUCENE-5527
> URL: https://issues.apache.org/jira/browse/LUCENE-5527
> Project: Lucene - Core
>  Issue Type: Improvement
>Reporter: Adrien Grand
>Priority: Minor
> Fix For: 5.0
>
> Attachments: LUCENE-5527.patch
>
>
> Spin-off of LUCENE-5299.
> LUCENE-5229 proposes different changes, some of them being controversial, but 
> there is one of them that I really really like that consists in refactoring 
> the {{Collector}} API in order to have a different Collector per segment.
> The idea is, instead of having a single Collector object that needs to be 
> able to take care of all segments, to have a top-level Collector:
> {code}
> public interface Collector {
>   AtomicCollector setNextReader(AtomicReaderContext context) throws 
> IOException;
>   
> }
> {code}
> and a per-AtomicReaderContext collector:
> {code}
> public interface AtomicCollector {
>   void setScorer(Scorer scorer) throws IOException;
>   void collect(int doc) throws IOException;
>   boolean acceptsDocsOutOfOrder();
> }
> {code}
> I think it makes the API clearer since it is now obious {{setScorer}} and 
> {{acceptDocsOutOfOrder}} need to be called after {{setNextReader}} which is 
> otherwise unclear.
> It also makes things more flexible. For example, a collector could much more 
> easily decide to use different strategies on different segments. In 
> particular, it makes the early-termination collector much cleaner since it 
> can return different atomic collectors implementations depending on whether 
> the current segment is sorted or not.
> Even if we have lots of collectors all over the place, we could make it 
> easier to migrate by having a Collector that would implement both Collector 
> and AtomicCollector, return {{this}} in setNextReader and make current 
> concrete Collector implementations extend this class instead of directly 
> extending Collector.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-5527) Make the Collector API work per-segment

2014-04-03 Thread Adrien Grand (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-5527?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13959092#comment-13959092
 ] 

Adrien Grand commented on LUCENE-5527:
--

bq. is setNextReader really an appropriate name anymore? or should it be 
something like getLeafCollector(AtomicReaderContext)

I think this name is better indeed!

bq. Should we bite the bullet and make LeafCollector and Collector extend 
Closable ?

I would like having such callbacks too. LUCENE-4370 discusses such a change and 
the challenges.

bq. Should we go ahead and think about if/how this API should be tweaked (now 
or in the future) to allow a Collector to indicate that it's LeafCollectors can 
be used to collect documents from different segments in parallel threads

This was the main point of LUCENE-5299 that I forked this issue/patch from. But 
I think this is more controversial: I think that in general, it is a better 
idea to shard the index (even locally) and merge per-shard results using 
something like {{TopDocs.merge}} than trying to collect segments concurrently 
because it allows to have individual tasks of equal cost.

> Make the Collector API work per-segment
> ---
>
> Key: LUCENE-5527
> URL: https://issues.apache.org/jira/browse/LUCENE-5527
> Project: Lucene - Core
>  Issue Type: Improvement
>Reporter: Adrien Grand
>Priority: Minor
> Fix For: 5.0
>
> Attachments: LUCENE-5527.patch
>
>
> Spin-off of LUCENE-5299.
> LUCENE-5229 proposes different changes, some of them being controversial, but 
> there is one of them that I really really like that consists in refactoring 
> the {{Collector}} API in order to have a different Collector per segment.
> The idea is, instead of having a single Collector object that needs to be 
> able to take care of all segments, to have a top-level Collector:
> {code}
> public interface Collector {
>   AtomicCollector setNextReader(AtomicReaderContext context) throws 
> IOException;
>   
> }
> {code}
> and a per-AtomicReaderContext collector:
> {code}
> public interface AtomicCollector {
>   void setScorer(Scorer scorer) throws IOException;
>   void collect(int doc) throws IOException;
>   boolean acceptsDocsOutOfOrder();
> }
> {code}
> I think it makes the API clearer since it is now obious {{setScorer}} and 
> {{acceptDocsOutOfOrder}} need to be called after {{setNextReader}} which is 
> otherwise unclear.
> It also makes things more flexible. For example, a collector could much more 
> easily decide to use different strategies on different segments. In 
> particular, it makes the early-termination collector much cleaner since it 
> can return different atomic collectors implementations depending on whether 
> the current segment is sorted or not.
> Even if we have lots of collectors all over the place, we could make it 
> easier to migrate by having a Collector that would implement both Collector 
> and AtomicCollector, return {{this}} in setNextReader and make current 
> concrete Collector implementations extend this class instead of directly 
> extending Collector.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-5527) Make the Collector API work per-segment

2014-04-03 Thread Hoss Man (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-5527?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13959080#comment-13959080
 ] 

Hoss Man commented on LUCENE-5527:
--

This isn't an area of Lucene i tend to think much about, but if we're talking 
about changing the API, i have a few questions i wanted to put out there for 
discussion:

* is {{setNextReader}} really an appropriate name anymore? or should it be 
something like {{getLeafCollector(AtomicReaderContext)}}
* Should we bite the bullet and make {{LeafCollector}} and {{Collector}} extend 
{{Closable}} ?
** I believe this was discussed at one point before, and it was considered an 
obnoxious interface change with minimal gain - but if we're going to change the 
interface API now anyway, it may be worth reconsidering
** Solr works around this by always using {{DelegatingCollector}} which has a 
{{finish()}} method - i can't imagine other lucene apps don't have similar 
wishes for {{close()}} like this.
* Should we go ahead and think about if/how this API should be tweaked (now or 
in the future) to allow a {{Collector}} to indicate that it's 
{{LeafCollectors}}  can be used to collect documents from different segments in 
parallel threads? If we think a marker interface would be the best way to go, 
so we can easily add that later -- but i wanted to ask the question
** One possibility that just occurred to me: Document right now that 
{{LeafCollectors}} _may_ be processed in parallel, and that if you want to 
force single threaded collection you should implement 
{{Collector.getLeafCollector(AtomicReaderContext)}} such that it blocks until 
the previously returned {{LeafCollector}} has been closed (which should be easy 
enough to do in things like {{SimpleCollector}} using a basic sync lock, right?)

(I have no strong feelings about most of this)

> Make the Collector API work per-segment
> ---
>
> Key: LUCENE-5527
> URL: https://issues.apache.org/jira/browse/LUCENE-5527
> Project: Lucene - Core
>  Issue Type: Improvement
>Reporter: Adrien Grand
>Priority: Minor
> Fix For: 5.0
>
> Attachments: LUCENE-5527.patch
>
>
> Spin-off of LUCENE-5299.
> LUCENE-5229 proposes different changes, some of them being controversial, but 
> there is one of them that I really really like that consists in refactoring 
> the {{Collector}} API in order to have a different Collector per segment.
> The idea is, instead of having a single Collector object that needs to be 
> able to take care of all segments, to have a top-level Collector:
> {code}
> public interface Collector {
>   AtomicCollector setNextReader(AtomicReaderContext context) throws 
> IOException;
>   
> }
> {code}
> and a per-AtomicReaderContext collector:
> {code}
> public interface AtomicCollector {
>   void setScorer(Scorer scorer) throws IOException;
>   void collect(int doc) throws IOException;
>   boolean acceptsDocsOutOfOrder();
> }
> {code}
> I think it makes the API clearer since it is now obious {{setScorer}} and 
> {{acceptDocsOutOfOrder}} need to be called after {{setNextReader}} which is 
> otherwise unclear.
> It also makes things more flexible. For example, a collector could much more 
> easily decide to use different strategies on different segments. In 
> particular, it makes the early-termination collector much cleaner since it 
> can return different atomic collectors implementations depending on whether 
> the current segment is sorted or not.
> Even if we have lots of collectors all over the place, we could make it 
> easier to migrate by having a Collector that would implement both Collector 
> and AtomicCollector, return {{this}} in setNextReader and make current 
> concrete Collector implementations extend this class instead of directly 
> extending Collector.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-5527) Make the Collector API work per-segment

2014-04-03 Thread Michael McCandless (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-5527?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13959078#comment-13959078
 ] 

Michael McCandless commented on LUCENE-5527:


+1 for LeafCollector and the patch.

I tested if there are search performance impacts from this:

{noformat}
Report after iter 10:
TaskQPS base  StdDevQPS comp  StdDev
Pct diff
 Respell   49.44  (3.3%)   48.10  (3.7%)   
-2.7% (  -9% -4%)
  Fuzzy2   46.74  (3.2%)   45.73  (3.1%)   
-2.2% (  -8% -4%)
  Fuzzy1   59.25  (3.7%)   58.08  (3.5%)   
-2.0% (  -8% -5%)
  IntNRQ3.42  (3.8%)3.40  (3.8%)   
-0.7% (  -7% -7%)
 Prefix3   86.67  (2.6%)   86.17  (2.6%)   
-0.6% (  -5% -4%)
 LowSloppyPhrase   44.44  (2.3%)   44.42  (2.5%)   
-0.1% (  -4% -4%)
Wildcard   19.08  (3.5%)   19.07  (3.0%)   
-0.1% (  -6% -6%)
  AndHighMed   34.38  (1.0%)   34.38  (1.0%)   
-0.0% (  -2% -2%)
 LowSpanNear   10.41  (3.1%)   10.41  (2.3%)
0.0% (  -5% -5%)
HighSloppyPhrase3.49  (7.9%)3.49  (6.6%)
0.1% ( -13% -   15%)
 AndHighHigh   28.35  (1.1%)   28.39  (1.0%)
0.1% (  -1% -2%)
 MedSpanNear   31.06  (2.8%)   31.12  (2.7%)
0.2% (  -5% -5%)
  AndHighLow  391.44  (2.9%)  392.73  (2.6%)
0.3% (  -5% -6%)
 MedSloppyPhrase3.54  (5.2%)3.56  (4.6%)
0.4% (  -8% -   10%)
   OrHighMed   26.51  (4.0%)   26.66  (5.7%)
0.6% (  -8% -   10%)
OrHighNotLow   24.84  (4.1%)   24.98  (5.8%)
0.6% (  -9% -   10%)
   LowPhrase   13.19  (1.6%)   13.27  (2.3%)
0.6% (  -3% -4%)
   OrHighLow   18.78  (4.1%)   18.91  (5.8%)
0.7% (  -8% -   11%)
   OrNotHighHigh8.87  (4.5%)8.93  (6.0%)
0.7% (  -9% -   11%)
OrHighNotMed   30.63  (4.1%)   30.85  (5.5%)
0.7% (  -8% -   10%)
  OrHighHigh8.21  (4.1%)8.27  (5.8%)
0.7% (  -8% -   11%)
   MedPhrase  203.10  (6.6%)  204.77  (6.3%)
0.8% ( -11% -   14%)
   OrHighNotHigh   11.09  (4.5%)   11.18  (5.9%)
0.8% (  -9% -   11%)
 LowTerm  322.74  (5.6%)  325.67  (5.6%)
0.9% (  -9% -   12%)
HighTerm   63.88 (12.8%)   64.55 (12.2%)
1.1% ( -21% -   29%)
 MedTerm  100.19  (9.8%)  101.31  (9.5%)
1.1% ( -16% -   22%)
HighSpanNear8.09  (4.0%)8.18  (4.9%)
1.1% (  -7% -   10%)
  HighPhrase4.27  (7.1%)4.32  (6.5%)
1.2% ( -11% -   15%)
OrNotHighMed   19.00  (7.0%)   19.30  (7.6%)
1.6% ( -12% -   17%)
OrNotHighLow   19.63  (7.4%)   19.96  (8.0%)
1.7% ( -12% -   18%)
{noformat}

Looks like just noise!

> Make the Collector API work per-segment
> ---
>
> Key: LUCENE-5527
> URL: https://issues.apache.org/jira/browse/LUCENE-5527
> Project: Lucene - Core
>  Issue Type: Improvement
>Reporter: Adrien Grand
>Priority: Minor
> Fix For: 5.0
>
> Attachments: LUCENE-5527.patch
>
>
> Spin-off of LUCENE-5299.
> LUCENE-5229 proposes different changes, some of them being controversial, but 
> there is one of them that I really really like that consists in refactoring 
> the {{Collector}} API in order to have a different Collector per segment.
> The idea is, instead of having a single Collector object that needs to be 
> able to take care of all segments, to have a top-level Collector:
> {code}
> public interface Collector {
>   AtomicCollector setNextReader(AtomicReaderContext context) throws 
> IOException;
>   
> }
> {code}
> and a per-AtomicReaderContext collector:
> {code}
> public interface AtomicCollector {
>   void setScorer(Scorer scorer) throws IOException;
>   void collect(int doc) throws IOException;
>   boolean acceptsDocsOutOfOrder();
> }
> {code}
> I think it makes the API clearer since it is now obious {{setScorer}} and 
> {{acceptDocsOutOfOrder}} need to be called after {{setNextReader}} which is 
> otherwise unclear.
> It also makes things more flexible. For example, a collector could much more 
> easily decide to use different strategies on different segments. In 
> p

[jira] [Commented] (LUCENE-5527) Make the Collector API work per-segment

2014-04-03 Thread Adrien Grand (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-5527?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13959075#comment-13959075
 ] 

Adrien Grand commented on LUCENE-5527:
--

I opened LUCENE-5569 to discuss the renaming.

> Make the Collector API work per-segment
> ---
>
> Key: LUCENE-5527
> URL: https://issues.apache.org/jira/browse/LUCENE-5527
> Project: Lucene - Core
>  Issue Type: Improvement
>Reporter: Adrien Grand
>Priority: Minor
> Fix For: 5.0
>
> Attachments: LUCENE-5527.patch
>
>
> Spin-off of LUCENE-5299.
> LUCENE-5229 proposes different changes, some of them being controversial, but 
> there is one of them that I really really like that consists in refactoring 
> the {{Collector}} API in order to have a different Collector per segment.
> The idea is, instead of having a single Collector object that needs to be 
> able to take care of all segments, to have a top-level Collector:
> {code}
> public interface Collector {
>   AtomicCollector setNextReader(AtomicReaderContext context) throws 
> IOException;
>   
> }
> {code}
> and a per-AtomicReaderContext collector:
> {code}
> public interface AtomicCollector {
>   void setScorer(Scorer scorer) throws IOException;
>   void collect(int doc) throws IOException;
>   boolean acceptsDocsOutOfOrder();
> }
> {code}
> I think it makes the API clearer since it is now obious {{setScorer}} and 
> {{acceptDocsOutOfOrder}} need to be called after {{setNextReader}} which is 
> otherwise unclear.
> It also makes things more flexible. For example, a collector could much more 
> easily decide to use different strategies on different segments. In 
> particular, it makes the early-termination collector much cleaner since it 
> can return different atomic collectors implementations depending on whether 
> the current segment is sorted or not.
> Even if we have lots of collectors all over the place, we could make it 
> easier to migrate by having a Collector that would implement both Collector 
> and AtomicCollector, return {{this}} in setNextReader and make current 
> concrete Collector implementations extend this class instead of directly 
> extending Collector.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-5527) Make the Collector API work per-segment

2014-04-03 Thread Ryan Ernst (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-5527?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13959036#comment-13959036
 ] 

Ryan Ernst commented on LUCENE-5527:


+1

> Make the Collector API work per-segment
> ---
>
> Key: LUCENE-5527
> URL: https://issues.apache.org/jira/browse/LUCENE-5527
> Project: Lucene - Core
>  Issue Type: Improvement
>Reporter: Adrien Grand
>Priority: Minor
> Fix For: 5.0
>
> Attachments: LUCENE-5527.patch
>
>
> Spin-off of LUCENE-5299.
> LUCENE-5229 proposes different changes, some of them being controversial, but 
> there is one of them that I really really like that consists in refactoring 
> the {{Collector}} API in order to have a different Collector per segment.
> The idea is, instead of having a single Collector object that needs to be 
> able to take care of all segments, to have a top-level Collector:
> {code}
> public interface Collector {
>   AtomicCollector setNextReader(AtomicReaderContext context) throws 
> IOException;
>   
> }
> {code}
> and a per-AtomicReaderContext collector:
> {code}
> public interface AtomicCollector {
>   void setScorer(Scorer scorer) throws IOException;
>   void collect(int doc) throws IOException;
>   boolean acceptsDocsOutOfOrder();
> }
> {code}
> I think it makes the API clearer since it is now obious {{setScorer}} and 
> {{acceptDocsOutOfOrder}} need to be called after {{setNextReader}} which is 
> otherwise unclear.
> It also makes things more flexible. For example, a collector could much more 
> easily decide to use different strategies on different segments. In 
> particular, it makes the early-termination collector much cleaner since it 
> can return different atomic collectors implementations depending on whether 
> the current segment is sorted or not.
> Even if we have lots of collectors all over the place, we could make it 
> easier to migrate by having a Collector that would implement both Collector 
> and AtomicCollector, return {{this}} in setNextReader and make current 
> concrete Collector implementations extend this class instead of directly 
> extending Collector.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-5527) Make the Collector API work per-segment

2014-04-03 Thread Adrien Grand (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-5527?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13959030#comment-13959030
 ] 

Adrien Grand commented on LUCENE-5527:
--

I like {{LeafCollector}} better too. So let's name it this way now and open an 
issue to rename {{AtomicReader}} to {{LeafReader}} in 5.0?

> Make the Collector API work per-segment
> ---
>
> Key: LUCENE-5527
> URL: https://issues.apache.org/jira/browse/LUCENE-5527
> Project: Lucene - Core
>  Issue Type: Improvement
>Reporter: Adrien Grand
>Priority: Minor
> Fix For: 5.0
>
> Attachments: LUCENE-5527.patch
>
>
> Spin-off of LUCENE-5299.
> LUCENE-5229 proposes different changes, some of them being controversial, but 
> there is one of them that I really really like that consists in refactoring 
> the {{Collector}} API in order to have a different Collector per segment.
> The idea is, instead of having a single Collector object that needs to be 
> able to take care of all segments, to have a top-level Collector:
> {code}
> public interface Collector {
>   AtomicCollector setNextReader(AtomicReaderContext context) throws 
> IOException;
>   
> }
> {code}
> and a per-AtomicReaderContext collector:
> {code}
> public interface AtomicCollector {
>   void setScorer(Scorer scorer) throws IOException;
>   void collect(int doc) throws IOException;
>   boolean acceptsDocsOutOfOrder();
> }
> {code}
> I think it makes the API clearer since it is now obious {{setScorer}} and 
> {{acceptDocsOutOfOrder}} need to be called after {{setNextReader}} which is 
> otherwise unclear.
> It also makes things more flexible. For example, a collector could much more 
> easily decide to use different strategies on different segments. In 
> particular, it makes the early-termination collector much cleaner since it 
> can return different atomic collectors implementations depending on whether 
> the current segment is sorted or not.
> Even if we have lots of collectors all over the place, we could make it 
> easier to migrate by having a Collector that would implement both Collector 
> and AtomicCollector, return {{this}} in setNextReader and make current 
> concrete Collector implementations extend this class instead of directly 
> extending Collector.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-5527) Make the Collector API work per-segment

2014-04-03 Thread Ryan Ernst (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-5527?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13958984#comment-13958984
 ] 

Ryan Ernst commented on LUCENE-5527:


{quote}
I guess I wish AtomicReader were named LeafReader.
So maybe we shouldnt try to tackle that on this issue.
{quote}
I agree 'atomic' is a bad name.  Why not name this LeafCollector now, since it 
makes more sense on its own?

> Make the Collector API work per-segment
> ---
>
> Key: LUCENE-5527
> URL: https://issues.apache.org/jira/browse/LUCENE-5527
> Project: Lucene - Core
>  Issue Type: Improvement
>Reporter: Adrien Grand
>Priority: Minor
> Fix For: 5.0
>
> Attachments: LUCENE-5527.patch
>
>
> Spin-off of LUCENE-5299.
> LUCENE-5229 proposes different changes, some of them being controversial, but 
> there is one of them that I really really like that consists in refactoring 
> the {{Collector}} API in order to have a different Collector per segment.
> The idea is, instead of having a single Collector object that needs to be 
> able to take care of all segments, to have a top-level Collector:
> {code}
> public interface Collector {
>   AtomicCollector setNextReader(AtomicReaderContext context) throws 
> IOException;
>   
> }
> {code}
> and a per-AtomicReaderContext collector:
> {code}
> public interface AtomicCollector {
>   void setScorer(Scorer scorer) throws IOException;
>   void collect(int doc) throws IOException;
>   boolean acceptsDocsOutOfOrder();
> }
> {code}
> I think it makes the API clearer since it is now obious {{setScorer}} and 
> {{acceptDocsOutOfOrder}} need to be called after {{setNextReader}} which is 
> otherwise unclear.
> It also makes things more flexible. For example, a collector could much more 
> easily decide to use different strategies on different segments. In 
> particular, it makes the early-termination collector much cleaner since it 
> can return different atomic collectors implementations depending on whether 
> the current segment is sorted or not.
> Even if we have lots of collectors all over the place, we could make it 
> easier to migrate by having a Collector that would implement both Collector 
> and AtomicCollector, return {{this}} in setNextReader and make current 
> concrete Collector implementations extend this class instead of directly 
> extending Collector.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-5527) Make the Collector API work per-segment

2014-04-03 Thread Robert Muir (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-5527?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13958894#comment-13958894
 ] 

Robert Muir commented on LUCENE-5527:
-

That makes sense. Honestly I think my issue is not specific to Collector, but 
more general.

I guess I wish AtomicReader were named LeafReader.

So maybe we shouldnt try to tackle that on this issue.

> Make the Collector API work per-segment
> ---
>
> Key: LUCENE-5527
> URL: https://issues.apache.org/jira/browse/LUCENE-5527
> Project: Lucene - Core
>  Issue Type: Improvement
>Reporter: Adrien Grand
>Priority: Minor
> Fix For: 5.0
>
> Attachments: LUCENE-5527.patch
>
>
> Spin-off of LUCENE-5299.
> LUCENE-5229 proposes different changes, some of them being controversial, but 
> there is one of them that I really really like that consists in refactoring 
> the {{Collector}} API in order to have a different Collector per segment.
> The idea is, instead of having a single Collector object that needs to be 
> able to take care of all segments, to have a top-level Collector:
> {code}
> public interface Collector {
>   AtomicCollector setNextReader(AtomicReaderContext context) throws 
> IOException;
>   
> }
> {code}
> and a per-AtomicReaderContext collector:
> {code}
> public interface AtomicCollector {
>   void setScorer(Scorer scorer) throws IOException;
>   void collect(int doc) throws IOException;
>   boolean acceptsDocsOutOfOrder();
> }
> {code}
> I think it makes the API clearer since it is now obious {{setScorer}} and 
> {{acceptDocsOutOfOrder}} need to be called after {{setNextReader}} which is 
> otherwise unclear.
> It also makes things more flexible. For example, a collector could much more 
> easily decide to use different strategies on different segments. In 
> particular, it makes the early-termination collector much cleaner since it 
> can return different atomic collectors implementations depending on whether 
> the current segment is sorted or not.
> Even if we have lots of collectors all over the place, we could make it 
> easier to migrate by having a Collector that would implement both Collector 
> and AtomicCollector, return {{this}} in setNextReader and make current 
> concrete Collector implementations extend this class instead of directly 
> extending Collector.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-5527) Make the Collector API work per-segment

2014-04-03 Thread Adrien Grand (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-5527?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13958891#comment-13958891
 ] 

Adrien Grand commented on LUCENE-5527:
--

I chose these names to make it clear what these classes relate to:
 - IndexReader -> Collector
 - AtomicReader -> AtomicCollector

But I'm totally open to better names!

> Make the Collector API work per-segment
> ---
>
> Key: LUCENE-5527
> URL: https://issues.apache.org/jira/browse/LUCENE-5527
> Project: Lucene - Core
>  Issue Type: Improvement
>Reporter: Adrien Grand
>Priority: Minor
> Fix For: 5.0
>
> Attachments: LUCENE-5527.patch
>
>
> Spin-off of LUCENE-5299.
> LUCENE-5229 proposes different changes, some of them being controversial, but 
> there is one of them that I really really like that consists in refactoring 
> the {{Collector}} API in order to have a different Collector per segment.
> The idea is, instead of having a single Collector object that needs to be 
> able to take care of all segments, to have a top-level Collector:
> {code}
> public interface Collector {
>   AtomicCollector setNextReader(AtomicReaderContext context) throws 
> IOException;
>   
> }
> {code}
> and a per-AtomicReaderContext collector:
> {code}
> public interface AtomicCollector {
>   void setScorer(Scorer scorer) throws IOException;
>   void collect(int doc) throws IOException;
>   boolean acceptsDocsOutOfOrder();
> }
> {code}
> I think it makes the API clearer since it is now obious {{setScorer}} and 
> {{acceptDocsOutOfOrder}} need to be called after {{setNextReader}} which is 
> otherwise unclear.
> It also makes things more flexible. For example, a collector could much more 
> easily decide to use different strategies on different segments. In 
> particular, it makes the early-termination collector much cleaner since it 
> can return different atomic collectors implementations depending on whether 
> the current segment is sorted or not.
> Even if we have lots of collectors all over the place, we could make it 
> easier to migrate by having a Collector that would implement both Collector 
> and AtomicCollector, return {{this}} in setNextReader and make current 
> concrete Collector implementations extend this class instead of directly 
> extending Collector.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-5527) Make the Collector API work per-segment

2014-04-03 Thread Robert Muir (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-5527?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13958871#comment-13958871
 ] 

Robert Muir commented on LUCENE-5527:
-

In my opinion we should rethink the naming here.

Collector now no longer actually collects anything, and there is no 'atomic' 
property about AtomicCollector...



> Make the Collector API work per-segment
> ---
>
> Key: LUCENE-5527
> URL: https://issues.apache.org/jira/browse/LUCENE-5527
> Project: Lucene - Core
>  Issue Type: Improvement
>Reporter: Adrien Grand
>Priority: Minor
> Fix For: 5.0
>
> Attachments: LUCENE-5527.patch
>
>
> Spin-off of LUCENE-5299.
> LUCENE-5229 proposes different changes, some of them being controversial, but 
> there is one of them that I really really like that consists in refactoring 
> the {{Collector}} API in order to have a different Collector per segment.
> The idea is, instead of having a single Collector object that needs to be 
> able to take care of all segments, to have a top-level Collector:
> {code}
> public interface Collector {
>   AtomicCollector setNextReader(AtomicReaderContext context) throws 
> IOException;
>   
> }
> {code}
> and a per-AtomicReaderContext collector:
> {code}
> public interface AtomicCollector {
>   void setScorer(Scorer scorer) throws IOException;
>   void collect(int doc) throws IOException;
>   boolean acceptsDocsOutOfOrder();
> }
> {code}
> I think it makes the API clearer since it is now obious {{setScorer}} and 
> {{acceptDocsOutOfOrder}} need to be called after {{setNextReader}} which is 
> otherwise unclear.
> It also makes things more flexible. For example, a collector could much more 
> easily decide to use different strategies on different segments. In 
> particular, it makes the early-termination collector much cleaner since it 
> can return different atomic collectors implementations depending on whether 
> the current segment is sorted or not.
> Even if we have lots of collectors all over the place, we could make it 
> easier to migrate by having a Collector that would implement both Collector 
> and AtomicCollector, return {{this}} in setNextReader and make current 
> concrete Collector implementations extend this class instead of directly 
> extending Collector.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-5527) Make the Collector API work per-segment

2014-03-15 Thread Shai Erera (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-5527?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13936430#comment-13936430
 ] 

Shai Erera commented on LUCENE-5527:


Maybe we can get rid of setScorer, passing Scorer to 
{{setNextReader(AtomicReaderContext,Scorer)}}?

> Make the Collector API work per-segment
> ---
>
> Key: LUCENE-5527
> URL: https://issues.apache.org/jira/browse/LUCENE-5527
> Project: Lucene - Core
>  Issue Type: Improvement
>Reporter: Adrien Grand
>Priority: Minor
>
> Spin-off of LUCENE-5299.
> LUCENE-5229 proposes different changes, some of them being controversial, but 
> there is one of them that I really really like that consists in refactoring 
> the {{Collector}} API in order to have a different Collector per segment.
> The idea is, instead of having a single Collector object that needs to be 
> able to take care of all segments, to have a top-level Collector:
> {code}
> public interface Collector {
>   AtomicCollector setNextReader(AtomicReaderContext context) throws 
> IOException;
>   
> }
> {code}
> and a per-AtomicReaderContext collector:
> {code}
> public interface AtomicCollector {
>   void setScorer(Scorer scorer) throws IOException;
>   void collect(int doc) throws IOException;
>   boolean acceptsDocsOutOfOrder();
> }
> {code}
> I think it makes the API clearer since it is now obious {{setScorer}} and 
> {{acceptDocsOutOfOrder}} need to be called after {{setNextReader}} which is 
> otherwise unclear.
> It also makes things more flexible. For example, a collector could much more 
> easily decide to use different strategies on different segments. In 
> particular, it makes the early-termination collector much cleaner since it 
> can return different atomic collectors implementations depending on whether 
> the current segment is sorted or not.
> Even if we have lots of collectors all over the place, we could make it 
> easier to migrate by having a Collector that would implement both Collector 
> and AtomicCollector, return {{this}} in setNextReader and make current 
> concrete Collector implementations extend this class instead of directly 
> extending Collector.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-5527) Make the Collector API work per-segment

2014-03-15 Thread David Smiley (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-5527?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13936426#comment-13936426
 ] 

David Smiley commented on LUCENE-5527:
--

+1 I like it!

> Make the Collector API work per-segment
> ---
>
> Key: LUCENE-5527
> URL: https://issues.apache.org/jira/browse/LUCENE-5527
> Project: Lucene - Core
>  Issue Type: Improvement
>Reporter: Adrien Grand
>Priority: Minor
>
> Spin-off of LUCENE-5299.
> LUCENE-5229 proposes different changes, some of them being controversial, but 
> there is one of them that I really really like that consists in refactoring 
> the {{Collector}} API in order to have a different Collector per segment.
> The idea is, instead of having a single Collector object that needs to be 
> able to take care of all segments, to have a top-level Collector:
> {code}
> public interface Collector {
>   AtomicCollector setNextReader(AtomicReaderContext context) throws 
> IOException;
>   
> }
> {code}
> and a per-AtomicReaderContext collector:
> {code}
> public interface AtomicCollector {
>   void setScorer(Scorer scorer) throws IOException;
>   void collect(int doc) throws IOException;
>   boolean acceptsDocsOutOfOrder();
> }
> {code}
> I think it makes the API clearer since it is now obious {{setScorer}} and 
> {{acceptDocsOutOfOrder}} need to be called after {{setNextReader}} which is 
> otherwise unclear.
> It also makes things more flexible. For example, a collector could much more 
> easily decide to use different strategies on different segments. In 
> particular, it makes the early-termination collector much cleaner since it 
> can return different atomic collectors implementations depending on whether 
> the current segment is sorted or not.
> Even if we have lots of collectors all over the place, we could make it 
> easier to migrate by having a Collector that would implement both Collector 
> and AtomicCollector, return {{this}} in setNextReader and make current 
> concrete Collector implementations extend this class instead of directly 
> extending Collector.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org