[jira] [Updated] (LUCENE-5527) Make the Collector API work per-segment

2014-04-03 Thread Adrien Grand (JIRA)

 [ 
https://issues.apache.org/jira/browse/LUCENE-5527?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Adrien Grand updated LUCENE-5527:
-

Attachment: LUCENE-5527.patch

[~hossman_luc...@fucit.org] no worries!

[~mikemccand] Thanks for running this benchmark!

Here is an updated patch:
 - {{AtomicCollector}} has been renamed to {{LeafCollector}}
 - {{Collector.setNextReader}} has been renamed to 
{{Collector.getLeafCollector}}.

precommit and all tests pass. I plan to commit this patch to 5.0 only for the 
reasons that Yonik and Uwe mentioned.

> Make the Collector API work per-segment
> ---
>
> Key: LUCENE-5527
> URL: https://issues.apache.org/jira/browse/LUCENE-5527
> Project: Lucene - Core
>  Issue Type: Improvement
>Reporter: Adrien Grand
>Priority: Minor
> Fix For: 5.0
>
> Attachments: LUCENE-5527.patch, LUCENE-5527.patch
>
>
> Spin-off of LUCENE-5299.
> LUCENE-5229 proposes different changes, some of them being controversial, but 
> there is one of them that I really really like that consists in refactoring 
> the {{Collector}} API in order to have a different Collector per segment.
> The idea is, instead of having a single Collector object that needs to be 
> able to take care of all segments, to have a top-level Collector:
> {code}
> public interface Collector {
>   AtomicCollector setNextReader(AtomicReaderContext context) throws 
> IOException;
>   
> }
> {code}
> and a per-AtomicReaderContext collector:
> {code}
> public interface AtomicCollector {
>   void setScorer(Scorer scorer) throws IOException;
>   void collect(int doc) throws IOException;
>   boolean acceptsDocsOutOfOrder();
> }
> {code}
> I think it makes the API clearer since it is now obious {{setScorer}} and 
> {{acceptDocsOutOfOrder}} need to be called after {{setNextReader}} which is 
> otherwise unclear.
> It also makes things more flexible. For example, a collector could much more 
> easily decide to use different strategies on different segments. In 
> particular, it makes the early-termination collector much cleaner since it 
> can return different atomic collectors implementations depending on whether 
> the current segment is sorted or not.
> Even if we have lots of collectors all over the place, we could make it 
> easier to migrate by having a Collector that would implement both Collector 
> and AtomicCollector, return {{this}} in setNextReader and make current 
> concrete Collector implementations extend this class instead of directly 
> extending Collector.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Updated] (LUCENE-5527) Make the Collector API work per-segment

2014-04-03 Thread Adrien Grand (JIRA)

 [ 
https://issues.apache.org/jira/browse/LUCENE-5527?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Adrien Grand updated LUCENE-5527:
-

Attachment: LUCENE-5527.patch

Here is a patch, all tests pass.

As much as possible, I tried to only change the API, not implementations so 
top-docs collectors, etc. still work exactly the same way. The helper 
{{SimpleCollector}} implements both {{Collector}} and {{AtomicCollector}} and 
returns itself in {{setNextReader}}. It is useful for collectors that don't 
need to change their behavior based on the current reader context such as 
{{TotalHitCountCollector}}, but it is also quite handy for the migration since 
migrating a {{Collector}} to the new API is just a matter of extending 
{{SimpleCollector}} instead of {{Collector}} and renaming {{setNextReader}} to 
{{doSetNextReader}}.

I had initially thought about doing this change for 5.0 only but since the 
migration ends up being easy and since {{Collector}} is an experimental API, we 
could also have it for 4.8.

Feedback is highly welcome!

bq. Maybe we can get rid of setScorer, passing Scorer to 
setNextReader(AtomicReaderContext,Scorer)?

This can't be done today because there is a circular dependency between the 
{{Collector}} and the {{Scorer}}. If you look at 
{{IndexSearcher.searchsearch(List, Weight, Collector)}}, 
it needs to know whether the collector supports out-of-order collection in 
order to know which {{BulkScorer}} to instantiate, and then the {{BulkScorer}} 
needs to let the {{Collector}} know about the {{Scorer}} that is being used.

> Make the Collector API work per-segment
> ---
>
> Key: LUCENE-5527
> URL: https://issues.apache.org/jira/browse/LUCENE-5527
> Project: Lucene - Core
>  Issue Type: Improvement
>Reporter: Adrien Grand
>Priority: Minor
> Fix For: 5.0
>
> Attachments: LUCENE-5527.patch
>
>
> Spin-off of LUCENE-5299.
> LUCENE-5229 proposes different changes, some of them being controversial, but 
> there is one of them that I really really like that consists in refactoring 
> the {{Collector}} API in order to have a different Collector per segment.
> The idea is, instead of having a single Collector object that needs to be 
> able to take care of all segments, to have a top-level Collector:
> {code}
> public interface Collector {
>   AtomicCollector setNextReader(AtomicReaderContext context) throws 
> IOException;
>   
> }
> {code}
> and a per-AtomicReaderContext collector:
> {code}
> public interface AtomicCollector {
>   void setScorer(Scorer scorer) throws IOException;
>   void collect(int doc) throws IOException;
>   boolean acceptsDocsOutOfOrder();
> }
> {code}
> I think it makes the API clearer since it is now obious {{setScorer}} and 
> {{acceptDocsOutOfOrder}} need to be called after {{setNextReader}} which is 
> otherwise unclear.
> It also makes things more flexible. For example, a collector could much more 
> easily decide to use different strategies on different segments. In 
> particular, it makes the early-termination collector much cleaner since it 
> can return different atomic collectors implementations depending on whether 
> the current segment is sorted or not.
> Even if we have lots of collectors all over the place, we could make it 
> easier to migrate by having a Collector that would implement both Collector 
> and AtomicCollector, return {{this}} in setNextReader and make current 
> concrete Collector implementations extend this class instead of directly 
> extending Collector.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Updated] (LUCENE-5527) Make the Collector API work per-segment

2014-04-03 Thread Adrien Grand (JIRA)

 [ 
https://issues.apache.org/jira/browse/LUCENE-5527?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Adrien Grand updated LUCENE-5527:
-

Fix Version/s: 5.0

> Make the Collector API work per-segment
> ---
>
> Key: LUCENE-5527
> URL: https://issues.apache.org/jira/browse/LUCENE-5527
> Project: Lucene - Core
>  Issue Type: Improvement
>Reporter: Adrien Grand
>Priority: Minor
> Fix For: 5.0
>
>
> Spin-off of LUCENE-5299.
> LUCENE-5229 proposes different changes, some of them being controversial, but 
> there is one of them that I really really like that consists in refactoring 
> the {{Collector}} API in order to have a different Collector per segment.
> The idea is, instead of having a single Collector object that needs to be 
> able to take care of all segments, to have a top-level Collector:
> {code}
> public interface Collector {
>   AtomicCollector setNextReader(AtomicReaderContext context) throws 
> IOException;
>   
> }
> {code}
> and a per-AtomicReaderContext collector:
> {code}
> public interface AtomicCollector {
>   void setScorer(Scorer scorer) throws IOException;
>   void collect(int doc) throws IOException;
>   boolean acceptsDocsOutOfOrder();
> }
> {code}
> I think it makes the API clearer since it is now obious {{setScorer}} and 
> {{acceptDocsOutOfOrder}} need to be called after {{setNextReader}} which is 
> otherwise unclear.
> It also makes things more flexible. For example, a collector could much more 
> easily decide to use different strategies on different segments. In 
> particular, it makes the early-termination collector much cleaner since it 
> can return different atomic collectors implementations depending on whether 
> the current segment is sorted or not.
> Even if we have lots of collectors all over the place, we could make it 
> easier to migrate by having a Collector that would implement both Collector 
> and AtomicCollector, return {{this}} in setNextReader and make current 
> concrete Collector implementations extend this class instead of directly 
> extending Collector.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Updated] (LUCENE-5527) Make the Collector API work per-segment

2014-03-14 Thread Adrien Grand (JIRA)

 [ 
https://issues.apache.org/jira/browse/LUCENE-5527?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Adrien Grand updated LUCENE-5527:
-

Description: 
Spin-off of LUCENE-5299.

LUCENE-5229 proposes different changes, some of them being controversial, but 
there is one of them that I really really like that consists in refactoring the 
{{Collector}} API in order to have a different Collector per segment.

The idea is, instead of having a single Collector object that needs to be able 
to take care of all segments, to have a top-level Collector:
{code}
public interface Collector {

  AtomicCollector setNextReader(AtomicReaderContext context) throws IOException;
  
}
{code}

and a per-AtomicReaderContext collector:

{code}
public interface AtomicCollector {

  void setScorer(Scorer scorer) throws IOException;

  void collect(int doc) throws IOException;

  boolean acceptsDocsOutOfOrder();

}
{code}

I think it makes the API clearer since it is now obious {{setScorer}} and 
{{acceptDocsOutOfOrder}} need to be called after {{setNextReader}} which is 
otherwise unclear.

It also makes things more flexible. For example, a collector could much more 
easily decide to use different strategies on different segments. In particular, 
it makes the early-termination collector much cleaner since it can return 
different atomic collectors implementations depending on whether the current 
segment is sorted or not.

Even if we have lots of collectors all over the place, we could make it easier 
to migrate by having a Collector that would implement both Collector and 
AtomicCollector, return {{this}} in setNextReader and make current concrete 
Collector implementations extend this class instead of directly extending 
Collector.

  was:
Spin-off of LUCENE-5299.

LUCENE-5229 is large and proposes lots of different changes, some of them being 
controversial, but there is one of them that I really really like that consists 
in refactoring the {{Collector}} API in order to have a different Collector per 
segment.

The idea is, instead of having a single Collector object that needs to be able 
to take care of all segments, to have a top-level Collector:
{code}
public interface Collector {

  AtomicCollector setNextReader(AtomicReaderContext context) throws IOException;
  
}
{code}

and a per-AtomicReaderContext collector:

{code}
public interface AtomicCollector {

  void setScorer(Scorer scorer) throws IOException;

  void collect(int doc) throws IOException;

  boolean acceptsDocsOutOfOrder();

}
{code}

I think it makes the API clearer since it is now obious {{setScorer}} and 
{{acceptDocsOutOfOrder}} need to be called after {{setNextReader}} which is 
otherwise unclear.

It also makes things more flexible. For example, a collector could much more 
easily decide to use different strategies on different segments. In particular, 
it makes the early-termination collector much cleaner since it can return 
different atomic collectors implementations depending on whether the current 
segment is sorted or not.

Even if we have lots of collectors all over the place, we could make it easier 
to migrate by having a Collector that would implement both Collector and 
AtomicCollector, return {{this}} in setNextReader and make current concrete 
Collector implementations extend this class instead of directly extending 
Collector.


> Make the Collector API work per-segment
> ---
>
> Key: LUCENE-5527
> URL: https://issues.apache.org/jira/browse/LUCENE-5527
> Project: Lucene - Core
>  Issue Type: Improvement
>Reporter: Adrien Grand
>Priority: Minor
>
> Spin-off of LUCENE-5299.
> LUCENE-5229 proposes different changes, some of them being controversial, but 
> there is one of them that I really really like that consists in refactoring 
> the {{Collector}} API in order to have a different Collector per segment.
> The idea is, instead of having a single Collector object that needs to be 
> able to take care of all segments, to have a top-level Collector:
> {code}
> public interface Collector {
>   AtomicCollector setNextReader(AtomicReaderContext context) throws 
> IOException;
>   
> }
> {code}
> and a per-AtomicReaderContext collector:
> {code}
> public interface AtomicCollector {
>   void setScorer(Scorer scorer) throws IOException;
>   void collect(int doc) throws IOException;
>   boolean acceptsDocsOutOfOrder();
> }
> {code}
> I think it makes the API clearer since it is now obious {{setScorer}} and 
> {{acceptDocsOutOfOrder}} need to be called after {{setNextReader}} which is 
> otherwise unclear.
> It also makes things more flexible. For example, a collector could much more 
> easily decide to use different strategies on different segments. In 
> particular, it makes the early-termination collector much cleaner since it 
> can return different atomic collectors implemen