[jira] [Commented] (LUCENE-5527) Make the Collector API work per-segment
[ https://issues.apache.org/jira/browse/LUCENE-5527?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13960055#comment-13960055 ] ASF subversion and git services commented on LUCENE-5527: - Commit 1584747 from jpou...@apache.org in branch 'dev/trunk' [ https://svn.apache.org/r1584747 ] LUCENE-5527: Refactor Collector API to use a dedicated Collector per leaf. > Make the Collector API work per-segment > --- > > Key: LUCENE-5527 > URL: https://issues.apache.org/jira/browse/LUCENE-5527 > Project: Lucene - Core > Issue Type: Improvement >Reporter: Adrien Grand >Priority: Minor > Fix For: 5.0 > > Attachments: LUCENE-5527.patch, LUCENE-5527.patch > > > Spin-off of LUCENE-5299. > LUCENE-5229 proposes different changes, some of them being controversial, but > there is one of them that I really really like that consists in refactoring > the {{Collector}} API in order to have a different Collector per segment. > The idea is, instead of having a single Collector object that needs to be > able to take care of all segments, to have a top-level Collector: > {code} > public interface Collector { > AtomicCollector setNextReader(AtomicReaderContext context) throws > IOException; > > } > {code} > and a per-AtomicReaderContext collector: > {code} > public interface AtomicCollector { > void setScorer(Scorer scorer) throws IOException; > void collect(int doc) throws IOException; > boolean acceptsDocsOutOfOrder(); > } > {code} > I think it makes the API clearer since it is now obious {{setScorer}} and > {{acceptDocsOutOfOrder}} need to be called after {{setNextReader}} which is > otherwise unclear. > It also makes things more flexible. For example, a collector could much more > easily decide to use different strategies on different segments. In > particular, it makes the early-termination collector much cleaner since it > can return different atomic collectors implementations depending on whether > the current segment is sorted or not. > Even if we have lots of collectors all over the place, we could make it > easier to migrate by having a Collector that would implement both Collector > and AtomicCollector, return {{this}} in setNextReader and make current > concrete Collector implementations extend this class instead of directly > extending Collector. -- This message was sent by Atlassian JIRA (v6.2#6252) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-5527) Make the Collector API work per-segment
[ https://issues.apache.org/jira/browse/LUCENE-5527?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13959340#comment-13959340 ] Hoss Man commented on LUCENE-5527: -- bq. I dont think we have to "give up" on the close() idea. Well, I only brought it up because i didn't realize the full context of existing discussion that already took place in LUCENE-5299 and it seemed like an "easy" win if we were already going to change the api/lifecycle of Collector -- but i don't want to derail Adrien's current goal. If, as you say, ... bq. ...it (adding Closable) just has its own issue and its own set of problems ...that seems like good grounds for it to be tackled in it's own Jira, and let this one (splitting out LeafCollector) move forward first. (Adrien: sorry for stiring up so much shit :( ) > Make the Collector API work per-segment > --- > > Key: LUCENE-5527 > URL: https://issues.apache.org/jira/browse/LUCENE-5527 > Project: Lucene - Core > Issue Type: Improvement >Reporter: Adrien Grand >Priority: Minor > Fix For: 5.0 > > Attachments: LUCENE-5527.patch > > > Spin-off of LUCENE-5299. > LUCENE-5229 proposes different changes, some of them being controversial, but > there is one of them that I really really like that consists in refactoring > the {{Collector}} API in order to have a different Collector per segment. > The idea is, instead of having a single Collector object that needs to be > able to take care of all segments, to have a top-level Collector: > {code} > public interface Collector { > AtomicCollector setNextReader(AtomicReaderContext context) throws > IOException; > > } > {code} > and a per-AtomicReaderContext collector: > {code} > public interface AtomicCollector { > void setScorer(Scorer scorer) throws IOException; > void collect(int doc) throws IOException; > boolean acceptsDocsOutOfOrder(); > } > {code} > I think it makes the API clearer since it is now obious {{setScorer}} and > {{acceptDocsOutOfOrder}} need to be called after {{setNextReader}} which is > otherwise unclear. > It also makes things more flexible. For example, a collector could much more > easily decide to use different strategies on different segments. In > particular, it makes the early-termination collector much cleaner since it > can return different atomic collectors implementations depending on whether > the current segment is sorted or not. > Even if we have lots of collectors all over the place, we could make it > easier to migrate by having a Collector that would implement both Collector > and AtomicCollector, return {{this}} in setNextReader and make current > concrete Collector implementations extend this class instead of directly > extending Collector. -- This message was sent by Atlassian JIRA (v6.2#6252) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-5527) Make the Collector API work per-segment
[ https://issues.apache.org/jira/browse/LUCENE-5527?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13959326#comment-13959326 ] Robert Muir commented on LUCENE-5527: - I dont think we have to "give up" on the close() idea. it just has its own issue and its own set of problems. probably a more annoying/larger issue than this one to get right. I think its ok to just tackle it there... > Make the Collector API work per-segment > --- > > Key: LUCENE-5527 > URL: https://issues.apache.org/jira/browse/LUCENE-5527 > Project: Lucene - Core > Issue Type: Improvement >Reporter: Adrien Grand >Priority: Minor > Fix For: 5.0 > > Attachments: LUCENE-5527.patch > > > Spin-off of LUCENE-5299. > LUCENE-5229 proposes different changes, some of them being controversial, but > there is one of them that I really really like that consists in refactoring > the {{Collector}} API in order to have a different Collector per segment. > The idea is, instead of having a single Collector object that needs to be > able to take care of all segments, to have a top-level Collector: > {code} > public interface Collector { > AtomicCollector setNextReader(AtomicReaderContext context) throws > IOException; > > } > {code} > and a per-AtomicReaderContext collector: > {code} > public interface AtomicCollector { > void setScorer(Scorer scorer) throws IOException; > void collect(int doc) throws IOException; > boolean acceptsDocsOutOfOrder(); > } > {code} > I think it makes the API clearer since it is now obious {{setScorer}} and > {{acceptDocsOutOfOrder}} need to be called after {{setNextReader}} which is > otherwise unclear. > It also makes things more flexible. For example, a collector could much more > easily decide to use different strategies on different segments. In > particular, it makes the early-termination collector much cleaner since it > can return different atomic collectors implementations depending on whether > the current segment is sorted or not. > Even if we have lots of collectors all over the place, we could make it > easier to migrate by having a Collector that would implement both Collector > and AtomicCollector, return {{this}} in setNextReader and make current > concrete Collector implementations extend this class instead of directly > extending Collector. -- This message was sent by Atlassian JIRA (v6.2#6252) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-5527) Make the Collector API work per-segment
[ https://issues.apache.org/jira/browse/LUCENE-5527?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13959324#comment-13959324 ] Hoss Man commented on LUCENE-5527: -- bq. Its controversial. ... Fair enough, i retract the suggestion. Off my list of questions, that just leaves... bq. is setNextReader really an appropriate name anymore? or should it be something like getLeafCollector(AtomicReaderContext) ...which Adrien seemed to be on board with. Any other concerns? > Make the Collector API work per-segment > --- > > Key: LUCENE-5527 > URL: https://issues.apache.org/jira/browse/LUCENE-5527 > Project: Lucene - Core > Issue Type: Improvement >Reporter: Adrien Grand >Priority: Minor > Fix For: 5.0 > > Attachments: LUCENE-5527.patch > > > Spin-off of LUCENE-5299. > LUCENE-5229 proposes different changes, some of them being controversial, but > there is one of them that I really really like that consists in refactoring > the {{Collector}} API in order to have a different Collector per segment. > The idea is, instead of having a single Collector object that needs to be > able to take care of all segments, to have a top-level Collector: > {code} > public interface Collector { > AtomicCollector setNextReader(AtomicReaderContext context) throws > IOException; > > } > {code} > and a per-AtomicReaderContext collector: > {code} > public interface AtomicCollector { > void setScorer(Scorer scorer) throws IOException; > void collect(int doc) throws IOException; > boolean acceptsDocsOutOfOrder(); > } > {code} > I think it makes the API clearer since it is now obious {{setScorer}} and > {{acceptDocsOutOfOrder}} need to be called after {{setNextReader}} which is > otherwise unclear. > It also makes things more flexible. For example, a collector could much more > easily decide to use different strategies on different segments. In > particular, it makes the early-termination collector much cleaner since it > can return different atomic collectors implementations depending on whether > the current segment is sorted or not. > Even if we have lots of collectors all over the place, we could make it > easier to migrate by having a Collector that would implement both Collector > and AtomicCollector, return {{this}} in setNextReader and make current > concrete Collector implementations extend this class instead of directly > extending Collector. -- This message was sent by Atlassian JIRA (v6.2#6252) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-5527) Make the Collector API work per-segment
[ https://issues.apache.org/jira/browse/LUCENE-5527?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13959300#comment-13959300 ] Robert Muir commented on LUCENE-5527: - {quote} Even if parallelization is controversial, adding close() doesn't seem to be, so why not do all of the known desirable, non-controversial, Collector API changes at once? {quote} Its controversial. Its going to be some work to do it correctly, and its arguably completely unnecessary (after IS.search returns, you are done with the collector). I described these problems here: LUCENE-4370 If we can add close() in a way thats consistent, fine. But adding this "lifecycle" stuff in a way that is inconsistent is very bad. We need semantics that can be relied upon. > Make the Collector API work per-segment > --- > > Key: LUCENE-5527 > URL: https://issues.apache.org/jira/browse/LUCENE-5527 > Project: Lucene - Core > Issue Type: Improvement >Reporter: Adrien Grand >Priority: Minor > Fix For: 5.0 > > Attachments: LUCENE-5527.patch > > > Spin-off of LUCENE-5299. > LUCENE-5229 proposes different changes, some of them being controversial, but > there is one of them that I really really like that consists in refactoring > the {{Collector}} API in order to have a different Collector per segment. > The idea is, instead of having a single Collector object that needs to be > able to take care of all segments, to have a top-level Collector: > {code} > public interface Collector { > AtomicCollector setNextReader(AtomicReaderContext context) throws > IOException; > > } > {code} > and a per-AtomicReaderContext collector: > {code} > public interface AtomicCollector { > void setScorer(Scorer scorer) throws IOException; > void collect(int doc) throws IOException; > boolean acceptsDocsOutOfOrder(); > } > {code} > I think it makes the API clearer since it is now obious {{setScorer}} and > {{acceptDocsOutOfOrder}} need to be called after {{setNextReader}} which is > otherwise unclear. > It also makes things more flexible. For example, a collector could much more > easily decide to use different strategies on different segments. In > particular, it makes the early-termination collector much cleaner since it > can return different atomic collectors implementations depending on whether > the current segment is sorted or not. > Even if we have lots of collectors all over the place, we could make it > easier to migrate by having a Collector that would implement both Collector > and AtomicCollector, return {{this}} in setNextReader and make current > concrete Collector implementations extend this class instead of directly > extending Collector. -- This message was sent by Atlassian JIRA (v6.2#6252) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-5527) Make the Collector API work per-segment
[ https://issues.apache.org/jira/browse/LUCENE-5527?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13959280#comment-13959280 ] Adrien Grand commented on LUCENE-5527: -- I think there are things to discuss regarding proper wrapping and how to deal with exceptions that are thrown during collection: should close/finish be called in case an exception is thrown during collection (do we want to do the same in case of an IOException or a TimeExceededException)? > Make the Collector API work per-segment > --- > > Key: LUCENE-5527 > URL: https://issues.apache.org/jira/browse/LUCENE-5527 > Project: Lucene - Core > Issue Type: Improvement >Reporter: Adrien Grand >Priority: Minor > Fix For: 5.0 > > Attachments: LUCENE-5527.patch > > > Spin-off of LUCENE-5299. > LUCENE-5229 proposes different changes, some of them being controversial, but > there is one of them that I really really like that consists in refactoring > the {{Collector}} API in order to have a different Collector per segment. > The idea is, instead of having a single Collector object that needs to be > able to take care of all segments, to have a top-level Collector: > {code} > public interface Collector { > AtomicCollector setNextReader(AtomicReaderContext context) throws > IOException; > > } > {code} > and a per-AtomicReaderContext collector: > {code} > public interface AtomicCollector { > void setScorer(Scorer scorer) throws IOException; > void collect(int doc) throws IOException; > boolean acceptsDocsOutOfOrder(); > } > {code} > I think it makes the API clearer since it is now obious {{setScorer}} and > {{acceptDocsOutOfOrder}} need to be called after {{setNextReader}} which is > otherwise unclear. > It also makes things more flexible. For example, a collector could much more > easily decide to use different strategies on different segments. In > particular, it makes the early-termination collector much cleaner since it > can return different atomic collectors implementations depending on whether > the current segment is sorted or not. > Even if we have lots of collectors all over the place, we could make it > easier to migrate by having a Collector that would implement both Collector > and AtomicCollector, return {{this}} in setNextReader and make current > concrete Collector implementations extend this class instead of directly > extending Collector. -- This message was sent by Atlassian JIRA (v6.2#6252) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-5527) Make the Collector API work per-segment
[ https://issues.apache.org/jira/browse/LUCENE-5527?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13959254#comment-13959254 ] Hoss Man commented on LUCENE-5527: -- bq. The patch only moves 3 methods of the Collector class into a new LeafCollector class that gets instantiated per-segment. It doesn't add any functionnality but just changes the way documents are collected. sure ... but "in for a penny in for a pound" ... this change is going to require every existing implementation of "Collector" to change, so *if* we're already going to require that of all existing classes, why not also add a {{close()}} method (that can easily be a No-Op for most existing implementations) at the same time? bq. I think having callbacks upon collection completion would be useful and help clean up code as Shikhar Bhushan mentionned. But this LeafCollector refactoring has already been stalled in LUCENE-5299 because LUCENE-5299 also discussed parallelizing collection so I would like to keep this change as small as possible and get it in. Fair enough -- as i said, i don't have strong opinions about any of this, i just wanted to raise the question. But personally i think that adding {{Closable}} here and now as part of this API change issue makes sense (since it's about the "life cycle" of the Collector/LeafCollector) independent of any question of parallelization., Even if parallelization is controversial, adding {{close()}} doesn't seem to be, so why not do all of the known desirable, non-controversial, Collector API changes at once? If i'm wrong, and there is some dissent about whether {{close()}} makes sense on these classes, then by all means yes: ignore the suggestion and focus purely on what you've got in the current patch. > Make the Collector API work per-segment > --- > > Key: LUCENE-5527 > URL: https://issues.apache.org/jira/browse/LUCENE-5527 > Project: Lucene - Core > Issue Type: Improvement >Reporter: Adrien Grand >Priority: Minor > Fix For: 5.0 > > Attachments: LUCENE-5527.patch > > > Spin-off of LUCENE-5299. > LUCENE-5229 proposes different changes, some of them being controversial, but > there is one of them that I really really like that consists in refactoring > the {{Collector}} API in order to have a different Collector per segment. > The idea is, instead of having a single Collector object that needs to be > able to take care of all segments, to have a top-level Collector: > {code} > public interface Collector { > AtomicCollector setNextReader(AtomicReaderContext context) throws > IOException; > > } > {code} > and a per-AtomicReaderContext collector: > {code} > public interface AtomicCollector { > void setScorer(Scorer scorer) throws IOException; > void collect(int doc) throws IOException; > boolean acceptsDocsOutOfOrder(); > } > {code} > I think it makes the API clearer since it is now obious {{setScorer}} and > {{acceptDocsOutOfOrder}} need to be called after {{setNextReader}} which is > otherwise unclear. > It also makes things more flexible. For example, a collector could much more > easily decide to use different strategies on different segments. In > particular, it makes the early-termination collector much cleaner since it > can return different atomic collectors implementations depending on whether > the current segment is sorted or not. > Even if we have lots of collectors all over the place, we could make it > easier to migrate by having a Collector that would implement both Collector > and AtomicCollector, return {{this}} in setNextReader and make current > concrete Collector implementations extend this class instead of directly > extending Collector. -- This message was sent by Atlassian JIRA (v6.2#6252) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-5527) Make the Collector API work per-segment
[ https://issues.apache.org/jira/browse/LUCENE-5527?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13959252#comment-13959252 ] Adrien Grand commented on LUCENE-5527: -- Uwe, are you talking about the {{LeafReader}} or the {{LeafCollector}} renaming? > Make the Collector API work per-segment > --- > > Key: LUCENE-5527 > URL: https://issues.apache.org/jira/browse/LUCENE-5527 > Project: Lucene - Core > Issue Type: Improvement >Reporter: Adrien Grand >Priority: Minor > Fix For: 5.0 > > Attachments: LUCENE-5527.patch > > > Spin-off of LUCENE-5299. > LUCENE-5229 proposes different changes, some of them being controversial, but > there is one of them that I really really like that consists in refactoring > the {{Collector}} API in order to have a different Collector per segment. > The idea is, instead of having a single Collector object that needs to be > able to take care of all segments, to have a top-level Collector: > {code} > public interface Collector { > AtomicCollector setNextReader(AtomicReaderContext context) throws > IOException; > > } > {code} > and a per-AtomicReaderContext collector: > {code} > public interface AtomicCollector { > void setScorer(Scorer scorer) throws IOException; > void collect(int doc) throws IOException; > boolean acceptsDocsOutOfOrder(); > } > {code} > I think it makes the API clearer since it is now obious {{setScorer}} and > {{acceptDocsOutOfOrder}} need to be called after {{setNextReader}} which is > otherwise unclear. > It also makes things more flexible. For example, a collector could much more > easily decide to use different strategies on different segments. In > particular, it makes the early-termination collector much cleaner since it > can return different atomic collectors implementations depending on whether > the current segment is sorted or not. > Even if we have lots of collectors all over the place, we could make it > easier to migrate by having a Collector that would implement both Collector > and AtomicCollector, return {{this}} in setNextReader and make current > concrete Collector implementations extend this class instead of directly > extending Collector. -- This message was sent by Atlassian JIRA (v6.2#6252) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-5527) Make the Collector API work per-segment
[ https://issues.apache.org/jira/browse/LUCENE-5527?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13959246#comment-13959246 ] Uwe Schindler commented on LUCENE-5527: --- bq. That surprised me that it was marked as "experimental"! It's been around in it's current form for 3 years, and anyone doing anything "interesting" with Lucene would most likely be using it. Perhaps this change should only be targeted toward trunk (5.0)? +1 - Yes, please. This would make upgrading a pain for lot of people. Like the renaming issue, I would only ever do this for 5.0. Why should we do it before 5.0? It is mainly a refactoring. > Make the Collector API work per-segment > --- > > Key: LUCENE-5527 > URL: https://issues.apache.org/jira/browse/LUCENE-5527 > Project: Lucene - Core > Issue Type: Improvement >Reporter: Adrien Grand >Priority: Minor > Fix For: 5.0 > > Attachments: LUCENE-5527.patch > > > Spin-off of LUCENE-5299. > LUCENE-5229 proposes different changes, some of them being controversial, but > there is one of them that I really really like that consists in refactoring > the {{Collector}} API in order to have a different Collector per segment. > The idea is, instead of having a single Collector object that needs to be > able to take care of all segments, to have a top-level Collector: > {code} > public interface Collector { > AtomicCollector setNextReader(AtomicReaderContext context) throws > IOException; > > } > {code} > and a per-AtomicReaderContext collector: > {code} > public interface AtomicCollector { > void setScorer(Scorer scorer) throws IOException; > void collect(int doc) throws IOException; > boolean acceptsDocsOutOfOrder(); > } > {code} > I think it makes the API clearer since it is now obious {{setScorer}} and > {{acceptDocsOutOfOrder}} need to be called after {{setNextReader}} which is > otherwise unclear. > It also makes things more flexible. For example, a collector could much more > easily decide to use different strategies on different segments. In > particular, it makes the early-termination collector much cleaner since it > can return different atomic collectors implementations depending on whether > the current segment is sorted or not. > Even if we have lots of collectors all over the place, we could make it > easier to migrate by having a Collector that would implement both Collector > and AtomicCollector, return {{this}} in setNextReader and make current > concrete Collector implementations extend this class instead of directly > extending Collector. -- This message was sent by Atlassian JIRA (v6.2#6252) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-5527) Make the Collector API work per-segment
[ https://issues.apache.org/jira/browse/LUCENE-5527?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13959225#comment-13959225 ] Yonik Seeley commented on LUCENE-5527: -- bq. I had initially thought about doing this change for 5.0 only but since the migration ends up being easy and since Collector is an experimental API, we could also have it for 4.8. That surprised me that it was marked as "experimental"! It's been around in it's current form for 3 years, and anyone doing anything "interesting" with Lucene would most likely be using it. Perhaps this change should only be targeted toward trunk (5.0)? > Make the Collector API work per-segment > --- > > Key: LUCENE-5527 > URL: https://issues.apache.org/jira/browse/LUCENE-5527 > Project: Lucene - Core > Issue Type: Improvement >Reporter: Adrien Grand >Priority: Minor > Fix For: 5.0 > > Attachments: LUCENE-5527.patch > > > Spin-off of LUCENE-5299. > LUCENE-5229 proposes different changes, some of them being controversial, but > there is one of them that I really really like that consists in refactoring > the {{Collector}} API in order to have a different Collector per segment. > The idea is, instead of having a single Collector object that needs to be > able to take care of all segments, to have a top-level Collector: > {code} > public interface Collector { > AtomicCollector setNextReader(AtomicReaderContext context) throws > IOException; > > } > {code} > and a per-AtomicReaderContext collector: > {code} > public interface AtomicCollector { > void setScorer(Scorer scorer) throws IOException; > void collect(int doc) throws IOException; > boolean acceptsDocsOutOfOrder(); > } > {code} > I think it makes the API clearer since it is now obious {{setScorer}} and > {{acceptDocsOutOfOrder}} need to be called after {{setNextReader}} which is > otherwise unclear. > It also makes things more flexible. For example, a collector could much more > easily decide to use different strategies on different segments. In > particular, it makes the early-termination collector much cleaner since it > can return different atomic collectors implementations depending on whether > the current segment is sorted or not. > Even if we have lots of collectors all over the place, we could make it > easier to migrate by having a Collector that would implement both Collector > and AtomicCollector, return {{this}} in setNextReader and make current > concrete Collector implementations extend this class instead of directly > extending Collector. -- This message was sent by Atlassian JIRA (v6.2#6252) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-5527) Make the Collector API work per-segment
[ https://issues.apache.org/jira/browse/LUCENE-5527?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13959218#comment-13959218 ] Adrien Grand commented on LUCENE-5527: -- The patch only moves 3 methods of the {{Collector}} class into a new {{LeafCollector}} class that gets instantiated per-segment. It doesn't add any functionnality but just changes the way documents are collected. I think having callbacks upon collection completion would be useful and help clean up code as [~shikhar] mentionned. But this {{LeafCollector}} refactoring has already been stalled in LUCENE-5299 because LUCENE-5299 also discussed parallelizing collection so I would like to keep this change as small as possible and get it in. > Make the Collector API work per-segment > --- > > Key: LUCENE-5527 > URL: https://issues.apache.org/jira/browse/LUCENE-5527 > Project: Lucene - Core > Issue Type: Improvement >Reporter: Adrien Grand >Priority: Minor > Fix For: 5.0 > > Attachments: LUCENE-5527.patch > > > Spin-off of LUCENE-5299. > LUCENE-5229 proposes different changes, some of them being controversial, but > there is one of them that I really really like that consists in refactoring > the {{Collector}} API in order to have a different Collector per segment. > The idea is, instead of having a single Collector object that needs to be > able to take care of all segments, to have a top-level Collector: > {code} > public interface Collector { > AtomicCollector setNextReader(AtomicReaderContext context) throws > IOException; > > } > {code} > and a per-AtomicReaderContext collector: > {code} > public interface AtomicCollector { > void setScorer(Scorer scorer) throws IOException; > void collect(int doc) throws IOException; > boolean acceptsDocsOutOfOrder(); > } > {code} > I think it makes the API clearer since it is now obious {{setScorer}} and > {{acceptDocsOutOfOrder}} need to be called after {{setNextReader}} which is > otherwise unclear. > It also makes things more flexible. For example, a collector could much more > easily decide to use different strategies on different segments. In > particular, it makes the early-termination collector much cleaner since it > can return different atomic collectors implementations depending on whether > the current segment is sorted or not. > Even if we have lots of collectors all over the place, we could make it > easier to migrate by having a Collector that would implement both Collector > and AtomicCollector, return {{this}} in setNextReader and make current > concrete Collector implementations extend this class instead of directly > extending Collector. -- This message was sent by Atlassian JIRA (v6.2#6252) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-5527) Make the Collector API work per-segment
[ https://issues.apache.org/jira/browse/LUCENE-5527?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13959116#comment-13959116 ] Hoss Man commented on LUCENE-5527: -- bq. Indeed there is! I think it would be nice to have it in Lucene, I would just like to keep this issue contained in order to make it easier to make progress. Understood -- but my point is that already in this issue: * {{LeafCollector}} is completely new, so it can have whatever methods we think it needs -- if it should have {{close()}} let's give it close. * we're discussing major changes to both the api and the functional semantics of {{Collector}} -- so if the semantics are on the table, and we think having {{close()}} is an appropriate semantic for Collector to have (in it's new context as a think that returns {{LeafCollectors}}) then lets make sure that's on the table as well. > Make the Collector API work per-segment > --- > > Key: LUCENE-5527 > URL: https://issues.apache.org/jira/browse/LUCENE-5527 > Project: Lucene - Core > Issue Type: Improvement >Reporter: Adrien Grand >Priority: Minor > Fix For: 5.0 > > Attachments: LUCENE-5527.patch > > > Spin-off of LUCENE-5299. > LUCENE-5229 proposes different changes, some of them being controversial, but > there is one of them that I really really like that consists in refactoring > the {{Collector}} API in order to have a different Collector per segment. > The idea is, instead of having a single Collector object that needs to be > able to take care of all segments, to have a top-level Collector: > {code} > public interface Collector { > AtomicCollector setNextReader(AtomicReaderContext context) throws > IOException; > > } > {code} > and a per-AtomicReaderContext collector: > {code} > public interface AtomicCollector { > void setScorer(Scorer scorer) throws IOException; > void collect(int doc) throws IOException; > boolean acceptsDocsOutOfOrder(); > } > {code} > I think it makes the API clearer since it is now obious {{setScorer}} and > {{acceptDocsOutOfOrder}} need to be called after {{setNextReader}} which is > otherwise unclear. > It also makes things more flexible. For example, a collector could much more > easily decide to use different strategies on different segments. In > particular, it makes the early-termination collector much cleaner since it > can return different atomic collectors implementations depending on whether > the current segment is sorted or not. > Even if we have lots of collectors all over the place, we could make it > easier to migrate by having a Collector that would implement both Collector > and AtomicCollector, return {{this}} in setNextReader and make current > concrete Collector implementations extend this class instead of directly > extending Collector. -- This message was sent by Atlassian JIRA (v6.2#6252) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-5527) Make the Collector API work per-segment
[ https://issues.apache.org/jira/browse/LUCENE-5527?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13959109#comment-13959109 ] Adrien Grand commented on LUCENE-5527: -- bq. Thanks for picking this up Adrien! I always wanted to push forward at least the API refactoring but did not get the chance to do so. You're welcome. :-) bq. It's not just Solr with DelegatingCollector that has something like this, I think I remember seeing this pattern even in ES. Indeed there is! I think it would be nice to have it in Lucene, I would just like to keep this issue contained in order to make it easier to make progress. We can iterate on LUCENE-4370 once this one is in! > Make the Collector API work per-segment > --- > > Key: LUCENE-5527 > URL: https://issues.apache.org/jira/browse/LUCENE-5527 > Project: Lucene - Core > Issue Type: Improvement >Reporter: Adrien Grand >Priority: Minor > Fix For: 5.0 > > Attachments: LUCENE-5527.patch > > > Spin-off of LUCENE-5299. > LUCENE-5229 proposes different changes, some of them being controversial, but > there is one of them that I really really like that consists in refactoring > the {{Collector}} API in order to have a different Collector per segment. > The idea is, instead of having a single Collector object that needs to be > able to take care of all segments, to have a top-level Collector: > {code} > public interface Collector { > AtomicCollector setNextReader(AtomicReaderContext context) throws > IOException; > > } > {code} > and a per-AtomicReaderContext collector: > {code} > public interface AtomicCollector { > void setScorer(Scorer scorer) throws IOException; > void collect(int doc) throws IOException; > boolean acceptsDocsOutOfOrder(); > } > {code} > I think it makes the API clearer since it is now obious {{setScorer}} and > {{acceptDocsOutOfOrder}} need to be called after {{setNextReader}} which is > otherwise unclear. > It also makes things more flexible. For example, a collector could much more > easily decide to use different strategies on different segments. In > particular, it makes the early-termination collector much cleaner since it > can return different atomic collectors implementations depending on whether > the current segment is sorted or not. > Even if we have lots of collectors all over the place, we could make it > easier to migrate by having a Collector that would implement both Collector > and AtomicCollector, return {{this}} in setNextReader and make current > concrete Collector implementations extend this class instead of directly > extending Collector. -- This message was sent by Atlassian JIRA (v6.2#6252) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-5527) Make the Collector API work per-segment
[ https://issues.apache.org/jira/browse/LUCENE-5527?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13959101#comment-13959101 ] Shikhar Bhushan commented on LUCENE-5527: - Thanks for picking this up Adrien! I always wanted to push forward at least the API refactoring but did not get the chance to do so. +1 on adding a method like LeafCollector.done() / finish() or such, and making that part of the usage contract. It's not just Solr with DelegatingCollector that has something like this, I think I remember seeing this pattern even in ES. LUCENE-5299 had this as a SubCollector.done() method and it led to a lot of code-cleanup at various places where we were trying to detect a transition to the next segment based on a call to setNextReader(). In some cases, the result finalization was being done lazily when result retrieval methods are being called, because there is no other good way of knowing that the last segment has been processed. > Make the Collector API work per-segment > --- > > Key: LUCENE-5527 > URL: https://issues.apache.org/jira/browse/LUCENE-5527 > Project: Lucene - Core > Issue Type: Improvement >Reporter: Adrien Grand >Priority: Minor > Fix For: 5.0 > > Attachments: LUCENE-5527.patch > > > Spin-off of LUCENE-5299. > LUCENE-5229 proposes different changes, some of them being controversial, but > there is one of them that I really really like that consists in refactoring > the {{Collector}} API in order to have a different Collector per segment. > The idea is, instead of having a single Collector object that needs to be > able to take care of all segments, to have a top-level Collector: > {code} > public interface Collector { > AtomicCollector setNextReader(AtomicReaderContext context) throws > IOException; > > } > {code} > and a per-AtomicReaderContext collector: > {code} > public interface AtomicCollector { > void setScorer(Scorer scorer) throws IOException; > void collect(int doc) throws IOException; > boolean acceptsDocsOutOfOrder(); > } > {code} > I think it makes the API clearer since it is now obious {{setScorer}} and > {{acceptDocsOutOfOrder}} need to be called after {{setNextReader}} which is > otherwise unclear. > It also makes things more flexible. For example, a collector could much more > easily decide to use different strategies on different segments. In > particular, it makes the early-termination collector much cleaner since it > can return different atomic collectors implementations depending on whether > the current segment is sorted or not. > Even if we have lots of collectors all over the place, we could make it > easier to migrate by having a Collector that would implement both Collector > and AtomicCollector, return {{this}} in setNextReader and make current > concrete Collector implementations extend this class instead of directly > extending Collector. -- This message was sent by Atlassian JIRA (v6.2#6252) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-5527) Make the Collector API work per-segment
[ https://issues.apache.org/jira/browse/LUCENE-5527?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13959092#comment-13959092 ] Adrien Grand commented on LUCENE-5527: -- bq. is setNextReader really an appropriate name anymore? or should it be something like getLeafCollector(AtomicReaderContext) I think this name is better indeed! bq. Should we bite the bullet and make LeafCollector and Collector extend Closable ? I would like having such callbacks too. LUCENE-4370 discusses such a change and the challenges. bq. Should we go ahead and think about if/how this API should be tweaked (now or in the future) to allow a Collector to indicate that it's LeafCollectors can be used to collect documents from different segments in parallel threads This was the main point of LUCENE-5299 that I forked this issue/patch from. But I think this is more controversial: I think that in general, it is a better idea to shard the index (even locally) and merge per-shard results using something like {{TopDocs.merge}} than trying to collect segments concurrently because it allows to have individual tasks of equal cost. > Make the Collector API work per-segment > --- > > Key: LUCENE-5527 > URL: https://issues.apache.org/jira/browse/LUCENE-5527 > Project: Lucene - Core > Issue Type: Improvement >Reporter: Adrien Grand >Priority: Minor > Fix For: 5.0 > > Attachments: LUCENE-5527.patch > > > Spin-off of LUCENE-5299. > LUCENE-5229 proposes different changes, some of them being controversial, but > there is one of them that I really really like that consists in refactoring > the {{Collector}} API in order to have a different Collector per segment. > The idea is, instead of having a single Collector object that needs to be > able to take care of all segments, to have a top-level Collector: > {code} > public interface Collector { > AtomicCollector setNextReader(AtomicReaderContext context) throws > IOException; > > } > {code} > and a per-AtomicReaderContext collector: > {code} > public interface AtomicCollector { > void setScorer(Scorer scorer) throws IOException; > void collect(int doc) throws IOException; > boolean acceptsDocsOutOfOrder(); > } > {code} > I think it makes the API clearer since it is now obious {{setScorer}} and > {{acceptDocsOutOfOrder}} need to be called after {{setNextReader}} which is > otherwise unclear. > It also makes things more flexible. For example, a collector could much more > easily decide to use different strategies on different segments. In > particular, it makes the early-termination collector much cleaner since it > can return different atomic collectors implementations depending on whether > the current segment is sorted or not. > Even if we have lots of collectors all over the place, we could make it > easier to migrate by having a Collector that would implement both Collector > and AtomicCollector, return {{this}} in setNextReader and make current > concrete Collector implementations extend this class instead of directly > extending Collector. -- This message was sent by Atlassian JIRA (v6.2#6252) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-5527) Make the Collector API work per-segment
[ https://issues.apache.org/jira/browse/LUCENE-5527?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13959080#comment-13959080 ] Hoss Man commented on LUCENE-5527: -- This isn't an area of Lucene i tend to think much about, but if we're talking about changing the API, i have a few questions i wanted to put out there for discussion: * is {{setNextReader}} really an appropriate name anymore? or should it be something like {{getLeafCollector(AtomicReaderContext)}} * Should we bite the bullet and make {{LeafCollector}} and {{Collector}} extend {{Closable}} ? ** I believe this was discussed at one point before, and it was considered an obnoxious interface change with minimal gain - but if we're going to change the interface API now anyway, it may be worth reconsidering ** Solr works around this by always using {{DelegatingCollector}} which has a {{finish()}} method - i can't imagine other lucene apps don't have similar wishes for {{close()}} like this. * Should we go ahead and think about if/how this API should be tweaked (now or in the future) to allow a {{Collector}} to indicate that it's {{LeafCollectors}} can be used to collect documents from different segments in parallel threads? If we think a marker interface would be the best way to go, so we can easily add that later -- but i wanted to ask the question ** One possibility that just occurred to me: Document right now that {{LeafCollectors}} _may_ be processed in parallel, and that if you want to force single threaded collection you should implement {{Collector.getLeafCollector(AtomicReaderContext)}} such that it blocks until the previously returned {{LeafCollector}} has been closed (which should be easy enough to do in things like {{SimpleCollector}} using a basic sync lock, right?) (I have no strong feelings about most of this) > Make the Collector API work per-segment > --- > > Key: LUCENE-5527 > URL: https://issues.apache.org/jira/browse/LUCENE-5527 > Project: Lucene - Core > Issue Type: Improvement >Reporter: Adrien Grand >Priority: Minor > Fix For: 5.0 > > Attachments: LUCENE-5527.patch > > > Spin-off of LUCENE-5299. > LUCENE-5229 proposes different changes, some of them being controversial, but > there is one of them that I really really like that consists in refactoring > the {{Collector}} API in order to have a different Collector per segment. > The idea is, instead of having a single Collector object that needs to be > able to take care of all segments, to have a top-level Collector: > {code} > public interface Collector { > AtomicCollector setNextReader(AtomicReaderContext context) throws > IOException; > > } > {code} > and a per-AtomicReaderContext collector: > {code} > public interface AtomicCollector { > void setScorer(Scorer scorer) throws IOException; > void collect(int doc) throws IOException; > boolean acceptsDocsOutOfOrder(); > } > {code} > I think it makes the API clearer since it is now obious {{setScorer}} and > {{acceptDocsOutOfOrder}} need to be called after {{setNextReader}} which is > otherwise unclear. > It also makes things more flexible. For example, a collector could much more > easily decide to use different strategies on different segments. In > particular, it makes the early-termination collector much cleaner since it > can return different atomic collectors implementations depending on whether > the current segment is sorted or not. > Even if we have lots of collectors all over the place, we could make it > easier to migrate by having a Collector that would implement both Collector > and AtomicCollector, return {{this}} in setNextReader and make current > concrete Collector implementations extend this class instead of directly > extending Collector. -- This message was sent by Atlassian JIRA (v6.2#6252) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-5527) Make the Collector API work per-segment
[ https://issues.apache.org/jira/browse/LUCENE-5527?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13959078#comment-13959078 ] Michael McCandless commented on LUCENE-5527: +1 for LeafCollector and the patch. I tested if there are search performance impacts from this: {noformat} Report after iter 10: TaskQPS base StdDevQPS comp StdDev Pct diff Respell 49.44 (3.3%) 48.10 (3.7%) -2.7% ( -9% -4%) Fuzzy2 46.74 (3.2%) 45.73 (3.1%) -2.2% ( -8% -4%) Fuzzy1 59.25 (3.7%) 58.08 (3.5%) -2.0% ( -8% -5%) IntNRQ3.42 (3.8%)3.40 (3.8%) -0.7% ( -7% -7%) Prefix3 86.67 (2.6%) 86.17 (2.6%) -0.6% ( -5% -4%) LowSloppyPhrase 44.44 (2.3%) 44.42 (2.5%) -0.1% ( -4% -4%) Wildcard 19.08 (3.5%) 19.07 (3.0%) -0.1% ( -6% -6%) AndHighMed 34.38 (1.0%) 34.38 (1.0%) -0.0% ( -2% -2%) LowSpanNear 10.41 (3.1%) 10.41 (2.3%) 0.0% ( -5% -5%) HighSloppyPhrase3.49 (7.9%)3.49 (6.6%) 0.1% ( -13% - 15%) AndHighHigh 28.35 (1.1%) 28.39 (1.0%) 0.1% ( -1% -2%) MedSpanNear 31.06 (2.8%) 31.12 (2.7%) 0.2% ( -5% -5%) AndHighLow 391.44 (2.9%) 392.73 (2.6%) 0.3% ( -5% -6%) MedSloppyPhrase3.54 (5.2%)3.56 (4.6%) 0.4% ( -8% - 10%) OrHighMed 26.51 (4.0%) 26.66 (5.7%) 0.6% ( -8% - 10%) OrHighNotLow 24.84 (4.1%) 24.98 (5.8%) 0.6% ( -9% - 10%) LowPhrase 13.19 (1.6%) 13.27 (2.3%) 0.6% ( -3% -4%) OrHighLow 18.78 (4.1%) 18.91 (5.8%) 0.7% ( -8% - 11%) OrNotHighHigh8.87 (4.5%)8.93 (6.0%) 0.7% ( -9% - 11%) OrHighNotMed 30.63 (4.1%) 30.85 (5.5%) 0.7% ( -8% - 10%) OrHighHigh8.21 (4.1%)8.27 (5.8%) 0.7% ( -8% - 11%) MedPhrase 203.10 (6.6%) 204.77 (6.3%) 0.8% ( -11% - 14%) OrHighNotHigh 11.09 (4.5%) 11.18 (5.9%) 0.8% ( -9% - 11%) LowTerm 322.74 (5.6%) 325.67 (5.6%) 0.9% ( -9% - 12%) HighTerm 63.88 (12.8%) 64.55 (12.2%) 1.1% ( -21% - 29%) MedTerm 100.19 (9.8%) 101.31 (9.5%) 1.1% ( -16% - 22%) HighSpanNear8.09 (4.0%)8.18 (4.9%) 1.1% ( -7% - 10%) HighPhrase4.27 (7.1%)4.32 (6.5%) 1.2% ( -11% - 15%) OrNotHighMed 19.00 (7.0%) 19.30 (7.6%) 1.6% ( -12% - 17%) OrNotHighLow 19.63 (7.4%) 19.96 (8.0%) 1.7% ( -12% - 18%) {noformat} Looks like just noise! > Make the Collector API work per-segment > --- > > Key: LUCENE-5527 > URL: https://issues.apache.org/jira/browse/LUCENE-5527 > Project: Lucene - Core > Issue Type: Improvement >Reporter: Adrien Grand >Priority: Minor > Fix For: 5.0 > > Attachments: LUCENE-5527.patch > > > Spin-off of LUCENE-5299. > LUCENE-5229 proposes different changes, some of them being controversial, but > there is one of them that I really really like that consists in refactoring > the {{Collector}} API in order to have a different Collector per segment. > The idea is, instead of having a single Collector object that needs to be > able to take care of all segments, to have a top-level Collector: > {code} > public interface Collector { > AtomicCollector setNextReader(AtomicReaderContext context) throws > IOException; > > } > {code} > and a per-AtomicReaderContext collector: > {code} > public interface AtomicCollector { > void setScorer(Scorer scorer) throws IOException; > void collect(int doc) throws IOException; > boolean acceptsDocsOutOfOrder(); > } > {code} > I think it makes the API clearer since it is now obious {{setScorer}} and > {{acceptDocsOutOfOrder}} need to be called after {{setNextReader}} which is > otherwise unclear. > It also makes things more flexible. For example, a collector could much more > easily decide to use different strategies on different segments. In > p
[jira] [Commented] (LUCENE-5527) Make the Collector API work per-segment
[ https://issues.apache.org/jira/browse/LUCENE-5527?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13959075#comment-13959075 ] Adrien Grand commented on LUCENE-5527: -- I opened LUCENE-5569 to discuss the renaming. > Make the Collector API work per-segment > --- > > Key: LUCENE-5527 > URL: https://issues.apache.org/jira/browse/LUCENE-5527 > Project: Lucene - Core > Issue Type: Improvement >Reporter: Adrien Grand >Priority: Minor > Fix For: 5.0 > > Attachments: LUCENE-5527.patch > > > Spin-off of LUCENE-5299. > LUCENE-5229 proposes different changes, some of them being controversial, but > there is one of them that I really really like that consists in refactoring > the {{Collector}} API in order to have a different Collector per segment. > The idea is, instead of having a single Collector object that needs to be > able to take care of all segments, to have a top-level Collector: > {code} > public interface Collector { > AtomicCollector setNextReader(AtomicReaderContext context) throws > IOException; > > } > {code} > and a per-AtomicReaderContext collector: > {code} > public interface AtomicCollector { > void setScorer(Scorer scorer) throws IOException; > void collect(int doc) throws IOException; > boolean acceptsDocsOutOfOrder(); > } > {code} > I think it makes the API clearer since it is now obious {{setScorer}} and > {{acceptDocsOutOfOrder}} need to be called after {{setNextReader}} which is > otherwise unclear. > It also makes things more flexible. For example, a collector could much more > easily decide to use different strategies on different segments. In > particular, it makes the early-termination collector much cleaner since it > can return different atomic collectors implementations depending on whether > the current segment is sorted or not. > Even if we have lots of collectors all over the place, we could make it > easier to migrate by having a Collector that would implement both Collector > and AtomicCollector, return {{this}} in setNextReader and make current > concrete Collector implementations extend this class instead of directly > extending Collector. -- This message was sent by Atlassian JIRA (v6.2#6252) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-5527) Make the Collector API work per-segment
[ https://issues.apache.org/jira/browse/LUCENE-5527?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13959036#comment-13959036 ] Ryan Ernst commented on LUCENE-5527: +1 > Make the Collector API work per-segment > --- > > Key: LUCENE-5527 > URL: https://issues.apache.org/jira/browse/LUCENE-5527 > Project: Lucene - Core > Issue Type: Improvement >Reporter: Adrien Grand >Priority: Minor > Fix For: 5.0 > > Attachments: LUCENE-5527.patch > > > Spin-off of LUCENE-5299. > LUCENE-5229 proposes different changes, some of them being controversial, but > there is one of them that I really really like that consists in refactoring > the {{Collector}} API in order to have a different Collector per segment. > The idea is, instead of having a single Collector object that needs to be > able to take care of all segments, to have a top-level Collector: > {code} > public interface Collector { > AtomicCollector setNextReader(AtomicReaderContext context) throws > IOException; > > } > {code} > and a per-AtomicReaderContext collector: > {code} > public interface AtomicCollector { > void setScorer(Scorer scorer) throws IOException; > void collect(int doc) throws IOException; > boolean acceptsDocsOutOfOrder(); > } > {code} > I think it makes the API clearer since it is now obious {{setScorer}} and > {{acceptDocsOutOfOrder}} need to be called after {{setNextReader}} which is > otherwise unclear. > It also makes things more flexible. For example, a collector could much more > easily decide to use different strategies on different segments. In > particular, it makes the early-termination collector much cleaner since it > can return different atomic collectors implementations depending on whether > the current segment is sorted or not. > Even if we have lots of collectors all over the place, we could make it > easier to migrate by having a Collector that would implement both Collector > and AtomicCollector, return {{this}} in setNextReader and make current > concrete Collector implementations extend this class instead of directly > extending Collector. -- This message was sent by Atlassian JIRA (v6.2#6252) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-5527) Make the Collector API work per-segment
[ https://issues.apache.org/jira/browse/LUCENE-5527?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13959030#comment-13959030 ] Adrien Grand commented on LUCENE-5527: -- I like {{LeafCollector}} better too. So let's name it this way now and open an issue to rename {{AtomicReader}} to {{LeafReader}} in 5.0? > Make the Collector API work per-segment > --- > > Key: LUCENE-5527 > URL: https://issues.apache.org/jira/browse/LUCENE-5527 > Project: Lucene - Core > Issue Type: Improvement >Reporter: Adrien Grand >Priority: Minor > Fix For: 5.0 > > Attachments: LUCENE-5527.patch > > > Spin-off of LUCENE-5299. > LUCENE-5229 proposes different changes, some of them being controversial, but > there is one of them that I really really like that consists in refactoring > the {{Collector}} API in order to have a different Collector per segment. > The idea is, instead of having a single Collector object that needs to be > able to take care of all segments, to have a top-level Collector: > {code} > public interface Collector { > AtomicCollector setNextReader(AtomicReaderContext context) throws > IOException; > > } > {code} > and a per-AtomicReaderContext collector: > {code} > public interface AtomicCollector { > void setScorer(Scorer scorer) throws IOException; > void collect(int doc) throws IOException; > boolean acceptsDocsOutOfOrder(); > } > {code} > I think it makes the API clearer since it is now obious {{setScorer}} and > {{acceptDocsOutOfOrder}} need to be called after {{setNextReader}} which is > otherwise unclear. > It also makes things more flexible. For example, a collector could much more > easily decide to use different strategies on different segments. In > particular, it makes the early-termination collector much cleaner since it > can return different atomic collectors implementations depending on whether > the current segment is sorted or not. > Even if we have lots of collectors all over the place, we could make it > easier to migrate by having a Collector that would implement both Collector > and AtomicCollector, return {{this}} in setNextReader and make current > concrete Collector implementations extend this class instead of directly > extending Collector. -- This message was sent by Atlassian JIRA (v6.2#6252) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-5527) Make the Collector API work per-segment
[ https://issues.apache.org/jira/browse/LUCENE-5527?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13958984#comment-13958984 ] Ryan Ernst commented on LUCENE-5527: {quote} I guess I wish AtomicReader were named LeafReader. So maybe we shouldnt try to tackle that on this issue. {quote} I agree 'atomic' is a bad name. Why not name this LeafCollector now, since it makes more sense on its own? > Make the Collector API work per-segment > --- > > Key: LUCENE-5527 > URL: https://issues.apache.org/jira/browse/LUCENE-5527 > Project: Lucene - Core > Issue Type: Improvement >Reporter: Adrien Grand >Priority: Minor > Fix For: 5.0 > > Attachments: LUCENE-5527.patch > > > Spin-off of LUCENE-5299. > LUCENE-5229 proposes different changes, some of them being controversial, but > there is one of them that I really really like that consists in refactoring > the {{Collector}} API in order to have a different Collector per segment. > The idea is, instead of having a single Collector object that needs to be > able to take care of all segments, to have a top-level Collector: > {code} > public interface Collector { > AtomicCollector setNextReader(AtomicReaderContext context) throws > IOException; > > } > {code} > and a per-AtomicReaderContext collector: > {code} > public interface AtomicCollector { > void setScorer(Scorer scorer) throws IOException; > void collect(int doc) throws IOException; > boolean acceptsDocsOutOfOrder(); > } > {code} > I think it makes the API clearer since it is now obious {{setScorer}} and > {{acceptDocsOutOfOrder}} need to be called after {{setNextReader}} which is > otherwise unclear. > It also makes things more flexible. For example, a collector could much more > easily decide to use different strategies on different segments. In > particular, it makes the early-termination collector much cleaner since it > can return different atomic collectors implementations depending on whether > the current segment is sorted or not. > Even if we have lots of collectors all over the place, we could make it > easier to migrate by having a Collector that would implement both Collector > and AtomicCollector, return {{this}} in setNextReader and make current > concrete Collector implementations extend this class instead of directly > extending Collector. -- This message was sent by Atlassian JIRA (v6.2#6252) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-5527) Make the Collector API work per-segment
[ https://issues.apache.org/jira/browse/LUCENE-5527?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13958894#comment-13958894 ] Robert Muir commented on LUCENE-5527: - That makes sense. Honestly I think my issue is not specific to Collector, but more general. I guess I wish AtomicReader were named LeafReader. So maybe we shouldnt try to tackle that on this issue. > Make the Collector API work per-segment > --- > > Key: LUCENE-5527 > URL: https://issues.apache.org/jira/browse/LUCENE-5527 > Project: Lucene - Core > Issue Type: Improvement >Reporter: Adrien Grand >Priority: Minor > Fix For: 5.0 > > Attachments: LUCENE-5527.patch > > > Spin-off of LUCENE-5299. > LUCENE-5229 proposes different changes, some of them being controversial, but > there is one of them that I really really like that consists in refactoring > the {{Collector}} API in order to have a different Collector per segment. > The idea is, instead of having a single Collector object that needs to be > able to take care of all segments, to have a top-level Collector: > {code} > public interface Collector { > AtomicCollector setNextReader(AtomicReaderContext context) throws > IOException; > > } > {code} > and a per-AtomicReaderContext collector: > {code} > public interface AtomicCollector { > void setScorer(Scorer scorer) throws IOException; > void collect(int doc) throws IOException; > boolean acceptsDocsOutOfOrder(); > } > {code} > I think it makes the API clearer since it is now obious {{setScorer}} and > {{acceptDocsOutOfOrder}} need to be called after {{setNextReader}} which is > otherwise unclear. > It also makes things more flexible. For example, a collector could much more > easily decide to use different strategies on different segments. In > particular, it makes the early-termination collector much cleaner since it > can return different atomic collectors implementations depending on whether > the current segment is sorted or not. > Even if we have lots of collectors all over the place, we could make it > easier to migrate by having a Collector that would implement both Collector > and AtomicCollector, return {{this}} in setNextReader and make current > concrete Collector implementations extend this class instead of directly > extending Collector. -- This message was sent by Atlassian JIRA (v6.2#6252) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-5527) Make the Collector API work per-segment
[ https://issues.apache.org/jira/browse/LUCENE-5527?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13958891#comment-13958891 ] Adrien Grand commented on LUCENE-5527: -- I chose these names to make it clear what these classes relate to: - IndexReader -> Collector - AtomicReader -> AtomicCollector But I'm totally open to better names! > Make the Collector API work per-segment > --- > > Key: LUCENE-5527 > URL: https://issues.apache.org/jira/browse/LUCENE-5527 > Project: Lucene - Core > Issue Type: Improvement >Reporter: Adrien Grand >Priority: Minor > Fix For: 5.0 > > Attachments: LUCENE-5527.patch > > > Spin-off of LUCENE-5299. > LUCENE-5229 proposes different changes, some of them being controversial, but > there is one of them that I really really like that consists in refactoring > the {{Collector}} API in order to have a different Collector per segment. > The idea is, instead of having a single Collector object that needs to be > able to take care of all segments, to have a top-level Collector: > {code} > public interface Collector { > AtomicCollector setNextReader(AtomicReaderContext context) throws > IOException; > > } > {code} > and a per-AtomicReaderContext collector: > {code} > public interface AtomicCollector { > void setScorer(Scorer scorer) throws IOException; > void collect(int doc) throws IOException; > boolean acceptsDocsOutOfOrder(); > } > {code} > I think it makes the API clearer since it is now obious {{setScorer}} and > {{acceptDocsOutOfOrder}} need to be called after {{setNextReader}} which is > otherwise unclear. > It also makes things more flexible. For example, a collector could much more > easily decide to use different strategies on different segments. In > particular, it makes the early-termination collector much cleaner since it > can return different atomic collectors implementations depending on whether > the current segment is sorted or not. > Even if we have lots of collectors all over the place, we could make it > easier to migrate by having a Collector that would implement both Collector > and AtomicCollector, return {{this}} in setNextReader and make current > concrete Collector implementations extend this class instead of directly > extending Collector. -- This message was sent by Atlassian JIRA (v6.2#6252) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-5527) Make the Collector API work per-segment
[ https://issues.apache.org/jira/browse/LUCENE-5527?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13958871#comment-13958871 ] Robert Muir commented on LUCENE-5527: - In my opinion we should rethink the naming here. Collector now no longer actually collects anything, and there is no 'atomic' property about AtomicCollector... > Make the Collector API work per-segment > --- > > Key: LUCENE-5527 > URL: https://issues.apache.org/jira/browse/LUCENE-5527 > Project: Lucene - Core > Issue Type: Improvement >Reporter: Adrien Grand >Priority: Minor > Fix For: 5.0 > > Attachments: LUCENE-5527.patch > > > Spin-off of LUCENE-5299. > LUCENE-5229 proposes different changes, some of them being controversial, but > there is one of them that I really really like that consists in refactoring > the {{Collector}} API in order to have a different Collector per segment. > The idea is, instead of having a single Collector object that needs to be > able to take care of all segments, to have a top-level Collector: > {code} > public interface Collector { > AtomicCollector setNextReader(AtomicReaderContext context) throws > IOException; > > } > {code} > and a per-AtomicReaderContext collector: > {code} > public interface AtomicCollector { > void setScorer(Scorer scorer) throws IOException; > void collect(int doc) throws IOException; > boolean acceptsDocsOutOfOrder(); > } > {code} > I think it makes the API clearer since it is now obious {{setScorer}} and > {{acceptDocsOutOfOrder}} need to be called after {{setNextReader}} which is > otherwise unclear. > It also makes things more flexible. For example, a collector could much more > easily decide to use different strategies on different segments. In > particular, it makes the early-termination collector much cleaner since it > can return different atomic collectors implementations depending on whether > the current segment is sorted or not. > Even if we have lots of collectors all over the place, we could make it > easier to migrate by having a Collector that would implement both Collector > and AtomicCollector, return {{this}} in setNextReader and make current > concrete Collector implementations extend this class instead of directly > extending Collector. -- This message was sent by Atlassian JIRA (v6.2#6252) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-5527) Make the Collector API work per-segment
[ https://issues.apache.org/jira/browse/LUCENE-5527?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13936430#comment-13936430 ] Shai Erera commented on LUCENE-5527: Maybe we can get rid of setScorer, passing Scorer to {{setNextReader(AtomicReaderContext,Scorer)}}? > Make the Collector API work per-segment > --- > > Key: LUCENE-5527 > URL: https://issues.apache.org/jira/browse/LUCENE-5527 > Project: Lucene - Core > Issue Type: Improvement >Reporter: Adrien Grand >Priority: Minor > > Spin-off of LUCENE-5299. > LUCENE-5229 proposes different changes, some of them being controversial, but > there is one of them that I really really like that consists in refactoring > the {{Collector}} API in order to have a different Collector per segment. > The idea is, instead of having a single Collector object that needs to be > able to take care of all segments, to have a top-level Collector: > {code} > public interface Collector { > AtomicCollector setNextReader(AtomicReaderContext context) throws > IOException; > > } > {code} > and a per-AtomicReaderContext collector: > {code} > public interface AtomicCollector { > void setScorer(Scorer scorer) throws IOException; > void collect(int doc) throws IOException; > boolean acceptsDocsOutOfOrder(); > } > {code} > I think it makes the API clearer since it is now obious {{setScorer}} and > {{acceptDocsOutOfOrder}} need to be called after {{setNextReader}} which is > otherwise unclear. > It also makes things more flexible. For example, a collector could much more > easily decide to use different strategies on different segments. In > particular, it makes the early-termination collector much cleaner since it > can return different atomic collectors implementations depending on whether > the current segment is sorted or not. > Even if we have lots of collectors all over the place, we could make it > easier to migrate by having a Collector that would implement both Collector > and AtomicCollector, return {{this}} in setNextReader and make current > concrete Collector implementations extend this class instead of directly > extending Collector. -- This message was sent by Atlassian JIRA (v6.2#6252) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-5527) Make the Collector API work per-segment
[ https://issues.apache.org/jira/browse/LUCENE-5527?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13936426#comment-13936426 ] David Smiley commented on LUCENE-5527: -- +1 I like it! > Make the Collector API work per-segment > --- > > Key: LUCENE-5527 > URL: https://issues.apache.org/jira/browse/LUCENE-5527 > Project: Lucene - Core > Issue Type: Improvement >Reporter: Adrien Grand >Priority: Minor > > Spin-off of LUCENE-5299. > LUCENE-5229 proposes different changes, some of them being controversial, but > there is one of them that I really really like that consists in refactoring > the {{Collector}} API in order to have a different Collector per segment. > The idea is, instead of having a single Collector object that needs to be > able to take care of all segments, to have a top-level Collector: > {code} > public interface Collector { > AtomicCollector setNextReader(AtomicReaderContext context) throws > IOException; > > } > {code} > and a per-AtomicReaderContext collector: > {code} > public interface AtomicCollector { > void setScorer(Scorer scorer) throws IOException; > void collect(int doc) throws IOException; > boolean acceptsDocsOutOfOrder(); > } > {code} > I think it makes the API clearer since it is now obious {{setScorer}} and > {{acceptDocsOutOfOrder}} need to be called after {{setNextReader}} which is > otherwise unclear. > It also makes things more flexible. For example, a collector could much more > easily decide to use different strategies on different segments. In > particular, it makes the early-termination collector much cleaner since it > can return different atomic collectors implementations depending on whether > the current segment is sorted or not. > Even if we have lots of collectors all over the place, we could make it > easier to migrate by having a Collector that would implement both Collector > and AtomicCollector, return {{this}} in setNextReader and make current > concrete Collector implementations extend this class instead of directly > extending Collector. -- This message was sent by Atlassian JIRA (v6.2#6252) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org