[jira] [Commented] (SOLR-8776) Support RankQuery in grouping
[ https://issues.apache.org/jira/browse/SOLR-8776?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16838896#comment-16838896 ] Diego Ceccarelli commented on SOLR-8776: [~ichattopadhyaya] are you still interested in moving this forward? It is really very painful to update it upstream when it gets stale :D > Support RankQuery in grouping > - > > Key: SOLR-8776 > URL: https://issues.apache.org/jira/browse/SOLR-8776 > Project: Solr > Issue Type: Improvement > Components: search >Affects Versions: 6.0 >Reporter: Diego Ceccarelli >Priority: Minor > Attachments: 0001-SOLR-8776-Support-RankQuery-in-grouping.patch, > 0001-SOLR-8776-Support-RankQuery-in-grouping.patch, > 0001-SOLR-8776-Support-RankQuery-in-grouping.patch, > 0001-SOLR-8776-Support-RankQuery-in-grouping.patch, > 0001-SOLR-8776-Support-RankQuery-in-grouping.patch > > Time Spent: 1h 10m > Remaining Estimate: 0h > > Currently it is not possible to use RankQuery [1] and Grouping [2] together > (see also [3]). In some situations Grouping can be replaced by Collapse and > Expand Results [4] (that supports reranking), but i) collapse cannot > guarantee that at least a minimum number of groups will be returned for a > query, and ii) in the Solr Cloud setting you will have constraints on how to > partition the documents among the shards. > I'm going to start working on supporting RankQuery in grouping. I'll start > attaching a patch with a test that fails because grouping does not support > the rank query and then I'll try to fix the problem, starting from the non > distributed setting (GroupingSearch). > My feeling is that since grouping is mostly performed by Lucene, RankQuery > should be refactored and moved (or partially moved) there. > Any feedback is welcome. > [1] https://cwiki.apache.org/confluence/display/solr/RankQuery+API > [2] https://cwiki.apache.org/confluence/display/solr/Result+Grouping > [3] > http://mail-archives.apache.org/mod_mbox/lucene-solr-user/201507.mbox/%3ccahm-lpuvspest-sw63_8a6gt-wor6ds_t_nb2rope93e4+s...@mail.gmail.com%3E > [4] > https://cwiki.apache.org/confluence/display/solr/Collapse+and+Expand+Results -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-8776) Support RankQuery in grouping
[ https://issues.apache.org/jira/browse/SOLR-8776?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16826814#comment-16826814 ] Diego Ceccarelli commented on SOLR-8776: Good morning, I updated the patch fixing the comment by Ilaygit, now the number of groups retrieved is correct. While I was fixing the issue I realized that this patch won't work with pagination :( Make it work it with pagination requires also to modify the first step collector - I don't mind doing that but I would do it in a separate patch, because otherwise this will become very hard to review. I might throw an exception if someone try to use grouping + ltr + pagination.. > Support RankQuery in grouping > - > > Key: SOLR-8776 > URL: https://issues.apache.org/jira/browse/SOLR-8776 > Project: Solr > Issue Type: Improvement > Components: search >Affects Versions: 6.0 >Reporter: Diego Ceccarelli >Priority: Minor > Attachments: 0001-SOLR-8776-Support-RankQuery-in-grouping.patch, > 0001-SOLR-8776-Support-RankQuery-in-grouping.patch, > 0001-SOLR-8776-Support-RankQuery-in-grouping.patch, > 0001-SOLR-8776-Support-RankQuery-in-grouping.patch, > 0001-SOLR-8776-Support-RankQuery-in-grouping.patch > > Time Spent: 1h 10m > Remaining Estimate: 0h > > Currently it is not possible to use RankQuery [1] and Grouping [2] together > (see also [3]). In some situations Grouping can be replaced by Collapse and > Expand Results [4] (that supports reranking), but i) collapse cannot > guarantee that at least a minimum number of groups will be returned for a > query, and ii) in the Solr Cloud setting you will have constraints on how to > partition the documents among the shards. > I'm going to start working on supporting RankQuery in grouping. I'll start > attaching a patch with a test that fails because grouping does not support > the rank query and then I'll try to fix the problem, starting from the non > distributed setting (GroupingSearch). > My feeling is that since grouping is mostly performed by Lucene, RankQuery > should be refactored and moved (or partially moved) there. > Any feedback is welcome. > [1] https://cwiki.apache.org/confluence/display/solr/RankQuery+API > [2] https://cwiki.apache.org/confluence/display/solr/Result+Grouping > [3] > http://mail-archives.apache.org/mod_mbox/lucene-solr-user/201507.mbox/%3ccahm-lpuvspest-sw63_8a6gt-wor6ds_t_nb2rope93e4+s...@mail.gmail.com%3E > [4] > https://cwiki.apache.org/confluence/display/solr/Collapse+and+Expand+Results -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-8776) Support RankQuery in grouping
[ https://issues.apache.org/jira/browse/SOLR-8776?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16824409#comment-16824409 ] Diego Ceccarelli commented on SOLR-8776: I updated the patch to the latest version upstream: [https://github.com/apache/lucene-solr/pull/162/files] I spent several days hunting a bug :P {{ant precommit}} is happy and tests are successful - there is a comment by Ilaygit about the number of groups retrieved in the non distributed setting that I want to double check tomorrow, but all the rest is done. [~ichattopadhyaya] [~joel.bernstein] [~romseygeek] can you take a look? I would be able to work on it in these days.. Please lets close this! :D :D :D > Support RankQuery in grouping > - > > Key: SOLR-8776 > URL: https://issues.apache.org/jira/browse/SOLR-8776 > Project: Solr > Issue Type: Improvement > Components: search >Affects Versions: 6.0 >Reporter: Diego Ceccarelli >Priority: Minor > Attachments: 0001-SOLR-8776-Support-RankQuery-in-grouping.patch, > 0001-SOLR-8776-Support-RankQuery-in-grouping.patch, > 0001-SOLR-8776-Support-RankQuery-in-grouping.patch, > 0001-SOLR-8776-Support-RankQuery-in-grouping.patch, > 0001-SOLR-8776-Support-RankQuery-in-grouping.patch > > Time Spent: 1h > Remaining Estimate: 0h > > Currently it is not possible to use RankQuery [1] and Grouping [2] together > (see also [3]). In some situations Grouping can be replaced by Collapse and > Expand Results [4] (that supports reranking), but i) collapse cannot > guarantee that at least a minimum number of groups will be returned for a > query, and ii) in the Solr Cloud setting you will have constraints on how to > partition the documents among the shards. > I'm going to start working on supporting RankQuery in grouping. I'll start > attaching a patch with a test that fails because grouping does not support > the rank query and then I'll try to fix the problem, starting from the non > distributed setting (GroupingSearch). > My feeling is that since grouping is mostly performed by Lucene, RankQuery > should be refactored and moved (or partially moved) there. > Any feedback is welcome. > [1] https://cwiki.apache.org/confluence/display/solr/RankQuery+API > [2] https://cwiki.apache.org/confluence/display/solr/Result+Grouping > [3] > http://mail-archives.apache.org/mod_mbox/lucene-solr-user/201507.mbox/%3ccahm-lpuvspest-sw63_8a6gt-wor6ds_t_nb2rope93e4+s...@mail.gmail.com%3E > [4] > https://cwiki.apache.org/confluence/display/solr/Collapse+and+Expand+Results -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-8776) Support RankQuery in grouping
[ https://issues.apache.org/jira/browse/SOLR-8776?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16815399#comment-16815399 ] Diego Ceccarelli commented on SOLR-8776: I'm updating the patch > Support RankQuery in grouping > - > > Key: SOLR-8776 > URL: https://issues.apache.org/jira/browse/SOLR-8776 > Project: Solr > Issue Type: Improvement > Components: search >Affects Versions: 6.0 >Reporter: Diego Ceccarelli >Priority: Minor > Attachments: 0001-SOLR-8776-Support-RankQuery-in-grouping.patch, > 0001-SOLR-8776-Support-RankQuery-in-grouping.patch, > 0001-SOLR-8776-Support-RankQuery-in-grouping.patch, > 0001-SOLR-8776-Support-RankQuery-in-grouping.patch, > 0001-SOLR-8776-Support-RankQuery-in-grouping.patch > > Time Spent: 40m > Remaining Estimate: 0h > > Currently it is not possible to use RankQuery [1] and Grouping [2] together > (see also [3]). In some situations Grouping can be replaced by Collapse and > Expand Results [4] (that supports reranking), but i) collapse cannot > guarantee that at least a minimum number of groups will be returned for a > query, and ii) in the Solr Cloud setting you will have constraints on how to > partition the documents among the shards. > I'm going to start working on supporting RankQuery in grouping. I'll start > attaching a patch with a test that fails because grouping does not support > the rank query and then I'll try to fix the problem, starting from the non > distributed setting (GroupingSearch). > My feeling is that since grouping is mostly performed by Lucene, RankQuery > should be refactored and moved (or partially moved) there. > Any feedback is welcome. > [1] https://cwiki.apache.org/confluence/display/solr/RankQuery+API > [2] https://cwiki.apache.org/confluence/display/solr/Result+Grouping > [3] > http://mail-archives.apache.org/mod_mbox/lucene-solr-user/201507.mbox/%3ccahm-lpuvspest-sw63_8a6gt-wor6ds_t_nb2rope93e4+s...@mail.gmail.com%3E > [4] > https://cwiki.apache.org/confluence/display/solr/Collapse+and+Expand+Results -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-8776) Support RankQuery in grouping
[ https://issues.apache.org/jira/browse/SOLR-8776?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16815373#comment-16815373 ] Joel Bernstein commented on SOLR-8776: -- +1 to moving this forward. > Support RankQuery in grouping > - > > Key: SOLR-8776 > URL: https://issues.apache.org/jira/browse/SOLR-8776 > Project: Solr > Issue Type: Improvement > Components: search >Affects Versions: 6.0 >Reporter: Diego Ceccarelli >Priority: Minor > Attachments: 0001-SOLR-8776-Support-RankQuery-in-grouping.patch, > 0001-SOLR-8776-Support-RankQuery-in-grouping.patch, > 0001-SOLR-8776-Support-RankQuery-in-grouping.patch, > 0001-SOLR-8776-Support-RankQuery-in-grouping.patch, > 0001-SOLR-8776-Support-RankQuery-in-grouping.patch > > Time Spent: 40m > Remaining Estimate: 0h > > Currently it is not possible to use RankQuery [1] and Grouping [2] together > (see also [3]). In some situations Grouping can be replaced by Collapse and > Expand Results [4] (that supports reranking), but i) collapse cannot > guarantee that at least a minimum number of groups will be returned for a > query, and ii) in the Solr Cloud setting you will have constraints on how to > partition the documents among the shards. > I'm going to start working on supporting RankQuery in grouping. I'll start > attaching a patch with a test that fails because grouping does not support > the rank query and then I'll try to fix the problem, starting from the non > distributed setting (GroupingSearch). > My feeling is that since grouping is mostly performed by Lucene, RankQuery > should be refactored and moved (or partially moved) there. > Any feedback is welcome. > [1] https://cwiki.apache.org/confluence/display/solr/RankQuery+API > [2] https://cwiki.apache.org/confluence/display/solr/Result+Grouping > [3] > http://mail-archives.apache.org/mod_mbox/lucene-solr-user/201507.mbox/%3ccahm-lpuvspest-sw63_8a6gt-wor6ds_t_nb2rope93e4+s...@mail.gmail.com%3E > [4] > https://cwiki.apache.org/confluence/display/solr/Collapse+and+Expand+Results -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-8776) Support RankQuery in grouping
[ https://issues.apache.org/jira/browse/SOLR-8776?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16815174#comment-16815174 ] Diego Ceccarelli commented on SOLR-8776: I would like to move this forward too, I can start rebasing it to the current master, what do you think [~romseygeek] [~joel.bernstein] [~ichattopadhyaya] ? > Support RankQuery in grouping > - > > Key: SOLR-8776 > URL: https://issues.apache.org/jira/browse/SOLR-8776 > Project: Solr > Issue Type: Improvement > Components: search >Affects Versions: 6.0 >Reporter: Diego Ceccarelli >Priority: Minor > Attachments: 0001-SOLR-8776-Support-RankQuery-in-grouping.patch, > 0001-SOLR-8776-Support-RankQuery-in-grouping.patch, > 0001-SOLR-8776-Support-RankQuery-in-grouping.patch, > 0001-SOLR-8776-Support-RankQuery-in-grouping.patch, > 0001-SOLR-8776-Support-RankQuery-in-grouping.patch > > Time Spent: 0.5h > Remaining Estimate: 0h > > Currently it is not possible to use RankQuery [1] and Grouping [2] together > (see also [3]). In some situations Grouping can be replaced by Collapse and > Expand Results [4] (that supports reranking), but i) collapse cannot > guarantee that at least a minimum number of groups will be returned for a > query, and ii) in the Solr Cloud setting you will have constraints on how to > partition the documents among the shards. > I'm going to start working on supporting RankQuery in grouping. I'll start > attaching a patch with a test that fails because grouping does not support > the rank query and then I'll try to fix the problem, starting from the non > distributed setting (GroupingSearch). > My feeling is that since grouping is mostly performed by Lucene, RankQuery > should be refactored and moved (or partially moved) there. > Any feedback is welcome. > [1] https://cwiki.apache.org/confluence/display/solr/RankQuery+API > [2] https://cwiki.apache.org/confluence/display/solr/Result+Grouping > [3] > http://mail-archives.apache.org/mod_mbox/lucene-solr-user/201507.mbox/%3ccahm-lpuvspest-sw63_8a6gt-wor6ds_t_nb2rope93e4+s...@mail.gmail.com%3E > [4] > https://cwiki.apache.org/confluence/display/solr/Collapse+and+Expand+Results -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-8776) Support RankQuery in grouping
[ https://issues.apache.org/jira/browse/SOLR-8776?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16815164#comment-16815164 ] Ishan Chattopadhyaya commented on SOLR-8776: I'm interested in moving this JIRA forward. [~romseygeek] / [~joel.bernstein], do you have any objection, in theory, to the Lucene changes here? > Support RankQuery in grouping > - > > Key: SOLR-8776 > URL: https://issues.apache.org/jira/browse/SOLR-8776 > Project: Solr > Issue Type: Improvement > Components: search >Affects Versions: 6.0 >Reporter: Diego Ceccarelli >Priority: Minor > Attachments: 0001-SOLR-8776-Support-RankQuery-in-grouping.patch, > 0001-SOLR-8776-Support-RankQuery-in-grouping.patch, > 0001-SOLR-8776-Support-RankQuery-in-grouping.patch, > 0001-SOLR-8776-Support-RankQuery-in-grouping.patch, > 0001-SOLR-8776-Support-RankQuery-in-grouping.patch > > Time Spent: 0.5h > Remaining Estimate: 0h > > Currently it is not possible to use RankQuery [1] and Grouping [2] together > (see also [3]). In some situations Grouping can be replaced by Collapse and > Expand Results [4] (that supports reranking), but i) collapse cannot > guarantee that at least a minimum number of groups will be returned for a > query, and ii) in the Solr Cloud setting you will have constraints on how to > partition the documents among the shards. > I'm going to start working on supporting RankQuery in grouping. I'll start > attaching a patch with a test that fails because grouping does not support > the rank query and then I'll try to fix the problem, starting from the non > distributed setting (GroupingSearch). > My feeling is that since grouping is mostly performed by Lucene, RankQuery > should be refactored and moved (or partially moved) there. > Any feedback is welcome. > [1] https://cwiki.apache.org/confluence/display/solr/RankQuery+API > [2] https://cwiki.apache.org/confluence/display/solr/Result+Grouping > [3] > http://mail-archives.apache.org/mod_mbox/lucene-solr-user/201507.mbox/%3ccahm-lpuvspest-sw63_8a6gt-wor6ds_t_nb2rope93e4+s...@mail.gmail.com%3E > [4] > https://cwiki.apache.org/confluence/display/solr/Collapse+and+Expand+Results -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-8776) Support RankQuery in grouping
[ https://issues.apache.org/jira/browse/SOLR-8776?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16496289#comment-16496289 ] Alessandro Benedetti commented on SOLR-8776: Hi all, I am quite interested in this Jira, any update on this ? I believe it would quite handy to have the Learning To Rank module working with grouping ! > Support RankQuery in grouping > - > > Key: SOLR-8776 > URL: https://issues.apache.org/jira/browse/SOLR-8776 > Project: Solr > Issue Type: Improvement > Components: search >Affects Versions: 6.0 >Reporter: Diego Ceccarelli >Priority: Minor > Attachments: 0001-SOLR-8776-Support-RankQuery-in-grouping.patch, > 0001-SOLR-8776-Support-RankQuery-in-grouping.patch, > 0001-SOLR-8776-Support-RankQuery-in-grouping.patch, > 0001-SOLR-8776-Support-RankQuery-in-grouping.patch, > 0001-SOLR-8776-Support-RankQuery-in-grouping.patch > > Time Spent: 0.5h > Remaining Estimate: 0h > > Currently it is not possible to use RankQuery [1] and Grouping [2] together > (see also [3]). In some situations Grouping can be replaced by Collapse and > Expand Results [4] (that supports reranking), but i) collapse cannot > guarantee that at least a minimum number of groups will be returned for a > query, and ii) in the Solr Cloud setting you will have constraints on how to > partition the documents among the shards. > I'm going to start working on supporting RankQuery in grouping. I'll start > attaching a patch with a test that fails because grouping does not support > the rank query and then I'll try to fix the problem, starting from the non > distributed setting (GroupingSearch). > My feeling is that since grouping is mostly performed by Lucene, RankQuery > should be refactored and moved (or partially moved) there. > Any feedback is welcome. > [1] https://cwiki.apache.org/confluence/display/solr/RankQuery+API > [2] https://cwiki.apache.org/confluence/display/solr/Result+Grouping > [3] > http://mail-archives.apache.org/mod_mbox/lucene-solr-user/201507.mbox/%3ccahm-lpuvspest-sw63_8a6gt-wor6ds_t_nb2rope93e4+s...@mail.gmail.com%3E > [4] > https://cwiki.apache.org/confluence/display/solr/Collapse+and+Expand+Results -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-8776) Support RankQuery in grouping
[ https://issues.apache.org/jira/browse/SOLR-8776?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16019698#comment-16019698 ] ASF GitHub Bot commented on SOLR-8776: -- Github user romseygeek commented on a diff in the pull request: https://github.com/apache/lucene-solr/pull/162#discussion_r117773779 --- Diff: lucene/grouping/src/java/org/apache/lucene/search/grouping/TopGroupsCollector.java --- @@ -39,6 +39,14 @@ private final Sort withinGroupSort; private final int maxDocsPerGroup; + protected TopGroupsCollector(GroupReducergroupReducer, GroupSelector groupSelector, Collection groups, Sort groupSort, Sort withinGroupSort, +int maxDocsPerGroup, boolean getScores, boolean getMaxScores, boolean fillSortFields) { +super(groupSelector, groups, groupReducer); --- End diff -- You can remove a few of these parameters now, I think? They're for the Reducer, and as such don't need to be passed separately. > Support RankQuery in grouping > - > > Key: SOLR-8776 > URL: https://issues.apache.org/jira/browse/SOLR-8776 > Project: Solr > Issue Type: Improvement > Components: search >Affects Versions: 6.0 >Reporter: Diego Ceccarelli >Priority: Minor > Fix For: master (7.0) > > Attachments: 0001-SOLR-8776-Support-RankQuery-in-grouping.patch, > 0001-SOLR-8776-Support-RankQuery-in-grouping.patch, > 0001-SOLR-8776-Support-RankQuery-in-grouping.patch, > 0001-SOLR-8776-Support-RankQuery-in-grouping.patch, > 0001-SOLR-8776-Support-RankQuery-in-grouping.patch > > > Currently it is not possible to use RankQuery [1] and Grouping [2] together > (see also [3]). In some situations Grouping can be replaced by Collapse and > Expand Results [4] (that supports reranking), but i) collapse cannot > guarantee that at least a minimum number of groups will be returned for a > query, and ii) in the Solr Cloud setting you will have constraints on how to > partition the documents among the shards. > I'm going to start working on supporting RankQuery in grouping. I'll start > attaching a patch with a test that fails because grouping does not support > the rank query and then I'll try to fix the problem, starting from the non > distributed setting (GroupingSearch). > My feeling is that since grouping is mostly performed by Lucene, RankQuery > should be refactored and moved (or partially moved) there. > Any feedback is welcome. > [1] https://cwiki.apache.org/confluence/display/solr/RankQuery+API > [2] https://cwiki.apache.org/confluence/display/solr/Result+Grouping > [3] > http://mail-archives.apache.org/mod_mbox/lucene-solr-user/201507.mbox/%3ccahm-lpuvspest-sw63_8a6gt-wor6ds_t_nb2rope93e4+s...@mail.gmail.com%3E > [4] > https://cwiki.apache.org/confluence/display/solr/Collapse+and+Expand+Results -- This message was sent by Atlassian JIRA (v6.3.15#6346) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-8776) Support RankQuery in grouping
[ https://issues.apache.org/jira/browse/SOLR-8776?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16005060#comment-16005060 ] Diego Ceccarelli commented on SOLR-8776: Hi all, I updated the PR (https://github.com/apache/lucene-solr/pull/162), highlights: [~romseygeek],[~martijn.v.groningen] now the patch relies on the new grouping code :) I had to add a new {{protected}} constructor to {{TopGroupsCollector}} to inject my own {{GroupReducer}}. Could you please take a look at let me know if it makes sense? also in [SecondPassGroupingCollector#L54|https://github.com/bloomberg/lucene-solr/blob/c22a9017649406c5673c9b72878ad66a20d9b8d2/lucene/grouping/src/java/org/apache/lucene/search/grouping/SecondPassGroupingCollector.java#L54] {code:title=SecondPassGroupingCollector.java|borderStyle=solid} public SecondPassGroupingCollector(GroupSelector groupSelector, Collectiongroups, GroupReducer reducer) { //System.out.println("SP init"); //Do we want to check if groups is null here? instead of checking at line 62? if (groups.isEmpty()) { throw new IllegalArgumentException("no groups to collect (groups is empty)"); } this.groupSelector = Objects.requireNonNull(groupSelector); this.groupSelector.setGroups(groups); this.groups = Objects.requireNonNull(groups); {code} I would check if {{groups != null}} before {{groups.isEmpty()}}. 2. I changed the logic of reranking to rerank groups, for example if a user ask to rerank the top 100 documents: {{q=greetings=10=\{!rerank reRankQuery=$rqq reRankDocs=100 reRankWeight=3\}=(hi+hello+hey+hiya)}}: * the top 100 groups matching {{greeting}} are retrieved; * top 100 groups are reranked by {{rqq}}; * finally the top 10 reranked groups are returned; * inside each group documents will be reranked as well. IMO the patch is now complete and I've working unit tests. Please, can someone review my patch? > Support RankQuery in grouping > - > > Key: SOLR-8776 > URL: https://issues.apache.org/jira/browse/SOLR-8776 > Project: Solr > Issue Type: Improvement > Components: search >Affects Versions: 6.0 >Reporter: Diego Ceccarelli >Priority: Minor > Fix For: 6.0 > > Attachments: 0001-SOLR-8776-Support-RankQuery-in-grouping.patch, > 0001-SOLR-8776-Support-RankQuery-in-grouping.patch, > 0001-SOLR-8776-Support-RankQuery-in-grouping.patch, > 0001-SOLR-8776-Support-RankQuery-in-grouping.patch, > 0001-SOLR-8776-Support-RankQuery-in-grouping.patch > > > Currently it is not possible to use RankQuery [1] and Grouping [2] together > (see also [3]). In some situations Grouping can be replaced by Collapse and > Expand Results [4] (that supports reranking), but i) collapse cannot > guarantee that at least a minimum number of groups will be returned for a > query, and ii) in the Solr Cloud setting you will have constraints on how to > partition the documents among the shards. > I'm going to start working on supporting RankQuery in grouping. I'll start > attaching a patch with a test that fails because grouping does not support > the rank query and then I'll try to fix the problem, starting from the non > distributed setting (GroupingSearch). > My feeling is that since grouping is mostly performed by Lucene, RankQuery > should be refactored and moved (or partially moved) there. > Any feedback is welcome. > [1] https://cwiki.apache.org/confluence/display/solr/RankQuery+API > [2] https://cwiki.apache.org/confluence/display/solr/Result+Grouping > [3] > http://mail-archives.apache.org/mod_mbox/lucene-solr-user/201507.mbox/%3ccahm-lpuvspest-sw63_8a6gt-wor6ds_t_nb2rope93e4+s...@mail.gmail.com%3E > [4] > https://cwiki.apache.org/confluence/display/solr/Collapse+and+Expand+Results -- This message was sent by Atlassian JIRA (v6.3.15#6346) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-8776) Support RankQuery in grouping
[ https://issues.apache.org/jira/browse/SOLR-8776?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15953586#comment-15953586 ] Diego Ceccarelli commented on SOLR-8776: Hi all, I updated the PR (https://github.com/apache/lucene-solr/pull/162) now it supports reranking by field and by function query, and reranking by field in distribute setting - I have working tests for all the use cases. There are still some details to fix: * I had to remove final from {{GroupDocs::maxScore}} and {{GroupDocs::score}} (I can easily fix this I think) * {{Rerank(Function|Group)SecondPassGroupingCollector}} have the number of documents to rerank hardcoded in the class because {{RankQuery}} doesn't expose that value (I think it should) [~joel.bernstein], [~martijn.v.groningen], [~romseygeek] any feedback? > Support RankQuery in grouping > - > > Key: SOLR-8776 > URL: https://issues.apache.org/jira/browse/SOLR-8776 > Project: Solr > Issue Type: Improvement > Components: search >Affects Versions: 6.0 >Reporter: Diego Ceccarelli >Priority: Minor > Fix For: 6.0 > > Attachments: 0001-SOLR-8776-Support-RankQuery-in-grouping.patch, > 0001-SOLR-8776-Support-RankQuery-in-grouping.patch, > 0001-SOLR-8776-Support-RankQuery-in-grouping.patch, > 0001-SOLR-8776-Support-RankQuery-in-grouping.patch, > 0001-SOLR-8776-Support-RankQuery-in-grouping.patch > > > Currently it is not possible to use RankQuery [1] and Grouping [2] together > (see also [3]). In some situations Grouping can be replaced by Collapse and > Expand Results [4] (that supports reranking), but i) collapse cannot > guarantee that at least a minimum number of groups will be returned for a > query, and ii) in the Solr Cloud setting you will have constraints on how to > partition the documents among the shards. > I'm going to start working on supporting RankQuery in grouping. I'll start > attaching a patch with a test that fails because grouping does not support > the rank query and then I'll try to fix the problem, starting from the non > distributed setting (GroupingSearch). > My feeling is that since grouping is mostly performed by Lucene, RankQuery > should be refactored and moved (or partially moved) there. > Any feedback is welcome. > [1] https://cwiki.apache.org/confluence/display/solr/RankQuery+API > [2] https://cwiki.apache.org/confluence/display/solr/Result+Grouping > [3] > http://mail-archives.apache.org/mod_mbox/lucene-solr-user/201507.mbox/%3ccahm-lpuvspest-sw63_8a6gt-wor6ds_t_nb2rope93e4+s...@mail.gmail.com%3E > [4] > https://cwiki.apache.org/confluence/display/solr/Collapse+and+Expand+Results -- This message was sent by Atlassian JIRA (v6.3.15#6346) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-8776) Support RankQuery in grouping
[ https://issues.apache.org/jira/browse/SOLR-8776?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15888478#comment-15888478 ] ASF GitHub Bot commented on SOLR-8776: -- GitHub user diegoceccarelli opened a pull request: https://github.com/apache/lucene-solr/pull/162 SOLR-8776: Support RankQuery in grouping Update SOLR-8776 to current master - Reranking and grouping work together in non-distributed setting when grouping is done by field - Still have to fix for distribute setting and for grouping based on the unique values of a function query. You can merge this pull request into a Git repository by running: $ git pull https://github.com/bloomberg/lucene-solr master-solr-8776 Alternatively you can review and apply these changes as the patch at: https://github.com/apache/lucene-solr/pull/162.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #162 commit cd33172184c3889dfe95c631e7c30729f1c752a3 Author: diegoDate: 2017-02-28T10:28:32Z SOLR-8776: Support RankQuery in grouping > Support RankQuery in grouping > - > > Key: SOLR-8776 > URL: https://issues.apache.org/jira/browse/SOLR-8776 > Project: Solr > Issue Type: Improvement > Components: search >Affects Versions: 6.0 >Reporter: Diego Ceccarelli >Priority: Minor > Fix For: 6.0 > > Attachments: 0001-SOLR-8776-Support-RankQuery-in-grouping.patch, > 0001-SOLR-8776-Support-RankQuery-in-grouping.patch, > 0001-SOLR-8776-Support-RankQuery-in-grouping.patch, > 0001-SOLR-8776-Support-RankQuery-in-grouping.patch, > 0001-SOLR-8776-Support-RankQuery-in-grouping.patch > > > Currently it is not possible to use RankQuery [1] and Grouping [2] together > (see also [3]). In some situations Grouping can be replaced by Collapse and > Expand Results [4] (that supports reranking), but i) collapse cannot > guarantee that at least a minimum number of groups will be returned for a > query, and ii) in the Solr Cloud setting you will have constraints on how to > partition the documents among the shards. > I'm going to start working on supporting RankQuery in grouping. I'll start > attaching a patch with a test that fails because grouping does not support > the rank query and then I'll try to fix the problem, starting from the non > distributed setting (GroupingSearch). > My feeling is that since grouping is mostly performed by Lucene, RankQuery > should be refactored and moved (or partially moved) there. > Any feedback is welcome. > [1] https://cwiki.apache.org/confluence/display/solr/RankQuery+API > [2] https://cwiki.apache.org/confluence/display/solr/Result+Grouping > [3] > http://mail-archives.apache.org/mod_mbox/lucene-solr-user/201507.mbox/%3ccahm-lpuvspest-sw63_8a6gt-wor6ds_t_nb2rope93e4+s...@mail.gmail.com%3E > [4] > https://cwiki.apache.org/confluence/display/solr/Collapse+and+Expand+Results -- This message was sent by Atlassian JIRA (v6.3.15#6346) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-8776) Support RankQuery in grouping
[ https://issues.apache.org/jira/browse/SOLR-8776?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15301877#comment-15301877 ] Ahmet Anil Pala commented on SOLR-8776: --- Hi, Which branch is this patch compatible with? I've tried branch_6x and branch_6_0. Although, the patch was successfully applied, it prevented the source from compiling. > Support RankQuery in grouping > - > > Key: SOLR-8776 > URL: https://issues.apache.org/jira/browse/SOLR-8776 > Project: Solr > Issue Type: Improvement > Components: search >Affects Versions: 6.0 >Reporter: Diego Ceccarelli >Priority: Minor > Fix For: 6.0 > > Attachments: 0001-SOLR-8776-Support-RankQuery-in-grouping.patch, > 0001-SOLR-8776-Support-RankQuery-in-grouping.patch, > 0001-SOLR-8776-Support-RankQuery-in-grouping.patch, > 0001-SOLR-8776-Support-RankQuery-in-grouping.patch > > > Currently it is not possible to use RankQuery [1] and Grouping [2] together > (see also [3]). In some situations Grouping can be replaced by Collapse and > Expand Results [4] (that supports reranking), but i) collapse cannot > guarantee that at least a minimum number of groups will be returned for a > query, and ii) in the Solr Cloud setting you will have constraints on how to > partition the documents among the shards. > I'm going to start working on supporting RankQuery in grouping. I'll start > attaching a patch with a test that fails because grouping does not support > the rank query and then I'll try to fix the problem, starting from the non > distributed setting (GroupingSearch). > My feeling is that since grouping is mostly performed by Lucene, RankQuery > should be refactored and moved (or partially moved) there. > Any feedback is welcome. > [1] https://cwiki.apache.org/confluence/display/solr/RankQuery+API > [2] https://cwiki.apache.org/confluence/display/solr/Result+Grouping > [3] > http://mail-archives.apache.org/mod_mbox/lucene-solr-user/201507.mbox/%3ccahm-lpuvspest-sw63_8a6gt-wor6ds_t_nb2rope93e4+s...@mail.gmail.com%3E > [4] > https://cwiki.apache.org/confluence/display/solr/Collapse+and+Expand+Results -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-8776) Support RankQuery in grouping
[ https://issues.apache.org/jira/browse/SOLR-8776?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15193892#comment-15193892 ] Diego Ceccarelli commented on SOLR-8776: Update: I found a way to change the behavior of the collectors without moving RankQuery into Lucene. This new patch performs the group reranking without changing Lucene. The only difference is that if the Query is a RankQuery instead of using {{TermSecondPassGroupingCollector}} I'll use a {{RerankTermSecondPassGroupingCollector}}. The new collector will scan the groups collectors and wrap them in 'rerank collectors': {code:java} for (SearchGroup group : groups) { if (query != null) { collector = groupMap.get(group.groupValue).collector; collector = query.getTopDocsCollector(collector, groupSort, searcher); groupMap.put(group.groupValue, new SearchGroupDocs(group.groupValue, collector)); } } {code} > Support RankQuery in grouping > - > > Key: SOLR-8776 > URL: https://issues.apache.org/jira/browse/SOLR-8776 > Project: Solr > Issue Type: Improvement > Components: search >Affects Versions: master >Reporter: Diego Ceccarelli >Priority: Minor > Fix For: master > > Attachments: 0001-SOLR-8776-Support-RankQuery-in-grouping.patch, > 0001-SOLR-8776-Support-RankQuery-in-grouping.patch, > 0001-SOLR-8776-Support-RankQuery-in-grouping.patch, > 0001-SOLR-8776-Support-RankQuery-in-grouping.patch > > > Currently it is not possible to use RankQuery [1] and Grouping [2] together > (see also [3]). In some situations Grouping can be replaced by Collapse and > Expand Results [4] (that supports reranking), but i) collapse cannot > guarantee that at least a minimum number of groups will be returned for a > query, and ii) in the Solr Cloud setting you will have constraints on how to > partition the documents among the shards. > I'm going to start working on supporting RankQuery in grouping. I'll start > attaching a patch with a test that fails because grouping does not support > the rank query and then I'll try to fix the problem, starting from the non > distributed setting (GroupingSearch). > My feeling is that since grouping is mostly performed by Lucene, RankQuery > should be refactored and moved (or partially moved) there. > Any feedback is welcome. > [1] https://cwiki.apache.org/confluence/display/solr/RankQuery+API > [2] https://cwiki.apache.org/confluence/display/solr/Result+Grouping > [3] > http://mail-archives.apache.org/mod_mbox/lucene-solr-user/201507.mbox/%3ccahm-lpuvspest-sw63_8a6gt-wor6ds_t_nb2rope93e4+s...@mail.gmail.com%3E > [4] > https://cwiki.apache.org/confluence/display/solr/Collapse+and+Expand+Results -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-8776) Support RankQuery in grouping
[ https://issues.apache.org/jira/browse/SOLR-8776?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15190864#comment-15190864 ] Diego Ceccarelli commented on SOLR-8776: I uploaded a new patch, now groups are reranked according to the reranking max scores, in the {{finish()}} method of the grouping {{CommandField}} I added: {code:java} if (result != null && query instanceof RankQuery && groupSort == Sort.RELEVANCE){ // if we are sorting for relevance and query is a RankQuery, it may be that // the order of the groups changed, we need to reorder GroupDocs[] groups = result.groups; Arrays.sort(groups, new Comparator() { @Override public int compare(GroupDocs o1, GroupDocs o2) { if (o1.maxScore > o2.maxScore) return -1; if (o1.maxScore < o2.maxScore) return 1; return 0; }}); } {code} This will reorder the groups if we re-rank the documents with the rank query. The second test succeeds. I'm still thinking what it should be the correct semantic to implement reranking + grouping: When you apply a query {{q}} and then a rank-query {{rq}} , you first score all the documents and then rescore top-N documents with the rank-query. The problem with grouping is that in order to get the top-groups you first need to score the collection: you may have a document that scored really low with {{q}} but got a high score with {{rq}}, but the only way to find it is to rerank the whole collection (impracticable). There are two possible solutions then: - if we want to apply {{rq}} on the top 1000 documents, we can collect the groups in the top-1000 documents, and they will be the same obtained scoring directly with {{rq}}, but in a different order; - we can collect more groups than what we need, and then rerank the top documents in each group - I would call this solution: **Group Reranking**. In my opinion group reranking is a better solution: imagine we have a group containing the top-1000 documents ranked with {{q}} we will rerank them maybe just to return one document. I guess the best would be, assuming that we want to apply rerank query to N documents and return the top K groups you can retrieve top K*y groups and then rerank N/(K*y) documents in each group. > Support RankQuery in grouping > - > > Key: SOLR-8776 > URL: https://issues.apache.org/jira/browse/SOLR-8776 > Project: Solr > Issue Type: Improvement > Components: search >Affects Versions: master >Reporter: Diego Ceccarelli >Priority: Minor > Fix For: master > > Attachments: 0001-SOLR-8776-Support-RankQuery-in-grouping.patch, > 0001-SOLR-8776-Support-RankQuery-in-grouping.patch, > 0001-SOLR-8776-Support-RankQuery-in-grouping.patch > > > Currently it is not possible to use RankQuery [1] and Grouping [2] together > (see also [3]). In some situations Grouping can be replaced by Collapse and > Expand Results [4] (that supports reranking), but i) collapse cannot > guarantee that at least a minimum number of groups will be returned for a > query, and ii) in the Solr Cloud setting you will have constraints on how to > partition the documents among the shards. > I'm going to start working on supporting RankQuery in grouping. I'll start > attaching a patch with a test that fails because grouping does not support > the rank query and then I'll try to fix the problem, starting from the non > distributed setting (GroupingSearch). > My feeling is that since grouping is mostly performed by Lucene, RankQuery > should be refactored and moved (or partially moved) there. > Any feedback is welcome. > [1] https://cwiki.apache.org/confluence/display/solr/RankQuery+API > [2] https://cwiki.apache.org/confluence/display/solr/Result+Grouping > [3] > http://mail-archives.apache.org/mod_mbox/lucene-solr-user/201507.mbox/%3ccahm-lpuvspest-sw63_8a6gt-wor6ds_t_nb2rope93e4+s...@mail.gmail.com%3E > [4] > https://cwiki.apache.org/confluence/display/solr/Collapse+and+Expand+Results -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-8776) Support RankQuery in grouping
[ https://issues.apache.org/jira/browse/SOLR-8776?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15189552#comment-15189552 ] Diego Ceccarelli commented on SOLR-8776: [~joel.bernstein] thanks for pointing out about the MergeStrategy. I uploaded a new patch with a first step. I agree that merge strategy must stay there, that's why I wrote "partially moved" :) as well as there's IndexSearcher and SolrIndexSearcher, I moved {RankQuery} in Lucene and created lucene {SolrRankQuery}. The reason is that the {RankQuery} works by manipulating the collector, through this method: {code:java} public abstract TopDocsCollector getTopDocsCollector(int len, QueryCommand cmd, IndexSearcher searcher) throws IOException; {code} At the moment what happens is that if the query is a RankQuery, and into the SolrIndexSearcher: {code:java} private TopDocsCollector buildTopDocsCollector(int len, QueryCommand cmd) throws IOException { Query q = cmd.getQuery(); if (q instanceof RankQuery) { RankQuery rq = (RankQuery) q; return rq.getTopDocsCollector(len, cmd, this); } .. {code} Instead of creating a topCollector using the {TopScoreDocCollector.create}, we wrap a topScoreCollector into a ReRanking collector. Let me remind that grouping works in two separate stages: 1. in the first stage, we iterate on the documents scoring them and keep a map { score>} where score is the highest score of a document in the group (the map contains only the TOP-k groups with the highest scores); 2. for each group in the top groups documents in the group are ranked and top documents for each group are returned. This logic is mainly implemented into {Abstract(First|Second)PassGroupingCollector} (within Lucene). We should probably discuss what means reranking for groups: in my opinion we should keep in mind that the idea behind RankQuery is that you don't want to apply the query to all the documents in the collection, so the "group-reranking" should: 1 in the first stage, we iterate on the documents scoring them as usual and keep a map {group -> score>}; 2 for each group, RankQuery is applied to the top documents in the group; 3 groups will be reranked according to the new scores. In this patch, I'm able to perform 2. I had to move RankQuery into Lucene, because what happens in the {AbstractSecondPassGroupingCollector} is that for each group a collector is created: {code:java} for (SearchGroup group : groups) { //System.out.println(" prep group=" + (group.groupValue == null ? "null" : group.groupValue.utf8ToString())); TopDocsCollector collector; if (withinGroupSort.equals(Sort.RELEVANCE)) { // optimize to use TopScoreDocCollector // Sort by score collector = TopScoreDocCollector.create(maxDocsPerGroup); ... {code} ... so no way to 'inject' the reranking collector from Solr. Moving the RankQuery into lucene I modified the code in: {code:java} collector = TopScoreDocCollector.create(maxDocsPerGroup); if (query != null && query instanceof RankQuery){ collector = ((RankQuery)query).getTopDocsCollector(collector, null, searcher); } {code} and now documents in groups are reranked. I'll work now on 3. i.e., reordering the groups based on the new rerank score (I added a new test that fails at the moment). Happy to discuss about this first change, if you have comments. Minor notes: - At the moment {SolrRankQuery} doesn't extend {ExtendedQueryBase}, I have to check if it is a problem. RankQuery could become an interface maybe. - I did some changes to the interface of {RankQuery.getTopDocsCollector}: {QueryCommand} was in solr but used only for getting {Sort}, len was never used. I added in input the previous collector, instead of creating a new TopDocScore collector inside {RankQuery}. > Support RankQuery in grouping > - > > Key: SOLR-8776 > URL: https://issues.apache.org/jira/browse/SOLR-8776 > Project: Solr > Issue Type: Improvement > Components: search >Affects Versions: master >Reporter: Diego Ceccarelli >Priority: Minor > Fix For: master > > Attachments: 0001-SOLR-8776-Support-RankQuery-in-grouping.patch, > 0001-SOLR-8776-Support-RankQuery-in-grouping.patch > > > Currently it is not possible to use RankQuery [1] and Grouping [2] together > (see also [3]). In some situations Grouping can be replaced by Collapse and > Expand Results [4] (that supports reranking), but i) collapse cannot > guarantee that at least a minimum number of groups will be returned for a > query, and ii) in the Solr Cloud setting you will have constraints on how to > partition the documents among the shards. > I'm going to start working on supporting RankQuery in grouping. I'll start > attaching a patch
[jira] [Commented] (SOLR-8776) Support RankQuery in grouping
[ https://issues.apache.org/jira/browse/SOLR-8776?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15175817#comment-15175817 ] Joel Bernstein commented on SOLR-8776: -- Moving the RankQuery into Lucene is a showstopper I believe because of the MergeStrategy which is very specific to how Solr merges documents from the shards. There are portions of the grouping code in Solr. It might be best to first start looking to see if you can get the RankQuery added on the Solr side. > Support RankQuery in grouping > - > > Key: SOLR-8776 > URL: https://issues.apache.org/jira/browse/SOLR-8776 > Project: Solr > Issue Type: Improvement > Components: search >Affects Versions: master >Reporter: Diego Ceccarelli >Priority: Minor > Fix For: master > > > Currently it is not possible to use RankQuery [1] and Grouping [2] together > (see also [3]). In some situations Grouping can be replaced by Collapse and > Expand Results [4] (that supports reranking), but i) collapse cannot > guarantee that at least a minimum number of groups will be returned for a > query, and ii) in the Solr Cloud setting you will have constraints on how to > partition the documents among the shards. > I'm going to start working on supporting RankQuery in grouping. I'll start > attaching a patch with a test that fails because grouping does not support > the rank query and then I'll try to fix the problem, starting from the non > distributed setting (GroupingSearch). > My feeling is that since grouping is mostly performed by Lucene, RankQuery > should be refactored and moved (or partially moved) there. > Any feedback is welcome. > [1] https://cwiki.apache.org/confluence/display/solr/RankQuery+API > [2] https://cwiki.apache.org/confluence/display/solr/Result+Grouping > [3] > http://mail-archives.apache.org/mod_mbox/lucene-solr-user/201507.mbox/%3ccahm-lpuvspest-sw63_8a6gt-wor6ds_t_nb2rope93e4+s...@mail.gmail.com%3E > [4] > https://cwiki.apache.org/confluence/display/solr/Collapse+and+Expand+Results -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org