from the reference guide: group.ngroups and group.facet require that all documents in each group must be co-located on the same shard in order for accurate counts to be returned.
Can't give you a technical reason, but there's no expectation it is supported with composite ID routing. Best, Erick On Thu, Apr 6, 2017 at 2:52 AM, [email protected] <[email protected]> wrote: > thank for your help > when i use compseId route ,i find the group.ngroup is a wrong number. I > would like to know what implementation mechanism has led to this > happening。why we must use implict route when we want to use the group > correctly > > ________________________________ > [email protected] > > > From: Diego Ceccarelli (BLOOMBERG/ LONDON) > Date: 2017-04-06 17:16 > To: 380382856 > Subject: Re: Re: Question about grouping in distribute mode > Dear 380382856, > I would be happy to help you if you can provide more informations, do you > want to know why grouping implements a specific route strategy? My point is > that usually grouping involves 3 communications between the federator and > the shards, but in case of ngroup=1 it would be possible to obtain the same > result with 2 communications. > > Can I please ask to post your question on the user solr mailing list [1]? in > this way my answer will be useful to all solr users and people more expert > than me can also answer (or correct me if I say something wrong :)) > > Have a good day! > Diego > > [1] http://lucene.apache.org/solr/community.html#mailing-lists-irc > > > From: [email protected] At: 04/06/17 08:38:20 > To: DIEGO CECCARELLI (BLOOMBERG/ LONDON) > Subject: Re: Re: Question about grouping in distribute mode > > hello can you help me? > There is a problem that has been bothering me.why solrcloud use group.ngroup > shoud implements implict route stratege? > [email protected] > > > From: Diego Ceccarelli (BLOOMBERG/ LONDON) > Date: 2017-03-30 22:09 > To: dev > Subject: Re: Question about grouping in distribute mode > Yes, I agree. And if there are not problems with the logic it would improve > the performance in both the cases.. > > From: [email protected] At: 03/30/17 14:59:31 > To: [email protected] > Subject: Re: Question about grouping in distribute mode > > This is also the case for non-distributed, isn’t it? The lucene-level > FirstPassGroupingCollector doesn’t actually record the docid of the top doc > for each group at the moment, but I don’t think there’s any reason it > couldn’t - it’s stored in the relevant FieldComparator. And it would be a > nice shortcut in GroupingSearch more generally. > > Alan Woodward > www.flax.co.uk > > > On 30 Mar 2017, at 14:26, Diego Ceccarelli <[email protected]> > wrote: > > Hello, I'm currently working on Solr grouping in order to support reranking > [1]. > I've a working patch for non distribute search, and I'm now working on the > distribute setting. > > Looking at the code of distribute grouping (top-k groups, top-n documents > for each group) search consists in: > > GROUPING_DISTRIBUTED_FIRST > 1. given the grouping query, each shard will return the top-k groups > 2. federator will merge the top-k groups and will produce the top-k groups > for the query > > GROUPING_DISTRIBUTED_SECOND > 1. given the top-k groups each shard will return its top-n documents for > each group. > 2. federator will then compute top-n documents for each group merging all > the shards responses. > > GET_FIELDS > as usual > > My plan was to change the collector in GROUPING_DISTRIBUTED_SECOND, and > return > the top documents for each group with a new score given by the function used > to rerank > (affecting maxScore for each group and then also the order of the groups). > Looking at the code then I realized that TopGroups asserts that order of the > groups is not changing, > and I realized that indeed _ if the ranking function is the same, group > order can't change after the first stage _. > > My question is: if the user is interested only in the top document for each > group (i.e., the default: group.limit = 1) do we really need > GROUPING_DISTRIBUTED_SECOND, or could we skip it? > is there any reason to perform grouping distributed second in this case? or > we could just return the top docid together with the topgroups in > GROUPING_DISTRIBUTED_FIRST and then go directly to GET_FIELDS? > > Cheers, > Diego > > [1] https://issues.apache.org/jira/browse/SOLR-8542 > > > --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
