Re: Re: Question about grouping in distribute mode

Erick Erickson Thu, 06 Apr 2017 08:17:07 -0700

from the reference guide:

group.ngroups and group.facet require that all documents in each group
must be co-located on the same shard in order for accurate counts to
be returned.


Can't give you a technical reason, but there's no expectation it is
supported with composite ID routing.

Best,
Erick

On Thu, Apr 6, 2017 at 2:52 AM, [email protected] <[email protected]> wrote:
> thank for your help
> when i use compseId route ,i find the group.ngroup is a wrong number. I
> would like to know what implementation mechanism has led to this
> happening。why  we must use implict route when we want to use the group
> correctly
>
> ________________________________
> [email protected]
>
>
> From: Diego Ceccarelli (BLOOMBERG/ LONDON)
> Date: 2017-04-06 17:16
> To: 380382856
> Subject: Re: Re: Question about grouping in distribute mode
> Dear 380382856,
> I would be happy to help you if you can provide more informations, do you
> want to know why grouping implements a specific route strategy? My point is
> that usually grouping involves 3 communications between the federator and
> the shards, but in case of ngroup=1 it would be possible to obtain the same
> result with 2 communications.
>
> Can I please ask to post your question on the user solr mailing list [1]? in
> this way my answer will be useful to all solr users and people more expert
> than me can also answer (or correct me if I say something wrong :))
>
> Have a good day!
> Diego
>
> [1] http://lucene.apache.org/solr/community.html#mailing-lists-irc
>
>
> From: [email protected] At: 04/06/17 08:38:20
> To: DIEGO CECCARELLI (BLOOMBERG/ LONDON)
> Subject: Re: Re: Question about grouping in distribute mode
>
> hello can you help me?
> There is a problem that has been bothering me.why solrcloud use group.ngroup
> shoud implements implict route stratege?
> [email protected]
>
>
> From: Diego Ceccarelli (BLOOMBERG/ LONDON)
> Date: 2017-03-30 22:09
> To: dev
> Subject: Re: Question about grouping in distribute mode
> Yes, I agree. And if there are not problems with the logic it would improve
> the performance in both the cases..
>
> From: [email protected] At: 03/30/17 14:59:31
> To: [email protected]
> Subject: Re: Question about grouping in distribute mode
>
> This is also the case for non-distributed, isn’t it?  The lucene-level
> FirstPassGroupingCollector doesn’t actually record the docid of the top doc
> for each group at the moment, but I don’t think there’s any reason it
> couldn’t - it’s stored in the relevant FieldComparator.  And it would be a
> nice shortcut in GroupingSearch more generally.
>
> Alan Woodward
> www.flax.co.uk
>
>
> On 30 Mar 2017, at 14:26, Diego Ceccarelli <[email protected]>
> wrote:
>
> Hello, I'm currently working on Solr grouping in order to support reranking
> [1].
> I've a working patch for non distribute search, and I'm now working on the
> distribute setting.
>
> Looking at the code of distribute grouping (top-k groups, top-n documents
> for each group) search consists in:
>
> GROUPING_DISTRIBUTED_FIRST
> 1. given the grouping query, each shard will return the top-k groups
> 2. federator will merge the top-k groups and will produce the top-k groups
> for the query
>
> GROUPING_DISTRIBUTED_SECOND
> 1. given the top-k groups  each shard will return its top-n documents for
> each group.
> 2. federator will then compute top-n documents for each group merging all
> the shards responses.
>
> GET_FIELDS
> as usual
>
> My plan was to change the collector in GROUPING_DISTRIBUTED_SECOND, and
> return
> the top documents for each group with a new score given by the function used
> to rerank
> (affecting maxScore for each group and then also the order of the groups).
> Looking at the code then I realized that TopGroups asserts that order of the
> groups is not changing,
> and I realized that indeed _ if the ranking function is the same, group
> order can't change after the first stage _.
>
> My question is: if the user is interested only in the top document for each
> group (i.e., the default: group.limit = 1) do we really need
> GROUPING_DISTRIBUTED_SECOND, or could we skip it?
> is there any reason to perform grouping distributed second in this case? or
> we could just return the top docid together with the topgroups in
> GROUPING_DISTRIBUTED_FIRST and then go directly to GET_FIELDS?
>
> Cheers,
> Diego
>
> [1] https://issues.apache.org/jira/browse/SOLR-8542
>
>
>

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Re: Re: Question about grouping in distribute mode

Reply via email to