I have 12_000_000 documents, 6_500_000 groups

With sort: It takes around 1 sec without grouping, 2 sec with grouping and
12 sec with setAllGroups(true)
Without sort: It takes around 0.2 sec without grouping, 0.6 sec with
grouping and 10 sec with setAllGroups(true)

Thank you, Erick, I will look into it

пт, 9 окт. 2020 г. в 14:32, Erick Erickson <erickerick...@gmail.com>:

> At the Solr level, CollapsingQParserPlugin see:
> https://lucene.apache.org/solr/guide/8_6/collapse-and-expand-results.html
>
> You could perhaps steal some ideas from that if you
> need this at the Lucene level.
>
> Best,
> Erick
>
> > On Oct 9, 2020, at 7:25 AM, Diego Ceccarelli (BLOOMBERG/ LONDON) <
> dceccarel...@bloomberg.net> wrote:
> >
> > Is the field that you are using to dedupe stored as a docvalue?
> >
> > From: java-user@lucene.apache.org At: 10/09/20 12:18:04To:
> java-user@lucene.apache.org
> > Subject: Deduplication of search result with custom with custom sort
> >
> > Hi,
> > I need to deduplicate search results by specific field and I have no idea
> > how to implement this properly.
> > I have tried grouping with setGroupDocsLimit(1) and it gives me expected
> > results, but has not very good performance.
> > I think that I need something like DiversifiedTopDocsCollector, but
> > suitable for collecting TopFieldDocs.
> > Is there any possibility to achieve deduplication with existing lucene
> > components, or do I need to implement my own
> DiversifiedTopFieldsCollector?
> >
> >
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
> For additional commands, e-mail: java-user-h...@lucene.apache.org
>
>

Reply via email to