I have 12_000_000 documents, 6_500_000 groups With sort: It takes around 1 sec without grouping, 2 sec with grouping and 12 sec with setAllGroups(true) Without sort: It takes around 0.2 sec without grouping, 0.6 sec with grouping and 10 sec with setAllGroups(true)
Thank you, Erick, I will look into it пт, 9 окт. 2020 г. в 14:32, Erick Erickson <[email protected]>: > At the Solr level, CollapsingQParserPlugin see: > https://lucene.apache.org/solr/guide/8_6/collapse-and-expand-results.html > > You could perhaps steal some ideas from that if you > need this at the Lucene level. > > Best, > Erick > > > On Oct 9, 2020, at 7:25 AM, Diego Ceccarelli (BLOOMBERG/ LONDON) < > [email protected]> wrote: > > > > Is the field that you are using to dedupe stored as a docvalue? > > > > From: [email protected] At: 10/09/20 12:18:04To: > [email protected] > > Subject: Deduplication of search result with custom with custom sort > > > > Hi, > > I need to deduplicate search results by specific field and I have no idea > > how to implement this properly. > > I have tried grouping with setGroupDocsLimit(1) and it gives me expected > > results, but has not very good performance. > > I think that I need something like DiversifiedTopDocsCollector, but > > suitable for collecting TopFieldDocs. > > Is there any possibility to achieve deduplication with existing lucene > > components, or do I need to implement my own > DiversifiedTopFieldsCollector? > > > > > > > --------------------------------------------------------------------- > To unsubscribe, e-mail: [email protected] > For additional commands, e-mail: [email protected] > >
