Thanks for your feedback Alessandro! I am using Lucene independent of Solr or OpenSearch, Elasticsearch, but would like to combine different result sets using RRF, therefore think that Lucene itself could be a good place actually.
Looking forward to your additional elaboration! Thanks Michael > Am 13.05.2024 um 12:34 schrieb Alessandro Benedetti <a.benede...@sease.io>: > > This is not strictly related to Lucene, but I'll give a talk at Berlin > Buzzwords on how I am implementing Reciprocal Rank Fusion in Apache Solr. > I'll resume my work on the contribution next week and have more to share > later. > > Back in the day, I was reasoning on this and I didn't think Lucene was the > right place for an interleaving algorithm, given that Reciprocal Rank Fusion > is affected by distribution and it's not supposed to work per node. > I think I evaluated the possibility of doing it as a Lucene query or a Lucene > component but then ended up with a different approach. > I'll elaborate more when I go back to the task! > > Cheers > -------------------------- > Alessandro Benedetti > Director @ Sease Ltd. > Apache Lucene/Solr Committer > Apache Solr PMC Member > > e-mail: a.benede...@sease.io <mailto:a.benede...@sease.io> > > > Sease - Information Retrieval Applied > Consulting | Training | Open Source > > Website: Sease.io <http://sease.io/> > LinkedIn <https://linkedin.com/company/sease-ltd> | Twitter > <https://twitter.com/seaseltd> | Youtube > <https://www.youtube.com/channel/UCDx86ZKLYNpI3gzMercM7BQ> | Github > <https://github.com/seaseltd> > > On Sat, 11 May 2024 at 09:10, Michael Wechner <michael.wech...@wyona.com > <mailto:michael.wech...@wyona.com>> wrote: > sure, no problem! > > Maybe Adrien Grand and others might also have some feedback :-) > > Thanks > > Michael > > Am 10.05.24 um 23:03 schrieb Chang Hank: >> Thank you for these useful resources, please allow me to spend some time >> look into it. >> I’ll let you know asap!! >> >> Thanks >> >> Hank >> >>> On May 10, 2024, at 12:34 PM, Michael Wechner <michael.wech...@wyona.com> >>> <mailto:michael.wech...@wyona.com> wrote: >>> >>> also we might want to consider how this relates to >>> >>> https://lucene.apache.org/core/9_10_0/core/org/apache/lucene/search/Rescorer.html >>> >>> <https://lucene.apache.org/core/9_10_0/core/org/apache/lucene/search/Rescorer.html> >>> >>> In vector search reranking has become quite popular, e.g. >>> >>> https://docs.cohere.com/docs/reranking >>> <https://docs.cohere.com/docs/reranking> >>> >>> IIUC LangChain (python) for example adds the reranker as an argument to the >>> searcher/retriever >>> >>> https://python.langchain.com/v0.1/docs/integrations/retrievers/cohere-reranker/ >>> >>> <https://python.langchain.com/v0.1/docs/integrations/retrievers/cohere-reranker/> >>> >>> So maybe the following might make sense as well >>> >>> TopDocs topDocsKeyword = keywordSearcher.search(keywordQuery, 10); >>> TopDocs topDocsVector = vectorSearcher.search(query, 50, new >>> CohereReranker()); >>> >>> TopDocs topDocs = TopDocs.merge(new RRFRanker(), topDocsKeyword, >>> topDocsVector); >>> >>> WDYT? >>> >>> Thanks >>> >>> Michael >>> >>> >>> Am 10.05.24 um 21:08 schrieb Michael Wechner: >>>> great, yes, let's get started :-) >>>> >>>> What about the following pseudo code, assuming that there might be >>>> alternative ranking algorithms to RRF >>>> >>>> StoredFieldsKeyword storedFieldsKeyword = >>>> indexReaderKeyword.storedFields(); >>>> StoredFieldsVector storedFieldsVector = indexReaderKeyword.storedFields(); >>>> >>>> TopDocs topDocsKeyword = keywordSearcher.search(keywordQuery, 10); >>>> TopDocs topDocsVector = vectorSearcher.search(vectorQuery, 50); >>>> >>>> Ranker ranker = new RRFRanker(); >>>> TopDocs topDocs = TopDocs.rank(ranker, topDocsKeyword, topDocsVector); >>>> >>>> for (ScoreDoc scoreDoc : topDocs.scoreDocs) { >>>> Document docK = storedFieldsKeyword.document(scoreDoc.doc); >>>> Document docV = storedFieldsVector.document(scoreDoc.doc); >>>> .... >>>> } >>>> >>>> whereas also see >>>> >>>> https://lucene.apache.org/core/9_10_0/core/org/apache/lucene/search/TopDocs.html >>>> >>>> <https://lucene.apache.org/core/9_10_0/core/org/apache/lucene/search/TopDocs.html> >>>> https://www.elastic.co/guide/en/elasticsearch/reference/current/rrf.html >>>> <https://www.elastic.co/guide/en/elasticsearch/reference/current/rrf.html> >>>> >>>> WDYT? >>>> >>>> Thanks >>>> >>>> Michael >>>> >>>> >>>> >>>> >>>> Am 10.05.24 um 20:01 schrieb Chang Hank: >>>>> Hi Michael, >>>>> >>>>> Sounds good to me. >>>>> Let’s do it!! >>>>> >>>>> Cheers, >>>>> Hank >>>>> >>>>>> On May 10, 2024, at 10:50 AM, Michael Wechner >>>>>> <michael.wech...@wyona.com> <mailto:michael.wech...@wyona.com> wrote: >>>>>> >>>>>> Hi Hank >>>>>> >>>>>> Very cool! >>>>>> >>>>>> Adrien Grand suggested to implement it as a utility method on the >>>>>> TopDocs class, and since Adrien worked for a decade on Lucene >>>>>> https://www.elastic.co/de/blog/author/adrien-grand >>>>>> <https://www.elastic.co/de/blog/author/adrien-grand> >>>>>> I guess it makes sense to follow his advice :-) >>>>>> >>>>>> We could create a PR and work together on it, WDYT? >>>>>> >>>>>> All the best >>>>>> >>>>>> Michael >>>>>> >>>>>> Am 10.05.24 um 18:51 schrieb Chang Hank: >>>>>>> Hi Michael, >>>>>>> >>>>>>> Thank you for the reply. >>>>>>> This is really a cool issue to work on, I’m happy to work on this with >>>>>>> you. I’ll try to do research on RRF first. >>>>>>> Also, are we going to implement this on the TopDocs class? >>>>>>> >>>>>>> Best, >>>>>>> Hank >>>>>>> >>>>>>> >>>>>>>> On May 9, 2024, at 11:08 PM, Michael Wechner >>>>>>>> <michael.wech...@wyona.com> <mailto:michael.wech...@wyona.com> wrote: >>>>>>>> >>>>>>>> Hi Hank >>>>>>>> >>>>>>>> Thanks for offering your help! >>>>>>>> >>>>>>>> I recently suggested to implement RRF (Reciprocal Rank Fusion) >>>>>>>> >>>>>>>> https://lists.apache.org/thread/vvwvjl0gk67okn8z1wg33ogyf9qm07sz >>>>>>>> <https://lists.apache.org/thread/vvwvjl0gk67okn8z1wg33ogyf9qm07sz> >>>>>>>> >>>>>>>> but still have not found the time to really work on this. >>>>>>>> >>>>>>>> Maybe you would be interested to do this or that we work on it >>>>>>>> together somehow? >>>>>>>> >>>>>>>> Thanks >>>>>>>> >>>>>>>> Michael >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> Am 10.05.24 um 07:27 schrieb Chang Hank: >>>>>>>>> Hi everyone, >>>>>>>>> >>>>>>>>> I’m Hank Chang, currently studying Information Retrieval topics. I’m >>>>>>>>> really interested in contributing to Apache Lucene and enhance my >>>>>>>>> understanding to the field. >>>>>>>>> I’ve reviewed several issues posted on the Github repository but >>>>>>>>> haven’t found a straightforward starting point. Could someone please >>>>>>>>> recommend suitable issues for a newcomer like me or suggest areas I >>>>>>>>> could assist with? >>>>>>>>> >>>>>>>>> Thank you for your time and guidance. >>>>>>>>> >>>>>>>>> Best regards, >>>>>>>>> Hank Chang >>>>>>>>> --------------------------------------------------------------------- >>>>>>>>> To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org >>>>>>>>> <mailto:dev-unsubscr...@lucene.apache.org> >>>>>>>>> For additional commands, e-mail: dev-h...@lucene.apache.org >>>>>>>>> <mailto:dev-h...@lucene.apache.org> >>>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> --------------------------------------------------------------------- >>>>>>>> To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org >>>>>>>> <mailto:dev-unsubscr...@lucene.apache.org> >>>>>>>> For additional commands, e-mail: dev-h...@lucene.apache.org >>>>>>>> <mailto:dev-h...@lucene.apache.org> >>>>>>>> >>>>>>> >>>>>> >>>>> >>>> >>> >> >