This is not strictly related to Lucene, but I'll give a talk at Berlin Buzzwords on how I am implementing Reciprocal Rank Fusion in Apache Solr. I'll resume my work on the contribution next week and have more to share later.
Back in the day, I was reasoning on this and I didn't think Lucene was the right place for an interleaving algorithm, given that Reciprocal Rank Fusion is affected by distribution and it's not supposed to work per node. I think I evaluated the possibility of doing it as a Lucene query or a Lucene component but then ended up with a different approach. I'll elaborate more when I go back to the task! Cheers -------------------------- *Alessandro Benedetti* Director @ Sease Ltd. *Apache Lucene/Solr Committer* *Apache Solr PMC Member* e-mail: a.benede...@sease.io *Sease* - Information Retrieval Applied Consulting | Training | Open Source Website: Sease.io <http://sease.io/> LinkedIn <https://linkedin.com/company/sease-ltd> | Twitter <https://twitter.com/seaseltd> | Youtube <https://www.youtube.com/channel/UCDx86ZKLYNpI3gzMercM7BQ> | Github <https://github.com/seaseltd> On Sat, 11 May 2024 at 09:10, Michael Wechner <michael.wech...@wyona.com> wrote: > sure, no problem! > > Maybe Adrien Grand and others might also have some feedback :-) > > Thanks > > Michael > > Am 10.05.24 um 23:03 schrieb Chang Hank: > > Thank you for these useful resources, please allow me to spend some time > look into it. > I’ll let you know asap!! > > Thanks > > Hank > > On May 10, 2024, at 12:34 PM, Michael Wechner <michael.wech...@wyona.com> > <michael.wech...@wyona.com> wrote: > > also we might want to consider how this relates to > > > https://lucene.apache.org/core/9_10_0/core/org/apache/lucene/search/Rescorer.html > > In vector search reranking has become quite popular, e.g. > > https://docs.cohere.com/docs/reranking > > IIUC LangChain (python) for example adds the reranker as an argument to > the searcher/retriever > > > https://python.langchain.com/v0.1/docs/integrations/retrievers/cohere-reranker/ > > So maybe the following might make sense as well > > TopDocs topDocsKeyword = keywordSearcher.search(keywordQuery, 10); > TopDocs topDocsVector = vectorSearcher.search(query, 50, new > CohereReranker()); > > TopDocs topDocs = TopDocs.merge(new RRFRanker(), topDocsKeyword, > topDocsVector); > > WDYT? > > Thanks > > Michael > > > Am 10.05.24 um 21:08 schrieb Michael Wechner: > > great, yes, let's get started :-) > > What about the following pseudo code, assuming that there might be > alternative ranking algorithms to RRF > > StoredFieldsKeyword storedFieldsKeyword = > indexReaderKeyword.storedFields(); > StoredFieldsVector storedFieldsVector = indexReaderKeyword.storedFields(); > > TopDocs topDocsKeyword = keywordSearcher.search(keywordQuery, 10); > TopDocs topDocsVector = vectorSearcher.search(vectorQuery, 50); > > Ranker ranker = new RRFRanker(); > TopDocs topDocs = TopDocs.rank(ranker, topDocsKeyword, topDocsVector); > > for (ScoreDoc scoreDoc : topDocs.scoreDocs) { > Document docK = storedFieldsKeyword.document(scoreDoc.doc); > Document docV = storedFieldsVector.document(scoreDoc.doc); > .... > } > > whereas also see > > > https://lucene.apache.org/core/9_10_0/core/org/apache/lucene/search/TopDocs.html > https://www.elastic.co/guide/en/elasticsearch/reference/current/rrf.html > > WDYT? > > Thanks > > Michael > > > > > Am 10.05.24 um 20:01 schrieb Chang Hank: > > Hi Michael, > > Sounds good to me. > Let’s do it!! > > Cheers, > Hank > > On May 10, 2024, at 10:50 AM, Michael Wechner <michael.wech...@wyona.com> > <michael.wech...@wyona.com> wrote: > > Hi Hank > > Very cool! > > Adrien Grand suggested to implement it as a utility method on the TopDocs > class, and since Adrien worked for a decade on Lucene > https://www.elastic.co/de/blog/author/adrien-grand I guess it makes sense > to follow his advice :-) We could create a PR and work together on it, > WDYT? All the best Michael > Am 10.05.24 um 18:51 schrieb Chang Hank: > > Hi Michael, > > Thank you for the reply. > This is really a cool issue to work on, I’m happy to work on this with > you. I’ll try to do research on RRF first. > Also, are we going to implement this on the TopDocs class? > > Best, > Hank > > > On May 9, 2024, at 11:08 PM, Michael Wechner <michael.wech...@wyona.com> > <michael.wech...@wyona.com> wrote: > > Hi Hank > > Thanks for offering your help! > > I recently suggested to implement RRF (Reciprocal Rank Fusion) > > https://lists.apache.org/thread/vvwvjl0gk67okn8z1wg33ogyf9qm07sz > > but still have not found the time to really work on this. > > Maybe you would be interested to do this or that we work on it together > somehow? > > Thanks > > Michael > > > > Am 10.05.24 um 07:27 schrieb Chang Hank: > > Hi everyone, > > I’m Hank Chang, currently studying Information Retrieval topics. I’m > really interested in contributing to Apache Lucene and enhance my > understanding to the field. > I’ve reviewed several issues posted on the Github repository but haven’t > found a straightforward starting point. Could someone please recommend > suitable issues for a newcomer like me or suggest areas I could assist with? > > Thank you for your time and guidance. > > Best regards, > Hank Chang > --------------------------------------------------------------------- > To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org > For additional commands, e-mail: dev-h...@lucene.apache.org > > > > --------------------------------------------------------------------- > To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org > For additional commands, e-mail: dev-h...@lucene.apache.org > > > > > > > > >