This is not strictly related to Lucene, but I'll give a talk at Berlin
Buzzwords on how I am implementing Reciprocal Rank Fusion in Apache Solr.
I'll resume my work on the contribution next week and have more to share
later.

Back in the day, I was reasoning on this and I didn't think Lucene was the
right place for an interleaving algorithm, given that Reciprocal Rank
Fusion is affected by distribution and it's not supposed to work per node.
I think I evaluated the possibility of doing it as a Lucene query or a
Lucene component but then ended up with a different approach.
I'll elaborate more when I go back to the task!

Cheers
--------------------------
*Alessandro Benedetti*
Director @ Sease Ltd.
*Apache Lucene/Solr Committer*
*Apache Solr PMC Member*

e-mail: a.benede...@sease.io


*Sease* - Information Retrieval Applied
Consulting | Training | Open Source

Website: Sease.io <http://sease.io/>
LinkedIn <https://linkedin.com/company/sease-ltd> | Twitter
<https://twitter.com/seaseltd> | Youtube
<https://www.youtube.com/channel/UCDx86ZKLYNpI3gzMercM7BQ> | Github
<https://github.com/seaseltd>


On Sat, 11 May 2024 at 09:10, Michael Wechner <michael.wech...@wyona.com>
wrote:

> sure, no problem!
>
> Maybe Adrien Grand and others might also have some feedback :-)
>
> Thanks
>
> Michael
>
> Am 10.05.24 um 23:03 schrieb Chang Hank:
>
> Thank you for these useful resources, please allow me to spend some time
> look into it.
> I’ll let you know asap!!
>
> Thanks
>
> Hank
>
> On May 10, 2024, at 12:34 PM, Michael Wechner <michael.wech...@wyona.com>
> <michael.wech...@wyona.com> wrote:
>
> also we might want to consider how this relates to
>
>
> https://lucene.apache.org/core/9_10_0/core/org/apache/lucene/search/Rescorer.html
>
> In vector search reranking has become quite popular, e.g.
>
> https://docs.cohere.com/docs/reranking
>
> IIUC LangChain (python) for example adds the reranker as an argument to
> the searcher/retriever
>
>
> https://python.langchain.com/v0.1/docs/integrations/retrievers/cohere-reranker/
>
> So maybe the following might make sense as well
>
> TopDocs topDocsKeyword = keywordSearcher.search(keywordQuery, 10);
> TopDocs topDocsVector = vectorSearcher.search(query, 50, new
> CohereReranker());
>
> TopDocs topDocs = TopDocs.merge(new RRFRanker(), topDocsKeyword,
> topDocsVector);
>
> WDYT?
>
> Thanks
>
> Michael
>
>
> Am 10.05.24 um 21:08 schrieb Michael Wechner:
>
> great, yes, let's get started :-)
>
> What about the following pseudo code, assuming that there might be
> alternative ranking algorithms to RRF
>
> StoredFieldsKeyword storedFieldsKeyword =
> indexReaderKeyword.storedFields();
> StoredFieldsVector storedFieldsVector = indexReaderKeyword.storedFields();
>
> TopDocs topDocsKeyword = keywordSearcher.search(keywordQuery, 10);
> TopDocs topDocsVector = vectorSearcher.search(vectorQuery, 50);
>
> Ranker ranker = new RRFRanker();
> TopDocs topDocs = TopDocs.rank(ranker, topDocsKeyword, topDocsVector);
>
> for (ScoreDoc scoreDoc : topDocs.scoreDocs) {
>     Document docK = storedFieldsKeyword.document(scoreDoc.doc);
>     Document docV = storedFieldsVector.document(scoreDoc.doc);
>     ....
> }
>
> whereas also see
>
>
> https://lucene.apache.org/core/9_10_0/core/org/apache/lucene/search/TopDocs.html
> https://www.elastic.co/guide/en/elasticsearch/reference/current/rrf.html
>
> WDYT?
>
> Thanks
>
> Michael
>
>
>
>
> Am 10.05.24 um 20:01 schrieb Chang Hank:
>
> Hi Michael,
>
> Sounds good to me.
> Let’s do it!!
>
> Cheers,
> Hank
>
> On May 10, 2024, at 10:50 AM, Michael Wechner <michael.wech...@wyona.com>
> <michael.wech...@wyona.com> wrote:
>
> Hi Hank
>
> Very cool!
>
> Adrien Grand suggested to implement it as a utility method on the TopDocs
> class, and since Adrien worked for a decade on Lucene
> https://www.elastic.co/de/blog/author/adrien-grand I guess it makes sense
> to follow his advice :-) We could create a PR and work together on it,
> WDYT? All the best Michael
> Am 10.05.24 um 18:51 schrieb Chang Hank:
>
> Hi Michael,
>
> Thank you for the reply.
> This is really a cool issue to work on,  I’m happy to work on this with
> you. I’ll try to do research on RRF first.
> Also, are we going to implement this on the TopDocs class?
>
> Best,
> Hank
>
>
> On May 9, 2024, at 11:08 PM, Michael Wechner <michael.wech...@wyona.com>
> <michael.wech...@wyona.com> wrote:
>
> Hi Hank
>
> Thanks for offering your help!
>
> I recently suggested to implement RRF (Reciprocal Rank Fusion)
>
> https://lists.apache.org/thread/vvwvjl0gk67okn8z1wg33ogyf9qm07sz
>
> but still have not found the time to really work on this.
>
> Maybe you would be interested to do this or that we work on it together
> somehow?
>
> Thanks
>
> Michael
>
>
>
> Am 10.05.24 um 07:27 schrieb Chang Hank:
>
> Hi everyone,
>
> I’m Hank Chang, currently studying Information Retrieval topics. I’m
> really interested in contributing to Apache Lucene and enhance my
> understanding to the field.
> I’ve reviewed several issues posted on the Github repository but haven’t
> found a straightforward starting point. Could someone please recommend
> suitable issues for a newcomer like me or suggest areas I could assist with?
>
> Thank you for your time and guidance.
>
> Best regards,
> Hank Chang
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
> For additional commands, e-mail: dev-h...@lucene.apache.org
>
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
> For additional commands, e-mail: dev-h...@lucene.apache.org
>
>
>
>
>
>
>
>
>

Reply via email to