sure, no problem!

Maybe Adrien Grand and others might also have some feedback :-)

Thanks

Michael

Am 10.05.24 um 23:03 schrieb Chang Hank:
Thank you for these useful resources, please allow me to spend some time look into it.
I’ll let you know asap!!

Thanks

Hank

On May 10, 2024, at 12:34 PM, Michael Wechner <michael.wech...@wyona.com> wrote:

also we might want to consider how this relates to

https://lucene.apache.org/core/9_10_0/core/org/apache/lucene/search/Rescorer.html

In vector search reranking has become quite popular, e.g.

https://docs.cohere.com/docs/reranking

IIUC LangChain (python) for example adds the reranker as an argument to the searcher/retriever

https://python.langchain.com/v0.1/docs/integrations/retrievers/cohere-reranker/

So maybe the following might make sense as well

TopDocs topDocsKeyword = keywordSearcher.search(keywordQuery, 10);
TopDocs topDocsVector = vectorSearcher.search(query, 50, new CohereReranker());

TopDocs topDocs = TopDocs.merge(new RRFRanker(), topDocsKeyword, topDocsVector);

WDYT?

Thanks

Michael


Am 10.05.24 um 21:08 schrieb Michael Wechner:
great, yes, let's get started :-)

What about the following pseudo code, assuming that there might be alternative ranking algorithms to RRF

StoredFieldsKeyword storedFieldsKeyword = indexReaderKeyword.storedFields(); StoredFieldsVector storedFieldsVector = indexReaderKeyword.storedFields();

TopDocs topDocsKeyword = keywordSearcher.search(keywordQuery, 10);
TopDocs topDocsVector = vectorSearcher.search(vectorQuery, 50);

Ranker ranker = new RRFRanker();
TopDocs topDocs = TopDocs.rank(ranker, topDocsKeyword, topDocsVector);

for (ScoreDoc scoreDoc : topDocs.scoreDocs) {
    Document docK = storedFieldsKeyword.document(scoreDoc.doc);
    Document docV = storedFieldsVector.document(scoreDoc.doc);
    ....
}

whereas also see

https://lucene.apache.org/core/9_10_0/core/org/apache/lucene/search/TopDocs.html
https://www.elastic.co/guide/en/elasticsearch/reference/current/rrf.html

WDYT?

Thanks

Michael




Am 10.05.24 um 20:01 schrieb Chang Hank:
Hi Michael,

Sounds good to me.
Let’s do it!!

Cheers,
Hank

On May 10, 2024, at 10:50 AM, Michael Wechner <michael.wech...@wyona.com> wrote:

Hi Hank

Very cool!

Adrien Grand suggested to implement it as a utility method on the TopDocs class, and since Adrien worked for a decade on Lucene https://www.elastic.co/de/blog/author/adrien-grand I guess it makes sense to follow his advice :-) We could create a PR and work together on it, WDYT? All the best Michael
Am 10.05.24 um 18:51 schrieb Chang Hank:
Hi Michael,

Thank you for the reply.
This is really a cool issue to work on, I’m happy to work on this with you. I’ll try to do research on RRF first.
Also, are we going to implement this on the TopDocs class?

Best,
Hank


On May 9, 2024, at 11:08 PM, Michael Wechner <michael.wech...@wyona.com> wrote:

Hi Hank

Thanks for offering your help!

I recently suggested to implement RRF (Reciprocal Rank Fusion)

https://lists.apache.org/thread/vvwvjl0gk67okn8z1wg33ogyf9qm07sz

but still have not found the time to really work on this.

Maybe you would be interested to do this or that we work on it together somehow?

Thanks

Michael



Am 10.05.24 um 07:27 schrieb Chang Hank:
Hi everyone,

I’m Hank Chang, currently studying Information Retrieval topics. I’m really interested in contributing to Apache Lucene and enhance my understanding to the field. I’ve reviewed several issues posted on the Github repository but haven’t found a straightforward starting point. Could someone please recommend suitable issues for a newcomer like me or suggest areas I could assist with?

Thank you for your time and guidance.

Best regards,
Hank Chang
---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org







Reply via email to