Re: Any recommended issues to work on for a newcomer?

Michael Wechner Sat, 11 May 2024 00:10:41 -0700

sure, no problem!

Maybe Adrien Grand and others might also have some feedback :-)


Thanks

Michael

Am 10.05.24 um 23:03 schrieb Chang Hank:

Thank you for these useful resources, please allow me to spend sometime look into it.
I’ll let you know asap!!

Thanks

Hank
On May 10, 2024, at 12:34 PM, Michael Wechner<[email protected]> wrote:
also we might want to consider how this relates to

https://lucene.apache.org/core/9_10_0/core/org/apache/lucene/search/Rescorer.html

In vector search reranking has become quite popular, e.g.

https://docs.cohere.com/docs/reranking
IIUC LangChain (python) for example adds the reranker as an argumentto the searcher/retriever
https://python.langchain.com/v0.1/docs/integrations/retrievers/cohere-reranker/

So maybe the following might make sense as well

TopDocs topDocsKeyword = keywordSearcher.search(keywordQuery, 10);
TopDocs topDocsVector = vectorSearcher.search(query, 50, newCohereReranker());
TopDocs topDocs = TopDocs.merge(new RRFRanker(), topDocsKeyword,topDocsVector);
WDYT?

Thanks

Michael


Am 10.05.24 um 21:08 schrieb Michael Wechner:
great, yes, let's get started :-)
What about the following pseudo code, assuming that there might bealternative ranking algorithms to RRF
StoredFieldsKeyword storedFieldsKeyword =indexReaderKeyword.storedFields();StoredFieldsVector storedFieldsVector =indexReaderKeyword.storedFields();
TopDocs topDocsKeyword = keywordSearcher.search(keywordQuery, 10);
TopDocs topDocsVector = vectorSearcher.search(vectorQuery, 50);

Ranker ranker = new RRFRanker();
TopDocs topDocs = TopDocs.rank(ranker, topDocsKeyword, topDocsVector);

for (ScoreDoc scoreDoc : topDocs.scoreDocs) {
    Document docK = storedFieldsKeyword.document(scoreDoc.doc);
    Document docV = storedFieldsVector.document(scoreDoc.doc);
    ....
}

whereas also see

https://lucene.apache.org/core/9_10_0/core/org/apache/lucene/search/TopDocs.html
https://www.elastic.co/guide/en/elasticsearch/reference/current/rrf.html

WDYT?

Thanks

Michael




Am 10.05.24 um 20:01 schrieb Chang Hank:
Hi Michael,

Sounds good to me.
Let’s do it!!

Cheers,
Hank
On May 10, 2024, at 10:50 AM, Michael Wechner<[email protected]> wrote:
Hi Hank

Very cool!
Adrien Grand suggested to implement it as a utility method on theTopDocs class, and since Adrien worked for a decade on Lucenehttps://www.elastic.co/de/blog/author/adrien-grand I guess itmakes sense to follow his advice :-) We could create a PR and worktogether on it, WDYT? All the best Michael
Am 10.05.24 um 18:51 schrieb Chang Hank:
Hi Michael,

Thank you for the reply.
This is really a cool issue to work on, I’m happy to work on thiswith you. I’ll try to do research on RRF first.
Also, are we going to implement this on the TopDocs class?

Best,
Hank
On May 9, 2024, at 11:08 PM, Michael Wechner<[email protected]> wrote:
Hi Hank

Thanks for offering your help!

I recently suggested to implement RRF (Reciprocal Rank Fusion)

https://lists.apache.org/thread/vvwvjl0gk67okn8z1wg33ogyf9qm07sz

but still have not found the time to really work on this.
Maybe you would be interested to do this or that we work on ittogether somehow?
Thanks

Michael



Am 10.05.24 um 07:27 schrieb Chang Hank:
Hi everyone,
I’m Hank Chang, currently studying Information Retrievaltopics. I’m really interested in contributing to Apache Luceneand enhance my understanding to the field.I’ve reviewed several issues posted on the Github repositorybut haven’t found a straightforward starting point. Couldsomeone please recommend suitable issues for a newcomer like meor suggest areas I could assist with?
Thank you for your time and guidance.

Best regards,
Hank Chang
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Re: Any recommended issues to work on for a newcomer?

Reply via email to