also we might want to consider how this relates to
https://lucene.apache.org/core/9_10_0/core/org/apache/lucene/search/Rescorer.html
In vector search reranking has become quite popular, e.g.
https://docs.cohere.com/docs/reranking
IIUC LangChain (python) for example adds the reranker as an argument to
the searcher/retriever
https://python.langchain.com/v0.1/docs/integrations/retrievers/cohere-reranker/
So maybe the following might make sense as well
TopDocs topDocsKeyword = keywordSearcher.search(keywordQuery, 10);
TopDocs topDocsVector = vectorSearcher.search(query, 50, new
CohereReranker());
TopDocs topDocs = TopDocs.merge(new RRFRanker(), topDocsKeyword,
topDocsVector);
WDYT?
Thanks
Michael
Am 10.05.24 um 21:08 schrieb Michael Wechner:
great, yes, let's get started :-)
What about the following pseudo code, assuming that there might be
alternative ranking algorithms to RRF
StoredFieldsKeyword storedFieldsKeyword =
indexReaderKeyword.storedFields();
StoredFieldsVector storedFieldsVector = indexReaderKeyword.storedFields();
TopDocs topDocsKeyword = keywordSearcher.search(keywordQuery, 10);
TopDocs topDocsVector = vectorSearcher.search(vectorQuery, 50);
Ranker ranker = new RRFRanker();
TopDocs topDocs = TopDocs.rank(ranker, topDocsKeyword, topDocsVector);
for (ScoreDoc scoreDoc : topDocs.scoreDocs) {
Document docK = storedFieldsKeyword.document(scoreDoc.doc);
Document docV = storedFieldsVector.document(scoreDoc.doc);
....
}
whereas also see
https://lucene.apache.org/core/9_10_0/core/org/apache/lucene/search/TopDocs.html
https://www.elastic.co/guide/en/elasticsearch/reference/current/rrf.html
WDYT?
Thanks
Michael
Am 10.05.24 um 20:01 schrieb Chang Hank:
Hi Michael,
Sounds good to me.
Let’s do it!!
Cheers,
Hank
On May 10, 2024, at 10:50 AM, Michael Wechner
<michael.wech...@wyona.com> wrote:
Hi Hank
Very cool!
Adrien Grand suggested to implement it as a utility method on the
TopDocs class, and since Adrien worked for a decade on Lucene
https://www.elastic.co/de/blog/author/adrien-grand I guess it makes
sense to follow his advice :-) We could create a PR and work
together on it, WDYT? All the best Michael
Am 10.05.24 um 18:51 schrieb Chang Hank:
Hi Michael,
Thank you for the reply.
This is really a cool issue to work on, I’m happy to work on this
with you. I’ll try to do research on RRF first.
Also, are we going to implement this on the TopDocs class?
Best,
Hank
On May 9, 2024, at 11:08 PM, Michael Wechner
<michael.wech...@wyona.com> wrote:
Hi Hank
Thanks for offering your help!
I recently suggested to implement RRF (Reciprocal Rank Fusion)
https://lists.apache.org/thread/vvwvjl0gk67okn8z1wg33ogyf9qm07sz
but still have not found the time to really work on this.
Maybe you would be interested to do this or that we work on it
together somehow?
Thanks
Michael
Am 10.05.24 um 07:27 schrieb Chang Hank:
Hi everyone,
I’m Hank Chang, currently studying Information Retrieval topics.
I’m really interested in contributing to Apache Lucene and
enhance my understanding to the field.
I’ve reviewed several issues posted on the Github repository but
haven’t found a straightforward starting point. Could someone
please recommend suitable issues for a newcomer like me or
suggest areas I could assist with?
Thank you for your time and guidance.
Best regards,
Hank Chang
---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org