Re: Any recommended issues to work on for a newcomer?

Michael Wechner Mon, 13 May 2024 04:41:46 -0700

Thanks for your feedback Alessandro!

I am using Lucene independent of Solr or OpenSearch, Elasticsearch, but would 
like to combine different result sets using RRF, therefore think that Lucene 
itself could be a good place actually.


Looking forward to your additional elaboration!

Thanks

Michael




> Am 13.05.2024 um 12:34 schrieb Alessandro Benedetti <[email protected]>:
> 
> This is not strictly related to Lucene, but I'll give a talk at Berlin 
> Buzzwords on how I am implementing Reciprocal Rank Fusion in Apache Solr.
> I'll resume my work on the contribution next week and have more to share 
> later.
> 
> Back in the day, I was reasoning on this and I didn't think Lucene was the 
> right place for an interleaving algorithm, given that Reciprocal Rank Fusion 
> is affected by distribution and it's not supposed to work per node.
> I think I evaluated the possibility of doing it as a Lucene query or a Lucene 
> component but then ended up with a different approach.
> I'll elaborate more when I go back to the task!
> 
> Cheers
> --------------------------
> Alessandro Benedetti
> Director @ Sease Ltd.
> Apache Lucene/Solr Committer
> Apache Solr PMC Member
> 
> e-mail: [email protected] <mailto:[email protected]>
> 
> 
> Sease - Information Retrieval Applied
> Consulting | Training | Open Source
> 
> Website: Sease.io <http://sease.io/>
> LinkedIn <https://linkedin.com/company/sease-ltd> | Twitter 
> <https://twitter.com/seaseltd> | Youtube 
> <https://www.youtube.com/channel/UCDx86ZKLYNpI3gzMercM7BQ> | Github 
> <https://github.com/seaseltd>
> 
> On Sat, 11 May 2024 at 09:10, Michael Wechner <[email protected] 
> <mailto:[email protected]>> wrote:
> sure, no problem!
> 
> Maybe Adrien Grand and others might also have some feedback :-)
> 
> Thanks
> 
> Michael
> 
> Am 10.05.24 um 23:03 schrieb Chang Hank:
>> Thank you for these useful resources, please allow me to spend some time 
>> look into it. 
>> I’ll let you know asap!!
>> 
>> Thanks
>> 
>> Hank
>> 
>>> On May 10, 2024, at 12:34 PM, Michael Wechner <[email protected]> 
>>> <mailto:[email protected]> wrote:
>>> 
>>> also we might want to consider how this relates to
>>> 
>>> https://lucene.apache.org/core/9_10_0/core/org/apache/lucene/search/Rescorer.html
>>>  
>>> <https://lucene.apache.org/core/9_10_0/core/org/apache/lucene/search/Rescorer.html>
>>> 
>>> In vector search reranking has become quite popular, e.g.
>>> 
>>> https://docs.cohere.com/docs/reranking 
>>> <https://docs.cohere.com/docs/reranking>
>>> 
>>> IIUC LangChain (python) for example adds the reranker as an argument to the 
>>> searcher/retriever
>>> 
>>> https://python.langchain.com/v0.1/docs/integrations/retrievers/cohere-reranker/
>>>  
>>> <https://python.langchain.com/v0.1/docs/integrations/retrievers/cohere-reranker/>
>>> 
>>> So maybe the following might make sense as well
>>> 
>>> TopDocs topDocsKeyword = keywordSearcher.search(keywordQuery, 10);
>>> TopDocs topDocsVector = vectorSearcher.search(query, 50, new 
>>> CohereReranker());
>>> 
>>> TopDocs topDocs = TopDocs.merge(new RRFRanker(), topDocsKeyword, 
>>> topDocsVector);
>>> 
>>> WDYT?
>>> 
>>> Thanks
>>> 
>>> Michael
>>> 
>>> 
>>> Am 10.05.24 um 21:08 schrieb Michael Wechner:
>>>> great, yes, let's get started :-)
>>>> 
>>>> What about the following pseudo code, assuming that there might be 
>>>> alternative ranking algorithms to RRF
>>>> 
>>>> StoredFieldsKeyword storedFieldsKeyword = 
>>>> indexReaderKeyword.storedFields();
>>>> StoredFieldsVector storedFieldsVector = indexReaderKeyword.storedFields();
>>>> 
>>>> TopDocs topDocsKeyword = keywordSearcher.search(keywordQuery, 10);
>>>> TopDocs topDocsVector = vectorSearcher.search(vectorQuery, 50);
>>>> 
>>>> Ranker ranker = new RRFRanker();
>>>> TopDocs topDocs = TopDocs.rank(ranker, topDocsKeyword, topDocsVector);
>>>> 
>>>> for (ScoreDoc scoreDoc : topDocs.scoreDocs) {
>>>>     Document docK = storedFieldsKeyword.document(scoreDoc.doc);
>>>>     Document docV = storedFieldsVector.document(scoreDoc.doc);
>>>>     ....
>>>> } 
>>>> 
>>>> whereas also see 
>>>> 
>>>> https://lucene.apache.org/core/9_10_0/core/org/apache/lucene/search/TopDocs.html
>>>>  
>>>> <https://lucene.apache.org/core/9_10_0/core/org/apache/lucene/search/TopDocs.html>
>>>> https://www.elastic.co/guide/en/elasticsearch/reference/current/rrf.html 
>>>> <https://www.elastic.co/guide/en/elasticsearch/reference/current/rrf.html>
>>>> 
>>>> WDYT?
>>>> 
>>>> Thanks
>>>> 
>>>> Michael
>>>> 
>>>> 
>>>> 
>>>> 
>>>> Am 10.05.24 um 20:01 schrieb Chang Hank:
>>>>> Hi Michael,
>>>>> 
>>>>> Sounds good to me. 
>>>>> Let’s do it!!
>>>>> 
>>>>> Cheers,
>>>>> Hank
>>>>> 
>>>>>> On May 10, 2024, at 10:50 AM, Michael Wechner 
>>>>>> <[email protected]> <mailto:[email protected]> wrote:
>>>>>> 
>>>>>> Hi Hank
>>>>>> 
>>>>>> Very cool!
>>>>>> 
>>>>>> Adrien Grand suggested to implement it as  a utility method on the 
>>>>>> TopDocs class, and since Adrien worked for a decade on Lucene
>>>>>> https://www.elastic.co/de/blog/author/adrien-grand 
>>>>>> <https://www.elastic.co/de/blog/author/adrien-grand>
>>>>>> I guess it makes sense to follow his advice :-)
>>>>>> 
>>>>>> We could create a PR and work together on it, WDYT?
>>>>>> 
>>>>>> All the best
>>>>>> 
>>>>>> Michael
>>>>>> 
>>>>>> Am 10.05.24 um 18:51 schrieb Chang Hank:
>>>>>>> Hi Michael, 
>>>>>>> 
>>>>>>> Thank you for the reply.
>>>>>>> This is really a cool issue to work on,  I’m happy to work on this with 
>>>>>>> you. I’ll try to do research on RRF first.
>>>>>>> Also, are we going to implement this on the TopDocs class?
>>>>>>> 
>>>>>>> Best,
>>>>>>> Hank
>>>>>>> 
>>>>>>> 
>>>>>>>> On May 9, 2024, at 11:08 PM, Michael Wechner 
>>>>>>>> <[email protected]> <mailto:[email protected]> wrote:
>>>>>>>> 
>>>>>>>> Hi Hank
>>>>>>>> 
>>>>>>>> Thanks for offering your help!
>>>>>>>> 
>>>>>>>> I recently suggested to implement RRF (Reciprocal Rank Fusion)
>>>>>>>> 
>>>>>>>> https://lists.apache.org/thread/vvwvjl0gk67okn8z1wg33ogyf9qm07sz 
>>>>>>>> <https://lists.apache.org/thread/vvwvjl0gk67okn8z1wg33ogyf9qm07sz>
>>>>>>>> 
>>>>>>>> but still have not found the time to really work on this.
>>>>>>>> 
>>>>>>>> Maybe you would be interested to do this or that we work on it 
>>>>>>>> together somehow?
>>>>>>>> 
>>>>>>>> Thanks
>>>>>>>> 
>>>>>>>> Michael
>>>>>>>> 
>>>>>>>> 
>>>>>>>> 
>>>>>>>> Am 10.05.24 um 07:27 schrieb Chang Hank:
>>>>>>>>> Hi everyone,
>>>>>>>>> 
>>>>>>>>> I’m Hank Chang, currently studying Information Retrieval topics. I’m 
>>>>>>>>> really interested in contributing to Apache Lucene and enhance my 
>>>>>>>>> understanding to the field.
>>>>>>>>> I’ve reviewed several issues posted on the Github repository but 
>>>>>>>>> haven’t found a straightforward starting point. Could someone please 
>>>>>>>>> recommend suitable issues for a newcomer like me or suggest areas I 
>>>>>>>>> could assist with?
>>>>>>>>> 
>>>>>>>>> Thank you for your time and guidance.
>>>>>>>>> 
>>>>>>>>> Best regards,
>>>>>>>>> Hank Chang
>>>>>>>>> ---------------------------------------------------------------------
>>>>>>>>> To unsubscribe, e-mail: [email protected] 
>>>>>>>>> <mailto:[email protected]>
>>>>>>>>> For additional commands, e-mail: [email protected] 
>>>>>>>>> <mailto:[email protected]>
>>>>>>>>> 
>>>>>>>> 
>>>>>>>> 
>>>>>>>> ---------------------------------------------------------------------
>>>>>>>> To unsubscribe, e-mail: [email protected] 
>>>>>>>> <mailto:[email protected]>
>>>>>>>> For additional commands, e-mail: [email protected] 
>>>>>>>> <mailto:[email protected]>
>>>>>>>> 
>>>>>>> 
>>>>>> 
>>>>> 
>>>> 
>>> 
>> 
>

Re: Any recommended issues to work on for a newcomer?

Reply via email to