Re: Any recommended issues to work on for a newcomer?

2024-05-31 Thread Michael Wechner

thank you very much for sharing!

Unfortunately I did not find time yet to review Hank's work yet, but 
maybe Hank can already proceed based on your code.


Thanks

Michael

Am 31.05.24 um 18:50 schrieb Alessandro Benedetti:
Just for your curiosity, my Reciprocal Rank Fusion contribution to 
Solr is in decent shape now:

https://github.com/apache/solr/pull/2489
Everything is just Solr's side but maybe it can be of some sort of 
inspiration if you want to do a similar work in Lucene.


Cheers
--
*Alessandro Benedetti*
Director @ Sease Ltd.
/Apache Lucene/Solr Committer/
/Apache Solr PMC Member/

e-mail: a.benede...@sease.io/
/

*Sease* - Information Retrieval Applied
Consulting | Training | Open Source

Website: Sease.io 
LinkedIn  | Twitter 
 | Youtube 
 | Github 




On Mon, 20 May 2024 at 08:16, Michael Wechner 
 wrote:


Hi Hank

Very cool, thank you, will try to do this asap!

All the best

Michael


Am 19.05.24 um 01:42 schrieb Chang Hank:

Hey Michael,

I wrote the first version of my idea about implementing RRF in
Lucene, here the link of the code
https://gist.github.com/hack4chang/ee2b37eab80bd82e574ff4f94ed204e9.
Right now I have some questions, one is about the shardIndex to
be returned, another one is the TotalHits value, please take a
look at the code and kindly leave some comments below.

Thanks,
Hank


On May 18, 2024, at 2:01 PM, Chang Hank
  wrote:

Or maybe we can first create an issue and PR based on the issue
number?
WDYT?

Best,

Hank


On May 18, 2024, at 11:29 AM, Chang Hank
  wrote:

Hey Michael,

Sorry I was a bit busy this week, but I’ve looked into the
resources you provided and also some useful advice from
Alessandro and Adrien.

I have a briefly understanding of how RRF works, but I’m not
quite sure how we should implement it. Based on the advice from
Alessandro and Adrien, it seems we need to consider that the
search results are located at different shards. According to
Alessandro, we should aggregate the ranked lists from all
distributed nodes and then apply RRF.
Are we going to implement this aggregation logic inside our RRF
method?

Also could you please create a PR so we can discuss more
details further?

All the best,

Hank


On May 13, 2024, at 10:09 AM, Michael Wechner
 
wrote:

Great, sounds like we have plan :-)

Hank and I can get started trying to understand the internals
better ...

Thanks

Michael

Am 13.05.24 um 18:21 schrieb Alessandro Benedetti:

Sure, we can make it work but in a distributed environment
you have to run first each query distributed (aggregating all
nodes) and then RRF on top of the aggregated ranked lists.
Doing RRF per node first and then aggregate per shard won't
return the same results I suspect.
When I go back to working on the task I'll be able to
elaborate more!

Cheers
--
*Alessandro Benedetti*
Director @ Sease Ltd.
/Apache Lucene/Solr Committer/
/Apache Solr PMC Member/

e-mail: a.benede...@sease.io/
/

*Sease* - Information Retrieval Applied
Consulting | Training | Open Source

Website: Sease.io 
LinkedIn  | Twitter
 | Youtube
 |
Github 


On Mon, 13 May 2024 at 14:12, Adrien Grand
 wrote:

> Maybe Adrien Grand and others might also have some
feedback :-)

I'd suggest the signature to look something like `TopDocs
TopDocs#rrf(int topN, int k, TopDocs[] hits)` to be
consistent with `TopDocs#merge`. Internally, it should
look at `ScoreDoc#shardId` and `ScoreDoc#doc` to figure
out which hits map to the same document.

> Back in the day, I was reasoning on this and I didn't
think Lucene was the right place for an interleaving
algorithm, given that Reciprocal Rank Fusion is affected
by distribution and it's not supposed to work per node.

To me this is like `TopDocs#merge`. There are changes
needed on the application side to hook this call into the
logic that combines hits that come from multiple shards
(multiple queries in the case of RRF), but Lucene can
still provide the merging logic.

On Mon, May 13, 2024 at 1:41 PM Michael Wechner
 wrote:

Thanks for your feedback Alessandro!

I am using Lucene 

Re: Any recommended issues to work on for a newcomer?

2024-05-31 Thread Alessandro Benedetti
Just for your curiosity, my Reciprocal Rank Fusion contribution to Solr is
in decent shape now:
https://github.com/apache/solr/pull/2489
Everything is just Solr's side but maybe it can be of some sort of
inspiration if you want to do a similar work in Lucene.

Cheers
--
*Alessandro Benedetti*
Director @ Sease Ltd.
*Apache Lucene/Solr Committer*
*Apache Solr PMC Member*

e-mail: a.benede...@sease.io


*Sease* - Information Retrieval Applied
Consulting | Training | Open Source

Website: Sease.io 
LinkedIn  | Twitter
 | Youtube
 | Github



On Mon, 20 May 2024 at 08:16, Michael Wechner 
wrote:

> Hi Hank
>
> Very cool, thank you, will try to do this asap!
>
> All the best
>
> Michael
>
>
> Am 19.05.24 um 01:42 schrieb Chang Hank:
>
> Hey Michael,
>
> I wrote the first version of my idea about implementing RRF in Lucene,
> here the link of the code
> https://gist.github.com/hack4chang/ee2b37eab80bd82e574ff4f94ed204e9.
> Right now I have some questions, one is about the shardIndex to be
> returned, another one is the TotalHits value, please take a look at the
> code and kindly leave some comments below.
>
> Thanks,
> Hank
>
> On May 18, 2024, at 2:01 PM, Chang Hank 
>  wrote:
>
> Or maybe we can first create an issue and PR based on the issue number?
> WDYT?
>
> Best,
>
> Hank
>
> On May 18, 2024, at 11:29 AM, Chang Hank 
>  wrote:
>
> Hey Michael,
>
> Sorry I was a bit busy this week, but I’ve looked into the resources you
> provided and also some useful advice from Alessandro and Adrien.
>
> I have a briefly understanding of how RRF works, but I’m not quite sure
> how we should implement it. Based on the advice from Alessandro and Adrien,
> it seems we need to consider that the search results are located at
> different shards. According to Alessandro, we should aggregate the ranked
> lists from all distributed nodes and then apply RRF.
> Are we going to implement this aggregation logic inside our RRF method?
>
> Also could you please create a PR so we can discuss more details further?
>
> All the best,
>
> Hank
>
> On May 13, 2024, at 10:09 AM, Michael Wechner 
>  wrote:
>
> Great, sounds like we have plan :-)
>
> Hank and I can get started trying to understand the internals better ...
>
> Thanks
>
> Michael
>
> Am 13.05.24 um 18:21 schrieb Alessandro Benedetti:
>
> Sure, we can make it work but in a distributed environment you have to run
> first each query distributed (aggregating all nodes) and then RRF on top of
> the aggregated ranked lists.
> Doing RRF per node first and then aggregate per shard won't return the
> same results I suspect.
> When I go back to working on the task I'll be able to elaborate more!
>
> Cheers
> --
> *Alessandro Benedetti*
> Director @ Sease Ltd.
> *Apache Lucene/Solr Committer*
> *Apache Solr PMC Member*
>
> e-mail: a.benede...@sease.io
>
>
> *Sease* - Information Retrieval Applied
> Consulting | Training | Open Source
>
> Website: Sease.io 
> LinkedIn  | Twitter
>  | Youtube
>  | Github
> 
>
>
> On Mon, 13 May 2024 at 14:12, Adrien Grand  wrote:
>
>> > Maybe Adrien Grand and others might also have some feedback :-)
>>
>> I'd suggest the signature to look something like `TopDocs TopDocs#rrf(int
>> topN, int k, TopDocs[] hits)` to be consistent with `TopDocs#merge`.
>> Internally, it should look at `ScoreDoc#shardId` and `ScoreDoc#doc` to
>> figure out which hits map to the same document.
>>
>> > Back in the day, I was reasoning on this and I didn't think Lucene was
>> the right place for an interleaving algorithm, given that Reciprocal Rank
>> Fusion is affected by distribution and it's not supposed to work per node.
>>
>> To me this is like `TopDocs#merge`. There are changes needed on the
>> application side to hook this call into the logic that combines hits that
>> come from multiple shards (multiple queries in the case of RRF), but Lucene
>> can still provide the merging logic.
>>
>> On Mon, May 13, 2024 at 1:41 PM Michael Wechner <
>> michael.wech...@wyona.com> wrote:
>>
>>> Thanks for your feedback Alessandro!
>>>
>>> I am using Lucene independent of Solr or OpenSearch, Elasticsearch, but
>>> would like to combine different result sets using RRF, therefore think that
>>> Lucene itself could be a good place actually.
>>>
>>> Looking forward to your additional elaboration!
>>>
>>> Thanks
>>>
>>> Michael
>>>
>>>
>>>
>>>
>>> Am 13.05.2024 um 12:34 schrieb Alessandro Benedetti <
>>> a.benede...@sease.io>:
>>>
>>> This is not strictly related to Lucene, but I'll give a talk at Berlin
>>> Buzzwords on how I am implementing Reciprocal Rank Fusion in Apache Solr.
>>> I'll resume my work 

Re: Any recommended issues to work on for a newcomer?

2024-05-20 Thread Michael Wechner

Hi Hank

Very cool, thank you, will try to do this asap!

All the best

Michael


Am 19.05.24 um 01:42 schrieb Chang Hank:

Hey Michael,

I wrote the first version of my idea about implementing RRF in Lucene, 
here the link of the code 
https://gist.github.com/hack4chang/ee2b37eab80bd82e574ff4f94ed204e9.
Right now I have some questions, one is about the shardIndex to be 
returned, another one is the TotalHits value, please take a look at 
the code and kindly leave some comments below.


Thanks,
Hank


On May 18, 2024, at 2:01 PM, Chang Hank  wrote:

Or maybe we can first create an issue and PR based on the issue number?
WDYT?

Best,

Hank

On May 18, 2024, at 11:29 AM, Chang Hank  
wrote:


Hey Michael,

Sorry I was a bit busy this week, but I’ve looked into the resources 
you provided and also some useful advice from Alessandro and Adrien.


I have a briefly understanding of how RRF works, but I’m not quite 
sure how we should implement it. Based on the advice from Alessandro 
and Adrien, it seems we need to consider that the search results are 
located at different shards. According to Alessandro, we should 
aggregate the ranked lists from all distributed nodes and then apply 
RRF.

Are we going to implement this aggregation logic inside our RRF method?

Also could you please create a PR so we can discuss more details 
further?


All the best,

Hank

On May 13, 2024, at 10:09 AM, Michael Wechner 
 wrote:


Great, sounds like we have plan :-)

Hank and I can get started trying to understand the internals 
better ...


Thanks

Michael

Am 13.05.24 um 18:21 schrieb Alessandro Benedetti:
Sure, we can make it work but in a distributed environment you 
have to run first each query distributed (aggregating all nodes) 
and then RRF on top of the aggregated ranked lists.
Doing RRF per node first and then aggregate per shard won't return 
the same results I suspect.

When I go back to working on the task I'll be able to elaborate more!

Cheers
--
*Alessandro Benedetti*
Director @ Sease Ltd.
/Apache Lucene/Solr Committer/
/Apache Solr PMC Member/

e-mail: a.benede...@sease.io/
/

*Sease* - Information Retrieval Applied
Consulting | Training | Open Source

Website: Sease.io 
LinkedIn  | Twitter 
 | Youtube 
 | 
Github 



On Mon, 13 May 2024 at 14:12, Adrien Grand  wrote:

> Maybe Adrien Grand and others might also have some feedback :-)

I'd suggest the signature to look something like `TopDocs
TopDocs#rrf(int topN, int k, TopDocs[] hits)` to be consistent
with `TopDocs#merge`. Internally, it should look at
`ScoreDoc#shardId` and `ScoreDoc#doc` to figure out which hits
map to the same document.

> Back in the day, I was reasoning on this and I didn't think
Lucene was the right place for an interleaving algorithm,
given that Reciprocal Rank Fusion is affected by distribution
and it's not supposed to work per node.

To me this is like `TopDocs#merge`. There are changes needed
on the application side to hook this call into the logic that
combines hits that come from multiple shards (multiple queries
in the case of RRF), but Lucene can still provide the merging
logic.

On Mon, May 13, 2024 at 1:41 PM Michael Wechner
 wrote:

Thanks for your feedback Alessandro!

I am using Lucene independent of Solr or OpenSearch,
Elasticsearch, but would like to combine different result
sets using RRF, therefore think that Lucene itself could
be a good place actually.

Looking forward to your additional elaboration!

Thanks

Michael





Am 13.05.2024 um 12:34 schrieb Alessandro Benedetti
:

This is not strictly related to Lucene, but I'll give a
talk at Berlin Buzzwords on how I am implementing
Reciprocal Rank Fusion in Apache Solr.
I'll resume my work on the contribution next week and
have more to share later.

Back in the day, I was reasoning on this and I didn't
think Lucene was the right place for an interleaving
algorithm, given that Reciprocal Rank Fusion is affected
by distribution and it's not supposed to work per node.
I think I evaluated the possibility of doing it as a
Lucene query or a Lucene component but then ended up with
a different approach.
I'll elaborate more when I go back to the task!

Cheers
--
*Alessandro Benedetti*
Director @ Sease Ltd.
/Apache Lucene/Solr Committer/
/Apache Solr PMC Member/

e-mail: a.benede...@sease.io/
/

*Sease* - Information Retrieval Applied
Consulting | Training | Open Source

Website: Sease.io 
LinkedIn 

Re: Any recommended issues to work on for a newcomer?

2024-05-18 Thread Chang Hank
Hey Michael,

I wrote the first version of my idea about implementing RRF in Lucene, here the 
link of the code 
https://gist.github.com/hack4chang/ee2b37eab80bd82e574ff4f94ed204e9.
Right now I have some questions, one is about the shardIndex to be returned, 
another one is the TotalHits value, please take a look at the code and kindly 
leave some comments below.

Thanks,
Hank

> On May 18, 2024, at 2:01 PM, Chang Hank  wrote:
> 
> Or maybe we can first create an issue and PR based on the issue number?
> WDYT?
> 
> Best,
> 
> Hank
> 
>> On May 18, 2024, at 11:29 AM, Chang Hank  wrote:
>> 
>> Hey Michael, 
>> 
>> Sorry I was a bit busy this week, but I’ve looked into the resources you 
>> provided and also some useful advice from Alessandro and Adrien.
>> 
>> I have a briefly understanding of how RRF works, but I’m not quite sure how 
>> we should implement it. Based on the advice from Alessandro and Adrien, it 
>> seems we need to consider that the search results are located at different 
>> shards. According to Alessandro, we should aggregate the ranked lists from 
>> all distributed nodes and then apply RRF.
>> Are we going to implement this aggregation logic inside our RRF method? 
>> 
>> Also could you please create a PR so we can discuss more details further?
>> 
>> All the best,
>> 
>> Hank
>> 
>>> On May 13, 2024, at 10:09 AM, Michael Wechner  
>>> wrote:
>>> 
>>> Great, sounds like we have plan :-)
>>> 
>>> Hank and I can get started trying to understand the internals better ...
>>> 
>>> Thanks
>>> 
>>> Michael
>>> 
>>> Am 13.05.24 um 18:21 schrieb Alessandro Benedetti:
 Sure, we can make it work but in a distributed environment you have to run 
 first each query distributed (aggregating all nodes) and then RRF on top 
 of the aggregated ranked lists.
 Doing RRF per node first and then aggregate per shard won't return the 
 same results I suspect.
 When I go back to working on the task I'll be able to elaborate more!
 
 Cheers
 --
 Alessandro Benedetti
 Director @ Sease Ltd.
 Apache Lucene/Solr Committer
 Apache Solr PMC Member
 
 e-mail: a.benede...@sease.io 
 
 
 Sease - Information Retrieval Applied
 Consulting | Training | Open Source
 
 Website: Sease.io 
 LinkedIn  | Twitter 
  | Youtube 
  | Github 
 
 
 On Mon, 13 May 2024 at 14:12, Adrien Grand >>> > wrote:
> > Maybe Adrien Grand and others might also have some feedback :-)
> 
> I'd suggest the signature to look something like `TopDocs TopDocs#rrf(int 
> topN, int k, TopDocs[] hits)` to be consistent with `TopDocs#merge`. 
> Internally, it should look at `ScoreDoc#shardId` and `ScoreDoc#doc` to 
> figure out which hits map to the same document.
> 
> > Back in the day, I was reasoning on this and I didn't think Lucene was 
> > the right place for an interleaving algorithm, given that Reciprocal 
> > Rank Fusion is affected by distribution and it's not supposed to work 
> > per node.
> 
> To me this is like `TopDocs#merge`. There are changes needed on the 
> application side to hook this call into the logic that combines hits that 
> come from multiple shards (multiple queries in the case of RRF), but 
> Lucene can still provide the merging logic.
> 
> On Mon, May 13, 2024 at 1:41 PM Michael Wechner 
> mailto:michael.wech...@wyona.com>> wrote:
>> Thanks for your feedback Alessandro!
>> 
>> I am using Lucene independent of Solr or OpenSearch, Elasticsearch, but 
>> would like to combine different result sets using RRF, therefore think 
>> that Lucene itself could be a good place actually.
>> 
>> Looking forward to your additional elaboration!
>> 
>> Thanks
>> 
>> Michael
>> 
>> 
>> 
>> 
>>> Am 13.05.2024 um 12:34 schrieb Alessandro Benedetti 
>>> mailto:a.benede...@sease.io>>:
>>> 
>>> This is not strictly related to Lucene, but I'll give a talk at Berlin 
>>> Buzzwords on how I am implementing Reciprocal Rank Fusion in Apache 
>>> Solr.
>>> I'll resume my work on the contribution next week and have more to 
>>> share later.
>>> 
>>> Back in the day, I was reasoning on this and I didn't think Lucene was 
>>> the right place for an interleaving algorithm, given that Reciprocal 
>>> Rank Fusion is affected by distribution and it's not supposed to work 
>>> per node.
>>> I think I evaluated the possibility of doing it as a Lucene query or a 
>>> Lucene component but then ended up with a different approach.
>>> I'll elaborate more when I go back to the task!
>>> 
>>> Cheers
>>> 

Re: Any recommended issues to work on for a newcomer?

2024-05-18 Thread Chang Hank
Or maybe we can first create an issue and PR based on the issue number?
WDYT?

Best,

Hank

> On May 18, 2024, at 11:29 AM, Chang Hank  wrote:
> 
> Hey Michael, 
> 
> Sorry I was a bit busy this week, but I’ve looked into the resources you 
> provided and also some useful advice from Alessandro and Adrien.
> 
> I have a briefly understanding of how RRF works, but I’m not quite sure how 
> we should implement it. Based on the advice from Alessandro and Adrien, it 
> seems we need to consider that the search results are located at different 
> shards. According to Alessandro, we should aggregate the ranked lists from 
> all distributed nodes and then apply RRF.
> Are we going to implement this aggregation logic inside our RRF method? 
> 
> Also could you please create a PR so we can discuss more details further?
> 
> All the best,
> 
> Hank
> 
>> On May 13, 2024, at 10:09 AM, Michael Wechner  
>> wrote:
>> 
>> Great, sounds like we have plan :-)
>> 
>> Hank and I can get started trying to understand the internals better ...
>> 
>> Thanks
>> 
>> Michael
>> 
>> Am 13.05.24 um 18:21 schrieb Alessandro Benedetti:
>>> Sure, we can make it work but in a distributed environment you have to run 
>>> first each query distributed (aggregating all nodes) and then RRF on top of 
>>> the aggregated ranked lists.
>>> Doing RRF per node first and then aggregate per shard won't return the same 
>>> results I suspect.
>>> When I go back to working on the task I'll be able to elaborate more!
>>> 
>>> Cheers
>>> --
>>> Alessandro Benedetti
>>> Director @ Sease Ltd.
>>> Apache Lucene/Solr Committer
>>> Apache Solr PMC Member
>>> 
>>> e-mail: a.benede...@sease.io 
>>> 
>>> 
>>> Sease - Information Retrieval Applied
>>> Consulting | Training | Open Source
>>> 
>>> Website: Sease.io 
>>> LinkedIn  | Twitter 
>>>  | Youtube 
>>>  | Github 
>>> 
>>> 
>>> On Mon, 13 May 2024 at 14:12, Adrien Grand >> > wrote:
 > Maybe Adrien Grand and others might also have some feedback :-)
 
 I'd suggest the signature to look something like `TopDocs TopDocs#rrf(int 
 topN, int k, TopDocs[] hits)` to be consistent with `TopDocs#merge`. 
 Internally, it should look at `ScoreDoc#shardId` and `ScoreDoc#doc` to 
 figure out which hits map to the same document.
 
 > Back in the day, I was reasoning on this and I didn't think Lucene was 
 > the right place for an interleaving algorithm, given that Reciprocal 
 > Rank Fusion is affected by distribution and it's not supposed to work 
 > per node.
 
 To me this is like `TopDocs#merge`. There are changes needed on the 
 application side to hook this call into the logic that combines hits that 
 come from multiple shards (multiple queries in the case of RRF), but 
 Lucene can still provide the merging logic.
 
 On Mon, May 13, 2024 at 1:41 PM Michael Wechner >>> > wrote:
> Thanks for your feedback Alessandro!
> 
> I am using Lucene independent of Solr or OpenSearch, Elasticsearch, but 
> would like to combine different result sets using RRF, therefore think 
> that Lucene itself could be a good place actually.
> 
> Looking forward to your additional elaboration!
> 
> Thanks
> 
> Michael
> 
> 
> 
> 
>> Am 13.05.2024 um 12:34 schrieb Alessandro Benedetti 
>> mailto:a.benede...@sease.io>>:
>> 
>> This is not strictly related to Lucene, but I'll give a talk at Berlin 
>> Buzzwords on how I am implementing Reciprocal Rank Fusion in Apache Solr.
>> I'll resume my work on the contribution next week and have more to share 
>> later.
>> 
>> Back in the day, I was reasoning on this and I didn't think Lucene was 
>> the right place for an interleaving algorithm, given that Reciprocal 
>> Rank Fusion is affected by distribution and it's not supposed to work 
>> per node.
>> I think I evaluated the possibility of doing it as a Lucene query or a 
>> Lucene component but then ended up with a different approach.
>> I'll elaborate more when I go back to the task!
>> 
>> Cheers
>> --
>> Alessandro Benedetti
>> Director @ Sease Ltd.
>> Apache Lucene/Solr Committer
>> Apache Solr PMC Member
>> 
>> e-mail: a.benede...@sease.io 
>> 
>> 
>> Sease - Information Retrieval Applied
>> Consulting | Training | Open Source
>> 
>> Website: Sease.io 
>> LinkedIn  | Twitter 
>>  | Youtube 
>>  | Github 

Re: Any recommended issues to work on for a newcomer?

2024-05-18 Thread Chang Hank
Hey Michael, 

Sorry I was a bit busy this week, but I’ve looked into the resources you 
provided and also some useful advice from Alessandro and Adrien.

I have a briefly understanding of how RRF works, but I’m not quite sure how we 
should implement it. Based on the advice from Alessandro and Adrien, it seems 
we need to consider that the search results are located at different shards. 
According to Alessandro, we should aggregate the ranked lists from all 
distributed nodes and then apply RRF.
Are we going to implement this aggregation logic inside our RRF method? 

Also could you please create a PR so we can discuss more details further?

All the best,

Hank

> On May 13, 2024, at 10:09 AM, Michael Wechner  
> wrote:
> 
> Great, sounds like we have plan :-)
> 
> Hank and I can get started trying to understand the internals better ...
> 
> Thanks
> 
> Michael
> 
> Am 13.05.24 um 18:21 schrieb Alessandro Benedetti:
>> Sure, we can make it work but in a distributed environment you have to run 
>> first each query distributed (aggregating all nodes) and then RRF on top of 
>> the aggregated ranked lists.
>> Doing RRF per node first and then aggregate per shard won't return the same 
>> results I suspect.
>> When I go back to working on the task I'll be able to elaborate more!
>> 
>> Cheers
>> --
>> Alessandro Benedetti
>> Director @ Sease Ltd.
>> Apache Lucene/Solr Committer
>> Apache Solr PMC Member
>> 
>> e-mail: a.benede...@sease.io 
>> 
>> 
>> Sease - Information Retrieval Applied
>> Consulting | Training | Open Source
>> 
>> Website: Sease.io 
>> LinkedIn  | Twitter 
>>  | Youtube 
>>  | Github 
>> 
>> 
>> On Mon, 13 May 2024 at 14:12, Adrien Grand > > wrote:
>>> > Maybe Adrien Grand and others might also have some feedback :-)
>>> 
>>> I'd suggest the signature to look something like `TopDocs TopDocs#rrf(int 
>>> topN, int k, TopDocs[] hits)` to be consistent with `TopDocs#merge`. 
>>> Internally, it should look at `ScoreDoc#shardId` and `ScoreDoc#doc` to 
>>> figure out which hits map to the same document.
>>> 
>>> > Back in the day, I was reasoning on this and I didn't think Lucene was 
>>> > the right place for an interleaving algorithm, given that Reciprocal Rank 
>>> > Fusion is affected by distribution and it's not supposed to work per node.
>>> 
>>> To me this is like `TopDocs#merge`. There are changes needed on the 
>>> application side to hook this call into the logic that combines hits that 
>>> come from multiple shards (multiple queries in the case of RRF), but Lucene 
>>> can still provide the merging logic.
>>> 
>>> On Mon, May 13, 2024 at 1:41 PM Michael Wechner >> > wrote:
 Thanks for your feedback Alessandro!
 
 I am using Lucene independent of Solr or OpenSearch, Elasticsearch, but 
 would like to combine different result sets using RRF, therefore think 
 that Lucene itself could be a good place actually.
 
 Looking forward to your additional elaboration!
 
 Thanks
 
 Michael
 
 
 
 
> Am 13.05.2024 um 12:34 schrieb Alessandro Benedetti  >:
> 
> This is not strictly related to Lucene, but I'll give a talk at Berlin 
> Buzzwords on how I am implementing Reciprocal Rank Fusion in Apache Solr.
> I'll resume my work on the contribution next week and have more to share 
> later.
> 
> Back in the day, I was reasoning on this and I didn't think Lucene was 
> the right place for an interleaving algorithm, given that Reciprocal Rank 
> Fusion is affected by distribution and it's not supposed to work per node.
> I think I evaluated the possibility of doing it as a Lucene query or a 
> Lucene component but then ended up with a different approach.
> I'll elaborate more when I go back to the task!
> 
> Cheers
> --
> Alessandro Benedetti
> Director @ Sease Ltd.
> Apache Lucene/Solr Committer
> Apache Solr PMC Member
> 
> e-mail: a.benede...@sease.io 
> 
> 
> Sease - Information Retrieval Applied
> Consulting | Training | Open Source
> 
> Website: Sease.io 
> LinkedIn  | Twitter 
>  | Youtube 
>  | Github 
> 
> 
> On Sat, 11 May 2024 at 09:10, Michael Wechner  > wrote:
>> sure, no problem!
>> 
>> Maybe Adrien Grand and others might also have some feedback :-)
>> 
>> Thanks
>> 
>> Michael
>> 

Re: Any recommended issues to work on for a newcomer?

2024-05-13 Thread Michael Wechner

Great, sounds like we have plan :-)

Hank and I can get started trying to understand the internals better ...

Thanks

Michael

Am 13.05.24 um 18:21 schrieb Alessandro Benedetti:
Sure, we can make it work but in a distributed environment you have to 
run first each query distributed (aggregating all nodes) and then RRF 
on top of the aggregated ranked lists.
Doing RRF per node first and then aggregate per shard won't return the 
same results I suspect.

When I go back to working on the task I'll be able to elaborate more!

Cheers
--
*Alessandro Benedetti*
Director @ Sease Ltd.
/Apache Lucene/Solr Committer/
/Apache Solr PMC Member/

e-mail: a.benede...@sease.io/
/

*Sease* - Information Retrieval Applied
Consulting | Training | Open Source

Website: Sease.io 
LinkedIn  | Twitter 
 | Youtube 
 | Github 




On Mon, 13 May 2024 at 14:12, Adrien Grand  wrote:

> Maybe Adrien Grand and others might also have some feedback :-)

I'd suggest the signature to look something like `TopDocs
TopDocs#rrf(int topN, int k, TopDocs[] hits)` to be consistent
with `TopDocs#merge`. Internally, it should look at
`ScoreDoc#shardId` and `ScoreDoc#doc` to figure out which hits map
to the same document.

> Back in the day, I was reasoning on this and I didn't think
Lucene was the right place for an interleaving algorithm, given
that Reciprocal Rank Fusion is affected by distribution and it's
not supposed to work per node.

To me this is like `TopDocs#merge`. There are changes needed on
the application side to hook this call into the logic that
combines hits that come from multiple shards (multiple queries in
the case of RRF), but Lucene can still provide the merging logic.

On Mon, May 13, 2024 at 1:41 PM Michael Wechner
 wrote:

Thanks for your feedback Alessandro!

I am using Lucene independent of Solr or OpenSearch,
Elasticsearch, but would like to combine different result sets
using RRF, therefore think that Lucene itself could be a good
place actually.

Looking forward to your additional elaboration!

Thanks

Michael





Am 13.05.2024 um 12:34 schrieb Alessandro Benedetti
:

This is not strictly related to Lucene, but I'll give a talk
at Berlin Buzzwords on how I am implementing Reciprocal Rank
Fusion in Apache Solr.
I'll resume my work on the contribution next week and have
more to share later.

Back in the day, I was reasoning on this and I didn't think
Lucene was the right place for an interleaving algorithm,
given that Reciprocal Rank Fusion is affected by distribution
and it's not supposed to work per node.
I think I evaluated the possibility of doing it as a Lucene
query or a Lucene component but then ended up with a
different approach.
I'll elaborate more when I go back to the task!

Cheers
--
*Alessandro Benedetti*
Director @ Sease Ltd.
/Apache Lucene/Solr Committer/
/Apache Solr PMC Member/

e-mail: a.benede...@sease.io/
/

*Sease* - Information Retrieval Applied
Consulting | Training | Open Source

Website: Sease.io 
LinkedIn  | Twitter
 | Youtube
 |
Github 


On Sat, 11 May 2024 at 09:10, Michael Wechner
 wrote:

sure, no problem!

Maybe Adrien Grand and others might also have some
feedback :-)

Thanks

Michael

Am 10.05.24 um 23:03 schrieb Chang Hank:

Thank you for these useful resources, please allow me to
spend some time look into it.
I’ll let you know asap!!

Thanks

Hank


On May 10, 2024, at 12:34 PM, Michael Wechner

 wrote:

also we might want to consider how this relates to


https://lucene.apache.org/core/9_10_0/core/org/apache/lucene/search/Rescorer.html

In vector search reranking has become quite popular, e.g.

https://docs.cohere.com/docs/reranking

IIUC LangChain (python) for example adds the reranker
as an argument to the searcher/retriever


https://python.langchain.com/v0.1/docs/integrations/retrievers/cohere-reranker/

So maybe the following might make sense as well

TopDocs topDocsKeyword =

Re: Any recommended issues to work on for a newcomer?

2024-05-13 Thread Alessandro Benedetti
Sure, we can make it work but in a distributed environment you have to run
first each query distributed (aggregating all nodes) and then RRF on top of
the aggregated ranked lists.
Doing RRF per node first and then aggregate per shard won't return the same
results I suspect.
When I go back to working on the task I'll be able to elaborate more!

Cheers
--
*Alessandro Benedetti*
Director @ Sease Ltd.
*Apache Lucene/Solr Committer*
*Apache Solr PMC Member*

e-mail: a.benede...@sease.io


*Sease* - Information Retrieval Applied
Consulting | Training | Open Source

Website: Sease.io 
LinkedIn  | Twitter
 | Youtube
 | Github



On Mon, 13 May 2024 at 14:12, Adrien Grand  wrote:

> > Maybe Adrien Grand and others might also have some feedback :-)
>
> I'd suggest the signature to look something like `TopDocs TopDocs#rrf(int
> topN, int k, TopDocs[] hits)` to be consistent with `TopDocs#merge`.
> Internally, it should look at `ScoreDoc#shardId` and `ScoreDoc#doc` to
> figure out which hits map to the same document.
>
> > Back in the day, I was reasoning on this and I didn't think Lucene was
> the right place for an interleaving algorithm, given that Reciprocal Rank
> Fusion is affected by distribution and it's not supposed to work per node.
>
> To me this is like `TopDocs#merge`. There are changes needed on the
> application side to hook this call into the logic that combines hits that
> come from multiple shards (multiple queries in the case of RRF), but Lucene
> can still provide the merging logic.
>
> On Mon, May 13, 2024 at 1:41 PM Michael Wechner 
> wrote:
>
>> Thanks for your feedback Alessandro!
>>
>> I am using Lucene independent of Solr or OpenSearch, Elasticsearch, but
>> would like to combine different result sets using RRF, therefore think that
>> Lucene itself could be a good place actually.
>>
>> Looking forward to your additional elaboration!
>>
>> Thanks
>>
>> Michael
>>
>>
>>
>>
>> Am 13.05.2024 um 12:34 schrieb Alessandro Benedetti > >:
>>
>> This is not strictly related to Lucene, but I'll give a talk at Berlin
>> Buzzwords on how I am implementing Reciprocal Rank Fusion in Apache Solr.
>> I'll resume my work on the contribution next week and have more to share
>> later.
>>
>> Back in the day, I was reasoning on this and I didn't think Lucene was
>> the right place for an interleaving algorithm, given that Reciprocal Rank
>> Fusion is affected by distribution and it's not supposed to work per node.
>> I think I evaluated the possibility of doing it as a Lucene query or a
>> Lucene component but then ended up with a different approach.
>> I'll elaborate more when I go back to the task!
>>
>> Cheers
>> --
>> *Alessandro Benedetti*
>> Director @ Sease Ltd.
>> *Apache Lucene/Solr Committer*
>> *Apache Solr PMC Member*
>>
>> e-mail: a.benede...@sease.io
>>
>>
>> *Sease* - Information Retrieval Applied
>> Consulting | Training | Open Source
>>
>> Website: Sease.io 
>> LinkedIn  | Twitter
>>  | Youtube
>>  | Github
>> 
>>
>>
>> On Sat, 11 May 2024 at 09:10, Michael Wechner 
>> wrote:
>>
>>> sure, no problem!
>>>
>>> Maybe Adrien Grand and others might also have some feedback :-)
>>>
>>> Thanks
>>>
>>> Michael
>>>
>>> Am 10.05.24 um 23:03 schrieb Chang Hank:
>>>
>>> Thank you for these useful resources, please allow me to spend some time
>>> look into it.
>>> I’ll let you know asap!!
>>>
>>> Thanks
>>>
>>> Hank
>>>
>>> On May 10, 2024, at 12:34 PM, Michael Wechner
>>>   wrote:
>>>
>>> also we might want to consider how this relates to
>>>
>>>
>>> https://lucene.apache.org/core/9_10_0/core/org/apache/lucene/search/Rescorer.html
>>>
>>> In vector search reranking has become quite popular, e.g.
>>>
>>> https://docs.cohere.com/docs/reranking
>>>
>>> IIUC LangChain (python) for example adds the reranker as an argument to
>>> the searcher/retriever
>>>
>>>
>>> https://python.langchain.com/v0.1/docs/integrations/retrievers/cohere-reranker/
>>>
>>> So maybe the following might make sense as well
>>>
>>> TopDocs topDocsKeyword = keywordSearcher.search(keywordQuery, 10);
>>> TopDocs topDocsVector = vectorSearcher.search(query, 50, new
>>> CohereReranker());
>>>
>>> TopDocs topDocs = TopDocs.merge(new RRFRanker(), topDocsKeyword,
>>> topDocsVector);
>>>
>>> WDYT?
>>>
>>> Thanks
>>>
>>> Michael
>>>
>>>
>>> Am 10.05.24 um 21:08 schrieb Michael Wechner:
>>>
>>> great, yes, let's get started :-)
>>>
>>> What about the following pseudo code, assuming that there might be
>>> alternative ranking algorithms to RRF
>>>
>>> StoredFieldsKeyword storedFieldsKeyword =
>>> indexReaderKeyword.storedFields();
>>> StoredFieldsVector 

Re: Any recommended issues to work on for a newcomer?

2024-05-13 Thread Adrien Grand
> Maybe Adrien Grand and others might also have some feedback :-)

I'd suggest the signature to look something like `TopDocs TopDocs#rrf(int
topN, int k, TopDocs[] hits)` to be consistent with `TopDocs#merge`.
Internally, it should look at `ScoreDoc#shardId` and `ScoreDoc#doc` to
figure out which hits map to the same document.

> Back in the day, I was reasoning on this and I didn't think Lucene was
the right place for an interleaving algorithm, given that Reciprocal Rank
Fusion is affected by distribution and it's not supposed to work per node.

To me this is like `TopDocs#merge`. There are changes needed on the
application side to hook this call into the logic that combines hits that
come from multiple shards (multiple queries in the case of RRF), but Lucene
can still provide the merging logic.

On Mon, May 13, 2024 at 1:41 PM Michael Wechner 
wrote:

> Thanks for your feedback Alessandro!
>
> I am using Lucene independent of Solr or OpenSearch, Elasticsearch, but
> would like to combine different result sets using RRF, therefore think that
> Lucene itself could be a good place actually.
>
> Looking forward to your additional elaboration!
>
> Thanks
>
> Michael
>
>
>
>
> Am 13.05.2024 um 12:34 schrieb Alessandro Benedetti  >:
>
> This is not strictly related to Lucene, but I'll give a talk at Berlin
> Buzzwords on how I am implementing Reciprocal Rank Fusion in Apache Solr.
> I'll resume my work on the contribution next week and have more to share
> later.
>
> Back in the day, I was reasoning on this and I didn't think Lucene was the
> right place for an interleaving algorithm, given that Reciprocal Rank
> Fusion is affected by distribution and it's not supposed to work per node.
> I think I evaluated the possibility of doing it as a Lucene query or a
> Lucene component but then ended up with a different approach.
> I'll elaborate more when I go back to the task!
>
> Cheers
> --
> *Alessandro Benedetti*
> Director @ Sease Ltd.
> *Apache Lucene/Solr Committer*
> *Apache Solr PMC Member*
>
> e-mail: a.benede...@sease.io
>
>
> *Sease* - Information Retrieval Applied
> Consulting | Training | Open Source
>
> Website: Sease.io 
> LinkedIn  | Twitter
>  | Youtube
>  | Github
> 
>
>
> On Sat, 11 May 2024 at 09:10, Michael Wechner 
> wrote:
>
>> sure, no problem!
>>
>> Maybe Adrien Grand and others might also have some feedback :-)
>>
>> Thanks
>>
>> Michael
>>
>> Am 10.05.24 um 23:03 schrieb Chang Hank:
>>
>> Thank you for these useful resources, please allow me to spend some time
>> look into it.
>> I’ll let you know asap!!
>>
>> Thanks
>>
>> Hank
>>
>> On May 10, 2024, at 12:34 PM, Michael Wechner 
>>  wrote:
>>
>> also we might want to consider how this relates to
>>
>>
>> https://lucene.apache.org/core/9_10_0/core/org/apache/lucene/search/Rescorer.html
>>
>> In vector search reranking has become quite popular, e.g.
>>
>> https://docs.cohere.com/docs/reranking
>>
>> IIUC LangChain (python) for example adds the reranker as an argument to
>> the searcher/retriever
>>
>>
>> https://python.langchain.com/v0.1/docs/integrations/retrievers/cohere-reranker/
>>
>> So maybe the following might make sense as well
>>
>> TopDocs topDocsKeyword = keywordSearcher.search(keywordQuery, 10);
>> TopDocs topDocsVector = vectorSearcher.search(query, 50, new
>> CohereReranker());
>>
>> TopDocs topDocs = TopDocs.merge(new RRFRanker(), topDocsKeyword,
>> topDocsVector);
>>
>> WDYT?
>>
>> Thanks
>>
>> Michael
>>
>>
>> Am 10.05.24 um 21:08 schrieb Michael Wechner:
>>
>> great, yes, let's get started :-)
>>
>> What about the following pseudo code, assuming that there might be
>> alternative ranking algorithms to RRF
>>
>> StoredFieldsKeyword storedFieldsKeyword =
>> indexReaderKeyword.storedFields();
>> StoredFieldsVector storedFieldsVector = indexReaderKeyword.storedFields();
>>
>> TopDocs topDocsKeyword = keywordSearcher.search(keywordQuery, 10);
>> TopDocs topDocsVector = vectorSearcher.search(vectorQuery, 50);
>>
>> Ranker ranker = new RRFRanker();
>> TopDocs topDocs = TopDocs.rank(ranker, topDocsKeyword, topDocsVector);
>>
>> for (ScoreDoc scoreDoc : topDocs.scoreDocs) {
>> Document docK = storedFieldsKeyword.document(scoreDoc.doc);
>> Document docV = storedFieldsVector.document(scoreDoc.doc);
>> 
>> }
>>
>> whereas also see
>>
>>
>> https://lucene.apache.org/core/9_10_0/core/org/apache/lucene/search/TopDocs.html
>> https://www.elastic.co/guide/en/elasticsearch/reference/current/rrf.html
>>
>> WDYT?
>>
>> Thanks
>>
>> Michael
>>
>>
>>
>>
>> Am 10.05.24 um 20:01 schrieb Chang Hank:
>>
>> Hi Michael,
>>
>> Sounds good to me.
>> Let’s do it!!
>>
>> Cheers,
>> Hank
>>
>> On May 10, 2024, at 10:50 AM, Michael Wechner 
>>  wrote:
>>
>> Hi Hank
>>
>> Very cool!
>>
>> Adrien Grand suggested to implement it as a utility method 

Re: Any recommended issues to work on for a newcomer?

2024-05-13 Thread Michael Wechner
Thanks for your feedback Alessandro!

I am using Lucene independent of Solr or OpenSearch, Elasticsearch, but would 
like to combine different result sets using RRF, therefore think that Lucene 
itself could be a good place actually.

Looking forward to your additional elaboration!

Thanks

Michael




> Am 13.05.2024 um 12:34 schrieb Alessandro Benedetti :
> 
> This is not strictly related to Lucene, but I'll give a talk at Berlin 
> Buzzwords on how I am implementing Reciprocal Rank Fusion in Apache Solr.
> I'll resume my work on the contribution next week and have more to share 
> later.
> 
> Back in the day, I was reasoning on this and I didn't think Lucene was the 
> right place for an interleaving algorithm, given that Reciprocal Rank Fusion 
> is affected by distribution and it's not supposed to work per node.
> I think I evaluated the possibility of doing it as a Lucene query or a Lucene 
> component but then ended up with a different approach.
> I'll elaborate more when I go back to the task!
> 
> Cheers
> --
> Alessandro Benedetti
> Director @ Sease Ltd.
> Apache Lucene/Solr Committer
> Apache Solr PMC Member
> 
> e-mail: a.benede...@sease.io 
> 
> 
> Sease - Information Retrieval Applied
> Consulting | Training | Open Source
> 
> Website: Sease.io 
> LinkedIn  | Twitter 
>  | Youtube 
>  | Github 
> 
> 
> On Sat, 11 May 2024 at 09:10, Michael Wechner  > wrote:
> sure, no problem!
> 
> Maybe Adrien Grand and others might also have some feedback :-)
> 
> Thanks
> 
> Michael
> 
> Am 10.05.24 um 23:03 schrieb Chang Hank:
>> Thank you for these useful resources, please allow me to spend some time 
>> look into it. 
>> I’ll let you know asap!!
>> 
>> Thanks
>> 
>> Hank
>> 
>>> On May 10, 2024, at 12:34 PM, Michael Wechner  
>>>  wrote:
>>> 
>>> also we might want to consider how this relates to
>>> 
>>> https://lucene.apache.org/core/9_10_0/core/org/apache/lucene/search/Rescorer.html
>>>  
>>> 
>>> 
>>> In vector search reranking has become quite popular, e.g.
>>> 
>>> https://docs.cohere.com/docs/reranking 
>>> 
>>> 
>>> IIUC LangChain (python) for example adds the reranker as an argument to the 
>>> searcher/retriever
>>> 
>>> https://python.langchain.com/v0.1/docs/integrations/retrievers/cohere-reranker/
>>>  
>>> 
>>> 
>>> So maybe the following might make sense as well
>>> 
>>> TopDocs topDocsKeyword = keywordSearcher.search(keywordQuery, 10);
>>> TopDocs topDocsVector = vectorSearcher.search(query, 50, new 
>>> CohereReranker());
>>> 
>>> TopDocs topDocs = TopDocs.merge(new RRFRanker(), topDocsKeyword, 
>>> topDocsVector);
>>> 
>>> WDYT?
>>> 
>>> Thanks
>>> 
>>> Michael
>>> 
>>> 
>>> Am 10.05.24 um 21:08 schrieb Michael Wechner:
 great, yes, let's get started :-)
 
 What about the following pseudo code, assuming that there might be 
 alternative ranking algorithms to RRF
 
 StoredFieldsKeyword storedFieldsKeyword = 
 indexReaderKeyword.storedFields();
 StoredFieldsVector storedFieldsVector = indexReaderKeyword.storedFields();
 
 TopDocs topDocsKeyword = keywordSearcher.search(keywordQuery, 10);
 TopDocs topDocsVector = vectorSearcher.search(vectorQuery, 50);
 
 Ranker ranker = new RRFRanker();
 TopDocs topDocs = TopDocs.rank(ranker, topDocsKeyword, topDocsVector);
 
 for (ScoreDoc scoreDoc : topDocs.scoreDocs) {
 Document docK = storedFieldsKeyword.document(scoreDoc.doc);
 Document docV = storedFieldsVector.document(scoreDoc.doc);
 
 } 
 
 whereas also see 
 
 https://lucene.apache.org/core/9_10_0/core/org/apache/lucene/search/TopDocs.html
  
 
 https://www.elastic.co/guide/en/elasticsearch/reference/current/rrf.html 
 
 
 WDYT?
 
 Thanks
 
 Michael
 
 
 
 
 Am 10.05.24 um 20:01 schrieb Chang Hank:
> Hi Michael,
> 
> Sounds good to me. 
> Let’s do it!!
> 
> Cheers,
> Hank
> 
>> On May 10, 2024, at 10:50 AM, Michael Wechner 
>>   wrote:
>> 
>> Hi Hank
>> 
>> Very cool!
>> 
>> Adrien Grand suggested to implement it as  a utility method on the 
>> TopDocs class, and since Adrien worked for a decade on Lucene
>> https://www.elastic.co/de/blog/author/adrien-grand 
>> 

Re: Any recommended issues to work on for a newcomer?

2024-05-13 Thread Alessandro Benedetti
This is not strictly related to Lucene, but I'll give a talk at Berlin
Buzzwords on how I am implementing Reciprocal Rank Fusion in Apache Solr.
I'll resume my work on the contribution next week and have more to share
later.

Back in the day, I was reasoning on this and I didn't think Lucene was the
right place for an interleaving algorithm, given that Reciprocal Rank
Fusion is affected by distribution and it's not supposed to work per node.
I think I evaluated the possibility of doing it as a Lucene query or a
Lucene component but then ended up with a different approach.
I'll elaborate more when I go back to the task!

Cheers
--
*Alessandro Benedetti*
Director @ Sease Ltd.
*Apache Lucene/Solr Committer*
*Apache Solr PMC Member*

e-mail: a.benede...@sease.io


*Sease* - Information Retrieval Applied
Consulting | Training | Open Source

Website: Sease.io 
LinkedIn  | Twitter
 | Youtube
 | Github



On Sat, 11 May 2024 at 09:10, Michael Wechner 
wrote:

> sure, no problem!
>
> Maybe Adrien Grand and others might also have some feedback :-)
>
> Thanks
>
> Michael
>
> Am 10.05.24 um 23:03 schrieb Chang Hank:
>
> Thank you for these useful resources, please allow me to spend some time
> look into it.
> I’ll let you know asap!!
>
> Thanks
>
> Hank
>
> On May 10, 2024, at 12:34 PM, Michael Wechner 
>  wrote:
>
> also we might want to consider how this relates to
>
>
> https://lucene.apache.org/core/9_10_0/core/org/apache/lucene/search/Rescorer.html
>
> In vector search reranking has become quite popular, e.g.
>
> https://docs.cohere.com/docs/reranking
>
> IIUC LangChain (python) for example adds the reranker as an argument to
> the searcher/retriever
>
>
> https://python.langchain.com/v0.1/docs/integrations/retrievers/cohere-reranker/
>
> So maybe the following might make sense as well
>
> TopDocs topDocsKeyword = keywordSearcher.search(keywordQuery, 10);
> TopDocs topDocsVector = vectorSearcher.search(query, 50, new
> CohereReranker());
>
> TopDocs topDocs = TopDocs.merge(new RRFRanker(), topDocsKeyword,
> topDocsVector);
>
> WDYT?
>
> Thanks
>
> Michael
>
>
> Am 10.05.24 um 21:08 schrieb Michael Wechner:
>
> great, yes, let's get started :-)
>
> What about the following pseudo code, assuming that there might be
> alternative ranking algorithms to RRF
>
> StoredFieldsKeyword storedFieldsKeyword =
> indexReaderKeyword.storedFields();
> StoredFieldsVector storedFieldsVector = indexReaderKeyword.storedFields();
>
> TopDocs topDocsKeyword = keywordSearcher.search(keywordQuery, 10);
> TopDocs topDocsVector = vectorSearcher.search(vectorQuery, 50);
>
> Ranker ranker = new RRFRanker();
> TopDocs topDocs = TopDocs.rank(ranker, topDocsKeyword, topDocsVector);
>
> for (ScoreDoc scoreDoc : topDocs.scoreDocs) {
> Document docK = storedFieldsKeyword.document(scoreDoc.doc);
> Document docV = storedFieldsVector.document(scoreDoc.doc);
> 
> }
>
> whereas also see
>
>
> https://lucene.apache.org/core/9_10_0/core/org/apache/lucene/search/TopDocs.html
> https://www.elastic.co/guide/en/elasticsearch/reference/current/rrf.html
>
> WDYT?
>
> Thanks
>
> Michael
>
>
>
>
> Am 10.05.24 um 20:01 schrieb Chang Hank:
>
> Hi Michael,
>
> Sounds good to me.
> Let’s do it!!
>
> Cheers,
> Hank
>
> On May 10, 2024, at 10:50 AM, Michael Wechner 
>  wrote:
>
> Hi Hank
>
> Very cool!
>
> Adrien Grand suggested to implement it as a utility method on the TopDocs
> class, and since Adrien worked for a decade on Lucene
> https://www.elastic.co/de/blog/author/adrien-grand I guess it makes sense
> to follow his advice :-) We could create a PR and work together on it,
> WDYT? All the best Michael
> Am 10.05.24 um 18:51 schrieb Chang Hank:
>
> Hi Michael,
>
> Thank you for the reply.
> This is really a cool issue to work on,  I’m happy to work on this with
> you. I’ll try to do research on RRF first.
> Also, are we going to implement this on the TopDocs class?
>
> Best,
> Hank
>
>
> On May 9, 2024, at 11:08 PM, Michael Wechner 
>  wrote:
>
> Hi Hank
>
> Thanks for offering your help!
>
> I recently suggested to implement RRF (Reciprocal Rank Fusion)
>
> https://lists.apache.org/thread/vvwvjl0gk67okn8z1wg33ogyf9qm07sz
>
> but still have not found the time to really work on this.
>
> Maybe you would be interested to do this or that we work on it together
> somehow?
>
> Thanks
>
> Michael
>
>
>
> Am 10.05.24 um 07:27 schrieb Chang Hank:
>
> Hi everyone,
>
> I’m Hank Chang, currently studying Information Retrieval topics. I’m
> really interested in contributing to Apache Lucene and enhance my
> understanding to the field.
> I’ve reviewed several issues posted on the Github repository but haven’t
> found a straightforward starting point. Could someone please recommend
> suitable issues for a newcomer like me or suggest areas I could assist with?
>
> 

Re: Any recommended issues to work on for a newcomer?

2024-05-11 Thread Michael Wechner

sure, no problem!

Maybe Adrien Grand and others might also have some feedback :-)

Thanks

Michael

Am 10.05.24 um 23:03 schrieb Chang Hank:
Thank you for these useful resources, please allow me to spend some 
time look into it.

I’ll let you know asap!!

Thanks

Hank

On May 10, 2024, at 12:34 PM, Michael Wechner 
 wrote:


also we might want to consider how this relates to

https://lucene.apache.org/core/9_10_0/core/org/apache/lucene/search/Rescorer.html

In vector search reranking has become quite popular, e.g.

https://docs.cohere.com/docs/reranking

IIUC LangChain (python) for example adds the reranker as an argument 
to the searcher/retriever


https://python.langchain.com/v0.1/docs/integrations/retrievers/cohere-reranker/

So maybe the following might make sense as well

TopDocs topDocsKeyword = keywordSearcher.search(keywordQuery, 10);
TopDocs topDocsVector = vectorSearcher.search(query, 50, new 
CohereReranker());


TopDocs topDocs = TopDocs.merge(new RRFRanker(), topDocsKeyword, 
topDocsVector);


WDYT?

Thanks

Michael


Am 10.05.24 um 21:08 schrieb Michael Wechner:

great, yes, let's get started :-)

What about the following pseudo code, assuming that there might be 
alternative ranking algorithms to RRF


StoredFieldsKeyword storedFieldsKeyword = 
indexReaderKeyword.storedFields();
StoredFieldsVector storedFieldsVector = 
indexReaderKeyword.storedFields();


TopDocs topDocsKeyword = keywordSearcher.search(keywordQuery, 10);
TopDocs topDocsVector = vectorSearcher.search(vectorQuery, 50);

Ranker ranker = new RRFRanker();
TopDocs topDocs = TopDocs.rank(ranker, topDocsKeyword, topDocsVector);

for (ScoreDoc scoreDoc : topDocs.scoreDocs) {
    Document docK = storedFieldsKeyword.document(scoreDoc.doc);
    Document docV = storedFieldsVector.document(scoreDoc.doc);
    
}

whereas also see

https://lucene.apache.org/core/9_10_0/core/org/apache/lucene/search/TopDocs.html
https://www.elastic.co/guide/en/elasticsearch/reference/current/rrf.html

WDYT?

Thanks

Michael




Am 10.05.24 um 20:01 schrieb Chang Hank:

Hi Michael,

Sounds good to me.
Let’s do it!!

Cheers,
Hank

On May 10, 2024, at 10:50 AM, Michael Wechner 
 wrote:


Hi Hank

Very cool!

Adrien Grand suggested to implement it as a utility method on the 
TopDocs class, and since Adrien worked for a decade on Lucene 
https://www.elastic.co/de/blog/author/adrien-grand I guess it 
makes sense to follow his advice :-) We could create a PR and work 
together on it, WDYT? All the best Michael

Am 10.05.24 um 18:51 schrieb Chang Hank:

Hi Michael,

Thank you for the reply.
This is really a cool issue to work on, I’m happy to work on this 
with you. I’ll try to do research on RRF first.

Also, are we going to implement this on the TopDocs class?

Best,
Hank


On May 9, 2024, at 11:08 PM, Michael Wechner 
 wrote:


Hi Hank

Thanks for offering your help!

I recently suggested to implement RRF (Reciprocal Rank Fusion)

https://lists.apache.org/thread/vvwvjl0gk67okn8z1wg33ogyf9qm07sz

but still have not found the time to really work on this.

Maybe you would be interested to do this or that we work on it 
together somehow?


Thanks

Michael



Am 10.05.24 um 07:27 schrieb Chang Hank:

Hi everyone,

I’m Hank Chang, currently studying Information Retrieval 
topics. I’m really interested in contributing to Apache Lucene 
and enhance my understanding to the field.
I’ve reviewed several issues posted on the Github repository 
but haven’t found a straightforward starting point. Could 
someone please recommend suitable issues for a newcomer like me 
or suggest areas I could assist with?


Thank you for your time and guidance.

Best regards,
Hank Chang
-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org




-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org















Re: Any recommended issues to work on for a newcomer?

2024-05-10 Thread Chang Hank
Thank you for these useful resources, please allow me to spend some time look 
into it. 
I’ll let you know asap!!

Thanks

Hank

> On May 10, 2024, at 12:34 PM, Michael Wechner  
> wrote:
> 
> also we might want to consider how this relates to
> 
> https://lucene.apache.org/core/9_10_0/core/org/apache/lucene/search/Rescorer.html
> 
> In vector search reranking has become quite popular, e.g.
> 
> https://docs.cohere.com/docs/reranking
> 
> IIUC LangChain (python) for example adds the reranker as an argument to the 
> searcher/retriever
> 
> https://python.langchain.com/v0.1/docs/integrations/retrievers/cohere-reranker/
> 
> So maybe the following might make sense as well
> 
> TopDocs topDocsKeyword = keywordSearcher.search(keywordQuery, 10);
> TopDocs topDocsVector = vectorSearcher.search(query, 50, new 
> CohereReranker());
> 
> TopDocs topDocs = TopDocs.merge(new RRFRanker(), topDocsKeyword, 
> topDocsVector);
> 
> WDYT?
> 
> Thanks
> 
> Michael
> 
> 
> Am 10.05.24 um 21:08 schrieb Michael Wechner:
>> great, yes, let's get started :-)
>> 
>> What about the following pseudo code, assuming that there might be 
>> alternative ranking algorithms to RRF
>> 
>> StoredFieldsKeyword storedFieldsKeyword = indexReaderKeyword.storedFields();
>> StoredFieldsVector storedFieldsVector = indexReaderKeyword.storedFields();
>> 
>> TopDocs topDocsKeyword = keywordSearcher.search(keywordQuery, 10);
>> TopDocs topDocsVector = vectorSearcher.search(vectorQuery, 50);
>> 
>> Ranker ranker = new RRFRanker();
>> TopDocs topDocs = TopDocs.rank(ranker, topDocsKeyword, topDocsVector);
>> 
>> for (ScoreDoc scoreDoc : topDocs.scoreDocs) {
>> Document docK = storedFieldsKeyword.document(scoreDoc.doc);
>> Document docV = storedFieldsVector.document(scoreDoc.doc);
>> 
>> } 
>> 
>> whereas also see 
>> 
>> https://lucene.apache.org/core/9_10_0/core/org/apache/lucene/search/TopDocs.html
>> https://www.elastic.co/guide/en/elasticsearch/reference/current/rrf.html
>> 
>> WDYT?
>> 
>> Thanks
>> 
>> Michael
>> 
>> 
>> 
>> 
>> Am 10.05.24 um 20:01 schrieb Chang Hank:
>>> Hi Michael,
>>> 
>>> Sounds good to me. 
>>> Let’s do it!!
>>> 
>>> Cheers,
>>> Hank
>>> 
 On May 10, 2024, at 10:50 AM, Michael Wechner  
  wrote:
 
 Hi Hank
 
 Very cool!
 
 Adrien Grand suggested to implement it as a utility method on the TopDocs 
 class, and since Adrien worked for a decade on Lucene
 https://www.elastic.co/de/blog/author/adrien-grand
 I guess it makes sense to follow his advice :-)
 
 We could create a PR and work together on it, WDYT?
 
 All the best
 
 Michael
 
 Am 10.05.24 um 18:51 schrieb Chang Hank:
> Hi Michael, 
> 
> Thank you for the reply.
> This is really a cool issue to work on,  I’m happy to work on this with 
> you. I’ll try to do research on RRF first.
> Also, are we going to implement this on the TopDocs class?
> 
> Best,
> Hank
> 
> 
>> On May 9, 2024, at 11:08 PM, Michael Wechner  
>>  wrote:
>> 
>> Hi Hank
>> 
>> Thanks for offering your help!
>> 
>> I recently suggested to implement RRF (Reciprocal Rank Fusion)
>> 
>> https://lists.apache.org/thread/vvwvjl0gk67okn8z1wg33ogyf9qm07sz
>> 
>> but still have not found the time to really work on this.
>> 
>> Maybe you would be interested to do this or that we work on it together 
>> somehow?
>> 
>> Thanks
>> 
>> Michael
>> 
>> 
>> 
>> Am 10.05.24 um 07:27 schrieb Chang Hank:
>>> Hi everyone,
>>> 
>>> I’m Hank Chang, currently studying Information Retrieval topics. I’m 
>>> really interested in contributing to Apache Lucene and enhance my 
>>> understanding to the field.
>>> I’ve reviewed several issues posted on the Github repository but 
>>> haven’t found a straightforward starting point. Could someone please 
>>> recommend suitable issues for a newcomer like me or suggest areas I 
>>> could assist with?
>>> 
>>> Thank you for your time and guidance.
>>> 
>>> Best regards,
>>> Hank Chang
>>> -
>>> To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org 
>>> 
>>> For additional commands, e-mail: dev-h...@lucene.apache.org 
>>> 
>>> 
>> 
>> 
>> -
>> To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org 
>> 
>> For additional commands, e-mail: dev-h...@lucene.apache.org 
>> 
>> 
> 
 
>>> 
>> 
> 



Re: Any recommended issues to work on for a newcomer?

2024-05-10 Thread Michael Wechner

also we might want to consider how this relates to

https://lucene.apache.org/core/9_10_0/core/org/apache/lucene/search/Rescorer.html

In vector search reranking has become quite popular, e.g.

https://docs.cohere.com/docs/reranking

IIUC LangChain (python) for example adds the reranker as an argument to 
the searcher/retriever


https://python.langchain.com/v0.1/docs/integrations/retrievers/cohere-reranker/

So maybe the following might make sense as well

TopDocs topDocsKeyword = keywordSearcher.search(keywordQuery, 10);
TopDocs topDocsVector = vectorSearcher.search(query, 50, new 
CohereReranker());


TopDocs topDocs = TopDocs.merge(new RRFRanker(), topDocsKeyword, 
topDocsVector);


WDYT?

Thanks

Michael


Am 10.05.24 um 21:08 schrieb Michael Wechner:

great, yes, let's get started :-)

What about the following pseudo code, assuming that there might be 
alternative ranking algorithms to RRF


StoredFieldsKeyword storedFieldsKeyword = 
indexReaderKeyword.storedFields();

StoredFieldsVector storedFieldsVector = indexReaderKeyword.storedFields();

TopDocs topDocsKeyword = keywordSearcher.search(keywordQuery, 10);
TopDocs topDocsVector = vectorSearcher.search(vectorQuery, 50);

Ranker ranker = new RRFRanker();
TopDocs topDocs = TopDocs.rank(ranker, topDocsKeyword, topDocsVector);

for (ScoreDoc scoreDoc : topDocs.scoreDocs) {
    Document docK = storedFieldsKeyword.document(scoreDoc.doc);
    Document docV = storedFieldsVector.document(scoreDoc.doc);
    
}

whereas also see

https://lucene.apache.org/core/9_10_0/core/org/apache/lucene/search/TopDocs.html
https://www.elastic.co/guide/en/elasticsearch/reference/current/rrf.html

WDYT?

Thanks

Michael




Am 10.05.24 um 20:01 schrieb Chang Hank:

Hi Michael,

Sounds good to me.
Let’s do it!!

Cheers,
Hank

On May 10, 2024, at 10:50 AM, Michael Wechner 
 wrote:


Hi Hank

Very cool!

Adrien Grand suggested to implement it as a utility method on the 
TopDocs class, and since Adrien worked for a decade on Lucene 
https://www.elastic.co/de/blog/author/adrien-grand I guess it makes 
sense to follow his advice :-) We could create a PR and work 
together on it, WDYT? All the best Michael

Am 10.05.24 um 18:51 schrieb Chang Hank:

Hi Michael,

Thank you for the reply.
This is really a cool issue to work on, I’m happy to work on this 
with you. I’ll try to do research on RRF first.

Also, are we going to implement this on the TopDocs class?

Best,
Hank


On May 9, 2024, at 11:08 PM, Michael Wechner 
 wrote:


Hi Hank

Thanks for offering your help!

I recently suggested to implement RRF (Reciprocal Rank Fusion)

https://lists.apache.org/thread/vvwvjl0gk67okn8z1wg33ogyf9qm07sz

but still have not found the time to really work on this.

Maybe you would be interested to do this or that we work on it 
together somehow?


Thanks

Michael



Am 10.05.24 um 07:27 schrieb Chang Hank:

Hi everyone,

I’m Hank Chang, currently studying Information Retrieval topics. 
I’m really interested in contributing to Apache Lucene and 
enhance my understanding to the field.
I’ve reviewed several issues posted on the Github repository but 
haven’t found a straightforward starting point. Could someone 
please recommend suitable issues for a newcomer like me or 
suggest areas I could assist with?


Thank you for your time and guidance.

Best regards,
Hank Chang
-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org




-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org











Re: Any recommended issues to work on for a newcomer?

2024-05-10 Thread Michael Wechner

great, yes, let's get started :-)

What about the following pseudo code, assuming that there might be 
alternative ranking algorithms to RRF


StoredFieldsKeyword storedFieldsKeyword = indexReaderKeyword.storedFields();
StoredFieldsVector storedFieldsVector = indexReaderKeyword.storedFields();

TopDocs topDocsKeyword = keywordSearcher.search(keywordQuery, 10);
TopDocs topDocsVector = vectorSearcher.search(vectorQuery, 50);

Ranker ranker = new RRFRanker();
TopDocs topDocs = TopDocs.rank(ranker, topDocsKeyword, topDocsVector);

for (ScoreDoc scoreDoc : topDocs.scoreDocs) {
    Document docK = storedFieldsKeyword.document(scoreDoc.doc);
    Document docV = storedFieldsVector.document(scoreDoc.doc);
    
}

whereas also see

https://lucene.apache.org/core/9_10_0/core/org/apache/lucene/search/TopDocs.html
https://www.elastic.co/guide/en/elasticsearch/reference/current/rrf.html

WDYT?

Thanks

Michael




Am 10.05.24 um 20:01 schrieb Chang Hank:

Hi Michael,

Sounds good to me.
Let’s do it!!

Cheers,
Hank

On May 10, 2024, at 10:50 AM, Michael Wechner 
 wrote:


Hi Hank

Very cool!

Adrien Grand suggested to implement it as a utility method on the 
TopDocs class, and since Adrien worked for a decade on Lucene 
https://www.elastic.co/de/blog/author/adrien-grand I guess it makes 
sense to follow his advice :-) We could create a PR and work together 
on it, WDYT? All the best Michael

Am 10.05.24 um 18:51 schrieb Chang Hank:

Hi Michael,

Thank you for the reply.
This is really a cool issue to work on, I’m happy to work on this 
with you. I’ll try to do research on RRF first.

Also, are we going to implement this on the TopDocs class?

Best,
Hank


On May 9, 2024, at 11:08 PM, Michael Wechner 
 wrote:


Hi Hank

Thanks for offering your help!

I recently suggested to implement RRF (Reciprocal Rank Fusion)

https://lists.apache.org/thread/vvwvjl0gk67okn8z1wg33ogyf9qm07sz

but still have not found the time to really work on this.

Maybe you would be interested to do this or that we work on it 
together somehow?


Thanks

Michael



Am 10.05.24 um 07:27 schrieb Chang Hank:

Hi everyone,

I’m Hank Chang, currently studying Information Retrieval topics. 
I’m really interested in contributing to Apache Lucene and enhance 
my understanding to the field.
I’ve reviewed several issues posted on the Github repository but 
haven’t found a straightforward starting point. Could someone 
please recommend suitable issues for a newcomer like me or suggest 
areas I could assist with?


Thank you for your time and guidance.

Best regards,
Hank Chang
-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org




-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org









Re: Any recommended issues to work on for a newcomer?

2024-05-10 Thread Chang Hank
Hi Michael,

Sounds good to me. 
Let’s do it!!

Cheers,
Hank

> On May 10, 2024, at 10:50 AM, Michael Wechner  
> wrote:
> 
> Hi Hank
> 
> Very cool!
> 
> Adrien Grand suggested to implement it as a utility method on the TopDocs 
> class, and since Adrien worked for a decade on Lucene
> https://www.elastic.co/de/blog/author/adrien-grand
> I guess it makes sense to follow his advice :-)
> 
> We could create a PR and work together on it, WDYT?
> 
> All the best
> 
> Michael
> 
> Am 10.05.24 um 18:51 schrieb Chang Hank:
>> Hi Michael, 
>> 
>> Thank you for the reply.
>> This is really a cool issue to work on,  I’m happy to work on this with you. 
>> I’ll try to do research on RRF first.
>> Also, are we going to implement this on the TopDocs class?
>> 
>> Best,
>> Hank
>> 
>> 
>>> On May 9, 2024, at 11:08 PM, Michael Wechner  
>>>  wrote:
>>> 
>>> Hi Hank
>>> 
>>> Thanks for offering your help!
>>> 
>>> I recently suggested to implement RRF (Reciprocal Rank Fusion)
>>> 
>>> https://lists.apache.org/thread/vvwvjl0gk67okn8z1wg33ogyf9qm07sz
>>> 
>>> but still have not found the time to really work on this.
>>> 
>>> Maybe you would be interested to do this or that we work on it together 
>>> somehow?
>>> 
>>> Thanks
>>> 
>>> Michael
>>> 
>>> 
>>> 
>>> Am 10.05.24 um 07:27 schrieb Chang Hank:
 Hi everyone,
 
 I’m Hank Chang, currently studying Information Retrieval topics. I’m 
 really interested in contributing to Apache Lucene and enhance my 
 understanding to the field.
 I’ve reviewed several issues posted on the Github repository but haven’t 
 found a straightforward starting point. Could someone please recommend 
 suitable issues for a newcomer like me or suggest areas I could assist 
 with?
 
 Thank you for your time and guidance.
 
 Best regards,
 Hank Chang
 -
 To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org 
 
 For additional commands, e-mail: dev-h...@lucene.apache.org 
 
 
>>> 
>>> 
>>> -
>>> To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org 
>>> 
>>> For additional commands, e-mail: dev-h...@lucene.apache.org 
>>> 
>>> 
>> 
> 



Re: Any recommended issues to work on for a newcomer?

2024-05-10 Thread Michael Wechner

Hi Hank

Very cool!

Adrien Grand suggested to implement it as a utility method on the 
TopDocs class, and since Adrien worked for a decade on Lucene 
https://www.elastic.co/de/blog/author/adrien-grand I guess it makes 
sense to follow his advice :-) We could create a PR and work together on 
it, WDYT? All the best Michael

Am 10.05.24 um 18:51 schrieb Chang Hank:

Hi Michael,

Thank you for the reply.
This is really a cool issue to work on, I’m happy to work on this with 
you. I’ll try to do research on RRF first.

Also, are we going to implement this on the TopDocs class?

Best,
Hank


On May 9, 2024, at 11:08 PM, Michael Wechner 
 wrote:


Hi Hank

Thanks for offering your help!

I recently suggested to implement RRF (Reciprocal Rank Fusion)

https://lists.apache.org/thread/vvwvjl0gk67okn8z1wg33ogyf9qm07sz

but still have not found the time to really work on this.

Maybe you would be interested to do this or that we work on it 
together somehow?


Thanks

Michael



Am 10.05.24 um 07:27 schrieb Chang Hank:

Hi everyone,

I’m Hank Chang, currently studying Information Retrieval topics. I’m 
really interested in contributing to Apache Lucene and enhance my 
understanding to the field.
I’ve reviewed several issues posted on the Github repository but 
haven’t found a straightforward starting point. Could someone please 
recommend suitable issues for a newcomer like me or suggest areas I 
could assist with?


Thank you for your time and guidance.

Best regards,
Hank Chang
-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org




-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org





Re: Any recommended issues to work on for a newcomer?

2024-05-10 Thread Chang Hank
Hi Michael, 

Thank you for the reply.
This is really a cool issue to work on,  I’m happy to work on this with you. 
I’ll try to do research on RRF first.
Also, are we going to implement this on the TopDocs class?

Best,
Hank


> On May 9, 2024, at 11:08 PM, Michael Wechner  
> wrote:
> 
> Hi Hank
> 
> Thanks for offering your help!
> 
> I recently suggested to implement RRF (Reciprocal Rank Fusion)
> 
> https://lists.apache.org/thread/vvwvjl0gk67okn8z1wg33ogyf9qm07sz
> 
> but still have not found the time to really work on this.
> 
> Maybe you would be interested to do this or that we work on it together 
> somehow?
> 
> Thanks
> 
> Michael
> 
> 
> 
> Am 10.05.24 um 07:27 schrieb Chang Hank:
>> Hi everyone,
>> 
>> I’m Hank Chang, currently studying Information Retrieval topics. I’m really 
>> interested in contributing to Apache Lucene and enhance my understanding to 
>> the field.
>> I’ve reviewed several issues posted on the Github repository but haven’t 
>> found a straightforward starting point. Could someone please recommend 
>> suitable issues for a newcomer like me or suggest areas I could assist with?
>> 
>> Thank you for your time and guidance.
>> 
>> Best regards,
>> Hank Chang
>> -
>> To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
>> For additional commands, e-mail: dev-h...@lucene.apache.org
>> 
> 
> 
> -
> To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
> For additional commands, e-mail: dev-h...@lucene.apache.org
> 



Re: Any recommended issues to work on for a newcomer?

2024-05-10 Thread Michael Wechner

Hi Hank

Thanks for offering your help!

I recently suggested to implement RRF (Reciprocal Rank Fusion)

https://lists.apache.org/thread/vvwvjl0gk67okn8z1wg33ogyf9qm07sz

but still have not found the time to really work on this.

Maybe you would be interested to do this or that we work on it together 
somehow?


Thanks

Michael



Am 10.05.24 um 07:27 schrieb Chang Hank:

Hi everyone,

I’m Hank Chang, currently studying Information Retrieval topics. I’m really 
interested in contributing to Apache Lucene and enhance my understanding to the 
field.
I’ve reviewed several issues posted on the Github repository but haven’t found 
a straightforward starting point. Could someone please recommend suitable 
issues for a newcomer like me or suggest areas I could assist with?

Thank you for your time and guidance.

Best regards,
Hank Chang
-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org




-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



Any recommended issues to work on for a newcomer?

2024-05-09 Thread Chang Hank
Hi everyone, 

I’m Hank Chang, currently studying Information Retrieval topics. I’m really 
interested in contributing to Apache Lucene and enhance my understanding to the 
field.
I’ve reviewed several issues posted on the Github repository but haven’t found 
a straightforward starting point. Could someone please recommend suitable 
issues for a newcomer like me or suggest areas I could assist with?

Thank you for your time and guidance.

Best regards,
Hank Chang
-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org