[jira] [Commented] (LUCENE-8791) Add CollectorRescorer

2019-06-14 Thread Atri Sharma (JIRA)


[ 
https://issues.apache.org/jira/browse/LUCENE-8791?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16863734#comment-16863734
 ] 

Atri Sharma commented on LUCENE-8791:
-

I will take a look today

> Add CollectorRescorer
> -
>
> Key: LUCENE-8791
> URL: https://issues.apache.org/jira/browse/LUCENE-8791
> Project: Lucene - Core
>  Issue Type: Improvement
>Reporter: Elbek Kamoliddinov
>Priority: Major
> Attachments: LUCENE-8791.patch, LUCENE-8791.patch, LUCENE-8791.patch, 
> LUCENE-8791.patch, LUCENE-8791.patch
>
>
> This is another implementation of query rescorer api (LUCENE-5489). It adds 
> rescoring functionality based on provided CollectorManager. 
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-8791) Add CollectorRescorer

2019-06-14 Thread Elbek Kamoliddinov (JIRA)


[ 
https://issues.apache.org/jira/browse/LUCENE-8791?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16863723#comment-16863723
 ] 

Elbek Kamoliddinov commented on LUCENE-8791:


Happy to address feed-backs for the last patch if any. 



> Add CollectorRescorer
> -
>
> Key: LUCENE-8791
> URL: https://issues.apache.org/jira/browse/LUCENE-8791
> Project: Lucene - Core
>  Issue Type: Improvement
>Reporter: Elbek Kamoliddinov
>Priority: Major
> Attachments: LUCENE-8791.patch, LUCENE-8791.patch, LUCENE-8791.patch, 
> LUCENE-8791.patch, LUCENE-8791.patch
>
>
> This is another implementation of query rescorer api (LUCENE-5489). It adds 
> rescoring functionality based on provided CollectorManager. 
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-8791) Add CollectorRescorer

2019-06-10 Thread Mike Sokolov (JIRA)


[ 
https://issues.apache.org/jira/browse/LUCENE-8791?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16860382#comment-16860382
 ] 

Mike Sokolov commented on LUCENE-8791:
--

bq. We distribute total number of results we are looking from matching across 
segments evenly plus some static number for overhead

I think this is the same pro-rated idea from LUCENE-8681; when the documents 
are randomly distributed among segments, the prediction can be quite accurate. 
In the case of a time series index though (eg, or any index where the 
distribution among segments is correlated with the rank), then this approach to 
early termination is not directly applicable.

> Add CollectorRescorer
> -
>
> Key: LUCENE-8791
> URL: https://issues.apache.org/jira/browse/LUCENE-8791
> Project: Lucene - Core
>  Issue Type: Improvement
>Reporter: Elbek Kamoliddinov
>Priority: Major
> Attachments: LUCENE-8791.patch, LUCENE-8791.patch, LUCENE-8791.patch, 
> LUCENE-8791.patch, LUCENE-8791.patch
>
>
> This is another implementation of query rescorer api (LUCENE-5489). It adds 
> rescoring functionality based on provided CollectorManager. 
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-8791) Add CollectorRescorer

2019-06-10 Thread Michael McCandless (JIRA)


[ 
https://issues.apache.org/jira/browse/LUCENE-8791?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16860335#comment-16860335
 ] 

Michael McCandless commented on LUCENE-8791:


{quote}the default interface takes an ExecutorManager.
{quote}
Hmm did you mean {{ExecutorService}} or {{CollectorManager}} here?  There is no 
{{ExecutorManger}} that I can see.

+1 to mark the {{ExecutorService}} ctor/setter as expert w/ javadocs that 
explain that it is not often needed to distribute collection work across 
concurrent threads.

> Add CollectorRescorer
> -
>
> Key: LUCENE-8791
> URL: https://issues.apache.org/jira/browse/LUCENE-8791
> Project: Lucene - Core
>  Issue Type: Improvement
>Reporter: Elbek Kamoliddinov
>Priority: Major
> Attachments: LUCENE-8791.patch, LUCENE-8791.patch, LUCENE-8791.patch, 
> LUCENE-8791.patch, LUCENE-8791.patch
>
>
> This is another implementation of query rescorer api (LUCENE-5489). It adds 
> rescoring functionality based on provided CollectorManager. 
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-8791) Add CollectorRescorer

2019-06-09 Thread Elbek Kamoliddinov (JIRA)


[ 
https://issues.apache.org/jira/browse/LUCENE-8791?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16859708#comment-16859708
 ] 

Elbek Kamoliddinov commented on LUCENE-8791:


{quote}Do you just accept the decrease of precision if one slice didn't collect 
enough hits, or do you re-run the search on this slice with a greater number of 
hits?
{quote}
We don't rerun, just accept the result. This approach has been working fine.
{quote}I won't veto this change, but could we make the single-threaded 
constructor take a Collector rather than a CollectorManager and document that 
the one that takes an ExecutorService is expert and usually doesn't help 
significantly?
{quote}
Hmm, how do we return {{TopDocs}} from {{Collector}} interface? As I understand 
{{Collector}}/{{LeafCollector}} doesn't make any assumption on how collected 
docs sorted, scored etc. So is it even possible to build {{TopDocs}} from 
{{Collector}}? With {{CollectorManager}} user plugs in this logic. Taking 
{{Collector}} object in the constructor feels like bad api and feels confusing. 
 I added Expert comment in the last patch. So patch has a sole ctor:
 {code}
public CollectorRescorer(CollectorManager manager) {
  this.manager = manager;
}
{code}


> Add CollectorRescorer
> -
>
> Key: LUCENE-8791
> URL: https://issues.apache.org/jira/browse/LUCENE-8791
> Project: Lucene - Core
>  Issue Type: Improvement
>Reporter: Elbek Kamoliddinov
>Priority: Major
> Attachments: LUCENE-8791.patch, LUCENE-8791.patch, LUCENE-8791.patch, 
> LUCENE-8791.patch
>
>
> This is another implementation of query rescorer api (LUCENE-5489). It adds 
> rescoring functionality based on provided CollectorManager. 
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-8791) Add CollectorRescorer

2019-06-08 Thread Atri Sharma (JIRA)


[ 
https://issues.apache.org/jira/browse/LUCENE-8791?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16859239#comment-16859239
 ] 

Atri Sharma commented on LUCENE-8791:
-

bq. could we make the single-threaded constructor take a Collector rather than 
a CollectorManager and document that the one that takes an ExecutorService is 
expert and usually doesn't help significantly?

That has also been one of my main gripes with the approach that this patch 
takes -- the default interface takes an ExecutorManager. Maybe it is just a 
matter of personal taste, but it still catches the eye. It would be great if we 
could improve upon that

> Add CollectorRescorer
> -
>
> Key: LUCENE-8791
> URL: https://issues.apache.org/jira/browse/LUCENE-8791
> Project: Lucene - Core
>  Issue Type: Improvement
>Reporter: Elbek Kamoliddinov
>Priority: Major
> Attachments: LUCENE-8791.patch, LUCENE-8791.patch, LUCENE-8791.patch, 
> LUCENE-8791.patch
>
>
> This is another implementation of query rescorer api (LUCENE-5489). It adds 
> rescoring functionality based on provided CollectorManager. 
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



Re: [jira] [Commented] (LUCENE-8791) Add CollectorRescorer

2019-06-08 Thread Michael Sokolov
I think this is the same pro-rated idea from LUCENE-8681; when the
documents are randomly distributed among segments, the prediction can
be quite accurate. In the case of a time series index though (eg, or
any index where the distribution among segments is correlated with the
rank), then this approach to early termination is not directly
applicable.

On Fri, Jun 7, 2019 at 4:27 AM Adrien Grand (JIRA)  wrote:
>
>
> [ 
> https://issues.apache.org/jira/browse/LUCENE-8791?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16858391#comment-16858391
>  ]
>
> Adrien Grand commented on LUCENE-8791:
> --
>
> My concern is that we are introducing complexity for a use-case that seems to 
> be collecting a number of hits in the first pass that is above what I would 
> consider reasonable. I understand why you might want to set an executor 
> service on an IndexSearcher since it needs to do work over potentially 
> millions of hits. However, the order of magnitude of the number of documents 
> that I'm expecting rescoring to have to look at is more in the order of 
> hundreds and should run in a few milliseconds only already with a single 
> thread.
>
> That concern aside, I'm supportive of the ability of rescoring hits based on 
> a collector, which adds flexibility compared to what you can do with a query 
> rescorer. I won't veto this change, but could we make the single-threaded 
> constructor take a Collector rather than a CollectorManager and document that 
> the one that takes an ExecutorService is expert and usually doesn't help 
> significantly?
>
> bq. We distribute total number of results we are looking from matching across 
> segments evenly plus some static number for overhead
>
> This way of working would be a nice enhancement to IndexSearcher when 
> constructed with an ExecutorService! Do you just accept the decrease of 
> precision if one slice didn't collect enough hits, or do you re-run the 
> search on this slice with a greater number of hits?
>
> > Add CollectorRescorer
> > -
> >
> > Key: LUCENE-8791
> > URL: https://issues.apache.org/jira/browse/LUCENE-8791
> > Project: Lucene - Core
> >  Issue Type: Improvement
> >Reporter: Elbek Kamoliddinov
> >Priority: Major
> > Attachments: LUCENE-8791.patch, LUCENE-8791.patch, 
> > LUCENE-8791.patch, LUCENE-8791.patch
> >
> >
> > This is another implementation of query rescorer api (LUCENE-5489). It adds 
> > rescoring functionality based on provided CollectorManager.
> >
>
>
>
> --
> This message was sent by Atlassian JIRA
> (v7.6.3#76005)
>
> -
> To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
> For additional commands, e-mail: dev-h...@lucene.apache.org
>

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-8791) Add CollectorRescorer

2019-06-07 Thread Adrien Grand (JIRA)


[ 
https://issues.apache.org/jira/browse/LUCENE-8791?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16858391#comment-16858391
 ] 

Adrien Grand commented on LUCENE-8791:
--

My concern is that we are introducing complexity for a use-case that seems to 
be collecting a number of hits in the first pass that is above what I would 
consider reasonable. I understand why you might want to set an executor service 
on an IndexSearcher since it needs to do work over potentially millions of 
hits. However, the order of magnitude of the number of documents that I'm 
expecting rescoring to have to look at is more in the order of hundreds and 
should run in a few milliseconds only already with a single thread.

That concern aside, I'm supportive of the ability of rescoring hits based on a 
collector, which adds flexibility compared to what you can do with a query 
rescorer. I won't veto this change, but could we make the single-threaded 
constructor take a Collector rather than a CollectorManager and document that 
the one that takes an ExecutorService is expert and usually doesn't help 
significantly?

bq. We distribute total number of results we are looking from matching across 
segments evenly plus some static number for overhead

This way of working would be a nice enhancement to IndexSearcher when 
constructed with an ExecutorService! Do you just accept the decrease of 
precision if one slice didn't collect enough hits, or do you re-run the search 
on this slice with a greater number of hits?

> Add CollectorRescorer
> -
>
> Key: LUCENE-8791
> URL: https://issues.apache.org/jira/browse/LUCENE-8791
> Project: Lucene - Core
>  Issue Type: Improvement
>Reporter: Elbek Kamoliddinov
>Priority: Major
> Attachments: LUCENE-8791.patch, LUCENE-8791.patch, LUCENE-8791.patch, 
> LUCENE-8791.patch
>
>
> This is another implementation of query rescorer api (LUCENE-5489). It adds 
> rescoring functionality based on provided CollectorManager. 
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-8791) Add CollectorRescorer

2019-06-06 Thread Elbek Kamoliddinov (JIRA)


[ 
https://issues.apache.org/jira/browse/LUCENE-8791?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16858314#comment-16858314
 ] 

Elbek Kamoliddinov commented on LUCENE-8791:


After MIA, I am back at this. I was thinking along the line Mike, but even more 
brutal. So the change was just to have one sole ctor which takes single 
{{CollectorManager}} param. There is setter for setting {{ExecutorService}} 
object. This way single running single threaded will get first class treatment? 
I have attached an updated patch. 

[~atris]
{quote}
Interesting. How do you early terminate across multiple slices running 
concurrently?
{quote}
We distribute total number of results we are looking from matching across 
segments evenly plus some static number for overhead (I think) and terminate 
when each segment collects enough docs.   

> Add CollectorRescorer
> -
>
> Key: LUCENE-8791
> URL: https://issues.apache.org/jira/browse/LUCENE-8791
> Project: Lucene - Core
>  Issue Type: Improvement
>Reporter: Elbek Kamoliddinov
>Priority: Major
> Attachments: LUCENE-8791.patch, LUCENE-8791.patch, LUCENE-8791.patch
>
>
> This is another implementation of query rescorer api (LUCENE-5489). It adds 
> rescoring functionality based on provided CollectorManager. 
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-8791) Add CollectorRescorer

2019-05-30 Thread Michael McCandless (JIRA)


[ 
https://issues.apache.org/jira/browse/LUCENE-8791?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16852068#comment-16852068
 ] 

Michael McCandless commented on LUCENE-8791:


{quote}In my opinion, rescorers should be used on the very top hits only.
{quote}
Is the concern that this rescorer optionally accepts {{ExecutorService}} to 
distribute the work across threads?

Maybe we could just add another ctor *not* taking {{ExecutorService}} for those 
use cases that want to run single-threaded?

 

> Add CollectorRescorer
> -
>
> Key: LUCENE-8791
> URL: https://issues.apache.org/jira/browse/LUCENE-8791
> Project: Lucene - Core
>  Issue Type: Improvement
>Reporter: Elbek Kamoliddinov
>Priority: Major
> Attachments: LUCENE-8791.patch, LUCENE-8791.patch, LUCENE-8791.patch
>
>
> This is another implementation of query rescorer api (LUCENE-5489). It adds 
> rescoring functionality based on provided CollectorManager. 
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-8791) Add CollectorRescorer

2019-05-27 Thread Adrien Grand (JIRA)


[ 
https://issues.apache.org/jira/browse/LUCENE-8791?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16848908#comment-16848908
 ] 

Adrien Grand commented on LUCENE-8791:
--

bq. Sometimes this set is not that small, Documents may be enriched during 
indexing time, hence too many may match.

We don't do anything to prevent from using rescorers on lots of hits, but I'd 
rather not optimize for this use-case. In my opinion, rescorers should be used 
on the very top hits only.

> Add CollectorRescorer
> -
>
> Key: LUCENE-8791
> URL: https://issues.apache.org/jira/browse/LUCENE-8791
> Project: Lucene - Core
>  Issue Type: Improvement
>Reporter: Elbek Kamoliddinov
>Priority: Major
> Attachments: LUCENE-8791.patch, LUCENE-8791.patch, LUCENE-8791.patch
>
>
> This is another implementation of query rescorer api (LUCENE-5489). It adds 
> rescoring functionality based on provided CollectorManager. 
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-8791) Add CollectorRescorer

2019-05-26 Thread Atri Sharma (JIRA)


[ 
https://issues.apache.org/jira/browse/LUCENE-8791?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16848359#comment-16848359
 ] 

Atri Sharma commented on LUCENE-8791:
-

bq. We do early termination when we have to many results, but still this upper 
bound is bit costly when running sequential

Interesting. How do you early terminate across multiple slices running 
concurrently?

> Add CollectorRescorer
> -
>
> Key: LUCENE-8791
> URL: https://issues.apache.org/jira/browse/LUCENE-8791
> Project: Lucene - Core
>  Issue Type: Improvement
>Reporter: Elbek Kamoliddinov
>Priority: Major
> Attachments: LUCENE-8791.patch, LUCENE-8791.patch, LUCENE-8791.patch
>
>
> This is another implementation of query rescorer api (LUCENE-5489). It adds 
> rescoring functionality based on provided CollectorManager. 
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-8791) Add CollectorRescorer

2019-05-26 Thread Elbek Kamoliddinov (JIRA)


[ 
https://issues.apache.org/jira/browse/LUCENE-8791?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16848355#comment-16848355
 ] 

Elbek Kamoliddinov commented on LUCENE-8791:


{quote}
I understand how ML models can be costly to build, but running them against a 
small set of hits should perform reasonably fast
{quote}
Sometimes this set is not that small, Documents may be enriched during indexing 
time, hence too many may match. We do early termination when we have to many 
results, but still this upper bound is bit costly when running sequential. We 
tested whole query running pipeline running a query per thread and had mixed 
results, but overall multi threaded querying serving offered better result. 

> Add CollectorRescorer
> -
>
> Key: LUCENE-8791
> URL: https://issues.apache.org/jira/browse/LUCENE-8791
> Project: Lucene - Core
>  Issue Type: Improvement
>Reporter: Elbek Kamoliddinov
>Priority: Major
> Attachments: LUCENE-8791.patch, LUCENE-8791.patch, LUCENE-8791.patch
>
>
> This is another implementation of query rescorer api (LUCENE-5489). It adds 
> rescoring functionality based on provided CollectorManager. 
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-8791) Add CollectorRescorer

2019-05-20 Thread Adrien Grand (JIRA)


[ 
https://issues.apache.org/jira/browse/LUCENE-8791?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16844029#comment-16844029
 ] 

Adrien Grand commented on LUCENE-8791:
--

bq. Yes, We run CPU intensive ML models to rescore hits, it always runs with an 
executor.

I understand how ML models can be costly to build, but running them against a 
small set of hits should perform reasonably fast?

bq. using LeafReaderContext.ord directly as positional parameter might be risky 
since LeafReaderContext has a constructor where it sets the ord to 0

This constructor is pkg-private and only used for single-segment reader 
contexts, such as the ones that you get from LeafReader#getContext. Reader 
context ords should be reliable.

> Add CollectorRescorer
> -
>
> Key: LUCENE-8791
> URL: https://issues.apache.org/jira/browse/LUCENE-8791
> Project: Lucene - Core
>  Issue Type: Improvement
>Reporter: Elbek Kamoliddinov
>Priority: Major
> Attachments: LUCENE-8791.patch, LUCENE-8791.patch, LUCENE-8791.patch
>
>
> This is another implementation of query rescorer api (LUCENE-5489). It adds 
> rescoring functionality based on provided CollectorManager. 
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-8791) Add CollectorRescorer

2019-05-17 Thread Elbek Kamoliddinov (JIRA)


[ 
https://issues.apache.org/jira/browse/LUCENE-8791?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16842005#comment-16842005
 ] 

Elbek Kamoliddinov commented on LUCENE-8791:


Sorry, I was bit busy and could not work on feedback. I created another 
iteration with some changes. 
{quote}
1) In buildSegmentScorers, could we rename maxDoc as endDoc? Since maxDoc is 
what is used to typically denote the number of documents in a segment, the code 
can be conflicting to read.
{quote}
I left the name as maxDoc, this is denoting max doc and used to check if we 
went beyond it so we advance. Is it convincing? 
{quote}
When mapping LeafSlices to SegmentScorers, we do another loop post the 
construction of SegmentScorers. Is there a way where we could do the mapping in 
buildSegmentScorers itself at the time of scorer task allocation?
{quote}
The thing is slice may contain out of order segments, for example slice 0 may 
contain segment 0 and segment 3. So we have to do full iteration here first 
then map segments to the slices. 
{quote}
3) I am a bit unsure about how the rescorer method works. If IndexSearcher has 
non empty slices, then that implies that a non null ExecutorService was passed 
in to IndexSearcher. Is there any correlation to that and CollectorRescorer 
using an ExecutorService?
{quote}
I think it depends on use case, one may want to run index searcher with 
executor but wants this run sequential? I made some changes there, so now slice 
works only with executor, if executor is null then slice is ignored. Also if 
slice is null but executor is not null (and segment count is >1) then resocres 
each segment in a separate thread.  
{quote}
4) Could you please break up rescorer method into more modular objects? 
Ideally, a method each for sequential and parallel cases. Also, there is some 
code duplication at the tail where you do the sequential execution, that should 
get eliminated.
{quote}
Done, but I did it based on slice or segment based. Because in both new methods 
it maybe possible that execution happens in parallel. 

{quote}
Thinking more about this, using LeafReaderContext.ord directly as positional 
parameter might be risky since LeafReaderContext has a constructor where it 
sets the ord to 0. Ideally that should not happen given that these segments are 
already scored once, but that might be one case worth thinking about?
{quote}
I don't think I understood this concern. You are saying there might be 
duplicate leaf with 0 ord or there might gap in the leaf ord? I don't think 
either one is possible. A lot of internal classes uses ord in such way, for 
example {{OrdinalMap}}.   

Let me know what you think.




> Add CollectorRescorer
> -
>
> Key: LUCENE-8791
> URL: https://issues.apache.org/jira/browse/LUCENE-8791
> Project: Lucene - Core
>  Issue Type: Improvement
>Reporter: Elbek Kamoliddinov
>Priority: Major
> Attachments: LUCENE-8791.patch, LUCENE-8791.patch, LUCENE-8791.patch
>
>
> This is another implementation of query rescorer api (LUCENE-5489). It adds 
> rescoring functionality based on provided CollectorManager. 
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-8791) Add CollectorRescorer

2019-05-13 Thread Atri Sharma (JIRA)


[ 
https://issues.apache.org/jira/browse/LUCENE-8791?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16838448#comment-16838448
 ] 

Atri Sharma commented on LUCENE-8791:
-

Thinking more about this, using LeafReaderContext.ord directly as positional 
parameter might be risky since LeafReaderContext has a constructor where it 
sets the ord to 0. Ideally that should not happen given that these segments are 
already scored once, but that might be one case worth thinking about?

> Add CollectorRescorer
> -
>
> Key: LUCENE-8791
> URL: https://issues.apache.org/jira/browse/LUCENE-8791
> Project: Lucene - Core
>  Issue Type: Improvement
>Reporter: Elbek Kamoliddinov
>Priority: Major
> Attachments: LUCENE-8791.patch, LUCENE-8791.patch
>
>
> This is another implementation of query rescorer api (LUCENE-5489). It adds 
> rescoring functionality based on provided CollectorManager. 
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-8791) Add CollectorRescorer

2019-05-12 Thread Atri Sharma (JIRA)


[ 
https://issues.apache.org/jira/browse/LUCENE-8791?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16838238#comment-16838238
 ] 

Atri Sharma commented on LUCENE-8791:
-

Hi [~elbek@gmail.com]

 

Thanks for another iteration. A few other comments:

 

1) In buildSegmentScorers, could we rename maxDoc as endDoc? Since maxDoc is 
what is used to typically denote the number of documents in a segment, the code 
can be conflicting to read.

 

2) When mapping LeafSlices to SegmentScorers, we do another loop post the 
construction of SegmentScorers. Is there a way where we could do the mapping in 
buildSegmentScorers itself at the time of scorer task allocation?

 

3) I am a bit unsure about how the rescorer method works. If IndexSearcher has 
non empty slices, then that implies that a non null ExecutorService was passed 
in to IndexSearcher. Is there any correlation to that and CollectorRescorer 
using an ExecutorService?

If yes, then a plain assert to check that ExecutorService is not null should 
suffice and there is no need for the if-else block.

If not, then we should ignore IndexSearcher's LeafSlices when CollectorRescorer 
is going single threaded i.e. we should not do the slices to segment mapping, 
since we anyways are not going to use that.

 

4) Could you please break up rescorer method into more modular objects? 
Ideally, a method each for sequential and parallel cases. Also, there is some 
code duplication at the tail where you do the sequential execution, that should 
get eliminated.

> Add CollectorRescorer
> -
>
> Key: LUCENE-8791
> URL: https://issues.apache.org/jira/browse/LUCENE-8791
> Project: Lucene - Core
>  Issue Type: Improvement
>Reporter: Elbek Kamoliddinov
>Priority: Major
> Attachments: LUCENE-8791.patch, LUCENE-8791.patch
>
>
> This is another implementation of query rescorer api (LUCENE-5489). It adds 
> rescoring functionality based on provided CollectorManager. 
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-8791) Add CollectorRescorer

2019-05-12 Thread Elbek Kamoliddinov (JIRA)


[ 
https://issues.apache.org/jira/browse/LUCENE-8791?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16838225#comment-16838225
 ] 

Elbek Kamoliddinov commented on LUCENE-8791:


Yes, We run CPU intensive ML models to rescore hits, it always runs with an 
executor. Idea is we first collect top X hits with static scorer then run this 
rescorer to X to reduce it to say top 20 best hits. 

> Add CollectorRescorer
> -
>
> Key: LUCENE-8791
> URL: https://issues.apache.org/jira/browse/LUCENE-8791
> Project: Lucene - Core
>  Issue Type: Improvement
>Reporter: Elbek Kamoliddinov
>Priority: Major
> Attachments: LUCENE-8791.patch, LUCENE-8791.patch
>
>
> This is another implementation of query rescorer api (LUCENE-5489). It adds 
> rescoring functionality based on provided CollectorManager. 
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-8791) Add CollectorRescorer

2019-05-12 Thread Adrien Grand (JIRA)


[ 
https://issues.apache.org/jira/browse/LUCENE-8791?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16838136#comment-16838136
 ] 

Adrien Grand commented on LUCENE-8791:
--

I like the idea of a collector-based rescorer so that you can do things that 
you can't do with queries, such as ensuring there is some diversity on the 
first page (we have a DiversifiedTopDocsCollector). However I'm curious about 
the fact that this rescorer takes an executor, are you running rescorers that 
are very CPU-intensive?

> Add CollectorRescorer
> -
>
> Key: LUCENE-8791
> URL: https://issues.apache.org/jira/browse/LUCENE-8791
> Project: Lucene - Core
>  Issue Type: Improvement
>Reporter: Elbek Kamoliddinov
>Priority: Major
> Attachments: LUCENE-8791.patch, LUCENE-8791.patch
>
>
> This is another implementation of query rescorer api (LUCENE-5489). It adds 
> rescoring functionality based on provided CollectorManager. 
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-8791) Add CollectorRescorer

2019-05-06 Thread Elbek Kamoliddinov (JIRA)


[ 
https://issues.apache.org/jira/browse/LUCENE-8791?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16833974#comment-16833974
 ] 

Elbek Kamoliddinov commented on LUCENE-8791:


{{testMultiSegment}} is creating multiple segments, also {{testRandom}} does 
commit with conditioned on {{rarely()}} to have multisegment index. So both of 
these tests should be sufficient? 

Thanks again looking into it. 

 

> Add CollectorRescorer
> -
>
> Key: LUCENE-8791
> URL: https://issues.apache.org/jira/browse/LUCENE-8791
> Project: Lucene - Core
>  Issue Type: Improvement
>Reporter: Elbek Kamoliddinov
>Priority: Major
> Attachments: LUCENE-8791.patch, LUCENE-8791.patch
>
>
> This is another implementation of query rescorer api (LUCENE-5489). It adds 
> rescoring functionality based on provided CollectorManager. 
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-8791) Add CollectorRescorer

2019-05-06 Thread Atri Sharma (JIRA)


[ 
https://issues.apache.org/jira/browse/LUCENE-8791?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16833610#comment-16833610
 ] 

Atri Sharma commented on LUCENE-8791:
-

Hi [~elbek@gmail.com]

 

bq. Yes, it takes CollectorManager instance and it is all it needs. Also It 
makes sense to me that constructor takes CollectorManager because without it, 
there is no point of creating CollectorRescorer. Also as you see It doesn't 
necessarily only run in parallel. if ExecutorService is null then it runs 
sequential, so calling it parallel would be wrong. JavaDoc on constructor 
highlights this.

Ok, I am fine with it as long as it is clearly advertised (maybe add it in the 
class heading's Javadocs itself?)

bq. Corrected it to use ord based assignment.

Could you add a test for this?

I will take another deeper look into the updated patch today and post comments.


> Add CollectorRescorer
> -
>
> Key: LUCENE-8791
> URL: https://issues.apache.org/jira/browse/LUCENE-8791
> Project: Lucene - Core
>  Issue Type: Improvement
>Reporter: Elbek Kamoliddinov
>Priority: Major
> Attachments: LUCENE-8791.patch, LUCENE-8791.patch
>
>
> This is another implementation of query rescorer api (LUCENE-5489). It adds 
> rescoring functionality based on provided CollectorManager. 
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-8791) Add CollectorRescorer

2019-05-05 Thread Elbek Kamoliddinov (JIRA)


[ 
https://issues.apache.org/jira/browse/LUCENE-8791?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16833258#comment-16833258
 ] 

Elbek Kamoliddinov commented on LUCENE-8791:


[~atris]

Hi Atri, Thanks for taking a look at the patch. So added more doc also renamed 
{{Job}} and {{Jobs}} to {{SegmentRescorer}} and {{RescorerTask}} respectively, 
hopefully this is more descriptive. I left {{docIDs}} as it is, this looks like 
pretty standard. 
 I am not sure if I got the point here:
{quote}In general, I am a bit unsure about the class name and description. The 
class advertises to be a Collector based rescorer implementation, whereas the 
only constructor of the class takes a CollectorManager instance, which is 
specific to parallel segments traversal. Should the class say that it is used 
for parallel rescoring of hits across multiple segments?
{quote}
Yes, it takes {{CollectorManager}} instance and it is all it needs. Also It 
makes sense to me that constructor takes {{CollectorManager}} because without 
it, there is no point of creating {{CollectorRescorer}}. Also as you see It 
doesn't necessarily only run in parallel. if {{ExecutorService}} is null then 
it runs sequential, so calling it parallel would be wrong. JavaDoc on 
constructor highlights this.
{quote}I am a bit wary of the way the patch maps the global jobs array to jobs 
associated with segments belonging to one slice. In particular, buildJobs 
method does not consider leaf ordinal during assigning values to the global 
array, but rescore depends on that during retrieval.
{quote}
Good one, Corrected it to use ord based assignment.

Thanks.

> Add CollectorRescorer
> -
>
> Key: LUCENE-8791
> URL: https://issues.apache.org/jira/browse/LUCENE-8791
> Project: Lucene - Core
>  Issue Type: Improvement
>Reporter: Elbek Kamoliddinov
>Priority: Major
> Attachments: LUCENE-8791.patch, LUCENE-8791.patch
>
>
> This is another implementation of query rescorer api (LUCENE-5489). It adds 
> rescoring functionality based on provided CollectorManager. 
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-8791) Add CollectorRescorer

2019-05-03 Thread Atri Sharma (JIRA)


[ 
https://issues.apache.org/jira/browse/LUCENE-8791?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16832371#comment-16832371
 ] 

Atri Sharma commented on LUCENE-8791:
-

Apologies then, my formatter must be screwing things up :)

> Add CollectorRescorer
> -
>
> Key: LUCENE-8791
> URL: https://issues.apache.org/jira/browse/LUCENE-8791
> Project: Lucene - Core
>  Issue Type: Improvement
>Reporter: Elbek Kamoliddinov
>Priority: Major
> Attachments: LUCENE-8791.patch
>
>
> This is another implementation of query rescorer api (LUCENE-5489). It adds 
> rescoring functionality based on provided CollectorManager. 
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-8791) Add CollectorRescorer

2019-05-03 Thread Michael McCandless (JIRA)


[ 
https://issues.apache.org/jira/browse/LUCENE-8791?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16832327#comment-16832327
 ] 

Michael McCandless commented on LUCENE-8791:


{quote}Please have a look again at the spacing. In general, it would be good if 
the code was a bit more readable w.r.t spacing around braces, breaking the code 
into logical paragraphes.
{quote}
The spacing/indenting looks correct to me – it seems to match Lucene's coding 
guidelines ([https://wiki.apache.org/lucene-java/DeveloperTips]).

 

> Add CollectorRescorer
> -
>
> Key: LUCENE-8791
> URL: https://issues.apache.org/jira/browse/LUCENE-8791
> Project: Lucene - Core
>  Issue Type: Improvement
>Reporter: Elbek Kamoliddinov
>Priority: Major
> Attachments: LUCENE-8791.patch
>
>
> This is another implementation of query rescorer api (LUCENE-5489). It adds 
> rescoring functionality based on provided CollectorManager. 
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-8791) Add CollectorRescorer

2019-05-03 Thread Atri Sharma (JIRA)


[ 
https://issues.apache.org/jira/browse/LUCENE-8791?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16832281#comment-16832281
 ] 

Atri Sharma commented on LUCENE-8791:
-

Hi [~elbek@gmail.com],

 

A few comments:

 

1) A better naming scheme would be helpful (eg. exec could be renamed to 
execManager for better context, ditto for pending docIDs list, using verbose 
names for objects instead of "Collector c")

2) Please have a look again at the spacing. In general, it would be good if the 
code was a bit more readable w.r.t spacing around braces, breaking the code 
into logical paragraphes.

3) Could the class Job have a better name? Maybe RescoreTask?

4) Same for the class Jobs.

5) Please add a descriptive error message in the assert for ensuring that the 
docID post skipping a segment is higher than the maxDoc.

6) In general, I am a bit unsure about the class name and description. The 
class advertises to be a Collector based rescorer implementation, whereas the 
only constructor of the class takes a CollectorManager instance, which is 
specific to parallel segments traversal. Should the class say that it is used 
for parallel rescoring of hits across multiple segments?

7) I am a bit wary of the way the patch maps the global jobs array to jobs 
associated with segments belonging to one slice. In particular, buildJobs 
method does not consider leaf ordinal during assigning values to the global 
array, but rescore depends on that during retrieval.

> Add CollectorRescorer
> -
>
> Key: LUCENE-8791
> URL: https://issues.apache.org/jira/browse/LUCENE-8791
> Project: Lucene - Core
>  Issue Type: Improvement
>Reporter: Elbek Kamoliddinov
>Priority: Major
> Attachments: LUCENE-8791.patch
>
>
> This is another implementation of query rescorer api (LUCENE-5489). It adds 
> rescoring functionality based on provided CollectorManager. 
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-8791) Add CollectorRescorer

2019-05-02 Thread Elbek Kamoliddinov (JIRA)


[ 
https://issues.apache.org/jira/browse/LUCENE-8791?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16832121#comment-16832121
 ] 

Elbek Kamoliddinov commented on LUCENE-8791:


Attached the patch, adds CollectorRescorer, and test case. Test case became bit 
verbose, because of inlined CollectorManager, LeafCollector classes

> Add CollectorRescorer
> -
>
> Key: LUCENE-8791
> URL: https://issues.apache.org/jira/browse/LUCENE-8791
> Project: Lucene - Core
>  Issue Type: Improvement
>Reporter: Elbek Kamoliddinov
>Priority: Major
> Attachments: LUCENE-8791.patch
>
>
> This is another implementation of query rescorer api (LUCENE-5489). It adds 
> rescoring functionality based on provided CollectorManager. 
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org