[jira] [Updated] (SOLR-16931) ReRankScaler explain only works when debugQuery=true, should also work with debug=query

Joel Bernstein (Jira) Fri, 15 Sep 2023 07:29:20 -0700


     [ 
https://issues.apache.org/jira/browse/SOLR-16931?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]


Joel Bernstein updated SOLR-16931:
----------------------------------
    Description: 
The ReRankScaler collects specific information for the explain when debugQuery 
is set to true. But the parameter *debug=query* doesn't trigger the collection 
of this data which causes an NPE in the explain.

The work around is to always use debugQuery=true until this ticket is resolved 
and released.

It turned out that this ticket had two problems. The first one is described 
above. The second issue is that distributed explain is broken with the 
ReRankScaler.

The reason for this is that in order to do proper explain for minMaxScaling you 
need to know the min and max score in the result set. This piece of state is 
maintained in the ReRankScaler itself which is inside of the ReRankQuery. But 
for this information to be populated the query must first be run. In 
distributed mode, explain is called in the second pass when the ids query is 
run so the state needed for the explain is not populated. The PR attached to 
this addresses this problem by doing a single pass distributed query if 
debugQuery is turned on and if reRank score scaling is applied. I'll add a 
distributed test for this as well.

This change is very limited in scope because the single pass distributed is 
only switched on in the very specific case when debugQuery=true and 
reRankScaling is on. 

  was:
The ReRankScaler collects specific information for the explain when debugQuery 
is set to true. But the parameter *debug=query* doesn't trigger the collection 
of this data which causes an NPE in the explain.

The work around is to always use debugQuery=true until this ticket is resolved 
and released.

It turned out that this ticket had two problems. The first one is described 
above. The second issue is that distributed explain was broken with the 
ReRankScaler.

The reason for this is that in order to do proper explain for minMaxScaling you 
need to know the min and max score in the result set. This piece of state is 
maintained in ReRankScaler itself which inside the ReRankQuery. But for this 
information to be populated the query must first be run. In distributed mode 
explain is called in the second pass when the ids query is run so the state 
needed for the explain is not populated. The PR attached to this addresses this 
problem by adding doing a single pass distributed query if debugQuery is turned 
on ad the reRankScaling score scaling is applied. I'll a distributed test for 
this as well.

This change is very limited in scope because the single pass distributed is 
only switched on in the very specific case of debugQuery=true and reRankScaling 
is on. 


> ReRankScaler explain only works when debugQuery=true, should also work with 
> debug=query
> ---------------------------------------------------------------------------------------
>
>                 Key: SOLR-16931
>                 URL: https://issues.apache.org/jira/browse/SOLR-16931
>             Project: Solr
>          Issue Type: Bug
>      Security Level: Public(Default Security Level. Issues are Public) 
>          Components: reranker
>    Affects Versions: 9.3
>            Reporter: Joel Bernstein
>            Assignee: Joel Bernstein
>            Priority: Major
>          Time Spent: 10m
>  Remaining Estimate: 0h
>
> The ReRankScaler collects specific information for the explain when 
> debugQuery is set to true. But the parameter *debug=query* doesn't trigger 
> the collection of this data which causes an NPE in the explain.
> The work around is to always use debugQuery=true until this ticket is 
> resolved and released.
> It turned out that this ticket had two problems. The first one is described 
> above. The second issue is that distributed explain is broken with the 
> ReRankScaler.
> The reason for this is that in order to do proper explain for minMaxScaling 
> you need to know the min and max score in the result set. This piece of state 
> is maintained in the ReRankScaler itself which is inside of the ReRankQuery. 
> But for this information to be populated the query must first be run. In 
> distributed mode, explain is called in the second pass when the ids query is 
> run so the state needed for the explain is not populated. The PR attached to 
> this addresses this problem by doing a single pass distributed query if 
> debugQuery is turned on and if reRank score scaling is applied. I'll add a 
> distributed test for this as well.
> This change is very limited in scope because the single pass distributed is 
> only switched on in the very specific case when debugQuery=true and 
> reRankScaling is on. 



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@solr.apache.org
For additional commands, e-mail: issues-h...@solr.apache.org

[jira] [Updated] (SOLR-16931) ReRankScaler explain only works when debugQuery=true, should also work with debug=query

Reply via email to