[ 
https://issues.apache.org/jira/browse/SOLR-14758?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17398541#comment-17398541
 ] 

Uwe Schindler commented on SOLR-14758:
--------------------------------------

I think I know where the issue is: If it figures out that there are missing 
sort values for one shard it calls "continue". Somewhere at the end it set the 
partial results flag. The porblem is now, if the last shard times out, it 
continues the loop, but as it is the last one the code that sets the "partial 
results" is not executed.

This is really a timing porblem, but i is good that we hit this by extensive 
testing. I have seen the same locally one time, but was not able to reproduce 
after I changed the SolrJ code that checks for partial results in the test. I 
am out of office, but will check later and possible I can get this sorted out 
during the next days.

Great that [~bvd]'s test found a bug in the merging code!

> NPE in QueryComponent.mergeIds when using timeAllowed and sorting
> -----------------------------------------------------------------
>
>                 Key: SOLR-14758
>                 URL: https://issues.apache.org/jira/browse/SOLR-14758
>             Project: Solr
>          Issue Type: Bug
>          Components: SolrCloud
>    Affects Versions: 7.7.3, 8.6.3, main (9.0), 8.8.1, 8.8.2
>            Reporter: Bram Van Dam
>            Assignee: Uwe Schindler
>            Priority: Major
>             Fix For: main (9.0), 8.10
>
>         Attachments: SOLR-14758.patch, SOLR-14758.patch
>
>          Time Spent: 1h
>  Remaining Estimate: 0h
>
> Only tested on 7.7.3 and master, but the offending code hasn't been changed 
> for a while, so this presumably affects other versions as well. 
> Steps to reproduce:
> # SolrCloud
> # Create a query which is complex enough to take a while
> # Add a sort clause to the query (e.g. &sort=creationTimestamp asc)
> # Add a short value timeAllowed (10ms in my test)
> Result: NPE in QueryComponent.mergeIds:935
> It may take a couple of attempts to hit the error.
> Offending code:
> {code:java}
>         NamedList sortFieldValues = 
> (NamedList)(srsp.getSolrResponse().getResponse().get("sort_values"));
>         if (sortFieldValues.size()==0 && // we bypass merging this response 
> only if it's partial itself
>                             thisResponseIsPartial) { // but not the previous 
> one!!
>           continue; //fsv timeout yields empty sort_vlaues
>         }
> {code}
> sortFieldValues can apparently be null in some cases, depending on when the 
> query hits the timeAllowed. Adding an extra null check fixes the issue.
> {code:java}
>         NamedList sortFieldValues = 
> (NamedList)(srsp.getSolrResponse().getResponse().get("sort_values"));
>         if ((null == sortFieldValues || sortFieldValues.size()==0) && // we 
> bypass merging this response only if it's partial itself
>                             thisResponseIsPartial) { // but not the previous 
> one!!
>           continue; //fsv timeout yields empty sort_vlaues
>         }
> {code}
> I'll attach a patch.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@solr.apache.org
For additional commands, e-mail: issues-h...@solr.apache.org

Reply via email to