[ 
https://issues.apache.org/jira/browse/SOLR-17819?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Houston Putman resolved SOLR-17819.
-----------------------------------
    Fix Version/s: 9.9
       Resolution: Fixed

> HttpShardHandler non-tolerant request cancellation bleeds across requests
> -------------------------------------------------------------------------
>
>                 Key: SOLR-17819
>                 URL: https://issues.apache.org/jira/browse/SOLR-17819
>             Project: Solr
>          Issue Type: Bug
>            Reporter: Houston Putman
>            Assignee: Houston Putman
>            Priority: Blocker
>              Labels: pull-request-available
>             Fix For: 9.9
>
>          Time Spent: 20m
>  Remaining Estimate: 0h
>
> However, after fixing that and beasting the tests, there is a really weird 
> error around cancelling requests. The  does a non-tolerant search then does a 
> tolerant search. The error I described above was breaking the non-tolerant 
> search. That is easily fixable. The second part of the test, testing tolerant 
> search fails very occasionally (but only when the non-tolerant search is done 
> first, when that is commented out, the tolerant search does not fail).
> When beasting {{DistributedDebugComponentTest.testTolerantSearch}} , and 
> adding a loop to do the requests 1,000 times, the tolerant search fails 
> because all three shard requests fail instead of just 1 of the shard requests 
> failing (because of a non-exisistant endpoint). the bad shard has the failure 
> that the test expects, but the good shards both fail with 
> {{java.io.IOException: cancel_stream_error/unexpected_data_frame}} meaning 
> that the requests were cancelled, even thought the request is "tolerant". I 
> did a lot of debugging here, and noticed that Solr is behaving correctly and 
> we are not cancelling shard requests for tolerant solr requests. And the fact 
> that if the "non-tolerant search" request case right before the tolerant 
> search request is commented out, the failures stop, tell us that the 
> cancellations from the non-tolerant request are bleeding into the tolerant 
> request. This is bad. I also confirmed this by commenting out the line that 
> actually cancels the HTTP requests (when commented out, the test succeeds): 
> [https://github.com/apache/solr/blob/branch_9_9/solr/solrj/src/java/org/apache/solr/client/solrj/impl/Http2SolrClient.java#L570-L574]
> This only happens on branch_9x (and presumable branch_9_9), not on main. So I 
> believe it's a bug in Jetty 10, which Jetty 12 has solved. So we are probably 
> fine just fixing this part on branch_9x and branch_9_9, and leaving the 
> request cancellation enabled on main (10.x).
> Amazingly, when beasting, there is a big difference in whether the 
> non-existent endpoint is put first or last in the list of shards. The failure 
> rate is much higher when the bad shard is the first listed rather the last 
> one listed.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to