[ 
https://issues.apache.org/jira/browse/SOLR-17819?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=18068111#comment-18068111
 ] 

Houston Putman commented on SOLR-17819:
---------------------------------------

I copied it from another Jira comment, sorry about that. It should be fixed up 
now.

> HttpShardHandler non-tolerant request cancellation bleeds across requests
> -------------------------------------------------------------------------
>
>                 Key: SOLR-17819
>                 URL: https://issues.apache.org/jira/browse/SOLR-17819
>             Project: Solr
>          Issue Type: Bug
>            Reporter: Houston Putman
>            Assignee: Houston Putman
>            Priority: Blocker
>              Labels: pull-request-available
>             Fix For: 9.9
>
>          Time Spent: 20m
>  Remaining Estimate: 0h
>
> When beasting {{DistributedDebugComponentTest.testTolerantSearch}}, there is 
> a really weird error around cancelling requests. The 
> {{DistributedDebugComponentTest.testTolerantSearch}} does a non-tolerant 
> search then does a tolerant search. The second part of the test, testing 
> tolerant search fails very occasionally (but only when the non-tolerant 
> search is done first, when that is commented out, the tolerant search does 
> not fail).
> The tolerant search fails (occasionally) because all three shard requests 
> fail instead of just 1 of the shard requests failing (because of a 
> non-exisistant endpoint). the bad shard has the failure that the test 
> expects, but the good shards both fail with {{java.io.IOException: 
> cancel_stream_error/unexpected_data_frame}} meaning that the requests were 
> cancelled, even thought the request is "tolerant". I did a lot of debugging 
> here, and noticed that Solr is behaving correctly and we are not cancelling 
> shard requests for tolerant solr requests. And the fact that if the 
> "non-tolerant search" request case right before the tolerant search request 
> is commented out, the failures stop, tell us that the cancellations from the 
> non-tolerant request are bleeding into the tolerant request. This is bad. I 
> also confirmed this by commenting out the line that actually cancels the HTTP 
> requests: 
> [https://github.com/apache/solr/blob/branch_9_9/solr/solrj/src/java/org/apache/solr/client/solrj/impl/Http2SolrClient.java#L570-L574]
> This only happens on branch_9x (and presumable branch_9_9), not on main. So I 
> believe it's a bug in Jetty 10, which Jetty 12 has solved. So we are probably 
> fine just fixing this part on branch_9x and branch_9_9, and leaving the 
> request cancellation enabled on main (10.x).
> Amazingly, when beasting, there is a big difference in whether the 
> non-existent endpoint is put first or last in the list of shards. The failure 
> rate is much higher when the bad shard is the first listed rather the last 
> one listed.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to