[
https://issues.apache.org/jira/browse/SOLR-17819?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Houston Putman resolved SOLR-17819.
-----------------------------------
Fix Version/s: 9.9
Resolution: Fixed
> HttpShardHandler non-tolerant request cancellation bleeds across requests
> -------------------------------------------------------------------------
>
> Key: SOLR-17819
> URL: https://issues.apache.org/jira/browse/SOLR-17819
> Project: Solr
> Issue Type: Bug
> Reporter: Houston Putman
> Assignee: Houston Putman
> Priority: Blocker
> Labels: pull-request-available
> Fix For: 9.9
>
> Time Spent: 20m
> Remaining Estimate: 0h
>
> However, after fixing that and beasting the tests, there is a really weird
> error around cancelling requests. The does a non-tolerant search then does a
> tolerant search. The error I described above was breaking the non-tolerant
> search. That is easily fixable. The second part of the test, testing tolerant
> search fails very occasionally (but only when the non-tolerant search is done
> first, when that is commented out, the tolerant search does not fail).
> When beasting {{DistributedDebugComponentTest.testTolerantSearch}} , and
> adding a loop to do the requests 1,000 times, the tolerant search fails
> because all three shard requests fail instead of just 1 of the shard requests
> failing (because of a non-exisistant endpoint). the bad shard has the failure
> that the test expects, but the good shards both fail with
> {{java.io.IOException: cancel_stream_error/unexpected_data_frame}} meaning
> that the requests were cancelled, even thought the request is "tolerant". I
> did a lot of debugging here, and noticed that Solr is behaving correctly and
> we are not cancelling shard requests for tolerant solr requests. And the fact
> that if the "non-tolerant search" request case right before the tolerant
> search request is commented out, the failures stop, tell us that the
> cancellations from the non-tolerant request are bleeding into the tolerant
> request. This is bad. I also confirmed this by commenting out the line that
> actually cancels the HTTP requests (when commented out, the test succeeds):
> [https://github.com/apache/solr/blob/branch_9_9/solr/solrj/src/java/org/apache/solr/client/solrj/impl/Http2SolrClient.java#L570-L574]
> This only happens on branch_9x (and presumable branch_9_9), not on main. So I
> believe it's a bug in Jetty 10, which Jetty 12 has solved. So we are probably
> fine just fixing this part on branch_9x and branch_9_9, and leaving the
> request cancellation enabled on main (10.x).
> Amazingly, when beasting, there is a big difference in whether the
> non-existent endpoint is put first or last in the list of shards. The failure
> rate is much higher when the bad shard is the first listed rather the last
> one listed.
--
This message was sent by Atlassian Jira
(v8.20.10#820010)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]