[
https://issues.apache.org/jira/browse/SOLR-17819?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=18068111#comment-18068111
]
Houston Putman commented on SOLR-17819:
---------------------------------------
I copied it from another Jira comment, sorry about that. It should be fixed up
now.
> HttpShardHandler non-tolerant request cancellation bleeds across requests
> -------------------------------------------------------------------------
>
> Key: SOLR-17819
> URL: https://issues.apache.org/jira/browse/SOLR-17819
> Project: Solr
> Issue Type: Bug
> Reporter: Houston Putman
> Assignee: Houston Putman
> Priority: Blocker
> Labels: pull-request-available
> Fix For: 9.9
>
> Time Spent: 20m
> Remaining Estimate: 0h
>
> When beasting {{DistributedDebugComponentTest.testTolerantSearch}}, there is
> a really weird error around cancelling requests. The
> {{DistributedDebugComponentTest.testTolerantSearch}} does a non-tolerant
> search then does a tolerant search. The second part of the test, testing
> tolerant search fails very occasionally (but only when the non-tolerant
> search is done first, when that is commented out, the tolerant search does
> not fail).
> The tolerant search fails (occasionally) because all three shard requests
> fail instead of just 1 of the shard requests failing (because of a
> non-exisistant endpoint). the bad shard has the failure that the test
> expects, but the good shards both fail with {{java.io.IOException:
> cancel_stream_error/unexpected_data_frame}} meaning that the requests were
> cancelled, even thought the request is "tolerant". I did a lot of debugging
> here, and noticed that Solr is behaving correctly and we are not cancelling
> shard requests for tolerant solr requests. And the fact that if the
> "non-tolerant search" request case right before the tolerant search request
> is commented out, the failures stop, tell us that the cancellations from the
> non-tolerant request are bleeding into the tolerant request. This is bad. I
> also confirmed this by commenting out the line that actually cancels the HTTP
> requests:
> [https://github.com/apache/solr/blob/branch_9_9/solr/solrj/src/java/org/apache/solr/client/solrj/impl/Http2SolrClient.java#L570-L574]
> This only happens on branch_9x (and presumable branch_9_9), not on main. So I
> believe it's a bug in Jetty 10, which Jetty 12 has solved. So we are probably
> fine just fixing this part on branch_9x and branch_9_9, and leaving the
> request cancellation enabled on main (10.x).
> Amazingly, when beasting, there is a big difference in whether the
> non-existent endpoint is put first or last in the list of shards. The failure
> rate is much higher when the bad shard is the first listed rather the last
> one listed.
--
This message was sent by Atlassian Jira
(v8.20.10#820010)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]