James Dyer created SOLR-17234: --------------------------------- Summary: LBHttp2SolrClient does not skip "zombie" endpoints Key: SOLR-17234 URL: https://issues.apache.org/jira/browse/SOLR-17234 Project: Solr Issue Type: Bug Security Level: Public (Default Security Level. Issues are Public) Components: SolrJ Affects Versions: main (10.0) Reporter: James Dyer
While working on SOLR-14763, I found different behavior with *LBHttp2SolrClient* between *branch_9x* and {*}main/10.x{*}. If the first Endpoint in the list had previously failed, *branch_9x* will skip the failed Endpoint with subsequent requests, and begin requesting with the second Endpoint. If all remaining Endpoints fail, it will then retry the first Endpoint again. If the first Endpoint in the list had previously failed, *main/10.x* will always try the first Endpoint despite it being in the "Zombie List". When the first Endpoint fails again, it will re-try the second Endpoint. The *branch_9x* behavior seems more desirable as this minimizes unnecessary work by avoiding Endpoints that are known to fail. Indeed, *main/10.x* has an obvious bug in *EndpointIterator#fetchNext* where it attempts to get the wrong type of key for the map holding the Zombies. I believe this difference is a regression bug in *main/10x*. The different behavior is recorded in test *LBHttp2SolrClientTest#testAsyncWithFailures*. This test was added after-the-fact with SOLR-14763. I needed to change its "asserts" when backporting to *branch_9x* to account for the changed behavior. -- This message was sent by Atlassian Jira (v8.20.10#820010) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@solr.apache.org For additional commands, e-mail: issues-h...@solr.apache.org