James Dyer created SOLR-17234:
---------------------------------

             Summary: LBHttp2SolrClient does not skip "zombie" endpoints
                 Key: SOLR-17234
                 URL: https://issues.apache.org/jira/browse/SOLR-17234
             Project: Solr
          Issue Type: Bug
      Security Level: Public (Default Security Level. Issues are Public)
          Components: SolrJ
    Affects Versions: main (10.0)
            Reporter: James Dyer


While working on SOLR-14763, I found different behavior with 
*LBHttp2SolrClient* between *branch_9x* and {*}main/10.x{*}.

If the first Endpoint in the list had previously failed, *branch_9x* will skip 
the failed Endpoint with subsequent requests, and begin requesting with the 
second Endpoint. If all remaining Endpoints fail, it will then retry the first 
Endpoint again.

If the first Endpoint in the list had previously failed, *main/10.x* will 
always try the first Endpoint despite it being in the "Zombie List".  When the 
first Endpoint fails again, it will re-try the second Endpoint.

The *branch_9x* behavior seems more desirable as this minimizes unnecessary 
work by avoiding Endpoints that are known to fail. Indeed, *main/10.x* has an 
obvious bug in *EndpointIterator#fetchNext* where it attempts to get the wrong 
type of key for the map holding the Zombies.  I believe this difference is a 
regression bug in *main/10x*.

The different behavior is recorded in test 
*LBHttp2SolrClientTest#testAsyncWithFailures*. This test was added 
after-the-fact with SOLR-14763. I needed to change its "asserts" when 
backporting to *branch_9x* to account for the changed behavior.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@solr.apache.org
For additional commands, e-mail: issues-h...@solr.apache.org

Reply via email to