[ 
https://issues.apache.org/jira/browse/TINKERPOP-2569?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17422502#comment-17422502
 ] 

ASF GitHub Bot commented on TINKERPOP-2569:
-------------------------------------------

xiazcy commented on pull request #1476:
URL: https://github.com/apache/tinkerpop/pull/1476#issuecomment-930699150


   The new changes are as follow:
   1. Cluster no longer marks host as available at its `init()`, so all hosts 
start with the default `isAvailable` as `false`.
   2. For ClusteredClients, at `initializeImplementation`, hosts are marked as 
available inside `initializeConnectionSetupForHost` if Connection Pool can be 
initialized successfully. At the end of initiating all hosts, if no hosts are 
available, then `NoHostAvailableException` will be thrown here. If there is at 
least one host made available, then any unavailable hosts will be passed into 
`handleUnavailableHosts` along with the executor, which will retry the 
connection initialization set up in the background through 
`host.makeUnavailable`, and the client is successfully initiated with the 
available host(s). 
   3. For SessionedClients, hosts are randomly chosen, and if a previously 
marked available host is present, it will be chosen instead. Otherwise, the 
chosen host is tried and marked as available if Connection Pool can be 
initialized successfully, and if unsuccessful it will log the error and return 
to the Client `init()` function to throw the `NoHostAvailableException`. 
   
   Note that for the Gremlin Console remote acceptor test, due to the above 
changes, `NoHostAvailableException` will be thrown when there is no live server 
to connect, so I have changed it to assert that remote will throw an exception 
instead of being successful. Please let me know if it should behave another way 
(e.g. catch the exception and return an error message instead, etc.)
   
   Please also let me know if there are any questions or concerns with the 
changes. Thank you!


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


> Reconnect to server if Java driver fails to initialize
> ------------------------------------------------------
>
>                 Key: TINKERPOP-2569
>                 URL: https://issues.apache.org/jira/browse/TINKERPOP-2569
>             Project: TinkerPop
>          Issue Type: Bug
>          Components: driver
>    Affects Versions: 3.4.11
>            Reporter: Stephen Mallette
>            Priority: Minor
>
> As reported here on SO: 
> https://stackoverflow.com/questions/67586427/how-to-recover-with-a-retry-from-gremlin-nohostavailableexception
> If the host is unavailable at {{Client}} initialization then the host is not 
> put in a state where reconnect is possible. Essentially, this test for 
> {{GremlinServerIntegrateTest}} should pass:
> {code}
> @Test
>     public void shouldFailOnInitiallyDeadHost() throws Exception {
>         // start test with no server
>         this.stopServer();
>         final Cluster cluster = TestClientFactory.build().create();
>         final Client client = cluster.connect();
>         try {
>             // try to re-issue a request now that the server is down
>             client.submit("g").all().get(3000, TimeUnit.MILLISECONDS);
>             fail("Should throw an exception.");
>         } catch (RuntimeException re) {
>             // Client would have no active connections to the host, hence it 
> would encounter a timeout
>             // trying to find an alive connection to the host.
>             assertThat(re.getCause(), 
> instanceOf(NoHostAvailableException.class));
>             //
>             // should recover when the server comes back
>             //
>             // restart server
>             this.startServer();
>             // try a bunch of times to reconnect. on slower systems this may 
> simply take longer...looking at you travis
>             for (int ix = 1; ix < 11; ix++) {
>                 // the retry interval is 1 second, wait a bit longer
>                 TimeUnit.SECONDS.sleep(5);
>                 try {
>                     final List<Result> results = 
> client.submit("1+1").all().get(3000, TimeUnit.MILLISECONDS);
>                     assertEquals(1, results.size());
>                     assertEquals(2, results.get(0).getInt());
>                 } catch (Exception ex) {
>                     if (ix == 10)
>                         fail("Should have eventually succeeded");
>                 }
>             }
>         } finally {
>             cluster.close();
>         }
>     }
> {code}
> Note that there is a similar test that first allows a connect to a host and 
> then kills it and then restarts it again called {{shouldFailOnDeadHost()}} 
> which demonstrates that reconnection works in that situation.
> I thought it might be an easy to fix to simply call 
> {{considerHostUnavailable()}} in the {{ConnectionPool}} constructor in the 
> event of a {{CompletionException}} which should kickstart the reconnect 
> process. The reconnects started firing but they all failed for some reason. I 
> didn't have time to investigate further than than. 
> Currently the only workaround is to recreate the `Client` if this sort of 
> situation occurs.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Reply via email to