[
https://issues.apache.org/jira/browse/TINKERPOP-2569?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17418343#comment-17418343
]
ASF GitHub Bot commented on TINKERPOP-2569:
-------------------------------------------
xiazcy commented on pull request #1476:
URL: https://github.com/apache/tinkerpop/pull/1476#issuecomment-924443050
Hi Stephen, thank you for running the tests! I will use the command for my
local tests from now on.
For the failed test, I am not sure if there is a way to fix this
reconnection issue without moving away from “lazy connection”.
From my understanding of this issue, it is due to marking a host as
“available” and marking client as “initialized” before confirming the host is
actually available. After this point, the host are not marked as
unavailable/uninitialized, and so the client thinks it’s a valid connection and
does not initiate reconnection even when the host later becomes available.
To avoid this, I’ve made the change so that hosts are not marked as
available upon cluster initiation, but marked as available only after
successful initialization of connection pools. This would lead to
`NoHostAvailableException()` and the client’s `initialized` field remaining
false if one connects to a dead host at first, and upon resubmit this would
lead to client calling `init()` again inside `submitAsync()`, which would lead
to successful reconnection when server/host restarts.
This technically breaks the “lazy connection” idea, making it impossible to
connect to a non-existent server/host, which caused that test to fail.
Confirming host availability at initialization was the most intuitive change,
though is “lazy connection” a concept we should hold? Thank you!
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
> Reconnect to server if Java driver fails to initialize
> ------------------------------------------------------
>
> Key: TINKERPOP-2569
> URL: https://issues.apache.org/jira/browse/TINKERPOP-2569
> Project: TinkerPop
> Issue Type: Bug
> Components: driver
> Affects Versions: 3.4.11
> Reporter: Stephen Mallette
> Priority: Minor
>
> As reported here on SO:
> https://stackoverflow.com/questions/67586427/how-to-recover-with-a-retry-from-gremlin-nohostavailableexception
> If the host is unavailable at {{Client}} initialization then the host is not
> put in a state where reconnect is possible. Essentially, this test for
> {{GremlinServerIntegrateTest}} should pass:
> {code}
> @Test
> public void shouldFailOnInitiallyDeadHost() throws Exception {
> // start test with no server
> this.stopServer();
> final Cluster cluster = TestClientFactory.build().create();
> final Client client = cluster.connect();
> try {
> // try to re-issue a request now that the server is down
> client.submit("g").all().get(3000, TimeUnit.MILLISECONDS);
> fail("Should throw an exception.");
> } catch (RuntimeException re) {
> // Client would have no active connections to the host, hence it
> would encounter a timeout
> // trying to find an alive connection to the host.
> assertThat(re.getCause(),
> instanceOf(NoHostAvailableException.class));
> //
> // should recover when the server comes back
> //
> // restart server
> this.startServer();
> // try a bunch of times to reconnect. on slower systems this may
> simply take longer...looking at you travis
> for (int ix = 1; ix < 11; ix++) {
> // the retry interval is 1 second, wait a bit longer
> TimeUnit.SECONDS.sleep(5);
> try {
> final List<Result> results =
> client.submit("1+1").all().get(3000, TimeUnit.MILLISECONDS);
> assertEquals(1, results.size());
> assertEquals(2, results.get(0).getInt());
> } catch (Exception ex) {
> if (ix == 10)
> fail("Should have eventually succeeded");
> }
> }
> } finally {
> cluster.close();
> }
> }
> {code}
> Note that there is a similar test that first allows a connect to a host and
> then kills it and then restarts it again called {{shouldFailOnDeadHost()}}
> which demonstrates that reconnection works in that situation.
> I thought it might be an easy to fix to simply call
> {{considerHostUnavailable()}} in the {{ConnectionPool}} constructor in the
> event of a {{CompletionException}} which should kickstart the reconnect
> process. The reconnects started firing but they all failed for some reason. I
> didn't have time to investigate further than than.
> Currently the only workaround is to recreate the `Client` if this sort of
> situation occurs.
--
This message was sent by Atlassian Jira
(v8.3.4#803005)