[ https://issues.apache.org/jira/browse/TINKERPOP-2369?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17105380#comment-17105380 ]
Stephen Mallette commented on TINKERPOP-2369: --------------------------------------------- Thanks for creating this issue. Of the two solutions you described I seem to recall looking into this one before: > by adding a listener for the close frame being sent to the underlying > channel to replace the connection. and not quite getting it to work for some reason. perhaps there's a JIRA issue somewhere that would remind me of what happened. > Connections in ConnectionPool are not replaced in background when underlying > channel is closed > ---------------------------------------------------------------------------------------------- > > Key: TINKERPOP-2369 > URL: https://issues.apache.org/jira/browse/TINKERPOP-2369 > Project: TinkerPop > Issue Type: Bug > Components: driver > Affects Versions: 3.4.1 > Reporter: Johannes Carlsen > Priority: Major > > Hi Tinkerpop team! > > We are using the Gremlin Java Driver to connect to an Amazon Neptune cluster. > We are using the IAM authentication feature provided by Neptune, which means > that individual websocket connections are closed by the server every 36 > hours, when their credentials expire. The current implementation of the > driver does not handle this situation well, as the Connection whose channel > has been closed by the server remains in the ConnectionPool. The connection > is only reported as dead and replaced when when it is later chosen by the > LoadBalancingStrategy to server a client request, which inevitably fails when > the connection attempts to write to the closed channel. > A fix for this bug would cause the connection pool to be automatically > refreshed in the background by either the keep-alive mechanism, which should > replace a connection if a keep-alive request fails, or by adding a listener > for the close frame being sent to the underlying channel to replace the > connection. Without a fix, the only way to recover from a stale connection is > to retry the request at the cluster level, which will allow the request to be > directed to a different connection. > I noticed a PR out for the .NET client to fix this behavior: > [https://github.com/apache/tinkerpop/pull/1279.] We are hoping for something > similar in the Gremlin Java Driver. -- This message was sent by Atlassian Jira (v8.3.4#803005)