[ 
https://issues.apache.org/jira/browse/TINKERPOP-2090?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16681521#comment-16681521
 ] 

Greg Pepin commented on TINKERPOP-2090:
---------------------------------------

Here is a response I received from Microsoft regarding aborted/closed 
connections on CosmosDb:

"Please find below few scenarios when it is expected for connections to be 
aborted/closed by the Cosmos DB Gremlin server?

If connections are being established, but then being closed, then the other 
possibilities are:
 # Cosmos DB Region is being upgraded:

 * This would see connections being forcibly closed as nodes are taken down.
 * Retrying requests/connections should be successful.
 * Typically the upgrade takes 2-3hrs.

 # Idle connections disconnected:

 * This should only occur after 10hrs.
 * There is a known issue where active connections will also get terminated 
after 10 hrs.

 # Firewall/Authentication failures:

 * If a firewall rule is changed, and an existing connection violates the new 
rule, then connection will be closed.
 * If a connection was established with authentication keys which are 
invalidated by a key rotation.

 # Protocol errors:

 * If a message received is malformed or does not adhere to the expected 
Gremlin handshake protocol, then the connection may be closed."

 

It sounds like the keep-alive functionality isn't enough to handle these 
scenarios.   Can the client be enhanced to check if a connection is active 
prior to using it?  Or maybe close connections on websocket errors?

In the short term we can dispose of the client and spin up a new on when a 
websocket error is detected.  However by biggest concern is that it can take 
over 10 minutes in some scenarios for that exception to be thrown.  How can we 
make the problematic connections fail fast?

> After running backend for a day or so System.IO.IOException keep throwing
> -------------------------------------------------------------------------
>
>                 Key: TINKERPOP-2090
>                 URL: https://issues.apache.org/jira/browse/TINKERPOP-2090
>             Project: TinkerPop
>          Issue Type: Bug
>          Components: dotnet
>    Affects Versions: 3.4.0
>         Environment: .NET Core 2.1.5
> Microsoft Azure
>            Reporter: Saber Karmous
>            Priority: Critical
>
> .NET Core 2.1.5
> Gremlin.NET 3.4.0-rc2 
> We're using the latest RC of the Gremlin client. And we have a gremlin client 
> that's being injected as a singleton through out IoC container. After running 
> the backend for a day or two it keeps throwing System.IO.IOExceptions. If we 
> restart the application it works again.
> We use Polly for out retry strategy, and retrying for 9 times. But it keeps 
> failing.
> I added the stack trace below. Reproducing is a bit of a pain in the behind, 
> you have to wait for a day or two for the exception to occur.
> {noformat}
> *no* System.IO.IOException:
>  at System.Runtime.ExceptionServices.ExceptionDispatchInfo.Throw 
> (System.Private.CoreLib, Version=4.0.0.0, Culture=neutral, 
> PublicKeyToken=7cec85d7bea7798e)
>  at System.Runtime.CompilerServices.TaskAwaiter.ThrowForNonSuccess 
> (System.Private.CoreLib, Version=4.0.0.0, Culture=neutral, 
> PublicKeyToken=7cec85d7bea7798e)
>  at 
> System.Runtime.CompilerServices.TaskAwaiter.HandleNonSuccessAndDebuggerNotification
>  (System.Private.CoreLib, Version=4.0.0.0, Culture=neutral, 
> PublicKeyToken=7cec85d7bea7798e)
>  at 
> System.Runtime.CompilerServices.ConfiguredValueTaskAwaitable+ConfiguredValueTaskAwaiter.GetResult
>  (System.Private.CoreLib, Version=4.0.0.0, Culture=neutral, 
> PublicKeyToken=7cec85d7bea7798e)
>  at 
> System.Net.Security.SslStreamInternal+<<WriteSingleChunk>g__CompleteAsync|36_1>d`1.MoveNext
>  (System.Net.Security, Version=4.1.1.0, Culture=neutral, 
> PublicKeyToken=b03f5f7f11d50a3a)
>  at System.Runtime.ExceptionServices.ExceptionDispatchInfo.Throw 
> (System.Private.CoreLib, Version=4.0.0.0, Culture=neutral, 
> PublicKeyToken=7cec85d7bea7798e)
>  at System.Runtime.CompilerServices.TaskAwaiter.ThrowForNonSuccess 
> (System.Private.CoreLib, Version=4.0.0.0, Culture=neutral, 
> PublicKeyToken=7cec85d7bea7798e)
>  at 
> System.Runtime.CompilerServices.TaskAwaiter.HandleNonSuccessAndDebuggerNotification
>  (System.Private.CoreLib, Version=4.0.0.0, Culture=neutral, 
> PublicKeyToken=7cec85d7bea7798e)
>  at 
> System.Runtime.CompilerServices.ConfiguredValueTaskAwaitable+ConfiguredValueTaskAwaiter.GetResult
>  (System.Private.CoreLib, Version=4.0.0.0, Culture=neutral, 
> PublicKeyToken=7cec85d7bea7798e)
>  at 
> System.Net.Security.SslStreamInternal+<<WriteAsyncInternal>g__ExitWriteAsync|35_0>d`1.MoveNext
>  (System.Net.Security, Version=4.1.1.0, Culture=neutral, 
> PublicKeyToken=b03f5f7f11d50a3a)
>  at System.Runtime.ExceptionServices.ExceptionDispatchInfo.Throw 
> (System.Private.CoreLib, Version=4.0.0.0, Culture=neutral, 
> PublicKeyToken=7cec85d7bea7798e)
>  at System.Runtime.CompilerServices.TaskAwaiter.ThrowForNonSuccess 
> (System.Private.CoreLib, Version=4.0.0.0, Culture=neutral, 
> PublicKeyToken=7cec85d7bea7798e)
>  at 
> System.Runtime.CompilerServices.TaskAwaiter.HandleNonSuccessAndDebuggerNotification
>  (System.Private.CoreLib, Version=4.0.0.0, Culture=neutral, 
> PublicKeyToken=7cec85d7bea7798e)
>  at Gremlin.Net.Driver.WebSocketConnection+<SendMessageAsync>d__7.MoveNext 
> (Gremlin.Net, Version=3.4.0.0, Culture=neutral, 
> PublicKeyToken=d2035e9aa387a711)
>  at System.Runtime.ExceptionServices.ExceptionDispatchInfo.Throw 
> (System.Private.CoreLib, Version=4.0.0.0, Culture=neutral, 
> PublicKeyToken=7cec85d7bea7798e)
>  at System.Runtime.CompilerServices.TaskAwaiter.ThrowForNonSuccess 
> (System.Private.CoreLib, Version=4.0.0.0, Culture=neutral, 
> PublicKeyToken=7cec85d7bea7798e)
>  at 
> System.Runtime.CompilerServices.TaskAwaiter.HandleNonSuccessAndDebuggerNotification
>  (System.Private.CoreLib, Version=4.0.0.0, Culture=neutral, 
> PublicKeyToken=7cec85d7bea7798e)
>  at Gremlin.Net.Driver.Connection+<SendAsync>d__13.MoveNext (Gremlin.Net, 
> Version=3.4.0.0, Culture=neutral, PublicKeyToken=d2035e9aa387a711)
>  at System.Runtime.ExceptionServices.ExceptionDispatchInfo.Throw 
> (System.Private.CoreLib, Version=4.0.0.0, Culture=neutral, 
> PublicKeyToken=7cec85d7bea7798e)
>  at System.Runtime.CompilerServices.TaskAwaiter.ThrowForNonSuccess 
> (System.Private.CoreLib, Version=4.0.0.0, Culture=neutral, 
> PublicKeyToken=7cec85d7bea7798e)
>  at 
> System.Runtime.CompilerServices.TaskAwaiter.HandleNonSuccessAndDebuggerNotification
>  (System.Private.CoreLib, Version=4.0.0.0, Culture=neutral, 
> PublicKeyToken=7cec85d7bea7798e)
>  at Gremlin.Net.Driver.Connection+<SubmitAsync>d__8`1.MoveNext (Gremlin.Net, 
> Version=3.4.0.0, Culture=neutral, PublicKeyToken=d2035e9aa387a711)
>  at System.Runtime.ExceptionServices.ExceptionDispatchInfo.Throw 
> (System.Private.CoreLib, Version=4.0.0.0, Culture=neutral, 
> PublicKeyToken=7cec85d7bea7798e)
>  at System.Runtime.CompilerServices.TaskAwaiter.ThrowForNonSuccess 
> (System.Private.CoreLib, Version=4.0.0.0, Culture=neutral, 
> PublicKeyToken=7cec85d7bea7798e)
>  at 
> System.Runtime.CompilerServices.TaskAwaiter.HandleNonSuccessAndDebuggerNotification
>  (System.Private.CoreLib, Version=4.0.0.0, Culture=neutral, 
> PublicKeyToken=7cec85d7bea7798e)
>  at Gremlin.Net.Driver.ProxyConnection+<SubmitAsync>d__3`1.MoveNext 
> (Gremlin.Net, Version=3.4.0.0, Culture=neutral, 
> PublicKeyToken=d2035e9aa387a711)
>  at System.Runtime.ExceptionServices.ExceptionDispatchInfo.Throw 
> (System.Private.CoreLib, Version=4.0.0.0, Culture=neutral, 
> PublicKeyToken=7cec85d7bea7798e)
>  at System.Runtime.CompilerServices.TaskAwaiter.ThrowForNonSuccess 
> (System.Private.CoreLib, Version=4.0.0.0, Culture=neutral, 
> PublicKeyToken=7cec85d7bea7798e)
>  at 
> System.Runtime.CompilerServices.TaskAwaiter.HandleNonSuccessAndDebuggerNotification
>  (System.Private.CoreLib, Version=4.0.0.0, Culture=neutral, 
> PublicKeyToken=7cec85d7bea7798e)
>  at Gremlin.Net.Driver.GremlinClient+<SubmitAsync>d__6`1.MoveNext 
> (Gremlin.Net, Version=3.4.0.0, Culture=neutral, 
> PublicKeyToken=d2035e9aa387a711)
>  at System.Runtime.ExceptionServices.ExceptionDispatchInfo.Throw 
> (System.Private.CoreLib, Version=4.0.0.0, Culture=neutral, 
> PublicKeyToken=7cec85d7bea7798e)
>  at System.Runtime.CompilerServices.TaskAwaiter.ThrowForNonSuccess 
> (System.Private.CoreLib, Version=4.0.0.0, Culture=neutral, 
> PublicKeyToken=7cec85d7bea7798e)
>  at 
> System.Runtime.CompilerServices.TaskAwaiter.HandleNonSuccessAndDebuggerNotification
>  (System.Private.CoreLib, Version=4.0.0.0, Culture=neutral, 
> PublicKeyToken=7cec85d7bea7798e)
>  at Gremlin.Net.Driver.GremlinClientExtensions+<SubmitAsync>d__4`1.MoveNext 
> (Gremlin.Net, Version=3.4.0.0, Culture=neutral, 
> PublicKeyToken=d2035e9aa387a711)
>  at System.Runtime.ExceptionServices.ExceptionDispatchInfo.Throw 
> (System.Private.CoreLib, Version=4.0.0.0, Culture=neutral, 
> PublicKeyToken=7cec85d7bea7798e)
>  at System.Runtime.CompilerServices.TaskAwaiter.ThrowForNonSuccess 
> (System.Private.CoreLib, Version=4.0.0.0, Culture=neutral, 
> PublicKeyToken=7cec85d7bea7798e)
>  at 
> System.Runtime.CompilerServices.TaskAwaiter.HandleNonSuccessAndDebuggerNotification
>  (System.Private.CoreLib, Version=4.0.0.0, Culture=neutral, 
> PublicKeyToken=7cec85d7bea7798e)
>  at System.Runtime.CompilerServices.TaskAwaiter`1.GetResult 
> (System.Private.CoreLib, Version=4.0.0.0, Culture=neutral, 
> PublicKeyToken=7cec85d7bea7798e)
>  at 
> boskalis.world.data.repository.Services.AzureCosmosDBGremlinProvider+<>c__DisplayClass9_0`1+<<Query>b__5>d.MoveNext
>  (boskalis.world.data.logic, Version=1.0.0.0, Culture=neutral, 
> PublicKeyToken=nullboskalis.world.data.logic, Version=1.0.0.0, 
> Culture=neutral, PublicKeyToken=null: 
> D:\a\1\s\src\boskalis.world.data.logic\Services\AzureCosmosDBGremlinProvider.csboskalis.world.data.logic,
>  Version=1.0.0.0, Culture=neutral, PublicKeyToken=null: 222)
>  at System.Runtime.ExceptionServices.ExceptionDispatchInfo.Throw 
> (System.Private.CoreLib, Version=4.0.0.0, Culture=neutral, 
> PublicKeyToken=7cec85d7bea7798e)
>  at System.Runtime.CompilerServices.TaskAwaiter.ThrowForNonSuccess 
> (System.Private.CoreLib, Version=4.0.0.0, Culture=neutral, 
> PublicKeyToken=7cec85d7bea7798e)
>  at 
> System.Runtime.CompilerServices.TaskAwaiter.HandleNonSuccessAndDebuggerNotification
>  (System.Private.CoreLib, Version=4.0.0.0, Culture=neutral, 
> PublicKeyToken=7cec85d7bea7798e)
>  at 
> Polly.Policy+<>c__DisplayClass181_0`1+<<ExecuteAsyncInternal>b__0>d.MoveNext 
> (Polly, Version=6.0.0.0, Culture=neutral, PublicKeyToken=c8a3ffc3f8f825cc)
>  at System.Runtime.ExceptionServices.ExceptionDispatchInfo.Throw 
> (System.Private.CoreLib, Version=4.0.0.0, Culture=neutral, 
> PublicKeyToken=7cec85d7bea7798e)
>  at System.Runtime.CompilerServices.TaskAwaiter.ThrowForNonSuccess 
> (System.Private.CoreLib, Version=4.0.0.0, Culture=neutral, 
> PublicKeyToken=7cec85d7bea7798e)
>  at 
> System.Runtime.CompilerServices.TaskAwaiter.HandleNonSuccessAndDebuggerNotification
>  (System.Private.CoreLib, Version=4.0.0.0, Culture=neutral, 
> PublicKeyToken=7cec85d7bea7798e)
>  at 
> Polly.RetrySyntaxAsync+<>c__DisplayClass25_1+<<WaitAndRetryAsync>b__1>d.MoveNext
>  (Polly, Version=6.0.0.0, Culture=neutral, PublicKeyToken=c8a3ffc3f8f825cc)
>  at System.Runtime.ExceptionServices.ExceptionDispatchInfo.Throw 
> (System.Private.CoreLib, Version=4.0.0.0, Culture=neutral, 
> PublicKeyToken=7cec85d7bea7798e)
>  at System.Runtime.CompilerServices.TaskAwaiter.ThrowForNonSuccess 
> (System.Private.CoreLib, Version=4.0.0.0, Culture=neutral, 
> PublicKeyToken=7cec85d7bea7798e)
>  at Polly.Retry.RetryEngine+<ImplementationAsync>d__1`1.MoveNext (Polly, 
> Version=6.0.0.0, Culture=neutral, PublicKeyToken=c8a3ffc3f8f825cc)
> Inner exception System.Net.Sockets.SocketException handled at 
> System.Runtime.ExceptionServices.ExceptionDispatchInfo.Throw:
> {noformat}
> I'm going to implement a workaround by recycling the client every couple of 
> hours.
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Reply via email to