[
https://issues.apache.org/jira/browse/TINKERPOP-2769?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17569894#comment-17569894
]
ASF GitHub Bot commented on TINKERPOP-2769:
---
jroimartin commented on PR #1761:
URL: https://github.com/apache/tinkerpop/pull/1761#issuecomment-1192356654
@kenhuuu I tested the PR and I can confirm that both the gremlin-go and the
gremlin-python reproducers are not able to trigger the issue anymore. Thanks!
> gremlin-server does not reply with a timeout response to all timed out
> requests
> ---
>
> Key: TINKERPOP-2769
> URL: https://issues.apache.org/jira/browse/TINKERPOP-2769
> Project: TinkerPop
> Issue Type: Bug
>Affects Versions: 3.6.0, 3.5.3, 3.6.1, 3.5.4
>Reporter: Roi Martin
>Priority: Major
>
> The following reproducer hangs because not all timed out requests are
> reported by gremlin-server.
> *main.go*
> {code:go}
> package main
> import (
> "log"
> "sync"
> "sync/atomic"
> gremlingo "github.com/apache/tinkerpop/gremlin-go/v3/driver"
> )
> const (
> gremlinServerEndpoint = "ws://localhost:8182/gremlin"
> nWorkers = 250
> )
> func main() {
> conn, err := gremlingo.NewDriverRemoteConnection(gremlinServerEndpoint,
> func(settings *gremlingo.DriverRemoteConnectionSettings) {
> settings.LogVerbosity = gremlingo.Off
> })
> if err != nil {
> log.Fatalf("could not create remote connection: %v", err)
> }
> defer conn.Close()
> g := gremlingo.Traversal_().WithRemote(conn)
> var (
> wg sync.WaitGroup
> nPendingRequests int32
> )
> for i := 0; i < nWorkers; i++ {
> wg.Add(1)
> go func(id int) {
> defer wg.Done()
> atomic.AddInt32(&nPendingRequests, 1)
> result, err := g.
> Inject(1).
>
> SideEffect(&gremlingo.Lambda{"Thread.sleep(5_000)", "gremlin-groovy"}).
> Next()
> npr := atomic.AddInt32(&nPendingRequests, -1)
> log.Printf("[%v] result=%v err=%v nPendingRequests=%v",
> id, result, err, npr)
> }(i)
> }
> log.Print("waiting")
> wg.Wait()
> log.Print("done")
> }
> {code}
> Steps to reproduce the issue:
> 1. Run gremlin-server:
> {noformat}
> docker run --rm -ti -p 8182:8182 tinkerpop/gremlin-server:3.5.4-SNAPSHOT 2>&1
> | tee log
> {noformat}
> 2. Execute the reproducer:
> {noformat}
> go run main.go
> {noformat}
> 3. Wait ~30s for the reproducer to hang.
> All these tests has been performed against the default gremlin-server
> configuration. It is important to note that in this machine the gremlin pool
> is set to 8. Higher values may require to adjust the number of workers in the
> reproducer to trigger the issue.
> After following these steps, the output of the reproducer should look like
> this:
> {noformat}
> 2022/06/29 16:08:47 waiting
> 2022/06/29 16:08:52 [22] result=result{object=1 class=int64} err=
> nPendingRequests=249
> ...
> 2022/06/29 16:09:17 [154] result=result{object=1 class=int64} err=
> nPendingRequests=208
> 2022/06/29 16:09:17 [243] result=result{object=1 class=int64} err=
> nPendingRequests=207
> 2022/06/29 16:09:17 [226] result= err=E0903: there are no results left
> nPendingRequests=206
> 2022/06/29 16:09:17 [164] result= err=E0903: there are no results left
> nPendingRequests=204
> ...
> 2022/06/29 16:09:17 [131] result= err=E0903: there are no results left
> nPendingRequests=113
> {noformat}
> On the other side, the gremlin-server logs should look like this:
> {noformat}
> [INFO] ? - Channel started at port 8182.
> [WARN] ? - A timeout occurred during traversal evaluation of
> [RequestMessage{, requestId=9d7ccc61-0e7e-4105-ae35-eb81a3895170,
> op='bytecode', processor='traversal', args={gremlin=[[], [inject(1),
> sideEffect(lambda[Thread.sleep(5_000)])]], aliases={g=g}}}] - consider
> increasing the limit given to evaluationTimeout
> [WARN] ? - A timeout occurred during traversal evaluation of
> [RequestMessage{, requestId=6f2d939a-31c9-48fd-bb54-b2bfa1fbd4de,
> op='bytecode', processor='traversal', args={gremlin=[[], [inject(1),
> sideEffect(lambda[Thread.sleep(5_000)])]], aliases={g=g}}}] - consider
> increasing the limit given to evaluationTimeout
> ...
> {noformat}
> In this specific case,
> - 43 requests finished successfully (250-207 = 43)
> - 94 requests timed out (207-113 = 94)
> - 113 requests never finished (making the reproducer hang)
> If we check the gremlin-server logs, the number of reported timeouts matches
> the output of t