[ 
https://issues.apache.org/jira/browse/TINKERPOP-2769?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17569894#comment-17569894
 ] 

ASF GitHub Bot commented on TINKERPOP-2769:
-------------------------------------------

jroimartin commented on PR #1761:
URL: https://github.com/apache/tinkerpop/pull/1761#issuecomment-1192356654

   @kenhuuu I tested the PR and I can confirm that both the gremlin-go and the 
gremlin-python reproducers are not able to trigger the issue anymore. Thanks!




> gremlin-server does not reply with a timeout response to all timed out 
> requests
> -------------------------------------------------------------------------------
>
>                 Key: TINKERPOP-2769
>                 URL: https://issues.apache.org/jira/browse/TINKERPOP-2769
>             Project: TinkerPop
>          Issue Type: Bug
>    Affects Versions: 3.6.0, 3.5.3, 3.6.1, 3.5.4
>            Reporter: Roi Martin
>            Priority: Major
>
> The following reproducer hangs because not all timed out requests are 
> reported by gremlin-server.
> *main.go*
> {code:go}
> package main
> import (
>       "log"
>       "sync"
>       "sync/atomic"
>       gremlingo "github.com/apache/tinkerpop/gremlin-go/v3/driver"
> )
> const (
>       gremlinServerEndpoint = "ws://localhost:8182/gremlin"
>       nWorkers              = 250
> )
> func main() {
>       conn, err := gremlingo.NewDriverRemoteConnection(gremlinServerEndpoint, 
> func(settings *gremlingo.DriverRemoteConnectionSettings) {
>               settings.LogVerbosity = gremlingo.Off
>       })
>       if err != nil {
>               log.Fatalf("could not create remote connection: %v", err)
>       }
>       defer conn.Close()
>       g := gremlingo.Traversal_().WithRemote(conn)
>       var (
>               wg               sync.WaitGroup
>               nPendingRequests int32
>       )
>       for i := 0; i < nWorkers; i++ {
>               wg.Add(1)
>               go func(id int) {
>                       defer wg.Done()
>                       atomic.AddInt32(&nPendingRequests, 1)
>                       result, err := g.
>                               Inject(1).
>                               
> SideEffect(&gremlingo.Lambda{"Thread.sleep(5_000)", "gremlin-groovy"}).
>                               Next()
>                       npr := atomic.AddInt32(&nPendingRequests, -1)
>                       log.Printf("[%v] result=%v err=%v nPendingRequests=%v", 
> id, result, err, npr)
>               }(i)
>       }
>       log.Print("waiting")
>       wg.Wait()
>       log.Print("done")
> }
> {code}
> Steps to reproduce the issue:
> 1. Run gremlin-server:
> {noformat}
> docker run --rm -ti -p 8182:8182 tinkerpop/gremlin-server:3.5.4-SNAPSHOT 2>&1 
> | tee log
> {noformat}
> 2. Execute the reproducer:
> {noformat}
> go run main.go
> {noformat}
> 3. Wait ~30s for the reproducer to hang.
> All these tests has been performed against the default gremlin-server 
> configuration. It is important to note that in this machine the gremlin pool 
> is set to 8. Higher values may require to adjust the number of workers in the 
> reproducer to trigger the issue.
> After following these steps, the output of the reproducer should look like 
> this:
> {noformat}
> 2022/06/29 16:08:47 waiting
> 2022/06/29 16:08:52 [22] result=result{object=1 class=int64} err=<nil> 
> nPendingRequests=249
> ...
> 2022/06/29 16:09:17 [154] result=result{object=1 class=int64} err=<nil> 
> nPendingRequests=208
> 2022/06/29 16:09:17 [243] result=result{object=1 class=int64} err=<nil> 
> nPendingRequests=207
> 2022/06/29 16:09:17 [226] result=<nil> err=E0903: there are no results left 
> nPendingRequests=206
> 2022/06/29 16:09:17 [164] result=<nil> err=E0903: there are no results left 
> nPendingRequests=204
> ...
> 2022/06/29 16:09:17 [131] result=<nil> err=E0903: there are no results left 
> nPendingRequests=113
> {noformat}
> On the other side, the gremlin-server logs should look like this:
> {noformat}
> [INFO] ? - Channel started at port 8182.
> [WARN] ? - A timeout occurred during traversal evaluation of 
> [RequestMessage{, requestId=9d7ccc61-0e7e-4105-ae35-eb81a3895170, 
> op='bytecode', processor='traversal', args={gremlin=[[], [inject(1), 
> sideEffect(lambda[Thread.sleep(5_000)])]], aliases={g=g}}}] - consider 
> increasing the limit given to evaluationTimeout
> [WARN] ? - A timeout occurred during traversal evaluation of 
> [RequestMessage{, requestId=6f2d939a-31c9-48fd-bb54-b2bfa1fbd4de, 
> op='bytecode', processor='traversal', args={gremlin=[[], [inject(1), 
> sideEffect(lambda[Thread.sleep(5_000)])]], aliases={g=g}}}] - consider 
> increasing the limit given to evaluationTimeout
> ...
> {noformat}
> In this specific case,
>  - 43 requests finished successfully (250-207 = 43)
>  - 94 requests timed out (207-113 = 94)
>  - 113 requests never finished (making the reproducer hang)
> If we check the gremlin-server logs, the number of reported timeouts matches 
> the output of the reproducer, which makes me think that gremlin-server is the 
> culprit.
> {noformat}
> $ grep 'A timeout occurred' log | wc -l
> 94
> {noformat}
> Inspecting the hanging goroutines in the reproducer shows that they are IO 
> waiting:
> {noformat}
> (dlv) grs
> ...
>   Goroutine 373 - User: /home/n/goroot/src/net/fd_posix.go:55 
> net.(*netFD).Read (0x684273) [IO wait]
> ...
> (dlv) gr 373
> Switched from 0 to 373 (thread 709858)
> (dlv) bt
>  0  0x000000000043d212 in runtime.gopark
>     at /home/n/goroot/src/runtime/proc.go:362
>  1  0x0000000000435a4a in runtime.netpollblock
>     at /home/n/goroot/src/runtime/netpoll.go:522
>  2  0x0000000000465e05 in internal/poll.runtime_pollWait
>     at /home/n/goroot/src/runtime/netpoll.go:302
>  3  0x00000000004ff2c8 in internal/poll.(*pollDesc).wait
>     at /home/n/goroot/src/internal/poll/fd_poll_runtime.go:83
>  4  0x00000000004ff377 in internal/poll.(*pollDesc).waitRead
>     at /home/n/goroot/src/internal/poll/fd_poll_runtime.go:88
>  5  0x0000000000500f8f in internal/poll.(*FD).Read
>     at /home/n/goroot/src/internal/poll/fd_unix.go:167
>  6  0x0000000000684273 in net.(*netFD).Read
>     at /home/n/goroot/src/net/fd_posix.go:55
>  7  0x000000000069ce49 in net.(*conn).Read
>     at /home/n/goroot/src/net/net.go:183
>  8  0x00000000005d9eb4 in bufio.(*Reader).fill
>     at /home/n/goroot/src/bufio/bufio.go:106
>  9  0x00000000005da278 in bufio.(*Reader).Peek
>     at /home/n/goroot/src/bufio/bufio.go:144
> 10  0x00000000008298a9 in github.com/gorilla/websocket.(*Conn).read
>     at /home/n/go/pkg/mod/github.com/gorilla/websocket@v1.5.0/conn.go:371
> 11  0x000000000082d5fd in github.com/gorilla/websocket.(*Conn).advanceFrame
>     at /home/n/go/pkg/mod/github.com/gorilla/websocket@v1.5.0/conn.go:809
> 12  0x000000000082f1d1 in github.com/gorilla/websocket.(*Conn).NextReader
>     at /home/n/go/pkg/mod/github.com/gorilla/websocket@v1.5.0/conn.go:1009
> 13  0x000000000082fccb in github.com/gorilla/websocket.(*Conn).ReadMessage
>     at /home/n/go/pkg/mod/github.com/gorilla/websocket@v1.5.0/conn.go:1093
> 14  0x00000000008418fb in 
> github.com/apache/tinkerpop/gremlin-go/v3/driver.(*gorillaTransporter).Read
>     at 
> /home/n/go/pkg/mod/github.com/apache/tinkerpop/gremlin-go/v3@v3.5.4-rc4.0.20220628161437-04f47d242010/driver/gorillaTransporter.go:116
> 15  0x000000000085a302 in 
> github.com/apache/tinkerpop/gremlin-go/v3/driver.(*gremlinServerWSProtocol).readLoop
>     at 
> /home/n/go/pkg/mod/github.com/apache/tinkerpop/gremlin-go/v3@v3.5.4-rc4.0.20220628161437-04f47d242010/driver/protocol.go:57
> 16  0x000000000085c2e7 in 
> github.com/apache/tinkerpop/gremlin-go/v3/driver.newGremlinServerWSProtocol.func1
>     at 
> /home/n/go/pkg/mod/github.com/apache/tinkerpop/gremlin-go/v3@v3.5.4-rc4.0.20220628161437-04f47d242010/driver/protocol.go:196
> 17  0x000000000046a841 in runtime.goexit
>     at /home/n/goroot/src/runtime/asm_amd64.s:1571
> {noformat}
> The reason why the go program does not finish is that gremlin-server keeps 
> responding to pings, therefore gremlin-go's pong handler keeps extending the 
> read deadline of the connection.
> */tinkerpop/gremlin-go/driver/gorillaTransporter.go*
> {code:go}
>       transporter.connection.SetPongHandler(func(string) error {
>               err := transporter.connection.SetReadDeadline(time.Now().Add(2 
> * transporter.connSettings.keepAliveInterval))
>               if err != nil {
>                       return err
>               }
>               return nil
>       })
> {code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to