[ https://issues.apache.org/jira/browse/TINKERPOP-2405?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Stephen Mallette reassigned TINKERPOP-2405: ------------------------------------------- Assignee: Stephen Mallette > gremlinpython: traversal hangs when the connection is established but the > servers stops responding later > -------------------------------------------------------------------------------------------------------- > > Key: TINKERPOP-2405 > URL: https://issues.apache.org/jira/browse/TINKERPOP-2405 > Project: TinkerPop > Issue Type: Bug > Components: python > Affects Versions: 3.4.6 > Environment: Ubuntu 18.04, Flask 1.1.1, python 3.8.1, Amazon > Neptune, Gremlin Server > Reporter: Guilherme Quentel Melo > Assignee: Stephen Mallette > Priority: Major > > On a HTTP server that connects to Amazon Neptune, I've seen some situations > where a request just hangs and never returns any response. While > investigating this, I found out that it hangs right when it is going to query > Neptune. > The problem is that if the connection to Gremlin/Neptune is established and > after that the server does not respond any more, the gremlin connection never > times out, making the process/thread wait forever for a response that will > never come. > h1. How to reproduce > # Start a local gremlin server on the default port 8182 > # On a terminal, run {{nc}} to listen on port 8183 with {{nc -lk 8183}} > # Run the following python code to connect to the *8183* port: > {code:python} > from gremlin_python.driver.driver_remote_connection import > DriverRemoteConnection > from gremlin_python.process.anonymous_traversal import traversal > remote_connection = DriverRemoteConnection("ws://127.0.0.1:8183/gremlin", > "g") > g = traversal().withRemote(remote_connection) > > g.V().limit(1).toList() > {code} > # You will see the connection request on {{nc}} output. First time, don't do > anything and it will timeout saying the connection couldn't be established. > # Now repeat the steps, but make nc respond to establish the connection. The > quickest way I found is to manually relay the message the real gremlin server: > ## Copy the whole request from {{nc -l}} output > ## On another terminal, open a connection to the gremlin server with {{nc > 127.0.0.1 8182}} > ## Paste the request you copied before to {{nc 127.0.0.1 8182}} terminal > ## Copy the gremlin server response and paste into {{nc -l}} output > ## The connection will be established and the {{nc -l}} will receive some > unprintable chars corresponding to {{g.V().limit(1).toList()}} > ## Now, if there is no response from {{nc -l}} process, the python code will > hang forever. > h1. Possible solution > As I looked into it, the problem seems that the {{TornadoTransport}} > implementation does not pass any timeout when reading (and writing) messages. > So, passing a timeout to {{self._loop.run_sync}} can solve the issue, at > least raising an exception when the server does not respond. > If I change the example above: > {code:python} > from gremlin_python.driver.driver_remote_connection import > DriverRemoteConnection > from gremlin_python.driver.tornado.transport import TornadoTransport > > from gremlin_python.process.anonymous_traversal import traversal > class CustomTornadoTransport(TornadoTransport): > def read(self): > return self._loop.run_sync(lambda: self._ws.read_message(), timeout=5) > remote_connection = DriverRemoteConnection("ws://127.0.0.1:8183/gremlin", > "g", transport_factory=CustomTornadoTransport) > g = traversal().withRemote(remote_connection) > > g.V().limit(1).toList() > {code} > and repeat the same steps, {{g.V().limit(1).toList()}} times out after not > getting any response from the server for 5 seconds. > I'm not sure if there should be any timeout for writing, but it seems it > should definitely be set for read operations. -- This message was sent by Atlassian Jira (v8.3.4#803005)