Ken Hu created TINKERPOP-3114: --------------------------------- Summary: Update connection pool handling in Gremlin Python Key: TINKERPOP-3114 URL: https://issues.apache.org/jira/browse/TINKERPOP-3114 Project: TinkerPop Issue Type: Improvement Components: python Affects Versions: 3.7.2, 3.6.7 Reporter: Ken Hu
A Discord user (e8l) was mentioning problems they were seeing with the connection pool in Gremlin Python. The pool can't determine whether or not a connection is healthy and therefore can't remove any problematic connections from the pool. This can lead to cases where the pool fills up with unusable connections which leads to the driver being unresponsive as it waits for an available connection. As reported by user e8l on Discord: {quote}I am struggling to avoid problems after a connection error occur. And now, I suspect it might be led by something bug of gremlinpython... Are these bugs? Or just I use it wrongly? Please let me know. Case 1: Script is hanged up when all pooled connections are consumed? When I specify wrong url to simulate network error, gremlinpython might consume connections and do not return them into the pool. So, below script is hanged up after all pooled connections are consumed. Python Script: see case1.py The Output: see case1-output.txt The result is changed when I specify different value to pool_size argument. My expectation is that error messages are shown in 9 times and the script ends. Case 2: Manual transaction is never rolled back(closed) Same as case 1, manual transaction is never ended. So, I cannot recover the error. Python Script: see case2.py The Output: see case2-output.py My expectation is that this script is end after trying 9 times and all trials are failed. Case 3: Once a connection error occurred, pooled connections are broken After I stopped TinkerPop server(JanusGraph) temporary, some pooled connections are broken and will not be recovered. Python Script: see case3.py The Output: see case3-output.txt My expectation is that connections are refreshed if they are not available when get them from the pool. {quote} case1: {code:python} from gremlin_python.driver.serializer import GraphSONMessageSerializer from gremlin_python.process.anonymous_traversal import traversal from gremlin_python.driver.driver_remote_connection import DriverRemoteConnection from gremlin_python.driver.aiohttp.transport import AiohttpTransportconnection = DriverRemoteConnection( 'ws://localhost:18182/gremlin', # this is wrong url 'g', pool_size=4, message_serializer=GraphSONMessageSerializer(), transport_factory=lambda: AiohttpTransport()) g = traversal().with_remote(connection)for count in range(1,10): try: print(f'{count=}') result = g.V().to_list() print(f'{result=}') except Exception as e: print(e)connection.close(){code} case1-output: {code:java} count=1 Cannot connect to host localhost:18182 ssl:default [Connect call failed ('127.0.0.1', 18182)] count=2 Cannot connect to host localhost:18182 ssl:default [Connect call failed ('127.0.0.1', 18182)] count=3 Cannot connect to host localhost:18182 ssl:default [Connect call failed ('127.0.0.1', 18182)] count=4 Cannot connect to host localhost:18182 ssl:default [Connect call failed ('127.0.0.1', 18182)] count=5 # the script is hanged here and we need to send signal(Ctrl+C) to abort this. {code} case2: {code:python} from gremlin_python.driver.serializer import GraphSONMessageSerializer from gremlin_python.process.anonymous_traversal import traversal from gremlin_python.driver.driver_remote_connection import DriverRemoteConnection from gremlin_python.driver.aiohttp.transport import AiohttpTransportconnection = DriverRemoteConnection( 'ws://localhost:18182/gremlin', # this is wrong url 'g', message_serializer=GraphSONMessageSerializer(), transport_factory=lambda: AiohttpTransport()) g = traversal().with_remote(connection)for count in range(1,10): try: print(f'{count=}') tx = g.tx() gtx = tx.begin() result = gtx.add_v().next() tx.commit() print(f'{result=}') except Exception as e: print(e) if tx.is_open(): print('rollback') try: tx.rollback() except Exception as re: print(re) finally: if tx.is_open(): print('close transaction') tx.close()connection.close() {code} case2-output: {code:python} count=1 Cannot connect to host localhost:18182 ssl:default [Connect call failed ('127.0.0.1', 18182)] rollback # the script is hanged here and we need to send signal(Ctrl+C) to abort this. # note that we have to send signals twice due to tx.close() in finally block. # tx.close() is also hanged, so we need to do that once more. {code} case3: {code:python} from gremlin_python.driver.serializer import GraphSONMessageSerializer from gremlin_python.process.anonymous_traversal import traversal from gremlin_python.driver.driver_remote_connection import DriverRemoteConnection from gremlin_python.driver.aiohttp.transport import AiohttpTransportfrom time import sleepconnection = DriverRemoteConnection( 'ws://localhost:8182/gremlin', 'g', pool_size=4, message_serializer=GraphSONMessageSerializer(), transport_factory=lambda: AiohttpTransport()) g = traversal().with_remote(connection)for count in range(1,10): try: print(f'{count=}') result = g.V().to_list() print(f'{result=}') except Exception as e: print(e) finally: sleep(15)connection.close(){code} case3-output {code:java} count=1 result=[] # I stopped JanusGraph here count=2 [Errno 104] Connection reset by peer # <- expected # I restart JanusGraph here count=3 [Errno 104] Connection reset by peer # <- maybe timing issue count=4 # sometimes exception is reported but I do not expect Exception ignored in: <function AiohttpTransport.__del__ at 0x7f639f5d9f80> Traceback (most recent call last): File "/home/xxx/venv/lib/python3.12/site-packages/gremlin_python/driver/aiohttp/transport.py", line 61, in __del__ self.close() File "/home/xxx/venv/lib/python3.12/site-packages/gremlin_python/driver/aiohttp/transport.py", line 132, in close self._loop.run_until_complete(async_close()) File "/usr/local/lib/python3.12/asyncio/base_events.py", line 663, in run_until_complete self._check_running() File "/usr/local/lib/python3.12/asyncio/base_events.py", line 624, in _check_running raise RuntimeError( RuntimeError: Cannot run the event loop while another loop is running /usr/local/lib/python3.12/threading.py:293: RuntimeWarning: coroutine 'AiohttpTransport.close.<locals>.async_close' was never awaited self._waiters = _deque() RuntimeWarning: Enable tracemalloc to get the object allocation traceback Unclosed client session client_session: <aiohttp.client.ClientSession object at 0x7f639f5e2420> Exception ignored in: <function AiohttpTransport.__del__ at 0x7f639f5d9f80> Traceback (most recent call last): File "/home/xxx/venv/lib/python3.12/site-packages/gremlin_python/driver/aiohttp/transport.py", line 61, in __del__ self.close() File "/home/xxx/venv/lib/python3.12/site-packages/gremlin_python/driver/aiohttp/transport.py", line 132, in close self._loop.run_until_complete(async_close()) File "/usr/local/lib/python3.12/asyncio/base_events.py", line 663, in run_until_complete self._check_running() File "/usr/local/lib/python3.12/asyncio/base_events.py", line 624, in _check_running raise RuntimeError( RuntimeError: Cannot run the event loop while another loop is running Unclosed client session client_session: <aiohttp.client.ClientSession object at 0x7f639f5e2d20> result=[] count=5 Connection was already closed. # <- unexpected count=6 result=[] count=7 Cannot write to closing transport # <- unexpected count=8 result=[] count=9 Cannot write to closing transport # <- unexpected{code} -- This message was sent by Atlassian Jira (v8.20.10#820010)