Ken Hu created TINKERPOP-3114:
---------------------------------

             Summary: Update connection pool handling in Gremlin Python
                 Key: TINKERPOP-3114
                 URL: https://issues.apache.org/jira/browse/TINKERPOP-3114
             Project: TinkerPop
          Issue Type: Improvement
          Components: python
    Affects Versions: 3.7.2, 3.6.7
            Reporter: Ken Hu


A Discord user (e8l) was mentioning problems they were seeing with the 
connection pool in Gremlin Python. The pool can't determine whether or not a 
connection is healthy and therefore can't remove any problematic connections 
from the pool. This can lead to cases where the pool fills up with unusable 
connections which leads to the driver being unresponsive as it waits for an 
available connection.

As reported by user e8l on Discord:
{quote}I am struggling to avoid problems after a connection error occur.
And now, I suspect it might be led by something bug of gremlinpython...

Are these bugs? Or just I use it wrongly?
Please let me know.

Case 1: Script is hanged up when all pooled connections are consumed?
When I specify wrong url to simulate network error,
gremlinpython might consume connections and do not return them into the pool.
So, below script is hanged up after all pooled connections are consumed.

Python Script: see case1.py
The Output: see case1-output.txt

The result is changed when I specify different value to pool_size argument.
My expectation is that error messages are shown in 9 times and the script ends.

Case 2: Manual transaction is never rolled back(closed)
Same as case 1, manual transaction is never ended.
So, I cannot recover the error.

Python Script: see case2.py
The Output: see case2-output.py

My expectation is that this script is end after trying 9 times and all trials 
are failed.

Case 3: Once a connection error occurred, pooled connections are broken
After I stopped TinkerPop server(JanusGraph) temporary,
some pooled connections are broken and will not be recovered.

Python Script: see case3.py
The Output: see case3-output.txt

My expectation is that connections are refreshed if they are not available when 
get them from the pool.
{quote}
case1:
{code:python}
from gremlin_python.driver.serializer import GraphSONMessageSerializer
from gremlin_python.process.anonymous_traversal import traversal
from gremlin_python.driver.driver_remote_connection import 
DriverRemoteConnection
from gremlin_python.driver.aiohttp.transport import AiohttpTransportconnection 
= DriverRemoteConnection(
    'ws://localhost:18182/gremlin', # this is wrong url
    'g',
    pool_size=4,
    message_serializer=GraphSONMessageSerializer(),
    transport_factory=lambda: AiohttpTransport())
g = traversal().with_remote(connection)for count in range(1,10):
  try:
    print(f'{count=}')
    result = g.V().to_list()
    print(f'{result=}')
  except Exception as e:
    print(e)connection.close(){code}
case1-output:
{code:java}
count=1
Cannot connect to host localhost:18182 ssl:default [Connect call failed 
('127.0.0.1', 18182)]
count=2
Cannot connect to host localhost:18182 ssl:default [Connect call failed 
('127.0.0.1', 18182)]
count=3
Cannot connect to host localhost:18182 ssl:default [Connect call failed 
('127.0.0.1', 18182)]
count=4
Cannot connect to host localhost:18182 ssl:default [Connect call failed 
('127.0.0.1', 18182)]
count=5
# the script is hanged here and we need to send signal(Ctrl+C) to abort this. 
{code}
case2:
{code:python}
from gremlin_python.driver.serializer import GraphSONMessageSerializer
from gremlin_python.process.anonymous_traversal import traversal
from gremlin_python.driver.driver_remote_connection import 
DriverRemoteConnection
from gremlin_python.driver.aiohttp.transport import AiohttpTransportconnection 
= DriverRemoteConnection(
    'ws://localhost:18182/gremlin', # this is wrong url
    'g',
    message_serializer=GraphSONMessageSerializer(),
    transport_factory=lambda: AiohttpTransport())
g = traversal().with_remote(connection)for count in range(1,10):
  try:
    print(f'{count=}')
    tx = g.tx()
    gtx = tx.begin()
    result = gtx.add_v().next()
    tx.commit()
    print(f'{result=}')
  except Exception as e:
    print(e)
    if tx.is_open():
      print('rollback')
      try:
        tx.rollback()
      except Exception as re:
        print(re)
  finally:
    if tx.is_open():
      print('close transaction')
      tx.close()connection.close() {code}
case2-output:
{code:python}
count=1
Cannot connect to host localhost:18182 ssl:default [Connect call failed 
('127.0.0.1', 18182)]
rollback
# the script is hanged here and we need to send signal(Ctrl+C) to abort this.
# note that we have to send signals twice due to tx.close() in finally block.
# tx.close() is also hanged, so we need to do that once more. {code}
case3:
{code:python}
from gremlin_python.driver.serializer import GraphSONMessageSerializer
from gremlin_python.process.anonymous_traversal import traversal
from gremlin_python.driver.driver_remote_connection import 
DriverRemoteConnection
from gremlin_python.driver.aiohttp.transport import AiohttpTransportfrom time 
import sleepconnection = DriverRemoteConnection(
    'ws://localhost:8182/gremlin',
    'g',
    pool_size=4,
    message_serializer=GraphSONMessageSerializer(),
    transport_factory=lambda: AiohttpTransport())
g = traversal().with_remote(connection)for count in range(1,10):
  try:
    print(f'{count=}')
    result = g.V().to_list()
    print(f'{result=}')
  except Exception as e:
    print(e)
  finally:
    sleep(15)connection.close(){code}
case3-output
{code:java}
count=1
result=[]
# I stopped JanusGraph here
count=2
[Errno 104] Connection reset by peer # <- expected
# I restart JanusGraph here
count=3
[Errno 104] Connection reset by peer # <- maybe timing issue
count=4
# sometimes exception is reported but I do not expect
Exception ignored in: <function AiohttpTransport.__del__ at 0x7f639f5d9f80>
Traceback (most recent call last):
  File 
"/home/xxx/venv/lib/python3.12/site-packages/gremlin_python/driver/aiohttp/transport.py",
 line 61, in __del__
    self.close()
  File 
"/home/xxx/venv/lib/python3.12/site-packages/gremlin_python/driver/aiohttp/transport.py",
 line 132, in close
    self._loop.run_until_complete(async_close())
  File "/usr/local/lib/python3.12/asyncio/base_events.py", line 663, in 
run_until_complete
    self._check_running()
  File "/usr/local/lib/python3.12/asyncio/base_events.py", line 624, in 
_check_running
    raise RuntimeError(
RuntimeError: Cannot run the event loop while another loop is running
/usr/local/lib/python3.12/threading.py:293: RuntimeWarning: coroutine 
'AiohttpTransport.close.<locals>.async_close' was never awaited
  self._waiters = _deque()
RuntimeWarning: Enable tracemalloc to get the object allocation traceback
Unclosed client session
client_session: <aiohttp.client.ClientSession object at 0x7f639f5e2420>
Exception ignored in: <function AiohttpTransport.__del__ at 0x7f639f5d9f80>
Traceback (most recent call last):
  File 
"/home/xxx/venv/lib/python3.12/site-packages/gremlin_python/driver/aiohttp/transport.py",
 line 61, in __del__
    self.close()
  File 
"/home/xxx/venv/lib/python3.12/site-packages/gremlin_python/driver/aiohttp/transport.py",
 line 132, in close
    self._loop.run_until_complete(async_close())
  File "/usr/local/lib/python3.12/asyncio/base_events.py", line 663, in 
run_until_complete
    self._check_running()
  File "/usr/local/lib/python3.12/asyncio/base_events.py", line 624, in 
_check_running
    raise RuntimeError(
RuntimeError: Cannot run the event loop while another loop is running
Unclosed client session
client_session: <aiohttp.client.ClientSession object at 0x7f639f5e2d20>
result=[]
count=5
Connection was already closed. # <- unexpected
count=6
result=[]
count=7
Cannot write to closing transport # <- unexpected
count=8
result=[]
count=9
Cannot write to closing transport # <- unexpected{code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to