[
https://issues.apache.org/jira/browse/TINKERPOP-2820?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17717438#comment-17717438
]
ASF GitHub Bot commented on TINKERPOP-2820:
-------------------------------------------
kenhuuu commented on PR #2051:
URL: https://github.com/apache/tinkerpop/pull/2051#issuecomment-1526908912
VOTE +1
> gremlin-python _close_session race condition/FD leak
> ----------------------------------------------------
>
> Key: TINKERPOP-2820
> URL: https://issues.apache.org/jira/browse/TINKERPOP-2820
> Project: TinkerPop
> Issue Type: Bug
> Components: python
> Affects Versions: 3.6.1
> Reporter: Alex Hamilton
> Assignee: Valentyn Kahamlyk
> Priority: Critical
> Fix For: 3.7.0, 3.6.3, 3.5.6
>
>
> There is a race condition in gremlin-python when closing session-based
> connections that results in leaking file descriptors for event loops -
> eventually leading to an `OSError [Errno 24] too many open files` error after
> enough transactions occur.
> The problem stems from a race condition when closing session based
> connections that causes the event loop opened for the session's connection to
> be left open.
> The problem is completely contained in these two methods from
> `gremlin_python.driver.client.py`
> ```py
> def close(self):
> # prevent the Client from being closed more than once. it raises errors
> if new jobby jobs
> # get submitted to the executor when it is shutdown
> if self._closed:
> return
> if self._session_enabled:
> self._close_session() # 1. (see below)
> log.info("Closing Client with url '%s'", self._url)
> while not self._pool.empty(): # 3. (see below)
> conn = self._pool.get(True)
> conn.close()
> self._executor.shutdown()
> self._closed = True
> def _close_session(self):
> message = request.RequestMessage(
> processor='session', op='close',
> args=\{'session': str(self._session)})
> conn = self._pool.get(True)
> return conn.write(message).result() # 2. (see below)
> ```
> 1. `_close_session()` called
> 2. `.result()` waits for the _write_ to finish, but does *not* wait for the
> _read_ to finish. `conn` does not get put back into `self._pool` until AFTER
> the read finishes (`gremlin_python.driver.connection.Connection._receive()`).
> However, this method returns early and goes to 3.
> 3. this while loop is not entered to close out the connections. This leaves
> the conn's event loop running, never to be closed.
> I was able to solve this by modifying `_close_session` as follows:
> ```py
> def _close_session(self):
> message = request.RequestMessage(
> processor='session', op='close',
> args=\{'session': str(self._session)})
> conn = self._pool.get(True)
> try:
> write_result_set = conn.write(message).result()
> return write_result_set.all().result() # wait for _receive() to finish
> except protocol.GremlinServerError:
> pass
> ```
> I'm not sure if this is the correct solution, but wanted to point out the bug.
> In the meantime however, I wrote a context manager to handle this cleanup for
> me
> ```py
> @contextlib.contextmanager
> def transaction():
> tx = g.tx()
> gtx = tx.begin()
> try:
> yield tx, gtx
> tx.commit()
> except Exception as e:
> tx.rollback()
> finally:
> while not tx._session_based_connection._client._pool.empty():
> conn = tx._session_based_connection._client._pool.get(True)
> conn.close()
> logger.info("Closed abandoned session connection")
> with transaction() as (tx, gtx):
> foo = gtx.some_traversal().to_list()
> # do something with foo
> gtx.some_other_traversal().iterate()
> ```
> Cheers
--
This message was sent by Atlassian Jira
(v8.20.10#820010)