[ https://issues.apache.org/jira/browse/TINKERPOP-2820?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17716395#comment-17716395 ]
ASF GitHub Bot commented on TINKERPOP-2820: ------------------------------------------- kenhuuu commented on code in PR #2042: URL: https://github.com/apache/tinkerpop/pull/2042#discussion_r1176909962 ########## gremlin-python/src/main/python/gremlin_python/process/graph_traversal.py: ########## @@ -1092,6 +1092,7 @@ def __verify_transaction_state(self, state, error_message): def __close_session(self, session): self._session_based_connection.close() Review Comment: Would it make sense to put these two lines in the newly added remove_session()? > gremlin-python _close_session race condition/FD leak > ---------------------------------------------------- > > Key: TINKERPOP-2820 > URL: https://issues.apache.org/jira/browse/TINKERPOP-2820 > Project: TinkerPop > Issue Type: Bug > Components: python > Affects Versions: 3.6.1 > Reporter: Alex Hamilton > Assignee: Valentyn Kahamlyk > Priority: Critical > > There is a race condition in gremlin-python when closing session-based > connections that results in leaking file descriptors for event loops - > eventually leading to an `OSError [Errno 24] too many open files` error after > enough transactions occur. > The problem stems from a race condition when closing session based > connections that causes the event loop opened for the session's connection to > be left open. > The problem is completely contained in these two methods from > `gremlin_python.driver.client.py` > ```py > def close(self): > # prevent the Client from being closed more than once. it raises errors > if new jobby jobs > # get submitted to the executor when it is shutdown > if self._closed: > return > if self._session_enabled: > self._close_session() # 1. (see below) > log.info("Closing Client with url '%s'", self._url) > while not self._pool.empty(): # 3. (see below) > conn = self._pool.get(True) > conn.close() > self._executor.shutdown() > self._closed = True > def _close_session(self): > message = request.RequestMessage( > processor='session', op='close', > args=\{'session': str(self._session)}) > conn = self._pool.get(True) > return conn.write(message).result() # 2. (see below) > ``` > 1. `_close_session()` called > 2. `.result()` waits for the _write_ to finish, but does *not* wait for the > _read_ to finish. `conn` does not get put back into `self._pool` until AFTER > the read finishes (`gremlin_python.driver.connection.Connection._receive()`). > However, this method returns early and goes to 3. > 3. this while loop is not entered to close out the connections. This leaves > the conn's event loop running, never to be closed. > I was able to solve this by modifying `_close_session` as follows: > ```py > def _close_session(self): > message = request.RequestMessage( > processor='session', op='close', > args=\{'session': str(self._session)}) > conn = self._pool.get(True) > try: > write_result_set = conn.write(message).result() > return write_result_set.all().result() # wait for _receive() to finish > except protocol.GremlinServerError: > pass > ``` > I'm not sure if this is the correct solution, but wanted to point out the bug. > In the meantime however, I wrote a context manager to handle this cleanup for > me > ```py > @contextlib.contextmanager > def transaction(): > tx = g.tx() > gtx = tx.begin() > try: > yield tx, gtx > tx.commit() > except Exception as e: > tx.rollback() > finally: > while not tx._session_based_connection._client._pool.empty(): > conn = tx._session_based_connection._client._pool.get(True) > conn.close() > logger.info("Closed abandoned session connection") > with transaction() as (tx, gtx): > foo = gtx.some_traversal().to_list() > # do something with foo > gtx.some_other_traversal().iterate() > ``` > Cheers -- This message was sent by Atlassian Jira (v8.20.10#820010)