[ https://issues.apache.org/jira/browse/TINKERPOP-3113?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17993119#comment-17993119 ]
Stephen Mallette commented on TINKERPOP-3113: --------------------------------------------- Checking in on this ticket - I suppose the question is whether there's much point to adding fixes here at this point given this feature never quite made it to production-ready status and that 4.x renders it useless in any event. My inclination is that the {{UnifiedChannelizer}} and its related code should be deprecated in 3.8-dev as its already removed as infrastructure in 4.x along with other {{OpProcessor}} concepts. > UnifiedChannelizer Retains Sessions Forever > ------------------------------------------- > > Key: TINKERPOP-3113 > URL: https://issues.apache.org/jira/browse/TINKERPOP-3113 > Project: TinkerPop > Issue Type: Bug > Components: server > Affects Versions: 3.7.2 > Reporter: Allan Clements > Priority: Major > Attachments: dominator_tree.png, path_to_gc_root.png > > > ([Prior Discord > Thread|https://discord.com/channels/838910279550238720/1290792261117939722/1290792261117939722]) > I believe UnifiedChannelizer is causing sessions to be retained forever until > either the channel is closed or an OOME occurs. > > My JanusGraph deployment would generally OOME after a couple hours. After > collecting some heap dumps I noticed over 40k traversal bytecode submissions > to the server were still in the dump according to the Eclipse Memory Analyzer > Tool's dominator tree view, attached below. > > Inspecting further I picked an instance of the bytecode and it appeared the > path to the GC root, preventing its collection, was through a CloseFuture > associated with its initialChannel, attached below also. > > So if the channel were regularly closed it may duck the issue. In either case > I was able to reproduce it in a toy Java application using the official > driver by continuously submitting traversals with large payloads via an > inject() step without closing the connection as well as in a toy Rust > application using the unofficial driver. > > I believe the issue originates > [here|https://github.com/apache/tinkerpop/blob/b7c9ddda16a3d059b2a677f578f131a7124187b6/gremlin-server/src/main/java/org/apache/tinkerpop/gremlin/server/handler/AbstractSession.java#L166-L170]. > The code creates a future using a closure that just calls close(). However > in doing so it captures a reference to the current object. The child class of > AbstractSession is SingleTaskSession and holds a reference to its > [onlySessionTask > here|https://github.com/apache/tinkerpop/blob/a6777d0c8394f95c5aa400b965e8f4f996db1abd/gremlin-server/src/main/java/org/apache/tinkerpop/gremlin/server/handler/SingleTaskSession.java#L33]. > [SessionTask > |https://github.com/apache/tinkerpop/blob/a6777d0c8394f95c5aa400b965e8f4f996db1abd/gremlin-server/src/main/java/org/apache/tinkerpop/gremlin/server/handler/SessionTask.java]extends > Context which holds a reference to the traversal's [Bytecode > here|https://github.com/apache/tinkerpop/blob/a6777d0c8394f95c5aa400b965e8f4f996db1abd/gremlin-server/src/main/java/org/apache/tinkerpop/gremlin/server/Context.java#L62]. > > There doesn't appear to be a "happy path" removal of the anonymously created > listener for the channel's closeFuture() once the session completes normally > (like when a traversal is fully iterated). > > As a side note, that anonymous created listener may also be problematic for > exposing the object being constructed before the constructor completes. Which > can cause problems in multi-threaded environments, assuming the underlying > asynchronous runtime permits multi-threaded operation. Imagine the channel > closing concurrently to the constructor being executed after the listener was > added. -- This message was sent by Atlassian Jira (v8.20.10#820010)