[ https://issues.apache.org/jira/browse/IGNITE-15398?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17409526#comment-17409526 ]
Mirza Aliev edited comment on IGNITE-15398 at 9/3/21, 2:54 PM: --------------------------------------------------------------- I've made some investigation and come up with the following: * {{JRaft-Request-Processor}} leakage was fixed before and currently we had only NioEventLoopGroup leakage * {{NioEventLoopGroup}} from {{ClientHandlerModule}} weren't cleared properly because we missed the module in the stop flow * {{NioEventLoopGroup}} form {{RestModule}} weren't cleared properly because RestModule#stop didn't contain Netty channel stopping mechainsm * In general, OOM is not connected to leakage of the threads, the main reason is that we do not stop {{MetaStorageManager}} properly, namely we do not stop {{MetaStorageServiceImpl$WatchProcessor$Watcher}}, hence a lot of is held by {{Watcher}} thread and GC couldn't collect them. Ticket for MetaStorageManager https://issues.apache.org/jira/browse/IGNITE-15444 !screenshot-1.png! !screenshot-2.png! was (Author: maliev): I've made some investigation and come up with the following: * {{JRaft-Request-Processor}} leakage was fixed before and currently we had only NioEventLoopGroup leakage * {{NioEventLoopGroup}} from {{ClientHandlerModule}} weren't cleared properly because we missed the module in the stop flow * {{NioEventLoopGroup}} form {{RestModule}} weren't cleared properly because RestModule#stop didn't contain Netty channel stopping mechainsm * In general, OOM is not connected to leakage of the threads, the main reason is that we do not stop {{MetaStorageManager}} properly, namely we do not stop {{MetaStorageServiceImpl$WatchProcessor$Watcher}}, hence a lot of is held by {{Watcher}} thread and GC couldn't collect them !screenshot-1.png! !screenshot-2.png! > NioEventLoopGroup threads leakage > --------------------------------- > > Key: IGNITE-15398 > URL: https://issues.apache.org/jira/browse/IGNITE-15398 > Project: Ignite > Issue Type: Bug > Reporter: Andrey Mashenkov > Assignee: Mirza Aliev > Priority: Blocker > Labels: ignite-3 > Fix For: 3.0.0-alpha3 > > Attachments: screenshot-1.png, screenshot-2.png > > Time Spent: 10m > Remaining Estimate: 0h > > I've run a simple test and face OOM on 7 of 100 iterations. > Seems, thread leakage is a reason. > Use JVM arg `-Xmx512M` to run the test, otherwise more iterations may be > required. > {code:java} > @RepeatedTest(100) > public void nodeRestart100Test() throws Exception { > List<Ignite> grid = startGrid(); > IgniteUtils.closeAll(Lists.reverse(grid)); > } > {code} > Thread dump shows a huge number of parked NioEventLoopGroup and > JRaft-Request-Processor. > Further investigation shows most of NioEventLoopGroup threads are acceptor > threads created in startEndpoint() method of RestModule and ClientModule > classes. -- This message was sent by Atlassian Jira (v8.3.4#803005)