[ 
https://issues.apache.org/jira/browse/IGNITE-15398?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17409526#comment-17409526
 ] 

Mirza Aliev edited comment on IGNITE-15398 at 9/3/21, 2:54 PM:
---------------------------------------------------------------

I've made some investigation and come up with the following: 
* {{JRaft-Request-Processor}} leakage was fixed before and currently we had 
only NioEventLoopGroup leakage
* {{NioEventLoopGroup}} from {{ClientHandlerModule}} weren't cleared properly 
because we missed the module in the stop flow
* {{NioEventLoopGroup}} form {{RestModule}} weren't cleared properly because 
RestModule#stop didn't contain Netty channel stopping mechainsm
* In general, OOM is not connected to leakage of the threads, the main reason 
is that we do not stop {{MetaStorageManager}} properly, namely we do not stop 
{{MetaStorageServiceImpl$WatchProcessor$Watcher}}, hence a lot of is held by 
{{Watcher}} thread and GC couldn't collect them. Ticket for MetaStorageManager 
https://issues.apache.org/jira/browse/IGNITE-15444

 !screenshot-1.png!  !screenshot-2.png! 


was (Author: maliev):
I've made some investigation and come up with the following: 
* {{JRaft-Request-Processor}} leakage was fixed before and currently we had 
only NioEventLoopGroup leakage
* {{NioEventLoopGroup}} from {{ClientHandlerModule}} weren't cleared properly 
because we missed the module in the stop flow
* {{NioEventLoopGroup}} form {{RestModule}} weren't cleared properly because 
RestModule#stop didn't contain Netty channel stopping mechainsm
* In general, OOM is not connected to leakage of the threads, the main reason 
is that we do not stop {{MetaStorageManager}} properly, namely we do not stop 
{{MetaStorageServiceImpl$WatchProcessor$Watcher}}, hence a lot of is held by 
{{Watcher}} thread and GC couldn't collect them

 !screenshot-1.png!  !screenshot-2.png! 

> NioEventLoopGroup threads leakage
> ---------------------------------
>
>                 Key: IGNITE-15398
>                 URL: https://issues.apache.org/jira/browse/IGNITE-15398
>             Project: Ignite
>          Issue Type: Bug
>            Reporter: Andrey Mashenkov
>            Assignee: Mirza Aliev
>            Priority: Blocker
>              Labels: ignite-3
>             Fix For: 3.0.0-alpha3
>
>         Attachments: screenshot-1.png, screenshot-2.png
>
>          Time Spent: 10m
>  Remaining Estimate: 0h
>
> I've run a simple test and face OOM on 7 of 100 iterations.
> Seems, thread leakage is a reason.
> Use JVM arg `-Xmx512M` to run the test, otherwise more iterations may be 
> required.
> {code:java}
>  @RepeatedTest(100)
>     public void nodeRestart100Test() throws Exception {
>         List<Ignite> grid = startGrid();
>         IgniteUtils.closeAll(Lists.reverse(grid));
>     }
> {code}
> Thread dump shows a huge number of parked NioEventLoopGroup and 
> JRaft-Request-Processor.
> Further investigation shows most of NioEventLoopGroup threads are acceptor 
> threads created in startEndpoint() method of RestModule and ClientModule 
> classes.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Reply via email to