[ 
https://issues.apache.org/jira/browse/HDFS-16668?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tai Zhou updated HDFS-16668:
----------------------------
    Description: 
Hi, 

I am working on a HDFS smart storage management project recently. It is based 
on the Mover in Hadoop-hdfs project. I noticed that most code in Mover is 
similar to Balancer. However, Mover doesn't clean up MoverExecutor as Balancer 
does. 

If we have multiple NameSystem for Namenode Connectors or have a large number 
of datanodes, Mover will result in threads leaking because there might be 
numerous iterations to process these namespaces.  Like our project, we modified 
some source code so that we can use mover.run() once we found the blocks did 
not match the expected storage policies. So our application will initialize 
Namenode Connector and Mover continually. It turns out we have thousands of 
threads or threads pools for MoverExecutor.

here is what it looks like. We can see here are 9000+ threads like this in WAIT 
condition.

!screenshot-2.png|width=558,height=209!

I know generally users may not use Mover like us. They might use it by CLI. But 
more and more users are planing to apply RBF or multiple NameSystems, or with a 
large cluster of datanodes. Mover CLI have to keep more than thousands of 
thread after pressing the enter key.

I have pulled a quick fix code, if you guys are interested, plz take a look at 
it.
thx.

  was:
Hi, 

I am working on a HDFS smart storage management project recently. It is based 
on the Mover in Hadoop-hdfs project. I noticed that most code in Mover is 
similar to Balancer. However, Mover doesn't clean up MoverExecutor as Balancer 
does. 

If we have multiple NameSystem for Namenode Connectors or have a large number 
of datanodes, Mover will result in threads leaking because there might be 
numerous iterations to process these namespaces.  Like our project, we modified 
some source code so that we can use mover.run() once we found the blocks did 
not match the expected storage policies. So our application will initialize 
Namenode Connector and Mover continually. It turns out we have thousands of 
threads or threads pools for MoverExecutor.

here is what it looks like. We can see here are 9000+ threads like this in WAIT 
condition.



I know generally users may not use Mover like us. They might use it by CLI. But 
 more and more users are planing to apply RBF or multiple NameSystems, or with 
a large cluster of datanodes. Mover CLI have to keep more than thousands of 
thread after pressing the enter key.

I have pulled a quick fix code, if you guys are interested, plz take a look at 
it.
thx. 





> Clean up MoverExecutor after each iteration to avoid potential thread leak
> --------------------------------------------------------------------------
>
>                 Key: HDFS-16668
>                 URL: https://issues.apache.org/jira/browse/HDFS-16668
>             Project: Hadoop HDFS
>          Issue Type: Improvement
>    Affects Versions: 3.3.3
>            Reporter: Tai Zhou
>            Priority: Major
>         Attachments: screenshot-1.png, screenshot-2.png
>
>
> Hi, 
> I am working on a HDFS smart storage management project recently. It is based 
> on the Mover in Hadoop-hdfs project. I noticed that most code in Mover is 
> similar to Balancer. However, Mover doesn't clean up MoverExecutor as 
> Balancer does. 
> If we have multiple NameSystem for Namenode Connectors or have a large number 
> of datanodes, Mover will result in threads leaking because there might be 
> numerous iterations to process these namespaces.  Like our project, we 
> modified some source code so that we can use mover.run() once we found the 
> blocks did not match the expected storage policies. So our application will 
> initialize Namenode Connector and Mover continually. It turns out we have 
> thousands of threads or threads pools for MoverExecutor.
> here is what it looks like. We can see here are 9000+ threads like this in 
> WAIT condition.
> !screenshot-2.png|width=558,height=209!
> I know generally users may not use Mover like us. They might use it by CLI. 
> But more and more users are planing to apply RBF or multiple NameSystems, or 
> with a large cluster of datanodes. Mover CLI have to keep more than thousands 
> of thread after pressing the enter key.
> I have pulled a quick fix code, if you guys are interested, plz take a look 
> at it.
> thx.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

Reply via email to