[ 
https://issues.apache.org/jira/browse/HDFS-14854?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16952229#comment-16952229
 ] 

Stephen O'Donnell commented on HDFS-14854:
------------------------------------------

For LowRedundancyBlocks, any changes are certainly a separate Jira. This one is 
already large enough. I am also wary of a major refactor in a critical area, 
while what is there works quite well generally.

For your further comments.
 # I will move all the locks outside of the try blocks.
 # I will clean this part up.
 # I am going to keep what is there. The pattern was established this way in 
the default monitor, so both of them work in roughly the same way. The current 
approach also lends some flexibility to throttling the number of nodes which 
have their storage scanned in a pass of the check loop in a similar way to the 
default monitor, if we later decided that is needed.
 # I will add the Java Doc.
 # In general this loop will find blocks it needs to move to the pending list 
and if there are no nodes decommissioning this code will never be called. On 
balance I feel the extra information we get from the taking the lock is worth 
it. Additionally in the BackOffMonitor the locking is much less aggressive than 
what it replaces so we should be good there.

> Create improved decommission monitor implementation
> ---------------------------------------------------
>
>                 Key: HDFS-14854
>                 URL: https://issues.apache.org/jira/browse/HDFS-14854
>             Project: Hadoop HDFS
>          Issue Type: Improvement
>          Components: namenode
>    Affects Versions: 3.3.0
>            Reporter: Stephen O'Donnell
>            Assignee: Stephen O'Donnell
>            Priority: Major
>         Attachments: Decommission_Monitor_V2_001.pdf, HDFS-14854.001.patch, 
> HDFS-14854.002.patch, HDFS-14854.003.patch, HDFS-14854.004.patch, 
> HDFS-14854.005.patch, HDFS-14854.006.patch, HDFS-14854.007.patch, 
> HDFS-14854.008.patch, HDFS-14854.009.patch
>
>
> In HDFS-13157, we discovered a series of problems with the current 
> decommission monitor implementation, such as:
>  * Blocks are replicated sequentially disk by disk and node by node, and 
> hence the load is not spread well across the cluster
>  * Adding a node for decommission can cause the namenode write lock to be 
> held for a long time.
>  * Decommissioning nodes floods the replication queue and under replicated 
> blocks from a future node or disk failure may way for a long time before they 
> are replicated.
>  * Blocks pending replication are checked many times under a write lock 
> before they are sufficiently replicate, wasting resources
> In this Jira I propose to create a new implementation of the decommission 
> monitor that resolves these issues. As it will be difficult to prove one 
> implementation is better than another, the new implementation can be enabled 
> or disabled giving the option of the existing implementation or the new one.
> I will attach a pdf with some more details on the design and then a version 1 
> patch shortly.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

Reply via email to