[jira] [Commented] (HDFS-14854) Create improved decommission monitor implementation

Hudson (Jira) Thu, 12 Dec 2019 07:51:21 -0800


    [ 
https://issues.apache.org/jira/browse/HDFS-14854?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16994802#comment-16994802
 ]


Hudson commented on HDFS-14854:
-------------------------------

SUCCESS: Integrated in Jenkins build Hadoop-trunk-Commit #17756 (See 
[https://builds.apache.org/job/Hadoop-trunk-Commit/17756/])
HDFS-15047. Document the new decommission monitor (HDFS-14854). (#1755) 
(github: rev bdd00f10b46c1c856433e2948906f36c70d3a0be)
* (edit) 
hadoop-hdfs-project/hadoop-hdfs/src/site/markdown/HdfsDataNodeAdminGuide.md


> Create improved decommission monitor implementation
> ---------------------------------------------------
>
>                 Key: HDFS-14854
>                 URL: https://issues.apache.org/jira/browse/HDFS-14854
>             Project: Hadoop HDFS
>          Issue Type: Improvement
>          Components: namenode
>    Affects Versions: 3.3.0
>            Reporter: Stephen O'Donnell
>            Assignee: Stephen O'Donnell
>            Priority: Major
>             Fix For: 3.3.0
>
>         Attachments: 012_to_013_changes.diff, 
> Decommission_Monitor_V2_001.pdf, HDFS-14854.001.patch, HDFS-14854.002.patch, 
> HDFS-14854.003.patch, HDFS-14854.004.patch, HDFS-14854.005.patch, 
> HDFS-14854.006.patch, HDFS-14854.007.patch, HDFS-14854.008.patch, 
> HDFS-14854.009.patch, HDFS-14854.010.patch, HDFS-14854.011.patch, 
> HDFS-14854.012.patch, HDFS-14854.013.patch, HDFS-14854.014.patch
>
>
> In HDFS-13157, we discovered a series of problems with the current 
> decommission monitor implementation, such as:
>  * Blocks are replicated sequentially disk by disk and node by node, and 
> hence the load is not spread well across the cluster
>  * Adding a node for decommission can cause the namenode write lock to be 
> held for a long time.
>  * Decommissioning nodes floods the replication queue and under replicated 
> blocks from a future node or disk failure may way for a long time before they 
> are replicated.
>  * Blocks pending replication are checked many times under a write lock 
> before they are sufficiently replicate, wasting resources
> In this Jira I propose to create a new implementation of the decommission 
> monitor that resolves these issues. As it will be difficult to prove one 
> implementation is better than another, the new implementation can be enabled 
> or disabled giving the option of the existing implementation or the new one.
> I will attach a pdf with some more details on the design and then a version 1 
> patch shortly.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

[jira] [Commented] (HDFS-14854) Create improved decommission monitor implementation

Reply via email to