[ 
https://issues.apache.org/jira/browse/HDFS-7541?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ming Ma updated HDFS-7541:
--------------------------
    Attachment: HDFS-7541-3.patch

We have been running upgrade domain policy on one of our large production 
clusters, here are the results.


* Not perf impact on write operation, specifically the RPC AddBlock latency
* All blocks have been migrated to the upgrade domain policy.


Here is the updated version of the patch. Appreciate if anyone has any high 
level comments on the design. If people are ok with the approach, I will open 
sub tasks.

During the work, we also found out that the balancer has hard code rack based 
policy, instead of leveraging block placement policy, e.g. HDFS-1431. Something 
we should follow up more so that balancer doesn’t need to be modified when we 
introduce new block placement policy.

> Upgrade Domains in HDFS
> -----------------------
>
>                 Key: HDFS-7541
>                 URL: https://issues.apache.org/jira/browse/HDFS-7541
>             Project: Hadoop HDFS
>          Issue Type: Improvement
>            Reporter: Ming Ma
>         Attachments: HDFS-7541-3.patch, HDFS-7541.patch, 
> SupportforfastHDFSdatanoderollingupgrade.pdf, UpgradeDomains_design_v2.pdf
>
>
> Current HDFS DN rolling upgrade step requires sequential DN restart to 
> minimize the impact on data availability and read/write operations. The side 
> effect is longer upgrade duration for large clusters. This might be 
> acceptable for DN JVM quick restart to update hadoop code/configuration. 
> However, for OS upgrade that requires machine reboot, the overall upgrade 
> duration will be too long if we continue to do sequential DN rolling restart.
>  



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to