[ https://issues.apache.org/jira/browse/HDFS-7541?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Ming Ma updated HDFS-7541: -------------------------- Attachment: HDFS-7541-3.patch We have been running upgrade domain policy on one of our large production clusters, here are the results. * Not perf impact on write operation, specifically the RPC AddBlock latency * All blocks have been migrated to the upgrade domain policy. Here is the updated version of the patch. Appreciate if anyone has any high level comments on the design. If people are ok with the approach, I will open sub tasks. During the work, we also found out that the balancer has hard code rack based policy, instead of leveraging block placement policy, e.g. HDFS-1431. Something we should follow up more so that balancer doesn’t need to be modified when we introduce new block placement policy. > Upgrade Domains in HDFS > ----------------------- > > Key: HDFS-7541 > URL: https://issues.apache.org/jira/browse/HDFS-7541 > Project: Hadoop HDFS > Issue Type: Improvement > Reporter: Ming Ma > Attachments: HDFS-7541-3.patch, HDFS-7541.patch, > SupportforfastHDFSdatanoderollingupgrade.pdf, UpgradeDomains_design_v2.pdf > > > Current HDFS DN rolling upgrade step requires sequential DN restart to > minimize the impact on data availability and read/write operations. The side > effect is longer upgrade duration for large clusters. This might be > acceptable for DN JVM quick restart to update hadoop code/configuration. > However, for OS upgrade that requires machine reboot, the overall upgrade > duration will be too long if we continue to do sequential DN rolling restart. > -- This message was sent by Atlassian JIRA (v6.3.4#6332)