[ https://issues.apache.org/jira/browse/HDFS-8578?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Tsz Wo Nicholas Sze updated HDFS-8578: -------------------------------------- Attachment: h8578_20151210.patch h8578_20151210.patch: execute hardlink tasks in parallel. The patch is big due to a lot of method header changes. The change actually is simple and safe since the hard link code are almost static. There is no synchronization issues. > On upgrade, Datanode should process all storage/data dirs in parallel > --------------------------------------------------------------------- > > Key: HDFS-8578 > URL: https://issues.apache.org/jira/browse/HDFS-8578 > Project: Hadoop HDFS > Issue Type: Improvement > Components: datanode > Reporter: Raju Bairishetti > Assignee: Vinayakumar B > Priority: Critical > Attachments: HDFS-8578-01.patch, HDFS-8578-02.patch, > HDFS-8578-03.patch, HDFS-8578-04.patch, HDFS-8578-05.patch, > HDFS-8578-06.patch, HDFS-8578-07.patch, HDFS-8578-08.patch, > HDFS-8578-09.patch, HDFS-8578-10.patch, HDFS-8578-11.patch, > HDFS-8578-12.patch, HDFS-8578-13.patch, HDFS-8578-14.patch, > HDFS-8578-15.patch, HDFS-8578-branch-2.6.0.patch, > HDFS-8578-branch-2.7-001.patch, HDFS-8578-branch-2.7-002.patch, > HDFS-8578-branch-2.7-003.patch, h8578_20151210.patch > > > Right now, during upgrades datanode is processing all the storage dirs > sequentially. Assume it takes ~20 mins to process a single storage dir then > datanode which has ~10 disks will take around 3hours to come up. > *BlockPoolSliceStorage.java* > {code} > for (int idx = 0; idx < getNumStorageDirs(); idx++) { > doTransition(datanode, getStorageDir(idx), nsInfo, startOpt); > assert getCTime() == nsInfo.getCTime() > : "Data-node and name-node CTimes must be the same."; > } > {code} > It would save lots of time during major upgrades if datanode process all > storagedirs/disks parallelly. > Can we make datanode to process all storage dirs parallelly? -- This message was sent by Atlassian JIRA (v6.3.4#6332)