[ 
https://issues.apache.org/jira/browse/HDFS-8578?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14590925#comment-14590925
 ] 

Colin Patrick McCabe commented on HDFS-8578:
--------------------------------------------

bq. Just realized that, parallelism should be added at the datanode level in 
DataStorage#addStorageLocations(). Not in the BlockPoolSliceStorage.

OK, so your idea is that that will allow us to parallelize upgrades between 
different block pools.  Fair enough.

bq. And during startup, If any one of the blockpool storage directories failed 
to load/upgrade also, datanode continues to start with other directories 
available. Only if all directories are failed to load then only it will fail.

I don't think this is true.  The DN will not start up if more than 
{{dfs.datanode.failed.volumes.tolerated}} volumes have failed, as per this code:

{code}
  /**
   * An FSDataset has a directory where it loads its data files.
   */
  FsDatasetImpl(DataNode datanode, DataStorage storage, Configuration conf
      ) throws IOException {
...
    if (volsFailed > volFailuresTolerated) {
      throw new DiskErrorException("Too many failed volumes - "
          + "current valid volumes: " + storage.getNumStorageDirs()
          + ", volumes configured: " + volsConfigured
          + ", volumes failed: " + volsFailed
          + ", volume failures tolerated: " + volFailuresTolerated);
    }
{code}

bq. Updated the patch, please review.

ok

> On upgrade, Datanode should process all storage/data dirs in parallel
> ---------------------------------------------------------------------
>
>                 Key: HDFS-8578
>                 URL: https://issues.apache.org/jira/browse/HDFS-8578
>             Project: Hadoop HDFS
>          Issue Type: Improvement
>          Components: datanode
>            Reporter: Raju Bairishetti
>            Priority: Critical
>         Attachments: HDFS-8578-01.patch, HDFS-8578-02.patch
>
>
> Right now, during upgrades datanode is processing all the storage dirs 
> sequentially. Assume it takes ~20 mins to process a single storage dir then  
> datanode which has ~10 disks will take around 3hours to come up.
> *BlockPoolSliceStorage.java*
> {code}
>    for (int idx = 0; idx < getNumStorageDirs(); idx++) {
>       doTransition(datanode, getStorageDir(idx), nsInfo, startOpt);
>       assert getCTime() == nsInfo.getCTime() 
>           : "Data-node and name-node CTimes must be the same.";
>     }
> {code}
> It would save lots of time during major upgrades if datanode process all 
> storagedirs/disks parallelly.
> Can we make datanode to process all storage dirs parallelly?



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to