[ 
https://issues.apache.org/jira/browse/HDFS-7035?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14187857#comment-14187857
 ] 

Colin Patrick McCabe commented on HDFS-7035:
--------------------------------------------

I think it would be easier to simply:

* step #1: add the new DataStorages
* step #2: for all the DataStorages that we successfully added, call 
FsDatasetSpi#addVolumes
* step #3: for all the volumes that failed in step #2, remove the DataStorages

Then we don't have to change the {{FsDatasetSpi}} interface again.  We don't 
have to create a new callback interface with {{commit}}, and the patch gets a 
lot smaller.

I can see there may be some benefit to parallelizing searching the new 
directory for replica files, but this seems like an optimization we should do 
in a separate JIRA.  Of course, we can do that inside {{FsDatasetSpi}} without 
changing the interface, the same way you're doing it here... with Futures.  My 
impression is that the amount of work we do in DataStorage is really small, so 
there isn't any need to parallelize the DataStorage stuff, only the scanning of 
the replicas.

I'm having trouble figuring out the locking for the callback-based approach 
here.  Right now we're holding the dn lock when calling {{commit}}... but I 
don't see why.  Locking would be clearer if we just avoided callbacks and went 
for the straight-line approach, I think.  What do you think?

{code}
506    int numOldDataDirs = 0;
507         synchronized (this) {
508           numOldDataDirs = dataDirs.size();
509           dataDirs = locations;
510         }
{code}
Do we still need to set {{DataNode#dataDirs}} here?  And I don't see where we 
correct this on failure, if we fail.

> Make adding volume an atomic operation.
> ---------------------------------------
>
>                 Key: HDFS-7035
>                 URL: https://issues.apache.org/jira/browse/HDFS-7035
>             Project: Hadoop HDFS
>          Issue Type: Sub-task
>          Components: datanode
>    Affects Versions: 2.5.0
>            Reporter: Lei (Eddy) Xu
>            Assignee: Lei (Eddy) Xu
>         Attachments: HDFS-7035.000.combo.patch, HDFS-7035.000.patch, 
> HDFS-7035.001.combo.patch, HDFS-7035.001.patch, HDFS-7035.002.patch, 
> HDFS-7035.003.patch, HDFS-7035.003.patch, HDFS-7035.004.patch, 
> HDFS-7035.005.patch, HDFS-7035.007.patch, HDFS-7035.008.patch, 
> HDFS-7035.009.patch, HDFS-7035.010.patch, HDFS-7035.010.patch, 
> HDFS-7035.011.patch
>
>
> It refactors {{DataStorage}} and {{BlockPoolSliceStorage}} to reduce the 
> duplicate code and supports atomic adding volume operations. Also it 
> parallels loading data volume operation: each thread loads one volume.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to