[ 
https://issues.apache.org/jira/browse/HDFS-6727?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14129441#comment-14129441
 ] 

Colin Patrick McCabe commented on HDFS-6727:
--------------------------------------------

Can we use a word other than "decommissioning" to describe getting rid of 
volumes?  How about "volume deactivation"?  It seems like people might get 
confused with DataNode decommissioning, a separate feature.

{code}
 +        LOG.info("Adding new volumes: " + newLocations);
{code}
This should be {{Joiner.on(",").join(newLocations)}} or similar.  The default 
way of printing lists may not be what we want here.

{code}
+        for (BPOfferService bpos : blockPoolManager.getAllNamenodeThreads()) {
+          data.addBlockPool(bpos.getNamespaceInfo().getBlockPoolID(), conf);
+        }
{code}

It seems like we should add any new block pool objects before we add the 
volumes that reference them?

{code}
+          storage.addStorageLocations(this, nsInfo, newLocations,
+              getStartupOption(conf));
{code}
I traced this code a bit, and it seems to be checking the startup option 
against {{StartupOption.FORMAT}} to see if it should format the new 
directories.  This brings up an important point: are we going to allow users to 
add directories that already contain block files and so forth in them?  Or is 
this feature just for adding new empty directories (that need to be formatted)? 
 Either way, using the startup option from the conf doesn't seem correct here.

I think we should only allow either empty directories, or directories at the 
same layout version as the current datanode, to be added.  Trying to do 
upgrades on the contents will lead to more complexity, and is something we 
should put off until a future JIRA.

{code}
+    } catch (IOException e) {
+      LOG.warn("There is IOException when refreshing volumes! "
+          + "Recover configurations: " + DFS_DATANODE_DATA_DIR_KEY
+          + " = " + oldVolumes, e);
+      // TODO Roll back changes.
+      throw e;
+    }
{code}

I think we need a better error management story here.  I don't think we should 
try to roll back to the original state-- if we fail to do that, we're in an 
even bigger mess.  Instead, we should have a catch block around individual 
operations that can fail, and clearly point out in the log message which 
operation failed (removal, addition, etc.)  Interestingly, removing volumes 
seems to be declared not to throw IOE, so maybe it can't fail.

> Refresh data volumes on DataNode based on configuration changes
> ---------------------------------------------------------------
>
>                 Key: HDFS-6727
>                 URL: https://issues.apache.org/jira/browse/HDFS-6727
>             Project: Hadoop HDFS
>          Issue Type: Sub-task
>          Components: datanode
>    Affects Versions: 2.5.0, 2.4.1
>            Reporter: Lei (Eddy) Xu
>            Assignee: Lei (Eddy) Xu
>              Labels: datanode
>         Attachments: HDFS-6727.000.delta-HDFS-6775.txt, HDFS-6727.001.patch, 
> HDFS-6727.002.patch, HDFS-6727.combo.patch
>
>
> HDFS-1362 requires DataNode to reload configuration file during the runtime, 
> so that DN can change the data volumes dynamically. This JIRA reuses the 
> reconfiguration framework introduced by HADOOP-7001 to enable DN to 
> reconfigure at runtime.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to