[jira] [Commented] (HDFS-4988) Datanode must support all the volumes as individual storages

Tsz Wo (Nicholas), SZE (JIRA) Wed, 25 Sep 2013 20:43:53 -0700

    [ 
https://issues.apache.org/jira/browse/HDFS-4988?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13778419#comment-13778419
 ]


Tsz Wo (Nicholas), SZE commented on HDFS-4988:
----------------------------------------------

- In LayoutVersion, let's change DATANODEID to DATANODE_ID so that it is 
consistent with other version such as SEQUENTIAL_BLOCK_ID.


- In the code below, the change is unnecessary since the DatanodeID constructor 
already has taken care the empty case.
{code}
-    return new DatanodeID(dn.getIpAddr(), dn.getHostName(), 
dn.getDatanodeUuid(),
-        dn.getXferPort(), dn.getInfoPort(), dn.getIpcPort());
+    return new DatanodeID(
+        dn.getIpAddr(),
+        dn.getHostName(),
+        dn.getDatanodeUuid().isEmpty() ? null : dn.getDatanodeUuid(),
{code}

- In DatanodeManager.getDatanode(datanodeUuid), is there a legitimate case that 
datanodeUuid is null?  If no, we should throw NPE.
-* If yes, let's change the code below to use 
getDatanode(nodeReg.getDatanodeUuid());
{code}
-      DatanodeDescriptor nodeS = datanodeMap.get(nodeReg.getDatanodeUuid());
+      DatanodeDescriptor nodeS =
+          nodeReg.getDatanodeUuid() == null ? null :
+          datanodeMap.get(nodeReg.getDatanodeUuid());
{code}

- It seems that StorageDirectory.storageID should be moved to DataStorage.  
StorageDirectory is a lower level class which takes care of the directory in 
the OS while DataStorage is a higher level class.  BTW, do you want to call it 
storageUuid?

- In BPServiceActor.blockReport(), we should put the type arguments when using 
Iterator and Map.Entry.  Then we do not need to cast the key and the value 
types.
{code}
+      Iterator it = perVolumeBlockLists.entrySet().iterator();
+      int i = 0;
+      while (it.hasNext()) {
+        Map.Entry kvPair = (Map.Entry) it.next();
+        String storageID = (String) kvPair.getKey();
+        BlockListAsLongs blockList = (BlockListAsLongs) kvPair.getValue();
{code}
BTW, the code above can be simplified with for-each loop, i.e.
{code}
      for(Map.Entry<String, BlockListAsLongs> kvPair : 
perVolumeBlockLists.entrySet()) {
        final String storageID = kvPair.getKey();
        final BlockListAsLongs blockList = kvPair.getValue();
{code}

- In DataStorage, should cachedDatanodeUuid be moved to DataNode?  Also, would 
calling it as "datanodeUuid" good enough?  For example, we use storageID (or 
storageUuid) but not cachedStorageID.


- The new DataNode.getDataDirs() method is not used.  Let's remove it.

- In new FsVolumeSpi.getStorageID() method, do you want to call it 
getStorageUuid()?

- Why removing the following TODO comment?
{code}
-      // TODO: getStorageTypeFromLocations() is only a temporary workaround 
and 
-      // should be replaced with getting storage type from DataStorage 
(missing 
-      // storage type now) directly.
+      Storage.StorageDirectory sd = storage.getStorageDir(idx);
+      final File dir = sd.getCurrentDir();
       final StorageType storageType = 
getStorageTypeFromLocations(dataLocations, dir);
{code}

- For the new perVolumeReplicaMap, it may be better to change ReplicaMap or add 
a new class (say VolumeReplicaMap, which has a volume-to-ReplicaMap map) to 
support the per-volume feature.  In that way, we won't have duplicated block 
info and don't have to worry about the consistency issue between two different 
maps.  The following points are related to this.
-* For code like perVolumeReplicaMap.get(v).add(bpid, newReplicaInfo), would 
get(v) return null?
-* The volumeMap is passed around such as BlockPoolSlice.addToReplicasMap(..) 
and it may mutate.  Would it be a problem if we have two maps?
-* The type of perVolumeReplicaMap should be Map<FsVolumeSpi, ReplicaMap>.  
Then, we do not cast the key to FsVolumeImpl in 
FsDatasetImpl.checkAndUpdate(..) and other methods.

- The old FsDatasetSpi.getBlockReport(String bpid) method should be removed.  
It is only used in tests.

- I suggest not to manually fix the imports.  We may use some IDE tools to fix 
them (e.g. cmd-shit-o in eclipse.)
{code}
-import java.util.ArrayList;
-import java.util.Collections;
-import java.util.Iterator;
-import java.util.List;
+import java.util.*;
{code}

                
> Datanode must support all the volumes as individual storages
> ------------------------------------------------------------
>
>                 Key: HDFS-4988
>                 URL: https://issues.apache.org/jira/browse/HDFS-4988
>             Project: Hadoop HDFS
>          Issue Type: Sub-task
>          Components: datanode
>            Reporter: Suresh Srinivas
>            Assignee: Arpit Agarwal
>         Attachments: HDFS-4988.01.patch, HDFS-4988.02.patch, 
> HDFS-4988.05.patch, HDFS-4988.06.patch, HDFS-4988.07.patch
>
>
> Currently all the volumes on datanode is reported as a single storage. This 
> change proposes reporting them as individual storage. This requires:
> # A unique storage ID for each storage
> #* This needs to be generated during formatting
> # There should be an option to allow existing disks to be reported as single 
> storage unit for backward compatibility.
> # A functionality is also needed to split the existing all volumes as single 
> storage unit to to individual storage units.
> # -Configuration must allow for each storage unit a storage type attribute. 
> (Now HDFS-5000)-
> # Block reports must be sent on a per storage basis. In some cases (such 
> memory tier) block reports may need to be sent more frequently. That means 
> block reporting period must be on a per storage type basis.
> My proposal is for new clusters to configure volumes by default as separate 
> storage unit. Lets discuss.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HDFS-4988) Datanode must support all the volumes as individual storages

Reply via email to