[ 
https://issues.apache.org/jira/browse/HDFS-5667?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Eric Sirianni updated HDFS-5667:
--------------------------------

    Description: 
The fix for HDFS-5484 was accidentally regressed by the following change made 
via HDFS-5542
{code}
+  DatanodeStorageInfo updateStorage(DatanodeStorage s) {
     synchronized (storageMap) {
       DatanodeStorageInfo storage = storageMap.get(s.getStorageID());
       if (storage == null) {
@@ -670,8 +658,6 @@
                  " for DN " + getXferAddr());
         storage = new DatanodeStorageInfo(this, s);
         storageMap.put(s.getStorageID(), storage);
-      } else {
-        storage.setState(s.getState());
       }
       return storage;
     }
{code}

By removing the 'else' and no longer updating the state in the BlockReport 
processing path, we effectively get the bogus state & type that is set via the 
first heartbeat (see the fix for HDFS-5455):

{code}
+      if (storage == null) {
+        // This is seen during cluster initialization when the heartbeat
+        // is received before the initial block reports from each storage.
+        storage = updateStorage(new DatanodeStorage(report.getStorageID()));
{code}
Even reverting the change and reintroducing the 'else' leaves the state & type 
temporarily inaccurate until the first block report. 

As discussed with [~arpitagarwal], a better fix would be to simply include the 
full {{DatanodeStorage}} object in the {{StorageReport}} (as opposed to only 
the Storage ID).  This requires adding the {{DatanodeStorage}} object to 
{{StorageReportProto}}. It needs to be a new optional field and we cannot 
remove the existing {{StorageUuid}} for protocol compatibility.

  was:
The fix for HDFS-5484 was accidentally regressed by the following change made 
via HDFS-5542
{code}
+  DatanodeStorageInfo updateStorage(DatanodeStorage s) {
     synchronized (storageMap) {
       DatanodeStorageInfo storage = storageMap.get(s.getStorageID());
       if (storage == null) {
@@ -670,8 +658,6 @@
                  " for DN " + getXferAddr());
         storage = new DatanodeStorageInfo(this, s);
         storageMap.put(s.getStorageID(), storage);
-      } else {
-        storage.setState(s.getState());
       }
       return storage;
     }
{code}

By removing the 'else' and no longer updating the state in the BlockReport 
processing path, we effectively get the bogus state & type that is set via the 
first heartbeat (see the fix for HDFS-5455):

{code}
+      if (storage == null) {
+        // This is seen during cluster initialization when the heartbeat
+        // is received before the initial block reports from each storage.
+        storage = updateStorage(new DatanodeStorage(report.getStorageID()));
{code}
Even reverting the change and reintroducing the 'else' leaves the state & type 
temporarily inaccurate until the first block report. 

As discussed with [~arpitagarwal], a better fix would be to simply include the 
full DatanodeStorage object in the StorageReport (as opposed to only the 
Storage ID).  This requires adding the {{DatanodeStorage}} object to 
{{StorageReportProto}}. It needs to be a new optional field and we cannot 
remove the existing {{StorageUuid}} for protocol compatibility.


> StorageType and State in DatanodeStorageInfo in NameNode is not accurate
> ------------------------------------------------------------------------
>
>                 Key: HDFS-5667
>                 URL: https://issues.apache.org/jira/browse/HDFS-5667
>             Project: Hadoop HDFS
>          Issue Type: Sub-task
>          Components: datanode
>    Affects Versions: Heterogeneous Storage (HDFS-2832)
>            Reporter: Eric Sirianni
>             Fix For: Heterogeneous Storage (HDFS-2832)
>
>
> The fix for HDFS-5484 was accidentally regressed by the following change made 
> via HDFS-5542
> {code}
> +  DatanodeStorageInfo updateStorage(DatanodeStorage s) {
>      synchronized (storageMap) {
>        DatanodeStorageInfo storage = storageMap.get(s.getStorageID());
>        if (storage == null) {
> @@ -670,8 +658,6 @@
>                   " for DN " + getXferAddr());
>          storage = new DatanodeStorageInfo(this, s);
>          storageMap.put(s.getStorageID(), storage);
> -      } else {
> -        storage.setState(s.getState());
>        }
>        return storage;
>      }
> {code}
> By removing the 'else' and no longer updating the state in the BlockReport 
> processing path, we effectively get the bogus state & type that is set via 
> the first heartbeat (see the fix for HDFS-5455):
> {code}
> +      if (storage == null) {
> +        // This is seen during cluster initialization when the heartbeat
> +        // is received before the initial block reports from each storage.
> +        storage = updateStorage(new DatanodeStorage(report.getStorageID()));
> {code}
> Even reverting the change and reintroducing the 'else' leaves the state & 
> type temporarily inaccurate until the first block report. 
> As discussed with [~arpitagarwal], a better fix would be to simply include 
> the full {{DatanodeStorage}} object in the {{StorageReport}} (as opposed to 
> only the Storage ID).  This requires adding the {{DatanodeStorage}} object to 
> {{StorageReportProto}}. It needs to be a new optional field and we cannot 
> remove the existing {{StorageUuid}} for protocol compatibility.



--
This message was sent by Atlassian JIRA
(v6.1.4#6159)

Reply via email to