[ https://issues.apache.org/jira/browse/HDFS-5667?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Eric Sirianni updated HDFS-5667: -------------------------------- Description: The fix for HDFS-5484 was accidentally regressed by the following change made via HDFS-5542 {code} + DatanodeStorageInfo updateStorage(DatanodeStorage s) { synchronized (storageMap) { DatanodeStorageInfo storage = storageMap.get(s.getStorageID()); if (storage == null) { @@ -670,8 +658,6 @@ " for DN " + getXferAddr()); storage = new DatanodeStorageInfo(this, s); storageMap.put(s.getStorageID(), storage); - } else { - storage.setState(s.getState()); } return storage; } {code} By removing the 'else' and no longer updating the state in the BlockReport processing path, we effectively get the bogus state & type that is set via the first heartbeat (see the fix for HDFS-5455): {code} + if (storage == null) { + // This is seen during cluster initialization when the heartbeat + // is received before the initial block reports from each storage. + storage = updateStorage(new DatanodeStorage(report.getStorageID())); {code} Even reverting the change and reintroducing the 'else' leaves the state & type temporarily inaccurate until the first block report. As discussed with [~arpitagarwal], a better fix would be to simply include the full {{DatanodeStorage}} object in the {{StorageReport}} (as opposed to only the Storage ID). This requires adding the {{DatanodeStorage}} object to {{StorageReportProto}}. It needs to be a new optional field and we cannot remove the existing {{StorageUuid}} for protocol compatibility. was: The fix for HDFS-5484 was accidentally regressed by the following change made via HDFS-5542 {code} + DatanodeStorageInfo updateStorage(DatanodeStorage s) { synchronized (storageMap) { DatanodeStorageInfo storage = storageMap.get(s.getStorageID()); if (storage == null) { @@ -670,8 +658,6 @@ " for DN " + getXferAddr()); storage = new DatanodeStorageInfo(this, s); storageMap.put(s.getStorageID(), storage); - } else { - storage.setState(s.getState()); } return storage; } {code} By removing the 'else' and no longer updating the state in the BlockReport processing path, we effectively get the bogus state & type that is set via the first heartbeat (see the fix for HDFS-5455): {code} + if (storage == null) { + // This is seen during cluster initialization when the heartbeat + // is received before the initial block reports from each storage. + storage = updateStorage(new DatanodeStorage(report.getStorageID())); {code} Even reverting the change and reintroducing the 'else' leaves the state & type temporarily inaccurate until the first block report. As discussed with [~arpitagarwal], a better fix would be to simply include the full DatanodeStorage object in the StorageReport (as opposed to only the Storage ID). This requires adding the {{DatanodeStorage}} object to {{StorageReportProto}}. It needs to be a new optional field and we cannot remove the existing {{StorageUuid}} for protocol compatibility. > StorageType and State in DatanodeStorageInfo in NameNode is not accurate > ------------------------------------------------------------------------ > > Key: HDFS-5667 > URL: https://issues.apache.org/jira/browse/HDFS-5667 > Project: Hadoop HDFS > Issue Type: Sub-task > Components: datanode > Affects Versions: Heterogeneous Storage (HDFS-2832) > Reporter: Eric Sirianni > Fix For: Heterogeneous Storage (HDFS-2832) > > > The fix for HDFS-5484 was accidentally regressed by the following change made > via HDFS-5542 > {code} > + DatanodeStorageInfo updateStorage(DatanodeStorage s) { > synchronized (storageMap) { > DatanodeStorageInfo storage = storageMap.get(s.getStorageID()); > if (storage == null) { > @@ -670,8 +658,6 @@ > " for DN " + getXferAddr()); > storage = new DatanodeStorageInfo(this, s); > storageMap.put(s.getStorageID(), storage); > - } else { > - storage.setState(s.getState()); > } > return storage; > } > {code} > By removing the 'else' and no longer updating the state in the BlockReport > processing path, we effectively get the bogus state & type that is set via > the first heartbeat (see the fix for HDFS-5455): > {code} > + if (storage == null) { > + // This is seen during cluster initialization when the heartbeat > + // is received before the initial block reports from each storage. > + storage = updateStorage(new DatanodeStorage(report.getStorageID())); > {code} > Even reverting the change and reintroducing the 'else' leaves the state & > type temporarily inaccurate until the first block report. > As discussed with [~arpitagarwal], a better fix would be to simply include > the full {{DatanodeStorage}} object in the {{StorageReport}} (as opposed to > only the Storage ID). This requires adding the {{DatanodeStorage}} object to > {{StorageReportProto}}. It needs to be a new optional field and we cannot > remove the existing {{StorageUuid}} for protocol compatibility. -- This message was sent by Atlassian JIRA (v6.1.4#6159)