[jira] [Updated] (HDFS-5667) StorageType and State in DatanodeStorageInfo in NameNode is not accurate
[ https://issues.apache.org/jira/browse/HDFS-5667?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Eric Sirianni updated HDFS-5667: Description: The fix for HDFS-5484 was accidentally regressed by the following change made via HDFS-5542 {code} + DatanodeStorageInfo updateStorage(DatanodeStorage s) { synchronized (storageMap) { DatanodeStorageInfo storage = storageMap.get(s.getStorageID()); if (storage == null) { @@ -670,8 +658,6 @@ for DN + getXferAddr()); storage = new DatanodeStorageInfo(this, s); storageMap.put(s.getStorageID(), storage); - } else { -storage.setState(s.getState()); } return storage; } {code} By removing the 'else' and no longer updating the state in the BlockReport processing path, we effectively get the bogus state type that is set via the first heartbeat (see the fix for HDFS-5455): {code} + if (storage == null) { +// This is seen during cluster initialization when the heartbeat +// is received before the initial block reports from each storage. +storage = updateStorage(new DatanodeStorage(report.getStorageID())); {code} Even reverting the change and reintroducing the 'else' leaves the state type temporarily inaccurate until the first block report. As discussed with [~arpitagarwal], a better fix would be to simply include the full DatanodeStorage object in the StorageReport (as opposed to only the Storage ID). This requires adding the {{DatanodeStorage}} object to {{StorageReportProto}}. It needs to be a new optional field and we cannot remove the existing {{StorageUuid}} for protocol compatibility. was: The fields in DatanodeStorageInfo are updated from two distinct paths: # block reports # storage reports (via heartbeats) The {{state}} and {{storageType}} fields are updated via the Block Report. However, as seen in the code blow, these fields are populated from a dummy {{DatanodeStorage}} object constructed in the DataNode: {code} BPServiceActor.blockReport() { //... // Dummy DatanodeStorage object just for sending the block report. DatanodeStorage dnStorage = new DatanodeStorage(storageID); //... } {code} The net effect is that the {{state}} and {{storageType}} fields are always the default of {{NORMAL}} and {{DISK}} in the NameNode. The recommended fix is to change {{FsDatasetSpi.getBlockReports()}} from: {code} public MapString, BlockListAsLongs getBlockReports(String bpid); {code} to: {code} public MapDatanodeStorage, BlockListAsLongs getBlockReports(String bpid); {code} thereby allowing {{BPServiceActor}} to send the real {{DatanodeStorage}} object with the block report. StorageType and State in DatanodeStorageInfo in NameNode is not accurate Key: HDFS-5667 URL: https://issues.apache.org/jira/browse/HDFS-5667 Project: Hadoop HDFS Issue Type: Sub-task Components: datanode Affects Versions: Heterogeneous Storage (HDFS-2832) Reporter: Eric Sirianni Fix For: Heterogeneous Storage (HDFS-2832) The fix for HDFS-5484 was accidentally regressed by the following change made via HDFS-5542 {code} + DatanodeStorageInfo updateStorage(DatanodeStorage s) { synchronized (storageMap) { DatanodeStorageInfo storage = storageMap.get(s.getStorageID()); if (storage == null) { @@ -670,8 +658,6 @@ for DN + getXferAddr()); storage = new DatanodeStorageInfo(this, s); storageMap.put(s.getStorageID(), storage); - } else { -storage.setState(s.getState()); } return storage; } {code} By removing the 'else' and no longer updating the state in the BlockReport processing path, we effectively get the bogus state type that is set via the first heartbeat (see the fix for HDFS-5455): {code} + if (storage == null) { +// This is seen during cluster initialization when the heartbeat +// is received before the initial block reports from each storage. +storage = updateStorage(new DatanodeStorage(report.getStorageID())); {code} Even reverting the change and reintroducing the 'else' leaves the state type temporarily inaccurate until the first block report. As discussed with [~arpitagarwal], a better fix would be to simply include the full DatanodeStorage object in the StorageReport (as opposed to only the Storage ID). This requires adding the {{DatanodeStorage}} object to {{StorageReportProto}}. It needs to be a new optional field and we cannot remove the existing {{StorageUuid}} for protocol compatibility. -- This message was sent by Atlassian JIRA (v6.1.4#6159)
[jira] [Updated] (HDFS-5667) StorageType and State in DatanodeStorageInfo in NameNode is not accurate
[ https://issues.apache.org/jira/browse/HDFS-5667?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Eric Sirianni updated HDFS-5667: Description: The fix for HDFS-5484 was accidentally regressed by the following change made via HDFS-5542 {code} + DatanodeStorageInfo updateStorage(DatanodeStorage s) { synchronized (storageMap) { DatanodeStorageInfo storage = storageMap.get(s.getStorageID()); if (storage == null) { @@ -670,8 +658,6 @@ for DN + getXferAddr()); storage = new DatanodeStorageInfo(this, s); storageMap.put(s.getStorageID(), storage); - } else { -storage.setState(s.getState()); } return storage; } {code} By removing the 'else' and no longer updating the state in the BlockReport processing path, we effectively get the bogus state type that is set via the first heartbeat (see the fix for HDFS-5455): {code} + if (storage == null) { +// This is seen during cluster initialization when the heartbeat +// is received before the initial block reports from each storage. +storage = updateStorage(new DatanodeStorage(report.getStorageID())); {code} Even reverting the change and reintroducing the 'else' leaves the state type temporarily inaccurate until the first block report. As discussed with [~arpitagarwal], a better fix would be to simply include the full {{DatanodeStorage}} object in the {{StorageReport}} (as opposed to only the Storage ID). This requires adding the {{DatanodeStorage}} object to {{StorageReportProto}}. It needs to be a new optional field and we cannot remove the existing {{StorageUuid}} for protocol compatibility. was: The fix for HDFS-5484 was accidentally regressed by the following change made via HDFS-5542 {code} + DatanodeStorageInfo updateStorage(DatanodeStorage s) { synchronized (storageMap) { DatanodeStorageInfo storage = storageMap.get(s.getStorageID()); if (storage == null) { @@ -670,8 +658,6 @@ for DN + getXferAddr()); storage = new DatanodeStorageInfo(this, s); storageMap.put(s.getStorageID(), storage); - } else { -storage.setState(s.getState()); } return storage; } {code} By removing the 'else' and no longer updating the state in the BlockReport processing path, we effectively get the bogus state type that is set via the first heartbeat (see the fix for HDFS-5455): {code} + if (storage == null) { +// This is seen during cluster initialization when the heartbeat +// is received before the initial block reports from each storage. +storage = updateStorage(new DatanodeStorage(report.getStorageID())); {code} Even reverting the change and reintroducing the 'else' leaves the state type temporarily inaccurate until the first block report. As discussed with [~arpitagarwal], a better fix would be to simply include the full DatanodeStorage object in the StorageReport (as opposed to only the Storage ID). This requires adding the {{DatanodeStorage}} object to {{StorageReportProto}}. It needs to be a new optional field and we cannot remove the existing {{StorageUuid}} for protocol compatibility. StorageType and State in DatanodeStorageInfo in NameNode is not accurate Key: HDFS-5667 URL: https://issues.apache.org/jira/browse/HDFS-5667 Project: Hadoop HDFS Issue Type: Sub-task Components: datanode Affects Versions: Heterogeneous Storage (HDFS-2832) Reporter: Eric Sirianni Fix For: Heterogeneous Storage (HDFS-2832) The fix for HDFS-5484 was accidentally regressed by the following change made via HDFS-5542 {code} + DatanodeStorageInfo updateStorage(DatanodeStorage s) { synchronized (storageMap) { DatanodeStorageInfo storage = storageMap.get(s.getStorageID()); if (storage == null) { @@ -670,8 +658,6 @@ for DN + getXferAddr()); storage = new DatanodeStorageInfo(this, s); storageMap.put(s.getStorageID(), storage); - } else { -storage.setState(s.getState()); } return storage; } {code} By removing the 'else' and no longer updating the state in the BlockReport processing path, we effectively get the bogus state type that is set via the first heartbeat (see the fix for HDFS-5455): {code} + if (storage == null) { +// This is seen during cluster initialization when the heartbeat +// is received before the initial block reports from each storage. +storage = updateStorage(new DatanodeStorage(report.getStorageID())); {code} Even reverting the change and reintroducing the 'else' leaves the state type temporarily inaccurate until the first block report.