[
https://issues.apache.org/jira/browse/HDFS-10372?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15275456#comment-15275456
]
Masatake Iwasaki commented on HDFS-10372:
-----------------------------------------
I'm +1 too on this, though I think it would be better to create and write
another file after one of the volume is removed in order to make it sure that
the datanode is still available.
I put the error logs on my environment as a reference.
DataNode in mini cluster was started with 2 volumes (data1 and data2).
{noformat}
2016-05-08 10:00:39,003 [DataNode:
[[[DISK]file:/home/iwasakims/srcs/hadoop/hadoop-hdfs-project/hadoop-hdfs/target/test/data/dfs/data/data1/,
[DISK]file:/home/iwasakims/srcs/hadoop/hadoop-hdfs-project/hadoop-hdfs/target/test/data/dfs/data/data2/]]
heartbeating to localhost/127.0.0.1:37720] INFO common.Storage
(DataStorage.java:createStorageID(158)) - Generated new storageID
DS-040e0757-ea7f-4465-80e3-9f8c00abeb83 for directory
/home/iwasakims/srcs/hadoop/hadoop-hdfs-project/hadoop-hdfs/target/test/data/dfs/data/data1
...(snip)
2016-05-08 10:00:39,109 [DataNode:
[[[DISK]file:/home/iwasakims/srcs/hadoop/hadoop-hdfs-project/hadoop-hdfs/target/test/data/dfs/data/data1/,
[DISK]file:/home/iwasakims/srcs/hadoop/hadoop-hdfs-project/hadoop-hdfs/target/test/data/dfs/data/data2/]]
heartbeating to localhost/127.0.0.1:37720] INFO common.Storage
(DataStorage.java:createStorageID(158)) - Generated new storageID
DS-8f82ba58-61ae-4cb1-b019-0c387d25b5d2 for directory
/home/iwasakims/srcs/hadoop/hadoop-hdfs-project/hadoop-hdfs/target/test/data/dfs/data/data2
{noformat}
The volume of data1 was removed since the test broke it.
{noformat}
2016-05-08 10:00:42,321 [IPC Server handler 4 on 37720] INFO
blockmanagement.DatanodeDescriptor
(DatanodeDescriptor.java:updateHeartbeatState(453)) - Number of failed storages
changes from 0 to 1
2016-05-08 10:00:42,321 [IPC Server handler 4 on 37720] INFO
blockmanagement.DatanodeDescriptor
(DatanodeDescriptor.java:updateFailedStorage(539)) -
[DISK]DS-040e0757-ea7f-4465-80e3-9f8c00abeb83:NORMAL:127.0.0.1:59604 failed.
2016-05-08 10:00:42,321 [IPC Server handler 4 on 37720] INFO
blockmanagement.DatanodeDescriptor
(DatanodeDescriptor.java:pruneStorageMap(525)) - Removed storage
[DISK]DS-040e0757-ea7f-4465-80e3-9f8c00abeb83:FAILED:127.0.0.1:59604 from
DataNode 127.0.0.1:59604
{noformat}
The test expected that the message in exception on {{out.close()}} contains the
name of failed volume (to which the replica was written) but it contained only
info about live volume (data2).
{noformat}
testCleanShutdownOfVolume(org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.TestFsDatasetImpl)
Time elapsed: 8.468 sec <<< FAILURE!
java.lang.AssertionError: Expected to find
'DatanodeInfoWithStorage[127.0.0.1:59604,DS-040e0757-ea7f-4465-80e3-9f8c00abeb83,DISK]'
but got unexpected exception:java.io.IOException: All datanodes
[DatanodeInfoWithStorage[127.0.0.1:59604,DS-8f82ba58-61ae-4cb1-b019-0c387d25b5d2,DISK]]
are bad. Aborting...
at
org.apache.hadoop.hdfs.DataStreamer.handleBadDatanode(DataStreamer.java:1395)
at
org.apache.hadoop.hdfs.DataStreamer.setupPipelineInternal(DataStreamer.java:1338)
at
org.apache.hadoop.hdfs.DataStreamer.setupPipelineForAppendOrRecovery(DataStreamer.java:1325)
at
org.apache.hadoop.hdfs.DataStreamer.processDatanodeOrExternalError(DataStreamer.java:1122)
at org.apache.hadoop.hdfs.DataStreamer.run(DataStreamer.java:549)
{noformat}
> Fix for failing TestFsDatasetImpl#testCleanShutdownOfVolume
> -----------------------------------------------------------
>
> Key: HDFS-10372
> URL: https://issues.apache.org/jira/browse/HDFS-10372
> Project: Hadoop HDFS
> Issue Type: Bug
> Components: test
> Affects Versions: 2.7.3
> Reporter: Rushabh S Shah
> Assignee: Rushabh S Shah
> Attachments: HDFS-10372.patch
>
>
> TestFsDatasetImpl#testCleanShutdownOfVolume fails very often.
> We added more debug information in HDFS-10260 to find out why this test is
> failing.
> Now I think I know the root cause of failure.
> I thought that {{LocatedBlock#getLocations()}} returns an array of
> DatanodeInfo but now I realized that it returns an array of
> DatandeStorageInfo (which is subclass of DatanodeInfo).
> In the test I intended to check whether the exception contains the xfer
> address of the DatanodeInfo. Since {{DatanodeInfo#toString()}} method returns
> the xfer address, I checked whether exception contains
> {{DatanodeInfo#toString}} or not.
> But since {{LocatedBlock#getLocations()}} returned an array of
> DatanodeStorageInfo, it has storage info in the toString() implementation.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]