[ 
https://issues.apache.org/jira/browse/HDFS-10372?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15275456#comment-15275456
 ] 

Masatake Iwasaki commented on HDFS-10372:
-----------------------------------------

I'm +1 too on this, though I think it would be better to create and write 
another file after one of the volume is removed in order to make it sure that 
the datanode is still available.

I put the error logs on my environment as a reference.

DataNode in mini cluster was started with 2 volumes (data1 and data2).

{noformat}
2016-05-08 10:00:39,003 [DataNode: 
[[[DISK]file:/home/iwasakims/srcs/hadoop/hadoop-hdfs-project/hadoop-hdfs/target/test/data/dfs/data/data1/,
 
[DISK]file:/home/iwasakims/srcs/hadoop/hadoop-hdfs-project/hadoop-hdfs/target/test/data/dfs/data/data2/]]
  heartbeating to localhost/127.0.0.1:37720] INFO  common.Storage 
(DataStorage.java:createStorageID(158)) - Generated new storageID 
DS-040e0757-ea7f-4465-80e3-9f8c00abeb83 for directory 
/home/iwasakims/srcs/hadoop/hadoop-hdfs-project/hadoop-hdfs/target/test/data/dfs/data/data1
...(snip)
2016-05-08 10:00:39,109 [DataNode: 
[[[DISK]file:/home/iwasakims/srcs/hadoop/hadoop-hdfs-project/hadoop-hdfs/target/test/data/dfs/data/data1/,
 
[DISK]file:/home/iwasakims/srcs/hadoop/hadoop-hdfs-project/hadoop-hdfs/target/test/data/dfs/data/data2/]]
  heartbeating to localhost/127.0.0.1:37720] INFO  common.Storage 
(DataStorage.java:createStorageID(158)) - Generated new storageID 
DS-8f82ba58-61ae-4cb1-b019-0c387d25b5d2 for directory 
/home/iwasakims/srcs/hadoop/hadoop-hdfs-project/hadoop-hdfs/target/test/data/dfs/data/data2
{noformat}

The volume of data1 was removed since the test broke it.

{noformat}
2016-05-08 10:00:42,321 [IPC Server handler 4 on 37720] INFO  
blockmanagement.DatanodeDescriptor 
(DatanodeDescriptor.java:updateHeartbeatState(453)) - Number of failed storages 
changes from 0 to 1
2016-05-08 10:00:42,321 [IPC Server handler 4 on 37720] INFO  
blockmanagement.DatanodeDescriptor 
(DatanodeDescriptor.java:updateFailedStorage(539)) - 
[DISK]DS-040e0757-ea7f-4465-80e3-9f8c00abeb83:NORMAL:127.0.0.1:59604 failed.
2016-05-08 10:00:42,321 [IPC Server handler 4 on 37720] INFO  
blockmanagement.DatanodeDescriptor 
(DatanodeDescriptor.java:pruneStorageMap(525)) - Removed storage 
[DISK]DS-040e0757-ea7f-4465-80e3-9f8c00abeb83:FAILED:127.0.0.1:59604 from 
DataNode 127.0.0.1:59604
{noformat}

The test expected that the message in exception on {{out.close()}} contains the 
name of failed volume (to which the replica was written) but it contained only 
info about live volume (data2).

{noformat}
testCleanShutdownOfVolume(org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.TestFsDatasetImpl)
  Time elapsed: 8.468 sec  <<< FAILURE!
java.lang.AssertionError: Expected to find 
'DatanodeInfoWithStorage[127.0.0.1:59604,DS-040e0757-ea7f-4465-80e3-9f8c00abeb83,DISK]'
 but got unexpected exception:java.io.IOException: All datanodes 
[DatanodeInfoWithStorage[127.0.0.1:59604,DS-8f82ba58-61ae-4cb1-b019-0c387d25b5d2,DISK]]
 are bad. Aborting...
        at 
org.apache.hadoop.hdfs.DataStreamer.handleBadDatanode(DataStreamer.java:1395)
        at 
org.apache.hadoop.hdfs.DataStreamer.setupPipelineInternal(DataStreamer.java:1338)
        at 
org.apache.hadoop.hdfs.DataStreamer.setupPipelineForAppendOrRecovery(DataStreamer.java:1325)
        at 
org.apache.hadoop.hdfs.DataStreamer.processDatanodeOrExternalError(DataStreamer.java:1122)
        at org.apache.hadoop.hdfs.DataStreamer.run(DataStreamer.java:549)
{noformat}


> Fix for failing TestFsDatasetImpl#testCleanShutdownOfVolume
> -----------------------------------------------------------
>
>                 Key: HDFS-10372
>                 URL: https://issues.apache.org/jira/browse/HDFS-10372
>             Project: Hadoop HDFS
>          Issue Type: Bug
>          Components: test
>    Affects Versions: 2.7.3
>            Reporter: Rushabh S Shah
>            Assignee: Rushabh S Shah
>         Attachments: HDFS-10372.patch
>
>
> TestFsDatasetImpl#testCleanShutdownOfVolume fails very often.
> We added more debug information in HDFS-10260 to find out why this test is 
> failing.
> Now I think I know the root cause of failure.
> I thought that {{LocatedBlock#getLocations()}} returns an array of 
> DatanodeInfo but now I realized that it returns an array of 
> DatandeStorageInfo (which is subclass of DatanodeInfo).
> In the test I intended to check whether the exception contains the xfer 
> address of the DatanodeInfo. Since {{DatanodeInfo#toString()}} method returns 
> the xfer address, I checked whether exception contains 
> {{DatanodeInfo#toString}} or not.
> But since  {{LocatedBlock#getLocations()}} returned an array of 
> DatanodeStorageInfo, it has storage info in the toString() implementation.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to