[ 
https://issues.apache.org/jira/browse/HDFS-10960?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15546654#comment-15546654
 ] 

Manoj Govindassamy commented on HDFS-10960:
-------------------------------------------


Looking at the code, remove volumes at DataNode can potentially interrupt 
BlockReceiver and if the BlockReceiver happens to be in some IO operations like 
flushing or setting channel position for the new checksum then it can throw 
IOException. {{BlockReceiver}} on getting IOexception, starts a thread to check 
for disk errors. 

TestDataNodeHotSwapVolumes#testRemoveVolumeBeingWritten verification fails if 
the DataNode ever started a disk error check thread. This verification doesn't 
seem to be fruitful as we already have another verification for checking the 
block replication factor. So, the proposal here is to replace this not so 
useful verification with another verification to check for if the disk removal 
happened successfully and if the replication factor of the block caught up even 
after the volume removal.

> TestDataNodeHotSwapVolumes#testRemoveVolumeBeingWritten fails at disk error 
> verification after volume remove
> ------------------------------------------------------------------------------------------------------------
>
>                 Key: HDFS-10960
>                 URL: https://issues.apache.org/jira/browse/HDFS-10960
>             Project: Hadoop HDFS
>          Issue Type: Bug
>          Components: hdfs
>    Affects Versions: 3.0.0-alpha2
>            Reporter: Manoj Govindassamy
>            Assignee: Manoj Govindassamy
>            Priority: Minor
>
> TestDataNodeHotSwapVolumes#testRemoveVolumeBeingWritten fails occasionally in 
> the following verification.
> {code}
>   700     // If an IOException thrown from BlockReceiver#run, it triggers
>   701     // DataNode#checkDiskError(). So we can test whether 
> checkDiskError() is called,
>   702     // to see whether there is IOException in BlockReceiver#run().
>   703     assertEquals(lastTimeDiskErrorCheck, dn.getLastDiskErrorCheck());
>   704 
> {code}
> {noformat}
> Error Message
> expected:<0> but was:<6498109>
> Stacktrace
> java.lang.AssertionError: expected:<0> but was:<6498109>
>       at org.junit.Assert.fail(Assert.java:88)
>       at org.junit.Assert.failNotEquals(Assert.java:743)
>       at org.junit.Assert.assertEquals(Assert.java:118)
>       at org.junit.Assert.assertEquals(Assert.java:555)
>       at org.junit.Assert.assertEquals(Assert.java:542)
>       at 
> org.apache.hadoop.hdfs.server.datanode.TestDataNodeHotSwapVolumes.testRemoveVolumeBeingWrittenForDatanode(TestDataNodeHotSwapVolumes.java:703)
>       at 
> org.apache.hadoop.hdfs.server.datanode.TestDataNodeHotSwapVolumes.testRemoveVolumeBeingWritten(TestDataNodeHotSwapVolumes.java:620)
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

Reply via email to