[ https://issues.apache.org/jira/browse/HDFS-6804?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15433292#comment-15433292 ]
Yongjun Zhang commented on HDFS-6804: ------------------------------------- Thanks for working on [~jojochuang]. For HDFS-10652, I was able to see HDFS-4660 symptom after reverting the HDFS-4660 / HDFS-9220 fix. If doing that help reproducing the issue here, maybe it's worth a trial. > race condition between transferring block and appending block causes > "Unexpected checksum mismatch exception" > -------------------------------------------------------------------------------------------------------------- > > Key: HDFS-6804 > URL: https://issues.apache.org/jira/browse/HDFS-6804 > Project: Hadoop HDFS > Issue Type: Bug > Components: datanode > Affects Versions: 2.2.0 > Reporter: Gordon Wang > Assignee: Wei-Chiu Chuang > > We found some error log in the datanode. like this > {noformat} > 2014-07-22 01:49:51,338 INFO org.apache.hadoop.hdfs.server.datanode.DataNode: > Ex > ception for BP-2072804351-192.168.2.104-1406008383435:blk_1073741997_9248 > java.io.IOException: Terminating due to a checksum error.java.io.IOException: > Unexpected checksum mismatch while writing > BP-2072804351-192.168.2.104-1406008383435:blk_1073741997_9248 from > /192.168.2.101:39495 > at > org.apache.hadoop.hdfs.server.datanode.BlockReceiver.receivePacket(BlockReceiver.java:536) > at > org.apache.hadoop.hdfs.server.datanode.BlockReceiver.receiveBlock(BlockReceiver.java:703) > at > org.apache.hadoop.hdfs.server.datanode.DataXceiver.writeBlock(DataXceiver.java:575) > at > org.apache.hadoop.hdfs.protocol.datatransfer.Receiver.opWriteBlock(Receiver.java:115) > at > org.apache.hadoop.hdfs.protocol.datatransfer.Receiver.processOp(Receiver.java:68) > at > org.apache.hadoop.hdfs.server.datanode.DataXceiver.run(DataXceiver.java:221) > at java.lang.Thread.run(Thread.java:744) > {noformat} > While on the source datanode, the log says the block is transmitted. > {noformat} > 2014-07-22 01:49:50,805 INFO org.apache.hadoop.hdfs.server.datanode.DataNode: > Da > taTransfer: Transmitted > BP-2072804351-192.168.2.104-1406008383435:blk_1073741997 > _9248 (numBytes=16188152) to /192.168.2.103:50010 > {noformat} > When the destination datanode gets the checksum mismatch, it reports bad > block to NameNode and NameNode marks the replica on the source datanode as > corrupt. But actually, the replica on the source datanode is valid. Because > the replica can pass the checksum verification. > In all, the replica on the source data is wrongly marked as corrupted. -- This message was sent by Atlassian JIRA (v6.3.4#6332) --------------------------------------------------------------------- To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org