[ https://issues.apache.org/jira/browse/HDFS-6804?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16311238#comment-16311238 ]
Brahma Reddy Battula commented on HDFS-6804: -------------------------------------------- [~jojochuang] can you please look once.Hope it's ready for commit. > race condition between transferring block and appending block causes > "Unexpected checksum mismatch exception" > -------------------------------------------------------------------------------------------------------------- > > Key: HDFS-6804 > URL: https://issues.apache.org/jira/browse/HDFS-6804 > Project: Hadoop HDFS > Issue Type: Bug > Components: datanode > Affects Versions: 2.2.0 > Reporter: Gordon Wang > Assignee: Brahma Reddy Battula > Attachments: HDFS-6804-branch-2.8-002.patch, > HDFS-6804-branch-2.8-003.patch, HDFS-6804-branch-2.8.patch, > Testcase_append_transfer_block.patch > > > We found some error log in the datanode. like this > {noformat} > 2014-07-22 01:49:51,338 INFO org.apache.hadoop.hdfs.server.datanode.DataNode: > Ex > ception for BP-2072804351-192.168.2.104-1406008383435:blk_1073741997_9248 > java.io.IOException: Terminating due to a checksum error.java.io.IOException: > Unexpected checksum mismatch while writing > BP-2072804351-192.168.2.104-1406008383435:blk_1073741997_9248 from > /192.168.2.101:39495 > at > org.apache.hadoop.hdfs.server.datanode.BlockReceiver.receivePacket(BlockReceiver.java:536) > at > org.apache.hadoop.hdfs.server.datanode.BlockReceiver.receiveBlock(BlockReceiver.java:703) > at > org.apache.hadoop.hdfs.server.datanode.DataXceiver.writeBlock(DataXceiver.java:575) > at > org.apache.hadoop.hdfs.protocol.datatransfer.Receiver.opWriteBlock(Receiver.java:115) > at > org.apache.hadoop.hdfs.protocol.datatransfer.Receiver.processOp(Receiver.java:68) > at > org.apache.hadoop.hdfs.server.datanode.DataXceiver.run(DataXceiver.java:221) > at java.lang.Thread.run(Thread.java:744) > {noformat} > While on the source datanode, the log says the block is transmitted. > {noformat} > 2014-07-22 01:49:50,805 INFO org.apache.hadoop.hdfs.server.datanode.DataNode: > Da > taTransfer: Transmitted > BP-2072804351-192.168.2.104-1406008383435:blk_1073741997 > _9248 (numBytes=16188152) to /192.168.2.103:50010 > {noformat} > When the destination datanode gets the checksum mismatch, it reports bad > block to NameNode and NameNode marks the replica on the source datanode as > corrupt. But actually, the replica on the source datanode is valid. Because > the replica can pass the checksum verification. > In all, the replica on the source data is wrongly marked as corrupted. -- This message was sent by Atlassian JIRA (v6.4.14#64029) --------------------------------------------------------------------- To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org