[ https://issues.apache.org/jira/browse/HDFS-6804?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14081972#comment-14081972 ]
Gordon Wang commented on HDFS-6804: ----------------------------------- Some thoughts about how to fix this issue. In my mind, there are 2 ways to fix this. * Option1 When the block is opened for appending, check if there are some DataTransfer threads which are transferring block to other DNs. Stop these DataTransferring threads. We can stop these threads because the generation timestamp of the block is increased because it is opened for appending. So, the DataTransfer threads are sending outdated blocks. * Option2 In DataTransfer thread, if the replica of the block is finalized, the DataTransfer thread can read the last data chunk checksum into the memory, record the replica length in memory too. Then, when sending the last data chunk, use the checksum in memory instead of reading it from the disk. This is similar to what we deal with a RBW replica in DataTransfer. For Option1, it is hard to stop the DataTransfer thread unless we add some code in DataNode to manage DataTransfer threads. For Option2, we should lock FsDatasetImpl object in DataNode when reading the last chunk checksum from disk. Otherwise, the last block might be overwritten. But reading from the disk needs time, putting the expensive disk IO operations during locking FsDatasetImpl might cause some performance downgrade in DataNodes. Any opinions or comments are welcome! Thanks. > race condition between transferring block and appending block causes > "Unexpected checksum mismatch exception" > -------------------------------------------------------------------------------------------------------------- > > Key: HDFS-6804 > URL: https://issues.apache.org/jira/browse/HDFS-6804 > Project: Hadoop HDFS > Issue Type: Bug > Components: datanode > Affects Versions: 2.2.0 > Reporter: Gordon Wang > > We found some error log in the datanode. like this > {noformat} > 2014-07-22 01:49:51,338 INFO org.apache.hadoop.hdfs.server.datanode.DataNode: > Ex > ception for BP-2072804351-192.168.2.104-1406008383435:blk_1073741997_9248 > java.io.IOException: Terminating due to a checksum error.java.io.IOException: > Unexpected checksum mismatch while writing > BP-2072804351-192.168.2.104-1406008383435:blk_1073741997_9248 from > /192.168.2.101:39495 > at > org.apache.hadoop.hdfs.server.datanode.BlockReceiver.receivePacket(BlockReceiver.java:536) > at > org.apache.hadoop.hdfs.server.datanode.BlockReceiver.receiveBlock(BlockReceiver.java:703) > at > org.apache.hadoop.hdfs.server.datanode.DataXceiver.writeBlock(DataXceiver.java:575) > at > org.apache.hadoop.hdfs.protocol.datatransfer.Receiver.opWriteBlock(Receiver.java:115) > at > org.apache.hadoop.hdfs.protocol.datatransfer.Receiver.processOp(Receiver.java:68) > at > org.apache.hadoop.hdfs.server.datanode.DataXceiver.run(DataXceiver.java:221) > at java.lang.Thread.run(Thread.java:744) > {noformat} > While on the source datanode, the log says the block is transmitted. > {noformat} > 2014-07-22 01:49:50,805 INFO org.apache.hadoop.hdfs.server.datanode.DataNode: > Da > taTransfer: Transmitted > BP-2072804351-192.168.2.104-1406008383435:blk_1073741997 > _9248 (numBytes=16188152) to /192.168.2.103:50010 > {noformat} > When the destination datanode gets the checksum mismatch, it reports bad > block to NameNode and NameNode marks the replica on the source datanode as > corrupt. But actually, the replica on the source datanode is valid. Because > the replica can pass the checksum verification. > In all, the replica on the source data is wrongly marked as corrupted. -- This message was sent by Atlassian JIRA (v6.2#6252)