Wei-Chiu Chuang created HDFS-11472: -------------------------------------- Summary: Fix inconsistent replica size after a data pipeline failure Key: HDFS-11472 URL: https://issues.apache.org/jira/browse/HDFS-11472 Project: Hadoop HDFS Issue Type: Bug Components: datanode Reporter: Wei-Chiu Chuang Assignee: Wei-Chiu Chuang
We observed a case where a replica's on disk length is less than acknowledged length, breaking the assumption in recovery code. {noformat} 2017-01-08 01:41:03,532 WARN org.apache.hadoop.hdfs.server.protocol.InterDatanodeProtocol: Failed to obtain replica info for block (=BP-947993742-10.204.0.136-1362248978912:blk_2526438952_1101394519586) from datanode (=DatanodeInfoWithStorage[10.204.138.17:1004,null,null]) java.io.IOException: THIS IS NOT SUPPOSED TO HAPPEN: getBytesOnDisk() < getVisibleLength(), rip=ReplicaBeingWritten, blk_2526438952_1101394519586, RBW getNumBytes() = 27530 getBytesOnDisk() = 27006 getVisibleLength()= 27268 getVolume() = /data/6/hdfs/datanode/current getBlockFile() = /data/6/hdfs/datanode/current/BP-947993742-10.204.0.136-1362248978912/current/rbw/blk_2526438952 bytesAcked=27268 bytesOnDisk=27006 at org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl.initReplicaRecovery(FsDatasetImpl.java:2284) at org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl.initReplicaRecovery(FsDatasetImpl.java:2260) at org.apache.hadoop.hdfs.server.datanode.DataNode.initReplicaRecovery(DataNode.java:2566) at org.apache.hadoop.hdfs.server.datanode.DataNode.callInitReplicaRecovery(DataNode.java:2577) at org.apache.hadoop.hdfs.server.datanode.DataNode.recoverBlock(DataNode.java:2645) at org.apache.hadoop.hdfs.server.datanode.DataNode.access$400(DataNode.java:245) at org.apache.hadoop.hdfs.server.datanode.DataNode$5.run(DataNode.java:2551) at java.lang.Thread.run(Thread.java:745) {noformat} It turns out that if an exception is thrown within {{BlockReceiver#receivePacket}}, the in-memory replica on disk length may not be updated, but the data is written to disk anyway. For example, here's one exception we observed {noformat} 2017-01-08 01:40:59,512 INFO org.apache.hadoop.hdfs.server.datanode.DataNode: Exception for BP-947993742-10.204.0.136-1362248978912:blk_2526438952_1101394499067 java.nio.channels.ClosedByInterruptException at java.nio.channels.spi.AbstractInterruptibleChannel.end(AbstractInterruptibleChannel.java:202) at sun.nio.ch.FileChannelImpl.position(FileChannelImpl.java:269) at org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl.adjustCrcChannelPosition(FsDatasetImpl.java:1484) at org.apache.hadoop.hdfs.server.datanode.BlockReceiver.adjustCrcFilePosition(BlockReceiver.java:994) at org.apache.hadoop.hdfs.server.datanode.BlockReceiver.receivePacket(BlockReceiver.java:670) at org.apache.hadoop.hdfs.server.datanode.BlockReceiver.receiveBlock(BlockReceiver.java:857) at org.apache.hadoop.hdfs.server.datanode.DataXceiver.writeBlock(DataXceiver.java:797) at org.apache.hadoop.hdfs.protocol.datatransfer.Receiver.opWriteBlock(Receiver.java:169) at org.apache.hadoop.hdfs.protocol.datatransfer.Receiver.processOp(Receiver.java:106) at org.apache.hadoop.hdfs.server.datanode.DataXceiver.run(DataXceiver.java:244) at java.lang.Thread.run(Thread.java:745) {noformat} There are potentially other places and causes where an exception is thrown within {{BlockReceiver#receivePacket}}, so it may not make much sense to alleviate it for this particular exception. Instead, we should improve replica recovery code to handle the case where ondisk size is less than acknowledged size, and update in-memory checksum accordingly. -- This message was sent by Atlassian JIRA (v6.3.15#6346) --------------------------------------------------------------------- To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org