[ https://issues.apache.org/jira/browse/HDFS-11472?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Wei-Chiu Chuang updated HDFS-11472: ----------------------------------- Attachment: HDFS-11472.testcase.patch IMHO, this boils down to that replica recovery does not consider the case where ondisk length can be less than acknowledged length. Attach a sample test to reproduce the replica recovery bug. > Fix inconsistent replica size after a data pipeline failure > ----------------------------------------------------------- > > Key: HDFS-11472 > URL: https://issues.apache.org/jira/browse/HDFS-11472 > Project: Hadoop HDFS > Issue Type: Bug > Components: datanode > Reporter: Wei-Chiu Chuang > Assignee: Wei-Chiu Chuang > Attachments: HDFS-11472.testcase.patch > > > We observed a case where a replica's on disk length is less than acknowledged > length, breaking the assumption in recovery code. > {noformat} > 2017-01-08 01:41:03,532 WARN > org.apache.hadoop.hdfs.server.protocol.InterDatanodeProtocol: Failed to > obtain replica info for block > (=BP-947993742-10.204.0.136-1362248978912:blk_2526438952_1101394519586) from > datanode (=DatanodeInfoWithStorage[10.204.138.17:1004,null,null]) > java.io.IOException: THIS IS NOT SUPPOSED TO HAPPEN: getBytesOnDisk() < > getVisibleLength(), rip=ReplicaBeingWritten, blk_2526438952_1101394519586, RBW > getNumBytes() = 27530 > getBytesOnDisk() = 27006 > getVisibleLength()= 27268 > getVolume() = /data/6/hdfs/datanode/current > getBlockFile() = > /data/6/hdfs/datanode/current/BP-947993742-10.204.0.136-1362248978912/current/rbw/blk_2526438952 > bytesAcked=27268 > bytesOnDisk=27006 > at > org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl.initReplicaRecovery(FsDatasetImpl.java:2284) > at > org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl.initReplicaRecovery(FsDatasetImpl.java:2260) > at > org.apache.hadoop.hdfs.server.datanode.DataNode.initReplicaRecovery(DataNode.java:2566) > at > org.apache.hadoop.hdfs.server.datanode.DataNode.callInitReplicaRecovery(DataNode.java:2577) > at > org.apache.hadoop.hdfs.server.datanode.DataNode.recoverBlock(DataNode.java:2645) > at > org.apache.hadoop.hdfs.server.datanode.DataNode.access$400(DataNode.java:245) > at > org.apache.hadoop.hdfs.server.datanode.DataNode$5.run(DataNode.java:2551) > at java.lang.Thread.run(Thread.java:745) > {noformat} > It turns out that if an exception is thrown within > {{BlockReceiver#receivePacket}}, the in-memory replica on disk length may not > be updated, but the data is written to disk anyway. > For example, here's one exception we observed > {noformat} > 2017-01-08 01:40:59,512 INFO org.apache.hadoop.hdfs.server.datanode.DataNode: > Exception for > BP-947993742-10.204.0.136-1362248978912:blk_2526438952_1101394499067 > java.nio.channels.ClosedByInterruptException > at > java.nio.channels.spi.AbstractInterruptibleChannel.end(AbstractInterruptibleChannel.java:202) > at sun.nio.ch.FileChannelImpl.position(FileChannelImpl.java:269) > at > org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl.adjustCrcChannelPosition(FsDatasetImpl.java:1484) > at > org.apache.hadoop.hdfs.server.datanode.BlockReceiver.adjustCrcFilePosition(BlockReceiver.java:994) > at > org.apache.hadoop.hdfs.server.datanode.BlockReceiver.receivePacket(BlockReceiver.java:670) > at > org.apache.hadoop.hdfs.server.datanode.BlockReceiver.receiveBlock(BlockReceiver.java:857) > at > org.apache.hadoop.hdfs.server.datanode.DataXceiver.writeBlock(DataXceiver.java:797) > at > org.apache.hadoop.hdfs.protocol.datatransfer.Receiver.opWriteBlock(Receiver.java:169) > at > org.apache.hadoop.hdfs.protocol.datatransfer.Receiver.processOp(Receiver.java:106) > at > org.apache.hadoop.hdfs.server.datanode.DataXceiver.run(DataXceiver.java:244) > at java.lang.Thread.run(Thread.java:745) > {noformat} > There are potentially other places and causes where an exception is thrown > within {{BlockReceiver#receivePacket}}, so it may not make much sense to > alleviate it for this particular exception. Instead, we should improve > replica recovery code to handle the case where ondisk size is less than > acknowledged size, and update in-memory checksum accordingly. -- This message was sent by Atlassian JIRA (v6.3.15#6346) --------------------------------------------------------------------- To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org