[ https://issues.apache.org/jira/browse/HDFS-10587?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
xupeng updated HDFS-10587: -------------------------- Comment: was deleted (was: And here are the logs : Hbase log {noformat} 2016-07-13 11:48:29,475 WARN [ResponseProcessor for block BP-448958278-10.6.130.96-1457941856632:blk_1116167880_42905642] hdfs.DFSClient: DFSOutputStream ResponseProcessor exception for block BP-448958278-10.6.130.96-1457941856632:blk_1116167880_42905642 java.io.IOException: Bad response ERROR for block BP-448958278-10.6.130.96-1457941856632:blk_1116167880_42905642 from datanode DatanodeInfoWithStorage[10.6.128.208:5080,DS-b20d6263-ef6b-46ba-9613-faf6d24231da,SSD] at org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer$ResponseProcessor.run(DFSOutputStream.java:909) 2016-07-13 11:48:29,476 WARN [DataStreamer for file /ssd2/hbase_tsdb22/WALs/n6-130-044.byted.org,31356,1468326625039/n6-130-044.byted.org%2C31356%2C1468326625039.null1.1468381657104 block BP-448958278-10.6.130.96-1457941856632:blk_1116167880_42905642] hdfs.DFSClient: Error Recovery for block BP-448958278-10.6.130.96-1457941856632:blk_1116167880_42905642 in pipeline DatanodeInfoWithStorage[10.6.134.228:5080,DS-ad10b254-5803-4109-a550-e07444a129c9,SSD], DatanodeInfoWithStorage[10.6.128.215:5080,DS-0f4dfb1f-225c-44cd-928a-f7420bcd96b9,SSD], DatanodeInfoWithStorage[10.6.128.208:5080,DS-b20d6263-ef6b-46ba-9613-faf6d24231da,SSD]: bad datanode DatanodeInfoWithStorage[10.6.128.208:5080,DS-b20d6263-ef6b-46ba-9613-faf6d24231da,SSD] 2016-07-13 11:49:01,499 WARN [ResponseProcessor for block BP-448958278-10.6.130.96-1457941856632:blk_1116167880_42906656] hdfs.DFSClient: DFSOutputStream ResponseProcessor exception for block BP-448958278-10.6.130.96-1457941856632:blk_1116167880_42906656 java.io.IOException: Bad response ERROR for block BP-448958278-10.6.130.96-1457941856632:blk_1116167880_42906656 from datanode DatanodeInfoWithStorage[10.6.128.215:5080,DS-0f4dfb1f-225c-44cd-928a-f7420bcd96b9,SSD] at org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer$ResponseProcessor.run(DFSOutputStream.java:909) 2016-07-13 11:49:01,500 WARN [DataStreamer for file /ssd2/hbase_tsdb22/WALs/n6-130-044.byted.org,31356,1468326625039/n6-130-044.byted.org%2C31356%2C1468326625039.null1.1468381657104 block BP-448958278-10.6.130.96-1457941856632:blk_1116167880_42906656] hdfs.DFSClient: Error Recovery for block BP-448958278-10.6.130.96-1457941856632:blk_1116167880_42906656 in pipeline DatanodeInfoWithStorage[10.6.134.228:5080,DS-ad10b254-5803-4109-a550-e07444a129c9,SSD], DatanodeInfoWithStorage[10.6.134.229:5080,DS-8c209fca-9b34-4a6b-919b-6b4d24a3e13a,SSD], DatanodeInfoWithStorage[10.6.128.215:5080,DS-0f4dfb1f-225c-44cd-928a-f7420bcd96b9,SSD]: bad datanode DatanodeInfoWithStorage[10.6.128.215:5080,DS-0f4dfb1f-225c-44cd-928a-f7420bcd96b9,SSD] 2016-07-13 11:49:01,566 INFO [DataStreamer for file /ssd2/hbase_tsdb22/WALs/n6-130-044.byted.org,31356,1468326625039/n6-130-044.byted.org%2C31356%2C1468326625039.null1.1468381657104 block BP-448958278-10.6.130.96-1457941856632:blk_1116167880_42906656] hdfs.DFSClient: Exception in createBlockOutputStream java.io.IOException: Bad connect ack with firstBadLink as 10.6.129.77:5080 at org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.createBlockOutputStream(DFSOutputStream.java:1472) at org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.setupPipelineForAppendOrRecovery(DFSOutputStream.java:1293) at org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.processDatanodeError(DFSOutputStream.java:1016) at org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.run(DFSOutputStream.java:560) {noformat} 10.6.128.215Log {noformat} 2016-07-13 11:48:29,555 INFO org.apache.hadoop.hdfs.server.datanode.DataNode: Receiving BP-448958278-10.6.130.96-1457941856632:blk_1116167880_42905642 src: /10.6.134.229:19009 dest: /10.6.128.215:5080 2016-07-13 11:48:29,555 INFO org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl: Recover RBW replica BP-448958278-10.6.130.96-1457941856632:blk_1116167880_42905642 2016-07-13 11:48:29,555 INFO org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl: Recovering ReplicaBeingWritten, blk_1116167880_42905642, RBW getNumBytes() = 9912487 getBytesOnDisk() = 9912487 getVisibleLength()= 9911790 getVolume() = /data12/yarn/dndata/current getBlockFile() = /data12/yarn/dndata/current/BP-448958278-10.6.130.96-1457941856632/current/rbw/blk_1116167880 bytesAcked=9911790 bytesOnDisk=9912487 2016-07-13 11:48:29,555 INFO org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl: truncateBlock: blockFile=/data12/yarn/dndata/current/BP-448958278-10.6.130.96-1457941856632/current/rbw/blk_1116167880, metaFile=/data12/yarn/dndata/current/BP-448958278-10.6.130.96-1457941856632/current/rbw/blk_1116167880_42905642.meta, oldlen=9912487, newlen=9911790 {noformat} 10.6.134.229) > Incorrect offset/length calculation in pipeline recovery causes block > corruption > -------------------------------------------------------------------------------- > > Key: HDFS-10587 > URL: https://issues.apache.org/jira/browse/HDFS-10587 > Project: Hadoop HDFS > Issue Type: Bug > Components: datanode > Reporter: Wei-Chiu Chuang > Assignee: Wei-Chiu Chuang > Attachments: HDFS-10587.001.patch > > > We found incorrect offset and length calculation in pipeline recovery may > cause block corruption and results in missing blocks under a very unfortunate > scenario. > (1) A client established pipeline and started writing data to the pipeline. > (2) One of the data node in the pipeline restarted, closing the socket, and > some written data were unacknowledged. > (3) Client replaced the failed data node with a new one, initiating block > transfer to copy existing data in the block to the new datanode. > (4) The block is transferred to the new node. Crucially, the entire block, > including the unacknowledged data, was transferred. > (5) The last chunk (512 bytes) was not a full chunk, but the destination > still reserved the whole chunk in its buffer, and wrote the entire buffer to > disk, therefore some written data is garbage. > (6) When the transfer was done, the destination data node converted the > replica from temporary to rbw, which made its visible length as the length of > bytes on disk. That is to say, it thought whatever was transferred was > acknowledged. However, the visible length of the replica is different (round > up to the next multiple of 512) than the source of transfer. [1] > (7) Client then truncated the block in the attempt to remove unacknowledged > data. However, because the visible length is equivalent of the bytes on disk, > it did not truncate unacknowledged data. > (8) When new data was appended to the destination, it skipped the bytes > already on disk. Therefore, whatever was written as garbage was not replaced. > (9) the volume scanner detected corrupt replica, but due to HDFS-10512, it > wouldn’t tell NameNode to mark the replica as corrupt, so the client > continued to form a pipeline using the corrupt replica. > (10) Finally the DN that had the only healthy replica was restarted. NameNode > then update the pipeline to only contain the corrupt replica. > (11) Client continue to write to the corrupt replica, because neither client > nor the data node itself knows the replica is corrupt. When the restarted > datanodes comes back, their replica are stale, despite they are not corrupt. > Therefore, none of the replica is good and up to date. > The sequence of events was reconstructed based on DataNode/NameNode log and > my understanding of code. > Incidentally, we have observed the same sequence of events on two independent > clusters. > [1] > The sender has the replica as follows: > 2016-04-15 22:03:05,066 INFO > org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl: > Recovering ReplicaBeingWritten, blk_1556997324_1100153495099, RBW > getNumBytes() = 41381376 > getBytesOnDisk() = 41381376 > getVisibleLength()= 41186444 > getVolume() = /hadoop-i/data/current > getBlockFile() = > /hadoop-i/data/current/BP-1043567091-10.1.1.1-1343682168507/current/rbw/blk_1556997324 > bytesAcked=41186444 > bytesOnDisk=41381376 > while the receiver has the replica as follows: > 2016-04-15 22:03:05,068 INFO > org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl: > Recovering ReplicaBeingWritten, blk_1556997324_1100153495099, RBW > getNumBytes() = 41186816 > getBytesOnDisk() = 41186816 > getVisibleLength()= 41186816 > getVolume() = /hadoop-g/data/current > getBlockFile() = > /hadoop-g/data/current/BP-1043567091-10.1.1.1-1343682168507/current/rbw/blk_1556997324 > bytesAcked=41186816 > bytesOnDisk=41186816 -- This message was sent by Atlassian JIRA (v6.3.4#6332) --------------------------------------------------------------------- To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org