[ https://issues.apache.org/jira/browse/HDFS-4660?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14590334#comment-14590334 ]
Hudson commented on HDFS-4660: ------------------------------ FAILURE: Integrated in Hadoop-Mapreduce-trunk #2177 (See [https://builds.apache.org/job/Hadoop-Mapreduce-trunk/2177/]) HDFS-4660. Block corruption can happen during pipeline recovery. Contributed by Kihwal Lee. (kihwal: rev c74517c46bf00af408ed866b6577623cdec02de1) * hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt * hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/datanode/BlockReceiver.java > Block corruption can happen during pipeline recovery > ---------------------------------------------------- > > Key: HDFS-4660 > URL: https://issues.apache.org/jira/browse/HDFS-4660 > Project: Hadoop HDFS > Issue Type: Bug > Components: datanode > Affects Versions: 3.0.0, 2.0.3-alpha > Reporter: Peng Zhang > Assignee: Kihwal Lee > Priority: Blocker > Fix For: 2.7.1 > > Attachments: HDFS-4660.patch, HDFS-4660.patch, HDFS-4660.v2.patch > > > pipeline DN1 DN2 DN3 > stop DN2 > pipeline added node DN4 located at 2nd position > DN1 DN4 DN3 > recover RBW > DN4 after recover rbw > 2013-04-01 21:02:31,570 INFO > org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl: Recover > RBW replica > BP-325305253-10.2.201.14-1364820083462:blk_-9076133543772600337_1004 > 2013-04-01 21:02:31,570 INFO > org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl: > Recovering ReplicaBeingWritten, blk_-9076133543772600337_1004, RBW > getNumBytes() = 134144 > getBytesOnDisk() = 134144 > getVisibleLength()= 134144 > end at chunk (134144/512=262) > DN3 after recover rbw > 2013-04-01 21:02:31,575 INFO > org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl: Recover > RBW replica > BP-325305253-10.2.201.14-1364820083462:blk_-9076133543772600337_10042013-04-01 > 21:02:31,575 INFO > org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl: > Recovering ReplicaBeingWritten, blk_-9076133543772600337_1004, RBW > getNumBytes() = 134028 > getBytesOnDisk() = 134028 > getVisibleLength()= 134028 > client send packet after recover pipeline > offset=133632 len=1008 > DN4 after flush > 2013-04-01 21:02:31,779 DEBUG > org.apache.hadoop.hdfs.server.datanode.DataNode: FlushOrsync, file > offset:134640; meta offset:1063 > // meta end position should be floor(134640/512)*4 + 7 == 1059, but now it is > 1063. > DN3 after flush > 2013-04-01 21:02:31,782 DEBUG > org.apache.hadoop.hdfs.server.datanode.DataNode: PacketResponder: > BP-325305253-10.2.201.14-1364820083462:blk_-9076133543772600337_1005, > type=LAST_IN_PIPELINE, downstreams=0:[]: enqueue Packet(seqno=219, > lastPacketInBlock=false, offsetInBlock=134640, > ackEnqueueNanoTime=8817026136871545) > 2013-04-01 21:02:31,782 DEBUG > org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl: Changing > meta file offset of block > BP-325305253-10.2.201.14-1364820083462:blk_-9076133543772600337_1005 from > 1055 to 1051 > 2013-04-01 21:02:31,782 DEBUG > org.apache.hadoop.hdfs.server.datanode.DataNode: FlushOrsync, file > offset:134640; meta offset:1059 > After checking meta on DN4, I found checksum of chunk 262 is duplicated, but > data not. > Later after block was finalized, DN4's scanner detected bad block, and then > reported it to NM. NM send a command to delete this block, and replicate this > block from other DN in pipeline to satisfy duplication num. > I think this is because in BlockReceiver it skips data bytes already written, > but not skips checksum bytes already written. And function > adjustCrcFilePosition is only used for last non-completed chunk, but > not for this situation. -- This message was sent by Atlassian JIRA (v6.3.4#6332)