Dave Latham created HDFS-5042:
---------------------------------

             Summary: Completed files lost after power failure
                 Key: HDFS-5042
                 URL: https://issues.apache.org/jira/browse/HDFS-5042
             Project: Hadoop HDFS
          Issue Type: Bug
         Environment: ext3 on CentOS 5.7 (kernel 2.6.18-274.el5)
            Reporter: Dave Latham
            Priority: Critical


We suffered a cluster wide power failure after which HDFS lost data that it had 
acknowledged as closed and complete.

The client was HBase which compacted a set of HFiles into a new HFile, then 
after closing the file successfully, deleted the previous versions of the file. 
 The cluster then lost power, and when brought back up the newly created file 
was marked CORRUPT.

Based on reading the logs it looks like the replicas were created by the 
DataNodes in the 'blocksBeingWritten' directory.  Then when the file was closed 
they were moved to the 'current' directory.  After the power cycle those 
replicas were again in the blocksBeingWritten directory of the underlying file 
system (ext3).  When those DataNodes reported in to the NameNode it deleted 
those replicas and lost the file.

Some possible fixes could be having the DataNode fsync the directory(s) after 
moving the block from blocksBeingWritten to current to ensure the rename is 
durable or having the NameNode accept replicas from blocksBeingWritten under 
certain circumstances.

Log snippets from RS (RegionServer), NN (NameNode), DN (DataNode):
{noformat}
RS 2013-06-29 11:16:06,812 DEBUG org.apache.hadoop.hbase.util.FSUtils: Creating 
file=hdfs://hm3:9000/hbase/users-6/b5b0820cde759ae68e333b2f4015bb7e/.tmp/6e0cc30af6e64e56ba5a539fdf159c4c
 with permission=rwxrwxrwx
NN 2013-06-29 11:16:06,830 INFO org.apache.hadoop.hdfs.StateChange: BLOCK* 
NameSystem.allocateBlock: 
/hbase/users-6/b5b0820cde759ae68e333b2f4015bb7e/.tmp/6e0cc30af6e64e56ba5a539fdf159c4c.
 blk_1395839728632046111_357084589
DN 2013-06-29 11:16:06,832 INFO 
org.apache.hadoop.hdfs.server.datanode.DataNode: Receiving block 
blk_1395839728632046111_357084589 src: /10.0.5.237:14327 dest: /10.0.5.237:50010
NN 2013-06-29 11:16:11,370 INFO org.apache.hadoop.hdfs.StateChange: BLOCK* 
NameSystem.addStoredBlock: blockMap updated: 10.0.6.1:50010 is added to 
blk_1395839728632046111_357084589 size 25418340
NN 2013-06-29 11:16:11,370 INFO org.apache.hadoop.hdfs.StateChange: BLOCK* 
NameSystem.addStoredBlock: blockMap updated: 10.0.6.24:50010 is added to 
blk_1395839728632046111_357084589 size 25418340
NN 2013-06-29 11:16:11,385 INFO org.apache.hadoop.hdfs.StateChange: BLOCK* 
NameSystem.addStoredBlock: blockMap updated: 10.0.5.237:50010 is added to 
blk_1395839728632046111_357084589 size 25418340
DN 2013-06-29 11:16:11,385 INFO 
org.apache.hadoop.hdfs.server.datanode.DataNode: Received block 
blk_1395839728632046111_357084589 of size 25418340 from /10.0.5.237:14327
DN 2013-06-29 11:16:11,385 INFO 
org.apache.hadoop.hdfs.server.datanode.DataNode: PacketResponder 2 for block 
blk_1395839728632046111_357084589 terminating
NN 2013-06-29 11:16:11,385 INFO org.apache.hadoop.hdfs.StateChange: Removing 
lease on  file 
/hbase/users-6/b5b0820cde759ae68e333b2f4015bb7e/.tmp/6e0cc30af6e64e56ba5a539fdf159c4c
 from client DFSClient_hb_rs_hs745,60020,1372470111932
NN 2013-06-29 11:16:11,385 INFO org.apache.hadoop.hdfs.StateChange: DIR* 
NameSystem.completeFile: file 
/hbase/users-6/b5b0820cde759ae68e333b2f4015bb7e/.tmp/6e0cc30af6e64e56ba5a539fdf159c4c
 is closed by DFSClient_hb_rs_hs745,60020,1372470111932
RS 2013-06-29 11:16:11,393 INFO org.apache.hadoop.hbase.regionserver.Store: 
Renaming compacted file at 
hdfs://hm3:9000/hbase/users-6/b5b0820cde759ae68e333b2f4015bb7e/.tmp/6e0cc30af6e64e56ba5a539fdf159c4c
 to 
hdfs://hm3:9000/hbase/users-6/b5b0820cde759ae68e333b2f4015bb7e/n/6e0cc30af6e64e56ba5a539fdf159c4c
RS 2013-06-29 11:16:11,505 INFO org.apache.hadoop.hbase.regionserver.Store: 
Completed major compaction of 7 file(s) in n of 
users-6,\x12\xBDp\xA3,1359426311784.b5b0820cde759ae68e333b2f4015bb7e. into 
6e0cc30af6e64e56ba5a539fdf159c4c, size=24.2m; total size for store is 24.2m

-------  CRASH, RESTART ---------

NN 2013-06-29 12:01:19,743 INFO org.apache.hadoop.hdfs.StateChange: BLOCK* 
NameSystem.addStoredBlock: addStoredBlock request received for 
blk_1395839728632046111_357084589 on 10.0.6.1:50010 size 21978112 but was 
rejected: Reported as block being written but is a block of closed file.
NN 2013-06-29 12:01:19,743 INFO org.apache.hadoop.hdfs.StateChange: BLOCK* 
NameSystem.addToInvalidates: blk_1395839728632046111 is added to invalidSet of 
10.0.6.1:50010
NN 2013-06-29 12:01:20,155 INFO org.apache.hadoop.hdfs.StateChange: BLOCK* 
NameSystem.addStoredBlock: addStoredBlock request received for 
blk_1395839728632046111_357084589 on 10.0.5.237:50010 size 16971264 but was 
rejected: Reported as block being written but is a block of closed file.
NN 2013-06-29 12:01:20,155 INFO org.apache.hadoop.hdfs.StateChange: BLOCK* 
NameSystem.addToInvalidates: blk_1395839728632046111 is added to invalidSet of 
10.0.5.237:50010
NN 2013-06-29 12:01:20,175 INFO org.apache.hadoop.hdfs.StateChange: BLOCK* 
NameSystem.addStoredBlock: addStoredBlock request received for 
blk_1395839728632046111_357084589 on 10.0.6.24:50010 size 21913088 but was 
rejected: Reported as block being written but is a block of closed file.
NN 2013-06-29 12:01:20,175 INFO org.apache.hadoop.hdfs.StateChange: BLOCK* 
NameSystem.addToInvalidates: blk_1395839728632046111 is added to invalidSet of 
10.0.6.24:50010
(note the clock on the server running DN is wrong after restart.  I believe 
timestamps are off by 6 hours:)
DN 2013-06-29 06:07:22,877 INFO 
org.apache.hadoop.hdfs.server.datanode.DataNode: Scheduling block 
blk_1395839728632046111_357084589 file 
/data/hadoop/dfs/data/blocksBeingWritten/blk_1395839728632046111 for deletion
DN 2013-06-29 06:07:24,952 INFO 
org.apache.hadoop.hdfs.server.datanode.DataNode: Deleted block 
blk_1395839728632046111_357084589 at file 
/data/hadoop/dfs/data/blocksBeingWritten/blk_1395839728632046111
{noformat}

There was some additional discussion on this thread on the mailing list:

http://mail-archives.apache.org/mod_mbox/hadoop-hdfs-user/201307.mbox/%3CCA+qbEUPuf19PL_EVeWi1104+scLVrcS0LTFUvBPw=qcuxnz...@mail.gmail.com%3E

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

Reply via email to