Saw a couple more references to that block before the "to delete blk" messages:

2010-11-10 21:42:33,389 INFO org.apache.hadoop.hdfs.StateChange: BLOCK* NameSystem.addToInvalidates: blk_-4237880568969698703 is added to invalidSet of <our ip prefix>.169:50010 2010-11-10 21:42:33,389 INFO org.apache.hadoop.hdfs.StateChange: BLOCK* NameSystem.addToInvalidates: blk_-4237880568969698703 is added to invalidSet of <our ip prefix>.173:50010 2010-11-10 21:42:33,389 INFO org.apache.hadoop.hdfs.StateChange: BLOCK* NameSystem.addToInvalidates: blk_-4237880568969698703 is added to invalidSet of <our ip prefix>.176:50010

Again, I'm not sure why this is happening though.


BTW, I appreciate your comments below (about it getting moved out of the temp directory and then getting removed in another pass). But I grepped the logs as you suggested, and I still don't see how it got moved/deleted:

[r...@hdmaster hadoop-0.20]# grep _attempt_201010221550_0418_r_000001_0 hadoop-hadoop-namenode-hdmaster.log.2010-11-10 2010-11-10 21:42:28,442 INFO org.apache.hadoop.hdfs.server.namenode.FSNamesystem.audit: ugi=root,root,bin,daemon,sys,adm,disk,wheel ip=/<our ip root>.176 cmd=create src=<our root dir>/2010.11.10-21.05.29/output/_temporary/_attempt_201010221550_0418_r_000001_0/shard2/IntentTrait.state dst=null perm=root:supergroup:rw-r--r-- 2010-11-10 21:42:29,802 INFO org.apache.hadoop.hdfs.StateChange: BLOCK* NameSystem.allocateBlock: <our root dir>/2010.11.10-21.05.29/output/_temporary/_attempt_201010221550_0418_r_000001_0/shard2/IntentTrait.state. blk_-4237880568969698703_13404582 2010-11-10 21:42:30,360 INFO org.apache.hadoop.hdfs.StateChange: Removing lease on file <our root dir>/2010.11.10-21.05.29/output/_temporary/_attempt_201010221550_0418_r_000001_0/shard2/IntentTrait.state from client DFSClient_attempt_201010221550_0418_r_000001_0 2010-11-10 21:42:30,360 INFO org.apache.hadoop.hdfs.StateChange: DIR* NameSystem.completeFile: file <our root dir>/2010.11.10-21.05.29/output/_temporary/_attempt_201010221550_0418_r_000001_0/shard2/IntentTrait.state is closed by DFSClient_attempt_201010221550_0418_r_000001_0


I have to say, I'm really quite stumped. Been looking at this all afternoon, and I still have no idea how/why that file got purged. :-(

Thanks,

DR

On 11/11/2010 02:13 PM, Todd Lipcon wrote:
Given that it's an MR output, my guess is it got moved out of the temporary
directory when the job "Committed" and then was removed as another pass. I'd
grep for the containing directory name in the audit logs to see where it got
moved to and how it was eventually deleted.

Would be great if someone wrote some tools that, given a block ID, tracked
the life of the file that contained it (including renames of containing
dirs, etc). Shouldn't be too difficult.

-Todd

On Thu, Nov 11, 2010 at 9:38 AM, David Rosenstrauch<dar...@darose.net>wrote:
Sorry, I stand corrected.  When I grep the name node logs for the block ID
there are some additional lines:

[r...@hdmaster hadoop-0.20]# grep 4237880568969698703_13404582
hadoop-hadoop-namenode-hdmaster.log.2010-11-10

2010-11-10 21:42:29,802 INFO org.apache.hadoop.hdfs.StateChange: BLOCK*
NameSystem.allocateBlock:<our root
dir>/2010.11.10-21.05.29/output/_temporary/_attempt_201010221550_0418_r_000001_0/shard2/IntentTrait.state.
blk_-4237880568969698703_13404582
  2010-11-10 21:42:30,355 INFO org.apache.hadoop.hdfs.StateChange: BLOCK*
NameSystem.addStoredBlock: blockMap updated:<our ip prefix>.169:50010 is
added to blk_-4237880568969698703_13404582 size 326522
2010-11-10 21:42:30,356 INFO org.apache.hadoop.hdfs.StateChange: BLOCK*
NameSystem.addStoredBlock: blockMap updated:<our ip prefix>.173:50010 is
added to blk_-4237880568969698703_13404582 size 326522
2010-11-10 21:42:30,357 INFO org.apache.hadoop.hdfs.StateChange: BLOCK*
NameSystem.addStoredBlock: blockMap updated:<our ip prefix>.176:50010 is
added to blk_-4237880568969698703_13404582 size 326522
2010-11-10 21:42:33,585 INFO org.apache.hadoop.hdfs.StateChange: BLOCK* ask
<our ip prefix>.176:50010 to delete  blk_-4237880568969698703_13404582
2010-11-10 21:42:42,598 INFO org.apache.hadoop.hdfs.StateChange: BLOCK* ask
<our ip prefix>.173:50010 to delete  blk_719995703651674050_13393533
blk_2844104197159113873_13393516 blk_6711472063024831893_13393527
blk_240592081553699046_13393054 blk_6933038421638165347_13393232
blk_-1288897671244436593_13393351 blk_1408176921409258101_13393152
blk_-4237880568969698703_13404582 blk_-4188858742780299895_13393410
blk_-2670318277937301225_13393450 blk_-1858397614006984730_13393480
blk_6292330466651227068_13393505
2010-11-10 21:42:42,598 INFO org.apache.hadoop.hdfs.StateChange: BLOCK* ask
<our ip prefix>.169:50010 to delete  blk_719995703651674050_13393533
blk_6711472063024831893_13393527 blk_-514463419649319075_13404582
blk_-1980639884885154260_13404582 blk_240592081553699046_13393054
blk_8520588884031249451_13393428 blk_-6620589068796293265_13393089
blk_6933038421638165347_13393232 blk_-1288897671244436593_13393351
blk_-3581082286710707012_13393556 blk_1408176921409258101_13393152
blk_-1858397614006984730_13393480 blk_6292330466651227068_13393505
blk_2844104197159113873_13393516 blk_-4237880568969698703_13404582
blk_-4188858742780299895_13393410 blk_-2670318277937301225_13393450


I'm still not clear why those deletes are happening though.  Will read
through the logs again at those locations.  But if anyone's got any ideas,
pls chime in.

DR

Reply via email to