These are the audit log entries immediately preceding that timestamp:

2010-11-10 21:42:33,034 INFO org.apache.hadoop.hdfs.server.namenode.FSNamesystem.audit: ugi=root,root,bin,daemon,sys,adm,disk,wheel ip=/<our ip prefix>.178 cmd=listStatus src=<our root dir>/2010.11.10-21.05.29/output/_temporary/_attempt_201010221550_0418_r_000000_0 dst=null perm=null 2010-11-10 21:42:33,043 INFO org.apache.hadoop.hdfs.server.namenode.FSNamesystem.audit: ugi=root,root,bin,daemon,sys,adm,disk,wheel ip=/<our ip prefix>.178 cmd=mkdirs src=<our root dir>/2010.11.10-21.05.29/output dst=null perm=root:supergroup:rwxr-xr-x 2010-11-10 21:42:33,046 INFO org.apache.hadoop.hdfs.server.namenode.FSNamesystem.audit: ugi=root,root,bin,daemon,sys,adm,disk,wheel ip=/<our ip prefix>.178 cmd=listStatus src=<our root dir>/2010.11.10-21.05.29/output/_temporary/_attempt_201010221550_0418_r_000000_0/shard1 dst=null perm=null 2010-11-10 21:42:33,047 INFO org.apache.hadoop.hdfs.server.namenode.FSNamesystem.audit: ugi=root,root,bin,daemon,sys,adm,disk,wheel ip=/<our ip prefix>.178 cmd=mkdirs src=<our root dir>/2010.11.10-21.05.29/output/shard1 dst=null perm=root:supergroup:rwxr-xr-x 2010-11-10 21:42:33,070 INFO org.apache.hadoop.hdfs.server.namenode.FSNamesystem.audit: ugi=root,root,bin,daemon,sys,adm,disk,wheel ip=/<our ip prefix>.178 cmd=rename src=<our root dir>/2010.11.10-21.05.29/output/_temporary/_attempt_201010221550_0418_r_000000_0/shard1/IntentTrait.state dst=<our root dir>/2010.11.10-21.05.29/output/shard1/IntentTrait.state perm=root:supergroup:rw-r--r-- 2010-11-10 21:42:33,093 INFO org.apache.hadoop.hdfs.server.namenode.FSNamesystem.audit: ugi=root,root,bin,daemon,sys,adm,disk,wheel ip=/<our ip prefix>.178 cmd=delete src=<our root dir>/2010.11.10-21.05.29/output/_temporary/_attempt_201010221550_0418_r_000000_0 dst=null perm=null 2010-11-10 21:42:33,341 INFO org.apache.hadoop.hdfs.server.namenode.FSNamesystem.audit: ugi=root,root,bin,daemon,sys,adm,disk,wheel ip=/<our ip prefix>.178 cmd=open src=<our root dir>/conf/ShardingTable.txt dst=null perm=null

Thanks,

DR

On 11/11/2010 07:14 PM, Todd Lipcon wrote:
What's the last audit log entry prior to 2010-11-10 21:42:33,389?

-Todd

On Thu, Nov 11, 2010 at 2:10 PM, David Rosenstrauch<dar...@darose.net>wrote:

Saw a couple more references to that block before the "to delete  blk"
messages:

2010-11-10 21:42:33,389 INFO org.apache.hadoop.hdfs.StateChange: BLOCK*
NameSystem.addToInvalidates: blk_-4237880568969698703 is added to invalidSet
of<our ip prefix>.169:50010
2010-11-10 21:42:33,389 INFO org.apache.hadoop.hdfs.StateChange: BLOCK*
NameSystem.addToInvalidates: blk_-4237880568969698703 is added to invalidSet
of<our ip prefix>.173:50010
2010-11-10 21:42:33,389 INFO org.apache.hadoop.hdfs.StateChange: BLOCK*
NameSystem.addToInvalidates: blk_-4237880568969698703 is added to invalidSet
of<our ip prefix>.176:50010

Again, I'm not sure why this is happening though.


BTW, I appreciate your comments below (about it getting moved out of the
temp directory and then getting removed in another pass).  But I grepped the
logs as you suggested, and I still don't see how it got moved/deleted:

[r...@hdmaster hadoop-0.20]# grep _attempt_201010221550_0418_r_000001_0
hadoop-hadoop-namenode-hdmaster.log.2010-11-10
2010-11-10 21:42:28,442 INFO
org.apache.hadoop.hdfs.server.namenode.FSNamesystem.audit:
ugi=root,root,bin,daemon,sys,adm,disk,wheel     ip=/<our ip root>.176
cmd=create      src=<our root
dir>/2010.11.10-21.05.29/output/_temporary/_attempt_201010221550_0418_r_000001_0/shard2/IntentTrait.state
  dst=null        perm=root:supergroup:rw-r--r--

2010-11-10 21:42:29,802 INFO org.apache.hadoop.hdfs.StateChange: BLOCK*
NameSystem.allocateBlock:<our root
dir>/2010.11.10-21.05.29/output/_temporary/_attempt_201010221550_0418_r_000001_0/shard2/IntentTrait.state.
blk_-4237880568969698703_13404582
2010-11-10 21:42:30,360 INFO org.apache.hadoop.hdfs.StateChange: Removing
lease on  file<our root
dir>/2010.11.10-21.05.29/output/_temporary/_attempt_201010221550_0418_r_000001_0/shard2/IntentTrait.state
from client DFSClient_attempt_201010221550_0418_r_000001_0
2010-11-10 21:42:30,360 INFO org.apache.hadoop.hdfs.StateChange: DIR*
NameSystem.completeFile: file<our root
dir>/2010.11.10-21.05.29/output/_temporary/_attempt_201010221550_0418_r_000001_0/shard2/IntentTrait.state
is closed by DFSClient_attempt_201010221550_0418_r_000001_0


I have to say, I'm really quite stumped.  Been looking at this all
afternoon, and I still have no idea how/why that file got purged.  :-(

Thanks,

DR


On 11/11/2010 02:13 PM, Todd Lipcon wrote:

Given that it's an MR output, my guess is it got moved out of the
temporary
directory when the job "Committed" and then was removed as another pass.
I'd
grep for the containing directory name in the audit logs to see where it
got
moved to and how it was eventually deleted.

Would be great if someone wrote some tools that, given a block ID, tracked
the life of the file that contained it (including renames of containing
dirs, etc). Shouldn't be too difficult.

-Todd

On Thu, Nov 11, 2010 at 9:38 AM, David Rosenstrauch<dar...@darose.net
wrote:

Sorry, I stand corrected.  When I grep the name node logs for the block
ID

there are some additional lines:

[r...@hdmaster hadoop-0.20]# grep 4237880568969698703_13404582
hadoop-hadoop-namenode-hdmaster.log.2010-11-10

2010-11-10 21:42:29,802 INFO org.apache.hadoop.hdfs.StateChange: BLOCK*
NameSystem.allocateBlock:<our root

dir>/2010.11.10-21.05.29/output/_temporary/_attempt_201010221550_0418_r_000001_0/shard2/IntentTrait.state.
blk_-4237880568969698703_13404582
  2010-11-10 21:42:30,355 INFO org.apache.hadoop.hdfs.StateChange: BLOCK*
NameSystem.addStoredBlock: blockMap updated:<our ip prefix>.169:50010 is
added to blk_-4237880568969698703_13404582 size 326522
2010-11-10 21:42:30,356 INFO org.apache.hadoop.hdfs.StateChange: BLOCK*
NameSystem.addStoredBlock: blockMap updated:<our ip prefix>.173:50010 is
added to blk_-4237880568969698703_13404582 size 326522
2010-11-10 21:42:30,357 INFO org.apache.hadoop.hdfs.StateChange: BLOCK*
NameSystem.addStoredBlock: blockMap updated:<our ip prefix>.176:50010 is
added to blk_-4237880568969698703_13404582 size 326522
2010-11-10 21:42:33,585 INFO org.apache.hadoop.hdfs.StateChange: BLOCK*
ask
<our ip prefix>.176:50010 to delete  blk_-4237880568969698703_13404582
2010-11-10 21:42:42,598 INFO org.apache.hadoop.hdfs.StateChange: BLOCK*
ask
<our ip prefix>.173:50010 to delete  blk_719995703651674050_13393533
blk_2844104197159113873_13393516 blk_6711472063024831893_13393527
blk_240592081553699046_13393054 blk_6933038421638165347_13393232
blk_-1288897671244436593_13393351 blk_1408176921409258101_13393152
blk_-4237880568969698703_13404582 blk_-4188858742780299895_13393410
blk_-2670318277937301225_13393450 blk_-1858397614006984730_13393480
blk_6292330466651227068_13393505
2010-11-10 21:42:42,598 INFO org.apache.hadoop.hdfs.StateChange: BLOCK*
ask
<our ip prefix>.169:50010 to delete  blk_719995703651674050_13393533
blk_6711472063024831893_13393527 blk_-514463419649319075_13404582
blk_-1980639884885154260_13404582 blk_240592081553699046_13393054
blk_8520588884031249451_13393428 blk_-6620589068796293265_13393089
blk_6933038421638165347_13393232 blk_-1288897671244436593_13393351
blk_-3581082286710707012_13393556 blk_1408176921409258101_13393152
blk_-1858397614006984730_13393480 blk_6292330466651227068_13393505
blk_2844104197159113873_13393516 blk_-4237880568969698703_13404582
blk_-4188858742780299895_13393410 blk_-2670318277937301225_13393450


I'm still not clear why those deletes are happening though.  Will read
through the logs again at those locations.  But if anyone's got any
ideas,
pls chime in.

DR

Reply via email to