Hi, guys,
I am using scribe to write data directly to hdfs, it works well most of the
time, but sporadically, I found some of files(1-2 out of 6000 per day)
written to hdfs become empty files, interestingly, this only happens when
hdfs is busy handing lots of other write request at the same time, here is
log pulled from namenode's log output, it seems the allocateBlock operation
following the create operation is vanished, unlike other normal operations.
Here are the comparison of a normal file and a abnormal one below:

----------- *Normal* one  (
file:/prefix-2012-02-18_00410 )---------------------------------
2012-02-18 02:32:15,873 INFO
org.apache.hadoop.hdfs.server.namenode.FSNamesystem.audit: ugi=user ip=/
10.16.35.236        cmd=create      src=/prefix-2012-02-18_00410   dst=null
       perm=user:supergroup:rw-r--r--
2012-02-18 02:32:15,905 INFO org.apache.hadoop.hdfs.StateChange: *BLOCK*
NameSystem.allocateBlock*: /prefix-2012-02-18_00410.
blk_-2839230806711337016_129091175
2012-02-18 02:32:17,422 INFO org.apache.hadoop.hdfs.StateChange: Removing
lease on  file /prefix-2012-02-18_00410 from client DFSClient_-36442571
2012-02-18 02:32:17,422 INFO org.apache.hadoop.hdfs.StateChange: DIR*
NameSystem.completeFile: file /prefix-2012-02-18_00410 is closed by
DFSClient_-36442571
----------- End --------------
----------- *Problematic* one (
file:/prefix-2012-02-18_00409 )----------------------------
2012-02-18 02:30:47,287 INFO
org.apache.hadoop.hdfs.server.namenode.FSNamesystem.audit: ugi=user ip=/
10.16.35.236        cmd=create      src=/prefix-2012-02-18_00409   dst=null
       perm=user:supergroup:rw-r--r--
*<comment: no allocateBlock is found, and there is 60-90 sec gap between>*
2012-02-18 02:32:15,648 INFO org.apache.hadoop.hdfs.StateChange: Removing
lease on  file /prefix-2012-02-18_00409 from client DFSClient_748605466
2012-02-18 02:32:15,648 INFO org.apache.hadoop.hdfs.StateChange: DIR*
NameSystem.completeFile: file /prefix-2012-02-18_00409 is closed by
DFSClient_748605466
----------- End -------------------

Not sure this is more scribe-related or hdfs-related, let me try shooting
the question in this mailing-list.

The 90 secs delay looks suspicious to me, and might cause this issue as I
guess, did you guys meet this situation before? Any light shed on me about
why is that and how to work it around to guarantee the entire file has been
successfully saved into Hdfs?  Any places I should focus on for
improvements?(on hdfs side(e.g. hdfs tunning, gc tuning etc) or application
side(e.g. timeout setting on hdfs api calling in scribe), your
suggestions/thoughts are greatly appreciated.

Regards,
Peter

Reply via email to