Hi Edward,

i copied the userlogs folder which caused the error. 
Two things which is speak against the too-many files theory.
a) i can add new files to this folder (touch userlogsOLD/a, etc... ) 
b) the sysctl fs.file-max shows 817874 whereas the file count on the first 
level of userlogsOLD is 31999 and all files recursively are 107400.

Any thoughts ?
Johannes


On Jun 14, 2010, at 7:47 PM, Edward Capriolo wrote:

> On Mon, Jun 14, 2010 at 1:15 PM, Johannes Zillmann <jzillm...@googlemail.com
>> wrote:
> 
>> Hi,
>> 
>> i have running a 4-node cluster with hadoop-0.20.2. Now i suddenly run into
>> a situation where every task scheduled on 2 of the 4 nodes failed.
>> Seems like the child jvm crashes. There are no child logs under
>> logs/userlogs. Tasktracker gives this:
>> 
>> 2010-06-14 09:34:12,714 INFO org.apache.hadoop.mapred.JvmManager: In
>> JvmRunner constructed JVM ID: jvm_201006091425_0049_m_-946174604
>> 2010-06-14 09:34:12,714 INFO org.apache.hadoop.mapred.JvmManager: JVM
>> Runner jvm_201006091425_0049_m_-946174604 spawned.
>> 2010-06-14 09:34:12,727 INFO org.apache.hadoop.mapred.JvmManager: JVM :
>> jvm_201006091425_0049_m_-946174604 exited. Number of tasks it ran: 0
>> 2010-06-14 09:34:12,727 WARN org.apache.hadoop.mapred.TaskRunner:
>> attempt_201006091425_0049_m_003179_0 Child Error
>> java.io.IOException: Task process exit with nonzero status of 1.
>>       at org.apache.hadoop.mapred.TaskRunner.run(TaskRunner.java:418)
>> 
>> 
>> At some point i simply renamed logs/userlogs to logs/userlogsOLD. A new job
>> created the logs/userlogs again and no error ocuured anymore on this host.
>> The permissions of userlogs and userlogsOLD are exactly the same.
>> userlogsOLD contains about 378M in 132747 files. When copying the content of
>> userlogsOLD into userlogs, the tasks of the belonging node starts failing
>> again.
>> 
>> Some questions:
>> - this seems to me like a problem with too many files in one folder - any
>> thoughts on this ?
>> - is the content of logs/userlogs cleaned up by hadoop regularly ?
>> - the logs/stdout file of the tasks are not existent, the logs/out fiels of
>> the tasktracker hasn't any specific message (other then message posted
>> above) - is there any log file left where an error message could be found ?
>> 
>> 
>> best regards
>> Johannes
> 
> 
> Most file systems have an upper limit on number of subfiles/folders in a
> folder. You have probably hit the EXT3 limit. If you launch lots and lots of
> jobs you can hit the limit before any cleanup happens.
> 
> You can experiment with cleanup and other filesystems. The following log
> related issue might be relevant.
> 
> https://issues.apache.org/jira/browse/MAPREDUCE-323?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12877614#action_12877614
> 
> Regards,
> Edward

Reply via email to