It was failing on all the nodes both new and old.
The problem was there were too many subdirectories under
$HADOOP_HOME/logs/userlogs
The fix was just to delete the subdirs and change this setting from 24
hours(the default) to 2 hours.
mapred.userlog.retain.hours

Would have been nice if there was an error message that pointed to this.


Aaron Kimball wrote:
> Hi David,
>
> If your tasks are failing on only the new nodes, it's likely that you're
> missing a library or something on those machines. See this Hadoop tutorial
> http://public.yahoo.com/gogate/hadoop-tutorial/html/module5.html about
> "distributing debug scripts." These will allow you to capture stdout/err and
> the syslog from tasks that fail.
>
> - Aaron
>
> On Wed, Jan 28, 2009 at 9:40 AM, Sagar Naik <sn...@attributor.com> wrote:
>
>   
>> Pl check which nodes have these failures.
>>
>> I guess the new tasktrackers/machines  are not configured correctly.
>> As a result, the map-task will die and the remaining map-tasks will be
>> sucked onto these machines
>>
>>
>> -Sagar
>>
>>
>> David J. O'Dell wrote:
>>
>>     
>>> We've been running 0.18.2 for over a month on an 8 node cluster.
>>> Last week we added 4 more nodes to the cluster and have experienced 2
>>> failures to the tasktrackers since then.
>>> The namenodes are running fine but all jobs submitted will die when
>>> submitted with this error on the tasktrackers.
>>>
>>> 2009-01-28 08:07:55,556 INFO org.apache.hadoop.mapred.TaskTracker:
>>> LaunchTaskAction: attempt_200901280756_0012_m_000074_2
>>> 2009-01-28 08:07:55,682 WARN org.apache.hadoop.mapred.TaskRunner:
>>> attempt_200901280756_0012_m_000074_2 Child Error
>>> java.io.IOException: Task process exit with nonzero status of 1.
>>>        at
>>> org.apache.hadoop.mapred.TaskRunner.runChild(TaskRunner.java:462)
>>>        at org.apache.hadoop.mapred.TaskRunner.run(TaskRunner.java:403)
>>>
>>> I tried running the tasktrackers in debug mode but the entries above are
>>> all that show up in the logs.
>>> As of now my cluster is down.
>>>
>>>
>>>
>>>       

-- 
David O'Dell
Director, Operations
e: dod...@videoegg.com
t:  (415) 738-5152
180 Townsend St., Third Floor
San Francisco, CA 94107 

Reply via email to