It was failing on all the nodes both new and old. The problem was there were too many subdirectories under $HADOOP_HOME/logs/userlogs The fix was just to delete the subdirs and change this setting from 24 hours(the default) to 2 hours. mapred.userlog.retain.hours
Would have been nice if there was an error message that pointed to this. Aaron Kimball wrote: > Hi David, > > If your tasks are failing on only the new nodes, it's likely that you're > missing a library or something on those machines. See this Hadoop tutorial > http://public.yahoo.com/gogate/hadoop-tutorial/html/module5.html about > "distributing debug scripts." These will allow you to capture stdout/err and > the syslog from tasks that fail. > > - Aaron > > On Wed, Jan 28, 2009 at 9:40 AM, Sagar Naik <sn...@attributor.com> wrote: > > >> Pl check which nodes have these failures. >> >> I guess the new tasktrackers/machines are not configured correctly. >> As a result, the map-task will die and the remaining map-tasks will be >> sucked onto these machines >> >> >> -Sagar >> >> >> David J. O'Dell wrote: >> >> >>> We've been running 0.18.2 for over a month on an 8 node cluster. >>> Last week we added 4 more nodes to the cluster and have experienced 2 >>> failures to the tasktrackers since then. >>> The namenodes are running fine but all jobs submitted will die when >>> submitted with this error on the tasktrackers. >>> >>> 2009-01-28 08:07:55,556 INFO org.apache.hadoop.mapred.TaskTracker: >>> LaunchTaskAction: attempt_200901280756_0012_m_000074_2 >>> 2009-01-28 08:07:55,682 WARN org.apache.hadoop.mapred.TaskRunner: >>> attempt_200901280756_0012_m_000074_2 Child Error >>> java.io.IOException: Task process exit with nonzero status of 1. >>> at >>> org.apache.hadoop.mapred.TaskRunner.runChild(TaskRunner.java:462) >>> at org.apache.hadoop.mapred.TaskRunner.run(TaskRunner.java:403) >>> >>> I tried running the tasktrackers in debug mode but the entries above are >>> all that show up in the logs. >>> As of now my cluster is down. >>> >>> >>> >>> -- David O'Dell Director, Operations e: dod...@videoegg.com t: (415) 738-5152 180 Townsend St., Third Floor San Francisco, CA 94107