We've been running 0.18.2 for over a month on an 8 node cluster. Last week we added 4 more nodes to the cluster and have experienced 2 failures to the tasktrackers since then. The namenodes are running fine but all jobs submitted will die when submitted with this error on the tasktrackers.
2009-01-28 08:07:55,556 INFO org.apache.hadoop.mapred.TaskTracker: LaunchTaskAction: attempt_200901280756_0012_m_000074_2 2009-01-28 08:07:55,682 WARN org.apache.hadoop.mapred.TaskRunner: attempt_200901280756_0012_m_000074_2 Child Error java.io.IOException: Task process exit with nonzero status of 1. at org.apache.hadoop.mapred.TaskRunner.runChild(TaskRunner.java:462) at org.apache.hadoop.mapred.TaskRunner.run(TaskRunner.java:403) I tried running the tasktrackers in debug mode but the entries above are all that show up in the logs. As of now my cluster is down. -- David O'Dell Director, Operations e: dod...@videoegg.com t: (415) 738-5152 180 Townsend St., Third Floor San Francisco, CA 94107