Re: reduce task failing after 24 hours waiting

Amareshwari Sriramadasu Wed, 25 Mar 2009 20:45:41 -0700

Set mapred.jobtracker.retirejob.interval and mapred.userlog.retain.hoursto higher value. By default, their values are 24 hours. These might bethe reason for failure, though I'm not sure.


Thanks
Amareshwari


Billy Pearson wrote:

I am seeing on one of my long running jobs about 50-60 hours thatafter 24 hours all
active reduce task fail with the error messages

java.io.IOException: Task process exit with nonzero status of 255.
at org.apache.hadoop.mapred.TaskRunner.run(TaskRunner.java:418)

Is there something in the config that I can change to stop this?

Every time with in 1 min of 24 hours they all fail at the same time.
waist a lot of resource downloading the map outputs and merging themagain.
Billy

Re: reduce task failing after 24 hours waiting

Reply via email to