mapred.jobtracker.retirejob.interval
is not in the default config
should this not be in the config?
Billy
"Amar Kamat" <ama...@yahoo-inc.com> wrote in
message news:49caff11.8070...@yahoo-inc.com...
Amar Kamat wrote:
Amareshwari Sriramadasu wrote:
Set mapred.jobtracker.retirejob.interval
This is used to retire completed jobs.
and mapred.userlog.retain.hours to higher value.
This is used to discard user logs.
As Amareshwari pointed out, this might be the cause. Can you increase this
value and try?
Amar
By default, their values are 24 hours. These might be the reason for
failure, though I'm not sure.
Thanks
Amareshwari
Billy Pearson wrote:
I am seeing on one of my long running jobs about 50-60 hours that after
24 hours all
active reduce task fail with the error messages
java.io.IOException: Task process exit with nonzero status of 255.
at org.apache.hadoop.mapred.TaskRunner.run(TaskRunner.java:418)
Is there something in the config that I can change to stop this?
Every time with in 1 min of 24 hours they all fail at the same time.
waist a lot of resource downloading the map outputs and merging them
again.
What is the state of the reducer (copy or sort)? Check
jobtracker/task-tracker logs to see what is the state of these reducers
and whether it issued a kill signal. Either jobtracker/tasktracker is
issuing a kill signal or the reducers are committing suicide. Were there
any failures on the reducer side while pulling the map output? Also what
is the nature of the job? How fast the maps finish?
Amar
Billy