Nathan Marz wrote:
I have a large job operating on over 2 TB of data, with about 50000
input splits. For some reason (as yet unknown), tasks started failing
on two of the machines (which got blacklisted). 13 mappers failed in
total. Of those 13, 8 of the tasks were able to execute on another
machine without any issues. 5 of the tasks *did not* get re-executed
on another machine, and their status is marked as "FAILED_UNCLEAN".
Anyone have any idea what's going on? Why isn't Hadoop running these
tasks on other machines?
Has the job failed/killed or Succeded when you see this situation ? Once
the job completes, the unclean attempts will not get scheduled.
If not, are there other jobs of higher priority running at the same time
preventing the cleanups to be launched?
What version of Hadoop are you using? latest trunk?
Thanks
Amareshwari
Thanks,
Nathan Marz