I have a large job operating on over 2 TB of data, with about 50000 input splits. For some reason (as yet unknown), tasks started failing on two of the machines (which got blacklisted). 13 mappers failed in total. Of those 13, 8 of the tasks were able to execute on another machine without any issues. 5 of the tasks *did not* get re-executed on another machine, and their status is marked as "FAILED_UNCLEAN". Anyone have any idea what's going on? Why isn't Hadoop running these tasks on other machines?

Thanks,
Nathan Marz


Reply via email to