We are running a job using up more than 80% of available dfs space. Many reduce tasks fail because of write errors and there seems to be a negative feedback, because the space used by failed tasks is not cleaned up, making write errors of new tasks even more likely.
E.g. We have 9,000 reduce tasks, 5,500 completed, and the remaining 3,500 failing repeatedly, because the dfs space required for the remaining tasks is basically taken away to a good percentage by the failed tasks. Is there a configuration option to get the temporary directories of failed tasks get cleaned up? Thanks, Christian Sample output of 'hadoop fs -du': .../_temporary/_task_200811051109_0002_r_000080_0 8589934731 .../_temporary/_task_200811051109_0002_r_000080_1 8455717003 .../_temporary/_task_200811051109_0002_r_000080_2 5771362443 .../_temporary/_task_200811051109_0002_r_000080_3 7784628363
