Failed tasks use up dfs space, preventing new attempts to complete

Christian Kunz Thu, 13 Nov 2008 09:53:28 -0800

We are running a job using up more than 80% of available dfs space.

Many reduce tasks fail because of write errors and there seems to be a
negative feedback, because the space used by failed tasks is not cleaned up,
making write errors of new tasks even more likely.


E.g. We have 9,000 reduce tasks, 5,500 completed, and the remaining 3,500
failing repeatedly, because the dfs space required for the remaining tasks
is basically taken away to a good percentage by the failed tasks.

Is there a configuration option to get the temporary directories of failed
tasks get cleaned up?

Thanks,
Christian

Sample output of 'hadoop fs -du':
.../_temporary/_task_200811051109_0002_r_000080_0    8589934731
.../_temporary/_task_200811051109_0002_r_000080_1    8455717003
.../_temporary/_task_200811051109_0002_r_000080_2    5771362443
.../_temporary/_task_200811051109_0002_r_000080_3    7784628363

Failed tasks use up dfs space, preventing new attempts to complete

Reply via email to