Re: intermediate files of killed tasks not purged

Sandhya E Tue, 28 Apr 2009 03:46:08 -0700

Attempt directories are in <hadoop-tmp>/mapred/local

I grep'd for one of the attempt that has left over in mapred/local in
tasktracker logs:
09/04/27 21:07:19 INFO mapred.TaskTracker: LaunchTaskAction:
attempt_200902120108_44218_r_000000_0
09/04/27 21:07:29 INFO mapred.TaskTracker:
attempt_200902120108_44218_r_000000_0 0.0% reduce > copy >
09/04/27 21:07:32 INFO mapred.TaskTracker:
attempt_200902120108_44218_r_000000_0 0.0% reduce > copy >
09/04/27 21:07:38 INFO mapred.TaskTracker:
attempt_200902120108_44218_r_000000_0 0.0% reduce > copy >
09/04/27 21:07:41 INFO mapred.TaskTracker:
attempt_200902120108_44218_r_000000_0 0.0% reduce > copy >
09/04/27 21:07:47 INFO mapred.TaskTracker:
attempt_200902120108_44218_r_000000_0 0.0% reduce > copy >
09/04/27 21:07:53 INFO mapred.TaskTracker:
attempt_200902120108_44218_r_000000_0 0.0% reduce > copy >
09/04/27 21:07:56 INFO mapred.TaskTracker:
attempt_200902120108_44218_r_000000_0 0.0% reduce > copy >
09/04/27 21:08:02 INFO mapred.TaskTracker:
attempt_200902120108_44218_r_000000_0 0.0% reduce > copy >
09/04/27 21:08:08 INFO mapred.TaskTracker:
attempt_200902120108_44218_r_000000_0 0.0% reduce > copy >
09/04/27 21:08:11 INFO mapred.TaskTracker:
attempt_200902120108_44218_r_000000_0 0.0% reduce > copy >
09/04/27 21:08:17 INFO mapred.TaskTracker:
attempt_200902120108_44218_r_000000_0 0.0% reduce > copy >
09/04/27 21:08:23 INFO mapred.TaskTracker:
attempt_200902120108_44218_r_000000_0 0.14285716% reduce > copy (6 of
14 at 2.03 MB/s) >
09/04/27 21:08:26 INFO mapred.TaskTracker:
attempt_200902120108_44218_r_000000_0 0.14285716% reduce > copy (6 of
14 at 2.03 MB/s) >
09/04/27 21:08:29 INFO mapred.TaskTracker:
attempt_200902120108_44218_r_000000_0 0.14285716% reduce > copy (6 of
14 at 2.03 MB/s) >
09/04/27 21:08:32 INFO mapred.TaskTracker:
attempt_200902120108_44218_r_000000_0 0.14285716% reduce > copy (6 of
14 at 2.03 MB/s) >
09/04/27 21:08:39 INFO mapred.TaskTracker:
attempt_200902120108_44218_r_000000_0 0.14285716% reduce > copy (6 of
14 at 2.03 MB/s) >
09/04/27 21:08:45 INFO mapred.TaskTracker:
attempt_200902120108_44218_r_000000_0 0.14285716% reduce > copy (6 of
14 at 2.03 MB/s) >
09/04/27 21:08:48 INFO mapred.TaskTracker:
attempt_200902120108_44218_r_000000_0 0.14285716% reduce > copy (6 of
14 at 2.03 MB/s) >
09/04/27 21:08:54 INFO mapred.TaskTracker:
attempt_200902120108_44218_r_000000_0 0.14285716% reduce > copy (6 of
14 at 2.03 MB/s) >
09/04/27 21:09:00 INFO mapred.TaskTracker:
attempt_200902120108_44218_r_000000_0 0.14285716% reduce > copy (6 of
14 at 2.03 MB/s) >
09/04/27 21:09:06 INFO mapred.TaskTracker:
attempt_200902120108_44218_r_000000_0 0.33333334% reduce > sort
09/04/27 21:09:09 INFO mapred.TaskTracker:
attempt_200902120108_44218_r_000000_0 0.33333334% reduce > sort
09/04/27 21:09:12 INFO mapred.TaskTracker:
attempt_200902120108_44218_r_000000_0 0.7029736% reduce > reduce
09/04/27 21:09:15 INFO mapred.TaskTracker:
attempt_200902120108_44218_r_000000_0 0.771893% reduce > reduce
09/04/27 21:09:18 INFO mapred.TaskTracker:
attempt_200902120108_44218_r_000000_0 0.8495109% reduce > reduce
09/04/27 21:09:21 INFO mapred.TaskTracker:
attempt_200902120108_44218_r_000000_0 0.9042134% reduce > reduce
09/04/27 21:09:24 INFO mapred.TaskTracker:
attempt_200902120108_44218_r_000000_0 0.98041093% reduce > reduce
09/04/27 21:09:26 INFO mapred.TaskTracker:
attempt_200902120108_44218_r_000000_0 0.99415195% reduce > reduce
09/04/27 21:09:26 INFO mapred.TaskTracker: Task
attempt_200902120108_44218_r_000000_0 is done.
09/04/27 21:09:31 INFO mapred.TaskRunner:
attempt_200902120108_44218_r_000000_0 done; removing files.


Regards
Sandhya

On Tue, Apr 28, 2009 at 2:39 PM, Amareshwari Sriramadasu
<amar...@yahoo-inc.com> wrote:
> Again, where are you seeing the attemptid directories? are they at
> mapred/local/<attemptid> or at
> mapred/local/taskTracker/jobCache/<jobid>/<attempid>.
> If you are seeing files at mapred/local/<attemptid>, then it is bug. Please
> raise a jira and attach tasktracker logs if possible.
> If not, mapred/local/taskTracker/jobCache/<jobid>/<attempid> directories are
> cleaned up on a KillTaskAction and mapred/local/taskTracker/jobCache/<jobid>
> directories are cleanedup on KillJobAction. Can you verify from TaskTracker
> logs, the attemptid got a KillTaskAction or jobid got a KillJobAction? If
> not, This is fixed by HADOOP-5247.
>
> Thanks
> Amareshwari
>
> Sandhya E wrote:
>>
>> Hi Amareshwari
>>
>> We are on 0.18 version. I verified from jobtracker website that not
>> all killed tasks have left overs in mapred/local.  Also there are some
>> tasks that were successful have left their tmp folders in mapred/local
>>
>> Can you please give some pointers on how to debug it further.
>>
>> Regards
>> Sandhya
>>
>> On Tue, Apr 28, 2009 at 2:02 PM, Amareshwari Sriramadasu
>> <amar...@yahoo-inc.com> wrote:
>>
>>>
>>> Hi Sandhya,
>>>
>>>  Which version of HADOOP are you using? There could be <attempt_id>
>>> directories in mapred/local, pre 0.17. Now, there should not be any such
>>> directories.
>>> From version 0.17 onwards, the attempt directories will be present only
>>> at
>>> mapred/local/taskTracker/jobCache/<jobid>/<attempid> . If you are seeing
>>> the
>>> directories in any other location, then it seems like a bug.
>>>
>>> HADOOP-4654 is to cleanup temporary data in DFS for failed tasks, it does
>>> not change local FileSystem files.
>>>
>>> Thanks
>>> Amareshwari
>>> Edward J. Yoon wrote:
>>>
>>>>
>>>> Hi,
>>>>
>>>> It seems related with https://issues.apache.org/jira/browse/HADOOP-4654.
>>>>
>>>> On Tue, Apr 28, 2009 at 4:01 PM, Sandhya E <sandhyabhas...@gmail.com>
>>>> wrote:
>>>>
>>>>
>>>>>
>>>>> Hi
>>>>>
>>>>> Under <hadoop-tmp-dir>/mapred/local there are directories like
>>>>> "attempt_200904262046_0026_m_000002_0"
>>>>> Each of these directories contains files of format: intermediate.1
>>>>> intermediate.2  intermediate.3  intermediate.4  intermediate.5
>>>>> There are many directories in this format. All these correspond to
>>>>> killed task attempts. As they contain huge intermediate files, we
>>>>> landed up in disk space issues.
>>>>>
>>>>> They are cleaned up  when mapred cluster is restarted. But otherwise,
>>>>> how can these be cleaned up without having to restart cluster.
>>>>>
>>>>> Conf parameter "keep.failed.task.files" is set to "false" in our case.
>>>>>
>>>>> Many Thanks
>>>>> Sandhya
>>>>>
>>>>>
>>>>>
>>>>
>>>>
>>>>
>>>
>>>
>
>

Re: intermediate files of killed tasks not purged

Reply via email to