[
https://issues.apache.org/jira/browse/PIG-116?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Olga Natkovich resolved PIG-116.
--------------------------------
Resolution: Won't Fix
We are not seeing this as causing problems since the data does not get cleaned
under very rare circumstances
> pig leaves temp files behind
> ----------------------------
>
> Key: PIG-116
> URL: https://issues.apache.org/jira/browse/PIG-116
> Project: Pig
> Issue Type: Bug
> Reporter: Olga Natkovich
> Assignee: Olga Natkovich
>
> Currently, pig creates temp dirs via call to FileLocalizer.getTemporaryPath.
> They are created on the client and are mainly used to store data between 2
> M-R jobs. Pig then attempts to clean them up in the client's shutdown hook.
> The problem with this approach is that, because there is now way to order the
> shutdown hooks, in some cases, the DFS is already closed when we try to
> delete the files in which case a substention amount of data can be left in
> DFS. I see this issue more frequently with hadoop 0.16 perhaps because I had
> to add an extra shutdown hook to handle hod disconnects.
> The short term, I would like to propose the approach below:
> (1) If trash is configured on the cluster, use trash location to create temp
> directory that will expire in 7 days. The hope is that most jobs don't run
> longer that 7 days. The user can specify a longer interval via a command line
> switch
> (2) If trash is not enabled on the cluster, the location that we use now will
> be used
> (3) In the shutdown hook, we will attempt to cleanup. If the attempt fails
> and trash is enabled, we let trash handle it; otherwise we provide the list
> of locations to the user to clean. (I realize that this is not ideal but
> could not figure out a better way.)
> Longer term, I am talking with hadoop team to have better temp file support:
> https://issues.apache.org/jira/browse/HADOOP-2815
> Comments? Suggestions?
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.