[ 
https://issues.apache.org/jira/browse/HADOOP-2815?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12574351#action_12574351
 ] 

Hemanth Yamijala commented on HADOOP-2815:
------------------------------------------

While Devaraj and I were trying to fix HADOOP-2899, we came up with a solution 
that exactly matches Dhruba's comment above. In HADOOP-2899, JobTracker needed 
to cleanup the directory pointed to by mapred.system.dir when it is shutting 
down. We experimented with some code in the JobTracker that adds a shutdown 
hook and deletes the directory at cleanup. The requirement for deleting the 
directory works, but we faced the following problems:
- the addition / removal of additional shutdown hooks sometimes causes 
IllegalStateExceptions in the FileSystem code
- the non-deterministic order of running shutdown hooks in the JVM actually 
makes the whole thing complicated, and seemed unsafe.

Dhruba's solution would work perfectly to handle this scenario, and seems a 
very clean approach. The JobTracker would just register a shutdown hook with 
the FileSystem, and would cleanup the system directory then. So, +1 for the 
suggestion.

Is there any chance this code can come up in Hadoop 0.16.1 because HADOOP-2899 
should ideally be fixed in 0.16.1 ?


> Allowing processes to cleanup dfs on shutdown
> ---------------------------------------------
>
>                 Key: HADOOP-2815
>                 URL: https://issues.apache.org/jira/browse/HADOOP-2815
>             Project: Hadoop Core
>          Issue Type: New Feature
>          Components: dfs
>            Reporter: Olga Natkovich
>            Assignee: dhruba borthakur
>
> Pig creates temp files that it wants to be removed at the end of the 
> processing. The code that removes the temp file is in the shutdown hook so 
> that they get removed both under normal shutdown as well as when process gets 
> killed.
> The problem that we are seeing is that by the time the code is called the DFS 
> might already be closed and the delete fails leaving temp files behind. Since 
> we have no control over the shutdown order, we have no way to make sure that 
> the files get removed.
> One way to solve this issue is to be able to mark the files as temp files so 
> that hadoop can remove them during its shutdown.
> The stack trace I am seeing is
> at org.apache.hadoop.dfs.DFSClient.checkOpen(DFSClient.java:158)
>         at org.apache.hadoop.dfs.DFSClient.delete(DFSClient.java:417)
>         at 
> org.apache.hadoop.dfs.DistributedFileSystem.delete(DistributedFileSystem.java:144)
>         at 
> org.apache.pig.backend.hadoop.datastorage.HPath.delete(HPath.java:96)
>         at org.apache.pig.impl.io.FileLocalizer$1.run(FileLocalizer.java:275)

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Reply via email to