[ 
https://issues.apache.org/jira/browse/PIG-4916?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Daniel Dai updated PIG-4916:
----------------------------
    Status: Patch Available  (was: Open)

I don't see a way to write a test case since this is non-deterministic in 
nature.

> Pig on Tez fail to remove temporary HDFS files in some cases
> ------------------------------------------------------------
>
>                 Key: PIG-4916
>                 URL: https://issues.apache.org/jira/browse/PIG-4916
>             Project: Pig
>          Issue Type: Bug
>            Reporter: Daniel Dai
>            Assignee: Daniel Dai
>             Fix For: 0.17.0, 0.16.1
>
>         Attachments: PIG-4916-1.patch
>
>
> We saw the following stack trace when running Pig on S3:
> {code}
> 2016-06-01 22:04:22,714 [Thread-19] INFO  
> org.apache.hadoop.service.AbstractService - Service 
> org.apache.hadoop.yarn.client.api.impl.TimelineClientImpl failed in state 
> STOPPED; cause: java.io.IOException: Filesystem closed
> java.io.IOException: Filesystem closed
>       at org.apache.hadoop.hdfs.DFSClient.checkOpen(DFSClient.java:808)
>       at 
> org.apache.hadoop.hdfs.DFSOutputStream.flushOrSync(DFSOutputStream.java:2034)
>       at 
> org.apache.hadoop.hdfs.DFSOutputStream.hflush(DFSOutputStream.java:1980)
>       at 
> org.apache.hadoop.fs.FSDataOutputStream.hflush(FSDataOutputStream.java:130)
>       at 
> org.apache.hadoop.yarn.client.api.impl.FileSystemTimelineWriter$LogFD.flush(FileSystemTimelineWriter.java:370)
>       at 
> org.apache.hadoop.yarn.client.api.impl.FileSystemTimelineWriter$LogFDsCache.flush(FileSystemTimelineWriter.java:485)
>       at 
> org.apache.hadoop.yarn.client.api.impl.FileSystemTimelineWriter.close(FileSystemTimelineWriter.java:271)
>       at 
> org.apache.hadoop.yarn.client.api.impl.TimelineClientImpl.serviceStop(TimelineClientImpl.java:326)
>       at 
> org.apache.hadoop.service.AbstractService.stop(AbstractService.java:221)
>       at 
> org.apache.tez.dag.history.ats.acls.ATSV15HistoryACLPolicyManager.close(ATSV15HistoryACLPolicyManager.java:259)
>       at org.apache.tez.client.TezClient.stop(TezClient.java:582)
>       at 
> org.apache.pig.backend.hadoop.executionengine.tez.TezSessionManager.shutdown(TezSessionManager.java:308)
>       at 
> org.apache.pig.backend.hadoop.executionengine.tez.TezSessionManager$1.run(TezSessionManager.java:53)
> 2016-06-01 22:04:22,718 [Thread-19] ERROR 
> org.apache.pig.backend.hadoop.executionengine.tez.TezSessionManager - Error 
> shutting down Tez session org.apache.tez.client.TezClient@48bf833a
> org.apache.hadoop.service.ServiceStateException: java.io.IOException: 
> Filesystem closed
>       at 
> org.apache.hadoop.service.ServiceStateException.convert(ServiceStateException.java:59)
>       at 
> org.apache.hadoop.service.AbstractService.stop(AbstractService.java:225)
>       at 
> org.apache.tez.dag.history.ats.acls.ATSV15HistoryACLPolicyManager.close(ATSV15HistoryACLPolicyManager.java:259)
>       at org.apache.tez.client.TezClient.stop(TezClient.java:582)
>       at 
> org.apache.pig.backend.hadoop.executionengine.tez.TezSessionManager.shutdown(TezSessionManager.java:308)
>       at 
> org.apache.pig.backend.hadoop.executionengine.tez.TezSessionManager$1.run(TezSessionManager.java:53)
> Caused by: java.io.IOException: Filesystem closed
>       at org.apache.hadoop.hdfs.DFSClient.checkOpen(DFSClient.java:808)
>       at 
> org.apache.hadoop.hdfs.DFSOutputStream.flushOrSync(DFSOutputStream.java:2034)
>       at 
> org.apache.hadoop.hdfs.DFSOutputStream.hflush(DFSOutputStream.java:1980)
>       at 
> org.apache.hadoop.fs.FSDataOutputStream.hflush(FSDataOutputStream.java:130)
>       at 
> org.apache.hadoop.yarn.client.api.impl.FileSystemTimelineWriter$LogFD.flush(FileSystemTimelineWriter.java:370)
>       at 
> org.apache.hadoop.yarn.client.api.impl.FileSystemTimelineWriter$LogFDsCache.flush(FileSystemTimelineWriter.java:485)
>       at 
> org.apache.hadoop.yarn.client.api.impl.FileSystemTimelineWriter.close(FileSystemTimelineWriter.java:271)
>       at 
> org.apache.hadoop.yarn.client.api.impl.TimelineClientImpl.serviceStop(TimelineClientImpl.java:326)
>       at 
> org.apache.hadoop.service.AbstractService.stop(AbstractService.java:221)
>       ... 4 more
> {code}
> The job run successfully, but the temporary hdfs files are not removed.
> [~cnauroth] points out FileSystem also use shutdown hook to close FileSystem 
> instances and it might run before Pig's shutdown hook in Main. By switching 
> to Hadoop's ShutdownHookManager, we can put an order on shutdown hook.
> This has been verified by testing the following code in Main:
> {code}
>         ShutdownHookManager.get().addShutdownHook(new Runnable() {
>             @Override
>             public void run() {
>                 FileLocalizer.deleteTempResourceFiles();
>             }
>         }, priority);
> {code}
> Notice FileSystem.SHUTDOWN_HOOK_PRIORITY=10. When priority=9, Pig fail. When 
> priority=11, Pig success.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to