Daniel Dai created PIG-4916:
-------------------------------

             Summary: Pig on Tez fail to remove temporary HDFS files in some 
cases
                 Key: PIG-4916
                 URL: https://issues.apache.org/jira/browse/PIG-4916
             Project: Pig
          Issue Type: Bug
            Reporter: Daniel Dai
            Assignee: Daniel Dai
             Fix For: 0.16.1, 0.17.0


We saw the following stack trace when running Pig on S3:
{code}
2016-06-01 22:04:22,714 [Thread-19] INFO  
org.apache.hadoop.service.AbstractService - Service 
org.apache.hadoop.yarn.client.api.impl.TimelineClientImpl failed in state 
STOPPED; cause: java.io.IOException: Filesystem closed
java.io.IOException: Filesystem closed
        at org.apache.hadoop.hdfs.DFSClient.checkOpen(DFSClient.java:808)
        at 
org.apache.hadoop.hdfs.DFSOutputStream.flushOrSync(DFSOutputStream.java:2034)
        at 
org.apache.hadoop.hdfs.DFSOutputStream.hflush(DFSOutputStream.java:1980)
        at 
org.apache.hadoop.fs.FSDataOutputStream.hflush(FSDataOutputStream.java:130)
        at 
org.apache.hadoop.yarn.client.api.impl.FileSystemTimelineWriter$LogFD.flush(FileSystemTimelineWriter.java:370)
        at 
org.apache.hadoop.yarn.client.api.impl.FileSystemTimelineWriter$LogFDsCache.flush(FileSystemTimelineWriter.java:485)
        at 
org.apache.hadoop.yarn.client.api.impl.FileSystemTimelineWriter.close(FileSystemTimelineWriter.java:271)
        at 
org.apache.hadoop.yarn.client.api.impl.TimelineClientImpl.serviceStop(TimelineClientImpl.java:326)
        at 
org.apache.hadoop.service.AbstractService.stop(AbstractService.java:221)
        at 
org.apache.tez.dag.history.ats.acls.ATSV15HistoryACLPolicyManager.close(ATSV15HistoryACLPolicyManager.java:259)
        at org.apache.tez.client.TezClient.stop(TezClient.java:582)
        at 
org.apache.pig.backend.hadoop.executionengine.tez.TezSessionManager.shutdown(TezSessionManager.java:308)
        at 
org.apache.pig.backend.hadoop.executionengine.tez.TezSessionManager$1.run(TezSessionManager.java:53)
2016-06-01 22:04:22,718 [Thread-19] ERROR 
org.apache.pig.backend.hadoop.executionengine.tez.TezSessionManager - Error 
shutting down Tez session org.apache.tez.client.TezClient@48bf833a
org.apache.hadoop.service.ServiceStateException: java.io.IOException: 
Filesystem closed
        at 
org.apache.hadoop.service.ServiceStateException.convert(ServiceStateException.java:59)
        at 
org.apache.hadoop.service.AbstractService.stop(AbstractService.java:225)
        at 
org.apache.tez.dag.history.ats.acls.ATSV15HistoryACLPolicyManager.close(ATSV15HistoryACLPolicyManager.java:259)
        at org.apache.tez.client.TezClient.stop(TezClient.java:582)
        at 
org.apache.pig.backend.hadoop.executionengine.tez.TezSessionManager.shutdown(TezSessionManager.java:308)
        at 
org.apache.pig.backend.hadoop.executionengine.tez.TezSessionManager$1.run(TezSessionManager.java:53)
Caused by: java.io.IOException: Filesystem closed
        at org.apache.hadoop.hdfs.DFSClient.checkOpen(DFSClient.java:808)
        at 
org.apache.hadoop.hdfs.DFSOutputStream.flushOrSync(DFSOutputStream.java:2034)
        at 
org.apache.hadoop.hdfs.DFSOutputStream.hflush(DFSOutputStream.java:1980)
        at 
org.apache.hadoop.fs.FSDataOutputStream.hflush(FSDataOutputStream.java:130)
        at 
org.apache.hadoop.yarn.client.api.impl.FileSystemTimelineWriter$LogFD.flush(FileSystemTimelineWriter.java:370)
        at 
org.apache.hadoop.yarn.client.api.impl.FileSystemTimelineWriter$LogFDsCache.flush(FileSystemTimelineWriter.java:485)
        at 
org.apache.hadoop.yarn.client.api.impl.FileSystemTimelineWriter.close(FileSystemTimelineWriter.java:271)
        at 
org.apache.hadoop.yarn.client.api.impl.TimelineClientImpl.serviceStop(TimelineClientImpl.java:326)
        at 
org.apache.hadoop.service.AbstractService.stop(AbstractService.java:221)
        ... 4 more
{code}
The job run successfully, but the temporary hdfs files are not removed.

[~cnauroth] points out FileSystem also use shutdown hook to close FileSystem 
instances and it might run before Pig's shutdown hook in Main. By switching to 
Hadoop's ShutdownHookManager, we can put an order on shutdown hook.

This has been verified by testing the following code in Main:
{code}
        ShutdownHookManager.get().addShutdownHook(new Runnable() {
            @Override
            public void run() {
                FileLocalizer.deleteTempResourceFiles();
            }
        }, priority);
{code}

Notice FileSystem.SHUTDOWN_HOOK_PRIORITY=10. When priority=9, Pig fail. When 
priority=11, Pig success.




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to