[ https://issues.apache.org/jira/browse/PIG-4916?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Daniel Dai updated PIG-4916: ---------------------------- Status: Patch Available (was: Open) I don't see a way to write a test case since this is non-deterministic in nature. > Pig on Tez fail to remove temporary HDFS files in some cases > ------------------------------------------------------------ > > Key: PIG-4916 > URL: https://issues.apache.org/jira/browse/PIG-4916 > Project: Pig > Issue Type: Bug > Reporter: Daniel Dai > Assignee: Daniel Dai > Fix For: 0.17.0, 0.16.1 > > Attachments: PIG-4916-1.patch > > > We saw the following stack trace when running Pig on S3: > {code} > 2016-06-01 22:04:22,714 [Thread-19] INFO > org.apache.hadoop.service.AbstractService - Service > org.apache.hadoop.yarn.client.api.impl.TimelineClientImpl failed in state > STOPPED; cause: java.io.IOException: Filesystem closed > java.io.IOException: Filesystem closed > at org.apache.hadoop.hdfs.DFSClient.checkOpen(DFSClient.java:808) > at > org.apache.hadoop.hdfs.DFSOutputStream.flushOrSync(DFSOutputStream.java:2034) > at > org.apache.hadoop.hdfs.DFSOutputStream.hflush(DFSOutputStream.java:1980) > at > org.apache.hadoop.fs.FSDataOutputStream.hflush(FSDataOutputStream.java:130) > at > org.apache.hadoop.yarn.client.api.impl.FileSystemTimelineWriter$LogFD.flush(FileSystemTimelineWriter.java:370) > at > org.apache.hadoop.yarn.client.api.impl.FileSystemTimelineWriter$LogFDsCache.flush(FileSystemTimelineWriter.java:485) > at > org.apache.hadoop.yarn.client.api.impl.FileSystemTimelineWriter.close(FileSystemTimelineWriter.java:271) > at > org.apache.hadoop.yarn.client.api.impl.TimelineClientImpl.serviceStop(TimelineClientImpl.java:326) > at > org.apache.hadoop.service.AbstractService.stop(AbstractService.java:221) > at > org.apache.tez.dag.history.ats.acls.ATSV15HistoryACLPolicyManager.close(ATSV15HistoryACLPolicyManager.java:259) > at org.apache.tez.client.TezClient.stop(TezClient.java:582) > at > org.apache.pig.backend.hadoop.executionengine.tez.TezSessionManager.shutdown(TezSessionManager.java:308) > at > org.apache.pig.backend.hadoop.executionengine.tez.TezSessionManager$1.run(TezSessionManager.java:53) > 2016-06-01 22:04:22,718 [Thread-19] ERROR > org.apache.pig.backend.hadoop.executionengine.tez.TezSessionManager - Error > shutting down Tez session org.apache.tez.client.TezClient@48bf833a > org.apache.hadoop.service.ServiceStateException: java.io.IOException: > Filesystem closed > at > org.apache.hadoop.service.ServiceStateException.convert(ServiceStateException.java:59) > at > org.apache.hadoop.service.AbstractService.stop(AbstractService.java:225) > at > org.apache.tez.dag.history.ats.acls.ATSV15HistoryACLPolicyManager.close(ATSV15HistoryACLPolicyManager.java:259) > at org.apache.tez.client.TezClient.stop(TezClient.java:582) > at > org.apache.pig.backend.hadoop.executionengine.tez.TezSessionManager.shutdown(TezSessionManager.java:308) > at > org.apache.pig.backend.hadoop.executionengine.tez.TezSessionManager$1.run(TezSessionManager.java:53) > Caused by: java.io.IOException: Filesystem closed > at org.apache.hadoop.hdfs.DFSClient.checkOpen(DFSClient.java:808) > at > org.apache.hadoop.hdfs.DFSOutputStream.flushOrSync(DFSOutputStream.java:2034) > at > org.apache.hadoop.hdfs.DFSOutputStream.hflush(DFSOutputStream.java:1980) > at > org.apache.hadoop.fs.FSDataOutputStream.hflush(FSDataOutputStream.java:130) > at > org.apache.hadoop.yarn.client.api.impl.FileSystemTimelineWriter$LogFD.flush(FileSystemTimelineWriter.java:370) > at > org.apache.hadoop.yarn.client.api.impl.FileSystemTimelineWriter$LogFDsCache.flush(FileSystemTimelineWriter.java:485) > at > org.apache.hadoop.yarn.client.api.impl.FileSystemTimelineWriter.close(FileSystemTimelineWriter.java:271) > at > org.apache.hadoop.yarn.client.api.impl.TimelineClientImpl.serviceStop(TimelineClientImpl.java:326) > at > org.apache.hadoop.service.AbstractService.stop(AbstractService.java:221) > ... 4 more > {code} > The job run successfully, but the temporary hdfs files are not removed. > [~cnauroth] points out FileSystem also use shutdown hook to close FileSystem > instances and it might run before Pig's shutdown hook in Main. By switching > to Hadoop's ShutdownHookManager, we can put an order on shutdown hook. > This has been verified by testing the following code in Main: > {code} > ShutdownHookManager.get().addShutdownHook(new Runnable() { > @Override > public void run() { > FileLocalizer.deleteTempResourceFiles(); > } > }, priority); > {code} > Notice FileSystem.SHUTDOWN_HOOK_PRIORITY=10. When priority=9, Pig fail. When > priority=11, Pig success. -- This message was sent by Atlassian JIRA (v6.3.4#6332)