Kousuke Saruta created SPARK-3900:
-------------------------------------

             Summary: ApplicationMaster's shutdown hook fails to cleanup 
staging directory.
                 Key: SPARK-3900
                 URL: https://issues.apache.org/jira/browse/SPARK-3900
             Project: Spark
          Issue Type: Bug
          Components: YARN
    Affects Versions: 1.2.0
         Environment: Hadoop 0.23
            Reporter: Kousuke Saruta
            Priority: Critical


ApplicationMaster registers a shutdown hook and it calls 
ApplicationMaster#cleanupStagingDir.

cleanupStagingDir invokes FileSystem.get(yarnConf) and it invokes 
FileSystem.getInternal. FileSystem.getInternal also registers shutdown hook.
In FileSystem of hadoop 0.23, the shutdown hook registration does not consider 
whether shutdown is in progress or not (In 2.2, it's considered).

{code}
// 0.23 
if (map.isEmpty() ) {
  ShutdownHookManager.get().addShutdownHook(clientFinalizer, 
SHUTDOWN_HOOK_PRIORITY);
}
{code}

{code}
// 2.2
if (map.isEmpty()
            && !ShutdownHookManager.get().isShutdownInProgress()) {
   ShutdownHookManager.get().addShutdownHook(clientFinalizer, 
SHUTDOWN_HOOK_PRIORITY);
}
{code}

Thus, in 0.23, another shutdown hook can be registered when ApplicationMaster's 
shutdown hook run.

This issue cause IllegalStateException as follows.

{code}
java.lang.IllegalStateException: Shutdown in progress, cannot add a shutdownHook
        at 
org.apache.hadoop.util.ShutdownHookManager.addShutdownHook(ShutdownHookManager.java:152)
        at 
org.apache.hadoop.fs.FileSystem$Cache.getInternal(FileSystem.java:2306)
        at org.apache.hadoop.fs.FileSystem$Cache.get(FileSystem.java:2278)
        at org.apache.hadoop.fs.FileSystem.get(FileSystem.java:316)
        at org.apache.hadoop.fs.FileSystem.get(FileSystem.java:162)
        at 
org.apache.spark.deploy.yarn.ApplicationMaster.org$apache$spark$deploy$yarn$ApplicationMaster$$cleanupStagingDir(ApplicationMaster.scala:307)
        at 
org.apache.spark.deploy.yarn.ApplicationMaster$$anon$3.run(ApplicationMaster.scala:118)
        at 
org.apache.hadoop.util.ShutdownHookManager$1.run(ShutdownHookManager.java:54)
{code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

Reply via email to