[
https://issues.apache.org/jira/browse/HDFS-16852?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Xing Lin updated HDFS-16852:
----------------------------
Summary: Register the shutdown hook only when not in shutdown for
KeyProviderCache constructor (was: HDFS-16852 Register the shutdown hook only
when not in shutdown for KeyProviderCache constructor)
> Register the shutdown hook only when not in shutdown for KeyProviderCache
> constructor
> -------------------------------------------------------------------------------------
>
> Key: HDFS-16852
> URL: https://issues.apache.org/jira/browse/HDFS-16852
> Project: Hadoop HDFS
> Issue Type: Bug
> Components: hdfs
> Reporter: Xing Lin
> Assignee: Xing Lin
> Priority: Minor
> Labels: pull-request-available
>
> When an HDFS client is created, it will register a shutdownhook to
> shutdownHookManager. ShutdownHookManager doesn't allow adding a new
> shutdownHook when the process is already in shutdown and throws an
> IllegalStateException.
> This behavior is not ideal, when a spark program failed during pre-launch. In
> that case, during shutdown, spark would call cleanStagingDir() to clean the
> staging dir. In cleanStagingDir(), it will create a FileSystem object to talk
> to HDFS. However, since this would be the first time to use a filesystem
> object in that process, it will need to create an hdfs client and register
> the shutdownHook. Then, we will hit the IllegalStateException. This
> illegalStateException will mask the actual exception which causes the spark
> program to fail during pre-launch.
> We propose to swallow IllegalStateException in KeyProviderCache and log a
> warning. The TCP connection between the client and NameNode should be closed
> by the OS when the process is shutdown.
> Example stacktrace
> {code:java}
> 13-09-2022 14:39:42 PDT INFO - 22/09/13 21:39:41 ERROR util.Utils: Uncaught
> exception in thread shutdown-hook-0
> 13-09-2022 14:39:42 PDT INFO - java.lang.IllegalStateException: Shutdown in
> progress, cannot add a shutdownHook
> 13-09-2022 14:39:42 PDT INFO - at
> org.apache.hadoop.util.ShutdownHookManager.addShutdownHook(ShutdownHookManager.java:299)
>
> 13-09-2022 14:39:42 PDT INFO - at
> org.apache.hadoop.hdfs.KeyProviderCache.<init>(KeyProviderCache.java:71)
>
> 13-09-2022 14:39:42 PDT INFO - at
> org.apache.hadoop.hdfs.ClientContext.<init>(ClientContext.java:130)
> 13-09-2022 14:39:42 PDT INFO - at
> org.apache.hadoop.hdfs.ClientContext.get(ClientContext.java:167)
> 13-09-2022 14:39:42 PDT INFO - at
> org.apache.hadoop.hdfs.DFSClient.<init>(DFSClient.java:383)
> 13-09-2022 14:39:42 PDT INFO - at
> org.apache.hadoop.hdfs.DFSClient.<init>(DFSClient.java:287)
> 13-09-2022 14:39:42 PDT INFO - at
> org.apache.hadoop.hdfs.DistributedFileSystem.initialize(DistributedFileSystem.java:159)
>
> 13-09-2022 14:39:42 PDT INFO - at
> org.apache.hadoop.fs.FileSystem.createFileSystem(FileSystem.java:3261)
>
> 13-09-2022 14:39:42 PDT INFO - at
> org.apache.hadoop.fs.FileSystem.access$200(FileSystem.java:121)
> 13-09-2022 14:39:42 PDT INFO - at
> org.apache.hadoop.fs.FileSystem$Cache.getInternal(FileSystem.java:3310)
>
> 13-09-2022 14:39:42 PDT INFO - at
> org.apache.hadoop.fs.FileSystem$Cache.get(FileSystem.java:3278)
> 13-09-2022 14:39:42 PDT INFO - at
> org.apache.hadoop.fs.FileSystem.get(FileSystem.java:475)
> 13-09-2022 14:39:42 PDT INFO - at
> org.apache.hadoop.fs.Path.getFileSystem(Path.java:356)
> 13-09-2022 14:39:42 PDT INFO - at
> org.apache.spark.deploy.yarn.ApplicationMaster.cleanupStagingDir(ApplicationMaster.scala:675)
>
> 13-09-2022 14:39:42 PDT INFO - at
> org.apache.spark.deploy.yarn.ApplicationMaster.$anonfun$run$2(ApplicationMaster.scala:259)
>
> 13-09-2022 14:39:42 PDT INFO - at
> org.apache.spark.util.SparkShutdownHook.run(ShutdownHookManager.scala:214)
>
> 13-09-2022 14:39:42 PDT INFO - at
> org.apache.spark.util.SparkShutdownHookManager.$anonfun$runAll$2(ShutdownHookManager.scala:188)
>
> 13-09-2022 14:39:42 PDT INFO - at
> scala.runtime.java8.JFunction0$mcV$sp.apply(JFunction0$mcV$sp.java:23)
>
> 13-09-2022 14:39:42 PDT INFO - at
> org.apache.spark.util.Utils$.logUncaughtExceptions(Utils.scala:2023)
> 13-09-2022 14:39:42 PDT INFO - at
> org.apache.spark.util.SparkShutdownHookManager.$anonfun$runAll$1(ShutdownHookManager.scala:188)
>
> 13-09-2022 14:39:42 PDT INFO - at
> scala.runtime.java8.JFunction0$mcV$sp.apply(JFunction0$mcV$sp.java:23)
>
> 13-09-2022 14:39:42 PDT INFO - at scala.util.Try$.apply(Try.scala:213)
>
> 13-09-2022 14:39:42 PDT INFO - at
> org.apache.spark.util.SparkShutdownHookManager.runAll(ShutdownHookManager.scala:188)
>
> 13-09-2022 14:39:42 PDT INFO - at
> org.apache.spark.util.SparkShutdownHookManager$$anon$2.run(ShutdownHookManager.scala:178)
>
> 13-09-2022 14:39:42 PDT INFO - at
> java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
>
> 13-09-2022 14:39:42 PDT INFO - at
> java.util.concurrent.FutureTask.run(FutureTask.java:266)
> 13-09-2022 14:39:42 PDT INFO - at
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
>
> 13-09-2022 14:39:42 PDT INFO - at
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
>
> 13-09-2022 14:39:42 PDT INFO - at java.lang.Thread.run(Thread.java:748)
>
> 13-09-2022 14:39:42 PDT INFO - 22/09/13 21:39:41 INFO
> util.ShutdownHookManager: Shutdown hook called
> {code}
>
--
This message was sent by Atlassian Jira
(v8.20.10#820010)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]