[ https://issues.apache.org/jira/browse/SPARK-41313?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Xing Lin updated SPARK-41313: ----------------------------- Description: spark-3900 fixed the illegalStateException in cleanupStagingDir in ApplicationMaster's shutdownhook. However, spark-21138 accidentally reverted/undid that change when fixing the "Wrong FS" bug. Now, we are seeing spark-3900 reported by our users at Linkedin. We need to bring back the fix for spark-3900. The illegalStateException when creating a new filesystem object is due to the limitation in hadoop that we can not register a shutdownhook during shutdown. So, when a spark job fails during pre-launch, as part of shutdown, cleanupStagingDir would be called. Then, if we attempt to create a new filesystem object for the first time, hadoop would try to register a hook to shutdown KeyProviderCache when creating a ClientContext for DFSClient. As a result, we hit the illegalStateException. We should avoid the creation of a new filesystem object in cleanupStagingDir() when it is called in a shutdown hook. This was introduced in spark-3900. However, spark-21138 accidentally reverted/undid that change. We need to bring back that fix to Spark to avoid the illegalStateException. was:spark-3900 fixed the illegalStateException in cleanupStagingDir in ApplicationMaster's shutdownhook. However, spark-21138 reverted that change when fixing the "Wrong FS" bug. We need both fixes. > Combine fixes for SPARK-3900 and SPARK-21138 > -------------------------------------------- > > Key: SPARK-41313 > URL: https://issues.apache.org/jira/browse/SPARK-41313 > Project: Spark > Issue Type: Bug > Components: Spark Core, YARN > Affects Versions: 3.4.0 > Reporter: Xing Lin > Priority: Major > > spark-3900 fixed the illegalStateException in cleanupStagingDir in > ApplicationMaster's shutdownhook. However, spark-21138 accidentally > reverted/undid that change when fixing the "Wrong FS" bug. Now, we are seeing > spark-3900 reported by our users at Linkedin. We need to bring back the fix > for spark-3900. > The illegalStateException when creating a new filesystem object is due to the > limitation in hadoop that we can not register a shutdownhook during shutdown. > So, when a spark job fails during pre-launch, as part of shutdown, > cleanupStagingDir would be called. Then, if we attempt to create a new > filesystem object for the first time, hadoop would try to register a hook to > shutdown KeyProviderCache when creating a ClientContext for DFSClient. As a > result, we hit the illegalStateException. We should avoid the creation of a > new filesystem object in cleanupStagingDir() when it is called in a shutdown > hook. This was introduced in spark-3900. However, spark-21138 accidentally > reverted/undid that change. We need to bring back that fix to Spark to avoid > the illegalStateException. > -- This message was sent by Atlassian Jira (v8.20.10#820010) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org