[ https://issues.apache.org/jira/browse/SPARK-31029?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
shanyu zhao updated SPARK-31029: -------------------------------- Description: *Problem:* When running tpc-ds test (https://github.com/databricks/spark-sql-perf), occasionally we see error related to class not found: 2020-02-04 20:00:26,673 ERROR yarn.ApplicationMaster: User class threw exception: scala.ScalaReflectionException: class com.databricks.spark.sql.perf.ExperimentRun in JavaMirror with sun.misc.Launcher$AppClassLoader@28ba21f3 of type class sun.misc.Launcher$AppClassLoader with classpath [...] and parent being sun.misc.Launcher$ExtClassLoader@3ff5d147 of type class sun.misc.Launcher$ExtClassLoader with classpath [...] and parent being primordial classloader with boot classpath [...] not found. *Root cause:* Spark driver starts ApplicationMaster in the main thread, which starts a user thread and set MutableURLClassLoader to that thread's ContextClassLoader. userClassThread = startUserApplication() The main thread then setup YarnSchedulerBackend RPC endpoints, which handles these calls using scala Future with the default global ExecutionContext: - doRequestTotalExecutors - doKillExecutors If main thread starts a future to handle doKillExecutors() before user thread does then the default thread pool thread's ContextClassLoader would be the default (AppClassLoader). If user thread starts a future first then the thread pool thread will have MutableURLClassLoader. So if user's code uses a future which references a user provided class (only MutableURLClassLoader can load), and before the future if there are executor lost, you will see errors related to class not found. *Proposed Solution:* We can potentially solve this problem in one of two ways: 1) Set the same class loader (userClassLoader) to both the main thread and user thread in ApplicationMaster.scala 2) Do not use "ExecutionContext.Implicits.global" in YarnSchedulerBackend was: *Problem:* When running tpc-ds test (https://github.com/databricks/spark-sql-perf), occasionally we see error related to class not found: 2020-02-04 20:00:26,673 ERROR yarn.ApplicationMaster: User class threw exception: scala.ScalaReflectionException: class com.databricks.spark.sql.perf.ExperimentRun in JavaMirror with sun.misc.Launcher$AppClassLoader@28ba21f3 of type class sun.misc.Launcher$AppClassLoader with classpath [...] and parent being sun.misc.Launcher$ExtClassLoader@3ff5d147 of type class sun.misc.Launcher$ExtClassLoader with classpath [...] and parent being primordial classloader with boot classpath [...] not found. *Root cause:* Spark driver starts ApplicationMaster in the main thread, which starts a user thread and set MutableURLClassLoader to that thread's ContextClassLoader. userClassThread = startUserApplication() The main thread then setup YarnSchedulerBackend RPC endpoints, which handles these calls using scala Future with the default global ExecutionContext: - doRequestTotalExecutors - doKillExecutors If main thread starts a future to handle doKillExecutors() before user thread does then the default thread pool thread's ContextClassLoader would be the default (AppClassLoader). If user thread starts a future first then the thread pool thread will have MutableURLClassLoader. So if user's code uses a future which references a user provided class (only MutableURLClassLoader can load), and before the future if there are executor lost, you will see errors related to class not found. *Proposed Solution:* Set the same class loader (userClassLoader) to both the main thread and user thread in ApplicationMaster.scala > Occasional class not found error in user's Future code using global > ExecutionContext > ------------------------------------------------------------------------------------ > > Key: SPARK-31029 > URL: https://issues.apache.org/jira/browse/SPARK-31029 > Project: Spark > Issue Type: Bug > Components: YARN > Affects Versions: 2.4.5 > Reporter: shanyu zhao > Priority: Major > > *Problem:* > When running tpc-ds test (https://github.com/databricks/spark-sql-perf), > occasionally we see error related to class not found: > 2020-02-04 20:00:26,673 ERROR yarn.ApplicationMaster: User class threw > exception: scala.ScalaReflectionException: class > com.databricks.spark.sql.perf.ExperimentRun in JavaMirror with > sun.misc.Launcher$AppClassLoader@28ba21f3 of type class > sun.misc.Launcher$AppClassLoader with classpath [...] > and parent being sun.misc.Launcher$ExtClassLoader@3ff5d147 of type class > sun.misc.Launcher$ExtClassLoader with classpath [...] > and parent being primordial classloader with boot classpath [...] not found. > *Root cause:* > Spark driver starts ApplicationMaster in the main thread, which starts a user > thread and set MutableURLClassLoader to that thread's ContextClassLoader. > userClassThread = startUserApplication() > The main thread then setup YarnSchedulerBackend RPC endpoints, which handles > these calls using scala Future with the default global ExecutionContext: > - doRequestTotalExecutors > - doKillExecutors > If main thread starts a future to handle doKillExecutors() before user thread > does then the default thread pool thread's ContextClassLoader would be the > default (AppClassLoader). > If user thread starts a future first then the thread pool thread will have > MutableURLClassLoader. > So if user's code uses a future which references a user provided class (only > MutableURLClassLoader can load), and before the future if there are executor > lost, you will see errors related to class not found. > *Proposed Solution:* > We can potentially solve this problem in one of two ways: > 1) Set the same class loader (userClassLoader) to both the main thread and > user thread in ApplicationMaster.scala > 2) Do not use "ExecutionContext.Implicits.global" in YarnSchedulerBackend -- This message was sent by Atlassian Jira (v8.3.4#803005) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org