[GitHub] spark pull request #21185: [SPARK-23894][CORE][SQL] Defensively clear Active...
Github user squito commented on a diff in the pull request: https://github.com/apache/spark/pull/21185#discussion_r186772234 --- Diff: core/src/main/scala/org/apache/spark/executor/Executor.scala --- @@ -229,6 +229,23 @@ private[spark] class Executor( ManagementFactory.getGarbageCollectorMXBeans.asScala.map(_.getCollectionTime).sum } + /** + * Only in local mode, we have to prevent the driver from setting the active SparkSession + * in the executor threads. See SPARK-23894. + */ + private lazy val clearActiveSparkSessionMethod = if (Utils.isLocalMaster(conf)) { --- End diff -- agreed, I closed this PR --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #21185: [SPARK-23894][CORE][SQL] Defensively clear Active...
Github user squito closed the pull request at: https://github.com/apache/spark/pull/21185 --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #21185: [SPARK-23894][CORE][SQL] Defensively clear Active...
Github user cloud-fan commented on a diff in the pull request: https://github.com/apache/spark/pull/21185#discussion_r186358733 --- Diff: core/src/main/scala/org/apache/spark/executor/Executor.scala --- @@ -229,6 +229,23 @@ private[spark] class Executor( ManagementFactory.getGarbageCollectorMXBeans.asScala.map(_.getCollectionTime).sum } + /** + * Only in local mode, we have to prevent the driver from setting the active SparkSession + * in the executor threads. See SPARK-23894. + */ + private lazy val clearActiveSparkSessionMethod = if (Utils.isLocalMaster(conf)) { --- End diff -- I've added this check in https://github.com/apache/spark/pull/21190 --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #21185: [SPARK-23894][CORE][SQL] Defensively clear Active...
Github user gatorsmile commented on a diff in the pull request: https://github.com/apache/spark/pull/21185#discussion_r185612118 --- Diff: core/src/main/scala/org/apache/spark/executor/Executor.scala --- @@ -229,6 +229,23 @@ private[spark] class Executor( ManagementFactory.getGarbageCollectorMXBeans.asScala.map(_.getCollectionTime).sum } + /** + * Only in local mode, we have to prevent the driver from setting the active SparkSession + * in the executor threads. See SPARK-23894. + */ + private lazy val clearActiveSparkSessionMethod = if (Utils.isLocalMaster(conf)) { --- End diff -- Do we still need to call this if we can resolve the root cause in https://github.com/apache/spark/pull/21190, and issue an error promptly like what you said in https://github.com/apache/spark/pull/21185#issuecomment-385996077? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #21185: [SPARK-23894][CORE][SQL] Defensively clear Active...
Github user squito commented on a diff in the pull request: https://github.com/apache/spark/pull/21185#discussion_r184993529 --- Diff: core/src/main/scala/org/apache/spark/executor/Executor.scala --- @@ -299,6 +316,9 @@ private[spark] class Executor( Thread.currentThread.setContextClassLoader(replClassLoader) val ser = env.closureSerializer.newInstance() logInfo(s"Running $taskName (TID $taskId)") + // When running in local mode, we might end up with the active session from the driver set on + // this thread, though we never should, so we defensively clear it. See SPARK-23894. + clearActiveSparkSessionMethod.foreach(_.invoke(null)) --- End diff -- ThreadFactories only create threads, they don't start them, so you can't do it directly there, as I need this to run *inside* the thread. I thought about trying to do something fancier, like having the thread pool also start the thread and submit one runnable to do this, or having threads get promoted to the correct thread pool after this ran, but I felt like it was too error-prone for minimal gain. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #21185: [SPARK-23894][CORE][SQL] Defensively clear Active...
Github user vanzin commented on a diff in the pull request: https://github.com/apache/spark/pull/21185#discussion_r184813348 --- Diff: core/src/main/scala/org/apache/spark/executor/Executor.scala --- @@ -299,6 +316,9 @@ private[spark] class Executor( Thread.currentThread.setContextClassLoader(replClassLoader) val ser = env.closureSerializer.newInstance() logInfo(s"Running $taskName (TID $taskId)") + // When running in local mode, we might end up with the active session from the driver set on + // this thread, though we never should, so we defensively clear it. See SPARK-23894. + clearActiveSparkSessionMethod.foreach(_.invoke(null)) --- End diff -- Can this be done in the thread pool's thread factory instead? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #21185: [SPARK-23894][CORE][SQL] Defensively clear Active...
Github user vanzin commented on a diff in the pull request: https://github.com/apache/spark/pull/21185#discussion_r184813243 --- Diff: core/src/main/scala/org/apache/spark/executor/Executor.scala --- @@ -229,6 +229,23 @@ private[spark] class Executor( ManagementFactory.getGarbageCollectorMXBeans.asScala.map(_.getCollectionTime).sum } + /** + * Only in local mode, we have to prevent the driver from setting the active SparkSession + * in the executor threads. See SPARK-23894. + */ + lazy val clearActiveSparkSessionMethod = if (Utils.isLocalMaster(conf)) { --- End diff -- private? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #21185: [SPARK-23894][CORE][SQL] Defensively clear Active...
GitHub user squito opened a pull request: https://github.com/apache/spark/pull/21185 [SPARK-23894][CORE][SQL] Defensively clear ActiveSession in Executors Because SparkSession.getActiveSession uses an InheritableThreadLocal, the ThreadPool in executors might end up inheriting the SparkSession from a driver thread. This leads to test failures as executors should never have an active SparkSession. So in local mode, defensively clear the active session. You can merge this pull request into a Git repository by running: $ git pull https://github.com/squito/spark SPARK-23894 Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/21185.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #21185 commit 2a4944ffe5836408b80f9aa06e9b28e57aa16649 Author: Imran RashidDate: 2018-04-27T20:03:14Z [SPARK-23894][CORE][SQL] Defensively clear ActiveSession in Executors Because SparkSession.getActiveSession uses an InheritableThreadLocal, the ThreadPool in executors might end up inheriting the SparkSession from a driver thread. This leads to test failures as executors should never have an active SparkSession. So in local mode, defensively clear the active session. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org