[jira] [Commented] (SPARK-23894) Flaky Test: BucketedWriteWithoutHiveSupportSuite
[ https://issues.apache.org/jira/browse/SPARK-23894?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16467570#comment-16467570 ] Imran Rashid commented on SPARK-23894: -- After discussion in related PRs, SPARK-22938 should cover the main problem, and the PR for that will include the appropriate defensive checks preventing this in the future. > Flaky Test: BucketedWriteWithoutHiveSupportSuite > - > > Key: SPARK-23894 > URL: https://issues.apache.org/jira/browse/SPARK-23894 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 2.4.0 >Reporter: Imran Rashid >Priority: Minor > Attachments: unit-tests.log > > > Flaky test observed here: > https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/88991/ > I'll attach a snippet of the unit-tests logs, for this suite and the > preceeding one. Here's a snippet of the exception. > {noformat} > 08:36:34.694 Executor task launch worker for task 436 ERROR Executor: > Exception in task 0.0 in stage 402.0 (TID 436) > java.lang.IllegalStateException: LiveListenerBus is stopped. > at > org.apache.spark.scheduler.LiveListenerBus.addToQueue(LiveListenerBus.scala:97) > at > org.apache.spark.scheduler.LiveListenerBus.addToStatusQueue(LiveListenerBus.scala:80) > at > org.apache.spark.sql.internal.SharedState.(SharedState.scala:93) > at > org.apache.spark.sql.SparkSession$$anonfun$sharedState$1.apply(SparkSession.scala:117) > at > org.apache.spark.sql.SparkSession$$anonfun$sharedState$1.apply(SparkSession.scala:117) > at scala.Option.getOrElse(Option.scala:121) > at > org.apache.spark.sql.SparkSession.sharedState$lzycompute(SparkSession.scala:117) > at > org.apache.spark.sql.SparkSession.sharedState(SparkSession.scala:116) > at > org.apache.spark.sql.internal.BaseSessionStateBuilder.build(BaseSessionStateBuilder.scala:286) > at > org.apache.spark.sql.test.TestSparkSession.sessionState$lzycompute(TestSQLContext.scala:42) > at > org.apache.spark.sql.test.TestSparkSession.sessionState(TestSQLContext.scala:41) > at > org.apache.spark.sql.SparkSession$$anonfun$1$$anonfun$apply$1.apply(SparkSession.scala:92) > at > org.apache.spark.sql.SparkSession$$anonfun$1$$anonfun$apply$1.apply(SparkSession.scala:92) > at scala.Option.map(Option.scala:146) > at > org.apache.spark.sql.SparkSession$$anonfun$1.apply(SparkSession.scala:92) > at > org.apache.spark.sql.SparkSession$$anonfun$1.apply(SparkSession.scala:91) > at org.apache.spark.sql.internal.SQLConf$.get(SQLConf.scala:110) > at org.apache.spark.sql.types.DataType.sameType(DataType.scala:84) > at > org.apache.spark.sql.catalyst.analysis.TypeCoercion$$anonfun$1.apply(TypeCoercion.scala:105) > at > org.apache.spark.sql.catalyst.analysis.TypeCoercion$$anonfun$1.apply(TypeCoercion.scala:86) > {noformat} > I doubt this is actually because of BucketedWriteWithoutHiveSupportSuite. I > think it has something more to do with {{SparkSession}} 's lazy evaluation of > {{SharedState}} doing something funny with the way we setup the test spark > context etc ... though I don't really understand it yet. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-23894) Flaky Test: BucketedWriteWithoutHiveSupportSuite
[ https://issues.apache.org/jira/browse/SPARK-23894?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16457003#comment-16457003 ] Imran Rashid commented on SPARK-23894: -- I believe this issue has existed since SPARK-10810 / https://github.com/apache/spark/commit/3390b400d04e40f767d8a51f1078fcccb4e64abd though originally the SQLContext is what was in the InheritableThreadLocal > Flaky Test: BucketedWriteWithoutHiveSupportSuite > - > > Key: SPARK-23894 > URL: https://issues.apache.org/jira/browse/SPARK-23894 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 2.4.0 >Reporter: Imran Rashid >Priority: Minor > Attachments: unit-tests.log > > > Flaky test observed here: > https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/88991/ > I'll attach a snippet of the unit-tests logs, for this suite and the > preceeding one. Here's a snippet of the exception. > {noformat} > 08:36:34.694 Executor task launch worker for task 436 ERROR Executor: > Exception in task 0.0 in stage 402.0 (TID 436) > java.lang.IllegalStateException: LiveListenerBus is stopped. > at > org.apache.spark.scheduler.LiveListenerBus.addToQueue(LiveListenerBus.scala:97) > at > org.apache.spark.scheduler.LiveListenerBus.addToStatusQueue(LiveListenerBus.scala:80) > at > org.apache.spark.sql.internal.SharedState.(SharedState.scala:93) > at > org.apache.spark.sql.SparkSession$$anonfun$sharedState$1.apply(SparkSession.scala:117) > at > org.apache.spark.sql.SparkSession$$anonfun$sharedState$1.apply(SparkSession.scala:117) > at scala.Option.getOrElse(Option.scala:121) > at > org.apache.spark.sql.SparkSession.sharedState$lzycompute(SparkSession.scala:117) > at > org.apache.spark.sql.SparkSession.sharedState(SparkSession.scala:116) > at > org.apache.spark.sql.internal.BaseSessionStateBuilder.build(BaseSessionStateBuilder.scala:286) > at > org.apache.spark.sql.test.TestSparkSession.sessionState$lzycompute(TestSQLContext.scala:42) > at > org.apache.spark.sql.test.TestSparkSession.sessionState(TestSQLContext.scala:41) > at > org.apache.spark.sql.SparkSession$$anonfun$1$$anonfun$apply$1.apply(SparkSession.scala:92) > at > org.apache.spark.sql.SparkSession$$anonfun$1$$anonfun$apply$1.apply(SparkSession.scala:92) > at scala.Option.map(Option.scala:146) > at > org.apache.spark.sql.SparkSession$$anonfun$1.apply(SparkSession.scala:92) > at > org.apache.spark.sql.SparkSession$$anonfun$1.apply(SparkSession.scala:91) > at org.apache.spark.sql.internal.SQLConf$.get(SQLConf.scala:110) > at org.apache.spark.sql.types.DataType.sameType(DataType.scala:84) > at > org.apache.spark.sql.catalyst.analysis.TypeCoercion$$anonfun$1.apply(TypeCoercion.scala:105) > at > org.apache.spark.sql.catalyst.analysis.TypeCoercion$$anonfun$1.apply(TypeCoercion.scala:86) > {noformat} > I doubt this is actually because of BucketedWriteWithoutHiveSupportSuite. I > think it has something more to do with {{SparkSession}} 's lazy evaluation of > {{SharedState}} doing something funny with the way we setup the test spark > context etc ... though I don't really understand it yet. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-23894) Flaky Test: BucketedWriteWithoutHiveSupportSuite
[ https://issues.apache.org/jira/browse/SPARK-23894?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16456984#comment-16456984 ] Apache Spark commented on SPARK-23894: -- User 'squito' has created a pull request for this issue: https://github.com/apache/spark/pull/21185 > Flaky Test: BucketedWriteWithoutHiveSupportSuite > - > > Key: SPARK-23894 > URL: https://issues.apache.org/jira/browse/SPARK-23894 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 2.4.0 >Reporter: Imran Rashid >Priority: Minor > Attachments: unit-tests.log > > > Flaky test observed here: > https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/88991/ > I'll attach a snippet of the unit-tests logs, for this suite and the > preceeding one. Here's a snippet of the exception. > {noformat} > 08:36:34.694 Executor task launch worker for task 436 ERROR Executor: > Exception in task 0.0 in stage 402.0 (TID 436) > java.lang.IllegalStateException: LiveListenerBus is stopped. > at > org.apache.spark.scheduler.LiveListenerBus.addToQueue(LiveListenerBus.scala:97) > at > org.apache.spark.scheduler.LiveListenerBus.addToStatusQueue(LiveListenerBus.scala:80) > at > org.apache.spark.sql.internal.SharedState.(SharedState.scala:93) > at > org.apache.spark.sql.SparkSession$$anonfun$sharedState$1.apply(SparkSession.scala:117) > at > org.apache.spark.sql.SparkSession$$anonfun$sharedState$1.apply(SparkSession.scala:117) > at scala.Option.getOrElse(Option.scala:121) > at > org.apache.spark.sql.SparkSession.sharedState$lzycompute(SparkSession.scala:117) > at > org.apache.spark.sql.SparkSession.sharedState(SparkSession.scala:116) > at > org.apache.spark.sql.internal.BaseSessionStateBuilder.build(BaseSessionStateBuilder.scala:286) > at > org.apache.spark.sql.test.TestSparkSession.sessionState$lzycompute(TestSQLContext.scala:42) > at > org.apache.spark.sql.test.TestSparkSession.sessionState(TestSQLContext.scala:41) > at > org.apache.spark.sql.SparkSession$$anonfun$1$$anonfun$apply$1.apply(SparkSession.scala:92) > at > org.apache.spark.sql.SparkSession$$anonfun$1$$anonfun$apply$1.apply(SparkSession.scala:92) > at scala.Option.map(Option.scala:146) > at > org.apache.spark.sql.SparkSession$$anonfun$1.apply(SparkSession.scala:92) > at > org.apache.spark.sql.SparkSession$$anonfun$1.apply(SparkSession.scala:91) > at org.apache.spark.sql.internal.SQLConf$.get(SQLConf.scala:110) > at org.apache.spark.sql.types.DataType.sameType(DataType.scala:84) > at > org.apache.spark.sql.catalyst.analysis.TypeCoercion$$anonfun$1.apply(TypeCoercion.scala:105) > at > org.apache.spark.sql.catalyst.analysis.TypeCoercion$$anonfun$1.apply(TypeCoercion.scala:86) > {noformat} > I doubt this is actually because of BucketedWriteWithoutHiveSupportSuite. I > think it has something more to do with {{SparkSession}} 's lazy evaluation of > {{SharedState}} doing something funny with the way we setup the test spark > context etc ... though I don't really understand it yet. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-23894) Flaky Test: BucketedWriteWithoutHiveSupportSuite
[ https://issues.apache.org/jira/browse/SPARK-23894?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16456969#comment-16456969 ] Imran Rashid commented on SPARK-23894: -- I think I understand what is happening here, but I don't know how to fix it. Normally, there is no active spark session for the executor threads. I added some debugging code to where an executor might call {{SQLConf.get}} to show the active session, and under my test runs, there isn't an active session: {noformat} 12:49:35.801 dispatcher-event-loop-0 INFO Executor: Creating task runner thread with activeSession = None ... getting conf, activeSession = None in Executor task launch worker for task 24 java.lang.Exception: getting conf in thread Executor task launch worker for task 23 at org.apache.spark.sql.catalyst.plans.QueryPlan.conf(QueryPlan.scala:35) at org.apache.spark.sql.execution.columnar.InMemoryTableScanExec.org$apache$spark$sql$execution$columnar$InMemoryTableScanExec$$createAndDecompressColumn(InMemoryTableScanExe c.scala:84) ... at org.apache.spark.scheduler.Task.run(Task.scala:109) at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:345) {noformat} So how come sometimes its defined? Note that activeSession is an *Inheritable* thread local. Normally the executor threads are created before activeSession is defined, so they don't inherit anything. But a threadpool is free to create more threads at any time. And when they do, then suddenly the new executor threads will inherit the active session from their parent, a thread in the driver with the activeSession defined. I'll submit a PR to defensively always clear the active session in the executor thread. > Flaky Test: BucketedWriteWithoutHiveSupportSuite > - > > Key: SPARK-23894 > URL: https://issues.apache.org/jira/browse/SPARK-23894 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 2.4.0 >Reporter: Imran Rashid >Priority: Minor > Attachments: unit-tests.log > > > Flaky test observed here: > https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/88991/ > I'll attach a snippet of the unit-tests logs, for this suite and the > preceeding one. Here's a snippet of the exception. > {noformat} > 08:36:34.694 Executor task launch worker for task 436 ERROR Executor: > Exception in task 0.0 in stage 402.0 (TID 436) > java.lang.IllegalStateException: LiveListenerBus is stopped. > at > org.apache.spark.scheduler.LiveListenerBus.addToQueue(LiveListenerBus.scala:97) > at > org.apache.spark.scheduler.LiveListenerBus.addToStatusQueue(LiveListenerBus.scala:80) > at > org.apache.spark.sql.internal.SharedState.(SharedState.scala:93) > at > org.apache.spark.sql.SparkSession$$anonfun$sharedState$1.apply(SparkSession.scala:117) > at > org.apache.spark.sql.SparkSession$$anonfun$sharedState$1.apply(SparkSession.scala:117) > at scala.Option.getOrElse(Option.scala:121) > at > org.apache.spark.sql.SparkSession.sharedState$lzycompute(SparkSession.scala:117) > at > org.apache.spark.sql.SparkSession.sharedState(SparkSession.scala:116) > at > org.apache.spark.sql.internal.BaseSessionStateBuilder.build(BaseSessionStateBuilder.scala:286) > at > org.apache.spark.sql.test.TestSparkSession.sessionState$lzycompute(TestSQLContext.scala:42) > at > org.apache.spark.sql.test.TestSparkSession.sessionState(TestSQLContext.scala:41) > at > org.apache.spark.sql.SparkSession$$anonfun$1$$anonfun$apply$1.apply(SparkSession.scala:92) > at > org.apache.spark.sql.SparkSession$$anonfun$1$$anonfun$apply$1.apply(SparkSession.scala:92) > at scala.Option.map(Option.scala:146) > at > org.apache.spark.sql.SparkSession$$anonfun$1.apply(SparkSession.scala:92) > at > org.apache.spark.sql.SparkSession$$anonfun$1.apply(SparkSession.scala:91) > at org.apache.spark.sql.internal.SQLConf$.get(SQLConf.scala:110) > at org.apache.spark.sql.types.DataType.sameType(DataType.scala:84) > at > org.apache.spark.sql.catalyst.analysis.TypeCoercion$$anonfun$1.apply(TypeCoercion.scala:105) > at > org.apache.spark.sql.catalyst.analysis.TypeCoercion$$anonfun$1.apply(TypeCoercion.scala:86) > {noformat} > I doubt this is actually because of BucketedWriteWithoutHiveSupportSuite. I > think it has something more to do with {{SparkSession}} 's lazy evaluation of > {{SharedState}} doing something funny with the way we setup the test spark > context etc ... though I don't really understand it yet. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail:
[jira] [Commented] (SPARK-23894) Flaky Test: BucketedWriteWithoutHiveSupportSuite
[ https://issues.apache.org/jira/browse/SPARK-23894?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16456919#comment-16456919 ] Imran Rashid commented on SPARK-23894: -- One thing I've noticed from looking at more instances of this is that normally, we don't see any log lines from {{SharedState}} from the executor threads. Normally we see this: {noformat} 09:37:38.203 pool-1-thread-1-ScalaTest-running-ParquetQuerySuite INFO SharedState: Warehouse path is 'file:/Users/irashid/github/pub/spark/sql/core/spark-warehouse/'. {noformat} but in failures, we see {noformat} 23:37:56.728 Executor task launch worker for task 48 INFO SharedState: Warehouse path is 'file:/home/jenkins/workspace/spark-branch-2.3-test-sbt-hadoop-2.6/sql/core/spark-warehouse'. {noformat} (notice the thread). I don't understand why this happens yet. Nor can I reproduce locally. > Flaky Test: BucketedWriteWithoutHiveSupportSuite > - > > Key: SPARK-23894 > URL: https://issues.apache.org/jira/browse/SPARK-23894 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 2.4.0 >Reporter: Imran Rashid >Priority: Minor > Attachments: unit-tests.log > > > Flaky test observed here: > https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/88991/ > I'll attach a snippet of the unit-tests logs, for this suite and the > preceeding one. Here's a snippet of the exception. > {noformat} > 08:36:34.694 Executor task launch worker for task 436 ERROR Executor: > Exception in task 0.0 in stage 402.0 (TID 436) > java.lang.IllegalStateException: LiveListenerBus is stopped. > at > org.apache.spark.scheduler.LiveListenerBus.addToQueue(LiveListenerBus.scala:97) > at > org.apache.spark.scheduler.LiveListenerBus.addToStatusQueue(LiveListenerBus.scala:80) > at > org.apache.spark.sql.internal.SharedState.(SharedState.scala:93) > at > org.apache.spark.sql.SparkSession$$anonfun$sharedState$1.apply(SparkSession.scala:117) > at > org.apache.spark.sql.SparkSession$$anonfun$sharedState$1.apply(SparkSession.scala:117) > at scala.Option.getOrElse(Option.scala:121) > at > org.apache.spark.sql.SparkSession.sharedState$lzycompute(SparkSession.scala:117) > at > org.apache.spark.sql.SparkSession.sharedState(SparkSession.scala:116) > at > org.apache.spark.sql.internal.BaseSessionStateBuilder.build(BaseSessionStateBuilder.scala:286) > at > org.apache.spark.sql.test.TestSparkSession.sessionState$lzycompute(TestSQLContext.scala:42) > at > org.apache.spark.sql.test.TestSparkSession.sessionState(TestSQLContext.scala:41) > at > org.apache.spark.sql.SparkSession$$anonfun$1$$anonfun$apply$1.apply(SparkSession.scala:92) > at > org.apache.spark.sql.SparkSession$$anonfun$1$$anonfun$apply$1.apply(SparkSession.scala:92) > at scala.Option.map(Option.scala:146) > at > org.apache.spark.sql.SparkSession$$anonfun$1.apply(SparkSession.scala:92) > at > org.apache.spark.sql.SparkSession$$anonfun$1.apply(SparkSession.scala:91) > at org.apache.spark.sql.internal.SQLConf$.get(SQLConf.scala:110) > at org.apache.spark.sql.types.DataType.sameType(DataType.scala:84) > at > org.apache.spark.sql.catalyst.analysis.TypeCoercion$$anonfun$1.apply(TypeCoercion.scala:105) > at > org.apache.spark.sql.catalyst.analysis.TypeCoercion$$anonfun$1.apply(TypeCoercion.scala:86) > {noformat} > I doubt this is actually because of BucketedWriteWithoutHiveSupportSuite. I > think it has something more to do with {{SparkSession}} 's lazy evaluation of > {{SharedState}} doing something funny with the way we setup the test spark > context etc ... though I don't really understand it yet. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org