[jira] [Commented] (SPARK-10548) Concurrent execution in SQL does not work
[ https://issues.apache.org/jira/browse/SPARK-10548?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15185528#comment-15185528 ] Shixiong Zhu commented on SPARK-10548: -- Open SPARK-13747 for further discussion > Concurrent execution in SQL does not work > - > > Key: SPARK-10548 > URL: https://issues.apache.org/jira/browse/SPARK-10548 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 1.5.0 >Reporter: Andrew Or >Assignee: Andrew Or >Priority: Blocker > Fix For: 1.5.1, 1.6.0 > > > From the mailing list: > {code} > future { df1.count() } > future { df2.count() } > java.lang.IllegalArgumentException: spark.sql.execution.id is already set > at > org.apache.spark.sql.execution.SQLExecution$.withNewExecutionId(SQLExecution.scala:87) > > at > org.apache.spark.sql.DataFrame.withNewExecutionId(DataFrame.scala:1904) > at org.apache.spark.sql.DataFrame.collect(DataFrame.scala:1385) > {code} > === edit === > Simple reproduction: > {code} > (1 to 100).par.foreach { _ => > sc.parallelize(1 to 5).map { i => (i, i) }.toDF("a", "b").count() > } > {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-10548) Concurrent execution in SQL does not work
[ https://issues.apache.org/jira/browse/SPARK-10548?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15184075#comment-15184075 ] nicerobot commented on SPARK-10548: --- Thanks [~zsxwing]. What's the recommended way to accomplish that? > Concurrent execution in SQL does not work > - > > Key: SPARK-10548 > URL: https://issues.apache.org/jira/browse/SPARK-10548 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 1.5.0 >Reporter: Andrew Or >Assignee: Andrew Or >Priority: Blocker > Fix For: 1.5.1, 1.6.0 > > > From the mailing list: > {code} > future { df1.count() } > future { df2.count() } > java.lang.IllegalArgumentException: spark.sql.execution.id is already set > at > org.apache.spark.sql.execution.SQLExecution$.withNewExecutionId(SQLExecution.scala:87) > > at > org.apache.spark.sql.DataFrame.withNewExecutionId(DataFrame.scala:1904) > at org.apache.spark.sql.DataFrame.collect(DataFrame.scala:1385) > {code} > === edit === > Simple reproduction: > {code} > (1 to 100).par.foreach { _ => > sc.parallelize(1 to 5).map { i => (i, i) }.toDF("a", "b").count() > } > {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-10548) Concurrent execution in SQL does not work
[ https://issues.apache.org/jira/browse/SPARK-10548?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15183812#comment-15183812 ] Shixiong Zhu commented on SPARK-10548: -- [~nicerobot] The issue was reintroduced by https://github.com/apache/spark/pull/9264 It will call Await.ready in "runJob" to wait for results. "par" uses a ForkJoin thread pool by default and ForkJoin thread pool will try to run new task when Await.ready is called. In this case, new task will see other task's "spark.sql.execution.id". Right now just don't use ForkJoin thread pool to launch Spark jobs until a fix is out. > Concurrent execution in SQL does not work > - > > Key: SPARK-10548 > URL: https://issues.apache.org/jira/browse/SPARK-10548 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 1.5.0 >Reporter: Andrew Or >Assignee: Andrew Or >Priority: Blocker > Fix For: 1.5.1, 1.6.0 > > > From the mailing list: > {code} > future { df1.count() } > future { df2.count() } > java.lang.IllegalArgumentException: spark.sql.execution.id is already set > at > org.apache.spark.sql.execution.SQLExecution$.withNewExecutionId(SQLExecution.scala:87) > > at > org.apache.spark.sql.DataFrame.withNewExecutionId(DataFrame.scala:1904) > at org.apache.spark.sql.DataFrame.collect(DataFrame.scala:1385) > {code} > === edit === > Simple reproduction: > {code} > (1 to 100).par.foreach { _ => > sc.parallelize(1 to 5).map { i => (i, i) }.toDF("a", "b").count() > } > {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-10548) Concurrent execution in SQL does not work
[ https://issues.apache.org/jira/browse/SPARK-10548?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15183660#comment-15183660 ] Shixiong Zhu commented on SPARK-10548: -- Thanks for reporting it. Looking at it. > Concurrent execution in SQL does not work > - > > Key: SPARK-10548 > URL: https://issues.apache.org/jira/browse/SPARK-10548 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 1.5.0 >Reporter: Andrew Or >Assignee: Andrew Or >Priority: Blocker > Fix For: 1.5.1, 1.6.0 > > > From the mailing list: > {code} > future { df1.count() } > future { df2.count() } > java.lang.IllegalArgumentException: spark.sql.execution.id is already set > at > org.apache.spark.sql.execution.SQLExecution$.withNewExecutionId(SQLExecution.scala:87) > > at > org.apache.spark.sql.DataFrame.withNewExecutionId(DataFrame.scala:1904) > at org.apache.spark.sql.DataFrame.collect(DataFrame.scala:1385) > {code} > === edit === > Simple reproduction: > {code} > (1 to 100).par.foreach { _ => > sc.parallelize(1 to 5).map { i => (i, i) }.toDF("a", "b").count() > } > {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-10548) Concurrent execution in SQL does not work
[ https://issues.apache.org/jira/browse/SPARK-10548?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15182265#comment-15182265 ] nicerobot commented on SPARK-10548: --- I might be misunderstanding the solution but i'm not clear how the implementation addresses the problem. The issue appears to be that the ThreadLocal property "spark.sql.execution.id" is not handled properly in thread-pooled environments. The implemented solution is essentially in {{SparkContext}} {code} // Note: make a clone such that changes in the parent properties aren't reflected in // the those of the children threads, which has confusing semantics (SPARK-10563). SerializationUtils.clone(parent).asInstanceOf[Properties] {code} But from what I can tell, the problem isn't related to parent/child threads. It's that {{localProperties}}' {{"spark.sql.execution.id"}} key is retained after a thread completes. When that thread is returned to the pool and reused by another execution, the execution id will remain because it's part of the SparkContext's {{localProperties}}. It seems like a {{"spark.sql.execution.id"}} should be local to an execution context instance (a {{QueryExecution}}?), not global to a thread nor specifically a property of a SQLContext/SparkContext. > Concurrent execution in SQL does not work > - > > Key: SPARK-10548 > URL: https://issues.apache.org/jira/browse/SPARK-10548 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 1.5.0 >Reporter: Andrew Or >Assignee: Andrew Or >Priority: Blocker > Fix For: 1.5.1, 1.6.0 > > > From the mailing list: > {code} > future { df1.count() } > future { df2.count() } > java.lang.IllegalArgumentException: spark.sql.execution.id is already set > at > org.apache.spark.sql.execution.SQLExecution$.withNewExecutionId(SQLExecution.scala:87) > > at > org.apache.spark.sql.DataFrame.withNewExecutionId(DataFrame.scala:1904) > at org.apache.spark.sql.DataFrame.collect(DataFrame.scala:1385) > {code} > === edit === > Simple reproduction: > {code} > (1 to 100).par.foreach { _ => > sc.parallelize(1 to 5).map { i => (i, i) }.toDF("a", "b").count() > } > {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-10548) Concurrent execution in SQL does not work
[ https://issues.apache.org/jira/browse/SPARK-10548?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15134146#comment-15134146 ] Akshay Harale commented on SPARK-10548: --- We are still facing this issue while repeatedly querying cassandra database using spark-cassandra-connector. Spark version 1.5.1 spark-cassandra-connector 1.5.0-M3 > Concurrent execution in SQL does not work > - > > Key: SPARK-10548 > URL: https://issues.apache.org/jira/browse/SPARK-10548 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 1.5.0 >Reporter: Andrew Or >Assignee: Andrew Or >Priority: Blocker > Fix For: 1.5.1, 1.6.0 > > > From the mailing list: > {code} > future { df1.count() } > future { df2.count() } > java.lang.IllegalArgumentException: spark.sql.execution.id is already set > at > org.apache.spark.sql.execution.SQLExecution$.withNewExecutionId(SQLExecution.scala:87) > > at > org.apache.spark.sql.DataFrame.withNewExecutionId(DataFrame.scala:1904) > at org.apache.spark.sql.DataFrame.collect(DataFrame.scala:1385) > {code} > === edit === > Simple reproduction: > {code} > (1 to 100).par.foreach { _ => > sc.parallelize(1 to 5).map { i => (i, i) }.toDF("a", "b").count() > } > {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-10548) Concurrent execution in SQL does not work
[ https://issues.apache.org/jira/browse/SPARK-10548?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14741294#comment-14741294 ] Apache Spark commented on SPARK-10548: -- User 'andrewor14' has created a pull request for this issue: https://github.com/apache/spark/pull/8721 > Concurrent execution in SQL does not work > - > > Key: SPARK-10548 > URL: https://issues.apache.org/jira/browse/SPARK-10548 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 1.5.0 >Reporter: Andrew Or >Assignee: Andrew Or >Priority: Blocker > > From the mailing list: > {code} > future { df1.count() } > future { df2.count() } > java.lang.IllegalArgumentException: spark.sql.execution.id is already set > at > org.apache.spark.sql.execution.SQLExecution$.withNewExecutionId(SQLExecution.scala:87) > > at > org.apache.spark.sql.DataFrame.withNewExecutionId(DataFrame.scala:1904) > at org.apache.spark.sql.DataFrame.collect(DataFrame.scala:1385) > {code} > === edit === > Simple reproduction: > {code} > (1 to 100).par.foreach { _ => > sc.parallelize(1 to 5).map { i => (i, i) }.toDF("a", "b").count() > } > {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-10548) Concurrent execution in SQL does not work
[ https://issues.apache.org/jira/browse/SPARK-10548?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14739938#comment-14739938 ] Apache Spark commented on SPARK-10548: -- User 'andrewor14' has created a pull request for this issue: https://github.com/apache/spark/pull/8710 > Concurrent execution in SQL does not work > - > > Key: SPARK-10548 > URL: https://issues.apache.org/jira/browse/SPARK-10548 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 1.5.0 >Reporter: Andrew Or >Assignee: Andrew Or >Priority: Blocker > > From the mailing list: > {code} > future { df1.count() } > future { df2.count() } > java.lang.IllegalArgumentException: spark.sql.execution.id is already set > at > org.apache.spark.sql.execution.SQLExecution$.withNewExecutionId(SQLExecution.scala:87) > > at > org.apache.spark.sql.DataFrame.withNewExecutionId(DataFrame.scala:1904) > at org.apache.spark.sql.DataFrame.collect(DataFrame.scala:1385) > {code} > === edit === > Simple reproduction: > {code} > (1 to 100).par.foreach { _ => > sc.parallelize(1 to 5).map { i => (i, i) }.toDF("a", "b").count() > } > {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org