[jira] [Commented] (SPARK-10548) Concurrent execution in SQL does not work

2016-03-08 Thread Shixiong Zhu (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-10548?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15185528#comment-15185528
 ] 

Shixiong Zhu commented on SPARK-10548:
--

Open SPARK-13747 for further discussion

> Concurrent execution in SQL does not work
> -
>
> Key: SPARK-10548
> URL: https://issues.apache.org/jira/browse/SPARK-10548
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 1.5.0
>Reporter: Andrew Or
>Assignee: Andrew Or
>Priority: Blocker
> Fix For: 1.5.1, 1.6.0
>
>
> From the mailing list:
> {code}
> future { df1.count() } 
> future { df2.count() } 
> java.lang.IllegalArgumentException: spark.sql.execution.id is already set 
> at 
> org.apache.spark.sql.execution.SQLExecution$.withNewExecutionId(SQLExecution.scala:87)
>  
> at 
> org.apache.spark.sql.DataFrame.withNewExecutionId(DataFrame.scala:1904) 
> at org.apache.spark.sql.DataFrame.collect(DataFrame.scala:1385) 
> {code}
> === edit ===
> Simple reproduction:
> {code}
> (1 to 100).par.foreach { _ =>
>   sc.parallelize(1 to 5).map { i => (i, i) }.toDF("a", "b").count()
> }
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-10548) Concurrent execution in SQL does not work

2016-03-07 Thread nicerobot (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-10548?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15184075#comment-15184075
 ] 

nicerobot commented on SPARK-10548:
---

Thanks [~zsxwing]. What's the recommended way to accomplish that?

> Concurrent execution in SQL does not work
> -
>
> Key: SPARK-10548
> URL: https://issues.apache.org/jira/browse/SPARK-10548
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 1.5.0
>Reporter: Andrew Or
>Assignee: Andrew Or
>Priority: Blocker
> Fix For: 1.5.1, 1.6.0
>
>
> From the mailing list:
> {code}
> future { df1.count() } 
> future { df2.count() } 
> java.lang.IllegalArgumentException: spark.sql.execution.id is already set 
> at 
> org.apache.spark.sql.execution.SQLExecution$.withNewExecutionId(SQLExecution.scala:87)
>  
> at 
> org.apache.spark.sql.DataFrame.withNewExecutionId(DataFrame.scala:1904) 
> at org.apache.spark.sql.DataFrame.collect(DataFrame.scala:1385) 
> {code}
> === edit ===
> Simple reproduction:
> {code}
> (1 to 100).par.foreach { _ =>
>   sc.parallelize(1 to 5).map { i => (i, i) }.toDF("a", "b").count()
> }
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-10548) Concurrent execution in SQL does not work

2016-03-07 Thread Shixiong Zhu (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-10548?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15183812#comment-15183812
 ] 

Shixiong Zhu commented on SPARK-10548:
--

[~nicerobot] The issue was reintroduced by 
https://github.com/apache/spark/pull/9264

It will call Await.ready in "runJob" to wait for results. "par" uses a ForkJoin 
thread pool by default and ForkJoin thread pool will try to run new task when  
Await.ready is called. In this case, new task will see other task's 
"spark.sql.execution.id".

Right now just don't use ForkJoin thread pool to launch Spark jobs until a fix 
is out.

> Concurrent execution in SQL does not work
> -
>
> Key: SPARK-10548
> URL: https://issues.apache.org/jira/browse/SPARK-10548
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 1.5.0
>Reporter: Andrew Or
>Assignee: Andrew Or
>Priority: Blocker
> Fix For: 1.5.1, 1.6.0
>
>
> From the mailing list:
> {code}
> future { df1.count() } 
> future { df2.count() } 
> java.lang.IllegalArgumentException: spark.sql.execution.id is already set 
> at 
> org.apache.spark.sql.execution.SQLExecution$.withNewExecutionId(SQLExecution.scala:87)
>  
> at 
> org.apache.spark.sql.DataFrame.withNewExecutionId(DataFrame.scala:1904) 
> at org.apache.spark.sql.DataFrame.collect(DataFrame.scala:1385) 
> {code}
> === edit ===
> Simple reproduction:
> {code}
> (1 to 100).par.foreach { _ =>
>   sc.parallelize(1 to 5).map { i => (i, i) }.toDF("a", "b").count()
> }
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-10548) Concurrent execution in SQL does not work

2016-03-07 Thread Shixiong Zhu (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-10548?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15183660#comment-15183660
 ] 

Shixiong Zhu commented on SPARK-10548:
--

Thanks for reporting it. Looking at it.

> Concurrent execution in SQL does not work
> -
>
> Key: SPARK-10548
> URL: https://issues.apache.org/jira/browse/SPARK-10548
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 1.5.0
>Reporter: Andrew Or
>Assignee: Andrew Or
>Priority: Blocker
> Fix For: 1.5.1, 1.6.0
>
>
> From the mailing list:
> {code}
> future { df1.count() } 
> future { df2.count() } 
> java.lang.IllegalArgumentException: spark.sql.execution.id is already set 
> at 
> org.apache.spark.sql.execution.SQLExecution$.withNewExecutionId(SQLExecution.scala:87)
>  
> at 
> org.apache.spark.sql.DataFrame.withNewExecutionId(DataFrame.scala:1904) 
> at org.apache.spark.sql.DataFrame.collect(DataFrame.scala:1385) 
> {code}
> === edit ===
> Simple reproduction:
> {code}
> (1 to 100).par.foreach { _ =>
>   sc.parallelize(1 to 5).map { i => (i, i) }.toDF("a", "b").count()
> }
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-10548) Concurrent execution in SQL does not work

2016-03-06 Thread nicerobot (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-10548?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15182265#comment-15182265
 ] 

nicerobot commented on SPARK-10548:
---

I might be misunderstanding the solution but i'm not clear how the 
implementation addresses the problem. The issue appears to be that the 
ThreadLocal property "spark.sql.execution.id" is not handled properly in 
thread-pooled environments. The implemented solution is essentially in 
{{SparkContext}}

{code}
  // Note: make a clone such that changes in the parent properties aren't 
reflected in
  // the those of the children threads, which has confusing semantics 
(SPARK-10563).
  SerializationUtils.clone(parent).asInstanceOf[Properties]
{code}

But from what I can tell, the problem isn't related to parent/child threads. 
It's that {{localProperties}}' {{"spark.sql.execution.id"}} key is retained 
after a thread completes. When that thread is returned to the pool and reused 
by another execution, the execution id will remain because it's part of the 
SparkContext's {{localProperties}}. It seems like a 
{{"spark.sql.execution.id"}} should be local to an execution context instance 
(a {{QueryExecution}}?), not global to a thread nor specifically a property of 
a SQLContext/SparkContext.

> Concurrent execution in SQL does not work
> -
>
> Key: SPARK-10548
> URL: https://issues.apache.org/jira/browse/SPARK-10548
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 1.5.0
>Reporter: Andrew Or
>Assignee: Andrew Or
>Priority: Blocker
> Fix For: 1.5.1, 1.6.0
>
>
> From the mailing list:
> {code}
> future { df1.count() } 
> future { df2.count() } 
> java.lang.IllegalArgumentException: spark.sql.execution.id is already set 
> at 
> org.apache.spark.sql.execution.SQLExecution$.withNewExecutionId(SQLExecution.scala:87)
>  
> at 
> org.apache.spark.sql.DataFrame.withNewExecutionId(DataFrame.scala:1904) 
> at org.apache.spark.sql.DataFrame.collect(DataFrame.scala:1385) 
> {code}
> === edit ===
> Simple reproduction:
> {code}
> (1 to 100).par.foreach { _ =>
>   sc.parallelize(1 to 5).map { i => (i, i) }.toDF("a", "b").count()
> }
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-10548) Concurrent execution in SQL does not work

2016-02-05 Thread Akshay Harale (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-10548?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15134146#comment-15134146
 ] 

Akshay Harale commented on SPARK-10548:
---

We are still facing this issue while repeatedly querying cassandra database 
using spark-cassandra-connector.

Spark version 1.5.1
spark-cassandra-connector 1.5.0-M3

> Concurrent execution in SQL does not work
> -
>
> Key: SPARK-10548
> URL: https://issues.apache.org/jira/browse/SPARK-10548
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 1.5.0
>Reporter: Andrew Or
>Assignee: Andrew Or
>Priority: Blocker
> Fix For: 1.5.1, 1.6.0
>
>
> From the mailing list:
> {code}
> future { df1.count() } 
> future { df2.count() } 
> java.lang.IllegalArgumentException: spark.sql.execution.id is already set 
> at 
> org.apache.spark.sql.execution.SQLExecution$.withNewExecutionId(SQLExecution.scala:87)
>  
> at 
> org.apache.spark.sql.DataFrame.withNewExecutionId(DataFrame.scala:1904) 
> at org.apache.spark.sql.DataFrame.collect(DataFrame.scala:1385) 
> {code}
> === edit ===
> Simple reproduction:
> {code}
> (1 to 100).par.foreach { _ =>
>   sc.parallelize(1 to 5).map { i => (i, i) }.toDF("a", "b").count()
> }
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-10548) Concurrent execution in SQL does not work

2015-09-11 Thread Apache Spark (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-10548?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14741294#comment-14741294
 ] 

Apache Spark commented on SPARK-10548:
--

User 'andrewor14' has created a pull request for this issue:
https://github.com/apache/spark/pull/8721

> Concurrent execution in SQL does not work
> -
>
> Key: SPARK-10548
> URL: https://issues.apache.org/jira/browse/SPARK-10548
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 1.5.0
>Reporter: Andrew Or
>Assignee: Andrew Or
>Priority: Blocker
>
> From the mailing list:
> {code}
> future { df1.count() } 
> future { df2.count() } 
> java.lang.IllegalArgumentException: spark.sql.execution.id is already set 
> at 
> org.apache.spark.sql.execution.SQLExecution$.withNewExecutionId(SQLExecution.scala:87)
>  
> at 
> org.apache.spark.sql.DataFrame.withNewExecutionId(DataFrame.scala:1904) 
> at org.apache.spark.sql.DataFrame.collect(DataFrame.scala:1385) 
> {code}
> === edit ===
> Simple reproduction:
> {code}
> (1 to 100).par.foreach { _ =>
>   sc.parallelize(1 to 5).map { i => (i, i) }.toDF("a", "b").count()
> }
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-10548) Concurrent execution in SQL does not work

2015-09-10 Thread Apache Spark (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-10548?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14739938#comment-14739938
 ] 

Apache Spark commented on SPARK-10548:
--

User 'andrewor14' has created a pull request for this issue:
https://github.com/apache/spark/pull/8710

> Concurrent execution in SQL does not work
> -
>
> Key: SPARK-10548
> URL: https://issues.apache.org/jira/browse/SPARK-10548
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 1.5.0
>Reporter: Andrew Or
>Assignee: Andrew Or
>Priority: Blocker
>
> From the mailing list:
> {code}
> future { df1.count() } 
> future { df2.count() } 
> java.lang.IllegalArgumentException: spark.sql.execution.id is already set 
> at 
> org.apache.spark.sql.execution.SQLExecution$.withNewExecutionId(SQLExecution.scala:87)
>  
> at 
> org.apache.spark.sql.DataFrame.withNewExecutionId(DataFrame.scala:1904) 
> at org.apache.spark.sql.DataFrame.collect(DataFrame.scala:1385) 
> {code}
> === edit ===
> Simple reproduction:
> {code}
> (1 to 100).par.foreach { _ =>
>   sc.parallelize(1 to 5).map { i => (i, i) }.toDF("a", "b").count()
> }
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org