[ 
https://issues.apache.org/jira/browse/SPARK-10548?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15182265#comment-15182265
 ] 

nicerobot edited comment on SPARK-10548 at 3/6/16 6:47 PM:
-----------------------------------------------------------

I might be misunderstanding the solution but i'm not clear how the 
implementation addresses the problem. The issue appears to be that the 
ThreadLocal property {{"spark.sql.execution.id"}} is not handled properly 
(cleaned up) in thread-pooled environments. The implemented solution is 
essentially in {{SparkContext}}

{code}
      // Note: make a clone such that changes in the parent properties aren't 
reflected in
      // the those of the children threads, which has confusing semantics 
(SPARK-10563).
      SerializationUtils.clone(parent).asInstanceOf[Properties]
{code}

But from what I can tell, the problem isn't related to parent/child threads. 
It's that {{localProperties}}' {{"spark.sql.execution.id"}} key is retained 
after a thread completes. When that thread is returned to the pool and reused 
by another execution, the execution id will remain because it's part of the 
SparkContext's {{localProperties}}. It seems like a 
{{"spark.sql.execution.id"}} should be local to an execution context instance 
(a {{QueryExecution}}?), not global to a thread nor specifically a property of 
a SQLContext/SparkContext.


was (Author: nicerobot):
I might be misunderstanding the solution but i'm not clear how the 
implementation addresses the problem. The issue appears to be that the 
ThreadLocal property "spark.sql.execution.id" is not handled properly in 
thread-pooled environments. The implemented solution is essentially in 
{{SparkContext}}

{code}
      // Note: make a clone such that changes in the parent properties aren't 
reflected in
      // the those of the children threads, which has confusing semantics 
(SPARK-10563).
      SerializationUtils.clone(parent).asInstanceOf[Properties]
{code}

But from what I can tell, the problem isn't related to parent/child threads. 
It's that {{localProperties}}' {{"spark.sql.execution.id"}} key is retained 
after a thread completes. When that thread is returned to the pool and reused 
by another execution, the execution id will remain because it's part of the 
SparkContext's {{localProperties}}. It seems like a 
{{"spark.sql.execution.id"}} should be local to an execution context instance 
(a {{QueryExecution}}?), not global to a thread nor specifically a property of 
a SQLContext/SparkContext.

> Concurrent execution in SQL does not work
> -----------------------------------------
>
>                 Key: SPARK-10548
>                 URL: https://issues.apache.org/jira/browse/SPARK-10548
>             Project: Spark
>          Issue Type: Bug
>          Components: SQL
>    Affects Versions: 1.5.0
>            Reporter: Andrew Or
>            Assignee: Andrew Or
>            Priority: Blocker
>             Fix For: 1.5.1, 1.6.0
>
>
> From the mailing list:
> {code}
> future { df1.count() } 
> future { df2.count() } 
> java.lang.IllegalArgumentException: spark.sql.execution.id is already set 
>         at 
> org.apache.spark.sql.execution.SQLExecution$.withNewExecutionId(SQLExecution.scala:87)
>  
>         at 
> org.apache.spark.sql.DataFrame.withNewExecutionId(DataFrame.scala:1904) 
>         at org.apache.spark.sql.DataFrame.collect(DataFrame.scala:1385) 
> {code}
> === edit ===
> Simple reproduction:
> {code}
> (1 to 100).par.foreach { _ =>
>   sc.parallelize(1 to 5).map { i => (i, i) }.toDF("a", "b").count()
> }
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

Reply via email to