[jira] [Comment Edited] (SPARK-13747) Concurrent execution in SQL doesn't work with Scala ForkJoinPool

Saif Addin (JIRA) Mon, 08 May 2017 12:13:35 -0700

    [ 
https://issues.apache.org/jira/browse/SPARK-13747?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16001356#comment-16001356
 ]


Saif Addin edited comment on SPARK-13747 at 5/8/17 7:12 PM:
------------------------------------------------------------

Sorry for the confusion. No, it doesnt work. I am currently trying out with 
using different execution contexts.

My issue happens always, it is 100% reproducible.

To simplify what I am doing:

1. akka-http server is started and REST DSL is setup
2. Inside a get dsl, I call a Spark dataframe which calls collect action from 
within a Future
3. the object containing the future calls await.result, as I need the dataframe 
to respond a 200 to http
4. the collect method is passed through as an annonymous function. runtime 
exception poinst at such annonymous function as the callback starter of my 
exception
5. The future call is handled by a thread pool manged by spark pool. Uses FAIR 
scheduling.

When my website starts, 4 collects are called simoultaneously. Only one get 
call returns 200. The others are internal server errors.


was (Author: revolucion09):
Sorry for the confusion. No, it doesnt work. I am currently trying out with 
using different execution contexts.

My issue happens always, it is 100% reproducible.

To simplify what I am doing:

1. akka-http server is started and REST DSL is setup
2. Inside a get dsl, I call a Spark dataframe which calls collect action from 
within a Future
3. the object containing the future calls await.result, as I need the dataframe 
to respond a 200 to http
4. the collect method is passed through as an annonymous function. runtime 
exception poinst at such annonymous function as the callback starter of my 
exception

When my website starts, 4 collects are called simoultaneously. Only one get 
call returns 200. The others are internal server errors.

> Concurrent execution in SQL doesn't work with Scala ForkJoinPool
> ----------------------------------------------------------------
>
>                 Key: SPARK-13747
>                 URL: https://issues.apache.org/jira/browse/SPARK-13747
>             Project: Spark
>          Issue Type: Bug
>          Components: SQL
>    Affects Versions: 2.0.0, 2.0.1
>            Reporter: Shixiong Zhu
>            Assignee: Shixiong Zhu
>
> Run the following codes may fail
> {code}
> (1 to 100).par.foreach { _ =>
>   println(sc.parallelize(1 to 5).map { i => (i, i) }.toDF("a", "b").count())
> }
> java.lang.IllegalArgumentException: spark.sql.execution.id is already set 
>         at 
> org.apache.spark.sql.execution.SQLExecution$.withNewExecutionId(SQLExecution.scala:87)
>  
>         at 
> org.apache.spark.sql.DataFrame.withNewExecutionId(DataFrame.scala:1904) 
>         at org.apache.spark.sql.DataFrame.collect(DataFrame.scala:1385) 
> {code}
> This is because SparkContext.runJob can be suspended when using a 
> ForkJoinPool (e.g.,scala.concurrent.ExecutionContext.Implicits.global) as it 
> calls Await.ready (introduced by https://github.com/apache/spark/pull/9264).
> So when SparkContext.runJob is suspended, ForkJoinPool will run another task 
> in the same thread, however, the local properties has been polluted.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Comment Edited] (SPARK-13747) Concurrent execution in SQL doesn't work with Scala ForkJoinPool

Reply via email to