[ https://issues.apache.org/jira/browse/SPARK-13747?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16001356#comment-16001356 ]
Saif Addin edited comment on SPARK-13747 at 5/8/17 7:12 PM: ------------------------------------------------------------ Sorry for the confusion. No, it doesnt work. I am currently trying out with using different execution contexts. My issue happens always, it is 100% reproducible. To simplify what I am doing: 1. akka-http server is started and REST DSL is setup 2. Inside a get dsl, I call a Spark dataframe which calls collect action from within a Future 3. the object containing the future calls await.result, as I need the dataframe to respond a 200 to http 4. the collect method is passed through as an annonymous function. runtime exception poinst at such annonymous function as the callback starter of my exception 5. The future call is handled by a thread pool manged by spark pool. Uses FAIR scheduling. When my website starts, 4 collects are called simoultaneously. Only one get call returns 200. The others are internal server errors. was (Author: revolucion09): Sorry for the confusion. No, it doesnt work. I am currently trying out with using different execution contexts. My issue happens always, it is 100% reproducible. To simplify what I am doing: 1. akka-http server is started and REST DSL is setup 2. Inside a get dsl, I call a Spark dataframe which calls collect action from within a Future 3. the object containing the future calls await.result, as I need the dataframe to respond a 200 to http 4. the collect method is passed through as an annonymous function. runtime exception poinst at such annonymous function as the callback starter of my exception When my website starts, 4 collects are called simoultaneously. Only one get call returns 200. The others are internal server errors. > Concurrent execution in SQL doesn't work with Scala ForkJoinPool > ---------------------------------------------------------------- > > Key: SPARK-13747 > URL: https://issues.apache.org/jira/browse/SPARK-13747 > Project: Spark > Issue Type: Bug > Components: SQL > Affects Versions: 2.0.0, 2.0.1 > Reporter: Shixiong Zhu > Assignee: Shixiong Zhu > > Run the following codes may fail > {code} > (1 to 100).par.foreach { _ => > println(sc.parallelize(1 to 5).map { i => (i, i) }.toDF("a", "b").count()) > } > java.lang.IllegalArgumentException: spark.sql.execution.id is already set > at > org.apache.spark.sql.execution.SQLExecution$.withNewExecutionId(SQLExecution.scala:87) > > at > org.apache.spark.sql.DataFrame.withNewExecutionId(DataFrame.scala:1904) > at org.apache.spark.sql.DataFrame.collect(DataFrame.scala:1385) > {code} > This is because SparkContext.runJob can be suspended when using a > ForkJoinPool (e.g.,scala.concurrent.ExecutionContext.Implicits.global) as it > calls Await.ready (introduced by https://github.com/apache/spark/pull/9264). > So when SparkContext.runJob is suspended, ForkJoinPool will run another task > in the same thread, however, the local properties has been polluted. -- This message was sent by Atlassian JIRA (v6.3.15#6346) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org