Github user gaborgsomogyi commented on a diff in the pull request:

    https://github.com/apache/spark/pull/21214#discussion_r185792166
  
    --- Diff: 
sql/core/src/test/scala/org/apache/spark/sql/DataFrameRangeSuite.scala ---
    @@ -153,23 +153,17 @@ class DataFrameRangeSuite extends QueryTest with 
SharedSQLContext with Eventuall
     
       test("Cancelling stage in a query with Range.") {
         val listener = new SparkListener {
    -      override def onJobStart(jobStart: SparkListenerJobStart): Unit = {
    -        eventually(timeout(10.seconds), interval(1.millis)) {
    -          assert(DataFrameRangeSuite.stageToKill > 0)
    -        }
    -        sparkContext.cancelStage(DataFrameRangeSuite.stageToKill)
    +      override def onTaskStart(taskStart: SparkListenerTaskStart): Unit = {
    +        sparkContext.cancelStage(taskStart.stageId)
           }
         }
     
         sparkContext.addSparkListener(listener)
         for (codegen <- Seq(true, false)) {
           withSQLConf(SQLConf.WHOLESTAGE_CODEGEN_ENABLED.key -> 
codegen.toString()) {
    -        DataFrameRangeSuite.stageToKill = -1
             val ex = intercept[SparkException] {
    -          spark.range(0, 100000000000L, 1, 1).map { x =>
    -            DataFrameRangeSuite.stageToKill = TaskContext.get().stageId()
    -            x
    -          }.toDF("id").agg(sum("id")).collect()
    +          spark.range(0, 100000000000L, 1, 1)
    --- End diff --
    
    Yeah, after extensive testing and tryout I was thinking about this as well 
but hesitating to stop the executor thread any kind of way. The reasons behind:
    - Executor code can be synchronized only with object variables, otherwise 
`NotSerializableException` comes. As a result of introducing object 
`CountDownLatch ` similar race will appear just like 
`DataFrameRangeSuite.stageToKill` was overwritten.
    - `spark.range(0, 100000000000L, 1, 1)` takes ~1-2 minutes to get 
calculated which is quite a big time window not to be flaky this way.
    - Minor concern(rather note) if no timeout syncronization way used the test 
code get stuck in `collect` and never reaches `waitUntilEmpty` and 
`eventually...`.
    
    All in all not recommend to do it because it could end up in 5 hours 
timeout.



---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

Reply via email to