[GitHub] spark pull request #20244: [SPARK-23053][CORE] taskBinarySerialization and t...

squito Fri, 02 Feb 2018 13:43:50 -0800

Github user squito commented on a diff in the pull request:

    https://github.com/apache/spark/pull/20244#discussion_r165761207
  
    --- Diff: 
core/src/test/scala/org/apache/spark/scheduler/DAGSchedulerSuite.scala ---
    @@ -2399,6 +2424,115 @@ class DAGSchedulerSuite extends SparkFunSuite with 
LocalSparkContext with TimeLi
         }
       }
     
    +  /**
    +   * In this test, we simply simulate the scene in concurrent jobs using 
the same
    +   * rdd which is marked to do checkpoint:
    +   * Job one has already finished the spark job, and start the process of 
doCheckpoint;
    +   * Job two is submitted, and submitMissingTasks is called.
    +   * In submitMissingTasks, if taskSerialization is called before 
doCheckpoint is done,
    +   * while part calculates from stage.rdd.partitions is called after 
doCheckpoint is done,
    +   * we may get a ClassCastException when execute the task because of some 
rdd will do
    +   * Partition cast.
    +   *
    +   * With this test case, just want to indicate that we should do 
taskSerialization and
    +   * part calculate in submitMissingTasks with the same rdd checkpoint 
status.
    +   */
    +  test("SPARK-23053: avoid ClassCastException in concurrent execution with 
checkpoint") {
    +    // set checkpointDir.
    +    val tempDir = Utils.createTempDir()
    +    val checkpointDir = File.createTempFile("temp", "", tempDir)
    +    checkpointDir.delete()
    --- End diff --
    
    why do you make a tempfile for the checkpoint dir and then delete it?  why 
not just  `checkpointDir = new File(tempDir, "checkpointing")`?  Or even just 
`checkpointDir = Utils.createTempDir()`?
    
    (CheckpointSuite does this so it can call `sc.setCheckpointDir`, but you're 
not doing that here)



---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #20244: [SPARK-23053][CORE] taskBinarySerialization and t...

Reply via email to