Github user jiangxb1987 commented on a diff in the pull request:

    https://github.com/apache/spark/pull/21758#discussion_r205100534
  
    --- Diff: core/src/main/scala/org/apache/spark/rdd/RDD.scala ---
    @@ -1647,6 +1647,14 @@ abstract class RDD[T: ClassTag](
         }
       }
     
    +  /**
    +   * :: Experimental ::
    +   * Indicates that Spark must launch the tasks together for the current 
stage.
    +   */
    +  @Experimental
    +  @Since("2.4.0")
    +  def barrier(): RDDBarrier[T] = withScope(new RDDBarrier[T](this))
    --- End diff --
    
    Em, thanks for raising this question. IMO we indeed require users be aware 
of how many tasks they may launch for a barrier stage, and tasks may exchange 
internal data between each other in the middle, so users really care about the 
task numbers. I agree it shall be very useful to enable specify the number of 
tasks in a barrier stage, maybe we can have `RDDBarrier.coalesce(numPartitions: 
Int)` to enforce the number of tasks to be launched together in a barrier stage.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

Reply via email to