Jiang Xingbo created SPARK-24941: ------------------------------------ Summary: Add RDDBarrier.coalesce() function Key: SPARK-24941 URL: https://issues.apache.org/jira/browse/SPARK-24941 Project: Spark Issue Type: Improvement Components: Spark Core Affects Versions: 3.0.0 Reporter: Jiang Xingbo
https://github.com/apache/spark/pull/21758#discussion_r204917245 The number of partitions from the input data can be unexpectedly large, eg. if you do {code} sc.textFile(...).barrier().mapPartitions() {code} The number of input partitions is based on the hdfs input splits. We shall provide a way in RDDBarrier to enable users to specify the number of tasks in a barrier stage. Maybe something like RDDBarrier.coalesce(numPartitions: Int) . -- This message was sent by Atlassian JIRA (v7.6.3#76005) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org