Github user viirya commented on a diff in the pull request: https://github.com/apache/spark/pull/21638#discussion_r197988075 --- Diff: core/src/main/scala/org/apache/spark/input/PortableDataStream.scala --- @@ -45,7 +45,8 @@ private[spark] abstract class StreamFileInputFormat[T] * which is set through setMaxSplitSize */ def setMinPartitions(sc: SparkContext, context: JobContext, minPartitions: Int) { - val defaultMaxSplitBytes = sc.getConf.get(config.FILES_MAX_PARTITION_BYTES) + val defaultMaxSplitBytes = Math.max( + sc.getConf.get(config.FILES_MAX_PARTITION_BYTES), minPartitions) val openCostInBytes = sc.getConf.get(config.FILES_OPEN_COST_IN_BYTES) val defaultParallelism = sc.defaultParallelism --- End diff -- hmm, shouldn't `minPartitions` be used like this? ```scala val defaultParallelism = Math.max(sc.defaultParallelism, if (minPartitions == 0) 1 else minPartitions) ```
--- --------------------------------------------------------------------- To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org