Github user cloud-fan commented on the issue: https://github.com/apache/spark/pull/21859 I don't think this optimization should be done at SQL layer. The `ShuffleWriter` should treat `RangePartitioner` specially and consume the sampled data in `RangePartitioner` instead of the input iterator. By doing that the SQL layer(as well as all other components) can benefit from it.
--- --------------------------------------------------------------------- To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org