Github user vgankidi commented on the issue: https://github.com/apache/spark/pull/19634 We will end up having fewer combined splits. That reduces the number of files that the job produces and also reduces the number of tasks in the downstream jobs. In some tests I have noticed about 10% reduction in the combined splits. However, the simple implementation of FFD has O(n^2) run time.
--- --------------------------------------------------------------------- To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org