Github user HyukjinKwon commented on the issue: https://github.com/apache/spark/pull/15835 Ah, I missed the last comment from the old PR. Okay, we can make this shaped nicer. BTW, Spark collects small partitions for each task so I guess this would not introduce a lot of tasks always but yes, I guess it is still a valid point to reduce the number of tasks. Right, I am fine with this. I thought the original PR was taken over without the courtesy of giving a notification or talking about this ahead. For the benchmark, I have a PR with a benchmark in a PR, 14660, which I also referred from other PRs. I have just few minor notes which are, maybe `Closes #14649` can be added at the end of the PR description so that the merge script from committers could close the original one if this one gets merged. Another one is, there are some style guide lines I usually refer which are [wiki](https://cwiki.apache.org/confluence/display/SPARK/Spark+Code+Style+Guide) and [databricks/scala-style-guide](https://github.com/databricks/scala-style-guide). I will leave some comments on the changed files.
--- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- --------------------------------------------------------------------- To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org