Github user felixcheung commented on the issue:

    https://github.com/apache/spark/pull/20414
  
    > Actually for the first case, you shall use coalesce() instead of 
repartition() to get a similar effect, without need of another shuffle! 
    Not quite - coalesce will not combine partitions across executor (aka 
shuffle) so you could still end up having many many files.
    
    I have seen that quite a bit with large scale ML. But FWIW, my comment 
earlier was for both "regular" use cases and ML use cases.



---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

Reply via email to