[ https://issues.apache.org/jira/browse/SPARK-13365?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15181827#comment-15181827 ]
Josh Rosen commented on SPARK-13365: ------------------------------------ If coalesce is called with {{shuffle == true}} then we might actually want to run the coalesce because the user's intent might be to produce more evenly-balanced partitions. If {{shuffle == false}}, though, then it seems fine to skip the coalesce since it would be a no-op. I believe that Spark SQL performs a similar optimization. > should coalesce do anything if coalescing to same number of partitions > without shuffle > -------------------------------------------------------------------------------------- > > Key: SPARK-13365 > URL: https://issues.apache.org/jira/browse/SPARK-13365 > Project: Spark > Issue Type: Improvement > Components: Spark Core > Affects Versions: 1.6.0 > Reporter: Thomas Graves > > Currently if a user does a coalesce to the same number of partitions as > already exist it spends a bunch of time doing stuff when it seems like it > shouldn't do anything. > for instance I have an RDD with 100 partitions if I run coalesce(100) it > seems like it should skip any computation since it already has 100 > partitions. One case I've seen this is actually when users do coalesce(1000) > without the shuffle which really turns into a coalesce(100). > I'm presenting this as a question as I'm not sure if there are use cases I > haven't thought of where this would break. -- This message was sent by Atlassian JIRA (v6.3.4#6332) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org