srowen commented on a change in pull request #23986: [SPARK-27070] Fix 
performance bug in DefaultPartitionCoalescer
URL: https://github.com/apache/spark/pull/23986#discussion_r262935136
 
 

 ##########
 File path: core/src/main/scala/org/apache/spark/rdd/CoalescedRDD.scala
 ##########
 @@ -297,9 +298,8 @@ private class DefaultPartitionCoalescer(val balanceSlack: 
Double = 0.10)
       partitionLocs: PartitionLocations): PartitionGroup = {
     val slack = (balanceSlack * prev.partitions.length).toInt
     // least loaded pref locs
-    val pref = currPrefLocs(p, 
prev).map(getLeastGroupHash(_)).sortWith(compare)
-    val prefPart = if (pref == Nil) None else pref.head
-
+    val pref = currPrefLocs(p, prev).flatMap(getLeastGroupHash)
+    val prefPart = if (pref.isEmpty) None else Some(pref.min)
 
 Review comment:
   Likewise, `pref.minBy(_.numPartitions)`?

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
[email protected]


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to