Hi, I'm wondering if I change RangePartitioner in sortBy to another partitioner like HashPartitioner. The first thing that comes into my head is that it can not be replaceable due to RangePartitioner is a part of the sort algorithm. If we call mapPartitions on key based partition after sorting, we need to repartition or coalece the dataset because it is rangepartitioned. In this case, we can not avoid shuffle dataset twice during sorting and repartitioning. It makes performance issues in large dataset.
Thanks, Kevin -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/Partitioner-in-sortBy-tp20614.html Sent from the Apache Spark User List mailing list archive at Nabble.com. --------------------------------------------------------------------- To unsubscribe, e-mail: user-unsubscr...@spark.apache.org For additional commands, e-mail: user-h...@spark.apache.org