Partitioner in sortBy

Kevin Jung Wed, 10 Dec 2014 17:44:57 -0800

Hi,
I'm wondering if I change RangePartitioner in sortBy to another partitioner
like HashPartitioner.
The first thing that comes into my head is that it can not be replaceable
due to RangePartitioner is a part of the sort algorithm.
If we call mapPartitions on key based partition after sorting, we need to
repartition or coalece the dataset because it is rangepartitioned.
In this case, we can not avoid shuffle dataset twice during sorting and
repartitioning.
It makes performance issues in large dataset.


Thanks,
Kevin



--
View this message in context: 
http://apache-spark-user-list.1001560.n3.nabble.com/Partitioner-in-sortBy-tp20614.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.

---------------------------------------------------------------------
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
For additional commands, e-mail: user-h...@spark.apache.org

Partitioner in sortBy

Reply via email to