RangePartitioning skewed data

jluan Mon, 25 Jan 2016 00:47:33 -0800

Lets say I have a dataset of (K,V) where the keys are really skewed:

myDataRDD = 
[(8, 1), (8, 13), (1,1), (2,4)]
[(8, 12), (8, 15), (8, 7), (8, 6), (8, 4), (8, 3), (8, 4), (10,2)]


If I applied a RangePartitioner to this set of data, say val rangePart = new
RangePartitioner(4, myDataRDD) and then repartitioned the data, would I be
able to get back 4 equally distributed partitions where Key=8 would be split
across multiple partitions, or would all the 8 keys end up in one partition?

Also, does myDataRDD need to be sorted in order to correctly create the
range partitioner? My research shows this may be the case.





--
View this message in context: 
http://apache-spark-user-list.1001560.n3.nabble.com/RangePartitioning-skewed-data-tp26055.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.

---------------------------------------------------------------------
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
For additional commands, e-mail: user-h...@spark.apache.org

RangePartitioning skewed data

Reply via email to