Lets say I have a dataset of (K,V) where the keys are really skewed: myDataRDD = [(8, 1), (8, 13), (1,1), (2,4)] [(8, 12), (8, 15), (8, 7), (8, 6), (8, 4), (8, 3), (8, 4), (10,2)]
If I applied a RangePartitioner to this set of data, say val rangePart = new RangePartitioner(4, myDataRDD) and then repartitioned the data, would I be able to get back 4 equally distributed partitions where Key=8 would be split across multiple partitions, or would all the 8 keys end up in one partition? Also, does myDataRDD need to be sorted in order to correctly create the range partitioner? My research shows this may be the case. -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/RangePartitioning-skewed-data-tp26055.html Sent from the Apache Spark User List mailing list archive at Nabble.com. --------------------------------------------------------------------- To unsubscribe, e-mail: user-unsubscr...@spark.apache.org For additional commands, e-mail: user-h...@spark.apache.org