Hm, I don't think the sort partitioner is going to cause the result to be ordered by c1,c2 if you only partitioned on c1. I mean, it's not even guaranteed that the type of c2 has an ordering, right?
On Tue, Feb 3, 2015 at 3:38 PM, nitinkak001 <nitinkak...@gmail.com> wrote: > I am trying to implement secondary sort in spark as we do in map-reduce. > > Here is my data(tab separated, without c1, c2, c2). > c1 c2 c3 > 1 2 4 > 1 3 6 > 2 4 7 > 2 6 8 > 3 5 5 > 3 1 8 > 3 2 0 > > To do secondary sort, I create paried RDD as > > /((c1 + ","+ c2), row)/ > > and then use a custom partitioner to partition only on c1. I have set > /spark.shuffle.manager = SORT/ so the keys per partition are sorted. For the > key "3" I am expecting to get > (3, 1) > (3, 2) > (3, 5) > but still getting the original order > 3,5 > 3,1 > 3,2 > > Here is the custom partitioner code: > > /class StraightPartitioner(p: Int) extends org.apache.spark.Partitioner { > def numPartitions = p > def getPartition(key: Any) = { > key.asInstanceOf[String].split(",")(0).toInt > } > > }/ > > and driver code, please tell me what I am doing wrong > > /val conf = new SparkConf().setAppName("MapInheritanceExample") > conf.set("spark.shuffle.manager", "SORT"); > val sc = new SparkContext(conf) > val pF = sc.textFile(inputFile) > > val log = LogFactory.getLog("MapFunctionTest") > val partitionedRDD = pF.map { x => > > var arr = x.split("\t"); > (arr(0)+","+arr(1), null) > > }.partitionBy(new StraightPartitioner(10)) > > var outputRDD = partitionedRDD.mapPartitions(p => { > p.map({ case(o, n) => { > o > } > }) > })/ > > > > -- > View this message in context: > http://apache-spark-user-list.1001560.n3.nabble.com/Sort-based-shuffle-not-working-properly-tp21487.html > Sent from the Apache Spark User List mailing list archive at Nabble.com. > > --------------------------------------------------------------------- > To unsubscribe, e-mail: user-unsubscr...@spark.apache.org > For additional commands, e-mail: user-h...@spark.apache.org > --------------------------------------------------------------------- To unsubscribe, e-mail: user-unsubscr...@spark.apache.org For additional commands, e-mail: user-h...@spark.apache.org