Akhil, I think Aniket uses the word "persisted" in a different way than what you mean. I.e. not in the RDD.persist() way. Aniket asks if running combineByKey on a sorted RDD will result in a sorted RDD. (I.e. the sorting is preserved.)
I think the answer is no. combineByKey uses AppendOnlyMap, which is a hashmap. That will shuffle your keys. You can quickly verify it in spark-shell: scala> sc.parallelize(7 to 8).map(_ -> 1).reduceByKey(_ + _).collect res0: Array[(Int, Int)] = Array((8,1), (7,1)) (The initial size of the AppendOnlyMap seems to be 8, so 8 is the first number that demonstrates this.) On Wed, Nov 19, 2014 at 9:05 AM, Akhil Das <ak...@sigmoidanalytics.com> wrote: > If something is persisted you can easily see them under the Storage tab in > the web ui. > > Thanks > Best Regards > > On Tue, Nov 18, 2014 at 7:26 PM, Aniket Bhatnagar < > aniket.bhatna...@gmail.com> wrote: > >> I am trying to figure out if sorting is persisted after applying Pair RDD >> transformations and I am not able to decisively tell after reading the >> documentation. >> >> For example: >> val numbers = .. // RDD of numbers >> val pairedNumbers = numbers.map(number => (number % 100, number)) >> val sortedPairedNumbers = pairedNumbers.sortBy(pairedNumber => >> pairedNumber._2) // Sort by values in the pair >> val aggregates = sortedPairedNumbers.combineByKey(..) >> >> In this example, will the combine functions see values in sorted order? >> What if I had done groupByKey and then combineByKey? What transformations >> can unsort an already sorted data? >> > >