sortByKey() is the probably the easiest way: import org.apache.spark.SparkContext._ joinedRdd.map{case(word, (file1Counts, file2Counts)) => (file1Counts, (word, file1Counts, file2Counts))}.sortByKey()
On Mon, Feb 23, 2015 at 10:41 AM, Anupama Joshi <anupama.jo...@gmail.com> wrote: > Hi , > To simplify my problem - > I have 2 files from which I reading words. > the o/p is like > file 1 > aaa 4 > bbb 6 > ddd 3 > > file 2 > ddd 2 > bbb 6 > ttt 5 > > if I do file1.join(file2) > I get (ddd(3,2) > bbb(6,6) > > If I want to sort the output by the number of occurances of the word i > file1 > . How do I achive that. > Any help would be appreciated. > Thanks > AJ >