subject:"sorting output of join operation"

Re: sorting output of join operation

2015-02-23 Thread Imran Rashid

sortByKey() is the probably the easiest way: import org.apache.spark.SparkContext._ joinedRdd.map{case(word, (file1Counts, file2Counts)) = (file1Counts, (word, file1Counts, file2Counts))}.sortByKey() On Mon, Feb 23, 2015 at 10:41 AM, Anupama Joshi anupama.jo...@gmail.com wrote: Hi , To

sorting output of join operation

2015-02-23 Thread Anupama Joshi

Hi , To simplify my problem - I have 2 files from which I reading words. the o/p is like file 1 aaa 4 bbb 6 ddd 3 file 2 ddd 2 bbb 6 ttt 5 if I do file1.join(file2) I get (ddd(3,2) bbb(6,6) If I want to sort the output by the number of occurances of the word i file1 . How do I achive