Re: sorting output of join operation

Imran Rashid Mon, 23 Feb 2015 08:59:40 -0800

sortByKey() is the probably the easiest way:

import org.apache.spark.SparkContext._
joinedRdd.map{case(word, (file1Counts, file2Counts)) => (file1Counts,
(word, file1Counts, file2Counts))}.sortByKey()


On Mon, Feb 23, 2015 at 10:41 AM, Anupama Joshi <anupama.jo...@gmail.com>
wrote:

> Hi ,
>  To simplify my problem -
> I have 2 files from which I reading words.
> the o/p is like
> file 1
> aaa 4
> bbb 6
> ddd 3
>
> file 2
> ddd 2
> bbb 6
> ttt 5
>
> if I do file1.join(file2)
> I get (ddd(3,2)
>         bbb(6,6)
>
> If I want to sort the output by the number of occurances of the word i
> file1
> . How do I achive that.
> Any help would be appreciated.
> Thanks
> AJ
>

Re: sorting output of join operation

Reply via email to