cogroup could be useful to you, since all three are PairRDD's.

https://spark.apache.org/docs/latest/api/scala/index.html#org.apache.spark.rdd.PairRDDFunctions

Best Regards,
Praveen


On 01.12.2015 10:47, Shams ul Haque wrote:
Hi All,

I have made 3 RDDs of 3 different dataset, all RDDs are grouped by CustomerID in which 2 RDDs have value of Iterable type and one has signle bean. All RDDs have id of Long type as CustomerId. Below are the model for 3 RDDs:
JavaPairRDD<Long, Iterable<TransactionInfo>>
JavaPairRDD<Long, Iterable<TransactionRaw>>
JavaPairRDD<Long, TransactionAggr>

Now, i have to merge all these 3 RDDs as signle one so that i can generate excel report as per each customer by using data in 3 RDDs. As i tried to using Join Transformation but it needs RDDs of same type and it works for two RDDs.
So my questions is,
1. is there any way to done my merging task efficiently, so that i can get all 3 dataset by CustomerId? 2. If i merge 1st two using Join Transformation, then do i need to run groupByKey() before Join so that all data related to single customer will be on one node?


Thanks
Shams


---------------------------------------------------------------------
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
For additional commands, e-mail: user-h...@spark.apache.org

Reply via email to