Re: merge 3 different types of RDDs in one

Praveen Chundi Tue, 01 Dec 2015 02:01:01 -0800

cogroup could be useful to you, since all three are PairRDD's.


https://spark.apache.org/docs/latest/api/scala/index.html#org.apache.spark.rdd.PairRDDFunctions

Best Regards,
Praveen


On 01.12.2015 10:47, Shams ul Haque wrote:

Hi All,
I have made 3 RDDs of 3 different dataset, all RDDs are grouped byCustomerID in which 2 RDDs have value of Iterable type and one hassignle bean. All RDDs have id of Long type as CustomerId. Below arethe model for 3 RDDs:
JavaPairRDD<Long, Iterable<TransactionInfo>>
JavaPairRDD<Long, Iterable<TransactionRaw>>
JavaPairRDD<Long, TransactionAggr>
Now, i have to merge all these 3 RDDs as signle one so that i cangenerate excel report as per each customer by using data in 3 RDDs.As i tried to using Join Transformation but it needs RDDs of same typeand it works for two RDDs.
So my questions is,
1. is there any way to done my merging task efficiently, so that i canget all 3 dataset by CustomerId?2. If i merge 1st two using Join Transformation, then do i need to rungroupByKey() before Join so that all data related to single customerwill be on one node?
Thanks
Shams



---------------------------------------------------------------------
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
For additional commands, e-mail: user-h...@spark.apache.org

Re: merge 3 different types of RDDs in one

Reply via email to