Re: merge 3 different types of RDDs in one

2015-12-01 Thread Shams ul Haque
Hi Jacek,

Thanks for the suggestion, i am going to try union.
And what is your opinion on 2nd question.


Thanks
Shams

On Tue, Dec 1, 2015 at 3:23 PM, Jacek Laskowski  wrote:

> Hi,
>
> Never done it before, but just yesterday I found out about
> SparkContext.union method that could help in your case.
>
> def union[T](rdds: Seq[RDD[T]])(implicit arg0: ClassTag[T]): RDD[T]
>
>
> http://spark.apache.org/docs/latest/api/scala/index.html#org.apache.spark.SparkContext
>
> Pozdrawiam,
> Jacek
>
> --
> Jacek Laskowski | https://medium.com/@jaceklaskowski/ |
> http://blog.jaceklaskowski.pl
> Mastering Spark https://jaceklaskowski.gitbooks.io/mastering-apache-spark/
> Follow me at https://twitter.com/jaceklaskowski
> Upvote at http://stackoverflow.com/users/1305344/jacek-laskowski
>
>
> On Tue, Dec 1, 2015 at 10:47 AM, Shams ul Haque 
> wrote:
> > Hi All,
> >
> > I have made 3 RDDs of 3 different dataset, all RDDs are grouped by
> > CustomerID in which 2 RDDs have value of Iterable type and one has signle
> > bean. All RDDs have id of Long type as CustomerId. Below are the model
> for 3
> > RDDs:
> > JavaPairRDD
> > JavaPairRDD
> > JavaPairRDD
> >
> > Now, i have to merge all these 3 RDDs as signle one so that i can
> generate
> > excel report as per each customer by using data in 3 RDDs.
> > As i tried to using Join Transformation but it needs RDDs of same type
> and
> > it works for two RDDs.
> > So my questions is,
> > 1. is there any way to done my merging task efficiently, so that i can
> get
> > all 3 dataset by CustomerId?
> > 2. If i merge 1st two using Join Transformation, then do i need to run
> > groupByKey() before Join so that all data related to single customer
> will be
> > on one node?
> >
> >
> > Thanks
> > Shams
>


Re: merge 3 different types of RDDs in one

2015-12-01 Thread Praveen Chundi

cogroup could be useful to you, since all three are PairRDD's.

https://spark.apache.org/docs/latest/api/scala/index.html#org.apache.spark.rdd.PairRDDFunctions

Best Regards,
Praveen


On 01.12.2015 10:47, Shams ul Haque wrote:

Hi All,

I have made 3 RDDs of 3 different dataset, all RDDs are grouped by 
CustomerID in which 2 RDDs have value of Iterable type and one has 
signle bean. All RDDs have id of Long type as CustomerId. Below are 
the model for 3 RDDs:

JavaPairRDD
JavaPairRDD
JavaPairRDD

Now, i have to merge all these 3 RDDs as signle one so that i can 
generate excel report as per each customer by using data in 3 RDDs.
As i tried to using Join Transformation but it needs RDDs of same type 
and it works for two RDDs.

So my questions is,
1. is there any way to done my merging task efficiently, so that i can 
get all 3 dataset by CustomerId?
2. If i merge 1st two using Join Transformation, then do i need to run 
groupByKey() before Join so that all data related to single customer 
will be on one node?



Thanks
Shams



-
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
For additional commands, e-mail: user-h...@spark.apache.org



Re: merge 3 different types of RDDs in one

2015-12-01 Thread Sonal Goyal
I think you should be able to join different  rdds with same key. Have you
tried that?
On Dec 1, 2015 3:30 PM, "Praveen Chundi"  wrote:

> cogroup could be useful to you, since all three are PairRDD's.
>
>
> https://spark.apache.org/docs/latest/api/scala/index.html#org.apache.spark.rdd.PairRDDFunctions
>
> Best Regards,
> Praveen
>
>
> On 01.12.2015 10:47, Shams ul Haque wrote:
>
>> Hi All,
>>
>> I have made 3 RDDs of 3 different dataset, all RDDs are grouped by
>> CustomerID in which 2 RDDs have value of Iterable type and one has signle
>> bean. All RDDs have id of Long type as CustomerId. Below are the model for
>> 3 RDDs:
>> JavaPairRDD
>> JavaPairRDD
>> JavaPairRDD
>>
>> Now, i have to merge all these 3 RDDs as signle one so that i can
>> generate excel report as per each customer by using data in 3 RDDs.
>> As i tried to using Join Transformation but it needs RDDs of same type
>> and it works for two RDDs.
>> So my questions is,
>> 1. is there any way to done my merging task efficiently, so that i can
>> get all 3 dataset by CustomerId?
>> 2. If i merge 1st two using Join Transformation, then do i need to run
>> groupByKey() before Join so that all data related to single customer will
>> be on one node?
>>
>>
>> Thanks
>> Shams
>>
>
>
> -
> To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
> For additional commands, e-mail: user-h...@spark.apache.org
>
>


Re: merge 3 different types of RDDs in one

2015-12-01 Thread Sushrut Ikhar
Hi,
I have myself used union in a similar case. And applied reduceByKey on it.
Union + reduceByKey will suffice join... but you will have to first use Map
so that all values are of same datatype

Regards,

Sushrut Ikhar
[image: https://]about.me/sushrutikhar



On Tue, Dec 1, 2015 at 3:34 PM, Sonal Goyal  wrote:

> I think you should be able to join different  rdds with same key. Have you
> tried that?
> On Dec 1, 2015 3:30 PM, "Praveen Chundi"  wrote:
>
>> cogroup could be useful to you, since all three are PairRDD's.
>>
>>
>> https://spark.apache.org/docs/latest/api/scala/index.html#org.apache.spark.rdd.PairRDDFunctions
>>
>> Best Regards,
>> Praveen
>>
>>
>> On 01.12.2015 10:47, Shams ul Haque wrote:
>>
>>> Hi All,
>>>
>>> I have made 3 RDDs of 3 different dataset, all RDDs are grouped by
>>> CustomerID in which 2 RDDs have value of Iterable type and one has signle
>>> bean. All RDDs have id of Long type as CustomerId. Below are the model for
>>> 3 RDDs:
>>> JavaPairRDD
>>> JavaPairRDD
>>> JavaPairRDD
>>>
>>> Now, i have to merge all these 3 RDDs as signle one so that i can
>>> generate excel report as per each customer by using data in 3 RDDs.
>>> As i tried to using Join Transformation but it needs RDDs of same type
>>> and it works for two RDDs.
>>> So my questions is,
>>> 1. is there any way to done my merging task efficiently, so that i can
>>> get all 3 dataset by CustomerId?
>>> 2. If i merge 1st two using Join Transformation, then do i need to run
>>> groupByKey() before Join so that all data related to single customer will
>>> be on one node?
>>>
>>>
>>> Thanks
>>> Shams
>>>
>>
>>
>> -
>> To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
>> For additional commands, e-mail: user-h...@spark.apache.org
>>
>>


Re: merge 3 different types of RDDs in one

2015-12-01 Thread Jacek Laskowski
On Tue, Dec 1, 2015 at 10:57 AM, Shams ul Haque  wrote:

> Thanks for the suggestion, i am going to try union.

...and please report your findings back.

> And what is your opinion on 2nd question.

Dunno. If you find a solution, let us know.

Jacek

-
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
For additional commands, e-mail: user-h...@spark.apache.org