to split an RDD to multiple ones?

2015-05-02 Thread Yifan LI
Hi,

I have an RDD srdd containing (unordered-)data like this:
s1_0, s3_0, s2_1, s2_2, s3_1, s1_3, s1_2, …

What I want is (it will be much better if they could be in ascending order):
srdd_s1:
s1_0, s1_1, s1_2, …, s1_n
srdd_s2:
s2_0, s2_1, s2_2, …, s2_n
srdd_s3:
s3_0, s3_1, s3_2, …, s3_n
…
…

Have any idea? Thanks in advance! :)


Best,
Yifan LI







Re: to split an RDD to multiple ones?

2015-05-02 Thread Olivier Girardot
I guess :

val srdd_s1 = srdd.filter(_.startsWith(s1_)).sortBy(_)
val srdd_s2 = srdd.filter(_.startsWith(s2_)).sortBy(_)
val srdd_s3 = srdd.filter(_.startsWith(s3_)).sortBy(_)

Regards,

Olivier.

Le sam. 2 mai 2015 à 22:53, Yifan LI iamyifa...@gmail.com a écrit :

 Hi,

 I have an RDD *srdd* containing (unordered-)data like this:
 s1_0, s3_0, s2_1, s2_2, s3_1, s1_3, s1_2, …

 What I want is (it will be much better if they could be in ascending
 order):
 *srdd_s1*:
 s1_0, s1_1, s1_2, …, s1_n
 *srdd_s2*:
 s2_0, s2_1, s2_2, …, s2_n
 *srdd_s3*:
 s3_0, s3_1, s3_2, …, s3_n
 …
 …

 Have any idea? Thanks in advance! :)


 Best,
 Yifan LI








Re: to split an RDD to multiple ones?

2015-05-02 Thread Yifan LI
Thanks, Olivier and Franz. :)


Best,
Yifan LI





 On 02 May 2015, at 23:23, Olivier Girardot ssab...@gmail.com wrote:
 
 I guess : 
 
 val srdd_s1 = srdd.filter(_.startsWith(s1_)).sortBy(_)
 val srdd_s2 = srdd.filter(_.startsWith(s2_)).sortBy(_)
 val srdd_s3 = srdd.filter(_.startsWith(s3_)).sortBy(_)
 
 Regards, 
 
 Olivier.
 
 Le sam. 2 mai 2015 à 22:53, Yifan LI iamyifa...@gmail.com 
 mailto:iamyifa...@gmail.com a écrit :
 Hi,
 
 I have an RDD srdd containing (unordered-)data like this:
 s1_0, s3_0, s2_1, s2_2, s3_1, s1_3, s1_2, …
 
 What I want is (it will be much better if they could be in ascending order):
 srdd_s1:
 s1_0, s1_1, s1_2, …, s1_n
 srdd_s2:
 s2_0, s2_1, s2_2, …, s2_n
 srdd_s3:
 s3_0, s3_1, s3_2, …, s3_n
 …
 …
 
 Have any idea? Thanks in advance! :)
 
 
 Best,
 Yifan LI