Re: how to print RDD by key into file with grouByKey

2015-03-16 Thread Akhil Das
If you want more partitions then you have specify it as:

Rdd.groupByKey(*10*).mapValues...

​I think if you don't specify anything, the # partitions will be the #
cores that you have for processing.​


Thanks
Best Regards

On Sat, Mar 14, 2015 at 12:28 AM, Adrian Mocanu amoc...@verticalscope.com
wrote:

  Hi

 I have an RDD: RDD[(String, scala.Iterable[(Long, Int)])] which I want to
 print into a file, a file for each key string.

 I tried to trigger a repartition of the RDD by doing group by on it. The
 grouping gives RDD[(String, scala.Iterable[Iterable[(Long, Int)]])] so  I
 flattened that:

   Rdd.groupByKey().mapValues(x=x.flatten)



 However, when I print with saveAsTextFile I get only 2 files



 I was under the impression that groupBy repartitions the data by key and
 saveAsTextFile make a file per partition.

 What am I doing wrong here?





 Thanks

 Adrian



how to print RDD by key into file with grouByKey

2015-03-13 Thread Adrian Mocanu
Hi
I have an RDD: RDD[(String, scala.Iterable[(Long, Int)])] which I want to print 
into a file, a file for each key string.
I tried to trigger a repartition of the RDD by doing group by on it. The 
grouping gives RDD[(String, scala.Iterable[Iterable[(Long, Int)]])] so  I 
flattened that:
  Rdd.groupByKey().mapValues(x=x.flatten)

However, when I print with saveAsTextFile I get only 2 files

I was under the impression that groupBy repartitions the data by key and 
saveAsTextFile make a file per partition.
What am I doing wrong here?


Thanks
Adrian