If you want more partitions then you have specify it as: Rdd.groupByKey(*10*).mapValues...
I think if you don't specify anything, the # partitions will be the # cores that you have for processing. Thanks Best Regards On Sat, Mar 14, 2015 at 12:28 AM, Adrian Mocanu <amoc...@verticalscope.com> wrote: > Hi > > I have an RDD: RDD[(String, scala.Iterable[(Long, Int)])] which I want to > print into a file, a file for each key string. > > I tried to trigger a repartition of the RDD by doing group by on it. The > grouping gives RDD[(String, scala.Iterable[Iterable[(Long, Int)]])] so I > flattened that: > > Rdd.groupByKey().mapValues(x=>x.flatten) > > > > However, when I print with saveAsTextFile I get only 2 files > > > > I was under the impression that groupBy repartitions the data by key and > saveAsTextFile make a file per partition. > > What am I doing wrong here? > > > > > > Thanks > > Adrian >