In order to do that, first of all you need to Key RDD by Key. and then use saveAsHadoopFile in this way:
We can use saveAsHadoopFile(location,classOf[KeyClass], classOf[ValueClass], classOf[PartitionOutputFormat]) When PartitionOutputFormat is extended from MultipleTextOutputFormat. Sample for that is below: class PartitionOutputFormat extends MultipleTextOutputFormat[Any, Any] { override def generateActualKey(key: Any, value: Any): Any = /// Add logic if you want to create any Key from Key and Value override def generateFileNameForKeyValue(key: Any, value: Any, basePath: String): String = { /// Add logic to generate file name from Key and Value, Generally we use basePath and add Key to it to make filename for that set of keys. } } On Wed, Sep 7, 2016 at 10:58 AM, Vikash Kumar <vikashsp...@gmail.com> wrote: > I need to spilt RDD [keys, Iterable[Value]] to save each key into > different file. > > e.g I have records like: customerId, name, age, sex > > 111,abc,34,M > 122, xyz,32,F > 111,def,31,F > 122.trp,30,F > 133,jkl,35,M > > I need to write 3 different files based on customerId > file1: > 111,abc,34,M > 111,def,31,F > > file2: > 122, xyz,32,F > 122.trp,30,F > > file3: > 133,jkl,35,M > > How I can achieve this in spark scala code? >