Re: Split RDD by key and save to different files

2016-09-07 Thread Dhaval Patel
In order to do that, first of all you need to Key RDD by Key. and then use saveAsHadoopFile in this way: We can use saveAsHadoopFile(location,classOf[KeyClass], classOf[ValueClass], classOf[PartitionOutputFormat]) When PartitionOutputFormat is extended from MultipleTextOutputFormat. Sample for

Split RDD by key and save to different files

2016-09-07 Thread Vikash Kumar
I need to spilt RDD [keys, Iterable[Value]] to save each key into different file. e.g I have records like: customerId, name, age, sex 111,abc,34,M 122, xyz,32,F 111,def,31,F 122.trp,30,F 133,jkl,35,M I need to write 3 different files based on customerId file1: 111,abc,34,M 111,def,31,F file2: