Re: Convert RDD[Iterrable[MyCaseClass]] to RDD[MyCaseClass]

2018-12-03 Thread Gerard Maas
James, How do you create an instance of `RDD[Iterable[MyCaseClass]]` ? Is it in that first code snippet? > new SparkContext(sc).parallelize(seq)? kr, Gerard On Fri, Nov 30, 2018 at 3:02 PM James Starks wrote: > When processing data, I create an instance of RDD[Iterable[MyCaseClass]] > and

Re: Convert RDD[Iterrable[MyCaseClass]] to RDD[MyCaseClass]

2018-12-03 Thread Shahab Yunus
Curious why you think this is not smart code? On Mon, Dec 3, 2018 at 8:04 AM James Starks wrote: > By taking with your advice flatMap, now I can convert result from > RDD[Iterable[MyCaseClass]] to RDD[MyCaseClass]. Basically just to perform > flatMap in the end before starting to convert RDD

Re: Convert RDD[Iterrable[MyCaseClass]] to RDD[MyCaseClass]

2018-12-03 Thread James Starks
By taking with your advice flatMap, now I can convert result from RDD[Iterable[MyCaseClass]] to RDD[MyCaseClass]. Basically just to perform flatMap in the end before starting to convert RDD object back to DF (i.e. SparkSession.createDataFrame(rddRecordsOfMyCaseClass)). For instance, df.map {

Re: Convert RDD[Iterrable[MyCaseClass]] to RDD[MyCaseClass]

2018-12-01 Thread Chris Teoh
Hi James, Try flatMap (_.toList). See below example:- scala> case class MyClass(i:Int) defined class MyClass scala> val r = 1 to 100 r: scala.collection.immutable.Range.Inclusive = Range(1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29,

Re: Convert RDD[Iterrable[MyCaseClass]] to RDD[MyCaseClass]

2018-12-01 Thread Chris Teoh
Do you have the full code example? I think this would be similar to the mapPartitions code flow, something like flatMap( _ => _.toList ) I haven't yet tested this out but this is how I'd first try. On Sat, 1 Dec 2018 at 01:02, James Starks wrote: > When processing data, I create an instance

Convert RDD[Iterrable[MyCaseClass]] to RDD[MyCaseClass]

2018-11-30 Thread James Starks
When processing data, I create an instance of RDD[Iterable[MyCaseClass]] and I want to convert it to RDD[MyCaseClass] so that it can be further converted to dataset or dataframe with toDS() function. But I encounter a problem that SparkContext can not be instantiated within SparkSession.map