James,
How do you create an instance of `RDD[Iterable[MyCaseClass]]` ?
Is it in that first code snippet? > new SparkContext(sc).parallelize(seq)?
kr, Gerard
On Fri, Nov 30, 2018 at 3:02 PM James Starks
wrote:
> When processing data, I create an instance of RDD[Iterable[MyCaseClass]]
> and
Curious why you think this is not smart code?
On Mon, Dec 3, 2018 at 8:04 AM James Starks
wrote:
> By taking with your advice flatMap, now I can convert result from
> RDD[Iterable[MyCaseClass]] to RDD[MyCaseClass]. Basically just to perform
> flatMap in the end before starting to convert RDD
By taking with your advice flatMap, now I can convert result from
RDD[Iterable[MyCaseClass]] to RDD[MyCaseClass]. Basically just to perform
flatMap in the end before starting to convert RDD object back to DF (i.e.
SparkSession.createDataFrame(rddRecordsOfMyCaseClass)). For instance,
df.map {
Hi James,
Try flatMap (_.toList). See below example:-
scala> case class MyClass(i:Int)
defined class MyClass
scala> val r = 1 to 100
r: scala.collection.immutable.Range.Inclusive = Range(1, 2, 3, 4, 5, 6, 7,
8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26,
27, 28, 29,
Do you have the full code example?
I think this would be similar to the mapPartitions code flow, something
like flatMap( _ => _.toList )
I haven't yet tested this out but this is how I'd first try.
On Sat, 1 Dec 2018 at 01:02, James Starks
wrote:
> When processing data, I create an instance
When processing data, I create an instance of RDD[Iterable[MyCaseClass]] and I
want to convert it to RDD[MyCaseClass] so that it can be further converted to
dataset or dataframe with toDS() function. But I encounter a problem that
SparkContext can not be instantiated within SparkSession.map