Re: Row Encoder For DataSet

2017-12-10 Thread Tomasz Dudek
Row(20, 11), Row(10, 2, 11)) >>>>> ) >>>>> >>>>> >>>>> 1st tuple is used to group the relevant records for aggregation. I >>>>> have used following to create dataset. >>>>> >>>>> val s = StructType(Seq( >>>>> StructField("x", IntegerType, true), >>>>> StructField("y", IntegerType, true) >>>>> )) >>>>> val s1 = StructType(Seq( >>>>> StructField("u", IntegerType, true), >>>>> StructField("v", IntegerType, true), >>>>> StructField("z", IntegerType, true) >>>>> )) >>>>> >>>>> val ds = >>>>> sparkSession.sqlContext.createDataset(sparkSession.sparkContext.parallelize(values))(Encoders.tuple(RowEncoder(s), >>>>> RowEncoder(s1))) >>>>> >>>>> Is this correct way of representing this? >>>>> >>>>> How do I create dataset and row encoder for such use case for doing >>>>> groupByKey on this? >>>>> >>>>> >>>>> >>>>> Regards >>>>> Sandeep >>>>> >>>> >>>>

Re: Row Encoder For DataSet

2017-12-07 Thread Georg Heiler
ollowing to create dataset. >>> >>> val s = StructType(Seq( >>> StructField("x", IntegerType, true), >>> StructField("y", IntegerType, true) >>> )) >>> val s1 = StructType(Seq( >>> StructField("u", Intege

Re: Row Encoder For DataSet

2017-12-07 Thread Sandip Mehta
; StructField("v", IntegerType, true), >> StructField("z", IntegerType, true) >> )) >> >> val ds = >> sparkSession.sqlContext.createDataset(sparkSession.sparkContext.parallelize(values))(Encoders.tuple(RowEncoder(s), >> RowEncoder(s1))) >> >> Is this correct way of representing this? >> >> How do I create dataset and row encoder for such use case for doing >> groupByKey on this? >> >> >> >> Regards >> Sandeep >> > >

Re: Row Encoder For DataSet

2017-12-07 Thread Weichen Xu
ctField("u", IntegerType, true), > StructField("v", IntegerType, true), > StructField("z", IntegerType, true) > )) > > val ds = > sparkSession.sqlContext.createDataset(sparkSession.sparkContext.parallelize(values))(Encoders.tuple(RowEn

Row Encoder For DataSet

2017-12-07 Thread Sandip Mehta
Field("z", IntegerType, true) )) val ds = sparkSession.sqlContext.createDataset(sparkSession.sparkContext.parallelize(values))(Encoders.tuple(RowEncoder(s), RowEncoder(s1))) Is this correct way of representing this? How do I create dataset and row encoder for such use case for doing groupByKey on this? Regards Sandeep