Re: question about the new Dataset API

2016-10-19 Thread Yang
I even added a fake groupByKey on the entire DataSet: scala> a_ds.groupByKey(k=>1).agg(typed.count[(Long,Long)](_._1)).show +-++ |value|TypedCount(scala.Tuple2)| +-++ |1| 2| +-++ On

question about the new Dataset API

2016-10-19 Thread Yang
scala> val a = sc.parallelize(Array((1,2),(3,4))) a: org.apache.spark.rdd.RDD[(Int, Int)] = ParallelCollectionRDD[243] at parallelize at :38 scala> val a_ds = hc.di.createDataFrame(a).as[(Long,Long)] a_ds: org.apache.spark.sql.Dataset[(Long, Long)] = [_1: int, _2: int] scala>