Re: Dataset API agg question

2016-06-07 Thread Reynold Xin
Take a look at the implementation of typed sum/avg: https://github.com/apache/spark/blob/master/sql/core/src/main/scala/org/apache/spark/sql/expressions/scalalang/typed.scala You can implement a typed max/min. On Tue, Jun 7, 2016 at 4:31 PM, Alexander Pivovarov wrote: >

Re: Dataset API agg question

2016-06-07 Thread Alexander Pivovarov
Ted, It does not work like that you have to .map(toAB).toDS On Tue, Jun 7, 2016 at 4:07 PM, Ted Yu wrote: > Have you tried the following ? > > Seq(1->2, 1->5, 3->6).toDS("a", "b") > > then you can refer to columns by name. > > FYI > > > On Tue, Jun 7, 2016 at 3:58 PM,

Re: Dataset API agg question

2016-06-07 Thread Ted Yu
Have you tried the following ? Seq(1->2, 1->5, 3->6).toDS("a", "b") then you can refer to columns by name. FYI On Tue, Jun 7, 2016 at 3:58 PM, Alexander Pivovarov wrote: > I'm trying to switch from RDD API to Dataset API > My question is about reduceByKey method > >

Dataset API agg question

2016-06-07 Thread Alexander Pivovarov
I'm trying to switch from RDD API to Dataset API My question is about reduceByKey method e.g. in the following example I'm trying to rewrite sc.parallelize(Seq(1->2, 1->5, 3->6)).reduceByKey(math.max).take(10) using DS API. That is what I have so far: Seq(1->2, 1->5,