In Spark 2.0 we are planning to combine DataFrame and Dataset so that all the methods will be available on either class.
On Tue, Jan 19, 2016 at 3:42 AM, Milad khajavi <khaj...@gmail.com> wrote: > Hi Spark users, > > when I want to map the result of count on groupBy, I need to convert the > result to Dataframe, then change the column names and map the result to new > case class, Why Spark Datatset API doesn't have direct functionality? > > case class LogRow(id: String, location: String, time: Long) > case class KeyValue(key: (String, String), value: Long) > > val log = LogRow("1", "a", 1) :: LogRow("1", "a", 2) :: LogRow("1", "b", > 3) :: LogRow("1", "a", 4) :: LogRow("1", "b", 5) :: LogRow("1", "b", 6) :: > LogRow("1", "c", 7) :: LogRow("2", "a", 1) :: LogRow("2", "b", 2) :: > LogRow("2", "b", 3) :: LogRow("2", "a", 4) :: LogRow("2", "a", 5) :: > LogRow("2", "a", 6) :: LogRow("2", "c", 7) :: Nil > log.toDS().groupBy(l => { > (l.id, l.location) > }).count().toDF().toDF("key", "value").as[KeyValue].show > > +-----+-----+ > | key|value| > +-----+-----+ > |[1,a]| 3| > |[1,b]| 3| > |[1,c]| 1| > |[2,a]| 4| > |[2,b]| 2| > |[2,c]| 1| > +-----+-----+ > > > -- > Milād Khājavi > http://blog.khajavi.ir > Having the source means you can do it yourself. > I tried to change the world, but I couldn’t find the source code. >