Hi Spark users, when I want to map the result of count on groupBy, I need to convert the result to Dataframe, then change the column names and map the result to new case class, Why Spark Datatset API doesn't have direct functionality?
case class LogRow(id: String, location: String, time: Long) case class KeyValue(key: (String, String), value: Long) val log = LogRow("1", "a", 1) :: LogRow("1", "a", 2) :: LogRow("1", "b", 3) :: LogRow("1", "a", 4) :: LogRow("1", "b", 5) :: LogRow("1", "b", 6) :: LogRow("1", "c", 7) :: LogRow("2", "a", 1) :: LogRow("2", "b", 2) :: LogRow("2", "b", 3) :: LogRow("2", "a", 4) :: LogRow("2", "a", 5) :: LogRow("2", "a", 6) :: LogRow("2", "c", 7) :: Nil log.toDS().groupBy(l => { (l.id, l.location) }).count().toDF().toDF("key", "value").as[KeyValue].show +-----+-----+ | key|value| +-----+-----+ |[1,a]| 3| |[1,b]| 3| |[1,c]| 1| |[2,a]| 4| |[2,b]| 2| |[2,c]| 1| +-----+-----+ -- Milād Khājavi http://blog.khajavi.ir Having the source means you can do it yourself. I tried to change the world, but I couldn’t find the source code.