Hi Spark users,

when I want to map the result of count on groupBy, I need to convert the
result to Dataframe, then change the column names and map the result to new
case class, Why Spark Datatset API doesn't have direct functionality?

case class LogRow(id: String, location: String, time: Long)
case class KeyValue(key: (String, String), value: Long)

val log = LogRow("1", "a", 1) :: LogRow("1", "a", 2) :: LogRow("1", "b", 3)
:: LogRow("1", "a", 4) :: LogRow("1", "b", 5) :: LogRow("1", "b", 6) ::
LogRow("1", "c", 7) :: LogRow("2", "a", 1) :: LogRow("2", "b", 2) ::
LogRow("2", "b", 3) :: LogRow("2", "a", 4) :: LogRow("2", "a", 5) ::
LogRow("2", "a", 6) :: LogRow("2", "c", 7) :: Nil
log.toDS().groupBy(l => {
  (l.id, l.location)
}).count().toDF().toDF("key", "value").as[KeyValue].show

+-----+-----+
|  key|value|
+-----+-----+
|[1,a]|    3|
|[1,b]|    3|
|[1,c]|    1|
|[2,a]|    4|
|[2,b]|    2|
|[2,c]|    1|
+-----+-----+


-- 
Milād Khājavi
http://blog.khajavi.ir
Having the source means you can do it yourself.
I tried to change the world, but I couldn’t find the source code.

Reply via email to