Re: Spark Dataset doesn't have api for changing columns
How can I request for this API? See this closed issue: https://issues.apache.org/jira/browse/SPARK-12863 On Tue, Jan 19, 2016 at 10:12 PM, Michael Armbrust wrote: > In Spark 2.0 we are planning to combine DataFrame and Dataset so that all > the methods will be available on either class. > > On Tue, Jan 19, 2016 at 3:42 AM, Milad khajavi wrote: > >> Hi Spark users, >> >> when I want to map the result of count on groupBy, I need to convert the >> result to Dataframe, then change the column names and map the result to new >> case class, Why Spark Datatset API doesn't have direct functionality? >> >> case class LogRow(id: String, location: String, time: Long) >> case class KeyValue(key: (String, String), value: Long) >> >> val log = LogRow("1", "a", 1) :: LogRow("1", "a", 2) :: LogRow("1", "b", >> 3) :: LogRow("1", "a", 4) :: LogRow("1", "b", 5) :: LogRow("1", "b", 6) :: >> LogRow("1", "c", 7) :: LogRow("2", "a", 1) :: LogRow("2", "b", 2) :: >> LogRow("2", "b", 3) :: LogRow("2", "a", 4) :: LogRow("2", "a", 5) :: >> LogRow("2", "a", 6) :: LogRow("2", "c", 7) :: Nil >> log.toDS().groupBy(l => { >> (l.id, l.location) >> }).count().toDF().toDF("key", "value").as[KeyValue].show >> >> +-+-+ >> | key|value| >> +-+-+ >> |[1,a]|3| >> |[1,b]|3| >> |[1,c]|1| >> |[2,a]|4| >> |[2,b]|2| >> |[2,c]|1| >> +-+-+ >> >> >> -- >> Milād Khājavi >> http://blog.khajavi.ir >> Having the source means you can do it yourself. >> I tried to change the world, but I couldn’t find the source code. >> > > -- Milād Khājavi http://blog.khajavi.ir Having the source means you can do it yourself. I tried to change the world, but I couldn’t find the source code.
Re: Spark Dataset doesn't have api for changing columns
In Spark 2.0 we are planning to combine DataFrame and Dataset so that all the methods will be available on either class. On Tue, Jan 19, 2016 at 3:42 AM, Milad khajavi wrote: > Hi Spark users, > > when I want to map the result of count on groupBy, I need to convert the > result to Dataframe, then change the column names and map the result to new > case class, Why Spark Datatset API doesn't have direct functionality? > > case class LogRow(id: String, location: String, time: Long) > case class KeyValue(key: (String, String), value: Long) > > val log = LogRow("1", "a", 1) :: LogRow("1", "a", 2) :: LogRow("1", "b", > 3) :: LogRow("1", "a", 4) :: LogRow("1", "b", 5) :: LogRow("1", "b", 6) :: > LogRow("1", "c", 7) :: LogRow("2", "a", 1) :: LogRow("2", "b", 2) :: > LogRow("2", "b", 3) :: LogRow("2", "a", 4) :: LogRow("2", "a", 5) :: > LogRow("2", "a", 6) :: LogRow("2", "c", 7) :: Nil > log.toDS().groupBy(l => { > (l.id, l.location) > }).count().toDF().toDF("key", "value").as[KeyValue].show > > +-+-+ > | key|value| > +-+-+ > |[1,a]|3| > |[1,b]|3| > |[1,c]|1| > |[2,a]|4| > |[2,b]|2| > |[2,c]|1| > +-+-+ > > > -- > Milād Khājavi > http://blog.khajavi.ir > Having the source means you can do it yourself. > I tried to change the world, but I couldn’t find the source code. >
Spark Dataset doesn't have api for changing columns
Hi Spark users, when I want to map the result of count on groupBy, I need to convert the result to Dataframe, then change the column names and map the result to new case class, Why Spark Datatset API doesn't have direct functionality? case class LogRow(id: String, location: String, time: Long) case class KeyValue(key: (String, String), value: Long) val log = LogRow("1", "a", 1) :: LogRow("1", "a", 2) :: LogRow("1", "b", 3) :: LogRow("1", "a", 4) :: LogRow("1", "b", 5) :: LogRow("1", "b", 6) :: LogRow("1", "c", 7) :: LogRow("2", "a", 1) :: LogRow("2", "b", 2) :: LogRow("2", "b", 3) :: LogRow("2", "a", 4) :: LogRow("2", "a", 5) :: LogRow("2", "a", 6) :: LogRow("2", "c", 7) :: Nil log.toDS().groupBy(l => { (l.id, l.location) }).count().toDF().toDF("key", "value").as[KeyValue].show +-+-+ | key|value| +-+-+ |[1,a]|3| |[1,b]|3| |[1,c]|1| |[2,a]|4| |[2,b]|2| |[2,c]|1| +-+-+ -- Milād Khājavi http://blog.khajavi.ir Having the source means you can do it yourself. I tried to change the world, but I couldn’t find the source code.