Re: Spark Dataset doesn't have api for changing columns

Michael Armbrust Tue, 19 Jan 2016 10:43:14 -0800

In Spark 2.0 we are planning to combine DataFrame and Dataset so that all
the methods will be available on either class.


On Tue, Jan 19, 2016 at 3:42 AM, Milad khajavi <khaj...@gmail.com> wrote:

> Hi Spark users,
>
> when I want to map the result of count on groupBy, I need to convert the
> result to Dataframe, then change the column names and map the result to new
> case class, Why Spark Datatset API doesn't have direct functionality?
>
> case class LogRow(id: String, location: String, time: Long)
> case class KeyValue(key: (String, String), value: Long)
>
> val log = LogRow("1", "a", 1) :: LogRow("1", "a", 2) :: LogRow("1", "b",
> 3) :: LogRow("1", "a", 4) :: LogRow("1", "b", 5) :: LogRow("1", "b", 6) ::
> LogRow("1", "c", 7) :: LogRow("2", "a", 1) :: LogRow("2", "b", 2) ::
> LogRow("2", "b", 3) :: LogRow("2", "a", 4) :: LogRow("2", "a", 5) ::
> LogRow("2", "a", 6) :: LogRow("2", "c", 7) :: Nil
> log.toDS().groupBy(l => {
>   (l.id, l.location)
> }).count().toDF().toDF("key", "value").as[KeyValue].show
>
> +-----+-----+
> |  key|value|
> +-----+-----+
> |[1,a]|    3|
> |[1,b]|    3|
> |[1,c]|    1|
> |[2,a]|    4|
> |[2,b]|    2|
> |[2,c]|    1|
> +-----+-----+
>
>
> --
> Milād Khājavi
> http://blog.khajavi.ir
> Having the source means you can do it yourself.
> I tried to change the world, but I couldn’t find the source code.
>

Re: Spark Dataset doesn't have api for changing columns

Reply via email to