[jira] [Commented] (SPARK-18393) DataFrame pivot output column names should respect aliases

Hyukjin Kwon (JIRA) Fri, 18 Nov 2016 01:41:16 -0800

    [ 
https://issues.apache.org/jira/browse/SPARK-18393?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15676315#comment-15676315
 ]


Hyukjin Kwon commented on SPARK-18393:
--------------------------------------

Hi [~ekhliang], It seems it was fixed in SPARK-17458

{code}
scala> val df = spark.range(100).selectExpr("id % 5 as x", "id % 2 as a", "id 
as b")
df: org.apache.spark.sql.DataFrame = [x: bigint, a: bigint ... 1 more field]

scala> df.groupBy('x).pivot("a", Seq(0, 1)).agg(expr("sum(b)").as("blah"), 
expr("count(b)").as("foo")).show()
+---+------+-----+------+-----+
|  x|0_blah|0_foo|1_blah|1_foo|
+---+------+-----+------+-----+
|  0|   450|   10|   500|   10|
|  1|   510|   10|   460|   10|
|  3|   530|   10|   480|   10|
|  2|   470|   10|   520|   10|
|  4|   490|   10|   540|   10|
+---+------+-----+------+-----+
{code}

> DataFrame pivot output column names should respect aliases
> ----------------------------------------------------------
>
>                 Key: SPARK-18393
>                 URL: https://issues.apache.org/jira/browse/SPARK-18393
>             Project: Spark
>          Issue Type: Improvement
>          Components: SQL
>            Reporter: Eric Liang
>            Priority: Minor
>
> For example
> {code}
> val df = spark.range(100).selectExpr("id % 5 as x", "id % 2 as a", "id as b")
> df
>   .groupBy('x)
>   .pivot("a", Seq(0, 1))
>   .agg(expr("sum(b)").as("blah"), expr("count(b)").as("foo"))
>   .show()
> +---+--------------------+---------------------+--------------------+---------------------+
> |  x|0_sum(`b`) AS `blah`|0_count(`b`) AS `foo`|1_sum(`b`) AS 
> `blah`|1_count(`b`) AS `foo`|
> +---+--------------------+---------------------+--------------------+---------------------+
> |  0|                 450|                   10|                 500|         
>           10|
> |  1|                 510|                   10|                 460|         
>           10|
> |  3|                 530|                   10|                 480|         
>           10|
> |  2|                 470|                   10|                 520|         
>           10|
> |  4|                 490|                   10|                 540|         
>           10|
> +---+--------------------+---------------------+--------------------+---------------------+
> {code}
> The column names here are quite hard to read. Ideally we would respect the 
> aliases and generate column names like 0_blah, 0_foo, 1_blah, 1_foo instead.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-18393) DataFrame pivot output column names should respect aliases

Reply via email to