[ https://issues.apache.org/jira/browse/SPARK-18393?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Sean Owen resolved SPARK-18393. ------------------------------- Resolution: Duplicate > DataFrame pivot output column names should respect aliases > ---------------------------------------------------------- > > Key: SPARK-18393 > URL: https://issues.apache.org/jira/browse/SPARK-18393 > Project: Spark > Issue Type: Improvement > Components: SQL > Reporter: Eric Liang > Priority: Minor > > For example > {code} > val df = spark.range(100).selectExpr("id % 5 as x", "id % 2 as a", "id as b") > df > .groupBy('x) > .pivot("a", Seq(0, 1)) > .agg(expr("sum(b)").as("blah"), expr("count(b)").as("foo")) > .show() > +---+--------------------+---------------------+--------------------+---------------------+ > | x|0_sum(`b`) AS `blah`|0_count(`b`) AS `foo`|1_sum(`b`) AS > `blah`|1_count(`b`) AS `foo`| > +---+--------------------+---------------------+--------------------+---------------------+ > | 0| 450| 10| 500| > 10| > | 1| 510| 10| 460| > 10| > | 3| 530| 10| 480| > 10| > | 2| 470| 10| 520| > 10| > | 4| 490| 10| 540| > 10| > +---+--------------------+---------------------+--------------------+---------------------+ > {code} > The column names here are quite hard to read. Ideally we would respect the > aliases and generate column names like 0_blah, 0_foo, 1_blah, 1_foo instead. -- This message was sent by Atlassian JIRA (v6.3.4#6332) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org