[ 
https://issues.apache.org/jira/browse/SPARK-15696?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-15696:
------------------------------------

    Assignee:     (was: Apache Spark)

> Improve `crosstab` to have a consistent column order 
> -----------------------------------------------------
>
>                 Key: SPARK-15696
>                 URL: https://issues.apache.org/jira/browse/SPARK-15696
>             Project: Spark
>          Issue Type: Improvement
>          Components: SQL
>            Reporter: Dongjoon Hyun
>
> Currently, `crosstab` have **random-order** columns obtained by just 
> `distinct`. Also, the documentation of `crosstab` also shows the result in a 
> sorted order which is different from the implementation.
> {code}
> scala> spark.createDataFrame(Seq((1, 1), (1, 2), (2, 1), (2, 1), (2, 3), (3, 
> 2), (3, 3))).toDF("key", "value").stat.crosstab("key", "value").show()
> +---------+---+---+---+
> |key_value|  3|  2|  1|
> +---------+---+---+---+
> |        2|  1|  0|  2|
> |        1|  0|  1|  1|
> |        3|  1|  1|  0|
> +---------+---+---+---+
> scala> spark.createDataFrame(Seq((1, "a"), (1, "b"), (2, "a"), (2, "a"), (2, 
> "c"), (3, "b"), (3, "c"))).toDF("key", "value").stat.crosstab("key", 
> "value").show()
> +---------+---+---+---+
> |key_value|  c|  a|  b|
> +---------+---+---+---+
> |        2|  1|  2|  0|
> |        1|  0|  1|  1|
> |        3|  1|  0|  1|
> +---------+---+---+---+
> {code}
> This issue explicitly constructs the columns in a sorted order in order to 
> improve user experience. Also, this implementation gives the same result with 
> the documentation.
> {code}
> scala> spark.createDataFrame(Seq((1, 1), (1, 2), (2, 1), (2, 1), (2, 3), (3, 
> 2), (3, 3))).toDF("key", "value").stat.crosstab("key", "value").show()
> +---------+---+---+---+
> |key_value|  1|  2|  3|
> +---------+---+---+---+
> |        2|  2|  0|  1|
> |        1|  1|  1|  0|
> |        3|  0|  1|  1|
> +---------+---+---+---+
> scala> spark.createDataFrame(Seq((1, "a"), (1, "b"), (2, "a"), (2, "a"), (2, 
> "c"), (3, "b"), (3, "c"))).toDF("key", "value").stat.crosstab("key", 
> "value").show()
> +---------+---+---+---+                                                       
>   
> |key_value|  a|  b|  c|
> +---------+---+---+---+
> |        2|  2|  0|  1|
> |        1|  1|  1|  0|
> |        3|  0|  1|  1|
> +---------+---+---+---+
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

Reply via email to