Reynold Xin created SPARK-7982:
----------------------------------
Summary: crosstab should use 0 instead of null for pairs that
don't appear
Key: SPARK-7982
URL: https://issues.apache.org/jira/browse/SPARK-7982
Project: Spark
Issue Type: Sub-task
Components: SQL
Reporter: Reynold Xin
Assignee: Burak Yavuz
See the following
{code}
In [79]: sqlContext.range(0, 10).stat.crosstab('id', 'id').show()
+-----+----+----+----+----+----+----+----+----+----+----+
|id_id| 0| 5| 1| 6| 9| 2| 7| 3| 8| 4|
+-----+----+----+----+----+----+----+----+----+----+----+
| 0| 1|null|null|null|null|null|null|null|null|null|
| 5|null|null|null|null|null| 1|null|null|null|null|
| 1|null| 1|null|null|null|null|null|null|null|null|
| 6|null|null|null|null|null|null| 1|null|null|null|
| 9|null|null|null|null|null|null|null|null|null| 1|
| 2|null|null| 1|null|null|null|null|null|null|null|
| 7|null|null|null|null|null|null|null| 1|null|null|
| 3|null|null|null| 1|null|null|null|null|null|null|
| 8|null|null|null|null|null|null|null|null| 1|null|
| 4|null|null|null|null| 1|null|null|null|null|null|
+-----+----+----+----+----+----+----+----+----+----+----+
{code}
I think we should use 0 instead of null for these columns.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]