[ https://issues.apache.org/jira/browse/SPARK-7982?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14566680#comment-14566680 ]
Burak Yavuz commented on SPARK-7982: ------------------------------------ The reason we used null's instead of 0L was to decrease storage. Making them 0 isn't hard at all though, should be a two line change. > crosstab should use 0 instead of null for pairs that don't appear > ----------------------------------------------------------------- > > Key: SPARK-7982 > URL: https://issues.apache.org/jira/browse/SPARK-7982 > Project: Spark > Issue Type: Sub-task > Components: SQL > Reporter: Reynold Xin > Labels: starter > > See the following > {code} > In [79]: sqlContext.range(0, 10).stat.crosstab('id', 'id').show() > +-----+----+----+----+----+----+----+----+----+----+----+ > |id_id| 0| 5| 1| 6| 9| 2| 7| 3| 8| 4| > +-----+----+----+----+----+----+----+----+----+----+----+ > | 0| 1|null|null|null|null|null|null|null|null|null| > | 5|null|null|null|null|null| 1|null|null|null|null| > | 1|null| 1|null|null|null|null|null|null|null|null| > | 6|null|null|null|null|null|null| 1|null|null|null| > | 9|null|null|null|null|null|null|null|null|null| 1| > | 2|null|null| 1|null|null|null|null|null|null|null| > | 7|null|null|null|null|null|null|null| 1|null|null| > | 3|null|null|null| 1|null|null|null|null|null|null| > | 8|null|null|null|null|null|null|null|null| 1|null| > | 4|null|null|null|null| 1|null|null|null|null|null| > +-----+----+----+----+----+----+----+----+----+----+----+ > {code} > I think we should use 0 instead of null for these columns. -- This message was sent by Atlassian JIRA (v6.3.4#6332) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org