[jira] [Commented] (SPARK-10588) Saving a DataFrame containing only nulls to JSON doesn't work

Yin Huai (JIRA) Mon, 14 Sep 2015 09:22:15 -0700

    [ 
https://issues.apache.org/jira/browse/SPARK-10588?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14743770#comment-14743770
 ]


Yin Huai commented on SPARK-10588:
----------------------------------

This is an expected behavior. When we write a row out, we skip those null 
values, which is pretty useful to save space when writing sparse data to json.

One possible way to address this issue is to write null values only for the 
first row generated by a writer.

> Saving a DataFrame containing only nulls to JSON doesn't work
> -------------------------------------------------------------
>
>                 Key: SPARK-10588
>                 URL: https://issues.apache.org/jira/browse/SPARK-10588
>             Project: Spark
>          Issue Type: Bug
>          Components: SQL
>    Affects Versions: 1.5.0
>            Reporter: Cheng Lian
>
> Snippets to reproduce this issue:
> {noformat}
> val path = "file:///tmp/spark/null"
> // A single row containing a single null double, saving to JSON, wrong
> sqlContext.
>   range(1).selectExpr("CAST(NULL AS DOUBLE) AS c0").
>   write.mode("overwrite").json(path)
> sqlContext.read.json(path).show()
> ++
> ||
> ++
> ||
> ++
> // Two rows each containing a single null double, saving to JSON, wrong
> sqlContext.
>   range(2).selectExpr("CAST(NULL AS DOUBLE) AS c0").
>   write.mode("overwrite").json(path)
> sqlContext.read.json(path).show()
> ++
> ||
> ++
> ||
> ||
> ++
> // A single row containing two null doubles, saving to JSON, wrong
> sqlContext.
>   range(1).selectExpr("CAST(NULL AS DOUBLE) AS c0", "CAST(NULL AS DOUBLE) AS 
> c1").
>   write.mode("overwrite").json(path)
> sqlContext.read.json(path).show()
> ++
> ||
> ++
> ||
> ++
> // A single row containing a single null double, saving to Parquet, OK
> sqlContext.
>   range(1).selectExpr("CAST(NULL AS DOUBLE) AS c0").
>   write.mode("overwrite").parquet(path)
> sqlContext.read.parquet(path).show()
> +----+
> |   d|
> +----+
> |null|
> +----+
> // Two rows, one containing a single null double, one containing non-null 
> double, saving to JSON, OK
> sqlContext.
>   range(2).selectExpr("IF(id % 2 = 0, CAST(NULL AS DOUBLE), id) AS c0").
>   write.mode("overwrite").json(path)
> sqlContext.read.json(path).show()
> +----+
> |   d|
> +----+
> |null|
> | 1.0|
> +----+
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-10588) Saving a DataFrame containing only nulls to JSON doesn't work

Reply via email to