[GitHub] spark pull request #22938: [SPARK-25935][SQL] Prevent null rows from JSON pa...

cloud-fan Wed, 07 Nov 2018 18:08:48 -0800

Github user cloud-fan commented on a diff in the pull request:

    https://github.com/apache/spark/pull/22938#discussion_r231745125
  
    --- Diff: docs/sql-migration-guide-upgrade.md ---
    @@ -15,6 +15,8 @@ displayTitle: Spark SQL Upgrading Guide
     
       - Since Spark 3.0, the `from_json` functions supports two modes - 
`PERMISSIVE` and `FAILFAST`. The modes can be set via the `mode` option. The 
default mode became `PERMISSIVE`. In previous versions, behavior of `from_json` 
did not conform to either `PERMISSIVE` nor `FAILFAST`, especially in processing 
of malformed JSON records. For example, the JSON string `{"a" 1}` with the 
schema `a INT` is converted to `null` by previous versions but Spark 3.0 
converts it to `Row(null)`.
     
    +  - In Spark version 2.4 and earlier, JSON data source and the `from_json` 
function produced `null`s if there is no valid root JSON token in its input (` 
` for example). Since Spark 3.0, such input is treated as a bad record and 
handled according to specified mode. For example, in the `PERMISSIVE` mode the 
` ` input is converted to `Row(null, null)` if specified schema is `key STRING, 
value INT`. 
    --- End diff --
    
    just for curiosity, how can the json data source return null rows?



---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #22938: [SPARK-25935][SQL] Prevent null rows from JSON pa...

Reply via email to