[GitHub] spark pull request #23253: [SPARK-26303][SQL] Return partial results for bad...

MaxGekk Sat, 08 Dec 2018 00:43:47 -0800

Github user MaxGekk commented on a diff in the pull request:

    https://github.com/apache/spark/pull/23253#discussion_r239998397
  
    --- Diff: docs/sql-migration-guide-upgrade.md ---
    @@ -37,6 +37,8 @@ displayTitle: Spark SQL Upgrading Guide
     
       - In Spark version 2.4 and earlier, CSV datasource converts a malformed 
CSV string to a row with all `null`s in the PERMISSIVE mode. Since Spark 3.0, 
returned row can contain non-`null` fields if some of CSV column values were 
parsed and converted to desired types successfully.
     
    +  - In Spark version 2.4 and earlier, JSON datasource and JSON functions 
like `from_json` convert a bad JSON record to a row with all `null`s in the 
PERMISSIVE mode when specified schema is `StructType`. Since Spark 3.0, 
returned row can contain non-`null` fields if some of JSON column values were 
parsed and converted to desired types successfully.
    +
    --- End diff --
    
    In the `PERMISSIVE` mode, no way but at the moment (without the PR) you 
cannot distinguish a row produced from a bad record from a row produced from 
JSON object with all `null` fields too.
    
    A row itself with all null cannot be an indicator of bad record. Need an 
additional flag. `null` or non-`null` in the corrupt column plays such role.



---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #23253: [SPARK-26303][SQL] Return partial results for bad...

Reply via email to