[GitHub] spark issue #22237: [SPARK-25243][SQL] Use FailureSafeParser in from_json

gengliangwang Thu, 27 Sep 2018 08:58:20 -0700

Github user gengliangwang commented on the issue:

    https://github.com/apache/spark/pull/22237
  
    Hi @MaxGekk ,
    I just reviewed this PR. I noticed that there is one behavior change. The 
column value of `from_json(corrupt_record...)` become `Row(null, nulll, ...)`, 
instead of `null`. 
    
    ``` 
    val df = Seq("""{"a" 1, "b": 2}""").toDS()
    val schema = new StructType().add("a", IntegerType).add("b", IntegerType)
    ```
    
    Before the code change:
    ```
    scala> df.select(from_json($"value", schema).as("col")).where("col is 
null").show()
    +----+
    | col|
    +----+
    |null|
    +----+
    
    scala> df.select(from_json($"value", schema).as("col")).where("col.a is 
null").show()
    +----+
    | col|
    +----+
    |null|
    +----+ 
    ```
    
    After the code change:
    ```
    scala> df.select(from_json($"value", schema).as("col")).where("col is 
null").show()
    +---+
    |col|
    +---+
    +---+
    
    
    scala> df.select(from_json($"value", schema).as("col")).where("col.a is 
null").show()
    +---+
    |col|
    +---+
    |[,]|
    +---+
    ```
    
    The main difference is that we can't filter the null `col` in the result 
column. Is there any reason for changing this?



---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #22237: [SPARK-25243][SQL] Use FailureSafeParser in from_json

Reply via email to