Github user HyukjinKwon commented on a diff in the pull request:

    https://github.com/apache/spark/pull/22814#discussion_r228115771
  
    --- Diff: 
external/avro/src/test/scala/org/apache/spark/sql/avro/AvroFunctionsSuite.scala 
---
    @@ -61,6 +59,24 @@ class AvroFunctionsSuite extends QueryTest with 
SharedSQLContext {
         checkAnswer(avroStructDF.select(from_avro('avro, avroTypeStruct)), df)
       }
     
    +  test("handle invalid input in from_avro") {
    +    val count = 10
    +    val df = spark.range(count).select(struct('id, 
'id.as("id2")).as("struct"))
    +    val avroStructDF = df.select(to_avro('struct).as("avro"))
    +    val avroTypeStruct = s"""
    +      |{
    +      |  "type": "record",
    +      |  "name": "struct",
    +      |  "fields": [
    +      |    {"name": "col1", "type": "long"},
    +      |    {"name": "col2", "type": "double"}
    +      |  ]
    +      |}
    +    """.stripMargin
    +    val expected = (0 until count).map(_ => Row(Row(null, null)))
    +    checkAnswer(avroStructDF.select(from_avro('avro, avroTypeStruct)), 
expected)
    --- End diff --
    
    BTW, when would there be malformed records - is this usually only when the 
schema is different?  Main purpose for parse modes at CSV and JSON is to cover 
the limitation from semi structured input data. Was wondering how useful it is 
in Avro if the most of cases are only when the input schema is different.
    
    Also, one good thing about PERMISSIVE mode is that we allow to fill invalid 
records at `columnNameOfCorruptRecord`. Here it looks it doesn't quite make 
sense to add this functionality at Avro.



---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

Reply via email to