Github user viirya commented on a diff in the pull request:

    https://github.com/apache/spark/pull/21371#discussion_r189454251
  
    --- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/json/JsonInferSchema.scala
 ---
    @@ -66,8 +69,12 @@ private[sql] object JsonInferSchema {
                     s"Parse Mode: ${FailFastMode.name}.", e)
               }
             }
    -      }
    -    }.fold(StructType(Nil))(
    +      }.fold(StructType(Nil))(
    +        compatibleRootType(columnNameOfCorruptRecord, parseMode))
    +      Iterator(typeInPartition)
    +    }.collect()
    --- End diff --
    
    > good catch! but wondering how the test passed in my PR...
    
    It is somehow flaky. If all types are folded at executor sides, when they 
are going to fold at local, it just merging `StructType()` and 
`StructType(StructField("id"), StructField("ID"))`. So you can still get 
current schema back.
    
    But if unfortunately, you have one partition with only `id` column, you 
need to merge `StructType(StructField("id"))` and 
`StructType(StructField("ID")` in local. Then the problem will happen.



---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

Reply via email to