Github user mgaido91 commented on a diff in the pull request:

    https://github.com/apache/spark/pull/21439#discussion_r204300575
  
    --- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/json/JacksonParser.scala
 ---
    @@ -101,6 +102,17 @@ class JacksonParser(
         }
       }
     
    +  private def makeArrayRootConverter(at: ArrayType): JsonParser => 
Seq[InternalRow] = {
    +    val elemConverter = makeConverter(at.elementType)
    +    (parser: JsonParser) => parseJsonToken[Seq[InternalRow]](parser, at) {
    +      case START_ARRAY => Seq(InternalRow(convertArray(parser, 
elemConverter)))
    +      case START_OBJECT if at.elementType.isInstanceOf[StructType] =>
    --- End diff --
    
    I think what was done in that PR is pretty different and it is actually the 
opposite of what we are doing here. Indeed, there we are returning an array of 
structs when a struct is specified as schema and the JSON contains an array. 
Here we are returning an array with one struct when the schema is an array of 
struct and there is a struct instead of an array.
    
    Despite I don't really like the behavior introduced in the PR you 
mentioned, I can understand it, as it was a way to support array of struct (the 
only at the moment) and I don't think we can change it before 3.0 at least for 
backward compatibility. But since here we are introducing a new behavior, if an 
array is required and a struct is found, I think returning an array with one 
element is a wrong/unexpected behavior and returning null would be what I'd 
expect as a user.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

Reply via email to