Github user mgaido91 commented on a diff in the pull request: https://github.com/apache/spark/pull/21439#discussion_r204300575 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/json/JacksonParser.scala --- @@ -101,6 +102,17 @@ class JacksonParser( } } + private def makeArrayRootConverter(at: ArrayType): JsonParser => Seq[InternalRow] = { + val elemConverter = makeConverter(at.elementType) + (parser: JsonParser) => parseJsonToken[Seq[InternalRow]](parser, at) { + case START_ARRAY => Seq(InternalRow(convertArray(parser, elemConverter))) + case START_OBJECT if at.elementType.isInstanceOf[StructType] => --- End diff -- I think what was done in that PR is pretty different and it is actually the opposite of what we are doing here. Indeed, there we are returning an array of structs when a struct is specified as schema and the JSON contains an array. Here we are returning an array with one struct when the schema is an array of struct and there is a struct instead of an array. Despite I don't really like the behavior introduced in the PR you mentioned, I can understand it, as it was a way to support array of struct (the only at the moment) and I don't think we can change it before 3.0 at least for backward compatibility. But since here we are introducing a new behavior, if an array is required and a struct is found, I think returning an array with one element is a wrong/unexpected behavior and returning null would be what I'd expect as a user.
--- --------------------------------------------------------------------- To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org