Github user MaxGekk commented on a diff in the pull request: https://github.com/apache/spark/pull/22938#discussion_r231783277 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/jsonExpressions.scala --- @@ -550,15 +550,33 @@ case class JsonToStructs( s"Input schema ${nullableSchema.catalogString} must be a struct, an array or a map.") } - // This converts parsed rows to the desired output by the given schema. @transient - lazy val converter = nullableSchema match { - case _: StructType => - (rows: Iterator[InternalRow]) => if (rows.hasNext) rows.next() else null - case _: ArrayType => - (rows: Iterator[InternalRow]) => if (rows.hasNext) rows.next().getArray(0) else null - case _: MapType => - (rows: Iterator[InternalRow]) => if (rows.hasNext) rows.next().getMap(0) else null + private lazy val castRow = nullableSchema match { + case _: StructType => (row: InternalRow) => row + case _: ArrayType => (row: InternalRow) => + if (row.isNullAt(0)) { + new GenericArrayData(Array()) --- End diff -- I also thought what is better to return here - `null` or empty `Array`/`MapData`. In the case of `StructType` we return `Row` in the `PERMISSIVE` mode. For consistency should we return empty array/map in this mode too? Maybe we can consider special mode when we can return `null` for the bad record? For now it is easy to do since we use `FailureSafeParser`.
--- --------------------------------------------------------------------- To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org