Github user MaxGekk commented on a diff in the pull request: https://github.com/apache/spark/pull/22237#discussion_r222554954 --- Diff: docs/sql-programming-guide.md --- @@ -1877,6 +1877,10 @@ working with timestamps in `pandas_udf`s to get the best performance, see # Migration Guide +## Upgrading From Spark SQL 2.4 to 3.0 + + - Since Spark 3.0, the `from_json` functions supports two modes - `PERMISSIVE` and `FAILFAST`. The modes can be set via the `mode` option. The default mode became `PERMISSIVE`. In previous versions, behavior of `from_json` did not conform to either `PERMISSIVE` nor `FAILFAST`, especially in processing of malformed JSON records. For example, the JSON string `{"a" 1}` with the schema `a INT` is converted to `null` by previous versions but Spark 3.0 converts it to `Row(null)`. In version 2.4 and earlier, arrays of JSON objects are considered as invalid and converted to `null` if specified schema is `StructType`. Since Spark 3.0, the input is considered as a valid JSON array and only its first element is parsed if it conforms to the specified `StructType`. + --- End diff -- > Do we have a clear definition of the current behavior? It's important to let user know how the behavior changes. I added new mode in which behavior is the same as current one.
--- --------------------------------------------------------------------- To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org