This is an automated email from the ASF dual-hosted git repository. gengliang pushed a commit to branch branch-3.4 in repository https://gitbox.apache.org/repos/asf/spark.git
The following commit(s) were added to refs/heads/branch-3.4 by this push: new 982fbc5b63e6 [SPARK-47370][DOC] Add migration doc: TimestampNTZ type inference on Parquet files 982fbc5b63e6 is described below commit 982fbc5b63e61cbc280f8049caf60fbb6e178423 Author: Gengliang Wang <gengli...@apache.org> AuthorDate: Tue Mar 12 15:11:34 2024 -0700 [SPARK-47370][DOC] Add migration doc: TimestampNTZ type inference on Parquet files ### What changes were proposed in this pull request? Add migration doc: TimestampNTZ type inference on Parquet files ### Why are the changes needed? Update docs. The behavior change was not mentioned in the SQL migration guide ### Does this PR introduce _any_ user-facing change? No ### How was this patch tested? It's just doc change ### Was this patch authored or co-authored using generative AI tooling? No Closes #45482 from gengliangwang/ntzMigrationDoc. Authored-by: Gengliang Wang <gengli...@apache.org> Signed-off-by: Gengliang Wang <gengli...@apache.org> (cherry picked from commit 621f2c88f3e56257ee517d65e093d32fb44b783e) Signed-off-by: Gengliang Wang <gengli...@apache.org> --- docs/sql-migration-guide.md | 1 + 1 file changed, 1 insertion(+) diff --git a/docs/sql-migration-guide.md b/docs/sql-migration-guide.md index 1ad6c8faa3db..b83745e75c79 100644 --- a/docs/sql-migration-guide.md +++ b/docs/sql-migration-guide.md @@ -43,6 +43,7 @@ license: | - Since Spark 3.4, vectorized readers are enabled by default for the nested data types (array, map and struct). To restore the legacy behavior, set `spark.sql.orc.enableNestedColumnVectorizedReader` and `spark.sql.parquet.enableNestedColumnVectorizedReader` to `false`. - Since Spark 3.4, `BinaryType` is not supported in CSV datasource. In Spark 3.3 or earlier, users can write binary columns in CSV datasource, but the output content in CSV files is `Object.toString()` which is meaningless; meanwhile, if users read CSV tables with binary columns, Spark will throw an `Unsupported type: binary` exception. - Since Spark 3.4, bloom filter joins are enabled by default. To restore the legacy behavior, set `spark.sql.optimizer.runtime.bloomFilter.enabled` to `false`. + - Since Spark 3.4, when schema inference on external Parquet files, INT64 timestamps with annotation `isAdjustedToUTC=false` will be inferred as TimestampNTZ type instead of Timestamp type. To restore the legacy behavior, set `spark.sql.parquet.inferTimestampNTZ.enabled` to `false`. ## Upgrading from Spark SQL 3.2 to 3.3 --------------------------------------------------------------------- To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org