(spark) branch branch-3.4 updated: [SPARK-47370][DOC] Add migration doc: TimestampNTZ type inference on Parquet files

gengliang Tue, 12 Mar 2024 15:12:43 -0700

This is an automated email from the ASF dual-hosted git repository.

gengliang pushed a commit to branch branch-3.4
in repository https://gitbox.apache.org/repos/asf/spark.git



The following commit(s) were added to refs/heads/branch-3.4 by this push:
     new 982fbc5b63e6 [SPARK-47370][DOC] Add migration doc: TimestampNTZ type 
inference on Parquet files
982fbc5b63e6 is described below

commit 982fbc5b63e61cbc280f8049caf60fbb6e178423
Author: Gengliang Wang <gengli...@apache.org>
AuthorDate: Tue Mar 12 15:11:34 2024 -0700

    [SPARK-47370][DOC] Add migration doc: TimestampNTZ type inference on 
Parquet files
    
    ### What changes were proposed in this pull request?
    
    Add migration doc: TimestampNTZ type inference on Parquet files
    
    ### Why are the changes needed?
    
    Update docs. The behavior change was not mentioned in the SQL migration 
guide
    
    ### Does this PR introduce _any_ user-facing change?
    
    No
    ### How was this patch tested?
    
    It's just doc change
    ### Was this patch authored or co-authored using generative AI tooling?
    
    No
    
    Closes #45482 from gengliangwang/ntzMigrationDoc.
    
    Authored-by: Gengliang Wang <gengli...@apache.org>
    Signed-off-by: Gengliang Wang <gengli...@apache.org>
    (cherry picked from commit 621f2c88f3e56257ee517d65e093d32fb44b783e)
    Signed-off-by: Gengliang Wang <gengli...@apache.org>
---
 docs/sql-migration-guide.md | 1 +
 1 file changed, 1 insertion(+)

diff --git a/docs/sql-migration-guide.md b/docs/sql-migration-guide.md
index 1ad6c8faa3db..b83745e75c79 100644
--- a/docs/sql-migration-guide.md
+++ b/docs/sql-migration-guide.md
@@ -43,6 +43,7 @@ license: |
   - Since Spark 3.4, vectorized readers are enabled by default for the nested 
data types (array, map and struct). To restore the legacy behavior, set 
`spark.sql.orc.enableNestedColumnVectorizedReader` and 
`spark.sql.parquet.enableNestedColumnVectorizedReader` to `false`.
   - Since Spark 3.4, `BinaryType` is not supported in CSV datasource. In Spark 
3.3 or earlier, users can write binary columns in CSV datasource, but the 
output content in CSV files is `Object.toString()` which is meaningless; 
meanwhile, if users read CSV tables with binary columns, Spark will throw an 
`Unsupported type: binary` exception.
   - Since Spark 3.4, bloom filter joins are enabled by default. To restore the 
legacy behavior, set `spark.sql.optimizer.runtime.bloomFilter.enabled` to 
`false`.
+  - Since Spark 3.4, when schema inference on external Parquet files, INT64 
timestamps with annotation `isAdjustedToUTC=false` will be inferred as 
TimestampNTZ type instead of Timestamp type. To restore the legacy behavior, 
set `spark.sql.parquet.inferTimestampNTZ.enabled` to `false`.
 
 ## Upgrading from Spark SQL 3.2 to 3.3
 


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

(spark) branch branch-3.4 updated: [SPARK-47370][DOC] Add migration doc: TimestampNTZ type inference on Parquet files

Reply via email to