(spark) branch branch-3.5 updated: [SPARK-47494][DOC] Add migration doc for the behavior change of Parquet timestamp inference since Spark 3.3

dongjoon Wed, 20 Mar 2024 15:19:35 -0700

This is an automated email from the ASF dual-hosted git repository.

dongjoon pushed a commit to branch branch-3.5
in repository https://gitbox.apache.org/repos/asf/spark.git



The following commit(s) were added to refs/heads/branch-3.5 by this push:
     new 430a407c3963 [SPARK-47494][DOC] Add migration doc for the behavior 
change of Parquet timestamp inference since Spark 3.3
430a407c3963 is described below

commit 430a407c39633637dba738482877edf806561ba7
Author: Gengliang Wang <gengli...@apache.org>
AuthorDate: Wed Mar 20 15:17:23 2024 -0700

    [SPARK-47494][DOC] Add migration doc for the behavior change of Parquet 
timestamp inference since Spark 3.3
    
    ### What changes were proposed in this pull request?
    
    Add migration doc for the behavior change of Parquet timestamp inference 
since Spark 3.3
    
    ### Why are the changes needed?
    
    Show the behavior change to users.
    
    ### Does this PR introduce _any_ user-facing change?
    
    No
    
    ### How was this patch tested?
    
    It's just doc change
    
    ### Was this patch authored or co-authored using generative AI tooling?
    
    Yes, there are some doc suggestion from copilot in 
docs/sql-migration-guide.md
    
    Closes #45623 from gengliangwang/SPARK-47494.
    
    Authored-by: Gengliang Wang <gengli...@apache.org>
    Signed-off-by: Dongjoon Hyun <dh...@apple.com>
    (cherry picked from commit 11247d804cd370aaeb88736a706c587e7f5c83b3)
    Signed-off-by: Dongjoon Hyun <dh...@apple.com>
---
 docs/sql-migration-guide.md | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/docs/sql-migration-guide.md b/docs/sql-migration-guide.md
index 0e54c33c6d12..f788d89c4999 100644
--- a/docs/sql-migration-guide.md
+++ b/docs/sql-migration-guide.md
@@ -99,6 +99,8 @@ license: |
 
   - Since Spark 3.3, the `unbase64` function throws error for a malformed 
`str` input. Use `try_to_binary(<str>, 'base64')` to tolerate malformed input 
and return NULL instead. In Spark 3.2 and earlier, the `unbase64` function 
returns a best-efforts result for a malformed `str` input.
 
+  - Since Spark 3.3, when reading Parquet files that were not produced by 
Spark, Parquet timestamp columns with annotation `isAdjustedToUTC = false` are 
inferred as TIMESTAMP_NTZ type during schema inference. In Spark 3.2 and 
earlier, these columns are inferred as TIMESTAMP type. To restore the behavior 
before Spark 3.3, you can set `spark.sql.parquet.inferTimestampNTZ.enabled` to 
`false`.
+
   - Since Spark 3.3.1 and 3.2.3, for `SELECT ... GROUP BY a GROUPING SETS 
(b)`-style SQL statements, `grouping__id` returns different values from Apache 
Spark 3.2.0, 3.2.1, 3.2.2, and 3.3.0. It computes based on user-given group-by 
expressions plus grouping set columns. To restore the behavior before 3.3.1 and 
3.2.3, you can set `spark.sql.legacy.groupingIdWithAppendedUserGroupBy`. For 
details, see [SPARK-40218](https://issues.apache.org/jira/browse/SPARK-40218) 
and [SPARK-40562](https:/ [...]
 
 ## Upgrading from Spark SQL 3.1 to 3.2


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

(spark) branch branch-3.5 updated: [SPARK-47494][DOC] Add migration doc for the behavior change of Parquet timestamp inference since Spark 3.3

Reply via email to