(spark) branch branch-3.5 updated: [SPARK-42285][DOC] Update Parquet data source doc on the timestamp_ntz inference option

gengliang Fri, 16 Feb 2024 18:21:55 -0800

This is an automated email from the ASF dual-hosted git repository.

gengliang pushed a commit to branch branch-3.5
in repository https://gitbox.apache.org/repos/asf/spark.git



The following commit(s) were added to refs/heads/branch-3.5 by this push:
     new 5067447bf9a4 [SPARK-42285][DOC] Update Parquet data source doc on the 
timestamp_ntz inference option
5067447bf9a4 is described below

commit 5067447bf9a420b2f972a03351058ebfa61e0e41
Author: Gengliang Wang <gengli...@apache.org>
AuthorDate: Fri Feb 16 18:21:19 2024 -0800

    [SPARK-42285][DOC] Update Parquet data source doc on the timestamp_ntz 
inference option
    
    ### What changes were proposed in this pull request?
    
    This is a follow-up of https://github.com/apache/spark/pull/39856. The 
configuration changes should be reflected in the Parquet data source doc
    
    ### Why are the changes needed?
    
    To fix doc
    ### Does this PR introduce _any_ user-facing change?
    
    No
    
    ### How was this patch tested?
    
    Preview:
    <img width="1010" alt="image" 
src="https://github.com/apache/spark/assets/1097932/618df731-49ad-49e7-afa2-22381cb3bbef";>
    
    ### Was this patch authored or co-authored using generative AI tooling?
    
    No
    
    Closes #45145 from gengliangwang/changeConfigName.
    
    Authored-by: Gengliang Wang <gengli...@apache.org>
    Signed-off-by: Gengliang Wang <gengli...@apache.org>
    (cherry picked from commit dc2f2673a73ccde44b59cada00e95e869ad64c01)
    Signed-off-by: Gengliang Wang <gengli...@apache.org>
---
 docs/sql-data-sources-parquet.md | 13 +++++++------
 1 file changed, 7 insertions(+), 6 deletions(-)

diff --git a/docs/sql-data-sources-parquet.md b/docs/sql-data-sources-parquet.md
index f49bbd7a9d04..707871e79802 100644
--- a/docs/sql-data-sources-parquet.md
+++ b/docs/sql-data-sources-parquet.md
@@ -616,14 +616,15 @@ Configuration of Parquet can be done using the `setConf` 
method on `SparkSession
   <td>3.3.0</td>
 </tr>
 <tr>
-  <td><code>spark.sql.parquet.timestampNTZ.enabled</code></td>
+  <td><code>spark.sql.parquet.inferTimestampNTZ.enabled</code></td>
   <td>true</td>
   <td>
-    Enables <code>TIMESTAMP_NTZ</code> support for Parquet reads and writes.
-    When enabled, <code>TIMESTAMP_NTZ</code> values are written as Parquet 
timestamp
-    columns with annotation isAdjustedToUTC = false and are inferred in a 
similar way.
-    When disabled, such values are read as <code>TIMESTAMP_LTZ</code> and have 
to be
-    converted to <code>TIMESTAMP_LTZ</code> for writes.
+    When enabled, Parquet timestamp columns with annotation 
<code>isAdjustedToUTC = false</code>
+    are inferred as TIMESTAMP_NTZ type during schema inference. Otherwise, all 
the Parquet
+    timestamp columns are inferred as TIMESTAMP_LTZ types. Note that Spark 
writes the
+    output schema into Parquet's footer metadata on file writing and leverages 
it on file
+    reading. Thus this configuration only affects the schema inference on 
Parquet files
+    which are not written by Spark.
   </td>
   <td>3.4.0</td>
 </tr>


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

(spark) branch branch-3.5 updated: [SPARK-42285][DOC] Update Parquet data source doc on the timestamp_ntz inference option

Reply via email to