This is an automated email from the ASF dual-hosted git repository. gengliang pushed a commit to branch branch-3.5 in repository https://gitbox.apache.org/repos/asf/spark.git
The following commit(s) were added to refs/heads/branch-3.5 by this push: new 5067447bf9a4 [SPARK-42285][DOC] Update Parquet data source doc on the timestamp_ntz inference option 5067447bf9a4 is described below commit 5067447bf9a420b2f972a03351058ebfa61e0e41 Author: Gengliang Wang <gengli...@apache.org> AuthorDate: Fri Feb 16 18:21:19 2024 -0800 [SPARK-42285][DOC] Update Parquet data source doc on the timestamp_ntz inference option ### What changes were proposed in this pull request? This is a follow-up of https://github.com/apache/spark/pull/39856. The configuration changes should be reflected in the Parquet data source doc ### Why are the changes needed? To fix doc ### Does this PR introduce _any_ user-facing change? No ### How was this patch tested? Preview: <img width="1010" alt="image" src="https://github.com/apache/spark/assets/1097932/618df731-49ad-49e7-afa2-22381cb3bbef"> ### Was this patch authored or co-authored using generative AI tooling? No Closes #45145 from gengliangwang/changeConfigName. Authored-by: Gengliang Wang <gengli...@apache.org> Signed-off-by: Gengliang Wang <gengli...@apache.org> (cherry picked from commit dc2f2673a73ccde44b59cada00e95e869ad64c01) Signed-off-by: Gengliang Wang <gengli...@apache.org> --- docs/sql-data-sources-parquet.md | 13 +++++++------ 1 file changed, 7 insertions(+), 6 deletions(-) diff --git a/docs/sql-data-sources-parquet.md b/docs/sql-data-sources-parquet.md index f49bbd7a9d04..707871e79802 100644 --- a/docs/sql-data-sources-parquet.md +++ b/docs/sql-data-sources-parquet.md @@ -616,14 +616,15 @@ Configuration of Parquet can be done using the `setConf` method on `SparkSession <td>3.3.0</td> </tr> <tr> - <td><code>spark.sql.parquet.timestampNTZ.enabled</code></td> + <td><code>spark.sql.parquet.inferTimestampNTZ.enabled</code></td> <td>true</td> <td> - Enables <code>TIMESTAMP_NTZ</code> support for Parquet reads and writes. - When enabled, <code>TIMESTAMP_NTZ</code> values are written as Parquet timestamp - columns with annotation isAdjustedToUTC = false and are inferred in a similar way. - When disabled, such values are read as <code>TIMESTAMP_LTZ</code> and have to be - converted to <code>TIMESTAMP_LTZ</code> for writes. + When enabled, Parquet timestamp columns with annotation <code>isAdjustedToUTC = false</code> + are inferred as TIMESTAMP_NTZ type during schema inference. Otherwise, all the Parquet + timestamp columns are inferred as TIMESTAMP_LTZ types. Note that Spark writes the + output schema into Parquet's footer metadata on file writing and leverages it on file + reading. Thus this configuration only affects the schema inference on Parquet files + which are not written by Spark. </td> <td>3.4.0</td> </tr> --------------------------------------------------------------------- To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org