[ https://issues.apache.org/jira/browse/SPARK-36182?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Wenchen Fan resolved SPARK-36182. --------------------------------- Resolution: Fixed Issue resolved by pull request 34495 [https://github.com/apache/spark/pull/34495] > Support TimestampNTZ type in Parquet file source > ------------------------------------------------ > > Key: SPARK-36182 > URL: https://issues.apache.org/jira/browse/SPARK-36182 > Project: Spark > Issue Type: Sub-task > Components: SQL > Affects Versions: 3.3.0 > Reporter: Gengliang Wang > Assignee: Gengliang Wang > Priority: Major > Fix For: 3.3.0 > > > As per > https://github.com/apache/parquet-format/blob/master/LogicalTypes.md#timestamp, > Parquet supports both TIMESTAMP_NTZ and TIMESTAMP_LTZ (Spark's current > default timestamp type): > * A TIMESTAMP with isAdjustedToUTC=true => TIMESTAMP_LTZ > * A TIMESTAMP with isAdjustedToUTC=false => TIMESTAMP_NTZ > In Spark 3.1 or prior, the Parquet writer follows the definition and sets > the field `isAdjustedToUTC` as `true`, while the Parquet reader doesn’t > respect the `isAdjustedToUTC` flag and convert any Parquet Timestamp type as > TIMESTAMP_LTZ. > Since 3.2, with the support of timestamp without time zone type: > * Parquet writer follows the definition and sets the field `isAdjustedToUTC` > as `false` on writing TIMESTAMP_NTZ. > * Parquet reader > ** For schema inference, Spark converts the Parquet timestamp type to the > corresponding catalyst timestamp type according to the timestamp annotation > flag `isAdjustedToUTC`. > ** If merge schema is enabled in schema inference and some of the files are > inferred as TIMESTAMP_NTZ while the others are TIMESTAMP_LTZ, the result type > is TIMESTAMP_LTZ which is considered as the “wider” type > ** If a column of a user-provided schema is TIMESTAMP_LTZ and the column was > written as TIMESTAMP_NTZ type, Spark allows the read operation. > ** If a column of a user-provided schema is TIMESTAMP_NTZ and the column was > written as TIMESTAMP_LTZ type, the read operation is not allowed since the > TIMESTAMP_NTZ is considered as narrower than TIMESTAMP_LTZ. -- This message was sent by Atlassian Jira (v8.20.1#820001) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org