[ https://issues.apache.org/jira/browse/SPARK-30875?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17040042#comment-17040042 ]
Wenchen Fan commented on SPARK-30875: ------------------------------------- cc [~maxgekk] [~hyukjin.kwon] > Revisit the decision of writing parquet TIMESTAMP_MICROS by default > ------------------------------------------------------------------- > > Key: SPARK-30875 > URL: https://issues.apache.org/jira/browse/SPARK-30875 > Project: Spark > Issue Type: Bug > Components: SQL > Affects Versions: 3.0.0 > Reporter: Wenchen Fan > Priority: Major > > In Spark 3.0, we write out timestamp values as parquet TIMESTAMP_MICROS by > default, instead of INT96. This is good in general as Spark can read all > kinds of parquet timestamps, but works better with TIMESTAMP_MICROS. > However, this brings some troubles with hive compatibility. Spark can use > native parquet writer to write hive parquet tables, which may break hive > compatibility if Spark writes TIMESTAMP_MICROS. > We can switch back to INT96 by default, or fix it: > 1. when using native parquet writer to write hive parquet tables, write > timestamp as INT96. > 2. when creating tables in `HiveExternalCatalog.createTable`, don't claim the > parquet table is hive compatible if it has timestamp columns. -- This message was sent by Atlassian Jira (v8.3.4#803005) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org