[ https://issues.apache.org/jira/browse/SPARK-38140?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Apache Spark reassigned SPARK-38140: ------------------------------------ Assignee: Apache Spark > Desc column stats (min, max) for timestamp type is not consistent with the > value due to time zone difference > ------------------------------------------------------------------------------------------------------------ > > Key: SPARK-38140 > URL: https://issues.apache.org/jira/browse/SPARK-38140 > Project: Spark > Issue Type: Bug > Components: SQL > Affects Versions: 3.1.2, 3.2.1 > Reporter: Zhenhua Wang > Assignee: Apache Spark > Priority: Minor > > Currently timestamp column's stats (min/max) are stored in UTC in metastore, > and when desc its min/max column stats, they are also shown in UTC. > As a result, for users not in UTC, the column stats (shown to users) are not > consistent with the actual value, which causes confusion. > For example: > {noformat} > spark-sql> create table tab_ts_master (ts timestamp) using parquet; > spark-sql> insert into tab_ts_master values make_timestamp(2022, 1, 1, 0, 0, > 1.123456), make_timestamp(2022, 1, 3, 0, 0, 2.987654); > spark-sql> select * from tab_ts_master; > 2022-01-01 00:00:01.123456 > 2022-01-03 00:00:02.987654 > spark-sql> set spark.sql.session.timeZone; > spark.sql.session.timeZone Asia/Shanghai > spark-sql> analyze table tab_ts_master compute statistics for all columns; > spark-sql> desc formatted tab_ts_master ts; > col_name ts > data_type timestamp > comment NULL > min 2021-12-31 16:00:01.123456 > max 2022-01-02 16:00:02.987654 > num_nulls 0 > distinct_count 2 > avg_col_len 8 > max_col_len 8 > histogram NULL > {noformat} -- This message was sent by Atlassian Jira (v8.20.1#820001) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org