[ https://issues.apache.org/jira/browse/SPARK-26654?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16751634#comment-16751634 ]
Maxim Gekk commented on SPARK-26654: ------------------------------------ [~cloud_fan][~hvanhovell][~srowen] I do believe saving statistics for TimestampType columns without time zone can cause inaccurate results if the statistics are read back in spark session with different time zone. So, it can impact on planning badly. This can be fixed by adding time zone during serialization of TimestampType column but it will change timestamp format (and old versions of Spark cannot read back if the versions will be not changed) or store original timezone separately together with statistics somewhere. > Use Timestamp/DateFormatter in CatalogColumnStat > ------------------------------------------------ > > Key: SPARK-26654 > URL: https://issues.apache.org/jira/browse/SPARK-26654 > Project: Spark > Issue Type: Sub-task > Components: SQL > Affects Versions: 2.4.0 > Reporter: Maxim Gekk > Priority: Major > > Need to switch fromExternalString on Timestamp/DateFormatters, in particular: > https://github.com/apache/spark/blob/3b7395fe025a4c9a591835e53ac6ca05be6868f1/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/catalog/interface.scala#L481-L482 -- This message was sent by Atlassian JIRA (v7.6.3#76005) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org