[jira] [Commented] (SPARK-26654) Use Timestamp/DateFormatter in CatalogColumnStat
[ https://issues.apache.org/jira/browse/SPARK-26654?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16757814#comment-16757814 ] Apache Spark commented on SPARK-26654: -- User 'MaxGekk' has created a pull request for this issue: https://github.com/apache/spark/pull/23662 > Use Timestamp/DateFormatter in CatalogColumnStat > > > Key: SPARK-26654 > URL: https://issues.apache.org/jira/browse/SPARK-26654 > Project: Spark > Issue Type: Sub-task > Components: SQL >Affects Versions: 2.4.0 >Reporter: Maxim Gekk >Priority: Major > > Need to switch fromExternalString on Timestamp/DateFormatters, in particular: > https://github.com/apache/spark/blob/3b7395fe025a4c9a591835e53ac6ca05be6868f1/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/catalog/interface.scala#L481-L482 -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-26654) Use Timestamp/DateFormatter in CatalogColumnStat
[ https://issues.apache.org/jira/browse/SPARK-26654?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16751855#comment-16751855 ] Wenchen Fan commented on SPARK-26654: - +1. I think we store string format instead of the actual long value, so that stats can be human-readable. For correctness, the string format must be able to convert back to the actual long value without ambiguity. > Use Timestamp/DateFormatter in CatalogColumnStat > > > Key: SPARK-26654 > URL: https://issues.apache.org/jira/browse/SPARK-26654 > Project: Spark > Issue Type: Sub-task > Components: SQL >Affects Versions: 2.4.0 >Reporter: Maxim Gekk >Priority: Major > > Need to switch fromExternalString on Timestamp/DateFormatters, in particular: > https://github.com/apache/spark/blob/3b7395fe025a4c9a591835e53ac6ca05be6868f1/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/catalog/interface.scala#L481-L482 -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-26654) Use Timestamp/DateFormatter in CatalogColumnStat
[ https://issues.apache.org/jira/browse/SPARK-26654?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16751634#comment-16751634 ] Maxim Gekk commented on SPARK-26654: [~cloud_fan][~hvanhovell][~srowen] I do believe saving statistics for TimestampType columns without time zone can cause inaccurate results if the statistics are read back in spark session with different time zone. So, it can impact on planning badly. This can be fixed by adding time zone during serialization of TimestampType column but it will change timestamp format (and old versions of Spark cannot read back if the versions will be not changed) or store original timezone separately together with statistics somewhere. > Use Timestamp/DateFormatter in CatalogColumnStat > > > Key: SPARK-26654 > URL: https://issues.apache.org/jira/browse/SPARK-26654 > Project: Spark > Issue Type: Sub-task > Components: SQL >Affects Versions: 2.4.0 >Reporter: Maxim Gekk >Priority: Major > > Need to switch fromExternalString on Timestamp/DateFormatters, in particular: > https://github.com/apache/spark/blob/3b7395fe025a4c9a591835e53ac6ca05be6868f1/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/catalog/interface.scala#L481-L482 -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org