[jira] [Commented] (SPARK-26654) Use Timestamp/DateFormatter in CatalogColumnStat

2019-01-31 Thread Apache Spark (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-26654?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16757814#comment-16757814
 ] 

Apache Spark commented on SPARK-26654:
--

User 'MaxGekk' has created a pull request for this issue:
https://github.com/apache/spark/pull/23662

> Use Timestamp/DateFormatter in CatalogColumnStat
> 
>
> Key: SPARK-26654
> URL: https://issues.apache.org/jira/browse/SPARK-26654
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 2.4.0
>Reporter: Maxim Gekk
>Priority: Major
>
> Need to switch fromExternalString on Timestamp/DateFormatters, in particular:
> https://github.com/apache/spark/blob/3b7395fe025a4c9a591835e53ac6ca05be6868f1/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/catalog/interface.scala#L481-L482



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-26654) Use Timestamp/DateFormatter in CatalogColumnStat

2019-01-24 Thread Wenchen Fan (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-26654?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16751855#comment-16751855
 ] 

Wenchen Fan commented on SPARK-26654:
-

+1. I think we store string format instead of the actual long value, so that 
stats can be human-readable. For correctness, the string format must be able to 
convert back to the actual long value without ambiguity.

> Use Timestamp/DateFormatter in CatalogColumnStat
> 
>
> Key: SPARK-26654
> URL: https://issues.apache.org/jira/browse/SPARK-26654
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 2.4.0
>Reporter: Maxim Gekk
>Priority: Major
>
> Need to switch fromExternalString on Timestamp/DateFormatters, in particular:
> https://github.com/apache/spark/blob/3b7395fe025a4c9a591835e53ac6ca05be6868f1/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/catalog/interface.scala#L481-L482



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-26654) Use Timestamp/DateFormatter in CatalogColumnStat

2019-01-24 Thread Maxim Gekk (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-26654?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16751634#comment-16751634
 ] 

Maxim Gekk commented on SPARK-26654:


[~cloud_fan][~hvanhovell][~srowen] I do believe saving statistics for 
TimestampType columns without time zone can cause inaccurate results if the 
statistics are read back in spark session with different time zone. So, it can 
impact on planning badly. This can be fixed by adding time zone during 
serialization of TimestampType column but it will change timestamp format (and 
old versions of Spark cannot read back if the versions will be not changed) or 
store original timezone separately together with statistics somewhere.

> Use Timestamp/DateFormatter in CatalogColumnStat
> 
>
> Key: SPARK-26654
> URL: https://issues.apache.org/jira/browse/SPARK-26654
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 2.4.0
>Reporter: Maxim Gekk
>Priority: Major
>
> Need to switch fromExternalString on Timestamp/DateFormatters, in particular:
> https://github.com/apache/spark/blob/3b7395fe025a4c9a591835e53ac6ca05be6868f1/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/catalog/interface.scala#L481-L482



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org