[jira] [Commented] (SPARK-36604) timestamp type column analyze result is wrong
[ https://issues.apache.org/jira/browse/SPARK-36604?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17695869#comment-17695869 ] Ritika Maheshwari commented on SPARK-36604: --- Seems to be working correctly in Spark 3.3.0 spark-sql> insert into a values(cast('2021-08-15 15:30:01' as timestamp) > ); 23/03/02 11:04:11 WARN ObjectStore: Failed to get database global_temp, returning NoSuchObjectException Time taken: 3.278 seconds spark-sql> select * from a; 2021-08-15 15:30:01 Time taken: 0.782 seconds, Fetched 1 row(s) spark-sql> analyze table a compute statistics for columns a; Time taken: 1.882 seconds spark-sql> desc formatted a a; col_name a data_type timestamp comment NULL min 2021-08-15 15:30:01.00 -0700 max 2021-08-15 15:30:01.00 -0700 num_nulls 0 distinct_count 1 avg_col_len 8 max_col_len 8 histogram NULL Time taken: 0.095 seconds, Fetched 10 row(s) spark-sql> desc a; a timestamp Time taken: 0.059 seconds, Fetched 1 row(s) spark-sql> > timestamp type column analyze result is wrong > - > > Key: SPARK-36604 > URL: https://issues.apache.org/jira/browse/SPARK-36604 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 3.1.1, 3.1.2 > Environment: Spark 3.1.1 >Reporter: YuanGuanhu >Priority: Major > > when we create table with timestamp column type, the min and max data of the > analyze result for the timestamp column is wrong > eg: > {code} > > select * from a; > {code} > {code} > 2021-08-15 15:30:01 > Time taken: 2.789 seconds, Fetched 1 row(s) > spark-sql> desc formatted a a; > col_name a > data_type timestamp > comment NULL > min 2021-08-15 07:30:01.00 > max 2021-08-15 07:30:01.00 > num_nulls 0 > distinct_count 1 > avg_col_len 8 > max_col_len 8 > histogram NULL > Time taken: 0.278 seconds, Fetched 10 row(s) > spark-sql> desc a; > a timestamp NULL > Time taken: 1.432 seconds, Fetched 1 row(s) > {code} > > reproduce step: > {code} > create table a(a timestamp); > insert into a select '2021-08-15 15:30:01'; > analyze table a compute statistics for columns a; > desc formatted a a; > select * from a; > {code} -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-36604) timestamp type column analyze result is wrong
[ https://issues.apache.org/jira/browse/SPARK-36604?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17521999#comment-17521999 ] YuanGuanhu commented on SPARK-36604: [~senthh] what's the session time zone? i tested with spark 3.2.1 alse have the issue. The value's '2021-08-15 15:30:01', while the min/max value is 8 hours diff. scala> spark.sql("insert into c select '2021-08-15 15:30:01'") 22/04/14 09:23:36 WARN ObjectStore: Failed to get database global_temp, returning NoSuchObjectException res3: org.apache.spark.sql.DataFrame = [] scala> spark.sql("analyze table c compute statistics for columns a") res4: org.apache.spark.sql.DataFrame = [] scala> spark.sql("desc formatted c a").show(true) +--++ | info_name| info_value| +--++ | col_name| a| | data_type| timestamp| | comment| NULL| | min|2021-08-15 07:30:...| | max|2021-08-15 07:30:...| | num_nulls| 0| |distinct_count| 1| | avg_col_len| 8| | max_col_len| 8| | histogram| NULL| +--++ scala> sql("set spark.sql.session.timeZone").show ++-+ | key| value| ++-+ |spark.sql.session...|Asia/Shanghai| ++-+ > timestamp type column analyze result is wrong > - > > Key: SPARK-36604 > URL: https://issues.apache.org/jira/browse/SPARK-36604 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 3.1.1, 3.1.2 > Environment: Spark 3.1.1 >Reporter: YuanGuanhu >Priority: Major > > when we create table with timestamp column type, the min and max data of the > analyze result for the timestamp column is wrong > eg: > {code} > > select * from a; > {code} > {code} > 2021-08-15 15:30:01 > Time taken: 2.789 seconds, Fetched 1 row(s) > spark-sql> desc formatted a a; > col_name a > data_type timestamp > comment NULL > min 2021-08-15 07:30:01.00 > max 2021-08-15 07:30:01.00 > num_nulls 0 > distinct_count 1 > avg_col_len 8 > max_col_len 8 > histogram NULL > Time taken: 0.278 seconds, Fetched 10 row(s) > spark-sql> desc a; > a timestamp NULL > Time taken: 1.432 seconds, Fetched 1 row(s) > {code} > > reproduce step: > {code} > create table a(a timestamp); > insert into a select '2021-08-15 15:30:01'; > analyze table a compute statistics for columns a; > desc formatted a a; > select * from a; > {code} -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-36604) timestamp type column analyze result is wrong
[ https://issues.apache.org/jira/browse/SPARK-36604?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17407918#comment-17407918 ] Apache Spark commented on SPARK-36604: -- User 'fhygh' has created a pull request for this issue: https://github.com/apache/spark/pull/33886 > timestamp type column analyze result is wrong > - > > Key: SPARK-36604 > URL: https://issues.apache.org/jira/browse/SPARK-36604 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 3.1.1, 3.1.2 > Environment: Spark 3.1.1 >Reporter: YuanGuanhu >Priority: Major > > when we create table with timestamp column type, the min and max data of the > analyze result for the timestamp column is wrong > eg: > {code} > > select * from a; > {code} > {code} > 2021-08-15 15:30:01 > Time taken: 2.789 seconds, Fetched 1 row(s) > spark-sql> desc formatted a a; > col_name a > data_type timestamp > comment NULL > min 2021-08-15 07:30:01.00 > max 2021-08-15 07:30:01.00 > num_nulls 0 > distinct_count 1 > avg_col_len 8 > max_col_len 8 > histogram NULL > Time taken: 0.278 seconds, Fetched 10 row(s) > spark-sql> desc a; > a timestamp NULL > Time taken: 1.432 seconds, Fetched 1 row(s) > {code} > > reproduce step: > {code} > create table a(a timestamp); > insert into a select '2021-08-15 15:30:01'; > analyze table a compute statistics for columns a; > desc formatted a a; > select * from a; > {code} -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-36604) timestamp type column analyze result is wrong
[ https://issues.apache.org/jira/browse/SPARK-36604?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17407099#comment-17407099 ] YuanGuanhu commented on SPARK-36604: [~senthh] i tested with spark2.4.5 also don't have this issue, i checked code maybe it's caused by this commit: https://github.com/apache/spark/pull/23662/files > timestamp type column analyze result is wrong > - > > Key: SPARK-36604 > URL: https://issues.apache.org/jira/browse/SPARK-36604 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 3.1.1, 3.1.2 > Environment: Spark 3.1.1 >Reporter: YuanGuanhu >Priority: Major > > when we create table with timestamp column type, the min and max data of the > analyze result for the timestamp column is wrong > eg: > {code} > > select * from a; > {code} > {code} > 2021-08-15 15:30:01 > Time taken: 2.789 seconds, Fetched 1 row(s) > spark-sql> desc formatted a a; > col_name a > data_type timestamp > comment NULL > min 2021-08-15 07:30:01.00 > max 2021-08-15 07:30:01.00 > num_nulls 0 > distinct_count 1 > avg_col_len 8 > max_col_len 8 > histogram NULL > Time taken: 0.278 seconds, Fetched 10 row(s) > spark-sql> desc a; > a timestamp NULL > Time taken: 1.432 seconds, Fetched 1 row(s) > {code} > > reproduce step: > {code} > create table a(a timestamp); > insert into a select '2021-08-15 15:30:01'; > analyze table a compute statistics for columns a; > desc formatted a a; > select * from a; > {code} -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-36604) timestamp type column analyze result is wrong
[ https://issues.apache.org/jira/browse/SPARK-36604?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17407068#comment-17407068 ] Senthil Kumar commented on SPARK-36604: --- [~yghu] I tested this scenario in Spark2.4, but I don't see this issue is occurring. Are you seeing this issue only in Spark 3.1.1? {panel} _scala> spark.sql("create table c(a timestamp)")_ _res16: org.apache.spark.sql.DataFrame = []_ __ _scala> spark.sql("insert into c select '2021-08-15 15:30:01'")_ _res17: org.apache.spark.sql.DataFrame = []_ __ _scala> spark.sql("analyze table c compute statistics for columns a")_ _res18: org.apache.spark.sql.DataFrame = []_ __ _scala> spark.sql("desc formatted c a").show(true)_ _+--++_ _| info_name| info_value|_ _+--++_ _| col_name| a|_ _| data_type| timestamp|_ _| comment| NULL|_ _| min|2021-08-15 15:30:...|_ _| max|2021-08-15 15:30:...|_ _| num_nulls| 0|_ _|distinct_count| 1|_ _| avg_col_len| 8|_ _| max_col_len| 8|_ _| histogram| NULL|_ _+--++_ {panel} > timestamp type column analyze result is wrong > - > > Key: SPARK-36604 > URL: https://issues.apache.org/jira/browse/SPARK-36604 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 3.1.1, 3.1.2 > Environment: Spark 3.1.1 >Reporter: YuanGuanhu >Priority: Major > > when we create table with timestamp column type, the min and max data of the > analyze result for the timestamp column is wrong > eg: > {code} > > select * from a; > {code} > {code} > 2021-08-15 15:30:01 > Time taken: 2.789 seconds, Fetched 1 row(s) > spark-sql> desc formatted a a; > col_name a > data_type timestamp > comment NULL > min 2021-08-15 07:30:01.00 > max 2021-08-15 07:30:01.00 > num_nulls 0 > distinct_count 1 > avg_col_len 8 > max_col_len 8 > histogram NULL > Time taken: 0.278 seconds, Fetched 10 row(s) > spark-sql> desc a; > a timestamp NULL > Time taken: 1.432 seconds, Fetched 1 row(s) > {code} > > reproduce step: > {code} > create table a(a timestamp); > insert into a select '2021-08-15 15:30:01'; > analyze table a compute statistics for columns a; > desc formatted a a; > select * from a; > {code} -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-36604) timestamp type column analyze result is wrong
[ https://issues.apache.org/jira/browse/SPARK-36604?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17406158#comment-17406158 ] YuanGuanhu commented on SPARK-36604: I'd like to work on this. > timestamp type column analyze result is wrong > - > > Key: SPARK-36604 > URL: https://issues.apache.org/jira/browse/SPARK-36604 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 3.1.1, 3.1.2 > Environment: Spark 3.1.1 >Reporter: YuanGuanhu >Priority: Major > > when we create table with timestamp column type, the min and max data of the > analyze result for the timestamp column is wrong > eg: > > select * from a; > 2021-08-15 15:30:01 > Time taken: 2.789 seconds, Fetched 1 row(s) > spark-sql> desc formatted a a; > col_name a > data_type timestamp > comment NULL > min 2021-08-15 07:30:01.00 > max 2021-08-15 07:30:01.00 > num_nulls 0 > distinct_count 1 > avg_col_len 8 > max_col_len 8 > histogram NULL > Time taken: 0.278 seconds, Fetched 10 row(s) > spark-sql> desc a; > a timestamp NULL > Time taken: 1.432 seconds, Fetched 1 row(s) > > reproduce step: > create table a(a timestamp); > insert into a select '2021-08-15 15:30:01'; > analyze table a compute statistics for columns a; > desc formatted a a; > select * from a; -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org