[jira] [Commented] (SPARK-36604) timestamp type column analyze result is wrong

2023-03-02 Thread Ritika Maheshwari (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-36604?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17695869#comment-17695869
 ] 

Ritika Maheshwari commented on SPARK-36604:
---

Seems to be working correctly in  Spark 3.3.0

spark-sql> insert into a values(cast('2021-08-15 15:30:01' as timestamp)
         > );
23/03/02 11:04:11 WARN ObjectStore: Failed to get database global_temp, 
returning NoSuchObjectException
Time taken: 3.278 seconds
spark-sql> select * from a;
2021-08-15 15:30:01
Time taken: 0.782 seconds, Fetched 1 row(s)
spark-sql> analyze table a compute statistics for columns a;
Time taken: 1.882 seconds
spark-sql> desc formatted a a;
col_name        a
data_type       timestamp
comment NULL
min     2021-08-15 15:30:01.00 -0700
max     2021-08-15 15:30:01.00 -0700
num_nulls       0
distinct_count  1
avg_col_len     8
max_col_len     8
histogram       NULL
Time taken: 0.095 seconds, Fetched 10 row(s)
spark-sql> desc a;
a                       timestamp                                   
Time taken: 0.059 seconds, Fetched 1 row(s)
spark-sql>

> timestamp type column analyze result is wrong
> -
>
> Key: SPARK-36604
> URL: https://issues.apache.org/jira/browse/SPARK-36604
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 3.1.1, 3.1.2
> Environment: Spark 3.1.1
>Reporter: YuanGuanhu
>Priority: Major
>
> when we create table with timestamp column type, the min and max data of the 
> analyze result for the timestamp column is wrong
> eg:
> {code}
> > select * from a;
> {code}
> {code}
> 2021-08-15 15:30:01
> Time taken: 2.789 seconds, Fetched 1 row(s)
> spark-sql> desc formatted a a;
> col_name a
> data_type timestamp
> comment NULL
> min 2021-08-15 07:30:01.00
> max 2021-08-15 07:30:01.00
> num_nulls 0
> distinct_count 1
> avg_col_len 8
> max_col_len 8
> histogram NULL
> Time taken: 0.278 seconds, Fetched 10 row(s)
> spark-sql> desc a;
> a timestamp NULL
> Time taken: 1.432 seconds, Fetched 1 row(s)
> {code}
>  
> reproduce step:
> {code}
> create table a(a timestamp);
> insert into a select '2021-08-15 15:30:01';
> analyze table a compute statistics for columns a;
> desc formatted a a;
> select * from a;
> {code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-36604) timestamp type column analyze result is wrong

2022-04-13 Thread YuanGuanhu (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-36604?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17521999#comment-17521999
 ] 

YuanGuanhu commented on SPARK-36604:


[~senthh] what's the session time zone?

i tested with spark 3.2.1 alse have the issue. The value's '2021-08-15 
15:30:01', while the min/max value is 8 hours diff.

scala>  spark.sql("insert into c select '2021-08-15 15:30:01'")
22/04/14 09:23:36 WARN ObjectStore: Failed to get database global_temp, 
returning NoSuchObjectException
res3: org.apache.spark.sql.DataFrame = []

scala> spark.sql("analyze table c compute statistics for columns a")
res4: org.apache.spark.sql.DataFrame = []                                       

scala> spark.sql("desc formatted c a").show(true)
+--++
|     info_name|          info_value|
+--++
|      col_name|                   a|
|     data_type|           timestamp|
|       comment|                NULL|
|           min|2021-08-15 07:30:...|
|           max|2021-08-15 07:30:...|
|     num_nulls|                   0|
|distinct_count|                   1|
|   avg_col_len|                   8|
|   max_col_len|                   8|
|     histogram|                NULL|
+--++


scala> sql("set spark.sql.session.timeZone").show
++-+
|                 key|        value|
++-+
|spark.sql.session...|Asia/Shanghai|
++-+

> timestamp type column analyze result is wrong
> -
>
> Key: SPARK-36604
> URL: https://issues.apache.org/jira/browse/SPARK-36604
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 3.1.1, 3.1.2
> Environment: Spark 3.1.1
>Reporter: YuanGuanhu
>Priority: Major
>
> when we create table with timestamp column type, the min and max data of the 
> analyze result for the timestamp column is wrong
> eg:
> {code}
> > select * from a;
> {code}
> {code}
> 2021-08-15 15:30:01
> Time taken: 2.789 seconds, Fetched 1 row(s)
> spark-sql> desc formatted a a;
> col_name a
> data_type timestamp
> comment NULL
> min 2021-08-15 07:30:01.00
> max 2021-08-15 07:30:01.00
> num_nulls 0
> distinct_count 1
> avg_col_len 8
> max_col_len 8
> histogram NULL
> Time taken: 0.278 seconds, Fetched 10 row(s)
> spark-sql> desc a;
> a timestamp NULL
> Time taken: 1.432 seconds, Fetched 1 row(s)
> {code}
>  
> reproduce step:
> {code}
> create table a(a timestamp);
> insert into a select '2021-08-15 15:30:01';
> analyze table a compute statistics for columns a;
> desc formatted a a;
> select * from a;
> {code}



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-36604) timestamp type column analyze result is wrong

2021-09-01 Thread Apache Spark (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-36604?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17407918#comment-17407918
 ] 

Apache Spark commented on SPARK-36604:
--

User 'fhygh' has created a pull request for this issue:
https://github.com/apache/spark/pull/33886

> timestamp type column analyze result is wrong
> -
>
> Key: SPARK-36604
> URL: https://issues.apache.org/jira/browse/SPARK-36604
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 3.1.1, 3.1.2
> Environment: Spark 3.1.1
>Reporter: YuanGuanhu
>Priority: Major
>
> when we create table with timestamp column type, the min and max data of the 
> analyze result for the timestamp column is wrong
> eg:
> {code}
> > select * from a;
> {code}
> {code}
> 2021-08-15 15:30:01
> Time taken: 2.789 seconds, Fetched 1 row(s)
> spark-sql> desc formatted a a;
> col_name a
> data_type timestamp
> comment NULL
> min 2021-08-15 07:30:01.00
> max 2021-08-15 07:30:01.00
> num_nulls 0
> distinct_count 1
> avg_col_len 8
> max_col_len 8
> histogram NULL
> Time taken: 0.278 seconds, Fetched 10 row(s)
> spark-sql> desc a;
> a timestamp NULL
> Time taken: 1.432 seconds, Fetched 1 row(s)
> {code}
>  
> reproduce step:
> {code}
> create table a(a timestamp);
> insert into a select '2021-08-15 15:30:01';
> analyze table a compute statistics for columns a;
> desc formatted a a;
> select * from a;
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-36604) timestamp type column analyze result is wrong

2021-08-31 Thread YuanGuanhu (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-36604?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17407099#comment-17407099
 ] 

YuanGuanhu commented on SPARK-36604:


[~senthh]  i tested with spark2.4.5 also don't have this issue, i checked code  
maybe it's caused by this commit: 
https://github.com/apache/spark/pull/23662/files

> timestamp type column analyze result is wrong
> -
>
> Key: SPARK-36604
> URL: https://issues.apache.org/jira/browse/SPARK-36604
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 3.1.1, 3.1.2
> Environment: Spark 3.1.1
>Reporter: YuanGuanhu
>Priority: Major
>
> when we create table with timestamp column type, the min and max data of the 
> analyze result for the timestamp column is wrong
> eg:
> {code}
> > select * from a;
> {code}
> {code}
> 2021-08-15 15:30:01
> Time taken: 2.789 seconds, Fetched 1 row(s)
> spark-sql> desc formatted a a;
> col_name a
> data_type timestamp
> comment NULL
> min 2021-08-15 07:30:01.00
> max 2021-08-15 07:30:01.00
> num_nulls 0
> distinct_count 1
> avg_col_len 8
> max_col_len 8
> histogram NULL
> Time taken: 0.278 seconds, Fetched 10 row(s)
> spark-sql> desc a;
> a timestamp NULL
> Time taken: 1.432 seconds, Fetched 1 row(s)
> {code}
>  
> reproduce step:
> {code}
> create table a(a timestamp);
> insert into a select '2021-08-15 15:30:01';
> analyze table a compute statistics for columns a;
> desc formatted a a;
> select * from a;
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-36604) timestamp type column analyze result is wrong

2021-08-30 Thread Senthil Kumar (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-36604?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17407068#comment-17407068
 ] 

Senthil Kumar commented on SPARK-36604:
---

[~yghu] I tested this scenario in Spark2.4, but I don't see this issue is 
occurring.   Are you seeing this issue only in Spark 3.1.1? 

 

 
{panel}


 

_scala> spark.sql("create table c(a timestamp)")_

_res16: org.apache.spark.sql.DataFrame = []_

 __ 

_scala> spark.sql("insert into c select '2021-08-15 15:30:01'")_

_res17: org.apache.spark.sql.DataFrame = []_

 __ 

_scala> spark.sql("analyze table c compute statistics for columns a")_

_res18: org.apache.spark.sql.DataFrame = []_

 __ 

_scala> spark.sql("desc formatted c a").show(true)_

_+--++_

_|     info_name|          info_value|_

_+--++_

_|      col_name|                   a|_

_|     data_type|           timestamp|_

_|       comment|                NULL|_

_|           min|2021-08-15 15:30:...|_

_|           max|2021-08-15 15:30:...|_

_|     num_nulls|                   0|_

_|distinct_count|                   1|_

_|   avg_col_len|                   8|_

_|   max_col_len|                   8|_

_|     histogram|                NULL|_

_+--++_

 
{panel}
 

> timestamp type column analyze result is wrong
> -
>
> Key: SPARK-36604
> URL: https://issues.apache.org/jira/browse/SPARK-36604
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 3.1.1, 3.1.2
> Environment: Spark 3.1.1
>Reporter: YuanGuanhu
>Priority: Major
>
> when we create table with timestamp column type, the min and max data of the 
> analyze result for the timestamp column is wrong
> eg:
> {code}
> > select * from a;
> {code}
> {code}
> 2021-08-15 15:30:01
> Time taken: 2.789 seconds, Fetched 1 row(s)
> spark-sql> desc formatted a a;
> col_name a
> data_type timestamp
> comment NULL
> min 2021-08-15 07:30:01.00
> max 2021-08-15 07:30:01.00
> num_nulls 0
> distinct_count 1
> avg_col_len 8
> max_col_len 8
> histogram NULL
> Time taken: 0.278 seconds, Fetched 10 row(s)
> spark-sql> desc a;
> a timestamp NULL
> Time taken: 1.432 seconds, Fetched 1 row(s)
> {code}
>  
> reproduce step:
> {code}
> create table a(a timestamp);
> insert into a select '2021-08-15 15:30:01';
> analyze table a compute statistics for columns a;
> desc formatted a a;
> select * from a;
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-36604) timestamp type column analyze result is wrong

2021-08-28 Thread YuanGuanhu (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-36604?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17406158#comment-17406158
 ] 

YuanGuanhu commented on SPARK-36604:


I'd like to work on this.

> timestamp type column analyze result is wrong
> -
>
> Key: SPARK-36604
> URL: https://issues.apache.org/jira/browse/SPARK-36604
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 3.1.1, 3.1.2
> Environment: Spark 3.1.1
>Reporter: YuanGuanhu
>Priority: Major
>
> when we create table with timestamp column type, the min and max data of the 
> analyze result for the timestamp column is wrong
> eg:
> > select * from a;
> 2021-08-15 15:30:01
> Time taken: 2.789 seconds, Fetched 1 row(s)
> spark-sql> desc formatted a a;
> col_name a
> data_type timestamp
> comment NULL
> min 2021-08-15 07:30:01.00
> max 2021-08-15 07:30:01.00
> num_nulls 0
> distinct_count 1
> avg_col_len 8
> max_col_len 8
> histogram NULL
> Time taken: 0.278 seconds, Fetched 10 row(s)
> spark-sql> desc a;
> a timestamp NULL
> Time taken: 1.432 seconds, Fetched 1 row(s)
>  
> reproduce step:
> create table a(a timestamp);
> insert into a select '2021-08-15 15:30:01';
> analyze table a compute statistics for columns a;
> desc formatted a a;
> select * from a;



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org