date:20211125

[jira] [Updated] (SPARK-37471) spark-sql support nested bracketed comment

2021-11-25 Thread angerszhu (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-37471?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

angerszhu updated SPARK-37471:
--
Description: 
{code}
/*
select  
select /* BROADCAST(b) */ 4\\;
*/
select  1
;
{code}
failed in spark-sql -f 

{code}
/usr/share/spark-3.2/bin/spark-sql --verbose  -f test.sql
{code}

{code}
Spark master: yarn, Application Id: application_1632999510150_6968442
/* sielect /* BROADCAST(b) */ 4
Error in query:
mismatched input '4' expecting {'(', 'ADD', 'ALTER', 'ANALYZE', 'CACHE', 
'CLEAR', 'COMMENT', 'COMMIT', 'CREATE', 'DELETE', 'DESC', 'DESCRIBE', 'DFS', 
'DROP', 'EXPLAIN', 'EXPORT', 'FROM', 'GRANT', 'IMPORT', 'INSERT', 'LIST', 
'LOAD', 'LOCK', 'MAP', 'MERGE', 'MSCK', 'REDUCE', 'REFRESH', 'REPLACE', 
'RESET', 'REVOKE', 'ROLLBACK', 'SELECT', 'SET', 'SHOW', 'START', 'TABLE', 
'TRUNCATE', 'UNCACHE', 'UNLOCK', 'UPDATE', 'USE', 'VALUES', 'WITH'}(line 1, pos 
30)

== SQL ==
/* sielect /* BROADCAST(b) */ 4
--^^^

{code}

  was:
{code}
/*
select  
select /* BROADCAST(b) */ 4\\;
*/
select  1
;
{code}
failed in spark-sql

{code}
Spark master: yarn, Application Id: application_1632999510150_6968442
/* sielect /* BROADCAST(b) */ 4
Error in query:
mismatched input '4' expecting {'(', 'ADD', 'ALTER', 'ANALYZE', 'CACHE', 
'CLEAR', 'COMMENT', 'COMMIT', 'CREATE', 'DELETE', 'DESC', 'DESCRIBE', 'DFS', 
'DROP', 'EXPLAIN', 'EXPORT', 'FROM', 'GRANT', 'IMPORT', 'INSERT', 'LIST', 
'LOAD', 'LOCK', 'MAP', 'MERGE', 'MSCK', 'REDUCE', 'REFRESH', 'REPLACE', 
'RESET', 'REVOKE', 'ROLLBACK', 'SELECT', 'SET', 'SHOW', 'START', 'TABLE', 
'TRUNCATE', 'UNCACHE', 'UNLOCK', 'UPDATE', 'USE', 'VALUES', 'WITH'}(line 1, pos 
30)

== SQL ==
/* sielect /* BROADCAST(b) */ 4
--^^^

{code}


> spark-sql support nested bracketed comment
> --
>
> Key: SPARK-37471
> URL: https://issues.apache.org/jira/browse/SPARK-37471
> Project: Spark
>  Issue Type: Task
>  Components: SQL
>Affects Versions: 3.2.0
>Reporter: angerszhu
>Priority: Major
>
> {code}
> /*
> select  
> select /* BROADCAST(b) */ 4\\;
> */
> select  1
> ;
> {code}
> failed in spark-sql -f 
> {code}
> /usr/share/spark-3.2/bin/spark-sql --verbose  -f test.sql
> {code}
> {code}
> Spark master: yarn, Application Id: application_1632999510150_6968442
> /* sielect /* BROADCAST(b) */ 4
> Error in query:
> mismatched input '4' expecting {'(', 'ADD', 'ALTER', 'ANALYZE', 'CACHE', 
> 'CLEAR', 'COMMENT', 'COMMIT', 'CREATE', 'DELETE', 'DESC', 'DESCRIBE', 'DFS', 
> 'DROP', 'EXPLAIN', 'EXPORT', 'FROM', 'GRANT', 'IMPORT', 'INSERT', 'LIST', 
> 'LOAD', 'LOCK', 'MAP', 'MERGE', 'MSCK', 'REDUCE', 'REFRESH', 'REPLACE', 
> 'RESET', 'REVOKE', 'ROLLBACK', 'SELECT', 'SET', 'SHOW', 'START', 'TABLE', 
> 'TRUNCATE', 'UNCACHE', 'UNLOCK', 'UPDATE', 'USE', 'VALUES', 'WITH'}(line 1, 
> pos 30)
> == SQL ==
> /* sielect /* BROADCAST(b) */ 4
> --^^^
> {code}



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-37471) spark-sql support nested bracketed comment

2021-11-25 Thread angerszhu (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-37471?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

angerszhu updated SPARK-37471:
--
Description: 
{code}
/*
select  
select /* BROADCAST(b) */ 4\\;
*/
select  1
;
{code}
failed in spark-sql

{code}
Spark master: yarn, Application Id: application_1632999510150_6968442
/* sielect /* BROADCAST(b) */ 4
Error in query:
mismatched input '4' expecting {'(', 'ADD', 'ALTER', 'ANALYZE', 'CACHE', 
'CLEAR', 'COMMENT', 'COMMIT', 'CREATE', 'DELETE', 'DESC', 'DESCRIBE', 'DFS', 
'DROP', 'EXPLAIN', 'EXPORT', 'FROM', 'GRANT', 'IMPORT', 'INSERT', 'LIST', 
'LOAD', 'LOCK', 'MAP', 'MERGE', 'MSCK', 'REDUCE', 'REFRESH', 'REPLACE', 
'RESET', 'REVOKE', 'ROLLBACK', 'SELECT', 'SET', 'SHOW', 'START', 'TABLE', 
'TRUNCATE', 'UNCACHE', 'UNLOCK', 'UPDATE', 'USE', 'VALUES', 'WITH'}(line 1, pos 
30)

== SQL ==
/* sielect /* BROADCAST(b) */ 4
--^^^

{code}

  was:
{code}
/*
select  
select /* BROADCAST(b) */ 4\\;
*/
select  1
;
{code}
failed in spark-sql

{code}
[info]   2021-11-25 23:30:18.727 - stderr>
[info]   2021-11-25 23:30:18.757 - stderr> Setting default log level to "WARN".
[info]   2021-11-25 23:30:18.758 - stderr> To adjust logging level use 
sc.setLogLevel(newLevel). For SparkR, use setLogLevel(newLevel).
[info]   2021-11-25 23:30:28.226 - stderr> Spark master: local, Application Id: 
local-1637911820513
[info]   2021-11-25 23:30:29.506 - stdout> spark-sql>
[info]   2021-11-25 23:30:29.535 - stdout>  > /*
[info]   2021-11-25 23:30:29.566 - stdout>  > SELECT /* BROADCAST(b) */ 
4;
[info]   2021-11-25 23:30:29.592 - stdout> /*
[info]   2021-11-25 23:30:29.592 - stdout> SELECT /* BROADCAST(b) */ 4
[info]   2021-11-25 23:30:30.239 - stderr> Error in query:
[info]   2021-11-25 23:30:30.239 - stderr> Unclosed bracketed comment(line 1, 
pos 0)
[info]   2021-11-25 23:30:30.239 - stderr>
[info]   2021-11-25 23:30:30.24 - stderr> == SQL ==
[info]   2021-11-25 23:30:30.24 - stderr> /*
[info]   2021-11-25 23:30:30.24 - stderr> ^^^
[info]   2021-11-25 23:30:30.24 - stderr> SELECT /* BROADCAST(b) */ 4
[info]   2021-11-25 23:30:30.24 - stderr>
[info]   2021-11-25 23:30:30.28 - stdout> spark-sql> */
[info]   2021-11-25 23:30:30.308 - stdout>  > SELECT 1
[info]   2021-11-25 23:30:30.336 - stdout>  > ;
[info]   2021-11-25 23:30:30.337 - stdout> */
[info]   2021-11-25 23:30:30.337 - stdout> SELECT 1
[info]   2021-11-25 23:30:30.337 - stdout>
[info]   2021-11-25 23:30:30.339 - stderr> Error in query:
[info]   2021-11-25 23:30:30.339 - stderr> extraneous input '*/' expecting 
{'(', 'ADD', 'ALTER', 'ANALYZE', 'CACHE', 'CLEAR', 'COMMENT', 'COMMIT', 
'CREATE', 'DELETE', 'DESC', 'DESCRIBE', 'DFS', 'DROP', 'EXPLAIN', 'EXPORT', 
'FROM', 'GRANT', 'IMPORT', 'INSERT', 'LIST', 'LOAD', 'LOCK', 'MAP', 'MERGE', 
'MSCK', 'REDUCE', 'REFRESH', 'REPLACE', 'RESET', 'REVOKE', 'ROLLBACK', 
'SELECT', 'SET', 'SHOW', 'START', 'TABLE', 'TRUNCATE', 'UNCACHE', 'UNLOCK', 
'UPDATE', 'USE', 'VALUES', 'WITH'}(line 1, pos 0)
[info]   2021-11-25 23:30:30.339 - stderr>
[info]   2021-11-25 23:30:30.339 - stderr> == SQL ==
[info]   2021-11-25 23:30:30.339 - stderr> */
[info]   2021-11-25 23:30:30.339 - stderr> ^^^
[info]   2021-11-25 23:30:30.339 - stderr> SELECT 1
[info]   2021-11-25 23:30:30.339 - stderr>
[info]   2021-11-25 23:30:30.368 - stdout> spark-sql>
[info]   2021-11-25 23:30:30.605 - stdout>
{code}


> spark-sql support nested bracketed comment
> --
>
> Key: SPARK-37471
> URL: https://issues.apache.org/jira/browse/SPARK-37471
> Project: Spark
>  Issue Type: Task
>  Components: SQL
>Affects Versions: 3.2.0
>Reporter: angerszhu
>Priority: Major
>
> {code}
> /*
> select  
> select /* BROADCAST(b) */ 4\\;
> */
> select  1
> ;
> {code}
> failed in spark-sql
> {code}
> Spark master: yarn, Application Id: application_1632999510150_6968442
> /* sielect /* BROADCAST(b) */ 4
> Error in query:
> mismatched input '4' expecting {'(', 'ADD', 'ALTER', 'ANALYZE', 'CACHE', 
> 'CLEAR', 'COMMENT', 'COMMIT', 'CREATE', 'DELETE', 'DESC', 'DESCRIBE', 'DFS', 
> 'DROP', 'EXPLAIN', 'EXPORT', 'FROM', 'GRANT', 'IMPORT', 'INSERT', 'LIST', 
> 'LOAD', 'LOCK', 'MAP', 'MERGE', 'MSCK', 'REDUCE', 'REFRESH', 'REPLACE', 
> 'RESET', 'REVOKE', 'ROLLBACK', 'SELECT', 'SET', 'SHOW', 'START', 'TABLE', 
> 'TRUNCATE', 'UNCACHE', 'UNLOCK', 'UPDATE', 'USE', 'VALUES', 'WITH'}(line 1, 
> pos 30)
> == SQL ==
> /* sielect /* BROADCAST(b) */ 4
> --^^^
> {code}



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-37471) spark-sql support nested bracketed comment

2021-11-25 Thread angerszhu (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-37471?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

angerszhu updated SPARK-37471:
--
Description: 
{code}
/*
select  
select /* BROADCAST(b) */ 4\\;
*/
select  1
;
{code}
failed in spark-sql

{code}
[info]   2021-11-25 23:30:18.727 - stderr>
[info]   2021-11-25 23:30:18.757 - stderr> Setting default log level to "WARN".
[info]   2021-11-25 23:30:18.758 - stderr> To adjust logging level use 
sc.setLogLevel(newLevel). For SparkR, use setLogLevel(newLevel).
[info]   2021-11-25 23:30:28.226 - stderr> Spark master: local, Application Id: 
local-1637911820513
[info]   2021-11-25 23:30:29.506 - stdout> spark-sql>
[info]   2021-11-25 23:30:29.535 - stdout>  > /*
[info]   2021-11-25 23:30:29.566 - stdout>  > SELECT /* BROADCAST(b) */ 
4;
[info]   2021-11-25 23:30:29.592 - stdout> /*
[info]   2021-11-25 23:30:29.592 - stdout> SELECT /* BROADCAST(b) */ 4
[info]   2021-11-25 23:30:30.239 - stderr> Error in query:
[info]   2021-11-25 23:30:30.239 - stderr> Unclosed bracketed comment(line 1, 
pos 0)
[info]   2021-11-25 23:30:30.239 - stderr>
[info]   2021-11-25 23:30:30.24 - stderr> == SQL ==
[info]   2021-11-25 23:30:30.24 - stderr> /*
[info]   2021-11-25 23:30:30.24 - stderr> ^^^
[info]   2021-11-25 23:30:30.24 - stderr> SELECT /* BROADCAST(b) */ 4
[info]   2021-11-25 23:30:30.24 - stderr>
[info]   2021-11-25 23:30:30.28 - stdout> spark-sql> */
[info]   2021-11-25 23:30:30.308 - stdout>  > SELECT 1
[info]   2021-11-25 23:30:30.336 - stdout>  > ;
[info]   2021-11-25 23:30:30.337 - stdout> */
[info]   2021-11-25 23:30:30.337 - stdout> SELECT 1
[info]   2021-11-25 23:30:30.337 - stdout>
[info]   2021-11-25 23:30:30.339 - stderr> Error in query:
[info]   2021-11-25 23:30:30.339 - stderr> extraneous input '*/' expecting 
{'(', 'ADD', 'ALTER', 'ANALYZE', 'CACHE', 'CLEAR', 'COMMENT', 'COMMIT', 
'CREATE', 'DELETE', 'DESC', 'DESCRIBE', 'DFS', 'DROP', 'EXPLAIN', 'EXPORT', 
'FROM', 'GRANT', 'IMPORT', 'INSERT', 'LIST', 'LOAD', 'LOCK', 'MAP', 'MERGE', 
'MSCK', 'REDUCE', 'REFRESH', 'REPLACE', 'RESET', 'REVOKE', 'ROLLBACK', 
'SELECT', 'SET', 'SHOW', 'START', 'TABLE', 'TRUNCATE', 'UNCACHE', 'UNLOCK', 
'UPDATE', 'USE', 'VALUES', 'WITH'}(line 1, pos 0)
[info]   2021-11-25 23:30:30.339 - stderr>
[info]   2021-11-25 23:30:30.339 - stderr> == SQL ==
[info]   2021-11-25 23:30:30.339 - stderr> */
[info]   2021-11-25 23:30:30.339 - stderr> ^^^
[info]   2021-11-25 23:30:30.339 - stderr> SELECT 1
[info]   2021-11-25 23:30:30.339 - stderr>
[info]   2021-11-25 23:30:30.368 - stdout> spark-sql>
[info]   2021-11-25 23:30:30.605 - stdout>
{code}

  was:
{code}
/*
select  
select /* BROADCAST(b) */ 4\\;
*/
select  1
;
{code}
failed in spark-sql


> spark-sql support nested bracketed comment
> --
>
> Key: SPARK-37471
> URL: https://issues.apache.org/jira/browse/SPARK-37471
> Project: Spark
>  Issue Type: Task
>  Components: SQL
>Affects Versions: 3.2.0
>Reporter: angerszhu
>Priority: Major
>
> {code}
> /*
> select  
> select /* BROADCAST(b) */ 4\\;
> */
> select  1
> ;
> {code}
> failed in spark-sql
> {code}
> [info]   2021-11-25 23:30:18.727 - stderr>
> [info]   2021-11-25 23:30:18.757 - stderr> Setting default log level to 
> "WARN".
> [info]   2021-11-25 23:30:18.758 - stderr> To adjust logging level use 
> sc.setLogLevel(newLevel). For SparkR, use setLogLevel(newLevel).
> [info]   2021-11-25 23:30:28.226 - stderr> Spark master: local, Application 
> Id: local-1637911820513
> [info]   2021-11-25 23:30:29.506 - stdout> spark-sql>
> [info]   2021-11-25 23:30:29.535 - stdout>  > /*
> [info]   2021-11-25 23:30:29.566 - stdout>  > SELECT /* BROADCAST(b) 
> */ 4;
> [info]   2021-11-25 23:30:29.592 - stdout> /*
> [info]   2021-11-25 23:30:29.592 - stdout> SELECT /* BROADCAST(b) */ 4
> [info]   2021-11-25 23:30:30.239 - stderr> Error in query:
> [info]   2021-11-25 23:30:30.239 - stderr> Unclosed bracketed comment(line 1, 
> pos 0)
> [info]   2021-11-25 23:30:30.239 - stderr>
> [info]   2021-11-25 23:30:30.24 - stderr> == SQL ==
> [info]   2021-11-25 23:30:30.24 - stderr> /*
> [info]   2021-11-25 23:30:30.24 - stderr> ^^^
> [info]   2021-11-25 23:30:30.24 - stderr> SELECT /* BROADCAST(b) */ 4
> [info]   2021-11-25 23:30:30.24 - stderr>
> [info]   2021-11-25 23:30:30.28 - stdout> spark-sql> */
> [info]   2021-11-25 23:30:30.308 - stdout>  > SELECT 1
> [info]   2021-11-25 23:30:30.336 - stdout>  > ;
> [info]   2021-11-25 23:30:30.337 - stdout> */
> [info]   2021-11-25 23:30:30.337 - stdout> SELECT 1
> [info]   2021-11-25 23:30:30.337 - stdout>
> [info]   2021-11-25 23:30:30.339 - stderr> Error in query:
> [info]   2021-11-25 23:30:30.339 - stderr> extraneous input '*/' expecting 
> {'(', 'ADD', 'ALTER', 'ANALYZE', 'CACHE', 'CLEAR', 'COMMENT', 'COMMIT', 
>

[jira] [Updated] (SPARK-37099) Impl a rank-based filter to optimize top-k computation

2021-11-25 Thread zhengruifeng (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-37099?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

zhengruifeng updated SPARK-37099:
-
Description: 
in JD, we found that more than 90% usage of window function follows this 
pattern:
{code:java}
 select (... (row_number|rank|dense_rank) () over( [partition by ...] order by 
... ) as rn)
where rn (==|<|<=) k and other conditions{code}
 

However, existing physical plan is not optimum:

 

1, we should select local top-k records within each partitions, and then 
compute the global top-k. this can help reduce the shuffle amount;

 

For these three rank functions (row_number|rank|dense_rank), the rank of a key 
computed on partitial dataset  is always <=  its final rank computed on the 
whole dataset.

so we can safely discard rows with partitial rank > rn, anywhere.

 

 

2, skewed-window: some partition is skewed and take a long time to finish 
computation.

 

A real-world skewed-window case in our system is attached.

 

  was:
in JD, we found that more than 90% usage of window function follows this 
pattern:
{code:java}
 select (... [row_number|rank|dense_rank]() over([partition by ...] order by 
...) as rn)
where rn ==[\<=] k and other conditions{code}
 

However, existing physical plan is not optimum:

 

1, we should select local top-k records within each partitions, and then 
compute the global top-k. this can help reduce the shuffle amount;

 

For these three rank functions (row_number|rank|dense_rank), the rank of a key 
computed on partitial dataset  is always <=  its final rank computed on the 
whole dataset.

so we can safely discard rows with partitial rank > rn, anywhere.

 

 

2, skewed-window: some partition is skewed and take a long time to finish 
computation.

 

A real-world skewed-window case in our system is attached.

 


> Impl a rank-based filter to optimize top-k computation
> --
>
> Key: SPARK-37099
> URL: https://issues.apache.org/jira/browse/SPARK-37099
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 3.3.0
>Reporter: zhengruifeng
>Priority: Major
> Attachments: skewed_window.png
>
>
> in JD, we found that more than 90% usage of window function follows this 
> pattern:
> {code:java}
>  select (... (row_number|rank|dense_rank) () over( [partition by ...] order 
> by ... ) as rn)
> where rn (==|<|<=) k and other conditions{code}
>  
> However, existing physical plan is not optimum:
>  
> 1, we should select local top-k records within each partitions, and then 
> compute the global top-k. this can help reduce the shuffle amount;
>  
> For these three rank functions (row_number|rank|dense_rank), the rank of a 
> key computed on partitial dataset  is always <=  its final rank computed on 
> the whole dataset.
> so we can safely discard rows with partitial rank > rn, anywhere.
>  
>  
> 2, skewed-window: some partition is skewed and take a long time to finish 
> computation.
>  
> A real-world skewed-window case in our system is attached.
>  



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Created] (SPARK-37471) spark-sql support nested bracketed comment

2021-11-25 Thread angerszhu (Jira)

angerszhu created SPARK-37471:
-

 Summary: spark-sql support nested bracketed comment
 Key: SPARK-37471
 URL: https://issues.apache.org/jira/browse/SPARK-37471
 Project: Spark
  Issue Type: Task
  Components: SQL
Affects Versions: 3.2.0
Reporter: angerszhu


{code}
/*
select  
select /* BROADCAST(b) */ 4\\;
*/
select  1
;
{code}
failed in spark-sql



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-37469) Unified "fetchWaitTime" and "shuffleReadTime" metrics On UI

2021-11-25 Thread Apache Spark (Jira)



[ 
https://issues.apache.org/jira/browse/SPARK-37469?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17449384#comment-17449384
 ] 

Apache Spark commented on SPARK-37469:
--

User 'toujours33' has created a pull request for this issue:
https://github.com/apache/spark/pull/34720

> Unified "fetchWaitTime" and "shuffleReadTime" metrics On UI
> ---
>
> Key: SPARK-37469
> URL: https://issues.apache.org/jira/browse/SPARK-37469
> Project: Spark
>  Issue Type: Improvement
>  Components: Web UI
>Affects Versions: 3.2.0
>Reporter: Yazhi Wang
>Priority: Minor
> Attachments: executor-page.png, sql-page.png
>
>
> Metrics in Executor/Task page shown as "
> Shuffle Read Block Time", and the SQL page shown as "fetch wait time" which 
> make us confused  !executor-page.png!
> !sql-page.png!



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-37469) Unified "fetchWaitTime" and "shuffleReadTime" metrics On UI

2021-11-25 Thread Apache Spark (Jira)



[ 
https://issues.apache.org/jira/browse/SPARK-37469?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17449383#comment-17449383
 ] 

Apache Spark commented on SPARK-37469:
--

User 'toujours33' has created a pull request for this issue:
https://github.com/apache/spark/pull/34720

> Unified "fetchWaitTime" and "shuffleReadTime" metrics On UI
> ---
>
> Key: SPARK-37469
> URL: https://issues.apache.org/jira/browse/SPARK-37469
> Project: Spark
>  Issue Type: Improvement
>  Components: Web UI
>Affects Versions: 3.2.0
>Reporter: Yazhi Wang
>Priority: Minor
> Attachments: executor-page.png, sql-page.png
>
>
> Metrics in Executor/Task page shown as "
> Shuffle Read Block Time", and the SQL page shown as "fetch wait time" which 
> make us confused  !executor-page.png!
> !sql-page.png!



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-37469) Unified "fetchWaitTime" and "shuffleReadTime" metrics On UI

2021-11-25 Thread Apache Spark (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-37469?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-37469:


Assignee: (was: Apache Spark)

> Unified "fetchWaitTime" and "shuffleReadTime" metrics On UI
> ---
>
> Key: SPARK-37469
> URL: https://issues.apache.org/jira/browse/SPARK-37469
> Project: Spark
>  Issue Type: Improvement
>  Components: Web UI
>Affects Versions: 3.2.0
>Reporter: Yazhi Wang
>Priority: Minor
> Attachments: executor-page.png, sql-page.png
>
>
> Metrics in Executor/Task page shown as "
> Shuffle Read Block Time", and the SQL page shown as "fetch wait time" which 
> make us confused  !executor-page.png!
> !sql-page.png!



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-37469) Unified "fetchWaitTime" and "shuffleReadTime" metrics On UI

2021-11-25 Thread Apache Spark (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-37469?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-37469:


Assignee: Apache Spark

> Unified "fetchWaitTime" and "shuffleReadTime" metrics On UI
> ---
>
> Key: SPARK-37469
> URL: https://issues.apache.org/jira/browse/SPARK-37469
> Project: Spark
>  Issue Type: Improvement
>  Components: Web UI
>Affects Versions: 3.2.0
>Reporter: Yazhi Wang
>Assignee: Apache Spark
>Priority: Minor
> Attachments: executor-page.png, sql-page.png
>
>
> Metrics in Executor/Task page shown as "
> Shuffle Read Block Time", and the SQL page shown as "fetch wait time" which 
> make us confused  !executor-page.png!
> !sql-page.png!



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-37470) A new created table gets duplicated after ALTER DATABASE SET LOCATION command

2021-11-25 Thread Yuto Akutsu (Jira)



[ 
https://issues.apache.org/jira/browse/SPARK-37470?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17449379#comment-17449379
 ] 

Yuto Akutsu commented on SPARK-37470:
-

I'm working on this.

> A new created table gets duplicated after ALTER DATABASE SET LOCATION command
> -
>
> Key: SPARK-37470
> URL: https://issues.apache.org/jira/browse/SPARK-37470
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 3.2.0
>Reporter: Yuto Akutsu
>Priority: Major
>
> Creating and saving a new table after changing the location of a database by 
> ALTER DATABASE SET LOCATION command generates an duplicate of the table in 
> the default location (which can be defined by spark.sql.warehouse.dir).



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Created] (SPARK-37470) A new created table gets duplicated after ALTER DATABASE SET LOCATION command

2021-11-25 Thread Yuto Akutsu (Jira)

Yuto Akutsu created SPARK-37470:
---

 Summary: A new created table gets duplicated after ALTER DATABASE 
SET LOCATION command
 Key: SPARK-37470
 URL: https://issues.apache.org/jira/browse/SPARK-37470
 Project: Spark
  Issue Type: Bug
  Components: SQL
Affects Versions: 3.2.0
Reporter: Yuto Akutsu


Creating and saving a new table after changing the location of a database by 
ALTER DATABASE SET LOCATION command generates an duplicate of the table in the 
default location (which can be defined by spark.sql.warehouse.dir).



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-37469) Unified "fetchWaitTime" and "shuffleReadTime" metrics On UI

2021-11-25 Thread Yazhi Wang (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-37469?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yazhi Wang updated SPARK-37469:
---
Attachment: executor-page.png
sql-page.png

> Unified "fetchWaitTime" and "shuffleReadTime" metrics On UI
> ---
>
> Key: SPARK-37469
> URL: https://issues.apache.org/jira/browse/SPARK-37469
> Project: Spark
>  Issue Type: Improvement
>  Components: Web UI
>Affects Versions: 3.2.0
>Reporter: Yazhi Wang
>Priority: Minor
> Attachments: executor-page.png, sql-page.png
>
>
> Metrics in Executor/Task page shown as "
> Shuffle Read Block Time", and the SQL page shown as "fetch wait time" which 
> make us confused
>  
> !image-2021-11-26-12-12-46-896.png!
> !image-2021-11-26-12-15-28-204.png!



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-37469) Unified "fetchWaitTime" and "shuffleReadTime" metrics On UI

2021-11-25 Thread Yazhi Wang (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-37469?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yazhi Wang updated SPARK-37469:
---
Description: 
Metrics in Executor/Task page shown as "

Shuffle Read Block Time", and the SQL page shown as "fetch wait time" which 
make us confused  !executor-page.png!

!sql-page.png!

  was:
Metrics in Executor/Task page shown as "

Shuffle Read Block Time", and the SQL page shown as "fetch wait time" which 
make us confused !executor-page.png!


> Unified "fetchWaitTime" and "shuffleReadTime" metrics On UI
> ---
>
> Key: SPARK-37469
> URL: https://issues.apache.org/jira/browse/SPARK-37469
> Project: Spark
>  Issue Type: Improvement
>  Components: Web UI
>Affects Versions: 3.2.0
>Reporter: Yazhi Wang
>Priority: Minor
> Attachments: executor-page.png, sql-page.png
>
>
> Metrics in Executor/Task page shown as "
> Shuffle Read Block Time", and the SQL page shown as "fetch wait time" which 
> make us confused  !executor-page.png!
> !sql-page.png!



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-37469) Unified "fetchWaitTime" and "shuffleReadTime" metrics On UI

2021-11-25 Thread Yazhi Wang (Jira)



[ 
https://issues.apache.org/jira/browse/SPARK-37469?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17449371#comment-17449371
 ] 

Yazhi Wang commented on SPARK-37469:


I'm working on it

> Unified "fetchWaitTime" and "shuffleReadTime" metrics On UI
> ---
>
> Key: SPARK-37469
> URL: https://issues.apache.org/jira/browse/SPARK-37469
> Project: Spark
>  Issue Type: Improvement
>  Components: Web UI
>Affects Versions: 3.2.0
>Reporter: Yazhi Wang
>Priority: Minor
> Attachments: executor-page.png, sql-page.png
>
>
> Metrics in Executor/Task page shown as "
> Shuffle Read Block Time", and the SQL page shown as "fetch wait time" which 
> make us confused  !executor-page.png!
> !sql-page.png!



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-37469) Unified "fetchWaitTime" and "shuffleReadTime" metrics On UI

2021-11-25 Thread Yazhi Wang (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-37469?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yazhi Wang updated SPARK-37469:
---
Description: 
Metrics in Executor/Task page shown as "

Shuffle Read Block Time", and the SQL page shown as "fetch wait time" which 
make us confused !executor-page.png!

  was:
Metrics in Executor/Task page shown as "

Shuffle Read Block Time", and the SQL page shown as "fetch wait time" which 
make us confused

 

!image-2021-11-26-12-12-46-896.png!

!image-2021-11-26-12-15-28-204.png!


> Unified "fetchWaitTime" and "shuffleReadTime" metrics On UI
> ---
>
> Key: SPARK-37469
> URL: https://issues.apache.org/jira/browse/SPARK-37469
> Project: Spark
>  Issue Type: Improvement
>  Components: Web UI
>Affects Versions: 3.2.0
>Reporter: Yazhi Wang
>Priority: Minor
> Attachments: executor-page.png, sql-page.png
>
>
> Metrics in Executor/Task page shown as "
> Shuffle Read Block Time", and the SQL page shown as "fetch wait time" which 
> make us confused !executor-page.png!



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Created] (SPARK-37469) Unified "fetchWaitTime" and "shuffleReadTime" metrics On UI

2021-11-25 Thread Yazhi Wang (Jira)

Yazhi Wang created SPARK-37469:
--

 Summary: Unified "fetchWaitTime" and "shuffleReadTime" metrics On 
UI
 Key: SPARK-37469
 URL: https://issues.apache.org/jira/browse/SPARK-37469
 Project: Spark
  Issue Type: Improvement
  Components: Web UI
Affects Versions: 3.2.0
Reporter: Yazhi Wang


Metrics in Executor/Task page shown as "

Shuffle Read Block Time", and the SQL page shown as "fetch wait time" which 
make us confused

 

!image-2021-11-26-12-12-46-896.png!

!image-2021-11-26-12-15-28-204.png!



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-37381) Unify v1 and v2 SHOW CREATE TABLE tests

2021-11-25 Thread Apache Spark (Jira)



[ 
https://issues.apache.org/jira/browse/SPARK-37381?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17449370#comment-17449370
 ] 

Apache Spark commented on SPARK-37381:
--

User 'Peng-Lei' has created a pull request for this issue:
https://github.com/apache/spark/pull/34719

> Unify v1 and v2 SHOW CREATE TABLE  tests
> 
>
> Key: SPARK-37381
> URL: https://issues.apache.org/jira/browse/SPARK-37381
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 3.3.0
>Reporter: PengLei
>Priority: Major
> Fix For: 3.3.0
>
>




--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-37381) Unify v1 and v2 SHOW CREATE TABLE tests

2021-11-25 Thread Apache Spark (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-37381?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-37381:


Assignee: Apache Spark

> Unify v1 and v2 SHOW CREATE TABLE  tests
> 
>
> Key: SPARK-37381
> URL: https://issues.apache.org/jira/browse/SPARK-37381
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 3.3.0
>Reporter: PengLei
>Assignee: Apache Spark
>Priority: Major
> Fix For: 3.3.0
>
>




--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-37381) Unify v1 and v2 SHOW CREATE TABLE tests

2021-11-25 Thread Apache Spark (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-37381?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-37381:


Assignee: (was: Apache Spark)

> Unify v1 and v2 SHOW CREATE TABLE  tests
> 
>
> Key: SPARK-37381
> URL: https://issues.apache.org/jira/browse/SPARK-37381
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 3.3.0
>Reporter: PengLei
>Priority: Major
> Fix For: 3.3.0
>
>




--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-37381) Unify v1 and v2 SHOW CREATE TABLE tests

2021-11-25 Thread Apache Spark (Jira)



[ 
https://issues.apache.org/jira/browse/SPARK-37381?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17449369#comment-17449369
 ] 

Apache Spark commented on SPARK-37381:
--

User 'Peng-Lei' has created a pull request for this issue:
https://github.com/apache/spark/pull/34719

> Unify v1 and v2 SHOW CREATE TABLE  tests
> 
>
> Key: SPARK-37381
> URL: https://issues.apache.org/jira/browse/SPARK-37381
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 3.3.0
>Reporter: PengLei
>Priority: Major
> Fix For: 3.3.0
>
>




--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-37460) ALTER (DATABASE|SCHEMA|NAMESPACE) ... SET LOCATION command not documented

2021-11-25 Thread Apache Spark (Jira)



[ 
https://issues.apache.org/jira/browse/SPARK-37460?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17449367#comment-17449367
 ] 

Apache Spark commented on SPARK-37460:
--

User 'yutoacts' has created a pull request for this issue:
https://github.com/apache/spark/pull/34718

> ALTER (DATABASE|SCHEMA|NAMESPACE) ... SET LOCATION command not documented
> -
>
> Key: SPARK-37460
> URL: https://issues.apache.org/jira/browse/SPARK-37460
> Project: Spark
>  Issue Type: Improvement
>  Components: Documentation, SQL
>Affects Versions: 3.2.0
>Reporter: Yuto Akutsu
>Priority: Minor
>
> The instruction of {color:#ff}ALTER DATABASE ... SET LOCATION ... 
> {color:#172b4d}command{color}{color} should be documented in a 
> sql-ref-syntax-ddl-create-table page.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-37460) ALTER (DATABASE|SCHEMA|NAMESPACE) ... SET LOCATION command not documented

2021-11-25 Thread Apache Spark (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-37460?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-37460:


Assignee: Apache Spark

> ALTER (DATABASE|SCHEMA|NAMESPACE) ... SET LOCATION command not documented
> -
>
> Key: SPARK-37460
> URL: https://issues.apache.org/jira/browse/SPARK-37460
> Project: Spark
>  Issue Type: Improvement
>  Components: Documentation, SQL
>Affects Versions: 3.2.0
>Reporter: Yuto Akutsu
>Assignee: Apache Spark
>Priority: Minor
>
> The instruction of {color:#ff}ALTER DATABASE ... SET LOCATION ... 
> {color:#172b4d}command{color}{color} should be documented in a 
> sql-ref-syntax-ddl-create-table page.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-37460) ALTER (DATABASE|SCHEMA|NAMESPACE) ... SET LOCATION command not documented

2021-11-25 Thread Apache Spark (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-37460?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-37460:


Assignee: (was: Apache Spark)

> ALTER (DATABASE|SCHEMA|NAMESPACE) ... SET LOCATION command not documented
> -
>
> Key: SPARK-37460
> URL: https://issues.apache.org/jira/browse/SPARK-37460
> Project: Spark
>  Issue Type: Improvement
>  Components: Documentation, SQL
>Affects Versions: 3.2.0
>Reporter: Yuto Akutsu
>Priority: Minor
>
> The instruction of {color:#ff}ALTER DATABASE ... SET LOCATION ... 
> {color:#172b4d}command{color}{color} should be documented in a 
> sql-ref-syntax-ddl-create-table page.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-37099) Impl a rank-based filter to optimize top-k computation

2021-11-25 Thread zhengruifeng (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-37099?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

zhengruifeng updated SPARK-37099:
-
Description: 
in JD, we found that more than 90% usage of window function follows this 
pattern:
{code:java}
 select (... [row_number|rank|dense_rank]() over([partition by ...] order by 
...) as rn)
where rn ==[\<=] k and other conditions{code}
 

However, existing physical plan is not optimum:

 

1, we should select local top-k records within each partitions, and then 
compute the global top-k. this can help reduce the shuffle amount;

 

For these three rank functions (row_number|rank|dense_rank), the rank of a key 
computed on partitial dataset  is always <=  its final rank computed on the 
whole dataset.

so we can safely discard rows with partitial rank > rn, anywhere.

 

 

2, skewed-window: some partition is skewed and take a long time to finish 
computation.

 

A real-world skewed-window case in our system is attached.

 

  was:
in JD, we found that more than 80% usage of window function follows this 
pattern:
{code:java}
 select (... row_number() over(partition by ... order by ...) as rn)
where rn ==[\<=] k{code}


  

However, existing physical plan is not optimum:

 

1, we should select local top-k records within each partitions, and then 
compute the global top-k. this can help reduce the shuffle amount;

 

2, skewed-window: some partition is skewed and take a long time to finish 
computation.

 

A real-world skewed-window case in our system is attached.

 


> Impl a rank-based filter to optimize top-k computation
> --
>
> Key: SPARK-37099
> URL: https://issues.apache.org/jira/browse/SPARK-37099
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 3.3.0
>Reporter: zhengruifeng
>Priority: Major
> Attachments: skewed_window.png
>
>
> in JD, we found that more than 90% usage of window function follows this 
> pattern:
> {code:java}
>  select (... [row_number|rank|dense_rank]() over([partition by ...] order by 
> ...) as rn)
> where rn ==[\<=] k and other conditions{code}
>  
> However, existing physical plan is not optimum:
>  
> 1, we should select local top-k records within each partitions, and then 
> compute the global top-k. this can help reduce the shuffle amount;
>  
> For these three rank functions (row_number|rank|dense_rank), the rank of a 
> key computed on partitial dataset  is always <=  its final rank computed on 
> the whole dataset.
> so we can safely discard rows with partitial rank > rn, anywhere.
>  
>  
> 2, skewed-window: some partition is skewed and take a long time to finish 
> computation.
>  
> A real-world skewed-window case in our system is attached.
>  



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-37468) Support ANSI intervals and TimestampNTZ for UnionEstimation

2021-11-25 Thread Apache Spark (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-37468?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-37468:


Assignee: Kousuke Saruta  (was: Apache Spark)

> Support ANSI intervals and TimestampNTZ for UnionEstimation
> ---
>
> Key: SPARK-37468
> URL: https://issues.apache.org/jira/browse/SPARK-37468
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 3.3.0
>Reporter: Kousuke Saruta
>Assignee: Kousuke Saruta
>Priority: Major
>
> Currently, UnionEstimation doesn't support ANSI intervals and TimestampNTZ. 
> But I think it can support those types because their underlying types are 
> integer or long, which UnionEstimation can compute stats for.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-37468) Support ANSI intervals and TimestampNTZ for UnionEstimation

2021-11-25 Thread Apache Spark (Jira)



[ 
https://issues.apache.org/jira/browse/SPARK-37468?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17449364#comment-17449364
 ] 

Apache Spark commented on SPARK-37468:
--

User 'sarutak' has created a pull request for this issue:
https://github.com/apache/spark/pull/34716

> Support ANSI intervals and TimestampNTZ for UnionEstimation
> ---
>
> Key: SPARK-37468
> URL: https://issues.apache.org/jira/browse/SPARK-37468
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 3.3.0
>Reporter: Kousuke Saruta
>Assignee: Kousuke Saruta
>Priority: Major
>
> Currently, UnionEstimation doesn't support ANSI intervals and TimestampNTZ. 
> But I think it can support those types because their underlying types are 
> integer or long, which UnionEstimation can compute stats for.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-37468) Support ANSI intervals and TimestampNTZ for UnionEstimation

2021-11-25 Thread Apache Spark (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-37468?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-37468:


Assignee: Apache Spark  (was: Kousuke Saruta)

> Support ANSI intervals and TimestampNTZ for UnionEstimation
> ---
>
> Key: SPARK-37468
> URL: https://issues.apache.org/jira/browse/SPARK-37468
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 3.3.0
>Reporter: Kousuke Saruta
>Assignee: Apache Spark
>Priority: Major
>
> Currently, UnionEstimation doesn't support ANSI intervals and TimestampNTZ. 
> But I think it can support those types because their underlying types are 
> integer or long, which UnionEstimation can compute stats for.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-37468) Support ANSI intervals and TimestampNTZ for UnionEstimation

2021-11-25 Thread Kousuke Saruta (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-37468?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kousuke Saruta updated SPARK-37468:
---
Description: Currently, UnionEstimation doesn't support ANSI intervals and 
TimestampNTZ. But I think it can support those types because their underlying 
types are integer or long, which UnionEstimation can compute stats for.  (was: 
Currently, UnionEstimation doesn't support ANSI intervals and TimestampNTZ. But 
I think it can support those types because their underlying types are integer 
or long, which it UnionEstimation can compute stats for.)

> Support ANSI intervals and TimestampNTZ for UnionEstimation
> ---
>
> Key: SPARK-37468
> URL: https://issues.apache.org/jira/browse/SPARK-37468
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 3.3.0
>Reporter: Kousuke Saruta
>Assignee: Kousuke Saruta
>Priority: Major
>
> Currently, UnionEstimation doesn't support ANSI intervals and TimestampNTZ. 
> But I think it can support those types because their underlying types are 
> integer or long, which UnionEstimation can compute stats for.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Created] (SPARK-37468) Support ANSI intervals and TimestampNTZ for UnionEstimation

2021-11-25 Thread Kousuke Saruta (Jira)

Kousuke Saruta created SPARK-37468:
--

 Summary: Support ANSI intervals and TimestampNTZ for 
UnionEstimation
 Key: SPARK-37468
 URL: https://issues.apache.org/jira/browse/SPARK-37468
 Project: Spark
  Issue Type: Improvement
  Components: SQL
Affects Versions: 3.3.0
Reporter: Kousuke Saruta
Assignee: Kousuke Saruta


Currently, UnionEstimation doesn't support ANSI intervals and TimestampNTZ. But 
I think it can support those types because their underlying types are integer 
or long, which it UnionEstimation can compute stats for.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-37445) Update hadoop-profile

2021-11-25 Thread Apache Spark (Jira)



[ 
https://issues.apache.org/jira/browse/SPARK-37445?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17449362#comment-17449362
 ] 

Apache Spark commented on SPARK-37445:
--

User 'AngersZh' has created a pull request for this issue:
https://github.com/apache/spark/pull/34715

> Update hadoop-profile
> -
>
> Key: SPARK-37445
> URL: https://issues.apache.org/jira/browse/SPARK-37445
> Project: Spark
>  Issue Type: Task
>  Components: Build
>Affects Versions: 3.2.0
>Reporter: angerszhu
>Priority: Major
>
> Current hadoop profile is hadoop-3.2, update to hadoop-3.3,



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-37465) PySpark tests failing on Pandas 0.23

2021-11-25 Thread Hyukjin Kwon (Jira)



[ 
https://issues.apache.org/jira/browse/SPARK-37465?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17449358#comment-17449358
 ] 

Hyukjin Kwon commented on SPARK-37465:
--

cc [~yikunkero] and [~itholic] FYI

> PySpark tests failing on Pandas 0.23
> 
>
> Key: SPARK-37465
> URL: https://issues.apache.org/jira/browse/SPARK-37465
> Project: Spark
>  Issue Type: Bug
>  Components: PySpark
>Affects Versions: 3.2.0
>Reporter: Willi Raschkowski
>Priority: Major
>
> I was running Spark tests with Pandas {{0.23.4}} and got the error below. The 
> minimum Pandas version is currently {{0.23.2}} 
> [(Github)|https://github.com/apache/spark/blob/v3.2.0/python/setup.py#L114]. 
> Upgrading to {{0.24.0}} fixes the error. I think Spark needs [this fix 
> (Github)|https://github.com/pandas-dev/pandas/pull/21160/files#diff-1b7183f5b3970e2a1d39a82d71686e39c765d18a34231b54c857b0c4c9bb8222]
>  in Pandas.
> {code:java}
> $ python/run-tests --testnames 
> 'pyspark.pandas.tests.data_type_ops.test_boolean_ops 
> BooleanOpsTest.test_floordiv'
> ...
> ==
> ERROR [5.785s]: test_floordiv 
> (pyspark.pandas.tests.data_type_ops.test_boolean_ops.BooleanOpsTest)
> --
> Traceback (most recent call last):
>   File 
> "/home/circleci/project/python/pyspark/pandas/tests/data_type_ops/test_boolean_ops.py",
>  line 128, in test_floordiv
> self.assert_eq(b_pser // b_pser.astype(int), b_psser // 
> b_psser.astype(int))
>   File 
> "/home/circleci/miniconda/envs/python3/lib/python3.6/site-packages/pandas/core/ops.py",
>  line 1069, in wrapper
> result = safe_na_op(lvalues, rvalues)
>   File 
> "/home/circleci/miniconda/envs/python3/lib/python3.6/site-packages/pandas/core/ops.py",
>  line 1033, in safe_na_op
> return na_op(lvalues, rvalues)
>   File 
> "/home/circleci/miniconda/envs/python3/lib/python3.6/site-packages/pandas/core/ops.py",
>  line 1027, in na_op
> result = missing.fill_zeros(result, x, y, op_name, fill_zeros)
>   File 
> "/home/circleci/miniconda/envs/python3/lib/python3.6/site-packages/pandas/core/missing.py",
>  line 641, in fill_zeros
> signs = np.sign(y if name.startswith(('r', '__r')) else x)
> TypeError: ufunc 'sign' did not contain a loop with signature matching types 
> dtype('bool') dtype('bool')
> {code}
> These are my relevant package versions:
> {code:java}
> $ conda list | grep -e numpy -e pyarrow -e pandas -e python
> # packages in environment at /home/circleci/miniconda/envs/python3:
> numpy 1.16.6   py36h0a8e133_3  
> numpy-base1.16.6   py36h41b4c56_3  
> pandas0.23.4   py36h04863e7_0  
> pyarrow   1.0.1   py36h6200943_36_cpuconda-forge
> python3.6.12   hcff3b4d_2anaconda
> python-dateutil   2.8.1  py_0anaconda
> python_abi3.6 1_cp36mconda-forg
> {code}



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-37465) PySpark tests failing on Pandas 0.23

2021-11-25 Thread Hyukjin Kwon (Jira)



[ 
https://issues.apache.org/jira/browse/SPARK-37465?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17449357#comment-17449357
 ] 

Hyukjin Kwon commented on SPARK-37465:
--

Maybe we should rather bump up the minimum pandas version to 1.0.0. Would you 
be interested in submitting a PR? cc [~XinrongM] [~ueshin] FYI

> PySpark tests failing on Pandas 0.23
> 
>
> Key: SPARK-37465
> URL: https://issues.apache.org/jira/browse/SPARK-37465
> Project: Spark
>  Issue Type: Bug
>  Components: PySpark
>Affects Versions: 3.2.0
>Reporter: Willi Raschkowski
>Priority: Major
>
> I was running Spark tests with Pandas {{0.23.4}} and got the error below. The 
> minimum Pandas version is currently {{0.23.2}} 
> [(Github)|https://github.com/apache/spark/blob/v3.2.0/python/setup.py#L114]. 
> Upgrading to {{0.24.0}} fixes the error. I think Spark needs [this fix 
> (Github)|https://github.com/pandas-dev/pandas/pull/21160/files#diff-1b7183f5b3970e2a1d39a82d71686e39c765d18a34231b54c857b0c4c9bb8222]
>  in Pandas.
> {code:java}
> $ python/run-tests --testnames 
> 'pyspark.pandas.tests.data_type_ops.test_boolean_ops 
> BooleanOpsTest.test_floordiv'
> ...
> ==
> ERROR [5.785s]: test_floordiv 
> (pyspark.pandas.tests.data_type_ops.test_boolean_ops.BooleanOpsTest)
> --
> Traceback (most recent call last):
>   File 
> "/home/circleci/project/python/pyspark/pandas/tests/data_type_ops/test_boolean_ops.py",
>  line 128, in test_floordiv
> self.assert_eq(b_pser // b_pser.astype(int), b_psser // 
> b_psser.astype(int))
>   File 
> "/home/circleci/miniconda/envs/python3/lib/python3.6/site-packages/pandas/core/ops.py",
>  line 1069, in wrapper
> result = safe_na_op(lvalues, rvalues)
>   File 
> "/home/circleci/miniconda/envs/python3/lib/python3.6/site-packages/pandas/core/ops.py",
>  line 1033, in safe_na_op
> return na_op(lvalues, rvalues)
>   File 
> "/home/circleci/miniconda/envs/python3/lib/python3.6/site-packages/pandas/core/ops.py",
>  line 1027, in na_op
> result = missing.fill_zeros(result, x, y, op_name, fill_zeros)
>   File 
> "/home/circleci/miniconda/envs/python3/lib/python3.6/site-packages/pandas/core/missing.py",
>  line 641, in fill_zeros
> signs = np.sign(y if name.startswith(('r', '__r')) else x)
> TypeError: ufunc 'sign' did not contain a loop with signature matching types 
> dtype('bool') dtype('bool')
> {code}
> These are my relevant package versions:
> {code:java}
> $ conda list | grep -e numpy -e pyarrow -e pandas -e python
> # packages in environment at /home/circleci/miniconda/envs/python3:
> numpy 1.16.6   py36h0a8e133_3  
> numpy-base1.16.6   py36h41b4c56_3  
> pandas0.23.4   py36h04863e7_0  
> pyarrow   1.0.1   py36h6200943_36_cpuconda-forge
> python3.6.12   hcff3b4d_2anaconda
> python-dateutil   2.8.1  py_0anaconda
> python_abi3.6 1_cp36mconda-forg
> {code}



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-37457) Update cloudpickle to v2.0.0

2021-11-25 Thread Hyukjin Kwon (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-37457?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hyukjin Kwon reassigned SPARK-37457:


Assignee: Hyukjin Kwon

> Update cloudpickle to v2.0.0
> 
>
> Key: SPARK-37457
> URL: https://issues.apache.org/jira/browse/SPARK-37457
> Project: Spark
>  Issue Type: Improvement
>  Components: PySpark
>Affects Versions: 3.3.0
>Reporter: Hyukjin Kwon
>Assignee: Hyukjin Kwon
>Priority: Major
>
> Cloudpickle 1.6.0 is released out. We should better match it to the latest 
> version.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Resolved] (SPARK-37457) Update cloudpickle to v2.0.0

2021-11-25 Thread Hyukjin Kwon (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-37457?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hyukjin Kwon resolved SPARK-37457.
--
Fix Version/s: 3.3.0
   Resolution: Fixed

Issue resolved by pull request 34705
[https://github.com/apache/spark/pull/34705]

> Update cloudpickle to v2.0.0
> 
>
> Key: SPARK-37457
> URL: https://issues.apache.org/jira/browse/SPARK-37457
> Project: Spark
>  Issue Type: Improvement
>  Components: PySpark
>Affects Versions: 3.3.0
>Reporter: Hyukjin Kwon
>Assignee: Hyukjin Kwon
>Priority: Major
> Fix For: 3.3.0
>
>
> Cloudpickle 1.6.0 is released out. We should better match it to the latest 
> version.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-37381) Unify v1 and v2 SHOW CREATE TABLE tests

2021-11-25 Thread dch nguyen (Jira)



[ 
https://issues.apache.org/jira/browse/SPARK-37381?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17449352#comment-17449352
 ] 

dch nguyen commented on SPARK-37381:


Can I go for it? I want to try to fix it [~xiaopenglei] [~maxgekk] 

> Unify v1 and v2 SHOW CREATE TABLE  tests
> 
>
> Key: SPARK-37381
> URL: https://issues.apache.org/jira/browse/SPARK-37381
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 3.3.0
>Reporter: PengLei
>Priority: Major
> Fix For: 3.3.0
>
>




--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Resolved] (SPARK-37436) Uses Python's standard string formatter for SQL API in pandas API on Spark

2021-11-25 Thread Hyukjin Kwon (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-37436?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hyukjin Kwon resolved SPARK-37436.
--
Fix Version/s: 3.3.0
 Assignee: Hyukjin Kwon
   Resolution: Fixed

Fixed in https://github.com/apache/spark/pull/34677

> Uses Python's standard string formatter for SQL API in pandas API on Spark
> --
>
> Key: SPARK-37436
> URL: https://issues.apache.org/jira/browse/SPARK-37436
> Project: Spark
>  Issue Type: Improvement
>  Components: PySpark
>Affects Versions: 3.3.0
>Reporter: Hyukjin Kwon
>Assignee: Hyukjin Kwon
>Priority: Major
> Fix For: 3.3.0
>
>
> Currently {{pyspark.pandas.sql}} uses its own hacky parser: 
> https://github.com/apache/spark/blob/master/python/pyspark/pandas/sql_processor.py
> We should ideally switch it to the standard Python formatter 
> https://docs.python.org/3/library/string.html#custom-string-formatting



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Created] (SPARK-37467) Consolidate whole stage and non-whole stage subexpression elimination

2021-11-25 Thread Adam Binford (Jira)

Adam Binford created SPARK-37467:


 Summary: Consolidate whole stage and non-whole stage subexpression 
elimination
 Key: SPARK-37467
 URL: https://issues.apache.org/jira/browse/SPARK-37467
 Project: Spark
  Issue Type: Improvement
  Components: SQL
Affects Versions: 3.3.0
Reporter: Adam Binford


Currently there are separate subexpression elimination paths for whole stage 
and non-whole stage codegen. Consolidating these into a single code path would 
make it simpler to add further enhancements, such as supporting lambda 
functions  (https://issues.apache.org/jira/browse/SPARK-37466).



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Created] (SPARK-37466) Support subexpression elimination in lambda functions

2021-11-25 Thread Adam Binford (Jira)

Adam Binford created SPARK-37466:


 Summary: Support subexpression elimination in lambda functions
 Key: SPARK-37466
 URL: https://issues.apache.org/jira/browse/SPARK-37466
 Project: Spark
  Issue Type: Improvement
  Components: SQL
Affects Versions: 3.3.0
Reporter: Adam Binford


https://issues.apache.org/jira/browse/SPARK-37019 will add codegen support for 
higher order functions. However we can't support subexpression elimination 
inside of lambda functions because subexpressions are evaluated once per row at 
the beginning of the codegen. Common expressions inside lambda functions can 
easily result in performance degradation due to multiple evaluations of the 
same expression. Subexpression elimination inside of lambda functions needs to 
be handled specially to be evaluated once per function invocation.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-37445) Update hadoop-profile

2021-11-25 Thread Apache Spark (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-37445?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-37445:


Assignee: Apache Spark

> Update hadoop-profile
> -
>
> Key: SPARK-37445
> URL: https://issues.apache.org/jira/browse/SPARK-37445
> Project: Spark
>  Issue Type: Task
>  Components: Build
>Affects Versions: 3.2.0
>Reporter: angerszhu
>Assignee: Apache Spark
>Priority: Major
>
> Current hadoop profile is hadoop-3.2, update to hadoop-3.3,



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-37445) Update hadoop-profile

2021-11-25 Thread Apache Spark (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-37445?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-37445:


Assignee: (was: Apache Spark)

> Update hadoop-profile
> -
>
> Key: SPARK-37445
> URL: https://issues.apache.org/jira/browse/SPARK-37445
> Project: Spark
>  Issue Type: Task
>  Components: Build
>Affects Versions: 3.2.0
>Reporter: angerszhu
>Priority: Major
>
> Current hadoop profile is hadoop-3.2, update to hadoop-3.3,



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Reopened] (SPARK-37445) Update hadoop-profile

2021-11-25 Thread Hyukjin Kwon (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-37445?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hyukjin Kwon reopened SPARK-37445:
--
  Assignee: (was: angerszhu)

Reverted at 
https://github.com/apache/spark/commit/444cfe66a65fbdbda53366154cf547de90309608

> Update hadoop-profile
> -
>
> Key: SPARK-37445
> URL: https://issues.apache.org/jira/browse/SPARK-37445
> Project: Spark
>  Issue Type: Task
>  Components: Build
>Affects Versions: 3.2.0
>Reporter: angerszhu
>Priority: Major
> Fix For: 3.3.0
>
>
> Current hadoop profile is hadoop-3.2, update to hadoop-3.3,



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-37445) Update hadoop-profile

2021-11-25 Thread Hyukjin Kwon (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-37445?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hyukjin Kwon updated SPARK-37445:
-
Fix Version/s: (was: 3.3.0)

> Update hadoop-profile
> -
>
> Key: SPARK-37445
> URL: https://issues.apache.org/jira/browse/SPARK-37445
> Project: Spark
>  Issue Type: Task
>  Components: Build
>Affects Versions: 3.2.0
>Reporter: angerszhu
>Priority: Major
>
> Current hadoop profile is hadoop-3.2, update to hadoop-3.3,



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-32079) PySpark <> Beam pickling issues for collections.namedtuple

2021-11-25 Thread Hyukjin Kwon (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-32079?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hyukjin Kwon reassigned SPARK-32079:


Assignee: Hyukjin Kwon

> PySpark <> Beam pickling issues for collections.namedtuple
> --
>
> Key: SPARK-32079
> URL: https://issues.apache.org/jira/browse/SPARK-32079
> Project: Spark
>  Issue Type: Bug
>  Components: PySpark
>Affects Versions: 3.3.0
>Reporter: Gerard Casas Saez
>Assignee: Hyukjin Kwon
>Priority: Major
>
> PySpark monkeypatching namedtuple makes it difficult/impossible to depickle 
> collections.namedtuple instances from outside of a pyspark environment.
>  
> When PySpark has been loaded into the environment, any time that you try to 
> pickle a namedtuple, you are only able to unpickle it from an environment 
> where the 
> [hijack|https://github.com/apache/spark/blob/master/python/pyspark/serializers.py#L385]
>  has been applied. 
> This conflicts directly when trying to use Beam from a non-Spark environment 
> (namingly Flink or Dataflow) making it impossible to use the pipeline if it 
> has a namedtuple loaded somewhere. 
>  
> {code:python}
> import collections
> import dill
> ColumnInfo = collections.namedtuple(
> "ColumnInfo",
> [
> "name",  # type: ColumnName  # pytype: disable=ignored-type-comment
> "type",  # type: Optional[ColumnType]  # pytype: 
> disable=ignored-type-comment
> ])
> dill.dumps(ColumnInfo('test', int))
> {code}
> {{b'\x80\x03cdill._dill\n_create_namedtuple\nq\x00X\n\x00\x00\x00ColumnInfoq\x01X\x04\x00\x00\x00nameq\x02X\x04\x00\x00\x00typeq\x03\x86q\x04X\x08\x00\x00\x00__main__q\x05\x87q\x06Rq\x07X\x04\x00\x00\x00testq\x08cdill._dill\n_load_type\nq\tX\x03\x00\x00\x00intq\n\x85q\x0bRq\x0c\x86q\r\x81q\x0e.'}}
> {code:python}
> import pyspark
> import collections
> import dill
> ColumnInfo = collections.namedtuple(
> "ColumnInfo",
> [
> "name",  # type: ColumnName  # pytype: disable=ignored-type-comment
> "type",  # type: Optional[ColumnType]  # pytype: 
> disable=ignored-type-comment
> ])
> dill.dumps(ColumnInfo('test', int))
> {code}
> {{b'\x80\x03cpyspark.serializers\n_restore\nq\x00X\n\x00\x00\x00ColumnInfoq\x01X\x04\x00\x00\x00nameq\x02X\x04\x00\x00\x00typeq\x03\x86q\x04X\x04\x00\x00\x00testq\x05cdill._dill\n_load_type\nq\x06X\x03\x00\x00\x00intq\x07\x85q\x08Rq\t\x86q\n\x87q\x0bRq\x0c.'}}
> Second pickled object can only be used from an environment with PySpark. 



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Resolved] (SPARK-32079) PySpark <> Beam pickling issues for collections.namedtuple

2021-11-25 Thread Hyukjin Kwon (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-32079?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hyukjin Kwon resolved SPARK-32079.
--
Fix Version/s: 3.3.0
   Resolution: Fixed

Issue resolved by pull request 34688
[https://github.com/apache/spark/pull/34688]

> PySpark <> Beam pickling issues for collections.namedtuple
> --
>
> Key: SPARK-32079
> URL: https://issues.apache.org/jira/browse/SPARK-32079
> Project: Spark
>  Issue Type: Bug
>  Components: PySpark
>Affects Versions: 3.3.0
>Reporter: Gerard Casas Saez
>Assignee: Hyukjin Kwon
>Priority: Major
> Fix For: 3.3.0
>
>
> PySpark monkeypatching namedtuple makes it difficult/impossible to depickle 
> collections.namedtuple instances from outside of a pyspark environment.
>  
> When PySpark has been loaded into the environment, any time that you try to 
> pickle a namedtuple, you are only able to unpickle it from an environment 
> where the 
> [hijack|https://github.com/apache/spark/blob/master/python/pyspark/serializers.py#L385]
>  has been applied. 
> This conflicts directly when trying to use Beam from a non-Spark environment 
> (namingly Flink or Dataflow) making it impossible to use the pipeline if it 
> has a namedtuple loaded somewhere. 
>  
> {code:python}
> import collections
> import dill
> ColumnInfo = collections.namedtuple(
> "ColumnInfo",
> [
> "name",  # type: ColumnName  # pytype: disable=ignored-type-comment
> "type",  # type: Optional[ColumnType]  # pytype: 
> disable=ignored-type-comment
> ])
> dill.dumps(ColumnInfo('test', int))
> {code}
> {{b'\x80\x03cdill._dill\n_create_namedtuple\nq\x00X\n\x00\x00\x00ColumnInfoq\x01X\x04\x00\x00\x00nameq\x02X\x04\x00\x00\x00typeq\x03\x86q\x04X\x08\x00\x00\x00__main__q\x05\x87q\x06Rq\x07X\x04\x00\x00\x00testq\x08cdill._dill\n_load_type\nq\tX\x03\x00\x00\x00intq\n\x85q\x0bRq\x0c\x86q\r\x81q\x0e.'}}
> {code:python}
> import pyspark
> import collections
> import dill
> ColumnInfo = collections.namedtuple(
> "ColumnInfo",
> [
> "name",  # type: ColumnName  # pytype: disable=ignored-type-comment
> "type",  # type: Optional[ColumnType]  # pytype: 
> disable=ignored-type-comment
> ])
> dill.dumps(ColumnInfo('test', int))
> {code}
> {{b'\x80\x03cpyspark.serializers\n_restore\nq\x00X\n\x00\x00\x00ColumnInfoq\x01X\x04\x00\x00\x00nameq\x02X\x04\x00\x00\x00typeq\x03\x86q\x04X\x04\x00\x00\x00testq\x05cdill._dill\n_load_type\nq\x06X\x03\x00\x00\x00intq\x07\x85q\x08Rq\t\x86q\n\x87q\x0bRq\x0c.'}}
> Second pickled object can only be used from an environment with PySpark. 



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-37465) PySpark tests failing on Pandas 0.23

2021-11-25 Thread Willi Raschkowski (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-37465?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Willi Raschkowski updated SPARK-37465:
--
Description: 
I was running Spark tests with Pandas {{0.23.4}} and got the error below. The 
minimum Pandas version is currently {{0.23.2}} 
[(Github)|https://github.com/apache/spark/blob/v3.2.0/python/setup.py#L114]. 
Upgrading to {{0.24.0}} fixes the error. I think Spark needs [this fix 
(Github)|https://github.com/pandas-dev/pandas/pull/21160/files#diff-1b7183f5b3970e2a1d39a82d71686e39c765d18a34231b54c857b0c4c9bb8222]
 in Pandas.
{code:java}
$ python/run-tests --testnames 
'pyspark.pandas.tests.data_type_ops.test_boolean_ops 
BooleanOpsTest.test_floordiv'

...

==
ERROR [5.785s]: test_floordiv 
(pyspark.pandas.tests.data_type_ops.test_boolean_ops.BooleanOpsTest)
--
Traceback (most recent call last):
  File 
"/home/circleci/project/python/pyspark/pandas/tests/data_type_ops/test_boolean_ops.py",
 line 128, in test_floordiv
self.assert_eq(b_pser // b_pser.astype(int), b_psser // b_psser.astype(int))
  File 
"/home/circleci/miniconda/envs/python3/lib/python3.6/site-packages/pandas/core/ops.py",
 line 1069, in wrapper
result = safe_na_op(lvalues, rvalues)
  File 
"/home/circleci/miniconda/envs/python3/lib/python3.6/site-packages/pandas/core/ops.py",
 line 1033, in safe_na_op
return na_op(lvalues, rvalues)
  File 
"/home/circleci/miniconda/envs/python3/lib/python3.6/site-packages/pandas/core/ops.py",
 line 1027, in na_op
result = missing.fill_zeros(result, x, y, op_name, fill_zeros)
  File 
"/home/circleci/miniconda/envs/python3/lib/python3.6/site-packages/pandas/core/missing.py",
 line 641, in fill_zeros
signs = np.sign(y if name.startswith(('r', '__r')) else x)
TypeError: ufunc 'sign' did not contain a loop with signature matching types 
dtype('bool') dtype('bool')
{code}
These are my relevant package versions:
{code:java}
$ conda list | grep -e numpy -e pyarrow -e pandas -e python
# packages in environment at /home/circleci/miniconda/envs/python3:
numpy 1.16.6   py36h0a8e133_3  
numpy-base1.16.6   py36h41b4c56_3  
pandas0.23.4   py36h04863e7_0  
pyarrow   1.0.1   py36h6200943_36_cpuconda-forge
python3.6.12   hcff3b4d_2anaconda
python-dateutil   2.8.1  py_0anaconda
python_abi3.6 1_cp36mconda-forg
{code}

  was:
I was running Spark tests with Pandas {{0.23.4}} and got the error below. The 
minimum Pandas version is currently {{0.23.2}} 
[(Github)|https://github.com/apache/spark/blob/v3.2.0/python/setup.py#L114]. 
Upgrading to {{0.24.0}} fixes the error. I think Spark needs [this fix 
(Github)|https://github.com/pandas-dev/pandas/pull/21160] in Pandas.
{code:java}
$ python/run-tests --testnames 
'pyspark.pandas.tests.data_type_ops.test_boolean_ops 
BooleanOpsTest.test_floordiv'

...

==
ERROR [5.785s]: test_floordiv 
(pyspark.pandas.tests.data_type_ops.test_boolean_ops.BooleanOpsTest)
--
Traceback (most recent call last):
  File 
"/home/circleci/project/python/pyspark/pandas/tests/data_type_ops/test_boolean_ops.py",
 line 128, in test_floordiv
self.assert_eq(b_pser // b_pser.astype(int), b_psser // b_psser.astype(int))
  File 
"/home/circleci/miniconda/envs/python3/lib/python3.6/site-packages/pandas/core/ops.py",
 line 1069, in wrapper
result = safe_na_op(lvalues, rvalues)
  File 
"/home/circleci/miniconda/envs/python3/lib/python3.6/site-packages/pandas/core/ops.py",
 line 1033, in safe_na_op
return na_op(lvalues, rvalues)
  File 
"/home/circleci/miniconda/envs/python3/lib/python3.6/site-packages/pandas/core/ops.py",
 line 1027, in na_op
result = missing.fill_zeros(result, x, y, op_name, fill_zeros)
  File 
"/home/circleci/miniconda/envs/python3/lib/python3.6/site-packages/pandas/core/missing.py",
 line 641, in fill_zeros
signs = np.sign(y if name.startswith(('r', '__r')) else x)
TypeError: ufunc 'sign' did not contain a loop with signature matching types 
dtype('bool') dtype('bool')
{code}
These are my relevant package versions:
{code:java}
$ conda list | grep -e numpy -e pyarrow -e pandas -e python
# packages in environment at /home/circleci/miniconda/envs/python3:
numpy 1.16.6   py36h0a8e133_3  
numpy-base1.16.6   py36h41b4c56_3  
pandas0.23.4   py36h04863e7_0  
pyarrow   1.0.1   py36h6200943_36_cpuconda-forge
python3.6.12   hcff3b4d_2

[jira] [Created] (SPARK-37465) PySpark tests failing on Pandas 0.23

2021-11-25 Thread Willi Raschkowski (Jira)

Willi Raschkowski created SPARK-37465:
-

 Summary: PySpark tests failing on Pandas 0.23
 Key: SPARK-37465
 URL: https://issues.apache.org/jira/browse/SPARK-37465
 Project: Spark
  Issue Type: Bug
  Components: PySpark
Affects Versions: 3.2.0
Reporter: Willi Raschkowski


I was running Spark tests with Pandas {{0.23.4}} and got the error below. The 
minimum Pandas version is currently {{0.23.2}} 
[(Github)|https://github.com/apache/spark/blob/v3.2.0/python/setup.py#L114]. 
Upgrading to {{0.24.0}} fixes the error. I think Spark needs [this fix 
(Github)|https://github.com/pandas-dev/pandas/pull/21160] in Pandas.
{code:java}
$ python/run-tests --testnames 
'pyspark.pandas.tests.data_type_ops.test_boolean_ops 
BooleanOpsTest.test_floordiv'

...

==
ERROR [5.785s]: test_floordiv 
(pyspark.pandas.tests.data_type_ops.test_boolean_ops.BooleanOpsTest)
--
Traceback (most recent call last):
  File 
"/home/circleci/project/python/pyspark/pandas/tests/data_type_ops/test_boolean_ops.py",
 line 128, in test_floordiv
self.assert_eq(b_pser // b_pser.astype(int), b_psser // b_psser.astype(int))
  File 
"/home/circleci/miniconda/envs/python3/lib/python3.6/site-packages/pandas/core/ops.py",
 line 1069, in wrapper
result = safe_na_op(lvalues, rvalues)
  File 
"/home/circleci/miniconda/envs/python3/lib/python3.6/site-packages/pandas/core/ops.py",
 line 1033, in safe_na_op
return na_op(lvalues, rvalues)
  File 
"/home/circleci/miniconda/envs/python3/lib/python3.6/site-packages/pandas/core/ops.py",
 line 1027, in na_op
result = missing.fill_zeros(result, x, y, op_name, fill_zeros)
  File 
"/home/circleci/miniconda/envs/python3/lib/python3.6/site-packages/pandas/core/missing.py",
 line 641, in fill_zeros
signs = np.sign(y if name.startswith(('r', '__r')) else x)
TypeError: ufunc 'sign' did not contain a loop with signature matching types 
dtype('bool') dtype('bool')
{code}
These are my relevant package versions:
{code:java}
$ conda list | grep -e numpy -e pyarrow -e pandas -e python
# packages in environment at /home/circleci/miniconda/envs/python3:
numpy 1.16.6   py36h0a8e133_3  
numpy-base1.16.6   py36h41b4c56_3  
pandas0.23.4   py36h04863e7_0  
pyarrow   1.0.1   py36h6200943_36_cpuconda-forge
python3.6.12   hcff3b4d_2anaconda
python-dateutil   2.8.1  py_0anaconda
python_abi3.6 1_cp36mconda-forg
{code}



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-37464) SCHEMA and DATABASE should simply be aliases of NAMESPACE

2021-11-25 Thread Apache Spark (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-37464?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-37464:


Assignee: Apache Spark

> SCHEMA and DATABASE should simply be aliases of NAMESPACE
> -
>
> Key: SPARK-37464
> URL: https://issues.apache.org/jira/browse/SPARK-37464
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 3.3.0
>Reporter: Wenchen Fan
>Assignee: Apache Spark
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-37464) SCHEMA and DATABASE should simply be aliases of NAMESPACE

2021-11-25 Thread Apache Spark (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-37464?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-37464:


Assignee: (was: Apache Spark)

> SCHEMA and DATABASE should simply be aliases of NAMESPACE
> -
>
> Key: SPARK-37464
> URL: https://issues.apache.org/jira/browse/SPARK-37464
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 3.3.0
>Reporter: Wenchen Fan
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-37464) SCHEMA and DATABASE should simply be aliases of NAMESPACE

2021-11-25 Thread Apache Spark (Jira)



[ 
https://issues.apache.org/jira/browse/SPARK-37464?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17449227#comment-17449227
 ] 

Apache Spark commented on SPARK-37464:
--

User 'cloud-fan' has created a pull request for this issue:
https://github.com/apache/spark/pull/34713

> SCHEMA and DATABASE should simply be aliases of NAMESPACE
> -
>
> Key: SPARK-37464
> URL: https://issues.apache.org/jira/browse/SPARK-37464
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 3.3.0
>Reporter: Wenchen Fan
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Created] (SPARK-37464) SCHEMA and DATABASE should simply be aliases of NAMESPACE

2021-11-25 Thread Wenchen Fan (Jira)

Wenchen Fan created SPARK-37464:
---

 Summary: SCHEMA and DATABASE should simply be aliases of NAMESPACE
 Key: SPARK-37464
 URL: https://issues.apache.org/jira/browse/SPARK-37464
 Project: Spark
  Issue Type: Improvement
  Components: SQL
Affects Versions: 3.3.0
Reporter: Wenchen Fan






--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-37463) Read/Write Timestamp ntz or ltz to Orc uses UTC timestamp

2021-11-25 Thread jiaan.geng (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-37463?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

jiaan.geng updated SPARK-37463:
---
Summary: Read/Write Timestamp ntz or ltz to Orc uses UTC timestamp  (was: 
read/write Timestamp ntz or ltz to Orc uses UTC timestamp)

> Read/Write Timestamp ntz or ltz to Orc uses UTC timestamp
> -
>
> Key: SPARK-37463
> URL: https://issues.apache.org/jira/browse/SPARK-37463
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 3.3.0
>Reporter: jiaan.geng
>Priority: Major
>
> There are some example code:
> import java.util.TimeZone
> TimeZone.setDefault(TimeZone.getTimeZone("America/Los_Angeles"))
> sql("set spark.sql.session.timeZone=America/Los_Angeles")
> val df = sql("select timestamp_ntz '2021-06-01 00:00:00' ts_ntz, timestamp 
> '2021-06-01 00:00:00' ts")
> df.write.mode("overwrite").orc("ts_ntz_orc")
> df.write.mode("overwrite").parquet("ts_ntz_parquet")
> df.write.mode("overwrite").format("avro").save("ts_ntz_avro")
> val query = """
>   select 'orc', *
>   from `orc`.`ts_ntz_orc`
>   union all
>   select 'parquet', *
>   from `parquet`.`ts_ntz_parquet`
>   union all
>   select 'avro', *
>   from `avro`.`ts_ntz_avro`
> """
> val tzs = Seq("America/Los_Angeles", "UTC", "Europe/Amsterdam")
> for (tz <- tzs) {
>   TimeZone.setDefault(TimeZone.getTimeZone(tz))
>   sql(s"set spark.sql.session.timeZone=$tz")
>   println(s"Time zone is ${TimeZone.getDefault.getID}")
>   sql(query).show(false)
> }
> The output show below looks so strange.
> Time zone is America/Los_Angeles
> +---+---+---+
> |orc|ts_ntz |ts |
> +---+---+---+
> |orc|2021-06-01 00:00:00|2021-06-01 00:00:00|
> |parquet|2021-06-01 00:00:00|2021-06-01 00:00:00|
> |avro   |2021-06-01 00:00:00|2021-06-01 00:00:00|
> +---+---+---+
> Time zone is UTC
> +---+---+---+
> |orc|ts_ntz |ts |
> +---+---+---+
> |orc|2021-05-31 17:00:00|2021-06-01 00:00:00|
> |parquet|2021-06-01 00:00:00|2021-06-01 07:00:00|
> |avro   |2021-06-01 00:00:00|2021-06-01 07:00:00|
> +---+---+---+
> Time zone is Europe/Amsterdam
> +---+---+---+
> |orc|ts_ntz |ts |
> +---+---+---+
> |orc|2021-05-31 15:00:00|2021-06-01 00:00:00|
> |parquet|2021-06-01 00:00:00|2021-06-01 09:00:00|
> |avro   |2021-06-01 00:00:00|2021-06-01 09:00:00|
> +---+---+---+



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-37463) read/write Timestamp ntz or ltz to Orc uses UTC timestamp

2021-11-25 Thread Apache Spark (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-37463?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-37463:


Assignee: Apache Spark

> read/write Timestamp ntz or ltz to Orc uses UTC timestamp
> -
>
> Key: SPARK-37463
> URL: https://issues.apache.org/jira/browse/SPARK-37463
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 3.3.0
>Reporter: jiaan.geng
>Assignee: Apache Spark
>Priority: Major
>
> There are some example code:
> import java.util.TimeZone
> TimeZone.setDefault(TimeZone.getTimeZone("America/Los_Angeles"))
> sql("set spark.sql.session.timeZone=America/Los_Angeles")
> val df = sql("select timestamp_ntz '2021-06-01 00:00:00' ts_ntz, timestamp 
> '2021-06-01 00:00:00' ts")
> df.write.mode("overwrite").orc("ts_ntz_orc")
> df.write.mode("overwrite").parquet("ts_ntz_parquet")
> df.write.mode("overwrite").format("avro").save("ts_ntz_avro")
> val query = """
>   select 'orc', *
>   from `orc`.`ts_ntz_orc`
>   union all
>   select 'parquet', *
>   from `parquet`.`ts_ntz_parquet`
>   union all
>   select 'avro', *
>   from `avro`.`ts_ntz_avro`
> """
> val tzs = Seq("America/Los_Angeles", "UTC", "Europe/Amsterdam")
> for (tz <- tzs) {
>   TimeZone.setDefault(TimeZone.getTimeZone(tz))
>   sql(s"set spark.sql.session.timeZone=$tz")
>   println(s"Time zone is ${TimeZone.getDefault.getID}")
>   sql(query).show(false)
> }
> The output show below looks so strange.
> Time zone is America/Los_Angeles
> +---+---+---+
> |orc|ts_ntz |ts |
> +---+---+---+
> |orc|2021-06-01 00:00:00|2021-06-01 00:00:00|
> |parquet|2021-06-01 00:00:00|2021-06-01 00:00:00|
> |avro   |2021-06-01 00:00:00|2021-06-01 00:00:00|
> +---+---+---+
> Time zone is UTC
> +---+---+---+
> |orc|ts_ntz |ts |
> +---+---+---+
> |orc|2021-05-31 17:00:00|2021-06-01 00:00:00|
> |parquet|2021-06-01 00:00:00|2021-06-01 07:00:00|
> |avro   |2021-06-01 00:00:00|2021-06-01 07:00:00|
> +---+---+---+
> Time zone is Europe/Amsterdam
> +---+---+---+
> |orc|ts_ntz |ts |
> +---+---+---+
> |orc|2021-05-31 15:00:00|2021-06-01 00:00:00|
> |parquet|2021-06-01 00:00:00|2021-06-01 09:00:00|
> |avro   |2021-06-01 00:00:00|2021-06-01 09:00:00|
> +---+---+---+



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-37463) read/write Timestamp ntz or ltz to Orc uses UTC timestamp

2021-11-25 Thread Apache Spark (Jira)



[ 
https://issues.apache.org/jira/browse/SPARK-37463?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17449179#comment-17449179
 ] 

Apache Spark commented on SPARK-37463:
--

User 'beliefer' has created a pull request for this issue:
https://github.com/apache/spark/pull/34712

> read/write Timestamp ntz or ltz to Orc uses UTC timestamp
> -
>
> Key: SPARK-37463
> URL: https://issues.apache.org/jira/browse/SPARK-37463
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 3.3.0
>Reporter: jiaan.geng
>Priority: Major
>
> There are some example code:
> import java.util.TimeZone
> TimeZone.setDefault(TimeZone.getTimeZone("America/Los_Angeles"))
> sql("set spark.sql.session.timeZone=America/Los_Angeles")
> val df = sql("select timestamp_ntz '2021-06-01 00:00:00' ts_ntz, timestamp 
> '2021-06-01 00:00:00' ts")
> df.write.mode("overwrite").orc("ts_ntz_orc")
> df.write.mode("overwrite").parquet("ts_ntz_parquet")
> df.write.mode("overwrite").format("avro").save("ts_ntz_avro")
> val query = """
>   select 'orc', *
>   from `orc`.`ts_ntz_orc`
>   union all
>   select 'parquet', *
>   from `parquet`.`ts_ntz_parquet`
>   union all
>   select 'avro', *
>   from `avro`.`ts_ntz_avro`
> """
> val tzs = Seq("America/Los_Angeles", "UTC", "Europe/Amsterdam")
> for (tz <- tzs) {
>   TimeZone.setDefault(TimeZone.getTimeZone(tz))
>   sql(s"set spark.sql.session.timeZone=$tz")
>   println(s"Time zone is ${TimeZone.getDefault.getID}")
>   sql(query).show(false)
> }
> The output show below looks so strange.
> Time zone is America/Los_Angeles
> +---+---+---+
> |orc|ts_ntz |ts |
> +---+---+---+
> |orc|2021-06-01 00:00:00|2021-06-01 00:00:00|
> |parquet|2021-06-01 00:00:00|2021-06-01 00:00:00|
> |avro   |2021-06-01 00:00:00|2021-06-01 00:00:00|
> +---+---+---+
> Time zone is UTC
> +---+---+---+
> |orc|ts_ntz |ts |
> +---+---+---+
> |orc|2021-05-31 17:00:00|2021-06-01 00:00:00|
> |parquet|2021-06-01 00:00:00|2021-06-01 07:00:00|
> |avro   |2021-06-01 00:00:00|2021-06-01 07:00:00|
> +---+---+---+
> Time zone is Europe/Amsterdam
> +---+---+---+
> |orc|ts_ntz |ts |
> +---+---+---+
> |orc|2021-05-31 15:00:00|2021-06-01 00:00:00|
> |parquet|2021-06-01 00:00:00|2021-06-01 09:00:00|
> |avro   |2021-06-01 00:00:00|2021-06-01 09:00:00|
> +---+---+---+



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-37463) read/write Timestamp ntz or ltz to Orc uses UTC timestamp

2021-11-25 Thread Apache Spark (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-37463?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-37463:


Assignee: (was: Apache Spark)

> read/write Timestamp ntz or ltz to Orc uses UTC timestamp
> -
>
> Key: SPARK-37463
> URL: https://issues.apache.org/jira/browse/SPARK-37463
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 3.3.0
>Reporter: jiaan.geng
>Priority: Major
>
> There are some example code:
> import java.util.TimeZone
> TimeZone.setDefault(TimeZone.getTimeZone("America/Los_Angeles"))
> sql("set spark.sql.session.timeZone=America/Los_Angeles")
> val df = sql("select timestamp_ntz '2021-06-01 00:00:00' ts_ntz, timestamp 
> '2021-06-01 00:00:00' ts")
> df.write.mode("overwrite").orc("ts_ntz_orc")
> df.write.mode("overwrite").parquet("ts_ntz_parquet")
> df.write.mode("overwrite").format("avro").save("ts_ntz_avro")
> val query = """
>   select 'orc', *
>   from `orc`.`ts_ntz_orc`
>   union all
>   select 'parquet', *
>   from `parquet`.`ts_ntz_parquet`
>   union all
>   select 'avro', *
>   from `avro`.`ts_ntz_avro`
> """
> val tzs = Seq("America/Los_Angeles", "UTC", "Europe/Amsterdam")
> for (tz <- tzs) {
>   TimeZone.setDefault(TimeZone.getTimeZone(tz))
>   sql(s"set spark.sql.session.timeZone=$tz")
>   println(s"Time zone is ${TimeZone.getDefault.getID}")
>   sql(query).show(false)
> }
> The output show below looks so strange.
> Time zone is America/Los_Angeles
> +---+---+---+
> |orc|ts_ntz |ts |
> +---+---+---+
> |orc|2021-06-01 00:00:00|2021-06-01 00:00:00|
> |parquet|2021-06-01 00:00:00|2021-06-01 00:00:00|
> |avro   |2021-06-01 00:00:00|2021-06-01 00:00:00|
> +---+---+---+
> Time zone is UTC
> +---+---+---+
> |orc|ts_ntz |ts |
> +---+---+---+
> |orc|2021-05-31 17:00:00|2021-06-01 00:00:00|
> |parquet|2021-06-01 00:00:00|2021-06-01 07:00:00|
> |avro   |2021-06-01 00:00:00|2021-06-01 07:00:00|
> +---+---+---+
> Time zone is Europe/Amsterdam
> +---+---+---+
> |orc|ts_ntz |ts |
> +---+---+---+
> |orc|2021-05-31 15:00:00|2021-06-01 00:00:00|
> |parquet|2021-06-01 00:00:00|2021-06-01 09:00:00|
> |avro   |2021-06-01 00:00:00|2021-06-01 09:00:00|
> +---+---+---+



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Created] (SPARK-37463) read/write Timestamp ntz or ltz to Orc uses UTC timestamp

2021-11-25 Thread jiaan.geng (Jira)

jiaan.geng created SPARK-37463:
--

 Summary: read/write Timestamp ntz or ltz to Orc uses UTC timestamp
 Key: SPARK-37463
 URL: https://issues.apache.org/jira/browse/SPARK-37463
 Project: Spark
  Issue Type: Bug
  Components: SQL
Affects Versions: 3.3.0
Reporter: jiaan.geng


There are some example code:
import java.util.TimeZone

TimeZone.setDefault(TimeZone.getTimeZone("America/Los_Angeles"))
sql("set spark.sql.session.timeZone=America/Los_Angeles")

val df = sql("select timestamp_ntz '2021-06-01 00:00:00' ts_ntz, timestamp 
'2021-06-01 00:00:00' ts")

df.write.mode("overwrite").orc("ts_ntz_orc")
df.write.mode("overwrite").parquet("ts_ntz_parquet")
df.write.mode("overwrite").format("avro").save("ts_ntz_avro")

val query = """
  select 'orc', *
  from `orc`.`ts_ntz_orc`
  union all
  select 'parquet', *
  from `parquet`.`ts_ntz_parquet`
  union all
  select 'avro', *
  from `avro`.`ts_ntz_avro`
"""

val tzs = Seq("America/Los_Angeles", "UTC", "Europe/Amsterdam")
for (tz <- tzs) {
  TimeZone.setDefault(TimeZone.getTimeZone(tz))
  sql(s"set spark.sql.session.timeZone=$tz")

  println(s"Time zone is ${TimeZone.getDefault.getID}")
  sql(query).show(false)
}

The output show below looks so strange.
Time zone is America/Los_Angeles
+---+---+---+
|orc|ts_ntz |ts |
+---+---+---+
|orc|2021-06-01 00:00:00|2021-06-01 00:00:00|
|parquet|2021-06-01 00:00:00|2021-06-01 00:00:00|
|avro   |2021-06-01 00:00:00|2021-06-01 00:00:00|
+---+---+---+

Time zone is UTC
+---+---+---+
|orc|ts_ntz |ts |
+---+---+---+
|orc|2021-05-31 17:00:00|2021-06-01 00:00:00|
|parquet|2021-06-01 00:00:00|2021-06-01 07:00:00|
|avro   |2021-06-01 00:00:00|2021-06-01 07:00:00|
+---+---+---+

Time zone is Europe/Amsterdam
+---+---+---+
|orc|ts_ntz |ts |
+---+---+---+
|orc|2021-05-31 15:00:00|2021-06-01 00:00:00|
|parquet|2021-06-01 00:00:00|2021-06-01 09:00:00|
|avro   |2021-06-01 00:00:00|2021-06-01 09:00:00|
+---+---+---+



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-37462) Avoid unnecessary calculating the number of outstanding fetch requests and RPCS

2021-11-25 Thread Apache Spark (Jira)



[ 
https://issues.apache.org/jira/browse/SPARK-37462?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17449170#comment-17449170
 ] 

Apache Spark commented on SPARK-37462:
--

User 'weixiuli' has created a pull request for this issue:
https://github.com/apache/spark/pull/34711

>  Avoid unnecessary calculating the number of outstanding fetch requests and 
> RPCS
> 
>
> Key: SPARK-37462
> URL: https://issues.apache.org/jira/browse/SPARK-37462
> Project: Spark
>  Issue Type: Improvement
>  Components: Shuffle, Spark Core
>Affects Versions: 3.1.0, 3.2.0
>Reporter: weixiuli
>Priority: Major
>
> It is unnecessary to calculate the number of outstanding fetch requests and 
> RPCS when the IdleStateEvent is not IDLE or the last request is not timeout.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-37462) Avoid unnecessary calculating the number of outstanding fetch requests and RPCS

2021-11-25 Thread Apache Spark (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-37462?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-37462:


Assignee: Apache Spark

>  Avoid unnecessary calculating the number of outstanding fetch requests and 
> RPCS
> 
>
> Key: SPARK-37462
> URL: https://issues.apache.org/jira/browse/SPARK-37462
> Project: Spark
>  Issue Type: Improvement
>  Components: Shuffle, Spark Core
>Affects Versions: 3.1.0, 3.2.0
>Reporter: weixiuli
>Assignee: Apache Spark
>Priority: Major
>
> It is unnecessary to calculate the number of outstanding fetch requests and 
> RPCS when the IdleStateEvent is not IDLE or the last request is not timeout.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-37462) Avoid unnecessary calculating the number of outstanding fetch requests and RPCS

2021-11-25 Thread Apache Spark (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-37462?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-37462:


Assignee: (was: Apache Spark)

>  Avoid unnecessary calculating the number of outstanding fetch requests and 
> RPCS
> 
>
> Key: SPARK-37462
> URL: https://issues.apache.org/jira/browse/SPARK-37462
> Project: Spark
>  Issue Type: Improvement
>  Components: Shuffle, Spark Core
>Affects Versions: 3.1.0, 3.2.0
>Reporter: weixiuli
>Priority: Major
>
> It is unnecessary to calculate the number of outstanding fetch requests and 
> RPCS when the IdleStateEvent is not IDLE or the last request is not timeout.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-37445) Update hadoop-profile

2021-11-25 Thread Wenchen Fan (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-37445?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wenchen Fan reassigned SPARK-37445:
---

Assignee: angerszhu

> Update hadoop-profile
> -
>
> Key: SPARK-37445
> URL: https://issues.apache.org/jira/browse/SPARK-37445
> Project: Spark
>  Issue Type: Task
>  Components: Build
>Affects Versions: 3.2.0
>Reporter: angerszhu
>Assignee: angerszhu
>Priority: Major
>
> Current hadoop profile is hadoop-3.2, update to hadoop-3.3,



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Resolved] (SPARK-37445) Update hadoop-profile

2021-11-25 Thread Wenchen Fan (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-37445?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wenchen Fan resolved SPARK-37445.
-
Fix Version/s: 3.3.0
   Resolution: Fixed

Issue resolved by pull request 34689
[https://github.com/apache/spark/pull/34689]

> Update hadoop-profile
> -
>
> Key: SPARK-37445
> URL: https://issues.apache.org/jira/browse/SPARK-37445
> Project: Spark
>  Issue Type: Task
>  Components: Build
>Affects Versions: 3.2.0
>Reporter: angerszhu
>Assignee: angerszhu
>Priority: Major
> Fix For: 3.3.0
>
>
> Current hadoop profile is hadoop-3.2, update to hadoop-3.3,



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-37462) Avoid unnecessary calculating the number of outstanding fetch requests and RPCS

2021-11-25 Thread weixiuli (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-37462?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

weixiuli updated SPARK-37462:
-
Description: It is unnecessary to calculate the number of outstanding fetch 
requests and RPCS when the IdleStateEvent is not IDLE or the last request is 
not timeout.  (was: To avoid unnecessary calculation of outstanding fetch 
requests and RPCS)
Summary:  Avoid unnecessary calculating the number of outstanding fetch 
requests and RPCS  (was: To avoid unnecessary calculation of outstanding fetch 
requests and RPCS)

>  Avoid unnecessary calculating the number of outstanding fetch requests and 
> RPCS
> 
>
> Key: SPARK-37462
> URL: https://issues.apache.org/jira/browse/SPARK-37462
> Project: Spark
>  Issue Type: Improvement
>  Components: Shuffle, Spark Core
>Affects Versions: 3.1.0, 3.2.0
>Reporter: weixiuli
>Priority: Major
>
> It is unnecessary to calculate the number of outstanding fetch requests and 
> RPCS when the IdleStateEvent is not IDLE or the last request is not timeout.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-37462) To avoid unnecessary calculation of outstanding fetch requests and RPCS

2021-11-25 Thread weixiuli (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-37462?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

weixiuli updated SPARK-37462:
-
Description: To avoid unnecessary calculation of outstanding fetch requests 
and RPCS  (was: To avoid unnecessary flight request calculations)
Summary: To avoid unnecessary calculation of outstanding fetch requests 
and RPCS  (was: To avoid unnecessary flight request calculations)

> To avoid unnecessary calculation of outstanding fetch requests and RPCS
> ---
>
> Key: SPARK-37462
> URL: https://issues.apache.org/jira/browse/SPARK-37462
> Project: Spark
>  Issue Type: Improvement
>  Components: Shuffle, Spark Core
>Affects Versions: 3.1.0, 3.2.0
>Reporter: weixiuli
>Priority: Major
>
> To avoid unnecessary calculation of outstanding fetch requests and RPCS



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Resolved] (SPARK-37311) Migrate ALTER NAMESPACE ... SET LOCATION to use v2 command by default

2021-11-25 Thread Wenchen Fan (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-37311?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wenchen Fan resolved SPARK-37311.
-
Fix Version/s: 3.3.0
   Resolution: Fixed

Issue resolved by pull request 34708
[https://github.com/apache/spark/pull/34708]

> Migrate ALTER NAMESPACE ... SET LOCATION to use v2 command by default
> -
>
> Key: SPARK-37311
> URL: https://issues.apache.org/jira/browse/SPARK-37311
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 3.3.0
>Reporter: Terry Kim
>Assignee: Terry Kim
>Priority: Major
> Fix For: 3.3.0
>
>
> Migrate ALTER NAMESPACE ... SET LOCATION to use v2 command by default



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-37311) Migrate ALTER NAMESPACE ... SET LOCATION to use v2 command by default

2021-11-25 Thread Wenchen Fan (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-37311?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wenchen Fan reassigned SPARK-37311:
---

Assignee: Terry Kim

> Migrate ALTER NAMESPACE ... SET LOCATION to use v2 command by default
> -
>
> Key: SPARK-37311
> URL: https://issues.apache.org/jira/browse/SPARK-37311
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 3.3.0
>Reporter: Terry Kim
>Assignee: Terry Kim
>Priority: Major
>
> Migrate ALTER NAMESPACE ... SET LOCATION to use v2 command by default



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Created] (SPARK-37462) To avoid unnecessary flight request calculations

2021-11-25 Thread weixiuli (Jira)

weixiuli created SPARK-37462:


 Summary: To avoid unnecessary flight request calculations
 Key: SPARK-37462
 URL: https://issues.apache.org/jira/browse/SPARK-37462
 Project: Spark
  Issue Type: Improvement
  Components: Shuffle, Spark Core
Affects Versions: 3.2.0, 3.1.0
Reporter: weixiuli


To avoid unnecessary flight request calculations



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-37461) yarn-client mode client's appid value is null

2021-11-25 Thread Apache Spark (Jira)



[ 
https://issues.apache.org/jira/browse/SPARK-37461?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17449135#comment-17449135
 ] 

Apache Spark commented on SPARK-37461:
--

User 'AngersZh' has created a pull request for this issue:
https://github.com/apache/spark/pull/34710

> yarn-client mode client's appid value is null
> -
>
> Key: SPARK-37461
> URL: https://issues.apache.org/jira/browse/SPARK-37461
> Project: Spark
>  Issue Type: Task
>  Components: YARN
>Affects Versions: 3.2.0
>Reporter: angerszhu
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-37461) yarn-client mode client's appid value is null

2021-11-25 Thread Apache Spark (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-37461?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-37461:


Assignee: (was: Apache Spark)

> yarn-client mode client's appid value is null
> -
>
> Key: SPARK-37461
> URL: https://issues.apache.org/jira/browse/SPARK-37461
> Project: Spark
>  Issue Type: Task
>  Components: YARN
>Affects Versions: 3.2.0
>Reporter: angerszhu
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-37461) yarn-client mode client's appid value is null

2021-11-25 Thread Apache Spark (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-37461?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-37461:


Assignee: Apache Spark

> yarn-client mode client's appid value is null
> -
>
> Key: SPARK-37461
> URL: https://issues.apache.org/jira/browse/SPARK-37461
> Project: Spark
>  Issue Type: Task
>  Components: YARN
>Affects Versions: 3.2.0
>Reporter: angerszhu
>Assignee: Apache Spark
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-37461) yarn-client mode client's appid value is null

2021-11-25 Thread Apache Spark (Jira)



[ 
https://issues.apache.org/jira/browse/SPARK-37461?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17449133#comment-17449133
 ] 

Apache Spark commented on SPARK-37461:
--

User 'AngersZh' has created a pull request for this issue:
https://github.com/apache/spark/pull/34710

> yarn-client mode client's appid value is null
> -
>
> Key: SPARK-37461
> URL: https://issues.apache.org/jira/browse/SPARK-37461
> Project: Spark
>  Issue Type: Task
>  Components: YARN
>Affects Versions: 3.2.0
>Reporter: angerszhu
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Created] (SPARK-37461) yarn-client mode client's appid value is null

2021-11-25 Thread angerszhu (Jira)

angerszhu created SPARK-37461:
-

 Summary: yarn-client mode client's appid value is null
 Key: SPARK-37461
 URL: https://issues.apache.org/jira/browse/SPARK-37461
 Project: Spark
  Issue Type: Task
  Components: YARN
Affects Versions: 3.2.0
Reporter: angerszhu






--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-37259) JDBC read is always going to wrap the query in a select statement

2021-11-25 Thread Apache Spark (Jira)



[ 
https://issues.apache.org/jira/browse/SPARK-37259?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17449072#comment-17449072
 ] 

Apache Spark commented on SPARK-37259:
--

User 'akhalymon-cv' has created a pull request for this issue:
https://github.com/apache/spark/pull/34709

> JDBC read is always going to wrap the query in a select statement
> -
>
> Key: SPARK-37259
> URL: https://issues.apache.org/jira/browse/SPARK-37259
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 3.1.2
>Reporter: Kevin Appel
>Priority: Major
>
> The read jdbc is wrapping the query it sends to the database server inside a 
> select statement and there is no way to override this currently.
> Initially I ran into this issue when trying to run a CTE query against SQL 
> server and it fails, the details of the failure is in these cases:
> [https://github.com/microsoft/mssql-jdbc/issues/1340]
> [https://github.com/microsoft/mssql-jdbc/issues/1657]
> [https://github.com/microsoft/sql-spark-connector/issues/147]
> https://issues.apache.org/jira/browse/SPARK-32825
> https://issues.apache.org/jira/browse/SPARK-34928
> I started to patch the code to get the query to run and ran into a few 
> different items, if there is a way to add these features to allow this code 
> path to run, this would be extremely helpful to running these type of edge 
> case queries.  These are basic examples here the actual queries are much more 
> complex and would require significant time to rewrite.
> Inside JDBCOptions.scala the query is being set to either, using the dbtable 
> this allows the query to be passed without modification
>  
> {code:java}
> name.trim
> or
> s"(${subquery}) SPARK_GEN_SUBQ_${curId.getAndIncrement()}"
> {code}
>  
> Inside JDBCRelation.scala this is going to try to get the schema for this 
> query, and this ends up running dialect.getSchemaQuery which is doing:
> {code:java}
> s"SELECT * FROM $table WHERE 1=0"{code}
> Overriding the dialect here and initially just passing back the $table gets 
> passed here and to the next issue which is in the compute function in 
> JDBCRDD.scala
>  
> {code:java}
> val sqlText = s"SELECT $columnList FROM ${options.tableOrQuery} 
> $myTableSampleClause" + s" $myWhereClause $getGroupByClause $myLimitClause"
>  
> {code}
>  
> For these two queries, about a CTE query and using temp tables, finding out 
> the schema is difficult without actually running the query and for the temp 
> table if you run it in the schema check that will have the table now exist 
> and fail when it runs the actual query.
>  
> The way I patched these is by doing these two items:
> JDBCRDD.scala (compute)
>  
> {code:java}
>     val runQueryAsIs = options.parameters.getOrElse("runQueryAsIs", 
> "false").toBoolean
>     val sqlText = if (runQueryAsIs) {
>       s"${options.tableOrQuery}"
>     } else {
>       s"SELECT $columnList FROM ${options.tableOrQuery} $myWhereClause"
>     }
> {code}
> JDBCRelation.scala (getSchema)
> {code:java}
> val useCustomSchema = jdbcOptions.parameters.getOrElse("useCustomSchema", 
> "false").toBoolean
>     if (useCustomSchema) {
>       val myCustomSchema = jdbcOptions.parameters.getOrElse("customSchema", 
> "").toString
>       val newSchema = CatalystSqlParser.parseTableSchema(myCustomSchema)
>       logInfo(s"Going to return the new $newSchema because useCustomSchema is 
> $useCustomSchema and passed in $myCustomSchema")
>       newSchema
>     } else {
>       val tableSchema = JDBCRDD.resolveTable(jdbcOptions)
>       jdbcOptions.customSchema match {
>       case Some(customSchema) => JdbcUtils.getCustomSchema(
>         tableSchema, customSchema, resolver)
>       case None => tableSchema
>       }
>     }{code}
>  
> This is allowing the query to run as is, by using the dbtable option and then 
> provide a custom schema that will bypass the dialect schema check
>  
> Test queries
>  
> {code:java}
> query1 = """ 
> SELECT 1 as DummyCOL
> """
> query2 = """ 
> WITH DummyCTE AS
> (
> SELECT 1 as DummyCOL
> )
> SELECT *
> FROM DummyCTE
> """
> query3 = """
> (SELECT *
> INTO #Temp1a
> FROM
> (SELECT @@VERSION as version) data
> )
> (SELECT *
> FROM
> #Temp1a)
> """
> {code}
>  
> Test schema
>  
> {code:java}
> schema1 = """
> DummyXCOL INT
> """
> schema2 = """
> DummyXCOL STRING
> """
> {code}
>  
> Test code
>  
> {code:java}
> jdbcDFWorking = (
>     spark.read.format("jdbc")
>     .option("url", 
> f"jdbc:sqlserver://{server}:{port};databaseName={database};")
>     .option("user", user)
>     .option("password", password)
>     .option("driver", "com.microsoft.sqlserver.jdbc.SQLServerDriver")
>     .option("dbtable", queryx)
>     .option("customSchema", schemax)
>     .option("useCustomSchema", "true")
>     .option("runQueryAsIs", "true")
>

[jira] [Commented] (SPARK-37259) JDBC read is always going to wrap the query in a select statement

2021-11-25 Thread Andrew (Jira)



[ 
https://issues.apache.org/jira/browse/SPARK-37259?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17449067#comment-17449067
 ] 

Andrew commented on SPARK-37259:


I've created another PR following [~KevinAppelBofa]  idea to unwrap the query 
and pass schema manually when the user chooses to do so. If let's say an option 
'useRawQuery' passed, the query will run as-is. The downside of this user has 
to provide schema manually. However, there are also advantages, as we are not 
running the query twice, and the user doesn't have to modify the query and 
split the 'with' clause.

PR link is https://github.com/apache/spark/pull/34709

[~KevinAppelBofa] [~petertoth] what are your thoughts on this?

> JDBC read is always going to wrap the query in a select statement
> -
>
> Key: SPARK-37259
> URL: https://issues.apache.org/jira/browse/SPARK-37259
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 3.1.2
>Reporter: Kevin Appel
>Priority: Major
>
> The read jdbc is wrapping the query it sends to the database server inside a 
> select statement and there is no way to override this currently.
> Initially I ran into this issue when trying to run a CTE query against SQL 
> server and it fails, the details of the failure is in these cases:
> [https://github.com/microsoft/mssql-jdbc/issues/1340]
> [https://github.com/microsoft/mssql-jdbc/issues/1657]
> [https://github.com/microsoft/sql-spark-connector/issues/147]
> https://issues.apache.org/jira/browse/SPARK-32825
> https://issues.apache.org/jira/browse/SPARK-34928
> I started to patch the code to get the query to run and ran into a few 
> different items, if there is a way to add these features to allow this code 
> path to run, this would be extremely helpful to running these type of edge 
> case queries.  These are basic examples here the actual queries are much more 
> complex and would require significant time to rewrite.
> Inside JDBCOptions.scala the query is being set to either, using the dbtable 
> this allows the query to be passed without modification
>  
> {code:java}
> name.trim
> or
> s"(${subquery}) SPARK_GEN_SUBQ_${curId.getAndIncrement()}"
> {code}
>  
> Inside JDBCRelation.scala this is going to try to get the schema for this 
> query, and this ends up running dialect.getSchemaQuery which is doing:
> {code:java}
> s"SELECT * FROM $table WHERE 1=0"{code}
> Overriding the dialect here and initially just passing back the $table gets 
> passed here and to the next issue which is in the compute function in 
> JDBCRDD.scala
>  
> {code:java}
> val sqlText = s"SELECT $columnList FROM ${options.tableOrQuery} 
> $myTableSampleClause" + s" $myWhereClause $getGroupByClause $myLimitClause"
>  
> {code}
>  
> For these two queries, about a CTE query and using temp tables, finding out 
> the schema is difficult without actually running the query and for the temp 
> table if you run it in the schema check that will have the table now exist 
> and fail when it runs the actual query.
>  
> The way I patched these is by doing these two items:
> JDBCRDD.scala (compute)
>  
> {code:java}
>     val runQueryAsIs = options.parameters.getOrElse("runQueryAsIs", 
> "false").toBoolean
>     val sqlText = if (runQueryAsIs) {
>       s"${options.tableOrQuery}"
>     } else {
>       s"SELECT $columnList FROM ${options.tableOrQuery} $myWhereClause"
>     }
> {code}
> JDBCRelation.scala (getSchema)
> {code:java}
> val useCustomSchema = jdbcOptions.parameters.getOrElse("useCustomSchema", 
> "false").toBoolean
>     if (useCustomSchema) {
>       val myCustomSchema = jdbcOptions.parameters.getOrElse("customSchema", 
> "").toString
>       val newSchema = CatalystSqlParser.parseTableSchema(myCustomSchema)
>       logInfo(s"Going to return the new $newSchema because useCustomSchema is 
> $useCustomSchema and passed in $myCustomSchema")
>       newSchema
>     } else {
>       val tableSchema = JDBCRDD.resolveTable(jdbcOptions)
>       jdbcOptions.customSchema match {
>       case Some(customSchema) => JdbcUtils.getCustomSchema(
>         tableSchema, customSchema, resolver)
>       case None => tableSchema
>       }
>     }{code}
>  
> This is allowing the query to run as is, by using the dbtable option and then 
> provide a custom schema that will bypass the dialect schema check
>  
> Test queries
>  
> {code:java}
> query1 = """ 
> SELECT 1 as DummyCOL
> """
> query2 = """ 
> WITH DummyCTE AS
> (
> SELECT 1 as DummyCOL
> )
> SELECT *
> FROM DummyCTE
> """
> query3 = """
> (SELECT *
> INTO #Temp1a
> FROM
> (SELECT @@VERSION as version) data
> )
> (SELECT *
> FROM
> #Temp1a)
> """
> {code}
>  
> Test schema
>  
> {code:java}
> schema1 = """
> DummyXCOL INT
> """
> schema2 = """
> DummyXCOL STRING
> """
> {code}
>  
> Test code
>  
> {code:java}
>

[jira] [Commented] (SPARK-37460) ALTER (DATABASE|SCHEMA|NAMESPACE) ... SET LOCATION command not documented

2021-11-25 Thread Yuto Akutsu (Jira)



[ 
https://issues.apache.org/jira/browse/SPARK-37460?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17449049#comment-17449049
 ] 

Yuto Akutsu commented on SPARK-37460:
-

I'm working on this.

> ALTER (DATABASE|SCHEMA|NAMESPACE) ... SET LOCATION command not documented
> -
>
> Key: SPARK-37460
> URL: https://issues.apache.org/jira/browse/SPARK-37460
> Project: Spark
>  Issue Type: Improvement
>  Components: Documentation, SQL
>Affects Versions: 3.2.0
>Reporter: Yuto Akutsu
>Priority: Minor
>
> The instruction of {color:#ff}ALTER DATABASE ... SET LOCATION ... 
> {color:#172b4d}command{color}{color} should be documented in a 
> sql-ref-syntax-ddl-create-table page.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Resolved] (SPARK-37357) Add small partition factor for rebalance partitions

2021-11-25 Thread Wenchen Fan (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-37357?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wenchen Fan resolved SPARK-37357.
-
Fix Version/s: 3.3.0
   Resolution: Fixed

Issue resolved by pull request 34634
[https://github.com/apache/spark/pull/34634]

> Add small partition factor for rebalance partitions
> ---
>
> Key: SPARK-37357
> URL: https://issues.apache.org/jira/browse/SPARK-37357
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 3.3.0
>Reporter: XiDuo You
>Priority: Major
> Fix For: 3.3.0
>
>
> For example `Rebalance` provide a functionality that split the large reduce 
> partition into smalls. However we have seen many SQL produce small files due 
> to the last partition.
> Let's say we have one reduce partition and six map partitions and the blocks 
> are: 
> [10, 10, 10, 10, 10, 10]
> If the target size is 50, we will get two files with 50 and 10. And it will 
> get worse if there are thousands of reduce partitions.
> It should be helpful if we can control the min partition size.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-37357) Add small partition factor for rebalance partitions

2021-11-25 Thread Wenchen Fan (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-37357?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wenchen Fan reassigned SPARK-37357:
---

Assignee: XiDuo You

> Add small partition factor for rebalance partitions
> ---
>
> Key: SPARK-37357
> URL: https://issues.apache.org/jira/browse/SPARK-37357
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 3.3.0
>Reporter: XiDuo You
>Assignee: XiDuo You
>Priority: Major
> Fix For: 3.3.0
>
>
> For example `Rebalance` provide a functionality that split the large reduce 
> partition into smalls. However we have seen many SQL produce small files due 
> to the last partition.
> Let's say we have one reduce partition and six map partitions and the blocks 
> are: 
> [10, 10, 10, 10, 10, 10]
> If the target size is 50, we will get two files with 50 and 10. And it will 
> get worse if there are thousands of reduce partitions.
> It should be helpful if we can control the min partition size.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-37311) Migrate ALTER NAMESPACE ... SET LOCATION to use v2 command by default

2021-11-25 Thread Apache Spark (Jira)



[ 
https://issues.apache.org/jira/browse/SPARK-37311?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17449025#comment-17449025
 ] 

Apache Spark commented on SPARK-37311:
--

User 'imback82' has created a pull request for this issue:
https://github.com/apache/spark/pull/34708

> Migrate ALTER NAMESPACE ... SET LOCATION to use v2 command by default
> -
>
> Key: SPARK-37311
> URL: https://issues.apache.org/jira/browse/SPARK-37311
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 3.3.0
>Reporter: Terry Kim
>Priority: Major
>
> Migrate ALTER NAMESPACE ... SET LOCATION to use v2 command by default



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-37311) Migrate ALTER NAMESPACE ... SET LOCATION to use v2 command by default

2021-11-25 Thread Apache Spark (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-37311?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-37311:


Assignee: Apache Spark

> Migrate ALTER NAMESPACE ... SET LOCATION to use v2 command by default
> -
>
> Key: SPARK-37311
> URL: https://issues.apache.org/jira/browse/SPARK-37311
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 3.3.0
>Reporter: Terry Kim
>Assignee: Apache Spark
>Priority: Major
>
> Migrate ALTER NAMESPACE ... SET LOCATION to use v2 command by default



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-37311) Migrate ALTER NAMESPACE ... SET LOCATION to use v2 command by default

2021-11-25 Thread Apache Spark (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-37311?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-37311:


Assignee: (was: Apache Spark)

> Migrate ALTER NAMESPACE ... SET LOCATION to use v2 command by default
> -
>
> Key: SPARK-37311
> URL: https://issues.apache.org/jira/browse/SPARK-37311
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 3.3.0
>Reporter: Terry Kim
>Priority: Major
>
> Migrate ALTER NAMESPACE ... SET LOCATION to use v2 command by default



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-37460) ALTER (DATABASE|SCHEMA|NAMESPACE) ... SET LOCATION command not documented

2021-11-25 Thread Yuto Akutsu (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-37460?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yuto Akutsu updated SPARK-37460:

Description: The instruction of {color:#ff}ALTER DATABASE ... SET 
LOCATION ... {color:#172b4d}command{color}{color} should be documented in a 
sql-ref-syntax-ddl-create-table page.  (was: The instruction of 
{color:#ff}ALTER DATABASE ... SET LOCATION ...{color:#172b4d} 
command{color}{color}{color:#172b4d} {color}should be documented in a 
sql-ref-syntax-ddl-create-table page.)

> ALTER (DATABASE|SCHEMA|NAMESPACE) ... SET LOCATION command not documented
> -
>
> Key: SPARK-37460
> URL: https://issues.apache.org/jira/browse/SPARK-37460
> Project: Spark
>  Issue Type: Improvement
>  Components: Documentation, SQL
>Affects Versions: 3.2.0
>Reporter: Yuto Akutsu
>Priority: Minor
>
> The instruction of {color:#ff}ALTER DATABASE ... SET LOCATION ... 
> {color:#172b4d}command{color}{color} should be documented in a 
> sql-ref-syntax-ddl-create-table page.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-37460) ALTER (DATABASE|SCHEMA|NAMESPACE) ... SET LOCATION command not documented

2021-11-25 Thread Yuto Akutsu (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-37460?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yuto Akutsu updated SPARK-37460:

Description: The instruction of {color:#FF}ALTER DATABASE ... SET 
LOCATION ... {color:#172b4d}command{color}{color} should be documented in a 
sql-ref-syntax-ddl-create-table page.  (was: The instruction of `ALTER DATABASE 
... SET LOCATION ...` should be documented in a 
`sql-ref-syntax-ddl-create-table` page.)

> ALTER (DATABASE|SCHEMA|NAMESPACE) ... SET LOCATION command not documented
> -
>
> Key: SPARK-37460
> URL: https://issues.apache.org/jira/browse/SPARK-37460
> Project: Spark
>  Issue Type: Improvement
>  Components: Documentation, SQL
>Affects Versions: 3.2.0
>Reporter: Yuto Akutsu
>Priority: Minor
>
> The instruction of {color:#FF}ALTER DATABASE ... SET LOCATION ... 
> {color:#172b4d}command{color}{color} should be documented in a 
> sql-ref-syntax-ddl-create-table page.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-37460) ALTER (DATABASE|SCHEMA|NAMESPACE) ... SET LOCATION command not documented

2021-11-25 Thread Yuto Akutsu (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-37460?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yuto Akutsu updated SPARK-37460:

Description: The instruction of {color:#ff}ALTER DATABASE ... SET 
LOCATION ...{color:#172b4d} command{color}{color}{color:#172b4d} {color}should 
be documented in a sql-ref-syntax-ddl-create-table page.  (was: The instruction 
of {color:#FF}ALTER DATABASE ... SET LOCATION ... 
{color:#172b4d}command{color}{color} should be documented in a 
sql-ref-syntax-ddl-create-table page.)

> ALTER (DATABASE|SCHEMA|NAMESPACE) ... SET LOCATION command not documented
> -
>
> Key: SPARK-37460
> URL: https://issues.apache.org/jira/browse/SPARK-37460
> Project: Spark
>  Issue Type: Improvement
>  Components: Documentation, SQL
>Affects Versions: 3.2.0
>Reporter: Yuto Akutsu
>Priority: Minor
>
> The instruction of {color:#ff}ALTER DATABASE ... SET LOCATION 
> ...{color:#172b4d} command{color}{color}{color:#172b4d} {color}should be 
> documented in a sql-ref-syntax-ddl-create-table page.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Created] (SPARK-37460) ALTER (DATABASE|SCHEMA|NAMESPACE) ... SET LOCATION command not documented

2021-11-25 Thread Yuto Akutsu (Jira)

Yuto Akutsu created SPARK-37460:
---

 Summary: ALTER (DATABASE|SCHEMA|NAMESPACE) ... SET LOCATION 
command not documented
 Key: SPARK-37460
 URL: https://issues.apache.org/jira/browse/SPARK-37460
 Project: Spark
  Issue Type: Improvement
  Components: Documentation, SQL
Affects Versions: 3.2.0
Reporter: Yuto Akutsu


The instruction of `ALTER DATABASE ... SET LOCATION ...` should be documented 
in a `sql-ref-syntax-ddl-create-table` page.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Resolved] (SPARK-37454) support expressions in time travel timestamp

2021-11-25 Thread Wenchen Fan (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-37454?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wenchen Fan resolved SPARK-37454.
-
Fix Version/s: 3.3.0
   Resolution: Fixed

Issue resolved by pull request 34699
[https://github.com/apache/spark/pull/34699]

> support expressions in time travel timestamp
> 
>
> Key: SPARK-37454
> URL: https://issues.apache.org/jira/browse/SPARK-37454
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 3.3.0
>Reporter: Wenchen Fan
>Assignee: Wenchen Fan
>Priority: Major
> Fix For: 3.3.0
>
>




--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-37454) support expressions in time travel timestamp

2021-11-25 Thread Wenchen Fan (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-37454?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wenchen Fan reassigned SPARK-37454:
---

Assignee: Wenchen Fan

> support expressions in time travel timestamp
> 
>
> Key: SPARK-37454
> URL: https://issues.apache.org/jira/browse/SPARK-37454
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 3.3.0
>Reporter: Wenchen Fan
>Assignee: Wenchen Fan
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

84 matches

Mail list logo