[jira] [Comment Edited] (SPARK-35161) Safe version SQL functions

2021-04-20 Thread Gengliang Wang (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-35161?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17326256#comment-17326256
 ] 

Gengliang Wang edited comment on SPARK-35161 at 4/21/21, 5:51 AM:
--

cc [~beliefer] [~angerszhuuu] Are you interested in these new features?


was (Author: gengliang.wang):
cc [~beliefer][~angerszhuuu] Are you interested in these new features?

> Safe version SQL functions
> --
>
> Key: SPARK-35161
> URL: https://issues.apache.org/jira/browse/SPARK-35161
> Project: Spark
>  Issue Type: Umbrella
>  Components: SQL
>Affects Versions: 3.2.0
>Reporter: Gengliang Wang
>Priority: Major
>
> Create new safe version SQL functions for existing SQL functions/operators, 
> which returns NULL if overflow/error occurs. So that:
> 1. Users can manage to finish queries without interruptions in ANSI mode.
> 2. Users can get NULLs instead of unreasonable results if overflow occurs 
> when ANSI mode is off.
> For example, the behavior of the following SQL operations is unreasonable:
> {code:java}
> 2147483647 + 2 => -2147483647
> CAST(2147483648L AS INT) => -2147483648
> {code}
> With the new safe version SQL functions:
> {code:java}
> TRY_ADD(2147483647, 2) => null
> TRY_CAST(2147483648L AS INT) => null
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-35161) Safe version SQL functions

2021-04-20 Thread angerszhu (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-35161?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17326272#comment-17326272
 ] 

angerszhu commented on SPARK-35161:
---

Got it.

> Safe version SQL functions
> --
>
> Key: SPARK-35161
> URL: https://issues.apache.org/jira/browse/SPARK-35161
> Project: Spark
>  Issue Type: Umbrella
>  Components: SQL
>Affects Versions: 3.2.0
>Reporter: Gengliang Wang
>Priority: Major
>
> Create new safe version SQL functions for existing SQL functions/operators, 
> which returns NULL if overflow/error occurs. So that:
> 1. Users can manage to finish queries without interruptions in ANSI mode.
> 2. Users can get NULLs instead of unreasonable results if overflow occurs 
> when ANSI mode is off.
> For example, the behavior of the following SQL operations is unreasonable:
> {code:java}
> 2147483647 + 2 => -2147483647
> CAST(2147483648L AS INT) => -2147483648
> {code}
> With the new safe version SQL functions:
> {code:java}
> TRY_ADD(2147483647, 2) => null
> TRY_CAST(2147483648L AS INT) => null
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-35161) Safe version SQL functions

2021-04-20 Thread Gengliang Wang (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-35161?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Gengliang Wang updated SPARK-35161:
---
Description: 
Create new safe version SQL functions for existing SQL functions/operators, 
which returns NULL if overflow/error occurs. So that:
1. Users can manage to finish queries without interruptions in ANSI mode.
2. Users can get NULLs instead of unreasonable results if overflow occurs when 
ANSI mode is off.
For example, the behavior of the following SQL operations is unreasonable:
{code:java}
2147483647 + 2 => -2147483647
CAST(2147483648L AS INT) => -2147483648
{code}
With the new safe version SQL functions:
{code:java}
TRY_ADD(2147483647, 2) => null
TRY_CAST(2147483648L AS INT) => null
{code}

  was:
Create new safe version SQL functions for existing SQL functions/operators, 
which returns NULL if overflow/error occurs. So that:
1. Users can manage to finish queries without interruptions in ANSI mode.
2. Even when ANSI mode is off, the result can be more reasonable. For example, 
the result of the following operation is terrible
{code:java}
2147483647 + 2 => -2147483647
CAST(2147483648L AS INT) => -2147483648
{code}
Having the safe version SQL functions provides an alternative solution for 
handling such cases
{code:java}
TRY_ADD(2147483647, 2) => null
TRY_CAST(2147483648L AS INT) => null
{code}


> Safe version SQL functions
> --
>
> Key: SPARK-35161
> URL: https://issues.apache.org/jira/browse/SPARK-35161
> Project: Spark
>  Issue Type: Umbrella
>  Components: SQL
>Affects Versions: 3.2.0
>Reporter: Gengliang Wang
>Priority: Major
>
> Create new safe version SQL functions for existing SQL functions/operators, 
> which returns NULL if overflow/error occurs. So that:
> 1. Users can manage to finish queries without interruptions in ANSI mode.
> 2. Users can get NULLs instead of unreasonable results if overflow occurs 
> when ANSI mode is off.
> For example, the behavior of the following SQL operations is unreasonable:
> {code:java}
> 2147483647 + 2 => -2147483647
> CAST(2147483648L AS INT) => -2147483648
> {code}
> With the new safe version SQL functions:
> {code:java}
> TRY_ADD(2147483647, 2) => null
> TRY_CAST(2147483648L AS INT) => null
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-33966) Two-tier encryption key management

2021-04-20 Thread Gidon Gershinsky (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-33966?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Gidon Gershinsky updated SPARK-33966:
-
Target Version/s:   (was: 3.2.0)

> Two-tier encryption key management
> --
>
> Key: SPARK-33966
> URL: https://issues.apache.org/jira/browse/SPARK-33966
> Project: Spark
>  Issue Type: New Feature
>  Components: SQL
>Affects Versions: 3.2.0
>Reporter: Gidon Gershinsky
>Priority: Major
>
> Columnar data formats (Parquet and ORC) have recently added a column 
> encryption capability. The data protection follows the practice of envelope 
> encryption, where the Data Encryption Key (DEK) is freshly generated for each 
> file/column, and is encrypted with a master key (or an intermediate key, that 
> is in turn encrypted with a master key). The master keys are kept in a 
> centralized Key Management Service (KMS) - meaning that each Spark worker 
> needs to interact with a (typically slow) KMS server. 
> This Jira (and its sub-tasks) introduce an alternative approach, that on one 
> hand preserves the best practice of generating fresh encryption keys for each 
> data file/column, and on the other hand allows Spark clusters to have a 
> scalable interaction with a KMS server, by delegating it to the application 
> driver. This is done via two-tier management of the keys, where a random Key 
> Encryption Key (KEK) is generated by the driver, encrypted by the master key 
> in the KMS, and distributed by the driver to the workers, so they can use it 
> to encrypt the DEKs, generated there by Parquet or ORC libraries. In the 
> workers, the KEKs are distributed to the executors/threads in the write path. 
> In the read path, the encrypted KEKs are fetched by workers from file 
> metadata, decrypted via interaction with the driver, and shared among the 
> executors/threads.
> The KEK layer further improves scalability of the key management, because 
> neither driver or workers need to interact with the KMS for each file/column.
> Stand-alone Parquet/ORC libraries (without Spark) and/or other frameworks 
> (e.g., Presto, pandas) must be able to read/decrypt the files, 
> written/encrypted by this Spark-driven key management mechanism - and 
> vice-versa. [of course, only if both sides have proper authorisation for 
> using the master keys in the KMS]
> A link to a discussion/design doc is attached.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-35161) Safe version SQL functions

2021-04-20 Thread Gengliang Wang (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-35161?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Gengliang Wang updated SPARK-35161:
---
Description: 
Create new safe version SQL functions for existing SQL functions/operators, 
which returns NULL if overflow/error occurs. So that:
1. Users can manage to finish queries without interruptions in ANSI mode.
2. Even when ANSI mode is off, the result can be more reasonable. For example, 
the result of the following operation is terrible
{code:java}
2147483647 + 2 => -2147483647
CAST(2147483648L AS INT) => -2147483648
{code}
Having the safe version SQL functions provides an alternative solution for 
handling such cases
{code:java}
TRY_ADD(2147483647, 2) => null
TRY_CAST(2147483648L AS INT) => null
{code}

  was:
Create new safe version SQL functions for existing SQL functions/operators, 
which returns NULL if overflow/error occurs. So that:
1. Users can manage to finish queries without interruptions
2. The result can be more reasonable. For example, the result of the following 
operation is terrible
{code:java}
2147483647 + 2 => -2147483647
CAST(2147483648L AS INT) => -2147483648
{code}
Having the safe version SQL functions provides an alternative solution for 
handling such cases
{code:java}
TRY_ADD(2147483647, 2) => null
TRY_CAST(2147483648L AS INT) => null
{code}


> Safe version SQL functions
> --
>
> Key: SPARK-35161
> URL: https://issues.apache.org/jira/browse/SPARK-35161
> Project: Spark
>  Issue Type: Umbrella
>  Components: SQL
>Affects Versions: 3.2.0
>Reporter: Gengliang Wang
>Priority: Major
>
> Create new safe version SQL functions for existing SQL functions/operators, 
> which returns NULL if overflow/error occurs. So that:
> 1. Users can manage to finish queries without interruptions in ANSI mode.
> 2. Even when ANSI mode is off, the result can be more reasonable. For 
> example, the result of the following operation is terrible
> {code:java}
> 2147483647 + 2 => -2147483647
> CAST(2147483648L AS INT) => -2147483648
> {code}
> Having the safe version SQL functions provides an alternative solution for 
> handling such cases
> {code:java}
> TRY_ADD(2147483647, 2) => null
> TRY_CAST(2147483648L AS INT) => null
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-35161) Safe version SQL functions

2021-04-20 Thread Gengliang Wang (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-35161?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17326256#comment-17326256
 ] 

Gengliang Wang commented on SPARK-35161:


cc [~beliefer][~angerszhuuu] Are you interested in these new features?

> Safe version SQL functions
> --
>
> Key: SPARK-35161
> URL: https://issues.apache.org/jira/browse/SPARK-35161
> Project: Spark
>  Issue Type: Umbrella
>  Components: SQL
>Affects Versions: 3.2.0
>Reporter: Gengliang Wang
>Priority: Major
>
> Create new safe version SQL functions for existing SQL functions/operators, 
> which returns NULL if overflow/error occurs. So that:
> 1. Users can manage to finish queries without interruptions
> 2. The result can be more reasonable. For example, the result of the 
> following operation is terrible
> {code:java}
> 2147483647 + 2 => -2147483647
> CAST(2147483648L AS INT) => -2147483648
> {code}
> Having the safe version SQL functions provides an alternative solution for 
> handling such cases
> {code:java}
> TRY_ADD(2147483647, 2) => null
> TRY_CAST(2147483648L AS INT) => null
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-35167) New SQL function: TRY_NEGATIVE

2021-04-20 Thread Gengliang Wang (Jira)
Gengliang Wang created SPARK-35167:
--

 Summary: New SQL function: TRY_NEGATIVE
 Key: SPARK-35167
 URL: https://issues.apache.org/jira/browse/SPARK-35167
 Project: Spark
  Issue Type: Sub-task
  Components: SQL
Affects Versions: 3.2.0
Reporter: Gengliang Wang






--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-35166) New SQL function: TRY_DIV

2021-04-20 Thread Gengliang Wang (Jira)
Gengliang Wang created SPARK-35166:
--

 Summary: New SQL function: TRY_DIV
 Key: SPARK-35166
 URL: https://issues.apache.org/jira/browse/SPARK-35166
 Project: Spark
  Issue Type: Sub-task
  Components: SQL
Affects Versions: 3.2.0
Reporter: Gengliang Wang


This is for integral divide



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-35164) New SQL function: TRY_MULTIPLY

2021-04-20 Thread Gengliang Wang (Jira)
Gengliang Wang created SPARK-35164:
--

 Summary: New SQL function: TRY_MULTIPLY
 Key: SPARK-35164
 URL: https://issues.apache.org/jira/browse/SPARK-35164
 Project: Spark
  Issue Type: Sub-task
  Components: SQL
Affects Versions: 3.2.0
Reporter: Gengliang Wang






--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-35165) New SQL function: TRY_DIVIDE

2021-04-20 Thread Gengliang Wang (Jira)
Gengliang Wang created SPARK-35165:
--

 Summary: New SQL function: TRY_DIVIDE
 Key: SPARK-35165
 URL: https://issues.apache.org/jira/browse/SPARK-35165
 Project: Spark
  Issue Type: Sub-task
  Components: SQL
Affects Versions: 3.2.0
Reporter: Gengliang Wang






--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-35163) New SQL function: TRY_SUBTRACT

2021-04-20 Thread Gengliang Wang (Jira)
Gengliang Wang created SPARK-35163:
--

 Summary: New SQL function: TRY_SUBTRACT
 Key: SPARK-35163
 URL: https://issues.apache.org/jira/browse/SPARK-35163
 Project: Spark
  Issue Type: Sub-task
  Components: SQL
Affects Versions: 3.2.0
Reporter: Gengliang Wang






--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-35162) New SQL function: TRY_ADD

2021-04-20 Thread Gengliang Wang (Jira)
Gengliang Wang created SPARK-35162:
--

 Summary: New SQL function: TRY_ADD
 Key: SPARK-35162
 URL: https://issues.apache.org/jira/browse/SPARK-35162
 Project: Spark
  Issue Type: Sub-task
  Components: SQL
Affects Versions: 3.2.0
Reporter: Gengliang Wang






--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-34881) New SQL Function: TRY_CAST

2021-04-20 Thread Gengliang Wang (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-34881?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Gengliang Wang updated SPARK-34881:
---
Parent: SPARK-35161
Issue Type: Sub-task  (was: New Feature)

> New SQL Function: TRY_CAST
> --
>
> Key: SPARK-34881
> URL: https://issues.apache.org/jira/browse/SPARK-34881
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 3.2.0
>Reporter: Gengliang Wang
>Assignee: Gengliang Wang
>Priority: Major
> Fix For: 3.2.0
>
>
> Add a new SQL function try_cast. try_cast is identical to CAST with 
> `spark.sql.ansi.enabled` as true, except it returns NULL instead of raising 
> an error. This expression has one major difference from `cast` with 
> `spark.sql.ansi.enabled` as true: when the source value can't be stored in 
> the target integral(Byte/Short/Int/Long) type, `try_cast` returns null 
> instead of returning the low order bytes of the source value.
> This is learned from Google BigQuery and Snowflake:
> https://docs.snowflake.com/en/sql-reference/functions/try_cast.html
> https://cloud.google.com/bigquery/docs/reference/standard-sql/functions-and-operators#safe_casting



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-35161) Safe version SQL functions

2021-04-20 Thread Gengliang Wang (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-35161?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Gengliang Wang updated SPARK-35161:
---
Issue Type: Umbrella  (was: New Feature)

> Safe version SQL functions
> --
>
> Key: SPARK-35161
> URL: https://issues.apache.org/jira/browse/SPARK-35161
> Project: Spark
>  Issue Type: Umbrella
>  Components: SQL
>Affects Versions: 3.2.0
>Reporter: Gengliang Wang
>Priority: Major
>
> Create new safe version SQL functions for existing SQL functions/operators, 
> which returns NULL if overflow/error occurs. So that:
> 1. Users can manage to finish queries without interruptions
> 2. The result can be more reasonable. For example, the result of the 
> following operation is terrible
> {code:java}
> 2147483647 + 2 => -2147483647
> CAST(2147483648L AS INT) => -2147483648
> {code}
> Having the safe version SQL functions provides an alternative solution for 
> handling such cases
> {code:java}
> TRY_ADD(2147483647, 2) => null
> TRY_CAST(2147483648L AS INT) => null
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-35112) Cast string to day-time interval

2021-04-20 Thread angerszhu (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-35112?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17326252#comment-17326252
 ] 

angerszhu commented on SPARK-35112:
---

Working on this

> Cast string to day-time interval
> 
>
> Key: SPARK-35112
> URL: https://issues.apache.org/jira/browse/SPARK-35112
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 3.2.0
>Reporter: Max Gekk
>Priority: Major
>
> Support cast of string to DayTimeIntervalType. The cast should support full 
> form INTERVAL '1 10:11:12' DAY TO SECOND and only interval payload '1 
> 10:11:12'.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-35111) Cast string to year-month interval

2021-04-20 Thread angerszhu (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-35111?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17326251#comment-17326251
 ] 

angerszhu commented on SPARK-35111:
---

Working on this.

> Cast string to year-month interval
> --
>
> Key: SPARK-35111
> URL: https://issues.apache.org/jira/browse/SPARK-35111
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 3.2.0
>Reporter: Max Gekk
>Priority: Major
>
> Support cast of string to YearMonthIntervalType. The cast should support full 
> form INTERVAL '1-1' YEAR TO MONTH and only interval payload '1-1'.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-35161) Safe version SQL functions

2021-04-20 Thread Gengliang Wang (Jira)
Gengliang Wang created SPARK-35161:
--

 Summary: Safe version SQL functions
 Key: SPARK-35161
 URL: https://issues.apache.org/jira/browse/SPARK-35161
 Project: Spark
  Issue Type: New Feature
  Components: SQL
Affects Versions: 3.2.0
Reporter: Gengliang Wang


Create new safe version SQL functions for existing SQL functions/operators, 
which returns NULL if overflow/error occurs. So that:
1. Users can manage to finish queries without interruptions
2. The result can be more reasonable. For example, the result of the following 
operation is terrible
{code:java}
2147483647 + 2 => -2147483647
CAST(2147483648L AS INT) => -2147483648
{code}
Having the safe version SQL functions provides an alternative solution for 
handling such cases
{code:java}
TRY_ADD(2147483647, 2) => null
TRY_CAST(2147483648L AS INT) => null
{code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-35113) Support ANSI intervals in the Hash expression

2021-04-20 Thread Max Gekk (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-35113?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Max Gekk resolved SPARK-35113.
--
Fix Version/s: 3.2.0
   Resolution: Fixed

Issue resolved by pull request 32259
[https://github.com/apache/spark/pull/32259]

> Support ANSI intervals in the Hash expression
> -
>
> Key: SPARK-35113
> URL: https://issues.apache.org/jira/browse/SPARK-35113
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 3.2.0
>Reporter: Max Gekk
>Assignee: angerszhu
>Priority: Major
> Fix For: 3.2.0
>
>
> Handle YearMonthIntervalType and DayTimeIntervalType in HashExpression. And 
> write tests.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-35113) Support ANSI intervals in the Hash expression

2021-04-20 Thread Max Gekk (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-35113?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Max Gekk reassigned SPARK-35113:


Assignee: angerszhu

> Support ANSI intervals in the Hash expression
> -
>
> Key: SPARK-35113
> URL: https://issues.apache.org/jira/browse/SPARK-35113
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 3.2.0
>Reporter: Max Gekk
>Assignee: angerszhu
>Priority: Major
>
> Handle YearMonthIntervalType and DayTimeIntervalType in HashExpression. And 
> write tests.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-35159) extract doc of hive format

2021-04-20 Thread Apache Spark (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-35159?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17326225#comment-17326225
 ] 

Apache Spark commented on SPARK-35159:
--

User 'AngersZh' has created a pull request for this issue:
https://github.com/apache/spark/pull/32264

> extract doc of hive format
> --
>
> Key: SPARK-35159
> URL: https://issues.apache.org/jira/browse/SPARK-35159
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 3.2.0
>Reporter: angerszhu
>Priority: Major
>
> extract doc of hive format



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-35159) extract doc of hive format

2021-04-20 Thread Apache Spark (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-35159?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-35159:


Assignee: (was: Apache Spark)

> extract doc of hive format
> --
>
> Key: SPARK-35159
> URL: https://issues.apache.org/jira/browse/SPARK-35159
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 3.2.0
>Reporter: angerszhu
>Priority: Major
>
> extract doc of hive format



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-35159) extract doc of hive format

2021-04-20 Thread Apache Spark (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-35159?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-35159:


Assignee: Apache Spark

> extract doc of hive format
> --
>
> Key: SPARK-35159
> URL: https://issues.apache.org/jira/browse/SPARK-35159
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 3.2.0
>Reporter: angerszhu
>Assignee: Apache Spark
>Priority: Major
>
> extract doc of hive format



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-35159) extract doc of hive format

2021-04-20 Thread Apache Spark (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-35159?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17326224#comment-17326224
 ] 

Apache Spark commented on SPARK-35159:
--

User 'AngersZh' has created a pull request for this issue:
https://github.com/apache/spark/pull/32264

> extract doc of hive format
> --
>
> Key: SPARK-35159
> URL: https://issues.apache.org/jira/browse/SPARK-35159
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 3.2.0
>Reporter: angerszhu
>Priority: Major
>
> extract doc of hive format



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-34928) CTE Execution fails for Sql Server

2021-04-20 Thread Supun De Silva (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-34928?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Supun De Silva updated SPARK-34928:
---
Description: 
h2. Issue

We have a simple Sql statement that we intend to execute on SQL Server. This 
has a CTE component.

Execution of this yields to an error that looks like follows
{code:java}
java.sql.SQLException: Incorrect syntax near the keyword 'WITH'.{code}
We are using the jdbc driver *net.sourceforge.jtds.jdbc.Driver* (version 1.3.1)

This is a particularly annoying issue and due to this we are having to write 
inner queries that are fair bit inefficient.
h2. SQL statement

(not the actual one but a simplified version with renamed parameters)

 
{code:sql}
WITH OldChanges as (
   SELECT distinct 
SomeDate,
Name
   FROM [dbo].[DateNameFoo] (nolock)
   WHERE SomeDate!= '2021-03-30'
   AND convert(date, UpdateDateTime) = '2021-03-31'

SELECT * from OldChanges {code}
h3. Update on 2021-04-21

We tried *com.microsoft.sqlserver.jdbc.SQLServerDriver* driver as well. This 
also yields to the same issue.

  was:
h2. Issue

We have a simple Sql statement that we intend to execute on SQL Server. This 
has a CTE component.

Execution of this yields to an error that looks like follows
{code:java}
java.sql.SQLException: Incorrect syntax near the keyword 'WITH'.{code}
We are using the jdbc driver *net.sourceforge.jtds.jdbc.Driver* (version 1.3.1)

This is a particularly annoying issue and due to this we are having to write 
inner queries that are fair bit inefficient.
h2. SQL statement

(not the actual one but a simplified version with renamed parameters)

 
{code:sql}
WITH OldChanges as (
   SELECT distinct 
SomeDate,
Name
   FROM [dbo].[DateNameFoo] (nolock)
   WHERE SomeDate!= '2021-03-30'
   AND convert(date, UpdateDateTime) = '2021-03-31'

SELECT * from OldChanges {code}


> CTE Execution fails for Sql Server
> --
>
> Key: SPARK-34928
> URL: https://issues.apache.org/jira/browse/SPARK-34928
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Core
>Affects Versions: 3.0.1
>Reporter: Supun De Silva
>Priority: Minor
>
> h2. Issue
> We have a simple Sql statement that we intend to execute on SQL Server. This 
> has a CTE component.
> Execution of this yields to an error that looks like follows
> {code:java}
> java.sql.SQLException: Incorrect syntax near the keyword 'WITH'.{code}
> We are using the jdbc driver *net.sourceforge.jtds.jdbc.Driver* (version 
> 1.3.1)
> This is a particularly annoying issue and due to this we are having to write 
> inner queries that are fair bit inefficient.
> h2. SQL statement
> (not the actual one but a simplified version with renamed parameters)
>  
> {code:sql}
> WITH OldChanges as (
>SELECT distinct 
> SomeDate,
> Name
>FROM [dbo].[DateNameFoo] (nolock)
>WHERE SomeDate!= '2021-03-30'
>AND convert(date, UpdateDateTime) = '2021-03-31'
> SELECT * from OldChanges {code}
> h3. Update on 2021-04-21
> We tried *com.microsoft.sqlserver.jdbc.SQLServerDriver* driver as well. This 
> also yields to the same issue.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-34928) CTE Execution fails for Sql Server

2021-04-20 Thread Supun De Silva (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-34928?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17326220#comment-17326220
 ] 

Supun De Silva commented on SPARK-34928:


[~hyukjin.kwon]

We tried *com.microsoft.sqlserver.jdbc.SQLServerDriver* driver as well. This 
also yields to the same issue.

> CTE Execution fails for Sql Server
> --
>
> Key: SPARK-34928
> URL: https://issues.apache.org/jira/browse/SPARK-34928
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Core
>Affects Versions: 3.0.1
>Reporter: Supun De Silva
>Priority: Minor
>
> h2. Issue
> We have a simple Sql statement that we intend to execute on SQL Server. This 
> has a CTE component.
> Execution of this yields to an error that looks like follows
> {code:java}
> java.sql.SQLException: Incorrect syntax near the keyword 'WITH'.{code}
> We are using the jdbc driver *net.sourceforge.jtds.jdbc.Driver* (version 
> 1.3.1)
> This is a particularly annoying issue and due to this we are having to write 
> inner queries that are fair bit inefficient.
> h2. SQL statement
> (not the actual one but a simplified version with renamed parameters)
>  
> {code:sql}
> WITH OldChanges as (
>SELECT distinct 
> SomeDate,
> Name
>FROM [dbo].[DateNameFoo] (nolock)
>WHERE SomeDate!= '2021-03-30'
>AND convert(date, UpdateDateTime) = '2021-03-31'
> SELECT * from OldChanges {code}
> h3. Update on 2021-04-21
> We tried *com.microsoft.sqlserver.jdbc.SQLServerDriver* driver as well. This 
> also yields to the same issue.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-35160) Spark application submitted despite failing to get Hive delegation token

2021-04-20 Thread Manu Zhang (Jira)
Manu Zhang created SPARK-35160:
--

 Summary: Spark application submitted despite failing to get Hive 
delegation token
 Key: SPARK-35160
 URL: https://issues.apache.org/jira/browse/SPARK-35160
 Project: Spark
  Issue Type: Improvement
  Components: Security
Affects Versions: 3.1.1
Reporter: Manu Zhang


Currently, when running on YARN and failing to get Hive delegation token, a 
Spark SQL application will still be submitted. Eventually, the application will 
fail on connecting to Hive metastore without a valid delegation token. 

Is there any reason for this design ?

cc [~jerryshao] who originally implemented this in 
https://issues.apache.org/jira/browse/SPARK-14743

 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-35159) extract doc of hive format

2021-04-20 Thread angerszhu (Jira)
angerszhu created SPARK-35159:
-

 Summary: extract doc of hive format
 Key: SPARK-35159
 URL: https://issues.apache.org/jira/browse/SPARK-35159
 Project: Spark
  Issue Type: Sub-task
  Components: SQL
Affects Versions: 3.2.0
Reporter: angerszhu


extract doc of hive format



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-35084) [k8s] On Spark 3, jars listed in spark.jars and spark.jars.packages are not added to sparkContext

2021-04-20 Thread Keunhyun Oh (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-35084?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Keunhyun Oh updated SPARK-35084:

Issue Type: Bug  (was: Question)

> [k8s] On Spark 3, jars listed in spark.jars and spark.jars.packages are not 
> added to sparkContext
> -
>
> Key: SPARK-35084
> URL: https://issues.apache.org/jira/browse/SPARK-35084
> Project: Spark
>  Issue Type: Bug
>  Components: Kubernetes
>Affects Versions: 3.0.0, 3.0.2, 3.1.1
>Reporter: Keunhyun Oh
>Priority: Major
>
> I'm trying to migrate spark 2 to spark 3 in k8s.
>  
> In my environment, on Spark 3.x, jars listed in spark.jars and 
> spark.jars.packages are not added to sparkContext.
> After driver's process is launched, jars are not propagated to Executors. So, 
> NoClassDefException is raised in executors.
>  
> In spark.properties, the only main application jar is contained in 
> spark.jars. It is different from Spark 2.
>  
> How to solve this situation? Is it any changed spark options in spark 3 from 
> spark 2?



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Comment Edited] (SPARK-35084) [k8s] On Spark 3, jars listed in spark.jars and spark.jars.packages are not added to sparkContext

2021-04-20 Thread Keunhyun Oh (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-35084?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17326200#comment-17326200
 ] 

Keunhyun Oh edited comment on SPARK-35084 at 4/21/21, 2:07 AM:
---

*Spark 2.4.5*

[https://github.com/apache/spark/blob/v2.4.5/core/src/main/scala/org/apache/spark/deploy/SparkSubmit.scala]
{code:java}
 if (!isMesosCluster && !isStandAloneCluster) {
  // Resolve maven dependencies if there are any and add classpath to jars. 
Add them to py-files
  // too for packages that include Python code
  val resolvedMavenCoordinates = DependencyUtils.resolveMavenDependencies(
args.packagesExclusions, args.packages, args.repositories, 
args.ivyRepoPath,
args.ivySettingsPath)
  
  if (!StringUtils.isBlank(resolvedMavenCoordinates)) {
args.jars = mergeFileLists(args.jars, resolvedMavenCoordinates)
if (args.isPython || isInternal(args.primaryResource)) {
  args.pyFiles = mergeFileLists(args.pyFiles, resolvedMavenCoordinates)
}
  } 
  
  // install any R packages that may have been passed through --jars or 
--packages.
  // Spark Packages may contain R source code inside the jar.
  if (args.isR && !StringUtils.isBlank(args.jars)) {
RPackageUtils.checkAndBuildRPackage(args.jars, printStream, 
args.verbose)
  }
} {code}
 

*Spark 3.0.2*

[https://github.com/apache/spark/blob/v3.0.2/core/src/main/scala/org/apache/spark/deploy/SparkSubmit.scala]
{code:java}
   if (!StringUtils.isBlank(resolvedMavenCoordinates)) {
// In K8s client mode, when in the driver, add resolved jars early as 
we might need
// them at the submit time for artifact downloading.
// For example we might use the dependencies for downloading
// files from a Hadoop Compatible fs eg. S3. In this case the user 
might pass:
// --packages 
com.amazonaws:aws-java-sdk:1.7.4:org.apache.hadoop:hadoop-aws:2.7.6
if (isKubernetesClusterModeDriver) {
  val loader = getSubmitClassLoader(sparkConf)
  for (jar <- resolvedMavenCoordinates.split(",")) {
addJarToClasspath(jar, loader)
  }
} else if (isKubernetesCluster) {
  // We need this in K8s cluster mode so that we can upload local deps
  // via the k8s application, like in cluster mode driver
  childClasspath ++= resolvedMavenCoordinates.split(",")
} else {
  args.jars = mergeFileLists(args.jars, resolvedMavenCoordinates)
  if (args.isPython || isInternal(args.primaryResource)) {
args.pyFiles = mergeFileLists(args.pyFiles, 
resolvedMavenCoordinates)
  }
}
  }{code}
 

When using k8s master, in spark 2, jars derived from maven are added to 
args.jars.

However, in spark 3, maven dependencies are not merged to args.jars.

 

I assume that because of it k8s cluster mode spark-submit is not supported 
spark.jars.packages I expected.

So, jars from packages are not added to spark context.

 

How to use maven packages in k8s cluster mode?


was (Author: ocworld):
*Spark 2.4.5*

[https://github.com/apache/spark/blob/v2.4.5/core/src/main/scala/org/apache/spark/deploy/SparkSubmit.scala]
{code:java}
 if (!isMesosCluster && !isStandAloneCluster) {
  // Resolve maven dependencies if there are any and add classpath to jars. 
Add them to py-files
  // too for packages that include Python code
  val resolvedMavenCoordinates = DependencyUtils.resolveMavenDependencies(
args.packagesExclusions, args.packages, args.repositories, 
args.ivyRepoPath,
args.ivySettingsPath)
  
  if (!StringUtils.isBlank(resolvedMavenCoordinates)) {
args.jars = mergeFileLists(args.jars, resolvedMavenCoordinates)
if (args.isPython || isInternal(args.primaryResource)) {
  args.pyFiles = mergeFileLists(args.pyFiles, resolvedMavenCoordinates)
}
  } 
  
  // install any R packages that may have been passed through --jars or 
--packages.
  // Spark Packages may contain R source code inside the jar.
  if (args.isR && !StringUtils.isBlank(args.jars)) {
RPackageUtils.checkAndBuildRPackage(args.jars, printStream, 
args.verbose)
  }
} {code}
 

*Spark 3.0.2*

**[https://github.com/apache/spark/blob/v3.0.2/core/src/main/scala/org/apache/spark/deploy/SparkSubmit.scala]
{code:java}
   if (!StringUtils.isBlank(resolvedMavenCoordinates)) {
// In K8s client mode, when in the driver, add resolved jars early as 
we might need
// them at the submit time for artifact downloading.
// For example we might use the dependencies for downloading
// files from a Hadoop Compatible fs eg. S3. In this case the user 
might pass:
// --packages 
com.amazonaws:aws-java-sdk:1.7.4:org.apache.hadoop:hadoop-aws:2.7.6
if 

[jira] [Commented] (SPARK-35084) [k8s] On Spark 3, jars listed in spark.jars and spark.jars.packages are not added to sparkContext

2021-04-20 Thread Keunhyun Oh (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-35084?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17326200#comment-17326200
 ] 

Keunhyun Oh commented on SPARK-35084:
-

*Spark 2.4.5*

[https://github.com/apache/spark/blob/v2.4.5/core/src/main/scala/org/apache/spark/deploy/SparkSubmit.scala]
{code:java}
 if (!isMesosCluster && !isStandAloneCluster) {
  // Resolve maven dependencies if there are any and add classpath to jars. 
Add them to py-files
  // too for packages that include Python code
  val resolvedMavenCoordinates = DependencyUtils.resolveMavenDependencies(
args.packagesExclusions, args.packages, args.repositories, 
args.ivyRepoPath,
args.ivySettingsPath)
  
  if (!StringUtils.isBlank(resolvedMavenCoordinates)) {
args.jars = mergeFileLists(args.jars, resolvedMavenCoordinates)
if (args.isPython || isInternal(args.primaryResource)) {
  args.pyFiles = mergeFileLists(args.pyFiles, resolvedMavenCoordinates)
}
  } 
  
  // install any R packages that may have been passed through --jars or 
--packages.
  // Spark Packages may contain R source code inside the jar.
  if (args.isR && !StringUtils.isBlank(args.jars)) {
RPackageUtils.checkAndBuildRPackage(args.jars, printStream, 
args.verbose)
  }
} {code}
 

*Spark 3.0.2*

**[https://github.com/apache/spark/blob/v3.0.2/core/src/main/scala/org/apache/spark/deploy/SparkSubmit.scala]
{code:java}
   if (!StringUtils.isBlank(resolvedMavenCoordinates)) {
// In K8s client mode, when in the driver, add resolved jars early as 
we might need
// them at the submit time for artifact downloading.
// For example we might use the dependencies for downloading
// files from a Hadoop Compatible fs eg. S3. In this case the user 
might pass:
// --packages 
com.amazonaws:aws-java-sdk:1.7.4:org.apache.hadoop:hadoop-aws:2.7.6
if (isKubernetesClusterModeDriver) {
  val loader = getSubmitClassLoader(sparkConf)
  for (jar <- resolvedMavenCoordinates.split(",")) {
addJarToClasspath(jar, loader)
  }
} else if (isKubernetesCluster) {
  // We need this in K8s cluster mode so that we can upload local deps
  // via the k8s application, like in cluster mode driver
  childClasspath ++= resolvedMavenCoordinates.split(",")
} else {
  args.jars = mergeFileLists(args.jars, resolvedMavenCoordinates)
  if (args.isPython || isInternal(args.primaryResource)) {
args.pyFiles = mergeFileLists(args.pyFiles, 
resolvedMavenCoordinates)
  }
}
  }{code}
 

When using k8s master, in spark 2, jars derived from maven are added to 
args.jars.

However, in spark 3, maven dependencies are not merged to args.jars.

 

I assume that because of it k8s cluster mode spark-submit is not supported 
spark.jars.packages I expected.

So, jars from packages are not added to spark context.

 

> [k8s] On Spark 3, jars listed in spark.jars and spark.jars.packages are not 
> added to sparkContext
> -
>
> Key: SPARK-35084
> URL: https://issues.apache.org/jira/browse/SPARK-35084
> Project: Spark
>  Issue Type: Question
>  Components: Kubernetes
>Affects Versions: 3.0.0, 3.0.2, 3.1.1
>Reporter: Keunhyun Oh
>Priority: Major
>
> I'm trying to migrate spark 2 to spark 3 in k8s.
>  
> In my environment, on Spark 3.x, jars listed in spark.jars and 
> spark.jars.packages are not added to sparkContext.
> After driver's process is launched, jars are not propagated to Executors. So, 
> NoClassDefException is raised in executors.
>  
> In spark.properties, the only main application jar is contained in 
> spark.jars. It is different from Spark 2.
>  
> How to solve this situation? Is it any changed spark options in spark 3 from 
> spark 2?



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-35084) [k8s] On Spark 3, jars listed in spark.jars and spark.jars.packages are not added to sparkContext

2021-04-20 Thread Keunhyun Oh (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-35084?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Keunhyun Oh updated SPARK-35084:

Affects Version/s: 3.1.1

> [k8s] On Spark 3, jars listed in spark.jars and spark.jars.packages are not 
> added to sparkContext
> -
>
> Key: SPARK-35084
> URL: https://issues.apache.org/jira/browse/SPARK-35084
> Project: Spark
>  Issue Type: Question
>  Components: Kubernetes
>Affects Versions: 3.0.0, 3.0.2, 3.1.1
>Reporter: Keunhyun Oh
>Priority: Major
>
> I'm trying to migrate spark 2 to spark 3 in k8s.
>  
> In my environment, on Spark 3.x, jars listed in spark.jars and 
> spark.jars.packages are not added to sparkContext.
> After driver's process is launched, jars are not propagated to Executors. So, 
> NoClassDefException is raised in executors.
>  
> In spark.properties, the only main application jar is contained in 
> spark.jars. It is different from Spark 2.
>  
> How to solve this situation? Is it any changed spark options in spark 3 from 
> spark 2?



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-35158) Add some guides for authors to retrigger the workflow run

2021-04-20 Thread Hyukjin Kwon (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-35158?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hyukjin Kwon updated SPARK-35158:
-
Description: 
Currently only authors can retrigger the GitHub Actions build in their PRs. We 
should explicitly guide them, at 
[https://github.com/apache/spark/blob/master/.github/workflows/notify_test_workflow.yml#L110],
 something like:
 If the tests fail for reasons unrelated to the change, please retrigger the 
workflow run in your forked repository.
 If related, please investigate, fix and push new changes to fix the test 
failure.

This guides can be removed once SPARK-35157 is done.

  was:
Currently only authors can retrigger the GitHub Actions build in their PRs. We 
should explicitly guide them, at 
[https://github.com/apache/spark/blob/master/.github/workflows/notify_test_workflow.yml#L110],
 something like:
If the tests fail for reasons unrelated to the change, please retrigger the 
workflow run in your forked repository.
If related, please investigate, fix and push new changes to fix the test 
failure.


> Add some guides for authors to retrigger the workflow run
> -
>
> Key: SPARK-35158
> URL: https://issues.apache.org/jira/browse/SPARK-35158
> Project: Spark
>  Issue Type: Sub-task
>  Components: Project Infra
>Affects Versions: 3.2.0
>Reporter: Hyukjin Kwon
>Priority: Major
>
> Currently only authors can retrigger the GitHub Actions build in their PRs. 
> We should explicitly guide them, at 
> [https://github.com/apache/spark/blob/master/.github/workflows/notify_test_workflow.yml#L110],
>  something like:
>  If the tests fail for reasons unrelated to the change, please retrigger the 
> workflow run in your forked repository.
>  If related, please investigate, fix and push new changes to fix the test 
> failure.
> This guides can be removed once SPARK-35157 is done.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-35158) Add some guides for authors to retrigger the workflow run

2021-04-20 Thread Hyukjin Kwon (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-35158?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hyukjin Kwon updated SPARK-35158:
-
Description: 
Currently only authors can retrigger the GitHub Actions build in their PRs. We 
should explicitly guide them, at 
[https://github.com/apache/spark/blob/master/.github/workflows/notify_test_workflow.yml#L110],
 something like:
If the tests fail for reasons unrelated to the change, please retrigger the 
workflow run in your forked repository.
If related, please investigate, fix and push new changes to fix the test 
failure.

  was:Currently only authors can retrigger the GitHub Actions build in their 
PRs. We should explicitly guide them, at 
[https://github.com/apache/spark/blob/master/.github/workflows/notify_test_workflow.yml#L110],
 to fix the changes and/or retrigger the tests if it fails.


> Add some guides for authors to retrigger the workflow run
> -
>
> Key: SPARK-35158
> URL: https://issues.apache.org/jira/browse/SPARK-35158
> Project: Spark
>  Issue Type: Sub-task
>  Components: Project Infra
>Affects Versions: 3.2.0
>Reporter: Hyukjin Kwon
>Priority: Major
>
> Currently only authors can retrigger the GitHub Actions build in their PRs. 
> We should explicitly guide them, at 
> [https://github.com/apache/spark/blob/master/.github/workflows/notify_test_workflow.yml#L110],
>  something like:
> If the tests fail for reasons unrelated to the change, please retrigger the 
> workflow run in your forked repository.
> If related, please investigate, fix and push new changes to fix the test 
> failure.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-35158) Add some guides for authors to retrigger the workflow run

2021-04-20 Thread Hyukjin Kwon (Jira)
Hyukjin Kwon created SPARK-35158:


 Summary: Add some guides for authors to retrigger the workflow run
 Key: SPARK-35158
 URL: https://issues.apache.org/jira/browse/SPARK-35158
 Project: Spark
  Issue Type: Sub-task
  Components: Project Infra
Affects Versions: 3.2.0
Reporter: Hyukjin Kwon


Currently only authors can retrigger the GitHub Actions build in their PRs. We 
should explicitly guide them, at 
[https://github.com/apache/spark/blob/master/.github/workflows/notify_test_workflow.yml#L110],
 to fix the changes and/or retrigger the tests if it fails.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-35157) Have a way for other people to retrigger the build in GitHub Actions

2021-04-20 Thread Hyukjin Kwon (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-35157?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hyukjin Kwon updated SPARK-35157:
-
Description: 
We should ask contributors to retrigger the tests because the builds run in 
their fork, and currently committers or other people cannot retrigger out of 
the box.
 Note that the retriggering has to happen in forked repository. This cannot be 
done from main repository.

One possible way is to create a workflow *that only runs in forked repository*:
 1. Regularly (15 mins?) get a list of PRs opened from the forked repository or 
possibly author? (see 
[https://github.com/apache/spark/blob/master/.github/workflows/update_build_status.yml#L36-L47])
 2. Iterate the PRs:
 ㅤ2.1. Get the latest workflow run 
([https://github.com/apache/spark/blob/master/.github/workflows/notify_test_workflow.yml#L41-L59])
 ㅤ2.2. Iterates the comments in the PR (see 
[https://docs.github.com/en/rest/guides/working-with-comments#pull-request-comments]
 for Javascript and 
[https://docs.github.com/en/rest/reference/issues#list-issue-comments] for REST 
API). Issue number is PR number.
 ㅤㅤ2.2.1. check if there is a comment such as "GitHub Actions: retrigger 
please" _after the latest workflow run_. Last update time is available when you 
get workflow run, see also 
[https://docs.github.com/en/rest/reference/actions#get-a-workflow-run]
 ㅤㅤㅤ2.2.1.1. If there is, retrigger the workflow run, see also 
[https://docs.github.com/en/rest/reference/actions#create-a-workflow-dispatch-event]
 ㅤㅤㅤ2.2.1.2. If not, skip.

  was:We should ask contributors to retrigger the tests because the builds run 
in their fork, and currently committers cannot retrigger out of the box.


> Have a way for other people to retrigger the build in GitHub Actions
> 
>
> Key: SPARK-35157
> URL: https://issues.apache.org/jira/browse/SPARK-35157
> Project: Spark
>  Issue Type: Sub-task
>  Components: Project Infra
>Affects Versions: 3.2.0
>Reporter: Hyukjin Kwon
>Priority: Major
>
> We should ask contributors to retrigger the tests because the builds run in 
> their fork, and currently committers or other people cannot retrigger out of 
> the box.
>  Note that the retriggering has to happen in forked repository. This cannot 
> be done from main repository.
> One possible way is to create a workflow *that only runs in forked 
> repository*:
>  1. Regularly (15 mins?) get a list of PRs opened from the forked repository 
> or possibly author? (see 
> [https://github.com/apache/spark/blob/master/.github/workflows/update_build_status.yml#L36-L47])
>  2. Iterate the PRs:
>  ㅤ2.1. Get the latest workflow run 
> ([https://github.com/apache/spark/blob/master/.github/workflows/notify_test_workflow.yml#L41-L59])
>  ㅤ2.2. Iterates the comments in the PR (see 
> [https://docs.github.com/en/rest/guides/working-with-comments#pull-request-comments]
>  for Javascript and 
> [https://docs.github.com/en/rest/reference/issues#list-issue-comments] for 
> REST API). Issue number is PR number.
>  ㅤㅤ2.2.1. check if there is a comment such as "GitHub Actions: retrigger 
> please" _after the latest workflow run_. Last update time is available when 
> you get workflow run, see also 
> [https://docs.github.com/en/rest/reference/actions#get-a-workflow-run]
>  ㅤㅤㅤ2.2.1.1. If there is, retrigger the workflow run, see also 
> [https://docs.github.com/en/rest/reference/actions#create-a-workflow-dispatch-event]
>  ㅤㅤㅤ2.2.1.2. If not, skip.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-35157) Have a way for other people to retrigger the build in GitHub Actions

2021-04-20 Thread Hyukjin Kwon (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-35157?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hyukjin Kwon updated SPARK-35157:
-
Summary: Have a way for other people to retrigger the build in GitHub 
Actions  (was: Guide users to retrigger if tests fails in GitHub Actions build)

> Have a way for other people to retrigger the build in GitHub Actions
> 
>
> Key: SPARK-35157
> URL: https://issues.apache.org/jira/browse/SPARK-35157
> Project: Spark
>  Issue Type: Sub-task
>  Components: Project Infra
>Affects Versions: 3.2.0
>Reporter: Hyukjin Kwon
>Priority: Major
>
> We should ask contributors to retrigger the tests because the builds run in 
> their fork, and currently committers cannot retrigger out of the box.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-34639) always remove unnecessary Alias in Analyzer.resolveExpression

2021-04-20 Thread Takeshi Yamamuro (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-34639?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Takeshi Yamamuro updated SPARK-34639:
-
Fix Version/s: 3.1.2

> always remove unnecessary Alias in Analyzer.resolveExpression
> -
>
> Key: SPARK-34639
> URL: https://issues.apache.org/jira/browse/SPARK-34639
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 3.2.0
>Reporter: Wenchen Fan
>Assignee: Wenchen Fan
>Priority: Major
> Fix For: 3.1.2, 3.2.0
>
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-35157) Guide users to retrigger if tests fails in GitHub Actions build

2021-04-20 Thread Hyukjin Kwon (Jira)
Hyukjin Kwon created SPARK-35157:


 Summary: Guide users to retrigger if tests fails in GitHub Actions 
build
 Key: SPARK-35157
 URL: https://issues.apache.org/jira/browse/SPARK-35157
 Project: Spark
  Issue Type: Sub-task
  Components: Project Infra
Affects Versions: 3.2.0
Reporter: Hyukjin Kwon


We should ask contributors to retrigger the tests because the builds run in 
their fork, and currently committers cannot retrigger out of the box.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-35156) Thrown java.lang.NoClassDefFoundError when using spark-submit

2021-04-20 Thread L. C. Hsieh (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-35156?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

L. C. Hsieh updated SPARK-35156:

Description: 
Got NoClassDefFoundError when run spark-submit to submit Spark app to K8S 
cluster.

Master, branch-3.1 are okay. Branch-3.1 is affected.

How to reproduce:

1. Using sbt to build Spark with Kubernetes (-Pkubernetes)
2. Run spark-submit to submit to K8S cluster
3. Get the following exception 

{code:java}
21/04/20 16:33:37 INFO SparkKubernetesClientFactory: Auto-configuring K8S 
client using current context from users K8S config file                         
        
Exception in thread "main" java.lang.NoClassDefFoundError: 
com/fasterxml/jackson/dataformat/yaml/YAMLFactory                               
                       
        at 
io.fabric8.kubernetes.client.internal.KubeConfigUtils.parseConfigFromString(KubeConfigUtils.java:46)
                                                   
        at 
io.fabric8.kubernetes.client.Config.loadFromKubeconfig(Config.java:564)         
                                                                       
        at io.fabric8.kubernetes.client.Config.tryKubeConfig(Config.java:530)   
                                                                                
  
        at io.fabric8.kubernetes.client.Config.autoConfigure(Config.java:264)   
                                                                                
  
        at io.fabric8.kubernetes.client.Config.(Config.java:230)          
                                                                                
  
        at io.fabric8.kubernetes.client.Config.(Config.java:224)          
                                                                                
  
        at io.fabric8.kubernetes.client.Config.autoConfigure(Config.java:259)   
                                                                                
          at 
org.apache.spark.deploy.k8s.SparkKubernetesClientFactory$.createKubernetesClient(SparkKubernetesClientFactory.scala:80)
                                
        at 
org.apache.spark.deploy.k8s.submit.KubernetesClientApplication.$anonfun$run$2(KubernetesClientApplication.scala:207)
                                   
        at org.apache.spark.util.Utils$.tryWithResource(Utils.scala:2621)       
                                                                                
  
        at 
org.apache.spark.deploy.k8s.submit.KubernetesClientApplication.run(KubernetesClientApplication.scala:207)
                                              
        at 
org.apache.spark.deploy.k8s.submit.KubernetesClientApplication.start(KubernetesClientApplication.scala:179)
                                            
        at 
org.apache.spark.deploy.SparkSubmit.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:951)
                                                
        at 
org.apache.spark.deploy.SparkSubmit.doRunMain$1(SparkSubmit.scala:180)          
                                                                       
        at org.apache.spark.deploy.SparkSubmit.submit(SparkSubmit.scala:203)    
                                                                                
  
        at org.apache.spark.deploy.SparkSubmit.doSubmit(SparkSubmit.scala:90)
        at 
org.apache.spark.deploy.SparkSubmit$$anon$2.doSubmit(SparkSubmit.scala:1030)
        at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:1039)
        at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)
Caused by: java.lang.ClassNotFoundException: 
com.fasterxml.jackson.dataformat.yaml.YAMLFactory
        at java.net.URLClassLoader.findClass(URLClassLoader.java:382)
        at java.lang.ClassLoader.loadClass(ClassLoader.java:418)
        at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:352)
        at java.lang.ClassLoader.loadClass(ClassLoader.java:351)
        ... 19 more {code}

  was:
Got NoClassDefFoundError when run spark-submit to submit Spark app to K8S 
cluster.

Master branch is okay. Branch-3.1 is affected.

How to reproduce:

1. Using sbt to build Spark with Kubernetes (-Pkubernetes)
2. Run spark-submit to submit to K8S cluster
3. Get the following exception 

{code:java}
21/04/20 16:33:37 INFO SparkKubernetesClientFactory: Auto-configuring K8S 
client using current context from users K8S config file                         
        
Exception in thread "main" java.lang.NoClassDefFoundError: 
com/fasterxml/jackson/dataformat/yaml/YAMLFactory                               
                       
        at 
io.fabric8.kubernetes.client.internal.KubeConfigUtils.parseConfigFromString(KubeConfigUtils.java:46)
                                                   
        at 
io.fabric8.kubernetes.client.Config.loadFromKubeconfig(Config.java:564)         
                                                                       
      

[jira] [Updated] (SPARK-35156) Thrown java.lang.NoClassDefFoundError when using spark-submit

2021-04-20 Thread L. C. Hsieh (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-35156?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

L. C. Hsieh updated SPARK-35156:

Description: 
Got NoClassDefFoundError when run spark-submit to submit Spark app to K8S 
cluster.

Master branch is okay. Branch-3.1 is affected.

How to reproduce:

1. Using sbt to build Spark with Kubernetes (-Pkubernetes)
2. Run spark-submit to submit to K8S cluster
3. Get the following exception 

{code:java}
21/04/20 16:33:37 INFO SparkKubernetesClientFactory: Auto-configuring K8S 
client using current context from users K8S config file                         
        
Exception in thread "main" java.lang.NoClassDefFoundError: 
com/fasterxml/jackson/dataformat/yaml/YAMLFactory                               
                       
        at 
io.fabric8.kubernetes.client.internal.KubeConfigUtils.parseConfigFromString(KubeConfigUtils.java:46)
                                                   
        at 
io.fabric8.kubernetes.client.Config.loadFromKubeconfig(Config.java:564)         
                                                                       
        at io.fabric8.kubernetes.client.Config.tryKubeConfig(Config.java:530)   
                                                                                
  
        at io.fabric8.kubernetes.client.Config.autoConfigure(Config.java:264)   
                                                                                
  
        at io.fabric8.kubernetes.client.Config.(Config.java:230)          
                                                                                
  
        at io.fabric8.kubernetes.client.Config.(Config.java:224)          
                                                                                
  
        at io.fabric8.kubernetes.client.Config.autoConfigure(Config.java:259)   
                                                                                
          at 
org.apache.spark.deploy.k8s.SparkKubernetesClientFactory$.createKubernetesClient(SparkKubernetesClientFactory.scala:80)
                                
        at 
org.apache.spark.deploy.k8s.submit.KubernetesClientApplication.$anonfun$run$2(KubernetesClientApplication.scala:207)
                                   
        at org.apache.spark.util.Utils$.tryWithResource(Utils.scala:2621)       
                                                                                
  
        at 
org.apache.spark.deploy.k8s.submit.KubernetesClientApplication.run(KubernetesClientApplication.scala:207)
                                              
        at 
org.apache.spark.deploy.k8s.submit.KubernetesClientApplication.start(KubernetesClientApplication.scala:179)
                                            
        at 
org.apache.spark.deploy.SparkSubmit.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:951)
                                                
        at 
org.apache.spark.deploy.SparkSubmit.doRunMain$1(SparkSubmit.scala:180)          
                                                                       
        at org.apache.spark.deploy.SparkSubmit.submit(SparkSubmit.scala:203)    
                                                                                
  
        at org.apache.spark.deploy.SparkSubmit.doSubmit(SparkSubmit.scala:90)
        at 
org.apache.spark.deploy.SparkSubmit$$anon$2.doSubmit(SparkSubmit.scala:1030)
        at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:1039)
        at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)
Caused by: java.lang.ClassNotFoundException: 
com.fasterxml.jackson.dataformat.yaml.YAMLFactory
        at java.net.URLClassLoader.findClass(URLClassLoader.java:382)
        at java.lang.ClassLoader.loadClass(ClassLoader.java:418)
        at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:352)
        at java.lang.ClassLoader.loadClass(ClassLoader.java:351)
        ... 19 more {code}

  was:
How to reproduce:

1. Using sbt to build Spark with Kubernetes (-Pkubernetes)
2. Run spark-submit to submit to K8S cluster
3. Get the following exception 

{code:java}
21/04/20 16:33:37 INFO SparkKubernetesClientFactory: Auto-configuring K8S 
client using current context from users K8S config file                         
        
Exception in thread "main" java.lang.NoClassDefFoundError: 
com/fasterxml/jackson/dataformat/yaml/YAMLFactory                               
                       
        at 
io.fabric8.kubernetes.client.internal.KubeConfigUtils.parseConfigFromString(KubeConfigUtils.java:46)
                                                   
        at 
io.fabric8.kubernetes.client.Config.loadFromKubeconfig(Config.java:564)         
                                                                       
        at io.fabric8.kubernetes.client.Config.tryKubeConfig(Config.java:530)   
                                                                

[jira] [Created] (SPARK-35156) Thrown java.lang.NoClassDefFoundError when using spark-submit

2021-04-20 Thread L. C. Hsieh (Jira)
L. C. Hsieh created SPARK-35156:
---

 Summary: Thrown java.lang.NoClassDefFoundError when using 
spark-submit
 Key: SPARK-35156
 URL: https://issues.apache.org/jira/browse/SPARK-35156
 Project: Spark
  Issue Type: Bug
  Components: Build, Kubernetes
Affects Versions: 3.1.1
Reporter: L. C. Hsieh


How to reproduce:

1. Using sbt to build Spark with Kubernetes (-Pkubernetes)
2. Run spark-submit to submit to K8S cluster
3. Get the following exception 

{code:java}
21/04/20 16:33:37 INFO SparkKubernetesClientFactory: Auto-configuring K8S 
client using current context from users K8S config file                         
        
Exception in thread "main" java.lang.NoClassDefFoundError: 
com/fasterxml/jackson/dataformat/yaml/YAMLFactory                               
                       
        at 
io.fabric8.kubernetes.client.internal.KubeConfigUtils.parseConfigFromString(KubeConfigUtils.java:46)
                                                   
        at 
io.fabric8.kubernetes.client.Config.loadFromKubeconfig(Config.java:564)         
                                                                       
        at io.fabric8.kubernetes.client.Config.tryKubeConfig(Config.java:530)   
                                                                                
  
        at io.fabric8.kubernetes.client.Config.autoConfigure(Config.java:264)   
                                                                                
  
        at io.fabric8.kubernetes.client.Config.(Config.java:230)          
                                                                                
  
        at io.fabric8.kubernetes.client.Config.(Config.java:224)          
                                                                                
  
        at io.fabric8.kubernetes.client.Config.autoConfigure(Config.java:259)   
                                                                                
          at 
org.apache.spark.deploy.k8s.SparkKubernetesClientFactory$.createKubernetesClient(SparkKubernetesClientFactory.scala:80)
                                
        at 
org.apache.spark.deploy.k8s.submit.KubernetesClientApplication.$anonfun$run$2(KubernetesClientApplication.scala:207)
                                   
        at org.apache.spark.util.Utils$.tryWithResource(Utils.scala:2621)       
                                                                                
  
        at 
org.apache.spark.deploy.k8s.submit.KubernetesClientApplication.run(KubernetesClientApplication.scala:207)
                                              
        at 
org.apache.spark.deploy.k8s.submit.KubernetesClientApplication.start(KubernetesClientApplication.scala:179)
                                            
        at 
org.apache.spark.deploy.SparkSubmit.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:951)
                                                
        at 
org.apache.spark.deploy.SparkSubmit.doRunMain$1(SparkSubmit.scala:180)          
                                                                       
        at org.apache.spark.deploy.SparkSubmit.submit(SparkSubmit.scala:203)    
                                                                                
  
        at org.apache.spark.deploy.SparkSubmit.doSubmit(SparkSubmit.scala:90)
        at 
org.apache.spark.deploy.SparkSubmit$$anon$2.doSubmit(SparkSubmit.scala:1030)
        at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:1039)
        at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)
Caused by: java.lang.ClassNotFoundException: 
com.fasterxml.jackson.dataformat.yaml.YAMLFactory
        at java.net.URLClassLoader.findClass(URLClassLoader.java:382)
        at java.lang.ClassLoader.loadClass(ClassLoader.java:418)
        at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:352)
        at java.lang.ClassLoader.loadClass(ClassLoader.java:351)
        ... 19 more {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-35132) Upgrade netty-all to 4.1.63.Final

2021-04-20 Thread Sean R. Owen (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-35132?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sean R. Owen resolved SPARK-35132.
--
Fix Version/s: 3.2.0
   Resolution: Fixed

Issue resolved by pull request 32227
[https://github.com/apache/spark/pull/32227]

> Upgrade netty-all to 4.1.63.Final
> -
>
> Key: SPARK-35132
> URL: https://issues.apache.org/jira/browse/SPARK-35132
> Project: Spark
>  Issue Type: Improvement
>  Components: Build
>Affects Versions: 3.2.0
>Reporter: Yang Jie
>Assignee: Yang Jie
>Priority: Minor
> Fix For: 3.2.0
>
>
> Three CVE problems were found after netty 4.1.51.Final:
>  
> ||Name||Description||
> |[CVE-2021-21409|https://cve.mitre.org/cgi-bin/cvename.cgi?name=CVE-2021-21409]|Netty
>  is an open-source, asynchronous event-driven network application framework 
> for rapid development of maintainable high performance protocol servers & 
> clients. In Netty (io.netty:netty-codec-http2) before version 4.1.61.Final 
> there is a vulnerability that enables request smuggling. The content-length 
> header is not correctly validated if the request only uses a single 
> Http2HeaderFrame with the endStream set to to true. This could lead to 
> request smuggling if the request is proxied to a remote peer and translated 
> to HTTP/1.1. This is a followup of GHSA-wm47-8v5p-wjpj/CVE-2021-21295 which 
> did miss to fix this one case. This was fixed as part of 4.1.61.Final.|
> |[CVE-2021-21295|https://cve.mitre.org/cgi-bin/cvename.cgi?name=CVE-2021-21295]|Netty
>  is an open-source, asynchronous event-driven network application framework 
> for rapid development of maintainable high performance protocol servers & 
> clients. In Netty (io.netty:netty-codec-http2) before version 4.1.60.Final 
> there is a vulnerability that enables request smuggling. If a Content-Length 
> header is present in the original HTTP/2 request, the field is not validated 
> by `Http2MultiplexHandler` as it is propagated up. This is fine as long as 
> the request is not proxied through as HTTP/1.1. If the request comes in as an 
> HTTP/2 stream, gets converted into the HTTP/1.1 domain objects 
> (`HttpRequest`, `HttpContent`, etc.) via `Http2StreamFrameToHttpObjectCodec 
> `and then sent up to the child channel's pipeline and proxied through a 
> remote peer as HTTP/1.1 this may result in request smuggling. In a proxy 
> case, users may assume the content-length is validated somehow, which is not 
> the case. If the request is forwarded to a backend channel that is a HTTP/1.1 
> connection, the Content-Length now has meaning and needs to be checked. An 
> attacker can smuggle requests inside the body as it gets downgraded from 
> HTTP/2 to HTTP/1.1. For an example attack refer to the linked GitHub 
> Advisory. Users are only affected if all of this is true: 
> `HTTP2MultiplexCodec` or `Http2FrameCodec` is used, 
> `Http2StreamFrameToHttpObjectCodec` is used to convert to HTTP/1.1 objects, 
> and these HTTP/1.1 objects are forwarded to another remote peer. This has 
> been patched in 4.1.60.Final As a workaround, the user can do the validation 
> by themselves by implementing a custom `ChannelInboundHandler` that is put in 
> the `ChannelPipeline` behind `Http2StreamFrameToHttpObjectCodec`.|
> |[CVE-2021-21290|https://cve.mitre.org/cgi-bin/cvename.cgi?name=CVE-2021-21290]|Netty
>  is an open-source, asynchronous event-driven network application framework 
> for rapid development of maintainable high performance protocol servers & 
> clients. In Netty before version 4.1.59.Final there is a vulnerability on 
> Unix-like systems involving an insecure temp file. When netty's multipart 
> decoders are used local information disclosure can occur via the local system 
> temporary directory if temporary storing uploads on the disk is enabled. On 
> unix-like systems, the temporary directory is shared between all user. As 
> such, writing to this directory using APIs that do not explicitly set the 
> file/directory permissions can lead to information disclosure. Of note, this 
> does not impact modern MacOS Operating Systems. The method 
> "File.createTempFile" on unix-like systems creates a random file, but, by 
> default will create this file with the permissions "-rw-r--r--". Thus, if 
> sensitive information is written to this file, other local users can read 
> this information. This is the case in netty's "AbstractDiskHttpData" is 
> vulnerable. This has been fixed in version 4.1.59.Final. As a workaround, one 
> may specify your own "java.io.tmpdir" when you start the JVM or use 
> "DefaultHttpDataFactory.setBaseDir(...)" to set the directory to something 
> that is only readable by the current user.|
>  
> Upgrade netty version to avoid these potential risks



--
This message was sent by Atlassian Jira

[jira] [Assigned] (SPARK-35132) Upgrade netty-all to 4.1.63.Final

2021-04-20 Thread Sean R. Owen (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-35132?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sean R. Owen reassigned SPARK-35132:


Assignee: Yang Jie

> Upgrade netty-all to 4.1.63.Final
> -
>
> Key: SPARK-35132
> URL: https://issues.apache.org/jira/browse/SPARK-35132
> Project: Spark
>  Issue Type: Improvement
>  Components: Build
>Affects Versions: 3.2.0
>Reporter: Yang Jie
>Assignee: Yang Jie
>Priority: Minor
>
> Three CVE problems were found after netty 4.1.51.Final:
>  
> ||Name||Description||
> |[CVE-2021-21409|https://cve.mitre.org/cgi-bin/cvename.cgi?name=CVE-2021-21409]|Netty
>  is an open-source, asynchronous event-driven network application framework 
> for rapid development of maintainable high performance protocol servers & 
> clients. In Netty (io.netty:netty-codec-http2) before version 4.1.61.Final 
> there is a vulnerability that enables request smuggling. The content-length 
> header is not correctly validated if the request only uses a single 
> Http2HeaderFrame with the endStream set to to true. This could lead to 
> request smuggling if the request is proxied to a remote peer and translated 
> to HTTP/1.1. This is a followup of GHSA-wm47-8v5p-wjpj/CVE-2021-21295 which 
> did miss to fix this one case. This was fixed as part of 4.1.61.Final.|
> |[CVE-2021-21295|https://cve.mitre.org/cgi-bin/cvename.cgi?name=CVE-2021-21295]|Netty
>  is an open-source, asynchronous event-driven network application framework 
> for rapid development of maintainable high performance protocol servers & 
> clients. In Netty (io.netty:netty-codec-http2) before version 4.1.60.Final 
> there is a vulnerability that enables request smuggling. If a Content-Length 
> header is present in the original HTTP/2 request, the field is not validated 
> by `Http2MultiplexHandler` as it is propagated up. This is fine as long as 
> the request is not proxied through as HTTP/1.1. If the request comes in as an 
> HTTP/2 stream, gets converted into the HTTP/1.1 domain objects 
> (`HttpRequest`, `HttpContent`, etc.) via `Http2StreamFrameToHttpObjectCodec 
> `and then sent up to the child channel's pipeline and proxied through a 
> remote peer as HTTP/1.1 this may result in request smuggling. In a proxy 
> case, users may assume the content-length is validated somehow, which is not 
> the case. If the request is forwarded to a backend channel that is a HTTP/1.1 
> connection, the Content-Length now has meaning and needs to be checked. An 
> attacker can smuggle requests inside the body as it gets downgraded from 
> HTTP/2 to HTTP/1.1. For an example attack refer to the linked GitHub 
> Advisory. Users are only affected if all of this is true: 
> `HTTP2MultiplexCodec` or `Http2FrameCodec` is used, 
> `Http2StreamFrameToHttpObjectCodec` is used to convert to HTTP/1.1 objects, 
> and these HTTP/1.1 objects are forwarded to another remote peer. This has 
> been patched in 4.1.60.Final As a workaround, the user can do the validation 
> by themselves by implementing a custom `ChannelInboundHandler` that is put in 
> the `ChannelPipeline` behind `Http2StreamFrameToHttpObjectCodec`.|
> |[CVE-2021-21290|https://cve.mitre.org/cgi-bin/cvename.cgi?name=CVE-2021-21290]|Netty
>  is an open-source, asynchronous event-driven network application framework 
> for rapid development of maintainable high performance protocol servers & 
> clients. In Netty before version 4.1.59.Final there is a vulnerability on 
> Unix-like systems involving an insecure temp file. When netty's multipart 
> decoders are used local information disclosure can occur via the local system 
> temporary directory if temporary storing uploads on the disk is enabled. On 
> unix-like systems, the temporary directory is shared between all user. As 
> such, writing to this directory using APIs that do not explicitly set the 
> file/directory permissions can lead to information disclosure. Of note, this 
> does not impact modern MacOS Operating Systems. The method 
> "File.createTempFile" on unix-like systems creates a random file, but, by 
> default will create this file with the permissions "-rw-r--r--". Thus, if 
> sensitive information is written to this file, other local users can read 
> this information. This is the case in netty's "AbstractDiskHttpData" is 
> vulnerable. This has been fixed in version 4.1.59.Final. As a workaround, one 
> may specify your own "java.io.tmpdir" when you start the JVM or use 
> "DefaultHttpDataFactory.setBaseDir(...)" to set the directory to something 
> that is only readable by the current user.|
>  
> Upgrade netty version to avoid these potential risks



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: 

[jira] [Resolved] (SPARK-35153) Override `sql()` of ANSI interval operators

2021-04-20 Thread Max Gekk (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-35153?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Max Gekk resolved SPARK-35153.
--
Fix Version/s: 3.2.0
   Resolution: Fixed

Issue resolved by pull request 32262
[https://github.com/apache/spark/pull/32262]

> Override `sql()` of ANSI interval operators
> ---
>
> Key: SPARK-35153
> URL: https://issues.apache.org/jira/browse/SPARK-35153
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 3.2.0
>Reporter: Max Gekk
>Assignee: Max Gekk
>Priority: Major
> Fix For: 3.2.0
>
>
> Override the sql() method of the expression that implements operators over 
> ANSI interval, and make SQL representation more readable and potentially 
> parsable.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-35155) Add rule id to all ResolveXxx rules

2021-04-20 Thread Yingyi Bu (Jira)
Yingyi Bu created SPARK-35155:
-

 Summary: Add rule id to all ResolveXxx rules
 Key: SPARK-35155
 URL: https://issues.apache.org/jira/browse/SPARK-35155
 Project: Spark
  Issue Type: Sub-task
  Components: Optimizer
Affects Versions: 3.1.0
Reporter: Yingyi Bu


All ResolveXxx rules are run in a fixed point batch and can be beneficial for 
the rule-id-based pruning, regardless of whether there's a stop condition 
lambda.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-34472) SparkContext.addJar with an ivy path fails in cluster mode with a custom ivySettings file

2021-04-20 Thread Thomas Graves (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-34472?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Thomas Graves resolved SPARK-34472.
---
Fix Version/s: 3.2.0
 Assignee: Shardul Mahadik
   Resolution: Fixed

> SparkContext.addJar with an ivy path fails in cluster mode with a custom 
> ivySettings file
> -
>
> Key: SPARK-34472
> URL: https://issues.apache.org/jira/browse/SPARK-34472
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Core
>Affects Versions: 3.2.0
>Reporter: Shardul Mahadik
>Assignee: Shardul Mahadik
>Priority: Major
> Fix For: 3.2.0
>
>
> SPARK-33084 introduced support for Ivy paths in {{sc.addJar}} or Spark SQL 
> {{ADD JAR}}. If we use a custom ivySettings file using 
> {{spark.jars.ivySettings}}, it is loaded at 
> [https://github.com/apache/spark/blob/b26e7b510bbaee63c4095ab47e75ff2a70e377d7/core/src/main/scala/org/apache/spark/deploy/SparkSubmit.scala#L1280.]
>  However, this file is only accessible on the client machine. In cluster 
> mode, this file is not available on the driver and so {{addJar}} fails.
> {code:sh}
> spark-submit --master yarn --deploy-mode cluster --class IvyAddJarExample 
> --conf spark.jars.ivySettings=/path/to/ivySettings.xml example.jar
> {code}
> {code}
> java.lang.IllegalArgumentException: requirement failed: Ivy settings file 
> /path/to/ivySettings.xml does not exist
>   at scala.Predef$.require(Predef.scala:281)
>   at 
> org.apache.spark.deploy.SparkSubmitUtils$.loadIvySettings(SparkSubmit.scala:1331)
>   at 
> org.apache.spark.util.DependencyUtils$.resolveMavenDependencies(DependencyUtils.scala:176)
>   at 
> org.apache.spark.util.DependencyUtils$.resolveMavenDependencies(DependencyUtils.scala:156)
>   at 
> org.apache.spark.sql.internal.SessionResourceLoader.resolveJars(SessionState.scala:166)
>   at 
> org.apache.spark.sql.hive.HiveSessionResourceLoader.addJar(HiveSessionStateBuilder.scala:133)
>   at 
> org.apache.spark.sql.execution.command.AddJarCommand.run(resources.scala:40)
>  {code}
> We should ship the ivySettings file to the driver so that {{addJar}} is able 
> to find it.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-35044) Support retrieve hadoop configurations via SET syntax

2021-04-20 Thread Apache Spark (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-35044?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17325994#comment-17325994
 ] 

Apache Spark commented on SPARK-35044:
--

User 'yaooqinn' has created a pull request for this issue:
https://github.com/apache/spark/pull/32263

> Support retrieve hadoop configurations via SET syntax
> -
>
> Key: SPARK-35044
> URL: https://issues.apache.org/jira/browse/SPARK-35044
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 3.2.0
>Reporter: Kent Yao
>Assignee: Kent Yao
>Priority: Major
> Fix For: 3.2.0
>
>
> Currently, pure SQL users are short of ways to see the Hadoop configurations 
> which may affect their jobs a lot



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-35044) Support retrieve hadoop configurations via SET syntax

2021-04-20 Thread Apache Spark (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-35044?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17325992#comment-17325992
 ] 

Apache Spark commented on SPARK-35044:
--

User 'yaooqinn' has created a pull request for this issue:
https://github.com/apache/spark/pull/32263

> Support retrieve hadoop configurations via SET syntax
> -
>
> Key: SPARK-35044
> URL: https://issues.apache.org/jira/browse/SPARK-35044
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 3.2.0
>Reporter: Kent Yao
>Assignee: Kent Yao
>Priority: Major
> Fix For: 3.2.0
>
>
> Currently, pure SQL users are short of ways to see the Hadoop configurations 
> which may affect their jobs a lot



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-35154) Rpc env not shutdown when shutdown method call by endpoint onStop

2021-04-20 Thread LIU (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-35154?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

LIU updated SPARK-35154:

Description: 
when i use this code to work,  Rpc thread hangs up and not close gracefully. i 
think when rpc thread called shutdown on OnStop method, it will try to put 
MessageLoop.PoisonPill to return and stop thread in rpc pool. In spark 3.x, it 
will make others thread return & stop but current thread which call OnStop 
method to await current pool to stop. it makes current thread not stop, and 
pending program.

I'm not sure that needs to be improved or not?

 
{code:java}
//代码占位符{code}
test("Rpc env not shutdown when shutdown method call by endpoint onStop") {
     val rpcEndpoint = new RpcEndpoint {
         override val rpcEnv: RpcEnv = env
         override def onStop(): Unit =

{             env.shutdown()             env.awaitTermination()         }

        override def receiveAndReply(context: RpcCallContext): 
PartialFunction[Any, Unit] =

{             case m => context.reply(m)          }

    }
    env.setupEndpoint("test", rpcEndpoint)
    rpcEndpoint.stop()
    env.awaitTermination()
 }

 

  was:
when i use this code to work,  Rpc thread hangs up and not close gracefully. i 
think when rpc thread called shutdown on OnStop method, it will try to put 

MessageLoop.PoisonPill to return and stop thread in rpc pool. In spark 3.x, it 
will make others thread return & stop but current thread which call OnStop 
method to await current pool to stop. it makes current thread not stop, and 
pending program.

I'm not sure that needs to be improved or not?

 
{code:java}
//代码占位符{code}
test("Rpc env not shutdown when shutdown method call by endpoint onStop") {
     val rpcEndpoint = new RpcEndpoint {
         override val rpcEnv: RpcEnv = env
         override def onStop(): Unit = {

            env.shutdown()

            env.awaitTermination()

        }

        override def receiveAndReply(context: RpcCallContext): 
PartialFunction[Any, Unit] = {

            case m => context.reply(m)

         }

    }
    env.setupEndpoint("test", rpcEndpoint)
    rpcEndpoint.stop()
    env.awaitTermination()
 }

 


> Rpc env not shutdown when shutdown method call by endpoint onStop
> -
>
> Key: SPARK-35154
> URL: https://issues.apache.org/jira/browse/SPARK-35154
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Core
>Affects Versions: 3.0.0
> Environment: spark-3.x
>Reporter: LIU
>Priority: Major
>
> when i use this code to work,  Rpc thread hangs up and not close gracefully. 
> i think when rpc thread called shutdown on OnStop method, it will try to put 
> MessageLoop.PoisonPill to return and stop thread in rpc pool. In spark 3.x, 
> it will make others thread return & stop but current thread which call OnStop 
> method to await current pool to stop. it makes current thread not stop, and 
> pending program.
> I'm not sure that needs to be improved or not?
>  
> {code:java}
> //代码占位符{code}
> test("Rpc env not shutdown when shutdown method call by endpoint onStop") {
>      val rpcEndpoint = new RpcEndpoint {
>          override val rpcEnv: RpcEnv = env
>          override def onStop(): Unit =
> {             env.shutdown()             env.awaitTermination()         }
>         override def receiveAndReply(context: RpcCallContext): 
> PartialFunction[Any, Unit] =
> {             case m => context.reply(m)          }
>     }
>     env.setupEndpoint("test", rpcEndpoint)
>     rpcEndpoint.stop()
>     env.awaitTermination()
>  }
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-35154) Rpc env not shutdown when shutdown method call by endpoint onStop

2021-04-20 Thread LIU (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-35154?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

LIU updated SPARK-35154:

Description: 
when i use this code to work,  Rpc thread hangs up and not close gracefully. i 
think when rpc thread called shutdown on OnStop method, it will try to put 

MessageLoop.PoisonPill to return and stop thread in rpc pool. In spark 3.x, it 
will make others thread return & stop but current thread which call OnStop 
method to await current pool to stop. it makes current thread not stop, and 
pending program.

I'm not sure that needs to be improved or not?

 
{code:java}
//代码占位符{code}
test("Rpc env not shutdown when shutdown method call by endpoint onStop") {
     val rpcEndpoint = new RpcEndpoint {
         override val rpcEnv: RpcEnv = env
         override def onStop(): Unit = {

            env.shutdown()

            env.awaitTermination()

        }

        override def receiveAndReply(context: RpcCallContext): 
PartialFunction[Any, Unit] = {

            case m => context.reply(m)

         }

    }
    env.setupEndpoint("test", rpcEndpoint)
    rpcEndpoint.stop()
    env.awaitTermination()
 }

 

  was:
when i use this code to work,  Rpc thread hangs up and not close gracefully. i 
think when rpc thread called shutdown on OnStop method, it will try to put 

MessageLoop.PoisonPill to return and stop thread in rpc pool. In spark 3.x, it 
will make others thread return & stop but current thread which call OnStop 
method to await current pool to stop. it makes current thread not stop, and 
pending program.

I'm not sure that needs to be improved or not?

 
{code:java}
//代码占位符{code}
test("Rpc env not shutdown when shutdown method call by endpoint onStop") {
     val rpcEndpoint = new RpcEndpoint {
         override val rpcEnv: RpcEnv = env
         override def onStop(): Unit = {            

            env.shutdown()            

            env.awaitTermination()        

        }

        override def receiveAndReply(context: RpcCallContext): 
PartialFunction[Any, Unit] ={

            case m => context.reply(m)

        }

    }
    env.setupEndpoint("test", rpcEndpoint)
    rpcEndpoint.stop()
    env.awaitTermination()
 }

 


> Rpc env not shutdown when shutdown method call by endpoint onStop
> -
>
> Key: SPARK-35154
> URL: https://issues.apache.org/jira/browse/SPARK-35154
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Core
>Affects Versions: 3.0.0
> Environment: spark-3.x
>Reporter: LIU
>Priority: Major
>
> when i use this code to work,  Rpc thread hangs up and not close gracefully. 
> i think when rpc thread called shutdown on OnStop method, it will try to put 
> MessageLoop.PoisonPill to return and stop thread in rpc pool. In spark 3.x, 
> it will make others thread return & stop but current thread which call OnStop 
> method to await current pool to stop. it makes current thread not stop, and 
> pending program.
> I'm not sure that needs to be improved or not?
>  
> {code:java}
> //代码占位符{code}
> test("Rpc env not shutdown when shutdown method call by endpoint onStop") {
>      val rpcEndpoint = new RpcEndpoint {
>          override val rpcEnv: RpcEnv = env
>          override def onStop(): Unit = {
>             env.shutdown()
>             env.awaitTermination()
>         }
>         override def receiveAndReply(context: RpcCallContext): 
> PartialFunction[Any, Unit] = {
>             case m => context.reply(m)
>          }
>     }
>     env.setupEndpoint("test", rpcEndpoint)
>     rpcEndpoint.stop()
>     env.awaitTermination()
>  }
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-35154) Rpc env not shutdown when shutdown method call by endpoint onStop

2021-04-20 Thread LIU (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-35154?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

LIU updated SPARK-35154:

Description: 
when i use this code to work,  Rpc thread hangs up and not close gracefully. i 
think when rpc thread called shutdown on OnStop method, it will try to put 

MessageLoop.PoisonPill to return and stop thread in rpc pool. In spark 3.x, it 
will make others thread return & stop but current thread which call OnStop 
method to await current pool to stop. it makes current thread not stop, and 
pending program.

I'm not sure that needs to be improved or not?

 
{code:java}
//代码占位符{code}
test("Rpc env not shutdown when shutdown method call by endpoint onStop") {
     val rpcEndpoint = new RpcEndpoint {
         override val rpcEnv: RpcEnv = env
         override def onStop(): Unit = {            

            env.shutdown()            

            env.awaitTermination()        

        }

        override def receiveAndReply(context: RpcCallContext): 
PartialFunction[Any, Unit] ={

            case m => context.reply(m)

        }

    }
    env.setupEndpoint("test", rpcEndpoint)
    rpcEndpoint.stop()
    env.awaitTermination()
 }

 

  was:
when i use this code to work,  Rpc thread hangs up and not close gracefully. i 
think when rpc thread called shutdown on OnStop method, it will try to put 

MessageLoop.PoisonPill to return and stop thread in rpc pool. In spark 3.x, it 
will make others thread return & stop but current thread which call OnStop 
method to await current pool to stop. it makes current thread not stop, and 
pending program.

I'm not sure that needs to be improved or not?

 
{code:java}
//代码占位符{code}
test("Rpc env not shutdown when shutdown method call by endpoint onStop") {
    val rpcEndpoint = new RpcEndpoint {
        override val rpcEnv: RpcEnv = env
        override def onStop(): Unit = {
            env.shutdown()
            env.awaitTermination()
        }

        override def receiveAndReply(context: RpcCallContext): 
PartialFunction[Any, Unit] ={
            case m => context.reply(m)
        }

    }
   env.setupEndpoint("test", rpcEndpoint)
   rpcEndpoint.stop()
   env.awaitTermination()
}

 


> Rpc env not shutdown when shutdown method call by endpoint onStop
> -
>
> Key: SPARK-35154
> URL: https://issues.apache.org/jira/browse/SPARK-35154
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Core
>Affects Versions: 3.0.0
> Environment: spark-3.x
>Reporter: LIU
>Priority: Major
>
> when i use this code to work,  Rpc thread hangs up and not close gracefully. 
> i think when rpc thread called shutdown on OnStop method, it will try to put 
> MessageLoop.PoisonPill to return and stop thread in rpc pool. In spark 3.x, 
> it will make others thread return & stop but current thread which call OnStop 
> method to await current pool to stop. it makes current thread not stop, and 
> pending program.
> I'm not sure that needs to be improved or not?
>  
> {code:java}
> //代码占位符{code}
> test("Rpc env not shutdown when shutdown method call by endpoint onStop") {
>      val rpcEndpoint = new RpcEndpoint {
>          override val rpcEnv: RpcEnv = env
>          override def onStop(): Unit = {            
>             env.shutdown()            
>             env.awaitTermination()        
>         }
>         override def receiveAndReply(context: RpcCallContext): 
> PartialFunction[Any, Unit] ={
>             case m => context.reply(m)
>         }
>     }
>     env.setupEndpoint("test", rpcEndpoint)
>     rpcEndpoint.stop()
>     env.awaitTermination()
>  }
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-35154) Rpc env not shutdown when shutdown method call by endpoint onStop

2021-04-20 Thread LIU (Jira)
LIU created SPARK-35154:
---

 Summary: Rpc env not shutdown when shutdown method call by 
endpoint onStop
 Key: SPARK-35154
 URL: https://issues.apache.org/jira/browse/SPARK-35154
 Project: Spark
  Issue Type: Bug
  Components: Spark Core
Affects Versions: 3.0.0
 Environment: spark-3.x
Reporter: LIU


when i use this code to work,  Rpc thread hangs up and not close gracefully. i 
think when rpc thread called shutdown on OnStop method, it will try to put 

MessageLoop.PoisonPill to return and stop thread in rpc pool. In spark 3.x, it 
will make others thread return & stop but current thread which call OnStop 
method to await current pool to stop. it makes current thread not stop, and 
pending program.

I'm not sure that needs to be improved or not?

 
{code:java}
//代码占位符{code}
test("Rpc env not shutdown when shutdown method call by endpoint onStop") {
    val rpcEndpoint = new RpcEndpoint {
        override val rpcEnv: RpcEnv = env
        override def onStop(): Unit = {
            env.shutdown()
            env.awaitTermination()
        }

        override def receiveAndReply(context: RpcCallContext): 
PartialFunction[Any, Unit] ={
            case m => context.reply(m)
        }

    }
   env.setupEndpoint("test", rpcEndpoint)
   rpcEndpoint.stop()
   env.awaitTermination()
}

 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-32288) [UI] Add failure summary table in stage page

2021-04-20 Thread Zhongwei Zhu (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-32288?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zhongwei Zhu updated SPARK-32288:
-
Summary: [UI] Add failure summary table in stage page  (was: [UI] Add 
exception summary table in stage page)

> [UI] Add failure summary table in stage page
> 
>
> Key: SPARK-32288
> URL: https://issues.apache.org/jira/browse/SPARK-32288
> Project: Spark
>  Issue Type: New Feature
>  Components: Web UI
>Affects Versions: 3.0.0
>Reporter: Zhongwei Zhu
>Priority: Major
>
> When there're many task failure during one stage, it's hard to find failure 
> pattern such as aggregation task failure by exception type and message. If we 
> have such information, we can easily know which type of exception of failure 
> is the root cause of stage failure.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-34777) [UI] StagePage input size/records not show when records greater than zero

2021-04-20 Thread Zhongwei Zhu (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-34777?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zhongwei Zhu updated SPARK-34777:
-
Summary: [UI] StagePage input size/records not show when records greater 
than zero  (was: [UI] StagePage input size records not show when records 
greater than zero)

> [UI] StagePage input size/records not show when records greater than zero
> -
>
> Key: SPARK-34777
> URL: https://issues.apache.org/jira/browse/SPARK-34777
> Project: Spark
>  Issue Type: Bug
>  Components: Web UI
>Affects Versions: 3.1.1
>Reporter: Zhongwei Zhu
>Priority: Minor
> Attachments: No input size records.png
>
>
> !No input size records.png|width=547,height=212!
> The `Input Size / Records` should show in summary metrics table and task 
> columns, as input records greater than zero and bytes is zero. One example is 
> spark streaming job read from kafka



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-35153) Override `sql()` of ANSI interval operators

2021-04-20 Thread Apache Spark (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-35153?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17325925#comment-17325925
 ] 

Apache Spark commented on SPARK-35153:
--

User 'MaxGekk' has created a pull request for this issue:
https://github.com/apache/spark/pull/32262

> Override `sql()` of ANSI interval operators
> ---
>
> Key: SPARK-35153
> URL: https://issues.apache.org/jira/browse/SPARK-35153
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 3.2.0
>Reporter: Max Gekk
>Assignee: Max Gekk
>Priority: Major
>
> Override the sql() method of the expression that implements operators over 
> ANSI interval, and make SQL representation more readable and potentially 
> parsable.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-35153) Override `sql()` of ANSI interval operators

2021-04-20 Thread Apache Spark (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-35153?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-35153:


Assignee: Max Gekk  (was: Apache Spark)

> Override `sql()` of ANSI interval operators
> ---
>
> Key: SPARK-35153
> URL: https://issues.apache.org/jira/browse/SPARK-35153
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 3.2.0
>Reporter: Max Gekk
>Assignee: Max Gekk
>Priority: Major
>
> Override the sql() method of the expression that implements operators over 
> ANSI interval, and make SQL representation more readable and potentially 
> parsable.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-35153) Override `sql()` of ANSI interval operators

2021-04-20 Thread Apache Spark (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-35153?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17325924#comment-17325924
 ] 

Apache Spark commented on SPARK-35153:
--

User 'MaxGekk' has created a pull request for this issue:
https://github.com/apache/spark/pull/32262

> Override `sql()` of ANSI interval operators
> ---
>
> Key: SPARK-35153
> URL: https://issues.apache.org/jira/browse/SPARK-35153
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 3.2.0
>Reporter: Max Gekk
>Assignee: Max Gekk
>Priority: Major
>
> Override the sql() method of the expression that implements operators over 
> ANSI interval, and make SQL representation more readable and potentially 
> parsable.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-35153) Override `sql()` of ANSI interval operators

2021-04-20 Thread Apache Spark (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-35153?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-35153:


Assignee: Apache Spark  (was: Max Gekk)

> Override `sql()` of ANSI interval operators
> ---
>
> Key: SPARK-35153
> URL: https://issues.apache.org/jira/browse/SPARK-35153
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 3.2.0
>Reporter: Max Gekk
>Assignee: Apache Spark
>Priority: Major
>
> Override the sql() method of the expression that implements operators over 
> ANSI interval, and make SQL representation more readable and potentially 
> parsable.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-35153) Override `sql()` of ANSI interval operators

2021-04-20 Thread Max Gekk (Jira)
Max Gekk created SPARK-35153:


 Summary: Override `sql()` of ANSI interval operators
 Key: SPARK-35153
 URL: https://issues.apache.org/jira/browse/SPARK-35153
 Project: Spark
  Issue Type: Sub-task
  Components: SQL
Affects Versions: 3.2.0
Reporter: Max Gekk
Assignee: Max Gekk


Override the sql() method of the expression that implements operators over ANSI 
interval, and make SQL representation more readable and potentially parsable.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-35151) Suppress `symbol literal is deprecated` compilation warnings in Scala 2.13

2021-04-20 Thread Apache Spark (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-35151?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17325911#comment-17325911
 ] 

Apache Spark commented on SPARK-35151:
--

User 'LuciferYang' has created a pull request for this issue:
https://github.com/apache/spark/pull/32261

> Suppress `symbol literal is deprecated` compilation warnings in Scala 2.13
> --
>
> Key: SPARK-35151
> URL: https://issues.apache.org/jira/browse/SPARK-35151
> Project: Spark
>  Issue Type: Sub-task
>  Components: Build
>Affects Versions: 3.2.0
>Reporter: Yang Jie
>Priority: Minor
>
> Add compile args to suppress  compilation warnings  as follows:
>  
> {code:java}
> [warn] 
> /home/kou/work/oss/spark-scala-2.13/examples/src/main/scala/org/apache/spark/examples/sql/SimpleTypedAggregator.scala:34:38:
>  [deprecation @  | origin= | version=2.13.0] symbol literal is deprecated; 
> use Symbol("id") instead
> [warn] val ds = spark.range(20).select(('id % 3).as("key"), 
> 'id).as[(Long, Long)]
> [warn]  ^
> [warn] 
> /home/kou/work/oss/spark-scala-2.13/examples/src/main/scala/org/apache/spark/examples/sql/SimpleTypedAggregator.scala:34:58:
>  [deprecation @  | origin= | version=2.13.0] symbol literal is deprecated; 
> use Symbol("id") instead
> [warn] val ds = spark.range(20).select(('id % 3).as("key"), 
> 'id).as[(Long, Long)]
> {code}
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-35151) Suppress `symbol literal is deprecated` compilation warnings in Scala 2.13

2021-04-20 Thread Apache Spark (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-35151?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-35151:


Assignee: Apache Spark

> Suppress `symbol literal is deprecated` compilation warnings in Scala 2.13
> --
>
> Key: SPARK-35151
> URL: https://issues.apache.org/jira/browse/SPARK-35151
> Project: Spark
>  Issue Type: Sub-task
>  Components: Build
>Affects Versions: 3.2.0
>Reporter: Yang Jie
>Assignee: Apache Spark
>Priority: Minor
>
> Add compile args to suppress  compilation warnings  as follows:
>  
> {code:java}
> [warn] 
> /home/kou/work/oss/spark-scala-2.13/examples/src/main/scala/org/apache/spark/examples/sql/SimpleTypedAggregator.scala:34:38:
>  [deprecation @  | origin= | version=2.13.0] symbol literal is deprecated; 
> use Symbol("id") instead
> [warn] val ds = spark.range(20).select(('id % 3).as("key"), 
> 'id).as[(Long, Long)]
> [warn]  ^
> [warn] 
> /home/kou/work/oss/spark-scala-2.13/examples/src/main/scala/org/apache/spark/examples/sql/SimpleTypedAggregator.scala:34:58:
>  [deprecation @  | origin= | version=2.13.0] symbol literal is deprecated; 
> use Symbol("id") instead
> [warn] val ds = spark.range(20).select(('id % 3).as("key"), 
> 'id).as[(Long, Long)]
> {code}
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-35151) Suppress `symbol literal is deprecated` compilation warnings in Scala 2.13

2021-04-20 Thread Apache Spark (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-35151?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-35151:


Assignee: (was: Apache Spark)

> Suppress `symbol literal is deprecated` compilation warnings in Scala 2.13
> --
>
> Key: SPARK-35151
> URL: https://issues.apache.org/jira/browse/SPARK-35151
> Project: Spark
>  Issue Type: Sub-task
>  Components: Build
>Affects Versions: 3.2.0
>Reporter: Yang Jie
>Priority: Minor
>
> Add compile args to suppress  compilation warnings  as follows:
>  
> {code:java}
> [warn] 
> /home/kou/work/oss/spark-scala-2.13/examples/src/main/scala/org/apache/spark/examples/sql/SimpleTypedAggregator.scala:34:38:
>  [deprecation @  | origin= | version=2.13.0] symbol literal is deprecated; 
> use Symbol("id") instead
> [warn] val ds = spark.range(20).select(('id % 3).as("key"), 
> 'id).as[(Long, Long)]
> [warn]  ^
> [warn] 
> /home/kou/work/oss/spark-scala-2.13/examples/src/main/scala/org/apache/spark/examples/sql/SimpleTypedAggregator.scala:34:58:
>  [deprecation @  | origin= | version=2.13.0] symbol literal is deprecated; 
> use Symbol("id") instead
> [warn] val ds = spark.range(20).select(('id % 3).as("key"), 
> 'id).as[(Long, Long)]
> {code}
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-35151) Suppress `symbol literal is deprecated` compilation warnings in Scala 2.13

2021-04-20 Thread Apache Spark (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-35151?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17325910#comment-17325910
 ] 

Apache Spark commented on SPARK-35151:
--

User 'LuciferYang' has created a pull request for this issue:
https://github.com/apache/spark/pull/32261

> Suppress `symbol literal is deprecated` compilation warnings in Scala 2.13
> --
>
> Key: SPARK-35151
> URL: https://issues.apache.org/jira/browse/SPARK-35151
> Project: Spark
>  Issue Type: Sub-task
>  Components: Build
>Affects Versions: 3.2.0
>Reporter: Yang Jie
>Priority: Minor
>
> Add compile args to suppress  compilation warnings  as follows:
>  
> {code:java}
> [warn] 
> /home/kou/work/oss/spark-scala-2.13/examples/src/main/scala/org/apache/spark/examples/sql/SimpleTypedAggregator.scala:34:38:
>  [deprecation @  | origin= | version=2.13.0] symbol literal is deprecated; 
> use Symbol("id") instead
> [warn] val ds = spark.range(20).select(('id % 3).as("key"), 
> 'id).as[(Long, Long)]
> [warn]  ^
> [warn] 
> /home/kou/work/oss/spark-scala-2.13/examples/src/main/scala/org/apache/spark/examples/sql/SimpleTypedAggregator.scala:34:58:
>  [deprecation @  | origin= | version=2.13.0] symbol literal is deprecated; 
> use Symbol("id") instead
> [warn] val ds = spark.range(20).select(('id % 3).as("key"), 
> 'id).as[(Long, Long)]
> {code}
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-35145) CurrentOrigin should support nested invoking

2021-04-20 Thread Wenchen Fan (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-35145?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wenchen Fan resolved SPARK-35145.
-
Fix Version/s: 3.2.0
   Resolution: Fixed

Issue resolved by pull request 32249
[https://github.com/apache/spark/pull/32249]

> CurrentOrigin should support nested invoking
> 
>
> Key: SPARK-35145
> URL: https://issues.apache.org/jira/browse/SPARK-35145
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 3.2.0
>Reporter: Wenchen Fan
>Assignee: Wenchen Fan
>Priority: Major
> Fix For: 3.2.0
>
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-35145) CurrentOrigin should support nested invoking

2021-04-20 Thread Wenchen Fan (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-35145?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wenchen Fan reassigned SPARK-35145:
---

Assignee: Wenchen Fan

> CurrentOrigin should support nested invoking
> 
>
> Key: SPARK-35145
> URL: https://issues.apache.org/jira/browse/SPARK-35145
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 3.2.0
>Reporter: Wenchen Fan
>Assignee: Wenchen Fan
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-35152) ANSI mode: IntegralDivide throws exception on overflow

2021-04-20 Thread Apache Spark (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-35152?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17325894#comment-17325894
 ] 

Apache Spark commented on SPARK-35152:
--

User 'gengliangwang' has created a pull request for this issue:
https://github.com/apache/spark/pull/32260

> ANSI mode: IntegralDivide throws exception on overflow
> --
>
> Key: SPARK-35152
> URL: https://issues.apache.org/jira/browse/SPARK-35152
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 3.2.0
>Reporter: Gengliang Wang
>Assignee: Gengliang Wang
>Priority: Major
>
> IntegralDivide throws an exception on overflow. 
> There is only one case that can cause that:
> ```
> Long.MinValue div -1
> ```



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-35152) ANSI mode: IntegralDivide throws exception on overflow

2021-04-20 Thread Apache Spark (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-35152?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-35152:


Assignee: Apache Spark  (was: Gengliang Wang)

> ANSI mode: IntegralDivide throws exception on overflow
> --
>
> Key: SPARK-35152
> URL: https://issues.apache.org/jira/browse/SPARK-35152
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 3.2.0
>Reporter: Gengliang Wang
>Assignee: Apache Spark
>Priority: Major
>
> IntegralDivide throws an exception on overflow. 
> There is only one case that can cause that:
> ```
> Long.MinValue div -1
> ```



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-35152) ANSI mode: IntegralDivide throws exception on overflow

2021-04-20 Thread Apache Spark (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-35152?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17325892#comment-17325892
 ] 

Apache Spark commented on SPARK-35152:
--

User 'gengliangwang' has created a pull request for this issue:
https://github.com/apache/spark/pull/32260

> ANSI mode: IntegralDivide throws exception on overflow
> --
>
> Key: SPARK-35152
> URL: https://issues.apache.org/jira/browse/SPARK-35152
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 3.2.0
>Reporter: Gengliang Wang
>Assignee: Gengliang Wang
>Priority: Major
>
> IntegralDivide throws an exception on overflow. 
> There is only one case that can cause that:
> ```
> Long.MinValue div -1
> ```



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-35152) ANSI mode: IntegralDivide throws exception on overflow

2021-04-20 Thread Apache Spark (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-35152?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-35152:


Assignee: Gengliang Wang  (was: Apache Spark)

> ANSI mode: IntegralDivide throws exception on overflow
> --
>
> Key: SPARK-35152
> URL: https://issues.apache.org/jira/browse/SPARK-35152
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 3.2.0
>Reporter: Gengliang Wang
>Assignee: Gengliang Wang
>Priority: Major
>
> IntegralDivide throws an exception on overflow. 
> There is only one case that can cause that:
> ```
> Long.MinValue div -1
> ```



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-34338) Report metrics from Datasource v2 scan

2021-04-20 Thread Wenchen Fan (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-34338?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wenchen Fan resolved SPARK-34338.
-
Fix Version/s: 3.2.0
   Resolution: Fixed

Issue resolved by pull request 31451
[https://github.com/apache/spark/pull/31451]

> Report metrics from Datasource v2 scan
> --
>
> Key: SPARK-34338
> URL: https://issues.apache.org/jira/browse/SPARK-34338
> Project: Spark
>  Issue Type: Umbrella
>  Components: SQL
>Affects Versions: 3.2.0
>Reporter: L. C. Hsieh
>Assignee: L. C. Hsieh
>Priority: Major
> Fix For: 3.2.0
>
>
> This is related to SPARK-34297.
> In SPARK-34297, we want to add a couple of useful metrics when reading from 
> Kafka in SS. We need some public API change in DS v2 to make it possible.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-35152) ANSI mode: IntegralDivide throws exception on overflow

2021-04-20 Thread Gengliang Wang (Jira)
Gengliang Wang created SPARK-35152:
--

 Summary: ANSI mode: IntegralDivide throws exception on overflow
 Key: SPARK-35152
 URL: https://issues.apache.org/jira/browse/SPARK-35152
 Project: Spark
  Issue Type: Sub-task
  Components: SQL
Affects Versions: 3.2.0
Reporter: Gengliang Wang
Assignee: Gengliang Wang


IntegralDivide throws an exception on overflow. 
There is only one case that can cause that:
```
Long.MinValue div -1
```




--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-34035) Refactor ScriptTransformation to remove input parameter and replace it by child.output

2021-04-20 Thread Wenchen Fan (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-34035?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wenchen Fan resolved SPARK-34035.
-
Fix Version/s: 3.2.0
   Resolution: Fixed

Issue resolved by pull request 32228
[https://github.com/apache/spark/pull/32228]

> Refactor ScriptTransformation to remove input parameter and replace it by  
> child.output
> ---
>
> Key: SPARK-34035
> URL: https://issues.apache.org/jira/browse/SPARK-34035
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 3.2.0
>Reporter: angerszhu
>Assignee: angerszhu
>Priority: Major
> Fix For: 3.2.0
>
>
> According to discussion here 
> https://github.com/apache/spark/pull/29087#discussion_r552625920



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-34035) Refactor ScriptTransformation to remove input parameter and replace it by child.output

2021-04-20 Thread Wenchen Fan (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-34035?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wenchen Fan reassigned SPARK-34035:
---

Assignee: angerszhu

> Refactor ScriptTransformation to remove input parameter and replace it by  
> child.output
> ---
>
> Key: SPARK-34035
> URL: https://issues.apache.org/jira/browse/SPARK-34035
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 3.2.0
>Reporter: angerszhu
>Assignee: angerszhu
>Priority: Major
>
> According to discussion here 
> https://github.com/apache/spark/pull/29087#discussion_r552625920



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-35151) Suppress `symbol literal is deprecated` compilation warnings in Scala 2.13

2021-04-20 Thread Yang Jie (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-35151?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yang Jie updated SPARK-35151:
-
Description: 
Add compile args to suppress  compilation warnings  as follows:

 
{code:java}
[warn] 
/home/kou/work/oss/spark-scala-2.13/examples/src/main/scala/org/apache/spark/examples/sql/SimpleTypedAggregator.scala:34:38:
 [deprecation @  | origin= | version=2.13.0] symbol literal is deprecated; use 
Symbol("id") instead
[warn] val ds = spark.range(20).select(('id % 3).as("key"), 'id).as[(Long, 
Long)]
[warn]  ^
[warn] 
/home/kou/work/oss/spark-scala-2.13/examples/src/main/scala/org/apache/spark/examples/sql/SimpleTypedAggregator.scala:34:58:
 [deprecation @  | origin= | version=2.13.0] symbol literal is deprecated; use 
Symbol("id") instead
[warn] val ds = spark.range(20).select(('id % 3).as("key"), 'id).as[(Long, 
Long)]
{code}
 

  was:Add compile args to suppress 


> Suppress `symbol literal is deprecated` compilation warnings in Scala 2.13
> --
>
> Key: SPARK-35151
> URL: https://issues.apache.org/jira/browse/SPARK-35151
> Project: Spark
>  Issue Type: Sub-task
>  Components: Build
>Affects Versions: 3.2.0
>Reporter: Yang Jie
>Priority: Minor
>
> Add compile args to suppress  compilation warnings  as follows:
>  
> {code:java}
> [warn] 
> /home/kou/work/oss/spark-scala-2.13/examples/src/main/scala/org/apache/spark/examples/sql/SimpleTypedAggregator.scala:34:38:
>  [deprecation @  | origin= | version=2.13.0] symbol literal is deprecated; 
> use Symbol("id") instead
> [warn] val ds = spark.range(20).select(('id % 3).as("key"), 
> 'id).as[(Long, Long)]
> [warn]  ^
> [warn] 
> /home/kou/work/oss/spark-scala-2.13/examples/src/main/scala/org/apache/spark/examples/sql/SimpleTypedAggregator.scala:34:58:
>  [deprecation @  | origin= | version=2.13.0] symbol literal is deprecated; 
> use Symbol("id") instead
> [warn] val ds = spark.range(20).select(('id % 3).as("key"), 
> 'id).as[(Long, Long)]
> {code}
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-35151) Suppress `symbol literal is deprecated` compilation warnings in Scala 2.13

2021-04-20 Thread Yang Jie (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-35151?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yang Jie updated SPARK-35151:
-
Description: Add compile args to suppress 

> Suppress `symbol literal is deprecated` compilation warnings in Scala 2.13
> --
>
> Key: SPARK-35151
> URL: https://issues.apache.org/jira/browse/SPARK-35151
> Project: Spark
>  Issue Type: Sub-task
>  Components: Build
>Affects Versions: 3.2.0
>Reporter: Yang Jie
>Priority: Minor
>
> Add compile args to suppress 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-35151) Suppress `symbol literal is deprecated` compilation warnings in Scala 2.13

2021-04-20 Thread Yang Jie (Jira)
Yang Jie created SPARK-35151:


 Summary: Suppress `symbol literal is deprecated` compilation 
warnings in Scala 2.13
 Key: SPARK-35151
 URL: https://issues.apache.org/jira/browse/SPARK-35151
 Project: Spark
  Issue Type: Sub-task
  Components: Build
Affects Versions: 3.2.0
Reporter: Yang Jie






--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-35150) Accelerate fallback BLAS with dev.ludovic.netlib

2021-04-20 Thread Apache Spark (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-35150?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-35150:


Assignee: (was: Apache Spark)

> Accelerate fallback BLAS with dev.ludovic.netlib
> 
>
> Key: SPARK-35150
> URL: https://issues.apache.org/jira/browse/SPARK-35150
> Project: Spark
>  Issue Type: Improvement
>  Components: GraphX, ML, MLlib
>Affects Versions: 3.2.0
>Reporter: Ludovic Henry
>Priority: Major
>
> Following https://github.com/apache/spark/pull/30810, I've continued looking 
> for ways to accelerate the usage of BLAS in Spark. With this PR, I integrate 
> work done in the [{{dev.ludovic.netlib}}|https://github.com/luhenry/netlib/] 
> Maven package.
> The {{dev.ludovic.netlib}} library wraps the original 
> {{com.github.fommil.netlib}} library and focus on accelerating the linear 
> algebra routines in use in Spark. When running the 
> {{org.apache.spark.ml.linalg.BLASBenchmark}}benchmarking suite, I get the 
> results at [1] on an Intel machine. Moreover, this library is thoroughly 
> tested to return the exact same results as the reference implementation.
> Under the hood, it reimplements the necessary algorithms in pure 
> autovectorization-friendly Java 8, as well as takes advantage of the Vector 
> API and Foreign Linker API introduced in JDK 16 when available.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-35150) Accelerate fallback BLAS with dev.ludovic.netlib

2021-04-20 Thread Apache Spark (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-35150?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17325842#comment-17325842
 ] 

Apache Spark commented on SPARK-35150:
--

User 'luhenry' has created a pull request for this issue:
https://github.com/apache/spark/pull/32253

> Accelerate fallback BLAS with dev.ludovic.netlib
> 
>
> Key: SPARK-35150
> URL: https://issues.apache.org/jira/browse/SPARK-35150
> Project: Spark
>  Issue Type: Improvement
>  Components: GraphX, ML, MLlib
>Affects Versions: 3.2.0
>Reporter: Ludovic Henry
>Priority: Major
>
> Following https://github.com/apache/spark/pull/30810, I've continued looking 
> for ways to accelerate the usage of BLAS in Spark. With this PR, I integrate 
> work done in the [{{dev.ludovic.netlib}}|https://github.com/luhenry/netlib/] 
> Maven package.
> The {{dev.ludovic.netlib}} library wraps the original 
> {{com.github.fommil.netlib}} library and focus on accelerating the linear 
> algebra routines in use in Spark. When running the 
> {{org.apache.spark.ml.linalg.BLASBenchmark}}benchmarking suite, I get the 
> results at [1] on an Intel machine. Moreover, this library is thoroughly 
> tested to return the exact same results as the reference implementation.
> Under the hood, it reimplements the necessary algorithms in pure 
> autovectorization-friendly Java 8, as well as takes advantage of the Vector 
> API and Foreign Linker API introduced in JDK 16 when available.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-35150) Accelerate fallback BLAS with dev.ludovic.netlib

2021-04-20 Thread Apache Spark (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-35150?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-35150:


Assignee: Apache Spark

> Accelerate fallback BLAS with dev.ludovic.netlib
> 
>
> Key: SPARK-35150
> URL: https://issues.apache.org/jira/browse/SPARK-35150
> Project: Spark
>  Issue Type: Improvement
>  Components: GraphX, ML, MLlib
>Affects Versions: 3.2.0
>Reporter: Ludovic Henry
>Assignee: Apache Spark
>Priority: Major
>
> Following https://github.com/apache/spark/pull/30810, I've continued looking 
> for ways to accelerate the usage of BLAS in Spark. With this PR, I integrate 
> work done in the [{{dev.ludovic.netlib}}|https://github.com/luhenry/netlib/] 
> Maven package.
> The {{dev.ludovic.netlib}} library wraps the original 
> {{com.github.fommil.netlib}} library and focus on accelerating the linear 
> algebra routines in use in Spark. When running the 
> {{org.apache.spark.ml.linalg.BLASBenchmark}}benchmarking suite, I get the 
> results at [1] on an Intel machine. Moreover, this library is thoroughly 
> tested to return the exact same results as the reference implementation.
> Under the hood, it reimplements the necessary algorithms in pure 
> autovectorization-friendly Java 8, as well as takes advantage of the Vector 
> API and Foreign Linker API introduced in JDK 16 when available.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-35150) Accelerate fallback BLAS with dev.ludovic.netlib

2021-04-20 Thread Ludovic Henry (Jira)
Ludovic Henry created SPARK-35150:
-

 Summary: Accelerate fallback BLAS with dev.ludovic.netlib
 Key: SPARK-35150
 URL: https://issues.apache.org/jira/browse/SPARK-35150
 Project: Spark
  Issue Type: Improvement
  Components: GraphX, ML, MLlib
Affects Versions: 3.2.0
Reporter: Ludovic Henry


Following https://github.com/apache/spark/pull/30810, I've continued looking 
for ways to accelerate the usage of BLAS in Spark. With this PR, I integrate 
work done in the [{{dev.ludovic.netlib}}|https://github.com/luhenry/netlib/] 
Maven package.

The {{dev.ludovic.netlib}} library wraps the original 
{{com.github.fommil.netlib}} library and focus on accelerating the linear 
algebra routines in use in Spark. When running the 
{{org.apache.spark.ml.linalg.BLASBenchmark}}benchmarking suite, I get the 
results at [1] on an Intel machine. Moreover, this library is thoroughly tested 
to return the exact same results as the reference implementation.

Under the hood, it reimplements the necessary algorithms in pure 
autovectorization-friendly Java 8, as well as takes advantage of the Vector API 
and Foreign Linker API introduced in JDK 16 when available.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-35113) Support ANSI intervals in the Hash expression

2021-04-20 Thread Apache Spark (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-35113?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17325827#comment-17325827
 ] 

Apache Spark commented on SPARK-35113:
--

User 'AngersZh' has created a pull request for this issue:
https://github.com/apache/spark/pull/32259

> Support ANSI intervals in the Hash expression
> -
>
> Key: SPARK-35113
> URL: https://issues.apache.org/jira/browse/SPARK-35113
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 3.2.0
>Reporter: Max Gekk
>Priority: Major
>
> Handle YearMonthIntervalType and DayTimeIntervalType in HashExpression. And 
> write tests.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-35113) Support ANSI intervals in the Hash expression

2021-04-20 Thread Apache Spark (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-35113?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-35113:


Assignee: Apache Spark

> Support ANSI intervals in the Hash expression
> -
>
> Key: SPARK-35113
> URL: https://issues.apache.org/jira/browse/SPARK-35113
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 3.2.0
>Reporter: Max Gekk
>Assignee: Apache Spark
>Priority: Major
>
> Handle YearMonthIntervalType and DayTimeIntervalType in HashExpression. And 
> write tests.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-35113) Support ANSI intervals in the Hash expression

2021-04-20 Thread Apache Spark (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-35113?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-35113:


Assignee: (was: Apache Spark)

> Support ANSI intervals in the Hash expression
> -
>
> Key: SPARK-35113
> URL: https://issues.apache.org/jira/browse/SPARK-35113
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 3.2.0
>Reporter: Max Gekk
>Priority: Major
>
> Handle YearMonthIntervalType and DayTimeIntervalType in HashExpression. And 
> write tests.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-35113) Support ANSI intervals in the Hash expression

2021-04-20 Thread Apache Spark (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-35113?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17325825#comment-17325825
 ] 

Apache Spark commented on SPARK-35113:
--

User 'AngersZh' has created a pull request for this issue:
https://github.com/apache/spark/pull/32259

> Support ANSI intervals in the Hash expression
> -
>
> Key: SPARK-35113
> URL: https://issues.apache.org/jira/browse/SPARK-35113
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 3.2.0
>Reporter: Max Gekk
>Priority: Major
>
> Handle YearMonthIntervalType and DayTimeIntervalType in HashExpression. And 
> write tests.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-34877) Add Spark AM Log link in case of master as yarn and deploy mode as client

2021-04-20 Thread Thomas Graves (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-34877?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Thomas Graves resolved SPARK-34877.
---
Fix Version/s: 3.2.0
 Assignee: Saurabh Chawla
   Resolution: Fixed

> Add Spark AM Log link in case of master as yarn and deploy mode as client
> -
>
> Key: SPARK-34877
> URL: https://issues.apache.org/jira/browse/SPARK-34877
> Project: Spark
>  Issue Type: Improvement
>  Components: Spark Core, YARN
>Affects Versions: 3.1.1
>Reporter: Saurabh Chawla
>Assignee: Saurabh Chawla
>Priority: Minor
> Fix For: 3.2.0
>
>
> On Running Spark job with yarn and deployment mode as client, Spark Driver 
> and Spark Application master launch in two separate containers. In various 
> scenarios there is need to see Spark Application master logs to see the 
> resource allocation, Decommissioning status and other information shared 
> between yarn RM and Spark Application master.
> Till now the only way to check this by finding the container id of the AM and 
> check the logs either using Yarn utility or Yarn RM Application History 
> server. 
> This Jira is for adding the spark AM log link for spark job running in the 
> client mode for yarn. Instead of searching the container id and then find the 
> logs. We can directly check in the Spark UI



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-35120) Guide users to sync branch and enable GitHub Actions in their forked repository

2021-04-20 Thread Apache Spark (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-35120?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17325810#comment-17325810
 ] 

Apache Spark commented on SPARK-35120:
--

User 'HyukjinKwon' has created a pull request for this issue:
https://github.com/apache/spark/pull/32258

> Guide users to sync branch and enable GitHub Actions in their forked 
> repository
> ---
>
> Key: SPARK-35120
> URL: https://issues.apache.org/jira/browse/SPARK-35120
> Project: Spark
>  Issue Type: Sub-task
>  Components: Project Infra
>Affects Versions: 3.2.0
>Reporter: Hyukjin Kwon
>Assignee: Hyukjin Kwon
>Priority: Major
> Fix For: 3.2.0
>
>
> If developers don't enable Github Actions in their fork, the Pr builds cannot 
> run. We should guide them to enable.
> Also, the branch should be synced to the latest master branch.
> We could leverage Action Required status in GitHub check: 
> https://docs.github.com/en/rest/guides/getting-started-with-the-checks-api



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-33976) Add a dedicated SQL document page for the TRANSFORM-related functionality,

2021-04-20 Thread Apache Spark (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-33976?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17325800#comment-17325800
 ] 

Apache Spark commented on SPARK-33976:
--

User 'AngersZh' has created a pull request for this issue:
https://github.com/apache/spark/pull/32257

> Add a dedicated SQL document page for the TRANSFORM-related functionality,
> --
>
> Key: SPARK-33976
> URL: https://issues.apache.org/jira/browse/SPARK-33976
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 3.2.0
>Reporter: angerszhu
>Assignee: angerszhu
>Priority: Major
> Fix For: 3.2.0
>
>
> Add doc about transform 
> https://github.com/apache/spark/pull/30973#issuecomment-753715318



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-33976) Add a dedicated SQL document page for the TRANSFORM-related functionality,

2021-04-20 Thread Apache Spark (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-33976?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17325801#comment-17325801
 ] 

Apache Spark commented on SPARK-33976:
--

User 'AngersZh' has created a pull request for this issue:
https://github.com/apache/spark/pull/32257

> Add a dedicated SQL document page for the TRANSFORM-related functionality,
> --
>
> Key: SPARK-33976
> URL: https://issues.apache.org/jira/browse/SPARK-33976
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 3.2.0
>Reporter: angerszhu
>Assignee: angerszhu
>Priority: Major
> Fix For: 3.2.0
>
>
> Add doc about transform 
> https://github.com/apache/spark/pull/30973#issuecomment-753715318



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-31225) Override `sql` method for OuterReference

2021-04-20 Thread Apache Spark (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-31225?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17325792#comment-17325792
 ] 

Apache Spark commented on SPARK-31225:
--

User 'yaooqinn' has created a pull request for this issue:
https://github.com/apache/spark/pull/32256

> Override `sql` method for  OuterReference
> -
>
> Key: SPARK-31225
> URL: https://issues.apache.org/jira/browse/SPARK-31225
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 3.1.0
>Reporter: Kent Yao
>Assignee: Kent Yao
>Priority: Minor
> Fix For: 3.0.0
>
>
> OuterReference is LeafExpression, so it's children is Nil, which makes its 
> SQL representation always be outer(). This makes our explain-command and 
> error msg unclear when OuterReference exists



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-31225) Override `sql` method for OuterReference

2021-04-20 Thread Apache Spark (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-31225?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17325791#comment-17325791
 ] 

Apache Spark commented on SPARK-31225:
--

User 'yaooqinn' has created a pull request for this issue:
https://github.com/apache/spark/pull/32256

> Override `sql` method for  OuterReference
> -
>
> Key: SPARK-31225
> URL: https://issues.apache.org/jira/browse/SPARK-31225
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 3.1.0
>Reporter: Kent Yao
>Assignee: Kent Yao
>Priority: Minor
> Fix For: 3.0.0
>
>
> OuterReference is LeafExpression, so it's children is Nil, which makes its 
> SQL representation always be outer(). This makes our explain-command and 
> error msg unclear when OuterReference exists



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-35108) Pickle produces incorrect key labels for GenericRowWithSchema (data corruption)

2021-04-20 Thread Hyukjin Kwon (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-35108?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17325764#comment-17325764
 ] 

Hyukjin Kwon commented on SPARK-35108:
--

Thanks for cc'ing me [~tgraves]. I will take a look early next week if no one 
takes this one.

> Pickle produces incorrect key labels for GenericRowWithSchema (data 
> corruption)
> ---
>
> Key: SPARK-35108
> URL: https://issues.apache.org/jira/browse/SPARK-35108
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 3.0.1, 3.0.2
>Reporter: Robert Joseph Evans
>Priority: Blocker
>  Labels: correctness
> Attachments: test.py, test.sh
>
>
> I think this also shows up for all versions of Spark that pickle the data 
> when doing a collect from python.
> When you do a collect in python java will do a collect and convert the 
> UnsafeRows into GenericRowWithSchema instances before it sends them to the 
> Pickler. The Pickler, by default, will try to dedupe objects using hashCode 
> and .equals for the object.  But .equals and .hashCode for 
> GenericRowWithSchema only looks at the data, not the schema. But when we 
> pickle the row the keys from the schema are written out.
> This can result in data corruption, sort of, in a few cases where a row has 
> the same number of elements as a struct within the row does, or a sub-struct 
> within another struct. 
> If the data happens to be the same, the keys for the resulting row or struct 
> can be wrong.
> My repro case is a bit convoluted, but it does happen.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-35143) Add default log config for spark-sql

2021-04-20 Thread Apache Spark (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-35143?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17325696#comment-17325696
 ] 

Apache Spark commented on SPARK-35143:
--

User 'ChenDou2021' has created a pull request for this issue:
https://github.com/apache/spark/pull/32254

> Add default log config for spark-sql
> 
>
> Key: SPARK-35143
> URL: https://issues.apache.org/jira/browse/SPARK-35143
> Project: Spark
>  Issue Type: Improvement
>  Components: Spark Shell, SQL
>Affects Versions: 3.1.1
>Reporter: hong dongdong
>Priority: Minor
>
> The default log level for spark-sql is WARN. How to change the log level is 
> confusing, we need a default config.
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-33976) Add a dedicated SQL document page for the TRANSFORM-related functionality,

2021-04-20 Thread Wenchen Fan (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-33976?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wenchen Fan reassigned SPARK-33976:
---

Assignee: angerszhu

> Add a dedicated SQL document page for the TRANSFORM-related functionality,
> --
>
> Key: SPARK-33976
> URL: https://issues.apache.org/jira/browse/SPARK-33976
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 3.2.0
>Reporter: angerszhu
>Assignee: angerszhu
>Priority: Major
>
> Add doc about transform 
> https://github.com/apache/spark/pull/30973#issuecomment-753715318



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-33976) Add a dedicated SQL document page for the TRANSFORM-related functionality,

2021-04-20 Thread Wenchen Fan (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-33976?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wenchen Fan resolved SPARK-33976.
-
Fix Version/s: 3.2.0
   Resolution: Fixed

Issue resolved by pull request 31010
[https://github.com/apache/spark/pull/31010]

> Add a dedicated SQL document page for the TRANSFORM-related functionality,
> --
>
> Key: SPARK-33976
> URL: https://issues.apache.org/jira/browse/SPARK-33976
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 3.2.0
>Reporter: angerszhu
>Assignee: angerszhu
>Priority: Major
> Fix For: 3.2.0
>
>
> Add doc about transform 
> https://github.com/apache/spark/pull/30973#issuecomment-753715318



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-35149) I am facing this issue regularly, how to fix this issue.

2021-04-20 Thread Eppa Rakesh (Jira)
Eppa Rakesh created SPARK-35149:
---

 Summary: I am facing this issue regularly, how to fix this issue.
 Key: SPARK-35149
 URL: https://issues.apache.org/jira/browse/SPARK-35149
 Project: Spark
  Issue Type: Question
  Components: Spark Submit
Affects Versions: 2.2.2
Reporter: Eppa Rakesh


21/04/19 21:02:11 WARN hdfs.DataStreamer: Exception for 
BP-823308525-10.56.47.77-1544458538172:blk_1170699623_96969312
 java.io.EOFException: Unexpected EOF while trying to read response from server
 at 
org.apache.hadoop.hdfs.protocolPB.PBHelperClient.vintPrefixed(PBHelperClient.java:448)
 at 
org.apache.hadoop.hdfs.protocol.datatransfer.PipelineAck.readFields(PipelineAck.java:213)
 at 
org.apache.hadoop.hdfs.DataStreamer$ResponseProcessor.run(DataStreamer.java:1086)
 21/04/19 21:04:01 WARN hdfs.DataStreamer: Error Recovery for 
BP-823308525-10.56.47.77-1544458538172:blk_1170699623_96969312 in pipeline 
[DatanodeInfoWithStorage[10.34.39.42:9866,DS-0ad94d03-fa3f-486b-b204-3e8d2df91f17,DISK],
 
DatanodeInfoWithStorage[10.56.47.67:9866,DS-c28dab54-8fa0-4a49-80ec-345cc0cc52bd,DISK],
 
DatanodeInfoWithStorage[10.56.47.55:9866,DS-79f5dd22-d0bc-4fe0-8e50-8a570779de17,DISK]]:
 datanode 
0(DatanodeInfoWithStorage[10.56.47.36:9866,DS-0ad94d03-fa3f-486b-b204-3e8d2df91f17,DISK])
 is bad.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-35068) Add tests for ANSI intervals to HiveThriftBinaryServerSuite

2021-04-20 Thread Max Gekk (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-35068?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Max Gekk reassigned SPARK-35068:


Assignee: angerszhu

> Add tests for ANSI intervals to HiveThriftBinaryServerSuite
> ---
>
> Key: SPARK-35068
> URL: https://issues.apache.org/jira/browse/SPARK-35068
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 3.2.0
>Reporter: angerszhu
>Assignee: angerszhu
>Priority: Major
>
> Add tests for year-month and day-time intervals to 
> HiveThriftBinaryServerSuite similar to:
> # Query Intervals in VIEWs through thrift server
> # Support interval type



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-35068) Add tests for ANSI intervals to HiveThriftBinaryServerSuite

2021-04-20 Thread Max Gekk (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-35068?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Max Gekk resolved SPARK-35068.
--
Fix Version/s: 3.2.0
   Resolution: Fixed

Issue resolved by pull request 32250
[https://github.com/apache/spark/pull/32250]

> Add tests for ANSI intervals to HiveThriftBinaryServerSuite
> ---
>
> Key: SPARK-35068
> URL: https://issues.apache.org/jira/browse/SPARK-35068
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 3.2.0
>Reporter: angerszhu
>Assignee: angerszhu
>Priority: Major
> Fix For: 3.2.0
>
>
> Add tests for year-month and day-time intervals to 
> HiveThriftBinaryServerSuite similar to:
> # Query Intervals in VIEWs through thrift server
> # Support interval type



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-34526) Skip checking glob path in FileStreamSink.hasMetadata

2021-04-20 Thread Yuanjian Li (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-34526?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yuanjian Li updated SPARK-34526:

Description: When checking the path in {{FileStreamSink.hasMetadata}}, we 
should ignore the error and assume the user wants to read a batch output. This 
is to keep the original behavior of ignoring the error.  (was: Some users may 
use a very long glob path to read and `isDirectory` may fail when the path is 
too long. We should ignore the error when the path is a glob path since the 
file streaming sink doesn’t support glob paths.)

> Skip checking glob path in FileStreamSink.hasMetadata
> -
>
> Key: SPARK-34526
> URL: https://issues.apache.org/jira/browse/SPARK-34526
> Project: Spark
>  Issue Type: Bug
>  Components: Structured Streaming
>Affects Versions: 3.1.0
>Reporter: Yuanjian Li
>Priority: Major
>
> When checking the path in {{FileStreamSink.hasMetadata}}, we should ignore 
> the error and assume the user wants to read a batch output. This is to keep 
> the original behavior of ignoring the error.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-35113) Support ANSI intervals in the Hash expression

2021-04-20 Thread Max Gekk (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-35113?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17325670#comment-17325670
 ] 

Max Gekk commented on SPARK-35113:
--

[~angerszhuuu] Feel free to take this.

> Support ANSI intervals in the Hash expression
> -
>
> Key: SPARK-35113
> URL: https://issues.apache.org/jira/browse/SPARK-35113
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 3.2.0
>Reporter: Max Gekk
>Priority: Major
>
> Handle YearMonthIntervalType and DayTimeIntervalType in HashExpression. And 
> write tests.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-35113) Support ANSI intervals in the Hash expression

2021-04-20 Thread angerszhu (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-35113?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17325663#comment-17325663
 ] 

angerszhu commented on SPARK-35113:
---

[~maxgekk] Have you work on this? if not, can I take this one?

> Support ANSI intervals in the Hash expression
> -
>
> Key: SPARK-35113
> URL: https://issues.apache.org/jira/browse/SPARK-35113
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 3.2.0
>Reporter: Max Gekk
>Priority: Major
>
> Handle YearMonthIntervalType and DayTimeIntervalType in HashExpression. And 
> write tests.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



  1   2   >