[jira] [Resolved] (SPARK-42157) `spark.scheduler.mode=FAIR` should provide FAIR scheduler

2023-01-23 Thread Dongjoon Hyun (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-42157?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dongjoon Hyun resolved SPARK-42157.
---
Fix Version/s: 3.2.4
   3.3.2
   3.4.0
   Resolution: Fixed

Issue resolved by pull request 39703
[https://github.com/apache/spark/pull/39703]

> `spark.scheduler.mode=FAIR` should provide FAIR scheduler
> -
>
> Key: SPARK-42157
> URL: https://issues.apache.org/jira/browse/SPARK-42157
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Core
>Affects Versions: 2.2.3, 2.3.4, 2.4.8, 3.0.3, 3.1.3, 3.3.1, 3.2.3, 3.4.0
>Reporter: Dongjoon Hyun
>Assignee: Dongjoon Hyun
>Priority: Major
> Fix For: 3.2.4, 3.3.2, 3.4.0
>
> Attachments: Screenshot 2023-01-22 at 2.39.34 PM.png
>
>
>  !Screenshot 2023-01-22 at 2.39.34 PM.png! 



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-42157) `spark.scheduler.mode=FAIR` should provide FAIR scheduler

2023-01-23 Thread Dongjoon Hyun (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-42157?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dongjoon Hyun reassigned SPARK-42157:
-

Assignee: Dongjoon Hyun

> `spark.scheduler.mode=FAIR` should provide FAIR scheduler
> -
>
> Key: SPARK-42157
> URL: https://issues.apache.org/jira/browse/SPARK-42157
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Core
>Affects Versions: 2.2.3, 2.3.4, 2.4.8, 3.0.3, 3.1.3, 3.3.1, 3.2.3, 3.4.0
>Reporter: Dongjoon Hyun
>Assignee: Dongjoon Hyun
>Priority: Major
> Attachments: Screenshot 2023-01-22 at 2.39.34 PM.png
>
>
>  !Screenshot 2023-01-22 at 2.39.34 PM.png! 



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-42164) Register partitioned-table-related classes to KryoSerializer

2023-01-23 Thread Dongjoon Hyun (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-42164?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dongjoon Hyun resolved SPARK-42164.
---
Fix Version/s: 3.4.0
   Resolution: Fixed

Issue resolved by pull request 39713
[https://github.com/apache/spark/pull/39713]

> Register partitioned-table-related classes to KryoSerializer
> 
>
> Key: SPARK-42164
> URL: https://issues.apache.org/jira/browse/SPARK-42164
> Project: Spark
>  Issue Type: Improvement
>  Components: Spark Core
>Affects Versions: 3.4.0
>Reporter: Dongjoon Hyun
>Assignee: Dongjoon Hyun
>Priority: Minor
> Fix For: 3.4.0
>
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-42164) Register partitioned-table-related classes to KryoSerializer

2023-01-23 Thread Dongjoon Hyun (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-42164?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dongjoon Hyun reassigned SPARK-42164:
-

Assignee: Dongjoon Hyun

> Register partitioned-table-related classes to KryoSerializer
> 
>
> Key: SPARK-42164
> URL: https://issues.apache.org/jira/browse/SPARK-42164
> Project: Spark
>  Issue Type: Improvement
>  Components: Spark Core
>Affects Versions: 3.4.0
>Reporter: Dongjoon Hyun
>Assignee: Dongjoon Hyun
>Priority: Minor
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-42164) Register partitioned-table-related classes to KryoSerializer

2023-01-23 Thread Apache Spark (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-42164?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-42164:


Assignee: (was: Apache Spark)

> Register partitioned-table-related classes to KryoSerializer
> 
>
> Key: SPARK-42164
> URL: https://issues.apache.org/jira/browse/SPARK-42164
> Project: Spark
>  Issue Type: Improvement
>  Components: Spark Core
>Affects Versions: 3.4.0
>Reporter: Dongjoon Hyun
>Priority: Minor
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-42164) Register partitioned-table-related classes to KryoSerializer

2023-01-23 Thread Apache Spark (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-42164?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-42164:


Assignee: Apache Spark

> Register partitioned-table-related classes to KryoSerializer
> 
>
> Key: SPARK-42164
> URL: https://issues.apache.org/jira/browse/SPARK-42164
> Project: Spark
>  Issue Type: Improvement
>  Components: Spark Core
>Affects Versions: 3.4.0
>Reporter: Dongjoon Hyun
>Assignee: Apache Spark
>Priority: Minor
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-42164) Register partitioned-table-related classes to KryoSerializer

2023-01-23 Thread Apache Spark (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-42164?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17680078#comment-17680078
 ] 

Apache Spark commented on SPARK-42164:
--

User 'dongjoon-hyun' has created a pull request for this issue:
https://github.com/apache/spark/pull/39713

> Register partitioned-table-related classes to KryoSerializer
> 
>
> Key: SPARK-42164
> URL: https://issues.apache.org/jira/browse/SPARK-42164
> Project: Spark
>  Issue Type: Improvement
>  Components: Spark Core
>Affects Versions: 3.4.0
>Reporter: Dongjoon Hyun
>Priority: Minor
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-42164) Register partitioned-table-related classes to KryoSerializer

2023-01-23 Thread Dongjoon Hyun (Jira)
Dongjoon Hyun created SPARK-42164:
-

 Summary: Register partitioned-table-related classes to 
KryoSerializer
 Key: SPARK-42164
 URL: https://issues.apache.org/jira/browse/SPARK-42164
 Project: Spark
  Issue Type: Improvement
  Components: Spark Core
Affects Versions: 3.4.0
Reporter: Dongjoon Hyun






--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-42133) Add basic Dataset API methods to Spark Connect Scala Client

2023-01-23 Thread Jira


 [ 
https://issues.apache.org/jira/browse/SPARK-42133?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Herman van Hövell reassigned SPARK-42133:
-

Assignee: Venkata Sai Akhil Gudesa

> Add basic Dataset API methods to Spark Connect Scala Client
> ---
>
> Key: SPARK-42133
> URL: https://issues.apache.org/jira/browse/SPARK-42133
> Project: Spark
>  Issue Type: Improvement
>  Components: Connect
>Affects Versions: 3.4.0
>Reporter: Venkata Sai Akhil Gudesa
>Assignee: Venkata Sai Akhil Gudesa
>Priority: Major
>
> Add basic Dataframe API methods (such as project, filter, limit) as well as 
> range() support in SparkSession.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-42133) Add basic Dataset API methods to Spark Connect Scala Client

2023-01-23 Thread Jira


 [ 
https://issues.apache.org/jira/browse/SPARK-42133?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Herman van Hövell resolved SPARK-42133.
---
Fix Version/s: 3.4.0
   Resolution: Fixed

> Add basic Dataset API methods to Spark Connect Scala Client
> ---
>
> Key: SPARK-42133
> URL: https://issues.apache.org/jira/browse/SPARK-42133
> Project: Spark
>  Issue Type: Improvement
>  Components: Connect
>Affects Versions: 3.4.0
>Reporter: Venkata Sai Akhil Gudesa
>Assignee: Venkata Sai Akhil Gudesa
>Priority: Major
> Fix For: 3.4.0
>
>
> Add basic Dataframe API methods (such as project, filter, limit) as well as 
> range() support in SparkSession.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-42163) Schema pruning fails on non-foldable array index or map key

2023-01-23 Thread David Cashman (Jira)
David Cashman created SPARK-42163:
-

 Summary: Schema pruning fails on non-foldable array index or map 
key
 Key: SPARK-42163
 URL: https://issues.apache.org/jira/browse/SPARK-42163
 Project: Spark
  Issue Type: Bug
  Components: Optimizer
Affects Versions: 3.2.3
Reporter: David Cashman


Schema pruning tries to extract selected fields from struct extractors. It 
looks through GetArrayItem/GetMapItem, but when doing so, it ignores the 
index/key, which may itself be a struct field. If it is a struct field that is 
not otherwise selected, and some other field of the same attribute is selected, 
then pruning will drop the field, resulting in an optimizer error.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-42162) Memory usage on executors increased drastically for a complex query with large number of addition operations

2023-01-23 Thread Supun Nakandala (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-42162?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Supun Nakandala updated SPARK-42162:

Description: 
With the [recent changes|https://github.com/apache/spark/pull/37851]  in the 
expression canonicalization, a complex query with a large number of Add 
operations ends up consuming 10x more memory on the executors.

The reason for this issue is that with the new changes the canonicalization 
process ends up generating lot of intermediate objects, especially for complex 
queries with a large number of commutative operators. In this specific case, a 
heap histogram analysis shows that a large number of Add objects use the extra 
memory.
This issue does not happen before PR 
[#37851.|https://github.com/apache/spark/pull/37851]

The high memory usage causes the executors to lose heartbeat signals and 
results in task failures.

  was:
With the [recent changes|https://github.com/apache/spark/pull/37851]  in the 
expression canonicalization, a complex query with a large number of Add 
operations ends up consuming 10x more memory on the executors.

A heap histogram analysis shows that a large number of Add objects use the 
extra memory.
Before the PR [#37851|https://github.com/apache/spark/pull/37851], this issue 
does not happen.

The high memory usage causes the executors to lose heartbeat signals and cause 
task failures.


> Memory usage on executors increased drastically for a complex query with 
> large number of addition operations
> 
>
> Key: SPARK-42162
> URL: https://issues.apache.org/jira/browse/SPARK-42162
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 3.3.0
>Reporter: Supun Nakandala
>Priority: Major
>
> With the [recent changes|https://github.com/apache/spark/pull/37851]  in the 
> expression canonicalization, a complex query with a large number of Add 
> operations ends up consuming 10x more memory on the executors.
> The reason for this issue is that with the new changes the canonicalization 
> process ends up generating lot of intermediate objects, especially for 
> complex queries with a large number of commutative operators. In this 
> specific case, a heap histogram analysis shows that a large number of Add 
> objects use the extra memory.
> This issue does not happen before PR 
> [#37851.|https://github.com/apache/spark/pull/37851]
> The high memory usage causes the executors to lose heartbeat signals and 
> results in task failures.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-42162) Memory usage on executors increased drastically for a complex query with large number of addition operations

2023-01-23 Thread Supun Nakandala (Jira)
Supun Nakandala created SPARK-42162:
---

 Summary: Memory usage on executors increased drastically for a 
complex query with large number of addition operations
 Key: SPARK-42162
 URL: https://issues.apache.org/jira/browse/SPARK-42162
 Project: Spark
  Issue Type: Bug
  Components: SQL
Affects Versions: 3.3.0
Reporter: Supun Nakandala


With the [recent changes|https://github.com/apache/spark/pull/37851]  in the 
expression canonicalization, a complex query with a large number of Add 
operations ends up consuming 10x more memory on the executors.

A heap histogram analysis shows that a large number of Add objects use the 
extra memory.
Before the PR [#37851|https://github.com/apache/spark/pull/37851], this issue 
does not happen.

The high memory usage causes the executors to lose heartbeat signals and cause 
task failures.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-41931) Improve UNSUPPORTED_DATA_TYPE message for complex types

2023-01-23 Thread Apache Spark (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-41931?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-41931:


Assignee: (was: Apache Spark)

> Improve UNSUPPORTED_DATA_TYPE message for complex types
> ---
>
> Key: SPARK-41931
> URL: https://issues.apache.org/jira/browse/SPARK-41931
> Project: Spark
>  Issue Type: Improvement
>  Components: Spark Core
>Affects Versions: 3.4.0
>Reporter: Serge Rielau
>Priority: Major
>
> spark-sql> SELECT CAST(array(1, 2, 3) AS ARRAY);
> [UNSUPPORTED_DATATYPE] Unsupported data type "ARRAY"(line 1, pos 30)
> == SQL ==
> SELECT CAST(array(1, 2, 3) AS ARRAY)
> --^^^
> This error message is confusing. We support ARRAY. We just require it to be 
> typed.
> We should have an error like:
> [INCOMPLETE_TYPE_DEFINITION.ARRAY] The definition of type `ARRAY` is 
> incomplete. You must provide an element type. For example: `ARRAY\`.
> Similarly for STRUCT and MAP.  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-41931) Improve UNSUPPORTED_DATA_TYPE message for complex types

2023-01-23 Thread Apache Spark (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-41931?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-41931:


Assignee: Apache Spark

> Improve UNSUPPORTED_DATA_TYPE message for complex types
> ---
>
> Key: SPARK-41931
> URL: https://issues.apache.org/jira/browse/SPARK-41931
> Project: Spark
>  Issue Type: Improvement
>  Components: Spark Core
>Affects Versions: 3.4.0
>Reporter: Serge Rielau
>Assignee: Apache Spark
>Priority: Major
>
> spark-sql> SELECT CAST(array(1, 2, 3) AS ARRAY);
> [UNSUPPORTED_DATATYPE] Unsupported data type "ARRAY"(line 1, pos 30)
> == SQL ==
> SELECT CAST(array(1, 2, 3) AS ARRAY)
> --^^^
> This error message is confusing. We support ARRAY. We just require it to be 
> typed.
> We should have an error like:
> [INCOMPLETE_TYPE_DEFINITION.ARRAY] The definition of type `ARRAY` is 
> incomplete. You must provide an element type. For example: `ARRAY\`.
> Similarly for STRUCT and MAP.  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-41931) Improve UNSUPPORTED_DATA_TYPE message for complex types

2023-01-23 Thread Apache Spark (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-41931?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17680001#comment-17680001
 ] 

Apache Spark commented on SPARK-41931:
--

User 'RunyaoChen' has created a pull request for this issue:
https://github.com/apache/spark/pull/39711

> Improve UNSUPPORTED_DATA_TYPE message for complex types
> ---
>
> Key: SPARK-41931
> URL: https://issues.apache.org/jira/browse/SPARK-41931
> Project: Spark
>  Issue Type: Improvement
>  Components: Spark Core
>Affects Versions: 3.4.0
>Reporter: Serge Rielau
>Priority: Major
>
> spark-sql> SELECT CAST(array(1, 2, 3) AS ARRAY);
> [UNSUPPORTED_DATATYPE] Unsupported data type "ARRAY"(line 1, pos 30)
> == SQL ==
> SELECT CAST(array(1, 2, 3) AS ARRAY)
> --^^^
> This error message is confusing. We support ARRAY. We just require it to be 
> typed.
> We should have an error like:
> [INCOMPLETE_TYPE_DEFINITION.ARRAY] The definition of type `ARRAY` is 
> incomplete. You must provide an element type. For example: `ARRAY\`.
> Similarly for STRUCT and MAP.  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-42090) Introduce sasl retry count in RetryingBlockTransferor

2023-01-23 Thread Apache Spark (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-42090?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17679986#comment-17679986
 ] 

Apache Spark commented on SPARK-42090:
--

User 'akpatnam25' has created a pull request for this issue:
https://github.com/apache/spark/pull/39710

> Introduce sasl retry count in RetryingBlockTransferor
> -
>
> Key: SPARK-42090
> URL: https://issues.apache.org/jira/browse/SPARK-42090
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Core
>Affects Versions: 3.4.0
>Reporter: Ted Yu
>Assignee: Ted Yu
>Priority: Major
> Fix For: 3.4.0
>
>
> Previously a boolean variable, saslTimeoutSeen, was used in 
> RetryingBlockTransferor. However, the boolean variable wouldn't cover the 
> following scenario:
> 1. SaslTimeoutException
> 2. IOException
> 3. SaslTimeoutException
> 4. IOException
> Even though IOException at #2 is retried (resulting in increment of 
> retryCount), the retryCount would be cleared at step #4.
> Since the intention of saslTimeoutSeen is to undo the increment due to 
> retrying SaslTimeoutException, we should keep a counter for 
> SaslTimeoutException retries and subtract the value of this counter from 
> retryCount.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-42090) Introduce sasl retry count in RetryingBlockTransferor

2023-01-23 Thread Apache Spark (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-42090?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17679984#comment-17679984
 ] 

Apache Spark commented on SPARK-42090:
--

User 'akpatnam25' has created a pull request for this issue:
https://github.com/apache/spark/pull/39709

> Introduce sasl retry count in RetryingBlockTransferor
> -
>
> Key: SPARK-42090
> URL: https://issues.apache.org/jira/browse/SPARK-42090
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Core
>Affects Versions: 3.4.0
>Reporter: Ted Yu
>Assignee: Ted Yu
>Priority: Major
> Fix For: 3.4.0
>
>
> Previously a boolean variable, saslTimeoutSeen, was used in 
> RetryingBlockTransferor. However, the boolean variable wouldn't cover the 
> following scenario:
> 1. SaslTimeoutException
> 2. IOException
> 3. SaslTimeoutException
> 4. IOException
> Even though IOException at #2 is retried (resulting in increment of 
> retryCount), the retryCount would be cleared at step #4.
> Since the intention of saslTimeoutSeen is to undo the increment due to 
> retrying SaslTimeoutException, we should keep a counter for 
> SaslTimeoutException retries and subtract the value of this counter from 
> retryCount.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-42090) Introduce sasl retry count in RetryingBlockTransferor

2023-01-23 Thread Apache Spark (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-42090?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17679982#comment-17679982
 ] 

Apache Spark commented on SPARK-42090:
--

User 'akpatnam25' has created a pull request for this issue:
https://github.com/apache/spark/pull/39709

> Introduce sasl retry count in RetryingBlockTransferor
> -
>
> Key: SPARK-42090
> URL: https://issues.apache.org/jira/browse/SPARK-42090
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Core
>Affects Versions: 3.4.0
>Reporter: Ted Yu
>Assignee: Ted Yu
>Priority: Major
> Fix For: 3.4.0
>
>
> Previously a boolean variable, saslTimeoutSeen, was used in 
> RetryingBlockTransferor. However, the boolean variable wouldn't cover the 
> following scenario:
> 1. SaslTimeoutException
> 2. IOException
> 3. SaslTimeoutException
> 4. IOException
> Even though IOException at #2 is retried (resulting in increment of 
> retryCount), the retryCount would be cleared at step #4.
> Since the intention of saslTimeoutSeen is to undo the increment due to 
> retrying SaslTimeoutException, we should keep a counter for 
> SaslTimeoutException retries and subtract the value of this counter from 
> retryCount.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-41413) SPJ: Avoid shuffle when partition keys mismatch, but join expressions are compatible

2023-01-23 Thread Apache Spark (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-41413?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17679972#comment-17679972
 ] 

Apache Spark commented on SPARK-41413:
--

User 'sunchao' has created a pull request for this issue:
https://github.com/apache/spark/pull/39708

> SPJ: Avoid shuffle when partition keys mismatch, but join expressions are 
> compatible
> 
>
> Key: SPARK-41413
> URL: https://issues.apache.org/jira/browse/SPARK-41413
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 3.3.1
>Reporter: Chao Sun
>Assignee: Chao Sun
>Priority: Major
> Fix For: 3.4.0
>
>
> Currently when checking whether two sides of a Storage Partitioned Join are 
> compatible, we requires both the partition expressions as well as the 
> partition keys are compatible. However, this condition could be relaxed so 
> that we only require the former. In the case that the latter is not 
> compatible, we can calculate a common superset of keys and push down the 
> information to both sides of the join, and use empty partitions for the 
> missing keys.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Comment Edited] (SPARK-26365) spark-submit for k8s cluster doesn't propagate exit code

2023-01-23 Thread Mayank Asthana (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-26365?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17679807#comment-17679807
 ] 

Mayank Asthana edited comment on SPARK-26365 at 1/23/23 2:06 PM:
-

{quote}Spark submit command exit code ($?) as 0 is okay as there is no error in 
job submission.
{quote}
Spark submit in cluster mode with master yarn, exits with `1` status code on a 
job failure. It would also be equivalent to a job submission, to yarn instead 
of kubernetes.

So, this should also be considered a bug.


was (Author: masthana):
{quote}Spark submit command exit code ($?) as 0 is okay as there is no error in 
job submission.


{quote}

> spark-submit for k8s cluster doesn't propagate exit code
> 
>
> Key: SPARK-26365
> URL: https://issues.apache.org/jira/browse/SPARK-26365
> Project: Spark
>  Issue Type: Bug
>  Components: Kubernetes, Spark Core, Spark Submit
>Affects Versions: 2.3.2, 2.4.0, 3.0.0, 3.1.0
>Reporter: Oscar Bonilla
>Priority: Major
> Attachments: spark-2.4.5-raise-exception-k8s-failure.patch, 
> spark-3.0.0-raise-exception-k8s-failure.patch
>
>
> When launching apps using spark-submit in a kubernetes cluster, if the Spark 
> applications fails (returns exit code = 1 for example), spark-submit will 
> still exit gracefully and return exit code = 0.
> This is problematic, since there's no way to know if there's been a problem 
> with the Spark application.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-26365) spark-submit for k8s cluster doesn't propagate exit code

2023-01-23 Thread Mayank Asthana (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-26365?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17679807#comment-17679807
 ] 

Mayank Asthana commented on SPARK-26365:


{quote}Spark submit command exit code ($?) as 0 is okay as there is no error in 
job submission.


{quote}

> spark-submit for k8s cluster doesn't propagate exit code
> 
>
> Key: SPARK-26365
> URL: https://issues.apache.org/jira/browse/SPARK-26365
> Project: Spark
>  Issue Type: Bug
>  Components: Kubernetes, Spark Core, Spark Submit
>Affects Versions: 2.3.2, 2.4.0, 3.0.0, 3.1.0
>Reporter: Oscar Bonilla
>Priority: Major
> Attachments: spark-2.4.5-raise-exception-k8s-failure.patch, 
> spark-3.0.0-raise-exception-k8s-failure.patch
>
>
> When launching apps using spark-submit in a kubernetes cluster, if the Spark 
> applications fails (returns exit code = 1 for example), spark-submit will 
> still exit gracefully and return exit code = 0.
> This is problematic, since there's no way to know if there's been a problem 
> with the Spark application.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-42161) Upgrade Arrow to 11.0.0

2023-01-23 Thread Yang Jie (Jira)
Yang Jie created SPARK-42161:


 Summary: Upgrade Arrow to 11.0.0
 Key: SPARK-42161
 URL: https://issues.apache.org/jira/browse/SPARK-42161
 Project: Spark
  Issue Type: Improvement
  Components: Build
Affects Versions: 3.4.0
Reporter: Yang Jie






--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-42161) Upgrade Arrow to 11.0.0

2023-01-23 Thread Apache Spark (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-42161?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-42161:


Assignee: Apache Spark

> Upgrade Arrow to 11.0.0
> ---
>
> Key: SPARK-42161
> URL: https://issues.apache.org/jira/browse/SPARK-42161
> Project: Spark
>  Issue Type: Improvement
>  Components: Build
>Affects Versions: 3.4.0
>Reporter: Yang Jie
>Assignee: Apache Spark
>Priority: Minor
>
> https://github.com/apache/arrow/releases/tag/apache-arrow-11.0.0



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-42161) Upgrade Arrow to 11.0.0

2023-01-23 Thread Apache Spark (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-42161?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-42161:


Assignee: (was: Apache Spark)

> Upgrade Arrow to 11.0.0
> ---
>
> Key: SPARK-42161
> URL: https://issues.apache.org/jira/browse/SPARK-42161
> Project: Spark
>  Issue Type: Improvement
>  Components: Build
>Affects Versions: 3.4.0
>Reporter: Yang Jie
>Priority: Minor
>
> https://github.com/apache/arrow/releases/tag/apache-arrow-11.0.0



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-42161) Upgrade Arrow to 11.0.0

2023-01-23 Thread Apache Spark (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-42161?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17679781#comment-17679781
 ] 

Apache Spark commented on SPARK-42161:
--

User 'LuciferYang' has created a pull request for this issue:
https://github.com/apache/spark/pull/39707

> Upgrade Arrow to 11.0.0
> ---
>
> Key: SPARK-42161
> URL: https://issues.apache.org/jira/browse/SPARK-42161
> Project: Spark
>  Issue Type: Improvement
>  Components: Build
>Affects Versions: 3.4.0
>Reporter: Yang Jie
>Priority: Minor
>
> https://github.com/apache/arrow/releases/tag/apache-arrow-11.0.0



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-41948) Fix NPE for error classes: CANNOT_PARSE_JSON_FIELD

2023-01-23 Thread Max Gekk (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-41948?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Max Gekk resolved SPARK-41948.
--
Fix Version/s: 3.4.0
   Resolution: Fixed

Issue resolved by pull request 39466
[https://github.com/apache/spark/pull/39466]

> Fix NPE for error classes: CANNOT_PARSE_JSON_FIELD
> --
>
> Key: SPARK-41948
> URL: https://issues.apache.org/jira/browse/SPARK-41948
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 3.4.0
>Reporter: BingKun Pan
>Assignee: BingKun Pan
>Priority: Minor
> Fix For: 3.4.0
>
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-42161) Upgrade Arrow to 11.0.0

2023-01-23 Thread Yang Jie (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-42161?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yang Jie updated SPARK-42161:
-
Description: 
https://github.com/apache/arrow/releases/tag/apache-arrow-11.0.0

> Upgrade Arrow to 11.0.0
> ---
>
> Key: SPARK-42161
> URL: https://issues.apache.org/jira/browse/SPARK-42161
> Project: Spark
>  Issue Type: Improvement
>  Components: Build
>Affects Versions: 3.4.0
>Reporter: Yang Jie
>Priority: Minor
>
> https://github.com/apache/arrow/releases/tag/apache-arrow-11.0.0



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-41948) Fix NPE for error classes: CANNOT_PARSE_JSON_FIELD

2023-01-23 Thread Max Gekk (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-41948?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Max Gekk reassigned SPARK-41948:


Assignee: BingKun Pan

> Fix NPE for error classes: CANNOT_PARSE_JSON_FIELD
> --
>
> Key: SPARK-41948
> URL: https://issues.apache.org/jira/browse/SPARK-41948
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 3.4.0
>Reporter: BingKun Pan
>Assignee: BingKun Pan
>Priority: Minor
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-42152) Use `_` instead of `-` in `shadedPattern` for relocation package name

2023-01-23 Thread Hyukjin Kwon (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-42152?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hyukjin Kwon resolved SPARK-42152.
--
Fix Version/s: 3.4.0
   Resolution: Fixed

Issue resolved by pull request 39694
[https://github.com/apache/spark/pull/39694]

> Use `_` instead of `-` in `shadedPattern` for relocation package name
> -
>
> Key: SPARK-42152
> URL: https://issues.apache.org/jira/browse/SPARK-42152
> Project: Spark
>  Issue Type: Improvement
>  Components: Build
>Affects Versions: 3.4.0
>Reporter: Yang Jie
>Assignee: Yang Jie
>Priority: Minor
> Fix For: 3.4.0
>
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-41775) Implement training functions as input

2023-01-23 Thread Hyukjin Kwon (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-41775?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hyukjin Kwon resolved SPARK-41775.
--
Fix Version/s: 3.4.0
   Resolution: Fixed

Issue resolved by pull request 39369
[https://github.com/apache/spark/pull/39369]

> Implement training functions as input
> -
>
> Key: SPARK-41775
> URL: https://issues.apache.org/jira/browse/SPARK-41775
> Project: Spark
>  Issue Type: Sub-task
>  Components: ML, PySpark
>Affects Versions: 3.4.0
>Reporter: Rithwik Ediga Lakhamsani
>Assignee: Rithwik Ediga Lakhamsani
>Priority: Major
> Fix For: 3.4.0
>
>
> Sidenote: make formatting updates described in 
> https://github.com/apache/spark/pull/39188
>  
> Currently, `Distributor().run(...)` takes only files as input. Now we will 
> add in additional functionality to take in functions as well. This will 
> require us to go through the following process on each task in the executor 
> nodes:
> 1. take the input function and args and pickle them
> 2. Create a temp train.py file that looks like
> {code:java}
> import cloudpickle
> import os
> if _name_ == "_main_":
>     train, args = cloudpickle.load(f"{tempdir}/train_input.pkl")
>     output = train(*args)
>     if output and os.environ.get("RANK", "") == "0": # this is for 
> partitionId == 0
>         cloudpickle.dump(f"{tempdir}/train_output.pkl") {code}
> 3. Run that train.py file with `torchrun`
> 4. Check if `train_output.pkl` has been created on process on partitionId == 
> 0, if it has, then deserialize it and return that output through `.collect()`



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-41712) Migrate the Spark Connect errors into PySpark error framework.

2023-01-23 Thread Hyukjin Kwon (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-41712?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hyukjin Kwon resolved SPARK-41712.
--
Fix Version/s: 3.4.0
   Resolution: Fixed

Issue resolved by pull request 39693
[https://github.com/apache/spark/pull/39693]

> Migrate the Spark Connect errors into PySpark error framework.
> --
>
> Key: SPARK-41712
> URL: https://issues.apache.org/jira/browse/SPARK-41712
> Project: Spark
>  Issue Type: Sub-task
>  Components: Connect, PySpark
>Affects Versions: 3.4.0
>Reporter: Haejoon Lee
>Assignee: Haejoon Lee
>Priority: Major
> Fix For: 3.4.0
>
>
> We need to migrate the Spark Connect errors into centralized error framework 
> for leveraging the error class logic.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-41712) Migrate the Spark Connect errors into PySpark error framework.

2023-01-23 Thread Hyukjin Kwon (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-41712?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hyukjin Kwon reassigned SPARK-41712:


Assignee: Haejoon Lee

> Migrate the Spark Connect errors into PySpark error framework.
> --
>
> Key: SPARK-41712
> URL: https://issues.apache.org/jira/browse/SPARK-41712
> Project: Spark
>  Issue Type: Sub-task
>  Components: Connect, PySpark
>Affects Versions: 3.4.0
>Reporter: Haejoon Lee
>Assignee: Haejoon Lee
>Priority: Major
>
> We need to migrate the Spark Connect errors into centralized error framework 
> for leveraging the error class logic.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-42152) Use `_` instead of `-` in `shadedPattern` for relocation package name

2023-01-23 Thread Hyukjin Kwon (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-42152?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hyukjin Kwon reassigned SPARK-42152:


Assignee: Yang Jie

> Use `_` instead of `-` in `shadedPattern` for relocation package name
> -
>
> Key: SPARK-42152
> URL: https://issues.apache.org/jira/browse/SPARK-42152
> Project: Spark
>  Issue Type: Improvement
>  Components: Build
>Affects Versions: 3.4.0
>Reporter: Yang Jie
>Assignee: Yang Jie
>Priority: Minor
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-41775) Implement training functions as input

2023-01-23 Thread Hyukjin Kwon (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-41775?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hyukjin Kwon reassigned SPARK-41775:


Assignee: Rithwik Ediga Lakhamsani

> Implement training functions as input
> -
>
> Key: SPARK-41775
> URL: https://issues.apache.org/jira/browse/SPARK-41775
> Project: Spark
>  Issue Type: Sub-task
>  Components: ML, PySpark
>Affects Versions: 3.4.0
>Reporter: Rithwik Ediga Lakhamsani
>Assignee: Rithwik Ediga Lakhamsani
>Priority: Major
>
> Sidenote: make formatting updates described in 
> https://github.com/apache/spark/pull/39188
>  
> Currently, `Distributor().run(...)` takes only files as input. Now we will 
> add in additional functionality to take in functions as well. This will 
> require us to go through the following process on each task in the executor 
> nodes:
> 1. take the input function and args and pickle them
> 2. Create a temp train.py file that looks like
> {code:java}
> import cloudpickle
> import os
> if _name_ == "_main_":
>     train, args = cloudpickle.load(f"{tempdir}/train_input.pkl")
>     output = train(*args)
>     if output and os.environ.get("RANK", "") == "0": # this is for 
> partitionId == 0
>         cloudpickle.dump(f"{tempdir}/train_output.pkl") {code}
> 3. Run that train.py file with `torchrun`
> 4. Check if `train_output.pkl` has been created on process on partitionId == 
> 0, if it has, then deserialize it and return that output through `.collect()`



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org