[jira] [Commented] (SPARK-42768) Enable cached plan apply AQE by default

2023-03-12 Thread Apache Spark (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-42768?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17699445#comment-17699445
 ] 

Apache Spark commented on SPARK-42768:
--

User 'ulysses-you' has created a pull request for this issue:
https://github.com/apache/spark/pull/40390

> Enable cached plan apply AQE by default
> ---
>
> Key: SPARK-42768
> URL: https://issues.apache.org/jira/browse/SPARK-42768
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 3.5.0
>Reporter: XiDuo You
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-42768) Enable cached plan apply AQE by default

2023-03-12 Thread Apache Spark (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-42768?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-42768:


Assignee: (was: Apache Spark)

> Enable cached plan apply AQE by default
> ---
>
> Key: SPARK-42768
> URL: https://issues.apache.org/jira/browse/SPARK-42768
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 3.5.0
>Reporter: XiDuo You
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-42768) Enable cached plan apply AQE by default

2023-03-12 Thread Apache Spark (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-42768?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-42768:


Assignee: Apache Spark

> Enable cached plan apply AQE by default
> ---
>
> Key: SPARK-42768
> URL: https://issues.apache.org/jira/browse/SPARK-42768
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 3.5.0
>Reporter: XiDuo You
>Assignee: Apache Spark
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-42768) Enable cached plan apply AQE by default

2023-03-12 Thread XiDuo You (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-42768?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

XiDuo You updated SPARK-42768:
--
Summary: Enable cached plan apply AQE by default  (was: Enable cache apply 
AQE by default)

> Enable cached plan apply AQE by default
> ---
>
> Key: SPARK-42768
> URL: https://issues.apache.org/jira/browse/SPARK-42768
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 3.5.0
>Reporter: XiDuo You
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-42768) Enable cache apply AQE by default

2023-03-12 Thread XiDuo You (Jira)
XiDuo You created SPARK-42768:
-

 Summary: Enable cache apply AQE by default
 Key: SPARK-42768
 URL: https://issues.apache.org/jira/browse/SPARK-42768
 Project: Spark
  Issue Type: Improvement
  Components: SQL
Affects Versions: 3.5.0
Reporter: XiDuo You






--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-42767) Add check condition to start connect server fallback with `in-memory` and auto ignored some tests strongly depend on hive

2023-03-12 Thread Apache Spark (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-42767?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-42767:


Assignee: Apache Spark

> Add check condition to start connect server fallback with `in-memory` and 
> auto ignored some tests strongly depend on hive
> -
>
> Key: SPARK-42767
> URL: https://issues.apache.org/jira/browse/SPARK-42767
> Project: Spark
>  Issue Type: Improvement
>  Components: Connect, Tests
>Affects Versions: 3.4.0, 3.5.0
>Reporter: Yang Jie
>Assignee: Apache Spark
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-42767) Add check condition to start connect server fallback with `in-memory` and auto ignored some tests strongly depend on hive

2023-03-12 Thread Apache Spark (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-42767?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-42767:


Assignee: (was: Apache Spark)

> Add check condition to start connect server fallback with `in-memory` and 
> auto ignored some tests strongly depend on hive
> -
>
> Key: SPARK-42767
> URL: https://issues.apache.org/jira/browse/SPARK-42767
> Project: Spark
>  Issue Type: Improvement
>  Components: Connect, Tests
>Affects Versions: 3.4.0, 3.5.0
>Reporter: Yang Jie
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-42767) Add check condition to start connect server fallback with `in-memory` and auto ignored some tests strongly depend on hive

2023-03-12 Thread Apache Spark (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-42767?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17699444#comment-17699444
 ] 

Apache Spark commented on SPARK-42767:
--

User 'LuciferYang' has created a pull request for this issue:
https://github.com/apache/spark/pull/40389

> Add check condition to start connect server fallback with `in-memory` and 
> auto ignored some tests strongly depend on hive
> -
>
> Key: SPARK-42767
> URL: https://issues.apache.org/jira/browse/SPARK-42767
> Project: Spark
>  Issue Type: Improvement
>  Components: Connect, Tests
>Affects Versions: 3.4.0, 3.5.0
>Reporter: Yang Jie
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-42767) Add check condition to start connect server fallback with `in-memory` and auto ignored some tests strongly depend on hive

2023-03-12 Thread Apache Spark (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-42767?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17699443#comment-17699443
 ] 

Apache Spark commented on SPARK-42767:
--

User 'LuciferYang' has created a pull request for this issue:
https://github.com/apache/spark/pull/40389

> Add check condition to start connect server fallback with `in-memory` and 
> auto ignored some tests strongly depend on hive
> -
>
> Key: SPARK-42767
> URL: https://issues.apache.org/jira/browse/SPARK-42767
> Project: Spark
>  Issue Type: Improvement
>  Components: Connect, Tests
>Affects Versions: 3.4.0, 3.5.0
>Reporter: Yang Jie
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-42767) Add check condition to start connect server fallback with `in-memory` and auto ignored some tests strongly depend on hive

2023-03-12 Thread Yang Jie (Jira)
Yang Jie created SPARK-42767:


 Summary: Add check condition to start connect server fallback with 
`in-memory` and auto ignored some tests strongly depend on hive
 Key: SPARK-42767
 URL: https://issues.apache.org/jira/browse/SPARK-42767
 Project: Spark
  Issue Type: Improvement
  Components: Connect, Tests
Affects Versions: 3.4.0, 3.5.0
Reporter: Yang Jie






--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-42766) YarnAllocator should filter excluded nodes when launching allocated containers

2023-03-12 Thread wangshengjie (Jira)
wangshengjie created SPARK-42766:


 Summary: YarnAllocator should filter excluded nodes when launching 
allocated containers
 Key: SPARK-42766
 URL: https://issues.apache.org/jira/browse/SPARK-42766
 Project: Spark
  Issue Type: Improvement
  Components: YARN
Affects Versions: 3.3.2
Reporter: wangshengjie


In production environment, we hit an issue like this:

If we request 10 containers form nodeA and nodeB, first response from Yarn 
return 5 contianers from nodeA and nodeB, then nodeA blacklisted, and second 
response from Yarn maybe return some containers from nodeA and launching 
containers, but when containers(Executor) setup and send register request to 
Driver, it will be rejected and this failure will be counted to 
{code:java}
spark.yarn.max.executor.failures {code}
, and will casue app failed.
{code:java}
Max number of executor failures ($maxNumExecutorFailures) reached{code}
 



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-42766) YarnAllocator should filter excluded nodes when launching allocated containers

2023-03-12 Thread wangshengjie (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-42766?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17699438#comment-17699438
 ] 

wangshengjie commented on SPARK-42766:
--

Working on this

> YarnAllocator should filter excluded nodes when launching allocated containers
> --
>
> Key: SPARK-42766
> URL: https://issues.apache.org/jira/browse/SPARK-42766
> Project: Spark
>  Issue Type: Improvement
>  Components: YARN
>Affects Versions: 3.3.2
>Reporter: wangshengjie
>Priority: Major
>
> In production environment, we hit an issue like this:
> If we request 10 containers form nodeA and nodeB, first response from Yarn 
> return 5 contianers from nodeA and nodeB, then nodeA blacklisted, and second 
> response from Yarn maybe return some containers from nodeA and launching 
> containers, but when containers(Executor) setup and send register request to 
> Driver, it will be rejected and this failure will be counted to 
> {code:java}
> spark.yarn.max.executor.failures {code}
> , and will casue app failed.
> {code:java}
> Max number of executor failures ($maxNumExecutorFailures) reached{code}
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-42679) createDataFrame doesn't work with non-nullable schema.

2023-03-12 Thread Ruifeng Zheng (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-42679?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ruifeng Zheng resolved SPARK-42679.
---
Fix Version/s: 3.4.0
   Resolution: Fixed

Issue resolved by pull request 40382
[https://github.com/apache/spark/pull/40382]

> createDataFrame doesn't work with non-nullable schema.
> --
>
> Key: SPARK-42679
> URL: https://issues.apache.org/jira/browse/SPARK-42679
> Project: Spark
>  Issue Type: Sub-task
>  Components: Connect
>Affects Versions: 3.4.0
>Reporter: Haejoon Lee
>Priority: Major
> Fix For: 3.4.0
>
>
> spark.createDataFrame won't work with non-nullable schema as below:
> {code:java}
> from pyspark.sql.types import *
> schema_false = StructType([StructField("id", IntegerType(), False)])
> spark.createDataFrame([[1]], schema=schema_false)
> Traceback (most recent call last):
> ...
> pyspark.errors.exceptions.connect.AnalysisException: 
> [NULLABLE_COLUMN_OR_FIELD] Column or field `id` is nullable while it's 
> required to be non-nullable.{code}
> whereas it works fine with nullable schema:
> {code:java}
> schema_true = StructType([StructField("id", IntegerType(), True)])
> spark.createDataFrame([[1]], schema=schema_true)
> DataFrame[id: int]{code}
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-42101) Wrap InMemoryTableScanExec with QueryStage

2023-03-12 Thread Wenchen Fan (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-42101?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wenchen Fan reassigned SPARK-42101:
---

Assignee: XiDuo You

> Wrap InMemoryTableScanExec with QueryStage
> --
>
> Key: SPARK-42101
> URL: https://issues.apache.org/jira/browse/SPARK-42101
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 3.5.0
>Reporter: XiDuo You
>Assignee: XiDuo You
>Priority: Major
> Fix For: 3.5.0
>
>
> The first access to the cached plan which is enable AQE is tricky. Currently, 
> we can not preverse it's output partitioning and ordering.
> The whole query plan also missed lots of optimization in AQE framework. Wrap 
> InMemoryTableScanExec  to query stage can resolve all these issues.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-42101) Wrap InMemoryTableScanExec with QueryStage

2023-03-12 Thread Wenchen Fan (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-42101?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wenchen Fan resolved SPARK-42101.
-
Fix Version/s: 3.5.0
   Resolution: Fixed

Issue resolved by pull request 39624
[https://github.com/apache/spark/pull/39624]

> Wrap InMemoryTableScanExec with QueryStage
> --
>
> Key: SPARK-42101
> URL: https://issues.apache.org/jira/browse/SPARK-42101
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 3.5.0
>Reporter: XiDuo You
>Priority: Major
> Fix For: 3.5.0
>
>
> The first access to the cached plan which is enable AQE is tricky. Currently, 
> we can not preverse it's output partitioning and ordering.
> The whole query plan also missed lots of optimization in AQE framework. Wrap 
> InMemoryTableScanExec  to query stage can resolve all these issues.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-42765) Regulate the import path of `pandas_udf`

2023-03-12 Thread Apache Spark (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-42765?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17699425#comment-17699425
 ] 

Apache Spark commented on SPARK-42765:
--

User 'xinrong-meng' has created a pull request for this issue:
https://github.com/apache/spark/pull/40388

> Regulate the import path of `pandas_udf`
> 
>
> Key: SPARK-42765
> URL: https://issues.apache.org/jira/browse/SPARK-42765
> Project: Spark
>  Issue Type: Sub-task
>  Components: Connect, PySpark
>Affects Versions: 3.4.0
>Reporter: Xinrong Meng
>Priority: Major
>
> Remove the outdated import path of `pandas_udf`



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-42765) Regulate the import path of `pandas_udf`

2023-03-12 Thread Apache Spark (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-42765?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-42765:


Assignee: Apache Spark

> Regulate the import path of `pandas_udf`
> 
>
> Key: SPARK-42765
> URL: https://issues.apache.org/jira/browse/SPARK-42765
> Project: Spark
>  Issue Type: Sub-task
>  Components: Connect, PySpark
>Affects Versions: 3.4.0
>Reporter: Xinrong Meng
>Assignee: Apache Spark
>Priority: Major
>
> Remove the outdated import path of `pandas_udf`



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-42765) Regulate the import path of `pandas_udf`

2023-03-12 Thread Apache Spark (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-42765?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17699426#comment-17699426
 ] 

Apache Spark commented on SPARK-42765:
--

User 'xinrong-meng' has created a pull request for this issue:
https://github.com/apache/spark/pull/40388

> Regulate the import path of `pandas_udf`
> 
>
> Key: SPARK-42765
> URL: https://issues.apache.org/jira/browse/SPARK-42765
> Project: Spark
>  Issue Type: Sub-task
>  Components: Connect, PySpark
>Affects Versions: 3.4.0
>Reporter: Xinrong Meng
>Priority: Major
>
> Remove the outdated import path of `pandas_udf`



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-42765) Regulate the import path of `pandas_udf`

2023-03-12 Thread Apache Spark (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-42765?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-42765:


Assignee: (was: Apache Spark)

> Regulate the import path of `pandas_udf`
> 
>
> Key: SPARK-42765
> URL: https://issues.apache.org/jira/browse/SPARK-42765
> Project: Spark
>  Issue Type: Sub-task
>  Components: Connect, PySpark
>Affects Versions: 3.4.0
>Reporter: Xinrong Meng
>Priority: Major
>
> Remove the outdated import path of `pandas_udf`



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-42765) Regulate the import path of `pandas_udf`

2023-03-12 Thread Xinrong Meng (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-42765?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xinrong Meng updated SPARK-42765:
-
Summary: Regulate the import path of `pandas_udf`  (was: Standardize the 
import path of `pandas_udf`)

> Regulate the import path of `pandas_udf`
> 
>
> Key: SPARK-42765
> URL: https://issues.apache.org/jira/browse/SPARK-42765
> Project: Spark
>  Issue Type: Sub-task
>  Components: Connect, PySpark
>Affects Versions: 3.4.0
>Reporter: Xinrong Meng
>Priority: Major
>
> Remove the outdated import path of `pandas_udf`



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-42765) Standardize the import path of `pandas_udf`

2023-03-12 Thread Xinrong Meng (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-42765?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xinrong Meng updated SPARK-42765:
-
Summary: Standardize the import path of `pandas_udf`  (was: Remove the 
outdated import path of `pandas_udf`)

> Standardize the import path of `pandas_udf`
> ---
>
> Key: SPARK-42765
> URL: https://issues.apache.org/jira/browse/SPARK-42765
> Project: Spark
>  Issue Type: Sub-task
>  Components: Connect, PySpark
>Affects Versions: 3.4.0
>Reporter: Xinrong Meng
>Priority: Major
>
> Remove the outdated import path of `pandas_udf`



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-42765) Remove the outdated import path of `pandas_udf`

2023-03-12 Thread Xinrong Meng (Jira)
Xinrong Meng created SPARK-42765:


 Summary: Remove the outdated import path of `pandas_udf`
 Key: SPARK-42765
 URL: https://issues.apache.org/jira/browse/SPARK-42765
 Project: Spark
  Issue Type: Sub-task
  Components: Connect, PySpark
Affects Versions: 3.4.0
Reporter: Xinrong Meng


Remove the outdated import path of `pandas_udf`



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-42764) Parameterize the max number of attempts for driver props fetcher in KubernetesExecutorBackend

2023-03-12 Thread Apache Spark (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-42764?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17699417#comment-17699417
 ] 

Apache Spark commented on SPARK-42764:
--

User 'dongjoon-hyun' has created a pull request for this issue:
https://github.com/apache/spark/pull/40387

> Parameterize the max number of attempts for driver props fetcher in 
> KubernetesExecutorBackend
> -
>
> Key: SPARK-42764
> URL: https://issues.apache.org/jira/browse/SPARK-42764
> Project: Spark
>  Issue Type: Improvement
>  Components: Kubernetes
>Affects Versions: 3.5.0
>Reporter: Dongjoon Hyun
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-42764) Parameterize the max number of attempts for driver props fetcher in KubernetesExecutorBackend

2023-03-12 Thread Apache Spark (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-42764?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-42764:


Assignee: (was: Apache Spark)

> Parameterize the max number of attempts for driver props fetcher in 
> KubernetesExecutorBackend
> -
>
> Key: SPARK-42764
> URL: https://issues.apache.org/jira/browse/SPARK-42764
> Project: Spark
>  Issue Type: Improvement
>  Components: Kubernetes
>Affects Versions: 3.5.0
>Reporter: Dongjoon Hyun
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-42764) Parameterize the max number of attempts for driver props fetcher in KubernetesExecutorBackend

2023-03-12 Thread Apache Spark (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-42764?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17699416#comment-17699416
 ] 

Apache Spark commented on SPARK-42764:
--

User 'dongjoon-hyun' has created a pull request for this issue:
https://github.com/apache/spark/pull/40387

> Parameterize the max number of attempts for driver props fetcher in 
> KubernetesExecutorBackend
> -
>
> Key: SPARK-42764
> URL: https://issues.apache.org/jira/browse/SPARK-42764
> Project: Spark
>  Issue Type: Improvement
>  Components: Kubernetes
>Affects Versions: 3.5.0
>Reporter: Dongjoon Hyun
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-42764) Parameterize the max number of attempts for driver props fetcher in KubernetesExecutorBackend

2023-03-12 Thread Apache Spark (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-42764?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-42764:


Assignee: Apache Spark

> Parameterize the max number of attempts for driver props fetcher in 
> KubernetesExecutorBackend
> -
>
> Key: SPARK-42764
> URL: https://issues.apache.org/jira/browse/SPARK-42764
> Project: Spark
>  Issue Type: Improvement
>  Components: Kubernetes
>Affects Versions: 3.5.0
>Reporter: Dongjoon Hyun
>Assignee: Apache Spark
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-42764) Parameterize the max number of attempts for driver props fetcher in KubernetesExecutorBackend

2023-03-12 Thread Dongjoon Hyun (Jira)
Dongjoon Hyun created SPARK-42764:
-

 Summary: Parameterize the max number of attempts for driver props 
fetcher in KubernetesExecutorBackend
 Key: SPARK-42764
 URL: https://issues.apache.org/jira/browse/SPARK-42764
 Project: Spark
  Issue Type: Improvement
  Components: Kubernetes
Affects Versions: 3.5.0
Reporter: Dongjoon Hyun






--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-42761) kubernetes-client from 6.4.1 to 6.5.0

2023-03-12 Thread Dongjoon Hyun (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-42761?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dongjoon Hyun reassigned SPARK-42761:
-

Assignee: Bjørn Jørgensen

> kubernetes-client from 6.4.1 to 6.5.0
> -
>
> Key: SPARK-42761
> URL: https://issues.apache.org/jira/browse/SPARK-42761
> Project: Spark
>  Issue Type: Dependency upgrade
>  Components: Build
>Affects Versions: 3.5.0
>Reporter: Bjørn Jørgensen
>Assignee: Bjørn Jørgensen
>Priority: Major
>
> Upgrade fabric8:kubernetes-client from 6.4.1 to 6.5.0



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-42761) kubernetes-client from 6.4.1 to 6.5.0

2023-03-12 Thread Dongjoon Hyun (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-42761?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dongjoon Hyun resolved SPARK-42761.
---
Fix Version/s: 3.5.0
   Resolution: Fixed

Issue resolved by pull request 40381
[https://github.com/apache/spark/pull/40381]

> kubernetes-client from 6.4.1 to 6.5.0
> -
>
> Key: SPARK-42761
> URL: https://issues.apache.org/jira/browse/SPARK-42761
> Project: Spark
>  Issue Type: Dependency upgrade
>  Components: Build
>Affects Versions: 3.5.0
>Reporter: Bjørn Jørgensen
>Assignee: Bjørn Jørgensen
>Priority: Major
> Fix For: 3.5.0
>
>
> Upgrade fabric8:kubernetes-client from 6.4.1 to 6.5.0



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-42761) kubernetes-client from 6.4.1 to 6.5.0

2023-03-12 Thread Jira


 [ 
https://issues.apache.org/jira/browse/SPARK-42761?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Bjørn Jørgensen updated SPARK-42761:

Description: 
Upgrade fabric8:kubernetes-client from 6.4.1 to 6.5.0


  was:
Upgrade fabric8:kubernetes-client from 6.4.1 to 6.5.0

[CVE-2022-1471|https://www.cve.org/CVERecord?id=CVE-2022-1471]


> kubernetes-client from 6.4.1 to 6.5.0
> -
>
> Key: SPARK-42761
> URL: https://issues.apache.org/jira/browse/SPARK-42761
> Project: Spark
>  Issue Type: Dependency upgrade
>  Components: Build
>Affects Versions: 3.5.0
>Reporter: Bjørn Jørgensen
>Priority: Major
>
> Upgrade fabric8:kubernetes-client from 6.4.1 to 6.5.0



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-42753) ReusedExchange refers to non-existent node

2023-03-12 Thread Apache Spark (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-42753?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17699388#comment-17699388
 ] 

Apache Spark commented on SPARK-42753:
--

User 'StevenChenDatabricks' has created a pull request for this issue:
https://github.com/apache/spark/pull/40385

> ReusedExchange refers to non-existent node
> --
>
> Key: SPARK-42753
> URL: https://issues.apache.org/jira/browse/SPARK-42753
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Core, Web UI
>Affects Versions: 3.4.0
>Reporter: Steven Chen
>Priority: Major
>
> There is an AQE “issue“ where during AQE planning, the Exchange "that's 
> being" reused could be replaced in the plan tree. So, when we print the query 
> plan, the ReusedExchange will refer to an “unknown“ Exchange. An example 
> below:
>  
> {code:java}
> (2775) ReusedExchange [Reuses operator id: unknown]
>  Output [3]: [sr_customer_sk#271, sr_store_sk#275, sum#377L]{code}
>  
>  
> Below is an example to demonstrate the root cause:
>  
> {code:java}
> AdaptiveSparkPlan
>   |-- SomeNode X (subquery xxx)
>       |-- Exchange A
>           |-- SomeNode Y
>               |-- Exchange B
> Subquery:Hosting operator = SomeNode Hosting Expression = xxx 
> dynamicpruning#388
> AdaptiveSparkPlan
>   |-- SomeNode M
>       |-- Exchange C
>           |-- SomeNode N
>               |-- Exchange D
> {code}
>  
>  
> Step 1: Exchange B is materialized and the QueryStage is added to stage cache
> Step 2: Exchange D reuses Exchange B
> Step 3: Exchange C is materialized and the QueryStage is added to stage cache
> Step 4: Exchange A reuses Exchange C
>  
> Then the final plan looks like:
>  
> {code:java}
> AdaptiveSparkPlan
>   |-- SomeNode X (subquery xxx)
>       |-- Exchange A -> ReusedExchange (reuses Exchange C)
> Subquery:Hosting operator = SomeNode Hosting Expression = xxx 
> dynamicpruning#388
> AdaptiveSparkPlan
>   |-- SomeNode M
>       |-- Exchange C -> PhotonShuffleMapStage 
>           |-- SomeNode N
>               |-- Exchange D -> ReusedExchange (reuses Exchange B)
> {code}
>  
>  
> As a result, the ReusedExchange (reuses Exchange B) will refer to a non-exist 
> node. This *DOES NOT* affect query execution but will cause the query 
> visualization malfunction in the following ways:
>  # The ReusedExchange child subtree will still appear in the Spark UI graph 
> but will contain no node IDs.
>  # The ReusedExchange node details in the Explain plan will refer to a 
> UNKNOWN node. Example below.
> {code:java}
> (2775) ReusedExchange [Reuses operator id: unknown]{code}
>  # The child exchange and its subtree may be missing from the Explain text 
> completely. No node details or tree string shown.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-42753) ReusedExchange refers to non-existent node

2023-03-12 Thread Apache Spark (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-42753?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-42753:


Assignee: (was: Apache Spark)

> ReusedExchange refers to non-existent node
> --
>
> Key: SPARK-42753
> URL: https://issues.apache.org/jira/browse/SPARK-42753
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Core, Web UI
>Affects Versions: 3.4.0
>Reporter: Steven Chen
>Priority: Major
>
> There is an AQE “issue“ where during AQE planning, the Exchange "that's 
> being" reused could be replaced in the plan tree. So, when we print the query 
> plan, the ReusedExchange will refer to an “unknown“ Exchange. An example 
> below:
>  
> {code:java}
> (2775) ReusedExchange [Reuses operator id: unknown]
>  Output [3]: [sr_customer_sk#271, sr_store_sk#275, sum#377L]{code}
>  
>  
> Below is an example to demonstrate the root cause:
>  
> {code:java}
> AdaptiveSparkPlan
>   |-- SomeNode X (subquery xxx)
>       |-- Exchange A
>           |-- SomeNode Y
>               |-- Exchange B
> Subquery:Hosting operator = SomeNode Hosting Expression = xxx 
> dynamicpruning#388
> AdaptiveSparkPlan
>   |-- SomeNode M
>       |-- Exchange C
>           |-- SomeNode N
>               |-- Exchange D
> {code}
>  
>  
> Step 1: Exchange B is materialized and the QueryStage is added to stage cache
> Step 2: Exchange D reuses Exchange B
> Step 3: Exchange C is materialized and the QueryStage is added to stage cache
> Step 4: Exchange A reuses Exchange C
>  
> Then the final plan looks like:
>  
> {code:java}
> AdaptiveSparkPlan
>   |-- SomeNode X (subquery xxx)
>       |-- Exchange A -> ReusedExchange (reuses Exchange C)
> Subquery:Hosting operator = SomeNode Hosting Expression = xxx 
> dynamicpruning#388
> AdaptiveSparkPlan
>   |-- SomeNode M
>       |-- Exchange C -> PhotonShuffleMapStage 
>           |-- SomeNode N
>               |-- Exchange D -> ReusedExchange (reuses Exchange B)
> {code}
>  
>  
> As a result, the ReusedExchange (reuses Exchange B) will refer to a non-exist 
> node. This *DOES NOT* affect query execution but will cause the query 
> visualization malfunction in the following ways:
>  # The ReusedExchange child subtree will still appear in the Spark UI graph 
> but will contain no node IDs.
>  # The ReusedExchange node details in the Explain plan will refer to a 
> UNKNOWN node. Example below.
> {code:java}
> (2775) ReusedExchange [Reuses operator id: unknown]{code}
>  # The child exchange and its subtree may be missing from the Explain text 
> completely. No node details or tree string shown.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-42753) ReusedExchange refers to non-existent node

2023-03-12 Thread Apache Spark (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-42753?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-42753:


Assignee: Apache Spark

> ReusedExchange refers to non-existent node
> --
>
> Key: SPARK-42753
> URL: https://issues.apache.org/jira/browse/SPARK-42753
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Core, Web UI
>Affects Versions: 3.4.0
>Reporter: Steven Chen
>Assignee: Apache Spark
>Priority: Major
>
> There is an AQE “issue“ where during AQE planning, the Exchange "that's 
> being" reused could be replaced in the plan tree. So, when we print the query 
> plan, the ReusedExchange will refer to an “unknown“ Exchange. An example 
> below:
>  
> {code:java}
> (2775) ReusedExchange [Reuses operator id: unknown]
>  Output [3]: [sr_customer_sk#271, sr_store_sk#275, sum#377L]{code}
>  
>  
> Below is an example to demonstrate the root cause:
>  
> {code:java}
> AdaptiveSparkPlan
>   |-- SomeNode X (subquery xxx)
>       |-- Exchange A
>           |-- SomeNode Y
>               |-- Exchange B
> Subquery:Hosting operator = SomeNode Hosting Expression = xxx 
> dynamicpruning#388
> AdaptiveSparkPlan
>   |-- SomeNode M
>       |-- Exchange C
>           |-- SomeNode N
>               |-- Exchange D
> {code}
>  
>  
> Step 1: Exchange B is materialized and the QueryStage is added to stage cache
> Step 2: Exchange D reuses Exchange B
> Step 3: Exchange C is materialized and the QueryStage is added to stage cache
> Step 4: Exchange A reuses Exchange C
>  
> Then the final plan looks like:
>  
> {code:java}
> AdaptiveSparkPlan
>   |-- SomeNode X (subquery xxx)
>       |-- Exchange A -> ReusedExchange (reuses Exchange C)
> Subquery:Hosting operator = SomeNode Hosting Expression = xxx 
> dynamicpruning#388
> AdaptiveSparkPlan
>   |-- SomeNode M
>       |-- Exchange C -> PhotonShuffleMapStage 
>           |-- SomeNode N
>               |-- Exchange D -> ReusedExchange (reuses Exchange B)
> {code}
>  
>  
> As a result, the ReusedExchange (reuses Exchange B) will refer to a non-exist 
> node. This *DOES NOT* affect query execution but will cause the query 
> visualization malfunction in the following ways:
>  # The ReusedExchange child subtree will still appear in the Spark UI graph 
> but will contain no node IDs.
>  # The ReusedExchange node details in the Explain plan will refer to a 
> UNKNOWN node. Example below.
> {code:java}
> (2775) ReusedExchange [Reuses operator id: unknown]{code}
>  # The child exchange and its subtree may be missing from the Explain text 
> completely. No node details or tree string shown.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-42753) ReusedExchange refers to non-existent node

2023-03-12 Thread Apache Spark (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-42753?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17699389#comment-17699389
 ] 

Apache Spark commented on SPARK-42753:
--

User 'StevenChenDatabricks' has created a pull request for this issue:
https://github.com/apache/spark/pull/40385

> ReusedExchange refers to non-existent node
> --
>
> Key: SPARK-42753
> URL: https://issues.apache.org/jira/browse/SPARK-42753
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Core, Web UI
>Affects Versions: 3.4.0
>Reporter: Steven Chen
>Priority: Major
>
> There is an AQE “issue“ where during AQE planning, the Exchange "that's 
> being" reused could be replaced in the plan tree. So, when we print the query 
> plan, the ReusedExchange will refer to an “unknown“ Exchange. An example 
> below:
>  
> {code:java}
> (2775) ReusedExchange [Reuses operator id: unknown]
>  Output [3]: [sr_customer_sk#271, sr_store_sk#275, sum#377L]{code}
>  
>  
> Below is an example to demonstrate the root cause:
>  
> {code:java}
> AdaptiveSparkPlan
>   |-- SomeNode X (subquery xxx)
>       |-- Exchange A
>           |-- SomeNode Y
>               |-- Exchange B
> Subquery:Hosting operator = SomeNode Hosting Expression = xxx 
> dynamicpruning#388
> AdaptiveSparkPlan
>   |-- SomeNode M
>       |-- Exchange C
>           |-- SomeNode N
>               |-- Exchange D
> {code}
>  
>  
> Step 1: Exchange B is materialized and the QueryStage is added to stage cache
> Step 2: Exchange D reuses Exchange B
> Step 3: Exchange C is materialized and the QueryStage is added to stage cache
> Step 4: Exchange A reuses Exchange C
>  
> Then the final plan looks like:
>  
> {code:java}
> AdaptiveSparkPlan
>   |-- SomeNode X (subquery xxx)
>       |-- Exchange A -> ReusedExchange (reuses Exchange C)
> Subquery:Hosting operator = SomeNode Hosting Expression = xxx 
> dynamicpruning#388
> AdaptiveSparkPlan
>   |-- SomeNode M
>       |-- Exchange C -> PhotonShuffleMapStage 
>           |-- SomeNode N
>               |-- Exchange D -> ReusedExchange (reuses Exchange B)
> {code}
>  
>  
> As a result, the ReusedExchange (reuses Exchange B) will refer to a non-exist 
> node. This *DOES NOT* affect query execution but will cause the query 
> visualization malfunction in the following ways:
>  # The ReusedExchange child subtree will still appear in the Spark UI graph 
> but will contain no node IDs.
>  # The ReusedExchange node details in the Explain plan will refer to a 
> UNKNOWN node. Example below.
> {code:java}
> (2775) ReusedExchange [Reuses operator id: unknown]{code}
>  # The child exchange and its subtree may be missing from the Explain text 
> completely. No node details or tree string shown.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org