[jira] [Commented] (SPARK-42768) Enable cached plan apply AQE by default
[ https://issues.apache.org/jira/browse/SPARK-42768?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17699445#comment-17699445 ] Apache Spark commented on SPARK-42768: -- User 'ulysses-you' has created a pull request for this issue: https://github.com/apache/spark/pull/40390 > Enable cached plan apply AQE by default > --- > > Key: SPARK-42768 > URL: https://issues.apache.org/jira/browse/SPARK-42768 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 3.5.0 >Reporter: XiDuo You >Priority: Major > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-42768) Enable cached plan apply AQE by default
[ https://issues.apache.org/jira/browse/SPARK-42768?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-42768: Assignee: (was: Apache Spark) > Enable cached plan apply AQE by default > --- > > Key: SPARK-42768 > URL: https://issues.apache.org/jira/browse/SPARK-42768 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 3.5.0 >Reporter: XiDuo You >Priority: Major > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-42768) Enable cached plan apply AQE by default
[ https://issues.apache.org/jira/browse/SPARK-42768?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-42768: Assignee: Apache Spark > Enable cached plan apply AQE by default > --- > > Key: SPARK-42768 > URL: https://issues.apache.org/jira/browse/SPARK-42768 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 3.5.0 >Reporter: XiDuo You >Assignee: Apache Spark >Priority: Major > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-42768) Enable cached plan apply AQE by default
[ https://issues.apache.org/jira/browse/SPARK-42768?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] XiDuo You updated SPARK-42768: -- Summary: Enable cached plan apply AQE by default (was: Enable cache apply AQE by default) > Enable cached plan apply AQE by default > --- > > Key: SPARK-42768 > URL: https://issues.apache.org/jira/browse/SPARK-42768 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 3.5.0 >Reporter: XiDuo You >Priority: Major > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-42768) Enable cache apply AQE by default
XiDuo You created SPARK-42768: - Summary: Enable cache apply AQE by default Key: SPARK-42768 URL: https://issues.apache.org/jira/browse/SPARK-42768 Project: Spark Issue Type: Improvement Components: SQL Affects Versions: 3.5.0 Reporter: XiDuo You -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-42767) Add check condition to start connect server fallback with `in-memory` and auto ignored some tests strongly depend on hive
[ https://issues.apache.org/jira/browse/SPARK-42767?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-42767: Assignee: Apache Spark > Add check condition to start connect server fallback with `in-memory` and > auto ignored some tests strongly depend on hive > - > > Key: SPARK-42767 > URL: https://issues.apache.org/jira/browse/SPARK-42767 > Project: Spark > Issue Type: Improvement > Components: Connect, Tests >Affects Versions: 3.4.0, 3.5.0 >Reporter: Yang Jie >Assignee: Apache Spark >Priority: Major > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-42767) Add check condition to start connect server fallback with `in-memory` and auto ignored some tests strongly depend on hive
[ https://issues.apache.org/jira/browse/SPARK-42767?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-42767: Assignee: (was: Apache Spark) > Add check condition to start connect server fallback with `in-memory` and > auto ignored some tests strongly depend on hive > - > > Key: SPARK-42767 > URL: https://issues.apache.org/jira/browse/SPARK-42767 > Project: Spark > Issue Type: Improvement > Components: Connect, Tests >Affects Versions: 3.4.0, 3.5.0 >Reporter: Yang Jie >Priority: Major > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-42767) Add check condition to start connect server fallback with `in-memory` and auto ignored some tests strongly depend on hive
[ https://issues.apache.org/jira/browse/SPARK-42767?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17699444#comment-17699444 ] Apache Spark commented on SPARK-42767: -- User 'LuciferYang' has created a pull request for this issue: https://github.com/apache/spark/pull/40389 > Add check condition to start connect server fallback with `in-memory` and > auto ignored some tests strongly depend on hive > - > > Key: SPARK-42767 > URL: https://issues.apache.org/jira/browse/SPARK-42767 > Project: Spark > Issue Type: Improvement > Components: Connect, Tests >Affects Versions: 3.4.0, 3.5.0 >Reporter: Yang Jie >Priority: Major > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-42767) Add check condition to start connect server fallback with `in-memory` and auto ignored some tests strongly depend on hive
[ https://issues.apache.org/jira/browse/SPARK-42767?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17699443#comment-17699443 ] Apache Spark commented on SPARK-42767: -- User 'LuciferYang' has created a pull request for this issue: https://github.com/apache/spark/pull/40389 > Add check condition to start connect server fallback with `in-memory` and > auto ignored some tests strongly depend on hive > - > > Key: SPARK-42767 > URL: https://issues.apache.org/jira/browse/SPARK-42767 > Project: Spark > Issue Type: Improvement > Components: Connect, Tests >Affects Versions: 3.4.0, 3.5.0 >Reporter: Yang Jie >Priority: Major > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-42767) Add check condition to start connect server fallback with `in-memory` and auto ignored some tests strongly depend on hive
Yang Jie created SPARK-42767: Summary: Add check condition to start connect server fallback with `in-memory` and auto ignored some tests strongly depend on hive Key: SPARK-42767 URL: https://issues.apache.org/jira/browse/SPARK-42767 Project: Spark Issue Type: Improvement Components: Connect, Tests Affects Versions: 3.4.0, 3.5.0 Reporter: Yang Jie -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-42766) YarnAllocator should filter excluded nodes when launching allocated containers
wangshengjie created SPARK-42766: Summary: YarnAllocator should filter excluded nodes when launching allocated containers Key: SPARK-42766 URL: https://issues.apache.org/jira/browse/SPARK-42766 Project: Spark Issue Type: Improvement Components: YARN Affects Versions: 3.3.2 Reporter: wangshengjie In production environment, we hit an issue like this: If we request 10 containers form nodeA and nodeB, first response from Yarn return 5 contianers from nodeA and nodeB, then nodeA blacklisted, and second response from Yarn maybe return some containers from nodeA and launching containers, but when containers(Executor) setup and send register request to Driver, it will be rejected and this failure will be counted to {code:java} spark.yarn.max.executor.failures {code} , and will casue app failed. {code:java} Max number of executor failures ($maxNumExecutorFailures) reached{code} -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-42766) YarnAllocator should filter excluded nodes when launching allocated containers
[ https://issues.apache.org/jira/browse/SPARK-42766?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17699438#comment-17699438 ] wangshengjie commented on SPARK-42766: -- Working on this > YarnAllocator should filter excluded nodes when launching allocated containers > -- > > Key: SPARK-42766 > URL: https://issues.apache.org/jira/browse/SPARK-42766 > Project: Spark > Issue Type: Improvement > Components: YARN >Affects Versions: 3.3.2 >Reporter: wangshengjie >Priority: Major > > In production environment, we hit an issue like this: > If we request 10 containers form nodeA and nodeB, first response from Yarn > return 5 contianers from nodeA and nodeB, then nodeA blacklisted, and second > response from Yarn maybe return some containers from nodeA and launching > containers, but when containers(Executor) setup and send register request to > Driver, it will be rejected and this failure will be counted to > {code:java} > spark.yarn.max.executor.failures {code} > , and will casue app failed. > {code:java} > Max number of executor failures ($maxNumExecutorFailures) reached{code} > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-42679) createDataFrame doesn't work with non-nullable schema.
[ https://issues.apache.org/jira/browse/SPARK-42679?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ruifeng Zheng resolved SPARK-42679. --- Fix Version/s: 3.4.0 Resolution: Fixed Issue resolved by pull request 40382 [https://github.com/apache/spark/pull/40382] > createDataFrame doesn't work with non-nullable schema. > -- > > Key: SPARK-42679 > URL: https://issues.apache.org/jira/browse/SPARK-42679 > Project: Spark > Issue Type: Sub-task > Components: Connect >Affects Versions: 3.4.0 >Reporter: Haejoon Lee >Priority: Major > Fix For: 3.4.0 > > > spark.createDataFrame won't work with non-nullable schema as below: > {code:java} > from pyspark.sql.types import * > schema_false = StructType([StructField("id", IntegerType(), False)]) > spark.createDataFrame([[1]], schema=schema_false) > Traceback (most recent call last): > ... > pyspark.errors.exceptions.connect.AnalysisException: > [NULLABLE_COLUMN_OR_FIELD] Column or field `id` is nullable while it's > required to be non-nullable.{code} > whereas it works fine with nullable schema: > {code:java} > schema_true = StructType([StructField("id", IntegerType(), True)]) > spark.createDataFrame([[1]], schema=schema_true) > DataFrame[id: int]{code} > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-42101) Wrap InMemoryTableScanExec with QueryStage
[ https://issues.apache.org/jira/browse/SPARK-42101?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wenchen Fan reassigned SPARK-42101: --- Assignee: XiDuo You > Wrap InMemoryTableScanExec with QueryStage > -- > > Key: SPARK-42101 > URL: https://issues.apache.org/jira/browse/SPARK-42101 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 3.5.0 >Reporter: XiDuo You >Assignee: XiDuo You >Priority: Major > Fix For: 3.5.0 > > > The first access to the cached plan which is enable AQE is tricky. Currently, > we can not preverse it's output partitioning and ordering. > The whole query plan also missed lots of optimization in AQE framework. Wrap > InMemoryTableScanExec to query stage can resolve all these issues. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-42101) Wrap InMemoryTableScanExec with QueryStage
[ https://issues.apache.org/jira/browse/SPARK-42101?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wenchen Fan resolved SPARK-42101. - Fix Version/s: 3.5.0 Resolution: Fixed Issue resolved by pull request 39624 [https://github.com/apache/spark/pull/39624] > Wrap InMemoryTableScanExec with QueryStage > -- > > Key: SPARK-42101 > URL: https://issues.apache.org/jira/browse/SPARK-42101 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 3.5.0 >Reporter: XiDuo You >Priority: Major > Fix For: 3.5.0 > > > The first access to the cached plan which is enable AQE is tricky. Currently, > we can not preverse it's output partitioning and ordering. > The whole query plan also missed lots of optimization in AQE framework. Wrap > InMemoryTableScanExec to query stage can resolve all these issues. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-42765) Regulate the import path of `pandas_udf`
[ https://issues.apache.org/jira/browse/SPARK-42765?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17699425#comment-17699425 ] Apache Spark commented on SPARK-42765: -- User 'xinrong-meng' has created a pull request for this issue: https://github.com/apache/spark/pull/40388 > Regulate the import path of `pandas_udf` > > > Key: SPARK-42765 > URL: https://issues.apache.org/jira/browse/SPARK-42765 > Project: Spark > Issue Type: Sub-task > Components: Connect, PySpark >Affects Versions: 3.4.0 >Reporter: Xinrong Meng >Priority: Major > > Remove the outdated import path of `pandas_udf` -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-42765) Regulate the import path of `pandas_udf`
[ https://issues.apache.org/jira/browse/SPARK-42765?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-42765: Assignee: Apache Spark > Regulate the import path of `pandas_udf` > > > Key: SPARK-42765 > URL: https://issues.apache.org/jira/browse/SPARK-42765 > Project: Spark > Issue Type: Sub-task > Components: Connect, PySpark >Affects Versions: 3.4.0 >Reporter: Xinrong Meng >Assignee: Apache Spark >Priority: Major > > Remove the outdated import path of `pandas_udf` -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-42765) Regulate the import path of `pandas_udf`
[ https://issues.apache.org/jira/browse/SPARK-42765?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17699426#comment-17699426 ] Apache Spark commented on SPARK-42765: -- User 'xinrong-meng' has created a pull request for this issue: https://github.com/apache/spark/pull/40388 > Regulate the import path of `pandas_udf` > > > Key: SPARK-42765 > URL: https://issues.apache.org/jira/browse/SPARK-42765 > Project: Spark > Issue Type: Sub-task > Components: Connect, PySpark >Affects Versions: 3.4.0 >Reporter: Xinrong Meng >Priority: Major > > Remove the outdated import path of `pandas_udf` -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-42765) Regulate the import path of `pandas_udf`
[ https://issues.apache.org/jira/browse/SPARK-42765?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-42765: Assignee: (was: Apache Spark) > Regulate the import path of `pandas_udf` > > > Key: SPARK-42765 > URL: https://issues.apache.org/jira/browse/SPARK-42765 > Project: Spark > Issue Type: Sub-task > Components: Connect, PySpark >Affects Versions: 3.4.0 >Reporter: Xinrong Meng >Priority: Major > > Remove the outdated import path of `pandas_udf` -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-42765) Regulate the import path of `pandas_udf`
[ https://issues.apache.org/jira/browse/SPARK-42765?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Xinrong Meng updated SPARK-42765: - Summary: Regulate the import path of `pandas_udf` (was: Standardize the import path of `pandas_udf`) > Regulate the import path of `pandas_udf` > > > Key: SPARK-42765 > URL: https://issues.apache.org/jira/browse/SPARK-42765 > Project: Spark > Issue Type: Sub-task > Components: Connect, PySpark >Affects Versions: 3.4.0 >Reporter: Xinrong Meng >Priority: Major > > Remove the outdated import path of `pandas_udf` -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-42765) Standardize the import path of `pandas_udf`
[ https://issues.apache.org/jira/browse/SPARK-42765?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Xinrong Meng updated SPARK-42765: - Summary: Standardize the import path of `pandas_udf` (was: Remove the outdated import path of `pandas_udf`) > Standardize the import path of `pandas_udf` > --- > > Key: SPARK-42765 > URL: https://issues.apache.org/jira/browse/SPARK-42765 > Project: Spark > Issue Type: Sub-task > Components: Connect, PySpark >Affects Versions: 3.4.0 >Reporter: Xinrong Meng >Priority: Major > > Remove the outdated import path of `pandas_udf` -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-42765) Remove the outdated import path of `pandas_udf`
Xinrong Meng created SPARK-42765: Summary: Remove the outdated import path of `pandas_udf` Key: SPARK-42765 URL: https://issues.apache.org/jira/browse/SPARK-42765 Project: Spark Issue Type: Sub-task Components: Connect, PySpark Affects Versions: 3.4.0 Reporter: Xinrong Meng Remove the outdated import path of `pandas_udf` -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-42764) Parameterize the max number of attempts for driver props fetcher in KubernetesExecutorBackend
[ https://issues.apache.org/jira/browse/SPARK-42764?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17699417#comment-17699417 ] Apache Spark commented on SPARK-42764: -- User 'dongjoon-hyun' has created a pull request for this issue: https://github.com/apache/spark/pull/40387 > Parameterize the max number of attempts for driver props fetcher in > KubernetesExecutorBackend > - > > Key: SPARK-42764 > URL: https://issues.apache.org/jira/browse/SPARK-42764 > Project: Spark > Issue Type: Improvement > Components: Kubernetes >Affects Versions: 3.5.0 >Reporter: Dongjoon Hyun >Priority: Major > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-42764) Parameterize the max number of attempts for driver props fetcher in KubernetesExecutorBackend
[ https://issues.apache.org/jira/browse/SPARK-42764?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-42764: Assignee: (was: Apache Spark) > Parameterize the max number of attempts for driver props fetcher in > KubernetesExecutorBackend > - > > Key: SPARK-42764 > URL: https://issues.apache.org/jira/browse/SPARK-42764 > Project: Spark > Issue Type: Improvement > Components: Kubernetes >Affects Versions: 3.5.0 >Reporter: Dongjoon Hyun >Priority: Major > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-42764) Parameterize the max number of attempts for driver props fetcher in KubernetesExecutorBackend
[ https://issues.apache.org/jira/browse/SPARK-42764?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17699416#comment-17699416 ] Apache Spark commented on SPARK-42764: -- User 'dongjoon-hyun' has created a pull request for this issue: https://github.com/apache/spark/pull/40387 > Parameterize the max number of attempts for driver props fetcher in > KubernetesExecutorBackend > - > > Key: SPARK-42764 > URL: https://issues.apache.org/jira/browse/SPARK-42764 > Project: Spark > Issue Type: Improvement > Components: Kubernetes >Affects Versions: 3.5.0 >Reporter: Dongjoon Hyun >Priority: Major > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-42764) Parameterize the max number of attempts for driver props fetcher in KubernetesExecutorBackend
[ https://issues.apache.org/jira/browse/SPARK-42764?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-42764: Assignee: Apache Spark > Parameterize the max number of attempts for driver props fetcher in > KubernetesExecutorBackend > - > > Key: SPARK-42764 > URL: https://issues.apache.org/jira/browse/SPARK-42764 > Project: Spark > Issue Type: Improvement > Components: Kubernetes >Affects Versions: 3.5.0 >Reporter: Dongjoon Hyun >Assignee: Apache Spark >Priority: Major > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-42764) Parameterize the max number of attempts for driver props fetcher in KubernetesExecutorBackend
Dongjoon Hyun created SPARK-42764: - Summary: Parameterize the max number of attempts for driver props fetcher in KubernetesExecutorBackend Key: SPARK-42764 URL: https://issues.apache.org/jira/browse/SPARK-42764 Project: Spark Issue Type: Improvement Components: Kubernetes Affects Versions: 3.5.0 Reporter: Dongjoon Hyun -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-42761) kubernetes-client from 6.4.1 to 6.5.0
[ https://issues.apache.org/jira/browse/SPARK-42761?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dongjoon Hyun reassigned SPARK-42761: - Assignee: Bjørn Jørgensen > kubernetes-client from 6.4.1 to 6.5.0 > - > > Key: SPARK-42761 > URL: https://issues.apache.org/jira/browse/SPARK-42761 > Project: Spark > Issue Type: Dependency upgrade > Components: Build >Affects Versions: 3.5.0 >Reporter: Bjørn Jørgensen >Assignee: Bjørn Jørgensen >Priority: Major > > Upgrade fabric8:kubernetes-client from 6.4.1 to 6.5.0 -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-42761) kubernetes-client from 6.4.1 to 6.5.0
[ https://issues.apache.org/jira/browse/SPARK-42761?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dongjoon Hyun resolved SPARK-42761. --- Fix Version/s: 3.5.0 Resolution: Fixed Issue resolved by pull request 40381 [https://github.com/apache/spark/pull/40381] > kubernetes-client from 6.4.1 to 6.5.0 > - > > Key: SPARK-42761 > URL: https://issues.apache.org/jira/browse/SPARK-42761 > Project: Spark > Issue Type: Dependency upgrade > Components: Build >Affects Versions: 3.5.0 >Reporter: Bjørn Jørgensen >Assignee: Bjørn Jørgensen >Priority: Major > Fix For: 3.5.0 > > > Upgrade fabric8:kubernetes-client from 6.4.1 to 6.5.0 -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-42761) kubernetes-client from 6.4.1 to 6.5.0
[ https://issues.apache.org/jira/browse/SPARK-42761?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Bjørn Jørgensen updated SPARK-42761: Description: Upgrade fabric8:kubernetes-client from 6.4.1 to 6.5.0 was: Upgrade fabric8:kubernetes-client from 6.4.1 to 6.5.0 [CVE-2022-1471|https://www.cve.org/CVERecord?id=CVE-2022-1471] > kubernetes-client from 6.4.1 to 6.5.0 > - > > Key: SPARK-42761 > URL: https://issues.apache.org/jira/browse/SPARK-42761 > Project: Spark > Issue Type: Dependency upgrade > Components: Build >Affects Versions: 3.5.0 >Reporter: Bjørn Jørgensen >Priority: Major > > Upgrade fabric8:kubernetes-client from 6.4.1 to 6.5.0 -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-42753) ReusedExchange refers to non-existent node
[ https://issues.apache.org/jira/browse/SPARK-42753?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17699388#comment-17699388 ] Apache Spark commented on SPARK-42753: -- User 'StevenChenDatabricks' has created a pull request for this issue: https://github.com/apache/spark/pull/40385 > ReusedExchange refers to non-existent node > -- > > Key: SPARK-42753 > URL: https://issues.apache.org/jira/browse/SPARK-42753 > Project: Spark > Issue Type: Bug > Components: Spark Core, Web UI >Affects Versions: 3.4.0 >Reporter: Steven Chen >Priority: Major > > There is an AQE “issue“ where during AQE planning, the Exchange "that's > being" reused could be replaced in the plan tree. So, when we print the query > plan, the ReusedExchange will refer to an “unknown“ Exchange. An example > below: > > {code:java} > (2775) ReusedExchange [Reuses operator id: unknown] > Output [3]: [sr_customer_sk#271, sr_store_sk#275, sum#377L]{code} > > > Below is an example to demonstrate the root cause: > > {code:java} > AdaptiveSparkPlan > |-- SomeNode X (subquery xxx) > |-- Exchange A > |-- SomeNode Y > |-- Exchange B > Subquery:Hosting operator = SomeNode Hosting Expression = xxx > dynamicpruning#388 > AdaptiveSparkPlan > |-- SomeNode M > |-- Exchange C > |-- SomeNode N > |-- Exchange D > {code} > > > Step 1: Exchange B is materialized and the QueryStage is added to stage cache > Step 2: Exchange D reuses Exchange B > Step 3: Exchange C is materialized and the QueryStage is added to stage cache > Step 4: Exchange A reuses Exchange C > > Then the final plan looks like: > > {code:java} > AdaptiveSparkPlan > |-- SomeNode X (subquery xxx) > |-- Exchange A -> ReusedExchange (reuses Exchange C) > Subquery:Hosting operator = SomeNode Hosting Expression = xxx > dynamicpruning#388 > AdaptiveSparkPlan > |-- SomeNode M > |-- Exchange C -> PhotonShuffleMapStage > |-- SomeNode N > |-- Exchange D -> ReusedExchange (reuses Exchange B) > {code} > > > As a result, the ReusedExchange (reuses Exchange B) will refer to a non-exist > node. This *DOES NOT* affect query execution but will cause the query > visualization malfunction in the following ways: > # The ReusedExchange child subtree will still appear in the Spark UI graph > but will contain no node IDs. > # The ReusedExchange node details in the Explain plan will refer to a > UNKNOWN node. Example below. > {code:java} > (2775) ReusedExchange [Reuses operator id: unknown]{code} > # The child exchange and its subtree may be missing from the Explain text > completely. No node details or tree string shown. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-42753) ReusedExchange refers to non-existent node
[ https://issues.apache.org/jira/browse/SPARK-42753?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-42753: Assignee: (was: Apache Spark) > ReusedExchange refers to non-existent node > -- > > Key: SPARK-42753 > URL: https://issues.apache.org/jira/browse/SPARK-42753 > Project: Spark > Issue Type: Bug > Components: Spark Core, Web UI >Affects Versions: 3.4.0 >Reporter: Steven Chen >Priority: Major > > There is an AQE “issue“ where during AQE planning, the Exchange "that's > being" reused could be replaced in the plan tree. So, when we print the query > plan, the ReusedExchange will refer to an “unknown“ Exchange. An example > below: > > {code:java} > (2775) ReusedExchange [Reuses operator id: unknown] > Output [3]: [sr_customer_sk#271, sr_store_sk#275, sum#377L]{code} > > > Below is an example to demonstrate the root cause: > > {code:java} > AdaptiveSparkPlan > |-- SomeNode X (subquery xxx) > |-- Exchange A > |-- SomeNode Y > |-- Exchange B > Subquery:Hosting operator = SomeNode Hosting Expression = xxx > dynamicpruning#388 > AdaptiveSparkPlan > |-- SomeNode M > |-- Exchange C > |-- SomeNode N > |-- Exchange D > {code} > > > Step 1: Exchange B is materialized and the QueryStage is added to stage cache > Step 2: Exchange D reuses Exchange B > Step 3: Exchange C is materialized and the QueryStage is added to stage cache > Step 4: Exchange A reuses Exchange C > > Then the final plan looks like: > > {code:java} > AdaptiveSparkPlan > |-- SomeNode X (subquery xxx) > |-- Exchange A -> ReusedExchange (reuses Exchange C) > Subquery:Hosting operator = SomeNode Hosting Expression = xxx > dynamicpruning#388 > AdaptiveSparkPlan > |-- SomeNode M > |-- Exchange C -> PhotonShuffleMapStage > |-- SomeNode N > |-- Exchange D -> ReusedExchange (reuses Exchange B) > {code} > > > As a result, the ReusedExchange (reuses Exchange B) will refer to a non-exist > node. This *DOES NOT* affect query execution but will cause the query > visualization malfunction in the following ways: > # The ReusedExchange child subtree will still appear in the Spark UI graph > but will contain no node IDs. > # The ReusedExchange node details in the Explain plan will refer to a > UNKNOWN node. Example below. > {code:java} > (2775) ReusedExchange [Reuses operator id: unknown]{code} > # The child exchange and its subtree may be missing from the Explain text > completely. No node details or tree string shown. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-42753) ReusedExchange refers to non-existent node
[ https://issues.apache.org/jira/browse/SPARK-42753?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-42753: Assignee: Apache Spark > ReusedExchange refers to non-existent node > -- > > Key: SPARK-42753 > URL: https://issues.apache.org/jira/browse/SPARK-42753 > Project: Spark > Issue Type: Bug > Components: Spark Core, Web UI >Affects Versions: 3.4.0 >Reporter: Steven Chen >Assignee: Apache Spark >Priority: Major > > There is an AQE “issue“ where during AQE planning, the Exchange "that's > being" reused could be replaced in the plan tree. So, when we print the query > plan, the ReusedExchange will refer to an “unknown“ Exchange. An example > below: > > {code:java} > (2775) ReusedExchange [Reuses operator id: unknown] > Output [3]: [sr_customer_sk#271, sr_store_sk#275, sum#377L]{code} > > > Below is an example to demonstrate the root cause: > > {code:java} > AdaptiveSparkPlan > |-- SomeNode X (subquery xxx) > |-- Exchange A > |-- SomeNode Y > |-- Exchange B > Subquery:Hosting operator = SomeNode Hosting Expression = xxx > dynamicpruning#388 > AdaptiveSparkPlan > |-- SomeNode M > |-- Exchange C > |-- SomeNode N > |-- Exchange D > {code} > > > Step 1: Exchange B is materialized and the QueryStage is added to stage cache > Step 2: Exchange D reuses Exchange B > Step 3: Exchange C is materialized and the QueryStage is added to stage cache > Step 4: Exchange A reuses Exchange C > > Then the final plan looks like: > > {code:java} > AdaptiveSparkPlan > |-- SomeNode X (subquery xxx) > |-- Exchange A -> ReusedExchange (reuses Exchange C) > Subquery:Hosting operator = SomeNode Hosting Expression = xxx > dynamicpruning#388 > AdaptiveSparkPlan > |-- SomeNode M > |-- Exchange C -> PhotonShuffleMapStage > |-- SomeNode N > |-- Exchange D -> ReusedExchange (reuses Exchange B) > {code} > > > As a result, the ReusedExchange (reuses Exchange B) will refer to a non-exist > node. This *DOES NOT* affect query execution but will cause the query > visualization malfunction in the following ways: > # The ReusedExchange child subtree will still appear in the Spark UI graph > but will contain no node IDs. > # The ReusedExchange node details in the Explain plan will refer to a > UNKNOWN node. Example below. > {code:java} > (2775) ReusedExchange [Reuses operator id: unknown]{code} > # The child exchange and its subtree may be missing from the Explain text > completely. No node details or tree string shown. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-42753) ReusedExchange refers to non-existent node
[ https://issues.apache.org/jira/browse/SPARK-42753?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17699389#comment-17699389 ] Apache Spark commented on SPARK-42753: -- User 'StevenChenDatabricks' has created a pull request for this issue: https://github.com/apache/spark/pull/40385 > ReusedExchange refers to non-existent node > -- > > Key: SPARK-42753 > URL: https://issues.apache.org/jira/browse/SPARK-42753 > Project: Spark > Issue Type: Bug > Components: Spark Core, Web UI >Affects Versions: 3.4.0 >Reporter: Steven Chen >Priority: Major > > There is an AQE “issue“ where during AQE planning, the Exchange "that's > being" reused could be replaced in the plan tree. So, when we print the query > plan, the ReusedExchange will refer to an “unknown“ Exchange. An example > below: > > {code:java} > (2775) ReusedExchange [Reuses operator id: unknown] > Output [3]: [sr_customer_sk#271, sr_store_sk#275, sum#377L]{code} > > > Below is an example to demonstrate the root cause: > > {code:java} > AdaptiveSparkPlan > |-- SomeNode X (subquery xxx) > |-- Exchange A > |-- SomeNode Y > |-- Exchange B > Subquery:Hosting operator = SomeNode Hosting Expression = xxx > dynamicpruning#388 > AdaptiveSparkPlan > |-- SomeNode M > |-- Exchange C > |-- SomeNode N > |-- Exchange D > {code} > > > Step 1: Exchange B is materialized and the QueryStage is added to stage cache > Step 2: Exchange D reuses Exchange B > Step 3: Exchange C is materialized and the QueryStage is added to stage cache > Step 4: Exchange A reuses Exchange C > > Then the final plan looks like: > > {code:java} > AdaptiveSparkPlan > |-- SomeNode X (subquery xxx) > |-- Exchange A -> ReusedExchange (reuses Exchange C) > Subquery:Hosting operator = SomeNode Hosting Expression = xxx > dynamicpruning#388 > AdaptiveSparkPlan > |-- SomeNode M > |-- Exchange C -> PhotonShuffleMapStage > |-- SomeNode N > |-- Exchange D -> ReusedExchange (reuses Exchange B) > {code} > > > As a result, the ReusedExchange (reuses Exchange B) will refer to a non-exist > node. This *DOES NOT* affect query execution but will cause the query > visualization malfunction in the following ways: > # The ReusedExchange child subtree will still appear in the Spark UI graph > but will contain no node IDs. > # The ReusedExchange node details in the Explain plan will refer to a > UNKNOWN node. Example below. > {code:java} > (2775) ReusedExchange [Reuses operator id: unknown]{code} > # The child exchange and its subtree may be missing from the Explain text > completely. No node details or tree string shown. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org