[jira] [Commented] (SPARK-42101) Wrap InMemoryTableScanExec with QueryStage
[ https://issues.apache.org/jira/browse/SPARK-42101?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17699937#comment-17699937 ] Apache Spark commented on SPARK-42101: -- User 'ulysses-you' has created a pull request for this issue: https://github.com/apache/spark/pull/40406 > Wrap InMemoryTableScanExec with QueryStage > -- > > Key: SPARK-42101 > URL: https://issues.apache.org/jira/browse/SPARK-42101 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 3.5.0 >Reporter: XiDuo You >Assignee: XiDuo You >Priority: Major > Fix For: 3.5.0 > > > The first access to the cached plan which is enable AQE is tricky. Currently, > we can not preverse it's output partitioning and ordering. > The whole query plan also missed lots of optimization in AQE framework. Wrap > InMemoryTableScanExec to query stage can resolve all these issues. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-42777) Support converting TimestampNTZ catalog stats to plan stats
[ https://issues.apache.org/jira/browse/SPARK-42777?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Gengliang Wang resolved SPARK-42777. Fix Version/s: 3.4.0 Resolution: Fixed Issue resolved by pull request 40404 [https://github.com/apache/spark/pull/40404] > Support converting TimestampNTZ catalog stats to plan stats > --- > > Key: SPARK-42777 > URL: https://issues.apache.org/jira/browse/SPARK-42777 > Project: Spark > Issue Type: Sub-task > Components: SQL >Affects Versions: 3.4.0 >Reporter: Gengliang Wang >Assignee: Gengliang Wang >Priority: Major > Fix For: 3.4.0 > > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-42340) Implement GroupedData.applyInPandas
[ https://issues.apache.org/jira/browse/SPARK-42340?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17699924#comment-17699924 ] Apache Spark commented on SPARK-42340: -- User 'xinrong-meng' has created a pull request for this issue: https://github.com/apache/spark/pull/40405 > Implement GroupedData.applyInPandas > --- > > Key: SPARK-42340 > URL: https://issues.apache.org/jira/browse/SPARK-42340 > Project: Spark > Issue Type: Sub-task > Components: Connect >Affects Versions: 3.4.0 >Reporter: Takuya Ueshin >Priority: Major > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-42340) Implement GroupedData.applyInPandas
[ https://issues.apache.org/jira/browse/SPARK-42340?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17699922#comment-17699922 ] Apache Spark commented on SPARK-42340: -- User 'xinrong-meng' has created a pull request for this issue: https://github.com/apache/spark/pull/40405 > Implement GroupedData.applyInPandas > --- > > Key: SPARK-42340 > URL: https://issues.apache.org/jira/browse/SPARK-42340 > Project: Spark > Issue Type: Sub-task > Components: Connect >Affects Versions: 3.4.0 >Reporter: Takuya Ueshin >Priority: Major > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-42340) Implement GroupedData.applyInPandas
[ https://issues.apache.org/jira/browse/SPARK-42340?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-42340: Assignee: Apache Spark > Implement GroupedData.applyInPandas > --- > > Key: SPARK-42340 > URL: https://issues.apache.org/jira/browse/SPARK-42340 > Project: Spark > Issue Type: Sub-task > Components: Connect >Affects Versions: 3.4.0 >Reporter: Takuya Ueshin >Assignee: Apache Spark >Priority: Major > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-42340) Implement GroupedData.applyInPandas
[ https://issues.apache.org/jira/browse/SPARK-42340?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-42340: Assignee: (was: Apache Spark) > Implement GroupedData.applyInPandas > --- > > Key: SPARK-42340 > URL: https://issues.apache.org/jira/browse/SPARK-42340 > Project: Spark > Issue Type: Sub-task > Components: Connect >Affects Versions: 3.4.0 >Reporter: Takuya Ueshin >Priority: Major > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-42773) Minor grammatical change to "Supports Spark Connect" message
[ https://issues.apache.org/jira/browse/SPARK-42773?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ruifeng Zheng resolved SPARK-42773. --- Fix Version/s: 3.4.0 Resolution: Fixed Issue resolved by pull request 40401 [https://github.com/apache/spark/pull/40401] > Minor grammatical change to "Supports Spark Connect" message > > > Key: SPARK-42773 > URL: https://issues.apache.org/jira/browse/SPARK-42773 > Project: Spark > Issue Type: Documentation > Components: PySpark >Affects Versions: 3.4.0 >Reporter: Allan Folting >Assignee: Allan Folting >Priority: Major > Fix For: 3.4.0 > > > Changing "Support Spark Connect" to "Supports Spark Connect" in the 3.4.0 > version change message which is also used in the documentation: > > .. versionchanged:: 3.4.0 > Supports Spark Connect. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-42773) Minor grammatical change to "Supports Spark Connect" message
[ https://issues.apache.org/jira/browse/SPARK-42773?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ruifeng Zheng reassigned SPARK-42773: - Assignee: Allan Folting > Minor grammatical change to "Supports Spark Connect" message > > > Key: SPARK-42773 > URL: https://issues.apache.org/jira/browse/SPARK-42773 > Project: Spark > Issue Type: Documentation > Components: PySpark >Affects Versions: 3.4.0 >Reporter: Allan Folting >Assignee: Allan Folting >Priority: Major > > Changing "Support Spark Connect" to "Supports Spark Connect" in the 3.4.0 > version change message which is also used in the documentation: > > .. versionchanged:: 3.4.0 > Supports Spark Connect. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-42702) Support parameterized CTE
[ https://issues.apache.org/jira/browse/SPARK-42702?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wenchen Fan resolved SPARK-42702. - Assignee: Wenchen Fan (was: Max Gekk) Resolution: Fixed > Support parameterized CTE > - > > Key: SPARK-42702 > URL: https://issues.apache.org/jira/browse/SPARK-42702 > Project: Spark > Issue Type: New Feature > Components: SQL >Affects Versions: 3.4.0 >Reporter: Max Gekk >Assignee: Wenchen Fan >Priority: Major > > Support named parameters in named common table expressions (CTE). At the > moment, such queries failed: > {code:java} > CREATE TABLE tbl(namespace STRING) USING parquet > INSERT INTO tbl SELECT 'abc' > WITH transitions AS ( > SELECT * FROM tbl WHERE namespace = :namespace > ) SELECT * FROM transitions {code} > w/ the following error: > {code:java} > [UNBOUND_SQL_PARAMETER] Found the unbound parameter: `namespace`. Please, fix > `args` and provide a mapping of the parameter to a SQL literal.; line 3 pos > 38; > 'WithCTE > :- 'CTERelationDef 0, false > : +- 'SubqueryAlias transitions > : +- 'Project [*] > : +- 'Filter (namespace#3 = parameter(namespace)) > : +- SubqueryAlias spark_catalog.default.tbl > : +- Relation spark_catalog.default.tbl[namespace#3] parquet > +- 'Project [*] > +- 'SubqueryAlias transitions > +- 'CTERelationRef 0, falseorg.apache.spark.sql.AnalysisException: > [UNBOUND_SQL_PARAMETER] Found the unbound parameter: `namespace`. Please, fix > `args` and provide a mapping of the parameter to a SQL literal.; line 3 pos > 38; > 'WithCTE > :- 'CTERelationDef 0, false > : +- 'SubqueryAlias transitions > : +- 'Project [*] > : +- 'Filter (namespace#3 = parameter(namespace)) > : +- SubqueryAlias spark_catalog.default.tbl > : +- Relation spark_catalog.default.tbl[namespace#3] parquet > +- 'Project [*] > +- 'SubqueryAlias transitions > +- 'CTERelationRef 0, false at > org.apache.spark.sql.catalyst.analysis.package$AnalysisErrorAt.failAnalysis(package.scala:52) > at > org.apache.spark.sql.catalyst.analysis.CheckAnalysis.$anonfun$checkAnalysis0$5(CheckAnalysis.scala:339) > at > org.apache.spark.sql.catalyst.analysis.CheckAnalysis.$anonfun$checkAnalysis0$5$adapted(CheckAnalysis.scala:244) > {code} -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-42597) Support unwrap date type to timestamp type
[ https://issues.apache.org/jira/browse/SPARK-42597?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Yuming Wang updated SPARK-42597: Summary: Support unwrap date type to timestamp type (was: UnwrapCastInBinaryComparison support unwrap timestamp type) > Support unwrap date type to timestamp type > -- > > Key: SPARK-42597 > URL: https://issues.apache.org/jira/browse/SPARK-42597 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 3.5.0 >Reporter: Yuming Wang >Assignee: Yuming Wang >Priority: Major > Fix For: 3.5.0 > > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-42597) UnwrapCastInBinaryComparison support unwrap timestamp type
[ https://issues.apache.org/jira/browse/SPARK-42597?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Yuming Wang resolved SPARK-42597. - Fix Version/s: 3.5.0 Resolution: Fixed Issue resolved by pull request 40190 [https://github.com/apache/spark/pull/40190] > UnwrapCastInBinaryComparison support unwrap timestamp type > -- > > Key: SPARK-42597 > URL: https://issues.apache.org/jira/browse/SPARK-42597 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 3.5.0 >Reporter: Yuming Wang >Assignee: Yuming Wang >Priority: Major > Fix For: 3.5.0 > > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-42597) UnwrapCastInBinaryComparison support unwrap timestamp type
[ https://issues.apache.org/jira/browse/SPARK-42597?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Yuming Wang reassigned SPARK-42597: --- Assignee: Yuming Wang > UnwrapCastInBinaryComparison support unwrap timestamp type > -- > > Key: SPARK-42597 > URL: https://issues.apache.org/jira/browse/SPARK-42597 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 3.5.0 >Reporter: Yuming Wang >Assignee: Yuming Wang >Priority: Major > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-42711) build/sbt usage error messages and shellcheck warn/error
[ https://issues.apache.org/jira/browse/SPARK-42711?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Liang Yan resolved SPARK-42711. --- Resolution: Not A Problem The original codes are just a copy of upstream. And the changes not fix actual problem. > build/sbt usage error messages and shellcheck warn/error > > > Key: SPARK-42711 > URL: https://issues.apache.org/jira/browse/SPARK-42711 > Project: Spark > Issue Type: Bug > Components: Build >Affects Versions: 3.3.2 >Reporter: Liang Yan >Priority: Minor > > The build/sbt tool's usage information has some missing content: > > {code:java} > (base) spark% ./build/sbt -help > Usage: [options] > -h | -help print this message > -v | -verbose this runner is chattier > {code} > And also some shellcheck warn/error. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Closed] (SPARK-42711) build/sbt usage error messages and shellcheck warn/error
[ https://issues.apache.org/jira/browse/SPARK-42711?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Liang Yan closed SPARK-42711. - > build/sbt usage error messages and shellcheck warn/error > > > Key: SPARK-42711 > URL: https://issues.apache.org/jira/browse/SPARK-42711 > Project: Spark > Issue Type: Bug > Components: Build >Affects Versions: 3.3.2 >Reporter: Liang Yan >Priority: Minor > > The build/sbt tool's usage information has some missing content: > > {code:java} > (base) spark% ./build/sbt -help > Usage: [options] > -h | -help print this message > -v | -verbose this runner is chattier > {code} > And also some shellcheck warn/error. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-21782) Repartition creates skews when numPartitions is a power of 2
[ https://issues.apache.org/jira/browse/SPARK-21782?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17699874#comment-17699874 ] Apache Spark commented on SPARK-21782: -- User 'megaserg' has created a pull request for this issue: https://github.com/apache/spark/pull/18990 > Repartition creates skews when numPartitions is a power of 2 > > > Key: SPARK-21782 > URL: https://issues.apache.org/jira/browse/SPARK-21782 > Project: Spark > Issue Type: Bug > Components: Spark Core >Affects Versions: 2.2.0 >Reporter: Sergey Serebryakov >Assignee: Sergey Serebryakov >Priority: Major > Labels: repartition > Fix For: 2.3.0 > > Attachments: Screen Shot 2017-08-16 at 3.40.01 PM.png > > > *Problem:* > When an RDD (particularly with a low item-per-partition ratio) is > repartitioned to {{numPartitions}} = power of 2, the resulting partitions are > very uneven-sized. This affects both {{repartition()}} and > {{coalesce(shuffle=true)}}. > *Steps to reproduce:* > {code} > $ spark-shell > scala> sc.parallelize(0 until 1000, > 250).repartition(64).glom().map(_.length).collect() > res0: Array[Int] = Array(0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, > 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, > 0, 0, 0, 0, 144, 250, 250, 250, 106, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0) > {code} > *Explanation:* > Currently, the [algorithm for > repartition|https://github.com/apache/spark/blob/v2.2.0/core/src/main/scala/org/apache/spark/rdd/RDD.scala#L450] > (shuffle-enabled coalesce) is as follows: > - for each initial partition {{index}}, generate {{position}} as {{(new > Random(index)).nextInt(numPartitions)}} > - then, for element number {{k}} in initial partition {{index}}, put it in > the new partition {{position + k}} (modulo {{numPartitions}}). > So, essentially elements are smeared roughly equally over {{numPartitions}} > buckets - starting from the one with number {{position+1}}. > Note that a new instance of {{Random}} is created for every initial partition > {{index}}, with a fixed seed {{index}}, and then discarded. So the > {{position}} is deterministic for every {{index}} for any RDD in the world. > Also, [{{nextInt(bound)}} > implementation|http://grepcode.com/file/repository.grepcode.com/java/root/jdk/openjdk/8u40-b25/java/util/Random.java/#393] > has a special case when {{bound}} is a power of 2, which is basically taking > several highest bits from the initial seed, with only a minimal scrambling. > Due to deterministic seed, using the generator only once, and lack of > scrambling, the {{position}} values for power-of-two {{numPartitions}} always > end up being almost the same regardless of the {{index}}, causing some > buckets to be much more popular than others. So, {{repartition}} will in fact > intentionally produce skewed partitions even when before the partition were > roughly equal in size. > The behavior seems to have been introduced in SPARK-1770 by > https://github.com/apache/spark/pull/727/ > {quote} > The load balancing is not perfect: a given output partition > can have up to N more elements than the average if there are N input > partitions. However, some randomization is used to minimize the > probabiliy that this happens. > {quote} > Another related ticket: SPARK-17817 - > https://github.com/apache/spark/pull/15445 -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-21782) Repartition creates skews when numPartitions is a power of 2
[ https://issues.apache.org/jira/browse/SPARK-21782?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17699875#comment-17699875 ] Apache Spark commented on SPARK-21782: -- User 'megaserg' has created a pull request for this issue: https://github.com/apache/spark/pull/18990 > Repartition creates skews when numPartitions is a power of 2 > > > Key: SPARK-21782 > URL: https://issues.apache.org/jira/browse/SPARK-21782 > Project: Spark > Issue Type: Bug > Components: Spark Core >Affects Versions: 2.2.0 >Reporter: Sergey Serebryakov >Assignee: Sergey Serebryakov >Priority: Major > Labels: repartition > Fix For: 2.3.0 > > Attachments: Screen Shot 2017-08-16 at 3.40.01 PM.png > > > *Problem:* > When an RDD (particularly with a low item-per-partition ratio) is > repartitioned to {{numPartitions}} = power of 2, the resulting partitions are > very uneven-sized. This affects both {{repartition()}} and > {{coalesce(shuffle=true)}}. > *Steps to reproduce:* > {code} > $ spark-shell > scala> sc.parallelize(0 until 1000, > 250).repartition(64).glom().map(_.length).collect() > res0: Array[Int] = Array(0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, > 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, > 0, 0, 0, 0, 144, 250, 250, 250, 106, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0) > {code} > *Explanation:* > Currently, the [algorithm for > repartition|https://github.com/apache/spark/blob/v2.2.0/core/src/main/scala/org/apache/spark/rdd/RDD.scala#L450] > (shuffle-enabled coalesce) is as follows: > - for each initial partition {{index}}, generate {{position}} as {{(new > Random(index)).nextInt(numPartitions)}} > - then, for element number {{k}} in initial partition {{index}}, put it in > the new partition {{position + k}} (modulo {{numPartitions}}). > So, essentially elements are smeared roughly equally over {{numPartitions}} > buckets - starting from the one with number {{position+1}}. > Note that a new instance of {{Random}} is created for every initial partition > {{index}}, with a fixed seed {{index}}, and then discarded. So the > {{position}} is deterministic for every {{index}} for any RDD in the world. > Also, [{{nextInt(bound)}} > implementation|http://grepcode.com/file/repository.grepcode.com/java/root/jdk/openjdk/8u40-b25/java/util/Random.java/#393] > has a special case when {{bound}} is a power of 2, which is basically taking > several highest bits from the initial seed, with only a minimal scrambling. > Due to deterministic seed, using the generator only once, and lack of > scrambling, the {{position}} values for power-of-two {{numPartitions}} always > end up being almost the same regardless of the {{index}}, causing some > buckets to be much more popular than others. So, {{repartition}} will in fact > intentionally produce skewed partitions even when before the partition were > roughly equal in size. > The behavior seems to have been introduced in SPARK-1770 by > https://github.com/apache/spark/pull/727/ > {quote} > The load balancing is not perfect: a given output partition > can have up to N more elements than the average if there are N input > partitions. However, some randomization is used to minimize the > probabiliy that this happens. > {quote} > Another related ticket: SPARK-17817 - > https://github.com/apache/spark/pull/15445 -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-42777) Support converting TimestampNTZ catalog stats to plan stats
[ https://issues.apache.org/jira/browse/SPARK-42777?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17699873#comment-17699873 ] Apache Spark commented on SPARK-42777: -- User 'gengliangwang' has created a pull request for this issue: https://github.com/apache/spark/pull/40404 > Support converting TimestampNTZ catalog stats to plan stats > --- > > Key: SPARK-42777 > URL: https://issues.apache.org/jira/browse/SPARK-42777 > Project: Spark > Issue Type: Sub-task > Components: SQL >Affects Versions: 3.4.0 >Reporter: Gengliang Wang >Assignee: Gengliang Wang >Priority: Major > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-42777) Support converting TimestampNTZ catalog stats to plan stats
[ https://issues.apache.org/jira/browse/SPARK-42777?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-42777: Assignee: Gengliang Wang (was: Apache Spark) > Support converting TimestampNTZ catalog stats to plan stats > --- > > Key: SPARK-42777 > URL: https://issues.apache.org/jira/browse/SPARK-42777 > Project: Spark > Issue Type: Sub-task > Components: SQL >Affects Versions: 3.4.0 >Reporter: Gengliang Wang >Assignee: Gengliang Wang >Priority: Major > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-42777) Support converting TimestampNTZ catalog stats to plan stats
[ https://issues.apache.org/jira/browse/SPARK-42777?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17699872#comment-17699872 ] Apache Spark commented on SPARK-42777: -- User 'gengliangwang' has created a pull request for this issue: https://github.com/apache/spark/pull/40404 > Support converting TimestampNTZ catalog stats to plan stats > --- > > Key: SPARK-42777 > URL: https://issues.apache.org/jira/browse/SPARK-42777 > Project: Spark > Issue Type: Sub-task > Components: SQL >Affects Versions: 3.4.0 >Reporter: Gengliang Wang >Assignee: Gengliang Wang >Priority: Major > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-42777) Support converting TimestampNTZ catalog stats to plan stats
[ https://issues.apache.org/jira/browse/SPARK-42777?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-42777: Assignee: Apache Spark (was: Gengliang Wang) > Support converting TimestampNTZ catalog stats to plan stats > --- > > Key: SPARK-42777 > URL: https://issues.apache.org/jira/browse/SPARK-42777 > Project: Spark > Issue Type: Sub-task > Components: SQL >Affects Versions: 3.4.0 >Reporter: Gengliang Wang >Assignee: Apache Spark >Priority: Major > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-42777) Support converting TimestampNTZ catalog stats to plan stats
Gengliang Wang created SPARK-42777: -- Summary: Support converting TimestampNTZ catalog stats to plan stats Key: SPARK-42777 URL: https://issues.apache.org/jira/browse/SPARK-42777 Project: Spark Issue Type: Sub-task Components: SQL Affects Versions: 3.4.0 Reporter: Gengliang Wang Assignee: Gengliang Wang -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-42754) Spark 3.4 history server's SQL tab incorrectly groups SQL executions when replaying event logs from Spark 3.3 and earlier
[ https://issues.apache.org/jira/browse/SPARK-42754?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-42754: Assignee: Apache Spark > Spark 3.4 history server's SQL tab incorrectly groups SQL executions when > replaying event logs from Spark 3.3 and earlier > - > > Key: SPARK-42754 > URL: https://issues.apache.org/jira/browse/SPARK-42754 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 3.4.0 >Reporter: Josh Rosen >Assignee: Apache Spark >Priority: Blocker > Attachments: example.png > > > In Spark 3.4.0 RC4, the Spark History Server's SQL tab incorrectly groups SQL > executions when replaying event logs generated by older Spark versions. > > {*}Reproduction{*}: > {{In ./bin/spark-shell --conf spark.eventLog.enabled=true --conf > spark.eventLog.dir=eventlogs, run three non-nested SQL queries:}} > {code:java} > sql("select * from range(10)").collect() > sql("select * from range(20)").collect() > sql("select * from range(30)").collect(){code} > Exit the shell and use the Spark History Server to replay this application's > UI. > In the SQL tab I expect to see three separate queries, but Spark 3.4's > history server incorrectly groups the second and third queries as nested > queries of the first (see attached screenshot). > > {*}Root cause{*}: > [https://github.com/apache/spark/pull/39268] / SPARK-41752 added a new > *non-optional* {{rootExecutionId: Long}} field to the > SparkListenerSQLExecutionStart case class. > When JsonProtocol deserializes this event it uses the "ignore missing > properties" Jackson deserialization option, causing the > {{rootExecutionField}} to be initialized with a default value of {{{}0{}}}. > The value {{0}} is a legitimate execution ID, so in the deserialized event we > have no ability to distinguish between the absence of a value and a case > where all queries have the first query as the root. > *Proposed* {*}fix{*}: > I think we should change this field to be of type {{Option[Long]}} . I > believe this is a release blocker for Spark 3.4.0 because we cannot change > the type of this new field in a future release without breaking binary > compatibility. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-42754) Spark 3.4 history server's SQL tab incorrectly groups SQL executions when replaying event logs from Spark 3.3 and earlier
[ https://issues.apache.org/jira/browse/SPARK-42754?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17699852#comment-17699852 ] Apache Spark commented on SPARK-42754: -- User 'linhongliu-db' has created a pull request for this issue: https://github.com/apache/spark/pull/40403 > Spark 3.4 history server's SQL tab incorrectly groups SQL executions when > replaying event logs from Spark 3.3 and earlier > - > > Key: SPARK-42754 > URL: https://issues.apache.org/jira/browse/SPARK-42754 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 3.4.0 >Reporter: Josh Rosen >Priority: Blocker > Attachments: example.png > > > In Spark 3.4.0 RC4, the Spark History Server's SQL tab incorrectly groups SQL > executions when replaying event logs generated by older Spark versions. > > {*}Reproduction{*}: > {{In ./bin/spark-shell --conf spark.eventLog.enabled=true --conf > spark.eventLog.dir=eventlogs, run three non-nested SQL queries:}} > {code:java} > sql("select * from range(10)").collect() > sql("select * from range(20)").collect() > sql("select * from range(30)").collect(){code} > Exit the shell and use the Spark History Server to replay this application's > UI. > In the SQL tab I expect to see three separate queries, but Spark 3.4's > history server incorrectly groups the second and third queries as nested > queries of the first (see attached screenshot). > > {*}Root cause{*}: > [https://github.com/apache/spark/pull/39268] / SPARK-41752 added a new > *non-optional* {{rootExecutionId: Long}} field to the > SparkListenerSQLExecutionStart case class. > When JsonProtocol deserializes this event it uses the "ignore missing > properties" Jackson deserialization option, causing the > {{rootExecutionField}} to be initialized with a default value of {{{}0{}}}. > The value {{0}} is a legitimate execution ID, so in the deserialized event we > have no ability to distinguish between the absence of a value and a case > where all queries have the first query as the root. > *Proposed* {*}fix{*}: > I think we should change this field to be of type {{Option[Long]}} . I > believe this is a release blocker for Spark 3.4.0 because we cannot change > the type of this new field in a future release without breaking binary > compatibility. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-42754) Spark 3.4 history server's SQL tab incorrectly groups SQL executions when replaying event logs from Spark 3.3 and earlier
[ https://issues.apache.org/jira/browse/SPARK-42754?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-42754: Assignee: (was: Apache Spark) > Spark 3.4 history server's SQL tab incorrectly groups SQL executions when > replaying event logs from Spark 3.3 and earlier > - > > Key: SPARK-42754 > URL: https://issues.apache.org/jira/browse/SPARK-42754 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 3.4.0 >Reporter: Josh Rosen >Priority: Blocker > Attachments: example.png > > > In Spark 3.4.0 RC4, the Spark History Server's SQL tab incorrectly groups SQL > executions when replaying event logs generated by older Spark versions. > > {*}Reproduction{*}: > {{In ./bin/spark-shell --conf spark.eventLog.enabled=true --conf > spark.eventLog.dir=eventlogs, run three non-nested SQL queries:}} > {code:java} > sql("select * from range(10)").collect() > sql("select * from range(20)").collect() > sql("select * from range(30)").collect(){code} > Exit the shell and use the Spark History Server to replay this application's > UI. > In the SQL tab I expect to see three separate queries, but Spark 3.4's > history server incorrectly groups the second and third queries as nested > queries of the first (see attached screenshot). > > {*}Root cause{*}: > [https://github.com/apache/spark/pull/39268] / SPARK-41752 added a new > *non-optional* {{rootExecutionId: Long}} field to the > SparkListenerSQLExecutionStart case class. > When JsonProtocol deserializes this event it uses the "ignore missing > properties" Jackson deserialization option, causing the > {{rootExecutionField}} to be initialized with a default value of {{{}0{}}}. > The value {{0}} is a legitimate execution ID, so in the deserialized event we > have no ability to distinguish between the absence of a value and a case > where all queries have the first query as the root. > *Proposed* {*}fix{*}: > I think we should change this field to be of type {{Option[Long]}} . I > believe this is a release blocker for Spark 3.4.0 because we cannot change > the type of this new field in a future release without breaking binary > compatibility. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-42776) BroadcastHashJoinExec.requiredChildDistribution called before columnar replacement rules
Timothy Miller created SPARK-42776: -- Summary: BroadcastHashJoinExec.requiredChildDistribution called before columnar replacement rules Key: SPARK-42776 URL: https://issues.apache.org/jira/browse/SPARK-42776 Project: Spark Issue Type: Bug Components: Optimizer Affects Versions: 3.3.1 Environment: I'm prototyping on a Mac, but that's not really relevant. Reporter: Timothy Miller I am trying to replace BroadcastHashJoinExec with a columnar equivalent. However, I noticed that BroadcastHashJoinExec.requiredChildDistribution gets called BEFORE the columnar replacement rules. As a result, the object that gets broadcast is the plain old hashmap created from row data. By the time the columnar replacement rules are applied, it's too late to get Spark to broadcast any other kind of object. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-42775) approx_percentile produces wrong results for large decimals.
[ https://issues.apache.org/jira/browse/SPARK-42775?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Chenhao Li updated SPARK-42775: --- Description: In the {{approx_percentile}} expression, Spark casts decimal to double to update the aggregation state ([ApproximatePercentile.scala#L181|https://github.com/apache/spark/blob/933dc0c42f0caf74aaa077fd4f2c2e7208452b9b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/aggregate/ApproximatePercentile.scala#L181]) and casts the result double back to decimal ([ApproximatePercentile.scala#L206|https://github.com/apache/spark/blob/933dc0c42f0caf74aaa077fd4f2c2e7208452b9b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/aggregate/ApproximatePercentile.scala#L206]). The precision loss in the casts can make the result decimal out of its precision range. This can lead to the following counter-intuitive results: {code:sql} spark-sql> select approx_percentile(col, 0.5) from values (999) as tab(col); NULL spark-sql> select approx_percentile(col, 0.5) is null from values (999) as tab(col); false spark-sql> select cast(approx_percentile(col, 0.5) as string) from values (999) as tab(col); 1000 spark-sql> desc select approx_percentile(col, 0.5) from values (999) as tab(col); approx_percentile(col, 0.5, 1) decimal(19,0) {code} The result is actually not null, so the second query returns false. The first query returns null because the result cannot fit into {{{}decimal(19, 0){}}}. A suggested fix is to use {{Decimal.changePrecision}} here to ensure the result fits, and really returns a null or throws an exception when the result doesn't fit. was: In the {{approx_percentile}} expression, Spark casts decimal to double to update the aggregation state ([ApproximatePercentile.scala#L181|https://github.com/apache/spark/blob/933dc0c42f0caf74aaa077fd4f2c2e7208452b9b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/aggregate/ApproximatePercentile.scala#L181]) and casts the result double back to decimal ([ApproximatePercentile.scala#L206|https://github.com/apache/spark/blob/933dc0c42f0caf74aaa077fd4f2c2e7208452b9b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/aggregate/ApproximatePercentile.scala#L206]). The precision loss in the casts can make the result decimal out of its precision range. This can lead to the following counter-intuitive results: {code:sql} spark-sql> select approx_percentile(col, 0.5) from values (999) as tab(col); NULL spark-sql> select approx_percentile(col, 0.5) is null from values (999) as tab(col); false spark-sql> select cast(approx_percentile(col, 0.5) as string) from values (999) as tab(col); 1000 spark-sql> desc select approx_percentile(col, 0.5) from values (999) as tab(col); approx_percentile(col, 0.5, 1) decimal(19,0) {code} The result is actually not null, so the second query returns false. The first query returns null because the result cannot fit into {{{}decimal(19, 0){}}}. A suggested fix is to use `Decimal.changePrecision` here to ensure the result fits, and really returns a null or throws an exception when the result doesn't fit. > approx_percentile produces wrong results for large decimals. > > > Key: SPARK-42775 > URL: https://issues.apache.org/jira/browse/SPARK-42775 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 2.1.0, 2.2.0, 2.3.0, 2.4.0, 3.0.0, 3.1.0, 3.2.0, 3.3.0, > 3.4.0 >Reporter: Chenhao Li >Priority: Major > > In the {{approx_percentile}} expression, Spark casts decimal to double to > update the aggregation state > ([ApproximatePercentile.scala#L181|https://github.com/apache/spark/blob/933dc0c42f0caf74aaa077fd4f2c2e7208452b9b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/aggregate/ApproximatePercentile.scala#L181]) > and casts the result double back to decimal > ([ApproximatePercentile.scala#L206|https://github.com/apache/spark/blob/933dc0c42f0caf74aaa077fd4f2c2e7208452b9b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/aggregate/ApproximatePercentile.scala#L206]). > The precision loss in the casts can make the result decimal out of its > precision range. This can lead to the following counter-intuitive results: > {code:sql} > spark-sql> select approx_percentile(col, 0.5) from values > (999) as tab(col); > NULL > spark-sql> select approx_percentile(col, 0.5) is null from values > (999) as tab(col); > false > spark-sql> select cast(approx_percentile(col, 0.5) as string) from values > (999) as tab(col); >
[jira] [Updated] (SPARK-42775) approx_percentile produces wrong results for large decimals.
[ https://issues.apache.org/jira/browse/SPARK-42775?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Chenhao Li updated SPARK-42775: --- Description: In the {{approx_percentile}} expression, Spark casts decimal to double to update the aggregation state ([ApproximatePercentile.scala#L181|https://github.com/apache/spark/blob/933dc0c42f0caf74aaa077fd4f2c2e7208452b9b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/aggregate/ApproximatePercentile.scala#L181]) and casts the result double back to decimal ([ApproximatePercentile.scala#L206|https://github.com/apache/spark/blob/933dc0c42f0caf74aaa077fd4f2c2e7208452b9b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/aggregate/ApproximatePercentile.scala#L206]). The precision loss in the casts can make the result decimal out of its precision range. This can lead to the following counter-intuitive results: {code:sql} spark-sql> select approx_percentile(col, 0.5) from values (999) as tab(col); NULL spark-sql> select approx_percentile(col, 0.5) is null from values (999) as tab(col); false spark-sql> select cast(approx_percentile(col, 0.5) as string) from values (999) as tab(col); 1000 spark-sql> desc select approx_percentile(col, 0.5) from values (999) as tab(col); approx_percentile(col, 0.5, 1) decimal(19,0) {code} The result is actually not null, so the second query returns false. The first query returns null because the result cannot fit into {{{}decimal(19, 0){}}}. A suggested fix is to use `Decimal.changePrecision` here to ensure the result fits, and really returns a null or throws an exception when the result doesn't fit. was: In the `approx_percentile` expression, Spark casts decimal to double to update the aggregation state ([ApproximatePercentile.scala#L181|https://github.com/apache/spark/blob/933dc0c42f0caf74aaa077fd4f2c2e7208452b9b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/aggregate/ApproximatePercentile.scala#L181]) and casts the result double back to decimal ([ApproximatePercentile.scala#L206|https://github.com/apache/spark/blob/933dc0c42f0caf74aaa077fd4f2c2e7208452b9b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/aggregate/ApproximatePercentile.scala#L206]). The precision loss in the casts can make the result decimal out of its precision range. This can lead to the following counter-intuitive results: {code:sql} spark-sql> select approx_percentile(col, 0.5) from values (999) as tab(col); NULL spark-sql> select approx_percentile(col, 0.5) is null from values (999) as tab(col); false spark-sql> select cast(approx_percentile(col, 0.5) as string) from values (999) as tab(col); 1000 spark-sql> desc select approx_percentile(col, 0.5) from values (999) as tab(col); approx_percentile(col, 0.5, 1) decimal(19,0) {code} The result is actually not null, so the second query returns false. The first query returns null because the result cannot fit into {{decimal(19, 0)}}. A suggested fix is to use `Decimal.changePrecision` here to ensure the result fits, and really returns a null or throws an exception when the result doesn't fit. > approx_percentile produces wrong results for large decimals. > > > Key: SPARK-42775 > URL: https://issues.apache.org/jira/browse/SPARK-42775 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 2.1.0, 2.2.0, 2.3.0, 2.4.0, 3.0.0, 3.1.0, 3.2.0, 3.3.0, > 3.4.0 >Reporter: Chenhao Li >Priority: Major > > In the {{approx_percentile}} expression, Spark casts decimal to double to > update the aggregation state > ([ApproximatePercentile.scala#L181|https://github.com/apache/spark/blob/933dc0c42f0caf74aaa077fd4f2c2e7208452b9b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/aggregate/ApproximatePercentile.scala#L181]) > and casts the result double back to decimal > ([ApproximatePercentile.scala#L206|https://github.com/apache/spark/blob/933dc0c42f0caf74aaa077fd4f2c2e7208452b9b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/aggregate/ApproximatePercentile.scala#L206]). > The precision loss in the casts can make the result decimal out of its > precision range. This can lead to the following counter-intuitive results: > {code:sql} > spark-sql> select approx_percentile(col, 0.5) from values > (999) as tab(col); > NULL > spark-sql> select approx_percentile(col, 0.5) is null from values > (999) as tab(col); > false > spark-sql> select cast(approx_percentile(col, 0.5) as string) from values > (999) as tab(col); >
[jira] [Created] (SPARK-42775) approx_percentile produces wrong results for large decimals.
Chenhao Li created SPARK-42775: -- Summary: approx_percentile produces wrong results for large decimals. Key: SPARK-42775 URL: https://issues.apache.org/jira/browse/SPARK-42775 Project: Spark Issue Type: Bug Components: SQL Affects Versions: 3.3.0, 3.2.0, 3.1.0, 3.0.0, 2.4.0, 2.3.0, 2.2.0, 2.1.0, 3.4.0 Reporter: Chenhao Li In the `approx_percentile` expression, Spark casts decimal to double to update the aggregation state ([ApproximatePercentile.scala#L181|https://github.com/apache/spark/blob/933dc0c42f0caf74aaa077fd4f2c2e7208452b9b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/aggregate/ApproximatePercentile.scala#L181]) and casts the result double back to decimal ([ApproximatePercentile.scala#L206|https://github.com/apache/spark/blob/933dc0c42f0caf74aaa077fd4f2c2e7208452b9b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/aggregate/ApproximatePercentile.scala#L206]). The precision loss in the casts can make the result decimal out of its precision range. This can lead to the following counter-intuitive results: {code:sql} spark-sql> select approx_percentile(col, 0.5) from values (999) as tab(col); NULL spark-sql> select approx_percentile(col, 0.5) is null from values (999) as tab(col); false spark-sql> select cast(approx_percentile(col, 0.5) as string) from values (999) as tab(col); 1000 spark-sql> desc select approx_percentile(col, 0.5) from values (999) as tab(col); approx_percentile(col, 0.5, 1) decimal(19,0) {code} The result is actually not null, so the second query returns false. The first query returns null because the result cannot fit into {{decimal(19, 0)}}. A suggested fix is to use `Decimal.changePrecision` here to ensure the result fits, and really returns a null or throws an exception when the result doesn't fit. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-42020) createDataFrame with UDT
[ https://issues.apache.org/jira/browse/SPARK-42020?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-42020: Assignee: (was: Apache Spark) > createDataFrame with UDT > > > Key: SPARK-42020 > URL: https://issues.apache.org/jira/browse/SPARK-42020 > Project: Spark > Issue Type: Sub-task > Components: Connect >Affects Versions: 3.4.0 >Reporter: Hyukjin Kwon >Priority: Major > > {code} > pyspark/sql/tests/test_types.py:596 > (TypesParityTests.test_apply_schema_with_udt) > self = testMethod=test_apply_schema_with_udt> > def test_apply_schema_with_udt(self): > row = (1.0, ExamplePoint(1.0, 2.0)) > schema = StructType( > [ > StructField("label", DoubleType(), False), > StructField("point", ExamplePointUDT(), False), > ] > ) > > df = self.spark.createDataFrame([row], schema) > ../test_types.py:605: > _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ > _ > ../../connect/session.py:282: in createDataFrame > _table = pa.Table.from_pylist([dict(zip(_cols, list(item))) for item in > _data]) > pyarrow/table.pxi:3700: in pyarrow.lib.Table.from_pylist > ??? > pyarrow/table.pxi:5221: in pyarrow.lib._from_pylist > ??? > pyarrow/table.pxi:3575: in pyarrow.lib.Table.from_arrays > ??? > pyarrow/table.pxi:1383: in pyarrow.lib._sanitize_arrays > ??? > pyarrow/table.pxi:1364: in pyarrow.lib._schema_from_arrays > ??? > pyarrow/array.pxi:320: in pyarrow.lib.array > ??? > pyarrow/array.pxi:39: in pyarrow.lib._sequence_to_array > ??? > pyarrow/error.pxi:144: in pyarrow.lib.pyarrow_internal_check_status > ??? > _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ > _ > > ??? > E pyarrow.lib.ArrowInvalid: Could not convert ExamplePoint(1.0,2.0) with > type ExamplePoint: did not recognize Python value type when inferring an > Arrow data type > pyarrow/error.pxi:100: ArrowInvalid > {code} -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-42020) createDataFrame with UDT
[ https://issues.apache.org/jira/browse/SPARK-42020?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-42020: Assignee: Apache Spark > createDataFrame with UDT > > > Key: SPARK-42020 > URL: https://issues.apache.org/jira/browse/SPARK-42020 > Project: Spark > Issue Type: Sub-task > Components: Connect >Affects Versions: 3.4.0 >Reporter: Hyukjin Kwon >Assignee: Apache Spark >Priority: Major > > {code} > pyspark/sql/tests/test_types.py:596 > (TypesParityTests.test_apply_schema_with_udt) > self = testMethod=test_apply_schema_with_udt> > def test_apply_schema_with_udt(self): > row = (1.0, ExamplePoint(1.0, 2.0)) > schema = StructType( > [ > StructField("label", DoubleType(), False), > StructField("point", ExamplePointUDT(), False), > ] > ) > > df = self.spark.createDataFrame([row], schema) > ../test_types.py:605: > _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ > _ > ../../connect/session.py:282: in createDataFrame > _table = pa.Table.from_pylist([dict(zip(_cols, list(item))) for item in > _data]) > pyarrow/table.pxi:3700: in pyarrow.lib.Table.from_pylist > ??? > pyarrow/table.pxi:5221: in pyarrow.lib._from_pylist > ??? > pyarrow/table.pxi:3575: in pyarrow.lib.Table.from_arrays > ??? > pyarrow/table.pxi:1383: in pyarrow.lib._sanitize_arrays > ??? > pyarrow/table.pxi:1364: in pyarrow.lib._schema_from_arrays > ??? > pyarrow/array.pxi:320: in pyarrow.lib.array > ??? > pyarrow/array.pxi:39: in pyarrow.lib._sequence_to_array > ??? > pyarrow/error.pxi:144: in pyarrow.lib.pyarrow_internal_check_status > ??? > _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ > _ > > ??? > E pyarrow.lib.ArrowInvalid: Could not convert ExamplePoint(1.0,2.0) with > type ExamplePoint: did not recognize Python value type when inferring an > Arrow data type > pyarrow/error.pxi:100: ArrowInvalid > {code} -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-42020) createDataFrame with UDT
[ https://issues.apache.org/jira/browse/SPARK-42020?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17699825#comment-17699825 ] Apache Spark commented on SPARK-42020: -- User 'ueshin' has created a pull request for this issue: https://github.com/apache/spark/pull/40402 > createDataFrame with UDT > > > Key: SPARK-42020 > URL: https://issues.apache.org/jira/browse/SPARK-42020 > Project: Spark > Issue Type: Sub-task > Components: Connect >Affects Versions: 3.4.0 >Reporter: Hyukjin Kwon >Priority: Major > > {code} > pyspark/sql/tests/test_types.py:596 > (TypesParityTests.test_apply_schema_with_udt) > self = testMethod=test_apply_schema_with_udt> > def test_apply_schema_with_udt(self): > row = (1.0, ExamplePoint(1.0, 2.0)) > schema = StructType( > [ > StructField("label", DoubleType(), False), > StructField("point", ExamplePointUDT(), False), > ] > ) > > df = self.spark.createDataFrame([row], schema) > ../test_types.py:605: > _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ > _ > ../../connect/session.py:282: in createDataFrame > _table = pa.Table.from_pylist([dict(zip(_cols, list(item))) for item in > _data]) > pyarrow/table.pxi:3700: in pyarrow.lib.Table.from_pylist > ??? > pyarrow/table.pxi:5221: in pyarrow.lib._from_pylist > ??? > pyarrow/table.pxi:3575: in pyarrow.lib.Table.from_arrays > ??? > pyarrow/table.pxi:1383: in pyarrow.lib._sanitize_arrays > ??? > pyarrow/table.pxi:1364: in pyarrow.lib._schema_from_arrays > ??? > pyarrow/array.pxi:320: in pyarrow.lib.array > ??? > pyarrow/array.pxi:39: in pyarrow.lib._sequence_to_array > ??? > pyarrow/error.pxi:144: in pyarrow.lib.pyarrow_internal_check_status > ??? > _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ > _ > > ??? > E pyarrow.lib.ArrowInvalid: Could not convert ExamplePoint(1.0,2.0) with > type ExamplePoint: did not recognize Python value type when inferring an > Arrow data type > pyarrow/error.pxi:100: ArrowInvalid > {code} -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-42774) Expose VectorTypes API for DataSourceV2 Batch Scans
Micah Kornfield created SPARK-42774: --- Summary: Expose VectorTypes API for DataSourceV2 Batch Scans Key: SPARK-42774 URL: https://issues.apache.org/jira/browse/SPARK-42774 Project: Spark Issue Type: Improvement Components: SQL Affects Versions: 3.3.2 Reporter: Micah Kornfield SparkPlan's vectorType's attribute can be used to [specialize codegen|https://github.com/apache/spark/blob/5556cfc59aa97a3ad4ea0baacebe19859ec0bcb7/sql/core/src/main/scala/org/apache/spark/sql/execution/Columnar.scala#L151] however [BatchScanExecBase|https://github.com/apache/spark/blob/6b6bb6fa20f40aeedea2fb87008e9cce76c54e28/sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/v2/DataSourceV2ScanExecBase.scala] does not override this so we DSv2 sources do not get any benefit of concrete class dispatch. This proposes adding an override to BatchScanExecBase which delegates to a new default method on [PartitionReaderFactory|https://github.com/apache/spark/blob/f1d42bb68d6d69d9a32f91a390270f9ec33c3207/sql/catalyst/src/main/java/org/apache/spark/sql/connector/read/PartitionReaderFactory.java] to expose vectoryTypes: {{ default Optional> getVectorTypes() { return Optional.empty(); } }} -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-42773) Minor grammatical change to "Supports Spark Connect" message
[ https://issues.apache.org/jira/browse/SPARK-42773?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-42773: Assignee: Apache Spark > Minor grammatical change to "Supports Spark Connect" message > > > Key: SPARK-42773 > URL: https://issues.apache.org/jira/browse/SPARK-42773 > Project: Spark > Issue Type: Documentation > Components: PySpark >Affects Versions: 3.4.0 >Reporter: Allan Folting >Assignee: Apache Spark >Priority: Major > > Changing "Support Spark Connect" to "Supports Spark Connect" in the 3.4.0 > version change message which is also used in the documentation: > > .. versionchanged:: 3.4.0 > Supports Spark Connect. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-42773) Minor grammatical change to "Supports Spark Connect" message
[ https://issues.apache.org/jira/browse/SPARK-42773?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-42773: Assignee: (was: Apache Spark) > Minor grammatical change to "Supports Spark Connect" message > > > Key: SPARK-42773 > URL: https://issues.apache.org/jira/browse/SPARK-42773 > Project: Spark > Issue Type: Documentation > Components: PySpark >Affects Versions: 3.4.0 >Reporter: Allan Folting >Priority: Major > > Changing "Support Spark Connect" to "Supports Spark Connect" in the 3.4.0 > version change message which is also used in the documentation: > > .. versionchanged:: 3.4.0 > Supports Spark Connect. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-42773) Minor grammatical change to "Supports Spark Connect" message
[ https://issues.apache.org/jira/browse/SPARK-42773?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17699781#comment-17699781 ] Apache Spark commented on SPARK-42773: -- User 'allanf-db' has created a pull request for this issue: https://github.com/apache/spark/pull/40401 > Minor grammatical change to "Supports Spark Connect" message > > > Key: SPARK-42773 > URL: https://issues.apache.org/jira/browse/SPARK-42773 > Project: Spark > Issue Type: Documentation > Components: PySpark >Affects Versions: 3.4.0 >Reporter: Allan Folting >Priority: Major > > Changing "Support Spark Connect" to "Supports Spark Connect" in the 3.4.0 > version change message which is also used in the documentation: > > .. versionchanged:: 3.4.0 > Supports Spark Connect. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-42769) Add SPARK_DRIVER_POD_IP env variable to executor pods
[ https://issues.apache.org/jira/browse/SPARK-42769?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dongjoon Hyun resolved SPARK-42769. --- Fix Version/s: 3.5.0 Resolution: Fixed Issue resolved by pull request 40392 [https://github.com/apache/spark/pull/40392] > Add SPARK_DRIVER_POD_IP env variable to executor pods > - > > Key: SPARK-42769 > URL: https://issues.apache.org/jira/browse/SPARK-42769 > Project: Spark > Issue Type: Improvement > Components: Kubernetes >Affects Versions: 3.5.0 >Reporter: Dongjoon Hyun >Assignee: Dongjoon Hyun >Priority: Major > Fix For: 3.5.0 > > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-42769) Add SPARK_DRIVER_POD_IP env variable to executor pods
[ https://issues.apache.org/jira/browse/SPARK-42769?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dongjoon Hyun reassigned SPARK-42769: - Assignee: Dongjoon Hyun > Add SPARK_DRIVER_POD_IP env variable to executor pods > - > > Key: SPARK-42769 > URL: https://issues.apache.org/jira/browse/SPARK-42769 > Project: Spark > Issue Type: Improvement > Components: Kubernetes >Affects Versions: 3.5.0 >Reporter: Dongjoon Hyun >Assignee: Dongjoon Hyun >Priority: Major > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-34637) Support DPP in AQE when the broadcast exchange can be reused
[ https://issues.apache.org/jira/browse/SPARK-34637?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Eugene Koifman updated SPARK-34637: --- Summary: Support DPP in AQE when the broadcast exchange can be reused (was: Support DPP in AQE when the boradcast exchange can be reused) > Support DPP in AQE when the broadcast exchange can be reused > > > Key: SPARK-34637 > URL: https://issues.apache.org/jira/browse/SPARK-34637 > Project: Spark > Issue Type: Sub-task > Components: SQL >Affects Versions: 3.2.0 >Reporter: Ke Jia >Assignee: Ke Jia >Priority: Major > Fix For: 3.2.0 > > > We have supported DPP in AQE when the join is Broadcast hash join before > applying the AQE rules in SPARK-34168, which has some limitations. It only > apply DPP when the small table side executed firstly and then the big table > side can reuse the broadcast exchange in small table side. This Jira is to > address the above limitations and can apply the DPP when the broadcast > exchange can be reused. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-41359) Use `PhysicalDataType` instead of DataType in UnsafeRow
[ https://issues.apache.org/jira/browse/SPARK-41359?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-41359: Assignee: Apache Spark > Use `PhysicalDataType` instead of DataType in UnsafeRow > --- > > Key: SPARK-41359 > URL: https://issues.apache.org/jira/browse/SPARK-41359 > Project: Spark > Issue Type: Sub-task > Components: SQL >Affects Versions: 3.4.0 >Reporter: Yang Jie >Assignee: Apache Spark >Priority: Minor > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-41359) Use `PhysicalDataType` instead of DataType in UnsafeRow
[ https://issues.apache.org/jira/browse/SPARK-41359?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17699720#comment-17699720 ] Apache Spark commented on SPARK-41359: -- User 'ClownXC' has created a pull request for this issue: https://github.com/apache/spark/pull/40400 > Use `PhysicalDataType` instead of DataType in UnsafeRow > --- > > Key: SPARK-41359 > URL: https://issues.apache.org/jira/browse/SPARK-41359 > Project: Spark > Issue Type: Sub-task > Components: SQL >Affects Versions: 3.4.0 >Reporter: Yang Jie >Priority: Minor > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-41359) Use `PhysicalDataType` instead of DataType in UnsafeRow
[ https://issues.apache.org/jira/browse/SPARK-41359?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-41359: Assignee: (was: Apache Spark) > Use `PhysicalDataType` instead of DataType in UnsafeRow > --- > > Key: SPARK-41359 > URL: https://issues.apache.org/jira/browse/SPARK-41359 > Project: Spark > Issue Type: Sub-task > Components: SQL >Affects Versions: 3.4.0 >Reporter: Yang Jie >Priority: Minor > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-42773) Minor grammatical change to "Supports Spark Connect" message
Allan Folting created SPARK-42773: - Summary: Minor grammatical change to "Supports Spark Connect" message Key: SPARK-42773 URL: https://issues.apache.org/jira/browse/SPARK-42773 Project: Spark Issue Type: Documentation Components: PySpark Affects Versions: 3.4.0 Reporter: Allan Folting Changing "Support Spark Connect" to "Supports Spark Connect" in the 3.4.0 version change message which is also used in the documentation: .. versionchanged:: 3.4.0 Supports Spark Connect. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-38992) Avoid using bash -c in ShellBasedGroupsMappingProvider
[ https://issues.apache.org/jira/browse/SPARK-38992?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sean R. Owen updated SPARK-38992: - Fix Version/s: (was: 3.1.3) > Avoid using bash -c in ShellBasedGroupsMappingProvider > -- > > Key: SPARK-38992 > URL: https://issues.apache.org/jira/browse/SPARK-38992 > Project: Spark > Issue Type: Bug > Components: Spark Core >Affects Versions: 3.0.3, 3.1.2, 3.2.1, 3.3.0 >Reporter: Hyukjin Kwon >Assignee: Hyukjin Kwon >Priority: Major > Fix For: 3.0.4, 3.3.0, 3.2.2 > > > Using bash -c can allow arbitrary shall execution from the end user. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-42101) Wrap InMemoryTableScanExec with QueryStage
[ https://issues.apache.org/jira/browse/SPARK-42101?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17699610#comment-17699610 ] Apache Spark commented on SPARK-42101: -- User 'cloud-fan' has created a pull request for this issue: https://github.com/apache/spark/pull/40399 > Wrap InMemoryTableScanExec with QueryStage > -- > > Key: SPARK-42101 > URL: https://issues.apache.org/jira/browse/SPARK-42101 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 3.5.0 >Reporter: XiDuo You >Assignee: XiDuo You >Priority: Major > Fix For: 3.5.0 > > > The first access to the cached plan which is enable AQE is tricky. Currently, > we can not preverse it's output partitioning and ordering. > The whole query plan also missed lots of optimization in AQE framework. Wrap > InMemoryTableScanExec to query stage can resolve all these issues. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-42101) Wrap InMemoryTableScanExec with QueryStage
[ https://issues.apache.org/jira/browse/SPARK-42101?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17699611#comment-17699611 ] Apache Spark commented on SPARK-42101: -- User 'cloud-fan' has created a pull request for this issue: https://github.com/apache/spark/pull/40399 > Wrap InMemoryTableScanExec with QueryStage > -- > > Key: SPARK-42101 > URL: https://issues.apache.org/jira/browse/SPARK-42101 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 3.5.0 >Reporter: XiDuo You >Assignee: XiDuo You >Priority: Major > Fix For: 3.5.0 > > > The first access to the cached plan which is enable AQE is tricky. Currently, > we can not preverse it's output partitioning and ordering. > The whole query plan also missed lots of optimization in AQE framework. Wrap > InMemoryTableScanExec to query stage can resolve all these issues. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-42052) Codegen Support for HiveSimpleUDF
[ https://issues.apache.org/jira/browse/SPARK-42052?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17699595#comment-17699595 ] Apache Spark commented on SPARK-42052: -- User 'panbingkun' has created a pull request for this issue: https://github.com/apache/spark/pull/40397 > Codegen Support for HiveSimpleUDF > - > > Key: SPARK-42052 > URL: https://issues.apache.org/jira/browse/SPARK-42052 > Project: Spark > Issue Type: Sub-task > Components: SQL >Affects Versions: 3.4.0 >Reporter: Kent Yao >Priority: Major > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-42772) Change the default value of JDBC options about push down to true
[ https://issues.apache.org/jira/browse/SPARK-42772?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-42772: Assignee: (was: Apache Spark) > Change the default value of JDBC options about push down to true > > > Key: SPARK-42772 > URL: https://issues.apache.org/jira/browse/SPARK-42772 > Project: Spark > Issue Type: Sub-task > Components: SQL >Affects Versions: 3.5.0 >Reporter: jiaan.geng >Priority: Major > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-42772) Change the default value of JDBC options about push down to true
[ https://issues.apache.org/jira/browse/SPARK-42772?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17699571#comment-17699571 ] Apache Spark commented on SPARK-42772: -- User 'beliefer' has created a pull request for this issue: https://github.com/apache/spark/pull/40396 > Change the default value of JDBC options about push down to true > > > Key: SPARK-42772 > URL: https://issues.apache.org/jira/browse/SPARK-42772 > Project: Spark > Issue Type: Sub-task > Components: SQL >Affects Versions: 3.5.0 >Reporter: jiaan.geng >Priority: Major > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-42772) Change the default value of JDBC options about push down to true
[ https://issues.apache.org/jira/browse/SPARK-42772?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-42772: Assignee: Apache Spark > Change the default value of JDBC options about push down to true > > > Key: SPARK-42772 > URL: https://issues.apache.org/jira/browse/SPARK-42772 > Project: Spark > Issue Type: Sub-task > Components: SQL >Affects Versions: 3.5.0 >Reporter: jiaan.geng >Assignee: Apache Spark >Priority: Major > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-42772) Adjust the default value of JDBC options about push down to true
jiaan.geng created SPARK-42772: -- Summary: Adjust the default value of JDBC options about push down to true Key: SPARK-42772 URL: https://issues.apache.org/jira/browse/SPARK-42772 Project: Spark Issue Type: Sub-task Components: SQL Affects Versions: 3.5.0 Reporter: jiaan.geng -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-42772) Change the default value of JDBC options about push down to true
[ https://issues.apache.org/jira/browse/SPARK-42772?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] jiaan.geng updated SPARK-42772: --- Summary: Change the default value of JDBC options about push down to true (was: Adjust the default value of JDBC options about push down to true) > Change the default value of JDBC options about push down to true > > > Key: SPARK-42772 > URL: https://issues.apache.org/jira/browse/SPARK-42772 > Project: Spark > Issue Type: Sub-task > Components: SQL >Affects Versions: 3.5.0 >Reporter: jiaan.geng >Priority: Major > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-42770) SQLImplicitsTestSuite test failed with Java 17
[ https://issues.apache.org/jira/browse/SPARK-42770?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-42770: Assignee: (was: Apache Spark) > SQLImplicitsTestSuite test failed with Java 17 > -- > > Key: SPARK-42770 > URL: https://issues.apache.org/jira/browse/SPARK-42770 > Project: Spark > Issue Type: Bug > Components: Connect, Tests >Affects Versions: 3.4.0, 3.5.0 >Reporter: Yang Jie >Priority: Major > > [https://github.com/apache/spark/actions/runs/4318647315/jobs/7537203682] > {code:java} > [info] - test implicit encoder resolution *** FAILED *** (1 second, 329 > milliseconds) > 4429[info] 2023-03-02T23:00:20.404434 did not equal > 2023-03-02T23:00:20.404434875 (SQLImplicitsTestSuite.scala:63) > 4430[info] org.scalatest.exceptions.TestFailedException: > 4431[info] at > org.scalatest.Assertions.newAssertionFailedException(Assertions.scala:472) > 4432[info] at > org.scalatest.Assertions.newAssertionFailedException$(Assertions.scala:471) > 4433[info] at > org.scalatest.Assertions$.newAssertionFailedException(Assertions.scala:1231) > 4434[info] at > org.scalatest.Assertions$AssertionsHelper.macroAssert(Assertions.scala:1295) > 4435[info] at > org.apache.spark.sql.SQLImplicitsTestSuite.testImplicit$1(SQLImplicitsTestSuite.scala:63) > 4436[info] at > org.apache.spark.sql.SQLImplicitsTestSuite.$anonfun$new$2(SQLImplicitsTestSuite.scala:133) > 4437[info] at > scala.runtime.java8.JFunction0$mcV$sp.apply(JFunction0$mcV$sp.java:23) > 4438[info] at org.scalatest.OutcomeOf.outcomeOf(OutcomeOf.scala:85) > 4439[info] at org.scalatest.OutcomeOf.outcomeOf$(OutcomeOf.scala:83) > 4440[info] at org.scalatest.OutcomeOf$.outcomeOf(OutcomeOf.scala:104) > 4441[info] at org.scalatest.Transformer.apply(Transformer.scala:22) > 4442[info] at org.scalatest.Transformer.apply(Transformer.scala:20) > 4443[info] at > org.scalatest.funsuite.AnyFunSuiteLike$$anon$1.apply(AnyFunSuiteLike.scala:226) > [info] at org.scalatest.TestSuite.withFixture(TestSuite.scala:196) > 4445[info] at org.scalatest.TestSuite.withFixture$(TestSuite.scala:195) > 4446[info] at > org.scalatest.funsuite.AnyFunSuite.withFixture(AnyFunSuite.scala:1564) > 4447[info] at > org.scalatest.funsuite.AnyFunSuiteLike.invokeWithFixture$1(AnyFunSuiteLike.scala:224) > 4448[info] at > org.scalatest.funsuite.AnyFunSuiteLike.$anonfun$runTest$1(AnyFunSuiteLike.scala:236) > 4449[info] at org.scalatest.SuperEngine.runTestImpl(Engine.scala:306) > 4450[info] at > org.scalatest.funsuite.AnyFunSuiteLike.runTest(AnyFunSuiteLike.scala:236) > 4451[info] at > org.scalatest.funsuite.AnyFunSuiteLike.runTest$(AnyFunSuiteLike.scala:218) > 4452[info] at > org.scalatest.funsuite.AnyFunSuite.runTest(AnyFunSuite.scala:1564) > 4453[info] at > org.scalatest.funsuite.AnyFunSuiteLike.$anonfun$runTests$1(AnyFunSuiteLike.scala:269) > 4454[info] at > org.scalatest.SuperEngine.$anonfun$runTestsInBranch$1(Engine.scala:413) > 4455[info] at scala.collection.immutable.List.foreach(List.scala:431) > 4456[info] at org.scalatest.SuperEngine.traverseSubNodes$1(Engine.scala:401) > 4457[info] at org.scalatest.SuperEngine.runTestsInBranch(Engine.scala:396) > 4458[info] at org.scalatest.SuperEngine.runTestsImpl(Engine.scala:475) > 4459[info] at > org.scalatest.funsuite.AnyFunSuiteLike.runTests(AnyFunSuiteLike.scala:269) > 4460[info] at > org.scalatest.funsuite.AnyFunSuiteLike.runTests$(AnyFunSuiteLike.scala:268) > 4461[info] at > org.scalatest.funsuite.AnyFunSuite.runTests(AnyFunSuite.scala:1564) > 4462[info] at org.scalatest.Suite.run(Suite.scala:1114) > 4463[info] at org.scalatest.Suite.run$(Suite.scala:1096) > 4464[info] at > org.scalatest.funsuite.AnyFunSuite.org$scalatest$funsuite$AnyFunSuiteLike$$super$run(AnyFunSuite.scala:1564) > 4465[info] at > org.scalatest.funsuite.AnyFunSuiteLike.$anonfun$run$1(AnyFunSuiteLike.scala:273) > 4466[info] at org.scalatest.SuperEngine.runImpl(Engine.scala:535) > 4467[info] at > org.scalatest.funsuite.AnyFunSuiteLike.run(AnyFunSuiteLike.scala:273) > 4468[info] at > org.scalatest.funsuite.AnyFunSuiteLike.run$(AnyFunSuiteLike.scala:272) > 4469[info] at > org.apache.spark.sql.SQLImplicitsTestSuite.org$scalatest$BeforeAndAfterAll$$super$run(SQLImplicitsTestSuite.scala:34) > 4470[info] at > org.scalatest.BeforeAndAfterAll.liftedTree1$1(BeforeAndAfterAll.scala:213) > 4471[info] at > org.scalatest.BeforeAndAfterAll.run(BeforeAndAfterAll.scala:210) > 4472[info] at > org.scalatest.BeforeAndAfterAll.run$(BeforeAndAfterAll.scala:208) > 4473[info] at > org.apache.spark.sql.SQLImplicitsTestSuite.run(SQLImplicitsTestSuite.scala:34) > 4474[info] at >
[jira] [Assigned] (SPARK-42770) SQLImplicitsTestSuite test failed with Java 17
[ https://issues.apache.org/jira/browse/SPARK-42770?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-42770: Assignee: Apache Spark > SQLImplicitsTestSuite test failed with Java 17 > -- > > Key: SPARK-42770 > URL: https://issues.apache.org/jira/browse/SPARK-42770 > Project: Spark > Issue Type: Bug > Components: Connect, Tests >Affects Versions: 3.4.0, 3.5.0 >Reporter: Yang Jie >Assignee: Apache Spark >Priority: Major > > [https://github.com/apache/spark/actions/runs/4318647315/jobs/7537203682] > {code:java} > [info] - test implicit encoder resolution *** FAILED *** (1 second, 329 > milliseconds) > 4429[info] 2023-03-02T23:00:20.404434 did not equal > 2023-03-02T23:00:20.404434875 (SQLImplicitsTestSuite.scala:63) > 4430[info] org.scalatest.exceptions.TestFailedException: > 4431[info] at > org.scalatest.Assertions.newAssertionFailedException(Assertions.scala:472) > 4432[info] at > org.scalatest.Assertions.newAssertionFailedException$(Assertions.scala:471) > 4433[info] at > org.scalatest.Assertions$.newAssertionFailedException(Assertions.scala:1231) > 4434[info] at > org.scalatest.Assertions$AssertionsHelper.macroAssert(Assertions.scala:1295) > 4435[info] at > org.apache.spark.sql.SQLImplicitsTestSuite.testImplicit$1(SQLImplicitsTestSuite.scala:63) > 4436[info] at > org.apache.spark.sql.SQLImplicitsTestSuite.$anonfun$new$2(SQLImplicitsTestSuite.scala:133) > 4437[info] at > scala.runtime.java8.JFunction0$mcV$sp.apply(JFunction0$mcV$sp.java:23) > 4438[info] at org.scalatest.OutcomeOf.outcomeOf(OutcomeOf.scala:85) > 4439[info] at org.scalatest.OutcomeOf.outcomeOf$(OutcomeOf.scala:83) > 4440[info] at org.scalatest.OutcomeOf$.outcomeOf(OutcomeOf.scala:104) > 4441[info] at org.scalatest.Transformer.apply(Transformer.scala:22) > 4442[info] at org.scalatest.Transformer.apply(Transformer.scala:20) > 4443[info] at > org.scalatest.funsuite.AnyFunSuiteLike$$anon$1.apply(AnyFunSuiteLike.scala:226) > [info] at org.scalatest.TestSuite.withFixture(TestSuite.scala:196) > 4445[info] at org.scalatest.TestSuite.withFixture$(TestSuite.scala:195) > 4446[info] at > org.scalatest.funsuite.AnyFunSuite.withFixture(AnyFunSuite.scala:1564) > 4447[info] at > org.scalatest.funsuite.AnyFunSuiteLike.invokeWithFixture$1(AnyFunSuiteLike.scala:224) > 4448[info] at > org.scalatest.funsuite.AnyFunSuiteLike.$anonfun$runTest$1(AnyFunSuiteLike.scala:236) > 4449[info] at org.scalatest.SuperEngine.runTestImpl(Engine.scala:306) > 4450[info] at > org.scalatest.funsuite.AnyFunSuiteLike.runTest(AnyFunSuiteLike.scala:236) > 4451[info] at > org.scalatest.funsuite.AnyFunSuiteLike.runTest$(AnyFunSuiteLike.scala:218) > 4452[info] at > org.scalatest.funsuite.AnyFunSuite.runTest(AnyFunSuite.scala:1564) > 4453[info] at > org.scalatest.funsuite.AnyFunSuiteLike.$anonfun$runTests$1(AnyFunSuiteLike.scala:269) > 4454[info] at > org.scalatest.SuperEngine.$anonfun$runTestsInBranch$1(Engine.scala:413) > 4455[info] at scala.collection.immutable.List.foreach(List.scala:431) > 4456[info] at org.scalatest.SuperEngine.traverseSubNodes$1(Engine.scala:401) > 4457[info] at org.scalatest.SuperEngine.runTestsInBranch(Engine.scala:396) > 4458[info] at org.scalatest.SuperEngine.runTestsImpl(Engine.scala:475) > 4459[info] at > org.scalatest.funsuite.AnyFunSuiteLike.runTests(AnyFunSuiteLike.scala:269) > 4460[info] at > org.scalatest.funsuite.AnyFunSuiteLike.runTests$(AnyFunSuiteLike.scala:268) > 4461[info] at > org.scalatest.funsuite.AnyFunSuite.runTests(AnyFunSuite.scala:1564) > 4462[info] at org.scalatest.Suite.run(Suite.scala:1114) > 4463[info] at org.scalatest.Suite.run$(Suite.scala:1096) > 4464[info] at > org.scalatest.funsuite.AnyFunSuite.org$scalatest$funsuite$AnyFunSuiteLike$$super$run(AnyFunSuite.scala:1564) > 4465[info] at > org.scalatest.funsuite.AnyFunSuiteLike.$anonfun$run$1(AnyFunSuiteLike.scala:273) > 4466[info] at org.scalatest.SuperEngine.runImpl(Engine.scala:535) > 4467[info] at > org.scalatest.funsuite.AnyFunSuiteLike.run(AnyFunSuiteLike.scala:273) > 4468[info] at > org.scalatest.funsuite.AnyFunSuiteLike.run$(AnyFunSuiteLike.scala:272) > 4469[info] at > org.apache.spark.sql.SQLImplicitsTestSuite.org$scalatest$BeforeAndAfterAll$$super$run(SQLImplicitsTestSuite.scala:34) > 4470[info] at > org.scalatest.BeforeAndAfterAll.liftedTree1$1(BeforeAndAfterAll.scala:213) > 4471[info] at > org.scalatest.BeforeAndAfterAll.run(BeforeAndAfterAll.scala:210) > 4472[info] at > org.scalatest.BeforeAndAfterAll.run$(BeforeAndAfterAll.scala:208) > 4473[info] at > org.apache.spark.sql.SQLImplicitsTestSuite.run(SQLImplicitsTestSuite.scala:34) > 4474[info] at >
[jira] [Commented] (SPARK-42770) SQLImplicitsTestSuite test failed with Java 17
[ https://issues.apache.org/jira/browse/SPARK-42770?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17699542#comment-17699542 ] Apache Spark commented on SPARK-42770: -- User 'LuciferYang' has created a pull request for this issue: https://github.com/apache/spark/pull/40395 > SQLImplicitsTestSuite test failed with Java 17 > -- > > Key: SPARK-42770 > URL: https://issues.apache.org/jira/browse/SPARK-42770 > Project: Spark > Issue Type: Bug > Components: Connect, Tests >Affects Versions: 3.4.0, 3.5.0 >Reporter: Yang Jie >Priority: Major > > [https://github.com/apache/spark/actions/runs/4318647315/jobs/7537203682] > {code:java} > [info] - test implicit encoder resolution *** FAILED *** (1 second, 329 > milliseconds) > 4429[info] 2023-03-02T23:00:20.404434 did not equal > 2023-03-02T23:00:20.404434875 (SQLImplicitsTestSuite.scala:63) > 4430[info] org.scalatest.exceptions.TestFailedException: > 4431[info] at > org.scalatest.Assertions.newAssertionFailedException(Assertions.scala:472) > 4432[info] at > org.scalatest.Assertions.newAssertionFailedException$(Assertions.scala:471) > 4433[info] at > org.scalatest.Assertions$.newAssertionFailedException(Assertions.scala:1231) > 4434[info] at > org.scalatest.Assertions$AssertionsHelper.macroAssert(Assertions.scala:1295) > 4435[info] at > org.apache.spark.sql.SQLImplicitsTestSuite.testImplicit$1(SQLImplicitsTestSuite.scala:63) > 4436[info] at > org.apache.spark.sql.SQLImplicitsTestSuite.$anonfun$new$2(SQLImplicitsTestSuite.scala:133) > 4437[info] at > scala.runtime.java8.JFunction0$mcV$sp.apply(JFunction0$mcV$sp.java:23) > 4438[info] at org.scalatest.OutcomeOf.outcomeOf(OutcomeOf.scala:85) > 4439[info] at org.scalatest.OutcomeOf.outcomeOf$(OutcomeOf.scala:83) > 4440[info] at org.scalatest.OutcomeOf$.outcomeOf(OutcomeOf.scala:104) > 4441[info] at org.scalatest.Transformer.apply(Transformer.scala:22) > 4442[info] at org.scalatest.Transformer.apply(Transformer.scala:20) > 4443[info] at > org.scalatest.funsuite.AnyFunSuiteLike$$anon$1.apply(AnyFunSuiteLike.scala:226) > [info] at org.scalatest.TestSuite.withFixture(TestSuite.scala:196) > 4445[info] at org.scalatest.TestSuite.withFixture$(TestSuite.scala:195) > 4446[info] at > org.scalatest.funsuite.AnyFunSuite.withFixture(AnyFunSuite.scala:1564) > 4447[info] at > org.scalatest.funsuite.AnyFunSuiteLike.invokeWithFixture$1(AnyFunSuiteLike.scala:224) > 4448[info] at > org.scalatest.funsuite.AnyFunSuiteLike.$anonfun$runTest$1(AnyFunSuiteLike.scala:236) > 4449[info] at org.scalatest.SuperEngine.runTestImpl(Engine.scala:306) > 4450[info] at > org.scalatest.funsuite.AnyFunSuiteLike.runTest(AnyFunSuiteLike.scala:236) > 4451[info] at > org.scalatest.funsuite.AnyFunSuiteLike.runTest$(AnyFunSuiteLike.scala:218) > 4452[info] at > org.scalatest.funsuite.AnyFunSuite.runTest(AnyFunSuite.scala:1564) > 4453[info] at > org.scalatest.funsuite.AnyFunSuiteLike.$anonfun$runTests$1(AnyFunSuiteLike.scala:269) > 4454[info] at > org.scalatest.SuperEngine.$anonfun$runTestsInBranch$1(Engine.scala:413) > 4455[info] at scala.collection.immutable.List.foreach(List.scala:431) > 4456[info] at org.scalatest.SuperEngine.traverseSubNodes$1(Engine.scala:401) > 4457[info] at org.scalatest.SuperEngine.runTestsInBranch(Engine.scala:396) > 4458[info] at org.scalatest.SuperEngine.runTestsImpl(Engine.scala:475) > 4459[info] at > org.scalatest.funsuite.AnyFunSuiteLike.runTests(AnyFunSuiteLike.scala:269) > 4460[info] at > org.scalatest.funsuite.AnyFunSuiteLike.runTests$(AnyFunSuiteLike.scala:268) > 4461[info] at > org.scalatest.funsuite.AnyFunSuite.runTests(AnyFunSuite.scala:1564) > 4462[info] at org.scalatest.Suite.run(Suite.scala:1114) > 4463[info] at org.scalatest.Suite.run$(Suite.scala:1096) > 4464[info] at > org.scalatest.funsuite.AnyFunSuite.org$scalatest$funsuite$AnyFunSuiteLike$$super$run(AnyFunSuite.scala:1564) > 4465[info] at > org.scalatest.funsuite.AnyFunSuiteLike.$anonfun$run$1(AnyFunSuiteLike.scala:273) > 4466[info] at org.scalatest.SuperEngine.runImpl(Engine.scala:535) > 4467[info] at > org.scalatest.funsuite.AnyFunSuiteLike.run(AnyFunSuiteLike.scala:273) > 4468[info] at > org.scalatest.funsuite.AnyFunSuiteLike.run$(AnyFunSuiteLike.scala:272) > 4469[info] at > org.apache.spark.sql.SQLImplicitsTestSuite.org$scalatest$BeforeAndAfterAll$$super$run(SQLImplicitsTestSuite.scala:34) > 4470[info] at > org.scalatest.BeforeAndAfterAll.liftedTree1$1(BeforeAndAfterAll.scala:213) > 4471[info] at > org.scalatest.BeforeAndAfterAll.run(BeforeAndAfterAll.scala:210) > 4472[info] at > org.scalatest.BeforeAndAfterAll.run$(BeforeAndAfterAll.scala:208) > 4473[info] at > org.apache.spark.sql.SQLImplicitsTestSuite.run(SQLImplicitsTestSuite.scala:34) >
[jira] [Commented] (SPARK-42771) Refactor HiveGenericUDF
[ https://issues.apache.org/jira/browse/SPARK-42771?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17699527#comment-17699527 ] Apache Spark commented on SPARK-42771: -- User 'panbingkun' has created a pull request for this issue: https://github.com/apache/spark/pull/40394 > Refactor HiveGenericUDF > --- > > Key: SPARK-42771 > URL: https://issues.apache.org/jira/browse/SPARK-42771 > Project: Spark > Issue Type: Sub-task > Components: SQL >Affects Versions: 3.5.0 >Reporter: BingKun Pan >Priority: Minor > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-42771) Refactor HiveGenericUDF
[ https://issues.apache.org/jira/browse/SPARK-42771?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-42771: Assignee: (was: Apache Spark) > Refactor HiveGenericUDF > --- > > Key: SPARK-42771 > URL: https://issues.apache.org/jira/browse/SPARK-42771 > Project: Spark > Issue Type: Sub-task > Components: SQL >Affects Versions: 3.5.0 >Reporter: BingKun Pan >Priority: Minor > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-42771) Refactor HiveGenericUDF
[ https://issues.apache.org/jira/browse/SPARK-42771?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-42771: Assignee: Apache Spark > Refactor HiveGenericUDF > --- > > Key: SPARK-42771 > URL: https://issues.apache.org/jira/browse/SPARK-42771 > Project: Spark > Issue Type: Sub-task > Components: SQL >Affects Versions: 3.5.0 >Reporter: BingKun Pan >Assignee: Apache Spark >Priority: Minor > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-42769) Add SPARK_DRIVER_POD_IP env variable to executor pods
[ https://issues.apache.org/jira/browse/SPARK-42769?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dongjoon Hyun updated SPARK-42769: -- Summary: Add SPARK_DRIVER_POD_IP env variable to executor pods (was: Add ENV_DRIVER_POD_IP env variable to executor pods) > Add SPARK_DRIVER_POD_IP env variable to executor pods > - > > Key: SPARK-42769 > URL: https://issues.apache.org/jira/browse/SPARK-42769 > Project: Spark > Issue Type: Improvement > Components: Kubernetes >Affects Versions: 3.5.0 >Reporter: Dongjoon Hyun >Priority: Major > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-42771) Refactor HiveGenericUDF
BingKun Pan created SPARK-42771: --- Summary: Refactor HiveGenericUDF Key: SPARK-42771 URL: https://issues.apache.org/jira/browse/SPARK-42771 Project: Spark Issue Type: Sub-task Components: SQL Affects Versions: 3.5.0 Reporter: BingKun Pan -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-40082) DAGScheduler may not schduler new stage in condition of push-based shuffle enabled
[ https://issues.apache.org/jira/browse/SPARK-40082?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17699516#comment-17699516 ] Apache Spark commented on SPARK-40082: -- User 'Stove-hust' has created a pull request for this issue: https://github.com/apache/spark/pull/40393 > DAGScheduler may not schduler new stage in condition of push-based shuffle > enabled > -- > > Key: SPARK-40082 > URL: https://issues.apache.org/jira/browse/SPARK-40082 > Project: Spark > Issue Type: Bug > Components: Scheduler >Affects Versions: 3.1.1 >Reporter: Penglei Shi >Priority: Major > Attachments: missParentStages.png, shuffleMergeFinalized.png, > submitMissingTasks.png > > > In condition of push-based shuffle being enabled and speculative tasks > existing, a shuffleMapStage will be resubmitting once fetchFailed occurring, > then its parent stages will be resubmitting firstly and it will cost some > time to compute. Before the shuffleMapStage being resubmitted, its all > speculative tasks success and register map output, but speculative task > successful events can not trigger shuffleMergeFinalized because this stage > has been removed from runningStages. > Then this stage is resubmitted, but speculative tasks have registered map > output and there are no missing tasks to compute, resubmitting stages will > also not trigger shuffleMergeFinalized. Eventually this stage‘s > _shuffleMergedFinalized keeps false. > Then AQE will submit next stages which are dependent on this shuffleMapStage > occurring fetchFailed. And in getMissingParentStages, this stage will be > marked as missing and will be resubmitted, but next stages are added to > waitingStages after this stage being finished, so next stages will not be > submitted even though this stage's resubmitting has been finished. > I have only met some times in my production env and it is difficult to > reproduce。 -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-40082) DAGScheduler may not schduler new stage in condition of push-based shuffle enabled
[ https://issues.apache.org/jira/browse/SPARK-40082?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-40082: Assignee: (was: Apache Spark) > DAGScheduler may not schduler new stage in condition of push-based shuffle > enabled > -- > > Key: SPARK-40082 > URL: https://issues.apache.org/jira/browse/SPARK-40082 > Project: Spark > Issue Type: Bug > Components: Scheduler >Affects Versions: 3.1.1 >Reporter: Penglei Shi >Priority: Major > Attachments: missParentStages.png, shuffleMergeFinalized.png, > submitMissingTasks.png > > > In condition of push-based shuffle being enabled and speculative tasks > existing, a shuffleMapStage will be resubmitting once fetchFailed occurring, > then its parent stages will be resubmitting firstly and it will cost some > time to compute. Before the shuffleMapStage being resubmitted, its all > speculative tasks success and register map output, but speculative task > successful events can not trigger shuffleMergeFinalized because this stage > has been removed from runningStages. > Then this stage is resubmitted, but speculative tasks have registered map > output and there are no missing tasks to compute, resubmitting stages will > also not trigger shuffleMergeFinalized. Eventually this stage‘s > _shuffleMergedFinalized keeps false. > Then AQE will submit next stages which are dependent on this shuffleMapStage > occurring fetchFailed. And in getMissingParentStages, this stage will be > marked as missing and will be resubmitted, but next stages are added to > waitingStages after this stage being finished, so next stages will not be > submitted even though this stage's resubmitting has been finished. > I have only met some times in my production env and it is difficult to > reproduce。 -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-40082) DAGScheduler may not schduler new stage in condition of push-based shuffle enabled
[ https://issues.apache.org/jira/browse/SPARK-40082?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-40082: Assignee: Apache Spark > DAGScheduler may not schduler new stage in condition of push-based shuffle > enabled > -- > > Key: SPARK-40082 > URL: https://issues.apache.org/jira/browse/SPARK-40082 > Project: Spark > Issue Type: Bug > Components: Scheduler >Affects Versions: 3.1.1 >Reporter: Penglei Shi >Assignee: Apache Spark >Priority: Major > Attachments: missParentStages.png, shuffleMergeFinalized.png, > submitMissingTasks.png > > > In condition of push-based shuffle being enabled and speculative tasks > existing, a shuffleMapStage will be resubmitting once fetchFailed occurring, > then its parent stages will be resubmitting firstly and it will cost some > time to compute. Before the shuffleMapStage being resubmitted, its all > speculative tasks success and register map output, but speculative task > successful events can not trigger shuffleMergeFinalized because this stage > has been removed from runningStages. > Then this stage is resubmitted, but speculative tasks have registered map > output and there are no missing tasks to compute, resubmitting stages will > also not trigger shuffleMergeFinalized. Eventually this stage‘s > _shuffleMergedFinalized keeps false. > Then AQE will submit next stages which are dependent on this shuffleMapStage > occurring fetchFailed. And in getMissingParentStages, this stage will be > marked as missing and will be resubmitted, but next stages are added to > waitingStages after this stage being finished, so next stages will not be > submitted even though this stage's resubmitting has been finished. > I have only met some times in my production env and it is difficult to > reproduce。 -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-40082) DAGScheduler may not schduler new stage in condition of push-based shuffle enabled
[ https://issues.apache.org/jira/browse/SPARK-40082?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17699515#comment-17699515 ] Apache Spark commented on SPARK-40082: -- User 'Stove-hust' has created a pull request for this issue: https://github.com/apache/spark/pull/40393 > DAGScheduler may not schduler new stage in condition of push-based shuffle > enabled > -- > > Key: SPARK-40082 > URL: https://issues.apache.org/jira/browse/SPARK-40082 > Project: Spark > Issue Type: Bug > Components: Scheduler >Affects Versions: 3.1.1 >Reporter: Penglei Shi >Priority: Major > Attachments: missParentStages.png, shuffleMergeFinalized.png, > submitMissingTasks.png > > > In condition of push-based shuffle being enabled and speculative tasks > existing, a shuffleMapStage will be resubmitting once fetchFailed occurring, > then its parent stages will be resubmitting firstly and it will cost some > time to compute. Before the shuffleMapStage being resubmitted, its all > speculative tasks success and register map output, but speculative task > successful events can not trigger shuffleMergeFinalized because this stage > has been removed from runningStages. > Then this stage is resubmitted, but speculative tasks have registered map > output and there are no missing tasks to compute, resubmitting stages will > also not trigger shuffleMergeFinalized. Eventually this stage‘s > _shuffleMergedFinalized keeps false. > Then AQE will submit next stages which are dependent on this shuffleMapStage > occurring fetchFailed. And in getMissingParentStages, this stage will be > marked as missing and will be resubmitted, but next stages are added to > waitingStages after this stage being finished, so next stages will not be > submitted even though this stage's resubmitting has been finished. > I have only met some times in my production env and it is difficult to > reproduce。 -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-42508) Extract the common .ml classes to `mllib-common`
[ https://issues.apache.org/jira/browse/SPARK-42508?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ruifeng Zheng reassigned SPARK-42508: - Assignee: Ruifeng Zheng > Extract the common .ml classes to `mllib-common` > > > Key: SPARK-42508 > URL: https://issues.apache.org/jira/browse/SPARK-42508 > Project: Spark > Issue Type: Sub-task > Components: Connect, ML >Affects Versions: 3.4.0 >Reporter: Ruifeng Zheng >Assignee: Ruifeng Zheng >Priority: Major > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-42749) CAST(x as int) does not generate error with overflow
[ https://issues.apache.org/jira/browse/SPARK-42749?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tjomme Vergauwen resolved SPARK-42749. -- Resolution: Fixed Additional settings required to get the intended behaviour. Documentation is up-to-date > CAST(x as int) does not generate error with overflow > > > Key: SPARK-42749 > URL: https://issues.apache.org/jira/browse/SPARK-42749 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 3.2.1, 3.3.0, 3.3.1, 3.3.2 > Environment: It was tested on a DataBricks environment with DBR 10.4 > and above, running Spark v3.2.1 and above. >Reporter: Tjomme Vergauwen >Priority: Major > Attachments: Spark-42749.PNG > > > Hi, > When performing the following code: > {{select cast(7.415246799222789E19 as int)}} > according to the documentation, an error is expected as > {{7.415246799222789E19 }}is an overflow value for datatype INT. > However, the value 2147483647 is returned. > The behaviour of the following is correct as it returns NULL: > {{select try_cast(7.415246799222789E19 as int) }} > This results in unexpected behaviour and data corruption. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-42749) CAST(x as int) does not generate error with overflow
[ https://issues.apache.org/jira/browse/SPARK-42749?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17699503#comment-17699503 ] Tjomme Vergauwen commented on SPARK-42749: -- Just checked the documentation again: the warning aparently was recently added > CAST(x as int) does not generate error with overflow > > > Key: SPARK-42749 > URL: https://issues.apache.org/jira/browse/SPARK-42749 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 3.2.1, 3.3.0, 3.3.1, 3.3.2 > Environment: It was tested on a DataBricks environment with DBR 10.4 > and above, running Spark v3.2.1 and above. >Reporter: Tjomme Vergauwen >Priority: Major > Attachments: Spark-42749.PNG > > > Hi, > When performing the following code: > {{select cast(7.415246799222789E19 as int)}} > according to the documentation, an error is expected as > {{7.415246799222789E19 }}is an overflow value for datatype INT. > However, the value 2147483647 is returned. > The behaviour of the following is correct as it returns NULL: > {{select try_cast(7.415246799222789E19 as int) }} > This results in unexpected behaviour and data corruption. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-42749) CAST(x as int) does not generate error with overflow
[ https://issues.apache.org/jira/browse/SPARK-42749?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17699497#comment-17699497 ] Tjomme Vergauwen commented on SPARK-42749: -- Hi, This does indeed solve the problem. Setting the parameter makes it behave as intended. Can this be noted in the documentation that this is a requirement? Thanks, Tjomme > CAST(x as int) does not generate error with overflow > > > Key: SPARK-42749 > URL: https://issues.apache.org/jira/browse/SPARK-42749 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 3.2.1, 3.3.0, 3.3.1, 3.3.2 > Environment: It was tested on a DataBricks environment with DBR 10.4 > and above, running Spark v3.2.1 and above. >Reporter: Tjomme Vergauwen >Priority: Major > Attachments: Spark-42749.PNG > > > Hi, > When performing the following code: > {{select cast(7.415246799222789E19 as int)}} > according to the documentation, an error is expected as > {{7.415246799222789E19 }}is an overflow value for datatype INT. > However, the value 2147483647 is returned. > The behaviour of the following is correct as it returns NULL: > {{select try_cast(7.415246799222789E19 as int) }} > This results in unexpected behaviour and data corruption. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-39235) Make Catalog API be compatible with 3-layer-namespace
[ https://issues.apache.org/jira/browse/SPARK-39235?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wenchen Fan resolved SPARK-39235. - Fix Version/s: 3.4.0 Resolution: Fixed > Make Catalog API be compatible with 3-layer-namespace > - > > Key: SPARK-39235 > URL: https://issues.apache.org/jira/browse/SPARK-39235 > Project: Spark > Issue Type: Improvement > Components: PySpark, R, SQL >Affects Versions: 3.4.0 >Reporter: Rui Wang >Priority: Major > Fix For: 3.4.0 > > > We can make Catalog API support 3 layer namespace: > catalog_name.database_name.table_name -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-42577) A large stage could run indefinitely due to executor lost
[ https://issues.apache.org/jira/browse/SPARK-42577?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Mridul Muralidharan resolved SPARK-42577. - Fix Version/s: 3.5.0 Resolution: Fixed Issue resolved by pull request 40286 [https://github.com/apache/spark/pull/40286] > A large stage could run indefinitely due to executor lost > - > > Key: SPARK-42577 > URL: https://issues.apache.org/jira/browse/SPARK-42577 > Project: Spark > Issue Type: Improvement > Components: Spark Core >Affects Versions: 3.0.3, 3.1.3, 3.2.3, 3.3.2 >Reporter: wuyi >Assignee: Tengfei Huang >Priority: Major > Fix For: 3.5.0 > > > When a stage is extremely large and Spark runs on spot instances or > problematic clusters with frequent worker/executor loss, the stage could run > indefinitely due to task rerun caused by the executor loss. This happens, > when the external shuffle service is on, and the large stages runs hours to > complete, when spark tries to submit a child stage, it will find the parent > stage - the large one, has missed some partitions, so the large stage has to > rerun. When it completes again, it finds new missing partitions due to the > same reason. > We should add a attempt limitation for this kind of scenario. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-42577) A large stage could run indefinitely due to executor lost
[ https://issues.apache.org/jira/browse/SPARK-42577?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Mridul Muralidharan reassigned SPARK-42577: --- Assignee: Tengfei Huang > A large stage could run indefinitely due to executor lost > - > > Key: SPARK-42577 > URL: https://issues.apache.org/jira/browse/SPARK-42577 > Project: Spark > Issue Type: Improvement > Components: Spark Core >Affects Versions: 3.0.3, 3.1.3, 3.2.3, 3.3.2 >Reporter: wuyi >Assignee: Tengfei Huang >Priority: Major > > When a stage is extremely large and Spark runs on spot instances or > problematic clusters with frequent worker/executor loss, the stage could run > indefinitely due to task rerun caused by the executor loss. This happens, > when the external shuffle service is on, and the large stages runs hours to > complete, when spark tries to submit a child stage, it will find the parent > stage - the large one, has missed some partitions, so the large stage has to > rerun. When it completes again, it finds new missing partitions due to the > same reason. > We should add a attempt limitation for this kind of scenario. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-42770) SQLImplicitsTestSuite test failed with Java 17
[ https://issues.apache.org/jira/browse/SPARK-42770?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17699490#comment-17699490 ] Yang Jie commented on SPARK-42770: -- Maybe it can only be reproduced on Linux > SQLImplicitsTestSuite test failed with Java 17 > -- > > Key: SPARK-42770 > URL: https://issues.apache.org/jira/browse/SPARK-42770 > Project: Spark > Issue Type: Bug > Components: Connect, Tests >Affects Versions: 3.4.0, 3.5.0 >Reporter: Yang Jie >Priority: Major > > [https://github.com/apache/spark/actions/runs/4318647315/jobs/7537203682] > {code:java} > [info] - test implicit encoder resolution *** FAILED *** (1 second, 329 > milliseconds) > 4429[info] 2023-03-02T23:00:20.404434 did not equal > 2023-03-02T23:00:20.404434875 (SQLImplicitsTestSuite.scala:63) > 4430[info] org.scalatest.exceptions.TestFailedException: > 4431[info] at > org.scalatest.Assertions.newAssertionFailedException(Assertions.scala:472) > 4432[info] at > org.scalatest.Assertions.newAssertionFailedException$(Assertions.scala:471) > 4433[info] at > org.scalatest.Assertions$.newAssertionFailedException(Assertions.scala:1231) > 4434[info] at > org.scalatest.Assertions$AssertionsHelper.macroAssert(Assertions.scala:1295) > 4435[info] at > org.apache.spark.sql.SQLImplicitsTestSuite.testImplicit$1(SQLImplicitsTestSuite.scala:63) > 4436[info] at > org.apache.spark.sql.SQLImplicitsTestSuite.$anonfun$new$2(SQLImplicitsTestSuite.scala:133) > 4437[info] at > scala.runtime.java8.JFunction0$mcV$sp.apply(JFunction0$mcV$sp.java:23) > 4438[info] at org.scalatest.OutcomeOf.outcomeOf(OutcomeOf.scala:85) > 4439[info] at org.scalatest.OutcomeOf.outcomeOf$(OutcomeOf.scala:83) > 4440[info] at org.scalatest.OutcomeOf$.outcomeOf(OutcomeOf.scala:104) > 4441[info] at org.scalatest.Transformer.apply(Transformer.scala:22) > 4442[info] at org.scalatest.Transformer.apply(Transformer.scala:20) > 4443[info] at > org.scalatest.funsuite.AnyFunSuiteLike$$anon$1.apply(AnyFunSuiteLike.scala:226) > [info] at org.scalatest.TestSuite.withFixture(TestSuite.scala:196) > 4445[info] at org.scalatest.TestSuite.withFixture$(TestSuite.scala:195) > 4446[info] at > org.scalatest.funsuite.AnyFunSuite.withFixture(AnyFunSuite.scala:1564) > 4447[info] at > org.scalatest.funsuite.AnyFunSuiteLike.invokeWithFixture$1(AnyFunSuiteLike.scala:224) > 4448[info] at > org.scalatest.funsuite.AnyFunSuiteLike.$anonfun$runTest$1(AnyFunSuiteLike.scala:236) > 4449[info] at org.scalatest.SuperEngine.runTestImpl(Engine.scala:306) > 4450[info] at > org.scalatest.funsuite.AnyFunSuiteLike.runTest(AnyFunSuiteLike.scala:236) > 4451[info] at > org.scalatest.funsuite.AnyFunSuiteLike.runTest$(AnyFunSuiteLike.scala:218) > 4452[info] at > org.scalatest.funsuite.AnyFunSuite.runTest(AnyFunSuite.scala:1564) > 4453[info] at > org.scalatest.funsuite.AnyFunSuiteLike.$anonfun$runTests$1(AnyFunSuiteLike.scala:269) > 4454[info] at > org.scalatest.SuperEngine.$anonfun$runTestsInBranch$1(Engine.scala:413) > 4455[info] at scala.collection.immutable.List.foreach(List.scala:431) > 4456[info] at org.scalatest.SuperEngine.traverseSubNodes$1(Engine.scala:401) > 4457[info] at org.scalatest.SuperEngine.runTestsInBranch(Engine.scala:396) > 4458[info] at org.scalatest.SuperEngine.runTestsImpl(Engine.scala:475) > 4459[info] at > org.scalatest.funsuite.AnyFunSuiteLike.runTests(AnyFunSuiteLike.scala:269) > 4460[info] at > org.scalatest.funsuite.AnyFunSuiteLike.runTests$(AnyFunSuiteLike.scala:268) > 4461[info] at > org.scalatest.funsuite.AnyFunSuite.runTests(AnyFunSuite.scala:1564) > 4462[info] at org.scalatest.Suite.run(Suite.scala:1114) > 4463[info] at org.scalatest.Suite.run$(Suite.scala:1096) > 4464[info] at > org.scalatest.funsuite.AnyFunSuite.org$scalatest$funsuite$AnyFunSuiteLike$$super$run(AnyFunSuite.scala:1564) > 4465[info] at > org.scalatest.funsuite.AnyFunSuiteLike.$anonfun$run$1(AnyFunSuiteLike.scala:273) > 4466[info] at org.scalatest.SuperEngine.runImpl(Engine.scala:535) > 4467[info] at > org.scalatest.funsuite.AnyFunSuiteLike.run(AnyFunSuiteLike.scala:273) > 4468[info] at > org.scalatest.funsuite.AnyFunSuiteLike.run$(AnyFunSuiteLike.scala:272) > 4469[info] at > org.apache.spark.sql.SQLImplicitsTestSuite.org$scalatest$BeforeAndAfterAll$$super$run(SQLImplicitsTestSuite.scala:34) > 4470[info] at > org.scalatest.BeforeAndAfterAll.liftedTree1$1(BeforeAndAfterAll.scala:213) > 4471[info] at > org.scalatest.BeforeAndAfterAll.run(BeforeAndAfterAll.scala:210) > 4472[info] at > org.scalatest.BeforeAndAfterAll.run$(BeforeAndAfterAll.scala:208) > 4473[info] at > org.apache.spark.sql.SQLImplicitsTestSuite.run(SQLImplicitsTestSuite.scala:34) > 4474[info] at >
[jira] [Created] (SPARK-42770) SQLImplicitsTestSuite test failed with Java 17
Yang Jie created SPARK-42770: Summary: SQLImplicitsTestSuite test failed with Java 17 Key: SPARK-42770 URL: https://issues.apache.org/jira/browse/SPARK-42770 Project: Spark Issue Type: Bug Components: Connect, Tests Affects Versions: 3.4.0, 3.5.0 Reporter: Yang Jie [https://github.com/apache/spark/actions/runs/4318647315/jobs/7537203682] {code:java} [info] - test implicit encoder resolution *** FAILED *** (1 second, 329 milliseconds) 4429[info] 2023-03-02T23:00:20.404434 did not equal 2023-03-02T23:00:20.404434875 (SQLImplicitsTestSuite.scala:63) 4430[info] org.scalatest.exceptions.TestFailedException: 4431[info] at org.scalatest.Assertions.newAssertionFailedException(Assertions.scala:472) 4432[info] at org.scalatest.Assertions.newAssertionFailedException$(Assertions.scala:471) 4433[info] at org.scalatest.Assertions$.newAssertionFailedException(Assertions.scala:1231) 4434[info] at org.scalatest.Assertions$AssertionsHelper.macroAssert(Assertions.scala:1295) 4435[info] at org.apache.spark.sql.SQLImplicitsTestSuite.testImplicit$1(SQLImplicitsTestSuite.scala:63) 4436[info] at org.apache.spark.sql.SQLImplicitsTestSuite.$anonfun$new$2(SQLImplicitsTestSuite.scala:133) 4437[info] at scala.runtime.java8.JFunction0$mcV$sp.apply(JFunction0$mcV$sp.java:23) 4438[info] at org.scalatest.OutcomeOf.outcomeOf(OutcomeOf.scala:85) 4439[info] at org.scalatest.OutcomeOf.outcomeOf$(OutcomeOf.scala:83) 4440[info] at org.scalatest.OutcomeOf$.outcomeOf(OutcomeOf.scala:104) 4441[info] at org.scalatest.Transformer.apply(Transformer.scala:22) 4442[info] at org.scalatest.Transformer.apply(Transformer.scala:20) 4443[info] at org.scalatest.funsuite.AnyFunSuiteLike$$anon$1.apply(AnyFunSuiteLike.scala:226) [info] at org.scalatest.TestSuite.withFixture(TestSuite.scala:196) 4445[info] at org.scalatest.TestSuite.withFixture$(TestSuite.scala:195) 4446[info] at org.scalatest.funsuite.AnyFunSuite.withFixture(AnyFunSuite.scala:1564) 4447[info] at org.scalatest.funsuite.AnyFunSuiteLike.invokeWithFixture$1(AnyFunSuiteLike.scala:224) 4448[info] at org.scalatest.funsuite.AnyFunSuiteLike.$anonfun$runTest$1(AnyFunSuiteLike.scala:236) 4449[info] at org.scalatest.SuperEngine.runTestImpl(Engine.scala:306) 4450[info] at org.scalatest.funsuite.AnyFunSuiteLike.runTest(AnyFunSuiteLike.scala:236) 4451[info] at org.scalatest.funsuite.AnyFunSuiteLike.runTest$(AnyFunSuiteLike.scala:218) 4452[info] at org.scalatest.funsuite.AnyFunSuite.runTest(AnyFunSuite.scala:1564) 4453[info] at org.scalatest.funsuite.AnyFunSuiteLike.$anonfun$runTests$1(AnyFunSuiteLike.scala:269) 4454[info] at org.scalatest.SuperEngine.$anonfun$runTestsInBranch$1(Engine.scala:413) 4455[info] at scala.collection.immutable.List.foreach(List.scala:431) 4456[info] at org.scalatest.SuperEngine.traverseSubNodes$1(Engine.scala:401) 4457[info] at org.scalatest.SuperEngine.runTestsInBranch(Engine.scala:396) 4458[info] at org.scalatest.SuperEngine.runTestsImpl(Engine.scala:475) 4459[info] at org.scalatest.funsuite.AnyFunSuiteLike.runTests(AnyFunSuiteLike.scala:269) 4460[info] at org.scalatest.funsuite.AnyFunSuiteLike.runTests$(AnyFunSuiteLike.scala:268) 4461[info] at org.scalatest.funsuite.AnyFunSuite.runTests(AnyFunSuite.scala:1564) 4462[info] at org.scalatest.Suite.run(Suite.scala:1114) 4463[info] at org.scalatest.Suite.run$(Suite.scala:1096) 4464[info] at org.scalatest.funsuite.AnyFunSuite.org$scalatest$funsuite$AnyFunSuiteLike$$super$run(AnyFunSuite.scala:1564) 4465[info] at org.scalatest.funsuite.AnyFunSuiteLike.$anonfun$run$1(AnyFunSuiteLike.scala:273) 4466[info] at org.scalatest.SuperEngine.runImpl(Engine.scala:535) 4467[info] at org.scalatest.funsuite.AnyFunSuiteLike.run(AnyFunSuiteLike.scala:273) 4468[info] at org.scalatest.funsuite.AnyFunSuiteLike.run$(AnyFunSuiteLike.scala:272) 4469[info] at org.apache.spark.sql.SQLImplicitsTestSuite.org$scalatest$BeforeAndAfterAll$$super$run(SQLImplicitsTestSuite.scala:34) 4470[info] at org.scalatest.BeforeAndAfterAll.liftedTree1$1(BeforeAndAfterAll.scala:213) 4471[info] at org.scalatest.BeforeAndAfterAll.run(BeforeAndAfterAll.scala:210) 4472[info] at org.scalatest.BeforeAndAfterAll.run$(BeforeAndAfterAll.scala:208) 4473[info] at org.apache.spark.sql.SQLImplicitsTestSuite.run(SQLImplicitsTestSuite.scala:34) 4474[info] at org.scalatest.tools.Framework.org$scalatest$tools$Framework$$runSuite(Framework.scala:321) 4475[info] at org.scalatest.tools.Framework$ScalaTestTask.execute(Framework.scala:517) 4476[info] at sbt.ForkMain$Run.lambda$runTest$1(ForkMain.java:413) 4477[info] at java.base/java.util.concurrent.FutureTask.run(FutureTask.java:264) 4478[info] at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1136) 4479[info] at
[jira] [Updated] (SPARK-42711) build/sbt usage error messages and shellcheck warn/error
[ https://issues.apache.org/jira/browse/SPARK-42711?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Liang Yan updated SPARK-42711: -- Description: The build/sbt tool's usage information has some missing content: {code:java} (base) spark% ./build/sbt -help Usage: [options] -h | -help print this message -v | -verbose this runner is chattier {code} And also some shellcheck warn/error. was: The build/sbt tool's usage information about java-home is wrong: # java version (default: java from PATH, currently $(java -version 2>&1 | grep version)) -java-home alternate JAVA_HOME > build/sbt usage error messages and shellcheck warn/error > > > Key: SPARK-42711 > URL: https://issues.apache.org/jira/browse/SPARK-42711 > Project: Spark > Issue Type: Bug > Components: Build >Affects Versions: 3.3.2 >Reporter: Liang Yan >Priority: Minor > > The build/sbt tool's usage information has some missing content: > > {code:java} > (base) spark% ./build/sbt -help > Usage: [options] > -h | -help print this message > -v | -verbose this runner is chattier > {code} > And also some shellcheck warn/error. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-42711) build/sbt usage error messages and shellcheck warn/error
[ https://issues.apache.org/jira/browse/SPARK-42711?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Liang Yan updated SPARK-42711: -- Summary: build/sbt usage error messages and shellcheck warn/error (was: build/sbt usage error messages about java-home) > build/sbt usage error messages and shellcheck warn/error > > > Key: SPARK-42711 > URL: https://issues.apache.org/jira/browse/SPARK-42711 > Project: Spark > Issue Type: Bug > Components: Build >Affects Versions: 3.3.2 >Reporter: Liang Yan >Priority: Minor > > The build/sbt tool's usage information about java-home is wrong: > # java version (default: java from PATH, currently $(java -version 2>&1 | > grep version)) > -java-home alternate JAVA_HOME -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] (SPARK-42766) YarnAllocator should filter excluded nodes when launching allocated containers
[ https://issues.apache.org/jira/browse/SPARK-42766 ] wangshengjie deleted comment on SPARK-42766: -- was (Author: wangshengjie): Working on this > YarnAllocator should filter excluded nodes when launching allocated containers > -- > > Key: SPARK-42766 > URL: https://issues.apache.org/jira/browse/SPARK-42766 > Project: Spark > Issue Type: Improvement > Components: YARN >Affects Versions: 3.3.2 >Reporter: wangshengjie >Priority: Major > > In production environment, we hit an issue like this: > If we request 10 containers form nodeA and nodeB, first response from Yarn > return 5 contianers from nodeA and nodeB, then nodeA blacklisted, and second > response from Yarn maybe return some containers from nodeA and launching > containers, but when containers(Executor) setup and send register request to > Driver, it will be rejected and this failure will be counted to > {code:java} > spark.yarn.max.executor.failures {code} > , and will casue app failed. > {code:java} > Max number of executor failures ($maxNumExecutorFailures) reached{code} > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-42756) Helper function to convert proto literal to value in Python Client
[ https://issues.apache.org/jira/browse/SPARK-42756?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ruifeng Zheng resolved SPARK-42756. --- Fix Version/s: 3.4.0 Resolution: Fixed Issue resolved by pull request 40376 [https://github.com/apache/spark/pull/40376] > Helper function to convert proto literal to value in Python Client > -- > > Key: SPARK-42756 > URL: https://issues.apache.org/jira/browse/SPARK-42756 > Project: Spark > Issue Type: Sub-task > Components: Connect, PySpark >Affects Versions: 3.4.0 >Reporter: Ruifeng Zheng >Assignee: Ruifeng Zheng >Priority: Major > Fix For: 3.4.0 > > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-42756) Helper function to convert proto literal to value in Python Client
[ https://issues.apache.org/jira/browse/SPARK-42756?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ruifeng Zheng reassigned SPARK-42756: - Assignee: Ruifeng Zheng > Helper function to convert proto literal to value in Python Client > -- > > Key: SPARK-42756 > URL: https://issues.apache.org/jira/browse/SPARK-42756 > Project: Spark > Issue Type: Sub-task > Components: Connect, PySpark >Affects Versions: 3.4.0 >Reporter: Ruifeng Zheng >Assignee: Ruifeng Zheng >Priority: Major > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-42755) Factor literal value conversion out to connect-common
[ https://issues.apache.org/jira/browse/SPARK-42755?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ruifeng Zheng resolved SPARK-42755. --- Fix Version/s: 3.4.0 Resolution: Fixed Issue resolved by pull request 40375 [https://github.com/apache/spark/pull/40375] > Factor literal value conversion out to connect-common > - > > Key: SPARK-42755 > URL: https://issues.apache.org/jira/browse/SPARK-42755 > Project: Spark > Issue Type: Sub-task > Components: Connect >Affects Versions: 3.4.0 >Reporter: Ruifeng Zheng >Assignee: Ruifeng Zheng >Priority: Major > Fix For: 3.4.0 > > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-42755) Factor literal value conversion out to connect-common
[ https://issues.apache.org/jira/browse/SPARK-42755?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ruifeng Zheng reassigned SPARK-42755: - Assignee: Ruifeng Zheng > Factor literal value conversion out to connect-common > - > > Key: SPARK-42755 > URL: https://issues.apache.org/jira/browse/SPARK-42755 > Project: Spark > Issue Type: Sub-task > Components: Connect >Affects Versions: 3.4.0 >Reporter: Ruifeng Zheng >Assignee: Ruifeng Zheng >Priority: Major > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-42769) Add ENV_DRIVER_POD_IP env variable to executor pods
[ https://issues.apache.org/jira/browse/SPARK-42769?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17699470#comment-17699470 ] Apache Spark commented on SPARK-42769: -- User 'dongjoon-hyun' has created a pull request for this issue: https://github.com/apache/spark/pull/40392 > Add ENV_DRIVER_POD_IP env variable to executor pods > --- > > Key: SPARK-42769 > URL: https://issues.apache.org/jira/browse/SPARK-42769 > Project: Spark > Issue Type: Improvement > Components: Kubernetes >Affects Versions: 3.5.0 >Reporter: Dongjoon Hyun >Priority: Major > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-42769) Add ENV_DRIVER_POD_IP env variable to executor pods
[ https://issues.apache.org/jira/browse/SPARK-42769?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-42769: Assignee: Apache Spark > Add ENV_DRIVER_POD_IP env variable to executor pods > --- > > Key: SPARK-42769 > URL: https://issues.apache.org/jira/browse/SPARK-42769 > Project: Spark > Issue Type: Improvement > Components: Kubernetes >Affects Versions: 3.5.0 >Reporter: Dongjoon Hyun >Assignee: Apache Spark >Priority: Major > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-42769) Add ENV_DRIVER_POD_IP env variable to executor pods
[ https://issues.apache.org/jira/browse/SPARK-42769?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-42769: Assignee: (was: Apache Spark) > Add ENV_DRIVER_POD_IP env variable to executor pods > --- > > Key: SPARK-42769 > URL: https://issues.apache.org/jira/browse/SPARK-42769 > Project: Spark > Issue Type: Improvement > Components: Kubernetes >Affects Versions: 3.5.0 >Reporter: Dongjoon Hyun >Priority: Major > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-42769) Add ENV_DRIVER_POD_IP env variable to executor pods
Dongjoon Hyun created SPARK-42769: - Summary: Add ENV_DRIVER_POD_IP env variable to executor pods Key: SPARK-42769 URL: https://issues.apache.org/jira/browse/SPARK-42769 Project: Spark Issue Type: Improvement Components: Kubernetes Affects Versions: 3.5.0 Reporter: Dongjoon Hyun -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-42766) YarnAllocator should filter excluded nodes when launching allocated containers
[ https://issues.apache.org/jira/browse/SPARK-42766?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17699467#comment-17699467 ] Apache Spark commented on SPARK-42766: -- User 'wangshengjie123' has created a pull request for this issue: https://github.com/apache/spark/pull/40391 > YarnAllocator should filter excluded nodes when launching allocated containers > -- > > Key: SPARK-42766 > URL: https://issues.apache.org/jira/browse/SPARK-42766 > Project: Spark > Issue Type: Improvement > Components: YARN >Affects Versions: 3.3.2 >Reporter: wangshengjie >Priority: Major > > In production environment, we hit an issue like this: > If we request 10 containers form nodeA and nodeB, first response from Yarn > return 5 contianers from nodeA and nodeB, then nodeA blacklisted, and second > response from Yarn maybe return some containers from nodeA and launching > containers, but when containers(Executor) setup and send register request to > Driver, it will be rejected and this failure will be counted to > {code:java} > spark.yarn.max.executor.failures {code} > , and will casue app failed. > {code:java} > Max number of executor failures ($maxNumExecutorFailures) reached{code} > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-42766) YarnAllocator should filter excluded nodes when launching allocated containers
[ https://issues.apache.org/jira/browse/SPARK-42766?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-42766: Assignee: (was: Apache Spark) > YarnAllocator should filter excluded nodes when launching allocated containers > -- > > Key: SPARK-42766 > URL: https://issues.apache.org/jira/browse/SPARK-42766 > Project: Spark > Issue Type: Improvement > Components: YARN >Affects Versions: 3.3.2 >Reporter: wangshengjie >Priority: Major > > In production environment, we hit an issue like this: > If we request 10 containers form nodeA and nodeB, first response from Yarn > return 5 contianers from nodeA and nodeB, then nodeA blacklisted, and second > response from Yarn maybe return some containers from nodeA and launching > containers, but when containers(Executor) setup and send register request to > Driver, it will be rejected and this failure will be counted to > {code:java} > spark.yarn.max.executor.failures {code} > , and will casue app failed. > {code:java} > Max number of executor failures ($maxNumExecutorFailures) reached{code} > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-42766) YarnAllocator should filter excluded nodes when launching allocated containers
[ https://issues.apache.org/jira/browse/SPARK-42766?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-42766: Assignee: Apache Spark > YarnAllocator should filter excluded nodes when launching allocated containers > -- > > Key: SPARK-42766 > URL: https://issues.apache.org/jira/browse/SPARK-42766 > Project: Spark > Issue Type: Improvement > Components: YARN >Affects Versions: 3.3.2 >Reporter: wangshengjie >Assignee: Apache Spark >Priority: Major > > In production environment, we hit an issue like this: > If we request 10 containers form nodeA and nodeB, first response from Yarn > return 5 contianers from nodeA and nodeB, then nodeA blacklisted, and second > response from Yarn maybe return some containers from nodeA and launching > containers, but when containers(Executor) setup and send register request to > Driver, it will be rejected and this failure will be counted to > {code:java} > spark.yarn.max.executor.failures {code} > , and will casue app failed. > {code:java} > Max number of executor failures ($maxNumExecutorFailures) reached{code} > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-42764) Parameterize the max number of attempts for driver props fetcher in KubernetesExecutorBackend
[ https://issues.apache.org/jira/browse/SPARK-42764?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dongjoon Hyun reassigned SPARK-42764: - Assignee: Dongjoon Hyun > Parameterize the max number of attempts for driver props fetcher in > KubernetesExecutorBackend > - > > Key: SPARK-42764 > URL: https://issues.apache.org/jira/browse/SPARK-42764 > Project: Spark > Issue Type: Improvement > Components: Kubernetes >Affects Versions: 3.5.0 >Reporter: Dongjoon Hyun >Assignee: Dongjoon Hyun >Priority: Major > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-42764) Parameterize the max number of attempts for driver props fetcher in KubernetesExecutorBackend
[ https://issues.apache.org/jira/browse/SPARK-42764?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dongjoon Hyun resolved SPARK-42764. --- Fix Version/s: 3.5.0 Resolution: Fixed Issue resolved by pull request 40387 [https://github.com/apache/spark/pull/40387] > Parameterize the max number of attempts for driver props fetcher in > KubernetesExecutorBackend > - > > Key: SPARK-42764 > URL: https://issues.apache.org/jira/browse/SPARK-42764 > Project: Spark > Issue Type: Improvement > Components: Kubernetes >Affects Versions: 3.5.0 >Reporter: Dongjoon Hyun >Assignee: Dongjoon Hyun >Priority: Major > Fix For: 3.5.0 > > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org