[jira] [Commented] (SPARK-42101) Wrap InMemoryTableScanExec with QueryStage

2023-03-13 Thread Apache Spark (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-42101?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17699937#comment-17699937
 ] 

Apache Spark commented on SPARK-42101:
--

User 'ulysses-you' has created a pull request for this issue:
https://github.com/apache/spark/pull/40406

> Wrap InMemoryTableScanExec with QueryStage
> --
>
> Key: SPARK-42101
> URL: https://issues.apache.org/jira/browse/SPARK-42101
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 3.5.0
>Reporter: XiDuo You
>Assignee: XiDuo You
>Priority: Major
> Fix For: 3.5.0
>
>
> The first access to the cached plan which is enable AQE is tricky. Currently, 
> we can not preverse it's output partitioning and ordering.
> The whole query plan also missed lots of optimization in AQE framework. Wrap 
> InMemoryTableScanExec  to query stage can resolve all these issues.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-42777) Support converting TimestampNTZ catalog stats to plan stats

2023-03-13 Thread Gengliang Wang (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-42777?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Gengliang Wang resolved SPARK-42777.

Fix Version/s: 3.4.0
   Resolution: Fixed

Issue resolved by pull request 40404
[https://github.com/apache/spark/pull/40404]

> Support converting TimestampNTZ catalog stats to plan stats
> ---
>
> Key: SPARK-42777
> URL: https://issues.apache.org/jira/browse/SPARK-42777
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 3.4.0
>Reporter: Gengliang Wang
>Assignee: Gengliang Wang
>Priority: Major
> Fix For: 3.4.0
>
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-42340) Implement GroupedData.applyInPandas

2023-03-13 Thread Apache Spark (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-42340?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17699924#comment-17699924
 ] 

Apache Spark commented on SPARK-42340:
--

User 'xinrong-meng' has created a pull request for this issue:
https://github.com/apache/spark/pull/40405

> Implement GroupedData.applyInPandas
> ---
>
> Key: SPARK-42340
> URL: https://issues.apache.org/jira/browse/SPARK-42340
> Project: Spark
>  Issue Type: Sub-task
>  Components: Connect
>Affects Versions: 3.4.0
>Reporter: Takuya Ueshin
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-42340) Implement GroupedData.applyInPandas

2023-03-13 Thread Apache Spark (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-42340?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17699922#comment-17699922
 ] 

Apache Spark commented on SPARK-42340:
--

User 'xinrong-meng' has created a pull request for this issue:
https://github.com/apache/spark/pull/40405

> Implement GroupedData.applyInPandas
> ---
>
> Key: SPARK-42340
> URL: https://issues.apache.org/jira/browse/SPARK-42340
> Project: Spark
>  Issue Type: Sub-task
>  Components: Connect
>Affects Versions: 3.4.0
>Reporter: Takuya Ueshin
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-42340) Implement GroupedData.applyInPandas

2023-03-13 Thread Apache Spark (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-42340?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-42340:


Assignee: Apache Spark

> Implement GroupedData.applyInPandas
> ---
>
> Key: SPARK-42340
> URL: https://issues.apache.org/jira/browse/SPARK-42340
> Project: Spark
>  Issue Type: Sub-task
>  Components: Connect
>Affects Versions: 3.4.0
>Reporter: Takuya Ueshin
>Assignee: Apache Spark
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-42340) Implement GroupedData.applyInPandas

2023-03-13 Thread Apache Spark (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-42340?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-42340:


Assignee: (was: Apache Spark)

> Implement GroupedData.applyInPandas
> ---
>
> Key: SPARK-42340
> URL: https://issues.apache.org/jira/browse/SPARK-42340
> Project: Spark
>  Issue Type: Sub-task
>  Components: Connect
>Affects Versions: 3.4.0
>Reporter: Takuya Ueshin
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-42773) Minor grammatical change to "Supports Spark Connect" message

2023-03-13 Thread Ruifeng Zheng (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-42773?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ruifeng Zheng resolved SPARK-42773.
---
Fix Version/s: 3.4.0
   Resolution: Fixed

Issue resolved by pull request 40401
[https://github.com/apache/spark/pull/40401]

> Minor grammatical change to "Supports Spark Connect" message
> 
>
> Key: SPARK-42773
> URL: https://issues.apache.org/jira/browse/SPARK-42773
> Project: Spark
>  Issue Type: Documentation
>  Components: PySpark
>Affects Versions: 3.4.0
>Reporter: Allan Folting
>Assignee: Allan Folting
>Priority: Major
> Fix For: 3.4.0
>
>
> Changing "Support Spark Connect" to "Supports Spark Connect" in the 3.4.0 
> version change message which is also used in the documentation:
>  
> .. versionchanged:: 3.4.0
>      Supports Spark Connect.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-42773) Minor grammatical change to "Supports Spark Connect" message

2023-03-13 Thread Ruifeng Zheng (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-42773?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ruifeng Zheng reassigned SPARK-42773:
-

Assignee: Allan Folting

> Minor grammatical change to "Supports Spark Connect" message
> 
>
> Key: SPARK-42773
> URL: https://issues.apache.org/jira/browse/SPARK-42773
> Project: Spark
>  Issue Type: Documentation
>  Components: PySpark
>Affects Versions: 3.4.0
>Reporter: Allan Folting
>Assignee: Allan Folting
>Priority: Major
>
> Changing "Support Spark Connect" to "Supports Spark Connect" in the 3.4.0 
> version change message which is also used in the documentation:
>  
> .. versionchanged:: 3.4.0
>      Supports Spark Connect.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-42702) Support parameterized CTE

2023-03-13 Thread Wenchen Fan (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-42702?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wenchen Fan resolved SPARK-42702.
-
  Assignee: Wenchen Fan  (was: Max Gekk)
Resolution: Fixed

> Support parameterized CTE
> -
>
> Key: SPARK-42702
> URL: https://issues.apache.org/jira/browse/SPARK-42702
> Project: Spark
>  Issue Type: New Feature
>  Components: SQL
>Affects Versions: 3.4.0
>Reporter: Max Gekk
>Assignee: Wenchen Fan
>Priority: Major
>
> Support named parameters in named common table expressions (CTE). At the 
> moment, such queries failed:
> {code:java}
> CREATE TABLE tbl(namespace STRING) USING parquet
> INSERT INTO tbl SELECT 'abc'
> WITH transitions AS (
>   SELECT * FROM tbl WHERE namespace = :namespace
> ) SELECT * FROM transitions {code}
> w/ the following error:
> {code:java}
> [UNBOUND_SQL_PARAMETER] Found the unbound parameter: `namespace`. Please, fix 
> `args` and provide a mapping of the parameter to a SQL literal.; line 3 pos 
> 38;
> 'WithCTE
> :- 'CTERelationDef 0, false
> :  +- 'SubqueryAlias transitions
> :     +- 'Project [*]
> :        +- 'Filter (namespace#3 = parameter(namespace))
> :           +- SubqueryAlias spark_catalog.default.tbl
> :              +- Relation spark_catalog.default.tbl[namespace#3] parquet
> +- 'Project [*]
>    +- 'SubqueryAlias transitions
>       +- 'CTERelationRef 0, falseorg.apache.spark.sql.AnalysisException: 
> [UNBOUND_SQL_PARAMETER] Found the unbound parameter: `namespace`. Please, fix 
> `args` and provide a mapping of the parameter to a SQL literal.; line 3 pos 
> 38;
> 'WithCTE
> :- 'CTERelationDef 0, false
> :  +- 'SubqueryAlias transitions
> :     +- 'Project [*]
> :        +- 'Filter (namespace#3 = parameter(namespace))
> :           +- SubqueryAlias spark_catalog.default.tbl
> :              +- Relation spark_catalog.default.tbl[namespace#3] parquet
> +- 'Project [*]
>    +- 'SubqueryAlias transitions
>       +- 'CTERelationRef 0, false    at 
> org.apache.spark.sql.catalyst.analysis.package$AnalysisErrorAt.failAnalysis(package.scala:52)
>     at 
> org.apache.spark.sql.catalyst.analysis.CheckAnalysis.$anonfun$checkAnalysis0$5(CheckAnalysis.scala:339)
>     at 
> org.apache.spark.sql.catalyst.analysis.CheckAnalysis.$anonfun$checkAnalysis0$5$adapted(CheckAnalysis.scala:244)
>  {code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-42597) Support unwrap date type to timestamp type

2023-03-13 Thread Yuming Wang (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-42597?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yuming Wang updated SPARK-42597:

Summary: Support unwrap date type to timestamp type  (was: 
UnwrapCastInBinaryComparison support unwrap timestamp type)

> Support unwrap date type to timestamp type
> --
>
> Key: SPARK-42597
> URL: https://issues.apache.org/jira/browse/SPARK-42597
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 3.5.0
>Reporter: Yuming Wang
>Assignee: Yuming Wang
>Priority: Major
> Fix For: 3.5.0
>
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-42597) UnwrapCastInBinaryComparison support unwrap timestamp type

2023-03-13 Thread Yuming Wang (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-42597?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yuming Wang resolved SPARK-42597.
-
Fix Version/s: 3.5.0
   Resolution: Fixed

Issue resolved by pull request 40190
[https://github.com/apache/spark/pull/40190]

> UnwrapCastInBinaryComparison support unwrap timestamp type
> --
>
> Key: SPARK-42597
> URL: https://issues.apache.org/jira/browse/SPARK-42597
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 3.5.0
>Reporter: Yuming Wang
>Assignee: Yuming Wang
>Priority: Major
> Fix For: 3.5.0
>
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-42597) UnwrapCastInBinaryComparison support unwrap timestamp type

2023-03-13 Thread Yuming Wang (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-42597?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yuming Wang reassigned SPARK-42597:
---

Assignee: Yuming Wang

> UnwrapCastInBinaryComparison support unwrap timestamp type
> --
>
> Key: SPARK-42597
> URL: https://issues.apache.org/jira/browse/SPARK-42597
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 3.5.0
>Reporter: Yuming Wang
>Assignee: Yuming Wang
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-42711) build/sbt usage error messages and shellcheck warn/error

2023-03-13 Thread Liang Yan (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-42711?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Liang Yan resolved SPARK-42711.
---
Resolution: Not A Problem

The original codes are just a copy of upstream. And the changes not fix actual 
problem.

> build/sbt usage error messages and shellcheck warn/error
> 
>
> Key: SPARK-42711
> URL: https://issues.apache.org/jira/browse/SPARK-42711
> Project: Spark
>  Issue Type: Bug
>  Components: Build
>Affects Versions: 3.3.2
>Reporter: Liang Yan
>Priority: Minor
>
> The build/sbt tool's usage information has some missing content:
>  
> {code:java}
> (base) spark% ./build/sbt -help
> Usage:  [options]
>   -h | -help print this message
>   -v | -verbose  this runner is chattier
> {code}
> And also some shellcheck warn/error.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Closed] (SPARK-42711) build/sbt usage error messages and shellcheck warn/error

2023-03-13 Thread Liang Yan (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-42711?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Liang Yan closed SPARK-42711.
-

> build/sbt usage error messages and shellcheck warn/error
> 
>
> Key: SPARK-42711
> URL: https://issues.apache.org/jira/browse/SPARK-42711
> Project: Spark
>  Issue Type: Bug
>  Components: Build
>Affects Versions: 3.3.2
>Reporter: Liang Yan
>Priority: Minor
>
> The build/sbt tool's usage information has some missing content:
>  
> {code:java}
> (base) spark% ./build/sbt -help
> Usage:  [options]
>   -h | -help print this message
>   -v | -verbose  this runner is chattier
> {code}
> And also some shellcheck warn/error.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-21782) Repartition creates skews when numPartitions is a power of 2

2023-03-13 Thread Apache Spark (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-21782?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17699874#comment-17699874
 ] 

Apache Spark commented on SPARK-21782:
--

User 'megaserg' has created a pull request for this issue:
https://github.com/apache/spark/pull/18990

> Repartition creates skews when numPartitions is a power of 2
> 
>
> Key: SPARK-21782
> URL: https://issues.apache.org/jira/browse/SPARK-21782
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Core
>Affects Versions: 2.2.0
>Reporter: Sergey Serebryakov
>Assignee: Sergey Serebryakov
>Priority: Major
>  Labels: repartition
> Fix For: 2.3.0
>
> Attachments: Screen Shot 2017-08-16 at 3.40.01 PM.png
>
>
> *Problem:*
> When an RDD (particularly with a low item-per-partition ratio) is 
> repartitioned to {{numPartitions}} = power of 2, the resulting partitions are 
> very uneven-sized. This affects both {{repartition()}} and 
> {{coalesce(shuffle=true)}}.
> *Steps to reproduce:*
> {code}
> $ spark-shell
> scala> sc.parallelize(0 until 1000, 
> 250).repartition(64).glom().map(_.length).collect()
> res0: Array[Int] = Array(0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 
> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 
> 0, 0, 0, 0, 144, 250, 250, 250, 106, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0)
> {code}
> *Explanation:*
> Currently, the [algorithm for 
> repartition|https://github.com/apache/spark/blob/v2.2.0/core/src/main/scala/org/apache/spark/rdd/RDD.scala#L450]
>  (shuffle-enabled coalesce) is as follows:
> - for each initial partition {{index}}, generate {{position}} as {{(new 
> Random(index)).nextInt(numPartitions)}}
> - then, for element number {{k}} in initial partition {{index}}, put it in 
> the new partition {{position + k}} (modulo {{numPartitions}}).
> So, essentially elements are smeared roughly equally over {{numPartitions}} 
> buckets - starting from the one with number {{position+1}}.
> Note that a new instance of {{Random}} is created for every initial partition 
> {{index}}, with a fixed seed {{index}}, and then discarded. So the 
> {{position}} is deterministic for every {{index}} for any RDD in the world. 
> Also, [{{nextInt(bound)}} 
> implementation|http://grepcode.com/file/repository.grepcode.com/java/root/jdk/openjdk/8u40-b25/java/util/Random.java/#393]
>  has a special case when {{bound}} is a power of 2, which is basically taking 
> several highest bits from the initial seed, with only a minimal scrambling.
> Due to deterministic seed, using the generator only once, and lack of 
> scrambling, the {{position}} values for power-of-two {{numPartitions}} always 
> end up being almost the same regardless of the {{index}}, causing some 
> buckets to be much more popular than others. So, {{repartition}} will in fact 
> intentionally produce skewed partitions even when before the partition were 
> roughly equal in size.
> The behavior seems to have been introduced in SPARK-1770 by 
> https://github.com/apache/spark/pull/727/
> {quote}
> The load balancing is not perfect: a given output partition
> can have up to N more elements than the average if there are N input
> partitions. However, some randomization is used to minimize the
> probabiliy that this happens.
> {quote}
> Another related ticket: SPARK-17817 - 
> https://github.com/apache/spark/pull/15445



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-21782) Repartition creates skews when numPartitions is a power of 2

2023-03-13 Thread Apache Spark (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-21782?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17699875#comment-17699875
 ] 

Apache Spark commented on SPARK-21782:
--

User 'megaserg' has created a pull request for this issue:
https://github.com/apache/spark/pull/18990

> Repartition creates skews when numPartitions is a power of 2
> 
>
> Key: SPARK-21782
> URL: https://issues.apache.org/jira/browse/SPARK-21782
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Core
>Affects Versions: 2.2.0
>Reporter: Sergey Serebryakov
>Assignee: Sergey Serebryakov
>Priority: Major
>  Labels: repartition
> Fix For: 2.3.0
>
> Attachments: Screen Shot 2017-08-16 at 3.40.01 PM.png
>
>
> *Problem:*
> When an RDD (particularly with a low item-per-partition ratio) is 
> repartitioned to {{numPartitions}} = power of 2, the resulting partitions are 
> very uneven-sized. This affects both {{repartition()}} and 
> {{coalesce(shuffle=true)}}.
> *Steps to reproduce:*
> {code}
> $ spark-shell
> scala> sc.parallelize(0 until 1000, 
> 250).repartition(64).glom().map(_.length).collect()
> res0: Array[Int] = Array(0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 
> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 
> 0, 0, 0, 0, 144, 250, 250, 250, 106, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0)
> {code}
> *Explanation:*
> Currently, the [algorithm for 
> repartition|https://github.com/apache/spark/blob/v2.2.0/core/src/main/scala/org/apache/spark/rdd/RDD.scala#L450]
>  (shuffle-enabled coalesce) is as follows:
> - for each initial partition {{index}}, generate {{position}} as {{(new 
> Random(index)).nextInt(numPartitions)}}
> - then, for element number {{k}} in initial partition {{index}}, put it in 
> the new partition {{position + k}} (modulo {{numPartitions}}).
> So, essentially elements are smeared roughly equally over {{numPartitions}} 
> buckets - starting from the one with number {{position+1}}.
> Note that a new instance of {{Random}} is created for every initial partition 
> {{index}}, with a fixed seed {{index}}, and then discarded. So the 
> {{position}} is deterministic for every {{index}} for any RDD in the world. 
> Also, [{{nextInt(bound)}} 
> implementation|http://grepcode.com/file/repository.grepcode.com/java/root/jdk/openjdk/8u40-b25/java/util/Random.java/#393]
>  has a special case when {{bound}} is a power of 2, which is basically taking 
> several highest bits from the initial seed, with only a minimal scrambling.
> Due to deterministic seed, using the generator only once, and lack of 
> scrambling, the {{position}} values for power-of-two {{numPartitions}} always 
> end up being almost the same regardless of the {{index}}, causing some 
> buckets to be much more popular than others. So, {{repartition}} will in fact 
> intentionally produce skewed partitions even when before the partition were 
> roughly equal in size.
> The behavior seems to have been introduced in SPARK-1770 by 
> https://github.com/apache/spark/pull/727/
> {quote}
> The load balancing is not perfect: a given output partition
> can have up to N more elements than the average if there are N input
> partitions. However, some randomization is used to minimize the
> probabiliy that this happens.
> {quote}
> Another related ticket: SPARK-17817 - 
> https://github.com/apache/spark/pull/15445



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-42777) Support converting TimestampNTZ catalog stats to plan stats

2023-03-13 Thread Apache Spark (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-42777?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17699873#comment-17699873
 ] 

Apache Spark commented on SPARK-42777:
--

User 'gengliangwang' has created a pull request for this issue:
https://github.com/apache/spark/pull/40404

> Support converting TimestampNTZ catalog stats to plan stats
> ---
>
> Key: SPARK-42777
> URL: https://issues.apache.org/jira/browse/SPARK-42777
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 3.4.0
>Reporter: Gengliang Wang
>Assignee: Gengliang Wang
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-42777) Support converting TimestampNTZ catalog stats to plan stats

2023-03-13 Thread Apache Spark (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-42777?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-42777:


Assignee: Gengliang Wang  (was: Apache Spark)

> Support converting TimestampNTZ catalog stats to plan stats
> ---
>
> Key: SPARK-42777
> URL: https://issues.apache.org/jira/browse/SPARK-42777
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 3.4.0
>Reporter: Gengliang Wang
>Assignee: Gengliang Wang
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-42777) Support converting TimestampNTZ catalog stats to plan stats

2023-03-13 Thread Apache Spark (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-42777?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17699872#comment-17699872
 ] 

Apache Spark commented on SPARK-42777:
--

User 'gengliangwang' has created a pull request for this issue:
https://github.com/apache/spark/pull/40404

> Support converting TimestampNTZ catalog stats to plan stats
> ---
>
> Key: SPARK-42777
> URL: https://issues.apache.org/jira/browse/SPARK-42777
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 3.4.0
>Reporter: Gengliang Wang
>Assignee: Gengliang Wang
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-42777) Support converting TimestampNTZ catalog stats to plan stats

2023-03-13 Thread Apache Spark (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-42777?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-42777:


Assignee: Apache Spark  (was: Gengliang Wang)

> Support converting TimestampNTZ catalog stats to plan stats
> ---
>
> Key: SPARK-42777
> URL: https://issues.apache.org/jira/browse/SPARK-42777
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 3.4.0
>Reporter: Gengliang Wang
>Assignee: Apache Spark
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-42777) Support converting TimestampNTZ catalog stats to plan stats

2023-03-13 Thread Gengliang Wang (Jira)
Gengliang Wang created SPARK-42777:
--

 Summary: Support converting TimestampNTZ catalog stats to plan 
stats
 Key: SPARK-42777
 URL: https://issues.apache.org/jira/browse/SPARK-42777
 Project: Spark
  Issue Type: Sub-task
  Components: SQL
Affects Versions: 3.4.0
Reporter: Gengliang Wang
Assignee: Gengliang Wang






--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-42754) Spark 3.4 history server's SQL tab incorrectly groups SQL executions when replaying event logs from Spark 3.3 and earlier

2023-03-13 Thread Apache Spark (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-42754?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-42754:


Assignee: Apache Spark

> Spark 3.4 history server's SQL tab incorrectly groups SQL executions when 
> replaying event logs from Spark 3.3 and earlier
> -
>
> Key: SPARK-42754
> URL: https://issues.apache.org/jira/browse/SPARK-42754
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 3.4.0
>Reporter: Josh Rosen
>Assignee: Apache Spark
>Priority: Blocker
> Attachments: example.png
>
>
> In Spark 3.4.0 RC4, the Spark History Server's SQL tab incorrectly groups SQL 
> executions when replaying event logs generated by older Spark versions.
>  
> {*}Reproduction{*}:
> {{In ./bin/spark-shell --conf spark.eventLog.enabled=true --conf 
> spark.eventLog.dir=eventlogs, run three non-nested SQL queries:}}
> {code:java}
> sql("select * from range(10)").collect()
> sql("select * from range(20)").collect()
> sql("select * from range(30)").collect(){code}
> Exit the shell and use the Spark History Server to replay this application's 
> UI.
> In the SQL tab I expect to see three separate queries, but Spark 3.4's 
> history server incorrectly groups the second and third queries as nested 
> queries of the first (see attached screenshot).
>  
> {*}Root cause{*}: 
> [https://github.com/apache/spark/pull/39268] / SPARK-41752 added a new 
> *non-optional* {{rootExecutionId: Long}} field to the 
> SparkListenerSQLExecutionStart case class.
> When JsonProtocol deserializes this event it uses the "ignore missing 
> properties" Jackson deserialization option, causing the 
> {{rootExecutionField}} to be initialized with a default value of {{{}0{}}}.
> The value {{0}} is a legitimate execution ID, so in the deserialized event we 
> have no ability to distinguish between the absence of a value and a case 
> where all queries have the first query as the root.
> *Proposed* {*}fix{*}:
> I think we should change this field to be of type {{Option[Long]}} . I 
> believe this is a release blocker for Spark 3.4.0 because we cannot change 
> the type of this new field in a future release without breaking binary 
> compatibility.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-42754) Spark 3.4 history server's SQL tab incorrectly groups SQL executions when replaying event logs from Spark 3.3 and earlier

2023-03-13 Thread Apache Spark (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-42754?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17699852#comment-17699852
 ] 

Apache Spark commented on SPARK-42754:
--

User 'linhongliu-db' has created a pull request for this issue:
https://github.com/apache/spark/pull/40403

> Spark 3.4 history server's SQL tab incorrectly groups SQL executions when 
> replaying event logs from Spark 3.3 and earlier
> -
>
> Key: SPARK-42754
> URL: https://issues.apache.org/jira/browse/SPARK-42754
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 3.4.0
>Reporter: Josh Rosen
>Priority: Blocker
> Attachments: example.png
>
>
> In Spark 3.4.0 RC4, the Spark History Server's SQL tab incorrectly groups SQL 
> executions when replaying event logs generated by older Spark versions.
>  
> {*}Reproduction{*}:
> {{In ./bin/spark-shell --conf spark.eventLog.enabled=true --conf 
> spark.eventLog.dir=eventlogs, run three non-nested SQL queries:}}
> {code:java}
> sql("select * from range(10)").collect()
> sql("select * from range(20)").collect()
> sql("select * from range(30)").collect(){code}
> Exit the shell and use the Spark History Server to replay this application's 
> UI.
> In the SQL tab I expect to see three separate queries, but Spark 3.4's 
> history server incorrectly groups the second and third queries as nested 
> queries of the first (see attached screenshot).
>  
> {*}Root cause{*}: 
> [https://github.com/apache/spark/pull/39268] / SPARK-41752 added a new 
> *non-optional* {{rootExecutionId: Long}} field to the 
> SparkListenerSQLExecutionStart case class.
> When JsonProtocol deserializes this event it uses the "ignore missing 
> properties" Jackson deserialization option, causing the 
> {{rootExecutionField}} to be initialized with a default value of {{{}0{}}}.
> The value {{0}} is a legitimate execution ID, so in the deserialized event we 
> have no ability to distinguish between the absence of a value and a case 
> where all queries have the first query as the root.
> *Proposed* {*}fix{*}:
> I think we should change this field to be of type {{Option[Long]}} . I 
> believe this is a release blocker for Spark 3.4.0 because we cannot change 
> the type of this new field in a future release without breaking binary 
> compatibility.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-42754) Spark 3.4 history server's SQL tab incorrectly groups SQL executions when replaying event logs from Spark 3.3 and earlier

2023-03-13 Thread Apache Spark (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-42754?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-42754:


Assignee: (was: Apache Spark)

> Spark 3.4 history server's SQL tab incorrectly groups SQL executions when 
> replaying event logs from Spark 3.3 and earlier
> -
>
> Key: SPARK-42754
> URL: https://issues.apache.org/jira/browse/SPARK-42754
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 3.4.0
>Reporter: Josh Rosen
>Priority: Blocker
> Attachments: example.png
>
>
> In Spark 3.4.0 RC4, the Spark History Server's SQL tab incorrectly groups SQL 
> executions when replaying event logs generated by older Spark versions.
>  
> {*}Reproduction{*}:
> {{In ./bin/spark-shell --conf spark.eventLog.enabled=true --conf 
> spark.eventLog.dir=eventlogs, run three non-nested SQL queries:}}
> {code:java}
> sql("select * from range(10)").collect()
> sql("select * from range(20)").collect()
> sql("select * from range(30)").collect(){code}
> Exit the shell and use the Spark History Server to replay this application's 
> UI.
> In the SQL tab I expect to see three separate queries, but Spark 3.4's 
> history server incorrectly groups the second and third queries as nested 
> queries of the first (see attached screenshot).
>  
> {*}Root cause{*}: 
> [https://github.com/apache/spark/pull/39268] / SPARK-41752 added a new 
> *non-optional* {{rootExecutionId: Long}} field to the 
> SparkListenerSQLExecutionStart case class.
> When JsonProtocol deserializes this event it uses the "ignore missing 
> properties" Jackson deserialization option, causing the 
> {{rootExecutionField}} to be initialized with a default value of {{{}0{}}}.
> The value {{0}} is a legitimate execution ID, so in the deserialized event we 
> have no ability to distinguish between the absence of a value and a case 
> where all queries have the first query as the root.
> *Proposed* {*}fix{*}:
> I think we should change this field to be of type {{Option[Long]}} . I 
> believe this is a release blocker for Spark 3.4.0 because we cannot change 
> the type of this new field in a future release without breaking binary 
> compatibility.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-42776) BroadcastHashJoinExec.requiredChildDistribution called before columnar replacement rules

2023-03-13 Thread Timothy Miller (Jira)
Timothy Miller created SPARK-42776:
--

 Summary: BroadcastHashJoinExec.requiredChildDistribution called 
before columnar replacement rules
 Key: SPARK-42776
 URL: https://issues.apache.org/jira/browse/SPARK-42776
 Project: Spark
  Issue Type: Bug
  Components: Optimizer
Affects Versions: 3.3.1
 Environment: I'm prototyping on a Mac, but that's not really relevant.
Reporter: Timothy Miller


I am trying to replace BroadcastHashJoinExec with a columnar equivalent. 
However, I noticed that BroadcastHashJoinExec.requiredChildDistribution gets 
called BEFORE the columnar replacement rules. As a result, the object that gets 
broadcast is the plain old hashmap created from row data. By the time the 
columnar replacement rules are applied, it's too late to get Spark to broadcast 
any other kind of object.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-42775) approx_percentile produces wrong results for large decimals.

2023-03-13 Thread Chenhao Li (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-42775?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chenhao Li updated SPARK-42775:
---
Description: 
In the {{approx_percentile}} expression, Spark casts decimal to double to 
update the aggregation state 
([ApproximatePercentile.scala#L181|https://github.com/apache/spark/blob/933dc0c42f0caf74aaa077fd4f2c2e7208452b9b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/aggregate/ApproximatePercentile.scala#L181])
 and casts the result double back to decimal 
([ApproximatePercentile.scala#L206|https://github.com/apache/spark/blob/933dc0c42f0caf74aaa077fd4f2c2e7208452b9b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/aggregate/ApproximatePercentile.scala#L206]).
 The precision loss in the casts can make the result decimal out of its 
precision range. This can lead to the following counter-intuitive results:
{code:sql}
spark-sql> select approx_percentile(col, 0.5) from values (999) 
as tab(col);
NULL
spark-sql> select approx_percentile(col, 0.5) is null from values 
(999) as tab(col);
false
spark-sql> select cast(approx_percentile(col, 0.5) as string) from values 
(999) as tab(col);
1000
spark-sql> desc select approx_percentile(col, 0.5) from values 
(999) as tab(col);
approx_percentile(col, 0.5, 1)  decimal(19,0) 
{code}
The result is actually not null, so the second query returns false. The first 
query returns null because the result cannot fit into {{{}decimal(19, 0){}}}.

A suggested fix is to use {{Decimal.changePrecision}} here to ensure the result 
fits, and really returns a null or throws an exception when the result doesn't 
fit.

  was:
In the {{approx_percentile}} expression, Spark casts decimal to double to 
update the aggregation state 
([ApproximatePercentile.scala#L181|https://github.com/apache/spark/blob/933dc0c42f0caf74aaa077fd4f2c2e7208452b9b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/aggregate/ApproximatePercentile.scala#L181])
 and casts the result double back to decimal 
([ApproximatePercentile.scala#L206|https://github.com/apache/spark/blob/933dc0c42f0caf74aaa077fd4f2c2e7208452b9b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/aggregate/ApproximatePercentile.scala#L206]).
 The precision loss in the casts can make the result decimal out of its 
precision range. This can lead to the following counter-intuitive results:
{code:sql}
spark-sql> select approx_percentile(col, 0.5) from values (999) 
as tab(col);
NULL
spark-sql> select approx_percentile(col, 0.5) is null from values 
(999) as tab(col);
false
spark-sql> select cast(approx_percentile(col, 0.5) as string) from values 
(999) as tab(col);
1000
spark-sql> desc select approx_percentile(col, 0.5) from values 
(999) as tab(col);
approx_percentile(col, 0.5, 1)  decimal(19,0) 
{code}
The result is actually not null, so the second query returns false. The first 
query returns null because the result cannot fit into {{{}decimal(19, 0){}}}.

A suggested fix is to use `Decimal.changePrecision` here to ensure the result 
fits, and really returns a null or throws an exception when the result doesn't 
fit.


> approx_percentile produces wrong results for large decimals.
> 
>
> Key: SPARK-42775
> URL: https://issues.apache.org/jira/browse/SPARK-42775
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 2.1.0, 2.2.0, 2.3.0, 2.4.0, 3.0.0, 3.1.0, 3.2.0, 3.3.0, 
> 3.4.0
>Reporter: Chenhao Li
>Priority: Major
>
> In the {{approx_percentile}} expression, Spark casts decimal to double to 
> update the aggregation state 
> ([ApproximatePercentile.scala#L181|https://github.com/apache/spark/blob/933dc0c42f0caf74aaa077fd4f2c2e7208452b9b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/aggregate/ApproximatePercentile.scala#L181])
>  and casts the result double back to decimal 
> ([ApproximatePercentile.scala#L206|https://github.com/apache/spark/blob/933dc0c42f0caf74aaa077fd4f2c2e7208452b9b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/aggregate/ApproximatePercentile.scala#L206]).
>  The precision loss in the casts can make the result decimal out of its 
> precision range. This can lead to the following counter-intuitive results:
> {code:sql}
> spark-sql> select approx_percentile(col, 0.5) from values 
> (999) as tab(col);
> NULL
> spark-sql> select approx_percentile(col, 0.5) is null from values 
> (999) as tab(col);
> false
> spark-sql> select cast(approx_percentile(col, 0.5) as string) from values 
> (999) as tab(col);
> 

[jira] [Updated] (SPARK-42775) approx_percentile produces wrong results for large decimals.

2023-03-13 Thread Chenhao Li (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-42775?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chenhao Li updated SPARK-42775:
---
Description: 
In the {{approx_percentile}} expression, Spark casts decimal to double to 
update the aggregation state 
([ApproximatePercentile.scala#L181|https://github.com/apache/spark/blob/933dc0c42f0caf74aaa077fd4f2c2e7208452b9b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/aggregate/ApproximatePercentile.scala#L181])
 and casts the result double back to decimal 
([ApproximatePercentile.scala#L206|https://github.com/apache/spark/blob/933dc0c42f0caf74aaa077fd4f2c2e7208452b9b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/aggregate/ApproximatePercentile.scala#L206]).
 The precision loss in the casts can make the result decimal out of its 
precision range. This can lead to the following counter-intuitive results:
{code:sql}
spark-sql> select approx_percentile(col, 0.5) from values (999) 
as tab(col);
NULL
spark-sql> select approx_percentile(col, 0.5) is null from values 
(999) as tab(col);
false
spark-sql> select cast(approx_percentile(col, 0.5) as string) from values 
(999) as tab(col);
1000
spark-sql> desc select approx_percentile(col, 0.5) from values 
(999) as tab(col);
approx_percentile(col, 0.5, 1)  decimal(19,0) 
{code}
The result is actually not null, so the second query returns false. The first 
query returns null because the result cannot fit into {{{}decimal(19, 0){}}}.

A suggested fix is to use `Decimal.changePrecision` here to ensure the result 
fits, and really returns a null or throws an exception when the result doesn't 
fit.

  was:
In the `approx_percentile` expression, Spark casts decimal to double to update 
the aggregation state 
([ApproximatePercentile.scala#L181|https://github.com/apache/spark/blob/933dc0c42f0caf74aaa077fd4f2c2e7208452b9b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/aggregate/ApproximatePercentile.scala#L181])
 and casts the result double back to decimal 
([ApproximatePercentile.scala#L206|https://github.com/apache/spark/blob/933dc0c42f0caf74aaa077fd4f2c2e7208452b9b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/aggregate/ApproximatePercentile.scala#L206]).
 The precision loss in the casts can make the result decimal out of its 
precision range. This can lead to the following counter-intuitive results:

{code:sql}
spark-sql> select approx_percentile(col, 0.5) from values (999) 
as tab(col);
NULL
spark-sql> select approx_percentile(col, 0.5) is null from values 
(999) as tab(col);
false
spark-sql> select cast(approx_percentile(col, 0.5) as string) from values 
(999) as tab(col);
1000
spark-sql> desc select approx_percentile(col, 0.5) from values 
(999) as tab(col);
approx_percentile(col, 0.5, 1)  decimal(19,0) 
{code}

The result is actually not null, so the second query returns false. The first 
query returns null because the result cannot fit into {{decimal(19, 0)}}.

A suggested fix is to use `Decimal.changePrecision` here to ensure the result 
fits, and really returns a null or throws an exception when the result doesn't 
fit.


> approx_percentile produces wrong results for large decimals.
> 
>
> Key: SPARK-42775
> URL: https://issues.apache.org/jira/browse/SPARK-42775
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 2.1.0, 2.2.0, 2.3.0, 2.4.0, 3.0.0, 3.1.0, 3.2.0, 3.3.0, 
> 3.4.0
>Reporter: Chenhao Li
>Priority: Major
>
> In the {{approx_percentile}} expression, Spark casts decimal to double to 
> update the aggregation state 
> ([ApproximatePercentile.scala#L181|https://github.com/apache/spark/blob/933dc0c42f0caf74aaa077fd4f2c2e7208452b9b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/aggregate/ApproximatePercentile.scala#L181])
>  and casts the result double back to decimal 
> ([ApproximatePercentile.scala#L206|https://github.com/apache/spark/blob/933dc0c42f0caf74aaa077fd4f2c2e7208452b9b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/aggregate/ApproximatePercentile.scala#L206]).
>  The precision loss in the casts can make the result decimal out of its 
> precision range. This can lead to the following counter-intuitive results:
> {code:sql}
> spark-sql> select approx_percentile(col, 0.5) from values 
> (999) as tab(col);
> NULL
> spark-sql> select approx_percentile(col, 0.5) is null from values 
> (999) as tab(col);
> false
> spark-sql> select cast(approx_percentile(col, 0.5) as string) from values 
> (999) as tab(col);
> 

[jira] [Created] (SPARK-42775) approx_percentile produces wrong results for large decimals.

2023-03-13 Thread Chenhao Li (Jira)
Chenhao Li created SPARK-42775:
--

 Summary: approx_percentile produces wrong results for large 
decimals.
 Key: SPARK-42775
 URL: https://issues.apache.org/jira/browse/SPARK-42775
 Project: Spark
  Issue Type: Bug
  Components: SQL
Affects Versions: 3.3.0, 3.2.0, 3.1.0, 3.0.0, 2.4.0, 2.3.0, 2.2.0, 2.1.0, 
3.4.0
Reporter: Chenhao Li


In the `approx_percentile` expression, Spark casts decimal to double to update 
the aggregation state 
([ApproximatePercentile.scala#L181|https://github.com/apache/spark/blob/933dc0c42f0caf74aaa077fd4f2c2e7208452b9b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/aggregate/ApproximatePercentile.scala#L181])
 and casts the result double back to decimal 
([ApproximatePercentile.scala#L206|https://github.com/apache/spark/blob/933dc0c42f0caf74aaa077fd4f2c2e7208452b9b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/aggregate/ApproximatePercentile.scala#L206]).
 The precision loss in the casts can make the result decimal out of its 
precision range. This can lead to the following counter-intuitive results:

{code:sql}
spark-sql> select approx_percentile(col, 0.5) from values (999) 
as tab(col);
NULL
spark-sql> select approx_percentile(col, 0.5) is null from values 
(999) as tab(col);
false
spark-sql> select cast(approx_percentile(col, 0.5) as string) from values 
(999) as tab(col);
1000
spark-sql> desc select approx_percentile(col, 0.5) from values 
(999) as tab(col);
approx_percentile(col, 0.5, 1)  decimal(19,0) 
{code}

The result is actually not null, so the second query returns false. The first 
query returns null because the result cannot fit into {{decimal(19, 0)}}.

A suggested fix is to use `Decimal.changePrecision` here to ensure the result 
fits, and really returns a null or throws an exception when the result doesn't 
fit.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-42020) createDataFrame with UDT

2023-03-13 Thread Apache Spark (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-42020?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-42020:


Assignee: (was: Apache Spark)

> createDataFrame with UDT
> 
>
> Key: SPARK-42020
> URL: https://issues.apache.org/jira/browse/SPARK-42020
> Project: Spark
>  Issue Type: Sub-task
>  Components: Connect
>Affects Versions: 3.4.0
>Reporter: Hyukjin Kwon
>Priority: Major
>
> {code}
> pyspark/sql/tests/test_types.py:596 
> (TypesParityTests.test_apply_schema_with_udt)
> self =  testMethod=test_apply_schema_with_udt>
> def test_apply_schema_with_udt(self):
> row = (1.0, ExamplePoint(1.0, 2.0))
> schema = StructType(
> [
> StructField("label", DoubleType(), False),
> StructField("point", ExamplePointUDT(), False),
> ]
> )
> >   df = self.spark.createDataFrame([row], schema)
> ../test_types.py:605: 
> _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 
> _ 
> ../../connect/session.py:282: in createDataFrame
> _table = pa.Table.from_pylist([dict(zip(_cols, list(item))) for item in 
> _data])
> pyarrow/table.pxi:3700: in pyarrow.lib.Table.from_pylist
> ???
> pyarrow/table.pxi:5221: in pyarrow.lib._from_pylist
> ???
> pyarrow/table.pxi:3575: in pyarrow.lib.Table.from_arrays
> ???
> pyarrow/table.pxi:1383: in pyarrow.lib._sanitize_arrays
> ???
> pyarrow/table.pxi:1364: in pyarrow.lib._schema_from_arrays
> ???
> pyarrow/array.pxi:320: in pyarrow.lib.array
> ???
> pyarrow/array.pxi:39: in pyarrow.lib._sequence_to_array
> ???
> pyarrow/error.pxi:144: in pyarrow.lib.pyarrow_internal_check_status
> ???
> _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 
> _ 
> >   ???
> E   pyarrow.lib.ArrowInvalid: Could not convert ExamplePoint(1.0,2.0) with 
> type ExamplePoint: did not recognize Python value type when inferring an 
> Arrow data type
> pyarrow/error.pxi:100: ArrowInvalid
> {code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-42020) createDataFrame with UDT

2023-03-13 Thread Apache Spark (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-42020?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-42020:


Assignee: Apache Spark

> createDataFrame with UDT
> 
>
> Key: SPARK-42020
> URL: https://issues.apache.org/jira/browse/SPARK-42020
> Project: Spark
>  Issue Type: Sub-task
>  Components: Connect
>Affects Versions: 3.4.0
>Reporter: Hyukjin Kwon
>Assignee: Apache Spark
>Priority: Major
>
> {code}
> pyspark/sql/tests/test_types.py:596 
> (TypesParityTests.test_apply_schema_with_udt)
> self =  testMethod=test_apply_schema_with_udt>
> def test_apply_schema_with_udt(self):
> row = (1.0, ExamplePoint(1.0, 2.0))
> schema = StructType(
> [
> StructField("label", DoubleType(), False),
> StructField("point", ExamplePointUDT(), False),
> ]
> )
> >   df = self.spark.createDataFrame([row], schema)
> ../test_types.py:605: 
> _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 
> _ 
> ../../connect/session.py:282: in createDataFrame
> _table = pa.Table.from_pylist([dict(zip(_cols, list(item))) for item in 
> _data])
> pyarrow/table.pxi:3700: in pyarrow.lib.Table.from_pylist
> ???
> pyarrow/table.pxi:5221: in pyarrow.lib._from_pylist
> ???
> pyarrow/table.pxi:3575: in pyarrow.lib.Table.from_arrays
> ???
> pyarrow/table.pxi:1383: in pyarrow.lib._sanitize_arrays
> ???
> pyarrow/table.pxi:1364: in pyarrow.lib._schema_from_arrays
> ???
> pyarrow/array.pxi:320: in pyarrow.lib.array
> ???
> pyarrow/array.pxi:39: in pyarrow.lib._sequence_to_array
> ???
> pyarrow/error.pxi:144: in pyarrow.lib.pyarrow_internal_check_status
> ???
> _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 
> _ 
> >   ???
> E   pyarrow.lib.ArrowInvalid: Could not convert ExamplePoint(1.0,2.0) with 
> type ExamplePoint: did not recognize Python value type when inferring an 
> Arrow data type
> pyarrow/error.pxi:100: ArrowInvalid
> {code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-42020) createDataFrame with UDT

2023-03-13 Thread Apache Spark (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-42020?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17699825#comment-17699825
 ] 

Apache Spark commented on SPARK-42020:
--

User 'ueshin' has created a pull request for this issue:
https://github.com/apache/spark/pull/40402

> createDataFrame with UDT
> 
>
> Key: SPARK-42020
> URL: https://issues.apache.org/jira/browse/SPARK-42020
> Project: Spark
>  Issue Type: Sub-task
>  Components: Connect
>Affects Versions: 3.4.0
>Reporter: Hyukjin Kwon
>Priority: Major
>
> {code}
> pyspark/sql/tests/test_types.py:596 
> (TypesParityTests.test_apply_schema_with_udt)
> self =  testMethod=test_apply_schema_with_udt>
> def test_apply_schema_with_udt(self):
> row = (1.0, ExamplePoint(1.0, 2.0))
> schema = StructType(
> [
> StructField("label", DoubleType(), False),
> StructField("point", ExamplePointUDT(), False),
> ]
> )
> >   df = self.spark.createDataFrame([row], schema)
> ../test_types.py:605: 
> _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 
> _ 
> ../../connect/session.py:282: in createDataFrame
> _table = pa.Table.from_pylist([dict(zip(_cols, list(item))) for item in 
> _data])
> pyarrow/table.pxi:3700: in pyarrow.lib.Table.from_pylist
> ???
> pyarrow/table.pxi:5221: in pyarrow.lib._from_pylist
> ???
> pyarrow/table.pxi:3575: in pyarrow.lib.Table.from_arrays
> ???
> pyarrow/table.pxi:1383: in pyarrow.lib._sanitize_arrays
> ???
> pyarrow/table.pxi:1364: in pyarrow.lib._schema_from_arrays
> ???
> pyarrow/array.pxi:320: in pyarrow.lib.array
> ???
> pyarrow/array.pxi:39: in pyarrow.lib._sequence_to_array
> ???
> pyarrow/error.pxi:144: in pyarrow.lib.pyarrow_internal_check_status
> ???
> _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 
> _ 
> >   ???
> E   pyarrow.lib.ArrowInvalid: Could not convert ExamplePoint(1.0,2.0) with 
> type ExamplePoint: did not recognize Python value type when inferring an 
> Arrow data type
> pyarrow/error.pxi:100: ArrowInvalid
> {code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-42774) Expose VectorTypes API for DataSourceV2 Batch Scans

2023-03-13 Thread Micah Kornfield (Jira)
Micah Kornfield created SPARK-42774:
---

 Summary: Expose VectorTypes API for DataSourceV2 Batch Scans
 Key: SPARK-42774
 URL: https://issues.apache.org/jira/browse/SPARK-42774
 Project: Spark
  Issue Type: Improvement
  Components: SQL
Affects Versions: 3.3.2
Reporter: Micah Kornfield


SparkPlan's vectorType's attribute can be used to [specialize 
codegen|https://github.com/apache/spark/blob/5556cfc59aa97a3ad4ea0baacebe19859ec0bcb7/sql/core/src/main/scala/org/apache/spark/sql/execution/Columnar.scala#L151]
 however 
[BatchScanExecBase|https://github.com/apache/spark/blob/6b6bb6fa20f40aeedea2fb87008e9cce76c54e28/sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/v2/DataSourceV2ScanExecBase.scala]
 does not override this so we DSv2 sources do not get any benefit of concrete 
class dispatch.

This proposes adding an override to BatchScanExecBase which delegates to a new 
default method on 
[PartitionReaderFactory|https://github.com/apache/spark/blob/f1d42bb68d6d69d9a32f91a390270f9ec33c3207/sql/catalyst/src/main/java/org/apache/spark/sql/connector/read/PartitionReaderFactory.java]
 to expose vectoryTypes:

{{
default Optional> getVectorTypes()

{ return Optional.empty(); } }}

 

 



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-42773) Minor grammatical change to "Supports Spark Connect" message

2023-03-13 Thread Apache Spark (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-42773?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-42773:


Assignee: Apache Spark

> Minor grammatical change to "Supports Spark Connect" message
> 
>
> Key: SPARK-42773
> URL: https://issues.apache.org/jira/browse/SPARK-42773
> Project: Spark
>  Issue Type: Documentation
>  Components: PySpark
>Affects Versions: 3.4.0
>Reporter: Allan Folting
>Assignee: Apache Spark
>Priority: Major
>
> Changing "Support Spark Connect" to "Supports Spark Connect" in the 3.4.0 
> version change message which is also used in the documentation:
>  
> .. versionchanged:: 3.4.0
>      Supports Spark Connect.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-42773) Minor grammatical change to "Supports Spark Connect" message

2023-03-13 Thread Apache Spark (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-42773?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-42773:


Assignee: (was: Apache Spark)

> Minor grammatical change to "Supports Spark Connect" message
> 
>
> Key: SPARK-42773
> URL: https://issues.apache.org/jira/browse/SPARK-42773
> Project: Spark
>  Issue Type: Documentation
>  Components: PySpark
>Affects Versions: 3.4.0
>Reporter: Allan Folting
>Priority: Major
>
> Changing "Support Spark Connect" to "Supports Spark Connect" in the 3.4.0 
> version change message which is also used in the documentation:
>  
> .. versionchanged:: 3.4.0
>      Supports Spark Connect.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-42773) Minor grammatical change to "Supports Spark Connect" message

2023-03-13 Thread Apache Spark (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-42773?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17699781#comment-17699781
 ] 

Apache Spark commented on SPARK-42773:
--

User 'allanf-db' has created a pull request for this issue:
https://github.com/apache/spark/pull/40401

> Minor grammatical change to "Supports Spark Connect" message
> 
>
> Key: SPARK-42773
> URL: https://issues.apache.org/jira/browse/SPARK-42773
> Project: Spark
>  Issue Type: Documentation
>  Components: PySpark
>Affects Versions: 3.4.0
>Reporter: Allan Folting
>Priority: Major
>
> Changing "Support Spark Connect" to "Supports Spark Connect" in the 3.4.0 
> version change message which is also used in the documentation:
>  
> .. versionchanged:: 3.4.0
>      Supports Spark Connect.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-42769) Add SPARK_DRIVER_POD_IP env variable to executor pods

2023-03-13 Thread Dongjoon Hyun (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-42769?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dongjoon Hyun resolved SPARK-42769.
---
Fix Version/s: 3.5.0
   Resolution: Fixed

Issue resolved by pull request 40392
[https://github.com/apache/spark/pull/40392]

> Add SPARK_DRIVER_POD_IP env variable to executor pods
> -
>
> Key: SPARK-42769
> URL: https://issues.apache.org/jira/browse/SPARK-42769
> Project: Spark
>  Issue Type: Improvement
>  Components: Kubernetes
>Affects Versions: 3.5.0
>Reporter: Dongjoon Hyun
>Assignee: Dongjoon Hyun
>Priority: Major
> Fix For: 3.5.0
>
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-42769) Add SPARK_DRIVER_POD_IP env variable to executor pods

2023-03-13 Thread Dongjoon Hyun (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-42769?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dongjoon Hyun reassigned SPARK-42769:
-

Assignee: Dongjoon Hyun

> Add SPARK_DRIVER_POD_IP env variable to executor pods
> -
>
> Key: SPARK-42769
> URL: https://issues.apache.org/jira/browse/SPARK-42769
> Project: Spark
>  Issue Type: Improvement
>  Components: Kubernetes
>Affects Versions: 3.5.0
>Reporter: Dongjoon Hyun
>Assignee: Dongjoon Hyun
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-34637) Support DPP in AQE when the broadcast exchange can be reused

2023-03-13 Thread Eugene Koifman (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-34637?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Eugene Koifman updated SPARK-34637:
---
Summary: Support DPP in AQE when the broadcast exchange can be reused  
(was: Support DPP in AQE when the boradcast exchange can be reused)

> Support DPP in AQE when the broadcast exchange can be reused
> 
>
> Key: SPARK-34637
> URL: https://issues.apache.org/jira/browse/SPARK-34637
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 3.2.0
>Reporter: Ke Jia
>Assignee: Ke Jia
>Priority: Major
> Fix For: 3.2.0
>
>
> We have supported DPP in AQE when the join is Broadcast hash join before 
> applying the AQE rules in SPARK-34168, which has some limitations. It only 
> apply DPP when the small table side executed firstly and then the big table 
> side can reuse the broadcast exchange in small table side. This Jira is to 
> address the above limitations and can apply the DPP when the broadcast 
> exchange can be reused.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-41359) Use `PhysicalDataType` instead of DataType in UnsafeRow

2023-03-13 Thread Apache Spark (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-41359?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-41359:


Assignee: Apache Spark

> Use `PhysicalDataType` instead of DataType in UnsafeRow
> ---
>
> Key: SPARK-41359
> URL: https://issues.apache.org/jira/browse/SPARK-41359
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 3.4.0
>Reporter: Yang Jie
>Assignee: Apache Spark
>Priority: Minor
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-41359) Use `PhysicalDataType` instead of DataType in UnsafeRow

2023-03-13 Thread Apache Spark (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-41359?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17699720#comment-17699720
 ] 

Apache Spark commented on SPARK-41359:
--

User 'ClownXC' has created a pull request for this issue:
https://github.com/apache/spark/pull/40400

> Use `PhysicalDataType` instead of DataType in UnsafeRow
> ---
>
> Key: SPARK-41359
> URL: https://issues.apache.org/jira/browse/SPARK-41359
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 3.4.0
>Reporter: Yang Jie
>Priority: Minor
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-41359) Use `PhysicalDataType` instead of DataType in UnsafeRow

2023-03-13 Thread Apache Spark (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-41359?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-41359:


Assignee: (was: Apache Spark)

> Use `PhysicalDataType` instead of DataType in UnsafeRow
> ---
>
> Key: SPARK-41359
> URL: https://issues.apache.org/jira/browse/SPARK-41359
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 3.4.0
>Reporter: Yang Jie
>Priority: Minor
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-42773) Minor grammatical change to "Supports Spark Connect" message

2023-03-13 Thread Allan Folting (Jira)
Allan Folting created SPARK-42773:
-

 Summary: Minor grammatical change to "Supports Spark Connect" 
message
 Key: SPARK-42773
 URL: https://issues.apache.org/jira/browse/SPARK-42773
 Project: Spark
  Issue Type: Documentation
  Components: PySpark
Affects Versions: 3.4.0
Reporter: Allan Folting


Changing "Support Spark Connect" to "Supports Spark Connect" in the 3.4.0 
version change message which is also used in the documentation:

 
.. versionchanged:: 3.4.0
     Supports Spark Connect.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-38992) Avoid using bash -c in ShellBasedGroupsMappingProvider

2023-03-13 Thread Sean R. Owen (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-38992?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sean R. Owen updated SPARK-38992:
-
Fix Version/s: (was: 3.1.3)

> Avoid using bash -c in ShellBasedGroupsMappingProvider
> --
>
> Key: SPARK-38992
> URL: https://issues.apache.org/jira/browse/SPARK-38992
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Core
>Affects Versions: 3.0.3, 3.1.2, 3.2.1, 3.3.0
>Reporter: Hyukjin Kwon
>Assignee: Hyukjin Kwon
>Priority: Major
> Fix For: 3.0.4, 3.3.0, 3.2.2
>
>
> Using bash -c can allow arbitrary shall execution from the end user.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-42101) Wrap InMemoryTableScanExec with QueryStage

2023-03-13 Thread Apache Spark (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-42101?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17699610#comment-17699610
 ] 

Apache Spark commented on SPARK-42101:
--

User 'cloud-fan' has created a pull request for this issue:
https://github.com/apache/spark/pull/40399

> Wrap InMemoryTableScanExec with QueryStage
> --
>
> Key: SPARK-42101
> URL: https://issues.apache.org/jira/browse/SPARK-42101
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 3.5.0
>Reporter: XiDuo You
>Assignee: XiDuo You
>Priority: Major
> Fix For: 3.5.0
>
>
> The first access to the cached plan which is enable AQE is tricky. Currently, 
> we can not preverse it's output partitioning and ordering.
> The whole query plan also missed lots of optimization in AQE framework. Wrap 
> InMemoryTableScanExec  to query stage can resolve all these issues.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-42101) Wrap InMemoryTableScanExec with QueryStage

2023-03-13 Thread Apache Spark (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-42101?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17699611#comment-17699611
 ] 

Apache Spark commented on SPARK-42101:
--

User 'cloud-fan' has created a pull request for this issue:
https://github.com/apache/spark/pull/40399

> Wrap InMemoryTableScanExec with QueryStage
> --
>
> Key: SPARK-42101
> URL: https://issues.apache.org/jira/browse/SPARK-42101
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 3.5.0
>Reporter: XiDuo You
>Assignee: XiDuo You
>Priority: Major
> Fix For: 3.5.0
>
>
> The first access to the cached plan which is enable AQE is tricky. Currently, 
> we can not preverse it's output partitioning and ordering.
> The whole query plan also missed lots of optimization in AQE framework. Wrap 
> InMemoryTableScanExec  to query stage can resolve all these issues.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-42052) Codegen Support for HiveSimpleUDF

2023-03-13 Thread Apache Spark (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-42052?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17699595#comment-17699595
 ] 

Apache Spark commented on SPARK-42052:
--

User 'panbingkun' has created a pull request for this issue:
https://github.com/apache/spark/pull/40397

> Codegen Support for HiveSimpleUDF
> -
>
> Key: SPARK-42052
> URL: https://issues.apache.org/jira/browse/SPARK-42052
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 3.4.0
>Reporter: Kent Yao
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-42772) Change the default value of JDBC options about push down to true

2023-03-13 Thread Apache Spark (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-42772?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-42772:


Assignee: (was: Apache Spark)

> Change the default value of JDBC options about push down to true
> 
>
> Key: SPARK-42772
> URL: https://issues.apache.org/jira/browse/SPARK-42772
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 3.5.0
>Reporter: jiaan.geng
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-42772) Change the default value of JDBC options about push down to true

2023-03-13 Thread Apache Spark (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-42772?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17699571#comment-17699571
 ] 

Apache Spark commented on SPARK-42772:
--

User 'beliefer' has created a pull request for this issue:
https://github.com/apache/spark/pull/40396

> Change the default value of JDBC options about push down to true
> 
>
> Key: SPARK-42772
> URL: https://issues.apache.org/jira/browse/SPARK-42772
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 3.5.0
>Reporter: jiaan.geng
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-42772) Change the default value of JDBC options about push down to true

2023-03-13 Thread Apache Spark (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-42772?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-42772:


Assignee: Apache Spark

> Change the default value of JDBC options about push down to true
> 
>
> Key: SPARK-42772
> URL: https://issues.apache.org/jira/browse/SPARK-42772
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 3.5.0
>Reporter: jiaan.geng
>Assignee: Apache Spark
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-42772) Adjust the default value of JDBC options about push down to true

2023-03-13 Thread jiaan.geng (Jira)
jiaan.geng created SPARK-42772:
--

 Summary: Adjust the default value of JDBC options about push down 
to true
 Key: SPARK-42772
 URL: https://issues.apache.org/jira/browse/SPARK-42772
 Project: Spark
  Issue Type: Sub-task
  Components: SQL
Affects Versions: 3.5.0
Reporter: jiaan.geng






--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-42772) Change the default value of JDBC options about push down to true

2023-03-13 Thread jiaan.geng (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-42772?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

jiaan.geng updated SPARK-42772:
---
Summary: Change the default value of JDBC options about push down to true  
(was: Adjust the default value of JDBC options about push down to true)

> Change the default value of JDBC options about push down to true
> 
>
> Key: SPARK-42772
> URL: https://issues.apache.org/jira/browse/SPARK-42772
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 3.5.0
>Reporter: jiaan.geng
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-42770) SQLImplicitsTestSuite test failed with Java 17

2023-03-13 Thread Apache Spark (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-42770?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-42770:


Assignee: (was: Apache Spark)

> SQLImplicitsTestSuite test failed with Java 17
> --
>
> Key: SPARK-42770
> URL: https://issues.apache.org/jira/browse/SPARK-42770
> Project: Spark
>  Issue Type: Bug
>  Components: Connect, Tests
>Affects Versions: 3.4.0, 3.5.0
>Reporter: Yang Jie
>Priority: Major
>
> [https://github.com/apache/spark/actions/runs/4318647315/jobs/7537203682]
> {code:java}
> [info] - test implicit encoder resolution *** FAILED *** (1 second, 329 
> milliseconds)
> 4429[info]   2023-03-02T23:00:20.404434 did not equal 
> 2023-03-02T23:00:20.404434875 (SQLImplicitsTestSuite.scala:63)
> 4430[info]   org.scalatest.exceptions.TestFailedException:
> 4431[info]   at 
> org.scalatest.Assertions.newAssertionFailedException(Assertions.scala:472)
> 4432[info]   at 
> org.scalatest.Assertions.newAssertionFailedException$(Assertions.scala:471)
> 4433[info]   at 
> org.scalatest.Assertions$.newAssertionFailedException(Assertions.scala:1231)
> 4434[info]   at 
> org.scalatest.Assertions$AssertionsHelper.macroAssert(Assertions.scala:1295)
> 4435[info]   at 
> org.apache.spark.sql.SQLImplicitsTestSuite.testImplicit$1(SQLImplicitsTestSuite.scala:63)
> 4436[info]   at 
> org.apache.spark.sql.SQLImplicitsTestSuite.$anonfun$new$2(SQLImplicitsTestSuite.scala:133)
> 4437[info]   at 
> scala.runtime.java8.JFunction0$mcV$sp.apply(JFunction0$mcV$sp.java:23)
> 4438[info]   at org.scalatest.OutcomeOf.outcomeOf(OutcomeOf.scala:85)
> 4439[info]   at org.scalatest.OutcomeOf.outcomeOf$(OutcomeOf.scala:83)
> 4440[info]   at org.scalatest.OutcomeOf$.outcomeOf(OutcomeOf.scala:104)
> 4441[info]   at org.scalatest.Transformer.apply(Transformer.scala:22)
> 4442[info]   at org.scalatest.Transformer.apply(Transformer.scala:20)
> 4443[info]   at 
> org.scalatest.funsuite.AnyFunSuiteLike$$anon$1.apply(AnyFunSuiteLike.scala:226)
> [info]   at org.scalatest.TestSuite.withFixture(TestSuite.scala:196)
> 4445[info]   at org.scalatest.TestSuite.withFixture$(TestSuite.scala:195)
> 4446[info]   at 
> org.scalatest.funsuite.AnyFunSuite.withFixture(AnyFunSuite.scala:1564)
> 4447[info]   at 
> org.scalatest.funsuite.AnyFunSuiteLike.invokeWithFixture$1(AnyFunSuiteLike.scala:224)
> 4448[info]   at 
> org.scalatest.funsuite.AnyFunSuiteLike.$anonfun$runTest$1(AnyFunSuiteLike.scala:236)
> 4449[info]   at org.scalatest.SuperEngine.runTestImpl(Engine.scala:306)
> 4450[info]   at 
> org.scalatest.funsuite.AnyFunSuiteLike.runTest(AnyFunSuiteLike.scala:236)
> 4451[info]   at 
> org.scalatest.funsuite.AnyFunSuiteLike.runTest$(AnyFunSuiteLike.scala:218)
> 4452[info]   at 
> org.scalatest.funsuite.AnyFunSuite.runTest(AnyFunSuite.scala:1564)
> 4453[info]   at 
> org.scalatest.funsuite.AnyFunSuiteLike.$anonfun$runTests$1(AnyFunSuiteLike.scala:269)
> 4454[info]   at 
> org.scalatest.SuperEngine.$anonfun$runTestsInBranch$1(Engine.scala:413)
> 4455[info]   at scala.collection.immutable.List.foreach(List.scala:431)
> 4456[info]   at org.scalatest.SuperEngine.traverseSubNodes$1(Engine.scala:401)
> 4457[info]   at org.scalatest.SuperEngine.runTestsInBranch(Engine.scala:396)
> 4458[info]   at org.scalatest.SuperEngine.runTestsImpl(Engine.scala:475)
> 4459[info]   at 
> org.scalatest.funsuite.AnyFunSuiteLike.runTests(AnyFunSuiteLike.scala:269)
> 4460[info]   at 
> org.scalatest.funsuite.AnyFunSuiteLike.runTests$(AnyFunSuiteLike.scala:268)
> 4461[info]   at 
> org.scalatest.funsuite.AnyFunSuite.runTests(AnyFunSuite.scala:1564)
> 4462[info]   at org.scalatest.Suite.run(Suite.scala:1114)
> 4463[info]   at org.scalatest.Suite.run$(Suite.scala:1096)
> 4464[info]   at 
> org.scalatest.funsuite.AnyFunSuite.org$scalatest$funsuite$AnyFunSuiteLike$$super$run(AnyFunSuite.scala:1564)
> 4465[info]   at 
> org.scalatest.funsuite.AnyFunSuiteLike.$anonfun$run$1(AnyFunSuiteLike.scala:273)
> 4466[info]   at org.scalatest.SuperEngine.runImpl(Engine.scala:535)
> 4467[info]   at 
> org.scalatest.funsuite.AnyFunSuiteLike.run(AnyFunSuiteLike.scala:273)
> 4468[info]   at 
> org.scalatest.funsuite.AnyFunSuiteLike.run$(AnyFunSuiteLike.scala:272)
> 4469[info]   at 
> org.apache.spark.sql.SQLImplicitsTestSuite.org$scalatest$BeforeAndAfterAll$$super$run(SQLImplicitsTestSuite.scala:34)
> 4470[info]   at 
> org.scalatest.BeforeAndAfterAll.liftedTree1$1(BeforeAndAfterAll.scala:213)
> 4471[info]   at 
> org.scalatest.BeforeAndAfterAll.run(BeforeAndAfterAll.scala:210)
> 4472[info]   at 
> org.scalatest.BeforeAndAfterAll.run$(BeforeAndAfterAll.scala:208)
> 4473[info]   at 
> org.apache.spark.sql.SQLImplicitsTestSuite.run(SQLImplicitsTestSuite.scala:34)
> 4474[info]   at 
> 

[jira] [Assigned] (SPARK-42770) SQLImplicitsTestSuite test failed with Java 17

2023-03-13 Thread Apache Spark (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-42770?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-42770:


Assignee: Apache Spark

> SQLImplicitsTestSuite test failed with Java 17
> --
>
> Key: SPARK-42770
> URL: https://issues.apache.org/jira/browse/SPARK-42770
> Project: Spark
>  Issue Type: Bug
>  Components: Connect, Tests
>Affects Versions: 3.4.0, 3.5.0
>Reporter: Yang Jie
>Assignee: Apache Spark
>Priority: Major
>
> [https://github.com/apache/spark/actions/runs/4318647315/jobs/7537203682]
> {code:java}
> [info] - test implicit encoder resolution *** FAILED *** (1 second, 329 
> milliseconds)
> 4429[info]   2023-03-02T23:00:20.404434 did not equal 
> 2023-03-02T23:00:20.404434875 (SQLImplicitsTestSuite.scala:63)
> 4430[info]   org.scalatest.exceptions.TestFailedException:
> 4431[info]   at 
> org.scalatest.Assertions.newAssertionFailedException(Assertions.scala:472)
> 4432[info]   at 
> org.scalatest.Assertions.newAssertionFailedException$(Assertions.scala:471)
> 4433[info]   at 
> org.scalatest.Assertions$.newAssertionFailedException(Assertions.scala:1231)
> 4434[info]   at 
> org.scalatest.Assertions$AssertionsHelper.macroAssert(Assertions.scala:1295)
> 4435[info]   at 
> org.apache.spark.sql.SQLImplicitsTestSuite.testImplicit$1(SQLImplicitsTestSuite.scala:63)
> 4436[info]   at 
> org.apache.spark.sql.SQLImplicitsTestSuite.$anonfun$new$2(SQLImplicitsTestSuite.scala:133)
> 4437[info]   at 
> scala.runtime.java8.JFunction0$mcV$sp.apply(JFunction0$mcV$sp.java:23)
> 4438[info]   at org.scalatest.OutcomeOf.outcomeOf(OutcomeOf.scala:85)
> 4439[info]   at org.scalatest.OutcomeOf.outcomeOf$(OutcomeOf.scala:83)
> 4440[info]   at org.scalatest.OutcomeOf$.outcomeOf(OutcomeOf.scala:104)
> 4441[info]   at org.scalatest.Transformer.apply(Transformer.scala:22)
> 4442[info]   at org.scalatest.Transformer.apply(Transformer.scala:20)
> 4443[info]   at 
> org.scalatest.funsuite.AnyFunSuiteLike$$anon$1.apply(AnyFunSuiteLike.scala:226)
> [info]   at org.scalatest.TestSuite.withFixture(TestSuite.scala:196)
> 4445[info]   at org.scalatest.TestSuite.withFixture$(TestSuite.scala:195)
> 4446[info]   at 
> org.scalatest.funsuite.AnyFunSuite.withFixture(AnyFunSuite.scala:1564)
> 4447[info]   at 
> org.scalatest.funsuite.AnyFunSuiteLike.invokeWithFixture$1(AnyFunSuiteLike.scala:224)
> 4448[info]   at 
> org.scalatest.funsuite.AnyFunSuiteLike.$anonfun$runTest$1(AnyFunSuiteLike.scala:236)
> 4449[info]   at org.scalatest.SuperEngine.runTestImpl(Engine.scala:306)
> 4450[info]   at 
> org.scalatest.funsuite.AnyFunSuiteLike.runTest(AnyFunSuiteLike.scala:236)
> 4451[info]   at 
> org.scalatest.funsuite.AnyFunSuiteLike.runTest$(AnyFunSuiteLike.scala:218)
> 4452[info]   at 
> org.scalatest.funsuite.AnyFunSuite.runTest(AnyFunSuite.scala:1564)
> 4453[info]   at 
> org.scalatest.funsuite.AnyFunSuiteLike.$anonfun$runTests$1(AnyFunSuiteLike.scala:269)
> 4454[info]   at 
> org.scalatest.SuperEngine.$anonfun$runTestsInBranch$1(Engine.scala:413)
> 4455[info]   at scala.collection.immutable.List.foreach(List.scala:431)
> 4456[info]   at org.scalatest.SuperEngine.traverseSubNodes$1(Engine.scala:401)
> 4457[info]   at org.scalatest.SuperEngine.runTestsInBranch(Engine.scala:396)
> 4458[info]   at org.scalatest.SuperEngine.runTestsImpl(Engine.scala:475)
> 4459[info]   at 
> org.scalatest.funsuite.AnyFunSuiteLike.runTests(AnyFunSuiteLike.scala:269)
> 4460[info]   at 
> org.scalatest.funsuite.AnyFunSuiteLike.runTests$(AnyFunSuiteLike.scala:268)
> 4461[info]   at 
> org.scalatest.funsuite.AnyFunSuite.runTests(AnyFunSuite.scala:1564)
> 4462[info]   at org.scalatest.Suite.run(Suite.scala:1114)
> 4463[info]   at org.scalatest.Suite.run$(Suite.scala:1096)
> 4464[info]   at 
> org.scalatest.funsuite.AnyFunSuite.org$scalatest$funsuite$AnyFunSuiteLike$$super$run(AnyFunSuite.scala:1564)
> 4465[info]   at 
> org.scalatest.funsuite.AnyFunSuiteLike.$anonfun$run$1(AnyFunSuiteLike.scala:273)
> 4466[info]   at org.scalatest.SuperEngine.runImpl(Engine.scala:535)
> 4467[info]   at 
> org.scalatest.funsuite.AnyFunSuiteLike.run(AnyFunSuiteLike.scala:273)
> 4468[info]   at 
> org.scalatest.funsuite.AnyFunSuiteLike.run$(AnyFunSuiteLike.scala:272)
> 4469[info]   at 
> org.apache.spark.sql.SQLImplicitsTestSuite.org$scalatest$BeforeAndAfterAll$$super$run(SQLImplicitsTestSuite.scala:34)
> 4470[info]   at 
> org.scalatest.BeforeAndAfterAll.liftedTree1$1(BeforeAndAfterAll.scala:213)
> 4471[info]   at 
> org.scalatest.BeforeAndAfterAll.run(BeforeAndAfterAll.scala:210)
> 4472[info]   at 
> org.scalatest.BeforeAndAfterAll.run$(BeforeAndAfterAll.scala:208)
> 4473[info]   at 
> org.apache.spark.sql.SQLImplicitsTestSuite.run(SQLImplicitsTestSuite.scala:34)
> 4474[info]   at 
> 

[jira] [Commented] (SPARK-42770) SQLImplicitsTestSuite test failed with Java 17

2023-03-13 Thread Apache Spark (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-42770?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17699542#comment-17699542
 ] 

Apache Spark commented on SPARK-42770:
--

User 'LuciferYang' has created a pull request for this issue:
https://github.com/apache/spark/pull/40395

> SQLImplicitsTestSuite test failed with Java 17
> --
>
> Key: SPARK-42770
> URL: https://issues.apache.org/jira/browse/SPARK-42770
> Project: Spark
>  Issue Type: Bug
>  Components: Connect, Tests
>Affects Versions: 3.4.0, 3.5.0
>Reporter: Yang Jie
>Priority: Major
>
> [https://github.com/apache/spark/actions/runs/4318647315/jobs/7537203682]
> {code:java}
> [info] - test implicit encoder resolution *** FAILED *** (1 second, 329 
> milliseconds)
> 4429[info]   2023-03-02T23:00:20.404434 did not equal 
> 2023-03-02T23:00:20.404434875 (SQLImplicitsTestSuite.scala:63)
> 4430[info]   org.scalatest.exceptions.TestFailedException:
> 4431[info]   at 
> org.scalatest.Assertions.newAssertionFailedException(Assertions.scala:472)
> 4432[info]   at 
> org.scalatest.Assertions.newAssertionFailedException$(Assertions.scala:471)
> 4433[info]   at 
> org.scalatest.Assertions$.newAssertionFailedException(Assertions.scala:1231)
> 4434[info]   at 
> org.scalatest.Assertions$AssertionsHelper.macroAssert(Assertions.scala:1295)
> 4435[info]   at 
> org.apache.spark.sql.SQLImplicitsTestSuite.testImplicit$1(SQLImplicitsTestSuite.scala:63)
> 4436[info]   at 
> org.apache.spark.sql.SQLImplicitsTestSuite.$anonfun$new$2(SQLImplicitsTestSuite.scala:133)
> 4437[info]   at 
> scala.runtime.java8.JFunction0$mcV$sp.apply(JFunction0$mcV$sp.java:23)
> 4438[info]   at org.scalatest.OutcomeOf.outcomeOf(OutcomeOf.scala:85)
> 4439[info]   at org.scalatest.OutcomeOf.outcomeOf$(OutcomeOf.scala:83)
> 4440[info]   at org.scalatest.OutcomeOf$.outcomeOf(OutcomeOf.scala:104)
> 4441[info]   at org.scalatest.Transformer.apply(Transformer.scala:22)
> 4442[info]   at org.scalatest.Transformer.apply(Transformer.scala:20)
> 4443[info]   at 
> org.scalatest.funsuite.AnyFunSuiteLike$$anon$1.apply(AnyFunSuiteLike.scala:226)
> [info]   at org.scalatest.TestSuite.withFixture(TestSuite.scala:196)
> 4445[info]   at org.scalatest.TestSuite.withFixture$(TestSuite.scala:195)
> 4446[info]   at 
> org.scalatest.funsuite.AnyFunSuite.withFixture(AnyFunSuite.scala:1564)
> 4447[info]   at 
> org.scalatest.funsuite.AnyFunSuiteLike.invokeWithFixture$1(AnyFunSuiteLike.scala:224)
> 4448[info]   at 
> org.scalatest.funsuite.AnyFunSuiteLike.$anonfun$runTest$1(AnyFunSuiteLike.scala:236)
> 4449[info]   at org.scalatest.SuperEngine.runTestImpl(Engine.scala:306)
> 4450[info]   at 
> org.scalatest.funsuite.AnyFunSuiteLike.runTest(AnyFunSuiteLike.scala:236)
> 4451[info]   at 
> org.scalatest.funsuite.AnyFunSuiteLike.runTest$(AnyFunSuiteLike.scala:218)
> 4452[info]   at 
> org.scalatest.funsuite.AnyFunSuite.runTest(AnyFunSuite.scala:1564)
> 4453[info]   at 
> org.scalatest.funsuite.AnyFunSuiteLike.$anonfun$runTests$1(AnyFunSuiteLike.scala:269)
> 4454[info]   at 
> org.scalatest.SuperEngine.$anonfun$runTestsInBranch$1(Engine.scala:413)
> 4455[info]   at scala.collection.immutable.List.foreach(List.scala:431)
> 4456[info]   at org.scalatest.SuperEngine.traverseSubNodes$1(Engine.scala:401)
> 4457[info]   at org.scalatest.SuperEngine.runTestsInBranch(Engine.scala:396)
> 4458[info]   at org.scalatest.SuperEngine.runTestsImpl(Engine.scala:475)
> 4459[info]   at 
> org.scalatest.funsuite.AnyFunSuiteLike.runTests(AnyFunSuiteLike.scala:269)
> 4460[info]   at 
> org.scalatest.funsuite.AnyFunSuiteLike.runTests$(AnyFunSuiteLike.scala:268)
> 4461[info]   at 
> org.scalatest.funsuite.AnyFunSuite.runTests(AnyFunSuite.scala:1564)
> 4462[info]   at org.scalatest.Suite.run(Suite.scala:1114)
> 4463[info]   at org.scalatest.Suite.run$(Suite.scala:1096)
> 4464[info]   at 
> org.scalatest.funsuite.AnyFunSuite.org$scalatest$funsuite$AnyFunSuiteLike$$super$run(AnyFunSuite.scala:1564)
> 4465[info]   at 
> org.scalatest.funsuite.AnyFunSuiteLike.$anonfun$run$1(AnyFunSuiteLike.scala:273)
> 4466[info]   at org.scalatest.SuperEngine.runImpl(Engine.scala:535)
> 4467[info]   at 
> org.scalatest.funsuite.AnyFunSuiteLike.run(AnyFunSuiteLike.scala:273)
> 4468[info]   at 
> org.scalatest.funsuite.AnyFunSuiteLike.run$(AnyFunSuiteLike.scala:272)
> 4469[info]   at 
> org.apache.spark.sql.SQLImplicitsTestSuite.org$scalatest$BeforeAndAfterAll$$super$run(SQLImplicitsTestSuite.scala:34)
> 4470[info]   at 
> org.scalatest.BeforeAndAfterAll.liftedTree1$1(BeforeAndAfterAll.scala:213)
> 4471[info]   at 
> org.scalatest.BeforeAndAfterAll.run(BeforeAndAfterAll.scala:210)
> 4472[info]   at 
> org.scalatest.BeforeAndAfterAll.run$(BeforeAndAfterAll.scala:208)
> 4473[info]   at 
> org.apache.spark.sql.SQLImplicitsTestSuite.run(SQLImplicitsTestSuite.scala:34)
> 

[jira] [Commented] (SPARK-42771) Refactor HiveGenericUDF

2023-03-13 Thread Apache Spark (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-42771?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17699527#comment-17699527
 ] 

Apache Spark commented on SPARK-42771:
--

User 'panbingkun' has created a pull request for this issue:
https://github.com/apache/spark/pull/40394

> Refactor HiveGenericUDF
> ---
>
> Key: SPARK-42771
> URL: https://issues.apache.org/jira/browse/SPARK-42771
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 3.5.0
>Reporter: BingKun Pan
>Priority: Minor
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-42771) Refactor HiveGenericUDF

2023-03-13 Thread Apache Spark (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-42771?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-42771:


Assignee: (was: Apache Spark)

> Refactor HiveGenericUDF
> ---
>
> Key: SPARK-42771
> URL: https://issues.apache.org/jira/browse/SPARK-42771
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 3.5.0
>Reporter: BingKun Pan
>Priority: Minor
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-42771) Refactor HiveGenericUDF

2023-03-13 Thread Apache Spark (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-42771?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-42771:


Assignee: Apache Spark

> Refactor HiveGenericUDF
> ---
>
> Key: SPARK-42771
> URL: https://issues.apache.org/jira/browse/SPARK-42771
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 3.5.0
>Reporter: BingKun Pan
>Assignee: Apache Spark
>Priority: Minor
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-42769) Add SPARK_DRIVER_POD_IP env variable to executor pods

2023-03-13 Thread Dongjoon Hyun (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-42769?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dongjoon Hyun updated SPARK-42769:
--
Summary: Add SPARK_DRIVER_POD_IP env variable to executor pods  (was: Add 
ENV_DRIVER_POD_IP env variable to executor pods)

> Add SPARK_DRIVER_POD_IP env variable to executor pods
> -
>
> Key: SPARK-42769
> URL: https://issues.apache.org/jira/browse/SPARK-42769
> Project: Spark
>  Issue Type: Improvement
>  Components: Kubernetes
>Affects Versions: 3.5.0
>Reporter: Dongjoon Hyun
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-42771) Refactor HiveGenericUDF

2023-03-13 Thread BingKun Pan (Jira)
BingKun Pan created SPARK-42771:
---

 Summary: Refactor HiveGenericUDF
 Key: SPARK-42771
 URL: https://issues.apache.org/jira/browse/SPARK-42771
 Project: Spark
  Issue Type: Sub-task
  Components: SQL
Affects Versions: 3.5.0
Reporter: BingKun Pan






--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-40082) DAGScheduler may not schduler new stage in condition of push-based shuffle enabled

2023-03-13 Thread Apache Spark (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-40082?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17699516#comment-17699516
 ] 

Apache Spark commented on SPARK-40082:
--

User 'Stove-hust' has created a pull request for this issue:
https://github.com/apache/spark/pull/40393

> DAGScheduler may not schduler new stage in condition of push-based shuffle 
> enabled
> --
>
> Key: SPARK-40082
> URL: https://issues.apache.org/jira/browse/SPARK-40082
> Project: Spark
>  Issue Type: Bug
>  Components: Scheduler
>Affects Versions: 3.1.1
>Reporter: Penglei Shi
>Priority: Major
> Attachments: missParentStages.png, shuffleMergeFinalized.png, 
> submitMissingTasks.png
>
>
> In condition of push-based shuffle being enabled and speculative tasks 
> existing, a shuffleMapStage will be resubmitting once fetchFailed occurring, 
> then its parent stages will be resubmitting firstly and it will cost some 
> time to compute. Before the shuffleMapStage being resubmitted, its all 
> speculative tasks success and register map output, but speculative task 
> successful events can not trigger shuffleMergeFinalized because this stage 
> has been removed from runningStages.
> Then this stage is resubmitted, but speculative tasks have registered map 
> output and there are no missing tasks to compute, resubmitting stages will 
> also not trigger shuffleMergeFinalized. Eventually this stage‘s 
> _shuffleMergedFinalized keeps false.
> Then AQE will submit next stages which are dependent on  this shuffleMapStage 
> occurring fetchFailed. And in getMissingParentStages, this stage will be 
> marked as missing and will be resubmitted, but next stages are added to 
> waitingStages after this stage being finished, so next stages will not be 
> submitted even though this stage's resubmitting has been finished.
> I have only met some times in my production env and it is difficult to 
> reproduce。



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-40082) DAGScheduler may not schduler new stage in condition of push-based shuffle enabled

2023-03-13 Thread Apache Spark (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-40082?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-40082:


Assignee: (was: Apache Spark)

> DAGScheduler may not schduler new stage in condition of push-based shuffle 
> enabled
> --
>
> Key: SPARK-40082
> URL: https://issues.apache.org/jira/browse/SPARK-40082
> Project: Spark
>  Issue Type: Bug
>  Components: Scheduler
>Affects Versions: 3.1.1
>Reporter: Penglei Shi
>Priority: Major
> Attachments: missParentStages.png, shuffleMergeFinalized.png, 
> submitMissingTasks.png
>
>
> In condition of push-based shuffle being enabled and speculative tasks 
> existing, a shuffleMapStage will be resubmitting once fetchFailed occurring, 
> then its parent stages will be resubmitting firstly and it will cost some 
> time to compute. Before the shuffleMapStage being resubmitted, its all 
> speculative tasks success and register map output, but speculative task 
> successful events can not trigger shuffleMergeFinalized because this stage 
> has been removed from runningStages.
> Then this stage is resubmitted, but speculative tasks have registered map 
> output and there are no missing tasks to compute, resubmitting stages will 
> also not trigger shuffleMergeFinalized. Eventually this stage‘s 
> _shuffleMergedFinalized keeps false.
> Then AQE will submit next stages which are dependent on  this shuffleMapStage 
> occurring fetchFailed. And in getMissingParentStages, this stage will be 
> marked as missing and will be resubmitted, but next stages are added to 
> waitingStages after this stage being finished, so next stages will not be 
> submitted even though this stage's resubmitting has been finished.
> I have only met some times in my production env and it is difficult to 
> reproduce。



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-40082) DAGScheduler may not schduler new stage in condition of push-based shuffle enabled

2023-03-13 Thread Apache Spark (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-40082?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-40082:


Assignee: Apache Spark

> DAGScheduler may not schduler new stage in condition of push-based shuffle 
> enabled
> --
>
> Key: SPARK-40082
> URL: https://issues.apache.org/jira/browse/SPARK-40082
> Project: Spark
>  Issue Type: Bug
>  Components: Scheduler
>Affects Versions: 3.1.1
>Reporter: Penglei Shi
>Assignee: Apache Spark
>Priority: Major
> Attachments: missParentStages.png, shuffleMergeFinalized.png, 
> submitMissingTasks.png
>
>
> In condition of push-based shuffle being enabled and speculative tasks 
> existing, a shuffleMapStage will be resubmitting once fetchFailed occurring, 
> then its parent stages will be resubmitting firstly and it will cost some 
> time to compute. Before the shuffleMapStage being resubmitted, its all 
> speculative tasks success and register map output, but speculative task 
> successful events can not trigger shuffleMergeFinalized because this stage 
> has been removed from runningStages.
> Then this stage is resubmitted, but speculative tasks have registered map 
> output and there are no missing tasks to compute, resubmitting stages will 
> also not trigger shuffleMergeFinalized. Eventually this stage‘s 
> _shuffleMergedFinalized keeps false.
> Then AQE will submit next stages which are dependent on  this shuffleMapStage 
> occurring fetchFailed. And in getMissingParentStages, this stage will be 
> marked as missing and will be resubmitted, but next stages are added to 
> waitingStages after this stage being finished, so next stages will not be 
> submitted even though this stage's resubmitting has been finished.
> I have only met some times in my production env and it is difficult to 
> reproduce。



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-40082) DAGScheduler may not schduler new stage in condition of push-based shuffle enabled

2023-03-13 Thread Apache Spark (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-40082?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17699515#comment-17699515
 ] 

Apache Spark commented on SPARK-40082:
--

User 'Stove-hust' has created a pull request for this issue:
https://github.com/apache/spark/pull/40393

> DAGScheduler may not schduler new stage in condition of push-based shuffle 
> enabled
> --
>
> Key: SPARK-40082
> URL: https://issues.apache.org/jira/browse/SPARK-40082
> Project: Spark
>  Issue Type: Bug
>  Components: Scheduler
>Affects Versions: 3.1.1
>Reporter: Penglei Shi
>Priority: Major
> Attachments: missParentStages.png, shuffleMergeFinalized.png, 
> submitMissingTasks.png
>
>
> In condition of push-based shuffle being enabled and speculative tasks 
> existing, a shuffleMapStage will be resubmitting once fetchFailed occurring, 
> then its parent stages will be resubmitting firstly and it will cost some 
> time to compute. Before the shuffleMapStage being resubmitted, its all 
> speculative tasks success and register map output, but speculative task 
> successful events can not trigger shuffleMergeFinalized because this stage 
> has been removed from runningStages.
> Then this stage is resubmitted, but speculative tasks have registered map 
> output and there are no missing tasks to compute, resubmitting stages will 
> also not trigger shuffleMergeFinalized. Eventually this stage‘s 
> _shuffleMergedFinalized keeps false.
> Then AQE will submit next stages which are dependent on  this shuffleMapStage 
> occurring fetchFailed. And in getMissingParentStages, this stage will be 
> marked as missing and will be resubmitted, but next stages are added to 
> waitingStages after this stage being finished, so next stages will not be 
> submitted even though this stage's resubmitting has been finished.
> I have only met some times in my production env and it is difficult to 
> reproduce。



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-42508) Extract the common .ml classes to `mllib-common`

2023-03-13 Thread Ruifeng Zheng (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-42508?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ruifeng Zheng reassigned SPARK-42508:
-

Assignee: Ruifeng Zheng

> Extract the common .ml classes to `mllib-common`
> 
>
> Key: SPARK-42508
> URL: https://issues.apache.org/jira/browse/SPARK-42508
> Project: Spark
>  Issue Type: Sub-task
>  Components: Connect, ML
>Affects Versions: 3.4.0
>Reporter: Ruifeng Zheng
>Assignee: Ruifeng Zheng
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-42749) CAST(x as int) does not generate error with overflow

2023-03-13 Thread Tjomme Vergauwen (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-42749?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tjomme Vergauwen resolved SPARK-42749.
--
Resolution: Fixed

Additional settings required to get the intended behaviour.

Documentation is up-to-date

> CAST(x as int) does not generate error with overflow
> 
>
> Key: SPARK-42749
> URL: https://issues.apache.org/jira/browse/SPARK-42749
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 3.2.1, 3.3.0, 3.3.1, 3.3.2
> Environment: It was tested on a DataBricks environment with DBR 10.4 
> and above, running Spark v3.2.1 and above.
>Reporter: Tjomme Vergauwen
>Priority: Major
> Attachments: Spark-42749.PNG
>
>
> Hi,
> When performing the following code:
> {{select cast(7.415246799222789E19 as int)}}
> according to the documentation, an error is expected as 
> {{7.415246799222789E19 }}is an overflow value for datatype INT.
> However, the value 2147483647 is returned. 
> The behaviour of the following is correct as it returns NULL:
> {{select try_cast(7.415246799222789E19 as int) }}
> This results in unexpected behaviour and data corruption.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-42749) CAST(x as int) does not generate error with overflow

2023-03-13 Thread Tjomme Vergauwen (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-42749?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17699503#comment-17699503
 ] 

Tjomme Vergauwen commented on SPARK-42749:
--

Just checked the documentation again: the warning aparently was recently added

> CAST(x as int) does not generate error with overflow
> 
>
> Key: SPARK-42749
> URL: https://issues.apache.org/jira/browse/SPARK-42749
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 3.2.1, 3.3.0, 3.3.1, 3.3.2
> Environment: It was tested on a DataBricks environment with DBR 10.4 
> and above, running Spark v3.2.1 and above.
>Reporter: Tjomme Vergauwen
>Priority: Major
> Attachments: Spark-42749.PNG
>
>
> Hi,
> When performing the following code:
> {{select cast(7.415246799222789E19 as int)}}
> according to the documentation, an error is expected as 
> {{7.415246799222789E19 }}is an overflow value for datatype INT.
> However, the value 2147483647 is returned. 
> The behaviour of the following is correct as it returns NULL:
> {{select try_cast(7.415246799222789E19 as int) }}
> This results in unexpected behaviour and data corruption.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-42749) CAST(x as int) does not generate error with overflow

2023-03-13 Thread Tjomme Vergauwen (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-42749?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17699497#comment-17699497
 ] 

Tjomme Vergauwen commented on SPARK-42749:
--

Hi,

This does indeed solve the problem. Setting the parameter makes it behave as 
intended.

Can this be noted in the documentation that this is a requirement?

Thanks,

Tjomme

> CAST(x as int) does not generate error with overflow
> 
>
> Key: SPARK-42749
> URL: https://issues.apache.org/jira/browse/SPARK-42749
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 3.2.1, 3.3.0, 3.3.1, 3.3.2
> Environment: It was tested on a DataBricks environment with DBR 10.4 
> and above, running Spark v3.2.1 and above.
>Reporter: Tjomme Vergauwen
>Priority: Major
> Attachments: Spark-42749.PNG
>
>
> Hi,
> When performing the following code:
> {{select cast(7.415246799222789E19 as int)}}
> according to the documentation, an error is expected as 
> {{7.415246799222789E19 }}is an overflow value for datatype INT.
> However, the value 2147483647 is returned. 
> The behaviour of the following is correct as it returns NULL:
> {{select try_cast(7.415246799222789E19 as int) }}
> This results in unexpected behaviour and data corruption.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-39235) Make Catalog API be compatible with 3-layer-namespace

2023-03-13 Thread Wenchen Fan (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-39235?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wenchen Fan resolved SPARK-39235.
-
Fix Version/s: 3.4.0
   Resolution: Fixed

> Make Catalog API be compatible with 3-layer-namespace
> -
>
> Key: SPARK-39235
> URL: https://issues.apache.org/jira/browse/SPARK-39235
> Project: Spark
>  Issue Type: Improvement
>  Components: PySpark, R, SQL
>Affects Versions: 3.4.0
>Reporter: Rui Wang
>Priority: Major
> Fix For: 3.4.0
>
>
> We can make Catalog API support 3 layer namespace: 
> catalog_name.database_name.table_name



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-42577) A large stage could run indefinitely due to executor lost

2023-03-13 Thread Mridul Muralidharan (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-42577?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mridul Muralidharan resolved SPARK-42577.
-
Fix Version/s: 3.5.0
   Resolution: Fixed

Issue resolved by pull request 40286
[https://github.com/apache/spark/pull/40286]

> A large stage could run indefinitely due to executor lost
> -
>
> Key: SPARK-42577
> URL: https://issues.apache.org/jira/browse/SPARK-42577
> Project: Spark
>  Issue Type: Improvement
>  Components: Spark Core
>Affects Versions: 3.0.3, 3.1.3, 3.2.3, 3.3.2
>Reporter: wuyi
>Assignee: Tengfei Huang
>Priority: Major
> Fix For: 3.5.0
>
>
> When a stage is extremely large and Spark runs on spot instances or 
> problematic clusters with frequent worker/executor loss,  the stage could run 
> indefinitely due to task rerun caused by the executor loss. This happens, 
> when the external shuffle service is on, and the large stages runs hours to 
> complete, when spark tries to submit a child stage, it will find the parent 
> stage - the large one, has missed some partitions, so the large stage has to 
> rerun. When it completes again, it finds new missing partitions due to the 
> same reason.
> We should add a attempt limitation for this kind of scenario.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-42577) A large stage could run indefinitely due to executor lost

2023-03-13 Thread Mridul Muralidharan (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-42577?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mridul Muralidharan reassigned SPARK-42577:
---

Assignee: Tengfei Huang

> A large stage could run indefinitely due to executor lost
> -
>
> Key: SPARK-42577
> URL: https://issues.apache.org/jira/browse/SPARK-42577
> Project: Spark
>  Issue Type: Improvement
>  Components: Spark Core
>Affects Versions: 3.0.3, 3.1.3, 3.2.3, 3.3.2
>Reporter: wuyi
>Assignee: Tengfei Huang
>Priority: Major
>
> When a stage is extremely large and Spark runs on spot instances or 
> problematic clusters with frequent worker/executor loss,  the stage could run 
> indefinitely due to task rerun caused by the executor loss. This happens, 
> when the external shuffle service is on, and the large stages runs hours to 
> complete, when spark tries to submit a child stage, it will find the parent 
> stage - the large one, has missed some partitions, so the large stage has to 
> rerun. When it completes again, it finds new missing partitions due to the 
> same reason.
> We should add a attempt limitation for this kind of scenario.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-42770) SQLImplicitsTestSuite test failed with Java 17

2023-03-13 Thread Yang Jie (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-42770?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17699490#comment-17699490
 ] 

Yang Jie commented on SPARK-42770:
--

Maybe it can only be reproduced on Linux

 

> SQLImplicitsTestSuite test failed with Java 17
> --
>
> Key: SPARK-42770
> URL: https://issues.apache.org/jira/browse/SPARK-42770
> Project: Spark
>  Issue Type: Bug
>  Components: Connect, Tests
>Affects Versions: 3.4.0, 3.5.0
>Reporter: Yang Jie
>Priority: Major
>
> [https://github.com/apache/spark/actions/runs/4318647315/jobs/7537203682]
> {code:java}
> [info] - test implicit encoder resolution *** FAILED *** (1 second, 329 
> milliseconds)
> 4429[info]   2023-03-02T23:00:20.404434 did not equal 
> 2023-03-02T23:00:20.404434875 (SQLImplicitsTestSuite.scala:63)
> 4430[info]   org.scalatest.exceptions.TestFailedException:
> 4431[info]   at 
> org.scalatest.Assertions.newAssertionFailedException(Assertions.scala:472)
> 4432[info]   at 
> org.scalatest.Assertions.newAssertionFailedException$(Assertions.scala:471)
> 4433[info]   at 
> org.scalatest.Assertions$.newAssertionFailedException(Assertions.scala:1231)
> 4434[info]   at 
> org.scalatest.Assertions$AssertionsHelper.macroAssert(Assertions.scala:1295)
> 4435[info]   at 
> org.apache.spark.sql.SQLImplicitsTestSuite.testImplicit$1(SQLImplicitsTestSuite.scala:63)
> 4436[info]   at 
> org.apache.spark.sql.SQLImplicitsTestSuite.$anonfun$new$2(SQLImplicitsTestSuite.scala:133)
> 4437[info]   at 
> scala.runtime.java8.JFunction0$mcV$sp.apply(JFunction0$mcV$sp.java:23)
> 4438[info]   at org.scalatest.OutcomeOf.outcomeOf(OutcomeOf.scala:85)
> 4439[info]   at org.scalatest.OutcomeOf.outcomeOf$(OutcomeOf.scala:83)
> 4440[info]   at org.scalatest.OutcomeOf$.outcomeOf(OutcomeOf.scala:104)
> 4441[info]   at org.scalatest.Transformer.apply(Transformer.scala:22)
> 4442[info]   at org.scalatest.Transformer.apply(Transformer.scala:20)
> 4443[info]   at 
> org.scalatest.funsuite.AnyFunSuiteLike$$anon$1.apply(AnyFunSuiteLike.scala:226)
> [info]   at org.scalatest.TestSuite.withFixture(TestSuite.scala:196)
> 4445[info]   at org.scalatest.TestSuite.withFixture$(TestSuite.scala:195)
> 4446[info]   at 
> org.scalatest.funsuite.AnyFunSuite.withFixture(AnyFunSuite.scala:1564)
> 4447[info]   at 
> org.scalatest.funsuite.AnyFunSuiteLike.invokeWithFixture$1(AnyFunSuiteLike.scala:224)
> 4448[info]   at 
> org.scalatest.funsuite.AnyFunSuiteLike.$anonfun$runTest$1(AnyFunSuiteLike.scala:236)
> 4449[info]   at org.scalatest.SuperEngine.runTestImpl(Engine.scala:306)
> 4450[info]   at 
> org.scalatest.funsuite.AnyFunSuiteLike.runTest(AnyFunSuiteLike.scala:236)
> 4451[info]   at 
> org.scalatest.funsuite.AnyFunSuiteLike.runTest$(AnyFunSuiteLike.scala:218)
> 4452[info]   at 
> org.scalatest.funsuite.AnyFunSuite.runTest(AnyFunSuite.scala:1564)
> 4453[info]   at 
> org.scalatest.funsuite.AnyFunSuiteLike.$anonfun$runTests$1(AnyFunSuiteLike.scala:269)
> 4454[info]   at 
> org.scalatest.SuperEngine.$anonfun$runTestsInBranch$1(Engine.scala:413)
> 4455[info]   at scala.collection.immutable.List.foreach(List.scala:431)
> 4456[info]   at org.scalatest.SuperEngine.traverseSubNodes$1(Engine.scala:401)
> 4457[info]   at org.scalatest.SuperEngine.runTestsInBranch(Engine.scala:396)
> 4458[info]   at org.scalatest.SuperEngine.runTestsImpl(Engine.scala:475)
> 4459[info]   at 
> org.scalatest.funsuite.AnyFunSuiteLike.runTests(AnyFunSuiteLike.scala:269)
> 4460[info]   at 
> org.scalatest.funsuite.AnyFunSuiteLike.runTests$(AnyFunSuiteLike.scala:268)
> 4461[info]   at 
> org.scalatest.funsuite.AnyFunSuite.runTests(AnyFunSuite.scala:1564)
> 4462[info]   at org.scalatest.Suite.run(Suite.scala:1114)
> 4463[info]   at org.scalatest.Suite.run$(Suite.scala:1096)
> 4464[info]   at 
> org.scalatest.funsuite.AnyFunSuite.org$scalatest$funsuite$AnyFunSuiteLike$$super$run(AnyFunSuite.scala:1564)
> 4465[info]   at 
> org.scalatest.funsuite.AnyFunSuiteLike.$anonfun$run$1(AnyFunSuiteLike.scala:273)
> 4466[info]   at org.scalatest.SuperEngine.runImpl(Engine.scala:535)
> 4467[info]   at 
> org.scalatest.funsuite.AnyFunSuiteLike.run(AnyFunSuiteLike.scala:273)
> 4468[info]   at 
> org.scalatest.funsuite.AnyFunSuiteLike.run$(AnyFunSuiteLike.scala:272)
> 4469[info]   at 
> org.apache.spark.sql.SQLImplicitsTestSuite.org$scalatest$BeforeAndAfterAll$$super$run(SQLImplicitsTestSuite.scala:34)
> 4470[info]   at 
> org.scalatest.BeforeAndAfterAll.liftedTree1$1(BeforeAndAfterAll.scala:213)
> 4471[info]   at 
> org.scalatest.BeforeAndAfterAll.run(BeforeAndAfterAll.scala:210)
> 4472[info]   at 
> org.scalatest.BeforeAndAfterAll.run$(BeforeAndAfterAll.scala:208)
> 4473[info]   at 
> org.apache.spark.sql.SQLImplicitsTestSuite.run(SQLImplicitsTestSuite.scala:34)
> 4474[info]   at 
> 

[jira] [Created] (SPARK-42770) SQLImplicitsTestSuite test failed with Java 17

2023-03-13 Thread Yang Jie (Jira)
Yang Jie created SPARK-42770:


 Summary: SQLImplicitsTestSuite test failed with Java 17
 Key: SPARK-42770
 URL: https://issues.apache.org/jira/browse/SPARK-42770
 Project: Spark
  Issue Type: Bug
  Components: Connect, Tests
Affects Versions: 3.4.0, 3.5.0
Reporter: Yang Jie


[https://github.com/apache/spark/actions/runs/4318647315/jobs/7537203682]
{code:java}
[info] - test implicit encoder resolution *** FAILED *** (1 second, 329 
milliseconds)
4429[info]   2023-03-02T23:00:20.404434 did not equal 
2023-03-02T23:00:20.404434875 (SQLImplicitsTestSuite.scala:63)
4430[info]   org.scalatest.exceptions.TestFailedException:
4431[info]   at 
org.scalatest.Assertions.newAssertionFailedException(Assertions.scala:472)
4432[info]   at 
org.scalatest.Assertions.newAssertionFailedException$(Assertions.scala:471)
4433[info]   at 
org.scalatest.Assertions$.newAssertionFailedException(Assertions.scala:1231)
4434[info]   at 
org.scalatest.Assertions$AssertionsHelper.macroAssert(Assertions.scala:1295)
4435[info]   at 
org.apache.spark.sql.SQLImplicitsTestSuite.testImplicit$1(SQLImplicitsTestSuite.scala:63)
4436[info]   at 
org.apache.spark.sql.SQLImplicitsTestSuite.$anonfun$new$2(SQLImplicitsTestSuite.scala:133)
4437[info]   at 
scala.runtime.java8.JFunction0$mcV$sp.apply(JFunction0$mcV$sp.java:23)
4438[info]   at org.scalatest.OutcomeOf.outcomeOf(OutcomeOf.scala:85)
4439[info]   at org.scalatest.OutcomeOf.outcomeOf$(OutcomeOf.scala:83)
4440[info]   at org.scalatest.OutcomeOf$.outcomeOf(OutcomeOf.scala:104)
4441[info]   at org.scalatest.Transformer.apply(Transformer.scala:22)
4442[info]   at org.scalatest.Transformer.apply(Transformer.scala:20)
4443[info]   at 
org.scalatest.funsuite.AnyFunSuiteLike$$anon$1.apply(AnyFunSuiteLike.scala:226)
[info]   at org.scalatest.TestSuite.withFixture(TestSuite.scala:196)
4445[info]   at org.scalatest.TestSuite.withFixture$(TestSuite.scala:195)
4446[info]   at 
org.scalatest.funsuite.AnyFunSuite.withFixture(AnyFunSuite.scala:1564)
4447[info]   at 
org.scalatest.funsuite.AnyFunSuiteLike.invokeWithFixture$1(AnyFunSuiteLike.scala:224)
4448[info]   at 
org.scalatest.funsuite.AnyFunSuiteLike.$anonfun$runTest$1(AnyFunSuiteLike.scala:236)
4449[info]   at org.scalatest.SuperEngine.runTestImpl(Engine.scala:306)
4450[info]   at 
org.scalatest.funsuite.AnyFunSuiteLike.runTest(AnyFunSuiteLike.scala:236)
4451[info]   at 
org.scalatest.funsuite.AnyFunSuiteLike.runTest$(AnyFunSuiteLike.scala:218)
4452[info]   at 
org.scalatest.funsuite.AnyFunSuite.runTest(AnyFunSuite.scala:1564)
4453[info]   at 
org.scalatest.funsuite.AnyFunSuiteLike.$anonfun$runTests$1(AnyFunSuiteLike.scala:269)
4454[info]   at 
org.scalatest.SuperEngine.$anonfun$runTestsInBranch$1(Engine.scala:413)
4455[info]   at scala.collection.immutable.List.foreach(List.scala:431)
4456[info]   at org.scalatest.SuperEngine.traverseSubNodes$1(Engine.scala:401)
4457[info]   at org.scalatest.SuperEngine.runTestsInBranch(Engine.scala:396)
4458[info]   at org.scalatest.SuperEngine.runTestsImpl(Engine.scala:475)
4459[info]   at 
org.scalatest.funsuite.AnyFunSuiteLike.runTests(AnyFunSuiteLike.scala:269)
4460[info]   at 
org.scalatest.funsuite.AnyFunSuiteLike.runTests$(AnyFunSuiteLike.scala:268)
4461[info]   at 
org.scalatest.funsuite.AnyFunSuite.runTests(AnyFunSuite.scala:1564)
4462[info]   at org.scalatest.Suite.run(Suite.scala:1114)
4463[info]   at org.scalatest.Suite.run$(Suite.scala:1096)
4464[info]   at 
org.scalatest.funsuite.AnyFunSuite.org$scalatest$funsuite$AnyFunSuiteLike$$super$run(AnyFunSuite.scala:1564)
4465[info]   at 
org.scalatest.funsuite.AnyFunSuiteLike.$anonfun$run$1(AnyFunSuiteLike.scala:273)
4466[info]   at org.scalatest.SuperEngine.runImpl(Engine.scala:535)
4467[info]   at 
org.scalatest.funsuite.AnyFunSuiteLike.run(AnyFunSuiteLike.scala:273)
4468[info]   at 
org.scalatest.funsuite.AnyFunSuiteLike.run$(AnyFunSuiteLike.scala:272)
4469[info]   at 
org.apache.spark.sql.SQLImplicitsTestSuite.org$scalatest$BeforeAndAfterAll$$super$run(SQLImplicitsTestSuite.scala:34)
4470[info]   at 
org.scalatest.BeforeAndAfterAll.liftedTree1$1(BeforeAndAfterAll.scala:213)
4471[info]   at org.scalatest.BeforeAndAfterAll.run(BeforeAndAfterAll.scala:210)
4472[info]   at 
org.scalatest.BeforeAndAfterAll.run$(BeforeAndAfterAll.scala:208)
4473[info]   at 
org.apache.spark.sql.SQLImplicitsTestSuite.run(SQLImplicitsTestSuite.scala:34)
4474[info]   at 
org.scalatest.tools.Framework.org$scalatest$tools$Framework$$runSuite(Framework.scala:321)
4475[info]   at 
org.scalatest.tools.Framework$ScalaTestTask.execute(Framework.scala:517)
4476[info]   at sbt.ForkMain$Run.lambda$runTest$1(ForkMain.java:413)
4477[info]   at 
java.base/java.util.concurrent.FutureTask.run(FutureTask.java:264)
4478[info]   at 
java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1136)
4479[info]   at 

[jira] [Updated] (SPARK-42711) build/sbt usage error messages and shellcheck warn/error

2023-03-13 Thread Liang Yan (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-42711?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Liang Yan updated SPARK-42711:
--
Description: 
The build/sbt tool's usage information has some missing content:
 
{code:java}
(base) spark% ./build/sbt -help
Usage:  [options]

  -h | -help print this message
  -v | -verbose  this runner is chattier
{code}

And also some shellcheck warn/error.

  was:
The build/sbt tool's usage information about java-home is wrong:

  # java version (default: java from PATH, currently $(java -version 2>&1 | 
grep version))

  -java-home          alternate JAVA_HOME


> build/sbt usage error messages and shellcheck warn/error
> 
>
> Key: SPARK-42711
> URL: https://issues.apache.org/jira/browse/SPARK-42711
> Project: Spark
>  Issue Type: Bug
>  Components: Build
>Affects Versions: 3.3.2
>Reporter: Liang Yan
>Priority: Minor
>
> The build/sbt tool's usage information has some missing content:
>  
> {code:java}
> (base) spark% ./build/sbt -help
> Usage:  [options]
>   -h | -help print this message
>   -v | -verbose  this runner is chattier
> {code}
> And also some shellcheck warn/error.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-42711) build/sbt usage error messages and shellcheck warn/error

2023-03-13 Thread Liang Yan (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-42711?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Liang Yan updated SPARK-42711:
--
Summary: build/sbt usage error messages and shellcheck warn/error  (was: 
build/sbt usage error messages about java-home)

> build/sbt usage error messages and shellcheck warn/error
> 
>
> Key: SPARK-42711
> URL: https://issues.apache.org/jira/browse/SPARK-42711
> Project: Spark
>  Issue Type: Bug
>  Components: Build
>Affects Versions: 3.3.2
>Reporter: Liang Yan
>Priority: Minor
>
> The build/sbt tool's usage information about java-home is wrong:
>   # java version (default: java from PATH, currently $(java -version 2>&1 | 
> grep version))
>   -java-home          alternate JAVA_HOME



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] (SPARK-42766) YarnAllocator should filter excluded nodes when launching allocated containers

2023-03-13 Thread wangshengjie (Jira)


[ https://issues.apache.org/jira/browse/SPARK-42766 ]


wangshengjie deleted comment on SPARK-42766:
--

was (Author: wangshengjie):
Working on this

> YarnAllocator should filter excluded nodes when launching allocated containers
> --
>
> Key: SPARK-42766
> URL: https://issues.apache.org/jira/browse/SPARK-42766
> Project: Spark
>  Issue Type: Improvement
>  Components: YARN
>Affects Versions: 3.3.2
>Reporter: wangshengjie
>Priority: Major
>
> In production environment, we hit an issue like this:
> If we request 10 containers form nodeA and nodeB, first response from Yarn 
> return 5 contianers from nodeA and nodeB, then nodeA blacklisted, and second 
> response from Yarn maybe return some containers from nodeA and launching 
> containers, but when containers(Executor) setup and send register request to 
> Driver, it will be rejected and this failure will be counted to 
> {code:java}
> spark.yarn.max.executor.failures {code}
> , and will casue app failed.
> {code:java}
> Max number of executor failures ($maxNumExecutorFailures) reached{code}
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-42756) Helper function to convert proto literal to value in Python Client

2023-03-13 Thread Ruifeng Zheng (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-42756?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ruifeng Zheng resolved SPARK-42756.
---
Fix Version/s: 3.4.0
   Resolution: Fixed

Issue resolved by pull request 40376
[https://github.com/apache/spark/pull/40376]

> Helper function to convert proto literal to value in Python Client
> --
>
> Key: SPARK-42756
> URL: https://issues.apache.org/jira/browse/SPARK-42756
> Project: Spark
>  Issue Type: Sub-task
>  Components: Connect, PySpark
>Affects Versions: 3.4.0
>Reporter: Ruifeng Zheng
>Assignee: Ruifeng Zheng
>Priority: Major
> Fix For: 3.4.0
>
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-42756) Helper function to convert proto literal to value in Python Client

2023-03-13 Thread Ruifeng Zheng (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-42756?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ruifeng Zheng reassigned SPARK-42756:
-

Assignee: Ruifeng Zheng

> Helper function to convert proto literal to value in Python Client
> --
>
> Key: SPARK-42756
> URL: https://issues.apache.org/jira/browse/SPARK-42756
> Project: Spark
>  Issue Type: Sub-task
>  Components: Connect, PySpark
>Affects Versions: 3.4.0
>Reporter: Ruifeng Zheng
>Assignee: Ruifeng Zheng
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-42755) Factor literal value conversion out to connect-common

2023-03-13 Thread Ruifeng Zheng (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-42755?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ruifeng Zheng resolved SPARK-42755.
---
Fix Version/s: 3.4.0
   Resolution: Fixed

Issue resolved by pull request 40375
[https://github.com/apache/spark/pull/40375]

> Factor literal value conversion out to connect-common
> -
>
> Key: SPARK-42755
> URL: https://issues.apache.org/jira/browse/SPARK-42755
> Project: Spark
>  Issue Type: Sub-task
>  Components: Connect
>Affects Versions: 3.4.0
>Reporter: Ruifeng Zheng
>Assignee: Ruifeng Zheng
>Priority: Major
> Fix For: 3.4.0
>
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-42755) Factor literal value conversion out to connect-common

2023-03-13 Thread Ruifeng Zheng (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-42755?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ruifeng Zheng reassigned SPARK-42755:
-

Assignee: Ruifeng Zheng

> Factor literal value conversion out to connect-common
> -
>
> Key: SPARK-42755
> URL: https://issues.apache.org/jira/browse/SPARK-42755
> Project: Spark
>  Issue Type: Sub-task
>  Components: Connect
>Affects Versions: 3.4.0
>Reporter: Ruifeng Zheng
>Assignee: Ruifeng Zheng
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-42769) Add ENV_DRIVER_POD_IP env variable to executor pods

2023-03-13 Thread Apache Spark (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-42769?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17699470#comment-17699470
 ] 

Apache Spark commented on SPARK-42769:
--

User 'dongjoon-hyun' has created a pull request for this issue:
https://github.com/apache/spark/pull/40392

> Add ENV_DRIVER_POD_IP env variable to executor pods
> ---
>
> Key: SPARK-42769
> URL: https://issues.apache.org/jira/browse/SPARK-42769
> Project: Spark
>  Issue Type: Improvement
>  Components: Kubernetes
>Affects Versions: 3.5.0
>Reporter: Dongjoon Hyun
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-42769) Add ENV_DRIVER_POD_IP env variable to executor pods

2023-03-13 Thread Apache Spark (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-42769?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-42769:


Assignee: Apache Spark

> Add ENV_DRIVER_POD_IP env variable to executor pods
> ---
>
> Key: SPARK-42769
> URL: https://issues.apache.org/jira/browse/SPARK-42769
> Project: Spark
>  Issue Type: Improvement
>  Components: Kubernetes
>Affects Versions: 3.5.0
>Reporter: Dongjoon Hyun
>Assignee: Apache Spark
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-42769) Add ENV_DRIVER_POD_IP env variable to executor pods

2023-03-13 Thread Apache Spark (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-42769?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-42769:


Assignee: (was: Apache Spark)

> Add ENV_DRIVER_POD_IP env variable to executor pods
> ---
>
> Key: SPARK-42769
> URL: https://issues.apache.org/jira/browse/SPARK-42769
> Project: Spark
>  Issue Type: Improvement
>  Components: Kubernetes
>Affects Versions: 3.5.0
>Reporter: Dongjoon Hyun
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-42769) Add ENV_DRIVER_POD_IP env variable to executor pods

2023-03-13 Thread Dongjoon Hyun (Jira)
Dongjoon Hyun created SPARK-42769:
-

 Summary: Add ENV_DRIVER_POD_IP env variable to executor pods
 Key: SPARK-42769
 URL: https://issues.apache.org/jira/browse/SPARK-42769
 Project: Spark
  Issue Type: Improvement
  Components: Kubernetes
Affects Versions: 3.5.0
Reporter: Dongjoon Hyun






--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-42766) YarnAllocator should filter excluded nodes when launching allocated containers

2023-03-13 Thread Apache Spark (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-42766?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17699467#comment-17699467
 ] 

Apache Spark commented on SPARK-42766:
--

User 'wangshengjie123' has created a pull request for this issue:
https://github.com/apache/spark/pull/40391

> YarnAllocator should filter excluded nodes when launching allocated containers
> --
>
> Key: SPARK-42766
> URL: https://issues.apache.org/jira/browse/SPARK-42766
> Project: Spark
>  Issue Type: Improvement
>  Components: YARN
>Affects Versions: 3.3.2
>Reporter: wangshengjie
>Priority: Major
>
> In production environment, we hit an issue like this:
> If we request 10 containers form nodeA and nodeB, first response from Yarn 
> return 5 contianers from nodeA and nodeB, then nodeA blacklisted, and second 
> response from Yarn maybe return some containers from nodeA and launching 
> containers, but when containers(Executor) setup and send register request to 
> Driver, it will be rejected and this failure will be counted to 
> {code:java}
> spark.yarn.max.executor.failures {code}
> , and will casue app failed.
> {code:java}
> Max number of executor failures ($maxNumExecutorFailures) reached{code}
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-42766) YarnAllocator should filter excluded nodes when launching allocated containers

2023-03-13 Thread Apache Spark (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-42766?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-42766:


Assignee: (was: Apache Spark)

> YarnAllocator should filter excluded nodes when launching allocated containers
> --
>
> Key: SPARK-42766
> URL: https://issues.apache.org/jira/browse/SPARK-42766
> Project: Spark
>  Issue Type: Improvement
>  Components: YARN
>Affects Versions: 3.3.2
>Reporter: wangshengjie
>Priority: Major
>
> In production environment, we hit an issue like this:
> If we request 10 containers form nodeA and nodeB, first response from Yarn 
> return 5 contianers from nodeA and nodeB, then nodeA blacklisted, and second 
> response from Yarn maybe return some containers from nodeA and launching 
> containers, but when containers(Executor) setup and send register request to 
> Driver, it will be rejected and this failure will be counted to 
> {code:java}
> spark.yarn.max.executor.failures {code}
> , and will casue app failed.
> {code:java}
> Max number of executor failures ($maxNumExecutorFailures) reached{code}
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-42766) YarnAllocator should filter excluded nodes when launching allocated containers

2023-03-13 Thread Apache Spark (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-42766?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-42766:


Assignee: Apache Spark

> YarnAllocator should filter excluded nodes when launching allocated containers
> --
>
> Key: SPARK-42766
> URL: https://issues.apache.org/jira/browse/SPARK-42766
> Project: Spark
>  Issue Type: Improvement
>  Components: YARN
>Affects Versions: 3.3.2
>Reporter: wangshengjie
>Assignee: Apache Spark
>Priority: Major
>
> In production environment, we hit an issue like this:
> If we request 10 containers form nodeA and nodeB, first response from Yarn 
> return 5 contianers from nodeA and nodeB, then nodeA blacklisted, and second 
> response from Yarn maybe return some containers from nodeA and launching 
> containers, but when containers(Executor) setup and send register request to 
> Driver, it will be rejected and this failure will be counted to 
> {code:java}
> spark.yarn.max.executor.failures {code}
> , and will casue app failed.
> {code:java}
> Max number of executor failures ($maxNumExecutorFailures) reached{code}
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-42764) Parameterize the max number of attempts for driver props fetcher in KubernetesExecutorBackend

2023-03-13 Thread Dongjoon Hyun (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-42764?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dongjoon Hyun reassigned SPARK-42764:
-

Assignee: Dongjoon Hyun

> Parameterize the max number of attempts for driver props fetcher in 
> KubernetesExecutorBackend
> -
>
> Key: SPARK-42764
> URL: https://issues.apache.org/jira/browse/SPARK-42764
> Project: Spark
>  Issue Type: Improvement
>  Components: Kubernetes
>Affects Versions: 3.5.0
>Reporter: Dongjoon Hyun
>Assignee: Dongjoon Hyun
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-42764) Parameterize the max number of attempts for driver props fetcher in KubernetesExecutorBackend

2023-03-13 Thread Dongjoon Hyun (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-42764?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dongjoon Hyun resolved SPARK-42764.
---
Fix Version/s: 3.5.0
   Resolution: Fixed

Issue resolved by pull request 40387
[https://github.com/apache/spark/pull/40387]

> Parameterize the max number of attempts for driver props fetcher in 
> KubernetesExecutorBackend
> -
>
> Key: SPARK-42764
> URL: https://issues.apache.org/jira/browse/SPARK-42764
> Project: Spark
>  Issue Type: Improvement
>  Components: Kubernetes
>Affects Versions: 3.5.0
>Reporter: Dongjoon Hyun
>Assignee: Dongjoon Hyun
>Priority: Major
> Fix For: 3.5.0
>
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org