date:20231011

[jira] [Updated] (SPARK-45502) Upgrade Kafka to 3.6.0

2023-10-11 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-45502?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated SPARK-45502:
---
Labels: pull-request-available  (was: )

> Upgrade Kafka to 3.6.0
> --
>
> Key: SPARK-45502
> URL: https://issues.apache.org/jira/browse/SPARK-45502
> Project: Spark
>  Issue Type: Sub-task
>  Components: Build
>Affects Versions: 4.0.0
>Reporter: Dongjoon Hyun
>Priority: Major
>  Labels: pull-request-available
>
> Apache Kafka 3.6.0 is released on Oct 10, 2023.
> - https://downloads.apache.org/kafka/3.6.0/RELEASE_NOTES.html



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-45442) Refine docstring of `DataFrame.show`

2023-10-11 Thread Ruifeng Zheng (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-45442?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ruifeng Zheng reassigned SPARK-45442:
-

Assignee: Allison Wang

> Refine docstring of `DataFrame.show`
> 
>
> Key: SPARK-45442
> URL: https://issues.apache.org/jira/browse/SPARK-45442
> Project: Spark
>  Issue Type: Sub-task
>  Components: Documentation, PySpark
>Affects Versions: 4.0.0
>Reporter: Allison Wang
>Assignee: Allison Wang
>Priority: Major
>  Labels: pull-request-available
>
> Refine docstring of `DataFrame.show()`



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Resolved] (SPARK-45442) Refine docstring of `DataFrame.show`

2023-10-11 Thread Ruifeng Zheng (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-45442?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ruifeng Zheng resolved SPARK-45442.
---
Fix Version/s: 4.0.0
   Resolution: Fixed

Issue resolved by pull request 43252
[https://github.com/apache/spark/pull/43252]

> Refine docstring of `DataFrame.show`
> 
>
> Key: SPARK-45442
> URL: https://issues.apache.org/jira/browse/SPARK-45442
> Project: Spark
>  Issue Type: Sub-task
>  Components: Documentation, PySpark
>Affects Versions: 4.0.0
>Reporter: Allison Wang
>Assignee: Allison Wang
>Priority: Major
>  Labels: pull-request-available
> Fix For: 4.0.0
>
>
> Refine docstring of `DataFrame.show()`



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-45510) Replace `scala.collection.generic.Growable` to `scala.collection.mutable.Growable`

2023-10-11 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-45510?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated SPARK-45510:
---
Labels: pull-request-available  (was: )

> Replace `scala.collection.generic.Growable` to 
> `scala.collection.mutable.Growable`
> --
>
> Key: SPARK-45510
> URL: https://issues.apache.org/jira/browse/SPARK-45510
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 4.0.0
>Reporter: Jia Fan
>Priority: Major
>  Labels: pull-request-available
>
> Replace `scala.collection.generic.Growable` to 
> `scala.collection.mutable.Growable`



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Created] (SPARK-45510) Replace `scala.collection.generic.Growable` to `scala.collection.mutable.Growable`

2023-10-11 Thread Jia Fan (Jira)

Jia Fan created SPARK-45510:
---

 Summary: Replace `scala.collection.generic.Growable` to 
`scala.collection.mutable.Growable`
 Key: SPARK-45510
 URL: https://issues.apache.org/jira/browse/SPARK-45510
 Project: Spark
  Issue Type: Sub-task
  Components: SQL
Affects Versions: 4.0.0
Reporter: Jia Fan


Replace `scala.collection.generic.Growable` to 
`scala.collection.mutable.Growable`



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Resolved] (SPARK-45402) Add API for 'analyze' method to return a buffer to be consumed on each class creation

2023-10-11 Thread Takuya Ueshin (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-45402?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Takuya Ueshin resolved SPARK-45402.
---
Fix Version/s: 4.0.0
 Assignee: Daniel
   Resolution: Fixed

Issue resolved by pull request 43204
https://github.com/apache/spark/pull/43204

> Add API for 'analyze' method to return a buffer to be consumed on each class 
> creation
> -
>
> Key: SPARK-45402
> URL: https://issues.apache.org/jira/browse/SPARK-45402
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 4.0.0
>Reporter: Daniel
>Assignee: Daniel
>Priority: Major
>  Labels: pull-request-available
> Fix For: 4.0.0
>
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-45508) Add "--add-opens=java.base/jdk.internal.ref=ALL-UNNAMED" so Platform can access cleaner on Java 9+

2023-10-11 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-45508?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated SPARK-45508:
---
Labels: pull-request-available  (was: )

> Add "--add-opens=java.base/jdk.internal.ref=ALL-UNNAMED" so Platform can 
> access cleaner on Java 9+
> --
>
> Key: SPARK-45508
> URL: https://issues.apache.org/jira/browse/SPARK-45508
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Core
>Affects Versions: 3.5.0
>Reporter: Josh Rosen
>Priority: Major
>  Labels: pull-request-available
>
> We need to add `--add-opens=java.base/jdk.internal.ref=ALL-UNNAMED` to our 
> JVM options so that the code in `org.apache.spark.unsafe.Platform` can access 
> the JDK internal cleaner classes.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-45508) Add "--add-opens=java.base/jdk.internal.ref=ALL-UNNAMED" so Platform can access cleaner on Java 9+

2023-10-11 Thread Josh Rosen (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-45508?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Josh Rosen updated SPARK-45508:
---
Description: We need to add 
`--add-opens=java.base/jdk.internal.ref=ALL-UNNAMED` to our JVM options so that 
the code in `org.apache.spark.unsafe.Platform` can access the JDK internal 
cleaner classes.  (was: We need to update the 

 
```
val f = 
classOf[org.apache.spark.unsafe.Platform].getDeclaredField("CLEANER_CREATE_METHOD")
f.setAccessible(true)
f.get(null)
```
returning `null` instead of a method.)

> Add "--add-opens=java.base/jdk.internal.ref=ALL-UNNAMED" so Platform can 
> access cleaner on Java 9+
> --
>
> Key: SPARK-45508
> URL: https://issues.apache.org/jira/browse/SPARK-45508
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Core
>Affects Versions: 3.5.0
>Reporter: Josh Rosen
>Priority: Major
>
> We need to add `--add-opens=java.base/jdk.internal.ref=ALL-UNNAMED` to our 
> JVM options so that the code in `org.apache.spark.unsafe.Platform` can access 
> the JDK internal cleaner classes.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-45508) Add "--add-opens=java.base/jdk.internal.ref=ALL-UNNAMED" so Platform can access cleaner on Java 9+

2023-10-11 Thread Josh Rosen (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-45508?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Josh Rosen updated SPARK-45508:
---
Summary: Add "--add-opens=java.base/jdk.internal.ref=ALL-UNNAMED" so 
Platform can access cleaner on Java 9+  (was: Add 
"--add-opens=java.base/jdk.internal.ref=ALL-UNNAMED" so Platform can access 
cleaner on Java 11+)

> Add "--add-opens=java.base/jdk.internal.ref=ALL-UNNAMED" so Platform can 
> access cleaner on Java 9+
> --
>
> Key: SPARK-45508
> URL: https://issues.apache.org/jira/browse/SPARK-45508
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Core
>Affects Versions: 3.5.0
>Reporter: Josh Rosen
>Priority: Major
>
> We need to update the 
>  
> ```
> val f = 
> classOf[org.apache.spark.unsafe.Platform].getDeclaredField("CLEANER_CREATE_METHOD")
> f.setAccessible(true)
> f.get(null)
> ```
> returning `null` instead of a method.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-45508) Add "--add-opens=java.base/jdk.internal.ref=ALL-UNNAMED" so Platform can access cleaner on Java 11+

2023-10-11 Thread Josh Rosen (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-45508?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Josh Rosen updated SPARK-45508:
---
Summary: Add "--add-opens=java.base/jdk.internal.ref=ALL-UNNAMED" so 
Platform can access cleaner on Java 11+  (was: Add 
"--add-opens=java.base/jdk.internal.ref=ALL-UNNAMED" so Platform can access 
cleaner on Java 9+)

> Add "--add-opens=java.base/jdk.internal.ref=ALL-UNNAMED" so Platform can 
> access cleaner on Java 11+
> ---
>
> Key: SPARK-45508
> URL: https://issues.apache.org/jira/browse/SPARK-45508
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Core
>Affects Versions: 3.5.0
>Reporter: Josh Rosen
>Priority: Major
>
> In JDK >= 9.b110, the code at 
> [https://github.com/apache/spark/blob/v3.5.0/common/unsafe/src/main/java/org/apache/spark/unsafe/Platform.java#L213]
>  hits a fallback path because we are using the wrong cleaner class name: 
> `jdk.internal.ref.Cleaner` was removed in 
> [https://bugs.openjdk.org/browse/JDK-8149925] 
> This can be verified via
>  
> ```
> val f = 
> classOf[org.apache.spark.unsafe.Platform].getDeclaredField("CLEANER_CREATE_METHOD")
> f.setAccessible(true)
> f.get(null)
> ```
> returning `null` instead of a method.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-45508) Add "--add-opens=java.base/jdk.internal.ref=ALL-UNNAMED" so Platform can access cleaner on Java 11+

2023-10-11 Thread Josh Rosen (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-45508?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Josh Rosen updated SPARK-45508:
---
Description: 
We need to update the 

 
```
val f = 
classOf[org.apache.spark.unsafe.Platform].getDeclaredField("CLEANER_CREATE_METHOD")
f.setAccessible(true)
f.get(null)
```
returning `null` instead of a method.

  was:
In JDK >= 9.b110, the code at 
[https://github.com/apache/spark/blob/v3.5.0/common/unsafe/src/main/java/org/apache/spark/unsafe/Platform.java#L213]
 hits a fallback path because we are using the wrong cleaner class name: 
`jdk.internal.ref.Cleaner` was removed in 
[https://bugs.openjdk.org/browse/JDK-8149925] 

This can be verified via
 
```
val f = 
classOf[org.apache.spark.unsafe.Platform].getDeclaredField("CLEANER_CREATE_METHOD")
f.setAccessible(true)
f.get(null)
```
returning `null` instead of a method.


> Add "--add-opens=java.base/jdk.internal.ref=ALL-UNNAMED" so Platform can 
> access cleaner on Java 11+
> ---
>
> Key: SPARK-45508
> URL: https://issues.apache.org/jira/browse/SPARK-45508
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Core
>Affects Versions: 3.5.0
>Reporter: Josh Rosen
>Priority: Major
>
> We need to update the 
>  
> ```
> val f = 
> classOf[org.apache.spark.unsafe.Platform].getDeclaredField("CLEANER_CREATE_METHOD")
> f.setAccessible(true)
> f.get(null)
> ```
> returning `null` instead of a method.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-45508) Add "--add-opens=java.base/jdk.internal.ref=ALL-UNNAMED" so Platform can access cleaner on Java 9+

2023-10-11 Thread Josh Rosen (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-45508?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Josh Rosen updated SPARK-45508:
---
Summary: Add "--add-opens=java.base/jdk.internal.ref=ALL-UNNAMED" so 
Platform can access cleaner on Java 9+  (was: org.apache.spark.unsafe.Platform 
uses wrong cleaner class name in JDK 9.b110+)

> Add "--add-opens=java.base/jdk.internal.ref=ALL-UNNAMED" so Platform can 
> access cleaner on Java 9+
> --
>
> Key: SPARK-45508
> URL: https://issues.apache.org/jira/browse/SPARK-45508
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Core
>Affects Versions: 3.5.0
>Reporter: Josh Rosen
>Priority: Major
>
> In JDK >= 9.b110, the code at 
> [https://github.com/apache/spark/blob/v3.5.0/common/unsafe/src/main/java/org/apache/spark/unsafe/Platform.java#L213]
>  hits a fallback path because we are using the wrong cleaner class name: 
> `jdk.internal.ref.Cleaner` was removed in 
> [https://bugs.openjdk.org/browse/JDK-8149925] 
> This can be verified via
>  
> ```
> val f = 
> classOf[org.apache.spark.unsafe.Platform].getDeclaredField("CLEANER_CREATE_METHOD")
> f.setAccessible(true)
> f.get(null)
> ```
> returning `null` instead of a method.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-45509) Investigate the behavior difference in self-join

2023-10-11 Thread Allison Wang (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-45509?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Allison Wang updated SPARK-45509:
-
Description: 
SPARK-45220 discovers a behavior difference for a self-join scenario between 
classic Spark and Spark Connect.

For instance, here is the query that works without Spark Connect: 
{code:java}
df = spark.createDataFrame([Row(name="Alice", age=2), Row(name="Bob", age=5)])
df2 = spark.createDataFrame([Row(name="Tom", height=80), Row(name="Bob", 
height=85)]){code}
{code:java}
joined = df.join(df2, df.name == df2.name, "outer").sort(sf.desc(df.name)) 
joined.show(){code}
But in Spark Connect, it throws this exception:
{code:java}
pyspark.errors.exceptions.connect.AnalysisException: 
[UNRESOLVED_COLUMN.WITH_SUGGESTION] A column, variable, or function parameter 
with name `name` cannot be resolved. Did you mean one of the following? 
[`name`, `name`, `age`, `height`].;
'Sort ['name DESC NULLS LAST], true
+- Join FullOuter, (name#64 = name#78)
   :- LocalRelation [name#64, age#65L]
   +- LocalRelation [name#78, height#79L]
 {code}
 

On the other hand, this query failed in classic Spark Connect:
{code:java}
df.join(df, df.name == df.name, "outer").select(df.name).show() {code}
{code:java}
pyspark.errors.exceptions.captured.AnalysisException: Column name#0 are 
ambiguous... {code}
 

but this query works with Spark Connect.

We need to investigate the behavior difference and fix it.

  was:
SPARK-45220 discovers a behavior difference for a self-join scenario between 
classic Spark and Spark Connect.

For instance, here is the query that works without Spark Connect: 
{code:java}
joined = df.join(df2, df.name == df2.name, "outer").sort(sf.desc(df.name)) 
joined.show(){code}
But in Spark Connect, it throws this exception:
{code:java}
pyspark.errors.exceptions.connect.AnalysisException: 
[UNRESOLVED_COLUMN.WITH_SUGGESTION] A column, variable, or function parameter 
with name `name` cannot be resolved. Did you mean one of the following? 
[`name`, `name`, `age`, `height`].;
'Sort ['name DESC NULLS LAST], true
+- Join FullOuter, (name#64 = name#78)
   :- LocalRelation [name#64, age#65L]
   +- LocalRelation [name#78, height#79L]
 {code}
 

On the other hand, this query failed in classic Spark Connect:
{code:java}
df.join(df, df.name == df.name, "outer").select(df.name).show() {code}
{code:java}
pyspark.errors.exceptions.captured.AnalysisException: Column name#0 are 
ambiguous... {code}
 

but this query works with Spark Connect.

We need to investigate the behavior difference and fix it.

 


> Investigate the behavior difference in self-join
> 
>
> Key: SPARK-45509
> URL: https://issues.apache.org/jira/browse/SPARK-45509
> Project: Spark
>  Issue Type: Sub-task
>  Components: Connect, PySpark
>Affects Versions: 3.5.0, 4.0.0
>Reporter: Allison Wang
>Priority: Major
>
> SPARK-45220 discovers a behavior difference for a self-join scenario between 
> classic Spark and Spark Connect.
> For instance, here is the query that works without Spark Connect: 
> {code:java}
> df = spark.createDataFrame([Row(name="Alice", age=2), Row(name="Bob", age=5)])
> df2 = spark.createDataFrame([Row(name="Tom", height=80), Row(name="Bob", 
> height=85)]){code}
> {code:java}
> joined = df.join(df2, df.name == df2.name, "outer").sort(sf.desc(df.name)) 
> joined.show(){code}
> But in Spark Connect, it throws this exception:
> {code:java}
> pyspark.errors.exceptions.connect.AnalysisException: 
> [UNRESOLVED_COLUMN.WITH_SUGGESTION] A column, variable, or function parameter 
> with name `name` cannot be resolved. Did you mean one of the following? 
> [`name`, `name`, `age`, `height`].;
> 'Sort ['name DESC NULLS LAST], true
> +- Join FullOuter, (name#64 = name#78)
>:- LocalRelation [name#64, age#65L]
>+- LocalRelation [name#78, height#79L]
>  {code}
>  
> On the other hand, this query failed in classic Spark Connect:
> {code:java}
> df.join(df, df.name == df.name, "outer").select(df.name).show() {code}
> {code:java}
> pyspark.errors.exceptions.captured.AnalysisException: Column name#0 are 
> ambiguous... {code}
>  
> but this query works with Spark Connect.
> We need to investigate the behavior difference and fix it.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-45509) Investigate the behavior difference in self-join

2023-10-11 Thread Allison Wang (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-45509?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Allison Wang updated SPARK-45509:
-
Description: 
SPARK-45220 discovers a behavior difference for a self-join scenario between 
classic Spark and Spark Connect.

For instance, here is the query that works without Spark Connect: 
{code:java}
joined = df.join(df2, df.name == df2.name, "outer").sort(sf.desc(df.name)) 
joined.show(){code}
But in Spark Connect, it throws this exception:
{code:java}
pyspark.errors.exceptions.connect.AnalysisException: 
[UNRESOLVED_COLUMN.WITH_SUGGESTION] A column, variable, or function parameter 
with name `name` cannot be resolved. Did you mean one of the following? 
[`name`, `name`, `age`, `height`].;
'Sort ['name DESC NULLS LAST], true
+- Join FullOuter, (name#64 = name#78)
   :- LocalRelation [name#64, age#65L]
   +- LocalRelation [name#78, height#79L]
 {code}
 

On the other hand, this query failed in classic Spark Connect:
{code:java}
df.join(df, df.name == df.name, "outer").select(df.name).show() {code}
{code:java}
pyspark.errors.exceptions.captured.AnalysisException: Column name#0 are 
ambiguous... {code}
 

but this query works with Spark Connect.

We need to investigate the behavior difference and fix it.

 

  was:
SPARK-45220 discovers a behavior difference for a self-join scenario between 
class Spark and Spark Connect.

For instance. here is the query that works without Spark Connect: 

 
{code:java}
joined = df.join(df2, df.name == df2.name, "outer").sort(sf.desc(df.name)) 
joined.show(){code}
 

But in Spark Connect, it throws this exception:

 
{code:java}
pyspark.errors.exceptions.connect.AnalysisException: 
[UNRESOLVED_COLUMN.WITH_SUGGESTION] A column, variable, or function parameter 
with name `name` cannot be resolved. Did you mean one of the following? 
[`name`, `name`, `age`, `height`].;
'Sort ['name DESC NULLS LAST], true
+- Join FullOuter, (name#64 = name#78)
   :- LocalRelation [name#64, age#65L]
   +- LocalRelation [name#78, height#79L]
 {code}
 

On the other hand, this query failed in classic Spark Connect:

 
{code:java}
df.join(df, df.name == df.name, "outer").select(df.name).show() {code}
{code:java}
pyspark.errors.exceptions.captured.AnalysisException: Column name#0 are 
ambiguous... {code}
 

but this query works with Spark Connect.

We need to investigate the behavior difference and fix it.

 


> Investigate the behavior difference in self-join
> 
>
> Key: SPARK-45509
> URL: https://issues.apache.org/jira/browse/SPARK-45509
> Project: Spark
>  Issue Type: Sub-task
>  Components: Connect, PySpark
>Affects Versions: 3.5.0, 4.0.0
>Reporter: Allison Wang
>Priority: Major
>
> SPARK-45220 discovers a behavior difference for a self-join scenario between 
> classic Spark and Spark Connect.
> For instance, here is the query that works without Spark Connect: 
> {code:java}
> joined = df.join(df2, df.name == df2.name, "outer").sort(sf.desc(df.name)) 
> joined.show(){code}
> But in Spark Connect, it throws this exception:
> {code:java}
> pyspark.errors.exceptions.connect.AnalysisException: 
> [UNRESOLVED_COLUMN.WITH_SUGGESTION] A column, variable, or function parameter 
> with name `name` cannot be resolved. Did you mean one of the following? 
> [`name`, `name`, `age`, `height`].;
> 'Sort ['name DESC NULLS LAST], true
> +- Join FullOuter, (name#64 = name#78)
>:- LocalRelation [name#64, age#65L]
>+- LocalRelation [name#78, height#79L]
>  {code}
>  
> On the other hand, this query failed in classic Spark Connect:
> {code:java}
> df.join(df, df.name == df.name, "outer").select(df.name).show() {code}
> {code:java}
> pyspark.errors.exceptions.captured.AnalysisException: Column name#0 are 
> ambiguous... {code}
>  
> but this query works with Spark Connect.
> We need to investigate the behavior difference and fix it.
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-45509) Investigate the behavior difference in self-join

2023-10-11 Thread Allison Wang (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-45509?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Allison Wang updated SPARK-45509:
-
Description: 
SPARK-45220 discovers a behavior difference for a self-join scenario between 
class Spark and Spark Connect.

For instance. here is the query that works without Spark Connect: 

 
{code:java}
joined = df.join(df2, df.name == df2.name, "outer").sort(sf.desc(df.name)) 
joined.show(){code}
 

But in Spark Connect, it throws this exception:

 
{code:java}
pyspark.errors.exceptions.connect.AnalysisException: 
[UNRESOLVED_COLUMN.WITH_SUGGESTION] A column, variable, or function parameter 
with name `name` cannot be resolved. Did you mean one of the following? 
[`name`, `name`, `age`, `height`].;
'Sort ['name DESC NULLS LAST], true
+- Join FullOuter, (name#64 = name#78)
   :- LocalRelation [name#64, age#65L]
   +- LocalRelation [name#78, height#79L]
 {code}
 

On the other hand, this query failed in classic Spark Connect:

 
{code:java}
df.join(df, df.name == df.name, "outer").select(df.name).show() {code}
{code:java}
pyspark.errors.exceptions.captured.AnalysisException: Column name#0 are 
ambiguous... {code}
 

but this query works with Spark Connect.

We need to investigate the behavior difference and fix it.

 

  was:
SAPRK-45220 discovers a behavior difference for a self-join scenario between 
class Spark and Spark Connect.

For instance. here is the query that works without Spark Connect: 

 
{code:java}
joined = df.join(df2, df.name == df2.name, "outer").sort(sf.desc(df.name)) 
joined.show(){code}
 

But in Spark Connect, it throws this exception:

 
{code:java}
pyspark.errors.exceptions.connect.AnalysisException: 
[UNRESOLVED_COLUMN.WITH_SUGGESTION] A column, variable, or function parameter 
with name `name` cannot be resolved. Did you mean one of the following? 
[`name`, `name`, `age`, `height`].;
'Sort ['name DESC NULLS LAST], true
+- Join FullOuter, (name#64 = name#78)
   :- LocalRelation [name#64, age#65L]
   +- LocalRelation [name#78, height#79L]
 {code}
 

On the other hand, this query failed in classic Spark Connect:

 
{code:java}
df.join(df, df.name == df.name, "outer").select(df.name).show() {code}
{code:java}
pyspark.errors.exceptions.captured.AnalysisException: Column name#0 are 
ambiguous... {code}
 

but this query works with Spark Connect.

We need to investigate the behavior difference and fix it.

 


> Investigate the behavior difference in self-join
> 
>
> Key: SPARK-45509
> URL: https://issues.apache.org/jira/browse/SPARK-45509
> Project: Spark
>  Issue Type: Sub-task
>  Components: Connect, PySpark
>Affects Versions: 3.5.0, 4.0.0
>Reporter: Allison Wang
>Priority: Major
>
> SPARK-45220 discovers a behavior difference for a self-join scenario between 
> class Spark and Spark Connect.
> For instance. here is the query that works without Spark Connect: 
>  
> {code:java}
> joined = df.join(df2, df.name == df2.name, "outer").sort(sf.desc(df.name)) 
> joined.show(){code}
>  
> But in Spark Connect, it throws this exception:
>  
> {code:java}
> pyspark.errors.exceptions.connect.AnalysisException: 
> [UNRESOLVED_COLUMN.WITH_SUGGESTION] A column, variable, or function parameter 
> with name `name` cannot be resolved. Did you mean one of the following? 
> [`name`, `name`, `age`, `height`].;
> 'Sort ['name DESC NULLS LAST], true
> +- Join FullOuter, (name#64 = name#78)
>:- LocalRelation [name#64, age#65L]
>+- LocalRelation [name#78, height#79L]
>  {code}
>  
> On the other hand, this query failed in classic Spark Connect:
>  
> {code:java}
> df.join(df, df.name == df.name, "outer").select(df.name).show() {code}
> {code:java}
> pyspark.errors.exceptions.captured.AnalysisException: Column name#0 are 
> ambiguous... {code}
>  
> but this query works with Spark Connect.
> We need to investigate the behavior difference and fix it.
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-45508) org.apache.spark.unsafe.Platform uses wrong cleaner class name in JDK 9.b110+

2023-10-11 Thread Josh Rosen (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-45508?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Josh Rosen updated SPARK-45508:
---
Description: 
In JDK >= 9.b110, the code at 
[https://github.com/apache/spark/blob/v3.5.0/common/unsafe/src/main/java/org/apache/spark/unsafe/Platform.java#L213]
 hits a fallback path because we are using the wrong cleaner class name: 
`jdk.internal.ref.Cleaner` was removed in 
[https://bugs.openjdk.org/browse/JDK-8149925] 

This can be verified via
 
```
val f = 
classOf[org.apache.spark.unsafe.Platform].getDeclaredField("CLEANER_CREATE_METHOD")
f.setAccessible(true)
f.get(null)
```
returning `null` instead of a method.

  was:
In JDK >= 9.b110, the code at 
[https://github.com/apache/spark/blob/v3.5.0/common/unsafe/src/main/java/org/apache/spark/unsafe/Platform.java#L213]
 hits a fallback path because we are using the wrong cleaner class name: 
`jdk.internal.ref.Cleaner` was removed in JDK-8149925 
[https://bugs.java.com/bugdatabase/view_bug.do?bug_id=8149925] 


This can be verified via
 
```
val f = 
classOf[org.apache.spark.unsafe.Platform].getDeclaredField("CLEANER_CREATE_METHOD")
f.setAccessible(true)
f.get(null)
```
returning `null` instead of a method.


> org.apache.spark.unsafe.Platform uses wrong cleaner class name in JDK 9.b110+
> -
>
> Key: SPARK-45508
> URL: https://issues.apache.org/jira/browse/SPARK-45508
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Core
>Affects Versions: 3.5.0
>Reporter: Josh Rosen
>Priority: Major
>
> In JDK >= 9.b110, the code at 
> [https://github.com/apache/spark/blob/v3.5.0/common/unsafe/src/main/java/org/apache/spark/unsafe/Platform.java#L213]
>  hits a fallback path because we are using the wrong cleaner class name: 
> `jdk.internal.ref.Cleaner` was removed in 
> [https://bugs.openjdk.org/browse/JDK-8149925] 
> This can be verified via
>  
> ```
> val f = 
> classOf[org.apache.spark.unsafe.Platform].getDeclaredField("CLEANER_CREATE_METHOD")
> f.setAccessible(true)
> f.get(null)
> ```
> returning `null` instead of a method.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-45508) org.apache.spark.unsafe.Platform uses wrong cleaner class name in JDK 9.b110+

2023-10-11 Thread Josh Rosen (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-45508?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Josh Rosen updated SPARK-45508:
---
Description: 
In JDK >= 9.b110, the code at 
[https://github.com/apache/spark/blob/v3.5.0/common/unsafe/src/main/java/org/apache/spark/unsafe/Platform.java#L213]
 hits a fallback path because we are using the wrong cleaner class name: 
`jdk.internal.ref.Cleaner` was removed in JDK-8149925 
[https://bugs.java.com/bugdatabase/view_bug.do?bug_id=8149925] 


This can be verified via
 
```
val f = 
classOf[org.apache.spark.unsafe.Platform].getDeclaredField("CLEANER_CREATE_METHOD")
f.setAccessible(true)
f.get(null)
```
returning `null` instead of a method.

  was:
In JDK 11+, the code at 
[https://github.com/apache/spark/blob/v3.5.0/common/unsafe/src/main/java/org/apache/spark/unsafe/Platform.java#L213]
 hits a fallback path because we are using the wrong cleaner class name.
 
This can be verified via
 
```
val f = 
classOf[org.apache.spark.unsafe.Platform].getDeclaredField("CLEANER_CREATE_METHOD")
f.setAccessible(true)
f.get(null)
```
returning `null` instead of a method.

Summary: org.apache.spark.unsafe.Platform uses wrong cleaner class name 
in JDK 9.b110+  (was: org.apache.spark.unsafe.Platform uses wrong cleaner class 
name in JDK 11+)

> org.apache.spark.unsafe.Platform uses wrong cleaner class name in JDK 9.b110+
> -
>
> Key: SPARK-45508
> URL: https://issues.apache.org/jira/browse/SPARK-45508
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Core
>Affects Versions: 3.5.0
>Reporter: Josh Rosen
>Priority: Major
>
> In JDK >= 9.b110, the code at 
> [https://github.com/apache/spark/blob/v3.5.0/common/unsafe/src/main/java/org/apache/spark/unsafe/Platform.java#L213]
>  hits a fallback path because we are using the wrong cleaner class name: 
> `jdk.internal.ref.Cleaner` was removed in JDK-8149925 
> [https://bugs.java.com/bugdatabase/view_bug.do?bug_id=8149925] 
> This can be verified via
>  
> ```
> val f = 
> classOf[org.apache.spark.unsafe.Platform].getDeclaredField("CLEANER_CREATE_METHOD")
> f.setAccessible(true)
> f.get(null)
> ```
> returning `null` instead of a method.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Created] (SPARK-45509) Investigate the behavior difference in self-join

2023-10-11 Thread Allison Wang (Jira)

Allison Wang created SPARK-45509:


 Summary: Investigate the behavior difference in self-join
 Key: SPARK-45509
 URL: https://issues.apache.org/jira/browse/SPARK-45509
 Project: Spark
  Issue Type: Sub-task
  Components: Connect, PySpark
Affects Versions: 3.5.0, 4.0.0
Reporter: Allison Wang


SAPRK-45220 discovers a behavior difference for a self-join scenario between 
class Spark and Spark Connect.

For instance. here is the query that works without Spark Connect: 

 
{code:java}
joined = df.join(df2, df.name == df2.name, "outer").sort(sf.desc(df.name)) 
joined.show(){code}
 

But in Spark Connect, it throws this exception:

 
{code:java}
pyspark.errors.exceptions.connect.AnalysisException: 
[UNRESOLVED_COLUMN.WITH_SUGGESTION] A column, variable, or function parameter 
with name `name` cannot be resolved. Did you mean one of the following? 
[`name`, `name`, `age`, `height`].;
'Sort ['name DESC NULLS LAST], true
+- Join FullOuter, (name#64 = name#78)
   :- LocalRelation [name#64, age#65L]
   +- LocalRelation [name#78, height#79L]
 {code}
 

On the other hand, this query failed in classic Spark Connect:

 
{code:java}
df.join(df, df.name == df.name, "outer").select(df.name).show() {code}
{code:java}
pyspark.errors.exceptions.captured.AnalysisException: Column name#0 are 
ambiguous... {code}
 

but this query works with Spark Connect.

We need to investigate the behavior difference and fix it.

 



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Created] (SPARK-45508) org.apache.spark.unsafe.Platform uses wrong cleaner class name in JDK 11+

2023-10-11 Thread Josh Rosen (Jira)

Josh Rosen created SPARK-45508:
--

 Summary: org.apache.spark.unsafe.Platform uses wrong cleaner class 
name in JDK 11+
 Key: SPARK-45508
 URL: https://issues.apache.org/jira/browse/SPARK-45508
 Project: Spark
  Issue Type: Bug
  Components: Spark Core
Affects Versions: 3.5.0
Reporter: Josh Rosen


In JDK 11+, the code at 
[https://github.com/apache/spark/blob/v3.5.0/common/unsafe/src/main/java/org/apache/spark/unsafe/Platform.java#L213]
 hits a fallback path because we are using the wrong cleaner class name.
 
This can be verified via
 
```
val f = 
classOf[org.apache.spark.unsafe.Platform].getDeclaredField("CLEANER_CREATE_METHOD")
f.setAccessible(true)
f.get(null)
```
returning `null` instead of a method.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-45487) Replace: _LEGACY_ERROR_TEMP_3007

2023-10-11 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-45487?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated SPARK-45487:
---
Labels: pull-request-available  (was: )

> Replace: _LEGACY_ERROR_TEMP_3007
> 
>
> Key: SPARK-45487
> URL: https://issues.apache.org/jira/browse/SPARK-45487
> Project: Spark
>  Issue Type: Improvement
>  Components: Spark Core
>Affects Versions: 4.0.0
>Reporter: Serge Rielau
>Priority: Major
>  Labels: pull-request-available
>
> def checkpointRDDBlockIdNotFoundError(rddBlockId: RDDBlockId): Throwable = \{
> new SparkException(
>   errorClass = "_LEGACY_ERROR_TEMP_3007",
>   messageParameters = Map("rddBlockId" -> s"$rddBlockId"),
>   cause = null
> )
>   }
> This error condition appears to be quite common, so we should convert it to a 
> proper error class.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-45507) Correctness bug in correlated scalar subqueries with COUNT aggregates

2023-10-11 Thread Andy Lam (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-45507?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Andy Lam updated SPARK-45507:
-
Description: 
{code:java}
 
create view if not exists t1(a1, a2) as values (0, 1), (1, 2);
create view if not exists t2(b1, b2) as values (0, 2), (0, 3);
create view if not exists t3(c1, c2) as values (0, 2), (0, 3);

-- Example 1
select (
  select SUM(l.cnt + r.cnt)
  from (select count(*) cnt from t2 where t1.a1 = t2.b1 having cnt = 0) l
  join (select count(*) cnt from t3 where t1.a1 = t3.c1 having cnt = 0) r
  on l.cnt = r.cnt
) from t1

-- Correct answer: (null, 0)
+--+
|scalarsubquery(c1, c1)|
+--+
|null  |
|null  |
+--+

-- Example 2
select ( select sum(cnt) from (select count(*) cnt from t2 where t1.c1 = t2.c1) 
) from t1

-- Correct answer: (2, 0)
+--+
|scalarsubquery(c1)|
+--+
|2 |
|null  |
+--+

-- Example 3
select ( select count(*) from (select count(*) cnt from t2 where t1.c1 = t2.c1) 
) from t1

-- Correct answer: (1, 1)
+--+
|scalarsubquery(c1)|
+--+
|1 |
|0 |
+--+ {code}
 

 

DB fiddle for correctness 
check:[https://www.db-fiddle.com/f/4jyoMCicNSZpjMt4jFYoz5/10403#]

  was:
{code:java}
 
create view if not exists t1(a1, a2) as values (0, 1), (1, 2);
create view if not exists t2(b1, b2) as values (0, 2), (0, 3);
create view if not exists t3(c1, c2) as values (0, 2), (0, 3);

-- Example 1 (has having clause)
select (
  select SUM(l.cnt + r.cnt)
  from (select count(*) cnt from t2 where t1.a1 = t2.b1 having cnt = 0) l
  join (select count(*) cnt from t3 where t1.a1 = t3.c1 having cnt = 0) r
  on l.cnt = r.cnt
) from t1

-- Correct answer: (null, 0)
+--+
|scalarsubquery(c1, c1)|
+--+
|null  |
|null  |
+--+

-- Example 2
select ( select sum(cnt) from (select count(*) cnt from t2 where t1.c1 = t2.c1) 
) from t1

-- Correct answer: (2, 0)
+--+
|scalarsubquery(c1)|
+--+
|2 |
|null  |
+--+

-- Example 3
select ( select count(*) from (select count(*) cnt from t2 where t1.c1 = t2.c1) 
) from t1

-- Correct answer: (1, 1)
+--+
|scalarsubquery(c1)|
+--+
|1 |
|0 |
+--+ {code}
 

 

DB fiddle for correctness 
check:[https://www.db-fiddle.com/f/4jyoMCicNSZpjMt4jFYoz5/10403#]


> Correctness bug in correlated scalar subqueries with COUNT aggregates
> -
>
> Key: SPARK-45507
> URL: https://issues.apache.org/jira/browse/SPARK-45507
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 4.0.0
>Reporter: Andy Lam
>Priority: Major
>  Labels: pull-request-available
>
> {code:java}
>  
> create view if not exists t1(a1, a2) as values (0, 1), (1, 2);
> create view if not exists t2(b1, b2) as values (0, 2), (0, 3);
> create view if not exists t3(c1, c2) as values (0, 2), (0, 3);
> -- Example 1
> select (
>   select SUM(l.cnt + r.cnt)
>   from (select count(*) cnt from t2 where t1.a1 = t2.b1 having cnt = 0) l
>   join (select count(*) cnt from t3 where t1.a1 = t3.c1 having cnt = 0) r
>   on l.cnt = r.cnt
> ) from t1
> -- Correct answer: (null, 0)
> +--+
> |scalarsubquery(c1, c1)|
> +--+
> |null  |
> |null  |
> +--+
> -- Example 2
> select ( select sum(cnt) from (select count(*) cnt from t2 where t1.c1 = 
> t2.c1) ) from t1
> -- Correct answer: (2, 0)
> +--+
> |scalarsubquery(c1)|
> +--+
> |2 |
> |null  |
> +--+
> -- Example 3
> select ( select count(*) from (select count(*) cnt from t2 where t1.c1 = 
> t2.c1) ) from t1
> -- Correct answer: (1, 1)
> +--+
> |scalarsubquery(c1)|
> +--+
> |1 |
> |0 |
> +--+ {code}
>  
>  
> DB fiddle for correctness 
> check:[https://www.db-fiddle.com/f/4jyoMCicNSZpjMt4jFYoz5/10403#]



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-45507) Correctness bug in correlated scalar subqueries with COUNT aggregates

2023-10-11 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-45507?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated SPARK-45507:
---
Labels: pull-request-available  (was: )

> Correctness bug in correlated scalar subqueries with COUNT aggregates
> -
>
> Key: SPARK-45507
> URL: https://issues.apache.org/jira/browse/SPARK-45507
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 4.0.0
>Reporter: Andy Lam
>Priority: Major
>  Labels: pull-request-available
>
> {code:java}
>  
> create view if not exists t1(a1, a2) as values (0, 1), (1, 2);
> create view if not exists t2(b1, b2) as values (0, 2), (0, 3);
> create view if not exists t3(c1, c2) as values (0, 2), (0, 3);
> -- Example 1 (has having clause)
> select (
>   select SUM(l.cnt + r.cnt)
>   from (select count(*) cnt from t2 where t1.a1 = t2.b1 having cnt = 0) l
>   join (select count(*) cnt from t3 where t1.a1 = t3.c1 having cnt = 0) r
>   on l.cnt = r.cnt
> ) from t1
> -- Correct answer: (null, 0)
> +--+
> |scalarsubquery(c1, c1)|
> +--+
> |null  |
> |null  |
> +--+
> -- Example 2
> select ( select sum(cnt) from (select count(*) cnt from t2 where t1.c1 = 
> t2.c1) ) from t1
> -- Correct answer: (2, 0)
> +--+
> |scalarsubquery(c1)|
> +--+
> |2 |
> |null  |
> +--+
> -- Example 3
> select ( select count(*) from (select count(*) cnt from t2 where t1.c1 = 
> t2.c1) ) from t1
> -- Correct answer: (1, 1)
> +--+
> |scalarsubquery(c1)|
> +--+
> |1 |
> |0 |
> +--+ {code}
>  
>  
> DB fiddle for correctness 
> check:[https://www.db-fiddle.com/f/4jyoMCicNSZpjMt4jFYoz5/10403#]



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Created] (SPARK-45507) Correctness bug in correlated scalar subqueries with COUNT aggregates

2023-10-11 Thread Andy Lam (Jira)

Andy Lam created SPARK-45507:


 Summary: Correctness bug in correlated scalar subqueries with 
COUNT aggregates
 Key: SPARK-45507
 URL: https://issues.apache.org/jira/browse/SPARK-45507
 Project: Spark
  Issue Type: Bug
  Components: SQL
Affects Versions: 4.0.0
Reporter: Andy Lam


{code:java}
 
create view if not exists t1(a1, a2) as values (0, 1), (1, 2);
create view if not exists t2(b1, b2) as values (0, 2), (0, 3);
create view if not exists t3(c1, c2) as values (0, 2), (0, 3);

-- Example 1 (has having clause)
select (
  select SUM(l.cnt + r.cnt)
  from (select count(*) cnt from t2 where t1.a1 = t2.b1 having cnt = 0) l
  join (select count(*) cnt from t3 where t1.a1 = t3.c1 having cnt = 0) r
  on l.cnt = r.cnt
) from t1

-- Correct answer: (null, 0)
+--+
|scalarsubquery(c1, c1)|
+--+
|null  |
|null  |
+--+

-- Example 2
select ( select sum(cnt) from (select count(*) cnt from t2 where t1.c1 = t2.c1) 
) from t1

-- Correct answer: (2, 0)
+--+
|scalarsubquery(c1)|
+--+
|2 |
|null  |
+--+

-- Example 3
select ( select count(*) from (select count(*) cnt from t2 where t1.c1 = t2.c1) 
) from t1

-- Correct answer: (1, 1)
+--+
|scalarsubquery(c1)|
+--+
|1 |
|0 |
+--+ {code}
 

 

DB fiddle for correctness 
check:[https://www.db-fiddle.com/f/4jyoMCicNSZpjMt4jFYoz5/10403#]



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Created] (SPARK-45506) Support ivy URIs in SparkConnect addArtifact

2023-10-11 Thread Vsevolod Stepanov (Jira)

Vsevolod Stepanov created SPARK-45506:
-

 Summary: Support ivy URIs in SparkConnect addArtifact
 Key: SPARK-45506
 URL: https://issues.apache.org/jira/browse/SPARK-45506
 Project: Spark
  Issue Type: Improvement
  Components: Connect
Affects Versions: 4.0.0
Reporter: Vsevolod Stepanov


Right now Spark Connect's addArtifact API supports only adding .jar & .class 
files. It would be useful to extend this API to support adding arbitrary Maven 
artifacts using Ivy



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Resolved] (SPARK-45221) Refine docstring of `DataFrameReader.parquet`

2023-10-11 Thread Hyukjin Kwon (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-45221?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hyukjin Kwon resolved SPARK-45221.
--
Fix Version/s: 4.0.0
   Resolution: Fixed

Issue resolved by pull request 43301
[https://github.com/apache/spark/pull/43301]

> Refine docstring of `DataFrameReader.parquet`
> -
>
> Key: SPARK-45221
> URL: https://issues.apache.org/jira/browse/SPARK-45221
> Project: Spark
>  Issue Type: Sub-task
>  Components: Documentation, PySpark
>Affects Versions: 4.0.0
>Reporter: Allison Wang
>Assignee: Allison Wang
>Priority: Major
>  Labels: pull-request-available
> Fix For: 4.0.0
>
>
> Refine the docstring of read parquet



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-45221) Refine docstring of `DataFrameReader.parquet`

2023-10-11 Thread Hyukjin Kwon (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-45221?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hyukjin Kwon reassigned SPARK-45221:


Assignee: Allison Wang

> Refine docstring of `DataFrameReader.parquet`
> -
>
> Key: SPARK-45221
> URL: https://issues.apache.org/jira/browse/SPARK-45221
> Project: Spark
>  Issue Type: Sub-task
>  Components: Documentation, PySpark
>Affects Versions: 4.0.0
>Reporter: Allison Wang
>Assignee: Allison Wang
>Priority: Major
>  Labels: pull-request-available
>
> Refine the docstring of read parquet



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-45505) Refactor analyzeInPython function to make it reusable

2023-10-11 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-45505?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated SPARK-45505:
---
Labels: pull-request-available  (was: )

> Refactor analyzeInPython function to make it reusable
> -
>
> Key: SPARK-45505
> URL: https://issues.apache.org/jira/browse/SPARK-45505
> Project: Spark
>  Issue Type: Sub-task
>  Components: PySpark
>Affects Versions: 4.0.0
>Reporter: Allison Wang
>Priority: Major
>  Labels: pull-request-available
>
> Refactor analyzeInPython method in UserDefinedPythonTableFunction object into 
> an abstract class so that it can be reused in the future.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-43396) Add config to control max ratio of decommissioning executors

2023-10-11 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-43396?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated SPARK-43396:
---
Labels: pull-request-available  (was: )

> Add config to control max ratio of decommissioning executors
> 
>
> Key: SPARK-43396
> URL: https://issues.apache.org/jira/browse/SPARK-43396
> Project: Spark
>  Issue Type: New Feature
>  Components: Spark Core
>Affects Versions: 3.4.0
>Reporter: Zhongwei Zhu
>Priority: Major
>  Labels: pull-request-available
>
> Decommission too many executors at the same time with shuffle or rdd 
> migration could severely hurt performance of shuffle fetch. Block manager 
> decommissioner try to migrate shuffle or rdd as soon as possible, this will 
> compete network and disk IO with shuffle fetch in the target executor. 



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-42584) Improve output of Column.explain

2023-10-11 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-42584?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated SPARK-42584:
---
Labels: pull-request-available  (was: )

> Improve output of Column.explain
> 
>
> Key: SPARK-42584
> URL: https://issues.apache.org/jira/browse/SPARK-42584
> Project: Spark
>  Issue Type: New Feature
>  Components: Connect
>Affects Versions: 3.4.0
>Reporter: Herman van Hövell
>Priority: Major
>  Labels: pull-request-available
>
> We currently display the structure of the proto in both the regular and 
> extended version of explain. We should display a more compact sql-a-like 
> string for the regular version.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Created] (SPARK-45505) Refactor analyzeInPython function to make it reusable

2023-10-11 Thread Allison Wang (Jira)

Allison Wang created SPARK-45505:


 Summary: Refactor analyzeInPython function to make it reusable
 Key: SPARK-45505
 URL: https://issues.apache.org/jira/browse/SPARK-45505
 Project: Spark
  Issue Type: Sub-task
  Components: PySpark
Affects Versions: 4.0.0
Reporter: Allison Wang


Refactor analyzeInPython method in UserDefinedPythonTableFunction object into 
an abstract class so that it can be reused in the future.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-45504) RocksDB State Store Should Lower RocksDB Background Thread CPU Priority

2023-10-11 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-45504?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated SPARK-45504:
---
Labels: pull-request-available  (was: )

> RocksDB State Store Should Lower RocksDB Background Thread CPU Priority
> ---
>
> Key: SPARK-45504
> URL: https://issues.apache.org/jira/browse/SPARK-45504
> Project: Spark
>  Issue Type: Task
>  Components: Structured Streaming
>Affects Versions: 3.5.1
>Reporter: Siying Dong
>Priority: Minor
>  Labels: pull-request-available
>
> We can move RocksDB flush and compaction to lower CPU priority. They usually 
> are background tasks and don't need to compete with task execution. For the 
> case where a task may wait for some RocksDB background task to finish, such 
> as checkpointing, or waiting async checkpointing to finish, the task slot is 
> waiting so we are likely to have enough CPU for low pri CPU anyway.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Created] (SPARK-45504) RocksDB State Store Should Lower RocksDB Background Thread CPU Priority

2023-10-11 Thread Siying Dong (Jira)

Siying Dong created SPARK-45504:
---

 Summary: RocksDB State Store Should Lower RocksDB Background 
Thread CPU Priority
 Key: SPARK-45504
 URL: https://issues.apache.org/jira/browse/SPARK-45504
 Project: Spark
  Issue Type: Task
  Components: Structured Streaming
Affects Versions: 3.5.1
Reporter: Siying Dong


We can move RocksDB flush and compaction to lower CPU priority. They usually 
are background tasks and don't need to compete with task execution. For the 
case where a task may wait for some RocksDB background task to finish, such as 
checkpointing, or waiting async checkpointing to finish, the task slot is 
waiting so we are likely to have enough CPU for low pri CPU anyway.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-45503) RocksDB State Store to Use LZ4 Compression

2023-10-11 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-45503?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated SPARK-45503:
---
Labels: pull-request-available  (was: )

> RocksDB State Store to Use LZ4 Compression
> --
>
> Key: SPARK-45503
> URL: https://issues.apache.org/jira/browse/SPARK-45503
> Project: Spark
>  Issue Type: Task
>  Components: Structured Streaming
>Affects Versions: 3.5.1
>Reporter: Siying Dong
>Priority: Minor
>  Labels: pull-request-available
>
> LZ4 is generally faster than Snappy. That's probably we use LZ4 in changelogs 
> and other places by default. However, we don't change RocksDB's default of 
> Snappy compression style. The RocksDB Team recommend LZ4 or ZSTD and the 
> default is kept to Snappy only for backward compatible reason. We should use 
> LZ4 instead.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Created] (SPARK-45503) RocksDB State Store to Use LZ4 Compression

2023-10-11 Thread Siying Dong (Jira)

Siying Dong created SPARK-45503:
---

 Summary: RocksDB State Store to Use LZ4 Compression
 Key: SPARK-45503
 URL: https://issues.apache.org/jira/browse/SPARK-45503
 Project: Spark
  Issue Type: Task
  Components: Structured Streaming
Affects Versions: 3.5.1
Reporter: Siying Dong


LZ4 is generally faster than Snappy. That's probably we use LZ4 in changelogs 
and other places by default. However, we don't change RocksDB's default of 
Snappy compression style. The RocksDB Team recommend LZ4 or ZSTD and the 
default is kept to Snappy only for backward compatible reason. We should use 
LZ4 instead.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Resolved] (SPARK-45490) Replace: _LEGACY_ERROR_TEMP_2151 with a proper error class

2023-10-11 Thread Serge Rielau (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-45490?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Serge Rielau resolved SPARK-45490.
--
Resolution: Cannot Reproduce

Seems to have been implemented as: EXPRESSION_DECODING_FAILED

> Replace: _LEGACY_ERROR_TEMP_2151 with a proper error class
> --
>
> Key: SPARK-45490
> URL: https://issues.apache.org/jira/browse/SPARK-45490
> Project: Spark
>  Issue Type: Improvement
>  Components: Spark Core
>Affects Versions: 4.0.0
>Reporter: Serge Rielau
>Priority: Major
>
> {code:java}
> def expressionDecodingError(e: Exception, expressions: Seq[Expression]): 
> SparkRuntimeException = {
>   new SparkRuntimeException(
> errorClass = "_LEGACY_ERROR_TEMP_2151",
> messageParameters = Map(
>   "e" -> e.toString(),
>   "expressions" -> expressions.map(
> _.simpleString(SQLConf.get.maxToStringFields)).mkString("\n")),
> cause = e)
> } {code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Resolved] (SPARK-44855) Small tweaks to attaching ExecuteGrpcResponseSender to ExecuteResponseObserver

2023-10-11 Thread Jira



 [ 
https://issues.apache.org/jira/browse/SPARK-44855?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Herman van Hövell resolved SPARK-44855.
---
Fix Version/s: 4.0.0
 Assignee: Juliusz Sompolski
   Resolution: Fixed

> Small tweaks to attaching ExecuteGrpcResponseSender to ExecuteResponseObserver
> --
>
> Key: SPARK-44855
> URL: https://issues.apache.org/jira/browse/SPARK-44855
> Project: Spark
>  Issue Type: Improvement
>  Components: Connect
>Affects Versions: 4.0.0
>Reporter: Juliusz Sompolski
>Assignee: Juliusz Sompolski
>Priority: Major
>  Labels: pull-request-available
> Fix For: 4.0.0
>
>
> Small improvements can be made to the way new ExecuteGrpcResponseSender is 
> attached to observer.
>  * Since now we have addGrpcResponseSender in ExecuteHolder, it should be 
> ExecuteHolder responsibility to interrupt the old sender and that there is 
> only one at a time, and to ExecuteResponseObserver's responsibility
>  * executeObserver is used as a lock for synchronization. An explicit lock 
> object could be better.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-45502) Upgrade Kafka to 3.6.0

2023-10-11 Thread Dongjoon Hyun (Jira)



[ 
https://issues.apache.org/jira/browse/SPARK-45502?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17774147#comment-17774147
 ] 

Dongjoon Hyun commented on SPARK-45502:
---

Thank you for volunteering, [~dengziming] . BTW, Apache Spark community 
respects the first PR. We don't have any locking or assignee system 
intentionally. The first PR will be reviewed first. And, the committers is 
going to set the assignee of this Jira when the PR is merged.

> Upgrade Kafka to 3.6.0
> --
>
> Key: SPARK-45502
> URL: https://issues.apache.org/jira/browse/SPARK-45502
> Project: Spark
>  Issue Type: Sub-task
>  Components: Build
>Affects Versions: 4.0.0
>Reporter: Dongjoon Hyun
>Priority: Major
>
> Apache Kafka 3.6.0 is released on Oct 10, 2023.
> - https://downloads.apache.org/kafka/3.6.0/RELEASE_NOTES.html



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Resolved] (SPARK-45433) CSV/JSON schema inference when timestamps do not match specified timestampFormat with only one row on each partition report error

2023-10-11 Thread Max Gekk (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-45433?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Max Gekk resolved SPARK-45433.
--
Fix Version/s: 3.5.1
   4.0.0
   Resolution: Fixed

Issue resolved by pull request 43243
[https://github.com/apache/spark/pull/43243]

> CSV/JSON schema inference when timestamps do not match specified 
> timestampFormat with only one row on each partition report error
> -
>
> Key: SPARK-45433
> URL: https://issues.apache.org/jira/browse/SPARK-45433
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 3.3.0, 3.4.0, 3.5.0
>Reporter: Jia Fan
>Assignee: Jia Fan
>Priority: Major
>  Labels: pull-request-available
> Fix For: 3.5.1, 4.0.0
>
>
> CSV/JSON schema inference when timestamps do not match specified 
> timestampFormat with `only one row on each partition` report error.
> {code:java}
> //eg
> val csv = spark.read.option("timestampFormat", "-MM-dd'T'HH:mm:ss")
>   .option("inferSchema", true).csv(Seq("2884-06-24T02:45:51.138").toDS())
> csv.show() {code}
> {code:java}
> //error
> Caused by: java.time.format.DateTimeParseException: Text 
> '2884-06-24T02:45:51.138' could not be parsed, unparsed text found at index 
> 19 {code}
> This bug affect 3.3/3.4/3.5. Unlike 
> https://issues.apache.org/jira/browse/SPARK-45424 , this is a different bug 
> but has the same error message



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-45433) CSV/JSON schema inference when timestamps do not match specified timestampFormat with only one row on each partition report error

2023-10-11 Thread Max Gekk (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-45433?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Max Gekk reassigned SPARK-45433:


Assignee: Jia Fan

> CSV/JSON schema inference when timestamps do not match specified 
> timestampFormat with only one row on each partition report error
> -
>
> Key: SPARK-45433
> URL: https://issues.apache.org/jira/browse/SPARK-45433
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 3.3.0, 3.4.0, 3.5.0
>Reporter: Jia Fan
>Assignee: Jia Fan
>Priority: Major
>  Labels: pull-request-available
>
> CSV/JSON schema inference when timestamps do not match specified 
> timestampFormat with `only one row on each partition` report error.
> {code:java}
> //eg
> val csv = spark.read.option("timestampFormat", "-MM-dd'T'HH:mm:ss")
>   .option("inferSchema", true).csv(Seq("2884-06-24T02:45:51.138").toDS())
> csv.show() {code}
> {code:java}
> //error
> Caused by: java.time.format.DateTimeParseException: Text 
> '2884-06-24T02:45:51.138' could not be parsed, unparsed text found at index 
> 19 {code}
> This bug affect 3.3/3.4/3.5. Unlike 
> https://issues.apache.org/jira/browse/SPARK-45424 , this is a different bug 
> but has the same error message



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Resolved] (SPARK-45483) Correct the function groups in connect.functions

2023-10-11 Thread Dongjoon Hyun (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-45483?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dongjoon Hyun resolved SPARK-45483.
---
Fix Version/s: 4.0.0
   Resolution: Fixed

Issue resolved by pull request 43309
[https://github.com/apache/spark/pull/43309]

> Correct the function groups in connect.functions
> 
>
> Key: SPARK-45483
> URL: https://issues.apache.org/jira/browse/SPARK-45483
> Project: Spark
>  Issue Type: Improvement
>  Components: Connect
>Affects Versions: 4.0.0
>Reporter: Ruifeng Zheng
>Assignee: Ruifeng Zheng
>Priority: Major
>  Labels: pull-request-available
> Fix For: 4.0.0
>
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Resolved] (SPARK-45499) Replace `Reference#isEnqueued` with `Reference#refersTo`

2023-10-11 Thread Dongjoon Hyun (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-45499?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dongjoon Hyun resolved SPARK-45499.
---
Fix Version/s: 4.0.0
   Resolution: Fixed

Issue resolved by pull request 43325
[https://github.com/apache/spark/pull/43325]

> Replace `Reference#isEnqueued` with `Reference#refersTo`
> 
>
> Key: SPARK-45499
> URL: https://issues.apache.org/jira/browse/SPARK-45499
> Project: Spark
>  Issue Type: Sub-task
>  Components: Spark Core, Tests
>Affects Versions: 4.0.0
>Reporter: Yang Jie
>Assignee: Yang Jie
>Priority: Minor
>  Labels: pull-request-available
> Fix For: 4.0.0
>
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-45499) Replace `Reference#isEnqueued` with `Reference#refersTo`

2023-10-11 Thread Dongjoon Hyun (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-45499?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dongjoon Hyun reassigned SPARK-45499:
-

Assignee: Yang Jie

> Replace `Reference#isEnqueued` with `Reference#refersTo`
> 
>
> Key: SPARK-45499
> URL: https://issues.apache.org/jira/browse/SPARK-45499
> Project: Spark
>  Issue Type: Sub-task
>  Components: Spark Core, Tests
>Affects Versions: 4.0.0
>Reporter: Yang Jie
>Assignee: Yang Jie
>Priority: Minor
>  Labels: pull-request-available
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Resolved] (SPARK-42881) Codegen Support for get_json_object

2023-10-11 Thread Max Gekk (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-42881?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Max Gekk resolved SPARK-42881.
--
Fix Version/s: 4.0.0
   Resolution: Fixed

Issue resolved by pull request 40506
[https://github.com/apache/spark/pull/40506]

> Codegen Support for get_json_object
> ---
>
> Key: SPARK-42881
> URL: https://issues.apache.org/jira/browse/SPARK-42881
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 3.5.0
>Reporter: BingKun Pan
>Assignee: BingKun Pan
>Priority: Minor
>  Labels: pull-request-available
> Fix For: 4.0.0
>
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-42881) Codegen Support for get_json_object

2023-10-11 Thread Max Gekk (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-42881?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Max Gekk reassigned SPARK-42881:


Assignee: BingKun Pan

> Codegen Support for get_json_object
> ---
>
> Key: SPARK-42881
> URL: https://issues.apache.org/jira/browse/SPARK-42881
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 3.5.0
>Reporter: BingKun Pan
>Assignee: BingKun Pan
>Priority: Minor
>  Labels: pull-request-available
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Resolved] (SPARK-45467) Replace `Proxy.getProxyClass()` with `Proxy.newProxyInstance().getClass`

2023-10-11 Thread Sean R. Owen (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-45467?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sean R. Owen resolved SPARK-45467.
--
Fix Version/s: 4.0.0
   Resolution: Fixed

Issue resolved by pull request 43291
[https://github.com/apache/spark/pull/43291]

> Replace `Proxy.getProxyClass()` with `Proxy.newProxyInstance().getClass`
> 
>
> Key: SPARK-45467
> URL: https://issues.apache.org/jira/browse/SPARK-45467
> Project: Spark
>  Issue Type: Sub-task
>  Components: Spark Core
>Affects Versions: 4.0.0
>Reporter: Yang Jie
>Assignee: Yang Jie
>Priority: Major
>  Labels: pull-request-available
> Fix For: 4.0.0
>
>
> {code:java}
>  * @deprecated Proxy classes generated in a named module are encapsulated
>  *  and not accessible to code outside its module.
>  *  {@link Constructor#newInstance(Object...) Constructor.newInstance}
>  *  will throw {@code IllegalAccessException} when it is called on
>  *  an inaccessible proxy class.
>  *  Use {@link #newProxyInstance(ClassLoader, Class[], InvocationHandler)}
>  *  to create a proxy instance instead.
>  *
>  * @see Package and Module Membership of Proxy Class
>  * @revised 9
>  */
> @Deprecated
> @CallerSensitive
> public static Class getProxyClass(ClassLoader loader,
>  Class... interfaces)
> throws IllegalArgumentException {code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-45467) Replace `Proxy.getProxyClass()` with `Proxy.newProxyInstance().getClass`

2023-10-11 Thread Sean R. Owen (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-45467?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sean R. Owen reassigned SPARK-45467:


Assignee: Yang Jie

> Replace `Proxy.getProxyClass()` with `Proxy.newProxyInstance().getClass`
> 
>
> Key: SPARK-45467
> URL: https://issues.apache.org/jira/browse/SPARK-45467
> Project: Spark
>  Issue Type: Sub-task
>  Components: Spark Core
>Affects Versions: 4.0.0
>Reporter: Yang Jie
>Assignee: Yang Jie
>Priority: Major
>  Labels: pull-request-available
>
> {code:java}
>  * @deprecated Proxy classes generated in a named module are encapsulated
>  *  and not accessible to code outside its module.
>  *  {@link Constructor#newInstance(Object...) Constructor.newInstance}
>  *  will throw {@code IllegalAccessException} when it is called on
>  *  an inaccessible proxy class.
>  *  Use {@link #newProxyInstance(ClassLoader, Class[], InvocationHandler)}
>  *  to create a proxy instance instead.
>  *
>  * @see Package and Module Membership of Proxy Class
>  * @revised 9
>  */
> @Deprecated
> @CallerSensitive
> public static Class getProxyClass(ClassLoader loader,
>  Class... interfaces)
> throws IllegalArgumentException {code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-45467) Replace `Proxy.getProxyClass()` with `Proxy.newProxyInstance().getClass`

2023-10-11 Thread Sean R. Owen (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-45467?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sean R. Owen updated SPARK-45467:
-
Priority: Minor  (was: Major)

> Replace `Proxy.getProxyClass()` with `Proxy.newProxyInstance().getClass`
> 
>
> Key: SPARK-45467
> URL: https://issues.apache.org/jira/browse/SPARK-45467
> Project: Spark
>  Issue Type: Sub-task
>  Components: Spark Core
>Affects Versions: 4.0.0
>Reporter: Yang Jie
>Assignee: Yang Jie
>Priority: Minor
>  Labels: pull-request-available
> Fix For: 4.0.0
>
>
> {code:java}
>  * @deprecated Proxy classes generated in a named module are encapsulated
>  *  and not accessible to code outside its module.
>  *  {@link Constructor#newInstance(Object...) Constructor.newInstance}
>  *  will throw {@code IllegalAccessException} when it is called on
>  *  an inaccessible proxy class.
>  *  Use {@link #newProxyInstance(ClassLoader, Class[], InvocationHandler)}
>  *  to create a proxy instance instead.
>  *
>  * @see Package and Module Membership of Proxy Class
>  * @revised 9
>  */
> @Deprecated
> @CallerSensitive
> public static Class getProxyClass(ClassLoader loader,
>  Class... interfaces)
> throws IllegalArgumentException {code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Resolved] (SPARK-45496) Fix the compilation warning related to other-pure-statement

2023-10-11 Thread Sean R. Owen (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-45496?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sean R. Owen resolved SPARK-45496.
--
Fix Version/s: 4.0.0
   Resolution: Fixed

Issue resolved by pull request 43312
[https://github.com/apache/spark/pull/43312]

> Fix the compilation warning related to other-pure-statement
> ---
>
> Key: SPARK-45496
> URL: https://issues.apache.org/jira/browse/SPARK-45496
> Project: Spark
>  Issue Type: Sub-task
>  Components: DStreams, Spark Core
>Affects Versions: 4.0.0
>Reporter: Yang Jie
>Assignee: Yang Jie
>Priority: Minor
>  Labels: pull-request-available
> Fix For: 4.0.0
>
>
> {code:java}
> "-Wconf:cat=other-match-analysis=org.apache.spark.sql.catalyst.catalog.SessionCatalog.lookupFunction.catalogFunction:wv",
> "-Wconf:cat=other-pure-statement=org.apache.spark.streaming.util.FileBasedWriteAheadLog.readAll.readFile:wv",
> "-Wconf:cat=other-pure-statement=org.apache.spark.scheduler.OutputCommitCoordinatorSuite:wv",
> "-Wconf:cat=other-pure-statement=org.apache.spark.sql.streaming.sources.StreamingDataSourceV2Suite.testPositiveCase.\\$anonfun:wv",
>  {code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-45496) Fix the compilation warning related to other-pure-statement

2023-10-11 Thread Sean R. Owen (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-45496?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sean R. Owen updated SPARK-45496:
-
Priority: Minor  (was: Major)

> Fix the compilation warning related to other-pure-statement
> ---
>
> Key: SPARK-45496
> URL: https://issues.apache.org/jira/browse/SPARK-45496
> Project: Spark
>  Issue Type: Sub-task
>  Components: DStreams, Spark Core
>Affects Versions: 4.0.0
>Reporter: Yang Jie
>Assignee: Yang Jie
>Priority: Minor
>  Labels: pull-request-available
>
> {code:java}
> "-Wconf:cat=other-match-analysis=org.apache.spark.sql.catalyst.catalog.SessionCatalog.lookupFunction.catalogFunction:wv",
> "-Wconf:cat=other-pure-statement=org.apache.spark.streaming.util.FileBasedWriteAheadLog.readAll.readFile:wv",
> "-Wconf:cat=other-pure-statement=org.apache.spark.scheduler.OutputCommitCoordinatorSuite:wv",
> "-Wconf:cat=other-pure-statement=org.apache.spark.sql.streaming.sources.StreamingDataSourceV2Suite.testPositiveCase.\\$anonfun:wv",
>  {code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Resolved] (SPARK-43828) Add config to control whether close idle connection

2023-10-11 Thread Sean R. Owen (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-43828?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sean R. Owen resolved SPARK-43828.
--
Resolution: Won't Fix

> Add config to control whether close idle connection
> ---
>
> Key: SPARK-43828
> URL: https://issues.apache.org/jira/browse/SPARK-43828
> Project: Spark
>  Issue Type: Improvement
>  Components: Spark Core
>Affects Versions: 3.4.0
>Reporter: Zhongwei Zhu
>Priority: Minor
>  Labels: pull-request-available
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-45496) Fix the compilation warning related to other-pure-statement

2023-10-11 Thread Sean R. Owen (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-45496?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sean R. Owen reassigned SPARK-45496:


Assignee: Yang Jie

> Fix the compilation warning related to other-pure-statement
> ---
>
> Key: SPARK-45496
> URL: https://issues.apache.org/jira/browse/SPARK-45496
> Project: Spark
>  Issue Type: Sub-task
>  Components: DStreams, Spark Core
>Affects Versions: 4.0.0
>Reporter: Yang Jie
>Assignee: Yang Jie
>Priority: Major
>  Labels: pull-request-available
>
> {code:java}
> "-Wconf:cat=other-match-analysis=org.apache.spark.sql.catalyst.catalog.SessionCatalog.lookupFunction.catalogFunction:wv",
> "-Wconf:cat=other-pure-statement=org.apache.spark.streaming.util.FileBasedWriteAheadLog.readAll.readFile:wv",
> "-Wconf:cat=other-pure-statement=org.apache.spark.scheduler.OutputCommitCoordinatorSuite:wv",
> "-Wconf:cat=other-pure-statement=org.apache.spark.sql.streaming.sources.StreamingDataSourceV2Suite.testPositiveCase.\\$anonfun:wv",
>  {code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-45496) Fix the compilation warning related to other-pure-statement

2023-10-11 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-45496?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated SPARK-45496:
---
Labels: pull-request-available  (was: )

> Fix the compilation warning related to other-pure-statement
> ---
>
> Key: SPARK-45496
> URL: https://issues.apache.org/jira/browse/SPARK-45496
> Project: Spark
>  Issue Type: Sub-task
>  Components: DStreams, Spark Core
>Affects Versions: 4.0.0
>Reporter: Yang Jie
>Priority: Major
>  Labels: pull-request-available
>
> {code:java}
> "-Wconf:cat=other-match-analysis=org.apache.spark.sql.catalyst.catalog.SessionCatalog.lookupFunction.catalogFunction:wv",
> "-Wconf:cat=other-pure-statement=org.apache.spark.streaming.util.FileBasedWriteAheadLog.readAll.readFile:wv",
> "-Wconf:cat=other-pure-statement=org.apache.spark.scheduler.OutputCommitCoordinatorSuite:wv",
> "-Wconf:cat=other-pure-statement=org.apache.spark.sql.streaming.sources.StreamingDataSourceV2Suite.testPositiveCase.\\$anonfun:wv",
>  {code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Resolved] (SPARK-45451) Make the default storage level of dataset cache configurable

2023-10-11 Thread Wenchen Fan (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-45451?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wenchen Fan resolved SPARK-45451.
-
Fix Version/s: 4.0.0
   Resolution: Fixed

Issue resolved by pull request 43259
[https://github.com/apache/spark/pull/43259]

> Make the default storage level of dataset cache configurable
> 
>
> Key: SPARK-45451
> URL: https://issues.apache.org/jira/browse/SPARK-45451
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 4.0.0
>Reporter: XiDuo You
>Assignee: XiDuo You
>Priority: Major
>  Labels: pull-request-available
> Fix For: 4.0.0
>
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-45451) Make the default storage level of dataset cache configurable

2023-10-11 Thread Wenchen Fan (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-45451?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wenchen Fan reassigned SPARK-45451:
---

Assignee: XiDuo You

> Make the default storage level of dataset cache configurable
> 
>
> Key: SPARK-45451
> URL: https://issues.apache.org/jira/browse/SPARK-45451
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 4.0.0
>Reporter: XiDuo You
>Assignee: XiDuo You
>Priority: Major
>  Labels: pull-request-available
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Resolved] (SPARK-45116) Add some comment for param of JdbcDialect createTable

2023-10-11 Thread Max Gekk (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-45116?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Max Gekk resolved SPARK-45116.
--
Fix Version/s: 4.0.0
   Resolution: Fixed

Issue resolved by pull request 42799
[https://github.com/apache/spark/pull/42799]

> Add some comment for param of JdbcDialect createTable
> -
>
> Key: SPARK-45116
> URL: https://issues.apache.org/jira/browse/SPARK-45116
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 3.5.1
>Reporter: Jia Fan
>Assignee: Jia Fan
>Priority: Minor
>  Labels: pull-request-available
> Fix For: 4.0.0
>
>
> Since SPARK-41516 , add {{createTable}} to {{{}JdbcDialect{}}}. But doesn't 
> add comment for param.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-45116) Add some comment for param of JdbcDialect createTable

2023-10-11 Thread Max Gekk (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-45116?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Max Gekk reassigned SPARK-45116:


Assignee: Jia Fan

> Add some comment for param of JdbcDialect createTable
> -
>
> Key: SPARK-45116
> URL: https://issues.apache.org/jira/browse/SPARK-45116
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 3.5.1
>Reporter: Jia Fan
>Assignee: Jia Fan
>Priority: Minor
>  Labels: pull-request-available
>
> Since SPARK-41516 , add {{createTable}} to {{{}JdbcDialect{}}}. But doesn't 
> add comment for param.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Resolved] (SPARK-45397) Add vector assembler feature transformer

2023-10-11 Thread Weichen Xu (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-45397?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Weichen Xu resolved SPARK-45397.

Fix Version/s: 4.0.0
   Resolution: Fixed

Issue resolved by pull request 43199
[https://github.com/apache/spark/pull/43199]

> Add vector assembler feature transformer
> 
>
> Key: SPARK-45397
> URL: https://issues.apache.org/jira/browse/SPARK-45397
> Project: Spark
>  Issue Type: Sub-task
>  Components: Connect, ML, PySpark
>Affects Versions: 3.5.1
>Reporter: Weichen Xu
>Assignee: Weichen Xu
>Priority: Major
>  Labels: pull-request-available
> Fix For: 4.0.0
>
>
> Add vector assembler feature transformer



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-45428) Add Matomo analytics to all released docs pages

2023-10-11 Thread Ruifeng Zheng (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-45428?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ruifeng Zheng reassigned SPARK-45428:
-

Assignee: BingKun Pan

> Add Matomo analytics to all released docs pages
> ---
>
> Key: SPARK-45428
> URL: https://issues.apache.org/jira/browse/SPARK-45428
> Project: Spark
>  Issue Type: Sub-task
>  Components: Documentation, PySpark
>Affects Versions: 3.5.0, 4.0.0
>Reporter: Allison Wang
>Assignee: BingKun Pan
>Priority: Major
>
> Matomo analytics has been added to some pages of the Spark website. Here is 
> Sean's initial PR: 
> [https://github.com/apache/spark-website/pull/479.|https://www.google.com/url?q=https://github.com/apache/spark-website/pull/479=D=docs=1696544881650480=AOvVaw11SNfWcd4UJzlO8EJvzdoe]
> You can find analytics for Spark website here: https://analytics.apache.org
> We need to add this to all API pages. This is very important for us to 
> prioritize documentation improvements and search engine optimization.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-45397) Add vector assembler feature transformer

2023-10-11 Thread Weichen Xu (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-45397?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Weichen Xu reassigned SPARK-45397:
--

Assignee: Weichen Xu

> Add vector assembler feature transformer
> 
>
> Key: SPARK-45397
> URL: https://issues.apache.org/jira/browse/SPARK-45397
> Project: Spark
>  Issue Type: Sub-task
>  Components: Connect, ML, PySpark
>Affects Versions: 3.5.1
>Reporter: Weichen Xu
>Assignee: Weichen Xu
>Priority: Major
>  Labels: pull-request-available
>
> Add vector assembler feature transformer



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Resolved] (SPARK-45466) VectorAssembler should validate the vector values

2023-10-11 Thread Ruifeng Zheng (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-45466?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ruifeng Zheng resolved SPARK-45466.
---
Resolution: Not A Problem

> VectorAssembler should validate the vector values
> -
>
> Key: SPARK-45466
> URL: https://issues.apache.org/jira/browse/SPARK-45466
> Project: Spark
>  Issue Type: Improvement
>  Components: ML
>Affects Versions: 4.0.0
>Reporter: Ruifeng Zheng
>Priority: Major
>  Labels: pull-request-available
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Comment Edited] (SPARK-45201) NoClassDefFoundError: InternalFutureFailureAccess when compiling Spark 3.5.0

2023-10-11 Thread Sebastian Daberdaku (Jira)



[ 
https://issues.apache.org/jira/browse/SPARK-45201?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17767622#comment-17767622
 ] 

Sebastian Daberdaku edited comment on SPARK-45201 at 10/11/23 9:58 AM:
---

After spending hours analyzing the project pom files, I discovered two things.

First, the shade plugin is relocating the guava/failureaccess package twice in 
the connect jars (once by the module shade plugin, once by the base project 
plugin). I created a simple patch to prevent the relocation of failureacces by 
the base plugin. I am adding the patch file [^spark-3.5.0.patch] to this Jira 
issue, I do not have time to create a pull request, you can apply the patch by 
navigating inside the source folder and running:
{code:java}
patch -p1  NoClassDefFoundError: InternalFutureFailureAccess when compiling Spark 3.5.0
> 
>
> Key: SPARK-45201
> URL: https://issues.apache.org/jira/browse/SPARK-45201
> Project: Spark
>  Issue Type: Bug
>  Components: Connect
>Affects Versions: 3.5.0
>Reporter: Sebastian Daberdaku
>Priority: Major
> Attachments: Dockerfile, spark-3.5.0.patch
>
>
> I am trying to compile Spark 3.5.0 and make a distribution that supports 
> Spark Connect and Kubernetes. The compilation seems to complete correctly, 
> but when I try to run the Spark Connect server on kubernetes I get a 
> "NoClassDefFoundError" as follows:
> {code:java}
> Exception in thread "main" java.lang.NoClassDefFoundError: 
> org/sparkproject/guava/util/concurrent/internal/InternalFutureFailureAccess
>     at java.base/java.lang.ClassLoader.defineClass1(Native Method)
>     at java.base/java.lang.ClassLoader.defineClass(ClassLoader.java:1017)
>     at 
> java.base/java.security.SecureClassLoader.defineClass(SecureClassLoader.java:150)
>     at 
> java.base/jdk.internal.loader.BuiltinClassLoader.defineClass(BuiltinClassLoader.java:862)
>     at 
> java.base/jdk.internal.loader.BuiltinClassLoader.findClassOnClassPathOrNull(BuiltinClassLoader.java:760)
>     at 
> java.base/jdk.internal.loader.BuiltinClassLoader.loadClassOrNull(BuiltinClassLoader.java:681)
>     at 
> java.base/jdk.internal.loader.BuiltinClassLoader.loadClass(BuiltinClassLoader.java:639)
>     at 
> java.base/jdk.internal.loader.ClassLoaders$AppClassLoader.loadClass(ClassLoaders.java:188)
>     at java.base/java.lang.ClassLoader.loadClass(ClassLoader.java:525)
>     at java.base/java.lang.ClassLoader.defineClass1(Native Method)
>     at java.base/java.lang.ClassLoader.defineClass(ClassLoader.java:1017)
>     at 
> java.base/java.security.SecureClassLoader.defineClass(SecureClassLoader.java:150)
>     at 
> java.base/jdk.internal.loader.BuiltinClassLoader.defineClass(BuiltinClassLoader.java:862)
>     at 
> java.base/jdk.internal.loader.BuiltinClassLoader.findClassOnClassPathOrNull(BuiltinClassLoader.java:760)
>     at 
> java.base/jdk.internal.loader.BuiltinClassLoader.loadClassOrNull(BuiltinClassLoader.java:681)
>     at 
> java.base/jdk.internal.loader.BuiltinClassLoader.loadClass(BuiltinClassLoader.java:639)
>     at 
> java.base/jdk.internal.loader.ClassLoaders$AppClassLoader.loadClass(ClassLoaders.java:188)
>     at java.base/java.lang.ClassLoader.loadClass(ClassLoader.java:525)
>     at java.base/java.lang.ClassLoader.defineClass1(Native Method)
>     at java.base/java.lang.ClassLoader.defineClass(ClassLoader.java:1017)
>     at 
> java.base/java.security.SecureClassLoader.defineClass(SecureClassLoader.java:150)
>     at 
> java.base/jdk.internal.loader.BuiltinClassLoader.defineClass(BuiltinClassLoader.java:862)
>     at 
> java.base/jdk.internal.loader.BuiltinClassLoader.findClassOnClassPathOrNull(BuiltinClassLoader.java:760)
>     at 
> java.base/jdk.internal.loader.BuiltinClassLoader.loadClassOrNull(BuiltinClassLoader.java:681)
>     at 
> java.base/jdk.internal.loader.BuiltinClassLoader.loadClass(BuiltinClassLoader.java:639)
>     at 
> java.base/jdk.internal.loader.ClassLoaders$AppClassLoader.loadClass(ClassLoaders.java:188)
>     at java.base/java.lang.ClassLoader.loadClass(ClassLoader.java:525)
>     at 
> org.sparkproject.guava.cache.LocalCache$LoadingValueReference.(LocalCache.java:3511)
>     at 
> org.sparkproject.guava.cache.LocalCache$LoadingValueReference.(LocalCache.java:3515)
>     at 
> org.sparkproject.guava.cache.LocalCache$Segment.lockedGetOrLoad(LocalCache.java:2168)
>     at 
> org.sparkproject.guava.cache.LocalCache$Segment.get(LocalCache.java:2079)
>     at org.sparkproject.guava.cache.LocalCache.get(LocalCache.java:4011)
>     at org.sparkproject.guava.cache.LocalCache.getOrLoad(LocalCache.java:4034)
>     at 
> org.sparkproject.guava.cache.LocalCache$LocalLoadingCache.get(LocalCache.java:5010)
>     at 
>

[jira] [Comment Edited] (SPARK-45201) NoClassDefFoundError: InternalFutureFailureAccess when compiling Spark 3.5.0

2023-10-11 Thread Sebastian Daberdaku (Jira)



[ 
https://issues.apache.org/jira/browse/SPARK-45201?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17767622#comment-17767622
 ] 

Sebastian Daberdaku edited comment on SPARK-45201 at 10/11/23 9:58 AM:
---

After spending hours analyzing the project pom files, I discovered two things.

First, the shade plugin is relocating the guava/failureaccess package twice in 
the connect jars (once by the module shade plugin, once by the base project 
plugin). I created a simple patch to prevent the relocation of failureacces by 
the base plugin. I am adding the patch file [^spark-3.5.0.patch] to this Jira 
issue, I do not have time to create a pull request, you can apply the patch by 
navigating inside the source folder and running:
patch -p1  NoClassDefFoundError: InternalFutureFailureAccess when compiling Spark 3.5.0
> 
>
> Key: SPARK-45201
> URL: https://issues.apache.org/jira/browse/SPARK-45201
> Project: Spark
>  Issue Type: Bug
>  Components: Connect
>Affects Versions: 3.5.0
>Reporter: Sebastian Daberdaku
>Priority: Major
> Attachments: Dockerfile, spark-3.5.0.patch
>
>
> I am trying to compile Spark 3.5.0 and make a distribution that supports 
> Spark Connect and Kubernetes. The compilation seems to complete correctly, 
> but when I try to run the Spark Connect server on kubernetes I get a 
> "NoClassDefFoundError" as follows:
> {code:java}
> Exception in thread "main" java.lang.NoClassDefFoundError: 
> org/sparkproject/guava/util/concurrent/internal/InternalFutureFailureAccess
>     at java.base/java.lang.ClassLoader.defineClass1(Native Method)
>     at java.base/java.lang.ClassLoader.defineClass(ClassLoader.java:1017)
>     at 
> java.base/java.security.SecureClassLoader.defineClass(SecureClassLoader.java:150)
>     at 
> java.base/jdk.internal.loader.BuiltinClassLoader.defineClass(BuiltinClassLoader.java:862)
>     at 
> java.base/jdk.internal.loader.BuiltinClassLoader.findClassOnClassPathOrNull(BuiltinClassLoader.java:760)
>     at 
> java.base/jdk.internal.loader.BuiltinClassLoader.loadClassOrNull(BuiltinClassLoader.java:681)
>     at 
> java.base/jdk.internal.loader.BuiltinClassLoader.loadClass(BuiltinClassLoader.java:639)
>     at 
> java.base/jdk.internal.loader.ClassLoaders$AppClassLoader.loadClass(ClassLoaders.java:188)
>     at java.base/java.lang.ClassLoader.loadClass(ClassLoader.java:525)
>     at java.base/java.lang.ClassLoader.defineClass1(Native Method)
>     at java.base/java.lang.ClassLoader.defineClass(ClassLoader.java:1017)
>     at 
> java.base/java.security.SecureClassLoader.defineClass(SecureClassLoader.java:150)
>     at 
> java.base/jdk.internal.loader.BuiltinClassLoader.defineClass(BuiltinClassLoader.java:862)
>     at 
> java.base/jdk.internal.loader.BuiltinClassLoader.findClassOnClassPathOrNull(BuiltinClassLoader.java:760)
>     at 
> java.base/jdk.internal.loader.BuiltinClassLoader.loadClassOrNull(BuiltinClassLoader.java:681)
>     at 
> java.base/jdk.internal.loader.BuiltinClassLoader.loadClass(BuiltinClassLoader.java:639)
>     at 
> java.base/jdk.internal.loader.ClassLoaders$AppClassLoader.loadClass(ClassLoaders.java:188)
>     at java.base/java.lang.ClassLoader.loadClass(ClassLoader.java:525)
>     at java.base/java.lang.ClassLoader.defineClass1(Native Method)
>     at java.base/java.lang.ClassLoader.defineClass(ClassLoader.java:1017)
>     at 
> java.base/java.security.SecureClassLoader.defineClass(SecureClassLoader.java:150)
>     at 
> java.base/jdk.internal.loader.BuiltinClassLoader.defineClass(BuiltinClassLoader.java:862)
>     at 
> java.base/jdk.internal.loader.BuiltinClassLoader.findClassOnClassPathOrNull(BuiltinClassLoader.java:760)
>     at 
> java.base/jdk.internal.loader.BuiltinClassLoader.loadClassOrNull(BuiltinClassLoader.java:681)
>     at 
> java.base/jdk.internal.loader.BuiltinClassLoader.loadClass(BuiltinClassLoader.java:639)
>     at 
> java.base/jdk.internal.loader.ClassLoaders$AppClassLoader.loadClass(ClassLoaders.java:188)
>     at java.base/java.lang.ClassLoader.loadClass(ClassLoader.java:525)
>     at 
> org.sparkproject.guava.cache.LocalCache$LoadingValueReference.(LocalCache.java:3511)
>     at 
> org.sparkproject.guava.cache.LocalCache$LoadingValueReference.(LocalCache.java:3515)
>     at 
> org.sparkproject.guava.cache.LocalCache$Segment.lockedGetOrLoad(LocalCache.java:2168)
>     at 
> org.sparkproject.guava.cache.LocalCache$Segment.get(LocalCache.java:2079)
>     at org.sparkproject.guava.cache.LocalCache.get(LocalCache.java:4011)
>     at org.sparkproject.guava.cache.LocalCache.getOrLoad(LocalCache.java:4034)
>     at 
> org.sparkproject.guava.cache.LocalCache$LocalLoadingCache.get(LocalCache.java:5010)
>     at 
>

[jira] [Comment Edited] (SPARK-45201) NoClassDefFoundError: InternalFutureFailureAccess when compiling Spark 3.5.0

2023-10-11 Thread Sebastian Daberdaku (Jira)



[ 
https://issues.apache.org/jira/browse/SPARK-45201?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17767622#comment-17767622
 ] 

Sebastian Daberdaku edited comment on SPARK-45201 at 10/11/23 9:58 AM:
---

After spending hours analyzing the project pom files, I discovered two things.

First, the shade plugin is relocating the guava/failureaccess package twice in 
the connect jars (once by the module shade plugin, once by the base project 
plugin). I created a simple patch to prevent the relocation of failureacces by 
the base plugin. I am adding the patch file [^spark-3.5.0.patch] to this Jira 
issue, I do not have time to create a pull request, you can apply the patch by 
navigating inside the source folder and running:
{{}}
{code:java}
patch -p1  NoClassDefFoundError: InternalFutureFailureAccess when compiling Spark 3.5.0
> 
>
> Key: SPARK-45201
> URL: https://issues.apache.org/jira/browse/SPARK-45201
> Project: Spark
>  Issue Type: Bug
>  Components: Connect
>Affects Versions: 3.5.0
>Reporter: Sebastian Daberdaku
>Priority: Major
> Attachments: Dockerfile, spark-3.5.0.patch
>
>
> I am trying to compile Spark 3.5.0 and make a distribution that supports 
> Spark Connect and Kubernetes. The compilation seems to complete correctly, 
> but when I try to run the Spark Connect server on kubernetes I get a 
> "NoClassDefFoundError" as follows:
> {code:java}
> Exception in thread "main" java.lang.NoClassDefFoundError: 
> org/sparkproject/guava/util/concurrent/internal/InternalFutureFailureAccess
>     at java.base/java.lang.ClassLoader.defineClass1(Native Method)
>     at java.base/java.lang.ClassLoader.defineClass(ClassLoader.java:1017)
>     at 
> java.base/java.security.SecureClassLoader.defineClass(SecureClassLoader.java:150)
>     at 
> java.base/jdk.internal.loader.BuiltinClassLoader.defineClass(BuiltinClassLoader.java:862)
>     at 
> java.base/jdk.internal.loader.BuiltinClassLoader.findClassOnClassPathOrNull(BuiltinClassLoader.java:760)
>     at 
> java.base/jdk.internal.loader.BuiltinClassLoader.loadClassOrNull(BuiltinClassLoader.java:681)
>     at 
> java.base/jdk.internal.loader.BuiltinClassLoader.loadClass(BuiltinClassLoader.java:639)
>     at 
> java.base/jdk.internal.loader.ClassLoaders$AppClassLoader.loadClass(ClassLoaders.java:188)
>     at java.base/java.lang.ClassLoader.loadClass(ClassLoader.java:525)
>     at java.base/java.lang.ClassLoader.defineClass1(Native Method)
>     at java.base/java.lang.ClassLoader.defineClass(ClassLoader.java:1017)
>     at 
> java.base/java.security.SecureClassLoader.defineClass(SecureClassLoader.java:150)
>     at 
> java.base/jdk.internal.loader.BuiltinClassLoader.defineClass(BuiltinClassLoader.java:862)
>     at 
> java.base/jdk.internal.loader.BuiltinClassLoader.findClassOnClassPathOrNull(BuiltinClassLoader.java:760)
>     at 
> java.base/jdk.internal.loader.BuiltinClassLoader.loadClassOrNull(BuiltinClassLoader.java:681)
>     at 
> java.base/jdk.internal.loader.BuiltinClassLoader.loadClass(BuiltinClassLoader.java:639)
>     at 
> java.base/jdk.internal.loader.ClassLoaders$AppClassLoader.loadClass(ClassLoaders.java:188)
>     at java.base/java.lang.ClassLoader.loadClass(ClassLoader.java:525)
>     at java.base/java.lang.ClassLoader.defineClass1(Native Method)
>     at java.base/java.lang.ClassLoader.defineClass(ClassLoader.java:1017)
>     at 
> java.base/java.security.SecureClassLoader.defineClass(SecureClassLoader.java:150)
>     at 
> java.base/jdk.internal.loader.BuiltinClassLoader.defineClass(BuiltinClassLoader.java:862)
>     at 
> java.base/jdk.internal.loader.BuiltinClassLoader.findClassOnClassPathOrNull(BuiltinClassLoader.java:760)
>     at 
> java.base/jdk.internal.loader.BuiltinClassLoader.loadClassOrNull(BuiltinClassLoader.java:681)
>     at 
> java.base/jdk.internal.loader.BuiltinClassLoader.loadClass(BuiltinClassLoader.java:639)
>     at 
> java.base/jdk.internal.loader.ClassLoaders$AppClassLoader.loadClass(ClassLoaders.java:188)
>     at java.base/java.lang.ClassLoader.loadClass(ClassLoader.java:525)
>     at 
> org.sparkproject.guava.cache.LocalCache$LoadingValueReference.(LocalCache.java:3511)
>     at 
> org.sparkproject.guava.cache.LocalCache$LoadingValueReference.(LocalCache.java:3515)
>     at 
> org.sparkproject.guava.cache.LocalCache$Segment.lockedGetOrLoad(LocalCache.java:2168)
>     at 
> org.sparkproject.guava.cache.LocalCache$Segment.get(LocalCache.java:2079)
>     at org.sparkproject.guava.cache.LocalCache.get(LocalCache.java:4011)
>     at org.sparkproject.guava.cache.LocalCache.getOrLoad(LocalCache.java:4034)
>     at 
> org.sparkproject.guava.cache.LocalCache$LocalLoadingCache.get(LocalCache.java:5010)
>     at 
>

[jira] [Commented] (SPARK-45201) NoClassDefFoundError: InternalFutureFailureAccess when compiling Spark 3.5.0

2023-10-11 Thread xie shuiahu (Jira)



[ 
https://issues.apache.org/jira/browse/SPARK-45201?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17773976#comment-17773976
 ] 

xie shuiahu commented on SPARK-45201:
-

[~sdaberdaku] I alse have the same issue. I solved it by putting 
spark-connect.jar in spark-submit --jars, instead of SPARK_HOME/jars

> NoClassDefFoundError: InternalFutureFailureAccess when compiling Spark 3.5.0
> 
>
> Key: SPARK-45201
> URL: https://issues.apache.org/jira/browse/SPARK-45201
> Project: Spark
>  Issue Type: Bug
>  Components: Connect
>Affects Versions: 3.5.0
>Reporter: Sebastian Daberdaku
>Priority: Major
> Attachments: Dockerfile, spark-3.5.0.patch
>
>
> I am trying to compile Spark 3.5.0 and make a distribution that supports 
> Spark Connect and Kubernetes. The compilation seems to complete correctly, 
> but when I try to run the Spark Connect server on kubernetes I get a 
> "NoClassDefFoundError" as follows:
> {code:java}
> Exception in thread "main" java.lang.NoClassDefFoundError: 
> org/sparkproject/guava/util/concurrent/internal/InternalFutureFailureAccess
>     at java.base/java.lang.ClassLoader.defineClass1(Native Method)
>     at java.base/java.lang.ClassLoader.defineClass(ClassLoader.java:1017)
>     at 
> java.base/java.security.SecureClassLoader.defineClass(SecureClassLoader.java:150)
>     at 
> java.base/jdk.internal.loader.BuiltinClassLoader.defineClass(BuiltinClassLoader.java:862)
>     at 
> java.base/jdk.internal.loader.BuiltinClassLoader.findClassOnClassPathOrNull(BuiltinClassLoader.java:760)
>     at 
> java.base/jdk.internal.loader.BuiltinClassLoader.loadClassOrNull(BuiltinClassLoader.java:681)
>     at 
> java.base/jdk.internal.loader.BuiltinClassLoader.loadClass(BuiltinClassLoader.java:639)
>     at 
> java.base/jdk.internal.loader.ClassLoaders$AppClassLoader.loadClass(ClassLoaders.java:188)
>     at java.base/java.lang.ClassLoader.loadClass(ClassLoader.java:525)
>     at java.base/java.lang.ClassLoader.defineClass1(Native Method)
>     at java.base/java.lang.ClassLoader.defineClass(ClassLoader.java:1017)
>     at 
> java.base/java.security.SecureClassLoader.defineClass(SecureClassLoader.java:150)
>     at 
> java.base/jdk.internal.loader.BuiltinClassLoader.defineClass(BuiltinClassLoader.java:862)
>     at 
> java.base/jdk.internal.loader.BuiltinClassLoader.findClassOnClassPathOrNull(BuiltinClassLoader.java:760)
>     at 
> java.base/jdk.internal.loader.BuiltinClassLoader.loadClassOrNull(BuiltinClassLoader.java:681)
>     at 
> java.base/jdk.internal.loader.BuiltinClassLoader.loadClass(BuiltinClassLoader.java:639)
>     at 
> java.base/jdk.internal.loader.ClassLoaders$AppClassLoader.loadClass(ClassLoaders.java:188)
>     at java.base/java.lang.ClassLoader.loadClass(ClassLoader.java:525)
>     at java.base/java.lang.ClassLoader.defineClass1(Native Method)
>     at java.base/java.lang.ClassLoader.defineClass(ClassLoader.java:1017)
>     at 
> java.base/java.security.SecureClassLoader.defineClass(SecureClassLoader.java:150)
>     at 
> java.base/jdk.internal.loader.BuiltinClassLoader.defineClass(BuiltinClassLoader.java:862)
>     at 
> java.base/jdk.internal.loader.BuiltinClassLoader.findClassOnClassPathOrNull(BuiltinClassLoader.java:760)
>     at 
> java.base/jdk.internal.loader.BuiltinClassLoader.loadClassOrNull(BuiltinClassLoader.java:681)
>     at 
> java.base/jdk.internal.loader.BuiltinClassLoader.loadClass(BuiltinClassLoader.java:639)
>     at 
> java.base/jdk.internal.loader.ClassLoaders$AppClassLoader.loadClass(ClassLoaders.java:188)
>     at java.base/java.lang.ClassLoader.loadClass(ClassLoader.java:525)
>     at 
> org.sparkproject.guava.cache.LocalCache$LoadingValueReference.(LocalCache.java:3511)
>     at 
> org.sparkproject.guava.cache.LocalCache$LoadingValueReference.(LocalCache.java:3515)
>     at 
> org.sparkproject.guava.cache.LocalCache$Segment.lockedGetOrLoad(LocalCache.java:2168)
>     at 
> org.sparkproject.guava.cache.LocalCache$Segment.get(LocalCache.java:2079)
>     at org.sparkproject.guava.cache.LocalCache.get(LocalCache.java:4011)
>     at org.sparkproject.guava.cache.LocalCache.getOrLoad(LocalCache.java:4034)
>     at 
> org.sparkproject.guava.cache.LocalCache$LocalLoadingCache.get(LocalCache.java:5010)
>     at 
> org.apache.spark.storage.BlockManagerId$.getCachedBlockManagerId(BlockManagerId.scala:146)
>     at 
> org.apache.spark.storage.BlockManagerId$.apply(BlockManagerId.scala:127)
>     at 
> org.apache.spark.storage.BlockManager.initialize(BlockManager.scala:536)
>     at org.apache.spark.SparkContext.(SparkContext.scala:625)
>     at org.apache.spark.SparkContext$.getOrCreate(SparkContext.scala:2888)
>     at 
> org.apache.spark.sql.SparkSession$Builder.$anonfun$getOrCreate$2(SparkSession.scala:1099)
>     at

[jira] [Resolved] (SPARK-45469) Replace `toIterator` with `iterator` for `IterableOnce`

2023-10-11 Thread Dongjoon Hyun (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-45469?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dongjoon Hyun resolved SPARK-45469.
---
Fix Version/s: 4.0.0
   Resolution: Fixed

Issue resolved by pull request 43295
[https://github.com/apache/spark/pull/43295]

> Replace `toIterator` with `iterator` for `IterableOnce`
> ---
>
> Key: SPARK-45469
> URL: https://issues.apache.org/jira/browse/SPARK-45469
> Project: Spark
>  Issue Type: Sub-task
>  Components: Connect, Spark Core, SQL
>Affects Versions: 4.0.0
>Reporter: Yang Jie
>Assignee: Yang Jie
>Priority: Major
>  Labels: pull-request-available
> Fix For: 4.0.0
>
>
> {code:java}
> @deprecated("Use .iterator instead", "2.13.0")
> @`inline` def toIterator: Iterator[A] = it.iterator {code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-45502) Upgrade Kafka to 3.6.0

2023-10-11 Thread Deng Ziming (Jira)



[ 
https://issues.apache.org/jira/browse/SPARK-45502?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17773964#comment-17773964
 ] 

Deng Ziming commented on SPARK-45502:
-

I'm trying this. 

> Upgrade Kafka to 3.6.0
> --
>
> Key: SPARK-45502
> URL: https://issues.apache.org/jira/browse/SPARK-45502
> Project: Spark
>  Issue Type: Sub-task
>  Components: Build
>Affects Versions: 4.0.0
>Reporter: Dongjoon Hyun
>Priority: Major
>
> Apache Kafka 3.6.0 is released on Oct 10, 2023.
> - https://downloads.apache.org/kafka/3.6.0/RELEASE_NOTES.html



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Resolved] (SPARK-45500) Show the number of abnormally completed drivers in MasterPage

2023-10-11 Thread Dongjoon Hyun (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-45500?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dongjoon Hyun resolved SPARK-45500.
---
Fix Version/s: 4.0.0
   Resolution: Fixed

Issue resolved by pull request 43328
[https://github.com/apache/spark/pull/43328]

> Show the number of abnormally completed drivers in MasterPage
> -
>
> Key: SPARK-45500
> URL: https://issues.apache.org/jira/browse/SPARK-45500
> Project: Spark
>  Issue Type: Improvement
>  Components: Spark Core, Web UI
>Affects Versions: 4.0.0
>Reporter: Dongjoon Hyun
>Assignee: Dongjoon Hyun
>Priority: Major
>  Labels: pull-request-available
> Fix For: 4.0.0
>
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-45500) Show the number of abnormally completed drivers in MasterPage

2023-10-11 Thread Dongjoon Hyun (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-45500?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dongjoon Hyun reassigned SPARK-45500:
-

Assignee: Dongjoon Hyun

> Show the number of abnormally completed drivers in MasterPage
> -
>
> Key: SPARK-45500
> URL: https://issues.apache.org/jira/browse/SPARK-45500
> Project: Spark
>  Issue Type: Improvement
>  Components: Spark Core, Web UI
>Affects Versions: 4.0.0
>Reporter: Dongjoon Hyun
>Assignee: Dongjoon Hyun
>Priority: Major
>  Labels: pull-request-available
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Comment Edited] (SPARK-44757) Vulnerabilities in Spark3.4

2023-10-11 Thread Laurenceau Julien (Jira)



[ 
https://issues.apache.org/jira/browse/SPARK-44757?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17773949#comment-17773949
 ] 

Laurenceau Julien edited comment on SPARK-44757 at 10/11/23 8:14 AM:
-

Hi,

In addition to this I'd like to add the following CVE:
h1. CVE-2022-1471 (High) detected in snakeyaml-1.33.jar

SnakeYaml's Constructor() class does not restrict types which can be 
instantiated during deserialization. Deserializing yaml content provided by an 
attacker can lead to remote code execution. We recommend using SnakeYaml's 
SafeConsturctor when parsing untrusted content to restrict deserialization.

Publish Date: 2022-12-01

URL: [CVE-2022-1471|https://www.mend.io/vulnerability-database/CVE-2022-1471]


was (Author: julienlau):
Hi,

In addition to this I'd like to add the following high CVE:
h1. CVE-2022-1471 (High) detected in snakeyaml-1.33.jar

> Vulnerabilities in Spark3.4
> ---
>
> Key: SPARK-44757
> URL: https://issues.apache.org/jira/browse/SPARK-44757
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Core
>Affects Versions: 3.4.0
>Reporter: Anand Balasubramaniam
>Priority: Major
>
> We are seeing below list of TPLS's with vulnerabilities bundled with Spark3.4 
> package with StackRox scan , is there any ETA on fixing them ? Kindly apprise 
> us on the same .
> h2. Vulnerabilities in Spark3.4
> |*CVE*|*Description*|*Severity*|
> |CVE-2018-21234|Jodd before 5.0.4 performs Deserialization of Untrusted JSON 
> Data when setClassMetadataName is set.|CVSS Score:9.8Critical|
> |CVE-2022-42004|In FasterXML jackson-databind before 2.13.4, resource 
> exhaustion can occur because of a lack of a check in 
> BeanDeserializer._deserializeFromArray to prevent use of deeply nested 
> arrays. An application is vulnerable only with certain customized choices for 
> deserialization.|CVSS Score 7.5Important|
> | CVE-2022-42003|In FasterXML jackson-databind before 2.14.0-rc1, resource 
> exhaustion can occur because of a lack of a check in primitive value 
> deserializers to avoid deep wrapper array nesting, when the 
> UNWRAP_SINGLE_VALUE_ARRAYS feature is enabled. Additional fix version in 
> 2.13.4.1 and 2.12.17.1|CVSS Score 7.5Important|
> |CVE-2022-40152|Those using Woodstox to parse XML data may be vulnerable to 
> Denial of Service attacks (DOS) if DTD support is enabled. If the parser is 
> running on user supplied input, an attacker may supply content that causes 
> the parser to crash by stackoverflow. This effect may support a denial of 
> service attack.|CVSS Score 7.5Important|
> |CVE-2022-3171|A parsing issue with binary data in protobuf-java core and 
> lite versions prior to 3.21.7, 3.20.3, 3.19.6 and 3.16.3 can lead to a denial 
> of service attack. Inputs containing multiple instances of non-repeated 
> embedded messages with repeated or unknown fields causes objects to be 
> converted back-n-forth between mutable and immutable forms, resulting in 
> potentially long garbage collection pauses. We recommend updating to the 
> versions mentioned above.|CVSS Score 7.5Important|
> |CVE-2021-34538|Apache Hive before 3.1.3 "CREATE" and "DROP" function 
> operations does not check for necessary authorization of involved entities in 
> the query. It was found that an unauthorized user can manipulate an existing 
> UDF without having the privileges to do so. This allowed unauthorized or 
> underprivileged users to drop and recreate UDFs pointing them to new jars 
> that could be potentially malicious.|CVSS Score 7.5Important|
> |CVE-2020-13949|In Apache Thrift 0.9.3 to 0.13.0, malicious RPC clients could 
> send short messages which would result in a large memory allocation, 
> potentially leading to denial of service.|CVSS Score 7.5Important|
> |CVE-2018-10237|Unbounded memory allocation in Google Guava 11.0 through 24.x 
> before 24.1.1 allows remote attackers to conduct denial of service attacks 
> against servers that depend on this library and deserialize attacker-provided 
> data, because the AtomicDoubleArray class (when serialized with Java 
> serialization) and the CompoundOrdering class (when serialized with GWT 
> serialization) perform eager allocation without appropriate checks on what a 
> client has sent and whether the data size is reasonable.|CVSS 5.9Moderate|
> |CVE-2021-22569|An issue in protobuf-java allowed the interleaving of 
> com.google.protobuf.UnknownFieldSet fields in such a way that would be 
> processed out of order. A small malicious payload can occupy the parser for 
> several minutes by creating large numbers of short-lived objects that cause 
> frequent, repeated pauses. We recommend upgrading libraries beyond the 
> vulnerable versions.|CVSS 5.9Moderate|
> |CVE-2020-8908|A temp directory creation vulnerability exists in all versions 
> of Guava,

[jira] [Commented] (SPARK-44757) Vulnerabilities in Spark3.4

2023-10-11 Thread Laurenceau Julien (Jira)



[ 
https://issues.apache.org/jira/browse/SPARK-44757?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17773949#comment-17773949
 ] 

Laurenceau Julien commented on SPARK-44757:
---

Hi,

In addition to this I'd like to add the following high CVE:
h1. CVE-2022-1471 (High) detected in snakeyaml-1.33.jar

> Vulnerabilities in Spark3.4
> ---
>
> Key: SPARK-44757
> URL: https://issues.apache.org/jira/browse/SPARK-44757
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Core
>Affects Versions: 3.4.0
>Reporter: Anand Balasubramaniam
>Priority: Major
>
> We are seeing below list of TPLS's with vulnerabilities bundled with Spark3.4 
> package with StackRox scan , is there any ETA on fixing them ? Kindly apprise 
> us on the same .
> h2. Vulnerabilities in Spark3.4
> |*CVE*|*Description*|*Severity*|
> |CVE-2018-21234|Jodd before 5.0.4 performs Deserialization of Untrusted JSON 
> Data when setClassMetadataName is set.|CVSS Score:9.8Critical|
> |CVE-2022-42004|In FasterXML jackson-databind before 2.13.4, resource 
> exhaustion can occur because of a lack of a check in 
> BeanDeserializer._deserializeFromArray to prevent use of deeply nested 
> arrays. An application is vulnerable only with certain customized choices for 
> deserialization.|CVSS Score 7.5Important|
> | CVE-2022-42003|In FasterXML jackson-databind before 2.14.0-rc1, resource 
> exhaustion can occur because of a lack of a check in primitive value 
> deserializers to avoid deep wrapper array nesting, when the 
> UNWRAP_SINGLE_VALUE_ARRAYS feature is enabled. Additional fix version in 
> 2.13.4.1 and 2.12.17.1|CVSS Score 7.5Important|
> |CVE-2022-40152|Those using Woodstox to parse XML data may be vulnerable to 
> Denial of Service attacks (DOS) if DTD support is enabled. If the parser is 
> running on user supplied input, an attacker may supply content that causes 
> the parser to crash by stackoverflow. This effect may support a denial of 
> service attack.|CVSS Score 7.5Important|
> |CVE-2022-3171|A parsing issue with binary data in protobuf-java core and 
> lite versions prior to 3.21.7, 3.20.3, 3.19.6 and 3.16.3 can lead to a denial 
> of service attack. Inputs containing multiple instances of non-repeated 
> embedded messages with repeated or unknown fields causes objects to be 
> converted back-n-forth between mutable and immutable forms, resulting in 
> potentially long garbage collection pauses. We recommend updating to the 
> versions mentioned above.|CVSS Score 7.5Important|
> |CVE-2021-34538|Apache Hive before 3.1.3 "CREATE" and "DROP" function 
> operations does not check for necessary authorization of involved entities in 
> the query. It was found that an unauthorized user can manipulate an existing 
> UDF without having the privileges to do so. This allowed unauthorized or 
> underprivileged users to drop and recreate UDFs pointing them to new jars 
> that could be potentially malicious.|CVSS Score 7.5Important|
> |CVE-2020-13949|In Apache Thrift 0.9.3 to 0.13.0, malicious RPC clients could 
> send short messages which would result in a large memory allocation, 
> potentially leading to denial of service.|CVSS Score 7.5Important|
> |CVE-2018-10237|Unbounded memory allocation in Google Guava 11.0 through 24.x 
> before 24.1.1 allows remote attackers to conduct denial of service attacks 
> against servers that depend on this library and deserialize attacker-provided 
> data, because the AtomicDoubleArray class (when serialized with Java 
> serialization) and the CompoundOrdering class (when serialized with GWT 
> serialization) perform eager allocation without appropriate checks on what a 
> client has sent and whether the data size is reasonable.|CVSS 5.9Moderate|
> |CVE-2021-22569|An issue in protobuf-java allowed the interleaving of 
> com.google.protobuf.UnknownFieldSet fields in such a way that would be 
> processed out of order. A small malicious payload can occupy the parser for 
> several minutes by creating large numbers of short-lived objects that cause 
> frequent, repeated pauses. We recommend upgrading libraries beyond the 
> vulnerable versions.|CVSS 5.9Moderate|
> |CVE-2020-8908|A temp directory creation vulnerability exists in all versions 
> of Guava, allowing an attacker with access to the machine to potentially 
> access data in a temporary directory created by the Guava API 
> [com.google.common.io|https://urldefense.com/v3/__http:/com.google.common.io/__;!!KpaPruflFCEp!hUy3fNZoxf_mnbeTP7GUWkbaKtRLDswR2fRnQ9Gm_AoaeVUncE_plq53EqTWyd1ZfAI7tIFOgmmEBPoGRw$].Files.createTempDir().
>  By default, on unix-like systems, the created directory is world-readable 
> (readable by an attacker with access to the system). The method in question 
> has been marked @Deprecated in versions 30.0 and later and should not be 
> used. For Android developers, we

[jira] [Resolved] (SPARK-45480) Selectable SQL Plan

2023-10-11 Thread Kent Yao (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-45480?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kent Yao resolved SPARK-45480.
--
Fix Version/s: 4.0.0
   Resolution: Fixed

Issue resolved by pull request 43307
[https://github.com/apache/spark/pull/43307]

> Selectable SQL Plan
> ---
>
> Key: SPARK-45480
> URL: https://issues.apache.org/jira/browse/SPARK-45480
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL, Web UI
>Affects Versions: 4.0.0
>Reporter: Kent Yao
>Assignee: Kent Yao
>Priority: Major
>  Labels: pull-request-available
> Fix For: 4.0.0
>
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-45480) Selectable SQL Plan

2023-10-11 Thread Kent Yao (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-45480?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kent Yao reassigned SPARK-45480:


Assignee: Kent Yao

> Selectable SQL Plan
> ---
>
> Key: SPARK-45480
> URL: https://issues.apache.org/jira/browse/SPARK-45480
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL, Web UI
>Affects Versions: 4.0.0
>Reporter: Kent Yao
>Assignee: Kent Yao
>Priority: Major
>  Labels: pull-request-available
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-45497) Add a symbolic link file `spark-examples.jar` in K8s Docker images

2023-10-11 Thread Dongjoon Hyun (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-45497?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dongjoon Hyun reassigned SPARK-45497:
-

Assignee: Dongjoon Hyun

> Add a symbolic link file `spark-examples.jar` in K8s Docker images
> --
>
> Key: SPARK-45497
> URL: https://issues.apache.org/jira/browse/SPARK-45497
> Project: Spark
>  Issue Type: Improvement
>  Components: Kubernetes
>Affects Versions: 4.0.0
>Reporter: Dongjoon Hyun
>Assignee: Dongjoon Hyun
>Priority: Major
>  Labels: pull-request-available
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Resolved] (SPARK-45497) Add a symbolic link file `spark-examples.jar` in K8s Docker images

2023-10-11 Thread Dongjoon Hyun (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-45497?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dongjoon Hyun resolved SPARK-45497.
---
Fix Version/s: 4.0.0
   Resolution: Fixed

Issue resolved by pull request 43324
[https://github.com/apache/spark/pull/43324]

> Add a symbolic link file `spark-examples.jar` in K8s Docker images
> --
>
> Key: SPARK-45497
> URL: https://issues.apache.org/jira/browse/SPARK-45497
> Project: Spark
>  Issue Type: Improvement
>  Components: Kubernetes
>Affects Versions: 4.0.0
>Reporter: Dongjoon Hyun
>Assignee: Dongjoon Hyun
>Priority: Major
>  Labels: pull-request-available
> Fix For: 4.0.0
>
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-45502) Upgrade Kafka to 3.6.0

2023-10-11 Thread Dongjoon Hyun (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-45502?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dongjoon Hyun updated SPARK-45502:
--
Summary: Upgrade Kafka to 3.6.0  (was: Upgrade to Kafka 3.6.0)

> Upgrade Kafka to 3.6.0
> --
>
> Key: SPARK-45502
> URL: https://issues.apache.org/jira/browse/SPARK-45502
> Project: Spark
>  Issue Type: Sub-task
>  Components: Build
>Affects Versions: 4.0.0
>Reporter: Dongjoon Hyun
>Priority: Major
>
> Apache Kafka 3.6.0 is released on Oct 10, 2023.
> - https://downloads.apache.org/kafka/3.6.0/RELEASE_NOTES.html



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-44115) Upgrade Apache ORC to 2.0

2023-10-11 Thread Dongjoon Hyun (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-44115?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dongjoon Hyun updated SPARK-44115:
--
Summary: Upgrade Apache ORC to 2.0  (was: Upgrade to Apache ORC 2.0)

> Upgrade Apache ORC to 2.0
> -
>
> Key: SPARK-44115
> URL: https://issues.apache.org/jira/browse/SPARK-44115
> Project: Spark
>  Issue Type: Sub-task
>  Components: Build
>Affects Versions: 4.0.0
>Reporter: Dongjoon Hyun
>Priority: Major
>
> Apache ORC community has the following release cycles which are synchronized 
> with Apache Spark releases.
>  * ORC v2.0.0 (next year) for Apache Spark 4.0.x
>  * ORC v1.9.0 (this month) for Apache Spark 3.5.x
>  * ORC v1.8.x for Apache Spark 3.4.x
>  * ORC v1.7.x for Apache Spark 3.3.x
>  * ORC v1.6.x for Apache Spark 3.2.x



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Created] (SPARK-45502) Upgrade to Kafka 3.6.0

2023-10-11 Thread Dongjoon Hyun (Jira)

Dongjoon Hyun created SPARK-45502:
-

 Summary: Upgrade to Kafka 3.6.0
 Key: SPARK-45502
 URL: https://issues.apache.org/jira/browse/SPARK-45502
 Project: Spark
  Issue Type: Sub-task
  Components: Build
Affects Versions: 4.0.0
Reporter: Dongjoon Hyun


Apache Kafka 3.6.0 is released on Oct 10, 2023.

- https://downloads.apache.org/kafka/3.6.0/RELEASE_NOTES.html



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-45501) Use pattern matching for type checking and conversion

2023-10-11 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-45501?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated SPARK-45501:
---
Labels: pull-request-available  (was: )

> Use pattern matching for type checking and conversion
> -
>
> Key: SPARK-45501
> URL: https://issues.apache.org/jira/browse/SPARK-45501
> Project: Spark
>  Issue Type: Sub-task
>  Components: Spark Core, SQL
>Affects Versions: 4.0.0
>Reporter: Yang Jie
>Priority: Minor
>  Labels: pull-request-available
>
> Refer to [JEP 394|https://openjdk.org/jeps/394]
> Example:
> {code:java}
> if (obj instanceof String) {
>     String str = (String) obj;
>     System.out.println(str);
> } {code}
> Can be replaced with
>  
> {code:java}
> if (obj instanceof String str) {
>     System.out.println(str);
> } {code}
> The new code look more compact



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Created] (SPARK-45501) Use pattern matching for type checking and conversion

2023-10-11 Thread Yang Jie (Jira)

Yang Jie created SPARK-45501:


 Summary: Use pattern matching for type checking and conversion
 Key: SPARK-45501
 URL: https://issues.apache.org/jira/browse/SPARK-45501
 Project: Spark
  Issue Type: Sub-task
  Components: Spark Core, SQL
Affects Versions: 4.0.0
Reporter: Yang Jie


Refer to [JEP 394|https://openjdk.org/jeps/394]

Example:
{code:java}
if (obj instanceof String) {
    String str = (String) obj;
    System.out.println(str);
} {code}

Can be replaced with

 
{code:java}
if (obj instanceof String str) {
    System.out.println(str);
} {code}
The new code look more compact



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-45500) Show the number of abnormally completed drivers in MasterPage

2023-10-11 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-45500?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated SPARK-45500:
---
Labels: pull-request-available  (was: )

> Show the number of abnormally completed drivers in MasterPage
> -
>
> Key: SPARK-45500
> URL: https://issues.apache.org/jira/browse/SPARK-45500
> Project: Spark
>  Issue Type: Improvement
>  Components: Spark Core, Web UI
>Affects Versions: 4.0.0
>Reporter: Dongjoon Hyun
>Priority: Major
>  Labels: pull-request-available
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Created] (SPARK-45500) Show the number of abnormally completed drivers in MasterPage

2023-10-11 Thread Dongjoon Hyun (Jira)

Dongjoon Hyun created SPARK-45500:
-

 Summary: Show the number of abnormally completed drivers in 
MasterPage
 Key: SPARK-45500
 URL: https://issues.apache.org/jira/browse/SPARK-45500
 Project: Spark
  Issue Type: Improvement
  Components: Spark Core, Web UI
Affects Versions: 4.0.0
Reporter: Dongjoon Hyun






--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-45478) codegen sum(decimal_column / 2) computes div twice

2023-10-11 Thread Zhizhen Hou (Jira)



[ 
https://issues.apache.org/jira/browse/SPARK-45478?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17773921#comment-17773921
 ] 

Zhizhen Hou commented on SPARK-45478:
-

There are three children in If: predicate, trueValue and falseValue.

There are two execution paths. 1: predicate and trueValue. 2: predicate and 
falseValue.

There are three conbinations of possible common subexpression. 1: predicate and 
trueValue. 2: predicate and falseValue. 3: trueValue and falseValue.

So if all possible common subexpression be eliminated, there is 2 of 3 
possibility to improve the performance. For example, if there is common 
subexpression in predicate and falseValue, and common subexpression is executed 
only once, and it can improve the performance. Only there is common 
subexpression in trueValue and falseValue will not improve the performance and 
it will not draw back the performance, since whether trueValue and falseValue 
will be executed.

So, it looks good to check all three children in If. Any suggestions?

> codegen sum(decimal_column / 2) computes div twice
> --
>
> Key: SPARK-45478
> URL: https://issues.apache.org/jira/browse/SPARK-45478
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 3.2.0
>Reporter: Zhizhen Hou
>Priority: Minor
>
> *The SQL to reproduce the result*
> {code:java}
> create table t_dec (c1 decimal(6,2));
> insert into t_dec values(1.0),(2.0),(null),(3.0);
> explain codegen select sum(c1/2) from t_dec; {code}
>  
> *Reasons may cause the result:* 
>  
> Function sum use If expression in updateExpressions:
>  `If(child.isNull, sum, sum + KnownNotNull(child).cast(resultType))`
>  
> The three variables in if expression like this.
> {code:java}
> predicate: isnull(CheckOverflow((promote_precision(input[2, decimal(10,0), 
> true]) / 2), DecimalType(16,6), true))trueValue: input[0, decimal(26,6), 
> true]falseValue: (input[0, decimal(26,6), true] + 
> cast(knownnotnull(CheckOverflow((promote_precision(input[2, decimal(10,0), 
> true]) / 2), DecimalType(16,6), true)) as decimal(26,6))) {code}
> In sub expression elimination, only predicate is evaluated in 
> EquivalentExpressions# childrenToRecurse
> {code:java}
> private def childrenToRecurse(expr: Expression): Seq[Expression] = expr match 
> {
>   case _: CodegenFallback => Nil
>   case i: If => i.predicate :: Nil
>   case c: CaseWhen => c.children.head :: Nil
>   case c: Coalesce => c.children.head :: Nil
>   case other => other.children
> } {code}
> I tried to replace `case i: If => i.predicate :: Nil` with 'case i: If => 
> i.predicate :: trueValue :: falseValue :: Nil', and it produce correct result.
>  
> But the following comment in `childrenToRecurse`  makes me not sure it will 
> cause any other problems.
> {code:java}
> // 2. If: common subexpressions will always be evaluated at the beginning, 
> but the true and
> // false expressions in `If` may not get accessed, according to the predicate
> // expression. We should only recurse into the predicate expression. {code}
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-45498) Followup: Ignore task completion from old stage after retrying indeterminate stages

2023-10-11 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-45498?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated SPARK-45498:
---
Labels: pull-request-available  (was: )

> Followup: Ignore task completion from old stage after retrying indeterminate 
> stages
> ---
>
> Key: SPARK-45498
> URL: https://issues.apache.org/jira/browse/SPARK-45498
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Core
>Affects Versions: 4.0.0, 3.5.1
>Reporter: Mayur Bhosale
>Priority: Minor
>  Labels: pull-request-available
>
> With SPARK-45182, we added a fix for not letting laggard tasks of the older 
> attempts of the indeterminate stage from marking the partition has completed 
> in map output tracker.
> When a task completes, DAG scheduler also notifies all the tasksets of the 
> stage about that partition being completed. Tasksets would not schedule such 
> task if they are not already scheduled. This is not correct for indeterminate 
> stage, since we want to re-run all the tasks on re-attempt



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-14745) CEP support in Spark Streaming

2023-10-11 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-14745?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated SPARK-14745:
---
Labels: pull-request-available  (was: )

> CEP support in Spark Streaming
> --
>
> Key: SPARK-14745
> URL: https://issues.apache.org/jira/browse/SPARK-14745
> Project: Spark
>  Issue Type: New Feature
>  Components: DStreams
>Reporter: Mario Briggs
>Priority: Major
>  Labels: pull-request-available
> Attachments: SparkStreamingCEP.pdf
>
>
> Complex Event Processing is a often used feature in Streaming applications. 
> Spark Streaming current does not have a DSL/API for it. This JIRA is about 
> how/what can we add in Spark Streaming to support CEP out of the box



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-45499) Replace `Reference#isEnqueued` with `Reference#refersTo`

2023-10-11 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-45499?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated SPARK-45499:
---
Labels: pull-request-available  (was: )

> Replace `Reference#isEnqueued` with `Reference#refersTo`
> 
>
> Key: SPARK-45499
> URL: https://issues.apache.org/jira/browse/SPARK-45499
> Project: Spark
>  Issue Type: Sub-task
>  Components: Spark Core, Tests
>Affects Versions: 4.0.0
>Reporter: Yang Jie
>Priority: Minor
>  Labels: pull-request-available
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Created] (SPARK-45498) Followup: Ignore task completion from old stage after retrying indeterminate stages

2023-10-11 Thread Mayur Bhosale (Jira)

Mayur Bhosale created SPARK-45498:
-

 Summary: Followup: Ignore task completion from old stage after 
retrying indeterminate stages
 Key: SPARK-45498
 URL: https://issues.apache.org/jira/browse/SPARK-45498
 Project: Spark
  Issue Type: Bug
  Components: Spark Core
Affects Versions: 4.0.0, 3.5.1
Reporter: Mayur Bhosale


With SPARK-45182, we added a fix for not letting laggard tasks of the older 
attempts of the indeterminate stage from marking the partition has completed in 
map output tracker.

When a task completes, DAG scheduler also notifies all the tasksets of the 
stage about that partition being completed. Tasksets would not schedule such 
task if they are not already scheduled. This is not correct for indeterminate 
stage, since we want to re-run all the tasks on re-attempt



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Created] (SPARK-45499) Replace `Reference#isEnqueued` with `Reference#refersTo`

2023-10-11 Thread Yang Jie (Jira)

Yang Jie created SPARK-45499:


 Summary: Replace `Reference#isEnqueued` with `Reference#refersTo`
 Key: SPARK-45499
 URL: https://issues.apache.org/jira/browse/SPARK-45499
 Project: Spark
  Issue Type: Sub-task
  Components: Spark Core, Tests
Affects Versions: 4.0.0
Reporter: Yang Jie






--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

87 matches

Mail list logo