[jira] [Updated] (SPARK-45502) Upgrade Kafka to 3.6.0
[ https://issues.apache.org/jira/browse/SPARK-45502?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated SPARK-45502: --- Labels: pull-request-available (was: ) > Upgrade Kafka to 3.6.0 > -- > > Key: SPARK-45502 > URL: https://issues.apache.org/jira/browse/SPARK-45502 > Project: Spark > Issue Type: Sub-task > Components: Build >Affects Versions: 4.0.0 >Reporter: Dongjoon Hyun >Priority: Major > Labels: pull-request-available > > Apache Kafka 3.6.0 is released on Oct 10, 2023. > - https://downloads.apache.org/kafka/3.6.0/RELEASE_NOTES.html -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-45442) Refine docstring of `DataFrame.show`
[ https://issues.apache.org/jira/browse/SPARK-45442?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ruifeng Zheng reassigned SPARK-45442: - Assignee: Allison Wang > Refine docstring of `DataFrame.show` > > > Key: SPARK-45442 > URL: https://issues.apache.org/jira/browse/SPARK-45442 > Project: Spark > Issue Type: Sub-task > Components: Documentation, PySpark >Affects Versions: 4.0.0 >Reporter: Allison Wang >Assignee: Allison Wang >Priority: Major > Labels: pull-request-available > > Refine docstring of `DataFrame.show()` -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-45442) Refine docstring of `DataFrame.show`
[ https://issues.apache.org/jira/browse/SPARK-45442?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ruifeng Zheng resolved SPARK-45442. --- Fix Version/s: 4.0.0 Resolution: Fixed Issue resolved by pull request 43252 [https://github.com/apache/spark/pull/43252] > Refine docstring of `DataFrame.show` > > > Key: SPARK-45442 > URL: https://issues.apache.org/jira/browse/SPARK-45442 > Project: Spark > Issue Type: Sub-task > Components: Documentation, PySpark >Affects Versions: 4.0.0 >Reporter: Allison Wang >Assignee: Allison Wang >Priority: Major > Labels: pull-request-available > Fix For: 4.0.0 > > > Refine docstring of `DataFrame.show()` -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-45510) Replace `scala.collection.generic.Growable` to `scala.collection.mutable.Growable`
[ https://issues.apache.org/jira/browse/SPARK-45510?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated SPARK-45510: --- Labels: pull-request-available (was: ) > Replace `scala.collection.generic.Growable` to > `scala.collection.mutable.Growable` > -- > > Key: SPARK-45510 > URL: https://issues.apache.org/jira/browse/SPARK-45510 > Project: Spark > Issue Type: Sub-task > Components: SQL >Affects Versions: 4.0.0 >Reporter: Jia Fan >Priority: Major > Labels: pull-request-available > > Replace `scala.collection.generic.Growable` to > `scala.collection.mutable.Growable` -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-45510) Replace `scala.collection.generic.Growable` to `scala.collection.mutable.Growable`
Jia Fan created SPARK-45510: --- Summary: Replace `scala.collection.generic.Growable` to `scala.collection.mutable.Growable` Key: SPARK-45510 URL: https://issues.apache.org/jira/browse/SPARK-45510 Project: Spark Issue Type: Sub-task Components: SQL Affects Versions: 4.0.0 Reporter: Jia Fan Replace `scala.collection.generic.Growable` to `scala.collection.mutable.Growable` -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-45402) Add API for 'analyze' method to return a buffer to be consumed on each class creation
[ https://issues.apache.org/jira/browse/SPARK-45402?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Takuya Ueshin resolved SPARK-45402. --- Fix Version/s: 4.0.0 Assignee: Daniel Resolution: Fixed Issue resolved by pull request 43204 https://github.com/apache/spark/pull/43204 > Add API for 'analyze' method to return a buffer to be consumed on each class > creation > - > > Key: SPARK-45402 > URL: https://issues.apache.org/jira/browse/SPARK-45402 > Project: Spark > Issue Type: Sub-task > Components: SQL >Affects Versions: 4.0.0 >Reporter: Daniel >Assignee: Daniel >Priority: Major > Labels: pull-request-available > Fix For: 4.0.0 > > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-45508) Add "--add-opens=java.base/jdk.internal.ref=ALL-UNNAMED" so Platform can access cleaner on Java 9+
[ https://issues.apache.org/jira/browse/SPARK-45508?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated SPARK-45508: --- Labels: pull-request-available (was: ) > Add "--add-opens=java.base/jdk.internal.ref=ALL-UNNAMED" so Platform can > access cleaner on Java 9+ > -- > > Key: SPARK-45508 > URL: https://issues.apache.org/jira/browse/SPARK-45508 > Project: Spark > Issue Type: Bug > Components: Spark Core >Affects Versions: 3.5.0 >Reporter: Josh Rosen >Priority: Major > Labels: pull-request-available > > We need to add `--add-opens=java.base/jdk.internal.ref=ALL-UNNAMED` to our > JVM options so that the code in `org.apache.spark.unsafe.Platform` can access > the JDK internal cleaner classes. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-45508) Add "--add-opens=java.base/jdk.internal.ref=ALL-UNNAMED" so Platform can access cleaner on Java 9+
[ https://issues.apache.org/jira/browse/SPARK-45508?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Josh Rosen updated SPARK-45508: --- Description: We need to add `--add-opens=java.base/jdk.internal.ref=ALL-UNNAMED` to our JVM options so that the code in `org.apache.spark.unsafe.Platform` can access the JDK internal cleaner classes. (was: We need to update the ``` val f = classOf[org.apache.spark.unsafe.Platform].getDeclaredField("CLEANER_CREATE_METHOD") f.setAccessible(true) f.get(null) ``` returning `null` instead of a method.) > Add "--add-opens=java.base/jdk.internal.ref=ALL-UNNAMED" so Platform can > access cleaner on Java 9+ > -- > > Key: SPARK-45508 > URL: https://issues.apache.org/jira/browse/SPARK-45508 > Project: Spark > Issue Type: Bug > Components: Spark Core >Affects Versions: 3.5.0 >Reporter: Josh Rosen >Priority: Major > > We need to add `--add-opens=java.base/jdk.internal.ref=ALL-UNNAMED` to our > JVM options so that the code in `org.apache.spark.unsafe.Platform` can access > the JDK internal cleaner classes. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-45508) Add "--add-opens=java.base/jdk.internal.ref=ALL-UNNAMED" so Platform can access cleaner on Java 9+
[ https://issues.apache.org/jira/browse/SPARK-45508?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Josh Rosen updated SPARK-45508: --- Summary: Add "--add-opens=java.base/jdk.internal.ref=ALL-UNNAMED" so Platform can access cleaner on Java 9+ (was: Add "--add-opens=java.base/jdk.internal.ref=ALL-UNNAMED" so Platform can access cleaner on Java 11+) > Add "--add-opens=java.base/jdk.internal.ref=ALL-UNNAMED" so Platform can > access cleaner on Java 9+ > -- > > Key: SPARK-45508 > URL: https://issues.apache.org/jira/browse/SPARK-45508 > Project: Spark > Issue Type: Bug > Components: Spark Core >Affects Versions: 3.5.0 >Reporter: Josh Rosen >Priority: Major > > We need to update the > > ``` > val f = > classOf[org.apache.spark.unsafe.Platform].getDeclaredField("CLEANER_CREATE_METHOD") > f.setAccessible(true) > f.get(null) > ``` > returning `null` instead of a method. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-45508) Add "--add-opens=java.base/jdk.internal.ref=ALL-UNNAMED" so Platform can access cleaner on Java 11+
[ https://issues.apache.org/jira/browse/SPARK-45508?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Josh Rosen updated SPARK-45508: --- Summary: Add "--add-opens=java.base/jdk.internal.ref=ALL-UNNAMED" so Platform can access cleaner on Java 11+ (was: Add "--add-opens=java.base/jdk.internal.ref=ALL-UNNAMED" so Platform can access cleaner on Java 9+) > Add "--add-opens=java.base/jdk.internal.ref=ALL-UNNAMED" so Platform can > access cleaner on Java 11+ > --- > > Key: SPARK-45508 > URL: https://issues.apache.org/jira/browse/SPARK-45508 > Project: Spark > Issue Type: Bug > Components: Spark Core >Affects Versions: 3.5.0 >Reporter: Josh Rosen >Priority: Major > > In JDK >= 9.b110, the code at > [https://github.com/apache/spark/blob/v3.5.0/common/unsafe/src/main/java/org/apache/spark/unsafe/Platform.java#L213] > hits a fallback path because we are using the wrong cleaner class name: > `jdk.internal.ref.Cleaner` was removed in > [https://bugs.openjdk.org/browse/JDK-8149925] > This can be verified via > > ``` > val f = > classOf[org.apache.spark.unsafe.Platform].getDeclaredField("CLEANER_CREATE_METHOD") > f.setAccessible(true) > f.get(null) > ``` > returning `null` instead of a method. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-45508) Add "--add-opens=java.base/jdk.internal.ref=ALL-UNNAMED" so Platform can access cleaner on Java 11+
[ https://issues.apache.org/jira/browse/SPARK-45508?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Josh Rosen updated SPARK-45508: --- Description: We need to update the ``` val f = classOf[org.apache.spark.unsafe.Platform].getDeclaredField("CLEANER_CREATE_METHOD") f.setAccessible(true) f.get(null) ``` returning `null` instead of a method. was: In JDK >= 9.b110, the code at [https://github.com/apache/spark/blob/v3.5.0/common/unsafe/src/main/java/org/apache/spark/unsafe/Platform.java#L213] hits a fallback path because we are using the wrong cleaner class name: `jdk.internal.ref.Cleaner` was removed in [https://bugs.openjdk.org/browse/JDK-8149925] This can be verified via ``` val f = classOf[org.apache.spark.unsafe.Platform].getDeclaredField("CLEANER_CREATE_METHOD") f.setAccessible(true) f.get(null) ``` returning `null` instead of a method. > Add "--add-opens=java.base/jdk.internal.ref=ALL-UNNAMED" so Platform can > access cleaner on Java 11+ > --- > > Key: SPARK-45508 > URL: https://issues.apache.org/jira/browse/SPARK-45508 > Project: Spark > Issue Type: Bug > Components: Spark Core >Affects Versions: 3.5.0 >Reporter: Josh Rosen >Priority: Major > > We need to update the > > ``` > val f = > classOf[org.apache.spark.unsafe.Platform].getDeclaredField("CLEANER_CREATE_METHOD") > f.setAccessible(true) > f.get(null) > ``` > returning `null` instead of a method. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-45508) Add "--add-opens=java.base/jdk.internal.ref=ALL-UNNAMED" so Platform can access cleaner on Java 9+
[ https://issues.apache.org/jira/browse/SPARK-45508?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Josh Rosen updated SPARK-45508: --- Summary: Add "--add-opens=java.base/jdk.internal.ref=ALL-UNNAMED" so Platform can access cleaner on Java 9+ (was: org.apache.spark.unsafe.Platform uses wrong cleaner class name in JDK 9.b110+) > Add "--add-opens=java.base/jdk.internal.ref=ALL-UNNAMED" so Platform can > access cleaner on Java 9+ > -- > > Key: SPARK-45508 > URL: https://issues.apache.org/jira/browse/SPARK-45508 > Project: Spark > Issue Type: Bug > Components: Spark Core >Affects Versions: 3.5.0 >Reporter: Josh Rosen >Priority: Major > > In JDK >= 9.b110, the code at > [https://github.com/apache/spark/blob/v3.5.0/common/unsafe/src/main/java/org/apache/spark/unsafe/Platform.java#L213] > hits a fallback path because we are using the wrong cleaner class name: > `jdk.internal.ref.Cleaner` was removed in > [https://bugs.openjdk.org/browse/JDK-8149925] > This can be verified via > > ``` > val f = > classOf[org.apache.spark.unsafe.Platform].getDeclaredField("CLEANER_CREATE_METHOD") > f.setAccessible(true) > f.get(null) > ``` > returning `null` instead of a method. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-45509) Investigate the behavior difference in self-join
[ https://issues.apache.org/jira/browse/SPARK-45509?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Allison Wang updated SPARK-45509: - Description: SPARK-45220 discovers a behavior difference for a self-join scenario between classic Spark and Spark Connect. For instance, here is the query that works without Spark Connect: {code:java} df = spark.createDataFrame([Row(name="Alice", age=2), Row(name="Bob", age=5)]) df2 = spark.createDataFrame([Row(name="Tom", height=80), Row(name="Bob", height=85)]){code} {code:java} joined = df.join(df2, df.name == df2.name, "outer").sort(sf.desc(df.name)) joined.show(){code} But in Spark Connect, it throws this exception: {code:java} pyspark.errors.exceptions.connect.AnalysisException: [UNRESOLVED_COLUMN.WITH_SUGGESTION] A column, variable, or function parameter with name `name` cannot be resolved. Did you mean one of the following? [`name`, `name`, `age`, `height`].; 'Sort ['name DESC NULLS LAST], true +- Join FullOuter, (name#64 = name#78) :- LocalRelation [name#64, age#65L] +- LocalRelation [name#78, height#79L] {code} On the other hand, this query failed in classic Spark Connect: {code:java} df.join(df, df.name == df.name, "outer").select(df.name).show() {code} {code:java} pyspark.errors.exceptions.captured.AnalysisException: Column name#0 are ambiguous... {code} but this query works with Spark Connect. We need to investigate the behavior difference and fix it. was: SPARK-45220 discovers a behavior difference for a self-join scenario between classic Spark and Spark Connect. For instance, here is the query that works without Spark Connect: {code:java} joined = df.join(df2, df.name == df2.name, "outer").sort(sf.desc(df.name)) joined.show(){code} But in Spark Connect, it throws this exception: {code:java} pyspark.errors.exceptions.connect.AnalysisException: [UNRESOLVED_COLUMN.WITH_SUGGESTION] A column, variable, or function parameter with name `name` cannot be resolved. Did you mean one of the following? [`name`, `name`, `age`, `height`].; 'Sort ['name DESC NULLS LAST], true +- Join FullOuter, (name#64 = name#78) :- LocalRelation [name#64, age#65L] +- LocalRelation [name#78, height#79L] {code} On the other hand, this query failed in classic Spark Connect: {code:java} df.join(df, df.name == df.name, "outer").select(df.name).show() {code} {code:java} pyspark.errors.exceptions.captured.AnalysisException: Column name#0 are ambiguous... {code} but this query works with Spark Connect. We need to investigate the behavior difference and fix it. > Investigate the behavior difference in self-join > > > Key: SPARK-45509 > URL: https://issues.apache.org/jira/browse/SPARK-45509 > Project: Spark > Issue Type: Sub-task > Components: Connect, PySpark >Affects Versions: 3.5.0, 4.0.0 >Reporter: Allison Wang >Priority: Major > > SPARK-45220 discovers a behavior difference for a self-join scenario between > classic Spark and Spark Connect. > For instance, here is the query that works without Spark Connect: > {code:java} > df = spark.createDataFrame([Row(name="Alice", age=2), Row(name="Bob", age=5)]) > df2 = spark.createDataFrame([Row(name="Tom", height=80), Row(name="Bob", > height=85)]){code} > {code:java} > joined = df.join(df2, df.name == df2.name, "outer").sort(sf.desc(df.name)) > joined.show(){code} > But in Spark Connect, it throws this exception: > {code:java} > pyspark.errors.exceptions.connect.AnalysisException: > [UNRESOLVED_COLUMN.WITH_SUGGESTION] A column, variable, or function parameter > with name `name` cannot be resolved. Did you mean one of the following? > [`name`, `name`, `age`, `height`].; > 'Sort ['name DESC NULLS LAST], true > +- Join FullOuter, (name#64 = name#78) >:- LocalRelation [name#64, age#65L] >+- LocalRelation [name#78, height#79L] > {code} > > On the other hand, this query failed in classic Spark Connect: > {code:java} > df.join(df, df.name == df.name, "outer").select(df.name).show() {code} > {code:java} > pyspark.errors.exceptions.captured.AnalysisException: Column name#0 are > ambiguous... {code} > > but this query works with Spark Connect. > We need to investigate the behavior difference and fix it. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-45509) Investigate the behavior difference in self-join
[ https://issues.apache.org/jira/browse/SPARK-45509?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Allison Wang updated SPARK-45509: - Description: SPARK-45220 discovers a behavior difference for a self-join scenario between classic Spark and Spark Connect. For instance, here is the query that works without Spark Connect: {code:java} joined = df.join(df2, df.name == df2.name, "outer").sort(sf.desc(df.name)) joined.show(){code} But in Spark Connect, it throws this exception: {code:java} pyspark.errors.exceptions.connect.AnalysisException: [UNRESOLVED_COLUMN.WITH_SUGGESTION] A column, variable, or function parameter with name `name` cannot be resolved. Did you mean one of the following? [`name`, `name`, `age`, `height`].; 'Sort ['name DESC NULLS LAST], true +- Join FullOuter, (name#64 = name#78) :- LocalRelation [name#64, age#65L] +- LocalRelation [name#78, height#79L] {code} On the other hand, this query failed in classic Spark Connect: {code:java} df.join(df, df.name == df.name, "outer").select(df.name).show() {code} {code:java} pyspark.errors.exceptions.captured.AnalysisException: Column name#0 are ambiguous... {code} but this query works with Spark Connect. We need to investigate the behavior difference and fix it. was: SPARK-45220 discovers a behavior difference for a self-join scenario between class Spark and Spark Connect. For instance. here is the query that works without Spark Connect: {code:java} joined = df.join(df2, df.name == df2.name, "outer").sort(sf.desc(df.name)) joined.show(){code} But in Spark Connect, it throws this exception: {code:java} pyspark.errors.exceptions.connect.AnalysisException: [UNRESOLVED_COLUMN.WITH_SUGGESTION] A column, variable, or function parameter with name `name` cannot be resolved. Did you mean one of the following? [`name`, `name`, `age`, `height`].; 'Sort ['name DESC NULLS LAST], true +- Join FullOuter, (name#64 = name#78) :- LocalRelation [name#64, age#65L] +- LocalRelation [name#78, height#79L] {code} On the other hand, this query failed in classic Spark Connect: {code:java} df.join(df, df.name == df.name, "outer").select(df.name).show() {code} {code:java} pyspark.errors.exceptions.captured.AnalysisException: Column name#0 are ambiguous... {code} but this query works with Spark Connect. We need to investigate the behavior difference and fix it. > Investigate the behavior difference in self-join > > > Key: SPARK-45509 > URL: https://issues.apache.org/jira/browse/SPARK-45509 > Project: Spark > Issue Type: Sub-task > Components: Connect, PySpark >Affects Versions: 3.5.0, 4.0.0 >Reporter: Allison Wang >Priority: Major > > SPARK-45220 discovers a behavior difference for a self-join scenario between > classic Spark and Spark Connect. > For instance, here is the query that works without Spark Connect: > {code:java} > joined = df.join(df2, df.name == df2.name, "outer").sort(sf.desc(df.name)) > joined.show(){code} > But in Spark Connect, it throws this exception: > {code:java} > pyspark.errors.exceptions.connect.AnalysisException: > [UNRESOLVED_COLUMN.WITH_SUGGESTION] A column, variable, or function parameter > with name `name` cannot be resolved. Did you mean one of the following? > [`name`, `name`, `age`, `height`].; > 'Sort ['name DESC NULLS LAST], true > +- Join FullOuter, (name#64 = name#78) >:- LocalRelation [name#64, age#65L] >+- LocalRelation [name#78, height#79L] > {code} > > On the other hand, this query failed in classic Spark Connect: > {code:java} > df.join(df, df.name == df.name, "outer").select(df.name).show() {code} > {code:java} > pyspark.errors.exceptions.captured.AnalysisException: Column name#0 are > ambiguous... {code} > > but this query works with Spark Connect. > We need to investigate the behavior difference and fix it. > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-45509) Investigate the behavior difference in self-join
[ https://issues.apache.org/jira/browse/SPARK-45509?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Allison Wang updated SPARK-45509: - Description: SPARK-45220 discovers a behavior difference for a self-join scenario between class Spark and Spark Connect. For instance. here is the query that works without Spark Connect: {code:java} joined = df.join(df2, df.name == df2.name, "outer").sort(sf.desc(df.name)) joined.show(){code} But in Spark Connect, it throws this exception: {code:java} pyspark.errors.exceptions.connect.AnalysisException: [UNRESOLVED_COLUMN.WITH_SUGGESTION] A column, variable, or function parameter with name `name` cannot be resolved. Did you mean one of the following? [`name`, `name`, `age`, `height`].; 'Sort ['name DESC NULLS LAST], true +- Join FullOuter, (name#64 = name#78) :- LocalRelation [name#64, age#65L] +- LocalRelation [name#78, height#79L] {code} On the other hand, this query failed in classic Spark Connect: {code:java} df.join(df, df.name == df.name, "outer").select(df.name).show() {code} {code:java} pyspark.errors.exceptions.captured.AnalysisException: Column name#0 are ambiguous... {code} but this query works with Spark Connect. We need to investigate the behavior difference and fix it. was: SAPRK-45220 discovers a behavior difference for a self-join scenario between class Spark and Spark Connect. For instance. here is the query that works without Spark Connect: {code:java} joined = df.join(df2, df.name == df2.name, "outer").sort(sf.desc(df.name)) joined.show(){code} But in Spark Connect, it throws this exception: {code:java} pyspark.errors.exceptions.connect.AnalysisException: [UNRESOLVED_COLUMN.WITH_SUGGESTION] A column, variable, or function parameter with name `name` cannot be resolved. Did you mean one of the following? [`name`, `name`, `age`, `height`].; 'Sort ['name DESC NULLS LAST], true +- Join FullOuter, (name#64 = name#78) :- LocalRelation [name#64, age#65L] +- LocalRelation [name#78, height#79L] {code} On the other hand, this query failed in classic Spark Connect: {code:java} df.join(df, df.name == df.name, "outer").select(df.name).show() {code} {code:java} pyspark.errors.exceptions.captured.AnalysisException: Column name#0 are ambiguous... {code} but this query works with Spark Connect. We need to investigate the behavior difference and fix it. > Investigate the behavior difference in self-join > > > Key: SPARK-45509 > URL: https://issues.apache.org/jira/browse/SPARK-45509 > Project: Spark > Issue Type: Sub-task > Components: Connect, PySpark >Affects Versions: 3.5.0, 4.0.0 >Reporter: Allison Wang >Priority: Major > > SPARK-45220 discovers a behavior difference for a self-join scenario between > class Spark and Spark Connect. > For instance. here is the query that works without Spark Connect: > > {code:java} > joined = df.join(df2, df.name == df2.name, "outer").sort(sf.desc(df.name)) > joined.show(){code} > > But in Spark Connect, it throws this exception: > > {code:java} > pyspark.errors.exceptions.connect.AnalysisException: > [UNRESOLVED_COLUMN.WITH_SUGGESTION] A column, variable, or function parameter > with name `name` cannot be resolved. Did you mean one of the following? > [`name`, `name`, `age`, `height`].; > 'Sort ['name DESC NULLS LAST], true > +- Join FullOuter, (name#64 = name#78) >:- LocalRelation [name#64, age#65L] >+- LocalRelation [name#78, height#79L] > {code} > > On the other hand, this query failed in classic Spark Connect: > > {code:java} > df.join(df, df.name == df.name, "outer").select(df.name).show() {code} > {code:java} > pyspark.errors.exceptions.captured.AnalysisException: Column name#0 are > ambiguous... {code} > > but this query works with Spark Connect. > We need to investigate the behavior difference and fix it. > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-45508) org.apache.spark.unsafe.Platform uses wrong cleaner class name in JDK 9.b110+
[ https://issues.apache.org/jira/browse/SPARK-45508?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Josh Rosen updated SPARK-45508: --- Description: In JDK >= 9.b110, the code at [https://github.com/apache/spark/blob/v3.5.0/common/unsafe/src/main/java/org/apache/spark/unsafe/Platform.java#L213] hits a fallback path because we are using the wrong cleaner class name: `jdk.internal.ref.Cleaner` was removed in [https://bugs.openjdk.org/browse/JDK-8149925] This can be verified via ``` val f = classOf[org.apache.spark.unsafe.Platform].getDeclaredField("CLEANER_CREATE_METHOD") f.setAccessible(true) f.get(null) ``` returning `null` instead of a method. was: In JDK >= 9.b110, the code at [https://github.com/apache/spark/blob/v3.5.0/common/unsafe/src/main/java/org/apache/spark/unsafe/Platform.java#L213] hits a fallback path because we are using the wrong cleaner class name: `jdk.internal.ref.Cleaner` was removed in JDK-8149925 [https://bugs.java.com/bugdatabase/view_bug.do?bug_id=8149925] This can be verified via ``` val f = classOf[org.apache.spark.unsafe.Platform].getDeclaredField("CLEANER_CREATE_METHOD") f.setAccessible(true) f.get(null) ``` returning `null` instead of a method. > org.apache.spark.unsafe.Platform uses wrong cleaner class name in JDK 9.b110+ > - > > Key: SPARK-45508 > URL: https://issues.apache.org/jira/browse/SPARK-45508 > Project: Spark > Issue Type: Bug > Components: Spark Core >Affects Versions: 3.5.0 >Reporter: Josh Rosen >Priority: Major > > In JDK >= 9.b110, the code at > [https://github.com/apache/spark/blob/v3.5.0/common/unsafe/src/main/java/org/apache/spark/unsafe/Platform.java#L213] > hits a fallback path because we are using the wrong cleaner class name: > `jdk.internal.ref.Cleaner` was removed in > [https://bugs.openjdk.org/browse/JDK-8149925] > This can be verified via > > ``` > val f = > classOf[org.apache.spark.unsafe.Platform].getDeclaredField("CLEANER_CREATE_METHOD") > f.setAccessible(true) > f.get(null) > ``` > returning `null` instead of a method. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-45508) org.apache.spark.unsafe.Platform uses wrong cleaner class name in JDK 9.b110+
[ https://issues.apache.org/jira/browse/SPARK-45508?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Josh Rosen updated SPARK-45508: --- Description: In JDK >= 9.b110, the code at [https://github.com/apache/spark/blob/v3.5.0/common/unsafe/src/main/java/org/apache/spark/unsafe/Platform.java#L213] hits a fallback path because we are using the wrong cleaner class name: `jdk.internal.ref.Cleaner` was removed in JDK-8149925 [https://bugs.java.com/bugdatabase/view_bug.do?bug_id=8149925] This can be verified via ``` val f = classOf[org.apache.spark.unsafe.Platform].getDeclaredField("CLEANER_CREATE_METHOD") f.setAccessible(true) f.get(null) ``` returning `null` instead of a method. was: In JDK 11+, the code at [https://github.com/apache/spark/blob/v3.5.0/common/unsafe/src/main/java/org/apache/spark/unsafe/Platform.java#L213] hits a fallback path because we are using the wrong cleaner class name. This can be verified via ``` val f = classOf[org.apache.spark.unsafe.Platform].getDeclaredField("CLEANER_CREATE_METHOD") f.setAccessible(true) f.get(null) ``` returning `null` instead of a method. Summary: org.apache.spark.unsafe.Platform uses wrong cleaner class name in JDK 9.b110+ (was: org.apache.spark.unsafe.Platform uses wrong cleaner class name in JDK 11+) > org.apache.spark.unsafe.Platform uses wrong cleaner class name in JDK 9.b110+ > - > > Key: SPARK-45508 > URL: https://issues.apache.org/jira/browse/SPARK-45508 > Project: Spark > Issue Type: Bug > Components: Spark Core >Affects Versions: 3.5.0 >Reporter: Josh Rosen >Priority: Major > > In JDK >= 9.b110, the code at > [https://github.com/apache/spark/blob/v3.5.0/common/unsafe/src/main/java/org/apache/spark/unsafe/Platform.java#L213] > hits a fallback path because we are using the wrong cleaner class name: > `jdk.internal.ref.Cleaner` was removed in JDK-8149925 > [https://bugs.java.com/bugdatabase/view_bug.do?bug_id=8149925] > This can be verified via > > ``` > val f = > classOf[org.apache.spark.unsafe.Platform].getDeclaredField("CLEANER_CREATE_METHOD") > f.setAccessible(true) > f.get(null) > ``` > returning `null` instead of a method. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-45509) Investigate the behavior difference in self-join
Allison Wang created SPARK-45509: Summary: Investigate the behavior difference in self-join Key: SPARK-45509 URL: https://issues.apache.org/jira/browse/SPARK-45509 Project: Spark Issue Type: Sub-task Components: Connect, PySpark Affects Versions: 3.5.0, 4.0.0 Reporter: Allison Wang SAPRK-45220 discovers a behavior difference for a self-join scenario between class Spark and Spark Connect. For instance. here is the query that works without Spark Connect: {code:java} joined = df.join(df2, df.name == df2.name, "outer").sort(sf.desc(df.name)) joined.show(){code} But in Spark Connect, it throws this exception: {code:java} pyspark.errors.exceptions.connect.AnalysisException: [UNRESOLVED_COLUMN.WITH_SUGGESTION] A column, variable, or function parameter with name `name` cannot be resolved. Did you mean one of the following? [`name`, `name`, `age`, `height`].; 'Sort ['name DESC NULLS LAST], true +- Join FullOuter, (name#64 = name#78) :- LocalRelation [name#64, age#65L] +- LocalRelation [name#78, height#79L] {code} On the other hand, this query failed in classic Spark Connect: {code:java} df.join(df, df.name == df.name, "outer").select(df.name).show() {code} {code:java} pyspark.errors.exceptions.captured.AnalysisException: Column name#0 are ambiguous... {code} but this query works with Spark Connect. We need to investigate the behavior difference and fix it. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-45508) org.apache.spark.unsafe.Platform uses wrong cleaner class name in JDK 11+
Josh Rosen created SPARK-45508: -- Summary: org.apache.spark.unsafe.Platform uses wrong cleaner class name in JDK 11+ Key: SPARK-45508 URL: https://issues.apache.org/jira/browse/SPARK-45508 Project: Spark Issue Type: Bug Components: Spark Core Affects Versions: 3.5.0 Reporter: Josh Rosen In JDK 11+, the code at [https://github.com/apache/spark/blob/v3.5.0/common/unsafe/src/main/java/org/apache/spark/unsafe/Platform.java#L213] hits a fallback path because we are using the wrong cleaner class name. This can be verified via ``` val f = classOf[org.apache.spark.unsafe.Platform].getDeclaredField("CLEANER_CREATE_METHOD") f.setAccessible(true) f.get(null) ``` returning `null` instead of a method. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-45487) Replace: _LEGACY_ERROR_TEMP_3007
[ https://issues.apache.org/jira/browse/SPARK-45487?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated SPARK-45487: --- Labels: pull-request-available (was: ) > Replace: _LEGACY_ERROR_TEMP_3007 > > > Key: SPARK-45487 > URL: https://issues.apache.org/jira/browse/SPARK-45487 > Project: Spark > Issue Type: Improvement > Components: Spark Core >Affects Versions: 4.0.0 >Reporter: Serge Rielau >Priority: Major > Labels: pull-request-available > > def checkpointRDDBlockIdNotFoundError(rddBlockId: RDDBlockId): Throwable = \{ > new SparkException( > errorClass = "_LEGACY_ERROR_TEMP_3007", > messageParameters = Map("rddBlockId" -> s"$rddBlockId"), > cause = null > ) > } > This error condition appears to be quite common, so we should convert it to a > proper error class. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-45507) Correctness bug in correlated scalar subqueries with COUNT aggregates
[ https://issues.apache.org/jira/browse/SPARK-45507?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Andy Lam updated SPARK-45507: - Description: {code:java} create view if not exists t1(a1, a2) as values (0, 1), (1, 2); create view if not exists t2(b1, b2) as values (0, 2), (0, 3); create view if not exists t3(c1, c2) as values (0, 2), (0, 3); -- Example 1 select ( select SUM(l.cnt + r.cnt) from (select count(*) cnt from t2 where t1.a1 = t2.b1 having cnt = 0) l join (select count(*) cnt from t3 where t1.a1 = t3.c1 having cnt = 0) r on l.cnt = r.cnt ) from t1 -- Correct answer: (null, 0) +--+ |scalarsubquery(c1, c1)| +--+ |null | |null | +--+ -- Example 2 select ( select sum(cnt) from (select count(*) cnt from t2 where t1.c1 = t2.c1) ) from t1 -- Correct answer: (2, 0) +--+ |scalarsubquery(c1)| +--+ |2 | |null | +--+ -- Example 3 select ( select count(*) from (select count(*) cnt from t2 where t1.c1 = t2.c1) ) from t1 -- Correct answer: (1, 1) +--+ |scalarsubquery(c1)| +--+ |1 | |0 | +--+ {code} DB fiddle for correctness check:[https://www.db-fiddle.com/f/4jyoMCicNSZpjMt4jFYoz5/10403#] was: {code:java} create view if not exists t1(a1, a2) as values (0, 1), (1, 2); create view if not exists t2(b1, b2) as values (0, 2), (0, 3); create view if not exists t3(c1, c2) as values (0, 2), (0, 3); -- Example 1 (has having clause) select ( select SUM(l.cnt + r.cnt) from (select count(*) cnt from t2 where t1.a1 = t2.b1 having cnt = 0) l join (select count(*) cnt from t3 where t1.a1 = t3.c1 having cnt = 0) r on l.cnt = r.cnt ) from t1 -- Correct answer: (null, 0) +--+ |scalarsubquery(c1, c1)| +--+ |null | |null | +--+ -- Example 2 select ( select sum(cnt) from (select count(*) cnt from t2 where t1.c1 = t2.c1) ) from t1 -- Correct answer: (2, 0) +--+ |scalarsubquery(c1)| +--+ |2 | |null | +--+ -- Example 3 select ( select count(*) from (select count(*) cnt from t2 where t1.c1 = t2.c1) ) from t1 -- Correct answer: (1, 1) +--+ |scalarsubquery(c1)| +--+ |1 | |0 | +--+ {code} DB fiddle for correctness check:[https://www.db-fiddle.com/f/4jyoMCicNSZpjMt4jFYoz5/10403#] > Correctness bug in correlated scalar subqueries with COUNT aggregates > - > > Key: SPARK-45507 > URL: https://issues.apache.org/jira/browse/SPARK-45507 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 4.0.0 >Reporter: Andy Lam >Priority: Major > Labels: pull-request-available > > {code:java} > > create view if not exists t1(a1, a2) as values (0, 1), (1, 2); > create view if not exists t2(b1, b2) as values (0, 2), (0, 3); > create view if not exists t3(c1, c2) as values (0, 2), (0, 3); > -- Example 1 > select ( > select SUM(l.cnt + r.cnt) > from (select count(*) cnt from t2 where t1.a1 = t2.b1 having cnt = 0) l > join (select count(*) cnt from t3 where t1.a1 = t3.c1 having cnt = 0) r > on l.cnt = r.cnt > ) from t1 > -- Correct answer: (null, 0) > +--+ > |scalarsubquery(c1, c1)| > +--+ > |null | > |null | > +--+ > -- Example 2 > select ( select sum(cnt) from (select count(*) cnt from t2 where t1.c1 = > t2.c1) ) from t1 > -- Correct answer: (2, 0) > +--+ > |scalarsubquery(c1)| > +--+ > |2 | > |null | > +--+ > -- Example 3 > select ( select count(*) from (select count(*) cnt from t2 where t1.c1 = > t2.c1) ) from t1 > -- Correct answer: (1, 1) > +--+ > |scalarsubquery(c1)| > +--+ > |1 | > |0 | > +--+ {code} > > > DB fiddle for correctness > check:[https://www.db-fiddle.com/f/4jyoMCicNSZpjMt4jFYoz5/10403#] -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-45507) Correctness bug in correlated scalar subqueries with COUNT aggregates
[ https://issues.apache.org/jira/browse/SPARK-45507?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated SPARK-45507: --- Labels: pull-request-available (was: ) > Correctness bug in correlated scalar subqueries with COUNT aggregates > - > > Key: SPARK-45507 > URL: https://issues.apache.org/jira/browse/SPARK-45507 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 4.0.0 >Reporter: Andy Lam >Priority: Major > Labels: pull-request-available > > {code:java} > > create view if not exists t1(a1, a2) as values (0, 1), (1, 2); > create view if not exists t2(b1, b2) as values (0, 2), (0, 3); > create view if not exists t3(c1, c2) as values (0, 2), (0, 3); > -- Example 1 (has having clause) > select ( > select SUM(l.cnt + r.cnt) > from (select count(*) cnt from t2 where t1.a1 = t2.b1 having cnt = 0) l > join (select count(*) cnt from t3 where t1.a1 = t3.c1 having cnt = 0) r > on l.cnt = r.cnt > ) from t1 > -- Correct answer: (null, 0) > +--+ > |scalarsubquery(c1, c1)| > +--+ > |null | > |null | > +--+ > -- Example 2 > select ( select sum(cnt) from (select count(*) cnt from t2 where t1.c1 = > t2.c1) ) from t1 > -- Correct answer: (2, 0) > +--+ > |scalarsubquery(c1)| > +--+ > |2 | > |null | > +--+ > -- Example 3 > select ( select count(*) from (select count(*) cnt from t2 where t1.c1 = > t2.c1) ) from t1 > -- Correct answer: (1, 1) > +--+ > |scalarsubquery(c1)| > +--+ > |1 | > |0 | > +--+ {code} > > > DB fiddle for correctness > check:[https://www.db-fiddle.com/f/4jyoMCicNSZpjMt4jFYoz5/10403#] -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-45507) Correctness bug in correlated scalar subqueries with COUNT aggregates
Andy Lam created SPARK-45507: Summary: Correctness bug in correlated scalar subqueries with COUNT aggregates Key: SPARK-45507 URL: https://issues.apache.org/jira/browse/SPARK-45507 Project: Spark Issue Type: Bug Components: SQL Affects Versions: 4.0.0 Reporter: Andy Lam {code:java} create view if not exists t1(a1, a2) as values (0, 1), (1, 2); create view if not exists t2(b1, b2) as values (0, 2), (0, 3); create view if not exists t3(c1, c2) as values (0, 2), (0, 3); -- Example 1 (has having clause) select ( select SUM(l.cnt + r.cnt) from (select count(*) cnt from t2 where t1.a1 = t2.b1 having cnt = 0) l join (select count(*) cnt from t3 where t1.a1 = t3.c1 having cnt = 0) r on l.cnt = r.cnt ) from t1 -- Correct answer: (null, 0) +--+ |scalarsubquery(c1, c1)| +--+ |null | |null | +--+ -- Example 2 select ( select sum(cnt) from (select count(*) cnt from t2 where t1.c1 = t2.c1) ) from t1 -- Correct answer: (2, 0) +--+ |scalarsubquery(c1)| +--+ |2 | |null | +--+ -- Example 3 select ( select count(*) from (select count(*) cnt from t2 where t1.c1 = t2.c1) ) from t1 -- Correct answer: (1, 1) +--+ |scalarsubquery(c1)| +--+ |1 | |0 | +--+ {code} DB fiddle for correctness check:[https://www.db-fiddle.com/f/4jyoMCicNSZpjMt4jFYoz5/10403#] -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-45506) Support ivy URIs in SparkConnect addArtifact
Vsevolod Stepanov created SPARK-45506: - Summary: Support ivy URIs in SparkConnect addArtifact Key: SPARK-45506 URL: https://issues.apache.org/jira/browse/SPARK-45506 Project: Spark Issue Type: Improvement Components: Connect Affects Versions: 4.0.0 Reporter: Vsevolod Stepanov Right now Spark Connect's addArtifact API supports only adding .jar & .class files. It would be useful to extend this API to support adding arbitrary Maven artifacts using Ivy -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-45221) Refine docstring of `DataFrameReader.parquet`
[ https://issues.apache.org/jira/browse/SPARK-45221?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hyukjin Kwon resolved SPARK-45221. -- Fix Version/s: 4.0.0 Resolution: Fixed Issue resolved by pull request 43301 [https://github.com/apache/spark/pull/43301] > Refine docstring of `DataFrameReader.parquet` > - > > Key: SPARK-45221 > URL: https://issues.apache.org/jira/browse/SPARK-45221 > Project: Spark > Issue Type: Sub-task > Components: Documentation, PySpark >Affects Versions: 4.0.0 >Reporter: Allison Wang >Assignee: Allison Wang >Priority: Major > Labels: pull-request-available > Fix For: 4.0.0 > > > Refine the docstring of read parquet -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-45221) Refine docstring of `DataFrameReader.parquet`
[ https://issues.apache.org/jira/browse/SPARK-45221?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hyukjin Kwon reassigned SPARK-45221: Assignee: Allison Wang > Refine docstring of `DataFrameReader.parquet` > - > > Key: SPARK-45221 > URL: https://issues.apache.org/jira/browse/SPARK-45221 > Project: Spark > Issue Type: Sub-task > Components: Documentation, PySpark >Affects Versions: 4.0.0 >Reporter: Allison Wang >Assignee: Allison Wang >Priority: Major > Labels: pull-request-available > > Refine the docstring of read parquet -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-45505) Refactor analyzeInPython function to make it reusable
[ https://issues.apache.org/jira/browse/SPARK-45505?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated SPARK-45505: --- Labels: pull-request-available (was: ) > Refactor analyzeInPython function to make it reusable > - > > Key: SPARK-45505 > URL: https://issues.apache.org/jira/browse/SPARK-45505 > Project: Spark > Issue Type: Sub-task > Components: PySpark >Affects Versions: 4.0.0 >Reporter: Allison Wang >Priority: Major > Labels: pull-request-available > > Refactor analyzeInPython method in UserDefinedPythonTableFunction object into > an abstract class so that it can be reused in the future. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-43396) Add config to control max ratio of decommissioning executors
[ https://issues.apache.org/jira/browse/SPARK-43396?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated SPARK-43396: --- Labels: pull-request-available (was: ) > Add config to control max ratio of decommissioning executors > > > Key: SPARK-43396 > URL: https://issues.apache.org/jira/browse/SPARK-43396 > Project: Spark > Issue Type: New Feature > Components: Spark Core >Affects Versions: 3.4.0 >Reporter: Zhongwei Zhu >Priority: Major > Labels: pull-request-available > > Decommission too many executors at the same time with shuffle or rdd > migration could severely hurt performance of shuffle fetch. Block manager > decommissioner try to migrate shuffle or rdd as soon as possible, this will > compete network and disk IO with shuffle fetch in the target executor. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-42584) Improve output of Column.explain
[ https://issues.apache.org/jira/browse/SPARK-42584?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated SPARK-42584: --- Labels: pull-request-available (was: ) > Improve output of Column.explain > > > Key: SPARK-42584 > URL: https://issues.apache.org/jira/browse/SPARK-42584 > Project: Spark > Issue Type: New Feature > Components: Connect >Affects Versions: 3.4.0 >Reporter: Herman van Hövell >Priority: Major > Labels: pull-request-available > > We currently display the structure of the proto in both the regular and > extended version of explain. We should display a more compact sql-a-like > string for the regular version. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-45505) Refactor analyzeInPython function to make it reusable
Allison Wang created SPARK-45505: Summary: Refactor analyzeInPython function to make it reusable Key: SPARK-45505 URL: https://issues.apache.org/jira/browse/SPARK-45505 Project: Spark Issue Type: Sub-task Components: PySpark Affects Versions: 4.0.0 Reporter: Allison Wang Refactor analyzeInPython method in UserDefinedPythonTableFunction object into an abstract class so that it can be reused in the future. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-45504) RocksDB State Store Should Lower RocksDB Background Thread CPU Priority
[ https://issues.apache.org/jira/browse/SPARK-45504?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated SPARK-45504: --- Labels: pull-request-available (was: ) > RocksDB State Store Should Lower RocksDB Background Thread CPU Priority > --- > > Key: SPARK-45504 > URL: https://issues.apache.org/jira/browse/SPARK-45504 > Project: Spark > Issue Type: Task > Components: Structured Streaming >Affects Versions: 3.5.1 >Reporter: Siying Dong >Priority: Minor > Labels: pull-request-available > > We can move RocksDB flush and compaction to lower CPU priority. They usually > are background tasks and don't need to compete with task execution. For the > case where a task may wait for some RocksDB background task to finish, such > as checkpointing, or waiting async checkpointing to finish, the task slot is > waiting so we are likely to have enough CPU for low pri CPU anyway. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-45504) RocksDB State Store Should Lower RocksDB Background Thread CPU Priority
Siying Dong created SPARK-45504: --- Summary: RocksDB State Store Should Lower RocksDB Background Thread CPU Priority Key: SPARK-45504 URL: https://issues.apache.org/jira/browse/SPARK-45504 Project: Spark Issue Type: Task Components: Structured Streaming Affects Versions: 3.5.1 Reporter: Siying Dong We can move RocksDB flush and compaction to lower CPU priority. They usually are background tasks and don't need to compete with task execution. For the case where a task may wait for some RocksDB background task to finish, such as checkpointing, or waiting async checkpointing to finish, the task slot is waiting so we are likely to have enough CPU for low pri CPU anyway. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-45503) RocksDB State Store to Use LZ4 Compression
[ https://issues.apache.org/jira/browse/SPARK-45503?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated SPARK-45503: --- Labels: pull-request-available (was: ) > RocksDB State Store to Use LZ4 Compression > -- > > Key: SPARK-45503 > URL: https://issues.apache.org/jira/browse/SPARK-45503 > Project: Spark > Issue Type: Task > Components: Structured Streaming >Affects Versions: 3.5.1 >Reporter: Siying Dong >Priority: Minor > Labels: pull-request-available > > LZ4 is generally faster than Snappy. That's probably we use LZ4 in changelogs > and other places by default. However, we don't change RocksDB's default of > Snappy compression style. The RocksDB Team recommend LZ4 or ZSTD and the > default is kept to Snappy only for backward compatible reason. We should use > LZ4 instead. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-45503) RocksDB State Store to Use LZ4 Compression
Siying Dong created SPARK-45503: --- Summary: RocksDB State Store to Use LZ4 Compression Key: SPARK-45503 URL: https://issues.apache.org/jira/browse/SPARK-45503 Project: Spark Issue Type: Task Components: Structured Streaming Affects Versions: 3.5.1 Reporter: Siying Dong LZ4 is generally faster than Snappy. That's probably we use LZ4 in changelogs and other places by default. However, we don't change RocksDB's default of Snappy compression style. The RocksDB Team recommend LZ4 or ZSTD and the default is kept to Snappy only for backward compatible reason. We should use LZ4 instead. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-45490) Replace: _LEGACY_ERROR_TEMP_2151 with a proper error class
[ https://issues.apache.org/jira/browse/SPARK-45490?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Serge Rielau resolved SPARK-45490. -- Resolution: Cannot Reproduce Seems to have been implemented as: EXPRESSION_DECODING_FAILED > Replace: _LEGACY_ERROR_TEMP_2151 with a proper error class > -- > > Key: SPARK-45490 > URL: https://issues.apache.org/jira/browse/SPARK-45490 > Project: Spark > Issue Type: Improvement > Components: Spark Core >Affects Versions: 4.0.0 >Reporter: Serge Rielau >Priority: Major > > {code:java} > def expressionDecodingError(e: Exception, expressions: Seq[Expression]): > SparkRuntimeException = { > new SparkRuntimeException( > errorClass = "_LEGACY_ERROR_TEMP_2151", > messageParameters = Map( > "e" -> e.toString(), > "expressions" -> expressions.map( > _.simpleString(SQLConf.get.maxToStringFields)).mkString("\n")), > cause = e) > } {code} -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-44855) Small tweaks to attaching ExecuteGrpcResponseSender to ExecuteResponseObserver
[ https://issues.apache.org/jira/browse/SPARK-44855?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Herman van Hövell resolved SPARK-44855. --- Fix Version/s: 4.0.0 Assignee: Juliusz Sompolski Resolution: Fixed > Small tweaks to attaching ExecuteGrpcResponseSender to ExecuteResponseObserver > -- > > Key: SPARK-44855 > URL: https://issues.apache.org/jira/browse/SPARK-44855 > Project: Spark > Issue Type: Improvement > Components: Connect >Affects Versions: 4.0.0 >Reporter: Juliusz Sompolski >Assignee: Juliusz Sompolski >Priority: Major > Labels: pull-request-available > Fix For: 4.0.0 > > > Small improvements can be made to the way new ExecuteGrpcResponseSender is > attached to observer. > * Since now we have addGrpcResponseSender in ExecuteHolder, it should be > ExecuteHolder responsibility to interrupt the old sender and that there is > only one at a time, and to ExecuteResponseObserver's responsibility > * executeObserver is used as a lock for synchronization. An explicit lock > object could be better. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-45502) Upgrade Kafka to 3.6.0
[ https://issues.apache.org/jira/browse/SPARK-45502?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17774147#comment-17774147 ] Dongjoon Hyun commented on SPARK-45502: --- Thank you for volunteering, [~dengziming] . BTW, Apache Spark community respects the first PR. We don't have any locking or assignee system intentionally. The first PR will be reviewed first. And, the committers is going to set the assignee of this Jira when the PR is merged. > Upgrade Kafka to 3.6.0 > -- > > Key: SPARK-45502 > URL: https://issues.apache.org/jira/browse/SPARK-45502 > Project: Spark > Issue Type: Sub-task > Components: Build >Affects Versions: 4.0.0 >Reporter: Dongjoon Hyun >Priority: Major > > Apache Kafka 3.6.0 is released on Oct 10, 2023. > - https://downloads.apache.org/kafka/3.6.0/RELEASE_NOTES.html -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-45433) CSV/JSON schema inference when timestamps do not match specified timestampFormat with only one row on each partition report error
[ https://issues.apache.org/jira/browse/SPARK-45433?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Max Gekk resolved SPARK-45433. -- Fix Version/s: 3.5.1 4.0.0 Resolution: Fixed Issue resolved by pull request 43243 [https://github.com/apache/spark/pull/43243] > CSV/JSON schema inference when timestamps do not match specified > timestampFormat with only one row on each partition report error > - > > Key: SPARK-45433 > URL: https://issues.apache.org/jira/browse/SPARK-45433 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 3.3.0, 3.4.0, 3.5.0 >Reporter: Jia Fan >Assignee: Jia Fan >Priority: Major > Labels: pull-request-available > Fix For: 3.5.1, 4.0.0 > > > CSV/JSON schema inference when timestamps do not match specified > timestampFormat with `only one row on each partition` report error. > {code:java} > //eg > val csv = spark.read.option("timestampFormat", "-MM-dd'T'HH:mm:ss") > .option("inferSchema", true).csv(Seq("2884-06-24T02:45:51.138").toDS()) > csv.show() {code} > {code:java} > //error > Caused by: java.time.format.DateTimeParseException: Text > '2884-06-24T02:45:51.138' could not be parsed, unparsed text found at index > 19 {code} > This bug affect 3.3/3.4/3.5. Unlike > https://issues.apache.org/jira/browse/SPARK-45424 , this is a different bug > but has the same error message -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-45433) CSV/JSON schema inference when timestamps do not match specified timestampFormat with only one row on each partition report error
[ https://issues.apache.org/jira/browse/SPARK-45433?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Max Gekk reassigned SPARK-45433: Assignee: Jia Fan > CSV/JSON schema inference when timestamps do not match specified > timestampFormat with only one row on each partition report error > - > > Key: SPARK-45433 > URL: https://issues.apache.org/jira/browse/SPARK-45433 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 3.3.0, 3.4.0, 3.5.0 >Reporter: Jia Fan >Assignee: Jia Fan >Priority: Major > Labels: pull-request-available > > CSV/JSON schema inference when timestamps do not match specified > timestampFormat with `only one row on each partition` report error. > {code:java} > //eg > val csv = spark.read.option("timestampFormat", "-MM-dd'T'HH:mm:ss") > .option("inferSchema", true).csv(Seq("2884-06-24T02:45:51.138").toDS()) > csv.show() {code} > {code:java} > //error > Caused by: java.time.format.DateTimeParseException: Text > '2884-06-24T02:45:51.138' could not be parsed, unparsed text found at index > 19 {code} > This bug affect 3.3/3.4/3.5. Unlike > https://issues.apache.org/jira/browse/SPARK-45424 , this is a different bug > but has the same error message -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-45483) Correct the function groups in connect.functions
[ https://issues.apache.org/jira/browse/SPARK-45483?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dongjoon Hyun resolved SPARK-45483. --- Fix Version/s: 4.0.0 Resolution: Fixed Issue resolved by pull request 43309 [https://github.com/apache/spark/pull/43309] > Correct the function groups in connect.functions > > > Key: SPARK-45483 > URL: https://issues.apache.org/jira/browse/SPARK-45483 > Project: Spark > Issue Type: Improvement > Components: Connect >Affects Versions: 4.0.0 >Reporter: Ruifeng Zheng >Assignee: Ruifeng Zheng >Priority: Major > Labels: pull-request-available > Fix For: 4.0.0 > > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-45499) Replace `Reference#isEnqueued` with `Reference#refersTo`
[ https://issues.apache.org/jira/browse/SPARK-45499?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dongjoon Hyun resolved SPARK-45499. --- Fix Version/s: 4.0.0 Resolution: Fixed Issue resolved by pull request 43325 [https://github.com/apache/spark/pull/43325] > Replace `Reference#isEnqueued` with `Reference#refersTo` > > > Key: SPARK-45499 > URL: https://issues.apache.org/jira/browse/SPARK-45499 > Project: Spark > Issue Type: Sub-task > Components: Spark Core, Tests >Affects Versions: 4.0.0 >Reporter: Yang Jie >Assignee: Yang Jie >Priority: Minor > Labels: pull-request-available > Fix For: 4.0.0 > > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-45499) Replace `Reference#isEnqueued` with `Reference#refersTo`
[ https://issues.apache.org/jira/browse/SPARK-45499?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dongjoon Hyun reassigned SPARK-45499: - Assignee: Yang Jie > Replace `Reference#isEnqueued` with `Reference#refersTo` > > > Key: SPARK-45499 > URL: https://issues.apache.org/jira/browse/SPARK-45499 > Project: Spark > Issue Type: Sub-task > Components: Spark Core, Tests >Affects Versions: 4.0.0 >Reporter: Yang Jie >Assignee: Yang Jie >Priority: Minor > Labels: pull-request-available > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-42881) Codegen Support for get_json_object
[ https://issues.apache.org/jira/browse/SPARK-42881?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Max Gekk resolved SPARK-42881. -- Fix Version/s: 4.0.0 Resolution: Fixed Issue resolved by pull request 40506 [https://github.com/apache/spark/pull/40506] > Codegen Support for get_json_object > --- > > Key: SPARK-42881 > URL: https://issues.apache.org/jira/browse/SPARK-42881 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 3.5.0 >Reporter: BingKun Pan >Assignee: BingKun Pan >Priority: Minor > Labels: pull-request-available > Fix For: 4.0.0 > > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-42881) Codegen Support for get_json_object
[ https://issues.apache.org/jira/browse/SPARK-42881?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Max Gekk reassigned SPARK-42881: Assignee: BingKun Pan > Codegen Support for get_json_object > --- > > Key: SPARK-42881 > URL: https://issues.apache.org/jira/browse/SPARK-42881 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 3.5.0 >Reporter: BingKun Pan >Assignee: BingKun Pan >Priority: Minor > Labels: pull-request-available > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-45467) Replace `Proxy.getProxyClass()` with `Proxy.newProxyInstance().getClass`
[ https://issues.apache.org/jira/browse/SPARK-45467?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sean R. Owen resolved SPARK-45467. -- Fix Version/s: 4.0.0 Resolution: Fixed Issue resolved by pull request 43291 [https://github.com/apache/spark/pull/43291] > Replace `Proxy.getProxyClass()` with `Proxy.newProxyInstance().getClass` > > > Key: SPARK-45467 > URL: https://issues.apache.org/jira/browse/SPARK-45467 > Project: Spark > Issue Type: Sub-task > Components: Spark Core >Affects Versions: 4.0.0 >Reporter: Yang Jie >Assignee: Yang Jie >Priority: Major > Labels: pull-request-available > Fix For: 4.0.0 > > > {code:java} > * @deprecated Proxy classes generated in a named module are encapsulated > * and not accessible to code outside its module. > * {@link Constructor#newInstance(Object...) Constructor.newInstance} > * will throw {@code IllegalAccessException} when it is called on > * an inaccessible proxy class. > * Use {@link #newProxyInstance(ClassLoader, Class[], InvocationHandler)} > * to create a proxy instance instead. > * > * @see Package and Module Membership of Proxy Class > * @revised 9 > */ > @Deprecated > @CallerSensitive > public static Class getProxyClass(ClassLoader loader, > Class... interfaces) > throws IllegalArgumentException {code} -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-45467) Replace `Proxy.getProxyClass()` with `Proxy.newProxyInstance().getClass`
[ https://issues.apache.org/jira/browse/SPARK-45467?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sean R. Owen reassigned SPARK-45467: Assignee: Yang Jie > Replace `Proxy.getProxyClass()` with `Proxy.newProxyInstance().getClass` > > > Key: SPARK-45467 > URL: https://issues.apache.org/jira/browse/SPARK-45467 > Project: Spark > Issue Type: Sub-task > Components: Spark Core >Affects Versions: 4.0.0 >Reporter: Yang Jie >Assignee: Yang Jie >Priority: Major > Labels: pull-request-available > > {code:java} > * @deprecated Proxy classes generated in a named module are encapsulated > * and not accessible to code outside its module. > * {@link Constructor#newInstance(Object...) Constructor.newInstance} > * will throw {@code IllegalAccessException} when it is called on > * an inaccessible proxy class. > * Use {@link #newProxyInstance(ClassLoader, Class[], InvocationHandler)} > * to create a proxy instance instead. > * > * @see Package and Module Membership of Proxy Class > * @revised 9 > */ > @Deprecated > @CallerSensitive > public static Class getProxyClass(ClassLoader loader, > Class... interfaces) > throws IllegalArgumentException {code} -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-45467) Replace `Proxy.getProxyClass()` with `Proxy.newProxyInstance().getClass`
[ https://issues.apache.org/jira/browse/SPARK-45467?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sean R. Owen updated SPARK-45467: - Priority: Minor (was: Major) > Replace `Proxy.getProxyClass()` with `Proxy.newProxyInstance().getClass` > > > Key: SPARK-45467 > URL: https://issues.apache.org/jira/browse/SPARK-45467 > Project: Spark > Issue Type: Sub-task > Components: Spark Core >Affects Versions: 4.0.0 >Reporter: Yang Jie >Assignee: Yang Jie >Priority: Minor > Labels: pull-request-available > Fix For: 4.0.0 > > > {code:java} > * @deprecated Proxy classes generated in a named module are encapsulated > * and not accessible to code outside its module. > * {@link Constructor#newInstance(Object...) Constructor.newInstance} > * will throw {@code IllegalAccessException} when it is called on > * an inaccessible proxy class. > * Use {@link #newProxyInstance(ClassLoader, Class[], InvocationHandler)} > * to create a proxy instance instead. > * > * @see Package and Module Membership of Proxy Class > * @revised 9 > */ > @Deprecated > @CallerSensitive > public static Class getProxyClass(ClassLoader loader, > Class... interfaces) > throws IllegalArgumentException {code} -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-45496) Fix the compilation warning related to other-pure-statement
[ https://issues.apache.org/jira/browse/SPARK-45496?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sean R. Owen resolved SPARK-45496. -- Fix Version/s: 4.0.0 Resolution: Fixed Issue resolved by pull request 43312 [https://github.com/apache/spark/pull/43312] > Fix the compilation warning related to other-pure-statement > --- > > Key: SPARK-45496 > URL: https://issues.apache.org/jira/browse/SPARK-45496 > Project: Spark > Issue Type: Sub-task > Components: DStreams, Spark Core >Affects Versions: 4.0.0 >Reporter: Yang Jie >Assignee: Yang Jie >Priority: Minor > Labels: pull-request-available > Fix For: 4.0.0 > > > {code:java} > "-Wconf:cat=other-match-analysis=org.apache.spark.sql.catalyst.catalog.SessionCatalog.lookupFunction.catalogFunction:wv", > "-Wconf:cat=other-pure-statement=org.apache.spark.streaming.util.FileBasedWriteAheadLog.readAll.readFile:wv", > "-Wconf:cat=other-pure-statement=org.apache.spark.scheduler.OutputCommitCoordinatorSuite:wv", > "-Wconf:cat=other-pure-statement=org.apache.spark.sql.streaming.sources.StreamingDataSourceV2Suite.testPositiveCase.\\$anonfun:wv", > {code} -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-45496) Fix the compilation warning related to other-pure-statement
[ https://issues.apache.org/jira/browse/SPARK-45496?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sean R. Owen updated SPARK-45496: - Priority: Minor (was: Major) > Fix the compilation warning related to other-pure-statement > --- > > Key: SPARK-45496 > URL: https://issues.apache.org/jira/browse/SPARK-45496 > Project: Spark > Issue Type: Sub-task > Components: DStreams, Spark Core >Affects Versions: 4.0.0 >Reporter: Yang Jie >Assignee: Yang Jie >Priority: Minor > Labels: pull-request-available > > {code:java} > "-Wconf:cat=other-match-analysis=org.apache.spark.sql.catalyst.catalog.SessionCatalog.lookupFunction.catalogFunction:wv", > "-Wconf:cat=other-pure-statement=org.apache.spark.streaming.util.FileBasedWriteAheadLog.readAll.readFile:wv", > "-Wconf:cat=other-pure-statement=org.apache.spark.scheduler.OutputCommitCoordinatorSuite:wv", > "-Wconf:cat=other-pure-statement=org.apache.spark.sql.streaming.sources.StreamingDataSourceV2Suite.testPositiveCase.\\$anonfun:wv", > {code} -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-43828) Add config to control whether close idle connection
[ https://issues.apache.org/jira/browse/SPARK-43828?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sean R. Owen resolved SPARK-43828. -- Resolution: Won't Fix > Add config to control whether close idle connection > --- > > Key: SPARK-43828 > URL: https://issues.apache.org/jira/browse/SPARK-43828 > Project: Spark > Issue Type: Improvement > Components: Spark Core >Affects Versions: 3.4.0 >Reporter: Zhongwei Zhu >Priority: Minor > Labels: pull-request-available > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-45496) Fix the compilation warning related to other-pure-statement
[ https://issues.apache.org/jira/browse/SPARK-45496?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sean R. Owen reassigned SPARK-45496: Assignee: Yang Jie > Fix the compilation warning related to other-pure-statement > --- > > Key: SPARK-45496 > URL: https://issues.apache.org/jira/browse/SPARK-45496 > Project: Spark > Issue Type: Sub-task > Components: DStreams, Spark Core >Affects Versions: 4.0.0 >Reporter: Yang Jie >Assignee: Yang Jie >Priority: Major > Labels: pull-request-available > > {code:java} > "-Wconf:cat=other-match-analysis=org.apache.spark.sql.catalyst.catalog.SessionCatalog.lookupFunction.catalogFunction:wv", > "-Wconf:cat=other-pure-statement=org.apache.spark.streaming.util.FileBasedWriteAheadLog.readAll.readFile:wv", > "-Wconf:cat=other-pure-statement=org.apache.spark.scheduler.OutputCommitCoordinatorSuite:wv", > "-Wconf:cat=other-pure-statement=org.apache.spark.sql.streaming.sources.StreamingDataSourceV2Suite.testPositiveCase.\\$anonfun:wv", > {code} -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-45496) Fix the compilation warning related to other-pure-statement
[ https://issues.apache.org/jira/browse/SPARK-45496?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated SPARK-45496: --- Labels: pull-request-available (was: ) > Fix the compilation warning related to other-pure-statement > --- > > Key: SPARK-45496 > URL: https://issues.apache.org/jira/browse/SPARK-45496 > Project: Spark > Issue Type: Sub-task > Components: DStreams, Spark Core >Affects Versions: 4.0.0 >Reporter: Yang Jie >Priority: Major > Labels: pull-request-available > > {code:java} > "-Wconf:cat=other-match-analysis=org.apache.spark.sql.catalyst.catalog.SessionCatalog.lookupFunction.catalogFunction:wv", > "-Wconf:cat=other-pure-statement=org.apache.spark.streaming.util.FileBasedWriteAheadLog.readAll.readFile:wv", > "-Wconf:cat=other-pure-statement=org.apache.spark.scheduler.OutputCommitCoordinatorSuite:wv", > "-Wconf:cat=other-pure-statement=org.apache.spark.sql.streaming.sources.StreamingDataSourceV2Suite.testPositiveCase.\\$anonfun:wv", > {code} -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-45451) Make the default storage level of dataset cache configurable
[ https://issues.apache.org/jira/browse/SPARK-45451?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wenchen Fan resolved SPARK-45451. - Fix Version/s: 4.0.0 Resolution: Fixed Issue resolved by pull request 43259 [https://github.com/apache/spark/pull/43259] > Make the default storage level of dataset cache configurable > > > Key: SPARK-45451 > URL: https://issues.apache.org/jira/browse/SPARK-45451 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 4.0.0 >Reporter: XiDuo You >Assignee: XiDuo You >Priority: Major > Labels: pull-request-available > Fix For: 4.0.0 > > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-45451) Make the default storage level of dataset cache configurable
[ https://issues.apache.org/jira/browse/SPARK-45451?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wenchen Fan reassigned SPARK-45451: --- Assignee: XiDuo You > Make the default storage level of dataset cache configurable > > > Key: SPARK-45451 > URL: https://issues.apache.org/jira/browse/SPARK-45451 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 4.0.0 >Reporter: XiDuo You >Assignee: XiDuo You >Priority: Major > Labels: pull-request-available > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-45116) Add some comment for param of JdbcDialect createTable
[ https://issues.apache.org/jira/browse/SPARK-45116?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Max Gekk resolved SPARK-45116. -- Fix Version/s: 4.0.0 Resolution: Fixed Issue resolved by pull request 42799 [https://github.com/apache/spark/pull/42799] > Add some comment for param of JdbcDialect createTable > - > > Key: SPARK-45116 > URL: https://issues.apache.org/jira/browse/SPARK-45116 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 3.5.1 >Reporter: Jia Fan >Assignee: Jia Fan >Priority: Minor > Labels: pull-request-available > Fix For: 4.0.0 > > > Since SPARK-41516 , add {{createTable}} to {{{}JdbcDialect{}}}. But doesn't > add comment for param. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-45116) Add some comment for param of JdbcDialect createTable
[ https://issues.apache.org/jira/browse/SPARK-45116?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Max Gekk reassigned SPARK-45116: Assignee: Jia Fan > Add some comment for param of JdbcDialect createTable > - > > Key: SPARK-45116 > URL: https://issues.apache.org/jira/browse/SPARK-45116 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 3.5.1 >Reporter: Jia Fan >Assignee: Jia Fan >Priority: Minor > Labels: pull-request-available > > Since SPARK-41516 , add {{createTable}} to {{{}JdbcDialect{}}}. But doesn't > add comment for param. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-45397) Add vector assembler feature transformer
[ https://issues.apache.org/jira/browse/SPARK-45397?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Weichen Xu resolved SPARK-45397. Fix Version/s: 4.0.0 Resolution: Fixed Issue resolved by pull request 43199 [https://github.com/apache/spark/pull/43199] > Add vector assembler feature transformer > > > Key: SPARK-45397 > URL: https://issues.apache.org/jira/browse/SPARK-45397 > Project: Spark > Issue Type: Sub-task > Components: Connect, ML, PySpark >Affects Versions: 3.5.1 >Reporter: Weichen Xu >Assignee: Weichen Xu >Priority: Major > Labels: pull-request-available > Fix For: 4.0.0 > > > Add vector assembler feature transformer -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-45428) Add Matomo analytics to all released docs pages
[ https://issues.apache.org/jira/browse/SPARK-45428?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ruifeng Zheng reassigned SPARK-45428: - Assignee: BingKun Pan > Add Matomo analytics to all released docs pages > --- > > Key: SPARK-45428 > URL: https://issues.apache.org/jira/browse/SPARK-45428 > Project: Spark > Issue Type: Sub-task > Components: Documentation, PySpark >Affects Versions: 3.5.0, 4.0.0 >Reporter: Allison Wang >Assignee: BingKun Pan >Priority: Major > > Matomo analytics has been added to some pages of the Spark website. Here is > Sean's initial PR: > [https://github.com/apache/spark-website/pull/479.|https://www.google.com/url?q=https://github.com/apache/spark-website/pull/479=D=docs=1696544881650480=AOvVaw11SNfWcd4UJzlO8EJvzdoe] > You can find analytics for Spark website here: https://analytics.apache.org > We need to add this to all API pages. This is very important for us to > prioritize documentation improvements and search engine optimization. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-45397) Add vector assembler feature transformer
[ https://issues.apache.org/jira/browse/SPARK-45397?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Weichen Xu reassigned SPARK-45397: -- Assignee: Weichen Xu > Add vector assembler feature transformer > > > Key: SPARK-45397 > URL: https://issues.apache.org/jira/browse/SPARK-45397 > Project: Spark > Issue Type: Sub-task > Components: Connect, ML, PySpark >Affects Versions: 3.5.1 >Reporter: Weichen Xu >Assignee: Weichen Xu >Priority: Major > Labels: pull-request-available > > Add vector assembler feature transformer -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-45466) VectorAssembler should validate the vector values
[ https://issues.apache.org/jira/browse/SPARK-45466?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ruifeng Zheng resolved SPARK-45466. --- Resolution: Not A Problem > VectorAssembler should validate the vector values > - > > Key: SPARK-45466 > URL: https://issues.apache.org/jira/browse/SPARK-45466 > Project: Spark > Issue Type: Improvement > Components: ML >Affects Versions: 4.0.0 >Reporter: Ruifeng Zheng >Priority: Major > Labels: pull-request-available > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Comment Edited] (SPARK-45201) NoClassDefFoundError: InternalFutureFailureAccess when compiling Spark 3.5.0
[ https://issues.apache.org/jira/browse/SPARK-45201?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17767622#comment-17767622 ] Sebastian Daberdaku edited comment on SPARK-45201 at 10/11/23 9:58 AM: --- After spending hours analyzing the project pom files, I discovered two things. First, the shade plugin is relocating the guava/failureaccess package twice in the connect jars (once by the module shade plugin, once by the base project plugin). I created a simple patch to prevent the relocation of failureacces by the base plugin. I am adding the patch file [^spark-3.5.0.patch] to this Jira issue, I do not have time to create a pull request, you can apply the patch by navigating inside the source folder and running: {code:java} patch -p1 NoClassDefFoundError: InternalFutureFailureAccess when compiling Spark 3.5.0 > > > Key: SPARK-45201 > URL: https://issues.apache.org/jira/browse/SPARK-45201 > Project: Spark > Issue Type: Bug > Components: Connect >Affects Versions: 3.5.0 >Reporter: Sebastian Daberdaku >Priority: Major > Attachments: Dockerfile, spark-3.5.0.patch > > > I am trying to compile Spark 3.5.0 and make a distribution that supports > Spark Connect and Kubernetes. The compilation seems to complete correctly, > but when I try to run the Spark Connect server on kubernetes I get a > "NoClassDefFoundError" as follows: > {code:java} > Exception in thread "main" java.lang.NoClassDefFoundError: > org/sparkproject/guava/util/concurrent/internal/InternalFutureFailureAccess > at java.base/java.lang.ClassLoader.defineClass1(Native Method) > at java.base/java.lang.ClassLoader.defineClass(ClassLoader.java:1017) > at > java.base/java.security.SecureClassLoader.defineClass(SecureClassLoader.java:150) > at > java.base/jdk.internal.loader.BuiltinClassLoader.defineClass(BuiltinClassLoader.java:862) > at > java.base/jdk.internal.loader.BuiltinClassLoader.findClassOnClassPathOrNull(BuiltinClassLoader.java:760) > at > java.base/jdk.internal.loader.BuiltinClassLoader.loadClassOrNull(BuiltinClassLoader.java:681) > at > java.base/jdk.internal.loader.BuiltinClassLoader.loadClass(BuiltinClassLoader.java:639) > at > java.base/jdk.internal.loader.ClassLoaders$AppClassLoader.loadClass(ClassLoaders.java:188) > at java.base/java.lang.ClassLoader.loadClass(ClassLoader.java:525) > at java.base/java.lang.ClassLoader.defineClass1(Native Method) > at java.base/java.lang.ClassLoader.defineClass(ClassLoader.java:1017) > at > java.base/java.security.SecureClassLoader.defineClass(SecureClassLoader.java:150) > at > java.base/jdk.internal.loader.BuiltinClassLoader.defineClass(BuiltinClassLoader.java:862) > at > java.base/jdk.internal.loader.BuiltinClassLoader.findClassOnClassPathOrNull(BuiltinClassLoader.java:760) > at > java.base/jdk.internal.loader.BuiltinClassLoader.loadClassOrNull(BuiltinClassLoader.java:681) > at > java.base/jdk.internal.loader.BuiltinClassLoader.loadClass(BuiltinClassLoader.java:639) > at > java.base/jdk.internal.loader.ClassLoaders$AppClassLoader.loadClass(ClassLoaders.java:188) > at java.base/java.lang.ClassLoader.loadClass(ClassLoader.java:525) > at java.base/java.lang.ClassLoader.defineClass1(Native Method) > at java.base/java.lang.ClassLoader.defineClass(ClassLoader.java:1017) > at > java.base/java.security.SecureClassLoader.defineClass(SecureClassLoader.java:150) > at > java.base/jdk.internal.loader.BuiltinClassLoader.defineClass(BuiltinClassLoader.java:862) > at > java.base/jdk.internal.loader.BuiltinClassLoader.findClassOnClassPathOrNull(BuiltinClassLoader.java:760) > at > java.base/jdk.internal.loader.BuiltinClassLoader.loadClassOrNull(BuiltinClassLoader.java:681) > at > java.base/jdk.internal.loader.BuiltinClassLoader.loadClass(BuiltinClassLoader.java:639) > at > java.base/jdk.internal.loader.ClassLoaders$AppClassLoader.loadClass(ClassLoaders.java:188) > at java.base/java.lang.ClassLoader.loadClass(ClassLoader.java:525) > at > org.sparkproject.guava.cache.LocalCache$LoadingValueReference.(LocalCache.java:3511) > at > org.sparkproject.guava.cache.LocalCache$LoadingValueReference.(LocalCache.java:3515) > at > org.sparkproject.guava.cache.LocalCache$Segment.lockedGetOrLoad(LocalCache.java:2168) > at > org.sparkproject.guava.cache.LocalCache$Segment.get(LocalCache.java:2079) > at org.sparkproject.guava.cache.LocalCache.get(LocalCache.java:4011) > at org.sparkproject.guava.cache.LocalCache.getOrLoad(LocalCache.java:4034) > at > org.sparkproject.guava.cache.LocalCache$LocalLoadingCache.get(LocalCache.java:5010) > at >
[jira] [Comment Edited] (SPARK-45201) NoClassDefFoundError: InternalFutureFailureAccess when compiling Spark 3.5.0
[ https://issues.apache.org/jira/browse/SPARK-45201?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17767622#comment-17767622 ] Sebastian Daberdaku edited comment on SPARK-45201 at 10/11/23 9:58 AM: --- After spending hours analyzing the project pom files, I discovered two things. First, the shade plugin is relocating the guava/failureaccess package twice in the connect jars (once by the module shade plugin, once by the base project plugin). I created a simple patch to prevent the relocation of failureacces by the base plugin. I am adding the patch file [^spark-3.5.0.patch] to this Jira issue, I do not have time to create a pull request, you can apply the patch by navigating inside the source folder and running: patch -p1 NoClassDefFoundError: InternalFutureFailureAccess when compiling Spark 3.5.0 > > > Key: SPARK-45201 > URL: https://issues.apache.org/jira/browse/SPARK-45201 > Project: Spark > Issue Type: Bug > Components: Connect >Affects Versions: 3.5.0 >Reporter: Sebastian Daberdaku >Priority: Major > Attachments: Dockerfile, spark-3.5.0.patch > > > I am trying to compile Spark 3.5.0 and make a distribution that supports > Spark Connect and Kubernetes. The compilation seems to complete correctly, > but when I try to run the Spark Connect server on kubernetes I get a > "NoClassDefFoundError" as follows: > {code:java} > Exception in thread "main" java.lang.NoClassDefFoundError: > org/sparkproject/guava/util/concurrent/internal/InternalFutureFailureAccess > at java.base/java.lang.ClassLoader.defineClass1(Native Method) > at java.base/java.lang.ClassLoader.defineClass(ClassLoader.java:1017) > at > java.base/java.security.SecureClassLoader.defineClass(SecureClassLoader.java:150) > at > java.base/jdk.internal.loader.BuiltinClassLoader.defineClass(BuiltinClassLoader.java:862) > at > java.base/jdk.internal.loader.BuiltinClassLoader.findClassOnClassPathOrNull(BuiltinClassLoader.java:760) > at > java.base/jdk.internal.loader.BuiltinClassLoader.loadClassOrNull(BuiltinClassLoader.java:681) > at > java.base/jdk.internal.loader.BuiltinClassLoader.loadClass(BuiltinClassLoader.java:639) > at > java.base/jdk.internal.loader.ClassLoaders$AppClassLoader.loadClass(ClassLoaders.java:188) > at java.base/java.lang.ClassLoader.loadClass(ClassLoader.java:525) > at java.base/java.lang.ClassLoader.defineClass1(Native Method) > at java.base/java.lang.ClassLoader.defineClass(ClassLoader.java:1017) > at > java.base/java.security.SecureClassLoader.defineClass(SecureClassLoader.java:150) > at > java.base/jdk.internal.loader.BuiltinClassLoader.defineClass(BuiltinClassLoader.java:862) > at > java.base/jdk.internal.loader.BuiltinClassLoader.findClassOnClassPathOrNull(BuiltinClassLoader.java:760) > at > java.base/jdk.internal.loader.BuiltinClassLoader.loadClassOrNull(BuiltinClassLoader.java:681) > at > java.base/jdk.internal.loader.BuiltinClassLoader.loadClass(BuiltinClassLoader.java:639) > at > java.base/jdk.internal.loader.ClassLoaders$AppClassLoader.loadClass(ClassLoaders.java:188) > at java.base/java.lang.ClassLoader.loadClass(ClassLoader.java:525) > at java.base/java.lang.ClassLoader.defineClass1(Native Method) > at java.base/java.lang.ClassLoader.defineClass(ClassLoader.java:1017) > at > java.base/java.security.SecureClassLoader.defineClass(SecureClassLoader.java:150) > at > java.base/jdk.internal.loader.BuiltinClassLoader.defineClass(BuiltinClassLoader.java:862) > at > java.base/jdk.internal.loader.BuiltinClassLoader.findClassOnClassPathOrNull(BuiltinClassLoader.java:760) > at > java.base/jdk.internal.loader.BuiltinClassLoader.loadClassOrNull(BuiltinClassLoader.java:681) > at > java.base/jdk.internal.loader.BuiltinClassLoader.loadClass(BuiltinClassLoader.java:639) > at > java.base/jdk.internal.loader.ClassLoaders$AppClassLoader.loadClass(ClassLoaders.java:188) > at java.base/java.lang.ClassLoader.loadClass(ClassLoader.java:525) > at > org.sparkproject.guava.cache.LocalCache$LoadingValueReference.(LocalCache.java:3511) > at > org.sparkproject.guava.cache.LocalCache$LoadingValueReference.(LocalCache.java:3515) > at > org.sparkproject.guava.cache.LocalCache$Segment.lockedGetOrLoad(LocalCache.java:2168) > at > org.sparkproject.guava.cache.LocalCache$Segment.get(LocalCache.java:2079) > at org.sparkproject.guava.cache.LocalCache.get(LocalCache.java:4011) > at org.sparkproject.guava.cache.LocalCache.getOrLoad(LocalCache.java:4034) > at > org.sparkproject.guava.cache.LocalCache$LocalLoadingCache.get(LocalCache.java:5010) > at >
[jira] [Comment Edited] (SPARK-45201) NoClassDefFoundError: InternalFutureFailureAccess when compiling Spark 3.5.0
[ https://issues.apache.org/jira/browse/SPARK-45201?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17767622#comment-17767622 ] Sebastian Daberdaku edited comment on SPARK-45201 at 10/11/23 9:58 AM: --- After spending hours analyzing the project pom files, I discovered two things. First, the shade plugin is relocating the guava/failureaccess package twice in the connect jars (once by the module shade plugin, once by the base project plugin). I created a simple patch to prevent the relocation of failureacces by the base plugin. I am adding the patch file [^spark-3.5.0.patch] to this Jira issue, I do not have time to create a pull request, you can apply the patch by navigating inside the source folder and running: {{}} {code:java} patch -p1 NoClassDefFoundError: InternalFutureFailureAccess when compiling Spark 3.5.0 > > > Key: SPARK-45201 > URL: https://issues.apache.org/jira/browse/SPARK-45201 > Project: Spark > Issue Type: Bug > Components: Connect >Affects Versions: 3.5.0 >Reporter: Sebastian Daberdaku >Priority: Major > Attachments: Dockerfile, spark-3.5.0.patch > > > I am trying to compile Spark 3.5.0 and make a distribution that supports > Spark Connect and Kubernetes. The compilation seems to complete correctly, > but when I try to run the Spark Connect server on kubernetes I get a > "NoClassDefFoundError" as follows: > {code:java} > Exception in thread "main" java.lang.NoClassDefFoundError: > org/sparkproject/guava/util/concurrent/internal/InternalFutureFailureAccess > at java.base/java.lang.ClassLoader.defineClass1(Native Method) > at java.base/java.lang.ClassLoader.defineClass(ClassLoader.java:1017) > at > java.base/java.security.SecureClassLoader.defineClass(SecureClassLoader.java:150) > at > java.base/jdk.internal.loader.BuiltinClassLoader.defineClass(BuiltinClassLoader.java:862) > at > java.base/jdk.internal.loader.BuiltinClassLoader.findClassOnClassPathOrNull(BuiltinClassLoader.java:760) > at > java.base/jdk.internal.loader.BuiltinClassLoader.loadClassOrNull(BuiltinClassLoader.java:681) > at > java.base/jdk.internal.loader.BuiltinClassLoader.loadClass(BuiltinClassLoader.java:639) > at > java.base/jdk.internal.loader.ClassLoaders$AppClassLoader.loadClass(ClassLoaders.java:188) > at java.base/java.lang.ClassLoader.loadClass(ClassLoader.java:525) > at java.base/java.lang.ClassLoader.defineClass1(Native Method) > at java.base/java.lang.ClassLoader.defineClass(ClassLoader.java:1017) > at > java.base/java.security.SecureClassLoader.defineClass(SecureClassLoader.java:150) > at > java.base/jdk.internal.loader.BuiltinClassLoader.defineClass(BuiltinClassLoader.java:862) > at > java.base/jdk.internal.loader.BuiltinClassLoader.findClassOnClassPathOrNull(BuiltinClassLoader.java:760) > at > java.base/jdk.internal.loader.BuiltinClassLoader.loadClassOrNull(BuiltinClassLoader.java:681) > at > java.base/jdk.internal.loader.BuiltinClassLoader.loadClass(BuiltinClassLoader.java:639) > at > java.base/jdk.internal.loader.ClassLoaders$AppClassLoader.loadClass(ClassLoaders.java:188) > at java.base/java.lang.ClassLoader.loadClass(ClassLoader.java:525) > at java.base/java.lang.ClassLoader.defineClass1(Native Method) > at java.base/java.lang.ClassLoader.defineClass(ClassLoader.java:1017) > at > java.base/java.security.SecureClassLoader.defineClass(SecureClassLoader.java:150) > at > java.base/jdk.internal.loader.BuiltinClassLoader.defineClass(BuiltinClassLoader.java:862) > at > java.base/jdk.internal.loader.BuiltinClassLoader.findClassOnClassPathOrNull(BuiltinClassLoader.java:760) > at > java.base/jdk.internal.loader.BuiltinClassLoader.loadClassOrNull(BuiltinClassLoader.java:681) > at > java.base/jdk.internal.loader.BuiltinClassLoader.loadClass(BuiltinClassLoader.java:639) > at > java.base/jdk.internal.loader.ClassLoaders$AppClassLoader.loadClass(ClassLoaders.java:188) > at java.base/java.lang.ClassLoader.loadClass(ClassLoader.java:525) > at > org.sparkproject.guava.cache.LocalCache$LoadingValueReference.(LocalCache.java:3511) > at > org.sparkproject.guava.cache.LocalCache$LoadingValueReference.(LocalCache.java:3515) > at > org.sparkproject.guava.cache.LocalCache$Segment.lockedGetOrLoad(LocalCache.java:2168) > at > org.sparkproject.guava.cache.LocalCache$Segment.get(LocalCache.java:2079) > at org.sparkproject.guava.cache.LocalCache.get(LocalCache.java:4011) > at org.sparkproject.guava.cache.LocalCache.getOrLoad(LocalCache.java:4034) > at > org.sparkproject.guava.cache.LocalCache$LocalLoadingCache.get(LocalCache.java:5010) > at >
[jira] [Commented] (SPARK-45201) NoClassDefFoundError: InternalFutureFailureAccess when compiling Spark 3.5.0
[ https://issues.apache.org/jira/browse/SPARK-45201?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17773976#comment-17773976 ] xie shuiahu commented on SPARK-45201: - [~sdaberdaku] I alse have the same issue. I solved it by putting spark-connect.jar in spark-submit --jars, instead of SPARK_HOME/jars > NoClassDefFoundError: InternalFutureFailureAccess when compiling Spark 3.5.0 > > > Key: SPARK-45201 > URL: https://issues.apache.org/jira/browse/SPARK-45201 > Project: Spark > Issue Type: Bug > Components: Connect >Affects Versions: 3.5.0 >Reporter: Sebastian Daberdaku >Priority: Major > Attachments: Dockerfile, spark-3.5.0.patch > > > I am trying to compile Spark 3.5.0 and make a distribution that supports > Spark Connect and Kubernetes. The compilation seems to complete correctly, > but when I try to run the Spark Connect server on kubernetes I get a > "NoClassDefFoundError" as follows: > {code:java} > Exception in thread "main" java.lang.NoClassDefFoundError: > org/sparkproject/guava/util/concurrent/internal/InternalFutureFailureAccess > at java.base/java.lang.ClassLoader.defineClass1(Native Method) > at java.base/java.lang.ClassLoader.defineClass(ClassLoader.java:1017) > at > java.base/java.security.SecureClassLoader.defineClass(SecureClassLoader.java:150) > at > java.base/jdk.internal.loader.BuiltinClassLoader.defineClass(BuiltinClassLoader.java:862) > at > java.base/jdk.internal.loader.BuiltinClassLoader.findClassOnClassPathOrNull(BuiltinClassLoader.java:760) > at > java.base/jdk.internal.loader.BuiltinClassLoader.loadClassOrNull(BuiltinClassLoader.java:681) > at > java.base/jdk.internal.loader.BuiltinClassLoader.loadClass(BuiltinClassLoader.java:639) > at > java.base/jdk.internal.loader.ClassLoaders$AppClassLoader.loadClass(ClassLoaders.java:188) > at java.base/java.lang.ClassLoader.loadClass(ClassLoader.java:525) > at java.base/java.lang.ClassLoader.defineClass1(Native Method) > at java.base/java.lang.ClassLoader.defineClass(ClassLoader.java:1017) > at > java.base/java.security.SecureClassLoader.defineClass(SecureClassLoader.java:150) > at > java.base/jdk.internal.loader.BuiltinClassLoader.defineClass(BuiltinClassLoader.java:862) > at > java.base/jdk.internal.loader.BuiltinClassLoader.findClassOnClassPathOrNull(BuiltinClassLoader.java:760) > at > java.base/jdk.internal.loader.BuiltinClassLoader.loadClassOrNull(BuiltinClassLoader.java:681) > at > java.base/jdk.internal.loader.BuiltinClassLoader.loadClass(BuiltinClassLoader.java:639) > at > java.base/jdk.internal.loader.ClassLoaders$AppClassLoader.loadClass(ClassLoaders.java:188) > at java.base/java.lang.ClassLoader.loadClass(ClassLoader.java:525) > at java.base/java.lang.ClassLoader.defineClass1(Native Method) > at java.base/java.lang.ClassLoader.defineClass(ClassLoader.java:1017) > at > java.base/java.security.SecureClassLoader.defineClass(SecureClassLoader.java:150) > at > java.base/jdk.internal.loader.BuiltinClassLoader.defineClass(BuiltinClassLoader.java:862) > at > java.base/jdk.internal.loader.BuiltinClassLoader.findClassOnClassPathOrNull(BuiltinClassLoader.java:760) > at > java.base/jdk.internal.loader.BuiltinClassLoader.loadClassOrNull(BuiltinClassLoader.java:681) > at > java.base/jdk.internal.loader.BuiltinClassLoader.loadClass(BuiltinClassLoader.java:639) > at > java.base/jdk.internal.loader.ClassLoaders$AppClassLoader.loadClass(ClassLoaders.java:188) > at java.base/java.lang.ClassLoader.loadClass(ClassLoader.java:525) > at > org.sparkproject.guava.cache.LocalCache$LoadingValueReference.(LocalCache.java:3511) > at > org.sparkproject.guava.cache.LocalCache$LoadingValueReference.(LocalCache.java:3515) > at > org.sparkproject.guava.cache.LocalCache$Segment.lockedGetOrLoad(LocalCache.java:2168) > at > org.sparkproject.guava.cache.LocalCache$Segment.get(LocalCache.java:2079) > at org.sparkproject.guava.cache.LocalCache.get(LocalCache.java:4011) > at org.sparkproject.guava.cache.LocalCache.getOrLoad(LocalCache.java:4034) > at > org.sparkproject.guava.cache.LocalCache$LocalLoadingCache.get(LocalCache.java:5010) > at > org.apache.spark.storage.BlockManagerId$.getCachedBlockManagerId(BlockManagerId.scala:146) > at > org.apache.spark.storage.BlockManagerId$.apply(BlockManagerId.scala:127) > at > org.apache.spark.storage.BlockManager.initialize(BlockManager.scala:536) > at org.apache.spark.SparkContext.(SparkContext.scala:625) > at org.apache.spark.SparkContext$.getOrCreate(SparkContext.scala:2888) > at > org.apache.spark.sql.SparkSession$Builder.$anonfun$getOrCreate$2(SparkSession.scala:1099) > at
[jira] [Resolved] (SPARK-45469) Replace `toIterator` with `iterator` for `IterableOnce`
[ https://issues.apache.org/jira/browse/SPARK-45469?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dongjoon Hyun resolved SPARK-45469. --- Fix Version/s: 4.0.0 Resolution: Fixed Issue resolved by pull request 43295 [https://github.com/apache/spark/pull/43295] > Replace `toIterator` with `iterator` for `IterableOnce` > --- > > Key: SPARK-45469 > URL: https://issues.apache.org/jira/browse/SPARK-45469 > Project: Spark > Issue Type: Sub-task > Components: Connect, Spark Core, SQL >Affects Versions: 4.0.0 >Reporter: Yang Jie >Assignee: Yang Jie >Priority: Major > Labels: pull-request-available > Fix For: 4.0.0 > > > {code:java} > @deprecated("Use .iterator instead", "2.13.0") > @`inline` def toIterator: Iterator[A] = it.iterator {code} -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-45502) Upgrade Kafka to 3.6.0
[ https://issues.apache.org/jira/browse/SPARK-45502?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17773964#comment-17773964 ] Deng Ziming commented on SPARK-45502: - I'm trying this. > Upgrade Kafka to 3.6.0 > -- > > Key: SPARK-45502 > URL: https://issues.apache.org/jira/browse/SPARK-45502 > Project: Spark > Issue Type: Sub-task > Components: Build >Affects Versions: 4.0.0 >Reporter: Dongjoon Hyun >Priority: Major > > Apache Kafka 3.6.0 is released on Oct 10, 2023. > - https://downloads.apache.org/kafka/3.6.0/RELEASE_NOTES.html -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-45500) Show the number of abnormally completed drivers in MasterPage
[ https://issues.apache.org/jira/browse/SPARK-45500?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dongjoon Hyun resolved SPARK-45500. --- Fix Version/s: 4.0.0 Resolution: Fixed Issue resolved by pull request 43328 [https://github.com/apache/spark/pull/43328] > Show the number of abnormally completed drivers in MasterPage > - > > Key: SPARK-45500 > URL: https://issues.apache.org/jira/browse/SPARK-45500 > Project: Spark > Issue Type: Improvement > Components: Spark Core, Web UI >Affects Versions: 4.0.0 >Reporter: Dongjoon Hyun >Assignee: Dongjoon Hyun >Priority: Major > Labels: pull-request-available > Fix For: 4.0.0 > > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-45500) Show the number of abnormally completed drivers in MasterPage
[ https://issues.apache.org/jira/browse/SPARK-45500?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dongjoon Hyun reassigned SPARK-45500: - Assignee: Dongjoon Hyun > Show the number of abnormally completed drivers in MasterPage > - > > Key: SPARK-45500 > URL: https://issues.apache.org/jira/browse/SPARK-45500 > Project: Spark > Issue Type: Improvement > Components: Spark Core, Web UI >Affects Versions: 4.0.0 >Reporter: Dongjoon Hyun >Assignee: Dongjoon Hyun >Priority: Major > Labels: pull-request-available > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Comment Edited] (SPARK-44757) Vulnerabilities in Spark3.4
[ https://issues.apache.org/jira/browse/SPARK-44757?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17773949#comment-17773949 ] Laurenceau Julien edited comment on SPARK-44757 at 10/11/23 8:14 AM: - Hi, In addition to this I'd like to add the following CVE: h1. CVE-2022-1471 (High) detected in snakeyaml-1.33.jar SnakeYaml's Constructor() class does not restrict types which can be instantiated during deserialization. Deserializing yaml content provided by an attacker can lead to remote code execution. We recommend using SnakeYaml's SafeConsturctor when parsing untrusted content to restrict deserialization. Publish Date: 2022-12-01 URL: [CVE-2022-1471|https://www.mend.io/vulnerability-database/CVE-2022-1471] was (Author: julienlau): Hi, In addition to this I'd like to add the following high CVE: h1. CVE-2022-1471 (High) detected in snakeyaml-1.33.jar > Vulnerabilities in Spark3.4 > --- > > Key: SPARK-44757 > URL: https://issues.apache.org/jira/browse/SPARK-44757 > Project: Spark > Issue Type: Bug > Components: Spark Core >Affects Versions: 3.4.0 >Reporter: Anand Balasubramaniam >Priority: Major > > We are seeing below list of TPLS's with vulnerabilities bundled with Spark3.4 > package with StackRox scan , is there any ETA on fixing them ? Kindly apprise > us on the same . > h2. Vulnerabilities in Spark3.4 > |*CVE*|*Description*|*Severity*| > |CVE-2018-21234|Jodd before 5.0.4 performs Deserialization of Untrusted JSON > Data when setClassMetadataName is set.|CVSS Score:9.8Critical| > |CVE-2022-42004|In FasterXML jackson-databind before 2.13.4, resource > exhaustion can occur because of a lack of a check in > BeanDeserializer._deserializeFromArray to prevent use of deeply nested > arrays. An application is vulnerable only with certain customized choices for > deserialization.|CVSS Score 7.5Important| > | CVE-2022-42003|In FasterXML jackson-databind before 2.14.0-rc1, resource > exhaustion can occur because of a lack of a check in primitive value > deserializers to avoid deep wrapper array nesting, when the > UNWRAP_SINGLE_VALUE_ARRAYS feature is enabled. Additional fix version in > 2.13.4.1 and 2.12.17.1|CVSS Score 7.5Important| > |CVE-2022-40152|Those using Woodstox to parse XML data may be vulnerable to > Denial of Service attacks (DOS) if DTD support is enabled. If the parser is > running on user supplied input, an attacker may supply content that causes > the parser to crash by stackoverflow. This effect may support a denial of > service attack.|CVSS Score 7.5Important| > |CVE-2022-3171|A parsing issue with binary data in protobuf-java core and > lite versions prior to 3.21.7, 3.20.3, 3.19.6 and 3.16.3 can lead to a denial > of service attack. Inputs containing multiple instances of non-repeated > embedded messages with repeated or unknown fields causes objects to be > converted back-n-forth between mutable and immutable forms, resulting in > potentially long garbage collection pauses. We recommend updating to the > versions mentioned above.|CVSS Score 7.5Important| > |CVE-2021-34538|Apache Hive before 3.1.3 "CREATE" and "DROP" function > operations does not check for necessary authorization of involved entities in > the query. It was found that an unauthorized user can manipulate an existing > UDF without having the privileges to do so. This allowed unauthorized or > underprivileged users to drop and recreate UDFs pointing them to new jars > that could be potentially malicious.|CVSS Score 7.5Important| > |CVE-2020-13949|In Apache Thrift 0.9.3 to 0.13.0, malicious RPC clients could > send short messages which would result in a large memory allocation, > potentially leading to denial of service.|CVSS Score 7.5Important| > |CVE-2018-10237|Unbounded memory allocation in Google Guava 11.0 through 24.x > before 24.1.1 allows remote attackers to conduct denial of service attacks > against servers that depend on this library and deserialize attacker-provided > data, because the AtomicDoubleArray class (when serialized with Java > serialization) and the CompoundOrdering class (when serialized with GWT > serialization) perform eager allocation without appropriate checks on what a > client has sent and whether the data size is reasonable.|CVSS 5.9Moderate| > |CVE-2021-22569|An issue in protobuf-java allowed the interleaving of > com.google.protobuf.UnknownFieldSet fields in such a way that would be > processed out of order. A small malicious payload can occupy the parser for > several minutes by creating large numbers of short-lived objects that cause > frequent, repeated pauses. We recommend upgrading libraries beyond the > vulnerable versions.|CVSS 5.9Moderate| > |CVE-2020-8908|A temp directory creation vulnerability exists in all versions > of Guava,
[jira] [Commented] (SPARK-44757) Vulnerabilities in Spark3.4
[ https://issues.apache.org/jira/browse/SPARK-44757?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17773949#comment-17773949 ] Laurenceau Julien commented on SPARK-44757: --- Hi, In addition to this I'd like to add the following high CVE: h1. CVE-2022-1471 (High) detected in snakeyaml-1.33.jar > Vulnerabilities in Spark3.4 > --- > > Key: SPARK-44757 > URL: https://issues.apache.org/jira/browse/SPARK-44757 > Project: Spark > Issue Type: Bug > Components: Spark Core >Affects Versions: 3.4.0 >Reporter: Anand Balasubramaniam >Priority: Major > > We are seeing below list of TPLS's with vulnerabilities bundled with Spark3.4 > package with StackRox scan , is there any ETA on fixing them ? Kindly apprise > us on the same . > h2. Vulnerabilities in Spark3.4 > |*CVE*|*Description*|*Severity*| > |CVE-2018-21234|Jodd before 5.0.4 performs Deserialization of Untrusted JSON > Data when setClassMetadataName is set.|CVSS Score:9.8Critical| > |CVE-2022-42004|In FasterXML jackson-databind before 2.13.4, resource > exhaustion can occur because of a lack of a check in > BeanDeserializer._deserializeFromArray to prevent use of deeply nested > arrays. An application is vulnerable only with certain customized choices for > deserialization.|CVSS Score 7.5Important| > | CVE-2022-42003|In FasterXML jackson-databind before 2.14.0-rc1, resource > exhaustion can occur because of a lack of a check in primitive value > deserializers to avoid deep wrapper array nesting, when the > UNWRAP_SINGLE_VALUE_ARRAYS feature is enabled. Additional fix version in > 2.13.4.1 and 2.12.17.1|CVSS Score 7.5Important| > |CVE-2022-40152|Those using Woodstox to parse XML data may be vulnerable to > Denial of Service attacks (DOS) if DTD support is enabled. If the parser is > running on user supplied input, an attacker may supply content that causes > the parser to crash by stackoverflow. This effect may support a denial of > service attack.|CVSS Score 7.5Important| > |CVE-2022-3171|A parsing issue with binary data in protobuf-java core and > lite versions prior to 3.21.7, 3.20.3, 3.19.6 and 3.16.3 can lead to a denial > of service attack. Inputs containing multiple instances of non-repeated > embedded messages with repeated or unknown fields causes objects to be > converted back-n-forth between mutable and immutable forms, resulting in > potentially long garbage collection pauses. We recommend updating to the > versions mentioned above.|CVSS Score 7.5Important| > |CVE-2021-34538|Apache Hive before 3.1.3 "CREATE" and "DROP" function > operations does not check for necessary authorization of involved entities in > the query. It was found that an unauthorized user can manipulate an existing > UDF without having the privileges to do so. This allowed unauthorized or > underprivileged users to drop and recreate UDFs pointing them to new jars > that could be potentially malicious.|CVSS Score 7.5Important| > |CVE-2020-13949|In Apache Thrift 0.9.3 to 0.13.0, malicious RPC clients could > send short messages which would result in a large memory allocation, > potentially leading to denial of service.|CVSS Score 7.5Important| > |CVE-2018-10237|Unbounded memory allocation in Google Guava 11.0 through 24.x > before 24.1.1 allows remote attackers to conduct denial of service attacks > against servers that depend on this library and deserialize attacker-provided > data, because the AtomicDoubleArray class (when serialized with Java > serialization) and the CompoundOrdering class (when serialized with GWT > serialization) perform eager allocation without appropriate checks on what a > client has sent and whether the data size is reasonable.|CVSS 5.9Moderate| > |CVE-2021-22569|An issue in protobuf-java allowed the interleaving of > com.google.protobuf.UnknownFieldSet fields in such a way that would be > processed out of order. A small malicious payload can occupy the parser for > several minutes by creating large numbers of short-lived objects that cause > frequent, repeated pauses. We recommend upgrading libraries beyond the > vulnerable versions.|CVSS 5.9Moderate| > |CVE-2020-8908|A temp directory creation vulnerability exists in all versions > of Guava, allowing an attacker with access to the machine to potentially > access data in a temporary directory created by the Guava API > [com.google.common.io|https://urldefense.com/v3/__http:/com.google.common.io/__;!!KpaPruflFCEp!hUy3fNZoxf_mnbeTP7GUWkbaKtRLDswR2fRnQ9Gm_AoaeVUncE_plq53EqTWyd1ZfAI7tIFOgmmEBPoGRw$].Files.createTempDir(). > By default, on unix-like systems, the created directory is world-readable > (readable by an attacker with access to the system). The method in question > has been marked @Deprecated in versions 30.0 and later and should not be > used. For Android developers, we
[jira] [Resolved] (SPARK-45480) Selectable SQL Plan
[ https://issues.apache.org/jira/browse/SPARK-45480?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Kent Yao resolved SPARK-45480. -- Fix Version/s: 4.0.0 Resolution: Fixed Issue resolved by pull request 43307 [https://github.com/apache/spark/pull/43307] > Selectable SQL Plan > --- > > Key: SPARK-45480 > URL: https://issues.apache.org/jira/browse/SPARK-45480 > Project: Spark > Issue Type: Improvement > Components: SQL, Web UI >Affects Versions: 4.0.0 >Reporter: Kent Yao >Assignee: Kent Yao >Priority: Major > Labels: pull-request-available > Fix For: 4.0.0 > > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-45480) Selectable SQL Plan
[ https://issues.apache.org/jira/browse/SPARK-45480?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Kent Yao reassigned SPARK-45480: Assignee: Kent Yao > Selectable SQL Plan > --- > > Key: SPARK-45480 > URL: https://issues.apache.org/jira/browse/SPARK-45480 > Project: Spark > Issue Type: Improvement > Components: SQL, Web UI >Affects Versions: 4.0.0 >Reporter: Kent Yao >Assignee: Kent Yao >Priority: Major > Labels: pull-request-available > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-45497) Add a symbolic link file `spark-examples.jar` in K8s Docker images
[ https://issues.apache.org/jira/browse/SPARK-45497?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dongjoon Hyun reassigned SPARK-45497: - Assignee: Dongjoon Hyun > Add a symbolic link file `spark-examples.jar` in K8s Docker images > -- > > Key: SPARK-45497 > URL: https://issues.apache.org/jira/browse/SPARK-45497 > Project: Spark > Issue Type: Improvement > Components: Kubernetes >Affects Versions: 4.0.0 >Reporter: Dongjoon Hyun >Assignee: Dongjoon Hyun >Priority: Major > Labels: pull-request-available > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-45497) Add a symbolic link file `spark-examples.jar` in K8s Docker images
[ https://issues.apache.org/jira/browse/SPARK-45497?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dongjoon Hyun resolved SPARK-45497. --- Fix Version/s: 4.0.0 Resolution: Fixed Issue resolved by pull request 43324 [https://github.com/apache/spark/pull/43324] > Add a symbolic link file `spark-examples.jar` in K8s Docker images > -- > > Key: SPARK-45497 > URL: https://issues.apache.org/jira/browse/SPARK-45497 > Project: Spark > Issue Type: Improvement > Components: Kubernetes >Affects Versions: 4.0.0 >Reporter: Dongjoon Hyun >Assignee: Dongjoon Hyun >Priority: Major > Labels: pull-request-available > Fix For: 4.0.0 > > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-45502) Upgrade Kafka to 3.6.0
[ https://issues.apache.org/jira/browse/SPARK-45502?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dongjoon Hyun updated SPARK-45502: -- Summary: Upgrade Kafka to 3.6.0 (was: Upgrade to Kafka 3.6.0) > Upgrade Kafka to 3.6.0 > -- > > Key: SPARK-45502 > URL: https://issues.apache.org/jira/browse/SPARK-45502 > Project: Spark > Issue Type: Sub-task > Components: Build >Affects Versions: 4.0.0 >Reporter: Dongjoon Hyun >Priority: Major > > Apache Kafka 3.6.0 is released on Oct 10, 2023. > - https://downloads.apache.org/kafka/3.6.0/RELEASE_NOTES.html -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-44115) Upgrade Apache ORC to 2.0
[ https://issues.apache.org/jira/browse/SPARK-44115?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dongjoon Hyun updated SPARK-44115: -- Summary: Upgrade Apache ORC to 2.0 (was: Upgrade to Apache ORC 2.0) > Upgrade Apache ORC to 2.0 > - > > Key: SPARK-44115 > URL: https://issues.apache.org/jira/browse/SPARK-44115 > Project: Spark > Issue Type: Sub-task > Components: Build >Affects Versions: 4.0.0 >Reporter: Dongjoon Hyun >Priority: Major > > Apache ORC community has the following release cycles which are synchronized > with Apache Spark releases. > * ORC v2.0.0 (next year) for Apache Spark 4.0.x > * ORC v1.9.0 (this month) for Apache Spark 3.5.x > * ORC v1.8.x for Apache Spark 3.4.x > * ORC v1.7.x for Apache Spark 3.3.x > * ORC v1.6.x for Apache Spark 3.2.x -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-45502) Upgrade to Kafka 3.6.0
Dongjoon Hyun created SPARK-45502: - Summary: Upgrade to Kafka 3.6.0 Key: SPARK-45502 URL: https://issues.apache.org/jira/browse/SPARK-45502 Project: Spark Issue Type: Sub-task Components: Build Affects Versions: 4.0.0 Reporter: Dongjoon Hyun Apache Kafka 3.6.0 is released on Oct 10, 2023. - https://downloads.apache.org/kafka/3.6.0/RELEASE_NOTES.html -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-45501) Use pattern matching for type checking and conversion
[ https://issues.apache.org/jira/browse/SPARK-45501?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated SPARK-45501: --- Labels: pull-request-available (was: ) > Use pattern matching for type checking and conversion > - > > Key: SPARK-45501 > URL: https://issues.apache.org/jira/browse/SPARK-45501 > Project: Spark > Issue Type: Sub-task > Components: Spark Core, SQL >Affects Versions: 4.0.0 >Reporter: Yang Jie >Priority: Minor > Labels: pull-request-available > > Refer to [JEP 394|https://openjdk.org/jeps/394] > Example: > {code:java} > if (obj instanceof String) { > String str = (String) obj; > System.out.println(str); > } {code} > Can be replaced with > > {code:java} > if (obj instanceof String str) { > System.out.println(str); > } {code} > The new code look more compact -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-45501) Use pattern matching for type checking and conversion
Yang Jie created SPARK-45501: Summary: Use pattern matching for type checking and conversion Key: SPARK-45501 URL: https://issues.apache.org/jira/browse/SPARK-45501 Project: Spark Issue Type: Sub-task Components: Spark Core, SQL Affects Versions: 4.0.0 Reporter: Yang Jie Refer to [JEP 394|https://openjdk.org/jeps/394] Example: {code:java} if (obj instanceof String) { String str = (String) obj; System.out.println(str); } {code} Can be replaced with {code:java} if (obj instanceof String str) { System.out.println(str); } {code} The new code look more compact -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-45500) Show the number of abnormally completed drivers in MasterPage
[ https://issues.apache.org/jira/browse/SPARK-45500?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated SPARK-45500: --- Labels: pull-request-available (was: ) > Show the number of abnormally completed drivers in MasterPage > - > > Key: SPARK-45500 > URL: https://issues.apache.org/jira/browse/SPARK-45500 > Project: Spark > Issue Type: Improvement > Components: Spark Core, Web UI >Affects Versions: 4.0.0 >Reporter: Dongjoon Hyun >Priority: Major > Labels: pull-request-available > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-45500) Show the number of abnormally completed drivers in MasterPage
Dongjoon Hyun created SPARK-45500: - Summary: Show the number of abnormally completed drivers in MasterPage Key: SPARK-45500 URL: https://issues.apache.org/jira/browse/SPARK-45500 Project: Spark Issue Type: Improvement Components: Spark Core, Web UI Affects Versions: 4.0.0 Reporter: Dongjoon Hyun -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-45478) codegen sum(decimal_column / 2) computes div twice
[ https://issues.apache.org/jira/browse/SPARK-45478?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17773921#comment-17773921 ] Zhizhen Hou commented on SPARK-45478: - There are three children in If: predicate, trueValue and falseValue. There are two execution paths. 1: predicate and trueValue. 2: predicate and falseValue. There are three conbinations of possible common subexpression. 1: predicate and trueValue. 2: predicate and falseValue. 3: trueValue and falseValue. So if all possible common subexpression be eliminated, there is 2 of 3 possibility to improve the performance. For example, if there is common subexpression in predicate and falseValue, and common subexpression is executed only once, and it can improve the performance. Only there is common subexpression in trueValue and falseValue will not improve the performance and it will not draw back the performance, since whether trueValue and falseValue will be executed. So, it looks good to check all three children in If. Any suggestions? > codegen sum(decimal_column / 2) computes div twice > -- > > Key: SPARK-45478 > URL: https://issues.apache.org/jira/browse/SPARK-45478 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 3.2.0 >Reporter: Zhizhen Hou >Priority: Minor > > *The SQL to reproduce the result* > {code:java} > create table t_dec (c1 decimal(6,2)); > insert into t_dec values(1.0),(2.0),(null),(3.0); > explain codegen select sum(c1/2) from t_dec; {code} > > *Reasons may cause the result:* > > Function sum use If expression in updateExpressions: > `If(child.isNull, sum, sum + KnownNotNull(child).cast(resultType))` > > The three variables in if expression like this. > {code:java} > predicate: isnull(CheckOverflow((promote_precision(input[2, decimal(10,0), > true]) / 2), DecimalType(16,6), true))trueValue: input[0, decimal(26,6), > true]falseValue: (input[0, decimal(26,6), true] + > cast(knownnotnull(CheckOverflow((promote_precision(input[2, decimal(10,0), > true]) / 2), DecimalType(16,6), true)) as decimal(26,6))) {code} > In sub expression elimination, only predicate is evaluated in > EquivalentExpressions# childrenToRecurse > {code:java} > private def childrenToRecurse(expr: Expression): Seq[Expression] = expr match > { > case _: CodegenFallback => Nil > case i: If => i.predicate :: Nil > case c: CaseWhen => c.children.head :: Nil > case c: Coalesce => c.children.head :: Nil > case other => other.children > } {code} > I tried to replace `case i: If => i.predicate :: Nil` with 'case i: If => > i.predicate :: trueValue :: falseValue :: Nil', and it produce correct result. > > But the following comment in `childrenToRecurse` makes me not sure it will > cause any other problems. > {code:java} > // 2. If: common subexpressions will always be evaluated at the beginning, > but the true and > // false expressions in `If` may not get accessed, according to the predicate > // expression. We should only recurse into the predicate expression. {code} > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-45498) Followup: Ignore task completion from old stage after retrying indeterminate stages
[ https://issues.apache.org/jira/browse/SPARK-45498?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated SPARK-45498: --- Labels: pull-request-available (was: ) > Followup: Ignore task completion from old stage after retrying indeterminate > stages > --- > > Key: SPARK-45498 > URL: https://issues.apache.org/jira/browse/SPARK-45498 > Project: Spark > Issue Type: Bug > Components: Spark Core >Affects Versions: 4.0.0, 3.5.1 >Reporter: Mayur Bhosale >Priority: Minor > Labels: pull-request-available > > With SPARK-45182, we added a fix for not letting laggard tasks of the older > attempts of the indeterminate stage from marking the partition has completed > in map output tracker. > When a task completes, DAG scheduler also notifies all the tasksets of the > stage about that partition being completed. Tasksets would not schedule such > task if they are not already scheduled. This is not correct for indeterminate > stage, since we want to re-run all the tasks on re-attempt -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-14745) CEP support in Spark Streaming
[ https://issues.apache.org/jira/browse/SPARK-14745?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated SPARK-14745: --- Labels: pull-request-available (was: ) > CEP support in Spark Streaming > -- > > Key: SPARK-14745 > URL: https://issues.apache.org/jira/browse/SPARK-14745 > Project: Spark > Issue Type: New Feature > Components: DStreams >Reporter: Mario Briggs >Priority: Major > Labels: pull-request-available > Attachments: SparkStreamingCEP.pdf > > > Complex Event Processing is a often used feature in Streaming applications. > Spark Streaming current does not have a DSL/API for it. This JIRA is about > how/what can we add in Spark Streaming to support CEP out of the box -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-45499) Replace `Reference#isEnqueued` with `Reference#refersTo`
[ https://issues.apache.org/jira/browse/SPARK-45499?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated SPARK-45499: --- Labels: pull-request-available (was: ) > Replace `Reference#isEnqueued` with `Reference#refersTo` > > > Key: SPARK-45499 > URL: https://issues.apache.org/jira/browse/SPARK-45499 > Project: Spark > Issue Type: Sub-task > Components: Spark Core, Tests >Affects Versions: 4.0.0 >Reporter: Yang Jie >Priority: Minor > Labels: pull-request-available > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-45498) Followup: Ignore task completion from old stage after retrying indeterminate stages
Mayur Bhosale created SPARK-45498: - Summary: Followup: Ignore task completion from old stage after retrying indeterminate stages Key: SPARK-45498 URL: https://issues.apache.org/jira/browse/SPARK-45498 Project: Spark Issue Type: Bug Components: Spark Core Affects Versions: 4.0.0, 3.5.1 Reporter: Mayur Bhosale With SPARK-45182, we added a fix for not letting laggard tasks of the older attempts of the indeterminate stage from marking the partition has completed in map output tracker. When a task completes, DAG scheduler also notifies all the tasksets of the stage about that partition being completed. Tasksets would not schedule such task if they are not already scheduled. This is not correct for indeterminate stage, since we want to re-run all the tasks on re-attempt -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-45499) Replace `Reference#isEnqueued` with `Reference#refersTo`
Yang Jie created SPARK-45499: Summary: Replace `Reference#isEnqueued` with `Reference#refersTo` Key: SPARK-45499 URL: https://issues.apache.org/jira/browse/SPARK-45499 Project: Spark Issue Type: Sub-task Components: Spark Core, Tests Affects Versions: 4.0.0 Reporter: Yang Jie -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org