[jira] [Updated] (SPARK-45943) DataSourceV2Relation.computeStats throws IllegalStateException in test mode
[ https://issues.apache.org/jira/browse/SPARK-45943?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated SPARK-45943: --- Labels: pull-request-available (was: ) > DataSourceV2Relation.computeStats throws IllegalStateException in test mode > --- > > Key: SPARK-45943 > URL: https://issues.apache.org/jira/browse/SPARK-45943 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 3.5.1 >Reporter: Asif >Priority: Major > Labels: pull-request-available > > This issue surfaces when the new unit test of PR > SPARK-45866|https://github.com/apache/spark/pull/43824] is added -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-42694) Data duplication and loss occur after executing 'insert overwrite...' in Spark 3.1.1
[ https://issues.apache.org/jira/browse/SPARK-42694?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17787074#comment-17787074 ] FengZhou commented on SPARK-42694: -- No. Everything is OK, all the tasks are successful. > Data duplication and loss occur after executing 'insert overwrite...' in > Spark 3.1.1 > > > Key: SPARK-42694 > URL: https://issues.apache.org/jira/browse/SPARK-42694 > Project: Spark > Issue Type: Bug > Components: Spark Core >Affects Versions: 3.1.1 > Environment: Spark 3.1.1 > Hadoop 3.2.1 > Hive 3.1.2 >Reporter: FengZhou >Priority: Critical > Labels: shuffle, spark > Attachments: image-2023-03-07-15-59-08-818.png, > image-2023-03-07-15-59-27-665.png > > > We are currently using Spark version 3.1.1 in our production environment. We > have noticed that occasionally, after executing 'insert overwrite ... > select', the resulting data is inconsistent, with some data being duplicated > or lost. This issue does not occur all the time and seems to be more > prevalent on large tables with tens of millions of records. > We compared the execution plans for two runs of the same SQL and found that > they were identical. In the case where the SQL was executed successfully, the > amount of data being written and read during the shuffle stage was the same. > However, in the case where the problem occurred, the amount of data being > written and read during the shuffle stage was different. Please see the > attached screenshots for the write/read data during shuffle stage. > > Normal SQL: > !image-2023-03-07-15-59-08-818.png! > SQL with issues: > !image-2023-03-07-15-59-27-665.png! > > Is this problem caused by a bug in version 3.1.1, specifically (SPARK-34534): > 'New protocol FetchShuffleBlocks in OneForOneBlockFetcher lead to data loss > or correctness'? Or is it caused by something else? What could be the root > cause of this problem? -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-45971) Correct the package name of `SparkCollectionUtils`
[ https://issues.apache.org/jira/browse/SPARK-45971?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated SPARK-45971: --- Labels: pull-request-available (was: ) > Correct the package name of `SparkCollectionUtils` > -- > > Key: SPARK-45971 > URL: https://issues.apache.org/jira/browse/SPARK-45971 > Project: Spark > Issue Type: Improvement > Components: Spark Core >Affects Versions: 4.0.0 >Reporter: Yang Jie >Priority: Major > Labels: pull-request-available > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-45962) Remove treatEmptyValuesAsNulls and use nullValue option instead in XML
[ https://issues.apache.org/jira/browse/SPARK-45962?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dongjoon Hyun resolved SPARK-45962. --- Fix Version/s: 4.0.0 Resolution: Fixed Issue resolved by pull request 43852 [https://github.com/apache/spark/pull/43852] > Remove treatEmptyValuesAsNulls and use nullValue option instead in XML > -- > > Key: SPARK-45962 > URL: https://issues.apache.org/jira/browse/SPARK-45962 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 4.0.0 >Reporter: Shujing Yang >Assignee: Shujing Yang >Priority: Major > Labels: pull-request-available > Fix For: 4.0.0 > > > Today, we offer two available options to handle null values. To enhance user > clarity and simplify usage, we propose consolidating these into a single > option. We recommend retaining the {{nullValue}} option due to its broader > semantic scope. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-45971) Correct the package name of `SparkCollectionUtils`
Yang Jie created SPARK-45971: Summary: Correct the package name of `SparkCollectionUtils` Key: SPARK-45971 URL: https://issues.apache.org/jira/browse/SPARK-45971 Project: Spark Issue Type: Improvement Components: Spark Core Affects Versions: 4.0.0 Reporter: Yang Jie -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-45699) Fix "Widening conversion from `TypeA` to `TypeB` is deprecated because it loses precision"
[ https://issues.apache.org/jira/browse/SPARK-45699?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17787053#comment-17787053 ] Yang Jie commented on SPARK-45699: -- [~hannahkamundson] Is there any progress on this ticket? > Fix "Widening conversion from `TypeA` to `TypeB` is deprecated because it > loses precision" > -- > > Key: SPARK-45699 > URL: https://issues.apache.org/jira/browse/SPARK-45699 > Project: Spark > Issue Type: Sub-task > Components: Spark Core, SQL >Affects Versions: 4.0.0 >Reporter: Yang Jie >Priority: Major > > {code:java} > error] > /Users/yangjie01/SourceCode/git/spark-mine-sbt/core/src/main/scala/org/apache/spark/scheduler/TaskSetManager.scala:1199:67: > Widening conversion from Long to Double is deprecated because it loses > precision. Write `.toDouble` instead. [quickfixable] > [error] Applicable -Wconf / @nowarn filters for this fatal warning: msg= of the message>, cat=deprecation, > site=org.apache.spark.scheduler.TaskSetManager.checkSpeculatableTasks.threshold > [error] val threshold = max(speculationMultiplier * medianDuration, > minTimeToSpeculation) > [error] ^ > [error] > /Users/yangjie01/SourceCode/git/spark-mine-sbt/core/src/main/scala/org/apache/spark/scheduler/TaskSetManager.scala:1207:60: > Widening conversion from Long to Double is deprecated because it loses > precision. Write `.toDouble` instead. [quickfixable] > [error] Applicable -Wconf / @nowarn filters for this fatal warning: msg= of the message>, cat=deprecation, > site=org.apache.spark.scheduler.TaskSetManager.checkSpeculatableTasks > [error] foundTasks = checkAndSubmitSpeculatableTasks(timeMs, threshold, > customizedThreshold = true) > [error] ^ > [error] > /Users/yangjie01/SourceCode/git/spark-mine-sbt/connector/connect/common/src/main/scala/org/apache/spark/sql/connect/client/arrow/ArrowVectorReader.scala:137:48: > Widening conversion from Int to Float is deprecated because it loses > precision. Write `.toFloat` instead. [quickfixable] > [error] Applicable -Wconf / @nowarn filters for this fatal warning: msg= of the message>, cat=deprecation, > site=org.apache.spark.sql.connect.client.arrow.IntVectorReader.getFloat > [error] override def getFloat(i: Int): Float = getInt(i) > [error] ^ > [error] > /Users/yangjie01/SourceCode/git/spark-mine-sbt/connector/connect/common/src/main/scala/org/apache/spark/sql/connect/client/arrow/ArrowVectorReader.scala:146:49: > Widening conversion from Long to Float is deprecated because it loses > precision. Write `.toFloat` instead. [quickfixable] > [error] Applicable -Wconf / @nowarn filters for this fatal warning: msg= of the message>, cat=deprecation, > site=org.apache.spark.sql.connect.client.arrow.BigIntVectorReader.getFloat > [error] override def getFloat(i: Int): Float = getLong(i) > [error] ^ > [error] > /Users/yangjie01/SourceCode/git/spark-mine-sbt/connector/connect/common/src/main/scala/org/apache/spark/sql/connect/client/arrow/ArrowVectorReader.scala:147:51: > Widening conversion from Long to Double is deprecated because it loses > precision. Write `.toDouble` instead. [quickfixable] > [error] Applicable -Wconf / @nowarn filters for this fatal warning: msg= of the message>, cat=deprecation, > site=org.apache.spark.sql.connect.client.arrow.BigIntVectorReader.getDouble > [error] override def getDouble(i: Int): Double = getLong(i) > [error] ^ {code} > > > The example of the compilation warning is as above, there are probably over > 100 similar cases that need to be fixed. > > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-45966) Add missing methods for API reference.
[ https://issues.apache.org/jira/browse/SPARK-45966?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dongjoon Hyun resolved SPARK-45966. --- Fix Version/s: 4.0.0 Resolution: Fixed Issue resolved by pull request 43860 [https://github.com/apache/spark/pull/43860] > Add missing methods for API reference. > -- > > Key: SPARK-45966 > URL: https://issues.apache.org/jira/browse/SPARK-45966 > Project: Spark > Issue Type: Bug > Components: Documentation, Pandas API on Spark >Affects Versions: 4.0.0 >Reporter: Haejoon Lee >Assignee: Haejoon Lee >Priority: Major > Labels: pull-request-available > Fix For: 4.0.0 > > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-45966) Add missing methods for API reference.
[ https://issues.apache.org/jira/browse/SPARK-45966?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dongjoon Hyun reassigned SPARK-45966: - Assignee: Haejoon Lee > Add missing methods for API reference. > -- > > Key: SPARK-45966 > URL: https://issues.apache.org/jira/browse/SPARK-45966 > Project: Spark > Issue Type: Bug > Components: Documentation, Pandas API on Spark >Affects Versions: 4.0.0 >Reporter: Haejoon Lee >Assignee: Haejoon Lee >Priority: Major > Labels: pull-request-available > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-45968) Upgrade github docker action to latest version
[ https://issues.apache.org/jira/browse/SPARK-45968?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dongjoon Hyun resolved SPARK-45968. --- Fix Version/s: 4.0.0 Resolution: Fixed Issue resolved by pull request 43862 [https://github.com/apache/spark/pull/43862] > Upgrade github docker action to latest version > -- > > Key: SPARK-45968 > URL: https://issues.apache.org/jira/browse/SPARK-45968 > Project: Spark > Issue Type: Improvement > Components: Project Infra >Affects Versions: 4.0.0 >Reporter: BingKun Pan >Assignee: BingKun Pan >Priority: Minor > Labels: pull-request-available > Fix For: 4.0.0 > > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-45970) Provide partitioning expressions in Java as same as Scala
Hyukjin Kwon created SPARK-45970: Summary: Provide partitioning expressions in Java as same as Scala Key: SPARK-45970 URL: https://issues.apache.org/jira/browse/SPARK-45970 Project: Spark Issue Type: Improvement Components: SQL Affects Versions: 4.0.0 Reporter: Hyukjin Kwon See https://github.com/apache/spark/pull/43858. Once Scala 3 is out, we can support the same way of partitioning expressions such as: {code} import static org.apache.spark.sql.functions.partitioning.*; {code} -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-45969) Document configuration change of executor failure tracker
[ https://issues.apache.org/jira/browse/SPARK-45969?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated SPARK-45969: --- Labels: pull-request-available (was: ) > Document configuration change of executor failure tracker > - > > Key: SPARK-45969 > URL: https://issues.apache.org/jira/browse/SPARK-45969 > Project: Spark > Issue Type: Documentation > Components: Documentation >Affects Versions: 3.5.0 >Reporter: Cheng Pan >Priority: Major > Labels: pull-request-available > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-45969) Document configuration change of executor failure tracker
Cheng Pan created SPARK-45969: - Summary: Document configuration change of executor failure tracker Key: SPARK-45969 URL: https://issues.apache.org/jira/browse/SPARK-45969 Project: Spark Issue Type: Documentation Components: Documentation Affects Versions: 3.5.0 Reporter: Cheng Pan -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-45762) Shuffle managers defined in user jars are not available for some launch modes
[ https://issues.apache.org/jira/browse/SPARK-45762?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Mridul Muralidharan reassigned SPARK-45762: --- Assignee: Alessandro Bellina > Shuffle managers defined in user jars are not available for some launch modes > - > > Key: SPARK-45762 > URL: https://issues.apache.org/jira/browse/SPARK-45762 > Project: Spark > Issue Type: Bug > Components: Spark Core >Affects Versions: 3.5.0 >Reporter: Alessandro Bellina >Assignee: Alessandro Bellina >Priority: Minor > Labels: pull-request-available > Fix For: 4.0.0 > > > Starting a spark job in standalone mode with a custom `ShuffleManager` > provided in a jar via `--jars` does not work. This can also be experienced in > local-cluster mode. > The approach that works consistently is to copy the jar containing the custom > `ShuffleManager` to a specific location in each node then add it to > `spark.driver.extraClassPath` and `spark.executor.extraClassPath`, but we > would like to move away from setting extra configurations unnecessarily. > Example: > {code:java} > $SPARK_HOME/bin/spark-shell \ > --master spark://127.0.0.1:7077 \ > --conf spark.shuffle.manager=org.apache.spark.examples.TestShuffleManager \ > --jars user-code.jar > {code} > This yields `java.lang.ClassNotFoundException` in the executors. > {code:java} > Exception in thread "main" java.lang.reflect.UndeclaredThrowableException > at > org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1915) > at > org.apache.spark.deploy.SparkHadoopUtil.runAsSparkUser(SparkHadoopUtil.scala:61) > at > org.apache.spark.executor.CoarseGrainedExecutorBackend$.run(CoarseGrainedExecutorBackend.scala:436) > at > org.apache.spark.executor.CoarseGrainedExecutorBackend$.main(CoarseGrainedExecutorBackend.scala:425) > at > org.apache.spark.executor.CoarseGrainedExecutorBackend.main(CoarseGrainedExecutorBackend.scala) > Caused by: java.lang.ClassNotFoundException: > org.apache.spark.examples.TestShuffleManager > at > java.base/jdk.internal.loader.BuiltinClassLoader.loadClass(BuiltinClassLoader.java:641) > at > java.base/jdk.internal.loader.ClassLoaders$AppClassLoader.loadClass(ClassLoaders.java:188) > at java.base/java.lang.ClassLoader.loadClass(ClassLoader.java:520) > at java.base/java.lang.Class.forName0(Native Method) > at java.base/java.lang.Class.forName(Class.java:467) > at > org.apache.spark.util.SparkClassUtils.classForName(SparkClassUtils.scala:41) > at > org.apache.spark.util.SparkClassUtils.classForName$(SparkClassUtils.scala:36) > at org.apache.spark.util.Utils$.classForName(Utils.scala:95) > at > org.apache.spark.util.Utils$.instantiateSerializerOrShuffleManager(Utils.scala:2574) > at org.apache.spark.SparkEnv$.create(SparkEnv.scala:366) > at org.apache.spark.SparkEnv$.createExecutorEnv(SparkEnv.scala:255) > at > org.apache.spark.executor.CoarseGrainedExecutorBackend$.$anonfun$run$7(CoarseGrainedExecutorBackend.scala:487) > at > org.apache.spark.deploy.SparkHadoopUtil$$anon$1.run(SparkHadoopUtil.scala:62) > at > org.apache.spark.deploy.SparkHadoopUtil$$anon$1.run(SparkHadoopUtil.scala:61) > at > java.base/java.security.AccessController.doPrivileged(AccessController.java:712) > at java.base/javax.security.auth.Subject.doAs(Subject.java:439) > at > org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1899) > ... 4 more > {code} > We can change our command to use `extraClassPath`: > {code:java} > $SPARK_HOME/bin/spark-shell \ > --master spark://127.0.0.1:7077 \ > --conf spark.shuffle.manager=org.apache.spark.examples.TestShuffleManager \ > --conf spark.driver.extraClassPath=user-code.jar \ > --conf spark.executor.extraClassPath=user-code.jar > {code} > Success after adding the jar to `extraClassPath`: > {code:java} > 23/10/26 12:58:26 INFO TransportClientFactory: Successfully created > connection to localhost/127.0.0.1:33053 after 7 ms (0 ms spent in bootstraps) > 23/10/26 12:58:26 WARN TestShuffleManager: Instantiated TestShuffleManager!! > 23/10/26 12:58:26 INFO DiskBlockManager: Created local directory at > /tmp/spark-cb101b05-c4b7-4ba9-8b3d-5b23baa7cb46/executor-5d5335dd-c116-4211-9691-87d8566017fd/blockmgr-2fcb1ab2-d886--8c7f-9dca2c880c2c > {code} > We would like to change startup order such that the original command > succeeds, without specifying `extraClassPath`: > {code:java} > $SPARK_HOME/bin/spark-shell \ > --master spark://127.0.0.1:7077 \ > --conf spark.shuffle.manager=org.apache.spark.examples.TestShuffleManager \ > --jars user-code.jar > {code} > Proposed changes: > Refactor code so we initialize the `ShuffleManager` later, after jars have > been locali
[jira] [Resolved] (SPARK-45762) Shuffle managers defined in user jars are not available for some launch modes
[ https://issues.apache.org/jira/browse/SPARK-45762?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Mridul Muralidharan resolved SPARK-45762. - Resolution: Fixed Issue resolved by pull request 43627 [https://github.com/apache/spark/pull/43627] > Shuffle managers defined in user jars are not available for some launch modes > - > > Key: SPARK-45762 > URL: https://issues.apache.org/jira/browse/SPARK-45762 > Project: Spark > Issue Type: Bug > Components: Spark Core >Affects Versions: 3.5.0 >Reporter: Alessandro Bellina >Assignee: Alessandro Bellina >Priority: Minor > Labels: pull-request-available > Fix For: 4.0.0 > > > Starting a spark job in standalone mode with a custom `ShuffleManager` > provided in a jar via `--jars` does not work. This can also be experienced in > local-cluster mode. > The approach that works consistently is to copy the jar containing the custom > `ShuffleManager` to a specific location in each node then add it to > `spark.driver.extraClassPath` and `spark.executor.extraClassPath`, but we > would like to move away from setting extra configurations unnecessarily. > Example: > {code:java} > $SPARK_HOME/bin/spark-shell \ > --master spark://127.0.0.1:7077 \ > --conf spark.shuffle.manager=org.apache.spark.examples.TestShuffleManager \ > --jars user-code.jar > {code} > This yields `java.lang.ClassNotFoundException` in the executors. > {code:java} > Exception in thread "main" java.lang.reflect.UndeclaredThrowableException > at > org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1915) > at > org.apache.spark.deploy.SparkHadoopUtil.runAsSparkUser(SparkHadoopUtil.scala:61) > at > org.apache.spark.executor.CoarseGrainedExecutorBackend$.run(CoarseGrainedExecutorBackend.scala:436) > at > org.apache.spark.executor.CoarseGrainedExecutorBackend$.main(CoarseGrainedExecutorBackend.scala:425) > at > org.apache.spark.executor.CoarseGrainedExecutorBackend.main(CoarseGrainedExecutorBackend.scala) > Caused by: java.lang.ClassNotFoundException: > org.apache.spark.examples.TestShuffleManager > at > java.base/jdk.internal.loader.BuiltinClassLoader.loadClass(BuiltinClassLoader.java:641) > at > java.base/jdk.internal.loader.ClassLoaders$AppClassLoader.loadClass(ClassLoaders.java:188) > at java.base/java.lang.ClassLoader.loadClass(ClassLoader.java:520) > at java.base/java.lang.Class.forName0(Native Method) > at java.base/java.lang.Class.forName(Class.java:467) > at > org.apache.spark.util.SparkClassUtils.classForName(SparkClassUtils.scala:41) > at > org.apache.spark.util.SparkClassUtils.classForName$(SparkClassUtils.scala:36) > at org.apache.spark.util.Utils$.classForName(Utils.scala:95) > at > org.apache.spark.util.Utils$.instantiateSerializerOrShuffleManager(Utils.scala:2574) > at org.apache.spark.SparkEnv$.create(SparkEnv.scala:366) > at org.apache.spark.SparkEnv$.createExecutorEnv(SparkEnv.scala:255) > at > org.apache.spark.executor.CoarseGrainedExecutorBackend$.$anonfun$run$7(CoarseGrainedExecutorBackend.scala:487) > at > org.apache.spark.deploy.SparkHadoopUtil$$anon$1.run(SparkHadoopUtil.scala:62) > at > org.apache.spark.deploy.SparkHadoopUtil$$anon$1.run(SparkHadoopUtil.scala:61) > at > java.base/java.security.AccessController.doPrivileged(AccessController.java:712) > at java.base/javax.security.auth.Subject.doAs(Subject.java:439) > at > org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1899) > ... 4 more > {code} > We can change our command to use `extraClassPath`: > {code:java} > $SPARK_HOME/bin/spark-shell \ > --master spark://127.0.0.1:7077 \ > --conf spark.shuffle.manager=org.apache.spark.examples.TestShuffleManager \ > --conf spark.driver.extraClassPath=user-code.jar \ > --conf spark.executor.extraClassPath=user-code.jar > {code} > Success after adding the jar to `extraClassPath`: > {code:java} > 23/10/26 12:58:26 INFO TransportClientFactory: Successfully created > connection to localhost/127.0.0.1:33053 after 7 ms (0 ms spent in bootstraps) > 23/10/26 12:58:26 WARN TestShuffleManager: Instantiated TestShuffleManager!! > 23/10/26 12:58:26 INFO DiskBlockManager: Created local directory at > /tmp/spark-cb101b05-c4b7-4ba9-8b3d-5b23baa7cb46/executor-5d5335dd-c116-4211-9691-87d8566017fd/blockmgr-2fcb1ab2-d886--8c7f-9dca2c880c2c > {code} > We would like to change startup order such that the original command > succeeds, without specifying `extraClassPath`: > {code:java} > $SPARK_HOME/bin/spark-shell \ > --master spark://127.0.0.1:7077 \ > --conf spark.shuffle.manager=org.apache.spark.examples.TestShuffleManager \ > --jars user-code.jar > {code} > Proposed changes: > Refactor code so we in
[jira] [Updated] (SPARK-44021) Add spark.sql.files.maxPartitionNum
[ https://issues.apache.org/jira/browse/SPARK-44021?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated SPARK-44021: --- Labels: pull-request-available (was: ) > Add spark.sql.files.maxPartitionNum > --- > > Key: SPARK-44021 > URL: https://issues.apache.org/jira/browse/SPARK-44021 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 3.5.0 >Reporter: Yuming Wang >Assignee: Yuming Wang >Priority: Major > Labels: pull-request-available > Fix For: 3.5.0 > > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-45966) Add missing methods for API reference.
[ https://issues.apache.org/jira/browse/SPARK-45966?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated SPARK-45966: --- Labels: pull-request-available (was: ) > Add missing methods for API reference. > -- > > Key: SPARK-45966 > URL: https://issues.apache.org/jira/browse/SPARK-45966 > Project: Spark > Issue Type: Bug > Components: Documentation, Pandas API on Spark >Affects Versions: 4.0.0 >Reporter: Haejoon Lee >Priority: Major > Labels: pull-request-available > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-45964) Remove private[sql] in XML and JSON package under catalyst package
[ https://issues.apache.org/jira/browse/SPARK-45964?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dongjoon Hyun resolved SPARK-45964. --- Fix Version/s: 4.0.0 Resolution: Fixed Issue resolved by pull request 43856 [https://github.com/apache/spark/pull/43856] > Remove private[sql] in XML and JSON package under catalyst package > -- > > Key: SPARK-45964 > URL: https://issues.apache.org/jira/browse/SPARK-45964 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 4.0.0 >Reporter: Hyukjin Kwon >Assignee: Hyukjin Kwon >Priority: Minor > Labels: pull-request-available > Fix For: 4.0.0 > > > catalyst is intenral, so we don't need to annotate them as private[sql] -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-45967) Upgrade jackson to 2.16.0
[ https://issues.apache.org/jira/browse/SPARK-45967?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated SPARK-45967: --- Labels: pull-request-available (was: ) > Upgrade jackson to 2.16.0 > - > > Key: SPARK-45967 > URL: https://issues.apache.org/jira/browse/SPARK-45967 > Project: Spark > Issue Type: Improvement > Components: Build >Affects Versions: 4.0.0 >Reporter: BingKun Pan >Priority: Minor > Labels: pull-request-available > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-45964) Remove private[sql] in XML and JSON package under catalyst package
[ https://issues.apache.org/jira/browse/SPARK-45964?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dongjoon Hyun reassigned SPARK-45964: - Assignee: Hyukjin Kwon > Remove private[sql] in XML and JSON package under catalyst package > -- > > Key: SPARK-45964 > URL: https://issues.apache.org/jira/browse/SPARK-45964 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 4.0.0 >Reporter: Hyukjin Kwon >Assignee: Hyukjin Kwon >Priority: Minor > Labels: pull-request-available > > catalyst is intenral, so we don't need to annotate them as private[sql] -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-45967) Upgrade jackson to 2.16.0
BingKun Pan created SPARK-45967: --- Summary: Upgrade jackson to 2.16.0 Key: SPARK-45967 URL: https://issues.apache.org/jira/browse/SPARK-45967 Project: Spark Issue Type: Improvement Components: Build Affects Versions: 4.0.0 Reporter: BingKun Pan -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-45966) Add missing methods for API reference.
Haejoon Lee created SPARK-45966: --- Summary: Add missing methods for API reference. Key: SPARK-45966 URL: https://issues.apache.org/jira/browse/SPARK-45966 Project: Spark Issue Type: Bug Components: Documentation, Pandas API on Spark Affects Versions: 4.0.0 Reporter: Haejoon Lee -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-45965) Move DSv2 partitioning expressions into functions.partitioning
[ https://issues.apache.org/jira/browse/SPARK-45965?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated SPARK-45965: --- Labels: pull-request-available (was: ) > Move DSv2 partitioning expressions into functions.partitioning > -- > > Key: SPARK-45965 > URL: https://issues.apache.org/jira/browse/SPARK-45965 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 4.0.0 >Reporter: Hyukjin Kwon >Priority: Major > Labels: pull-request-available > > We weren't able to move those partitioning expressions into nested object > because of Scala 2.12 limitation. Now we're able to do it with Scala 2.13 -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-45952) Use built-in math constant in math functions
[ https://issues.apache.org/jira/browse/SPARK-45952?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hyukjin Kwon reassigned SPARK-45952: Assignee: Ruifeng Zheng > Use built-in math constant in math functions > - > > Key: SPARK-45952 > URL: https://issues.apache.org/jira/browse/SPARK-45952 > Project: Spark > Issue Type: Sub-task > Components: Documentation, PySpark >Affects Versions: 4.0.0 >Reporter: Ruifeng Zheng >Assignee: Ruifeng Zheng >Priority: Major > Labels: pull-request-available > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-45952) Use built-in math constant in math functions
[ https://issues.apache.org/jira/browse/SPARK-45952?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hyukjin Kwon resolved SPARK-45952. -- Fix Version/s: 4.0.0 Resolution: Fixed Issue resolved by pull request 43837 [https://github.com/apache/spark/pull/43837] > Use built-in math constant in math functions > - > > Key: SPARK-45952 > URL: https://issues.apache.org/jira/browse/SPARK-45952 > Project: Spark > Issue Type: Sub-task > Components: Documentation, PySpark >Affects Versions: 4.0.0 >Reporter: Ruifeng Zheng >Assignee: Ruifeng Zheng >Priority: Major > Labels: pull-request-available > Fix For: 4.0.0 > > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-45964) Remove private[sql] in XML and JSON package under catalyst package
[ https://issues.apache.org/jira/browse/SPARK-45964?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated SPARK-45964: --- Labels: pull-request-available (was: ) > Remove private[sql] in XML and JSON package under catalyst package > -- > > Key: SPARK-45964 > URL: https://issues.apache.org/jira/browse/SPARK-45964 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 4.0.0 >Reporter: Hyukjin Kwon >Priority: Minor > Labels: pull-request-available > > catalyst is intenral, so we don't need to annotate them as private[sql] -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-40909) Reuse the broadcast exchange for bloom filter
[ https://issues.apache.org/jira/browse/SPARK-40909?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated SPARK-40909: --- Labels: pull-request-available (was: ) > Reuse the broadcast exchange for bloom filter > - > > Key: SPARK-40909 > URL: https://issues.apache.org/jira/browse/SPARK-40909 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 3.4.0 >Reporter: Jiaan Geng >Priority: Major > Labels: pull-request-available > > Currently, if the creation side of bloom filter could be broadcasted, Spark > cannot inject a bloom filter or InSunquery filter into the application side. > In fact, we can inject bloom filter which could reuse the broadcast exchange > and improve performance. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-44669) Parquet/ORC files written using Hive Serde should has file extension
[ https://issues.apache.org/jira/browse/SPARK-44669?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated SPARK-44669: --- Labels: pull-request-available (was: ) > Parquet/ORC files written using Hive Serde should has file extension > > > Key: SPARK-44669 > URL: https://issues.apache.org/jira/browse/SPARK-44669 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 3.3.2 >Reporter: Cheng Pan >Priority: Major > Labels: pull-request-available > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-45912) Enhancement of XSDToSchema API: Change to HDFS API for cloud storage accessibility
[ https://issues.apache.org/jira/browse/SPARK-45912?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hyukjin Kwon reassigned SPARK-45912: Assignee: Shujing Yang > Enhancement of XSDToSchema API: Change to HDFS API for cloud storage > accessibility > --- > > Key: SPARK-45912 > URL: https://issues.apache.org/jira/browse/SPARK-45912 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 4.0.0 >Reporter: Shujing Yang >Assignee: Shujing Yang >Priority: Major > Labels: pull-request-available > > Previously, it utilized `java.nio.path`, which limited file reading to local > file systems only. By changing this to an HDFS-compatible API, we now enable > the XSDToSchema function to access files in cloud storage. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-44704) Cleanup shuffle files from host node after migration due to graceful decommissioning
[ https://issues.apache.org/jira/browse/SPARK-44704?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated SPARK-44704: --- Labels: pull-request-available (was: ) > Cleanup shuffle files from host node after migration due to graceful > decommissioning > > > Key: SPARK-44704 > URL: https://issues.apache.org/jira/browse/SPARK-44704 > Project: Spark > Issue Type: Improvement > Components: Block Manager >Affects Versions: 3.4.1 >Reporter: Deependra Patel >Priority: Minor > Labels: pull-request-available > > Although these files will be deleted at the end of the application by the > external shuffle service, doing this early can free up resources and can help > in long running applications running out of disk space. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-45912) Enhancement of XSDToSchema API: Change to HDFS API for cloud storage accessibility
[ https://issues.apache.org/jira/browse/SPARK-45912?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hyukjin Kwon resolved SPARK-45912. -- Fix Version/s: 4.0.0 Resolution: Fixed Issue resolved by pull request 43789 [https://github.com/apache/spark/pull/43789] > Enhancement of XSDToSchema API: Change to HDFS API for cloud storage > accessibility > --- > > Key: SPARK-45912 > URL: https://issues.apache.org/jira/browse/SPARK-45912 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 4.0.0 >Reporter: Shujing Yang >Assignee: Shujing Yang >Priority: Major > Labels: pull-request-available > Fix For: 4.0.0 > > > Previously, it utilized `java.nio.path`, which limited file reading to local > file systems only. By changing this to an HDFS-compatible API, we now enable > the XSDToSchema function to access files in cloud storage. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-45964) Remove private[sql] in XML and JSON package under catalyst package
Hyukjin Kwon created SPARK-45964: Summary: Remove private[sql] in XML and JSON package under catalyst package Key: SPARK-45964 URL: https://issues.apache.org/jira/browse/SPARK-45964 Project: Spark Issue Type: Bug Components: SQL Affects Versions: 4.0.0 Reporter: Hyukjin Kwon catalyst is intenral, so we don't need to annotate them as private[sql] -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-45963) Restore documentation for DSv2 API
[ https://issues.apache.org/jira/browse/SPARK-45963?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated SPARK-45963: --- Labels: pull-request-available (was: ) > Restore documentation for DSv2 API > -- > > Key: SPARK-45963 > URL: https://issues.apache.org/jira/browse/SPARK-45963 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 3.4.1, 3.5.0, 4.0.0 >Reporter: Hyukjin Kwon >Priority: Major > Labels: pull-request-available > > DSv2 documentation is mistakenly gone after > https://github.com/apache/spark/pull/38392. It used to exist in 3.3.0: > https://spark.apache.org/docs/3.3.0/api/scala/org/apache/spark/sql/connector/catalog/index.html -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-45963) Restore documentation for DSv2 API
Hyukjin Kwon created SPARK-45963: Summary: Restore documentation for DSv2 API Key: SPARK-45963 URL: https://issues.apache.org/jira/browse/SPARK-45963 Project: Spark Issue Type: Bug Components: SQL Affects Versions: 3.5.0, 3.4.1, 4.0.0 Reporter: Hyukjin Kwon DSv2 documentation is mistakenly gone after https://github.com/apache/spark/pull/38392. It used to exist in 3.3.0: https://spark.apache.org/docs/3.3.0/api/scala/org/apache/spark/sql/connector/catalog/index.html -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-45959) Abusing DataSet.withColumn can cause huge tree with severe perf degradation
[ https://issues.apache.org/jira/browse/SPARK-45959?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated SPARK-45959: --- Labels: pull-request-available (was: ) > Abusing DataSet.withColumn can cause huge tree with severe perf degradation > --- > > Key: SPARK-45959 > URL: https://issues.apache.org/jira/browse/SPARK-45959 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 3.5.1 >Reporter: Asif >Priority: Minor > Labels: pull-request-available > > Though documentation clearly recommends to add all columns in a single shot, > but in reality is difficult to expect customer to modify their code, as in > spark2 the rules in analyzer were such that they did not do deep tree > traversal. Moreover in Spark3 , the plans are cloned before giving to > analyzer , optimizer etc which was not the case in Spark2. > All these things have resulted in query time being increased from 5 min to 2 > - 3 hrs. > Many times the columns are added to plan via some for loop logic which just > keeps adding new computation based on some rule. > So, my suggestion is to do some intial check in the withColumn api, before > creating a new projection, like if all the existing columns are still being > projected, and the new column being added has an expression which is not > depending on the output of the top node , but its child, then instead of > adding a new project, the column can be added to the existing node. > For starts, may be we can just handle Project node .. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-45950) Fix IvyTestUtils#createIvyDescriptor function and make common-utils module can run tests on GitHub Action
[ https://issues.apache.org/jira/browse/SPARK-45950?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hyukjin Kwon reassigned SPARK-45950: Assignee: Yang Jie > Fix IvyTestUtils#createIvyDescriptor function and make common-utils module > can run tests on GitHub Action > - > > Key: SPARK-45950 > URL: https://issues.apache.org/jira/browse/SPARK-45950 > Project: Spark > Issue Type: Bug > Components: Project Infra, Spark Core >Affects Versions: 4.0.0 >Reporter: Yang Jie >Assignee: Yang Jie >Priority: Major > Labels: pull-request-available > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-45950) Fix IvyTestUtils#createIvyDescriptor function and make common-utils module can run tests on GitHub Action
[ https://issues.apache.org/jira/browse/SPARK-45950?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hyukjin Kwon resolved SPARK-45950. -- Fix Version/s: 4.0.0 Resolution: Fixed Issue resolved by pull request 43834 [https://github.com/apache/spark/pull/43834] > Fix IvyTestUtils#createIvyDescriptor function and make common-utils module > can run tests on GitHub Action > - > > Key: SPARK-45950 > URL: https://issues.apache.org/jira/browse/SPARK-45950 > Project: Spark > Issue Type: Bug > Components: Project Infra, Spark Core >Affects Versions: 4.0.0 >Reporter: Yang Jie >Assignee: Yang Jie >Priority: Major > Labels: pull-request-available > Fix For: 4.0.0 > > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-45960) Add Python 3.10 to the Daily Python Github Action job
[ https://issues.apache.org/jira/browse/SPARK-45960?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hyukjin Kwon resolved SPARK-45960. -- Fix Version/s: 4.0.0 Resolution: Fixed Issue resolved by pull request 43847 [https://github.com/apache/spark/pull/43847] > Add Python 3.10 to the Daily Python Github Action job > - > > Key: SPARK-45960 > URL: https://issues.apache.org/jira/browse/SPARK-45960 > Project: Spark > Issue Type: Task > Components: Project Infra >Affects Versions: 4.0.0 >Reporter: Dongjoon Hyun >Assignee: Dongjoon Hyun >Priority: Minor > Labels: pull-request-available > Fix For: 4.0.0 > > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-45961) Document `spark.master.*` configurations
[ https://issues.apache.org/jira/browse/SPARK-45961?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dongjoon Hyun updated SPARK-45961: -- Fix Version/s: 3.4.2 > Document `spark.master.*` configurations > > > Key: SPARK-45961 > URL: https://issues.apache.org/jira/browse/SPARK-45961 > Project: Spark > Issue Type: Sub-task > Components: Documentation >Affects Versions: 3.4.2, 4.0.0, 3.5.1 >Reporter: Dongjoon Hyun >Assignee: Dongjoon Hyun >Priority: Major > Labels: pull-request-available > Fix For: 3.4.2, 4.0.0, 3.5.1 > > > Currently, `spark.master.*` configurations are undocumented. > {code:java} > $ git grep 'ConfigBuilder("spark.master' > core/src/main/scala/org/apache/spark/internal/config/UI.scala: val > MASTER_UI_DECOMMISSION_ALLOW_MODE = > ConfigBuilder("spark.master.ui.decommission.allow.mode") > core/src/main/scala/org/apache/spark/internal/config/package.scala: > private[spark] val MASTER_REST_SERVER_ENABLED = > ConfigBuilder("spark.master.rest.enabled") > core/src/main/scala/org/apache/spark/internal/config/package.scala: > private[spark] val MASTER_REST_SERVER_PORT = > ConfigBuilder("spark.master.rest.port") > core/src/main/scala/org/apache/spark/internal/config/package.scala: > private[spark] val MASTER_UI_PORT = ConfigBuilder("spark.master.ui.port") > core/src/main/scala/org/apache/spark/internal/config/package.scala: > ConfigBuilder("spark.master.ui.historyServerUrl") > core/src/main/scala/org/apache/spark/internal/config/package.scala: > ConfigBuilder("spark.master.useAppNameAsAppId.enabled") {code} -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-45961) Document `spark.master.*` configurations
[ https://issues.apache.org/jira/browse/SPARK-45961?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dongjoon Hyun updated SPARK-45961: -- Fix Version/s: 3.5.1 > Document `spark.master.*` configurations > > > Key: SPARK-45961 > URL: https://issues.apache.org/jira/browse/SPARK-45961 > Project: Spark > Issue Type: Sub-task > Components: Documentation >Affects Versions: 3.4.2, 4.0.0, 3.5.1 >Reporter: Dongjoon Hyun >Assignee: Dongjoon Hyun >Priority: Major > Labels: pull-request-available > Fix For: 4.0.0, 3.5.1 > > > Currently, `spark.master.*` configurations are undocumented. > {code:java} > $ git grep 'ConfigBuilder("spark.master' > core/src/main/scala/org/apache/spark/internal/config/UI.scala: val > MASTER_UI_DECOMMISSION_ALLOW_MODE = > ConfigBuilder("spark.master.ui.decommission.allow.mode") > core/src/main/scala/org/apache/spark/internal/config/package.scala: > private[spark] val MASTER_REST_SERVER_ENABLED = > ConfigBuilder("spark.master.rest.enabled") > core/src/main/scala/org/apache/spark/internal/config/package.scala: > private[spark] val MASTER_REST_SERVER_PORT = > ConfigBuilder("spark.master.rest.port") > core/src/main/scala/org/apache/spark/internal/config/package.scala: > private[spark] val MASTER_UI_PORT = ConfigBuilder("spark.master.ui.port") > core/src/main/scala/org/apache/spark/internal/config/package.scala: > ConfigBuilder("spark.master.ui.historyServerUrl") > core/src/main/scala/org/apache/spark/internal/config/package.scala: > ConfigBuilder("spark.master.useAppNameAsAppId.enabled") {code} -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-45961) Document `spark.master.*` configurations
[ https://issues.apache.org/jira/browse/SPARK-45961?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dongjoon Hyun reassigned SPARK-45961: - Assignee: Dongjoon Hyun > Document `spark.master.*` configurations > > > Key: SPARK-45961 > URL: https://issues.apache.org/jira/browse/SPARK-45961 > Project: Spark > Issue Type: Sub-task > Components: Documentation >Affects Versions: 3.4.2, 4.0.0, 3.5.1 >Reporter: Dongjoon Hyun >Assignee: Dongjoon Hyun >Priority: Major > Labels: pull-request-available > > Currently, `spark.master.*` configurations are undocumented. > {code:java} > $ git grep 'ConfigBuilder("spark.master' > core/src/main/scala/org/apache/spark/internal/config/UI.scala: val > MASTER_UI_DECOMMISSION_ALLOW_MODE = > ConfigBuilder("spark.master.ui.decommission.allow.mode") > core/src/main/scala/org/apache/spark/internal/config/package.scala: > private[spark] val MASTER_REST_SERVER_ENABLED = > ConfigBuilder("spark.master.rest.enabled") > core/src/main/scala/org/apache/spark/internal/config/package.scala: > private[spark] val MASTER_REST_SERVER_PORT = > ConfigBuilder("spark.master.rest.port") > core/src/main/scala/org/apache/spark/internal/config/package.scala: > private[spark] val MASTER_UI_PORT = ConfigBuilder("spark.master.ui.port") > core/src/main/scala/org/apache/spark/internal/config/package.scala: > ConfigBuilder("spark.master.ui.historyServerUrl") > core/src/main/scala/org/apache/spark/internal/config/package.scala: > ConfigBuilder("spark.master.useAppNameAsAppId.enabled") {code} -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-45961) Document `spark.master.*` configurations
[ https://issues.apache.org/jira/browse/SPARK-45961?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dongjoon Hyun resolved SPARK-45961. --- Fix Version/s: 4.0.0 Resolution: Fixed Issue resolved by pull request 43848 [https://github.com/apache/spark/pull/43848] > Document `spark.master.*` configurations > > > Key: SPARK-45961 > URL: https://issues.apache.org/jira/browse/SPARK-45961 > Project: Spark > Issue Type: Sub-task > Components: Documentation >Affects Versions: 3.4.2, 4.0.0, 3.5.1 >Reporter: Dongjoon Hyun >Assignee: Dongjoon Hyun >Priority: Major > Labels: pull-request-available > Fix For: 4.0.0 > > > Currently, `spark.master.*` configurations are undocumented. > {code:java} > $ git grep 'ConfigBuilder("spark.master' > core/src/main/scala/org/apache/spark/internal/config/UI.scala: val > MASTER_UI_DECOMMISSION_ALLOW_MODE = > ConfigBuilder("spark.master.ui.decommission.allow.mode") > core/src/main/scala/org/apache/spark/internal/config/package.scala: > private[spark] val MASTER_REST_SERVER_ENABLED = > ConfigBuilder("spark.master.rest.enabled") > core/src/main/scala/org/apache/spark/internal/config/package.scala: > private[spark] val MASTER_REST_SERVER_PORT = > ConfigBuilder("spark.master.rest.port") > core/src/main/scala/org/apache/spark/internal/config/package.scala: > private[spark] val MASTER_UI_PORT = ConfigBuilder("spark.master.ui.port") > core/src/main/scala/org/apache/spark/internal/config/package.scala: > ConfigBuilder("spark.master.ui.historyServerUrl") > core/src/main/scala/org/apache/spark/internal/config/package.scala: > ConfigBuilder("spark.master.useAppNameAsAppId.enabled") {code} -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-45962) Remove treatEmptyValuesAsNulls and use nullValue option instead in XML
[ https://issues.apache.org/jira/browse/SPARK-45962?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated SPARK-45962: --- Labels: pull-request-available (was: ) > Remove treatEmptyValuesAsNulls and use nullValue option instead in XML > -- > > Key: SPARK-45962 > URL: https://issues.apache.org/jira/browse/SPARK-45962 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 4.0.0 >Reporter: Shujing Yang >Priority: Major > Labels: pull-request-available > > Today, we offer two available options to handle null values. To enhance user > clarity and simplify usage, we propose consolidating these into a single > option. We recommend retaining the {{nullValue}} option due to its broader > semantic scope. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-45961) Document `spark.master.*` configurations
[ https://issues.apache.org/jira/browse/SPARK-45961?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated SPARK-45961: --- Labels: pull-request-available (was: ) > Document `spark.master.*` configurations > > > Key: SPARK-45961 > URL: https://issues.apache.org/jira/browse/SPARK-45961 > Project: Spark > Issue Type: Sub-task > Components: Documentation >Affects Versions: 3.4.2, 4.0.0, 3.5.1 >Reporter: Dongjoon Hyun >Priority: Major > Labels: pull-request-available > > Currently, `spark.master.*` configurations are undocumented. > {code:java} > $ git grep 'ConfigBuilder("spark.master' > core/src/main/scala/org/apache/spark/internal/config/UI.scala: val > MASTER_UI_DECOMMISSION_ALLOW_MODE = > ConfigBuilder("spark.master.ui.decommission.allow.mode") > core/src/main/scala/org/apache/spark/internal/config/package.scala: > private[spark] val MASTER_REST_SERVER_ENABLED = > ConfigBuilder("spark.master.rest.enabled") > core/src/main/scala/org/apache/spark/internal/config/package.scala: > private[spark] val MASTER_REST_SERVER_PORT = > ConfigBuilder("spark.master.rest.port") > core/src/main/scala/org/apache/spark/internal/config/package.scala: > private[spark] val MASTER_UI_PORT = ConfigBuilder("spark.master.ui.port") > core/src/main/scala/org/apache/spark/internal/config/package.scala: > ConfigBuilder("spark.master.ui.historyServerUrl") > core/src/main/scala/org/apache/spark/internal/config/package.scala: > ConfigBuilder("spark.master.useAppNameAsAppId.enabled") {code} -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-45961) Document `spark.master.*` configurations
Dongjoon Hyun created SPARK-45961: - Summary: Document `spark.master.*` configurations Key: SPARK-45961 URL: https://issues.apache.org/jira/browse/SPARK-45961 Project: Spark Issue Type: Sub-task Components: Documentation Affects Versions: 3.4.2, 4.0.0, 3.5.1 Reporter: Dongjoon Hyun Currently, `spark.master.*` configurations are undocumented. {code:java} $ git grep 'ConfigBuilder("spark.master' core/src/main/scala/org/apache/spark/internal/config/UI.scala: val MASTER_UI_DECOMMISSION_ALLOW_MODE = ConfigBuilder("spark.master.ui.decommission.allow.mode") core/src/main/scala/org/apache/spark/internal/config/package.scala: private[spark] val MASTER_REST_SERVER_ENABLED = ConfigBuilder("spark.master.rest.enabled") core/src/main/scala/org/apache/spark/internal/config/package.scala: private[spark] val MASTER_REST_SERVER_PORT = ConfigBuilder("spark.master.rest.port") core/src/main/scala/org/apache/spark/internal/config/package.scala: private[spark] val MASTER_UI_PORT = ConfigBuilder("spark.master.ui.port") core/src/main/scala/org/apache/spark/internal/config/package.scala: ConfigBuilder("spark.master.ui.historyServerUrl") core/src/main/scala/org/apache/spark/internal/config/package.scala: ConfigBuilder("spark.master.useAppNameAsAppId.enabled") {code} -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-45958) Upgrade Arrow to 14.0.1
[ https://issues.apache.org/jira/browse/SPARK-45958?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dongjoon Hyun resolved SPARK-45958. --- Fix Version/s: 4.0.0 Resolution: Fixed Issue resolved by pull request 43846 [https://github.com/apache/spark/pull/43846] > Upgrade Arrow to 14.0.1 > --- > > Key: SPARK-45958 > URL: https://issues.apache.org/jira/browse/SPARK-45958 > Project: Spark > Issue Type: Task > Components: Build >Affects Versions: 4.0.0 >Reporter: Dongjoon Hyun >Assignee: Dongjoon Hyun >Priority: Major > Labels: pull-request-available > Fix For: 4.0.0 > > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-45946) Fix use of deprecated FileUtils write in RocksDBSuite
[ https://issues.apache.org/jira/browse/SPARK-45946?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jungtaek Lim resolved SPARK-45946. -- Fix Version/s: 4.0.0 Resolution: Fixed Issue resolved by pull request 43832 [https://github.com/apache/spark/pull/43832] > Fix use of deprecated FileUtils write in RocksDBSuite > - > > Key: SPARK-45946 > URL: https://issues.apache.org/jira/browse/SPARK-45946 > Project: Spark > Issue Type: Task > Components: Structured Streaming >Affects Versions: 4.0.0 >Reporter: Anish Shrigondekar >Assignee: Anish Shrigondekar >Priority: Major > Labels: pull-request-available > Fix For: 4.0.0 > > > Fix use of deprecated FileUtils write in RocksDBSuite -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-45960) Add Python 3.10 to the Daily Python Github Action job
[ https://issues.apache.org/jira/browse/SPARK-45960?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated SPARK-45960: --- Labels: pull-request-available (was: ) > Add Python 3.10 to the Daily Python Github Action job > - > > Key: SPARK-45960 > URL: https://issues.apache.org/jira/browse/SPARK-45960 > Project: Spark > Issue Type: Task > Components: Project Infra >Affects Versions: 4.0.0 >Reporter: Dongjoon Hyun >Priority: Minor > Labels: pull-request-available > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-45960) Add Python 3.10 to the Daily Python Github Action job
Dongjoon Hyun created SPARK-45960: - Summary: Add Python 3.10 to the Daily Python Github Action job Key: SPARK-45960 URL: https://issues.apache.org/jira/browse/SPARK-45960 Project: Spark Issue Type: Task Components: Project Infra Affects Versions: 4.0.0 Reporter: Dongjoon Hyun -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-45953) Add Python 3.10 to Infra docker image
[ https://issues.apache.org/jira/browse/SPARK-45953?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dongjoon Hyun resolved SPARK-45953. --- Fix Version/s: 4.0.0 Resolution: Fixed Issue resolved by pull request 43840 [https://github.com/apache/spark/pull/43840] > Add Python 3.10 to Infra docker image > - > > Key: SPARK-45953 > URL: https://issues.apache.org/jira/browse/SPARK-45953 > Project: Spark > Issue Type: Task > Components: Project Infra >Affects Versions: 4.0.0 >Reporter: Dongjoon Hyun >Assignee: Dongjoon Hyun >Priority: Major > Labels: pull-request-available > Fix For: 4.0.0 > > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-45953) Add Python 3.10 to Infra docker image
[ https://issues.apache.org/jira/browse/SPARK-45953?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dongjoon Hyun reassigned SPARK-45953: - Assignee: Dongjoon Hyun > Add Python 3.10 to Infra docker image > - > > Key: SPARK-45953 > URL: https://issues.apache.org/jira/browse/SPARK-45953 > Project: Spark > Issue Type: Task > Components: Project Infra >Affects Versions: 4.0.0 >Reporter: Dongjoon Hyun >Assignee: Dongjoon Hyun >Priority: Major > Labels: pull-request-available > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-45959) Abusing DataSet.withColumn can cause huge tree with severe perf degradation
[ https://issues.apache.org/jira/browse/SPARK-45959?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Asif updated SPARK-45959: - Priority: Minor (was: Major) > Abusing DataSet.withColumn can cause huge tree with severe perf degradation > --- > > Key: SPARK-45959 > URL: https://issues.apache.org/jira/browse/SPARK-45959 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 3.5.1 >Reporter: Asif >Priority: Minor > > Though documentation clearly recommends to add all columns in a single shot, > but in reality is difficult to expect customer to modify their code, as in > spark2 the rules in analyzer were such that they did not do deep tree > traversal. Moreover in Spark3 , the plans are cloned before giving to > analyzer , optimizer etc which was not the case in Spark2. > All these things have resulted in query time being increased from 5 min to 2 > - 3 hrs. > Many times the columns are added to plan via some for loop logic which just > keeps adding new computation based on some rule. > So, my suggestion is to do some intial check in the withColumn api, before > creating a new projection, like if all the existing columns are still being > projected, and the new column being added has an expression which is not > depending on the output of the top node , but its child, then instead of > adding a new project, the column can be added to the existing node. > For starts, may be we can just handle Project node .. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-45959) Abusing DataSet.withColumn can cause huge tree with severe perf degradation
[ https://issues.apache.org/jira/browse/SPARK-45959?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17786941#comment-17786941 ] Asif commented on SPARK-45959: -- will create a PR for the same.. > Abusing DataSet.withColumn can cause huge tree with severe perf degradation > --- > > Key: SPARK-45959 > URL: https://issues.apache.org/jira/browse/SPARK-45959 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 3.5.1 >Reporter: Asif >Priority: Major > > Though documentation clearly recommends to add all columns in a single shot, > but in reality is difficult to expect customer to modify their code, as in > spark2 the rules in analyzer were such that they did not do deep tree > traversal. Moreover in Spark3 , the plans are cloned before giving to > analyzer , optimizer etc which was not the case in Spark2. > All these things have resulted in query time being increased from 5 min to 2 > - 3 hrs. > Many times the columns are added to plan via some for loop logic which just > keeps adding new computation based on some rule. > So, my suggestion is to do some intial check in the withColumn api, before > creating a new projection, like if all the existing columns are still being > projected, and the new column being added has an expression which is not > depending on the output of the top node , but its child, then instead of > adding a new project, the column can be added to the existing node. > For starts, may be we can just handle Project node .. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-45959) Abusing DataSet.withColumn can cause huge tree with severe perf degradation
Asif created SPARK-45959: Summary: Abusing DataSet.withColumn can cause huge tree with severe perf degradation Key: SPARK-45959 URL: https://issues.apache.org/jira/browse/SPARK-45959 Project: Spark Issue Type: Improvement Components: SQL Affects Versions: 3.5.1 Reporter: Asif Though documentation clearly recommends to add all columns in a single shot, but in reality is difficult to expect customer to modify their code, as in spark2 the rules in analyzer were such that they did not do deep tree traversal. Moreover in Spark3 , the plans are cloned before giving to analyzer , optimizer etc which was not the case in Spark2. All these things have resulted in query time being increased from 5 min to 2 - 3 hrs. Many times the columns are added to plan via some for loop logic which just keeps adding new computation based on some rule. So, my suggestion is to do some intial check in the withColumn api, before creating a new projection, like if all the existing columns are still being projected, and the new column being added has an expression which is not depending on the output of the top node , but its child, then instead of adding a new project, the column can be added to the existing node. For starts, may be we can just handle Project node .. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-45958) Upgrade Arrow to 14.0.1
[ https://issues.apache.org/jira/browse/SPARK-45958?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated SPARK-45958: --- Labels: pull-request-available (was: ) > Upgrade Arrow to 14.0.1 > --- > > Key: SPARK-45958 > URL: https://issues.apache.org/jira/browse/SPARK-45958 > Project: Spark > Issue Type: Task > Components: Build >Affects Versions: 4.0.0 >Reporter: Dongjoon Hyun >Priority: Major > Labels: pull-request-available > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-45958) Upgrade Arrow to 14.0.1
Dongjoon Hyun created SPARK-45958: - Summary: Upgrade Arrow to 14.0.1 Key: SPARK-45958 URL: https://issues.apache.org/jira/browse/SPARK-45958 Project: Spark Issue Type: Task Components: Build Affects Versions: 4.0.0 Reporter: Dongjoon Hyun -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-44924) Add configurations for FileStreamSource cached files
[ https://issues.apache.org/jira/browse/SPARK-44924?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated SPARK-44924: --- Labels: pull-request-available (was: ) > Add configurations for FileStreamSource cached files > > > Key: SPARK-44924 > URL: https://issues.apache.org/jira/browse/SPARK-44924 > Project: Spark > Issue Type: Improvement > Components: Structured Streaming >Affects Versions: 3.1.0 >Reporter: kevin nacios >Priority: Minor > Labels: pull-request-available > > With https://issues.apache.org/jira/browse/SPARK-30866, caching of listed > files was added for structured streaming to reduce cost of relisting from > filesystem each batch. The settings that drive this are currently hardcoded > and there is no way to change them. > > This impacts some of our workloads where we process large datasets where its > unknown how "heavy" some files are, so a single batch can take a long period > of time. When we set maxFilesPerTrigger to 100k files, a subsequent batch > using the cached max of 10k files is causing the job to take longer since the > cluster is capable of handling the 100k files but is stuck doing 10% of the > workload. The benefit of the caching doesn't outweigh the cost of the > performance on the rest of the job. > > With config settings available for this, we could either absorb some > increased driver memory usage for caching the next 100k files, or opt to > disable caching entirely and just relist files each batch by setting the > cache amount to 0. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-45956) Upgrade ZooKeeper to 3.7.2
[ https://issues.apache.org/jira/browse/SPARK-45956?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17786919#comment-17786919 ] Dongjoon Hyun commented on SPARK-45956: --- I collected this as a subtask of SPARK-44111 to give more visibility. Thank you for working on this. > Upgrade ZooKeeper to 3.7.2 > -- > > Key: SPARK-45956 > URL: https://issues.apache.org/jira/browse/SPARK-45956 > Project: Spark > Issue Type: Sub-task > Components: Build >Affects Versions: 4.0.0 >Reporter: Bjørn Jørgensen >Priority: Major > Labels: pull-request-available > > [CVE-2023-44981|https://nvd.nist.gov/vuln/detail/CVE-2023-44981] -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-45956) Upgrade ZooKeeper to 3.7.2
[ https://issues.apache.org/jira/browse/SPARK-45956?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dongjoon Hyun updated SPARK-45956: -- Parent: SPARK-44111 Issue Type: Sub-task (was: Dependency upgrade) > Upgrade ZooKeeper to 3.7.2 > -- > > Key: SPARK-45956 > URL: https://issues.apache.org/jira/browse/SPARK-45956 > Project: Spark > Issue Type: Sub-task > Components: Build >Affects Versions: 4.0.0 >Reporter: Bjørn Jørgensen >Priority: Major > Labels: pull-request-available > > [CVE-2023-44981|https://nvd.nist.gov/vuln/detail/CVE-2023-44981] -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-44118) Support K8s scheduling gates
[ https://issues.apache.org/jira/browse/SPARK-44118?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17786916#comment-17786916 ] Dongjoon Hyun commented on SPARK-44118: --- We will revisit this after the feature reaches `GA` and the most K8s environment users can access this feature. > Support K8s scheduling gates > > > Key: SPARK-44118 > URL: https://issues.apache.org/jira/browse/SPARK-44118 > Project: Spark > Issue Type: Improvement > Components: Kubernetes >Affects Versions: 4.0.0 >Reporter: Dongjoon Hyun >Priority: Major > > [https://kubernetes.io/docs/concepts/scheduling-eviction/pod-scheduling-readiness/] > - Kubernetes v1.26 [alpha] > - Kubernetes v1.27 [beta] > - Kubernetes v1.28 [beta] -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-44118) Support K8s scheduling gates
[ https://issues.apache.org/jira/browse/SPARK-44118?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dongjoon Hyun updated SPARK-44118: -- Description: [https://kubernetes.io/docs/concepts/scheduling-eviction/pod-scheduling-readiness/] - Kubernetes v1.26 [alpha] - Kubernetes v1.27 [beta] - Kubernetes v1.28 [beta] was: https://kubernetes.io/docs/concepts/scheduling-eviction/pod-scheduling-readiness/ - Kubernetes v1.26 [alpha] > Support K8s scheduling gates > > > Key: SPARK-44118 > URL: https://issues.apache.org/jira/browse/SPARK-44118 > Project: Spark > Issue Type: Sub-task > Components: Kubernetes >Affects Versions: 4.0.0 >Reporter: Dongjoon Hyun >Priority: Major > > [https://kubernetes.io/docs/concepts/scheduling-eviction/pod-scheduling-readiness/] > - Kubernetes v1.26 [alpha] > - Kubernetes v1.27 [beta] > - Kubernetes v1.28 [beta] -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-44118) Support K8s scheduling gates
[ https://issues.apache.org/jira/browse/SPARK-44118?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dongjoon Hyun updated SPARK-44118: -- Parent: (was: SPARK-44111) Issue Type: Improvement (was: Sub-task) > Support K8s scheduling gates > > > Key: SPARK-44118 > URL: https://issues.apache.org/jira/browse/SPARK-44118 > Project: Spark > Issue Type: Improvement > Components: Kubernetes >Affects Versions: 4.0.0 >Reporter: Dongjoon Hyun >Priority: Major > > [https://kubernetes.io/docs/concepts/scheduling-eviction/pod-scheduling-readiness/] > - Kubernetes v1.26 [alpha] > - Kubernetes v1.27 [beta] > - Kubernetes v1.28 [beta] -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-44118) Support K8s scheduling gates
[ https://issues.apache.org/jira/browse/SPARK-44118?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17786913#comment-17786913 ] Dongjoon Hyun commented on SPARK-44118: --- This is excluded from Apache Spark 4.0.0 scope because it's still `beta` even in K8s 1.28. > Support K8s scheduling gates > > > Key: SPARK-44118 > URL: https://issues.apache.org/jira/browse/SPARK-44118 > Project: Spark > Issue Type: Sub-task > Components: Kubernetes >Affects Versions: 4.0.0 >Reporter: Dongjoon Hyun >Priority: Major > > https://kubernetes.io/docs/concepts/scheduling-eviction/pod-scheduling-readiness/ > - Kubernetes v1.26 [alpha] -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-45957) SQL on streaming Temp view fails
[ https://issues.apache.org/jira/browse/SPARK-45957?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated SPARK-45957: --- Labels: pull-request-available (was: ) > SQL on streaming Temp view fails > > > Key: SPARK-45957 > URL: https://issues.apache.org/jira/browse/SPARK-45957 > Project: Spark > Issue Type: Bug > Components: Connect, Structured Streaming >Affects Versions: 4.0.0 >Reporter: Raghu Angadi >Priority: Major > Labels: pull-request-available > Fix For: 4.0.0 > > > The following code fails in the last step with Spark Connect. > The root cause is that Connect server triggers physical plan on a streaming > Dataframe [in > SparkConnectPlanner.scala|https://github.com/apache/spark/blob/334d952f9555cbfad8ef84987d6f978eb6b37b9b/connector/connect/server/src/main/scala/org/apache/spark/sql/connect/planner/SparkConnectPlanner.scala#L2591]. > Better to avoid that entirely, but at least for streaming it should be > avoided since it cannot be done with a batch execution engine. > {code:java} > df = spark.readStream.format("rate").option("numPartitions", "1").load() > df.createOrReplaceTempView("temp_view") > view_df = spark.sql("SELECT * FROM temp_view") // FAILS{code} > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-45957) SQL on streaming Temp view fails
[ https://issues.apache.org/jira/browse/SPARK-45957?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Raghu Angadi updated SPARK-45957: - Description: The following code fails in the last step with Spark Connect. The root cause is that Connect server triggers physical plan on a streaming Dataframe [in SparkConnectPlanner.scala|https://github.com/apache/spark/blob/334d952f9555cbfad8ef84987d6f978eb6b37b9b/connector/connect/server/src/main/scala/org/apache/spark/sql/connect/planner/SparkConnectPlanner.scala#L2591]. Better to avoid that entirely, but at least for streaming it should be avoided since it cannot be done with a batch execution engine. {code:java} df = spark.readStream.format("rate").option("numPartitions", "1").load() df.createOrReplaceTempView("temp_view") view_df = spark.sql("SELECT * FROM temp_view") // FAILS{code} was: The following code fails in the last step with Spark Connect. The root cause is that Connect server triggers physical plan on a streaming Dataframe [in SparkConnectPlanner.scala|https://github.com/apache/spark/blob/334d952f9555cbfad8ef84987d6f978eb6b37b9b/connector/connect/server/src/main/scala/org/apache/spark/sql/connect/planner/SparkConnectPlanner.scala#L2591]. Better to avoid that entirely, but at least for streaming it should be avoided since it cannot be done with a batch execution engine. {code:java} df = spark.readStream.format("rate").option("numPartitions", "1").load() df.createOrReplaceTempView("temp_view") view_df = spark.sql("SELECT * FROM temp_view") // FAILS{code} > SQL on streaming Temp view fails > > > Key: SPARK-45957 > URL: https://issues.apache.org/jira/browse/SPARK-45957 > Project: Spark > Issue Type: Bug > Components: Connect, Structured Streaming >Affects Versions: 4.0.0 >Reporter: Raghu Angadi >Priority: Major > Fix For: 4.0.0 > > > The following code fails in the last step with Spark Connect. > The root cause is that Connect server triggers physical plan on a streaming > Dataframe [in > SparkConnectPlanner.scala|https://github.com/apache/spark/blob/334d952f9555cbfad8ef84987d6f978eb6b37b9b/connector/connect/server/src/main/scala/org/apache/spark/sql/connect/planner/SparkConnectPlanner.scala#L2591]. > Better to avoid that entirely, but at least for streaming it should be > avoided since it cannot be done with a batch execution engine. > {code:java} > df = spark.readStream.format("rate").option("numPartitions", "1").load() > df.createOrReplaceTempView("temp_view") > view_df = spark.sql("SELECT * FROM temp_view") // FAILS{code} > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-45955) Collapse Support for Flamegraph and thread dump details
[ https://issues.apache.org/jira/browse/SPARK-45955?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dongjoon Hyun resolved SPARK-45955. --- Fix Version/s: 4.0.0 Resolution: Fixed Issue resolved by pull request 43842 [https://github.com/apache/spark/pull/43842] > Collapse Support for Flamegraph and thread dump details > --- > > Key: SPARK-45955 > URL: https://issues.apache.org/jira/browse/SPARK-45955 > Project: Spark > Issue Type: Sub-task > Components: Web UI >Affects Versions: 4.0.0 >Reporter: Kent Yao >Assignee: Kent Yao >Priority: Major > Labels: pull-request-available > Fix For: 4.0.0 > > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-45957) SQL on streaming Temp view fails
Raghu Angadi created SPARK-45957: Summary: SQL on streaming Temp view fails Key: SPARK-45957 URL: https://issues.apache.org/jira/browse/SPARK-45957 Project: Spark Issue Type: Bug Components: Connect, Structured Streaming Affects Versions: 4.0.0 Reporter: Raghu Angadi Fix For: 4.0.0 The following code fails in the last step with Spark Connect. The root cause is that Connect server triggers physical plan on a streaming Dataframe [in SparkConnectPlanner.scala|https://github.com/apache/spark/blob/334d952f9555cbfad8ef84987d6f978eb6b37b9b/connector/connect/server/src/main/scala/org/apache/spark/sql/connect/planner/SparkConnectPlanner.scala#L2591]. Better to avoid that entirely, but at least for streaming it should be avoided since it cannot be done with a batch execution engine. {code:java} df = spark.readStream.format("rate").option("numPartitions", "1").load() df.createOrReplaceTempView("temp_view") view_df = spark.sql("SELECT * FROM temp_view") // FAILS{code} -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-45955) Collapse Support for Flamegraph and thread dump details
[ https://issues.apache.org/jira/browse/SPARK-45955?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dongjoon Hyun reassigned SPARK-45955: - Assignee: Kent Yao > Collapse Support for Flamegraph and thread dump details > --- > > Key: SPARK-45955 > URL: https://issues.apache.org/jira/browse/SPARK-45955 > Project: Spark > Issue Type: Sub-task > Components: Web UI >Affects Versions: 4.0.0 >Reporter: Kent Yao >Assignee: Kent Yao >Priority: Major > Labels: pull-request-available > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-45956) Upgrade ZooKeeper to 3.7.2
[ https://issues.apache.org/jira/browse/SPARK-45956?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated SPARK-45956: --- Labels: pull-request-available (was: ) > Upgrade ZooKeeper to 3.7.2 > -- > > Key: SPARK-45956 > URL: https://issues.apache.org/jira/browse/SPARK-45956 > Project: Spark > Issue Type: Dependency upgrade > Components: Build >Affects Versions: 4.0.0 >Reporter: Bjørn Jørgensen >Priority: Major > Labels: pull-request-available > > [CVE-2023-44981|https://nvd.nist.gov/vuln/detail/CVE-2023-44981] -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-45956) Upgrade ZooKeeper to 3.7.2
[ https://issues.apache.org/jira/browse/SPARK-45956?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Bjørn Jørgensen updated SPARK-45956: Summary: Upgrade ZooKeeper to 3.7.2 (was: Upgrade ZooKeeper to X.X) > Upgrade ZooKeeper to 3.7.2 > -- > > Key: SPARK-45956 > URL: https://issues.apache.org/jira/browse/SPARK-45956 > Project: Spark > Issue Type: Dependency upgrade > Components: Build >Affects Versions: 4.0.0 >Reporter: Bjørn Jørgensen >Priority: Major > > [CVE-2023-44981|https://nvd.nist.gov/vuln/detail/CVE-2023-44981] -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-45956) Upgrade ZooKeeper to X.X
Bjørn Jørgensen created SPARK-45956: --- Summary: Upgrade ZooKeeper to X.X Key: SPARK-45956 URL: https://issues.apache.org/jira/browse/SPARK-45956 Project: Spark Issue Type: Dependency upgrade Components: Build Affects Versions: 4.0.0 Reporter: Bjørn Jørgensen [CVE-2023-44981|https://nvd.nist.gov/vuln/detail/CVE-2023-44981] -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-45920) group by ordinal should be idempotent
[ https://issues.apache.org/jira/browse/SPARK-45920?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dongjoon Hyun updated SPARK-45920: -- Fix Version/s: 3.3.4 > group by ordinal should be idempotent > - > > Key: SPARK-45920 > URL: https://issues.apache.org/jira/browse/SPARK-45920 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 3.3.3 >Reporter: Wenchen Fan >Assignee: Wenchen Fan >Priority: Major > Labels: pull-request-available > Fix For: 3.4.2, 4.0.0, 3.5.1, 3.3.4 > > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-45920) group by ordinal should be idempotent
[ https://issues.apache.org/jira/browse/SPARK-45920?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dongjoon Hyun updated SPARK-45920: -- Fix Version/s: 3.4.2 > group by ordinal should be idempotent > - > > Key: SPARK-45920 > URL: https://issues.apache.org/jira/browse/SPARK-45920 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 3.3.3 >Reporter: Wenchen Fan >Assignee: Wenchen Fan >Priority: Major > Labels: pull-request-available > Fix For: 3.4.2, 4.0.0, 3.5.1 > > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-45920) group by ordinal should be idempotent
[ https://issues.apache.org/jira/browse/SPARK-45920?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dongjoon Hyun updated SPARK-45920: -- Fix Version/s: 3.5.1 > group by ordinal should be idempotent > - > > Key: SPARK-45920 > URL: https://issues.apache.org/jira/browse/SPARK-45920 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 3.3.3 >Reporter: Wenchen Fan >Assignee: Wenchen Fan >Priority: Major > Labels: pull-request-available > Fix For: 4.0.0, 3.5.1 > > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-43980) Add support for EXCEPT in select clause, similar to what databricks provides
[ https://issues.apache.org/jira/browse/SPARK-43980?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated SPARK-43980: --- Labels: pull-request-available (was: ) > Add support for EXCEPT in select clause, similar to what databricks provides > > > Key: SPARK-43980 > URL: https://issues.apache.org/jira/browse/SPARK-43980 > Project: Spark > Issue Type: New Feature > Components: SQL >Affects Versions: 3.4.0 >Reporter: Yash Kothari >Priority: Major > Labels: pull-request-available > > I'm looking for a way to incorporate the {{select * except(col1, ...)}} > clause provided by Databricks into my workflow. I don't use Databricks and > would like to introduce this {{select except}} clause either as a > spark-package or by contributing a change to Spark. > However, I'm unsure about how to begin this process and would appreciate any > guidance from the community. > [https://docs.databricks.com/sql/language-manual/sql-ref-syntax-qry-select.html#examples] > > Thank you for your assistance. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-45954) Avoid generating redundant ShuffleExchangeExec node
[ https://issues.apache.org/jira/browse/SPARK-45954?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Yuming Wang updated SPARK-45954: Summary: Avoid generating redundant ShuffleExchangeExec node (was: Remove redundant shuffles) > Avoid generating redundant ShuffleExchangeExec node > --- > > Key: SPARK-45954 > URL: https://issues.apache.org/jira/browse/SPARK-45954 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 4.0.0 >Reporter: Yuming Wang >Priority: Major > Labels: pull-request-available > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-45951) Upgrade buf to v1.28.1
[ https://issues.apache.org/jira/browse/SPARK-45951?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Yang Jie resolved SPARK-45951. -- Fix Version/s: 4.0.0 Resolution: Fixed Issue resolved by pull request 43835 [https://github.com/apache/spark/pull/43835] > Upgrade buf to v1.28.1 > -- > > Key: SPARK-45951 > URL: https://issues.apache.org/jira/browse/SPARK-45951 > Project: Spark > Issue Type: Improvement > Components: Build >Affects Versions: 4.0.0 >Reporter: Ruifeng Zheng >Assignee: Ruifeng Zheng >Priority: Major > Labels: pull-request-available > Fix For: 4.0.0 > > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-45951) Upgrade buf to v1.28.1
[ https://issues.apache.org/jira/browse/SPARK-45951?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Yang Jie reassigned SPARK-45951: Assignee: Ruifeng Zheng > Upgrade buf to v1.28.1 > -- > > Key: SPARK-45951 > URL: https://issues.apache.org/jira/browse/SPARK-45951 > Project: Spark > Issue Type: Improvement > Components: Build >Affects Versions: 4.0.0 >Reporter: Ruifeng Zheng >Assignee: Ruifeng Zheng >Priority: Major > Labels: pull-request-available > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-45955) Collapse Support for Flamegraph and thread dump details
[ https://issues.apache.org/jira/browse/SPARK-45955?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated SPARK-45955: --- Labels: pull-request-available (was: ) > Collapse Support for Flamegraph and thread dump details > --- > > Key: SPARK-45955 > URL: https://issues.apache.org/jira/browse/SPARK-45955 > Project: Spark > Issue Type: Sub-task > Components: Web UI >Affects Versions: 4.0.0 >Reporter: Kent Yao >Priority: Major > Labels: pull-request-available > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-45955) Collapse Support for Flamegraph and thread dump details
Kent Yao created SPARK-45955: Summary: Collapse Support for Flamegraph and thread dump details Key: SPARK-45955 URL: https://issues.apache.org/jira/browse/SPARK-45955 Project: Spark Issue Type: Sub-task Components: Web UI Affects Versions: 4.0.0 Reporter: Kent Yao -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-45954) Remove redundant shuffles
[ https://issues.apache.org/jira/browse/SPARK-45954?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated SPARK-45954: --- Labels: pull-request-available (was: ) > Remove redundant shuffles > - > > Key: SPARK-45954 > URL: https://issues.apache.org/jira/browse/SPARK-45954 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 4.0.0 >Reporter: Yuming Wang >Priority: Major > Labels: pull-request-available > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-45954) Remove redundant shuffles
Yuming Wang created SPARK-45954: --- Summary: Remove redundant shuffles Key: SPARK-45954 URL: https://issues.apache.org/jira/browse/SPARK-45954 Project: Spark Issue Type: Improvement Components: SQL Affects Versions: 4.0.0 Reporter: Yuming Wang -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-45950) Fix IvyTestUtils#createIvyDescriptor function and make common-utils module can run tests on GitHub Action
[ https://issues.apache.org/jira/browse/SPARK-45950?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Yang Jie updated SPARK-45950: - Summary: Fix IvyTestUtils#createIvyDescriptor function and make common-utils module can run tests on GitHub Action (was: Make `common-utils` module can run tests on GitHub Action) > Fix IvyTestUtils#createIvyDescriptor function and make common-utils module > can run tests on GitHub Action > - > > Key: SPARK-45950 > URL: https://issues.apache.org/jira/browse/SPARK-45950 > Project: Spark > Issue Type: Improvement > Components: Project Infra >Affects Versions: 4.0.0 >Reporter: Yang Jie >Priority: Major > Labels: pull-request-available > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-45950) Fix IvyTestUtils#createIvyDescriptor function and make common-utils module can run tests on GitHub Action
[ https://issues.apache.org/jira/browse/SPARK-45950?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Yang Jie updated SPARK-45950: - Component/s: Spark Core > Fix IvyTestUtils#createIvyDescriptor function and make common-utils module > can run tests on GitHub Action > - > > Key: SPARK-45950 > URL: https://issues.apache.org/jira/browse/SPARK-45950 > Project: Spark > Issue Type: Bug > Components: Project Infra, Spark Core >Affects Versions: 4.0.0 >Reporter: Yang Jie >Priority: Major > Labels: pull-request-available > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-45950) Fix IvyTestUtils#createIvyDescriptor function and make common-utils module can run tests on GitHub Action
[ https://issues.apache.org/jira/browse/SPARK-45950?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Yang Jie updated SPARK-45950: - Issue Type: Bug (was: Improvement) > Fix IvyTestUtils#createIvyDescriptor function and make common-utils module > can run tests on GitHub Action > - > > Key: SPARK-45950 > URL: https://issues.apache.org/jira/browse/SPARK-45950 > Project: Spark > Issue Type: Bug > Components: Project Infra >Affects Versions: 4.0.0 >Reporter: Yang Jie >Priority: Major > Labels: pull-request-available > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-45414) spark-xml misplaces string tag content
[ https://issues.apache.org/jira/browse/SPARK-45414?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17786700#comment-17786700 ] Giuseppe Ceravolo commented on SPARK-45414: --- [~ritikam] I appreciate your support, but I do not want to have to manually/programmatically move up or down one or more fields... I am looking for an automatic fix of this error by the way, I have already put in place (in production) the workaround you are suggesting by programmatically moving down all string columns, and adding a new "fake" column for each one of them, writing the file like that and then reading it back to remove the "fake" tags and re-writing it... not the best solution I guess :) > spark-xml misplaces string tag content > -- > > Key: SPARK-45414 > URL: https://issues.apache.org/jira/browse/SPARK-45414 > Project: Spark > Issue Type: Bug > Components: PySpark, Spark Core >Affects Versions: 3.3.0 >Reporter: Giuseppe Ceravolo >Priority: Critical > Attachments: IllegalArgumentException.txt > > > h1. Intro > Hi all! Please expect some degree of incompleteness in this issue as this is > the very first one I post, and feel free to edit it as you like - I welcome > your feedback. > My goal is to provide you with as many details and indications as I can on > this issue that I am currently facing with a Client of mine on its Production > environment (we use Azure Databricks DBR 11.3 LTS). > I was told by Sean Owen [[srowen (Sean Owen) > (github.com)|https://github.com/srowen]], who maintains the spark-xml maven > repository on GitHub [[https://github.com/srowen/spark-xml]] to post an issue > here because "This code has been ported to Apache Spark now anyway so won't > be updated here" (refer to his comment [here|#issuecomment-1744792958]). > h1. Issue > When I write a DataFrame into xml format via the spark-xml library either (1) > I get an error if empty string columns are in between non-string nested ones > or (2) if I put all string columns at the end then I get a wrong xml where > the content of string tags are misplaced into the following ones. > h1. Code to reproduce the issue > Please find below the end-to-end code snippet that results into the error > h2. CASE (1): ERROR > When empty strings are in between non-string nested ones, the write fails > with the following error. > _Caused by: java.lang.IllegalArgumentException: Failed to convert value > MyDescription (class of class java.lang.String) in type > ArrayType(StructType(StructField(_ID,StringType,true),StructField(_Level,StringType,true)),true) > to XML._ > Please find attached the full trace of the error. > {code:python} > fake_file_df = spark \ > .sql( > """SELECT > CAST(STRUCT('ItemId' AS `_Type`, '123' AS `_VALUE`) AS > STRUCT<_Type: STRING, _VALUE: STRING>) AS ItemID, > CAST(STRUCT('UPC' AS `_Type`, '123' AS `_VALUE`) AS STRUCT<_Type: > STRING, _VALUE: STRING>) AS UPC, > CAST('' AS STRING) AS _SerialNumberFlag, > CAST('MyDescription' AS STRING) AS Description, > CAST(ARRAY(STRUCT(NULL AS `_ID`, NULL AS `_Level`)) AS > ARRAY>) AS MerchandiseHierarchy, > CAST(ARRAY(STRUCT(NULL AS `_ValueTypeCode`, NULL AS `_VALUE`)) AS > ARRAY>) AS ItemPrice, > CAST('' AS STRING) AS Color, > CAST('' AS STRING) AS IntendedIndustry, > CAST(STRUCT(NULL AS `Name`) AS STRUCT) AS > Manufacturer, > CAST(STRUCT(NULL AS `Season`) AS STRUCT) AS > Marketing, > CAST(STRUCT(NULL AS `_Name`) AS STRUCT<_Name: STRING>) AS > BrandOwner, > CAST(ARRAY(STRUCT('Attribute1' AS `_Name`, 'Value1' AS `_VALUE`)) > AS ARRAY>) AS > ItemAttribute_culinary, > CAST(ARRAY(STRUCT(NULL AS `_Name`, ARRAY(ARRAY(STRUCT(NULL AS > `AttributeCode`, NULL AS `AttributeValue`))) AS `_VALUE`)) AS > ARRAY AttributeValue: STRING>) AS ItemAttribute_noculinary, > CAST(STRUCT(STRUCT(NULL AS `_UnitOfMeasure`, NULL AS `_VALUE`) AS > `Depth`, STRUCT(NULL AS `_UnitOfMeasure`, NULL AS `_VALUE`) AS `Height`, > STRUCT(NULL AS `_UnitOfMeasure`, NULL AS `_VALUE`) AS `Width`, STRUCT(NULL AS > `_UnitOfMeasure`, NULL AS `_VALUE`) AS `Diameter`) AS STRUCT STRUCT<_UnitOfMeasure: STRING, _VALUE: STRING>, Height: > STRUCT<_UnitOfMeasure: STRING, _VALUE: STRING>, Width: STRUCT<_UnitOfMeasure: > STRING, _VALUE: STRING>, Diameter: STRUCT<_UnitOfMeasure: STRING, _VALUE: > STRING>>) AS ItemMeasurements, > CAST(STRUCT('GroupA' AS `TaxGroupID`, 'CodeA' AS `TaxExemptCode`, > '1' AS `TaxAmount`) AS STRUCT TaxAmount: STRING>) AS TaxInformation, > CAST('' AS STRING) AS ItemImageUrl, > CAST(ARRAY(ARRAY(STRUCT(NULL AS `_action`, NULL AS > `_franchiseeId`, NULL AS `_franchisee
[jira] [Resolved] (SPARK-45851) (Scala) Support different retry policies for connect client
[ https://issues.apache.org/jira/browse/SPARK-45851?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hyukjin Kwon resolved SPARK-45851. -- Fix Version/s: 4.0.0 Assignee: Alice Sayutina Resolution: Fixed Fixed in https://github.com/apache/spark/pull/43757 > (Scala) Support different retry policies for connect client > --- > > Key: SPARK-45851 > URL: https://issues.apache.org/jira/browse/SPARK-45851 > Project: Spark > Issue Type: Improvement > Components: Connect >Affects Versions: 4.0.0 >Reporter: Alice Sayutina >Assignee: Alice Sayutina >Priority: Major > Labels: pull-request-available > Fix For: 4.0.0 > > > Support multiple retry policies defined at the same time. Each policy > determines which error types it can retry and how exactly. > For instance, networking errors should generally be retried differently that > remote resource being available. > Relevant python ticket: SPARK-45733 -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-45945) Add a helper function for `parser`
[ https://issues.apache.org/jira/browse/SPARK-45945?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ruifeng Zheng resolved SPARK-45945. --- Fix Version/s: 4.0.0 Resolution: Fixed Issue resolved by pull request 43826 [https://github.com/apache/spark/pull/43826] > Add a helper function for `parser` > -- > > Key: SPARK-45945 > URL: https://issues.apache.org/jira/browse/SPARK-45945 > Project: Spark > Issue Type: Improvement > Components: Connect >Affects Versions: 4.0.0 >Reporter: Ruifeng Zheng >Assignee: Ruifeng Zheng >Priority: Minor > Labels: pull-request-available > Fix For: 4.0.0 > > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-45953) Add Python 3.10 to Infra docker image
[ https://issues.apache.org/jira/browse/SPARK-45953?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot reassigned SPARK-45953: -- Assignee: Apache Spark > Add Python 3.10 to Infra docker image > - > > Key: SPARK-45953 > URL: https://issues.apache.org/jira/browse/SPARK-45953 > Project: Spark > Issue Type: Task > Components: Project Infra >Affects Versions: 4.0.0 >Reporter: Dongjoon Hyun >Assignee: Apache Spark >Priority: Major > Labels: pull-request-available > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-45953) Add Python 3.10 to Infra docker image
[ https://issues.apache.org/jira/browse/SPARK-45953?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot reassigned SPARK-45953: -- Assignee: (was: Apache Spark) > Add Python 3.10 to Infra docker image > - > > Key: SPARK-45953 > URL: https://issues.apache.org/jira/browse/SPARK-45953 > Project: Spark > Issue Type: Task > Components: Project Infra >Affects Versions: 4.0.0 >Reporter: Dongjoon Hyun >Priority: Major > Labels: pull-request-available > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-45953) Add Python 3.10 to Infra docker image
[ https://issues.apache.org/jira/browse/SPARK-45953?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot reassigned SPARK-45953: -- Assignee: (was: Apache Spark) > Add Python 3.10 to Infra docker image > - > > Key: SPARK-45953 > URL: https://issues.apache.org/jira/browse/SPARK-45953 > Project: Spark > Issue Type: Task > Components: Project Infra >Affects Versions: 4.0.0 >Reporter: Dongjoon Hyun >Priority: Major > Labels: pull-request-available > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-45953) Add Python 3.10 to Infra docker image
[ https://issues.apache.org/jira/browse/SPARK-45953?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot reassigned SPARK-45953: -- Assignee: Apache Spark > Add Python 3.10 to Infra docker image > - > > Key: SPARK-45953 > URL: https://issues.apache.org/jira/browse/SPARK-45953 > Project: Spark > Issue Type: Task > Components: Project Infra >Affects Versions: 4.0.0 >Reporter: Dongjoon Hyun >Assignee: Apache Spark >Priority: Major > Labels: pull-request-available > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-45953) Add Python 3.10 to Infra docker image
[ https://issues.apache.org/jira/browse/SPARK-45953?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated SPARK-45953: --- Labels: pull-request-available (was: ) > Add Python 3.10 to Infra docker image > - > > Key: SPARK-45953 > URL: https://issues.apache.org/jira/browse/SPARK-45953 > Project: Spark > Issue Type: Task > Components: Project Infra >Affects Versions: 4.0.0 >Reporter: Dongjoon Hyun >Priority: Major > Labels: pull-request-available > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-45953) Add Python 3.10 to Infra docker image
Dongjoon Hyun created SPARK-45953: - Summary: Add Python 3.10 to Infra docker image Key: SPARK-45953 URL: https://issues.apache.org/jira/browse/SPARK-45953 Project: Spark Issue Type: Task Components: Project Infra Affects Versions: 4.0.0 Reporter: Dongjoon Hyun -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-45929) support grouping set operation in dataframe api
[ https://issues.apache.org/jira/browse/SPARK-45929?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot reassigned SPARK-45929: -- Assignee: (was: Apache Spark) > support grouping set operation in dataframe api > --- > > Key: SPARK-45929 > URL: https://issues.apache.org/jira/browse/SPARK-45929 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 3.4.1 >Reporter: JacobZheng >Priority: Major > Labels: pull-request-available > > I am using spark dataframe api for complex calculations. When I need to use > the grouping sets function, I can only convert the expression to sql via > analyzedPlan and then splice these sql into a complex sql to execute. In some > cases, this operation generates an extremely complex sql. executing this > complex sql, antlr4 continues to consume a large amount of memory, similar to > a memory leak scenario. If you can and rollup, cube function through the > dataframe api to calculate these operations will be much simpler. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-45929) support grouping set operation in dataframe api
[ https://issues.apache.org/jira/browse/SPARK-45929?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot reassigned SPARK-45929: -- Assignee: Apache Spark > support grouping set operation in dataframe api > --- > > Key: SPARK-45929 > URL: https://issues.apache.org/jira/browse/SPARK-45929 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 3.4.1 >Reporter: JacobZheng >Assignee: Apache Spark >Priority: Major > Labels: pull-request-available > > I am using spark dataframe api for complex calculations. When I need to use > the grouping sets function, I can only convert the expression to sql via > analyzedPlan and then splice these sql into a complex sql to execute. In some > cases, this operation generates an extremely complex sql. executing this > complex sql, antlr4 continues to consume a large amount of memory, similar to > a memory leak scenario. If you can and rollup, cube function through the > dataframe api to calculate these operations will be much simpler. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-45952) Use built-in math constant in math functions
[ https://issues.apache.org/jira/browse/SPARK-45952?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot reassigned SPARK-45952: -- Assignee: (was: Apache Spark) > Use built-in math constant in math functions > - > > Key: SPARK-45952 > URL: https://issues.apache.org/jira/browse/SPARK-45952 > Project: Spark > Issue Type: Sub-task > Components: Documentation, PySpark >Affects Versions: 4.0.0 >Reporter: Ruifeng Zheng >Priority: Major > Labels: pull-request-available > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-45952) Use built-in math constant in math functions
[ https://issues.apache.org/jira/browse/SPARK-45952?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot reassigned SPARK-45952: -- Assignee: Apache Spark > Use built-in math constant in math functions > - > > Key: SPARK-45952 > URL: https://issues.apache.org/jira/browse/SPARK-45952 > Project: Spark > Issue Type: Sub-task > Components: Documentation, PySpark >Affects Versions: 4.0.0 >Reporter: Ruifeng Zheng >Assignee: Apache Spark >Priority: Major > Labels: pull-request-available > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-45952) Use built-in math constant in math functions
[ https://issues.apache.org/jira/browse/SPARK-45952?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated SPARK-45952: --- Labels: pull-request-available (was: ) > Use built-in math constant in math functions > - > > Key: SPARK-45952 > URL: https://issues.apache.org/jira/browse/SPARK-45952 > Project: Spark > Issue Type: Sub-task > Components: Documentation, PySpark >Affects Versions: 4.0.0 >Reporter: Ruifeng Zheng >Priority: Major > Labels: pull-request-available > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-45929) support grouping set operation in dataframe api
[ https://issues.apache.org/jira/browse/SPARK-45929?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot reassigned SPARK-45929: -- Assignee: (was: Apache Spark) > support grouping set operation in dataframe api > --- > > Key: SPARK-45929 > URL: https://issues.apache.org/jira/browse/SPARK-45929 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 3.4.1 >Reporter: JacobZheng >Priority: Major > Labels: pull-request-available > > I am using spark dataframe api for complex calculations. When I need to use > the grouping sets function, I can only convert the expression to sql via > analyzedPlan and then splice these sql into a complex sql to execute. In some > cases, this operation generates an extremely complex sql. executing this > complex sql, antlr4 continues to consume a large amount of memory, similar to > a memory leak scenario. If you can and rollup, cube function through the > dataframe api to calculate these operations will be much simpler. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-45929) support grouping set operation in dataframe api
[ https://issues.apache.org/jira/browse/SPARK-45929?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot reassigned SPARK-45929: -- Assignee: Apache Spark > support grouping set operation in dataframe api > --- > > Key: SPARK-45929 > URL: https://issues.apache.org/jira/browse/SPARK-45929 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 3.4.1 >Reporter: JacobZheng >Assignee: Apache Spark >Priority: Major > Labels: pull-request-available > > I am using spark dataframe api for complex calculations. When I need to use > the grouping sets function, I can only convert the expression to sql via > analyzedPlan and then splice these sql into a complex sql to execute. In some > cases, this operation generates an extremely complex sql. executing this > complex sql, antlr4 continues to consume a large amount of memory, similar to > a memory leak scenario. If you can and rollup, cube function through the > dataframe api to calculate these operations will be much simpler. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org