[jira] [Updated] (SPARK-52088) Redesign ClosureCleaner Implementation Due to JDK-8309635's Removal of Old Core Reflection and Inability to Modify Private Final Fields in Hidden Classes
[ https://issues.apache.org/jira/browse/SPARK-52088?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated SPARK-52088: --- Labels: pull-request-available (was: ) > Redesign ClosureCleaner Implementation Due to JDK-8309635's Removal of Old > Core Reflection and Inability to Modify Private Final Fields in Hidden Classes > -- > > Key: SPARK-52088 > URL: https://issues.apache.org/jira/browse/SPARK-52088 > Project: Spark > Issue Type: Sub-task > Components: Spark Core >Affects Versions: 4.1.0 >Reporter: Yang Jie >Priority: Major > Labels: pull-request-available > > The removal of the old core reflection implementation in > [JDK-8309635|https://bugs.openjdk.org/browse/JDK-8309635] poses a risk that > the workaround for SPARK-40729 — which involves setting > `-Djdk.reflect.useDirectMethodHandle=false` to enable the old core reflection > — may no longer work in the next Java LTS release (if the next Java LTS does > not revert [JDK-8309635|https://bugs.openjdk.org/browse/JDK-8309635]). We > might need to consider redesigning the implementation of `ClosureCleaner`. > Currently, when testing the `repl` module with Java 22, the following error > occurs: > ``` > build/sbt clean "repl/test" > ``` > ``` > [info] - broadcast vars *** FAILED *** (1 second, 141 milliseconds) > [info] isContain was true Interpreter output contained 'Exception': > [info] Welcome to > [info] __ > [info] / _{_}/{_}_ ___ {_}/ /{_}_ > [info] {_}\ \/ _ \/ _ `/ __/ '{_}/ > [info] /__{_}/ .{_}{_}/_,{_}/{_}/ /{_}/_\ version 4.1.0-SNAPSHOT > [info] /_/ > [info] > [info] Using Scala version 2.13.16 (OpenJDK 64-Bit Server VM, Java 22.0.2) > [info] Type in expressions to have them evaluated. > [info] Type :help for more information. > [info] > [info] scala> > [info] scala> var array: Array[Int] = Array(0, 0, 0, 0, 0) > [info] > [info] scala> val broadcastArray: > org.apache.spark.broadcast.Broadcast[Array[Int]] = Broadcast(0) > [info] > [info] scala> java.lang.InternalError: java.lang.IllegalAccessException: > final field has no write access: $Lambda/0x060001ecedd8.arg$1/putField, > from class java.lang.Object (module java.base) > [info] at > java.base/jdk.internal.reflect.MethodHandleAccessorFactory.newFieldAccessor(MethodHandleAccessorFactory.java:207) > [info] at > java.base/jdk.internal.reflect.ReflectionFactory.newFieldAccessor(ReflectionFactory.java:144) > [info] at > java.base/java.lang.reflect.Field.acquireOverrideFieldAccessor(Field.java:1200) > [info] at > java.base/java.lang.reflect.Field.getOverrideFieldAccessor(Field.java:1169) > [info] at java.base/java.lang.reflect.Field.set(Field.java:836) > [info] at > org.apache.spark.util.ClosureCleaner$.setFieldAndIgnoreModifiers(ClosureCleaner.scala:564) > [info] at > org.apache.spark.util.ClosureCleaner$.cleanupScalaReplClosure(ClosureCleaner.scala:432) > [info] at > org.apache.spark.util.ClosureCleaner$.clean(ClosureCleaner.scala:257) > [info] at > org.apache.spark.util.SparkClosureCleaner$.clean(SparkClosureCleaner.scala:39) > [info] at org.apache.spark.SparkContext.clean(SparkContext.scala:2843) > [info] at org.apache.spark.rdd.RDD.$anonfun$map$1(RDD.scala:425) > [info] at > org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:151) > [info] at > org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:112) > [info] at org.apache.spark.rdd.RDD.withScope(RDD.scala:417) > [info] at org.apache.spark.rdd.RDD.map(RDD.scala:424) > [info] ... 79 elided > ... > [info] Run completed in 35 seconds, 38 milliseconds. > [info] Total number of tests run: 36 > [info] Suites: completed 3, aborted 0 > [info] Tests: succeeded 27, failed 9, canceled 0, ignored 0, pending 0 > [info] *** 9 TESTS FAILED *** > [error] Failed tests: > [error] org.apache.spark.repl.SingletonReplSuite > [error] org.apache.spark.repl.ReplSuite > ``` > I tried switching to using either `VarHandle` or `Unsafe#putObject`, but > neither of them worked because the test cases involved modifying a `private > final field` within a `hidden class`. > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-52325) Publish Apache Spark 3.5.6 to docker registry
[ https://issues.apache.org/jira/browse/SPARK-52325?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated SPARK-52325: --- Labels: pull-request-available (was: ) > Publish Apache Spark 3.5.6 to docker registry > -- > > Key: SPARK-52325 > URL: https://issues.apache.org/jira/browse/SPARK-52325 > Project: Spark > Issue Type: Improvement > Components: Build >Affects Versions: 4.0.0 >Reporter: Hyukjin Kwon >Priority: Major > Labels: pull-request-available > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-52254) Adds a GitHub Actions workflow to convert RC to the official release
[ https://issues.apache.org/jira/browse/SPARK-52254?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hyukjin Kwon reassigned SPARK-52254: Assignee: Hyukjin Kwon > Adds a GitHub Actions workflow to convert RC to the official release > > > Key: SPARK-52254 > URL: https://issues.apache.org/jira/browse/SPARK-52254 > Project: Spark > Issue Type: Sub-task > Components: Project Infra >Affects Versions: 4.0.0 >Reporter: Hyukjin Kwon >Assignee: Hyukjin Kwon >Priority: Major > Labels: pull-request-available > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-52254) Adds a GitHub Actions workflow to convert RC to the official release
[ https://issues.apache.org/jira/browse/SPARK-52254?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hyukjin Kwon resolved SPARK-52254. -- Fix Version/s: 4.1.0 Resolution: Fixed Issue resolved by pull request 50974 [https://github.com/apache/spark/pull/50974] > Adds a GitHub Actions workflow to convert RC to the official release > > > Key: SPARK-52254 > URL: https://issues.apache.org/jira/browse/SPARK-52254 > Project: Spark > Issue Type: Sub-task > Components: Project Infra >Affects Versions: 4.0.0 >Reporter: Hyukjin Kwon >Assignee: Hyukjin Kwon >Priority: Major > Labels: pull-request-available > Fix For: 4.1.0 > > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-52312) Caching AppendData plan causes data to be inserted twice
[ https://issues.apache.org/jira/browse/SPARK-52312?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated SPARK-52312: --- Labels: pull-request-available (was: ) > Caching AppendData plan causes data to be inserted twice > > > Key: SPARK-52312 > URL: https://issues.apache.org/jira/browse/SPARK-52312 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 4.0.0 >Reporter: Tom van Bussel >Priority: Major > Labels: pull-request-available > > We’ve identified an issue where a {{DataFrame}} created from an {{INSERT}} > SQL statement and then cached will cause the {{INSERT}} to be executed twice. > This happens because the logical plan for the {{INSERT}} ({{{}AppendData{}}}) > doesn’t extend the {{IgnoreCachedData}} trait, so it isn’t ignored during > caching as expected. As a result, the plan is cached and re-executed. We > should fix this by ensuring that plans used by {{INSERT}} all extend the > {{IgnoreCachedData}} trait. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-52286) Publish Apache Spark 4.0.0 to docker registry
[ https://issues.apache.org/jira/browse/SPARK-52286?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17954309#comment-17954309 ] Yury Molchan commented on SPARK-52286: -- Hello I am trying to run 4.0.0 on the Mac M1 and I am facing with the error to pull it. 4.0.0-preview2 docker image works well. ``` spark % docker pull spark:4.0.0-scala2.13-java17-python3-ubuntu Error response from daemon: manifest for spark:4.0.0-scala2.13-java17-python3-ubuntu not found: manifest unknown: manifest unknown spark % docker pull spark:4.0.0-scala2.13-java21-python3-ubuntu Error response from daemon: manifest for spark:4.0.0-scala2.13-java21-python3-ubuntu not found: manifest unknown: manifest unknown spark % docker pull spark:4.0.0-preview2-scala2.13-java21-python3-ubuntu 4.0.0-preview2-scala2.13-java21-python3-ubuntu: Pulling from library/spark 67b06617bd6b: Pulling fs layer ``` > Publish Apache Spark 4.0.0 to docker registry > - > > Key: SPARK-52286 > URL: https://issues.apache.org/jira/browse/SPARK-52286 > Project: Spark > Issue Type: Sub-task > Components: Project Infra >Affects Versions: 4.0.0 >Reporter: Dongjoon Hyun >Assignee: Dongjoon Hyun >Priority: Major > Labels: pull-request-available > Fix For: 4.0.0 > > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Comment Edited] (SPARK-52286) Publish Apache Spark 4.0.0 to docker registry
[ https://issues.apache.org/jira/browse/SPARK-52286?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17954309#comment-17954309 ] Yury Molchan edited comment on SPARK-52286 at 5/27/25 11:52 AM: Hello I am trying to run 4.0.0 on the Mac M1 and I am facing with the error to pull it. 4.0.0-preview2 docker image works well. {code} spark % docker pull spark:4.0.0-scala2.13-java17-python3-ubuntu Error response from daemon: manifest for spark:4.0.0-scala2.13-java17-python3-ubuntu not found: manifest unknown: manifest unknown spark % docker pull spark:4.0.0-scala2.13-java21-python3-ubuntu Error response from daemon: manifest for spark:4.0.0-scala2.13-java21-python3-ubuntu not found: manifest unknown: manifest unknown spark % docker pull spark:4.0.0-preview2-scala2.13-java21-python3-ubuntu 4.0.0-preview2-scala2.13-java21-python3-ubuntu: Pulling from library/spark 67b06617bd6b: Pulling fs layer {code} was (Author: yurkom): Hello I am trying to run 4.0.0 on the Mac M1 and I am facing with the error to pull it. 4.0.0-preview2 docker image works well. ``` spark % docker pull spark:4.0.0-scala2.13-java17-python3-ubuntu Error response from daemon: manifest for spark:4.0.0-scala2.13-java17-python3-ubuntu not found: manifest unknown: manifest unknown spark % docker pull spark:4.0.0-scala2.13-java21-python3-ubuntu Error response from daemon: manifest for spark:4.0.0-scala2.13-java21-python3-ubuntu not found: manifest unknown: manifest unknown spark % docker pull spark:4.0.0-preview2-scala2.13-java21-python3-ubuntu 4.0.0-preview2-scala2.13-java21-python3-ubuntu: Pulling from library/spark 67b06617bd6b: Pulling fs layer ``` > Publish Apache Spark 4.0.0 to docker registry > - > > Key: SPARK-52286 > URL: https://issues.apache.org/jira/browse/SPARK-52286 > Project: Spark > Issue Type: Sub-task > Components: Project Infra >Affects Versions: 4.0.0 >Reporter: Dongjoon Hyun >Assignee: Dongjoon Hyun >Priority: Major > Labels: pull-request-available > Fix For: 4.0.0 > > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Comment Edited] (SPARK-52286) Publish Apache Spark 4.0.0 to docker registry
[ https://issues.apache.org/jira/browse/SPARK-52286?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17954309#comment-17954309 ] Yury Molchan edited comment on SPARK-52286 at 5/27/25 12:01 PM: Hello I am trying to run 4.0.0 on the Mac M1 and I am facing with the error to pull it. 4.0.0-preview2 docker image works well. {code:java} spark % docker pull spark:4.0.0-scala2.13-java17-python3-ubuntu Error response from daemon: manifest for spark:4.0.0-scala2.13-java17-python3-ubuntu not found: manifest unknown: manifest unknown spark % docker pull spark:4.0.0-scala2.13-java21-python3-ubuntu Error response from daemon: manifest for spark:4.0.0-scala2.13-java21-python3-ubuntu not found: manifest unknown: manifest unknown spark % docker pull spark:4.0.0-preview2-scala2.13-java21-python3-ubuntu 4.0.0-preview2-scala2.13-java21-python3-ubuntu: Pulling from library/spark 67b06617bd6b: Pulling fs layer {code} Be noted that the pull was performed as 'library'. the following is working: {code} spark % docker pull apache/spark:4.0.0-scala2.13-java17-python3-ubuntu4.0.0-scala2.13-java17-python3-ubuntu: Pulling from apache/spark 67b06617bd6b: Pull complete {code} was (Author: yurkom): Hello I am trying to run 4.0.0 on the Mac M1 and I am facing with the error to pull it. 4.0.0-preview2 docker image works well. {code} spark % docker pull spark:4.0.0-scala2.13-java17-python3-ubuntu Error response from daemon: manifest for spark:4.0.0-scala2.13-java17-python3-ubuntu not found: manifest unknown: manifest unknown spark % docker pull spark:4.0.0-scala2.13-java21-python3-ubuntu Error response from daemon: manifest for spark:4.0.0-scala2.13-java21-python3-ubuntu not found: manifest unknown: manifest unknown spark % docker pull spark:4.0.0-preview2-scala2.13-java21-python3-ubuntu 4.0.0-preview2-scala2.13-java21-python3-ubuntu: Pulling from library/spark 67b06617bd6b: Pulling fs layer {code} > Publish Apache Spark 4.0.0 to docker registry > - > > Key: SPARK-52286 > URL: https://issues.apache.org/jira/browse/SPARK-52286 > Project: Spark > Issue Type: Sub-task > Components: Project Infra >Affects Versions: 4.0.0 >Reporter: Dongjoon Hyun >Assignee: Dongjoon Hyun >Priority: Major > Labels: pull-request-available > Fix For: 4.0.0 > > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-52324) move Spark docs to the release directory
[ https://issues.apache.org/jira/browse/SPARK-52324?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hyukjin Kwon resolved SPARK-52324. -- Fix Version/s: 4.1.0 3.5.6 4.0.1 Resolution: Fixed Issue resolved by pull request 51026 [https://github.com/apache/spark/pull/51026] > move Spark docs to the release directory > > > Key: SPARK-52324 > URL: https://issues.apache.org/jira/browse/SPARK-52324 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 4.0.0 >Reporter: Wenchen Fan >Assignee: Wenchen Fan >Priority: Major > Labels: pull-request-available > Fix For: 4.1.0, 3.5.6, 4.0.1 > > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-52326) Add partitions related external catalog events
Xiang Li created SPARK-52326: Summary: Add partitions related external catalog events Key: SPARK-52326 URL: https://issues.apache.org/jira/browse/SPARK-52326 Project: Spark Issue Type: Bug Components: SQL Affects Versions: 3.5.0 Reporter: Xiang Li -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-52326) Add partitions related external catalog events
[ https://issues.apache.org/jira/browse/SPARK-52326?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Xiang Li updated SPARK-52326: - External issue URL: https://github.com/apache/spark/pull/51030 > Add partitions related external catalog events > -- > > Key: SPARK-52326 > URL: https://issues.apache.org/jira/browse/SPARK-52326 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 3.5.0 >Reporter: Xiang Li >Priority: Minor > > In > [ExternalCatalogWithListener|https://github.com/apache/spark/blob/master/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/catalog/ExternalCatalogWithListener.scala], > there are events posted for operations against db, table and function for > all registered listeners. But an operation against operation does not have > its event posted. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-52326) Add partitions related external catalog events
[ https://issues.apache.org/jira/browse/SPARK-52326?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Xiang Li updated SPARK-52326: - External issue URL: (was: https://github.com/apache/spark/pull/51030) > Add partitions related external catalog events > -- > > Key: SPARK-52326 > URL: https://issues.apache.org/jira/browse/SPARK-52326 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 3.5.0 >Reporter: Xiang Li >Priority: Minor > > In > [ExternalCatalogWithListener|https://github.com/apache/spark/blob/master/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/catalog/ExternalCatalogWithListener.scala], > there are events posted for operations against db, table and function for > all registered listeners. But an operation against operation does not have > its event posted. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-52324) move Spark docs to the release directory
[ https://issues.apache.org/jira/browse/SPARK-52324?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hyukjin Kwon reassigned SPARK-52324: Assignee: Wenchen Fan > move Spark docs to the release directory > > > Key: SPARK-52324 > URL: https://issues.apache.org/jira/browse/SPARK-52324 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 4.0.0 >Reporter: Wenchen Fan >Assignee: Wenchen Fan >Priority: Major > Labels: pull-request-available > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-52267) Match field id in ParquetToSparkSchemaConverter
[ https://issues.apache.org/jira/browse/SPARK-52267?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wenchen Fan resolved SPARK-52267. - Fix Version/s: 4.1.0 4.0.1 Resolution: Fixed Issue resolved by pull request 50990 [https://github.com/apache/spark/pull/50990] > Match field id in ParquetToSparkSchemaConverter > --- > > Key: SPARK-52267 > URL: https://issues.apache.org/jira/browse/SPARK-52267 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 4.0.0 >Reporter: Chenhao Li >Assignee: Chenhao Li >Priority: Major > Labels: pull-request-available > Fix For: 4.1.0, 4.0.1 > > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-52267) Match field id in ParquetToSparkSchemaConverter
[ https://issues.apache.org/jira/browse/SPARK-52267?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wenchen Fan reassigned SPARK-52267: --- Assignee: Chenhao Li > Match field id in ParquetToSparkSchemaConverter > --- > > Key: SPARK-52267 > URL: https://issues.apache.org/jira/browse/SPARK-52267 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 4.0.0 >Reporter: Chenhao Li >Assignee: Chenhao Li >Priority: Major > Labels: pull-request-available > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-44728) Improve PySpark documentations
[ https://issues.apache.org/jira/browse/SPARK-44728?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ruifeng Zheng updated SPARK-44728: -- Affects Version/s: 4.1.0 > Improve PySpark documentations > -- > > Key: SPARK-44728 > URL: https://issues.apache.org/jira/browse/SPARK-44728 > Project: Spark > Issue Type: Umbrella > Components: PySpark >Affects Versions: 3.5.0, 4.0.0, 4.1.0 >Reporter: Allison Wang >Priority: Major > Labels: pull-request-available > Fix For: 4.0.0 > > > An umbrella Jira ticket to improve the PySpark documentation. > > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-52305) Refine the docstring for isnotnull, equal_null, nullif, nullifzero, nvl, nvl2, zeroifnull
[ https://issues.apache.org/jira/browse/SPARK-52305?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ruifeng Zheng resolved SPARK-52305. --- Fix Version/s: 4.1.0 Resolution: Fixed Issue resolved by pull request 51016 [https://github.com/apache/spark/pull/51016] > Refine the docstring for isnotnull, equal_null, nullif, nullifzero, nvl, > nvl2, zeroifnull > - > > Key: SPARK-52305 > URL: https://issues.apache.org/jira/browse/SPARK-52305 > Project: Spark > Issue Type: Sub-task > Components: Documentation, PySpark >Affects Versions: 4.0.1 >Reporter: Evan Wu >Assignee: Evan Wu >Priority: Major > Labels: pull-request-available > Fix For: 4.1.0 > > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-52305) Refine the docstring for isnotnull, equal_null, nullif, nullifzero, nvl, nvl2, zeroifnull
[ https://issues.apache.org/jira/browse/SPARK-52305?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ruifeng Zheng reassigned SPARK-52305: - Assignee: Evan Wu > Refine the docstring for isnotnull, equal_null, nullif, nullifzero, nvl, > nvl2, zeroifnull > - > > Key: SPARK-52305 > URL: https://issues.apache.org/jira/browse/SPARK-52305 > Project: Spark > Issue Type: Sub-task > Components: Documentation, PySpark >Affects Versions: 4.0.1 >Reporter: Evan Wu >Assignee: Evan Wu >Priority: Major > Labels: pull-request-available > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-52325) Publish Apache Spark 3.5.6 to docker registry
Hyukjin Kwon created SPARK-52325: Summary: Publish Apache Spark 3.5.6 to docker registry Key: SPARK-52325 URL: https://issues.apache.org/jira/browse/SPARK-52325 Project: Spark Issue Type: Improvement Components: Build Affects Versions: 4.0.0 Reporter: Hyukjin Kwon -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-52327) Glob based history provider
Gaurav Waghmare created SPARK-52327: --- Summary: Glob based history provider Key: SPARK-52327 URL: https://issues.apache.org/jira/browse/SPARK-52327 Project: Spark Issue Type: New Feature Components: Spark Core Affects Versions: 4.0.0 Reporter: Gaurav Waghmare Currently, spark history server runs with one base directory immediate subdirectories of which correspond to event logs for each application. There are usecases for eg., multi tenancy where for the purpose of logical separation, the event logs could be stored in separate directories at a tenant level. To achieve this, instead of providing the path of the base directory, a glob for the tenant directories could be provided and used in a separate history provider. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] (SPARK-52286) Publish Apache Spark 4.0.0 to docker registry
[ https://issues.apache.org/jira/browse/SPARK-52286 ] Yury Molchan deleted comment on SPARK-52286: -- was (Author: yurkom): Hello I am trying to run 4.0.0 on the Mac M1 and I am facing with the error to pull it. 4.0.0-preview2 docker image works well. {code:java} spark % docker pull spark:4.0.0-scala2.13-java17-python3-ubuntu Error response from daemon: manifest for spark:4.0.0-scala2.13-java17-python3-ubuntu not found: manifest unknown: manifest unknown spark % docker pull spark:4.0.0-scala2.13-java21-python3-ubuntu Error response from daemon: manifest for spark:4.0.0-scala2.13-java21-python3-ubuntu not found: manifest unknown: manifest unknown spark % docker pull spark:4.0.0-preview2-scala2.13-java21-python3-ubuntu 4.0.0-preview2-scala2.13-java21-python3-ubuntu: Pulling from library/spark 67b06617bd6b: Pulling fs layer {code} Be noted that the pull was performed as 'library'. the following is working: {code} spark % docker pull apache/spark:4.0.0-scala2.13-java17-python3-ubuntu4.0.0-scala2.13-java17-python3-ubuntu: Pulling from apache/spark 67b06617bd6b: Pull complete {code} > Publish Apache Spark 4.0.0 to docker registry > - > > Key: SPARK-52286 > URL: https://issues.apache.org/jira/browse/SPARK-52286 > Project: Spark > Issue Type: Sub-task > Components: Project Infra >Affects Versions: 4.0.0 >Reporter: Dongjoon Hyun >Assignee: Dongjoon Hyun >Priority: Major > Labels: pull-request-available > Fix For: 4.0.0 > > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-52300) Catalog config overrides do not make it into UDTVF resolution
[ https://issues.apache.org/jira/browse/SPARK-52300?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wenchen Fan resolved SPARK-52300. - Fix Version/s: 4.1.0 4.0.1 Resolution: Fixed > Catalog config overrides do not make it into UDTVF resolution > - > > Key: SPARK-52300 > URL: https://issues.apache.org/jira/browse/SPARK-52300 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 4.0.0 >Reporter: Burak Yavuz >Assignee: Burak Yavuz >Priority: Major > Labels: pull-request-available > Fix For: 4.1.0, 4.0.1 > > > When resolving SQL User-defined Table Valued Functions, the catalog options > do not get registered correctly if the configurations were overridden after a > session is created (that is, not available as an override at Spark startup). > This is not a problem during View resolution. > > This rift is unnecessary and the resolution rules should be consistent with > regards to which SQL configurations get passed down during UDTVF resolution > similar with view resolution. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-52223) [SDP] Create spark connect API for SDP
[ https://issues.apache.org/jira/browse/SPARK-52223?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sandy Ryza resolved SPARK-52223. Resolution: Fixed > [SDP] Create spark connect API for SDP > -- > > Key: SPARK-52223 > URL: https://issues.apache.org/jira/browse/SPARK-52223 > Project: Spark > Issue Type: Sub-task > Components: Spark Core >Affects Versions: 4.1.0 >Reporter: Aakash Japi >Assignee: Aakash Japi >Priority: Major > Labels: pull-request-available > > SDP is a Spark Connect-only feature. We need to add the following APIs to > cover the pipeline lifecycle: > # {{CreateDataflowGraph}} creates a new graph in the registry. > # {{DefineDataset}} and {{DefineFlow}} register elements to the created > graph. Datasets are the nodes of the dataflow graph, and are either tables or > views, and flows are the edges connecting them. > # {{StartRun}} starts a run, which is a single execution of a graph. > # {{StopRun}} stops an existing run, while {{DropPipeline}} stops any > current runs and drops the pipeline. > # `PipelineCommand`, which contains a oneof that contains one of the above > protos. This is the interface exposed to the SC command itself. > We also need to add the new {{PipelineCommand}} object to the > {{ExecutePlanRequest}} and the {{PipelineCommand.Response}} to the > {{ExecutePlanResponse}} object. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-52329) Remove private[sql] tags for new transformWithState API
[ https://issues.apache.org/jira/browse/SPARK-52329?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated SPARK-52329: --- Labels: pull-request-available (was: ) > Remove private[sql] tags for new transformWithState API > --- > > Key: SPARK-52329 > URL: https://issues.apache.org/jira/browse/SPARK-52329 > Project: Spark > Issue Type: Task > Components: Structured Streaming >Affects Versions: 4.0.0, 4.1.0 >Reporter: Anish Shrigondekar >Priority: Major > Labels: pull-request-available > > Remove private[sql] tags for new transformWithState API -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-52315) Upgrade kubernetes-client version to 7.3.1
[ https://issues.apache.org/jira/browse/SPARK-52315?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dongjoon Hyun updated SPARK-52315: -- Parent: SPARK-52205 Issue Type: Sub-task (was: Bug) > Upgrade kubernetes-client version to 7.3.1 > -- > > Key: SPARK-52315 > URL: https://issues.apache.org/jira/browse/SPARK-52315 > Project: Spark > Issue Type: Sub-task > Components: Kubernetes >Affects Versions: kubernetes-operator-0.3.0 >Reporter: William Hyun >Assignee: William Hyun >Priority: Minor > Labels: pull-request-available > Fix For: kubernetes-operator-0.3.0 > > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-52328) Use `apache/spark-connect-swift:pi` image
[ https://issues.apache.org/jira/browse/SPARK-52328?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dongjoon Hyun updated SPARK-52328: -- Summary: Use `apache/spark-connect-swift:pi` image (was: Use `apache/spark-connect-swift:pi`) > Use `apache/spark-connect-swift:pi` image > - > > Key: SPARK-52328 > URL: https://issues.apache.org/jira/browse/SPARK-52328 > Project: Spark > Issue Type: Sub-task > Components: Kubernetes >Affects Versions: kubernetes-operator-0.3.0 >Reporter: Dongjoon Hyun >Priority: Major > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-52328) Use `apache/spark-connect-swift:pi` image
[ https://issues.apache.org/jira/browse/SPARK-52328?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated SPARK-52328: --- Labels: pull-request-available (was: ) > Use `apache/spark-connect-swift:pi` image > - > > Key: SPARK-52328 > URL: https://issues.apache.org/jira/browse/SPARK-52328 > Project: Spark > Issue Type: Sub-task > Components: Kubernetes >Affects Versions: kubernetes-operator-0.3.0 >Reporter: Dongjoon Hyun >Priority: Major > Labels: pull-request-available > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-52326) Add partitions related external catalog events
[ https://issues.apache.org/jira/browse/SPARK-52326?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Xiang Li updated SPARK-52326: - Description: In [ExternalCatalogWithListener|https://github.com/apache/spark/blob/master/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/catalog/ExternalCatalogWithListener.scala], there are events posted for operations against db, table and function for all registered listeners. But an operation against operation does not have its event posted. > Add partitions related external catalog events > -- > > Key: SPARK-52326 > URL: https://issues.apache.org/jira/browse/SPARK-52326 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 3.5.0 >Reporter: Xiang Li >Priority: Minor > > In > [ExternalCatalogWithListener|https://github.com/apache/spark/blob/master/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/catalog/ExternalCatalogWithListener.scala], > there are events posted for operations against db, table and function for > all registered listeners. But an operation against operation does not have > its event posted. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-52334) In Kubernetes mode, update all files, jars, archiveFiles, and pyFiles to reference the working directory after they are downloaded.
[ https://issues.apache.org/jira/browse/SPARK-52334?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tongwei updated SPARK-52334: Description: When submitting a Spark job with the {{--files}} option and also calling {{SparkContext.addFile()}} for a file with the same name in the application code, Spark throws an exception due to a file registration conflict. *Reproduction Steps:* # Submit a Spark application using {{spark-submit}} with the {{--files}} option: {code:java} bin/spark-submit \ --files s3://bucket/a.text \ --class testDemo \ app.jar {code} # In the {{testDemo}} application code, call: # > In Kubernetes mode, update all files, jars, archiveFiles, and pyFiles to > reference the working directory after they are downloaded. > --- > > Key: SPARK-52334 > URL: https://issues.apache.org/jira/browse/SPARK-52334 > Project: Spark > Issue Type: Bug > Components: Kubernetes, Spark Core >Affects Versions: 4.0.0, 3.5.5 >Reporter: Tongwei >Priority: Major > > When submitting a Spark job with the {{--files}} option and also calling > {{SparkContext.addFile()}} for a file with the same name in the application > code, Spark throws an exception due to a file registration conflict. > *Reproduction Steps:* > # Submit a Spark application using {{spark-submit}} with the {{--files}} > option: > {code:java} > bin/spark-submit \ --files s3://bucket/a.text \ --class testDemo \ app.jar > {code} > > # In the {{testDemo}} application code, call: > # -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-52334) In Kubernetes mode, update all files, jars, archiveFiles, and pyFiles to reference the working directory after they are downloaded.
[ https://issues.apache.org/jira/browse/SPARK-52334?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tongwei updated SPARK-52334: Description: When submitting a Spark job with the {{--files}} option and also calling {{SparkContext.addFile()}} for a file with the same name in the application code, Spark throws an exception due to a file registration conflict. *Reproduction Steps:* # Submit a Spark application using {{spark-submit}} with the {{--files}} option: {code:java} bin/spark-submit \ --files s3://bucket/a.text \ --class testDemo \ app.jar {code} # In the {{testDemo}} application code, call: # was: When submitting a Spark job with the {{--files}} option and also calling {{SparkContext.addFile()}} for a file with the same name in the application code, Spark throws an exception due to a file registration conflict. *Reproduction Steps:* # Submit a Spark application using {{spark-submit}} with the {{--files}} option: {code:java} bin/spark-submit \ --files s3://bucket/a.text \ --class testDemo \ app.jar {code} # In the {{testDemo}} application code, call: # > In Kubernetes mode, update all files, jars, archiveFiles, and pyFiles to > reference the working directory after they are downloaded. > --- > > Key: SPARK-52334 > URL: https://issues.apache.org/jira/browse/SPARK-52334 > Project: Spark > Issue Type: Bug > Components: Kubernetes, Spark Core >Affects Versions: 4.0.0, 3.5.5 >Reporter: Tongwei >Priority: Major > > When submitting a Spark job with the {{--files}} option and also calling > {{SparkContext.addFile()}} for a file with the same name in the application > code, Spark throws an exception due to a file registration conflict. > *Reproduction Steps:* > # Submit a Spark application using {{spark-submit}} with the {{--files}} > option: > {code:java} > bin/spark-submit \ --files s3://bucket/a.text \ --class testDemo \ app.jar > {code} > # In the {{testDemo}} application code, call: > # -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-52334) In Kubernetes mode, update all files, jars, archiveFiles, and pyFiles to reference the working directory after they are downloaded.
Tongwei created SPARK-52334: --- Summary: In Kubernetes mode, update all files, jars, archiveFiles, and pyFiles to reference the working directory after they are downloaded. Key: SPARK-52334 URL: https://issues.apache.org/jira/browse/SPARK-52334 Project: Spark Issue Type: Bug Components: Kubernetes, Spark Core Affects Versions: 3.5.5, 4.0.0 Reporter: Tongwei -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-52334) In Kubernetes mode, update all files, jars, archiveFiles, and pyFiles to reference the working directory after they are downloaded.
[ https://issues.apache.org/jira/browse/SPARK-52334?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated SPARK-52334: --- Labels: pull-request-available (was: ) > In Kubernetes mode, update all files, jars, archiveFiles, and pyFiles to > reference the working directory after they are downloaded. > --- > > Key: SPARK-52334 > URL: https://issues.apache.org/jira/browse/SPARK-52334 > Project: Spark > Issue Type: Bug > Components: Kubernetes, Spark Core >Affects Versions: 4.0.0, 3.5.5 >Reporter: Tongwei >Priority: Major > Labels: pull-request-available > > When submitting a Spark job with the {{--files}} option and also calling > {{SparkContext.addFile()}} for a file with the same name in the application > code, Spark throws an exception ({_}And the same code does not throw an error > in YARN mode{_}.) > *Reproduction Steps:* > 1. Submit a Spark application using {{spark-submit}} with the {{--files}} > option: > {code:java} > bin/spark-submit \ --files s3://bucket/a.text \ --class testDemo \ app.jar > {code} > 2. In the {{testDemo}} application code, call: > {code:java} > sc.addFile("a.text", true) {code} > Error msg: > {code:java} > Exception in thread "main" java.lang.IllegalArgumentException: requirement > failed: File a.text was already registered with a different path (old path = > /tmp/spark-6aa5129d-5bbb-464a-9e50-5b6ffe364ffb/a.text, new path = > /opt/spark/work-dir/a.text{code} -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-52334) In Kubernetes mode, update all files, jars, archiveFiles, and pyFiles to reference the working directory after they are downloaded.
[ https://issues.apache.org/jira/browse/SPARK-52334?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tongwei updated SPARK-52334: Description: When submitting a Spark job with the {{--files}} option and also calling {{SparkContext.addFile()}} for a file with the same name in the application code, Spark throws an exception ({_}And the same code does not throw an error in YARN mode{_}.) *Reproduction Steps:* 1. Submit a Spark application using {{spark-submit}} with the {{--files}} option: {code:java} bin/spark-submit \ --files s3://bucket/a.text \ --class testDemo \ app.jar {code} 2. In the {{testDemo}} application code, call: {code:java} sc.addFile("a.text", true) {code} Error msg: {code:java} Exception in thread "main" java.lang.IllegalArgumentException: requirement failed: File a.text was already registered with a different path (old path = /tmp/spark-6aa5129d-5bbb-464a-9e50-5b6ffe364ffb/a.text, new path = /opt/spark/work-dir/a.text{code} was: When submitting a Spark job with the {{--files}} option and also calling {{SparkContext.addFile()}} for a file with the same name in the application code, Spark throws an exception due to a file registration conflict. *Reproduction Steps:* # Submit a Spark application using {{spark-submit}} with the {{--files}} option: {code:java} bin/spark-submit \ --files s3://bucket/a.text \ --class testDemo \ app.jar {code} # In the {{testDemo}} application code, call: # > In Kubernetes mode, update all files, jars, archiveFiles, and pyFiles to > reference the working directory after they are downloaded. > --- > > Key: SPARK-52334 > URL: https://issues.apache.org/jira/browse/SPARK-52334 > Project: Spark > Issue Type: Bug > Components: Kubernetes, Spark Core >Affects Versions: 4.0.0, 3.5.5 >Reporter: Tongwei >Priority: Major > > When submitting a Spark job with the {{--files}} option and also calling > {{SparkContext.addFile()}} for a file with the same name in the application > code, Spark throws an exception ({_}And the same code does not throw an error > in YARN mode{_}.) > *Reproduction Steps:* > 1. Submit a Spark application using {{spark-submit}} with the {{--files}} > option: > {code:java} > bin/spark-submit \ --files s3://bucket/a.text \ --class testDemo \ app.jar > {code} > 2. In the {{testDemo}} application code, call: > {code:java} > sc.addFile("a.text", true) {code} > Error msg: > {code:java} > Exception in thread "main" java.lang.IllegalArgumentException: requirement > failed: File a.text was already registered with a different path (old path = > /tmp/spark-6aa5129d-5bbb-464a-9e50-5b6ffe364ffb/a.text, new path = > /opt/spark/work-dir/a.text{code} -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-52333) Squeeze protocol for timers (list on specific grouping key, and expiry timers)
[ https://issues.apache.org/jira/browse/SPARK-52333?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated SPARK-52333: --- Labels: pull-request-available (was: ) > Squeeze protocol for timers (list on specific grouping key, and expiry timers) > -- > > Key: SPARK-52333 > URL: https://issues.apache.org/jira/browse/SPARK-52333 > Project: Spark > Issue Type: Improvement > Components: PySpark, Structured Streaming >Affects Versions: 4.1.0 >Reporter: Jungtaek Lim >Priority: Major > Labels: pull-request-available > > Likewise we did for ListState and MapState, we figured out inlining timers > into proto message would give the huge benefit on the state interaction > (intercommunication). This ticket aims to address the same change to listing > timers for grouping key and expiry timers. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-33537) Hive Metastore filter pushdown improvement
[ https://issues.apache.org/jira/browse/SPARK-33537?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Yuming Wang resolved SPARK-33537. - Resolution: Fixed > Hive Metastore filter pushdown improvement > -- > > Key: SPARK-33537 > URL: https://issues.apache.org/jira/browse/SPARK-33537 > Project: Spark > Issue Type: Umbrella > Components: SQL >Affects Versions: 3.1.0 >Reporter: Yuming Wang >Assignee: Yuming Wang >Priority: Major > > This umbrella ticket to track Hive Metastore filter pushdown improvement. It > includes: > 1. Date type push down > 2. Like push down > 3. InSet pushdown improvement > and other fixes. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-52328) Use `apache/spark-connect-swift:pi`
Dongjoon Hyun created SPARK-52328: - Summary: Use `apache/spark-connect-swift:pi` Key: SPARK-52328 URL: https://issues.apache.org/jira/browse/SPARK-52328 Project: Spark Issue Type: Sub-task Components: Kubernetes Affects Versions: kubernetes-operator-0.3.0 Reporter: Dongjoon Hyun -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-52328) Use `apache/spark-connect-swift:pi` image
[ https://issues.apache.org/jira/browse/SPARK-52328?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dongjoon Hyun resolved SPARK-52328. --- Fix Version/s: kubernetes-operator-0.3.0 Resolution: Fixed Issue resolved by pull request 230 [https://github.com/apache/spark-kubernetes-operator/pull/230] > Use `apache/spark-connect-swift:pi` image > - > > Key: SPARK-52328 > URL: https://issues.apache.org/jira/browse/SPARK-52328 > Project: Spark > Issue Type: Sub-task > Components: Kubernetes >Affects Versions: kubernetes-operator-0.3.0 >Reporter: Dongjoon Hyun >Assignee: Dongjoon Hyun >Priority: Major > Labels: pull-request-available > Fix For: kubernetes-operator-0.3.0 > > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-52328) Use `apache/spark-connect-swift:pi` image
[ https://issues.apache.org/jira/browse/SPARK-52328?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dongjoon Hyun reassigned SPARK-52328: - Assignee: Dongjoon Hyun > Use `apache/spark-connect-swift:pi` image > - > > Key: SPARK-52328 > URL: https://issues.apache.org/jira/browse/SPARK-52328 > Project: Spark > Issue Type: Sub-task > Components: Kubernetes >Affects Versions: kubernetes-operator-0.3.0 >Reporter: Dongjoon Hyun >Assignee: Dongjoon Hyun >Priority: Major > Labels: pull-request-available > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-52270) User guide for native plotting
[ https://issues.apache.org/jira/browse/SPARK-52270?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Xinrong Meng reassigned SPARK-52270: Assignee: Xinrong Meng > User guide for native plotting > -- > > Key: SPARK-52270 > URL: https://issues.apache.org/jira/browse/SPARK-52270 > Project: Spark > Issue Type: Sub-task > Components: Documentation, PySpark >Affects Versions: 4.1.0 >Reporter: Xinrong Meng >Assignee: Xinrong Meng >Priority: Major > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-52264) Test divide-by-zero behavior with more numeric data types
[ https://issues.apache.org/jira/browse/SPARK-52264?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Xinrong Meng resolved SPARK-52264. -- Assignee: Xinrong Meng Resolution: Resolved Resolved by https://github.com/apache/spark/pull/50988 > Test divide-by-zero behavior with more numeric data types > - > > Key: SPARK-52264 > URL: https://issues.apache.org/jira/browse/SPARK-52264 > Project: Spark > Issue Type: Sub-task > Components: Pandas API on Spark, Tests >Affects Versions: 4.1.0 >Reporter: Xinrong Meng >Assignee: Xinrong Meng >Priority: Major > Labels: pull-request-available > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-52270) User guide for native plotting
[ https://issues.apache.org/jira/browse/SPARK-52270?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Xinrong Meng resolved SPARK-52270. -- Resolution: Resolved Resolved by https://github.com/apache/spark/pull/50992 > User guide for native plotting > -- > > Key: SPARK-52270 > URL: https://issues.apache.org/jira/browse/SPARK-52270 > Project: Spark > Issue Type: Sub-task > Components: Documentation, PySpark >Affects Versions: 4.1.0 >Reporter: Xinrong Meng >Assignee: Xinrong Meng >Priority: Major > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-52331) Adjust test for promotion from float32 to float64 during division
Xinrong Meng created SPARK-52331: Summary: Adjust test for promotion from float32 to float64 during division Key: SPARK-52331 URL: https://issues.apache.org/jira/browse/SPARK-52331 Project: Spark Issue Type: Sub-task Components: PS, Tests Affects Versions: 4.1.0 Reporter: Xinrong Meng -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-52332) Fix promotion from float32 to float64 during division
Xinrong Meng created SPARK-52332: Summary: Fix promotion from float32 to float64 during division Key: SPARK-52332 URL: https://issues.apache.org/jira/browse/SPARK-52332 Project: Spark Issue Type: Sub-task Components: PS Affects Versions: 4.1.0 Reporter: Xinrong Meng -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-52331) Adjust test for promotion from float32 to float64 during division
[ https://issues.apache.org/jira/browse/SPARK-52331?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated SPARK-52331: --- Labels: pull-request-available (was: ) > Adjust test for promotion from float32 to float64 during division > - > > Key: SPARK-52331 > URL: https://issues.apache.org/jira/browse/SPARK-52331 > Project: Spark > Issue Type: Sub-task > Components: PS, Tests >Affects Versions: 4.1.0 >Reporter: Xinrong Meng >Priority: Major > Labels: pull-request-available > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-52332) Fix promotion from float32 to float64 during division
[ https://issues.apache.org/jira/browse/SPARK-52332?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Xinrong Meng updated SPARK-52332: - Description: {{>>> ps.set_option("compute.fail_on_ansi_mode", False)}} {{>>> spark.conf.set("spark.sql.ansi.enabled", False)}} {{>>> }} {{>>> import pandas as pd}} {{>>> import numpy as np}} {{>>> pdf = pd.DataFrame(}} {{... {}} {{... "a": [1.0, -1.0, 0.0, np.nan],}} {{... "b": [0.0, 0.0, 0.0, 0.0],}} {{... },}} {{... dtype=np.float32,}} {{... )}} {{>>> }} {{>>> psdf = ps.from_pandas(pdf)}} {{>>> }} {{>>> psdf["a"] / psdf["b"]}} {{0 inf }} {{1 -inf}} {{2 NaN}} {{3 NaN}} {{dtype: float64}} {{>>> }} {{>>> pdf["a"] / pdf["b"]}} {{0 inf}} {{1 -inf}} {{2 NaN}} {{3 NaN}} {{dtype: float32}} > Fix promotion from float32 to float64 during division > - > > Key: SPARK-52332 > URL: https://issues.apache.org/jira/browse/SPARK-52332 > Project: Spark > Issue Type: Sub-task > Components: PS >Affects Versions: 4.1.0 >Reporter: Xinrong Meng >Priority: Major > > {{>>> ps.set_option("compute.fail_on_ansi_mode", False)}} > {{>>> spark.conf.set("spark.sql.ansi.enabled", False)}} > {{>>> }} > {{>>> import pandas as pd}} > {{>>> import numpy as np}} > {{>>> pdf = pd.DataFrame(}} > {{... {}} > {{... "a": [1.0, -1.0, 0.0, np.nan],}} > {{... "b": [0.0, 0.0, 0.0, 0.0],}} > {{... },}} > {{... dtype=np.float32,}} > {{... )}} > {{>>> }} > {{>>> psdf = ps.from_pandas(pdf)}} > {{>>> }} > {{>>> psdf["a"] / psdf["b"]}} > {{0 inf > }} > {{1 -inf}} > {{2 NaN}} > {{3 NaN}} > {{dtype: float64}} > {{>>> }} > {{>>> pdf["a"] / pdf["b"]}} > {{0 inf}} > {{1 -inf}} > {{2 NaN}} > {{3 NaN}} > {{dtype: float32}} -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-52331) Adjust test for promotion from float32 to float64 during division
[ https://issues.apache.org/jira/browse/SPARK-52331?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Xinrong Meng resolved SPARK-52331. -- Assignee: Xinrong Meng Resolution: Resolved Resolved by https://github.com/apache/spark/pull/51035 > Adjust test for promotion from float32 to float64 during division > - > > Key: SPARK-52331 > URL: https://issues.apache.org/jira/browse/SPARK-52331 > Project: Spark > Issue Type: Sub-task > Components: PS, Tests >Affects Versions: 4.1.0 >Reporter: Xinrong Meng >Assignee: Xinrong Meng >Priority: Major > Labels: pull-request-available > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-52330) SPIP: Real-Time Mode in Apache Spark Structured Streaming
[ https://issues.apache.org/jira/browse/SPARK-52330?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Boyang Jerry Peng updated SPARK-52330: -- Description: The SPIP proposes to add a new execution mode called “{*}Real-time Mode{*}” in Spark Structured Streaming that significantly lowers end-to-end latency for processing streams of data. Our goal is to make Spark capable of handling streaming jobs that need results *almost immediately (within* {*}O(100) millisecond{*}{*}){*}. We want to achieve this *without changing the high-level DataFrame/Dataset API* that users already use – so existing streaming queries can run in this new ultra-low-latency mode by simply turning it on, without rewriting their logic. In short, we’re trying to enable Spark to power *real-time applications* (like instant anomaly alerts or live personalization) that today cannot meet their latency requirements with Spark’s current streaming engine. SPIP doc: [https://docs.google.com/document/d/1CvJvtlTGP6TwQIT4kW6GFT1JbdziAYOBvt60ybb7Dw8/edit?usp=sharing] was: We propose to add a *real-time mode* in Spark Structured Streaming that significantly lowers end-to-end latency for processing streams of data. Our goal is to make Spark capable of handling streaming jobs that need results *almost immediately (within* {*}O(100) millisecond{*}{*}){*}. We want to achieve this *without changing the high-level DataFrame/Dataset API* that users already use – so existing streaming queries can run in this new ultra-low-latency mode by simply turning it on, without rewriting their logic. In short, we’re trying to enable Spark to power *real-time applications* (like instant anomaly alerts or live personalization) that today cannot meet their latency requirements with Spark’s current streaming engine. SPIP doc: [https://docs.google.com/document/d/1CvJvtlTGP6TwQIT4kW6GFT1JbdziAYOBvt60ybb7Dw8/edit?usp=sharing] > SPIP: Real-Time Mode in Apache Spark Structured Streaming > - > > Key: SPARK-52330 > URL: https://issues.apache.org/jira/browse/SPARK-52330 > Project: Spark > Issue Type: Umbrella > Components: Structured Streaming >Affects Versions: 4.1.0 >Reporter: Boyang Jerry Peng >Priority: Major > > The SPIP proposes to add a new execution mode called “{*}Real-time Mode{*}” > in Spark Structured Streaming that significantly lowers end-to-end latency > for processing streams of data. > Our goal is to make Spark capable of handling streaming jobs that need > results *almost immediately (within* {*}O(100) millisecond{*}{*}){*}. We want > to achieve this *without changing the high-level DataFrame/Dataset API* that > users already use – so existing streaming queries can run in this new > ultra-low-latency mode by simply turning it on, without rewriting their logic. > In short, we’re trying to enable Spark to power *real-time applications* > (like instant anomaly alerts or live personalization) that today cannot meet > their latency requirements with Spark’s current streaming engine. > > SPIP doc: > [https://docs.google.com/document/d/1CvJvtlTGP6TwQIT4kW6GFT1JbdziAYOBvt60ybb7Dw8/edit?usp=sharing] -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-52330) SPIP: Real-Time Mode in Apache Spark Structured Streaming
Boyang Jerry Peng created SPARK-52330: - Summary: SPIP: Real-Time Mode in Apache Spark Structured Streaming Key: SPARK-52330 URL: https://issues.apache.org/jira/browse/SPARK-52330 Project: Spark Issue Type: Umbrella Components: Structured Streaming Affects Versions: 4.1.0 Reporter: Boyang Jerry Peng We propose to add a *real-time mode* in Spark Structured Streaming that significantly lowers end-to-end latency for processing streams of data. Our goal is to make Spark capable of handling streaming jobs that need results *almost immediately (within* {*}O(100) millisecond{*}{*}){*}. We want to achieve this *without changing the high-level DataFrame/Dataset API* that users already use – so existing streaming queries can run in this new ultra-low-latency mode by simply turning it on, without rewriting their logic. In short, we’re trying to enable Spark to power *real-time applications* (like instant anomaly alerts or live personalization) that today cannot meet their latency requirements with Spark’s current streaming engine. SPIP doc: https://docs.google.com/document/d/1CvJvtlTGP6TwQIT4kW6GFT1JbdziAYOBvt60ybb7Dw8/edit?usp=sharing -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-52330) SPIP: Real-Time Mode in Apache Spark Structured Streaming
[ https://issues.apache.org/jira/browse/SPARK-52330?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Boyang Jerry Peng updated SPARK-52330: -- Description: The SPIP proposes a new execution mode called “{*}Real-time Mode{*}” in Spark Structured Streaming that significantly lowers end-to-end latency for processing streams of data. Our goal is to make Spark capable of handling streaming jobs that need results *almost immediately (within* {*}O(100) millisecond{*}{*}){*}. We want to achieve this *without changing the high-level DataFrame/Dataset API* that users already use – so existing streaming queries can run in this new ultra-low-latency mode by simply turning it on, without rewriting their logic. In short, we’re trying to enable Spark to power *real-time applications* (like instant anomaly alerts or live personalization) that today cannot meet their latency requirements with Spark’s current streaming engine. SPIP doc: [https://docs.google.com/document/d/1CvJvtlTGP6TwQIT4kW6GFT1JbdziAYOBvt60ybb7Dw8/edit?usp=sharing] was: The SPIP proposes to add a new execution mode called “{*}Real-time Mode{*}” in Spark Structured Streaming that significantly lowers end-to-end latency for processing streams of data. Our goal is to make Spark capable of handling streaming jobs that need results *almost immediately (within* {*}O(100) millisecond{*}{*}){*}. We want to achieve this *without changing the high-level DataFrame/Dataset API* that users already use – so existing streaming queries can run in this new ultra-low-latency mode by simply turning it on, without rewriting their logic. In short, we’re trying to enable Spark to power *real-time applications* (like instant anomaly alerts or live personalization) that today cannot meet their latency requirements with Spark’s current streaming engine. SPIP doc: [https://docs.google.com/document/d/1CvJvtlTGP6TwQIT4kW6GFT1JbdziAYOBvt60ybb7Dw8/edit?usp=sharing] > SPIP: Real-Time Mode in Apache Spark Structured Streaming > - > > Key: SPARK-52330 > URL: https://issues.apache.org/jira/browse/SPARK-52330 > Project: Spark > Issue Type: Umbrella > Components: Structured Streaming >Affects Versions: 4.1.0 >Reporter: Boyang Jerry Peng >Priority: Major > > The SPIP proposes a new execution mode called “{*}Real-time Mode{*}” in > Spark Structured Streaming that significantly lowers end-to-end latency for > processing streams of data. > Our goal is to make Spark capable of handling streaming jobs that need > results *almost immediately (within* {*}O(100) millisecond{*}{*}){*}. We want > to achieve this *without changing the high-level DataFrame/Dataset API* that > users already use – so existing streaming queries can run in this new > ultra-low-latency mode by simply turning it on, without rewriting their logic. > In short, we’re trying to enable Spark to power *real-time applications* > (like instant anomaly alerts or live personalization) that today cannot meet > their latency requirements with Spark’s current streaming engine. > > SPIP doc: > [https://docs.google.com/document/d/1CvJvtlTGP6TwQIT4kW6GFT1JbdziAYOBvt60ybb7Dw8/edit?usp=sharing] -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-52330) SPIP: Real-Time Mode in Apache Spark Structured Streaming
[ https://issues.apache.org/jira/browse/SPARK-52330?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Boyang Jerry Peng updated SPARK-52330: -- Description: We propose to add a *real-time mode* in Spark Structured Streaming that significantly lowers end-to-end latency for processing streams of data. Our goal is to make Spark capable of handling streaming jobs that need results *almost immediately (within* {*}O(100) millisecond{*}{*}){*}. We want to achieve this *without changing the high-level DataFrame/Dataset API* that users already use – so existing streaming queries can run in this new ultra-low-latency mode by simply turning it on, without rewriting their logic. In short, we’re trying to enable Spark to power *real-time applications* (like instant anomaly alerts or live personalization) that today cannot meet their latency requirements with Spark’s current streaming engine. SPIP doc: [https://docs.google.com/document/d/1CvJvtlTGP6TwQIT4kW6GFT1JbdziAYOBvt60ybb7Dw8/edit?usp=sharing] was: We propose to add a *real-time mode* in Spark Structured Streaming that significantly lowers end-to-end latency for processing streams of data. Our goal is to make Spark capable of handling streaming jobs that need results *almost immediately (within* {*}O(100) millisecond{*}{*}){*}. We want to achieve this *without changing the high-level DataFrame/Dataset API* that users already use – so existing streaming queries can run in this new ultra-low-latency mode by simply turning it on, without rewriting their logic. In short, we’re trying to enable Spark to power *real-time applications* (like instant anomaly alerts or live personalization) that today cannot meet their latency requirements with Spark’s current streaming engine. SPIP doc: https://docs.google.com/document/d/1CvJvtlTGP6TwQIT4kW6GFT1JbdziAYOBvt60ybb7Dw8/edit?usp=sharing > SPIP: Real-Time Mode in Apache Spark Structured Streaming > - > > Key: SPARK-52330 > URL: https://issues.apache.org/jira/browse/SPARK-52330 > Project: Spark > Issue Type: Umbrella > Components: Structured Streaming >Affects Versions: 4.1.0 >Reporter: Boyang Jerry Peng >Priority: Major > > We propose to add a *real-time mode* in Spark Structured Streaming that > significantly lowers end-to-end latency for processing streams of data. > Our goal is to make Spark capable of handling streaming jobs that need > results *almost immediately (within* {*}O(100) millisecond{*}{*}){*}. We want > to achieve this *without changing the high-level DataFrame/Dataset API* that > users already use – so existing streaming queries can run in this new > ultra-low-latency mode by simply turning it on, without rewriting their logic. > In short, we’re trying to enable Spark to power *real-time applications* > (like instant anomaly alerts or live personalization) that today cannot meet > their latency requirements with Spark’s current streaming engine. > > SPIP doc: > [https://docs.google.com/document/d/1CvJvtlTGP6TwQIT4kW6GFT1JbdziAYOBvt60ybb7Dw8/edit?usp=sharing] -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-52327) Glob based history provider
[ https://issues.apache.org/jira/browse/SPARK-52327?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Gaurav Waghmare updated SPARK-52327: Description: Currently, spark history server runs with one base directory immediate subdirectories of which correspond to event logs for each application. There are usecases for eg., multi tenancy where for the purpose of logical separation, the event logs could be stored in separate directories at a tenant level. To achieve this, instead of providing the path of the base directory, a glob for the tenant directories could be provided and used in a separate history provider similar to `org.apache.spark.deploy.history.FsHistoryProvider`. was: Currently, spark history server runs with one base directory immediate subdirectories of which correspond to event logs for each application. There are usecases for eg., multi tenancy where for the purpose of logical separation, the event logs could be stored in separate directories at a tenant level. To achieve this, instead of providing the path of the base directory, a glob for the tenant directories could be provided and used in a separate history provider. > Glob based history provider > --- > > Key: SPARK-52327 > URL: https://issues.apache.org/jira/browse/SPARK-52327 > Project: Spark > Issue Type: New Feature > Components: Spark Core >Affects Versions: 4.0.0 >Reporter: Gaurav Waghmare >Priority: Major > Original Estimate: 168h > Remaining Estimate: 168h > > Currently, spark history server runs with one base directory immediate > subdirectories of which correspond to event logs for each application. > There are usecases for eg., multi tenancy where for the purpose of logical > separation, the event logs could be stored in separate directories at a > tenant level. To achieve this, instead of providing the path of the base > directory, a glob for the tenant directories could be provided and used in a > separate history provider similar to > `org.apache.spark.deploy.history.FsHistoryProvider`. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-52329) Remove private[sql] tags for new transformWithState API
Anish Shrigondekar created SPARK-52329: -- Summary: Remove private[sql] tags for new transformWithState API Key: SPARK-52329 URL: https://issues.apache.org/jira/browse/SPARK-52329 Project: Spark Issue Type: Task Components: Structured Streaming Affects Versions: 4.0.0, 4.1.0 Reporter: Anish Shrigondekar Remove private[sql] tags for new transformWithState API -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-52313) Correctly resolve reference data type for Views with default collation
[ https://issues.apache.org/jira/browse/SPARK-52313?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wenchen Fan reassigned SPARK-52313: --- Assignee: Marko Ilic > Correctly resolve reference data type for Views with default collation > -- > > Key: SPARK-52313 > URL: https://issues.apache.org/jira/browse/SPARK-52313 > Project: Spark > Issue Type: Improvement > Components: Spark Core >Affects Versions: 4.1.0 >Reporter: Marko Ilic >Assignee: Marko Ilic >Priority: Major > Labels: pull-request-available > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-52313) Correctly resolve reference data type for Views with default collation
[ https://issues.apache.org/jira/browse/SPARK-52313?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wenchen Fan resolved SPARK-52313. - Fix Version/s: 4.1.0 4.0.1 Resolution: Fixed Issue resolved by pull request 51023 [https://github.com/apache/spark/pull/51023] > Correctly resolve reference data type for Views with default collation > -- > > Key: SPARK-52313 > URL: https://issues.apache.org/jira/browse/SPARK-52313 > Project: Spark > Issue Type: Improvement > Components: Spark Core >Affects Versions: 4.1.0 >Reporter: Marko Ilic >Assignee: Marko Ilic >Priority: Major > Labels: pull-request-available > Fix For: 4.1.0, 4.0.1 > > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-52329) Remove private[sql] tags for new transformWithState API
[ https://issues.apache.org/jira/browse/SPARK-52329?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jungtaek Lim reassigned SPARK-52329: Assignee: Anish Shrigondekar > Remove private[sql] tags for new transformWithState API > --- > > Key: SPARK-52329 > URL: https://issues.apache.org/jira/browse/SPARK-52329 > Project: Spark > Issue Type: Task > Components: Structured Streaming >Affects Versions: 4.0.0, 4.1.0 >Reporter: Anish Shrigondekar >Assignee: Anish Shrigondekar >Priority: Major > Labels: pull-request-available > > Remove private[sql] tags for new transformWithState API -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-52329) Remove private[sql] tags for new transformWithState API
[ https://issues.apache.org/jira/browse/SPARK-52329?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jungtaek Lim resolved SPARK-52329. -- Fix Version/s: 4.1.0 4.0.1 Resolution: Fixed Issue resolved by pull request 51033 [https://github.com/apache/spark/pull/51033] > Remove private[sql] tags for new transformWithState API > --- > > Key: SPARK-52329 > URL: https://issues.apache.org/jira/browse/SPARK-52329 > Project: Spark > Issue Type: Task > Components: Structured Streaming >Affects Versions: 4.0.0, 4.1.0 >Reporter: Anish Shrigondekar >Assignee: Anish Shrigondekar >Priority: Major > Labels: pull-request-available > Fix For: 4.1.0, 4.0.1 > > > Remove private[sql] tags for new transformWithState API -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-52333) Squeeze protocol for timers (list on specific grouping key, and expiry timers)
[ https://issues.apache.org/jira/browse/SPARK-52333?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17954468#comment-17954468 ] Jungtaek Lim commented on SPARK-52333: -- Going to submit a PR for this. Probably in today. > Squeeze protocol for timers (list on specific grouping key, and expiry timers) > -- > > Key: SPARK-52333 > URL: https://issues.apache.org/jira/browse/SPARK-52333 > Project: Spark > Issue Type: Improvement > Components: PySpark, Structured Streaming >Affects Versions: 4.1.0 >Reporter: Jungtaek Lim >Priority: Major > > Likewise we did for ListState and MapState, we figured out inlining timers > into proto message would give the huge benefit on the state interaction > (intercommunication). This ticket aims to address the same change to listing > timers for grouping key and expiry timers. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-52333) Squeeze protocol for timers (list on specific grouping key, and expiry timers)
Jungtaek Lim created SPARK-52333: Summary: Squeeze protocol for timers (list on specific grouping key, and expiry timers) Key: SPARK-52333 URL: https://issues.apache.org/jira/browse/SPARK-52333 Project: Spark Issue Type: Improvement Components: PySpark, Structured Streaming Affects Versions: 4.1.0 Reporter: Jungtaek Lim Likewise we did for ListState and MapState, we figured out inlining timers into proto message would give the huge benefit on the state interaction (intercommunication). This ticket aims to address the same change to listing timers for grouping key and expiry timers. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org