[jira] [Updated] (SPARK-45670) SparkSubmit does not support --total-executor-cores when deploying on K8s
[ https://issues.apache.org/jira/browse/SPARK-45670?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hyukjin Kwon updated SPARK-45670: - Fix Version/s: 3.4.2 3.5.1 > SparkSubmit does not support --total-executor-cores when deploying on K8s > - > > Key: SPARK-45670 > URL: https://issues.apache.org/jira/browse/SPARK-45670 > Project: Spark > Issue Type: Bug > Components: Spark Submit >Affects Versions: 3.3.3, 3.4.1, 3.5.0 >Reporter: Cheng Pan >Assignee: Cheng Pan >Priority: Major > Labels: pull-request-available > Fix For: 3.4.2, 3.5.1, 3.3.4 > > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-45670) SparkSubmit does not support --total-executor-cores when deploying on K8s
[ https://issues.apache.org/jira/browse/SPARK-45670?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hyukjin Kwon resolved SPARK-45670. -- Fix Version/s: 3.3.4 Resolution: Fixed Issue resolved by pull request 43548 [https://github.com/apache/spark/pull/43548] > SparkSubmit does not support --total-executor-cores when deploying on K8s > - > > Key: SPARK-45670 > URL: https://issues.apache.org/jira/browse/SPARK-45670 > Project: Spark > Issue Type: Bug > Components: Spark Submit >Affects Versions: 3.3.3, 3.4.1, 3.5.0 >Reporter: Cheng Pan >Assignee: Cheng Pan >Priority: Major > Labels: pull-request-available > Fix For: 3.3.4 > > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-45670) SparkSubmit does not support --total-executor-cores when deploying on K8s
[ https://issues.apache.org/jira/browse/SPARK-45670?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hyukjin Kwon reassigned SPARK-45670: Assignee: Cheng Pan > SparkSubmit does not support --total-executor-cores when deploying on K8s > - > > Key: SPARK-45670 > URL: https://issues.apache.org/jira/browse/SPARK-45670 > Project: Spark > Issue Type: Bug > Components: Spark Submit >Affects Versions: 3.3.3, 3.4.1, 3.5.0 >Reporter: Cheng Pan >Assignee: Cheng Pan >Priority: Major > Labels: pull-request-available > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-45637) Time window aggregation in separate streams followed by stream-stream join not returning results
[ https://issues.apache.org/jira/browse/SPARK-45637?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wei Liu updated SPARK-45637: Description: According to documentation update (SPARK-42591) resulting from SPARK-42376, Spark 3.5.0 should support time-window aggregations in two separate streams followed by stream-stream window join: [https://github.com/apache/spark/blob/261b281e6e57be32eb28bf4e50bea24ed22a9f21/docs/structured-streaming-programming-guide.md?plain=1#L1939-L1995] However, I failed to reproduce this example and the query I built doesn't return any results: {code:java} from pyspark.sql.functions import rand from pyspark.sql.functions import expr, window, window_time spark.conf.set("spark.sql.shuffle.partitions", "1") impressions = ( spark .readStream.format("rate").option("rowsPerSecond", "5").option("numPartitions", "1").load() .selectExpr("value AS adId", "timestamp AS impressionTime") ) impressionsWithWatermark = impressions \ .selectExpr("adId AS impressionAdId", "impressionTime") \ .withWatermark("impressionTime", "10 seconds") clicks = ( spark .readStream.format("rate").option("rowsPerSecond", "5").option("numPartitions", "1").load() .where((rand() * 100).cast("integer") < 10) # 10 out of every 100 impressions result in a click .selectExpr("(value - 10) AS adId ", "timestamp AS clickTime") # -10 so that a click with same id as impression is generated later (i.e. delayed data). .where("adId > 0") ) clicksWithWatermark = clicks \ .selectExpr("adId AS clickAdId", "clickTime") \ .withWatermark("clickTime", "10 seconds") clicksWindow = clicksWithWatermark.groupBy( window(clicksWithWatermark.clickTime, "1 minute") ).count() impressionsWindow = impressionsWithWatermark.groupBy( window(impressionsWithWatermark.impressionTime, "1 minute") ).count() clicksAndImpressions = clicksWindow.join(impressionsWindow, "window", "inner") clicksAndImpressions.writeStream \ .format("memory") \ .queryName("clicksAndImpressions") \ .outputMode("append") \ .start() {code} My intuition is that I'm getting no results because to output results of the first stateful operator (time window aggregation), a watermark needs to pass the end timestamp of the window. And once the watermark is after the end timestamp of the window, this window is ignored at the second stateful operator (stream-stream) join because it's behind the watermark. Indeed, a small hack done to event time column (adding one minute) between two stateful operators makes it possible to get results: {code:java} clicksWindow2 = clicksWithWatermark.groupBy( window(clicksWithWatermark.clickTime, "1 minute") ).count().withColumn("window_time", window_time("window") + expr('INTERVAL 1 MINUTE')).drop("window") impressionsWindow2 = impressionsWithWatermark.groupBy( window(impressionsWithWatermark.impressionTime, "1 minute") ).count().withColumn("window_time", window_time("window") + expr('INTERVAL 1 MINUTE')).drop("window") clicksAndImpressions2 = clicksWindow2.join(impressionsWindow2, "window_time", "inner") clicksAndImpressions2.writeStream \ .format("memory") \ .queryName("clicksAndImpressions2") \ .outputMode("append") \ .start() {code} was: According to documentation update (SPARK-42591) resulting from SPARK-42376, Spark 3.5.0 should support time-window aggregations in two separate streams followed by stream-stream window join: https://github.com/apache/spark/blob/261b281e6e57be32eb28bf4e50bea24ed22a9f21/docs/structured-streaming-programming-guide.md?plain=1#L1939-L1995 However, I failed to reproduce this example and the query I built doesn't return any results: {code:java} from pyspark.sql.functions import rand from pyspark.sql.functions import expr, window, window_time spark.conf.set("spark.sql.shuffle.partitions", "1") impressions = ( spark .readStream.format("rate").option("rowsPerSecond", "5").option("numPartitions", "1").load() .selectExpr("value AS adId", "timestamp AS impressionTime") ) impressionsWithWatermark = impressions \ .selectExpr("adId AS impressionAdId", "impressionTime") \ .withWatermark("impressionTime", "10 seconds") clicks = ( spark .readStream.format("rate").option("rowsPerSecond", "5").option("numPartitions", "1").load() .where((rand() * 100).cast("integer") < 10) # 10 out of every 100 impressions result in a click .selectExpr("(value - 10) AS adId ", "timestamp AS clickTime") # -10 so that a click with same id as impression is generated later (i.e. delayed data). .where("adId > 0") ) clicksWithWatermark = clicks \ .selectExpr("adId AS clickAdId", "clickTime") \ .withWatermark("clickTime", "10 seconds") clicksWindow = clicksWithWatermark.groupBy( window(clicksWithWatermark.clickTime, "1 minute") ).count() impressionsWindow =
[jira] [Updated] (SPARK-45698) Clean up the deprecated API usage related to `Buffer`
[ https://issues.apache.org/jira/browse/SPARK-45698?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated SPARK-45698: --- Labels: pull-request-available (was: ) > Clean up the deprecated API usage related to `Buffer` > - > > Key: SPARK-45698 > URL: https://issues.apache.org/jira/browse/SPARK-45698 > Project: Spark > Issue Type: Sub-task > Components: Spark Core, SQL >Affects Versions: 4.0.0 >Reporter: Yang Jie >Priority: Major > Labels: pull-request-available > > * method append in trait Buffer is deprecated (since 2.13.0) > * method prepend in trait Buffer is deprecated (since 2.13.0) > * method trimEnd in trait Buffer is deprecated (since 2.13.4) > * method trimStart in trait Buffer is deprecated (since 2.13.4) > {code:java} > [warn] > /Users/yangjie01/SourceCode/git/spark-mine-sbt/core/src/test/scala/org/apache/spark/deploy/IvyTestUtils.scala:319:18: > method append in trait Buffer is deprecated (since 2.13.0): Use appendAll > instead > [warn] Applicable -Wconf / @nowarn filters for this warning: msg= message>, cat=deprecation, > site=org.apache.spark.deploy.IvyTestUtils.createLocalRepository, > origin=scala.collection.mutable.Buffer.append, version=2.13.0 > [warn] allFiles.append(rFiles: _*) > [warn] ^ > [warn] > /Users/yangjie01/SourceCode/git/spark-mine-sbt/core/src/main/scala/org/apache/spark/util/SizeEstimator.scala:183:13: > method trimEnd in trait Buffer is deprecated (since 2.13.4): use > dropRightInPlace instead > [warn] Applicable -Wconf / @nowarn filters for this warning: msg= message>, cat=deprecation, > site=org.apache.spark.util.SizeEstimator.SearchState.dequeue, > origin=scala.collection.mutable.Buffer.trimEnd, version=2.13.4 > [warn] stack.trimEnd(1) > [warn] ^{code} -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-45685) Use `LazyList` instead of `Stream`
[ https://issues.apache.org/jira/browse/SPARK-45685?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Yang Jie updated SPARK-45685: - Description: * class Stream in package immutable is deprecated (since 2.13.0) * object Stream in package immutable is deprecated (since 2.13.0) * type Stream in package scala is deprecated (since 2.13.0) * value Stream in package scala is deprecated (since 2.13.0) * method append in class Stream is deprecated (since 2.13.0) * method toStream in trait IterableOnceOps is deprecated (since 2.13.0) {code:java} [warn] /Users/yangjie01/SourceCode/git/spark-mine-sbt/sql/core/src/test/scala/org/apache/spark/sql/GenTPCDSData.scala:49:20: class Stream in package immutable is deprecated (since 2.13.0): Use LazyList (which is fully lazy) instead of Stream (which has a lazy tail only) [warn] Applicable -Wconf / @nowarn filters for this warning: msg=, cat=deprecation, site=org.apache.spark.sql.BlockingLineStream.BlockingStreamed.stream, origin=scala.collection.immutable.Stream, version=2.13.0 [warn] val stream: () => Stream[T]) [warn] ^ {code} was: * class Stream in package immutable is deprecated (since 2.13.0)object Stream in * package immutable is deprecated (since 2.13.0) * type Stream in package scala is deprecated (since 2.13.0) * value Stream in package scala is deprecated (since 2.13.0) * method append in class Stream is deprecated (since 2.13.0) * method toStream in trait IterableOnceOps is deprecated (since 2.13.0) {code:java} [warn] /Users/yangjie01/SourceCode/git/spark-mine-sbt/sql/core/src/test/scala/org/apache/spark/sql/GenTPCDSData.scala:49:20: class Stream in package immutable is deprecated (since 2.13.0): Use LazyList (which is fully lazy) instead of Stream (which has a lazy tail only) [warn] Applicable -Wconf / @nowarn filters for this warning: msg=, cat=deprecation, site=org.apache.spark.sql.BlockingLineStream.BlockingStreamed.stream, origin=scala.collection.immutable.Stream, version=2.13.0 [warn] val stream: () => Stream[T]) [warn] ^ {code} > Use `LazyList` instead of `Stream` > -- > > Key: SPARK-45685 > URL: https://issues.apache.org/jira/browse/SPARK-45685 > Project: Spark > Issue Type: Sub-task > Components: Build, Spark Core, SQL >Affects Versions: 4.0.0 >Reporter: Yang Jie >Priority: Major > > * class Stream in package immutable is deprecated (since 2.13.0) > * object Stream in package immutable is deprecated (since 2.13.0) > * type Stream in package scala is deprecated (since 2.13.0) > * value Stream in package scala is deprecated (since 2.13.0) > * method append in class Stream is deprecated (since 2.13.0) > * method toStream in trait IterableOnceOps is deprecated (since 2.13.0) > > {code:java} > [warn] > /Users/yangjie01/SourceCode/git/spark-mine-sbt/sql/core/src/test/scala/org/apache/spark/sql/GenTPCDSData.scala:49:20: > class Stream in package immutable is deprecated (since 2.13.0): Use LazyList > (which is fully lazy) instead of Stream (which has a lazy tail only) > [warn] Applicable -Wconf / @nowarn filters for this warning: msg= message>, cat=deprecation, > site=org.apache.spark.sql.BlockingLineStream.BlockingStreamed.stream, > origin=scala.collection.immutable.Stream, version=2.13.0 > [warn] val stream: () => Stream[T]) > [warn] ^ {code} -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-45699) Fix "Widening conversion from `TypeA` to `TypeB` is deprecated because it loses precision"
[ https://issues.apache.org/jira/browse/SPARK-45699?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Yang Jie updated SPARK-45699: - Summary: Fix "Widening conversion from `TypeA` to `TypeB` is deprecated because it loses precision" (was: Fix "Widening conversion from `TypeA` to `TypeB` is deprecated because it loses precision. Write `.toTypeB` instead") > Fix "Widening conversion from `TypeA` to `TypeB` is deprecated because it > loses precision" > -- > > Key: SPARK-45699 > URL: https://issues.apache.org/jira/browse/SPARK-45699 > Project: Spark > Issue Type: Sub-task > Components: Spark Core, SQL >Affects Versions: 4.0.0 >Reporter: Yang Jie >Priority: Major > > {code:java} > error] > /Users/yangjie01/SourceCode/git/spark-mine-sbt/core/src/main/scala/org/apache/spark/scheduler/TaskSetManager.scala:1199:67: > Widening conversion from Long to Double is deprecated because it loses > precision. Write `.toDouble` instead. [quickfixable] > [error] Applicable -Wconf / @nowarn filters for this fatal warning: msg= of the message>, cat=deprecation, > site=org.apache.spark.scheduler.TaskSetManager.checkSpeculatableTasks.threshold > [error] val threshold = max(speculationMultiplier * medianDuration, > minTimeToSpeculation) > [error] ^ > [error] > /Users/yangjie01/SourceCode/git/spark-mine-sbt/core/src/main/scala/org/apache/spark/scheduler/TaskSetManager.scala:1207:60: > Widening conversion from Long to Double is deprecated because it loses > precision. Write `.toDouble` instead. [quickfixable] > [error] Applicable -Wconf / @nowarn filters for this fatal warning: msg= of the message>, cat=deprecation, > site=org.apache.spark.scheduler.TaskSetManager.checkSpeculatableTasks > [error] foundTasks = checkAndSubmitSpeculatableTasks(timeMs, threshold, > customizedThreshold = true) > [error] ^ > [error] > /Users/yangjie01/SourceCode/git/spark-mine-sbt/connector/connect/common/src/main/scala/org/apache/spark/sql/connect/client/arrow/ArrowVectorReader.scala:137:48: > Widening conversion from Int to Float is deprecated because it loses > precision. Write `.toFloat` instead. [quickfixable] > [error] Applicable -Wconf / @nowarn filters for this fatal warning: msg= of the message>, cat=deprecation, > site=org.apache.spark.sql.connect.client.arrow.IntVectorReader.getFloat > [error] override def getFloat(i: Int): Float = getInt(i) > [error] ^ > [error] > /Users/yangjie01/SourceCode/git/spark-mine-sbt/connector/connect/common/src/main/scala/org/apache/spark/sql/connect/client/arrow/ArrowVectorReader.scala:146:49: > Widening conversion from Long to Float is deprecated because it loses > precision. Write `.toFloat` instead. [quickfixable] > [error] Applicable -Wconf / @nowarn filters for this fatal warning: msg= of the message>, cat=deprecation, > site=org.apache.spark.sql.connect.client.arrow.BigIntVectorReader.getFloat > [error] override def getFloat(i: Int): Float = getLong(i) > [error] ^ > [error] > /Users/yangjie01/SourceCode/git/spark-mine-sbt/connector/connect/common/src/main/scala/org/apache/spark/sql/connect/client/arrow/ArrowVectorReader.scala:147:51: > Widening conversion from Long to Double is deprecated because it loses > precision. Write `.toDouble` instead. [quickfixable] > [error] Applicable -Wconf / @nowarn filters for this fatal warning: msg= of the message>, cat=deprecation, > site=org.apache.spark.sql.connect.client.arrow.BigIntVectorReader.getDouble > [error] override def getDouble(i: Int): Double = getLong(i) > [error] ^ {code} -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-45699) Fix "Widening conversion from `TypeA` to `TypeB` is deprecated because it loses precision. Write `.toTypeB` instead"
[ https://issues.apache.org/jira/browse/SPARK-45699?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Yang Jie updated SPARK-45699: - Summary: Fix "Widening conversion from `TypeA` to `TypeB` is deprecated because it loses precision. Write `.toTypeB` instead" (was: Fix "Widening conversion from `OType` to `NType` is deprecated because it loses precision. Write `.toXX` instead") > Fix "Widening conversion from `TypeA` to `TypeB` is deprecated because it > loses precision. Write `.toTypeB` instead" > > > Key: SPARK-45699 > URL: https://issues.apache.org/jira/browse/SPARK-45699 > Project: Spark > Issue Type: Sub-task > Components: Spark Core, SQL >Affects Versions: 4.0.0 >Reporter: Yang Jie >Priority: Major > > {code:java} > error] > /Users/yangjie01/SourceCode/git/spark-mine-sbt/core/src/main/scala/org/apache/spark/scheduler/TaskSetManager.scala:1199:67: > Widening conversion from Long to Double is deprecated because it loses > precision. Write `.toDouble` instead. [quickfixable] > [error] Applicable -Wconf / @nowarn filters for this fatal warning: msg= of the message>, cat=deprecation, > site=org.apache.spark.scheduler.TaskSetManager.checkSpeculatableTasks.threshold > [error] val threshold = max(speculationMultiplier * medianDuration, > minTimeToSpeculation) > [error] ^ > [error] > /Users/yangjie01/SourceCode/git/spark-mine-sbt/core/src/main/scala/org/apache/spark/scheduler/TaskSetManager.scala:1207:60: > Widening conversion from Long to Double is deprecated because it loses > precision. Write `.toDouble` instead. [quickfixable] > [error] Applicable -Wconf / @nowarn filters for this fatal warning: msg= of the message>, cat=deprecation, > site=org.apache.spark.scheduler.TaskSetManager.checkSpeculatableTasks > [error] foundTasks = checkAndSubmitSpeculatableTasks(timeMs, threshold, > customizedThreshold = true) > [error] ^ > [error] > /Users/yangjie01/SourceCode/git/spark-mine-sbt/connector/connect/common/src/main/scala/org/apache/spark/sql/connect/client/arrow/ArrowVectorReader.scala:137:48: > Widening conversion from Int to Float is deprecated because it loses > precision. Write `.toFloat` instead. [quickfixable] > [error] Applicable -Wconf / @nowarn filters for this fatal warning: msg= of the message>, cat=deprecation, > site=org.apache.spark.sql.connect.client.arrow.IntVectorReader.getFloat > [error] override def getFloat(i: Int): Float = getInt(i) > [error] ^ > [error] > /Users/yangjie01/SourceCode/git/spark-mine-sbt/connector/connect/common/src/main/scala/org/apache/spark/sql/connect/client/arrow/ArrowVectorReader.scala:146:49: > Widening conversion from Long to Float is deprecated because it loses > precision. Write `.toFloat` instead. [quickfixable] > [error] Applicable -Wconf / @nowarn filters for this fatal warning: msg= of the message>, cat=deprecation, > site=org.apache.spark.sql.connect.client.arrow.BigIntVectorReader.getFloat > [error] override def getFloat(i: Int): Float = getLong(i) > [error] ^ > [error] > /Users/yangjie01/SourceCode/git/spark-mine-sbt/connector/connect/common/src/main/scala/org/apache/spark/sql/connect/client/arrow/ArrowVectorReader.scala:147:51: > Widening conversion from Long to Double is deprecated because it loses > precision. Write `.toDouble` instead. [quickfixable] > [error] Applicable -Wconf / @nowarn filters for this fatal warning: msg= of the message>, cat=deprecation, > site=org.apache.spark.sql.connect.client.arrow.BigIntVectorReader.getDouble > [error] override def getDouble(i: Int): Double = getLong(i) > [error] ^ {code} -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-45704) Fix `legacy-binding`
Yang Jie created SPARK-45704: Summary: Fix `legacy-binding` Key: SPARK-45704 URL: https://issues.apache.org/jira/browse/SPARK-45704 Project: Spark Issue Type: Sub-task Components: Spark Core, SQL Affects Versions: 4.0.0 Reporter: Yang Jie {code:java} [error] /Users/yangjie01/SourceCode/git/spark-mine-sbt/core/src/main/scala/org/apache/spark/deploy/client/StandaloneAppClient.scala:93:11: reference to stop is ambiguous; [error] it is both defined in the enclosing class StandaloneAppClient and inherited in the enclosing class ClientEndpoint as method stop (defined in trait RpcEndpoint, inherited through parent trait ThreadSafeRpcEndpoint) [error] In Scala 2, symbols inherited from a superclass shadow symbols defined in an outer scope. [error] Such references are ambiguous in Scala 3. To continue using the inherited symbol, write `this.stop`. [error] Or use `-Wconf:msg=legacy-binding:s` to silence this warning. [quickfixable] [error] Applicable -Wconf / @nowarn filters for this fatal warning: msg=, cat=other, site=org.apache.spark.deploy.client.StandaloneAppClient.ClientEndpoint.onStart [error] stop() [error] ^ [error] /Users/yangjie01/SourceCode/git/spark-mine-sbt/core/src/main/scala/org/apache/spark/deploy/client/StandaloneAppClient.scala:171:9: reference to stop is ambiguous; [error] it is both defined in the enclosing class StandaloneAppClient and inherited in the enclosing class ClientEndpoint as method stop (defined in trait RpcEndpoint, inherited through parent trait ThreadSafeRpcEndpoint) [error] In Scala 2, symbols inherited from a superclass shadow symbols defined in an outer scope. [error] Such references are ambiguous in Scala 3. To continue using the inherited symbol, write `this.stop`. [error] Or use `-Wconf:msg=legacy-binding:s` to silence this warning. [quickfixable] [error] Applicable -Wconf / @nowarn filters for this fatal warning: msg=, cat=other, site=org.apache.spark.deploy.client.StandaloneAppClient.ClientEndpoint.receive [error] stop() [error] ^ [error] /Users/yangjie01/SourceCode/git/spark-mine-sbt/core/src/main/scala/org/apache/spark/deploy/client/StandaloneAppClient.scala:206:9: reference to stop is ambiguous; [error] it is both defined in the enclosing class StandaloneAppClient and inherited in the enclosing class ClientEndpoint as method stop (defined in trait RpcEndpoint, inherited through parent trait ThreadSafeRpcEndpoint) [error] In Scala 2, symbols inherited from a superclass shadow symbols defined in an outer scope. [error] Such references are ambiguous in Scala 3. To continue using the inherited symbol, write `this.stop`. [error] Or use `-Wconf:msg=legacy-binding:s` to silence this warning. [quickfixable] [error] Applicable -Wconf / @nowarn filters for this fatal warning: msg=, cat=other, site=org.apache.spark.deploy.client.StandaloneAppClient.ClientEndpoint.receiveAndReply [error] stop() [error] ^ [error] /Users/yangjie01/SourceCode/git/spark-mine-sbt/core/src/main/scala/org/apache/spark/rdd/OrderedRDDFunctions.scala:100:21: the type test for pattern org.apache.spark.RangePartitioner[K,V] cannot be checked at runtime because it has type parameters eliminated by erasure [error] Applicable -Wconf / @nowarn filters for this fatal warning: msg=, cat=unchecked, site=org.apache.spark.rdd.OrderedRDDFunctions.filterByRange.rddToFilter [error] case Some(rp: RangePartitioner[K, V]) => [error] ^ [error] /Users/yangjie01/SourceCode/git/spark-mine-sbt/core/src/main/scala/org/apache/spark/scheduler/cluster/CoarseGrainedSchedulerBackend.scala:322:9: reference to stop is ambiguous; [error] it is both defined in the enclosing class CoarseGrainedSchedulerBackend and inherited in the enclosing class DriverEndpoint as method stop (defined in trait RpcEndpoint, inherited through parent trait IsolatedThreadSafeRpcEndpoint) [error] In Scala 2, symbols inherited from a superclass shadow symbols defined in an outer scope. [error] Such references are ambiguous in Scala 3. To continue using the inherited symbol, write `this.stop`. [error] Or use `-Wconf:msg=legacy-binding:s` to silence this warning. [quickfixable] [error] Applicable -Wconf / @nowarn filters for this fatal warning: msg=, cat=other, site=org.apache.spark.scheduler.cluster.CoarseGrainedSchedulerBackend.DriverEndpoint.receiveAndReply [error] stop() [error] ^ [info] compiling 29 Scala sources and 267 Java sources to /Users/yangjie01/SourceCode/git/spark-mine-sbt/connector/connect/common/target/scala-2.13/classes ... [warn] -target is deprecated: Use -release instead to compile against the correct platform API. [warn] Applicable -Wconf / @nowarn filters for this warning: msg=, cat=deprecation [error] /Users/yangjie01/SourceCode/git/spark-mine-sbt/core/src/main/scala/org/apache/sp
[jira] [Created] (SPARK-45703) Fix `abstract type TypeA in type pattern Some[TypeA] is unchecked since it is eliminated by erasure`
Yang Jie created SPARK-45703: Summary: Fix `abstract type TypeA in type pattern Some[TypeA] is unchecked since it is eliminated by erasure` Key: SPARK-45703 URL: https://issues.apache.org/jira/browse/SPARK-45703 Project: Spark Issue Type: Sub-task Components: SQL Affects Versions: 4.0.0 Reporter: Yang Jie {code:java} [error] /Users/yangjie01/SourceCode/git/spark-mine-sbt/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/CatalystTypeConverters.scala:105:19: abstract type ScalaInputType in type pattern Some[ScalaInputType] is unchecked since it is eliminated by erasure [error] Applicable -Wconf / @nowarn filters for this fatal warning: msg=, cat=unchecked, site=org.apache.spark.sql.catalyst.CatalystTypeConverters.CatalystTypeConverter.toCatalyst [error] case opt: Some[ScalaInputType] => toCatalystImpl(opt.get) [error] ^ {code} -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-45702) Fix `the type test for pattern TypeA cannot be checked at runtime`
Yang Jie created SPARK-45702: Summary: Fix `the type test for pattern TypeA cannot be checked at runtime` Key: SPARK-45702 URL: https://issues.apache.org/jira/browse/SPARK-45702 Project: Spark Issue Type: Sub-task Components: Spark Core Affects Versions: 4.0.0 Reporter: Yang Jie {code:java} [error] /Users/yangjie01/SourceCode/git/spark-mine-sbt/core/src/main/scala/org/apache/spark/rdd/OrderedRDDFunctions.scala:100:21: the type test for pattern org.apache.spark.RangePartitioner[K,V] cannot be checked at runtime because it has type parameters eliminated by erasure [error] Applicable -Wconf / @nowarn filters for this fatal warning: msg=, cat=unchecked, site=org.apache.spark.rdd.OrderedRDDFunctions.filterByRange.rddToFilter [error] case Some(rp: RangePartitioner[K, V]) => [error] {code} -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-45701) Clean up the deprecated API usage related to `SetOps`
Yang Jie created SPARK-45701: Summary: Clean up the deprecated API usage related to `SetOps` Key: SPARK-45701 URL: https://issues.apache.org/jira/browse/SPARK-45701 Project: Spark Issue Type: Sub-task Components: Spark Core, SQL Affects Versions: 4.0.0 Reporter: Yang Jie * method - in trait SetOps is deprecated (since 2.13.0) * method – in trait SetOps is deprecated (since 2.13.0) * method + in trait SetOps is deprecated (since 2.13.0) {code:java} [warn] /Users/yangjie01/SourceCode/git/spark-mine-sbt/core/src/main/scala/org/apache/spark/storage/BlockReplicationPolicy.scala:70:32: method + in trait SetOps is deprecated (since 2.13.0): Consider requiring an immutable Set or fall back to Set.union [warn] Applicable -Wconf / @nowarn filters for this warning: msg=, cat=deprecation, site=org.apache.spark.storage.BlockReplicationUtils.getSampleIds.indices.$anonfun, origin=scala.collection.SetOps.+, version=2.13.0 [warn] if (set.contains(t)) set + i else set + t [warn] ^ {code} -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-45684) Clean up the deprecated API usage related to `SeqOps`
[ https://issues.apache.org/jira/browse/SPARK-45684?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Yang Jie updated SPARK-45684: - Description: * method transform in trait SeqOps is deprecated (since 2.13.0) * method reverseMap in trait SeqOps is deprecated (since 2.13.0) * method retain in trait SetOps is deprecated (since 2.13.0) * method union in trait SeqOps is deprecated (since 2.13.0) {code:java} [warn] /Users/yangjie01/SourceCode/git/spark-mine-sbt/mllib/src/main/scala/org/apache/spark/ml/classification/LogisticRegression.scala:675:15: method transform in trait SeqOps is deprecated (since 2.13.0): Use `mapInPlace` on an `IndexedSeq` instead [warn] Applicable -Wconf / @nowarn filters for this warning: msg=, cat=deprecation, site=org.apache.spark.ml.classification.LogisticRegression.train.$anonfun, origin=scala.collection.mutable.SeqOps.transform, version=2.13.0 [warn] centers.transform(_ / numCoefficientSets) [warn] ^ {code} was: * method transform in trait SeqOps is deprecated (since 2.13.0) * method reverseMap in trait SeqOps is deprecated (since 2.13.0) * method retain in trait SetOps is deprecated (since 2.13.0) * method - in trait SetOps is deprecated (since 2.13.0) * method -- in trait SetOps is deprecated (since 2.13.0) * method + in trait SetOps is deprecated (since 2.13.0) {code:java} [warn] /Users/yangjie01/SourceCode/git/spark-mine-sbt/mllib/src/main/scala/org/apache/spark/ml/classification/LogisticRegression.scala:675:15: method transform in trait SeqOps is deprecated (since 2.13.0): Use `mapInPlace` on an `IndexedSeq` instead [warn] Applicable -Wconf / @nowarn filters for this warning: msg=, cat=deprecation, site=org.apache.spark.ml.classification.LogisticRegression.train.$anonfun, origin=scala.collection.mutable.SeqOps.transform, version=2.13.0 [warn] centers.transform(_ / numCoefficientSets) [warn] ^ {code} > Clean up the deprecated API usage related to `SeqOps` > - > > Key: SPARK-45684 > URL: https://issues.apache.org/jira/browse/SPARK-45684 > Project: Spark > Issue Type: Sub-task > Components: Build, Spark Core, SQL >Affects Versions: 4.0.0 >Reporter: Yang Jie >Priority: Major > > * method transform in trait SeqOps is deprecated (since 2.13.0) > * method reverseMap in trait SeqOps is deprecated (since 2.13.0) > * method retain in trait SetOps is deprecated (since 2.13.0) > * method union in trait SeqOps is deprecated (since 2.13.0) > {code:java} > [warn] > /Users/yangjie01/SourceCode/git/spark-mine-sbt/mllib/src/main/scala/org/apache/spark/ml/classification/LogisticRegression.scala:675:15: > method transform in trait SeqOps is deprecated (since 2.13.0): Use > `mapInPlace` on an `IndexedSeq` instead > [warn] Applicable -Wconf / @nowarn filters for this warning: msg= message>, cat=deprecation, > site=org.apache.spark.ml.classification.LogisticRegression.train.$anonfun, > origin=scala.collection.mutable.SeqOps.transform, version=2.13.0 > [warn] centers.transform(_ / numCoefficientSets) > [warn] ^ {code} -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-45700) Fix `The outer reference in this type test cannot be checked at run time`
Yang Jie created SPARK-45700: Summary: Fix `The outer reference in this type test cannot be checked at run time` Key: SPARK-45700 URL: https://issues.apache.org/jira/browse/SPARK-45700 Project: Spark Issue Type: Sub-task Components: Spark Core, SQL Affects Versions: 4.0.0 Reporter: Yang Jie {code:java} [error] /Users/yangjie01/SourceCode/git/spark-mine-sbt/sql/core/src/test/scala/org/apache/spark/sql/SQLQueryTestSuite.scala:324:12: The outer reference in this type test cannot be checked at run time. [error] Applicable -Wconf / @nowarn filters for this fatal warning: msg=, cat=unchecked, site=org.apache.spark.sql.SQLQueryTestSuite.createScalaTestCase [error] case udfTestCase: UDFTest [error] ^ [error] /Users/yangjie01/SourceCode/git/spark-mine-sbt/sql/core/src/test/scala/org/apache/spark/sql/SQLQueryTestSuite.scala:506:12: The outer reference in this type test cannot be checked at run time. [error] Applicable -Wconf / @nowarn filters for this fatal warning: msg=, cat=unchecked, site=org.apache.spark.sql.SQLQueryTestSuite.runQueries [error] case udfTestCase: UDFTest => [error] ^ [error] /Users/yangjie01/SourceCode/git/spark-mine-sbt/sql/core/src/test/scala/org/apache/spark/sql/SQLQueryTestSuite.scala:508:12: The outer reference in this type test cannot be checked at run time. [error] Applicable -Wconf / @nowarn filters for this fatal warning: msg=, cat=unchecked, site=org.apache.spark.sql.SQLQueryTestSuite.runQueries [error] case udtfTestCase: UDTFSetTest => [error] ^ [error] /Users/yangjie01/SourceCode/git/spark-mine-sbt/sql/core/src/test/scala/org/apache/spark/sql/SQLQueryTestSuite.scala:514:13: The outer reference in this type test cannot be checked at run time. [error] Applicable -Wconf / @nowarn filters for this fatal warning: msg=, cat=unchecked, site=org.apache.spark.sql.SQLQueryTestSuite.runQueries [error] case _: PgSQLTest => [error] ^ [error] /Users/yangjie01/SourceCode/git/spark-mine-sbt/sql/core/src/test/scala/org/apache/spark/sql/SQLQueryTestSuite.scala:522:13: The outer reference in this type test cannot be checked at run time. [error] Applicable -Wconf / @nowarn filters for this fatal warning: msg=, cat=unchecked, site=org.apache.spark.sql.SQLQueryTestSuite.runQueries [error] case _: AnsiTest => [error] ^ [error] /Users/yangjie01/SourceCode/git/spark-mine-sbt/sql/core/src/test/scala/org/apache/spark/sql/SQLQueryTestSuite.scala:524:13: The outer reference in this type test cannot be checked at run time. [error] Applicable -Wconf / @nowarn filters for this fatal warning: msg=, cat=unchecked, site=org.apache.spark.sql.SQLQueryTestSuite.runQueries [error] case _: TimestampNTZTest => [error] ^ [error] /Users/yangjie01/SourceCode/git/spark-mine-sbt/sql/core/src/test/scala/org/apache/spark/sql/SQLQueryTestSuite.scala:584:12: The outer reference in this type test cannot be checked at run time. [error] Applicable -Wconf / @nowarn filters for this fatal warning: msg=, cat=unchecked, site=org.apache.spark.sql.SQLQueryTestSuite.runQueries.clue [error] case udfTestCase: UDFTest [error] ^ [error] /Users/yangjie01/SourceCode/git/spark-mine-sbt/sql/core/src/test/scala/org/apache/spark/sql/SQLQueryTestSuite.scala:596:12: The outer reference in this type test cannot be checked at run time. [error] Applicable -Wconf / @nowarn filters for this fatal warning: msg=, cat=unchecked, site=org.apache.spark.sql.SQLQueryTestSuite.runQueries.clue [error] case udtfTestCase: UDTFSetTest [error] ^ {code} -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-45699) Fix "Widening conversion from `OType` to `NType` is deprecated because it loses precision. Write `.toXX` instead"
Yang Jie created SPARK-45699: Summary: Fix "Widening conversion from `OType` to `NType` is deprecated because it loses precision. Write `.toXX` instead" Key: SPARK-45699 URL: https://issues.apache.org/jira/browse/SPARK-45699 Project: Spark Issue Type: Sub-task Components: Spark Core, SQL Affects Versions: 4.0.0 Reporter: Yang Jie {code:java} error] /Users/yangjie01/SourceCode/git/spark-mine-sbt/core/src/main/scala/org/apache/spark/scheduler/TaskSetManager.scala:1199:67: Widening conversion from Long to Double is deprecated because it loses precision. Write `.toDouble` instead. [quickfixable] [error] Applicable -Wconf / @nowarn filters for this fatal warning: msg=, cat=deprecation, site=org.apache.spark.scheduler.TaskSetManager.checkSpeculatableTasks.threshold [error] val threshold = max(speculationMultiplier * medianDuration, minTimeToSpeculation) [error] ^ [error] /Users/yangjie01/SourceCode/git/spark-mine-sbt/core/src/main/scala/org/apache/spark/scheduler/TaskSetManager.scala:1207:60: Widening conversion from Long to Double is deprecated because it loses precision. Write `.toDouble` instead. [quickfixable] [error] Applicable -Wconf / @nowarn filters for this fatal warning: msg=, cat=deprecation, site=org.apache.spark.scheduler.TaskSetManager.checkSpeculatableTasks [error] foundTasks = checkAndSubmitSpeculatableTasks(timeMs, threshold, customizedThreshold = true) [error] ^ [error] /Users/yangjie01/SourceCode/git/spark-mine-sbt/connector/connect/common/src/main/scala/org/apache/spark/sql/connect/client/arrow/ArrowVectorReader.scala:137:48: Widening conversion from Int to Float is deprecated because it loses precision. Write `.toFloat` instead. [quickfixable] [error] Applicable -Wconf / @nowarn filters for this fatal warning: msg=, cat=deprecation, site=org.apache.spark.sql.connect.client.arrow.IntVectorReader.getFloat [error] override def getFloat(i: Int): Float = getInt(i) [error] ^ [error] /Users/yangjie01/SourceCode/git/spark-mine-sbt/connector/connect/common/src/main/scala/org/apache/spark/sql/connect/client/arrow/ArrowVectorReader.scala:146:49: Widening conversion from Long to Float is deprecated because it loses precision. Write `.toFloat` instead. [quickfixable] [error] Applicable -Wconf / @nowarn filters for this fatal warning: msg=, cat=deprecation, site=org.apache.spark.sql.connect.client.arrow.BigIntVectorReader.getFloat [error] override def getFloat(i: Int): Float = getLong(i) [error] ^ [error] /Users/yangjie01/SourceCode/git/spark-mine-sbt/connector/connect/common/src/main/scala/org/apache/spark/sql/connect/client/arrow/ArrowVectorReader.scala:147:51: Widening conversion from Long to Double is deprecated because it loses precision. Write `.toDouble` instead. [quickfixable] [error] Applicable -Wconf / @nowarn filters for this fatal warning: msg=, cat=deprecation, site=org.apache.spark.sql.connect.client.arrow.BigIntVectorReader.getDouble [error] override def getDouble(i: Int): Double = getLong(i) [error] ^ {code} -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-45687) Fix `Passing an explicit array value to a Scala varargs method is deprecated`
[ https://issues.apache.org/jira/browse/SPARK-45687?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17780150#comment-17780150 ] Yang Jie commented on SPARK-45687: -- We need to distinguish the situations, some need to be changed to `.toIndexedSeq`, some need to be changed to `ArraySeq.unsafeWrapArray` > Fix `Passing an explicit array value to a Scala varargs method is deprecated` > - > > Key: SPARK-45687 > URL: https://issues.apache.org/jira/browse/SPARK-45687 > Project: Spark > Issue Type: Sub-task > Components: Spark Core, SQL >Affects Versions: 4.0.0 >Reporter: Yang Jie >Priority: Major > > Passing an explicit array value to a Scala varargs method is deprecated > (since 2.13.0) and will result in a defensive copy; Use the more efficient > non-copying ArraySeq.unsafeWrapArray or an explicit toIndexedSeq call > > {code:java} > [warn] > /Users/yangjie01/SourceCode/git/spark-mine-sbt/sql/hive/src/test/scala/org/apache/spark/sql/hive/execution/AggregationQuerySuite.scala:945:21: > Passing an explicit array value to a Scala varargs method is deprecated > (since 2.13.0) and will result in a defensive copy; Use the more efficient > non-copying ArraySeq.unsafeWrapArray or an explicit toIndexedSeq call > [warn] Applicable -Wconf / @nowarn filters for this warning: msg= message>, cat=deprecation, > site=org.apache.spark.sql.hive.execution.AggregationQuerySuite, version=2.13.0 > [warn] df.agg(udaf(allColumns: _*)), > [warn] ^ > [warn] > /Users/yangjie01/SourceCode/git/spark-mine-sbt/sql/hive/src/test/scala/org/apache/spark/sql/hive/execution/ObjectHashAggregateSuite.scala:156:48: > Passing an explicit array value to a Scala varargs method is deprecated > (since 2.13.0) and will result in a defensive copy; Use the more efficient > non-copying ArraySeq.unsafeWrapArray or an explicit toIndexedSeq call > [warn] Applicable -Wconf / @nowarn filters for this warning: msg= message>, cat=deprecation, > site=org.apache.spark.sql.hive.execution.ObjectHashAggregateSuite, > version=2.13.0 > [warn] df.agg(aggFunctions.head, aggFunctions.tail: _*), > [warn] ^ > [warn] > /Users/yangjie01/SourceCode/git/spark-mine-sbt/sql/hive/src/test/scala/org/apache/spark/sql/hive/execution/ObjectHashAggregateSuite.scala:161:76: > Passing an explicit array value to a Scala varargs method is deprecated > (since 2.13.0) and will result in a defensive copy; Use the more efficient > non-copying ArraySeq.unsafeWrapArray or an explicit toIndexedSeq call > [warn] Applicable -Wconf / @nowarn filters for this warning: msg= message>, cat=deprecation, > site=org.apache.spark.sql.hive.execution.ObjectHashAggregateSuite, > version=2.13.0 > [warn] df.groupBy($"id" % 4 as "mod").agg(aggFunctions.head, > aggFunctions.tail: _*), > [warn] > ^ > [warn] > /Users/yangjie01/SourceCode/git/spark-mine-sbt/sql/hive/src/test/scala/org/apache/spark/sql/hive/execution/ObjectHashAggregateSuite.scala:171:50: > Passing an explicit array value to a Scala varargs method is deprecated > (since 2.13.0) and will result in a defensive copy; Use the more efficient > non-copying ArraySeq.unsafeWrapArray or an explicit toIndexedSeq call > [warn] Applicable -Wconf / @nowarn filters for this warning: msg= message>, cat=deprecation, > site=org.apache.spark.sql.hive.execution.ObjectHashAggregateSuite, > version=2.13.0 > [warn] df.agg(aggFunctions.head, aggFunctions.tail: _*), > [warn] {code} -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-45686) Fix `method copyArrayToImmutableIndexedSeq in class LowPriorityImplicits2 is deprecated`
[ https://issues.apache.org/jira/browse/SPARK-45686?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17780149#comment-17780149 ] Yang Jie commented on SPARK-45686: -- We need to distinguish the situations, some need to be changed to `.toIndexedSeq`, some need to be changed to `ArraySeq.unsafeWrapArray` > Fix `method copyArrayToImmutableIndexedSeq in class LowPriorityImplicits2 is > deprecated` > > > Key: SPARK-45686 > URL: https://issues.apache.org/jira/browse/SPARK-45686 > Project: Spark > Issue Type: Sub-task > Components: Spark Core, SQL >Affects Versions: 4.0.0 >Reporter: Yang Jie >Priority: Major > > {code:java} > [error] > /Users/yangjie01/SourceCode/git/spark-mine-sbt/mllib-local/src/main/scala/org/apache/spark/ml/linalg/Vectors.scala:57:31: > method copyArrayToImmutableIndexedSeq in class LowPriorityImplicits2 is > deprecated (since 2.13.0): implicit conversions from Array to > immutable.IndexedSeq are implemented by copying; use `toIndexedSeq` > explicitly if you want to copy, or use the more efficient non-copying > ArraySeq.unsafeWrapArray > [error] Applicable -Wconf / @nowarn filters for this fatal warning: msg= of the message>, cat=deprecation, > site=org.apache.spark.ml.linalg.Vector.equals, > origin=scala.LowPriorityImplicits2.copyArrayToImmutableIndexedSeq, > version=2.13.0 > [error] Vectors.equals(s1.indices, s1.values, s2.indices, > s2.values) > [error] ^ > [error] > /Users/yangjie01/SourceCode/git/spark-mine-sbt/mllib-local/src/main/scala/org/apache/spark/ml/linalg/Vectors.scala:57:54: > method copyArrayToImmutableIndexedSeq in class LowPriorityImplicits2 is > deprecated (since 2.13.0): implicit conversions from Array to > immutable.IndexedSeq are implemented by copying; use `toIndexedSeq` > explicitly if you want to copy, or use the more efficient non-copying > ArraySeq.unsafeWrapArray > [error] Applicable -Wconf / @nowarn filters for this fatal warning: msg= of the message>, cat=deprecation, > site=org.apache.spark.ml.linalg.Vector.equals, > origin=scala.LowPriorityImplicits2.copyArrayToImmutableIndexedSeq, > version=2.13.0 > [error] Vectors.equals(s1.indices, s1.values, s2.indices, > s2.values) > [error] ^ > [error] > /Users/yangjie01/SourceCode/git/spark-mine-sbt/mllib-local/src/main/scala/org/apache/spark/ml/linalg/Vectors.scala:59:31: > method copyArrayToImmutableIndexedSeq in class LowPriorityImplicits2 is > deprecated (since 2.13.0): implicit conversions from Array to > immutable.IndexedSeq are implemented by copying; use `toIndexedSeq` > explicitly if you want to copy, or use the more efficient non-copying > ArraySeq.unsafeWrapArray > [error] Applicable -Wconf / @nowarn filters for this fatal warning: msg= of the message>, cat=deprecation, > site=org.apache.spark.ml.linalg.Vector.equals, > origin=scala.LowPriorityImplicits2.copyArrayToImmutableIndexedSeq, > version=2.13.0 > [error] Vectors.equals(s1.indices, s1.values, 0 until d1.size, > d1.values) > [error] ^ > [error] > /Users/yangjie01/SourceCode/git/spark-mine-sbt/mllib-local/src/main/scala/org/apache/spark/ml/linalg/Vectors.scala:61:59: > method copyArrayToImmutableIndexedSeq in class LowPriorityImplicits2 is > deprecated (since 2.13.0): implicit conversions from Array to > immutable.IndexedSeq are implemented by copying; use `toIndexedSeq` > explicitly if you want to copy, or use the more efficient non-copying > ArraySeq.unsafeWrapArray > [error] Applicable -Wconf / @nowarn filters for this fatal warning: msg= of the message>, cat=deprecation, > site=org.apache.spark.ml.linalg.Vector.equals, > origin=scala.LowPriorityImplicits2.copyArrayToImmutableIndexedSeq, > version=2.13.0 > [error] Vectors.equals(0 until d1.size, d1.values, s1.indices, > s1.values) > [error] {code} -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Comment Edited] (SPARK-45686) Fix `method copyArrayToImmutableIndexedSeq in class LowPriorityImplicits2 is deprecated`
[ https://issues.apache.org/jira/browse/SPARK-45686?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17780149#comment-17780149 ] Yang Jie edited comment on SPARK-45686 at 10/27/23 3:21 AM: We need to distinguish the situations, some need to be changed to `.toIndexedSeq`, some need to be changed to `ArraySeq.unsafeWrapArray` was (Author: luciferyang): We need to distinguish the situations, some need to be changed to `.toIndexedSeq`, some need to be changed to `ArraySeq.unsafeWrapArray` > Fix `method copyArrayToImmutableIndexedSeq in class LowPriorityImplicits2 is > deprecated` > > > Key: SPARK-45686 > URL: https://issues.apache.org/jira/browse/SPARK-45686 > Project: Spark > Issue Type: Sub-task > Components: Spark Core, SQL >Affects Versions: 4.0.0 >Reporter: Yang Jie >Priority: Major > > {code:java} > [error] > /Users/yangjie01/SourceCode/git/spark-mine-sbt/mllib-local/src/main/scala/org/apache/spark/ml/linalg/Vectors.scala:57:31: > method copyArrayToImmutableIndexedSeq in class LowPriorityImplicits2 is > deprecated (since 2.13.0): implicit conversions from Array to > immutable.IndexedSeq are implemented by copying; use `toIndexedSeq` > explicitly if you want to copy, or use the more efficient non-copying > ArraySeq.unsafeWrapArray > [error] Applicable -Wconf / @nowarn filters for this fatal warning: msg= of the message>, cat=deprecation, > site=org.apache.spark.ml.linalg.Vector.equals, > origin=scala.LowPriorityImplicits2.copyArrayToImmutableIndexedSeq, > version=2.13.0 > [error] Vectors.equals(s1.indices, s1.values, s2.indices, > s2.values) > [error] ^ > [error] > /Users/yangjie01/SourceCode/git/spark-mine-sbt/mllib-local/src/main/scala/org/apache/spark/ml/linalg/Vectors.scala:57:54: > method copyArrayToImmutableIndexedSeq in class LowPriorityImplicits2 is > deprecated (since 2.13.0): implicit conversions from Array to > immutable.IndexedSeq are implemented by copying; use `toIndexedSeq` > explicitly if you want to copy, or use the more efficient non-copying > ArraySeq.unsafeWrapArray > [error] Applicable -Wconf / @nowarn filters for this fatal warning: msg= of the message>, cat=deprecation, > site=org.apache.spark.ml.linalg.Vector.equals, > origin=scala.LowPriorityImplicits2.copyArrayToImmutableIndexedSeq, > version=2.13.0 > [error] Vectors.equals(s1.indices, s1.values, s2.indices, > s2.values) > [error] ^ > [error] > /Users/yangjie01/SourceCode/git/spark-mine-sbt/mllib-local/src/main/scala/org/apache/spark/ml/linalg/Vectors.scala:59:31: > method copyArrayToImmutableIndexedSeq in class LowPriorityImplicits2 is > deprecated (since 2.13.0): implicit conversions from Array to > immutable.IndexedSeq are implemented by copying; use `toIndexedSeq` > explicitly if you want to copy, or use the more efficient non-copying > ArraySeq.unsafeWrapArray > [error] Applicable -Wconf / @nowarn filters for this fatal warning: msg= of the message>, cat=deprecation, > site=org.apache.spark.ml.linalg.Vector.equals, > origin=scala.LowPriorityImplicits2.copyArrayToImmutableIndexedSeq, > version=2.13.0 > [error] Vectors.equals(s1.indices, s1.values, 0 until d1.size, > d1.values) > [error] ^ > [error] > /Users/yangjie01/SourceCode/git/spark-mine-sbt/mllib-local/src/main/scala/org/apache/spark/ml/linalg/Vectors.scala:61:59: > method copyArrayToImmutableIndexedSeq in class LowPriorityImplicits2 is > deprecated (since 2.13.0): implicit conversions from Array to > immutable.IndexedSeq are implemented by copying; use `toIndexedSeq` > explicitly if you want to copy, or use the more efficient non-copying > ArraySeq.unsafeWrapArray > [error] Applicable -Wconf / @nowarn filters for this fatal warning: msg= of the message>, cat=deprecation, > site=org.apache.spark.ml.linalg.Vector.equals, > origin=scala.LowPriorityImplicits2.copyArrayToImmutableIndexedSeq, > version=2.13.0 > [error] Vectors.equals(0 until d1.size, d1.values, s1.indices, > s1.values) > [error] {code} -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-45314) Drop Scala 2.12 and make Scala 2.13 by default
[ https://issues.apache.org/jira/browse/SPARK-45314?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17780148#comment-17780148 ] Yang Jie commented on SPARK-45314: -- Friendly ping [~ivoson] [~panbingkun] [~zhiyuan] [~laglangyue], I has created some tickets here, feel free to pick up them if you are interested ~ > Drop Scala 2.12 and make Scala 2.13 by default > -- > > Key: SPARK-45314 > URL: https://issues.apache.org/jira/browse/SPARK-45314 > Project: Spark > Issue Type: Umbrella > Components: Build >Affects Versions: 4.0.0 >Reporter: Hyukjin Kwon >Assignee: Yang Jie >Priority: Critical > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-45698) Clean up the deprecated API usage related to `Buffer`
Yang Jie created SPARK-45698: Summary: Clean up the deprecated API usage related to `Buffer` Key: SPARK-45698 URL: https://issues.apache.org/jira/browse/SPARK-45698 Project: Spark Issue Type: Sub-task Components: Spark Core, SQL Affects Versions: 4.0.0 Reporter: Yang Jie * method append in trait Buffer is deprecated (since 2.13.0) * method prepend in trait Buffer is deprecated (since 2.13.0) * method trimEnd in trait Buffer is deprecated (since 2.13.4) * method trimStart in trait Buffer is deprecated (since 2.13.4) {code:java} [warn] /Users/yangjie01/SourceCode/git/spark-mine-sbt/core/src/test/scala/org/apache/spark/deploy/IvyTestUtils.scala:319:18: method append in trait Buffer is deprecated (since 2.13.0): Use appendAll instead [warn] Applicable -Wconf / @nowarn filters for this warning: msg=, cat=deprecation, site=org.apache.spark.deploy.IvyTestUtils.createLocalRepository, origin=scala.collection.mutable.Buffer.append, version=2.13.0 [warn] allFiles.append(rFiles: _*) [warn] ^ [warn] /Users/yangjie01/SourceCode/git/spark-mine-sbt/core/src/main/scala/org/apache/spark/util/SizeEstimator.scala:183:13: method trimEnd in trait Buffer is deprecated (since 2.13.4): use dropRightInPlace instead [warn] Applicable -Wconf / @nowarn filters for this warning: msg=, cat=deprecation, site=org.apache.spark.util.SizeEstimator.SearchState.dequeue, origin=scala.collection.mutable.Buffer.trimEnd, version=2.13.4 [warn] stack.trimEnd(1) [warn] ^{code} -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-45697) Fix `Unicode escapes in triple quoted strings are deprecated`
Yang Jie created SPARK-45697: Summary: Fix `Unicode escapes in triple quoted strings are deprecated` Key: SPARK-45697 URL: https://issues.apache.org/jira/browse/SPARK-45697 Project: Spark Issue Type: Sub-task Components: SQL Affects Versions: 4.0.0 Reporter: Yang Jie {code:java} [warn] /Users/yangjie01/SourceCode/git/spark-mine-sbt/sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/parser/PlanParserSuite.scala:1686:44: Unicode escapes in triple quoted strings are deprecated; use the literal character instead [warn] Applicable -Wconf / @nowarn filters for this warning: msg=, cat=deprecation, version=2.13.2 [warn] | COLLECTION ITEMS TERMINATED BY '\u0002' [warn] ^ {code} -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-45667) Clean up the deprecated API usage related to `IterableOnceExtensionMethods`.
[ https://issues.apache.org/jira/browse/SPARK-45667?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Yang Jie resolved SPARK-45667. -- Fix Version/s: 4.0.0 Resolution: Fixed Issue resolved by pull request 43532 [https://github.com/apache/spark/pull/43532] > Clean up the deprecated API usage related to `IterableOnceExtensionMethods`. > > > Key: SPARK-45667 > URL: https://issues.apache.org/jira/browse/SPARK-45667 > Project: Spark > Issue Type: Sub-task > Components: Connect, Spark Core, SQL >Affects Versions: 4.0.0 >Reporter: Yang Jie >Assignee: Yang Jie >Priority: Major > Labels: pull-request-available > Fix For: 4.0.0 > > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-45667) Clean up the deprecated API usage related to `IterableOnceExtensionMethods`.
[ https://issues.apache.org/jira/browse/SPARK-45667?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Yang Jie reassigned SPARK-45667: Assignee: Yang Jie > Clean up the deprecated API usage related to `IterableOnceExtensionMethods`. > > > Key: SPARK-45667 > URL: https://issues.apache.org/jira/browse/SPARK-45667 > Project: Spark > Issue Type: Sub-task > Components: Connect, Spark Core, SQL >Affects Versions: 4.0.0 >Reporter: Yang Jie >Assignee: Yang Jie >Priority: Major > Labels: pull-request-available > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-45481) Introduce a mapper for parquet compression codecs
[ https://issues.apache.org/jira/browse/SPARK-45481?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jiaan Geng resolved SPARK-45481. Fix Version/s: 4.0.0 Resolution: Fixed Issue resolved by pull request 43308 [https://github.com/apache/spark/pull/43308] > Introduce a mapper for parquet compression codecs > - > > Key: SPARK-45481 > URL: https://issues.apache.org/jira/browse/SPARK-45481 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 4.0.0 >Reporter: Jiaan Geng >Assignee: Jiaan Geng >Priority: Major > Labels: pull-request-available > Fix For: 4.0.0 > > > Currently, Spark supported all the parquet compression codecs, but the > parquet supported compression codecs and spark supported are not completely > one-on-one due to Spark introduce a fake compression codecs none. > There are a lot of magic strings copy from parquet compression codecs. This > issue lead to developers need to manually maintain its consistency. It is > easy to make mistakes and reduce development efficiency. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-45696) Fix `method tryCompleteWith in trait Promise is deprecated`
Yang Jie created SPARK-45696: Summary: Fix `method tryCompleteWith in trait Promise is deprecated` Key: SPARK-45696 URL: https://issues.apache.org/jira/browse/SPARK-45696 Project: Spark Issue Type: Sub-task Components: Spark Core Affects Versions: 4.0.0 Reporter: Yang Jie {code:java} [warn] /Users/yangjie01/SourceCode/git/spark-mine-sbt/core/src/main/scala/org/apache/spark/FutureAction.scala:190:32: method tryCompleteWith in trait Promise is deprecated (since 2.13.0): Since this method is semantically equivalent to `completeWith`, use that instead. [warn] Applicable -Wconf / @nowarn filters for this warning: msg=, cat=deprecation, site=org.apache.spark.ComplexFutureAction.p, origin=scala.concurrent.Promise.tryCompleteWith, version=2.13.0 [warn] private val p = Promise[T]().tryCompleteWith(run(jobSubmitter)) [warn] ^ {code} -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-45694) Fix `method signum in trait ScalaNumberProxy is deprecated`
Yang Jie created SPARK-45694: Summary: Fix `method signum in trait ScalaNumberProxy is deprecated` Key: SPARK-45694 URL: https://issues.apache.org/jira/browse/SPARK-45694 Project: Spark Issue Type: Sub-task Components: SQL Affects Versions: 4.0.0 Reporter: Yang Jie {code:java} [warn] /Users/yangjie01/SourceCode/git/spark-mine-sbt/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/EquivalentExpressions.scalalang:194:25: method signum in trait ScalaNumberProxy is deprecated (since 2.13.0): use `sign` method instead [warn] Applicable -Wconf / @nowarn filters for this warning: msg=, cat=deprecation, site=org.apache.spark.sql.catalyst.expressions.EquivalentExpressions.updateExprTree.uc, origin=scala.runtime.ScalaNumberProxy.signum, version=2.13.0 [warn] val uc = useCount.signum [warn] {code} -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-45691) Clean up the deprecated API usage related to `RightProjection/LeftProjection/Either`
Yang Jie created SPARK-45691: Summary: Clean up the deprecated API usage related to `RightProjection/LeftProjection/Either` Key: SPARK-45691 URL: https://issues.apache.org/jira/browse/SPARK-45691 Project: Spark Issue Type: Sub-task Components: Spark Core, SQL Affects Versions: 4.0.0 Reporter: Yang Jie * method get in class RightProjection is deprecated (since 2.13.0) * method get in class LeftProjection is deprecated (since 2.13.0) * method right in class Either is deprecated (since 2.13.0) {code:java} [warn] /Users/yangjie01/SourceCode/git/spark-mine-sbt/sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/v2/GroupBasedRowLevelOperationScanPlanning.scala:54:28: method get in class LeftProjection is deprecated (since 2.13.0): use `Either.swap.getOrElse` instead [warn] Applicable -Wconf / @nowarn filters for this warning: msg=, cat=deprecation, site=org.apache.spark.sql.execution.datasources.v2.GroupBasedRowLevelOperationScanPlanning.apply, origin=scala.util.Either.LeftProjection.get, version=2.13.0 [warn] pushedFilters.left.get.mkString(", ") [warn] ^ {code} -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-45690) Clean up type use of `BufferedIterator/CanBuildFrom/Traversable`
Yang Jie created SPARK-45690: Summary: Clean up type use of `BufferedIterator/CanBuildFrom/Traversable` Key: SPARK-45690 URL: https://issues.apache.org/jira/browse/SPARK-45690 Project: Spark Issue Type: Sub-task Components: Spark Core Affects Versions: 4.0.0 Reporter: Yang Jie * type BufferedIterator in package scala is deprecated (since 2.13.0) * type CanBuildFrom in package generic is deprecated (since 2.13.0) * type Traversable in package scala is deprecated (since 2.13.0) {code:java} [warn] /Users/yangjie01/SourceCode/git/spark-mine-sbt/sql/core/src/main/scala/org/apache/spark/sql/execution/GroupedIterator.scala:67:12: type BufferedIterator in package scala is deprecated (since 2.13.0): Use scala.collection.BufferedIterator instead of scala.BufferedIterator [warn] Applicable -Wconf / @nowarn filters for this warning: msg=, cat=deprecation, site=org.apache.spark.sql.execution.GroupedIterator.input, origin=scala.BufferedIterator, version=2.13.0 [warn] input: BufferedIterator[InternalRow], [warn] ^ {code} -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-45689) Clean up the deprecated API usage related to `StringContext/StringOps`
Yang Jie created SPARK-45689: Summary: Clean up the deprecated API usage related to `StringContext/StringOps` Key: SPARK-45689 URL: https://issues.apache.org/jira/browse/SPARK-45689 Project: Spark Issue Type: Sub-task Components: Spark Core, SQL Affects Versions: 4.0.0 Reporter: Yang Jie {code:java} [warn] /Users/yangjie01/SourceCode/git/spark-mine-sbt/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/codegen/javaCode.scala:258:30: method treatEscapes in object StringContext is deprecated (since 2.13.0): use processEscapes [warn] Applicable -Wconf / @nowarn filters for this warning: msg=, cat=deprecation, site=org.apache.spark.sql.catalyst.expressions.codegen.Block.foldLiteralArgs, origin=scala.StringContext.treatEscapes, version=2.13.0 [warn] buf.append(StringContext.treatEscapes(strings.next())) [warn] ^ [warn] /Users/yangjie01/SourceCode/git/spark-mine-sbt/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/codegen/javaCode.scala:270:32: method treatEscapes in object StringContext is deprecated (since 2.13.0): use processEscapes [warn] Applicable -Wconf / @nowarn filters for this warning: msg=, cat=deprecation, site=org.apache.spark.sql.catalyst.expressions.codegen.Block.foldLiteralArgs, origin=scala.StringContext.treatEscapes, version=2.13.0 [warn] buf.append(StringContext.treatEscapes(strings.next())) [warn] {code} * method checkLengths in class StringContext is deprecated (since 2.13.0) * method treatEscapes in object StringContext is deprecated (since 2.13.0) * method replaceAllLiterally in class StringOps is deprecated (since 2.13.2) -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-45688) Clean up the deprecated API usage related to `MapOps`
Yang Jie created SPARK-45688: Summary: Clean up the deprecated API usage related to `MapOps` Key: SPARK-45688 URL: https://issues.apache.org/jira/browse/SPARK-45688 Project: Spark Issue Type: Sub-task Components: Spark Core, SQL Affects Versions: 4.0.0 Reporter: Yang Jie * method - in trait MapOps is deprecated (since 2.13.0) * method -- in trait MapOps is deprecated (since 2.13.0) * method + in trait MapOps is deprecated (since 2.13.0) {code:java} [warn] /Users/yangjie01/SourceCode/git/spark-mine-sbt/core/src/main/scala/org/apache/spark/deploy/worker/CommandUtils.scala:84:27: method + in trait MapOps is deprecated (since 2.13.0): Consider requiring an immutable Map or fall back to Map.concat. [warn] Applicable -Wconf / @nowarn filters for this warning: msg=, cat=deprecation, site=org.apache.spark.deploy.worker.CommandUtils.buildLocalCommand.newEnvironment, origin=scala.collection.MapOps.+, version=2.13.0 [warn] command.environment + ((libraryPathName, libraryPaths.mkString(File.pathSeparator))) [warn] ^ [warn] /Users/yangjie01/SourceCode/git/spark-mine-sbt/core/src/main/scala/org/apache/spark/deploy/worker/CommandUtils.scala:91:22: method + in trait MapOps is deprecated (since 2.13.0): Consider requiring an immutable Map or fall back to Map.concat. [warn] Applicable -Wconf / @nowarn filters for this warning: msg=, cat=deprecation, site=org.apache.spark.deploy.worker.CommandUtils.buildLocalCommand, origin=scala.collection.MapOps.+, version=2.13.0 [warn] newEnvironment += (SecurityManager.ENV_AUTH_SECRET -> securityMgr.getSecretKey()) [warn] ^ {code} -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-45687) Fix `Passing an explicit array value to a Scala varargs method is deprecated`
Yang Jie created SPARK-45687: Summary: Fix `Passing an explicit array value to a Scala varargs method is deprecated` Key: SPARK-45687 URL: https://issues.apache.org/jira/browse/SPARK-45687 Project: Spark Issue Type: Sub-task Components: Spark Core, SQL Affects Versions: 4.0.0 Reporter: Yang Jie Passing an explicit array value to a Scala varargs method is deprecated (since 2.13.0) and will result in a defensive copy; Use the more efficient non-copying ArraySeq.unsafeWrapArray or an explicit toIndexedSeq call {code:java} [warn] /Users/yangjie01/SourceCode/git/spark-mine-sbt/sql/hive/src/test/scala/org/apache/spark/sql/hive/execution/AggregationQuerySuite.scala:945:21: Passing an explicit array value to a Scala varargs method is deprecated (since 2.13.0) and will result in a defensive copy; Use the more efficient non-copying ArraySeq.unsafeWrapArray or an explicit toIndexedSeq call [warn] Applicable -Wconf / @nowarn filters for this warning: msg=, cat=deprecation, site=org.apache.spark.sql.hive.execution.AggregationQuerySuite, version=2.13.0 [warn] df.agg(udaf(allColumns: _*)), [warn] ^ [warn] /Users/yangjie01/SourceCode/git/spark-mine-sbt/sql/hive/src/test/scala/org/apache/spark/sql/hive/execution/ObjectHashAggregateSuite.scala:156:48: Passing an explicit array value to a Scala varargs method is deprecated (since 2.13.0) and will result in a defensive copy; Use the more efficient non-copying ArraySeq.unsafeWrapArray or an explicit toIndexedSeq call [warn] Applicable -Wconf / @nowarn filters for this warning: msg=, cat=deprecation, site=org.apache.spark.sql.hive.execution.ObjectHashAggregateSuite, version=2.13.0 [warn] df.agg(aggFunctions.head, aggFunctions.tail: _*), [warn] ^ [warn] /Users/yangjie01/SourceCode/git/spark-mine-sbt/sql/hive/src/test/scala/org/apache/spark/sql/hive/execution/ObjectHashAggregateSuite.scala:161:76: Passing an explicit array value to a Scala varargs method is deprecated (since 2.13.0) and will result in a defensive copy; Use the more efficient non-copying ArraySeq.unsafeWrapArray or an explicit toIndexedSeq call [warn] Applicable -Wconf / @nowarn filters for this warning: msg=, cat=deprecation, site=org.apache.spark.sql.hive.execution.ObjectHashAggregateSuite, version=2.13.0 [warn] df.groupBy($"id" % 4 as "mod").agg(aggFunctions.head, aggFunctions.tail: _*), [warn] ^ [warn] /Users/yangjie01/SourceCode/git/spark-mine-sbt/sql/hive/src/test/scala/org/apache/spark/sql/hive/execution/ObjectHashAggregateSuite.scala:171:50: Passing an explicit array value to a Scala varargs method is deprecated (since 2.13.0) and will result in a defensive copy; Use the more efficient non-copying ArraySeq.unsafeWrapArray or an explicit toIndexedSeq call [warn] Applicable -Wconf / @nowarn filters for this warning: msg=, cat=deprecation, site=org.apache.spark.sql.hive.execution.ObjectHashAggregateSuite, version=2.13.0 [warn] df.agg(aggFunctions.head, aggFunctions.tail: _*), [warn] {code} -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-45686) Fix `method copyArrayToImmutableIndexedSeq in class LowPriorityImplicits2 is deprecated`
Yang Jie created SPARK-45686: Summary: Fix `method copyArrayToImmutableIndexedSeq in class LowPriorityImplicits2 is deprecated` Key: SPARK-45686 URL: https://issues.apache.org/jira/browse/SPARK-45686 Project: Spark Issue Type: Sub-task Components: Spark Core, SQL Affects Versions: 4.0.0 Reporter: Yang Jie {code:java} [error] /Users/yangjie01/SourceCode/git/spark-mine-sbt/mllib-local/src/main/scala/org/apache/spark/ml/linalg/Vectors.scala:57:31: method copyArrayToImmutableIndexedSeq in class LowPriorityImplicits2 is deprecated (since 2.13.0): implicit conversions from Array to immutable.IndexedSeq are implemented by copying; use `toIndexedSeq` explicitly if you want to copy, or use the more efficient non-copying ArraySeq.unsafeWrapArray [error] Applicable -Wconf / @nowarn filters for this fatal warning: msg=, cat=deprecation, site=org.apache.spark.ml.linalg.Vector.equals, origin=scala.LowPriorityImplicits2.copyArrayToImmutableIndexedSeq, version=2.13.0 [error] Vectors.equals(s1.indices, s1.values, s2.indices, s2.values) [error] ^ [error] /Users/yangjie01/SourceCode/git/spark-mine-sbt/mllib-local/src/main/scala/org/apache/spark/ml/linalg/Vectors.scala:57:54: method copyArrayToImmutableIndexedSeq in class LowPriorityImplicits2 is deprecated (since 2.13.0): implicit conversions from Array to immutable.IndexedSeq are implemented by copying; use `toIndexedSeq` explicitly if you want to copy, or use the more efficient non-copying ArraySeq.unsafeWrapArray [error] Applicable -Wconf / @nowarn filters for this fatal warning: msg=, cat=deprecation, site=org.apache.spark.ml.linalg.Vector.equals, origin=scala.LowPriorityImplicits2.copyArrayToImmutableIndexedSeq, version=2.13.0 [error] Vectors.equals(s1.indices, s1.values, s2.indices, s2.values) [error] ^ [error] /Users/yangjie01/SourceCode/git/spark-mine-sbt/mllib-local/src/main/scala/org/apache/spark/ml/linalg/Vectors.scala:59:31: method copyArrayToImmutableIndexedSeq in class LowPriorityImplicits2 is deprecated (since 2.13.0): implicit conversions from Array to immutable.IndexedSeq are implemented by copying; use `toIndexedSeq` explicitly if you want to copy, or use the more efficient non-copying ArraySeq.unsafeWrapArray [error] Applicable -Wconf / @nowarn filters for this fatal warning: msg=, cat=deprecation, site=org.apache.spark.ml.linalg.Vector.equals, origin=scala.LowPriorityImplicits2.copyArrayToImmutableIndexedSeq, version=2.13.0 [error] Vectors.equals(s1.indices, s1.values, 0 until d1.size, d1.values) [error] ^ [error] /Users/yangjie01/SourceCode/git/spark-mine-sbt/mllib-local/src/main/scala/org/apache/spark/ml/linalg/Vectors.scala:61:59: method copyArrayToImmutableIndexedSeq in class LowPriorityImplicits2 is deprecated (since 2.13.0): implicit conversions from Array to immutable.IndexedSeq are implemented by copying; use `toIndexedSeq` explicitly if you want to copy, or use the more efficient non-copying ArraySeq.unsafeWrapArray [error] Applicable -Wconf / @nowarn filters for this fatal warning: msg=, cat=deprecation, site=org.apache.spark.ml.linalg.Vector.equals, origin=scala.LowPriorityImplicits2.copyArrayToImmutableIndexedSeq, version=2.13.0 [error] Vectors.equals(0 until d1.size, d1.values, s1.indices, s1.values) [error] {code} -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-45685) Use `LazyList` instead of `Stream`
Yang Jie created SPARK-45685: Summary: Use `LazyList` instead of `Stream` Key: SPARK-45685 URL: https://issues.apache.org/jira/browse/SPARK-45685 Project: Spark Issue Type: Sub-task Components: Build, Spark Core, SQL Affects Versions: 4.0.0 Reporter: Yang Jie * class Stream in package immutable is deprecated (since 2.13.0)object Stream in * package immutable is deprecated (since 2.13.0) * type Stream in package scala is deprecated (since 2.13.0) * value Stream in package scala is deprecated (since 2.13.0) * method append in class Stream is deprecated (since 2.13.0) * method toStream in trait IterableOnceOps is deprecated (since 2.13.0) {code:java} [warn] /Users/yangjie01/SourceCode/git/spark-mine-sbt/sql/core/src/test/scala/org/apache/spark/sql/GenTPCDSData.scala:49:20: class Stream in package immutable is deprecated (since 2.13.0): Use LazyList (which is fully lazy) instead of Stream (which has a lazy tail only) [warn] Applicable -Wconf / @nowarn filters for this warning: msg=, cat=deprecation, site=org.apache.spark.sql.BlockingLineStream.BlockingStreamed.stream, origin=scala.collection.immutable.Stream, version=2.13.0 [warn] val stream: () => Stream[T]) [warn] ^ {code} -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-45575) support time travel options for df read API
[ https://issues.apache.org/jira/browse/SPARK-45575?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Kent Yao resolved SPARK-45575. -- Fix Version/s: 4.0.0 Resolution: Fixed Issue resolved by pull request 43403 [https://github.com/apache/spark/pull/43403] > support time travel options for df read API > --- > > Key: SPARK-45575 > URL: https://issues.apache.org/jira/browse/SPARK-45575 > Project: Spark > Issue Type: New Feature > Components: SQL >Affects Versions: 4.0.0 >Reporter: Wenchen Fan >Assignee: Wenchen Fan >Priority: Major > Labels: pull-request-available > Fix For: 4.0.0 > > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-45575) support time travel options for df read API
[ https://issues.apache.org/jira/browse/SPARK-45575?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Kent Yao reassigned SPARK-45575: Assignee: Wenchen Fan > support time travel options for df read API > --- > > Key: SPARK-45575 > URL: https://issues.apache.org/jira/browse/SPARK-45575 > Project: Spark > Issue Type: New Feature > Components: SQL >Affects Versions: 4.0.0 >Reporter: Wenchen Fan >Assignee: Wenchen Fan >Priority: Major > Labels: pull-request-available > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-45684) Clean up the deprecated API usage related to `SeqOps`
Yang Jie created SPARK-45684: Summary: Clean up the deprecated API usage related to `SeqOps` Key: SPARK-45684 URL: https://issues.apache.org/jira/browse/SPARK-45684 Project: Spark Issue Type: Sub-task Components: Build, Spark Core, SQL Affects Versions: 4.0.0 Reporter: Yang Jie * method transform in trait SeqOps is deprecated (since 2.13.0) * method reverseMap in trait SeqOps is deprecated (since 2.13.0) * method retain in trait SetOps is deprecated (since 2.13.0) * method - in trait SetOps is deprecated (since 2.13.0) * method -- in trait SetOps is deprecated (since 2.13.0) * method + in trait SetOps is deprecated (since 2.13.0) {code:java} [warn] /Users/yangjie01/SourceCode/git/spark-mine-sbt/mllib/src/main/scala/org/apache/spark/ml/classification/LogisticRegression.scala:675:15: method transform in trait SeqOps is deprecated (since 2.13.0): Use `mapInPlace` on an `IndexedSeq` instead [warn] Applicable -Wconf / @nowarn filters for this warning: msg=, cat=deprecation, site=org.apache.spark.ml.classification.LogisticRegression.train.$anonfun, origin=scala.collection.mutable.SeqOps.transform, version=2.13.0 [warn] centers.transform(_ / numCoefficientSets) [warn] ^ {code} -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-45681) Clone a js version of UIUtils.errorMessageCell to for consistent error parsing on UI
[ https://issues.apache.org/jira/browse/SPARK-45681?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated SPARK-45681: --- Labels: pull-request-available (was: ) > Clone a js version of UIUtils.errorMessageCell to for consistent error > parsing on UI > > > Key: SPARK-45681 > URL: https://issues.apache.org/jira/browse/SPARK-45681 > Project: Spark > Issue Type: Improvement > Components: Web UI >Affects Versions: 3.5.0, 4.0.0 >Reporter: Kent Yao >Priority: Major > Labels: pull-request-available > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-45682) Fix "method + in class Byte/Short/Char/Long/Double/Int is deprecated"
[ https://issues.apache.org/jira/browse/SPARK-45682?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Yang Jie updated SPARK-45682: - Description: {code:java} [warn] /Users/yangjie01/SourceCode/git/spark-mine-sbt/core/src/test/scala/org/apache/spark/rdd/PipedRDDSuite.scala:127:42: method + in class Int is deprecated (since 2.13.0): Adding a number and a String is deprecated. Use the string interpolation `s"$num$str"` [warn] Applicable -Wconf / @nowarn filters for this warning: msg=, cat=deprecation, site=org.apache.spark.rdd.PipedRDDSuite, origin=scala.Int.+, version=2.13.0 [warn] (i: Int, f: String => Unit) => f(i + "_")) {code} > Fix "method + in class Byte/Short/Char/Long/Double/Int is deprecated" > --- > > Key: SPARK-45682 > URL: https://issues.apache.org/jira/browse/SPARK-45682 > Project: Spark > Issue Type: Sub-task > Components: Build, Spark Core >Affects Versions: 4.0.0 >Reporter: Yang Jie >Priority: Major > > {code:java} > [warn] > /Users/yangjie01/SourceCode/git/spark-mine-sbt/core/src/test/scala/org/apache/spark/rdd/PipedRDDSuite.scala:127:42: > method + in class Int is deprecated (since 2.13.0): Adding a number and a > String is deprecated. Use the string interpolation `s"$num$str"` > [warn] Applicable -Wconf / @nowarn filters for this warning: msg= message>, cat=deprecation, site=org.apache.spark.rdd.PipedRDDSuite, > origin=scala.Int.+, version=2.13.0 > [warn] (i: Int, f: String => Unit) => f(i + "_")) {code} -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-45683) Fix `method any2stringadd in object Predef is deprecated`
Yang Jie created SPARK-45683: Summary: Fix `method any2stringadd in object Predef is deprecated` Key: SPARK-45683 URL: https://issues.apache.org/jira/browse/SPARK-45683 Project: Spark Issue Type: Sub-task Components: Build, Spark Core Affects Versions: 4.0.0 Reporter: Yang Jie {code:java} [warn] /Users/yangjie01/SourceCode/git/spark-mine-sbt/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/Expression.scala:720:17: method any2stringadd in object Predef is deprecated (since 2.13.0): Implicit injection of + is deprecated. Convert to String to call + [warn] Applicable -Wconf / @nowarn filters for this warning: msg=, cat=deprecation, site=org.apache.spark.sql.catalyst.expressions.BinaryExpression.nullSafeCodeGen.nullSafeEval, origin=scala.Predef.any2stringadd, version=2.13.0 [warn] leftGen.code + ctx.nullSafeExec(left.nullable, leftGen.isNull) { [warn] ^ {code} -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-45682) Fix "method + in class Byte/Short/Char/Long/Double/Int is deprecated"
Yang Jie created SPARK-45682: Summary: Fix "method + in class Byte/Short/Char/Long/Double/Int is deprecated" Key: SPARK-45682 URL: https://issues.apache.org/jira/browse/SPARK-45682 Project: Spark Issue Type: Sub-task Components: Build, Spark Core Affects Versions: 4.0.0 Reporter: Yang Jie -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-45681) Clone a js version of UIUtils.errorMessageCell to for consistent error parsing on UI
Kent Yao created SPARK-45681: Summary: Clone a js version of UIUtils.errorMessageCell to for consistent error parsing on UI Key: SPARK-45681 URL: https://issues.apache.org/jira/browse/SPARK-45681 Project: Spark Issue Type: Improvement Components: Web UI Affects Versions: 3.5.0, 4.0.0 Reporter: Kent Yao -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-45679) Add clusterBy in DataFrame API
[ https://issues.apache.org/jira/browse/SPARK-45679?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hyukjin Kwon resolved SPARK-45679. -- Fix Version/s: 4.0.0 Resolution: Fixed Issue resolved by pull request 43544 [https://github.com/apache/spark/pull/43544] > Add clusterBy in DataFrame API > -- > > Key: SPARK-45679 > URL: https://issues.apache.org/jira/browse/SPARK-45679 > Project: Spark > Issue Type: Improvement > Components: Connect >Affects Versions: 3.5.1 >Reporter: Zhen Li >Assignee: Zhen Li >Priority: Major > Labels: pull-request-available > Fix For: 4.0.0 > > > Add clusterBy to Dataframe API e.g. in python > DataframeWriterV1 > ``` > df.write > .format("delta") > .clusterBy("clusteringColumn1", "clusteringColumn2") > .save(...) or saveAsTable(...) > ``` > DataFrameWriterV2 > ``` > df.writeTo(...).using("delta") > .clusterBy("clusteringColumn1", "clusteringColumn2") > .create() or replace() or createOrReplace() > ``` -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-45679) Add clusterBy in DataFrame API
[ https://issues.apache.org/jira/browse/SPARK-45679?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hyukjin Kwon reassigned SPARK-45679: Assignee: Zhen Li > Add clusterBy in DataFrame API > -- > > Key: SPARK-45679 > URL: https://issues.apache.org/jira/browse/SPARK-45679 > Project: Spark > Issue Type: Improvement > Components: Connect >Affects Versions: 3.5.1 >Reporter: Zhen Li >Assignee: Zhen Li >Priority: Major > Labels: pull-request-available > > Add clusterBy to Dataframe API e.g. in python > DataframeWriterV1 > ``` > df.write > .format("delta") > .clusterBy("clusteringColumn1", "clusteringColumn2") > .save(...) or saveAsTable(...) > ``` > DataFrameWriterV2 > ``` > df.writeTo(...).using("delta") > .clusterBy("clusteringColumn1", "clusteringColumn2") > .create() or replace() or createOrReplace() > ``` -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Reopened] (SPARK-45651) Snapshots of some packages are not published any more
[ https://issues.apache.org/jira/browse/SPARK-45651?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hyukjin Kwon reopened SPARK-45651: -- Reverted in https://github.com/apache/spark/commit/df0262f29969fe40f53dee070a150f2bfe98484c > Snapshots of some packages are not published any more > - > > Key: SPARK-45651 > URL: https://issues.apache.org/jira/browse/SPARK-45651 > Project: Spark > Issue Type: Bug > Components: Build >Affects Versions: 4.0.0 >Reporter: Enrico Minack >Assignee: Enrico Minack >Priority: Major > Labels: pull-request-available > Fix For: 4.0.0 > > > Snapshots of some packages are not been published anymore, e.g. > spark-sql_2.13-4.0.0 has not been published since Sep, 13th: > https://repository.apache.org/content/groups/snapshots/org/apache/spark/spark-sql_2.13/4.0.0-SNAPSHOT/ > There have been some attempts to fix CI: SPARK-45535 SPARK-45536 > Assumption is that memory consumption during build exceeds the available > memory of the Github host. > The following could be attempted: > - enable manual trigger of the {{publish_snapshots.yml}} workflow > - enable some memory use logging to proof that exceeded memory is the root > cause > - attempt to reduce memory footprint and see impact in above logging > - revert memory use logging -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-45651) Snapshots of some packages are not published any more
[ https://issues.apache.org/jira/browse/SPARK-45651?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hyukjin Kwon resolved SPARK-45651. -- Fix Version/s: (was: 4.0.0) Assignee: (was: Enrico Minack) Resolution: Invalid > Snapshots of some packages are not published any more > - > > Key: SPARK-45651 > URL: https://issues.apache.org/jira/browse/SPARK-45651 > Project: Spark > Issue Type: Bug > Components: Build >Affects Versions: 4.0.0 >Reporter: Enrico Minack >Priority: Major > Labels: pull-request-available > > Snapshots of some packages are not been published anymore, e.g. > spark-sql_2.13-4.0.0 has not been published since Sep, 13th: > https://repository.apache.org/content/groups/snapshots/org/apache/spark/spark-sql_2.13/4.0.0-SNAPSHOT/ > There have been some attempts to fix CI: SPARK-45535 SPARK-45536 > Assumption is that memory consumption during build exceeds the available > memory of the Github host. > The following could be attempted: > - enable manual trigger of the {{publish_snapshots.yml}} workflow > - enable some memory use logging to proof that exceeded memory is the root > cause > - attempt to reduce memory footprint and see impact in above logging > - revert memory use logging -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Comment Edited] (SPARK-45651) Snapshots of some packages are not published any more
[ https://issues.apache.org/jira/browse/SPARK-45651?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17780127#comment-17780127 ] Hyukjin Kwon edited comment on SPARK-45651 at 10/27/23 12:48 AM: - Reverted in https://github.com/apache/spark/commit/df0262f29969fe40f53dee070a150f2bfe98484c and https://github.com/apache/spark/commit/0d665fe8c87b037516f21162d2f5545580776af3 was (Author: gurwls223): Reverted in https://github.com/apache/spark/commit/df0262f29969fe40f53dee070a150f2bfe98484c > Snapshots of some packages are not published any more > - > > Key: SPARK-45651 > URL: https://issues.apache.org/jira/browse/SPARK-45651 > Project: Spark > Issue Type: Bug > Components: Build >Affects Versions: 4.0.0 >Reporter: Enrico Minack >Priority: Major > Labels: pull-request-available > > Snapshots of some packages are not been published anymore, e.g. > spark-sql_2.13-4.0.0 has not been published since Sep, 13th: > https://repository.apache.org/content/groups/snapshots/org/apache/spark/spark-sql_2.13/4.0.0-SNAPSHOT/ > There have been some attempts to fix CI: SPARK-45535 SPARK-45536 > Assumption is that memory consumption during build exceeds the available > memory of the Github host. > The following could be attempted: > - enable manual trigger of the {{publish_snapshots.yml}} workflow > - enable some memory use logging to proof that exceeded memory is the root > cause > - attempt to reduce memory footprint and see impact in above logging > - revert memory use logging -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Reopened] (SPARK-45651) Snapshots of some packages are not published any more
[ https://issues.apache.org/jira/browse/SPARK-45651?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hyukjin Kwon reopened SPARK-45651: -- > Snapshots of some packages are not published any more > - > > Key: SPARK-45651 > URL: https://issues.apache.org/jira/browse/SPARK-45651 > Project: Spark > Issue Type: Bug > Components: Build >Affects Versions: 4.0.0 >Reporter: Enrico Minack >Priority: Major > Labels: pull-request-available > > Snapshots of some packages are not been published anymore, e.g. > spark-sql_2.13-4.0.0 has not been published since Sep, 13th: > https://repository.apache.org/content/groups/snapshots/org/apache/spark/spark-sql_2.13/4.0.0-SNAPSHOT/ > There have been some attempts to fix CI: SPARK-45535 SPARK-45536 > Assumption is that memory consumption during build exceeds the available > memory of the Github host. > The following could be attempted: > - enable manual trigger of the {{publish_snapshots.yml}} workflow > - enable some memory use logging to proof that exceeded memory is the root > cause > - attempt to reduce memory footprint and see impact in above logging > - revert memory use logging -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-45677) Observe API error logging
[ https://issues.apache.org/jira/browse/SPARK-45677?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jungtaek Lim resolved SPARK-45677. -- Fix Version/s: 4.0.0 Resolution: Fixed Issue resolved by pull request 43542 [https://github.com/apache/spark/pull/43542] > Observe API error logging > - > > Key: SPARK-45677 > URL: https://issues.apache.org/jira/browse/SPARK-45677 > Project: Spark > Issue Type: Task > Components: Structured Streaming >Affects Versions: 4.0.0 >Reporter: Wei Liu >Assignee: Wei Liu >Priority: Major > Labels: pull-request-available > Fix For: 4.0.0 > > > We should tell user why it's not supported and what to do > [https://github.com/apache/spark/blob/536439244593d40bdab88e9d3657f2691d3d33f2/sql/core/src/main/scala/org/apache/spark/sql/Observation.scala#L76] -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-45677) Observe API error logging
[ https://issues.apache.org/jira/browse/SPARK-45677?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jungtaek Lim reassigned SPARK-45677: Assignee: Wei Liu > Observe API error logging > - > > Key: SPARK-45677 > URL: https://issues.apache.org/jira/browse/SPARK-45677 > Project: Spark > Issue Type: Task > Components: Structured Streaming >Affects Versions: 4.0.0 >Reporter: Wei Liu >Assignee: Wei Liu >Priority: Major > Labels: pull-request-available > > We should tell user why it's not supported and what to do > [https://github.com/apache/spark/blob/536439244593d40bdab88e9d3657f2691d3d33f2/sql/core/src/main/scala/org/apache/spark/sql/Observation.scala#L76] -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-44386) Use PartitionEvaluator API in HashAggregateExec, ObjectHashAggregateExec, SortAggregateExec
[ https://issues.apache.org/jira/browse/SPARK-44386?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated SPARK-44386: --- Labels: pull-request-available (was: ) > Use PartitionEvaluator API in HashAggregateExec, ObjectHashAggregateExec, > SortAggregateExec > --- > > Key: SPARK-44386 > URL: https://issues.apache.org/jira/browse/SPARK-44386 > Project: Spark > Issue Type: Sub-task > Components: SQL >Affects Versions: 3.5.0 >Reporter: Jia Fan >Priority: Major > Labels: pull-request-available > > Use PartitionEvaluator API in HashAggregateExec, ObjectHashAggregateExec, > SortAggregateExec -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-32268) Bloom Filter Join
[ https://issues.apache.org/jira/browse/SPARK-32268?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated SPARK-32268: --- Labels: pull-request-available (was: ) > Bloom Filter Join > - > > Key: SPARK-32268 > URL: https://issues.apache.org/jira/browse/SPARK-32268 > Project: Spark > Issue Type: New Feature > Components: SQL >Affects Versions: 3.1.0 >Reporter: Yuming Wang >Assignee: Yingyi Bu >Priority: Major > Labels: pull-request-available > Fix For: 3.3.0 > > Attachments: q16-bloom-filter.jpg, q16-default.jpg > > > We can improve the performance of some joins by pre-filtering one side of a > join using a Bloom filter and IN predicate generated from the values from the > other side of the join. > For > example:[tpcds/q16.sql|https://github.com/apache/spark/blob/a78d6ce376edf2a8836e01f47b9dff5371058d4c/sql/core/src/test/resources/tpcds/q16.sql]. > [Before this > optimization|https://issues.apache.org/jira/secure/attachment/13007418/q16-default.jpg]. > [After this > optimization|https://issues.apache.org/jira/secure/attachment/13007416/q16-bloom-filter.jpg]. > *Query Performance Benchmarks: TPC-DS Performance Evaluation* > Our setup for running TPC-DS benchmark was as follows: TPC-DS 5T and > Partitioned Parquet table > > |Query|Default(Seconds)|Enable Bloom Filter Join(Seconds)| > |tpcds q16|84|46| > |tpcds q36|29|21| > |tpcds q57|39|28| > |tpcds q94|42|34| > |tpcds q95|306|288| -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-44447) Use PartitionEvaluator API in FlatMapGroupsInPandasExec, FlatMapCoGroupsInPandasExec
[ https://issues.apache.org/jira/browse/SPARK-7?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated SPARK-7: --- Labels: pull-request-available (was: ) > Use PartitionEvaluator API in FlatMapGroupsInPandasExec, > FlatMapCoGroupsInPandasExec > > > Key: SPARK-7 > URL: https://issues.apache.org/jira/browse/SPARK-7 > Project: Spark > Issue Type: Sub-task > Components: SQL >Affects Versions: 3.5.0 >Reporter: Vinod KC >Priority: Major > Labels: pull-request-available > > Use PartitionEvaluator API in > `FlatMapGroupsInPandasExec` > `FlatMapCoGroupsInPandasExec` -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-44385) Use PartitionEvaluator API in MergingSessionsExec & UpdatingSessionsExec
[ https://issues.apache.org/jira/browse/SPARK-44385?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated SPARK-44385: --- Labels: pull-request-available (was: ) > Use PartitionEvaluator API in MergingSessionsExec & UpdatingSessionsExec > > > Key: SPARK-44385 > URL: https://issues.apache.org/jira/browse/SPARK-44385 > Project: Spark > Issue Type: Sub-task > Components: SQL >Affects Versions: 3.5.0 >Reporter: Jia Fan >Priority: Major > Labels: pull-request-available > > Use PartitionEvaluator API in MergingSessionsExec & UpdatingSessionsExec -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-44414) Fixed matching check for CharType/VarcharType
[ https://issues.apache.org/jira/browse/SPARK-44414?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated SPARK-44414: --- Labels: pull-request-available (was: ) > Fixed matching check for CharType/VarcharType > - > > Key: SPARK-44414 > URL: https://issues.apache.org/jira/browse/SPARK-44414 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 3.1.2, 3.2.0, 3.3.0, 3.4.0 >Reporter: caican >Priority: Major > Labels: pull-request-available > > Running the following code throws an exception > {code:java} > val analyzer = getAnalyzer > // check varchar type > val json1 = "{\"__CHAR_VARCHAR_TYPE_STRING\":\"varchar(80)\"}" > val metadata1 = new > MetadataBuilder().withMetadata(Metadata.fromJson(json1)).build() > val query1 = TestRelation(StructType(Seq( > StructField("x", StringType, metadata = metadata1), > StructField("y", StringType, metadata = metadata1))).toAttributes) > val table1 = TestRelation(StructType(Seq( > StructField("x", StringType, metadata = metadata1), > StructField("y", StringType, metadata = metadata1))).toAttributes) > val parsedPlanByName1 = byName(table1, query1) > analyzer.executeAndCheck(parsedPlanByName1, new QueryPlanningTracker()) {code} > > Exception details are as follows > {code:java} > org.apache.spark.sql.AnalysisException: unresolved operator 'AppendData > TestRelation [x#8, y#9], true; > 'AppendData TestRelation [x#8, y#9], true > +- TestRelation [x#6, y#7] at > org.apache.spark.sql.catalyst.analysis.CheckAnalysis.failAnalysis(CheckAnalysis.scala:52) > at > org.apache.spark.sql.catalyst.analysis.CheckAnalysis.failAnalysis$(CheckAnalysis.scala:51) > at > org.apache.spark.sql.catalyst.analysis.Analyzer.failAnalysis(Analyzer.scala:156) > at > org.apache.spark.sql.catalyst.analysis.CheckAnalysis.$anonfun$checkAnalysis$47(CheckAnalysis.scala:704) > at > org.apache.spark.sql.catalyst.analysis.CheckAnalysis.$anonfun$checkAnalysis$47$adapted(CheckAnalysis.scala:702) > at > org.apache.spark.sql.catalyst.trees.TreeNode.foreachUp(TreeNode.scala:186) > at > org.apache.spark.sql.catalyst.analysis.CheckAnalysis.checkAnalysis(CheckAnalysis.scala:702) > at > org.apache.spark.sql.catalyst.analysis.CheckAnalysis.checkAnalysis$(CheckAnalysis.scala:92) > at > org.apache.spark.sql.catalyst.analysis.Analyzer.checkAnalysis(Analyzer.scala:156) > at > org.apache.spark.sql.catalyst.analysis.Analyzer.$anonfun$executeAndCheck$1(Analyzer.scala:177) > at > org.apache.spark.sql.catalyst.plans.logical.AnalysisHelper$.markInAnalyzer(AnalysisHelper.scala:228) > at > org.apache.spark.sql.catalyst.analysis.Analyzer.executeAndCheck(Analyzer.scala:174) > at > org.apache.spark.sql.catalyst.analysis.DataSourceV2AnalysisBaseSuite.$anonfun$new$36(DataSourceV2AnalysisSuite.scala:691) > {code} > > > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-44362) Use PartitionEvaluator API in AggregateInPandasExec, AttachDistributedSequenceExec
[ https://issues.apache.org/jira/browse/SPARK-44362?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated SPARK-44362: --- Labels: pull-request-available (was: ) > Use PartitionEvaluator API in AggregateInPandasExec, > AttachDistributedSequenceExec > --- > > Key: SPARK-44362 > URL: https://issues.apache.org/jira/browse/SPARK-44362 > Project: Spark > Issue Type: Sub-task > Components: SQL >Affects Versions: 3.5.0 >Reporter: Vinod KC >Priority: Major > Labels: pull-request-available > > Use PartitionEvaluator API in > `AggregateInPandasExec` > `AttachDistributedSequenceExec` -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-45680) ReleaseSession to close Spark Connect session
[ https://issues.apache.org/jira/browse/SPARK-45680?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated SPARK-45680: --- Labels: pull-request-available (was: ) > ReleaseSession to close Spark Connect session > - > > Key: SPARK-45680 > URL: https://issues.apache.org/jira/browse/SPARK-45680 > Project: Spark > Issue Type: Improvement > Components: Connect >Affects Versions: 4.0.0 >Reporter: Juliusz Sompolski >Priority: Major > Labels: pull-request-available > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-43754) Spark Connect Session & Query lifecycle
[ https://issues.apache.org/jira/browse/SPARK-43754?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Juliusz Sompolski updated SPARK-43754: -- Affects Version/s: 4.0.0 > Spark Connect Session & Query lifecycle > --- > > Key: SPARK-43754 > URL: https://issues.apache.org/jira/browse/SPARK-43754 > Project: Spark > Issue Type: Epic > Components: Connect >Affects Versions: 3.5.0, 4.0.0 >Reporter: Juliusz Sompolski >Priority: Major > > Currently, queries in Spark Connect are executed within the RPC handler. > We want to detach the RPC interface from actual sessions and execution, so > that we can make the interface more flexible > * maintain long running sessions, independent of unbroken GRPC channel > * be able to cancel queries > * have different interfaces to query results than push from server -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-45680) ReleaseSession to close Spark Connect session
Juliusz Sompolski created SPARK-45680: - Summary: ReleaseSession to close Spark Connect session Key: SPARK-45680 URL: https://issues.apache.org/jira/browse/SPARK-45680 Project: Spark Issue Type: Improvement Components: Connect Affects Versions: 4.0.0 Reporter: Juliusz Sompolski -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-45679) Add clusterBy in DataFrame API
[ https://issues.apache.org/jira/browse/SPARK-45679?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated SPARK-45679: --- Labels: pull-request-available (was: ) > Add clusterBy in DataFrame API > -- > > Key: SPARK-45679 > URL: https://issues.apache.org/jira/browse/SPARK-45679 > Project: Spark > Issue Type: Improvement > Components: Connect >Affects Versions: 3.5.1 >Reporter: Zhen Li >Priority: Major > Labels: pull-request-available > > Add clusterBy to Dataframe API e.g. in python > DataframeWriterV1 > ``` > df.write > .format("delta") > .clusterBy("clusteringColumn1", "clusteringColumn2") > .save(...) or saveAsTable(...) > ``` > DataFrameWriterV2 > ``` > df.writeTo(...).using("delta") > .clusterBy("clusteringColumn1", "clusteringColumn2") > .create() or replace() or createOrReplace() > ``` -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-45679) Add clusterBy in DataFrame API
Zhen Li created SPARK-45679: --- Summary: Add clusterBy in DataFrame API Key: SPARK-45679 URL: https://issues.apache.org/jira/browse/SPARK-45679 Project: Spark Issue Type: Improvement Components: Connect Affects Versions: 3.5.1 Reporter: Zhen Li Add clusterBy to Dataframe API e.g. in python DataframeWriterV1 ``` df.write .format("delta") .clusterBy("clusteringColumn1", "clusteringColumn2") .save(...) or saveAsTable(...) ``` DataFrameWriterV2 ``` df.writeTo(...).using("delta") .clusterBy("clusteringColumn1", "clusteringColumn2") .create() or replace() or createOrReplace() ``` -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-45678) Cover BufferReleasingInputStream.available under tryOrFetchFailedException
[ https://issues.apache.org/jira/browse/SPARK-45678?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated SPARK-45678: --- Labels: pull-request-available (was: ) > Cover BufferReleasingInputStream.available under tryOrFetchFailedException > -- > > Key: SPARK-45678 > URL: https://issues.apache.org/jira/browse/SPARK-45678 > Project: Spark > Issue Type: Improvement > Components: Spark Core >Affects Versions: 4.0.0 >Reporter: L. C. Hsieh >Priority: Minor > Labels: pull-request-available > > We have encountered shuffle data corruption issue: > ``` > Caused by: java.io.IOException: FAILED_TO_UNCOMPRESS(5) > at org.xerial.snappy.SnappyNative.throw_error(SnappyNative.java:112) > at org.xerial.snappy.SnappyNative.rawUncompress(Native Method) > at org.xerial.snappy.Snappy.rawUncompress(Snappy.java:504) > at org.xerial.snappy.Snappy.uncompress(Snappy.java:543) > at > org.xerial.snappy.SnappyInputStream.hasNextChunk(SnappyInputStream.java:450) > at > org.xerial.snappy.SnappyInputStream.available(SnappyInputStream.java:497) > at > org.apache.spark.storage.BufferReleasingInputStream.available(ShuffleBlockFetcherIterator.scala:1356) > ``` > Spark shuffle has capacity to detect corruption for a few stream op like > `read` and `skip`, such `IOException` in the stack trace will be rethrown as > `FetchFailedException` that will re-try the failed shuffle task. But in the > stack trace it is `available` that is not covered by the mechanism. So > no-retry has been happened and the Spark application just failed. > As the `available` op will also involve data decompression, we should be able > to check it like `read` and `skip` do. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-45678) Cover BufferReleasingInputStream.available under tryOrFetchFailedException
L. C. Hsieh created SPARK-45678: --- Summary: Cover BufferReleasingInputStream.available under tryOrFetchFailedException Key: SPARK-45678 URL: https://issues.apache.org/jira/browse/SPARK-45678 Project: Spark Issue Type: Improvement Components: Spark Core Affects Versions: 4.0.0 Reporter: L. C. Hsieh We have encountered shuffle data corruption issue: ``` Caused by: java.io.IOException: FAILED_TO_UNCOMPRESS(5) at org.xerial.snappy.SnappyNative.throw_error(SnappyNative.java:112) at org.xerial.snappy.SnappyNative.rawUncompress(Native Method) at org.xerial.snappy.Snappy.rawUncompress(Snappy.java:504) at org.xerial.snappy.Snappy.uncompress(Snappy.java:543) at org.xerial.snappy.SnappyInputStream.hasNextChunk(SnappyInputStream.java:450) at org.xerial.snappy.SnappyInputStream.available(SnappyInputStream.java:497) at org.apache.spark.storage.BufferReleasingInputStream.available(ShuffleBlockFetcherIterator.scala:1356) ``` Spark shuffle has capacity to detect corruption for a few stream op like `read` and `skip`, such `IOException` in the stack trace will be rethrown as `FetchFailedException` that will re-try the failed shuffle task. But in the stack trace it is `available` that is not covered by the mechanism. So no-retry has been happened and the Spark application just failed. As the `available` op will also involve data decompression, we should be able to check it like `read` and `skip` do. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-45652) SPJ: Handle empty input partitions after dynamic filtering
[ https://issues.apache.org/jira/browse/SPARK-45652?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dongjoon Hyun updated SPARK-45652: -- Fix Version/s: 3.4.2 3.5.1 > SPJ: Handle empty input partitions after dynamic filtering > -- > > Key: SPARK-45652 > URL: https://issues.apache.org/jira/browse/SPARK-45652 > Project: Spark > Issue Type: Sub-task > Components: SQL >Affects Versions: 3.4.1 >Reporter: Chao Sun >Assignee: Chao Sun >Priority: Major > Labels: pull-request-available > Fix For: 3.4.2, 4.0.0, 3.5.1 > > > When the number of input partitions become 0 after dynamic filtering, in > {{BatchScanExec}}, currently SPJ will fail with error: > {code} > java.util.NoSuchElementException: None.get > at scala.None$.get(Option.scala:529) > at scala.None$.get(Option.scala:527) > at > org.apache.spark.sql.execution.datasources.v2.BatchScanExec.filteredPartitions$lzycompute(BatchScanExec.scala:108) > at > org.apache.spark.sql.execution.datasources.v2.BatchScanExec.filteredPartitions(BatchScanExec.scala:65) > at > org.apache.spark.sql.execution.datasources.v2.BatchScanExec.inputRDD$lzycompute(BatchScanExec.scala:136) > at > org.apache.spark.sql.execution.datasources.v2.BatchScanExec.inputRDD(BatchScanExec.scala:135) > at > org.apache.spark.sql.boson.BosonBatchScanExec.inputRDD$lzycompute(BosonBatchScanExec.scala:28) > at > org.apache.spark.sql.boson.BosonBatchScanExec.inputRDD(BosonBatchScanExec.scala:28) > at > org.apache.spark.sql.boson.BosonBatchScanExec.doExecuteColumnar(BosonBatchScanExec.scala:33) > at > org.apache.spark.sql.execution.SparkPlan.$anonfun$executeColumnar$1(SparkPlan.scala:222) > at > org.apache.spark.sql.execution.SparkPlan.$anonfun$executeQuery$1(SparkPlan.scala:246) > at > org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:151) > at > org.apache.spark.sql.execution.SparkPlan.executeQuery(SparkPlan.scala:243) > at > org.apache.spark.sql.execution.SparkPlan.executeColumnar(SparkPlan.scala:218) > at > org.apache.spark.sql.execution.InputAdapter.doExecuteColumnar(WholeStageCodegenExec.scala:521) > at > org.apache.spark.sql.execution.SparkPlan.$anonfun$executeColumnar$1(SparkPlan.scala:222) > at > org.apache.spark.sql.execution.SparkPlan.$anonfun$executeQuery$1(SparkPlan.scala:246) > at > org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:151) > {code} > This is because {{groupPartitions}} will return {{None}} for this case. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-45677) Observe API error logging
[ https://issues.apache.org/jira/browse/SPARK-45677?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated SPARK-45677: --- Labels: pull-request-available (was: ) > Observe API error logging > - > > Key: SPARK-45677 > URL: https://issues.apache.org/jira/browse/SPARK-45677 > Project: Spark > Issue Type: Task > Components: Structured Streaming >Affects Versions: 4.0.0 >Reporter: Wei Liu >Priority: Major > Labels: pull-request-available > > We should tell user why it's not supported and what to do > [https://github.com/apache/spark/blob/536439244593d40bdab88e9d3657f2691d3d33f2/sql/core/src/main/scala/org/apache/spark/sql/Observation.scala#L76] -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-45677) Observe API error logging
Wei Liu created SPARK-45677: --- Summary: Observe API error logging Key: SPARK-45677 URL: https://issues.apache.org/jira/browse/SPARK-45677 Project: Spark Issue Type: Task Components: Structured Streaming Affects Versions: 4.0.0 Reporter: Wei Liu We should tell user why it's not supported and what to do [https://github.com/apache/spark/blob/536439244593d40bdab88e9d3657f2691d3d33f2/sql/core/src/main/scala/org/apache/spark/sql/Observation.scala#L76] -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-45544) [CORE] Integrate SSL support into TransportContext
[ https://issues.apache.org/jira/browse/SPARK-45544?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated SPARK-45544: --- Labels: pull-request-available (was: ) > [CORE] Integrate SSL support into TransportContext > -- > > Key: SPARK-45544 > URL: https://issues.apache.org/jira/browse/SPARK-45544 > Project: Spark > Issue Type: Task > Components: Spark Core >Affects Versions: 4.0.0 >Reporter: Hasnain Lakhani >Priority: Major > Labels: pull-request-available > > Integrate the SSL support into TransportContext so that Spark can use RPC SSL > support -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-45652) SPJ: Handle empty input partitions after dynamic filtering
[ https://issues.apache.org/jira/browse/SPARK-45652?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Chao Sun resolved SPARK-45652. -- Fix Version/s: 4.0.0 Resolution: Fixed Issue resolved by pull request 43531 [https://github.com/apache/spark/pull/43531] > SPJ: Handle empty input partitions after dynamic filtering > -- > > Key: SPARK-45652 > URL: https://issues.apache.org/jira/browse/SPARK-45652 > Project: Spark > Issue Type: Sub-task > Components: SQL >Affects Versions: 3.4.1 >Reporter: Chao Sun >Assignee: Chao Sun >Priority: Major > Labels: pull-request-available > Fix For: 4.0.0 > > > When the number of input partitions become 0 after dynamic filtering, in > {{BatchScanExec}}, currently SPJ will fail with error: > {code} > java.util.NoSuchElementException: None.get > at scala.None$.get(Option.scala:529) > at scala.None$.get(Option.scala:527) > at > org.apache.spark.sql.execution.datasources.v2.BatchScanExec.filteredPartitions$lzycompute(BatchScanExec.scala:108) > at > org.apache.spark.sql.execution.datasources.v2.BatchScanExec.filteredPartitions(BatchScanExec.scala:65) > at > org.apache.spark.sql.execution.datasources.v2.BatchScanExec.inputRDD$lzycompute(BatchScanExec.scala:136) > at > org.apache.spark.sql.execution.datasources.v2.BatchScanExec.inputRDD(BatchScanExec.scala:135) > at > org.apache.spark.sql.boson.BosonBatchScanExec.inputRDD$lzycompute(BosonBatchScanExec.scala:28) > at > org.apache.spark.sql.boson.BosonBatchScanExec.inputRDD(BosonBatchScanExec.scala:28) > at > org.apache.spark.sql.boson.BosonBatchScanExec.doExecuteColumnar(BosonBatchScanExec.scala:33) > at > org.apache.spark.sql.execution.SparkPlan.$anonfun$executeColumnar$1(SparkPlan.scala:222) > at > org.apache.spark.sql.execution.SparkPlan.$anonfun$executeQuery$1(SparkPlan.scala:246) > at > org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:151) > at > org.apache.spark.sql.execution.SparkPlan.executeQuery(SparkPlan.scala:243) > at > org.apache.spark.sql.execution.SparkPlan.executeColumnar(SparkPlan.scala:218) > at > org.apache.spark.sql.execution.InputAdapter.doExecuteColumnar(WholeStageCodegenExec.scala:521) > at > org.apache.spark.sql.execution.SparkPlan.$anonfun$executeColumnar$1(SparkPlan.scala:222) > at > org.apache.spark.sql.execution.SparkPlan.$anonfun$executeQuery$1(SparkPlan.scala:246) > at > org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:151) > {code} > This is because {{groupPartitions}} will return {{None}} for this case. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-45652) SPJ: Handle empty input partitions after dynamic filtering
[ https://issues.apache.org/jira/browse/SPARK-45652?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Chao Sun reassigned SPARK-45652: Assignee: Chao Sun > SPJ: Handle empty input partitions after dynamic filtering > -- > > Key: SPARK-45652 > URL: https://issues.apache.org/jira/browse/SPARK-45652 > Project: Spark > Issue Type: Sub-task > Components: SQL >Affects Versions: 3.4.1 >Reporter: Chao Sun >Assignee: Chao Sun >Priority: Major > Labels: pull-request-available > > When the number of input partitions become 0 after dynamic filtering, in > {{BatchScanExec}}, currently SPJ will fail with error: > {code} > java.util.NoSuchElementException: None.get > at scala.None$.get(Option.scala:529) > at scala.None$.get(Option.scala:527) > at > org.apache.spark.sql.execution.datasources.v2.BatchScanExec.filteredPartitions$lzycompute(BatchScanExec.scala:108) > at > org.apache.spark.sql.execution.datasources.v2.BatchScanExec.filteredPartitions(BatchScanExec.scala:65) > at > org.apache.spark.sql.execution.datasources.v2.BatchScanExec.inputRDD$lzycompute(BatchScanExec.scala:136) > at > org.apache.spark.sql.execution.datasources.v2.BatchScanExec.inputRDD(BatchScanExec.scala:135) > at > org.apache.spark.sql.boson.BosonBatchScanExec.inputRDD$lzycompute(BosonBatchScanExec.scala:28) > at > org.apache.spark.sql.boson.BosonBatchScanExec.inputRDD(BosonBatchScanExec.scala:28) > at > org.apache.spark.sql.boson.BosonBatchScanExec.doExecuteColumnar(BosonBatchScanExec.scala:33) > at > org.apache.spark.sql.execution.SparkPlan.$anonfun$executeColumnar$1(SparkPlan.scala:222) > at > org.apache.spark.sql.execution.SparkPlan.$anonfun$executeQuery$1(SparkPlan.scala:246) > at > org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:151) > at > org.apache.spark.sql.execution.SparkPlan.executeQuery(SparkPlan.scala:243) > at > org.apache.spark.sql.execution.SparkPlan.executeColumnar(SparkPlan.scala:218) > at > org.apache.spark.sql.execution.InputAdapter.doExecuteColumnar(WholeStageCodegenExec.scala:521) > at > org.apache.spark.sql.execution.SparkPlan.$anonfun$executeColumnar$1(SparkPlan.scala:222) > at > org.apache.spark.sql.execution.SparkPlan.$anonfun$executeQuery$1(SparkPlan.scala:246) > at > org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:151) > {code} > This is because {{groupPartitions}} will return {{None}} for this case. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-45596) Use java.lang.ref.Cleaner instead of org.apache.spark.sql.connect.client.util.Cleaner
[ https://issues.apache.org/jira/browse/SPARK-45596?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Yang Jie reassigned SPARK-45596: Assignee: Min Zhao > Use java.lang.ref.Cleaner instead of > org.apache.spark.sql.connect.client.util.Cleaner > - > > Key: SPARK-45596 > URL: https://issues.apache.org/jira/browse/SPARK-45596 > Project: Spark > Issue Type: Improvement > Components: Connect >Affects Versions: 4.0.0 >Reporter: Min Zhao >Assignee: Min Zhao >Priority: Minor > Labels: pull-request-available > Attachments: image-2023-10-19-02-25-57-966.png > > > Now, we have updated JDK to 17, so should replace this class by > [[java.lang.ref.Cleaner]]. > > !image-2023-10-19-02-25-57-966.png! -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-45596) Use java.lang.ref.Cleaner instead of org.apache.spark.sql.connect.client.util.Cleaner
[ https://issues.apache.org/jira/browse/SPARK-45596?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Yang Jie resolved SPARK-45596. -- Fix Version/s: 4.0.0 Resolution: Fixed Issue resolved by pull request 43439 [https://github.com/apache/spark/pull/43439] > Use java.lang.ref.Cleaner instead of > org.apache.spark.sql.connect.client.util.Cleaner > - > > Key: SPARK-45596 > URL: https://issues.apache.org/jira/browse/SPARK-45596 > Project: Spark > Issue Type: Improvement > Components: Connect >Affects Versions: 4.0.0 >Reporter: Min Zhao >Assignee: Min Zhao >Priority: Minor > Labels: pull-request-available > Fix For: 4.0.0 > > Attachments: image-2023-10-19-02-25-57-966.png > > > Now, we have updated JDK to 17, so should replace this class by > [[java.lang.ref.Cleaner]]. > > !image-2023-10-19-02-25-57-966.png! -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-45637) Time window aggregation in separate streams followed by stream-stream join not returning results
[ https://issues.apache.org/jira/browse/SPARK-45637?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Andrzej Zera updated SPARK-45637: - Description: According to documentation update (SPARK-42591) resulting from SPARK-42376, Spark 3.5.0 should support time-window aggregations in two separate streams followed by stream-stream window join: https://github.com/apache/spark/blob/261b281e6e57be32eb28bf4e50bea24ed22a9f21/docs/structured-streaming-programming-guide.md?plain=1#L1939-L1995 However, I failed to reproduce this example and the query I built doesn't return any results: {code:java} from pyspark.sql.functions import rand from pyspark.sql.functions import expr, window, window_time spark.conf.set("spark.sql.shuffle.partitions", "1") impressions = ( spark .readStream.format("rate").option("rowsPerSecond", "5").option("numPartitions", "1").load() .selectExpr("value AS adId", "timestamp AS impressionTime") ) impressionsWithWatermark = impressions \ .selectExpr("adId AS impressionAdId", "impressionTime") \ .withWatermark("impressionTime", "10 seconds") clicks = ( spark .readStream.format("rate").option("rowsPerSecond", "5").option("numPartitions", "1").load() .where((rand() * 100).cast("integer") < 10) # 10 out of every 100 impressions result in a click .selectExpr("(value - 10) AS adId ", "timestamp AS clickTime") # -10 so that a click with same id as impression is generated later (i.e. delayed data). .where("adId > 0") ) clicksWithWatermark = clicks \ .selectExpr("adId AS clickAdId", "clickTime") \ .withWatermark("clickTime", "10 seconds") clicksWindow = clicksWithWatermark.groupBy( window(clicksWithWatermark.clickTime, "1 minute") ).count() impressionsWindow = impressionsWithWatermark.groupBy( window(impressionsWithWatermark.impressionTime, "1 minute") ).count() clicksAndImpressions = clicksWindow.join(impressionsWindow, "window", "inner") clicksAndImpressions.writeStream \ .format("memory") \ .queryName("clicksAndImpressions") \ .outputMode("append") \ .start() {code} My intuition is that I'm getting no results because to output results of the first stateful operator (time window aggregation), a watermark needs to pass the end timestamp of the window. And once the watermark is after the end timestamp of the window, this window is ignored at the second stateful operator (stream-stream) join because it's behind the watermark. Indeed, a small hack done to event time column (adding one minute) between two stateful operators makes it possible to get results: {code:java} clicksWindow2 = clicksWithWatermark.groupBy( window(clicksWithWatermark.clickTime, "1 minute") ).count().withColumn("window_time", window_time("window") + expr('INTERVAL 1 MINUTE')).drop("window") impressionsWindow2 = impressionsWithWatermark.groupBy( window(impressionsWithWatermark.impressionTime, "1 minute") ).count().withColumn("window_time", window_time("window") + expr('INTERVAL 1 MINUTE')).drop("window") clicksAndImpressions2 = clicksWindow2.join(impressionsWindow2, "window_time", "inner") clicksAndImpressions2.writeStream \ .format("memory") \ .queryName("clicksAndImpressions2") \ .outputMode("append") \ .start() {code} was: According to documentation update (SPARK-42591) resulting from SPARK-42376, Spark 3.5.0 should support time-window aggregations in two separate streams followed by stream-stream window join: [https://github.com/HeartSaVioR/spark/blob/eb0b09f0f2b518915421365a61d1f3d7d58b4404/docs/structured-streaming-programming-guide.md?plain=1#L1939-L1995] However, I failed to reproduce this example and the query I built doesn't return any results: {code:java} from pyspark.sql.functions import rand from pyspark.sql.functions import expr, window, window_time spark.conf.set("spark.sql.shuffle.partitions", "1") impressions = ( spark .readStream.format("rate").option("rowsPerSecond", "5").option("numPartitions", "1").load() .selectExpr("value AS adId", "timestamp AS impressionTime") ) impressionsWithWatermark = impressions \ .selectExpr("adId AS impressionAdId", "impressionTime") \ .withWatermark("impressionTime", "10 seconds") clicks = ( spark .readStream.format("rate").option("rowsPerSecond", "5").option("numPartitions", "1").load() .where((rand() * 100).cast("integer") < 10) # 10 out of every 100 impressions result in a click .selectExpr("(value - 10) AS adId ", "timestamp AS clickTime") # -10 so that a click with same id as impression is generated later (i.e. delayed data). .where("adId > 0") ) clicksWithWatermark = clicks \ .selectExpr("adId AS clickAdId", "clickTime") \ .withWatermark("clickTime", "10 seconds") clicksWindow = clicksWithWatermark.groupBy( window(clicksWithWatermark.clickTime, "1 minute") ).count() impressi
[jira] [Resolved] (SPARK-45659) Add `since` field to Java API marked as `@Deprecated`.
[ https://issues.apache.org/jira/browse/SPARK-45659?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Yang Jie resolved SPARK-45659. -- Fix Version/s: 4.0.0 Resolution: Fixed Issue resolved by pull request 43522 [https://github.com/apache/spark/pull/43522] > Add `since` field to Java API marked as `@Deprecated`. > -- > > Key: SPARK-45659 > URL: https://issues.apache.org/jira/browse/SPARK-45659 > Project: Spark > Issue Type: Improvement > Components: Spark Core, SQL, SS >Affects Versions: 4.0.0 >Reporter: Yang Jie >Assignee: Yang Jie >Priority: Minor > Labels: pull-request-available > Fix For: 4.0.0 > > > Spark 3.0.0: > - SPARK-26861 > - org.apache.spark.sql.expressions.javalang.typed > - SPARK-27606 > - org.apache.spark.sql.catalyst.expressions.ExpressionDescription#extended: > - > org.apache.spark.sql.catalyst.expressions.ExpressionInfo#ExpressionInfo(String, > String, String, String, String) > Spark 3.2.0 > - SPARK-33717 > - > org.apache.spark.launcher.SparkLauncher#DEPRECATED_CHILD_CONNECTION_TIMEOUT > - SPARK-33779 > - org.apache.spark.sql.connector.write.WriteBuilder#buildForBatch > - org.apache.spark.sql.connector.write.WriteBuilder#buildForStreaming > Spark 3.4.0 > - SPARK-39805 > - org.apache.spark.sql.streaming.Trigger > - SPARK-42398 > - > org.apache.spark.sql.connector.catalog.TableCatalog#createTable(Identifier, > StructType, Transform[], Map) > - > org.apache.spark.sql.connector.catalog.StagingTableCatalog#stageCreate(Identifier, > StructType, Transform[], Map) > - org.apache.spark.sql.connector.catalog.Table#schema > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-45659) Add `since` field to Java API marked as `@Deprecated`.
[ https://issues.apache.org/jira/browse/SPARK-45659?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Yang Jie reassigned SPARK-45659: Assignee: Yang Jie > Add `since` field to Java API marked as `@Deprecated`. > -- > > Key: SPARK-45659 > URL: https://issues.apache.org/jira/browse/SPARK-45659 > Project: Spark > Issue Type: Improvement > Components: Spark Core, SQL, SS >Affects Versions: 4.0.0 >Reporter: Yang Jie >Assignee: Yang Jie >Priority: Minor > Labels: pull-request-available > > Spark 3.0.0: > - SPARK-26861 > - org.apache.spark.sql.expressions.javalang.typed > - SPARK-27606 > - org.apache.spark.sql.catalyst.expressions.ExpressionDescription#extended: > - > org.apache.spark.sql.catalyst.expressions.ExpressionInfo#ExpressionInfo(String, > String, String, String, String) > Spark 3.2.0 > - SPARK-33717 > - > org.apache.spark.launcher.SparkLauncher#DEPRECATED_CHILD_CONNECTION_TIMEOUT > - SPARK-33779 > - org.apache.spark.sql.connector.write.WriteBuilder#buildForBatch > - org.apache.spark.sql.connector.write.WriteBuilder#buildForStreaming > Spark 3.4.0 > - SPARK-39805 > - org.apache.spark.sql.streaming.Trigger > - SPARK-42398 > - > org.apache.spark.sql.connector.catalog.TableCatalog#createTable(Identifier, > StructType, Transform[], Map) > - > org.apache.spark.sql.connector.catalog.StagingTableCatalog#stageCreate(Identifier, > StructType, Transform[], Map) > - org.apache.spark.sql.connector.catalog.Table#schema > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-38723) Test the error class: CONCURRENT_QUERY
[ https://issues.apache.org/jira/browse/SPARK-38723?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17779871#comment-17779871 ] Jungtaek Lim commented on SPARK-38723: -- The merge script will assign the PR author. We keep the Jira ticket be unassigned till the PR gets merged. > Test the error class: CONCURRENT_QUERY > -- > > Key: SPARK-38723 > URL: https://issues.apache.org/jira/browse/SPARK-38723 > Project: Spark > Issue Type: Sub-task > Components: SQL >Affects Versions: 3.4.0 >Reporter: Max Gekk >Assignee: Philip Dakin >Priority: Minor > Labels: pull-request-available, starter > Fix For: 4.0.0 > > > Add at least one test for the error class *CONCURRENT_QUERY* to > QueryExecutionErrorsSuite. The test should cover the exception throw in > QueryExecutionErrors: > {code:scala} > def concurrentQueryInstanceError(): Throwable = { > new SparkConcurrentModificationException("CONCURRENT_QUERY", Array.empty) > } > {code} > For example, here is a test for the error class *UNSUPPORTED_FEATURE*: > https://github.com/apache/spark/blob/34e3029a43d2a8241f70f2343be8285cb7f231b9/sql/core/src/test/scala/org/apache/spark/sql/errors/QueryCompilationErrorsSuite.scala#L151-L170 > +The test must have a check of:+ > # the entire error message > # sqlState if it is defined in the error-classes.json file > # the error class -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-38723) Test the error class: CONCURRENT_QUERY
[ https://issues.apache.org/jira/browse/SPARK-38723?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jungtaek Lim reassigned SPARK-38723: Assignee: Philip Dakin > Test the error class: CONCURRENT_QUERY > -- > > Key: SPARK-38723 > URL: https://issues.apache.org/jira/browse/SPARK-38723 > Project: Spark > Issue Type: Sub-task > Components: SQL >Affects Versions: 3.4.0 >Reporter: Max Gekk >Assignee: Philip Dakin >Priority: Minor > Labels: pull-request-available, starter > Fix For: 4.0.0 > > > Add at least one test for the error class *CONCURRENT_QUERY* to > QueryExecutionErrorsSuite. The test should cover the exception throw in > QueryExecutionErrors: > {code:scala} > def concurrentQueryInstanceError(): Throwable = { > new SparkConcurrentModificationException("CONCURRENT_QUERY", Array.empty) > } > {code} > For example, here is a test for the error class *UNSUPPORTED_FEATURE*: > https://github.com/apache/spark/blob/34e3029a43d2a8241f70f2343be8285cb7f231b9/sql/core/src/test/scala/org/apache/spark/sql/errors/QueryCompilationErrorsSuite.scala#L151-L170 > +The test must have a check of:+ > # the entire error message > # sqlState if it is defined in the error-classes.json file > # the error class -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-38723) Test the error class: CONCURRENT_QUERY
[ https://issues.apache.org/jira/browse/SPARK-38723?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17779869#comment-17779869 ] Philip Dakin commented on SPARK-38723: -- [~kabhwan] it's me. BTW - do you know steps to get the ability to assign things? Would have assigned to myself but I don't see the option. > Test the error class: CONCURRENT_QUERY > -- > > Key: SPARK-38723 > URL: https://issues.apache.org/jira/browse/SPARK-38723 > Project: Spark > Issue Type: Sub-task > Components: SQL >Affects Versions: 3.4.0 >Reporter: Max Gekk >Priority: Minor > Labels: pull-request-available, starter > Fix For: 4.0.0 > > > Add at least one test for the error class *CONCURRENT_QUERY* to > QueryExecutionErrorsSuite. The test should cover the exception throw in > QueryExecutionErrors: > {code:scala} > def concurrentQueryInstanceError(): Throwable = { > new SparkConcurrentModificationException("CONCURRENT_QUERY", Array.empty) > } > {code} > For example, here is a test for the error class *UNSUPPORTED_FEATURE*: > https://github.com/apache/spark/blob/34e3029a43d2a8241f70f2343be8285cb7f231b9/sql/core/src/test/scala/org/apache/spark/sql/errors/QueryCompilationErrorsSuite.scala#L151-L170 > +The test must have a check of:+ > # the entire error message > # sqlState if it is defined in the error-classes.json file > # the error class -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-45598) Delta table 3.0.0 not working with Spark Connect 3.5.0
[ https://issues.apache.org/jira/browse/SPARK-45598?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17779843#comment-17779843 ] Faiz Halde commented on SPARK-45598: Hi, do we have any updates here? Happy to help > Delta table 3.0.0 not working with Spark Connect 3.5.0 > -- > > Key: SPARK-45598 > URL: https://issues.apache.org/jira/browse/SPARK-45598 > Project: Spark > Issue Type: Bug > Components: Connect >Affects Versions: 3.5.0 >Reporter: Faiz Halde >Priority: Major > > Spark version 3.5.0 > Spark Connect version 3.5.0 > Delta table 3.0-rc2 > Spark connect server was started using > *{{./sbin/start-connect-server.sh --master spark://localhost:7077 --packages > org.apache.spark:spark-connect_2.12:3.5.0,io.delta:delta-spark_2.12:3.0.0rc2 > --conf "spark.sql.extensions=io.delta.sql.DeltaSparkSessionExtension" --conf > "spark.sql.catalog.spark_catalog=org.apache.spark.sql.delta.catalog.DeltaCatalog" > --conf > 'spark.jars.repositories=[https://oss.sonatype.org/content/repositories/iodelta-1120']}}* > {{Connect client depends on}} > *libraryDependencies += "io.delta" %% "delta-spark" % "3.0.0rc2"* > *and the connect libraries* > > When trying to run a simple job that writes to a delta table > {{val spark = SparkSession.builder().remote("sc://localhost").getOrCreate()}} > {{val data = spark.read.json("profiles.json")}} > {{data.write.format("delta").save("/tmp/delta")}} > > {{Error log in connect client}} > {{Exception in thread "main" org.apache.spark.SparkException: > io.grpc.StatusRuntimeException: INTERNAL: Job aborted due to stage failure: > Task 0 in stage 1.0 failed 4 times, most recent failure: Lost task 0.3 in > stage 1.0 (TID 4) (172.23.128.15 executor 0): java.lang.ClassCastException: > cannot assign instance of java.lang.invoke.SerializedLambda to field > org.apache.spark.sql.catalyst.expressions.ScalaUDF.f of type scala.Function1 > in instance of org.apache.spark.sql.catalyst.expressions.ScalaUDF}} > {{ at > java.io.ObjectStreamClass$FieldReflector.setObjFieldValues(ObjectStreamClass.java:2301)}} > {{ at > java.io.ObjectStreamClass.setObjFieldValues(ObjectStreamClass.java:1431)}} > {{ at > java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:2437)}} > {{ at > java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:2355)}} > {{ at > java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:2213)}} > {{ at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1669)}} > {{ at java.io.ObjectInputStream.readArray(ObjectInputStream.java:2119)}} > {{ at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1657)}} > {{ at > java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:2431)}} > {{ at > java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:2355)}} > {{ at > java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:2213)}} > {{ at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1669)}} > {{ at > java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:2431)}} > {{ at > java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:2355)}} > {{ at > java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:2213)}} > {{ at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1669)}} > {{ at java.io.ObjectInputStream.readArray(ObjectInputStream.java:2119)}} > {{ at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1657)}} > {{ at > java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:2431)}} > {{ at > java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:2355)}} > {{ at > java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:2213)}} > {{ at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1669)}} > {{...}} > {{ at > org.apache.spark.sql.connect.client.GrpcExceptionConverter$.toThrowable(GrpcExceptionConverter.scala:110)}} > {{ at > org.apache.spark.sql.connect.client.GrpcExceptionConverter$.convert(GrpcExceptionConverter.scala:41)}} > {{ at > org.apache.spark.sql.connect.client.GrpcExceptionConverter$$anon$1.hasNext(GrpcExceptionConverter.scala:49)}} > {{ at scala.collection.Iterator.foreach(Iterator.scala:943)}} > {{ at scala.collection.Iterator.foreach$(Iterator.scala:943)}} > {{ at > org.apache.spark.sql.connect.client.GrpcExceptionConverter$$anon$1.foreach(GrpcExceptionConverter.scala:46)}} > {{ at scala.collection.generic.Growable.$plus$plus$eq(Growable.scala:62)}} > {{ at scala.collection.generic.Growable.$plus$plus$eq$(Growable.scala:53)}} > {{ at > scala.collection.mutable.ArrayBuffer.$plus$plus$eq(ArrayBuffer.scala:105)}} > {{ at > scala.collection.mutable.Ar
[jira] [Assigned] (SPARK-45642) Fix `FileSystem.isFile & FileSystem.isDirectory is deprecated`
[ https://issues.apache.org/jira/browse/SPARK-45642?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot reassigned SPARK-45642: -- Assignee: (was: Apache Spark) > Fix `FileSystem.isFile & FileSystem.isDirectory is deprecated` > -- > > Key: SPARK-45642 > URL: https://issues.apache.org/jira/browse/SPARK-45642 > Project: Spark > Issue Type: Improvement > Components: Spark Core, SQL >Affects Versions: 4.0.0 >Reporter: BingKun Pan >Priority: Minor > Labels: pull-request-available > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-45642) Fix `FileSystem.isFile & FileSystem.isDirectory is deprecated`
[ https://issues.apache.org/jira/browse/SPARK-45642?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot reassigned SPARK-45642: -- Assignee: Apache Spark > Fix `FileSystem.isFile & FileSystem.isDirectory is deprecated` > -- > > Key: SPARK-45642 > URL: https://issues.apache.org/jira/browse/SPARK-45642 > Project: Spark > Issue Type: Improvement > Components: Spark Core, SQL >Affects Versions: 4.0.0 >Reporter: BingKun Pan >Assignee: Apache Spark >Priority: Minor > Labels: pull-request-available > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-45368) Remove scala2.12 compatibility logic for DoubleType, FloatType, Decimal
[ https://issues.apache.org/jira/browse/SPARK-45368?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot reassigned SPARK-45368: -- Assignee: Apache Spark > Remove scala2.12 compatibility logic for DoubleType, FloatType, Decimal > --- > > Key: SPARK-45368 > URL: https://issues.apache.org/jira/browse/SPARK-45368 > Project: Spark > Issue Type: Sub-task > Components: SQL >Affects Versions: 4.0.0 >Reporter: BingKun Pan >Assignee: Apache Spark >Priority: Minor > Labels: pull-request-available > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-45481) Introduce a mapper for parquet compression codecs
[ https://issues.apache.org/jira/browse/SPARK-45481?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot reassigned SPARK-45481: -- Assignee: Jiaan Geng (was: Apache Spark) > Introduce a mapper for parquet compression codecs > - > > Key: SPARK-45481 > URL: https://issues.apache.org/jira/browse/SPARK-45481 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 4.0.0 >Reporter: Jiaan Geng >Assignee: Jiaan Geng >Priority: Major > Labels: pull-request-available > > Currently, Spark supported all the parquet compression codecs, but the > parquet supported compression codecs and spark supported are not completely > one-on-one due to Spark introduce a fake compression codecs none. > There are a lot of magic strings copy from parquet compression codecs. This > issue lead to developers need to manually maintain its consistency. It is > easy to make mistakes and reduce development efficiency. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-45368) Remove scala2.12 compatibility logic for DoubleType, FloatType, Decimal
[ https://issues.apache.org/jira/browse/SPARK-45368?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot reassigned SPARK-45368: -- Assignee: (was: Apache Spark) > Remove scala2.12 compatibility logic for DoubleType, FloatType, Decimal > --- > > Key: SPARK-45368 > URL: https://issues.apache.org/jira/browse/SPARK-45368 > Project: Spark > Issue Type: Sub-task > Components: SQL >Affects Versions: 4.0.0 >Reporter: BingKun Pan >Priority: Minor > Labels: pull-request-available > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-45481) Introduce a mapper for parquet compression codecs
[ https://issues.apache.org/jira/browse/SPARK-45481?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot reassigned SPARK-45481: -- Assignee: Apache Spark (was: Jiaan Geng) > Introduce a mapper for parquet compression codecs > - > > Key: SPARK-45481 > URL: https://issues.apache.org/jira/browse/SPARK-45481 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 4.0.0 >Reporter: Jiaan Geng >Assignee: Apache Spark >Priority: Major > Labels: pull-request-available > > Currently, Spark supported all the parquet compression codecs, but the > parquet supported compression codecs and spark supported are not completely > one-on-one due to Spark introduce a fake compression codecs none. > There are a lot of magic strings copy from parquet compression codecs. This > issue lead to developers need to manually maintain its consistency. It is > easy to make mistakes and reduce development efficiency. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-45368) Remove scala2.12 compatibility logic for DoubleType, FloatType, Decimal
[ https://issues.apache.org/jira/browse/SPARK-45368?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot reassigned SPARK-45368: -- Assignee: (was: Apache Spark) > Remove scala2.12 compatibility logic for DoubleType, FloatType, Decimal > --- > > Key: SPARK-45368 > URL: https://issues.apache.org/jira/browse/SPARK-45368 > Project: Spark > Issue Type: Sub-task > Components: SQL >Affects Versions: 4.0.0 >Reporter: BingKun Pan >Priority: Minor > Labels: pull-request-available > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-45481) Introduce a mapper for parquet compression codecs
[ https://issues.apache.org/jira/browse/SPARK-45481?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot reassigned SPARK-45481: -- Assignee: Apache Spark (was: Jiaan Geng) > Introduce a mapper for parquet compression codecs > - > > Key: SPARK-45481 > URL: https://issues.apache.org/jira/browse/SPARK-45481 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 4.0.0 >Reporter: Jiaan Geng >Assignee: Apache Spark >Priority: Major > Labels: pull-request-available > > Currently, Spark supported all the parquet compression codecs, but the > parquet supported compression codecs and spark supported are not completely > one-on-one due to Spark introduce a fake compression codecs none. > There are a lot of magic strings copy from parquet compression codecs. This > issue lead to developers need to manually maintain its consistency. It is > easy to make mistakes and reduce development efficiency. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-45481) Introduce a mapper for parquet compression codecs
[ https://issues.apache.org/jira/browse/SPARK-45481?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot reassigned SPARK-45481: -- Assignee: Jiaan Geng (was: Apache Spark) > Introduce a mapper for parquet compression codecs > - > > Key: SPARK-45481 > URL: https://issues.apache.org/jira/browse/SPARK-45481 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 4.0.0 >Reporter: Jiaan Geng >Assignee: Jiaan Geng >Priority: Major > Labels: pull-request-available > > Currently, Spark supported all the parquet compression codecs, but the > parquet supported compression codecs and spark supported are not completely > one-on-one due to Spark introduce a fake compression codecs none. > There are a lot of magic strings copy from parquet compression codecs. This > issue lead to developers need to manually maintain its consistency. It is > easy to make mistakes and reduce development efficiency. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-45368) Remove scala2.12 compatibility logic for DoubleType, FloatType, Decimal
[ https://issues.apache.org/jira/browse/SPARK-45368?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot reassigned SPARK-45368: -- Assignee: Apache Spark > Remove scala2.12 compatibility logic for DoubleType, FloatType, Decimal > --- > > Key: SPARK-45368 > URL: https://issues.apache.org/jira/browse/SPARK-45368 > Project: Spark > Issue Type: Sub-task > Components: SQL >Affects Versions: 4.0.0 >Reporter: BingKun Pan >Assignee: Apache Spark >Priority: Minor > Labels: pull-request-available > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-45481) Introduce a mapper for parquet compression codecs
[ https://issues.apache.org/jira/browse/SPARK-45481?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot reassigned SPARK-45481: -- Assignee: Jiaan Geng (was: Apache Spark) > Introduce a mapper for parquet compression codecs > - > > Key: SPARK-45481 > URL: https://issues.apache.org/jira/browse/SPARK-45481 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 4.0.0 >Reporter: Jiaan Geng >Assignee: Jiaan Geng >Priority: Major > Labels: pull-request-available > > Currently, Spark supported all the parquet compression codecs, but the > parquet supported compression codecs and spark supported are not completely > one-on-one due to Spark introduce a fake compression codecs none. > There are a lot of magic strings copy from parquet compression codecs. This > issue lead to developers need to manually maintain its consistency. It is > easy to make mistakes and reduce development efficiency. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-45481) Introduce a mapper for parquet compression codecs
[ https://issues.apache.org/jira/browse/SPARK-45481?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot reassigned SPARK-45481: -- Assignee: Apache Spark (was: Jiaan Geng) > Introduce a mapper for parquet compression codecs > - > > Key: SPARK-45481 > URL: https://issues.apache.org/jira/browse/SPARK-45481 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 4.0.0 >Reporter: Jiaan Geng >Assignee: Apache Spark >Priority: Major > Labels: pull-request-available > > Currently, Spark supported all the parquet compression codecs, but the > parquet supported compression codecs and spark supported are not completely > one-on-one due to Spark introduce a fake compression codecs none. > There are a lot of magic strings copy from parquet compression codecs. This > issue lead to developers need to manually maintain its consistency. It is > easy to make mistakes and reduce development efficiency. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-45481) Introduce a mapper for parquet compression codecs
[ https://issues.apache.org/jira/browse/SPARK-45481?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot reassigned SPARK-45481: -- Assignee: Jiaan Geng (was: Apache Spark) > Introduce a mapper for parquet compression codecs > - > > Key: SPARK-45481 > URL: https://issues.apache.org/jira/browse/SPARK-45481 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 4.0.0 >Reporter: Jiaan Geng >Assignee: Jiaan Geng >Priority: Major > Labels: pull-request-available > > Currently, Spark supported all the parquet compression codecs, but the > parquet supported compression codecs and spark supported are not completely > one-on-one due to Spark introduce a fake compression codecs none. > There are a lot of magic strings copy from parquet compression codecs. This > issue lead to developers need to manually maintain its consistency. It is > easy to make mistakes and reduce development efficiency. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-45481) Introduce a mapper for parquet compression codecs
[ https://issues.apache.org/jira/browse/SPARK-45481?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot reassigned SPARK-45481: -- Assignee: Apache Spark (was: Jiaan Geng) > Introduce a mapper for parquet compression codecs > - > > Key: SPARK-45481 > URL: https://issues.apache.org/jira/browse/SPARK-45481 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 4.0.0 >Reporter: Jiaan Geng >Assignee: Apache Spark >Priority: Major > Labels: pull-request-available > > Currently, Spark supported all the parquet compression codecs, but the > parquet supported compression codecs and spark supported are not completely > one-on-one due to Spark introduce a fake compression codecs none. > There are a lot of magic strings copy from parquet compression codecs. This > issue lead to developers need to manually maintain its consistency. It is > easy to make mistakes and reduce development efficiency. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-45676) Upgrade to PySpark 3.5.0 gives Class org.apache.hadoop.fs.s3a.S3AFileSystem not found
Miles Granger created SPARK-45676: - Summary: Upgrade to PySpark 3.5.0 gives Class org.apache.hadoop.fs.s3a.S3AFileSystem not found Key: SPARK-45676 URL: https://issues.apache.org/jira/browse/SPARK-45676 Project: Spark Issue Type: Bug Components: PySpark Affects Versions: 3.5.0 Reporter: Miles Granger Using PySpark 3.4.1 w/ the following dependencies works fine for reading S3 files: hadoop-client:3.3.4 hadoop-common:3.3.4 hadoop-aws:3.3.4 aws-java-sdk-bundle:1.12.262 Doing a simple upgrade to PySpark 3.5.0 (which is still using hadoop 3.3.4 AFAIK) results in failing to read the same S3 files: ``` Caused by: java.lang.RuntimeException: java.lang.ClassNotFoundException: Class org.apache.hadoop.fs.s3a.S3AFileSystem not found at org.apache.hadoop.conf.Configuration.getClass(Configuration.java:2688) at org.apache.hadoop.fs.FileSystem.getFileSystemClass(FileSystem.java:3431) at org.apache.hadoop.fs.FileSystem.createFileSystem(FileSystem.java:3466) at org.apache.hadoop.fs.FileSystem.access$300(FileSystem.java:174) at org.apache.hadoop.fs.FileSystem$Cache.getInternal(FileSystem.java:3574) at org.apache.hadoop.fs.FileSystem$Cache.get(FileSystem.java:3521) at org.apache.hadoop.fs.FileSystem.get(FileSystem.java:540) at org.apache.hadoop.fs.Path.getFileSystem(Path.java:365) at org.apache.parquet.hadoop.util.HadoopInputFile.fromStatus(HadoopInputFile.java:44) at org.apache.spark.sql.execution.datasources.parquet.ParquetFooterReader.readFooter(ParquetFooterReader.java:76) at org.apache.spark.sql.execution.datasources.parquet.ParquetFileFormat$.$anonfun$readParquetFootersInParallel$1(ParquetFileFormat.scala:450) ... 14 more ``` -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-45675) Specify number of partitions when creating spark dataframe from pandas dataframe
Jelmer Kuperus created SPARK-45675: -- Summary: Specify number of partitions when creating spark dataframe from pandas dataframe Key: SPARK-45675 URL: https://issues.apache.org/jira/browse/SPARK-45675 Project: Spark Issue Type: Improvement Components: Pandas API on Spark Affects Versions: 3.5.0 Reporter: Jelmer Kuperus When converting a large pandas dataframe to a spark dataframe like so {code:java} import pandas as pd pdf = pd.DataFrame([{"board_id": "3074457346698037360_0", "file_name": "board-content", "value": "A" * 119251} for i in range(0, 2)]) spark.createDataFrame(pdf).write.mode("overwrite").format("delta").saveAsTable("catalog.schema.table"){code} You can encounter the following error org.apache.spark.SparkException: Job aborted due to stage failure: Serialized task 11:1 was 366405365 bytes, which exceeds max allowed: spark.rpc.message.maxSize (268435456 bytes). Consider increasing spark.rpc.message.maxSize or using broadcast variables for large values. As far as I can tell spark first converts the pandas dataframe into a python list and then constructs an rdd out of that list. which means that the parallelism is determined by the value of spark.sparkcontext.defaultparallelism and if the pandas dataframe is very large and the number of available cores is low then you end up with very large tasks that exceed the limits imposed on the size of tasks Methods like spark.sparkContext.parallelize allow you to pass in the number of partitions of the resulting dataset. I think having a similar capability when creating a dataframe from a pandas dataframe makes a lot of sense. As right now I think the only workaround I can think of is changing the value of spark.default.parallelism but this is a system wide setting -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org