[spark] branch master updated: [SPARK-44295][BUILD] Upgrade `scala-parser-combinators` to 2.3.0
This is an automated email from the ASF dual-hosted git repository. gurwls223 pushed a commit to branch master in repository https://gitbox.apache.org/repos/asf/spark.git The following commit(s) were added to refs/heads/master by this push: new 0e8a95ea99f [SPARK-44295][BUILD] Upgrade `scala-parser-combinators` to 2.3.0 0e8a95ea99f is described below commit 0e8a95ea99f3be0c31e8c76d5086d46062ef3bef Author: yangjie01 AuthorDate: Tue Jul 4 20:06:24 2023 +0900 [SPARK-44295][BUILD] Upgrade `scala-parser-combinators` to 2.3.0 ### What changes were proposed in this pull request? This pr aims to upgrade `scala-parser-combinators` from 2.2.0 to 2.3.0 ### Why are the changes needed? The new version [dropped support for Scala 2.11](https://github.com/scala/scala-parser-combinators/pull/504) and bring a bug fix: - https://github.com/scala/scala-parser-combinators/pull/507 The full release notes as follows: - https://github.com/scala/scala-parser-combinators/releases/tag/v2.3.0 ### Does this PR introduce _any_ user-facing change? No ### How was this patch tested? Pass Github Actions Closes #41848 from LuciferYang/scala-parser-combinators-23. Authored-by: yangjie01 Signed-off-by: Hyukjin Kwon --- dev/deps/spark-deps-hadoop-3-hive-2.3 | 2 +- pom.xml | 2 +- 2 files changed, 2 insertions(+), 2 deletions(-) diff --git a/dev/deps/spark-deps-hadoop-3-hive-2.3 b/dev/deps/spark-deps-hadoop-3-hive-2.3 index 3ac03fa6472..910b1703c70 100644 --- a/dev/deps/spark-deps-hadoop-3-hive-2.3 +++ b/dev/deps/spark-deps-hadoop-3-hive-2.3 @@ -227,7 +227,7 @@ rocksdbjni/8.3.2//rocksdbjni-8.3.2.jar scala-collection-compat_2.12/2.7.0//scala-collection-compat_2.12-2.7.0.jar scala-compiler/2.12.18//scala-compiler-2.12.18.jar scala-library/2.12.18//scala-library-2.12.18.jar -scala-parser-combinators_2.12/2.2.0//scala-parser-combinators_2.12-2.2.0.jar +scala-parser-combinators_2.12/2.3.0//scala-parser-combinators_2.12-2.3.0.jar scala-reflect/2.12.18//scala-reflect-2.12.18.jar scala-xml_2.12/2.1.0//scala-xml_2.12-2.1.0.jar shims/0.9.45//shims-0.9.45.jar diff --git a/pom.xml b/pom.xml index deccc904dd9..588f91155d6 100644 --- a/pom.xml +++ b/pom.xml @@ -1102,7 +1102,7 @@ org.scala-lang.modules scala-parser-combinators_${scala.binary.version} -2.2.0 +2.3.0 jline - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark] branch master updated: [SPARK-44297][CORE][TESTS] Make `ClassLoaderIsolationSuite` test pass with Scala 2.13
This is an automated email from the ASF dual-hosted git repository. yangjie01 pushed a commit to branch master in repository https://gitbox.apache.org/repos/asf/spark.git The following commit(s) were added to refs/heads/master by this push: new 16295a33822 [SPARK-44297][CORE][TESTS] Make `ClassLoaderIsolationSuite` test pass with Scala 2.13 16295a33822 is described below commit 16295a338220cc66fea5d91bcbd4213df0f2d4bd Author: yangjie01 AuthorDate: Tue Jul 4 22:37:04 2023 +0800 [SPARK-44297][CORE][TESTS] Make `ClassLoaderIsolationSuite` test pass with Scala 2.13 ### What changes were proposed in this pull request? The main change of this pr as follows: 1. rename `TestHelloV2.jar` and `TestHelloV3.jar` added in https://github.com/apache/spark/pull/41789 to `TestHelloV2_2.12.jar` and `TestHelloV3_2.12.jar` 2. Add corresponding `TestHelloV2_2.13.jar` and `TestHelloV3_2.13.jar` which compiled with Scala 2.13 3. Make `ClassLoaderIsolationSuite` use the correct jar in testing ### Why are the changes needed? Make `ClassLoaderIsolationSuite` test pass with Scala 2.13. The Scala 2.13 daily test failed after https://github.com/apache/spark/pull/41789 - https://github.com/apache/spark/actions/runs/5447771717/jobs/9910185372 ``` [info] - Executor classloader isolation with JobArtifactSet *** FAILED *** (83 milliseconds) [info] org.apache.spark.SparkException: Job aborted due to stage failure: Task 0 in stage 0.0 failed 1 times, most recent failure: Lost task 0.0 in stage 0.0 (TID 0) (localhost executor driver): java.lang.NoClassDefFoundError: scala/Serializable [info] at java.lang.ClassLoader.defineClass1(Native Method) [info] at java.lang.ClassLoader.defineClass(ClassLoader.java:756) [info] at java.security.SecureClassLoader.defineClass(SecureClassLoader.java:142) [info] at java.net.URLClassLoader.defineClass(URLClassLoader.java:473) [info] at java.net.URLClassLoader.access$100(URLClassLoader.java:74) [info] at java.net.URLClassLoader$1.run(URLClassLoader.java:369) [info] at java.net.URLClassLoader$1.run(URLClassLoader.java:363) [info] at java.security.AccessController.doPrivileged(Native Method) [info] at java.net.URLClassLoader.findClass(URLClassLoader.java:362) [info] at java.lang.ClassLoader.loadClass(ClassLoader.java:418) [info] at java.lang.ClassLoader.loadClass(ClassLoader.java:351) [info] at java.lang.Class.forName0(Native Method) [info] at java.lang.Class.forName(Class.java:348) [info] at org.apache.spark.util.SparkClassUtils.classForName(SparkClassUtils.scala:35) [info] at org.apache.spark.util.SparkClassUtils.classForName$(SparkClassUtils.scala:30) [info] at org.apache.spark.util.Utils$.classForName(Utils.scala:94) [info] at org.apache.spark.executor.ClassLoaderIsolationSuite.$anonfun$new$3(ClassLoaderIsolationSuite.scala:53) [info] at scala.runtime.java8.JFunction1$mcVI$sp.apply(JFunction1$mcVI$sp.scala:18) [info] at scala.collection.IterableOnceOps.foreach(IterableOnce.scala:576) [info] at scala.collection.IterableOnceOps.foreach$(IterableOnce.scala:574) [info] at org.apache.spark.InterruptibleIterator.foreach(InterruptibleIterator.scala:28) [info] at org.apache.spark.rdd.RDD.$anonfun$foreach$2(RDD.scala:1028) [info] at org.apache.spark.rdd.RDD.$anonfun$foreach$2$adapted(RDD.scala:1028) [info] at org.apache.spark.SparkContext.$anonfun$runJob$5(SparkContext.scala:2406) [info] at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:93) [info] at org.apache.spark.TaskContext.runTaskWithListeners(TaskContext.scala:161) [info] at org.apache.spark.scheduler.Task.run(Task.scala:141) [info] at org.apache.spark.executor.Executor$TaskRunner.$anonfun$run$4(Executor.scala:593) [info] at org.apache.spark.util.Utils$.tryWithSafeFinally(Utils.scala:1478) [info] at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:596) [info] at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) [info] at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) [info] at java.lang.Thread.run(Thread.java:750) [info] Caused by: java.lang.ClassNotFoundException: scala.Serializable [info] at java.net.URLClassLoader.findClass(URLClassLoader.java:387) [info] at java.lang.ClassLoader.loadClass(ClassLoader.java:418) [info] at java.lang.ClassLoader.loadClass(ClassLoader.java:351) [info] ... 33 more ``` ### Does this PR introduce _any_ user-facing change? No ### How was this patch tested? - Pass GitHub Actions - Manual check: **Scala 2.12** ``` build/sbt "core/testOnly *ClassLoaderIsolationSuite" ``` ``` [info] ClassLoaderIsolationSuite: [info] - Executor classloader isolation with JobArtifactSet (1 seco
[spark] branch master updated: [SPARK-44296][BUILD] Upgrade dropwizard metrics 4.2.19
This is an automated email from the ASF dual-hosted git repository. yangjie01 pushed a commit to branch master in repository https://gitbox.apache.org/repos/asf/spark.git The following commit(s) were added to refs/heads/master by this push: new 68862589a0c [SPARK-44296][BUILD] Upgrade dropwizard metrics 4.2.19 68862589a0c is described below commit 68862589a0cc0c450434d3a1675bb241a5917f1b Author: yangjie01 AuthorDate: Tue Jul 4 22:38:43 2023 +0800 [SPARK-44296][BUILD] Upgrade dropwizard metrics 4.2.19 ### What changes were proposed in this pull request? This pr aims upgrade dropwizard metrics to 4.2.19. ### Why are the changes needed? The new version bring a bug fix related to metrics-jetty module: - https://github.com/dropwizard/metrics/pull/3379 ### Does this PR introduce _any_ user-facing change? No ### How was this patch tested? Pass Github Actions Closes #41849 from LuciferYang/SPARK-44296. Authored-by: yangjie01 Signed-off-by: yangjie01 --- dev/deps/spark-deps-hadoop-3-hive-2.3 | 10 +- pom.xml | 2 +- 2 files changed, 6 insertions(+), 6 deletions(-) diff --git a/dev/deps/spark-deps-hadoop-3-hive-2.3 b/dev/deps/spark-deps-hadoop-3-hive-2.3 index 910b1703c70..1cdf08f321e 100644 --- a/dev/deps/spark-deps-hadoop-3-hive-2.3 +++ b/dev/deps/spark-deps-hadoop-3-hive-2.3 @@ -177,11 +177,11 @@ log4j-slf4j2-impl/2.20.0//log4j-slf4j2-impl-2.20.0.jar logging-interceptor/3.12.12//logging-interceptor-3.12.12.jar lz4-java/1.8.0//lz4-java-1.8.0.jar mesos/1.4.3/shaded-protobuf/mesos-1.4.3-shaded-protobuf.jar -metrics-core/4.2.18//metrics-core-4.2.18.jar -metrics-graphite/4.2.18//metrics-graphite-4.2.18.jar -metrics-jmx/4.2.18//metrics-jmx-4.2.18.jar -metrics-json/4.2.18//metrics-json-4.2.18.jar -metrics-jvm/4.2.18//metrics-jvm-4.2.18.jar +metrics-core/4.2.19//metrics-core-4.2.19.jar +metrics-graphite/4.2.19//metrics-graphite-4.2.19.jar +metrics-jmx/4.2.19//metrics-jmx-4.2.19.jar +metrics-json/4.2.19//metrics-json-4.2.19.jar +metrics-jvm/4.2.19//metrics-jvm-4.2.19.jar minlog/1.3.0//minlog-1.3.0.jar netty-all/4.1.93.Final//netty-all-4.1.93.Final.jar netty-buffer/4.1.93.Final//netty-buffer-4.1.93.Final.jar diff --git a/pom.xml b/pom.xml index 588f91155d6..2e29d1de0c9 100644 --- a/pom.xml +++ b/pom.xml @@ -152,7 +152,7 @@ If you changes codahale.metrics.version, you also need to change the link to metrics.dropwizard.io in docs/monitoring.md. --> -4.2.18 +4.2.19 1.11.1 1.12.0 - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark] branch master updated (68862589a0c -> d53585c91b2)
This is an automated email from the ASF dual-hosted git repository. maxgekk pushed a change to branch master in repository https://gitbox.apache.org/repos/asf/spark.git from 68862589a0c [SPARK-44296][BUILD] Upgrade dropwizard metrics 4.2.19 add d53585c91b2 [SPARK-44292][SQL] Assign names to the error class _LEGACY_ERROR_TEMP_[2315-2319] No new revisions were added by this update. Summary of changes: .../src/main/resources/error/error-classes.json| 57 -- ...ror-conditions-datatype-mismatch-error-class.md | 4 ++ ...itions-invalid-observed-metrics-error-class.md} | 24 - .../sql/catalyst/analysis/CheckAnalysis.scala | 24 + .../sql/catalyst/analysis/AnalysisSuite.scala | 40 ++- 5 files changed, 92 insertions(+), 57 deletions(-) copy docs/{sql-error-conditions-invalid-schema-error-class.md => sql-error-conditions-invalid-observed-metrics-error-class.md} (61%) - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark] branch master updated: [SPARK-44293][CONNECT] Fix invalid URI for custom JARs in Spark Connect
This is an automated email from the ASF dual-hosted git repository. gurwls223 pushed a commit to branch master in repository https://gitbox.apache.org/repos/asf/spark.git The following commit(s) were added to refs/heads/master by this push: new c08700f05f9 [SPARK-44293][CONNECT] Fix invalid URI for custom JARs in Spark Connect c08700f05f9 is described below commit c08700f05f96e083dd8dec12fb2ca90a49d16a52 Author: vicennial AuthorDate: Wed Jul 5 08:41:47 2023 +0900 [SPARK-44293][CONNECT] Fix invalid URI for custom JARs in Spark Connect ### What changes were proposed in this pull request? This PR fixs the bug where invalid JAR URIs were being generated because the URI was stored as `artifactURI + "/" + target.toString` (here, `target` is the absolute path of the file) instead of `artifactURI + "/" + remoteRelativePath.toString` (here, the `remoteRelativePath` is in the form of `jars/...`) ### Why are the changes needed? Without this change, Spark Connect users attempting to use a custom JAR (such as in UDFs) will hit task failure issue as an exception would be thrown during the JAR file fetch operation. Example stacktrace: ``` 23/07/03 17:00:15 INFO Executor: Fetching spark://ip-10-110-22-170.us-west-2.compute.internal:43743/artifacts/d9548b02-ff3b-4278-ab52-aef5d1fc724e//home/venkata.gudesa/spark/artifacts/spark-d6141194-c487-40fd-ba40-444d922808ea/d9548b02-ff3b-4278-ab52-aef5d1fc724e/jars/TestHelloV2.jar with timestamp 0 23/07/03 17:00:15 ERROR Executor: Exception in task 6.0 in stage 4.0 (TID 55) java.lang.RuntimeException: Stream '/artifacts/d9548b02-ff3b-4278-ab52-aef5d1fc724e//home/venkata.gudesa/spark/artifacts/spark-d6141194-c487-40fd-ba40-444d922808ea/d9548b02-ff3b-4278-ab52-aef5d1fc724e/jars/TestHelloV2.jar' was not found. at org.apache.spark.network.client.TransportResponseHandler.handle(TransportResponseHandler.java:260) at org.apache.spark.network.server.TransportChannelHandler.channelRead0(TransportChannelHandler.java:142) at org.apache.spark.network.server.TransportChannelHandler.channelRead0(TransportChannelHandler.java:53) at io.netty.channel.SimpleChannelInboundHandler.channelRead(SimpleChannelInboundHandler.java:99) at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:444) at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:420) at io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:412) at io.netty.handler.timeout.IdleStateHandler.channelRead(IdleStateHandler.java:286) at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:442) at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:420) at io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:412) at io.netty.handler.codec.MessageToMessageDecoder.channelRead(MessageToMessageDecoder.java:103) at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:444) at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:420) at io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:412) at org.apache.spark.network.util.TransportFrameDecoder.channelRead(TransportFrameDecoder.java:102) at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:444) at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:420) at io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:412) at io.netty.channel.DefaultChannelPipeline$HeadContext.channelRead(DefaultChannelPipeline.java:1410) at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:440) at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:420) at io.netty.channel.DefaultChannelPipeline.fireChannelRead(DefaultChannelPipeline.java:919) at io.netty.channel.nio.AbstractNioByteChannel$NioByteUnsafe.read(AbstractNioByteChannel.java:166) at io.netty.channel.nio.NioEventLoop.processSelectedKey(NioEventLoop.java:788) at io.netty.channel.nio.NioEventLoop.processSelectedKeysOptimized(NioEventLoop.java:724) at io.netty.channel.nio.NioEventLoop.processSelectedKeys(NioEventLoop.java:650) at io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:5
[spark] branch branch-3.4 updated: [SPARK-44215][SHUFFLE] If num chunks are 0, then server should throw a RuntimeException
This is an automated email from the ASF dual-hosted git repository. mridulm80 pushed a commit to branch branch-3.4 in repository https://gitbox.apache.org/repos/asf/spark.git The following commit(s) were added to refs/heads/branch-3.4 by this push: new 85d41757e85 [SPARK-44215][SHUFFLE] If num chunks are 0, then server should throw a RuntimeException 85d41757e85 is described below commit 85d41757e855d97dee0f24f281f82249161c3d29 Author: Chandni Singh AuthorDate: Tue Jul 4 18:46:18 2023 -0500 [SPARK-44215][SHUFFLE] If num chunks are 0, then server should throw a RuntimeException ### What changes were proposed in this pull request? The executor expects `numChunks` to be > 0. If it is zero, then we see that the executor fails with ``` 23/06/20 19:07:37 ERROR task 2031.0 in stage 47.0 (TID 25018) Executor: Exception in task 2031.0 in stage 47.0 (TID 25018) java.lang.ArithmeticException: / by zero at org.apache.spark.storage.PushBasedFetchHelper.createChunkBlockInfosFromMetaResponse(PushBasedFetchHelper.scala:128) at org.apache.spark.storage.ShuffleBlockFetcherIterator.next(ShuffleBlockFetcherIterator.scala:1047) at org.apache.spark.storage.ShuffleBlockFetcherIterator.next(ShuffleBlockFetcherIterator.scala:90) at org.apache.spark.util.CompletionIterator.next(CompletionIterator.scala:29) at scala.collection.Iterator$$anon$11.nextCur(Iterator.scala:484) at scala.collection.Iterator$$anon$11.hasNext(Iterator.scala:490) at scala.collection.Iterator$$anon$10.hasNext(Iterator.scala:458) at org.apache.spark.util.CompletionIterator.hasNext(CompletionIterator.scala:31) at org.apache.spark.InterruptibleIterator.hasNext(InterruptibleIterator.scala:37) at scala.collection.Iterator$$anon$10.hasNext(Iterator.scala:458) ``` Because this is an `ArithmeticException`, the executor doesn't fallback. It's not a `FetchFailure` either, so the stage is not retried and the application fails. ### Why are the changes needed? The executor should fallback to fetch original blocks and not fail because this suggests that there is an issue with push-merged block. ### Does this PR introduce _any_ user-facing change? No. ### How was this patch tested? Modified the existing UTs to validate that RuntimeException is thrown when numChunks are 0. Closes #41762 from otterc/SPARK-44215. Authored-by: Chandni Singh Signed-off-by: Mridul Muralidharan gmail.com> (cherry picked from commit 3e72806bb421b103bf6e73518b80200ccdd58ce5) Signed-off-by: Mridul Muralidharan --- .../network/shuffle/RemoteBlockPushResolver.java | 4 .../shuffle/RemoteBlockPushResolverSuite.java | 24 ++ 2 files changed, 20 insertions(+), 8 deletions(-) diff --git a/common/network-shuffle/src/main/java/org/apache/spark/network/shuffle/RemoteBlockPushResolver.java b/common/network-shuffle/src/main/java/org/apache/spark/network/shuffle/RemoteBlockPushResolver.java index df2d1fa12d1..04935cfd932 100644 --- a/common/network-shuffle/src/main/java/org/apache/spark/network/shuffle/RemoteBlockPushResolver.java +++ b/common/network-shuffle/src/main/java/org/apache/spark/network/shuffle/RemoteBlockPushResolver.java @@ -328,6 +328,10 @@ public class RemoteBlockPushResolver implements MergedShuffleFileManager { int size = (int) indexFile.length(); // First entry is the zero offset int numChunks = (size / Long.BYTES) - 1; +if (numChunks <= 0) { + throw new RuntimeException(String.format( + "Merged shuffle index file %s is empty", indexFile.getPath())); +} File metaFile = appShuffleInfo.getMergedShuffleMetaFile(shuffleId, shuffleMergeId, reduceId); if (!metaFile.exists()) { throw new RuntimeException(String.format("Merged shuffle meta file %s not found", diff --git a/common/network-shuffle/src/test/java/org/apache/spark/network/shuffle/RemoteBlockPushResolverSuite.java b/common/network-shuffle/src/test/java/org/apache/spark/network/shuffle/RemoteBlockPushResolverSuite.java index 2526a94f429..0847121b0cc 100644 --- a/common/network-shuffle/src/test/java/org/apache/spark/network/shuffle/RemoteBlockPushResolverSuite.java +++ b/common/network-shuffle/src/test/java/org/apache/spark/network/shuffle/RemoteBlockPushResolverSuite.java @@ -281,7 +281,7 @@ public class RemoteBlockPushResolverSuite { verifyMetrics(4, 0, 0, 0, 0, 0, 4); } - @Test + @Test(expected = RuntimeException.class) public void testFailureAfterData() throws IOException { StreamCallbackWithID stream = pushResolver.receiveBlockDataAsStream( @@ -289,12 +289,16 @@ public class RemoteBlockPushResolverSuite { stream.onData(stream.getID(), ByteBuffer.wrap(new byte[4])); stream.onFailure(stream.getID(), new RuntimeException("Forced Failure")); pushResolver.
[spark] branch master updated: [SPARK-44215][SHUFFLE] If num chunks are 0, then server should throw a RuntimeException
This is an automated email from the ASF dual-hosted git repository. mridulm80 pushed a commit to branch master in repository https://gitbox.apache.org/repos/asf/spark.git The following commit(s) were added to refs/heads/master by this push: new 3e72806bb42 [SPARK-44215][SHUFFLE] If num chunks are 0, then server should throw a RuntimeException 3e72806bb42 is described below commit 3e72806bb421b103bf6e73518b80200ccdd58ce5 Author: Chandni Singh AuthorDate: Tue Jul 4 18:46:18 2023 -0500 [SPARK-44215][SHUFFLE] If num chunks are 0, then server should throw a RuntimeException ### What changes were proposed in this pull request? The executor expects `numChunks` to be > 0. If it is zero, then we see that the executor fails with ``` 23/06/20 19:07:37 ERROR task 2031.0 in stage 47.0 (TID 25018) Executor: Exception in task 2031.0 in stage 47.0 (TID 25018) java.lang.ArithmeticException: / by zero at org.apache.spark.storage.PushBasedFetchHelper.createChunkBlockInfosFromMetaResponse(PushBasedFetchHelper.scala:128) at org.apache.spark.storage.ShuffleBlockFetcherIterator.next(ShuffleBlockFetcherIterator.scala:1047) at org.apache.spark.storage.ShuffleBlockFetcherIterator.next(ShuffleBlockFetcherIterator.scala:90) at org.apache.spark.util.CompletionIterator.next(CompletionIterator.scala:29) at scala.collection.Iterator$$anon$11.nextCur(Iterator.scala:484) at scala.collection.Iterator$$anon$11.hasNext(Iterator.scala:490) at scala.collection.Iterator$$anon$10.hasNext(Iterator.scala:458) at org.apache.spark.util.CompletionIterator.hasNext(CompletionIterator.scala:31) at org.apache.spark.InterruptibleIterator.hasNext(InterruptibleIterator.scala:37) at scala.collection.Iterator$$anon$10.hasNext(Iterator.scala:458) ``` Because this is an `ArithmeticException`, the executor doesn't fallback. It's not a `FetchFailure` either, so the stage is not retried and the application fails. ### Why are the changes needed? The executor should fallback to fetch original blocks and not fail because this suggests that there is an issue with push-merged block. ### Does this PR introduce _any_ user-facing change? No. ### How was this patch tested? Modified the existing UTs to validate that RuntimeException is thrown when numChunks are 0. Closes #41762 from otterc/SPARK-44215. Authored-by: Chandni Singh Signed-off-by: Mridul Muralidharan gmail.com> --- .../network/shuffle/RemoteBlockPushResolver.java | 4 .../shuffle/RemoteBlockPushResolverSuite.java | 24 ++ 2 files changed, 20 insertions(+), 8 deletions(-) diff --git a/common/network-shuffle/src/main/java/org/apache/spark/network/shuffle/RemoteBlockPushResolver.java b/common/network-shuffle/src/main/java/org/apache/spark/network/shuffle/RemoteBlockPushResolver.java index 7f0862fcef4..b95a8700109 100644 --- a/common/network-shuffle/src/main/java/org/apache/spark/network/shuffle/RemoteBlockPushResolver.java +++ b/common/network-shuffle/src/main/java/org/apache/spark/network/shuffle/RemoteBlockPushResolver.java @@ -328,6 +328,10 @@ public class RemoteBlockPushResolver implements MergedShuffleFileManager { int size = (int) indexFile.length(); // First entry is the zero offset int numChunks = (size / Long.BYTES) - 1; +if (numChunks <= 0) { + throw new RuntimeException(String.format( + "Merged shuffle index file %s is empty", indexFile.getPath())); +} File metaFile = appShuffleInfo.getMergedShuffleMetaFile(shuffleId, shuffleMergeId, reduceId); if (!metaFile.exists()) { throw new RuntimeException(String.format("Merged shuffle meta file %s not found", diff --git a/common/network-shuffle/src/test/java/org/apache/spark/network/shuffle/RemoteBlockPushResolverSuite.java b/common/network-shuffle/src/test/java/org/apache/spark/network/shuffle/RemoteBlockPushResolverSuite.java index 2526a94f429..0847121b0cc 100644 --- a/common/network-shuffle/src/test/java/org/apache/spark/network/shuffle/RemoteBlockPushResolverSuite.java +++ b/common/network-shuffle/src/test/java/org/apache/spark/network/shuffle/RemoteBlockPushResolverSuite.java @@ -281,7 +281,7 @@ public class RemoteBlockPushResolverSuite { verifyMetrics(4, 0, 0, 0, 0, 0, 4); } - @Test + @Test(expected = RuntimeException.class) public void testFailureAfterData() throws IOException { StreamCallbackWithID stream = pushResolver.receiveBlockDataAsStream( @@ -289,12 +289,16 @@ public class RemoteBlockPushResolverSuite { stream.onData(stream.getID(), ByteBuffer.wrap(new byte[4])); stream.onFailure(stream.getID(), new RuntimeException("Forced Failure")); pushResolver.finalizeShuffleMerge(new FinalizeShuffleMerge(TEST_APP, NO_ATTEMPT_ID, 0, 0)); -MergedBlockMeta blockMeta = pushReso
[spark] branch master updated (3e72806bb42 -> e88f5ec54bf)
This is an automated email from the ASF dual-hosted git repository. gurwls223 pushed a change to branch master in repository https://gitbox.apache.org/repos/asf/spark.git from 3e72806bb42 [SPARK-44215][SHUFFLE] If num chunks are 0, then server should throw a RuntimeException add e88f5ec54bf [SPARK-44245][PYTHON] pyspark.sql.dataframe doctests behave differently No new revisions were added by this update. Summary of changes: python/pyspark/sql/dataframe.py | 6 +++--- 1 file changed, 3 insertions(+), 3 deletions(-) - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark] branch master updated: Revert "[SPARK-43476][PYTHON][TESTS] Enable SeriesStringTests.test_string_replace for pandas 2.0.0"
This is an automated email from the ASF dual-hosted git repository. gurwls223 pushed a commit to branch master in repository https://gitbox.apache.org/repos/asf/spark.git The following commit(s) were added to refs/heads/master by this push: new 82e091ef270 Revert "[SPARK-43476][PYTHON][TESTS] Enable SeriesStringTests.test_string_replace for pandas 2.0.0" 82e091ef270 is described below commit 82e091ef2707a349b322f011d8cab9bcae6d9638 Author: Hyukjin Kwon AuthorDate: Wed Jul 5 09:03:28 2023 +0900 Revert "[SPARK-43476][PYTHON][TESTS] Enable SeriesStringTests.test_string_replace for pandas 2.0.0" This reverts commit 442fdb8be42789d9a3fac8361f339f4e2a304fb8. --- python/pyspark/pandas/tests/test_series_string.py | 9 ++--- 1 file changed, 6 insertions(+), 3 deletions(-) diff --git a/python/pyspark/pandas/tests/test_series_string.py b/python/pyspark/pandas/tests/test_series_string.py index 956567bc5a4..3c2bd58da1a 100644 --- a/python/pyspark/pandas/tests/test_series_string.py +++ b/python/pyspark/pandas/tests/test_series_string.py @@ -246,6 +246,10 @@ class SeriesStringTestsMixin: with self.assertRaises(TypeError): self.check_func(lambda x: x.str.repeat(repeats=[0, 1, 2, 3, 4, 5, 6, 7, 8, 9])) +@unittest.skipIf( +LooseVersion(pd.__version__) >= LooseVersion("2.0.0"), +"TODO(SPARK-43476): Enable SeriesStringTests.test_string_replace for pandas 2.0.0.", +) def test_string_replace(self): self.check_func(lambda x: x.str.replace("a.", "xx", regex=True)) self.check_func(lambda x: x.str.replace("a.", "xx", regex=False)) @@ -255,11 +259,10 @@ class SeriesStringTestsMixin: def repl(m): return m.group(0)[::-1] -regex_pat = re.compile(r"[a-z]+") -self.check_func(lambda x: x.str.replace(regex_pat, repl, regex=True)) +self.check_func(lambda x: x.str.replace(r"[a-z]+", repl)) # compiled regex with flags regex_pat = re.compile(r"WHITESPACE", flags=re.IGNORECASE) -self.check_func(lambda x: x.str.replace(regex_pat, "---", regex=True)) +self.check_func(lambda x: x.str.replace(regex_pat, "---")) def test_string_rfind(self): self.check_func(lambda x: x.str.rfind("a")) - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark] branch master updated (82e091ef270 -> 756b4ea7d6a)
This is an automated email from the ASF dual-hosted git repository. yangjie01 pushed a change to branch master in repository https://gitbox.apache.org/repos/asf/spark.git from 82e091ef270 Revert "[SPARK-43476][PYTHON][TESTS] Enable SeriesStringTests.test_string_replace for pandas 2.0.0" add 756b4ea7d6a [SPARK-44298][BUILD] Disable PySpark test on the daily test of Java 21 before the new arrow version release No new revisions were added by this update. Summary of changes: .github/workflows/build_java21.yml | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark] branch master updated: [SPARK-44300][CONNECT][BUG-FIX] Fix artifact cleanup to limit deletion scope to session specific artifacts
This is an automated email from the ASF dual-hosted git repository. gurwls223 pushed a commit to branch master in repository https://gitbox.apache.org/repos/asf/spark.git The following commit(s) were added to refs/heads/master by this push: new 531ec8bddc8 [SPARK-44300][CONNECT][BUG-FIX] Fix artifact cleanup to limit deletion scope to session specific artifacts 531ec8bddc8 is described below commit 531ec8bddc8dd22ca39486dbdd31e62e989ddc15 Author: vicennial AuthorDate: Wed Jul 5 12:06:33 2023 +0900 [SPARK-44300][CONNECT][BUG-FIX] Fix artifact cleanup to limit deletion scope to session specific artifacts ### What changes were proposed in this pull request? Modify the directory deletion in `SparkConnectArtifactManager#cleanUpResources` to target the session-specific artifact directory instead of the root artifact directory. ### Why are the changes needed? Currently, when `SparkConnectArtifactManager#cleanUpResources` is called, it would lead to the deletion of **all** artifacts instead of session-specific ones. This breaks resource isolation among sessions when the bug is triggered. ### Does this PR introduce _any_ user-facing change? No. ### How was this patch tested? New unit test in `ArtifactManagerSuite` that verifies that the correct directory is deleted as well as the existence of the root directory. Closes #41854 from vicennial/SPARK-44300. Authored-by: vicennial Signed-off-by: Hyukjin Kwon --- .../connect/artifact/SparkConnectArtifactManager.scala | 2 +- .../sql/connect/artifact/ArtifactManagerSuite.scala| 18 ++ 2 files changed, 19 insertions(+), 1 deletion(-) diff --git a/connector/connect/server/src/main/scala/org/apache/spark/sql/connect/artifact/SparkConnectArtifactManager.scala b/connector/connect/server/src/main/scala/org/apache/spark/sql/connect/artifact/SparkConnectArtifactManager.scala index 9fd8e367e4a..449ba011c21 100644 --- a/connector/connect/server/src/main/scala/org/apache/spark/sql/connect/artifact/SparkConnectArtifactManager.scala +++ b/connector/connect/server/src/main/scala/org/apache/spark/sql/connect/artifact/SparkConnectArtifactManager.scala @@ -188,7 +188,7 @@ class SparkConnectArtifactManager(sessionHolder: SessionHolder) extends Logging blockManager.removeCache(sessionHolder.userId, sessionHolder.sessionId) // Clean up artifacts folder -FileUtils.deleteDirectory(artifactRootPath.toFile) +FileUtils.deleteDirectory(artifactPath.toFile) } private[connect] def uploadArtifactToFs( diff --git a/connector/connect/server/src/test/scala/org/apache/spark/sql/connect/artifact/ArtifactManagerSuite.scala b/connector/connect/server/src/test/scala/org/apache/spark/sql/connect/artifact/ArtifactManagerSuite.scala index 612bf096b22..345e458cd2f 100644 --- a/connector/connect/server/src/test/scala/org/apache/spark/sql/connect/artifact/ArtifactManagerSuite.scala +++ b/connector/connect/server/src/test/scala/org/apache/spark/sql/connect/artifact/ArtifactManagerSuite.scala @@ -274,6 +274,24 @@ class ArtifactManagerSuite extends SharedSparkSession with ResourceHelper { assert(result.forall(_.getString(0).contains("Ahri"))) } } + + test("SPARK-44300: Cleaning up resources only deletes session-specific resources") { +val copyDir = Utils.createTempDir().toPath +FileUtils.copyDirectory(artifactPath.toFile, copyDir.toFile) +val stagingPath = copyDir.resolve("Hello.class") +val remotePath = Paths.get("classes/Hello.class") + +val sessionHolder = SparkConnectService.getOrCreateIsolatedSession("c1", "session") +sessionHolder.addArtifact(remotePath, stagingPath, None) + +val sessionDirectory = + SparkConnectArtifactManager.getArtifactDirectoryAndUriForSession(sessionHolder)._1.toFile +assert(sessionDirectory.exists()) + +sessionHolder.artifactManager.cleanUpResources() +assert(!sessionDirectory.exists()) +assert(SparkConnectArtifactManager.artifactRootPath.toFile.exists()) + } } class ArtifactUriSuite extends SparkFunSuite with LocalSparkContext { - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org