[spark] branch master updated: [SPARK-44295][BUILD] Upgrade `scala-parser-combinators` to 2.3.0

2023-07-04 Thread gurwls223
This is an automated email from the ASF dual-hosted git repository.

gurwls223 pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/master by this push:
 new 0e8a95ea99f [SPARK-44295][BUILD] Upgrade `scala-parser-combinators` to 
2.3.0
0e8a95ea99f is described below

commit 0e8a95ea99f3be0c31e8c76d5086d46062ef3bef
Author: yangjie01 
AuthorDate: Tue Jul 4 20:06:24 2023 +0900

[SPARK-44295][BUILD] Upgrade `scala-parser-combinators` to 2.3.0

### What changes were proposed in this pull request?
This pr aims to upgrade `scala-parser-combinators` from 2.2.0 to 2.3.0

### Why are the changes needed?
The new version [dropped support for Scala 
2.11](https://github.com/scala/scala-parser-combinators/pull/504) and bring a 
bug fix:
- https://github.com/scala/scala-parser-combinators/pull/507

The full release notes as follows:
- https://github.com/scala/scala-parser-combinators/releases/tag/v2.3.0

### Does this PR introduce _any_ user-facing change?
No

### How was this patch tested?
Pass Github Actions

Closes #41848 from LuciferYang/scala-parser-combinators-23.

Authored-by: yangjie01 
Signed-off-by: Hyukjin Kwon 
---
 dev/deps/spark-deps-hadoop-3-hive-2.3 | 2 +-
 pom.xml   | 2 +-
 2 files changed, 2 insertions(+), 2 deletions(-)

diff --git a/dev/deps/spark-deps-hadoop-3-hive-2.3 
b/dev/deps/spark-deps-hadoop-3-hive-2.3
index 3ac03fa6472..910b1703c70 100644
--- a/dev/deps/spark-deps-hadoop-3-hive-2.3
+++ b/dev/deps/spark-deps-hadoop-3-hive-2.3
@@ -227,7 +227,7 @@ rocksdbjni/8.3.2//rocksdbjni-8.3.2.jar
 scala-collection-compat_2.12/2.7.0//scala-collection-compat_2.12-2.7.0.jar
 scala-compiler/2.12.18//scala-compiler-2.12.18.jar
 scala-library/2.12.18//scala-library-2.12.18.jar
-scala-parser-combinators_2.12/2.2.0//scala-parser-combinators_2.12-2.2.0.jar
+scala-parser-combinators_2.12/2.3.0//scala-parser-combinators_2.12-2.3.0.jar
 scala-reflect/2.12.18//scala-reflect-2.12.18.jar
 scala-xml_2.12/2.1.0//scala-xml_2.12-2.1.0.jar
 shims/0.9.45//shims-0.9.45.jar
diff --git a/pom.xml b/pom.xml
index deccc904dd9..588f91155d6 100644
--- a/pom.xml
+++ b/pom.xml
@@ -1102,7 +1102,7 @@
   
 org.scala-lang.modules
 
scala-parser-combinators_${scala.binary.version}
-2.2.0
+2.3.0
   
   
 jline


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org



[spark] branch master updated: [SPARK-44297][CORE][TESTS] Make `ClassLoaderIsolationSuite` test pass with Scala 2.13

2023-07-04 Thread yangjie01
This is an automated email from the ASF dual-hosted git repository.

yangjie01 pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/master by this push:
 new 16295a33822 [SPARK-44297][CORE][TESTS] Make 
`ClassLoaderIsolationSuite` test pass with Scala 2.13
16295a33822 is described below

commit 16295a338220cc66fea5d91bcbd4213df0f2d4bd
Author: yangjie01 
AuthorDate: Tue Jul 4 22:37:04 2023 +0800

[SPARK-44297][CORE][TESTS] Make `ClassLoaderIsolationSuite` test pass with 
Scala 2.13

### What changes were proposed in this pull request?
The main change of this pr as follows:

1. rename `TestHelloV2.jar` and `TestHelloV3.jar` added in 
https://github.com/apache/spark/pull/41789 to `TestHelloV2_2.12.jar` and 
`TestHelloV3_2.12.jar`
2. Add corresponding `TestHelloV2_2.13.jar` and `TestHelloV3_2.13.jar` 
which compiled with Scala 2.13
3. Make `ClassLoaderIsolationSuite` use the correct jar in testing

### Why are the changes needed?
Make `ClassLoaderIsolationSuite` test pass with Scala 2.13.

The Scala 2.13 daily test failed after 
https://github.com/apache/spark/pull/41789
- https://github.com/apache/spark/actions/runs/5447771717/jobs/9910185372

```
[info] - Executor classloader isolation with JobArtifactSet *** FAILED *** 
(83 milliseconds)
[info]   org.apache.spark.SparkException: Job aborted due to stage failure: 
Task 0 in stage 0.0 failed 1 times, most recent failure: Lost task 0.0 in stage 
0.0 (TID 0) (localhost executor driver): java.lang.NoClassDefFoundError: 
scala/Serializable
[info]  at java.lang.ClassLoader.defineClass1(Native Method)
[info]  at java.lang.ClassLoader.defineClass(ClassLoader.java:756)
[info]  at 
java.security.SecureClassLoader.defineClass(SecureClassLoader.java:142)
[info]  at java.net.URLClassLoader.defineClass(URLClassLoader.java:473)
[info]  at java.net.URLClassLoader.access$100(URLClassLoader.java:74)
[info]  at java.net.URLClassLoader$1.run(URLClassLoader.java:369)
[info]  at java.net.URLClassLoader$1.run(URLClassLoader.java:363)
[info]  at java.security.AccessController.doPrivileged(Native Method)
[info]  at java.net.URLClassLoader.findClass(URLClassLoader.java:362)
[info]  at java.lang.ClassLoader.loadClass(ClassLoader.java:418)
[info]  at java.lang.ClassLoader.loadClass(ClassLoader.java:351)
[info]  at java.lang.Class.forName0(Native Method)
[info]  at java.lang.Class.forName(Class.java:348)
[info]  at 
org.apache.spark.util.SparkClassUtils.classForName(SparkClassUtils.scala:35)
[info]  at 
org.apache.spark.util.SparkClassUtils.classForName$(SparkClassUtils.scala:30)
[info]  at org.apache.spark.util.Utils$.classForName(Utils.scala:94)
[info]  at 
org.apache.spark.executor.ClassLoaderIsolationSuite.$anonfun$new$3(ClassLoaderIsolationSuite.scala:53)
[info]  at 
scala.runtime.java8.JFunction1$mcVI$sp.apply(JFunction1$mcVI$sp.scala:18)
[info]  at scala.collection.IterableOnceOps.foreach(IterableOnce.scala:576)
[info]  at scala.collection.IterableOnceOps.foreach$(IterableOnce.scala:574)
[info]  at 
org.apache.spark.InterruptibleIterator.foreach(InterruptibleIterator.scala:28)
[info]  at org.apache.spark.rdd.RDD.$anonfun$foreach$2(RDD.scala:1028)
[info]  at 
org.apache.spark.rdd.RDD.$anonfun$foreach$2$adapted(RDD.scala:1028)
[info]  at 
org.apache.spark.SparkContext.$anonfun$runJob$5(SparkContext.scala:2406)
[info]  at 
org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:93)
[info]  at 
org.apache.spark.TaskContext.runTaskWithListeners(TaskContext.scala:161)
[info]  at org.apache.spark.scheduler.Task.run(Task.scala:141)
[info]  at 
org.apache.spark.executor.Executor$TaskRunner.$anonfun$run$4(Executor.scala:593)
[info]  at org.apache.spark.util.Utils$.tryWithSafeFinally(Utils.scala:1478)
[info]  at 
org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:596)
[info]  at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
[info]  at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
[info]  at java.lang.Thread.run(Thread.java:750)
[info] Caused by: java.lang.ClassNotFoundException: scala.Serializable
[info]  at java.net.URLClassLoader.findClass(URLClassLoader.java:387)
[info]  at java.lang.ClassLoader.loadClass(ClassLoader.java:418)
[info]  at java.lang.ClassLoader.loadClass(ClassLoader.java:351)
[info]  ... 33 more
```

### Does this PR introduce _any_ user-facing change?
No

### How was this patch tested?
- Pass GitHub Actions
- Manual check:

**Scala 2.12**
```
build/sbt "core/testOnly *ClassLoaderIsolationSuite"
```

```
[info] ClassLoaderIsolationSuite:
[info] - Executor classloader isolation with JobArtifactSet (1 seco

[spark] branch master updated: [SPARK-44296][BUILD] Upgrade dropwizard metrics 4.2.19

2023-07-04 Thread yangjie01
This is an automated email from the ASF dual-hosted git repository.

yangjie01 pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/master by this push:
 new 68862589a0c [SPARK-44296][BUILD] Upgrade dropwizard metrics 4.2.19
68862589a0c is described below

commit 68862589a0cc0c450434d3a1675bb241a5917f1b
Author: yangjie01 
AuthorDate: Tue Jul 4 22:38:43 2023 +0800

[SPARK-44296][BUILD] Upgrade dropwizard metrics 4.2.19

### What changes were proposed in this pull request?
This pr aims upgrade dropwizard metrics to 4.2.19.

### Why are the changes needed?
The new version bring a bug fix related to metrics-jetty module:
- https://github.com/dropwizard/metrics/pull/3379

### Does this PR introduce _any_ user-facing change?
No

### How was this patch tested?
Pass Github Actions

Closes #41849 from LuciferYang/SPARK-44296.

Authored-by: yangjie01 
Signed-off-by: yangjie01 
---
 dev/deps/spark-deps-hadoop-3-hive-2.3 | 10 +-
 pom.xml   |  2 +-
 2 files changed, 6 insertions(+), 6 deletions(-)

diff --git a/dev/deps/spark-deps-hadoop-3-hive-2.3 
b/dev/deps/spark-deps-hadoop-3-hive-2.3
index 910b1703c70..1cdf08f321e 100644
--- a/dev/deps/spark-deps-hadoop-3-hive-2.3
+++ b/dev/deps/spark-deps-hadoop-3-hive-2.3
@@ -177,11 +177,11 @@ log4j-slf4j2-impl/2.20.0//log4j-slf4j2-impl-2.20.0.jar
 logging-interceptor/3.12.12//logging-interceptor-3.12.12.jar
 lz4-java/1.8.0//lz4-java-1.8.0.jar
 mesos/1.4.3/shaded-protobuf/mesos-1.4.3-shaded-protobuf.jar
-metrics-core/4.2.18//metrics-core-4.2.18.jar
-metrics-graphite/4.2.18//metrics-graphite-4.2.18.jar
-metrics-jmx/4.2.18//metrics-jmx-4.2.18.jar
-metrics-json/4.2.18//metrics-json-4.2.18.jar
-metrics-jvm/4.2.18//metrics-jvm-4.2.18.jar
+metrics-core/4.2.19//metrics-core-4.2.19.jar
+metrics-graphite/4.2.19//metrics-graphite-4.2.19.jar
+metrics-jmx/4.2.19//metrics-jmx-4.2.19.jar
+metrics-json/4.2.19//metrics-json-4.2.19.jar
+metrics-jvm/4.2.19//metrics-jvm-4.2.19.jar
 minlog/1.3.0//minlog-1.3.0.jar
 netty-all/4.1.93.Final//netty-all-4.1.93.Final.jar
 netty-buffer/4.1.93.Final//netty-buffer-4.1.93.Final.jar
diff --git a/pom.xml b/pom.xml
index 588f91155d6..2e29d1de0c9 100644
--- a/pom.xml
+++ b/pom.xml
@@ -152,7 +152,7 @@
 If you changes codahale.metrics.version, you also need to change
 the link to metrics.dropwizard.io in docs/monitoring.md.
 -->
-4.2.18
+4.2.19
 
 1.11.1
 1.12.0


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org



[spark] branch master updated (68862589a0c -> d53585c91b2)

2023-07-04 Thread maxgekk
This is an automated email from the ASF dual-hosted git repository.

maxgekk pushed a change to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git


from 68862589a0c [SPARK-44296][BUILD] Upgrade dropwizard metrics 4.2.19
 add d53585c91b2 [SPARK-44292][SQL] Assign names to the error class 
_LEGACY_ERROR_TEMP_[2315-2319]

No new revisions were added by this update.

Summary of changes:
 .../src/main/resources/error/error-classes.json| 57 --
 ...ror-conditions-datatype-mismatch-error-class.md |  4 ++
 ...itions-invalid-observed-metrics-error-class.md} | 24 -
 .../sql/catalyst/analysis/CheckAnalysis.scala  | 24 +
 .../sql/catalyst/analysis/AnalysisSuite.scala  | 40 ++-
 5 files changed, 92 insertions(+), 57 deletions(-)
 copy docs/{sql-error-conditions-invalid-schema-error-class.md => 
sql-error-conditions-invalid-observed-metrics-error-class.md} (61%)


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org



[spark] branch master updated: [SPARK-44293][CONNECT] Fix invalid URI for custom JARs in Spark Connect

2023-07-04 Thread gurwls223
This is an automated email from the ASF dual-hosted git repository.

gurwls223 pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/master by this push:
 new c08700f05f9 [SPARK-44293][CONNECT] Fix invalid URI for custom JARs in 
Spark Connect
c08700f05f9 is described below

commit c08700f05f96e083dd8dec12fb2ca90a49d16a52
Author: vicennial 
AuthorDate: Wed Jul 5 08:41:47 2023 +0900

[SPARK-44293][CONNECT] Fix invalid URI for custom JARs in Spark Connect

### What changes were proposed in this pull request?

This PR fixs the bug where invalid JAR URIs were being generated because 
the URI was stored as `artifactURI + "/" + target.toString` (here, `target` is 
the absolute path of the file) instead of `artifactURI + "/" + 
remoteRelativePath.toString` (here, the `remoteRelativePath` is in the form of 
`jars/...`)

### Why are the changes needed?

Without this change, Spark Connect users attempting to use a custom JAR 
(such as in UDFs) will hit task failure issue as an exception would be thrown 
during the JAR file fetch operation.
Example stacktrace:
```
23/07/03 17:00:15 INFO Executor: Fetching 
spark://ip-10-110-22-170.us-west-2.compute.internal:43743/artifacts/d9548b02-ff3b-4278-ab52-aef5d1fc724e//home/venkata.gudesa/spark/artifacts/spark-d6141194-c487-40fd-ba40-444d922808ea/d9548b02-ff3b-4278-ab52-aef5d1fc724e/jars/TestHelloV2.jar
 with timestamp 0
23/07/03 17:00:15 ERROR Executor: Exception in task 6.0 in stage 4.0 (TID 
55)
java.lang.RuntimeException: Stream 
'/artifacts/d9548b02-ff3b-4278-ab52-aef5d1fc724e//home/venkata.gudesa/spark/artifacts/spark-d6141194-c487-40fd-ba40-444d922808ea/d9548b02-ff3b-4278-ab52-aef5d1fc724e/jars/TestHelloV2.jar'
 was not found.
at 
org.apache.spark.network.client.TransportResponseHandler.handle(TransportResponseHandler.java:260)
at 
org.apache.spark.network.server.TransportChannelHandler.channelRead0(TransportChannelHandler.java:142)
at 
org.apache.spark.network.server.TransportChannelHandler.channelRead0(TransportChannelHandler.java:53)
at 
io.netty.channel.SimpleChannelInboundHandler.channelRead(SimpleChannelInboundHandler.java:99)
at 
io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:444)
at 
io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:420)
at 
io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:412)
at 
io.netty.handler.timeout.IdleStateHandler.channelRead(IdleStateHandler.java:286)
at 
io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:442)
at 
io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:420)
at 
io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:412)
at 
io.netty.handler.codec.MessageToMessageDecoder.channelRead(MessageToMessageDecoder.java:103)
at 
io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:444)
at 
io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:420)
at 
io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:412)
at 
org.apache.spark.network.util.TransportFrameDecoder.channelRead(TransportFrameDecoder.java:102)
at 
io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:444)
at 
io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:420)
at 
io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:412)
at 
io.netty.channel.DefaultChannelPipeline$HeadContext.channelRead(DefaultChannelPipeline.java:1410)
at 
io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:440)
at 
io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:420)
at 
io.netty.channel.DefaultChannelPipeline.fireChannelRead(DefaultChannelPipeline.java:919)
at 
io.netty.channel.nio.AbstractNioByteChannel$NioByteUnsafe.read(AbstractNioByteChannel.java:166)
at 
io.netty.channel.nio.NioEventLoop.processSelectedKey(NioEventLoop.java:788)
at 
io.netty.channel.nio.NioEventLoop.processSelectedKeysOptimized(NioEventLoop.java:724)
at 
io.netty.channel.nio.NioEventLoop.processSelectedKeys(NioEventLoop.java:650)
at io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:5

[spark] branch branch-3.4 updated: [SPARK-44215][SHUFFLE] If num chunks are 0, then server should throw a RuntimeException

2023-07-04 Thread mridulm80
This is an automated email from the ASF dual-hosted git repository.

mridulm80 pushed a commit to branch branch-3.4
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/branch-3.4 by this push:
 new 85d41757e85 [SPARK-44215][SHUFFLE] If num chunks are 0, then server 
should throw a RuntimeException
85d41757e85 is described below

commit 85d41757e855d97dee0f24f281f82249161c3d29
Author: Chandni Singh 
AuthorDate: Tue Jul 4 18:46:18 2023 -0500

[SPARK-44215][SHUFFLE] If num chunks are 0, then server should throw a 
RuntimeException

### What changes were proposed in this pull request?
The executor expects `numChunks` to be > 0. If it is zero, then we see that 
the executor fails with
```
23/06/20 19:07:37 ERROR task 2031.0 in stage 47.0 (TID 25018) Executor: 
Exception in task 2031.0 in stage 47.0 (TID 25018)
java.lang.ArithmeticException: / by zero
at 
org.apache.spark.storage.PushBasedFetchHelper.createChunkBlockInfosFromMetaResponse(PushBasedFetchHelper.scala:128)
at 
org.apache.spark.storage.ShuffleBlockFetcherIterator.next(ShuffleBlockFetcherIterator.scala:1047)
at 
org.apache.spark.storage.ShuffleBlockFetcherIterator.next(ShuffleBlockFetcherIterator.scala:90)
at 
org.apache.spark.util.CompletionIterator.next(CompletionIterator.scala:29)
at scala.collection.Iterator$$anon$11.nextCur(Iterator.scala:484)
at scala.collection.Iterator$$anon$11.hasNext(Iterator.scala:490)
at scala.collection.Iterator$$anon$10.hasNext(Iterator.scala:458)
at 
org.apache.spark.util.CompletionIterator.hasNext(CompletionIterator.scala:31)
at 
org.apache.spark.InterruptibleIterator.hasNext(InterruptibleIterator.scala:37)
at scala.collection.Iterator$$anon$10.hasNext(Iterator.scala:458)
```
Because this is an `ArithmeticException`, the executor doesn't fallback. 
It's not a `FetchFailure` either, so the stage is not retried and the 
application fails.

### Why are the changes needed?
The executor should fallback to fetch original blocks and not fail because 
this suggests that there is an issue with push-merged block.

### Does this PR introduce _any_ user-facing change?
No.

### How was this patch tested?
Modified the existing UTs to validate that RuntimeException is thrown when 
numChunks are 0.

Closes #41762 from otterc/SPARK-44215.

Authored-by: Chandni Singh 
Signed-off-by: Mridul Muralidharan gmail.com>
(cherry picked from commit 3e72806bb421b103bf6e73518b80200ccdd58ce5)
Signed-off-by: Mridul Muralidharan 
---
 .../network/shuffle/RemoteBlockPushResolver.java   |  4 
 .../shuffle/RemoteBlockPushResolverSuite.java  | 24 ++
 2 files changed, 20 insertions(+), 8 deletions(-)

diff --git 
a/common/network-shuffle/src/main/java/org/apache/spark/network/shuffle/RemoteBlockPushResolver.java
 
b/common/network-shuffle/src/main/java/org/apache/spark/network/shuffle/RemoteBlockPushResolver.java
index df2d1fa12d1..04935cfd932 100644
--- 
a/common/network-shuffle/src/main/java/org/apache/spark/network/shuffle/RemoteBlockPushResolver.java
+++ 
b/common/network-shuffle/src/main/java/org/apache/spark/network/shuffle/RemoteBlockPushResolver.java
@@ -328,6 +328,10 @@ public class RemoteBlockPushResolver implements 
MergedShuffleFileManager {
 int size = (int) indexFile.length();
 // First entry is the zero offset
 int numChunks = (size / Long.BYTES) - 1;
+if (numChunks <= 0) {
+  throw new RuntimeException(String.format(
+  "Merged shuffle index file %s is empty", indexFile.getPath()));
+}
 File metaFile = appShuffleInfo.getMergedShuffleMetaFile(shuffleId, 
shuffleMergeId, reduceId);
 if (!metaFile.exists()) {
   throw new RuntimeException(String.format("Merged shuffle meta file %s 
not found",
diff --git 
a/common/network-shuffle/src/test/java/org/apache/spark/network/shuffle/RemoteBlockPushResolverSuite.java
 
b/common/network-shuffle/src/test/java/org/apache/spark/network/shuffle/RemoteBlockPushResolverSuite.java
index 2526a94f429..0847121b0cc 100644
--- 
a/common/network-shuffle/src/test/java/org/apache/spark/network/shuffle/RemoteBlockPushResolverSuite.java
+++ 
b/common/network-shuffle/src/test/java/org/apache/spark/network/shuffle/RemoteBlockPushResolverSuite.java
@@ -281,7 +281,7 @@ public class RemoteBlockPushResolverSuite {
 verifyMetrics(4, 0, 0, 0, 0, 0, 4);
   }
 
-  @Test
+  @Test(expected = RuntimeException.class)
   public void testFailureAfterData() throws IOException {
 StreamCallbackWithID stream =
   pushResolver.receiveBlockDataAsStream(
@@ -289,12 +289,16 @@ public class RemoteBlockPushResolverSuite {
 stream.onData(stream.getID(), ByteBuffer.wrap(new byte[4]));
 stream.onFailure(stream.getID(), new RuntimeException("Forced Failure"));
 pushResolver.

[spark] branch master updated: [SPARK-44215][SHUFFLE] If num chunks are 0, then server should throw a RuntimeException

2023-07-04 Thread mridulm80
This is an automated email from the ASF dual-hosted git repository.

mridulm80 pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/master by this push:
 new 3e72806bb42 [SPARK-44215][SHUFFLE] If num chunks are 0, then server 
should throw a RuntimeException
3e72806bb42 is described below

commit 3e72806bb421b103bf6e73518b80200ccdd58ce5
Author: Chandni Singh 
AuthorDate: Tue Jul 4 18:46:18 2023 -0500

[SPARK-44215][SHUFFLE] If num chunks are 0, then server should throw a 
RuntimeException

### What changes were proposed in this pull request?
The executor expects `numChunks` to be > 0. If it is zero, then we see that 
the executor fails with
```
23/06/20 19:07:37 ERROR task 2031.0 in stage 47.0 (TID 25018) Executor: 
Exception in task 2031.0 in stage 47.0 (TID 25018)
java.lang.ArithmeticException: / by zero
at 
org.apache.spark.storage.PushBasedFetchHelper.createChunkBlockInfosFromMetaResponse(PushBasedFetchHelper.scala:128)
at 
org.apache.spark.storage.ShuffleBlockFetcherIterator.next(ShuffleBlockFetcherIterator.scala:1047)
at 
org.apache.spark.storage.ShuffleBlockFetcherIterator.next(ShuffleBlockFetcherIterator.scala:90)
at 
org.apache.spark.util.CompletionIterator.next(CompletionIterator.scala:29)
at scala.collection.Iterator$$anon$11.nextCur(Iterator.scala:484)
at scala.collection.Iterator$$anon$11.hasNext(Iterator.scala:490)
at scala.collection.Iterator$$anon$10.hasNext(Iterator.scala:458)
at 
org.apache.spark.util.CompletionIterator.hasNext(CompletionIterator.scala:31)
at 
org.apache.spark.InterruptibleIterator.hasNext(InterruptibleIterator.scala:37)
at scala.collection.Iterator$$anon$10.hasNext(Iterator.scala:458)
```
Because this is an `ArithmeticException`, the executor doesn't fallback. 
It's not a `FetchFailure` either, so the stage is not retried and the 
application fails.

### Why are the changes needed?
The executor should fallback to fetch original blocks and not fail because 
this suggests that there is an issue with push-merged block.

### Does this PR introduce _any_ user-facing change?
No.

### How was this patch tested?
Modified the existing UTs to validate that RuntimeException is thrown when 
numChunks are 0.

Closes #41762 from otterc/SPARK-44215.

Authored-by: Chandni Singh 
Signed-off-by: Mridul Muralidharan gmail.com>
---
 .../network/shuffle/RemoteBlockPushResolver.java   |  4 
 .../shuffle/RemoteBlockPushResolverSuite.java  | 24 ++
 2 files changed, 20 insertions(+), 8 deletions(-)

diff --git 
a/common/network-shuffle/src/main/java/org/apache/spark/network/shuffle/RemoteBlockPushResolver.java
 
b/common/network-shuffle/src/main/java/org/apache/spark/network/shuffle/RemoteBlockPushResolver.java
index 7f0862fcef4..b95a8700109 100644
--- 
a/common/network-shuffle/src/main/java/org/apache/spark/network/shuffle/RemoteBlockPushResolver.java
+++ 
b/common/network-shuffle/src/main/java/org/apache/spark/network/shuffle/RemoteBlockPushResolver.java
@@ -328,6 +328,10 @@ public class RemoteBlockPushResolver implements 
MergedShuffleFileManager {
 int size = (int) indexFile.length();
 // First entry is the zero offset
 int numChunks = (size / Long.BYTES) - 1;
+if (numChunks <= 0) {
+  throw new RuntimeException(String.format(
+  "Merged shuffle index file %s is empty", indexFile.getPath()));
+}
 File metaFile = appShuffleInfo.getMergedShuffleMetaFile(shuffleId, 
shuffleMergeId, reduceId);
 if (!metaFile.exists()) {
   throw new RuntimeException(String.format("Merged shuffle meta file %s 
not found",
diff --git 
a/common/network-shuffle/src/test/java/org/apache/spark/network/shuffle/RemoteBlockPushResolverSuite.java
 
b/common/network-shuffle/src/test/java/org/apache/spark/network/shuffle/RemoteBlockPushResolverSuite.java
index 2526a94f429..0847121b0cc 100644
--- 
a/common/network-shuffle/src/test/java/org/apache/spark/network/shuffle/RemoteBlockPushResolverSuite.java
+++ 
b/common/network-shuffle/src/test/java/org/apache/spark/network/shuffle/RemoteBlockPushResolverSuite.java
@@ -281,7 +281,7 @@ public class RemoteBlockPushResolverSuite {
 verifyMetrics(4, 0, 0, 0, 0, 0, 4);
   }
 
-  @Test
+  @Test(expected = RuntimeException.class)
   public void testFailureAfterData() throws IOException {
 StreamCallbackWithID stream =
   pushResolver.receiveBlockDataAsStream(
@@ -289,12 +289,16 @@ public class RemoteBlockPushResolverSuite {
 stream.onData(stream.getID(), ByteBuffer.wrap(new byte[4]));
 stream.onFailure(stream.getID(), new RuntimeException("Forced Failure"));
 pushResolver.finalizeShuffleMerge(new FinalizeShuffleMerge(TEST_APP, 
NO_ATTEMPT_ID, 0, 0));
-MergedBlockMeta blockMeta = pushReso

[spark] branch master updated (3e72806bb42 -> e88f5ec54bf)

2023-07-04 Thread gurwls223
This is an automated email from the ASF dual-hosted git repository.

gurwls223 pushed a change to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git


from 3e72806bb42 [SPARK-44215][SHUFFLE] If num chunks are 0, then server 
should throw a RuntimeException
 add e88f5ec54bf [SPARK-44245][PYTHON] pyspark.sql.dataframe doctests 
behave differently

No new revisions were added by this update.

Summary of changes:
 python/pyspark/sql/dataframe.py | 6 +++---
 1 file changed, 3 insertions(+), 3 deletions(-)


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org



[spark] branch master updated: Revert "[SPARK-43476][PYTHON][TESTS] Enable SeriesStringTests.test_string_replace for pandas 2.0.0"

2023-07-04 Thread gurwls223
This is an automated email from the ASF dual-hosted git repository.

gurwls223 pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/master by this push:
 new 82e091ef270 Revert "[SPARK-43476][PYTHON][TESTS] Enable 
SeriesStringTests.test_string_replace for pandas 2.0.0"
82e091ef270 is described below

commit 82e091ef2707a349b322f011d8cab9bcae6d9638
Author: Hyukjin Kwon 
AuthorDate: Wed Jul 5 09:03:28 2023 +0900

Revert "[SPARK-43476][PYTHON][TESTS] Enable 
SeriesStringTests.test_string_replace for pandas 2.0.0"

This reverts commit 442fdb8be42789d9a3fac8361f339f4e2a304fb8.
---
 python/pyspark/pandas/tests/test_series_string.py | 9 ++---
 1 file changed, 6 insertions(+), 3 deletions(-)

diff --git a/python/pyspark/pandas/tests/test_series_string.py 
b/python/pyspark/pandas/tests/test_series_string.py
index 956567bc5a4..3c2bd58da1a 100644
--- a/python/pyspark/pandas/tests/test_series_string.py
+++ b/python/pyspark/pandas/tests/test_series_string.py
@@ -246,6 +246,10 @@ class SeriesStringTestsMixin:
 with self.assertRaises(TypeError):
 self.check_func(lambda x: x.str.repeat(repeats=[0, 1, 2, 3, 4, 5, 
6, 7, 8, 9]))
 
+@unittest.skipIf(
+LooseVersion(pd.__version__) >= LooseVersion("2.0.0"),
+"TODO(SPARK-43476): Enable SeriesStringTests.test_string_replace for 
pandas 2.0.0.",
+)
 def test_string_replace(self):
 self.check_func(lambda x: x.str.replace("a.", "xx", regex=True))
 self.check_func(lambda x: x.str.replace("a.", "xx", regex=False))
@@ -255,11 +259,10 @@ class SeriesStringTestsMixin:
 def repl(m):
 return m.group(0)[::-1]
 
-regex_pat = re.compile(r"[a-z]+")
-self.check_func(lambda x: x.str.replace(regex_pat, repl, regex=True))
+self.check_func(lambda x: x.str.replace(r"[a-z]+", repl))
 # compiled regex with flags
 regex_pat = re.compile(r"WHITESPACE", flags=re.IGNORECASE)
-self.check_func(lambda x: x.str.replace(regex_pat, "---", regex=True))
+self.check_func(lambda x: x.str.replace(regex_pat, "---"))
 
 def test_string_rfind(self):
 self.check_func(lambda x: x.str.rfind("a"))


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org



[spark] branch master updated (82e091ef270 -> 756b4ea7d6a)

2023-07-04 Thread yangjie01
This is an automated email from the ASF dual-hosted git repository.

yangjie01 pushed a change to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git


from 82e091ef270 Revert "[SPARK-43476][PYTHON][TESTS] Enable 
SeriesStringTests.test_string_replace for pandas 2.0.0"
 add 756b4ea7d6a [SPARK-44298][BUILD] Disable PySpark test on the daily 
test of Java 21 before the new arrow version release

No new revisions were added by this update.

Summary of changes:
 .github/workflows/build_java21.yml | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org



[spark] branch master updated: [SPARK-44300][CONNECT][BUG-FIX] Fix artifact cleanup to limit deletion scope to session specific artifacts

2023-07-04 Thread gurwls223
This is an automated email from the ASF dual-hosted git repository.

gurwls223 pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/master by this push:
 new 531ec8bddc8 [SPARK-44300][CONNECT][BUG-FIX] Fix artifact cleanup to 
limit deletion scope to session specific artifacts
531ec8bddc8 is described below

commit 531ec8bddc8dd22ca39486dbdd31e62e989ddc15
Author: vicennial 
AuthorDate: Wed Jul 5 12:06:33 2023 +0900

[SPARK-44300][CONNECT][BUG-FIX] Fix artifact cleanup to limit deletion 
scope to session specific artifacts

### What changes were proposed in this pull request?

Modify the directory deletion in 
`SparkConnectArtifactManager#cleanUpResources` to target the session-specific 
artifact directory instead of the root artifact directory.

### Why are the changes needed?

Currently, when `SparkConnectArtifactManager#cleanUpResources` is called, 
it would lead to the deletion of **all** artifacts instead of session-specific 
ones. This breaks resource isolation among sessions when the bug is triggered.

### Does this PR introduce _any_ user-facing change?

No.

### How was this patch tested?

New unit test in `ArtifactManagerSuite` that verifies that the correct 
directory is deleted as well as the existence of the root directory.

Closes #41854 from vicennial/SPARK-44300.

Authored-by: vicennial 
Signed-off-by: Hyukjin Kwon 
---
 .../connect/artifact/SparkConnectArtifactManager.scala |  2 +-
 .../sql/connect/artifact/ArtifactManagerSuite.scala| 18 ++
 2 files changed, 19 insertions(+), 1 deletion(-)

diff --git 
a/connector/connect/server/src/main/scala/org/apache/spark/sql/connect/artifact/SparkConnectArtifactManager.scala
 
b/connector/connect/server/src/main/scala/org/apache/spark/sql/connect/artifact/SparkConnectArtifactManager.scala
index 9fd8e367e4a..449ba011c21 100644
--- 
a/connector/connect/server/src/main/scala/org/apache/spark/sql/connect/artifact/SparkConnectArtifactManager.scala
+++ 
b/connector/connect/server/src/main/scala/org/apache/spark/sql/connect/artifact/SparkConnectArtifactManager.scala
@@ -188,7 +188,7 @@ class SparkConnectArtifactManager(sessionHolder: 
SessionHolder) extends Logging
 blockManager.removeCache(sessionHolder.userId, sessionHolder.sessionId)
 
 // Clean up artifacts folder
-FileUtils.deleteDirectory(artifactRootPath.toFile)
+FileUtils.deleteDirectory(artifactPath.toFile)
   }
 
   private[connect] def uploadArtifactToFs(
diff --git 
a/connector/connect/server/src/test/scala/org/apache/spark/sql/connect/artifact/ArtifactManagerSuite.scala
 
b/connector/connect/server/src/test/scala/org/apache/spark/sql/connect/artifact/ArtifactManagerSuite.scala
index 612bf096b22..345e458cd2f 100644
--- 
a/connector/connect/server/src/test/scala/org/apache/spark/sql/connect/artifact/ArtifactManagerSuite.scala
+++ 
b/connector/connect/server/src/test/scala/org/apache/spark/sql/connect/artifact/ArtifactManagerSuite.scala
@@ -274,6 +274,24 @@ class ArtifactManagerSuite extends SharedSparkSession with 
ResourceHelper {
   assert(result.forall(_.getString(0).contains("Ahri")))
 }
   }
+
+  test("SPARK-44300: Cleaning up resources only deletes session-specific 
resources") {
+val copyDir = Utils.createTempDir().toPath
+FileUtils.copyDirectory(artifactPath.toFile, copyDir.toFile)
+val stagingPath = copyDir.resolve("Hello.class")
+val remotePath = Paths.get("classes/Hello.class")
+
+val sessionHolder = SparkConnectService.getOrCreateIsolatedSession("c1", 
"session")
+sessionHolder.addArtifact(remotePath, stagingPath, None)
+
+val sessionDirectory =
+  
SparkConnectArtifactManager.getArtifactDirectoryAndUriForSession(sessionHolder)._1.toFile
+assert(sessionDirectory.exists())
+
+sessionHolder.artifactManager.cleanUpResources()
+assert(!sessionDirectory.exists())
+assert(SparkConnectArtifactManager.artifactRootPath.toFile.exists())
+  }
 }
 
 class ArtifactUriSuite extends SparkFunSuite with LocalSparkContext {


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org