This is an automated email from the ASF dual-hosted git repository. hvanhovell pushed a commit to branch master in repository https://gitbox.apache.org/repos/asf/spark.git
The following commit(s) were added to refs/heads/master by this push: new e53abbbceaa [SPARK-45371][CONNECT] Fix shading issues in the Spark Connect Scala Client e53abbbceaa is described below commit e53abbbceaa2c41babaa23fe4c2f282f559b4c03 Author: Herman van Hovell <her...@databricks.com> AuthorDate: Mon Oct 2 13:03:06 2023 -0400 [SPARK-45371][CONNECT] Fix shading issues in the Spark Connect Scala Client ### What changes were proposed in this pull request? This PR fixes shading for the Spark Connect Scala Client maven build. The following things are addressed: - Guava and protobuf are included in the shaded jars. These were missing, and were causing users to see `ClassNotFoundException`s. - Fixed duplicate shading of guava. We use the parent pom's location now. - Fixed duplicate Netty dependency (shaded and transitive). One was used for GRPC and the other was needed by Arrow. This was fixed by pulling arrow into the shaded jar. - Use the same package as the shading defined in the parent package. ### Why are the changes needed? The maven artifacts for the Spark Connect Scala Client are currently broken. ### Does this PR introduce _any_ user-facing change? No. ### How was this patch tested? Manual tests. #### Step 1: Build new shaded library and install it in local maven repository `build/mvn clean install -pl connector/connect/client/jvm -am -DskipTests` #### Step 2: Start Connect Server `connector/connect/bin/spark-connect` #### Step 3: Launch REPL using the newly created library This step requires [coursier](https://get-coursier.io/) to be installed. `cs launch --jvm zulu:17.0.8 --scala 2.13.9 -r m2Local com.lihaoyi:::ammonite:2.5.11 org.apache.spark::spark-connect-client-jvm:4.0.0-SNAPSHOT --java-opt --add-opens=java.base/java.nio=ALL-UNNAMED -M org.apache.spark.sql.application.ConnectRepl` #### Step 4: Run a bunch of commands: ```scala // Check version spark.version // Run a simple query { spark.range(1, 10000, 1) .select($"id", $"id" % 5 as "group", rand(1).as("v1"), rand(2).as("v2")) .groupBy($"group") .agg( avg($"v1").as("v1_avg"), avg($"v2").as("v2_avg")) .show() } // Run a streaming query { import org.apache.spark.sql.execution.streaming.ProcessingTimeTrigger val query_name = "simple_streaming" val stream = spark.readStream .format("rate") .option("numPartitions", "1") .option("rowsPerSecond", "10") .load() .withWatermark("timestamp", "10 milliseconds") .groupBy(window(col("timestamp"), "10 milliseconds")) .count() .selectExpr("window.start as timestamp", "count as num_events") .writeStream .format("memory") .queryName(query_name) .trigger(ProcessingTimeTrigger.create("10 milliseconds")) // run for 20 seconds val query = stream.start() val start = System.currentTimeMillis() val end = System.currentTimeMillis() + 20 * 1000 while (System.currentTimeMillis() < end) { println(s"time: ${System.currentTimeMillis() - start} ms") println(query.status) spark.sql(s"select * from ${query_name}").show() Thread.sleep(500) } query.stop() } ``` Closes #43195 from hvanhovell/SPARK-45371. Authored-by: Herman van Hovell <her...@databricks.com> Signed-off-by: Herman van Hovell <her...@databricks.com> --- connector/connect/client/jvm/pom.xml | 39 +++++++++++++++++++++++++++--------- 1 file changed, 30 insertions(+), 9 deletions(-) diff --git a/connector/connect/client/jvm/pom.xml b/connector/connect/client/jvm/pom.xml index 9ca66b5c29c..a9040107f38 100644 --- a/connector/connect/client/jvm/pom.xml +++ b/connector/connect/client/jvm/pom.xml @@ -50,10 +50,20 @@ <artifactId>spark-sketch_${scala.binary.version}</artifactId> <version>${project.version}</version> </dependency> + <!-- + We need to define guava and protobuf here because we need to change the scope of both from + provided to compile. If we don't do this we can't shade these libraries. + --> <dependency> <groupId>com.google.guava</groupId> <artifactId>guava</artifactId> <version>${connect.guava.version}</version> + <scope>compile</scope> + </dependency> + <dependency> + <groupId>com.google.protobuf</groupId> + <artifactId>protobuf-java</artifactId> + <scope>compile</scope> </dependency> <dependency> <groupId>com.lihaoyi</groupId> @@ -85,6 +95,7 @@ <artifactId>maven-shade-plugin</artifactId> <configuration> <shadedArtifactAttached>false</shadedArtifactAttached> + <promoteTransitiveDependencies>true</promoteTransitiveDependencies> <artifactSet> <includes> <include>com.google.android:*</include> @@ -92,52 +103,62 @@ <include>com.google.code.findbugs:*</include> <include>com.google.code.gson:*</include> <include>com.google.errorprone:*</include> - <include>com.google.guava:*</include> <include>com.google.j2objc:*</include> <include>com.google.protobuf:*</include> + <include>com.google.flatbuffers:*</include> <include>io.grpc:*</include> <include>io.netty:*</include> <include>io.perfmark:*</include> + <include>org.apache.arrow:*</include> <include>org.codehaus.mojo:*</include> <include>org.checkerframework:*</include> <include>org.apache.spark:spark-connect-common_${scala.binary.version}</include> + <include>org.apache.spark:spark-sql-api_${scala.binary.version}</include> </includes> </artifactSet> <relocations> <relocation> <pattern>io.grpc</pattern> - <shadedPattern>${spark.shade.packageName}.connect.client.io.grpc</shadedPattern> + <shadedPattern>${spark.shade.packageName}.io.grpc</shadedPattern> <includes> <include>io.grpc.**</include> </includes> </relocation> <relocation> <pattern>com.google</pattern> - <shadedPattern>${spark.shade.packageName}.connect.client.com.google</shadedPattern> + <shadedPattern>${spark.shade.packageName}.com.google</shadedPattern> + <excludes> + <!-- Guava is relocated to ${spark.shade.packageName}.guava (see the parent pom.xml) --> + <exclude>com.google.common.**</exclude> + </excludes> </relocation> <relocation> <pattern>io.netty</pattern> - <shadedPattern>${spark.shade.packageName}.connect.client.io.netty</shadedPattern> + <shadedPattern>${spark.shade.packageName}.io.netty</shadedPattern> </relocation> <relocation> <pattern>org.checkerframework</pattern> - <shadedPattern>${spark.shade.packageName}.connect.client.org.checkerframework</shadedPattern> + <shadedPattern>${spark.shade.packageName}.org.checkerframework</shadedPattern> </relocation> <relocation> <pattern>javax.annotation</pattern> - <shadedPattern>${spark.shade.packageName}.connect.client.javax.annotation</shadedPattern> + <shadedPattern>${spark.shade.packageName}.javax.annotation</shadedPattern> </relocation> <relocation> <pattern>io.perfmark</pattern> - <shadedPattern>${spark.shade.packageName}.connect.client.io.perfmark</shadedPattern> + <shadedPattern>${spark.shade.packageName}.io.perfmark</shadedPattern> </relocation> <relocation> <pattern>org.codehaus</pattern> - <shadedPattern>${spark.shade.packageName}.connect.client.org.codehaus</shadedPattern> + <shadedPattern>${spark.shade.packageName}.org.codehaus</shadedPattern> + </relocation> + <relocation> + <pattern>org.apache.arrow</pattern> + <shadedPattern>${spark.shade.packageName}.org.apache.arrow</shadedPattern> </relocation> <relocation> <pattern>android.annotation</pattern> - <shadedPattern>${spark.shade.packageName}.connect.client.android.annotation</shadedPattern> + <shadedPattern>${spark.shade.packageName}.android.annotation</shadedPattern> </relocation> </relocations> <!--SPARK-42228: Add `ServicesResourceTransformer` to relocation class names in META-INF/services for grpc--> --------------------------------------------------------------------- To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org