This is an automated email from the ASF dual-hosted git repository. yangjie01 pushed a commit to branch master in repository https://gitbox.apache.org/repos/asf/spark.git
The following commit(s) were added to refs/heads/master by this push: new 44d2c86e71fc [SPARK-45593][BUILD] Building a runnable distribution from master code running spark-sql raise error 44d2c86e71fc is described below commit 44d2c86e71fca7044e6d5d9e9222eecff17c360c Author: yikaifei <yikai...@apache.org> AuthorDate: Thu Jan 18 11:32:01 2024 +0800 [SPARK-45593][BUILD] Building a runnable distribution from master code running spark-sql raise error ### What changes were proposed in this pull request? Fix a build issue, when building a runnable distribution from master code running spark-sql raise error: ``` Caused by: java.lang.ClassNotFoundException: org.sparkproject.guava.util.concurrent.internal.InternalFutureFailureAccess at java.base/jdk.internal.loader.BuiltinClassLoader.loadClass(BuiltinClassLoader.java:641) at java.base/jdk.internal.loader.ClassLoaders$AppClassLoader.loadClass(ClassLoaders.java:188) at java.base/java.lang.ClassLoader.loadClass(ClassLoader.java:520) ... 58 more ``` the problem is due to a gauva dependency in spark-connect-common POM that **conflicts** with the shade plugin of the parent pom. - the spark-connect-common contains `connect.guava.version` version of guava, and it is relocation as `${spark.shade.packageName}.guava` not the `${spark.shade.packageName}.connect.guava`; - The spark-network-common also contains guava related classes, it has also been relocation is `${spark.shade.packageName}.guava`, but guava version `${guava.version}`; - As a result, in the presence of different versions of the classpath org.sparkproject.guava.xx; In addition, after investigation, it seems that module spark-connect-common is not related to guava, so we can remove guava dependency from spark-connect-common. ### Why are the changes needed? Building a runnable distribution from master code is not runnable. ### Does this PR introduce _any_ user-facing change? No ### How was this patch tested? I ran the build command output a runnable distribution package manually for the tests; Build command: ``` ./dev/make-distribution.sh --name ui --pip --tgz -Phive -Phive-thriftserver -Pyarn -Pconnect ``` Test result: <img width="1276" alt="image" src="https://github.com/apache/spark/assets/51110188/aefbc433-ea5c-4287-8ebd-367806043ac8"> I also checked the `org.sparkproject.guava.cache.LocalCache` from jars dir; Before: ``` ➜ jars grep -lr 'org.sparkproject.guava.cache.LocalCache' ./ .//spark-connect_2.13-4.0.0-SNAPSHOT.jar .//spark-network-common_2.13-4.0.0-SNAPSHOT.jar .//spark-connect-common_2.13-4.0.0-SNAPSHOT.jar ``` Now: ``` ➜ jars grep -lr 'org.sparkproject.guava.cache.LocalCache' ./ .//spark-network-common_2.13-4.0.0-SNAPSHOT.jar ``` ### Was this patch authored or co-authored using generative AI tooling? No Closes #43436 from Yikf/SPARK-45593. Authored-by: yikaifei <yikai...@apache.org> Signed-off-by: yangjie01 <yangji...@baidu.com> --- assembly/pom.xml | 6 ++++++ connector/connect/client/jvm/pom.xml | 8 +------- connector/connect/common/pom.xml | 34 ++++++++++++++++++++++++++++++++++ connector/connect/server/pom.xml | 25 ------------------------- 4 files changed, 41 insertions(+), 32 deletions(-) diff --git a/assembly/pom.xml b/assembly/pom.xml index 77ff87c17f52..cd8c3fca9d23 100644 --- a/assembly/pom.xml +++ b/assembly/pom.xml @@ -149,6 +149,12 @@ <groupId>org.apache.spark</groupId> <artifactId>spark-connect_${scala.binary.version}</artifactId> <version>${project.version}</version> + <exclusions> + <exclusion> + <groupId>org.apache.spark</groupId> + <artifactId>spark-connect-common_${scala.binary.version}</artifactId> + </exclusion> + </exclusions> </dependency> <dependency> <groupId>org.apache.spark</groupId> diff --git a/connector/connect/client/jvm/pom.xml b/connector/connect/client/jvm/pom.xml index 8057a33df178..9bedebf523a7 100644 --- a/connector/connect/client/jvm/pom.xml +++ b/connector/connect/client/jvm/pom.xml @@ -51,15 +51,9 @@ <version>${project.version}</version> </dependency> <!-- - We need to define guava and protobuf here because we need to change the scope of both from + We need to define protobuf here because we need to change the scope of both from provided to compile. If we don't do this we can't shade these libraries. --> - <dependency> - <groupId>com.google.guava</groupId> - <artifactId>guava</artifactId> - <version>${connect.guava.version}</version> - <scope>compile</scope> - </dependency> <dependency> <groupId>com.google.protobuf</groupId> <artifactId>protobuf-java</artifactId> diff --git a/connector/connect/common/pom.xml b/connector/connect/common/pom.xml index a374646f8f29..336d83e04c15 100644 --- a/connector/connect/common/pom.xml +++ b/connector/connect/common/pom.xml @@ -47,6 +47,11 @@ <groupId>com.google.protobuf</groupId> <artifactId>protobuf-java</artifactId> </dependency> + <!-- + SPARK-45593: spark connect relies on a specific version of Guava, We perform shading + of the Guava library within the connect-common module to ensure both connect-server and + connect-client modules maintain consistent and accurate Guava dependencies. + --> <dependency> <groupId>com.google.guava</groupId> <artifactId>guava</artifactId> @@ -145,6 +150,35 @@ </execution> </executions> </plugin> + <plugin> + <groupId>org.apache.maven.plugins</groupId> + <artifactId>maven-shade-plugin</artifactId> + <configuration> + <shadedArtifactAttached>false</shadedArtifactAttached> + <artifactSet> + <includes> + <include>org.spark-project.spark:unused</include> + <include>com.google.guava:guava</include> + <include>com.google.guava:failureaccess</include> + <include>org.apache.tomcat:annotations-api</include> + </includes> + </artifactSet> + <relocations> + <relocation> + <pattern>com.google.common</pattern> + <shadedPattern>${spark.shade.packageName}.connect.guava</shadedPattern> + </relocation> + </relocations> + </configuration> + <executions> + <execution> + <phase>package</phase> + <goals> + <goal>shade</goal> + </goals> + </execution> + </executions> + </plugin> </plugins> </build> <profiles> diff --git a/connector/connect/server/pom.xml b/connector/connect/server/pom.xml index e9c7bd86e0f7..82127f736ccb 100644 --- a/connector/connect/server/pom.xml +++ b/connector/connect/server/pom.xml @@ -51,12 +51,6 @@ <groupId>org.apache.spark</groupId> <artifactId>spark-connect-common_${scala.binary.version}</artifactId> <version>${project.version}</version> - <exclusions> - <exclusion> - <groupId>com.google.guava</groupId> - <artifactId>guava</artifactId> - </exclusion> - </exclusions> </dependency> <dependency> <groupId>org.apache.spark</groupId> @@ -156,17 +150,6 @@ <groupId>org.scala-lang.modules</groupId> <artifactId>scala-parallel-collections_${scala.binary.version}</artifactId> </dependency> - <dependency> - <groupId>com.google.guava</groupId> - <artifactId>guava</artifactId> - <version>${connect.guava.version}</version> - <scope>compile</scope> - </dependency> - <dependency> - <groupId>com.google.guava</groupId> - <artifactId>failureaccess</artifactId> - <version>${guava.failureaccess.version}</version> - </dependency> <dependency> <groupId>com.google.protobuf</groupId> <artifactId>protobuf-java</artifactId> @@ -287,7 +270,6 @@ <shadedArtifactAttached>false</shadedArtifactAttached> <artifactSet> <includes> - <include>com.google.guava:*</include> <include>io.grpc:*:</include> <include>com.google.protobuf:*</include> @@ -307,13 +289,6 @@ </includes> </artifactSet> <relocations> - <relocation> - <pattern>com.google.common</pattern> - <shadedPattern>${spark.shade.packageName}.connect.guava</shadedPattern> - <includes> - <include>com.google.common.**</include> - </includes> - </relocation> <relocation> <pattern>com.google.thirdparty</pattern> <shadedPattern>${spark.shade.packageName}.connect.guava</shadedPattern> --------------------------------------------------------------------- To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org