This is an automated email from the ASF dual-hosted git repository.

yangjie01 pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/master by this push:
     new 44d2c86e71fc [SPARK-45593][BUILD] Building a runnable distribution 
from master code running spark-sql raise error
44d2c86e71fc is described below

commit 44d2c86e71fca7044e6d5d9e9222eecff17c360c
Author: yikaifei <yikai...@apache.org>
AuthorDate: Thu Jan 18 11:32:01 2024 +0800

    [SPARK-45593][BUILD] Building a runnable distribution from master code 
running spark-sql raise error
    
    ### What changes were proposed in this pull request?
    
    Fix a build issue, when building a runnable distribution from master code 
running spark-sql raise error:
    ```
    Caused by: java.lang.ClassNotFoundException: 
org.sparkproject.guava.util.concurrent.internal.InternalFutureFailureAccess
            at 
java.base/jdk.internal.loader.BuiltinClassLoader.loadClass(BuiltinClassLoader.java:641)
            at 
java.base/jdk.internal.loader.ClassLoaders$AppClassLoader.loadClass(ClassLoaders.java:188)
            at java.base/java.lang.ClassLoader.loadClass(ClassLoader.java:520)
            ... 58 more
    ```
    the problem is due to a gauva dependency in  spark-connect-common POM that 
**conflicts**  with the shade plugin of the parent pom.
    
    - the spark-connect-common contains `connect.guava.version` version of 
guava, and it is relocation as `${spark.shade.packageName}.guava` not the 
`${spark.shade.packageName}.connect.guava`;
    - The spark-network-common also contains guava related classes, it has also 
been relocation is `${spark.shade.packageName}.guava`, but guava version 
`${guava.version}`;
    - As a result, in the presence of different versions of the classpath 
org.sparkproject.guava.xx;
    
    In addition, after investigation, it seems that module spark-connect-common 
is not related to guava, so we can remove guava dependency from 
spark-connect-common.
    
    ### Why are the changes needed?
    
    Building a runnable distribution from master code is not runnable.
    
    ### Does this PR introduce _any_ user-facing change?
    
    No
    
    ### How was this patch tested?
    
    I ran the build command output a runnable distribution package manually for 
the tests;
    
    Build command:
    ```
    ./dev/make-distribution.sh --name ui --pip --tgz  -Phive 
-Phive-thriftserver -Pyarn -Pconnect
    ```
    
    Test result:
    <img width="1276" alt="image" 
src="https://github.com/apache/spark/assets/51110188/aefbc433-ea5c-4287-8ebd-367806043ac8";>
    
    I also checked the `org.sparkproject.guava.cache.LocalCache` from jars dir;
    Before:
    ```
    ➜  jars grep -lr 'org.sparkproject.guava.cache.LocalCache' ./
    .//spark-connect_2.13-4.0.0-SNAPSHOT.jar
    .//spark-network-common_2.13-4.0.0-SNAPSHOT.jar
    .//spark-connect-common_2.13-4.0.0-SNAPSHOT.jar
    ```
    
    Now:
    ```
    ➜  jars grep -lr 'org.sparkproject.guava.cache.LocalCache' ./
    .//spark-network-common_2.13-4.0.0-SNAPSHOT.jar
    ```
    
    ### Was this patch authored or co-authored using generative AI tooling?
    
    No
    
    Closes #43436 from Yikf/SPARK-45593.
    
    Authored-by: yikaifei <yikai...@apache.org>
    Signed-off-by: yangjie01 <yangji...@baidu.com>
---
 assembly/pom.xml                     |  6 ++++++
 connector/connect/client/jvm/pom.xml |  8 +-------
 connector/connect/common/pom.xml     | 34 ++++++++++++++++++++++++++++++++++
 connector/connect/server/pom.xml     | 25 -------------------------
 4 files changed, 41 insertions(+), 32 deletions(-)

diff --git a/assembly/pom.xml b/assembly/pom.xml
index 77ff87c17f52..cd8c3fca9d23 100644
--- a/assembly/pom.xml
+++ b/assembly/pom.xml
@@ -149,6 +149,12 @@
           <groupId>org.apache.spark</groupId>
           <artifactId>spark-connect_${scala.binary.version}</artifactId>
           <version>${project.version}</version>
+          <exclusions>
+            <exclusion>
+              <groupId>org.apache.spark</groupId>
+              
<artifactId>spark-connect-common_${scala.binary.version}</artifactId>
+            </exclusion>
+          </exclusions>
         </dependency>
         <dependency>
           <groupId>org.apache.spark</groupId>
diff --git a/connector/connect/client/jvm/pom.xml 
b/connector/connect/client/jvm/pom.xml
index 8057a33df178..9bedebf523a7 100644
--- a/connector/connect/client/jvm/pom.xml
+++ b/connector/connect/client/jvm/pom.xml
@@ -51,15 +51,9 @@
       <version>${project.version}</version>
     </dependency>
     <!--
-      We need to define guava and protobuf here because we need to change the 
scope of both from
+      We need to define protobuf here because we need to change the scope of 
both from
       provided to compile. If we don't do this we can't shade these libraries.
     -->
-    <dependency>
-      <groupId>com.google.guava</groupId>
-      <artifactId>guava</artifactId>
-      <version>${connect.guava.version}</version>
-      <scope>compile</scope>
-    </dependency>
     <dependency>
       <groupId>com.google.protobuf</groupId>
       <artifactId>protobuf-java</artifactId>
diff --git a/connector/connect/common/pom.xml b/connector/connect/common/pom.xml
index a374646f8f29..336d83e04c15 100644
--- a/connector/connect/common/pom.xml
+++ b/connector/connect/common/pom.xml
@@ -47,6 +47,11 @@
             <groupId>com.google.protobuf</groupId>
             <artifactId>protobuf-java</artifactId>
         </dependency>
+        <!--
+          SPARK-45593: spark connect relies on a specific version of Guava, We 
perform shading
+          of the Guava library within the connect-common module to ensure both 
connect-server and
+          connect-client modules maintain consistent and accurate Guava 
dependencies.
+        -->
         <dependency>
             <groupId>com.google.guava</groupId>
             <artifactId>guava</artifactId>
@@ -145,6 +150,35 @@
                     </execution>
                 </executions>
             </plugin>
+            <plugin>
+                <groupId>org.apache.maven.plugins</groupId>
+                <artifactId>maven-shade-plugin</artifactId>
+                <configuration>
+                    <shadedArtifactAttached>false</shadedArtifactAttached>
+                    <artifactSet>
+                        <includes>
+                            <include>org.spark-project.spark:unused</include>
+                            <include>com.google.guava:guava</include>
+                            <include>com.google.guava:failureaccess</include>
+                            
<include>org.apache.tomcat:annotations-api</include>
+                        </includes>
+                    </artifactSet>
+                    <relocations>
+                        <relocation>
+                            <pattern>com.google.common</pattern>
+                            
<shadedPattern>${spark.shade.packageName}.connect.guava</shadedPattern>
+                        </relocation>
+                    </relocations>
+                </configuration>
+                <executions>
+                    <execution>
+                        <phase>package</phase>
+                        <goals>
+                            <goal>shade</goal>
+                        </goals>
+                    </execution>
+                </executions>
+            </plugin>
         </plugins>
     </build>
     <profiles>
diff --git a/connector/connect/server/pom.xml b/connector/connect/server/pom.xml
index e9c7bd86e0f7..82127f736ccb 100644
--- a/connector/connect/server/pom.xml
+++ b/connector/connect/server/pom.xml
@@ -51,12 +51,6 @@
       <groupId>org.apache.spark</groupId>
       <artifactId>spark-connect-common_${scala.binary.version}</artifactId>
       <version>${project.version}</version>
-      <exclusions>
-        <exclusion>
-          <groupId>com.google.guava</groupId>
-          <artifactId>guava</artifactId>
-        </exclusion>
-      </exclusions>
     </dependency>
     <dependency>
       <groupId>org.apache.spark</groupId>
@@ -156,17 +150,6 @@
       <groupId>org.scala-lang.modules</groupId>
       
<artifactId>scala-parallel-collections_${scala.binary.version}</artifactId>
     </dependency>
-    <dependency>
-      <groupId>com.google.guava</groupId>
-      <artifactId>guava</artifactId>
-      <version>${connect.guava.version}</version>
-      <scope>compile</scope>
-    </dependency>
-    <dependency>
-      <groupId>com.google.guava</groupId>
-      <artifactId>failureaccess</artifactId>
-      <version>${guava.failureaccess.version}</version>
-    </dependency>
     <dependency>
       <groupId>com.google.protobuf</groupId>
       <artifactId>protobuf-java</artifactId>
@@ -287,7 +270,6 @@
           <shadedArtifactAttached>false</shadedArtifactAttached>
           <artifactSet>
             <includes>
-              <include>com.google.guava:*</include>
               <include>io.grpc:*:</include>
               <include>com.google.protobuf:*</include>
 
@@ -307,13 +289,6 @@
             </includes>
           </artifactSet>
           <relocations>
-            <relocation>
-              <pattern>com.google.common</pattern>
-              
<shadedPattern>${spark.shade.packageName}.connect.guava</shadedPattern>
-              <includes>
-                <include>com.google.common.**</include>
-              </includes>
-            </relocation>
             <relocation>
               <pattern>com.google.thirdparty</pattern>
               
<shadedPattern>${spark.shade.packageName}.connect.guava</shadedPattern>


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

Reply via email to