Akihiro Okuno created SPARK-50414: ------------------------------------- Summary: Incorrect statement that causes java.lang.NoClassDefFoundError in Spark Connect documents Key: SPARK-50414 URL: https://issues.apache.org/jira/browse/SPARK-50414 Project: Spark Issue Type: Bug Components: Connect, Documentation Affects Versions: 3.5.3, 3.5.2, 3.5.1 Reporter: Akihiro Okuno
h2. Issue Description I created a small Scala application to verify Spark Connect functionality by running a simple query. I tried to follow the [Use Spark Connect in standalone applications|https://spark.apache.org/docs/3.5.3/spark-connect-overview.html#use-spark-connect-in-standalone-applications] section as: {code:java} // build.sbt lazy val root = (project in file(".")) .settings( scalaVersion := "2.13.12", name := "Sample app", libraryDependencies ++= "org.apache.spark" %% "spark-sql-api" % "3.5.3" :: "org.apache.spark" %% "spark-connect-client-jvm" % "3.5.3" :: Nil ) {code} {code:java} // src/main/scala/example/Hello.scala package example import org.apache.spark.sql.SparkSession object Hello extends App { private val spark = SparkSession.builder().remote("sc://localhost").build() spark.sql("select 1").show() spark.close() }{code} However, when I run "sbt run", I got the following exception: {code:java} Exception in thread "sbt-bg-threads-1" java.lang.NoClassDefFoundError: io/netty/buffer/PooledByteBufAllocator at java.base/java.lang.ClassLoader.defineClass1(Native Method) at java.base/java.lang.ClassLoader.defineClass(ClassLoader.java:1022) at java.base/java.security.SecureClassLoader.defineClass(SecureClassLoader.java:174) at java.base/java.net.URLClassLoader.defineClass(URLClassLoader.java:555) at java.base/java.net.URLClassLoader$1.run(URLClassLoader.java:458) at java.base/java.net.URLClassLoader$1.run(URLClassLoader.java:452) at java.base/java.security.AccessController.doPrivileged(Native Method) at java.base/java.net.URLClassLoader.findClass(URLClassLoader.java:451) at sbt.internal.ManagedClassLoader.findClass(ManagedClassLoader.java:103) at java.base/java.lang.ClassLoader.loadClass(ClassLoader.java:594) at java.base/java.lang.ClassLoader.loadClass(ClassLoader.java:527) at io.netty.buffer.PooledByteBufAllocatorL.<init>(PooledByteBufAllocatorL.java:49) at org.apache.arrow.memory.NettyAllocationManager.<clinit>(NettyAllocationManager.java:51) at org.apache.arrow.memory.DefaultAllocationManagerFactory.<clinit>(DefaultAllocationManagerFactory.java:26) at java.base/java.lang.Class.forName0(Native Method) at java.base/java.lang.Class.forName(Class.java:315) at org.apache.arrow.memory.DefaultAllocationManagerOption.getFactory(DefaultAllocationManagerOption.java:108) at org.apache.arrow.memory.DefaultAllocationManagerOption.getDefaultAllocationManagerFactory(DefaultAllocationManagerOption.java:98) at org.apache.arrow.memory.BaseAllocator$Config.getAllocationManagerFactory(BaseAllocator.java:772) at org.apache.arrow.memory.ImmutableConfig.access$801(ImmutableConfig.java:24) at org.apache.arrow.memory.ImmutableConfig$InitShim.getAllocationManagerFactory(ImmutableConfig.java:83) at org.apache.arrow.memory.ImmutableConfig.<init>(ImmutableConfig.java:47) at org.apache.arrow.memory.ImmutableConfig.<init>(ImmutableConfig.java:24) at org.apache.arrow.memory.ImmutableConfig$Builder.build(ImmutableConfig.java:485) at org.apache.arrow.memory.BaseAllocator.<clinit>(BaseAllocator.java:61) at org.apache.spark.sql.util.ArrowUtils$.<clinit>(ArrowUtils.scala:34) at org.apache.spark.sql.connect.client.arrow.ArrowVectorReader$.apply(ArrowVectorReader.scala:70) ...{code} h2. Issue Cause I have investigated this issue, and I suppose the following is the cause. The "spark-connect-client-jvm" is a fat jar containing dependency classes, relocating some under the "org.sparkproject" package. Starting from version 3.5.1 (see SPARK-45371), it also includes classes from "spark-sql-api". As a result, references to relocated classes within "spark-sql-api" are updated accordingly. However, specifying "spark-sql-api" as an application dependency causes the classloader to load the original classes, leading to conflicts with relocated references. In fact, in the stack trace, it tried to load io/netty/buffer/PooledByteBufAllocator, which must be org/sparkproject/io/netty/buffer/PooledByteBufAllocator. h2. How to resolve To resolve this issue, removing the dependency on "spark-sql-api" in the application resolves the classloader conflict. I have verified this solution with versions 3.5.1, 3.5.2, and 3.5.3, and it works consistently. We should also fix the documentation by removing the "spark-sql-api" dependency. -- This message was sent by Atlassian Jira (v8.20.10#820010) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org