Akihiro Okuno created SPARK-50414:
-------------------------------------

             Summary: Incorrect statement that causes 
java.lang.NoClassDefFoundError in Spark Connect documents
                 Key: SPARK-50414
                 URL: https://issues.apache.org/jira/browse/SPARK-50414
             Project: Spark
          Issue Type: Bug
          Components: Connect, Documentation
    Affects Versions: 3.5.3, 3.5.2, 3.5.1
            Reporter: Akihiro Okuno


h2. Issue Description

I created a small Scala application to verify Spark Connect functionality by 
running a simple query. I tried to follow the [Use Spark Connect in standalone 
applications|https://spark.apache.org/docs/3.5.3/spark-connect-overview.html#use-spark-connect-in-standalone-applications]
 section as:

 
{code:java}
// build.sbt
lazy val root = (project in file("."))
  .settings(
    scalaVersion := "2.13.12",
    name := "Sample app",
    libraryDependencies ++=
      "org.apache.spark" %% "spark-sql-api" % "3.5.3" ::
        "org.apache.spark" %% "spark-connect-client-jvm" % "3.5.3" ::
        Nil
  ) {code}
 
{code:java}
// src/main/scala/example/Hello.scala
package example

import org.apache.spark.sql.SparkSession

object Hello extends App {
  private val spark = SparkSession.builder().remote("sc://localhost").build()
  spark.sql("select 1").show()
  spark.close()
}{code}
However, when I run "sbt run", I got the following exception:
{code:java}
Exception in thread "sbt-bg-threads-1" java.lang.NoClassDefFoundError: 
io/netty/buffer/PooledByteBufAllocator
        at java.base/java.lang.ClassLoader.defineClass1(Native Method)
        at java.base/java.lang.ClassLoader.defineClass(ClassLoader.java:1022)
        at 
java.base/java.security.SecureClassLoader.defineClass(SecureClassLoader.java:174)
        at 
java.base/java.net.URLClassLoader.defineClass(URLClassLoader.java:555)
        at java.base/java.net.URLClassLoader$1.run(URLClassLoader.java:458)
        at java.base/java.net.URLClassLoader$1.run(URLClassLoader.java:452)
        at java.base/java.security.AccessController.doPrivileged(Native Method)
        at java.base/java.net.URLClassLoader.findClass(URLClassLoader.java:451)
        at 
sbt.internal.ManagedClassLoader.findClass(ManagedClassLoader.java:103)
        at java.base/java.lang.ClassLoader.loadClass(ClassLoader.java:594)
        at java.base/java.lang.ClassLoader.loadClass(ClassLoader.java:527)
        at 
io.netty.buffer.PooledByteBufAllocatorL.<init>(PooledByteBufAllocatorL.java:49)
        at 
org.apache.arrow.memory.NettyAllocationManager.<clinit>(NettyAllocationManager.java:51)
        at 
org.apache.arrow.memory.DefaultAllocationManagerFactory.<clinit>(DefaultAllocationManagerFactory.java:26)
        at java.base/java.lang.Class.forName0(Native Method)
        at java.base/java.lang.Class.forName(Class.java:315)
        at 
org.apache.arrow.memory.DefaultAllocationManagerOption.getFactory(DefaultAllocationManagerOption.java:108)
        at 
org.apache.arrow.memory.DefaultAllocationManagerOption.getDefaultAllocationManagerFactory(DefaultAllocationManagerOption.java:98)
        at 
org.apache.arrow.memory.BaseAllocator$Config.getAllocationManagerFactory(BaseAllocator.java:772)
        at 
org.apache.arrow.memory.ImmutableConfig.access$801(ImmutableConfig.java:24)
        at 
org.apache.arrow.memory.ImmutableConfig$InitShim.getAllocationManagerFactory(ImmutableConfig.java:83)
        at 
org.apache.arrow.memory.ImmutableConfig.<init>(ImmutableConfig.java:47)
        at 
org.apache.arrow.memory.ImmutableConfig.<init>(ImmutableConfig.java:24)
        at 
org.apache.arrow.memory.ImmutableConfig$Builder.build(ImmutableConfig.java:485)
        at org.apache.arrow.memory.BaseAllocator.<clinit>(BaseAllocator.java:61)
        at org.apache.spark.sql.util.ArrowUtils$.<clinit>(ArrowUtils.scala:34)
        at 
org.apache.spark.sql.connect.client.arrow.ArrowVectorReader$.apply(ArrowVectorReader.scala:70)
 
...{code}
h2. Issue Cause

I have investigated this issue, and I suppose the following is the cause.

The "spark-connect-client-jvm" is a fat jar containing dependency classes, 
relocating some under the "org.sparkproject" package. Starting from version 
3.5.1 (see SPARK-45371), it also includes classes from "spark-sql-api". As a 
result, references to relocated classes within "spark-sql-api" are updated 
accordingly. However, specifying "spark-sql-api" as an application dependency 
causes the classloader to load the original classes, leading to conflicts with 
relocated references.

In fact, in the stack trace, it tried to load 
io/netty/buffer/PooledByteBufAllocator, which must be 
org/sparkproject/io/netty/buffer/PooledByteBufAllocator.
h2. How to resolve

To resolve this issue, removing the dependency on "spark-sql-api" in the 
application resolves the classloader conflict. I have verified this solution 
with versions 3.5.1, 3.5.2, and 3.5.3, and it works consistently.

We should also fix the documentation by removing the "spark-sql-api" dependency.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

Reply via email to