[ https://issues.apache.org/jira/browse/SPARK-46762?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
nirav patel updated SPARK-46762: -------------------------------- Description: *Affected version:* spark 3.5 and spark-connect_2.12:3.5.0 *Not affected version and variation:* Spark 3.4 and spark-connect_2.12:3.4.0 Also works with just Spark 3.5 spark-submit script directly (ie without using spark-connect 3.5) We are having following `java.lang.ClassCastException` error in spark Executors when using spark-connect 3.5 with external spark sql catalog jar - iceberg-spark-runtime-3.5_2.12-1.4.3.jar We also set "spark.executor.userClassPathFirst=true" {code:java} pyspark.errors.exceptions.connect.SparkConnectGrpcException: (org.apache.spark.SparkException) Job aborted due to stage failure: Task 0 in stage 0.0 failed 4 times, most recent failure: Lost task 0.3 in stage 0.0 (TID 3) (spark35-m.c.strivr-dev-test.internal executor 2): java.lang.ClassCastException: class org.apache.iceberg.spark.source.SerializableTableWithSize cannot be cast to class org.apache.iceberg.Table (org.apache.iceberg.spark.source.SerializableTableWithSize is in unnamed module of loader org.apache.spark.util.ChildFirstURLClassLoader @5e7ae053; org.apache.iceberg.Table is in unnamed module of loader org.apache.spark.util.ChildFirstURLClassLoader @4b18b943) at org.apache.iceberg.spark.source.SparkInputPartition.table(SparkInputPartition.java:88) at org.apache.iceberg.spark.source.RowDataReader.<init>(RowDataReader.java:50) at org.apache.iceberg.spark.source.SparkRowReaderFactory.createReader(SparkRowReaderFactory.java:45) at org.apache.spark.sql.execution.datasources.v2.DataSourceRDD$$anon$1.advanceToNextIter(DataSourceRDD.scala:84) at org.apache.spark.sql.execution.datasources.v2.DataSourceRDD$$anon$1.hasNext(DataSourceRDD.scala:63) at org.apache.spark.InterruptibleIterator.hasNext(InterruptibleIterator.scala:37) at scala.collection.Iterator$$anon$10.hasNext(Iterator.scala:460) at scala.collection.Iterator$$anon$10.hasNext(Iterator.scala:460) at org.apache.spark.sql.execution.SparkPlan.$anonfun$getByteArrayRdd$1(SparkPlan.scala:388) at org.apache.spark.rdd.RDD.$anonfun$mapPartitionsInternal$2(RDD.scala:890) at org.apache.spark.rdd.RDD.$anonfun$mapPartitionsInternal$2$adapted(RDD.scala:890) at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:52) at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:364) at org.apache.spark.rdd.RDD.iterator(RDD.scala:328) at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:93) at org.apache.spark.TaskContext.runTaskWithListeners(TaskContext.scala:161) at org.apache.spark.scheduler.Task.run(Task.scala:141) at org.apache.spark.executor.Executor$TaskRunner.$anonfun$run$4(Executor.scala:620) at org.apach...{code} We verified that there's only one jar of `iceberg-spark-runtime-3.5_2.12-1.4.3.jar` loaded when spark-connect server is started. Issue has been open with Iceberg as well: [https://github.com/apache/iceberg/issues/8978] And being discussed in mail archive: [https://lists.apache.org/thread/5q1pdqqrd1h06hgs8vx9ztt60z5yv8n1] Looking more into Error it seems classloader itself is instantiated multiple times somewhere. I can see two instances: org.apache.spark.util.ChildFirstURLClassLoader @5e7ae053 and org.apache.spark.util.ChildFirstURLClassLoader @4b18b943 Again this issue doesn't happen with spark-connect 3.4 and doesn't happen with directly using spark3.5 without spark-connect 3.5 was: *Affected version:* spark 3.5 and spark-connect_2.12:3.5.0 *Not affected version and variation:* Spark 3.4 and spark-connect_2.12:3.4.0 Also works with just Spark 3.5 spark-submit script directly (ie without using spark-connect 3.5) We are having following `java.lang.ClassCastException` error in spark Executors when using spark-connect 3.5 with external spark sql catalog jar - iceberg-spark-runtime-3.5_2.12-1.4.3.jar {code:java} pyspark.errors.exceptions.connect.SparkConnectGrpcException: (org.apache.spark.SparkException) Job aborted due to stage failure: Task 0 in stage 0.0 failed 4 times, most recent failure: Lost task 0.3 in stage 0.0 (TID 3) (spark35-m.c.strivr-dev-test.internal executor 2): java.lang.ClassCastException: class org.apache.iceberg.spark.source.SerializableTableWithSize cannot be cast to class org.apache.iceberg.Table (org.apache.iceberg.spark.source.SerializableTableWithSize is in unnamed module of loader org.apache.spark.util.ChildFirstURLClassLoader @5e7ae053; org.apache.iceberg.Table is in unnamed module of loader org.apache.spark.util.ChildFirstURLClassLoader @4b18b943) at org.apache.iceberg.spark.source.SparkInputPartition.table(SparkInputPartition.java:88) at org.apache.iceberg.spark.source.RowDataReader.<init>(RowDataReader.java:50) at org.apache.iceberg.spark.source.SparkRowReaderFactory.createReader(SparkRowReaderFactory.java:45) at org.apache.spark.sql.execution.datasources.v2.DataSourceRDD$$anon$1.advanceToNextIter(DataSourceRDD.scala:84) at org.apache.spark.sql.execution.datasources.v2.DataSourceRDD$$anon$1.hasNext(DataSourceRDD.scala:63) at org.apache.spark.InterruptibleIterator.hasNext(InterruptibleIterator.scala:37) at scala.collection.Iterator$$anon$10.hasNext(Iterator.scala:460) at scala.collection.Iterator$$anon$10.hasNext(Iterator.scala:460) at org.apache.spark.sql.execution.SparkPlan.$anonfun$getByteArrayRdd$1(SparkPlan.scala:388) at org.apache.spark.rdd.RDD.$anonfun$mapPartitionsInternal$2(RDD.scala:890) at org.apache.spark.rdd.RDD.$anonfun$mapPartitionsInternal$2$adapted(RDD.scala:890) at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:52) at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:364) at org.apache.spark.rdd.RDD.iterator(RDD.scala:328) at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:93) at org.apache.spark.TaskContext.runTaskWithListeners(TaskContext.scala:161) at org.apache.spark.scheduler.Task.run(Task.scala:141) at org.apache.spark.executor.Executor$TaskRunner.$anonfun$run$4(Executor.scala:620) at org.apach...{code} We verified that there's only one jar of `iceberg-spark-runtime-3.5_2.12-1.4.3.jar` loaded when spark-connect server is started. Issue has been open with Iceberg as well: [https://github.com/apache/iceberg/issues/8978] And being discussed in mail archive: [https://lists.apache.org/thread/5q1pdqqrd1h06hgs8vx9ztt60z5yv8n1] Looking more into this issue it seems classloader itself is instantiated multiple times somewhere. I can see two instances: org.apache.spark.util.ChildFirstURLClassLoader @5e7ae053 and org.apache.spark.util.ChildFirstURLClassLoader @4b18b943 > Spark Connect 3.5 Classloading issue > ------------------------------------ > > Key: SPARK-46762 > URL: https://issues.apache.org/jira/browse/SPARK-46762 > Project: Spark > Issue Type: Bug > Components: Connect > Affects Versions: 3.5.0 > Reporter: nirav patel > Priority: Major > > *Affected version:* > spark 3.5 and spark-connect_2.12:3.5.0 > > *Not affected version and variation:* > Spark 3.4 and spark-connect_2.12:3.4.0 > Also works with just Spark 3.5 spark-submit script directly (ie without using > spark-connect 3.5) > > We are having following `java.lang.ClassCastException` error in spark > Executors when using spark-connect 3.5 with external spark sql catalog jar - > iceberg-spark-runtime-3.5_2.12-1.4.3.jar > We also set "spark.executor.userClassPathFirst=true" > > {code:java} > pyspark.errors.exceptions.connect.SparkConnectGrpcException: > (org.apache.spark.SparkException) Job aborted due to stage failure: Task 0 in > stage 0.0 failed 4 times, most recent failure: Lost task 0.3 in stage 0.0 > (TID 3) (spark35-m.c.strivr-dev-test.internal executor 2): > java.lang.ClassCastException: class > org.apache.iceberg.spark.source.SerializableTableWithSize cannot be cast to > class org.apache.iceberg.Table > (org.apache.iceberg.spark.source.SerializableTableWithSize is in unnamed > module of loader org.apache.spark.util.ChildFirstURLClassLoader @5e7ae053; > org.apache.iceberg.Table is in unnamed module of loader > org.apache.spark.util.ChildFirstURLClassLoader @4b18b943) > at > org.apache.iceberg.spark.source.SparkInputPartition.table(SparkInputPartition.java:88) > at > org.apache.iceberg.spark.source.RowDataReader.<init>(RowDataReader.java:50) > at > org.apache.iceberg.spark.source.SparkRowReaderFactory.createReader(SparkRowReaderFactory.java:45) > at > org.apache.spark.sql.execution.datasources.v2.DataSourceRDD$$anon$1.advanceToNextIter(DataSourceRDD.scala:84) > at > org.apache.spark.sql.execution.datasources.v2.DataSourceRDD$$anon$1.hasNext(DataSourceRDD.scala:63) > at > org.apache.spark.InterruptibleIterator.hasNext(InterruptibleIterator.scala:37) > at scala.collection.Iterator$$anon$10.hasNext(Iterator.scala:460) > at scala.collection.Iterator$$anon$10.hasNext(Iterator.scala:460) > at > org.apache.spark.sql.execution.SparkPlan.$anonfun$getByteArrayRdd$1(SparkPlan.scala:388) > at > org.apache.spark.rdd.RDD.$anonfun$mapPartitionsInternal$2(RDD.scala:890) > at > org.apache.spark.rdd.RDD.$anonfun$mapPartitionsInternal$2$adapted(RDD.scala:890) > at > org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:52) > at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:364) > at org.apache.spark.rdd.RDD.iterator(RDD.scala:328) > at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:93) > at > org.apache.spark.TaskContext.runTaskWithListeners(TaskContext.scala:161) > at org.apache.spark.scheduler.Task.run(Task.scala:141) > at > org.apache.spark.executor.Executor$TaskRunner.$anonfun$run$4(Executor.scala:620) > at org.apach...{code} > > > We verified that there's only one jar of > `iceberg-spark-runtime-3.5_2.12-1.4.3.jar` loaded when spark-connect server > is started. > Issue has been open with Iceberg as well: > [https://github.com/apache/iceberg/issues/8978] > And being discussed in mail archive: > [https://lists.apache.org/thread/5q1pdqqrd1h06hgs8vx9ztt60z5yv8n1] > > Looking more into Error it seems classloader itself is instantiated multiple > times somewhere. I can see two instances: > org.apache.spark.util.ChildFirstURLClassLoader @5e7ae053 and > org.apache.spark.util.ChildFirstURLClassLoader @4b18b943 > > Again this issue doesn't happen with spark-connect 3.4 and doesn't happen > with directly using spark3.5 without spark-connect 3.5 -- This message was sent by Atlassian Jira (v8.20.10#820010) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org