It looks to me like the classloader is the problem. The "child first" classloader is apparently loading `Table`, but Spark is loading `SerializableTableWithSize` from the parent classloader. Because delegation isn't happening properly, you're getting two incompatible classes from the same classpath, depending on where a class was loaded for the first time.
On Fri, Jan 12, 2024 at 5:30 PM Nirav Patel <[email protected]> wrote: > It seem to happening on executor of SC server as I see the error in > executor logs. We did verify that there was only one version of > iceberg-spark-runtime at the moment. > We do include custom catalog imp jar. Though it's a shaded jar I don't see > in "org/apache/iceberg/Table" or other iceberg classes when I do "jar -tvf" > on that. > > I see both jars in 3 spark > configs. spark.repl.local.jars, spark.yarn.dist.jars > and spark.yarn.secondary.jars. > > I suspected classloading issue as well as initially error was pointing to > it: > > pyspark.errors.exceptions.connect.SparkConnectGrpcException: > (org.apache.spark.SparkException) Job aborted due to stage failure: Task 0 > in stage 0.0 failed 4 times, most recent failure: Lost task 0.3 in stage > 0.0 (TID 3) (spark35-m.c.strivr-dev-test.internal executor 2): > java.lang.ClassCastException: class > org.apache.iceberg.spark.source.SerializableTableWithSize cannot be cast to > class org.apache.iceberg.Table > (org.apache.iceberg.spark.source.SerializableTableWithSize is in unnamed > module of loader org.apache.spark.util.*MutableURLClassLoader* @6819e13c; > org.apache.iceberg.Table is in unnamed module of loader > org.apache.spark.util.*ChildFirstURLClassLoader* @15fb0c43) > > Although *ChildFirstURLClassLoader *is child of MutableURLClassLoader > error shouldn't be related to that. I still try adding spark flag (--conf > "spark.executor.userClassPathFirst=true") when starting spark connect > server. it seem both classes gets loaded by same ClassLoader but error > still happens: > > pyspark.errors.exceptions.connect.SparkConnectGrpcException: > (org.apache.spark.SparkException) Job aborted due to stage failure: Task 0 > in stage 0.0 failed 4 times, most recent failure: Lost task 0.3 in stage > 0.0 (TID 3) (spark35-m.c.strivr-dev-test.internal executor 2): > java.lang.ClassCastException: class > org.apache.iceberg.spark.source.SerializableTableWithSize cannot be cast to > class org.apache.iceberg.Table > (org.apache.iceberg.spark.source.SerializableTableWithSize is in unnamed > module of loader org.apache.spark.util.*ChildFirstURLClassLoader* > @a41c33c; org.apache.iceberg.Table is in unnamed module of loader > org.apache.spark.util.*ChildFirstURLClassLoader* @16f95afb) > > I see ClassLoader @ <some_id> in logs. Are those object Ids? (been awhile > working with java) . wondering if multiple instance of same CLassLoader is > being initialized by SC. may be doing --verbose:class or heap dump help to > verify? > > > On Fri, Jan 12, 2024 at 4:38 PM Ryan Blue <[email protected]> wrote: > >> I think it looks like a version mismatch, perhaps between the SC client >> and the server or between where planning occurs and the executors. The >> error is that the `SerializableTableWithSize` is not a subclass of `Table`, >> but it definitely should be. That sort of problem is usually caused by >> class loading issues. Can you double-check that you have only one Iceberg >> runtime in the Environment tab of your Spark cluster? >> >> On Tue, Jan 9, 2024 at 4:57 PM Nirav Patel <[email protected]> wrote: >> >>> PS - issue doesn't happen if we don't use spark-connect and instead just >>> use spark-shell or pyspark as OP in github said as well. however stacktrace >>> desont seem to point any of the class from spark-connect jar >>> (org.apache.spark:spark-connect_2.12:3.5.0). >>> >>> On Tue, Jan 9, 2024 at 4:52 PM Nirav Patel <[email protected]> wrote: >>> >>>> Hi, >>>> We are testing spark-connect with iceberg. >>>> We tried spark 3.5, iceberg 1.4.x versions (all of >>>> iceberg-spark-runtime-3.5_2.12-1.4.x.jar) >>>> >>>> with all the 1.4.x jars we are having following issue when running >>>> iceberg queries from sparkSession created using spark-connect (--remote >>>> "sc://remote-master-node") >>>> >>>> org.apache.iceberg.spark.source.SerializableTableWithSize cannot be >>>> cast to org.apache.iceberg.Table at >>>> org.apache.iceberg.spark.source.SparkInputPartition.table(SparkInputPartition.java:88) >>>> at >>>> org.apache.iceberg.spark.source.BatchDataReader.<init>(BatchDataReader.java:50) >>>> at >>>> org.apache.iceberg.spark.source.SparkColumnarReaderFactory.createColumnarReader(SparkColumnarReaderFactory.java:52) >>>> at >>>> org.apache.spark.sql.execution.datasources.v2.DataSourceRDD$$anon$1.advanceToNextIter(DataSourceRDD.scala:79) >>>> at >>>> org.apache.spark.sql.execution.datasources.v2.DataSourceRDD$$anon$1.hasNext(DataSourceRDD.scala:63) >>>> at >>>> org.apache.spark.InterruptibleIterator.hasNext(InterruptibleIterator.scala:37) >>>> at scala.collection.Iterator$$anon$10.hasNext(Iterator.scala:460) at >>>> org.apache.spark.sql.catalyst.expressions.GeneratedClass$GeneratedIteratorForCodegenStage1.columnartorow_nextBatch_0$(Unknown >>>> Source) at >>>> org.apache.spark.sql.catalyst.expressions.GeneratedClass$GeneratedIteratorForCodegenStage1.hashAgg_doAggregateWithKeys_0$(Unknown >>>> Source) at >>>> >>>> Someone else has reported this issue on github as well: >>>> https://github.com/apache/iceberg/issues/8978 >>>> >>>> It's currently working with spark 3.4 and iceberg 1.3 . However Ideally >>>> it'd be nice to get it working with spark 3.5 as well as 3.5 has many >>>> improvements in spark-connect. >>>> >>>> Thanks >>>> Nirav >>>> >>> >> >> -- >> Ryan Blue >> Tabular >> > -- Ryan Blue Tabular
