Classloading does seem like an issue while using it with Spark Connect 3.5 and iceberg >= 1.4 version only though.
It's weird as I also mentioned in previous email that after adding spark property (spark.executor.userClassPathFirst=true) both classes gets loaded from same classloader - org.apache.spark.util.ChildFirstURLClassLoader. Not sure why error would still happen. java.lang.ClassCastException: class org.apache.iceberg.spark.source.SerializableTableWithSize cannot be cast to class org.apache.iceberg.Table (org.apache.iceberg.spark.source. *SerializableTableWithSize* is in unnamed module of loader org.apache.spark.util.*ChildFirstURLClassLoader* @a41c33c; org.apache.iceberg.*Table* is in unnamed module of loader org.apache.spark.util.*ChildFirstURLClassLoader* @16f95afb) On Tue, Jan 16, 2024 at 12:53 PM Ryan Blue <b...@tabular.io> wrote: > It looks to me like the classloader is the problem. The "child first" > classloader is apparently loading `Table`, but Spark is loading > `SerializableTableWithSize` from the parent classloader. Because delegation > isn't happening properly, you're getting two incompatible classes from the > same classpath, depending on where a class was loaded for the first time. > > On Fri, Jan 12, 2024 at 5:30 PM Nirav Patel <nira...@gmail.com> wrote: > >> It seem to happening on executor of SC server as I see the error in >> executor logs. We did verify that there was only one version of >> iceberg-spark-runtime at the moment. >> We do include custom catalog imp jar. Though it's a shaded jar I don't >> see in "org/apache/iceberg/Table" or other iceberg classes when I do "jar >> -tvf" on that. >> >> I see both jars in 3 spark >> configs. spark.repl.local.jars, spark.yarn.dist.jars >> and spark.yarn.secondary.jars. >> >> I suspected classloading issue as well as initially error was pointing to >> it: >> >> pyspark.errors.exceptions.connect.SparkConnectGrpcException: >> (org.apache.spark.SparkException) Job aborted due to stage failure: Task 0 >> in stage 0.0 failed 4 times, most recent failure: Lost task 0.3 in stage >> 0.0 (TID 3) (spark35-m.c.strivr-dev-test.internal executor 2): >> java.lang.ClassCastException: class >> org.apache.iceberg.spark.source.SerializableTableWithSize cannot be cast to >> class org.apache.iceberg.Table >> (org.apache.iceberg.spark.source.SerializableTableWithSize is in unnamed >> module of loader org.apache.spark.util.*MutableURLClassLoader* >> @6819e13c; org.apache.iceberg.Table is in unnamed module of loader >> org.apache.spark.util.*ChildFirstURLClassLoader* @15fb0c43) >> >> Although *ChildFirstURLClassLoader *is child of MutableURLClassLoader >> error shouldn't be related to that. I still try adding spark flag (--conf >> "spark.executor.userClassPathFirst=true") when starting spark connect >> server. it seem both classes gets loaded by same ClassLoader but error >> still happens: >> >> pyspark.errors.exceptions.connect.SparkConnectGrpcException: >> (org.apache.spark.SparkException) Job aborted due to stage failure: Task 0 >> in stage 0.0 failed 4 times, most recent failure: Lost task 0.3 in stage >> 0.0 (TID 3) (spark35-m.c.strivr-dev-test.internal executor 2): >> java.lang.ClassCastException: class >> org.apache.iceberg.spark.source.SerializableTableWithSize cannot be cast to >> class org.apache.iceberg.Table >> (org.apache.iceberg.spark.source.SerializableTableWithSize is in unnamed >> module of loader org.apache.spark.util.*ChildFirstURLClassLoader* >> @a41c33c; org.apache.iceberg.Table is in unnamed module of loader >> org.apache.spark.util.*ChildFirstURLClassLoader* @16f95afb) >> >> I see ClassLoader @ <some_id> in logs. Are those object Ids? (been awhile >> working with java) . wondering if multiple instance of same CLassLoader is >> being initialized by SC. may be doing --verbose:class or heap dump help to >> verify? >> >> >> On Fri, Jan 12, 2024 at 4:38 PM Ryan Blue <b...@tabular.io> wrote: >> >>> I think it looks like a version mismatch, perhaps between the SC client >>> and the server or between where planning occurs and the executors. The >>> error is that the `SerializableTableWithSize` is not a subclass of `Table`, >>> but it definitely should be. That sort of problem is usually caused by >>> class loading issues. Can you double-check that you have only one Iceberg >>> runtime in the Environment tab of your Spark cluster? >>> >>> On Tue, Jan 9, 2024 at 4:57 PM Nirav Patel <nira...@gmail.com> wrote: >>> >>>> PS - issue doesn't happen if we don't use spark-connect and instead >>>> just use spark-shell or pyspark as OP in github said as well. however >>>> stacktrace desont seem to point any of the class from spark-connect jar >>>> (org.apache.spark:spark-connect_2.12:3.5.0). >>>> >>>> On Tue, Jan 9, 2024 at 4:52 PM Nirav Patel <nira...@gmail.com> wrote: >>>> >>>>> Hi, >>>>> We are testing spark-connect with iceberg. >>>>> We tried spark 3.5, iceberg 1.4.x versions (all of >>>>> iceberg-spark-runtime-3.5_2.12-1.4.x.jar) >>>>> >>>>> with all the 1.4.x jars we are having following issue when running >>>>> iceberg queries from sparkSession created using spark-connect (--remote >>>>> "sc://remote-master-node") >>>>> >>>>> org.apache.iceberg.spark.source.SerializableTableWithSize cannot be >>>>> cast to org.apache.iceberg.Table at >>>>> org.apache.iceberg.spark.source.SparkInputPartition.table(SparkInputPartition.java:88) >>>>> at >>>>> org.apache.iceberg.spark.source.BatchDataReader.<init>(BatchDataReader.java:50) >>>>> at >>>>> org.apache.iceberg.spark.source.SparkColumnarReaderFactory.createColumnarReader(SparkColumnarReaderFactory.java:52) >>>>> at >>>>> org.apache.spark.sql.execution.datasources.v2.DataSourceRDD$$anon$1.advanceToNextIter(DataSourceRDD.scala:79) >>>>> at >>>>> org.apache.spark.sql.execution.datasources.v2.DataSourceRDD$$anon$1.hasNext(DataSourceRDD.scala:63) >>>>> at >>>>> org.apache.spark.InterruptibleIterator.hasNext(InterruptibleIterator.scala:37) >>>>> at scala.collection.Iterator$$anon$10.hasNext(Iterator.scala:460) at >>>>> org.apache.spark.sql.catalyst.expressions.GeneratedClass$GeneratedIteratorForCodegenStage1.columnartorow_nextBatch_0$(Unknown >>>>> Source) at >>>>> org.apache.spark.sql.catalyst.expressions.GeneratedClass$GeneratedIteratorForCodegenStage1.hashAgg_doAggregateWithKeys_0$(Unknown >>>>> Source) at >>>>> >>>>> Someone else has reported this issue on github as well: >>>>> https://github.com/apache/iceberg/issues/8978 >>>>> >>>>> It's currently working with spark 3.4 and iceberg 1.3 . However >>>>> Ideally it'd be nice to get it working with spark 3.5 as well as 3.5 has >>>>> many improvements in spark-connect. >>>>> >>>>> Thanks >>>>> Nirav >>>>> >>>> >>> >>> -- >>> Ryan Blue >>> Tabular >>> >> > > -- > Ryan Blue > Tabular >