Thanks for sharing those issues. it does seem related to me based on similar test case failures they had internally. i could try to drop iceberg-runtime in jars dir of spark and see if that help avoid this as it seems classloading issue comes up when loading using --jars args with spark-connect
On Fri, Feb 23, 2024 at 1:39 AM Eduard Tudenhoefner <edu...@tabular.io> wrote: > I wonder if this is somewhat related to > https://github.com/apache/spark/commit/6d0fed9a18ff87e73fdf1ee46b6b0d2df8dd5a1b > / > SPARK-43744 <https://issues.apache.org/jira/browse/SPARK-43744>, which > appears to have fixed similar issues that you were experiencing for Spark > 3.5, but maybe some other place to be fixed was missed. > The thing in Java is that if a class is being loaded by two different > class loaders, then these two classes are never being considered equal. > That means that a ClassCastException like the one you mentioned in your > first email could happen for that reason. > > @Nirav were you able to test Spark 3.4 vs 3.5 with Iceberg 1.4.x? In your > previous email you only mentioned Spark 3.5 + Iceberg 1.4.x vs Spark 3.4 + > Iceberg 1.3.x, but I would only try and compare different Spark versions > and keep Iceberg versions the same. > > To answer your question whether it might be an Iceberg vs Spark issue, I > think it's a spark-connect issue with different classloaders. I've seen > similar things in the past in other Java environments and Iceberg itself > doesn't do anything fancy around classloading. > > > On Thu, Feb 22, 2024 at 11:15 PM Nirav Patel <nira...@gmail.com> wrote: > >> Hi Ryan, >> >> I updated the spark-jira I opened with more information I found after >> taking heapdump: >> >> https://issues.apache.org/jira/browse/SPARK-46762 >> >> class `org.apache.iceberg.Table` is loaded twice> once by >> ChildFirstUrlClassLoader and once by MutableURLClassLoader . >> >> Issue doesn't happen with spark3.4 and iceberg 1.3 as I mentioned in >> ticket. do you think it's still a spark-connect issue ? I noticed there's a >> slightly bigger migratory changes in iceberg repo going from 1.3 to 1.4 in >> order to support spark3.5 . DO you think something might have gotten missed >> there? >> >> >> Thanks >> Nirav >> >> On Thu, Jan 18, 2024 at 9:46 AM Nirav Patel <nira...@gmail.com> wrote: >> >>> Classloading does seem like an issue while using it with Spark Connect >>> 3.5 and iceberg >= 1.4 version only though. >>> >>> It's weird as I also mentioned in previous email that after adding spark >>> property (spark.executor.userClassPathFirst=true) both classes gets loaded >>> from same classloader - org.apache.spark.util.ChildFirstURLClassLoader. Not >>> sure why error would still happen. >>> >>> java.lang.ClassCastException: class >>> org.apache.iceberg.spark.source.SerializableTableWithSize cannot be cast to >>> class org.apache.iceberg.Table (org.apache.iceberg.spark.source. >>> *SerializableTableWithSize* is in unnamed module of loader >>> org.apache.spark.util.*ChildFirstURLClassLoader* @a41c33c; >>> org.apache.iceberg.*Table* is in unnamed module of loader >>> org.apache.spark.util.*ChildFirstURLClassLoader* @16f95afb) >>> >>> >>> On Tue, Jan 16, 2024 at 12:53 PM Ryan Blue <b...@tabular.io> wrote: >>> >>>> It looks to me like the classloader is the problem. The "child first" >>>> classloader is apparently loading `Table`, but Spark is loading >>>> `SerializableTableWithSize` from the parent classloader. Because delegation >>>> isn't happening properly, you're getting two incompatible classes from the >>>> same classpath, depending on where a class was loaded for the first time. >>>> >>>> On Fri, Jan 12, 2024 at 5:30 PM Nirav Patel <nira...@gmail.com> wrote: >>>> >>>>> It seem to happening on executor of SC server as I see the error in >>>>> executor logs. We did verify that there was only one version of >>>>> iceberg-spark-runtime at the moment. >>>>> We do include custom catalog imp jar. Though it's a shaded jar I don't >>>>> see in "org/apache/iceberg/Table" or other iceberg classes when I do "jar >>>>> -tvf" on that. >>>>> >>>>> I see both jars in 3 spark >>>>> configs. spark.repl.local.jars, spark.yarn.dist.jars >>>>> and spark.yarn.secondary.jars. >>>>> >>>>> I suspected classloading issue as well as initially error was pointing >>>>> to it: >>>>> >>>>> pyspark.errors.exceptions.connect.SparkConnectGrpcException: >>>>> (org.apache.spark.SparkException) Job aborted due to stage failure: Task 0 >>>>> in stage 0.0 failed 4 times, most recent failure: Lost task 0.3 in stage >>>>> 0.0 (TID 3) (spark35-m.c.strivr-dev-test.internal executor 2): >>>>> java.lang.ClassCastException: class >>>>> org.apache.iceberg.spark.source.SerializableTableWithSize cannot be cast >>>>> to >>>>> class org.apache.iceberg.Table >>>>> (org.apache.iceberg.spark.source.SerializableTableWithSize is in unnamed >>>>> module of loader org.apache.spark.util.*MutableURLClassLoader* >>>>> @6819e13c; org.apache.iceberg.Table is in unnamed module of loader >>>>> org.apache.spark.util.*ChildFirstURLClassLoader* @15fb0c43) >>>>> >>>>> Although *ChildFirstURLClassLoader *is child of MutableURLClassLoader >>>>> error shouldn't be related to that. I still try adding spark flag (--conf >>>>> "spark.executor.userClassPathFirst=true") when starting spark connect >>>>> server. it seem both classes gets loaded by same ClassLoader but error >>>>> still happens: >>>>> >>>>> pyspark.errors.exceptions.connect.SparkConnectGrpcException: >>>>> (org.apache.spark.SparkException) Job aborted due to stage failure: Task 0 >>>>> in stage 0.0 failed 4 times, most recent failure: Lost task 0.3 in stage >>>>> 0.0 (TID 3) (spark35-m.c.strivr-dev-test.internal executor 2): >>>>> java.lang.ClassCastException: class >>>>> org.apache.iceberg.spark.source.SerializableTableWithSize cannot be cast >>>>> to >>>>> class org.apache.iceberg.Table >>>>> (org.apache.iceberg.spark.source.SerializableTableWithSize is in unnamed >>>>> module of loader org.apache.spark.util.*ChildFirstURLClassLoader* >>>>> @a41c33c; org.apache.iceberg.Table is in unnamed module of loader >>>>> org.apache.spark.util.*ChildFirstURLClassLoader* @16f95afb) >>>>> >>>>> I see ClassLoader @ <some_id> in logs. Are those object Ids? (been >>>>> awhile working with java) . wondering if multiple instance of same >>>>> CLassLoader is being initialized by SC. may be doing --verbose:class or >>>>> heap dump help to verify? >>>>> >>>>> >>>>> On Fri, Jan 12, 2024 at 4:38 PM Ryan Blue <b...@tabular.io> wrote: >>>>> >>>>>> I think it looks like a version mismatch, perhaps between the SC >>>>>> client and the server or between where planning occurs and the executors. >>>>>> The error is that the `SerializableTableWithSize` is not a subclass of >>>>>> `Table`, but it definitely should be. That sort of problem is usually >>>>>> caused by class loading issues. Can you double-check that you have only >>>>>> one >>>>>> Iceberg runtime in the Environment tab of your Spark cluster? >>>>>> >>>>>> On Tue, Jan 9, 2024 at 4:57 PM Nirav Patel <nira...@gmail.com> wrote: >>>>>> >>>>>>> PS - issue doesn't happen if we don't use spark-connect and instead >>>>>>> just use spark-shell or pyspark as OP in github said as well. however >>>>>>> stacktrace desont seem to point any of the class from spark-connect jar >>>>>>> (org.apache.spark:spark-connect_2.12:3.5.0). >>>>>>> >>>>>>> On Tue, Jan 9, 2024 at 4:52 PM Nirav Patel <nira...@gmail.com> >>>>>>> wrote: >>>>>>> >>>>>>>> Hi, >>>>>>>> We are testing spark-connect with iceberg. >>>>>>>> We tried spark 3.5, iceberg 1.4.x versions (all of >>>>>>>> iceberg-spark-runtime-3.5_2.12-1.4.x.jar) >>>>>>>> >>>>>>>> with all the 1.4.x jars we are having following issue when running >>>>>>>> iceberg queries from sparkSession created using spark-connect (--remote >>>>>>>> "sc://remote-master-node") >>>>>>>> >>>>>>>> org.apache.iceberg.spark.source.SerializableTableWithSize cannot be >>>>>>>> cast to org.apache.iceberg.Table at >>>>>>>> org.apache.iceberg.spark.source.SparkInputPartition.table(SparkInputPartition.java:88) >>>>>>>> at >>>>>>>> org.apache.iceberg.spark.source.BatchDataReader.<init>(BatchDataReader.java:50) >>>>>>>> at >>>>>>>> org.apache.iceberg.spark.source.SparkColumnarReaderFactory.createColumnarReader(SparkColumnarReaderFactory.java:52) >>>>>>>> at >>>>>>>> org.apache.spark.sql.execution.datasources.v2.DataSourceRDD$$anon$1.advanceToNextIter(DataSourceRDD.scala:79) >>>>>>>> at >>>>>>>> org.apache.spark.sql.execution.datasources.v2.DataSourceRDD$$anon$1.hasNext(DataSourceRDD.scala:63) >>>>>>>> at >>>>>>>> org.apache.spark.InterruptibleIterator.hasNext(InterruptibleIterator.scala:37) >>>>>>>> at scala.collection.Iterator$$anon$10.hasNext(Iterator.scala:460) at >>>>>>>> org.apache.spark.sql.catalyst.expressions.GeneratedClass$GeneratedIteratorForCodegenStage1.columnartorow_nextBatch_0$(Unknown >>>>>>>> Source) at >>>>>>>> org.apache.spark.sql.catalyst.expressions.GeneratedClass$GeneratedIteratorForCodegenStage1.hashAgg_doAggregateWithKeys_0$(Unknown >>>>>>>> Source) at >>>>>>>> >>>>>>>> Someone else has reported this issue on github as well: >>>>>>>> https://github.com/apache/iceberg/issues/8978 >>>>>>>> >>>>>>>> It's currently working with spark 3.4 and iceberg 1.3 . However >>>>>>>> Ideally it'd be nice to get it working with spark 3.5 as well as 3.5 >>>>>>>> has >>>>>>>> many improvements in spark-connect. >>>>>>>> >>>>>>>> Thanks >>>>>>>> Nirav >>>>>>>> >>>>>>> >>>>>> >>>>>> -- >>>>>> Ryan Blue >>>>>> Tabular >>>>>> >>>>> >>>> >>>> -- >>>> Ryan Blue >>>> Tabular >>>> >>>