Re: Iceberg 1.4/spark3.5 seem to have some breaking issue with spark-connect

Ryan Blue Tue, 16 Jan 2024 12:52:27 -0800

It looks to me like the classloader is the problem. The "child first"
classloader is apparently loading `Table`, but Spark is loading
`SerializableTableWithSize` from the parent classloader. Because delegation
isn't happening properly, you're getting two incompatible classes from the
same classpath, depending on where a class was loaded for the first time.


On Fri, Jan 12, 2024 at 5:30 PM Nirav Patel <nira...@gmail.com> wrote:

> It seem to happening on executor of SC server as I see the error in
> executor logs. We did verify that there was only one version of
> iceberg-spark-runtime at the moment.
> We do include custom catalog imp jar. Though it's a shaded jar I don't see
> in "org/apache/iceberg/Table" or other iceberg classes when I do "jar -tvf"
> on that.
>
> I see both jars in 3 spark
> configs. spark.repl.local.jars, spark.yarn.dist.jars
> and spark.yarn.secondary.jars.
>
> I suspected classloading issue as well as initially error was pointing to
> it:
>
> pyspark.errors.exceptions.connect.SparkConnectGrpcException:
> (org.apache.spark.SparkException) Job aborted due to stage failure: Task 0
> in stage 0.0 failed 4 times, most recent failure: Lost task 0.3 in stage
> 0.0 (TID 3) (spark35-m.c.strivr-dev-test.internal executor 2):
> java.lang.ClassCastException: class
> org.apache.iceberg.spark.source.SerializableTableWithSize cannot be cast to
> class org.apache.iceberg.Table
> (org.apache.iceberg.spark.source.SerializableTableWithSize is in unnamed
> module of loader org.apache.spark.util.*MutableURLClassLoader* @6819e13c;
> org.apache.iceberg.Table is in unnamed module of loader
> org.apache.spark.util.*ChildFirstURLClassLoader* @15fb0c43)
>
> Although *ChildFirstURLClassLoader *is child of MutableURLClassLoader
> error shouldn't be related to that.  I still try adding spark flag (--conf
> "spark.executor.userClassPathFirst=true") when starting spark connect
> server. it seem both classes gets loaded by same ClassLoader but error
> still happens:
>
> pyspark.errors.exceptions.connect.SparkConnectGrpcException:
> (org.apache.spark.SparkException) Job aborted due to stage failure: Task 0
> in stage 0.0 failed 4 times, most recent failure: Lost task 0.3 in stage
> 0.0 (TID 3) (spark35-m.c.strivr-dev-test.internal executor 2):
> java.lang.ClassCastException: class
> org.apache.iceberg.spark.source.SerializableTableWithSize cannot be cast to
> class org.apache.iceberg.Table
> (org.apache.iceberg.spark.source.SerializableTableWithSize is in unnamed
> module of loader org.apache.spark.util.*ChildFirstURLClassLoader*
> @a41c33c; org.apache.iceberg.Table is in unnamed module of loader
> org.apache.spark.util.*ChildFirstURLClassLoader* @16f95afb)
>
> I see ClassLoader @ <some_id> in logs. Are those object Ids? (been awhile
> working with java) . wondering if multiple instance of same CLassLoader is
> being initialized by SC. may be doing --verbose:class or heap dump help to
> verify?
>
>
> On Fri, Jan 12, 2024 at 4:38 PM Ryan Blue <b...@tabular.io> wrote:
>
>> I think it looks like a version mismatch, perhaps between the SC client
>> and the server or between where planning occurs and the executors. The
>> error is that the `SerializableTableWithSize` is not a subclass of `Table`,
>> but it definitely should be. That sort of problem is usually caused by
>> class loading issues. Can you double-check that you have only one Iceberg
>> runtime in the Environment tab of your Spark cluster?
>>
>> On Tue, Jan 9, 2024 at 4:57 PM Nirav Patel <nira...@gmail.com> wrote:
>>
>>> PS - issue doesn't happen if we don't use spark-connect and instead just
>>> use spark-shell or pyspark as OP in github said as well. however stacktrace
>>> desont seem to point any of the class from spark-connect jar
>>> (org.apache.spark:spark-connect_2.12:3.5.0).
>>>
>>> On Tue, Jan 9, 2024 at 4:52 PM Nirav Patel <nira...@gmail.com> wrote:
>>>
>>>> Hi,
>>>> We are testing spark-connect with iceberg.
>>>> We tried spark 3.5, iceberg 1.4.x versions (all of
>>>> iceberg-spark-runtime-3.5_2.12-1.4.x.jar)
>>>>
>>>> with all the 1.4.x jars we are having following issue when running
>>>> iceberg queries from sparkSession created using spark-connect (--remote
>>>> "sc://remote-master-node")
>>>>
>>>> org.apache.iceberg.spark.source.SerializableTableWithSize cannot be
>>>> cast to org.apache.iceberg.Table at
>>>> org.apache.iceberg.spark.source.SparkInputPartition.table(SparkInputPartition.java:88)
>>>> at
>>>> org.apache.iceberg.spark.source.BatchDataReader.<init>(BatchDataReader.java:50)
>>>> at
>>>> org.apache.iceberg.spark.source.SparkColumnarReaderFactory.createColumnarReader(SparkColumnarReaderFactory.java:52)
>>>> at
>>>> org.apache.spark.sql.execution.datasources.v2.DataSourceRDD$$anon$1.advanceToNextIter(DataSourceRDD.scala:79)
>>>> at
>>>> org.apache.spark.sql.execution.datasources.v2.DataSourceRDD$$anon$1.hasNext(DataSourceRDD.scala:63)
>>>> at
>>>> org.apache.spark.InterruptibleIterator.hasNext(InterruptibleIterator.scala:37)
>>>> at scala.collection.Iterator$$anon$10.hasNext(Iterator.scala:460) at
>>>> org.apache.spark.sql.catalyst.expressions.GeneratedClass$GeneratedIteratorForCodegenStage1.columnartorow_nextBatch_0$(Unknown
>>>> Source) at
>>>> org.apache.spark.sql.catalyst.expressions.GeneratedClass$GeneratedIteratorForCodegenStage1.hashAgg_doAggregateWithKeys_0$(Unknown
>>>> Source) at
>>>>
>>>> Someone else has reported this issue on github as well:
>>>> https://github.com/apache/iceberg/issues/8978
>>>>
>>>> It's currently working with spark 3.4 and iceberg 1.3 . However Ideally
>>>> it'd be nice to get it working with spark 3.5 as well as 3.5 has many
>>>> improvements in spark-connect.
>>>>
>>>> Thanks
>>>> Nirav
>>>>
>>>
>>
>> --
>> Ryan Blue
>> Tabular
>>
>

-- 
Ryan Blue
Tabular

Re: Iceberg 1.4/spark3.5 seem to have some breaking issue with spark-connect

Reply via email to