Re: Iceberg 1.4/spark3.5 seem to have some breaking issue with spark-connect

Nirav Patel Thu, 18 Jan 2024 09:46:26 -0800

Classloading does seem like an issue while using it with Spark Connect 3.5
and iceberg >= 1.4 version only though.


It's weird as I also mentioned in previous email that after adding spark
property (spark.executor.userClassPathFirst=true) both classes gets loaded
from same classloader - org.apache.spark.util.ChildFirstURLClassLoader. Not
sure why error would still happen.

 java.lang.ClassCastException: class
org.apache.iceberg.spark.source.SerializableTableWithSize cannot be cast to
class org.apache.iceberg.Table (org.apache.iceberg.spark.source.
*SerializableTableWithSize* is in unnamed module of loader
org.apache.spark.util.*ChildFirstURLClassLoader* @a41c33c;
org.apache.iceberg.*Table* is in unnamed module of loader
org.apache.spark.util.*ChildFirstURLClassLoader* @16f95afb)


On Tue, Jan 16, 2024 at 12:53 PM Ryan Blue <[email protected]> wrote:

> It looks to me like the classloader is the problem. The "child first"
> classloader is apparently loading `Table`, but Spark is loading
> `SerializableTableWithSize` from the parent classloader. Because delegation
> isn't happening properly, you're getting two incompatible classes from the
> same classpath, depending on where a class was loaded for the first time.
>
> On Fri, Jan 12, 2024 at 5:30 PM Nirav Patel <[email protected]> wrote:
>
>> It seem to happening on executor of SC server as I see the error in
>> executor logs. We did verify that there was only one version of
>> iceberg-spark-runtime at the moment.
>> We do include custom catalog imp jar. Though it's a shaded jar I don't
>> see in "org/apache/iceberg/Table" or other iceberg classes when I do "jar
>> -tvf" on that.
>>
>> I see both jars in 3 spark
>> configs. spark.repl.local.jars, spark.yarn.dist.jars
>> and spark.yarn.secondary.jars.
>>
>> I suspected classloading issue as well as initially error was pointing to
>> it:
>>
>> pyspark.errors.exceptions.connect.SparkConnectGrpcException:
>> (org.apache.spark.SparkException) Job aborted due to stage failure: Task 0
>> in stage 0.0 failed 4 times, most recent failure: Lost task 0.3 in stage
>> 0.0 (TID 3) (spark35-m.c.strivr-dev-test.internal executor 2):
>> java.lang.ClassCastException: class
>> org.apache.iceberg.spark.source.SerializableTableWithSize cannot be cast to
>> class org.apache.iceberg.Table
>> (org.apache.iceberg.spark.source.SerializableTableWithSize is in unnamed
>> module of loader org.apache.spark.util.*MutableURLClassLoader*
>> @6819e13c; org.apache.iceberg.Table is in unnamed module of loader
>> org.apache.spark.util.*ChildFirstURLClassLoader* @15fb0c43)
>>
>> Although *ChildFirstURLClassLoader *is child of MutableURLClassLoader
>> error shouldn't be related to that.  I still try adding spark flag (--conf
>> "spark.executor.userClassPathFirst=true") when starting spark connect
>> server. it seem both classes gets loaded by same ClassLoader but error
>> still happens:
>>
>> pyspark.errors.exceptions.connect.SparkConnectGrpcException:
>> (org.apache.spark.SparkException) Job aborted due to stage failure: Task 0
>> in stage 0.0 failed 4 times, most recent failure: Lost task 0.3 in stage
>> 0.0 (TID 3) (spark35-m.c.strivr-dev-test.internal executor 2):
>> java.lang.ClassCastException: class
>> org.apache.iceberg.spark.source.SerializableTableWithSize cannot be cast to
>> class org.apache.iceberg.Table
>> (org.apache.iceberg.spark.source.SerializableTableWithSize is in unnamed
>> module of loader org.apache.spark.util.*ChildFirstURLClassLoader*
>> @a41c33c; org.apache.iceberg.Table is in unnamed module of loader
>> org.apache.spark.util.*ChildFirstURLClassLoader* @16f95afb)
>>
>> I see ClassLoader @ <some_id> in logs. Are those object Ids? (been awhile
>> working with java) . wondering if multiple instance of same CLassLoader is
>> being initialized by SC. may be doing --verbose:class or heap dump help to
>> verify?
>>
>>
>> On Fri, Jan 12, 2024 at 4:38 PM Ryan Blue <[email protected]> wrote:
>>
>>> I think it looks like a version mismatch, perhaps between the SC client
>>> and the server or between where planning occurs and the executors. The
>>> error is that the `SerializableTableWithSize` is not a subclass of `Table`,
>>> but it definitely should be. That sort of problem is usually caused by
>>> class loading issues. Can you double-check that you have only one Iceberg
>>> runtime in the Environment tab of your Spark cluster?
>>>
>>> On Tue, Jan 9, 2024 at 4:57 PM Nirav Patel <[email protected]> wrote:
>>>
>>>> PS - issue doesn't happen if we don't use spark-connect and instead
>>>> just use spark-shell or pyspark as OP in github said as well. however
>>>> stacktrace desont seem to point any of the class from spark-connect jar
>>>> (org.apache.spark:spark-connect_2.12:3.5.0).
>>>>
>>>> On Tue, Jan 9, 2024 at 4:52 PM Nirav Patel <[email protected]> wrote:
>>>>
>>>>> Hi,
>>>>> We are testing spark-connect with iceberg.
>>>>> We tried spark 3.5, iceberg 1.4.x versions (all of
>>>>> iceberg-spark-runtime-3.5_2.12-1.4.x.jar)
>>>>>
>>>>> with all the 1.4.x jars we are having following issue when running
>>>>> iceberg queries from sparkSession created using spark-connect (--remote
>>>>> "sc://remote-master-node")
>>>>>
>>>>> org.apache.iceberg.spark.source.SerializableTableWithSize cannot be
>>>>> cast to org.apache.iceberg.Table at
>>>>> org.apache.iceberg.spark.source.SparkInputPartition.table(SparkInputPartition.java:88)
>>>>> at
>>>>> org.apache.iceberg.spark.source.BatchDataReader.<init>(BatchDataReader.java:50)
>>>>> at
>>>>> org.apache.iceberg.spark.source.SparkColumnarReaderFactory.createColumnarReader(SparkColumnarReaderFactory.java:52)
>>>>> at
>>>>> org.apache.spark.sql.execution.datasources.v2.DataSourceRDD$$anon$1.advanceToNextIter(DataSourceRDD.scala:79)
>>>>> at
>>>>> org.apache.spark.sql.execution.datasources.v2.DataSourceRDD$$anon$1.hasNext(DataSourceRDD.scala:63)
>>>>> at
>>>>> org.apache.spark.InterruptibleIterator.hasNext(InterruptibleIterator.scala:37)
>>>>> at scala.collection.Iterator$$anon$10.hasNext(Iterator.scala:460) at
>>>>> org.apache.spark.sql.catalyst.expressions.GeneratedClass$GeneratedIteratorForCodegenStage1.columnartorow_nextBatch_0$(Unknown
>>>>> Source) at
>>>>> org.apache.spark.sql.catalyst.expressions.GeneratedClass$GeneratedIteratorForCodegenStage1.hashAgg_doAggregateWithKeys_0$(Unknown
>>>>> Source) at
>>>>>
>>>>> Someone else has reported this issue on github as well:
>>>>> https://github.com/apache/iceberg/issues/8978
>>>>>
>>>>> It's currently working with spark 3.4 and iceberg 1.3 . However
>>>>> Ideally it'd be nice to get it working with spark 3.5 as well as 3.5 has
>>>>> many improvements in spark-connect.
>>>>>
>>>>> Thanks
>>>>> Nirav
>>>>>
>>>>
>>>
>>> --
>>> Ryan Blue
>>> Tabular
>>>
>>
>
> --
> Ryan Blue
> Tabular
>

Re: Iceberg 1.4/spark3.5 seem to have some breaking issue with spark-connect

Reply via email to