PS - issue doesn't happen if we don't use spark-connect and instead just use spark-shell or pyspark as OP in github said as well. however stacktrace desont seem to point any of the class from spark-connect jar (org.apache.spark:spark-connect_2.12:3.5.0).
On Tue, Jan 9, 2024 at 4:52 PM Nirav Patel <nira...@gmail.com> wrote: > Hi, > We are testing spark-connect with iceberg. > We tried spark 3.5, iceberg 1.4.x versions (all of > iceberg-spark-runtime-3.5_2.12-1.4.x.jar) > > with all the 1.4.x jars we are having following issue when running iceberg > queries from sparkSession created using spark-connect (--remote > "sc://remote-master-node") > > org.apache.iceberg.spark.source.SerializableTableWithSize cannot be cast > to org.apache.iceberg.Table at > org.apache.iceberg.spark.source.SparkInputPartition.table(SparkInputPartition.java:88) > at > org.apache.iceberg.spark.source.BatchDataReader.<init>(BatchDataReader.java:50) > at > org.apache.iceberg.spark.source.SparkColumnarReaderFactory.createColumnarReader(SparkColumnarReaderFactory.java:52) > at > org.apache.spark.sql.execution.datasources.v2.DataSourceRDD$$anon$1.advanceToNextIter(DataSourceRDD.scala:79) > at > org.apache.spark.sql.execution.datasources.v2.DataSourceRDD$$anon$1.hasNext(DataSourceRDD.scala:63) > at > org.apache.spark.InterruptibleIterator.hasNext(InterruptibleIterator.scala:37) > at scala.collection.Iterator$$anon$10.hasNext(Iterator.scala:460) at > org.apache.spark.sql.catalyst.expressions.GeneratedClass$GeneratedIteratorForCodegenStage1.columnartorow_nextBatch_0$(Unknown > Source) at > org.apache.spark.sql.catalyst.expressions.GeneratedClass$GeneratedIteratorForCodegenStage1.hashAgg_doAggregateWithKeys_0$(Unknown > Source) at > > Someone else has reported this issue on github as well: > https://github.com/apache/iceberg/issues/8978 > > It's currently working with spark 3.4 and iceberg 1.3 . However Ideally > it'd be nice to get it working with spark 3.5 as well as 3.5 has many > improvements in spark-connect. > > Thanks > Nirav >