yanxugithub opened a new issue, #14655:
URL: https://github.com/apache/iceberg/issues/14655

   ### Apache Iceberg version
   
   1.10.0 (latest release)
   
   ### Query engine
   
   Spark
   
   ### Please describe the bug 🐞
   
   I ran into "java.lang.NullPointerException: Cannot invoke 
"org.apache.iceberg.types.Type.asVariantType()" because "iType" is null" with 
the simple table
   spark.sql("Create or replace Table customer ( customerId long, 
customerDetail variant) using iceberg tblproperties('format-version' = '3')")
   spark.sql("""
   insert into customer (customerId, customerDetail)
   values (1, parse_json('{"name":"customer 1"}'))
   """)
   
   spark.sql("describe table customer").show()
   +--------------+---------+-------+
   |      col_name|data_type|comment|
   +--------------+---------+-------+
   |    customerId|   bigint|   NULL|
   |customerDetail|  variant|   NULL|
   +--------------+---------+-------+
   
   spark.version
   '4.0.1'
   
   spark.sql("""select schema_of_variant(parse_json('{"name":"customer 
1"}'))""").show()
   +----------------------------------------------------+
   |schema_of_variant(parse_json({"name":"customer 1"}))|
   +----------------------------------------------------+
   |                                OBJECT<name: STRING>|
   +----------------------------------------------------+
   
   
   
   When I ran spark.sql("select customerId from customer").show()
   SparkException: Job aborted due to stage failure: Task 0 in stage 32.0 
failed 1 times, most recent failure: Lost task 0.0 in stage 32.0 (TID 32) 
(93bcdaad8379 executor driver): java.lang.NullPointerException: Cannot invoke 
"org.apache.iceberg.types.Type.asVariantType()" because "iType" is null
        at 
org.apache.iceberg.parquet.TypeWithSchemaVisitor.visit(TypeWithSchemaVisitor.java:67)
        at 
org.apache.iceberg.parquet.TypeWithSchemaVisitor.visitField(TypeWithSchemaVisitor.java:193)
        at 
org.apache.iceberg.parquet.TypeWithSchemaVisitor.visitFields(TypeWithSchemaVisitor.java:208)
        at 
org.apache.iceberg.parquet.TypeWithSchemaVisitor.visit(TypeWithSchemaVisitor.java:49)
        at 
org.apache.iceberg.parquet.ParquetSchemaUtil.pruneColumns(ParquetSchemaUtil.java:134)
        at org.apache.iceberg.parquet.ReadConf.<init>(ReadConf.java:82)
        at 
org.apache.iceberg.parquet.VectorizedParquetReader.init(VectorizedParquetReader.java:90)
        at 
org.apache.iceberg.parquet.VectorizedParquetReader.iterator(VectorizedParquetReader.java:99)
        at 
org.apache.iceberg.spark.source.BatchDataReader.open(BatchDataReader.java:126)
        at 
org.apache.iceberg.spark.source.BatchDataReader.open(BatchDataReader.java:43)
        at org.apache.iceberg.spark.source.BaseReader.next(BaseReader.java:141)
        at 
org.apache.spark.sql.execution.datasources.v2.PartitionIterator.hasNext(DataSourceRDD.scala:148)
        at 
org.apache.spark.sql.execution.datasources.v2.MetricsIterator.hasNext(DataSourceRDD.scala:186)
        at 
org.apache.spark.sql.execution.datasources.v2.DataSourceRDD$$anon$1.$anonfun$hasNext$1(DataSourceRDD.scala:72)
        at 
org.apache.spark.sql.execution.datasources.v2.DataSourceRDD$$anon$1.$anonfun$hasNext$1$adapted(DataSourceRDD.scala:72)
        at scala.Option.exists(Option.scala:406)
        at 
org.apache.spark.sql.execution.datasources.v2.DataSourceRDD$$anon$1.hasNext(DataSourceRDD.scala:72)
        at 
org.apache.spark.sql.execution.datasources.v2.DataSourceRDD$$anon$1.advanceToNextIter(DataSourceRDD.scala:103)
        at 
org.apache.spark.sql.execution.datasources.v2.DataSourceRDD$$anon$1.hasNext(DataSourceRDD.scala:72)
        at org...
   
   It appears that there is some problem in 
org.apache.iceberg.parquet.ParquetSchemaUtil.pruneColumns
   
   iceberg runtime is iceberg-spark-runtime-4.0_2.13-1.10.0.jar
   
   ### Willingness to contribute
   
   - [ ] I can contribute a fix for this bug independently
   - [ ] I would be willing to contribute a fix for this bug with guidance from 
the Iceberg community
   - [ ] I cannot contribute a fix for this bug at this time


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to