[GitHub] [iceberg] manuzhang opened a new issue, #8655: Spark failed to read imported parquet file

via GitHub Tue, 26 Sep 2023 08:35:21 -0700


manuzhang opened a new issue, #8655:
URL: https://github.com/apache/iceberg/issues/8655


   ### Apache Iceberg version
   
   1.2.1
   
   ### Query engine
   
   Spark
   
   ### Please describe the bug 🐞
   
   1. import parquet table into iceberg table with `add_files` procedure
   2. read iceberg table failed with following exception
   ```
   Caused by: org.apache.spark.SparkException: Job aborted due to stage 
failure: Task 7 in stage 0.0 failed 2 times, most recent failure: Lost task 7.1 
in stage 0.0 (TID 152) 
(hdc34-lvs05-01-0310-6208-042-tess0172.stratus.lvs.ebay.com executor 3): 
java.lang.IllegalStateException: Value at index is null
        at 
org.apache.iceberg.shaded.org.apache.arrow.vector.TimeStampVector.get(TimeStampVector.java:74)
        at 
org.apache.iceberg.arrow.vectorized.GenericArrowVectorAccessorFactory$TimestampMicroTzAccessor.getLong(GenericArrowVectorAccessorFactory.java:501)
        at 
org.apache.iceberg.spark.data.vectorized.IcebergArrowColumnVector.getLong(IcebergArrowColumnVector.java:101)
        at 
org.apache.spark.sql.catalyst.expressions.GeneratedClass$GeneratedIteratorForCodegenStage1.processNext(Unknown
 Source)
        at 
org.apache.spark.sql.execution.BufferedRowIterator.hasNext(BufferedRowIterator.java:43)
        at 
org.apache.spark.sql.execution.WholeStageCodegenExec$$anon$1.hasNext(WholeStageCodegenExec.scala:756)
   ```
   3. I narrow down the issue to a timestamp column containing null values
   4. inserting parquet table into iceberg table doesn't have this issue
   5. I compare the table metrics. The only difference is that column's 
`lower_bound` and `upper_bound` are null from the imported file


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[GitHub] [iceberg] manuzhang opened a new issue, #8655: Spark failed to read imported parquet file

Reply via email to