prodeezy opened a new issue #2962: URL: https://github.com/apache/iceberg/issues/2962
As part of https://github.com/apache/iceberg/issues/1441 Parquet version was updated from 1.11.0 to 1.11.1. After rebasing our internal version with latest changes from master we found that certain fields written with iceberg using Parquet v1.11.0 are not readable with iceberg built against Parquet v1.11.1 **Error:** ``` java.lang.IllegalArgumentException: [segmentMembership, map, key] required binary key (STRING) = 9 is not in the store: [[identityMap, map, key] required binary key (STRING) = 3, [identityMap, map, value, list, element, id] optional binary id (STRING) = 7, [identityMap, map, value, list, element, authenticatedState] optional binary authenticatedState (STRING) = 6, [identityMap, map, value, list, element, primary] optional boolean primary = 8] 4 at org.apache.iceberg.shaded.org.apache.parquet.hadoop.ColumnChunkPageReadStore.getPageReader(ColumnChunkPageReadStore.java:231) at org.apache.iceberg.parquet.ParquetValueReaders$PrimitiveReader.setPageSource(ParquetValueReaders.java:185) at org.apache.iceberg.parquet.ParquetValueReaders$RepeatedKeyValueReader.setPageSource(ParquetValueReaders.java:529) at org.apache.iceberg.parquet.ParquetValueReaders$StructReader.setPageSource(ParquetValueReaders.java:685) at org.apache.iceberg.parquet.ParquetReader$FileIterator.advance(ParquetReader.java:142) at org.apache.iceberg.parquet.ParquetReader$FileIterator.next(ParquetReader.java:112) at org.apache.iceberg.io.FilterIterator.advance(FilterIterator.java:66) at org.apache.iceberg.io.FilterIterator.hasNext(FilterIterator.java:50) at org.apache.iceberg.spark.source.BaseDataReader.next(BaseDataReader.java:87) at org.apache.spark.sql.execution.datasources.v2.DataSourceRDD$$anon$1.hasNext(DataSourceRDD.scala:49) at org.apache.spark.InterruptibleIterator.hasNext(InterruptibleIterator.scala:37) at org.apache.spark.sql.catalyst.expressions.GeneratedClass$GeneratedIteratorForCodegenStage1.processNext(Unknown Source) at org.apache.spark.sql.execution.BufferedRowIterator.hasNext(BufferedRowIterator.java:43) at org.apache.spark.sql.execution.WholeStageCodegenExec$$anonfun$13$$anon$1.hasNext(WholeStageCodegenExec.scala:640) at org.apache.spark.sql.execution.datasources.v2.DataWritingSparkTask$$anonfun$run$3.apply(WriteToDataSourceV2Exec.scala:117) at org.apache.spark.sql.execution.datasources.v2.DataWritingSparkTask$$anonfun$run$3.apply(WriteToDataSourceV2Exec.scala:116) at org.apache.spark.util.Utils$.tryWithSafeFinallyAndFailureCallbacks(Utils.scala:1560) at org.apache.spark.sql.execution.datasources.v2.DataWritingSparkTask$.run(WriteToDataSourceV2Exec.scala:146) at org.apache.spark.sql.execution.datasources.v2.WriteToDataSourceV2Exec$$anonfun$doExecute$2.apply(WriteToDataSourceV2Exec.scala:67) at org.apache.spark.sql.execution.datasources.v2.WriteToDataSourceV2Exec$$anonfun$doExecute$2.apply(WriteToDataSourceV2Exec.scala:66) at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:90) at org.apache.spark.scheduler.Task.doRunTask(Task.scala:139) at org.apache.spark.scheduler.Task.run(Task.scala:112) at org.apache.spark.executor.Executor$TaskRunner$$anonfun$13.apply(Executor.scala:497) at org.apache.spark.util.Utils$.tryWithSafeFinally(Utils.scala:1526) at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:503) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) at java.lang.Thread.run(Thread.java:748) ``` Field in question is of this form: ``` |-- segmentMembership: map (nullable = true) | |-- key: string | |-- value: map (valueContainsNull = true) | | |-- key: string | | |-- value: struct (valueContainsNull = true) | | | |-- payload: struct (nullable = true) | | | | |-- boolA: boolean (nullable = true) | | | | |-- doubleValueA: double (nullable = true) | | | | |-- doubleValueB: double (nullable = true) | | | | |-- stringValueA: string (nullable = true) | | | | |-- stringValueB: string (nullable = true) | | | |-- status: string (nullable = true) | | | |-- lastQualificationTime: timestamp (nullable = true) ``` -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
