szehon-ho opened a new issue, #5543: URL: https://github.com/apache/iceberg/issues/5543
### Apache Iceberg version main (development) ### Query engine Spark ### Please describe the bug 🐞 I found this problem while doing https://github.com/apache/iceberg/pull/5376#discussion_r934960703, which now attempts to convert metrics to readable ones and encountered an exception. So just reporting the problem. See the test: TestIcebergSourceTablesBase::testFilesTableWithSnapshotIdInheritance https://github.com/apache/iceberg/blob/5f5c9235c10ed4a711a64de880491b3ae4f348ec/spark/v3.3/spark/src/test/java/org/apache/iceberg/spark/source/TestIcebergSourceTablesBase.java#L466 Setup: The parquet table is a partitioned one, so when we insert data to that table, there is only one column in the file (data). Column "Id" is partitioned so does not exist in the file. Code Flow: The import code seems to do the following steps (via TableMigraitonUtil::listPartition -> TableMigrationUtil::getParquetMetrics -> ParquetUtil::footerMetrics()) 1. Assign Field Ids 2. Calculate metrics The first step, it sees the parquet file schema does not have ids (expected) and assigns the ids using ParquetSchemaUtil::addFallbackIds, which starts as 1, so now data column has field 1. The second step calculates metrics for 'data' column and puts them in the map with id=1. However, in the Iceberg destination table schema, id=1, data=2. So now, when we try to read the metrics, they are not correct. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
