szehon-ho opened a new issue, #5543:
URL: https://github.com/apache/iceberg/issues/5543

   ### Apache Iceberg version
   
   main (development)
   
   ### Query engine
   
   Spark
   
   ### Please describe the bug 🐞
   
   I found this problem while doing 
https://github.com/apache/iceberg/pull/5376#discussion_r934960703, which now 
attempts to convert metrics to readable ones and encountered an exception.  So 
just reporting the problem.
   
   See the test:  
TestIcebergSourceTablesBase::testFilesTableWithSnapshotIdInheritance 
https://github.com/apache/iceberg/blob/5f5c9235c10ed4a711a64de880491b3ae4f348ec/spark/v3.3/spark/src/test/java/org/apache/iceberg/spark/source/TestIcebergSourceTablesBase.java#L466
   
   Setup:  
   The parquet table is a partitioned one, so when we insert data to that 
table, there is only one column in the file (data).  Column "Id" is partitioned 
so does not exist in the file.
   
   Code Flow:  
   The import code seems to do the following steps (via 
TableMigraitonUtil::listPartition -> TableMigrationUtil::getParquetMetrics -> 
ParquetUtil::footerMetrics())
   1. Assign Field Ids
   2.  Calculate metrics
   
   The first step, it sees the parquet file schema does not have ids (expected) 
and assigns the ids using ParquetSchemaUtil::addFallbackIds, which starts as 1, 
so now data column has field 1.
   
   The second step calculates metrics for 'data' column and puts them in the 
map with id=1.
   
   However, in the Iceberg destination table schema, id=1, data=2.  So now, 
when we try to read the metrics, they are not correct.
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to