Shane-Yu commented on issue #5404: URL: https://github.com/apache/iceberg/issues/5404#issuecomment-1212867591
I also met this probelm in the same case. It's not "some delete files associated with the data file" casue this problem. Add log in the tail of https://github.com/apache/iceberg/blob/5a15efc070ab59eeda6343998aa065c0c9892c5c/core/src/main/java/org/apache/iceberg/DeleteFileIndex.java#L151 to print the data file path, delete file path, lower and upper. And you can see the upper and lower filepath info is not complete filepath, but truncate 16 bit. This can lead to false positives when determining whether a data file references a deleted file. From the source code https://github.com/apache/iceberg/blob/5a15efc070ab59eeda6343998aa065c0c9892c5c/core/src/main/java/org/apache/iceberg/MetricsConfig.java#L52 you can see the DEFAULT_WRITE_METRICS_MODE_DEFAULT is truncate(16). The upper and lower information of the filepath was intercepted when the data file was generated, which lead to the misjudgment when commit in rewrite data. To resolve this problem, add a property like this when create table. ` alter table iceberg_table set tblproperties ( 'write.etadata.metrics.default'='full' );` -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
