rajan-v opened a new issue, #5673: URL: https://github.com/apache/iceberg/issues/5673
### Feature Request / Improvement Support for another interface in DeleteFiles _DeleteFiles deleteOffsetFromDataFile(Map<String, String> dataFileAndOffsetFileMap)_ https://iceberg.apache.org/javadoc/master/org/apache/iceberg/DeleteFiles.html **Context**: If we process datafile directly and derive that for a datafile we need to delete records at certain offsets, then we need some interface from Iceberg to pass that {datafile, offset} information. Till now without upsert features in catalog like hive table, lot of legacy application are scanning files and deriving business related details along with doing some updates by traditional way of rewriting complete hdfs files. All such computation can be reused if we have this API supported from Iceberg. **Option-1 (Eventual Deletes)** This can be thought as Eventual Deletes as this delete flow can just update the delete data files struct and can skip updating the manifest and stats around it. Any new snapshot commit can compute correct set of deletion and fix the manifest stats w.r.t deletion. _Option-2(Actual Deletes)_ Go through the standard flow of deletion and apply offset deletion. Did some experiment in executing deletes on Iceberg tables and then replacing, removing the immutable delete_files generated in the respective directories. Was able to validate the application of those delete filters while querying. ### Query engine Spark -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
