rajan-v opened a new issue, #5673:
URL: https://github.com/apache/iceberg/issues/5673

   ### Feature Request / Improvement
   
   Support for another interface in DeleteFiles 
   _DeleteFiles  deleteOffsetFromDataFile(Map<String, String> 
dataFileAndOffsetFileMap)_
   https://iceberg.apache.org/javadoc/master/org/apache/iceberg/DeleteFiles.html
   
   **Context**:
   If we process datafile directly and derive that for a datafile we need to 
delete records at certain offsets, then we need some interface from Iceberg to 
pass that {datafile, offset} information. Till now without upsert features in 
catalog like hive table, lot of legacy application are scanning files and 
deriving business related details along with doing some updates by traditional 
way of rewriting complete hdfs files. All such computation can be reused if we 
have this API supported from Iceberg.
   
   
   **Option-1 (Eventual Deletes)**
   This can be thought as Eventual Deletes as this delete flow can just update 
the delete data files struct and can skip updating the manifest and stats 
around it. 
   Any new snapshot commit can compute correct set of deletion and fix the 
manifest stats w.r.t deletion.
   
   _Option-2(Actual Deletes)_
   Go through the standard flow of deletion and apply offset deletion.
   
   
   Did some experiment in executing deletes on Iceberg tables and then 
replacing, removing the immutable delete_files generated in the respective 
directories. Was able to validate the application of those delete filters while 
querying. 
   
   
   ### Query engine
   
   Spark


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to