laskoviymishka commented on issue #602:
URL: https://github.com/apache/iceberg-go/issues/602#issuecomment-4063879892

   I've been looking into this for a while and I think it's more feasible than 
it might seem — iceberg-go already has most of the plumbing in place. The 
`snapshotProducer` supports data files + position delete files in one snapshot, 
`DataFileBuilder.EqualityFieldIDs()` exists, and the metrics infrastructure 
already handles equality deletes.
   
   My thinking is to approach this incrementally:
   
   1. Start with the RowDelta API surface itself — a builder on `Transaction` 
that commits data files + delete files in one snapshot. This would work 
immediately with position deletes that are already supported end-to-end.
   2. Then add equality delete file writing (the writer, schema projection, 
wiring into the snapshot producer).
   3. Then equality delete reading in the scanner (the hardest part — 
hash-based anti-join + sequence number filtering).
   
   I have a concrete use case driving this — CDC replication from Postgres to 
Iceberg via [Transferia](https://github.com/transferia/iceberg) 
([transferia/iceberg#4](https://github.com/transferia/iceberg/issues/4)). I'm 
working on a related multi-table commit feature (#784) right now and plan to 
pick this up next. Happy to share a more detailed breakdown or discuss API 
design before starting.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to