singhpk234 opened a new pull request, #5888:
URL: https://github.com/apache/iceberg/pull/5888
### About the change :
Revert compaction in case a conflict is detected with the non compaction
writes.
* Tag / mark the snapshot which was created as part of the compaction
process (can use snapshot summary and introduce a flag is_compaction), so that
it can be used for identification.
While committing the pending updates of the transaction, check if the
current snapshot conflicts with the transaction updates.
* If the current snapshot conflicts and it’s a snapshot created by the
compaction process (check snapshot summary key exists). Then revert the current
snapshot (rollback to its parent snapshot) and then try re-applying updates on
top of it.
Introduce a new table property `rollback.compaction.on-conflicts.enabled`,
which essentially helps in identifying if we want to rollback the compaction
commit if conflicts are detected.
* Let’s say a transaction has U1, U2, U3 ... update to be applied, let say B
is the base snapshot, now when we were going to actually commit the transaction
it saw the current snapshot now is B`(a snapshot created due to compaction), so
make current snapshot as B` and try re-apply updates i.e {U1, U2, U3}, when
applying U3 on top of { B` -> U1 -> U2 } a metadata conflict was detected, then
try updates of transactions by rolling back B` to it’s parent, Let's say
rollback B` to its parent is called RollbackB` so now try applying {RollbackB`,
U1, U2, U3} on top of B` and see if it still conflicts if yes fail, else commit
and update table metadata.
Based on proposal :
https://docs.google.com/document/d/1pSqxf5A59J062j9VFF5rcCpbW9vdTbBKTmjps80D-B0/edit#
Adding this here to get a feedback on the approach. Once agreed upon, can
make engines (for ex: spark) use transaction api.
-----
### Testing done :
- Added ut's for validating is_compacted is present in compacted snapshot
summary
- Added ut for e2e using transaction API and concurrent rewrite.
- TODO add more exhaustive UT's
cc @rdblue @danielcweeks @jackye1995 @rajarshisarkar @amogh-jahagirdar
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]