[ https://issues.apache.org/jira/browse/HUDI-5647?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Danny Chen closed HUDI-5647. ---------------------------- Resolution: Fixed Fixed via master branch: 6fbf9d4f840b1079877bd6d2e649678a9e01b715 > Automate savepoint and restore tests > ------------------------------------ > > Key: HUDI-5647 > URL: https://issues.apache.org/jira/browse/HUDI-5647 > Project: Apache Hudi > Issue Type: Improvement > Components: writer-core > Reporter: sivabalan narayanan > Assignee: Danny Chen > Priority: Critical > Labels: pull-request-available > Fix For: 0.13.1 > > > Automate savepoint and restore tests > Scenarios to cover: > > All tests to be done for > w/ and w/o metadata > partitioned and non-partitioned dataset. > COW > Format: > scenario being tested > timeline > what to expect after restore. > 1. straight forward > C1, C2, savepoint C2. C3, C4, restore. > should go back to C2. > C3, C4 should be cleaned up. > 2. pending inflight. > C1, C2, savepoint C2. C3, C4 inflight. restore. > should go back to C2. > C3, C4 should be cleaned up. > 3. completed rollbacks in timeline. > C1, C2, savepoint C2, C3, C4 (RB_C3), C5. restore. > should go back to C2. > C3, C4(RB_C3), C5 should be cleaned up. > 4. pending rollbacks after savepoint. > C1, C2, savepoint C2, C3, C4 (RB_C3) inflight. restore. > should go back to C2. > C3, C4 (RB_C3) should be cleaned up. > 5. clean commits after savepoint. > C1, C2, savepoint C2, C3, C4, C5 (clean C1), C6, restore > should go back to C2. > C3, C4, C5 (clean C1), C6 should be cleaned up. > 6. clustering. > C1, C2, savepoint C2. C3, C4.replace commit, C5, restore. > should go back to C2. > C3, C4.replace commit, C5 should be cleaned up. > 7. pending clustering after savepoint. > C1, C2, savepoint C2. C3, C4.replace commit.inflight, C5, restore. > should go back to C2. > C3, C4.replace commit files and C5 files should be cleaned up. > 8. completed clustering before savepoint. > C1, C2, C3.replacecommit.complete, C4, savepoint C4, C5, restore. > should go back to C4. > C5 should be cleaned up. > 9. pending clustering before savepoint. > C1, C2, C3.replace commit.inflight, C3, C4, savepoint C4, C5, restore > should go back to C4. > C4 should be cleaned up. if pipeline is restarted, C3.replace commit should > be re-attempted. > MOR > 1. simple one > DC1, DC2, DC3, savepoint DC3. DC4, DC5. restore > should rollback DC4 and DC5 > No files will be cleaned up. only rollback log appends. > 2. simple one w/ compaction. > DC1, DC2, DC3, C4, savepoint C4. DC5, DC6. restore > should rollback DC5 and DC6 > No files will be cleaned up. only rollback log appends. > 3. another one w/ compaction. > DC1, DC2, DC3, savepoint DC3, DC4, C5, DC6, DC7. restore > should rollback DC5 and DC6. > latest file slice should be fully cleaned up. and rollback log appends for > DC4 in first file slice. > 4. compaction and clean commits. > DC1, DC2, DC3, savepoint DC3, DC4, C5, DC6, DC7, DC8, C9, C10.clean, DC11, > DC12 restore. > should take the table back to DC3. > Cleaner should not have cleaned up file slice 1 since it was part of > savepoint. Entire file slice 2 and 3 should be cleaned up. > i.e. C5, DC6, DC7, DC8, C9, C10.clean, DC11, DC12. and a rollback log append > for DC4. > 5. pending compaction after savepoint. > DC1, DC2, DC3, savepoint DC3, DC4, C5.pending. DC6, DC7. restore > should rollback until DC3. > latest file slice should be fully delete. for DC4 a rollback log append > should be made. > 6. pending compaction before savepoint. > DC1, DC2, DC3, C4.pending, DC5, savepoint DC5, DC6, DC7. restore > should rollback until DC5. > rollback log appends for DC6 and DC7. > 7. compaction and clustering. completed clustering before savepoint. > DC1, DC2, DC3, C4, DC5, C6.replacecommit.completed. DC7, savepoint DC7, DC8, > DC9. restore > inpsect what C6 does. likely it will create a new file group. and then start > taking in DC7. > should take the table back to DC7. > rollback log appends for DC8 and DC9. > 8. compaction and clustering. completed clustering after savepoint. > DC1, DC2, DC3, C4, DC5, savepoint DC5, C6.replacecommit.completed, DC7, DC8, > restore > inpsect what C6 does. likely it will create a new file group. and then start > taking in DC7. > should take the table back to DC5. > latest file slice created by C6 should be cleaned up fully. > 9. pending clustering before savepoint. > DC1, DC2, DC3, C4, DC5, C6.replacecommit.inflight. DC7, savepoint DC7, DC8, > DC9. restore > should take the table back to DC7. > rollback log appends for DC8 and DC9. when pipeline is restarted, C6 should > be re-attempted and get to completion. > 10. pending clustering after savepoint. > DC1, DC2, DC3, C4, DC5, savepoint DC5, C6.replacecommit.inflight, DC7, DC8, > restore > should take the table back to DC5. > latest file slice created by C6 should be cleaned up fully. > 11. completed rollbacks after savepoint. > DC1, DC2, DC3, C4, savepoint C4. DC5, C6(RB_DC5), DC7. restore > should rollback DC5, C6 and DC6. > No files will be cleaned up. only rollback log appends. > > Few more cases to test: > > case 1: > rolling back a commit thats already cleaned up: > C1, C2, C3, C4, SP_C4, C5, C6, C7, C8, cleaner_C9 (cleaned up C1, C2, C3, > C5), C10, restore. > case 2: > inflight clean after savepoint which is supposed to clean up files pertaining > to a commit that will be rolled back by restore. > C1, C2, C3, C4, SP_C4, C5, C6, C7, C8, cleaner_C9.inflight (cleaned up C1, > C2, C3, C5), C10, restore. > after restore: > C1, C2, C3, C4, SP_C4, cleaner_C9.inflight > at some point, cleaner will retry. > Fix: restore should first finish any pending clean after savepoint and then > start the restore. > > More cases: > 12: > rolling back a commit thats already cleaned up: > C1, C2, C3, C4, SP_C4, C5, C6, C7, C8, cleaner_C9 (cleaned up C1, C2, C3, > C5), C10, restore. > 13: > inflight clean after savepoint which is supposed to clean up files pertaining > to a commit that will be rolled back by restore. > C1, C2, C3, C4, SP_C4, C5, C6, C7, C8, cleaner_C9.inflight (cleaned up C1, > C2, C3, C5), C10, restore. > after restore: > C1, C2, C3, C4, SP_C4, cleaner_C9.inflight > at some point, cleaner will retry. > When cleaner retries, it does succeed w/o any issues. > -- This message was sent by Atlassian Jira (v8.20.10#820010)