[ 
https://issues.apache.org/jira/browse/HUDI-2432?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Rajesh Mahindra updated HUDI-2432:
----------------------------------
    Sprint: Hudi-Sprint-Jan-3, Hudi-Sprint-Jan-10, Hudi-Sprint-Jan-18, 
Hudi-Sprint-Jan-24, Hudi-Sprint-Jan-31  (was: Hudi-Sprint-Jan-3, 
Hudi-Sprint-Jan-10, Hudi-Sprint-Jan-18, Hudi-Sprint-Jan-24)

> Fix restore by adding a requested instant and restore plan
> ----------------------------------------------------------
>
>                 Key: HUDI-2432
>                 URL: https://issues.apache.org/jira/browse/HUDI-2432
>             Project: Apache Hudi
>          Issue Type: Task
>            Reporter: sivabalan narayanan
>            Assignee: sivabalan narayanan
>            Priority: Blocker
>              Labels: pull-request-available
>             Fix For: 0.11.0
>
>
> Fix restore by adding a requested instant and restore plan
>  
> Trying to see if we really need a plan. Dumping my thoughts here. 
> Restore internally converts to N no of rollbacks. We fetch active instants in 
> reverse order from timeline and trigger rollbacks 1 by 1. We have already 
> have a patch fixing rollback to add rollback Plan in rollback.requested meta 
> file. So, walking through failure scenarios. 
>  
> With restore, individual rollbacks are not published to timeline. So, if 
> restore fails midway, in the 2nd attempt, only subset of rollback will be 
> applied to metadata table(which got rolledback during the 2nd attempt). so, 
> we need a plan for restore as well.
> But with our enhancement to rollback to publish a plan, Rollback.requested 
> can't be skipped and we have to publish to timeline. So, here is what will 
> happen w/o a restore plan.
>  
> start restore
>     rollback commit N
>           rollback.requested for commit N// plan.
>           execute rollback, but do not publish to timeline. so this will not 
> get applied to metadata table. 
>     rollback commit N-1
>            rollback.requested for commit N-1 // plan
>           execute rollback, but do not publish to timeline. again, will not 
> get applied to metadata table. 
>      .
> commit restore and publish. this will get applied to metadata table. 
> Once we are done committing restore, we can remove all rollback.requested 
> files if needed. 
>  
> Failure scenarios: 
> If after 2 rollbacks, we fail. 
> on re-attempt, we will process remaining commits only, since active timeline 
> may not report commitN and commitN-1 as active. So, we can do something like 
> below w/ a restore plan.
>  
> 1. start restore
>    2. schedule rollback for all of them. 
>         serialize all commit instants that need to be rolledback along with 
> the rollback plan. // by now, we would have created rollback.requested meta 
> file for all commits that need to be rolled back. 
>     3. now execute rollback one by one. // do not publish to timeline once 
> done. also changes should not be applied to metadata table. 
> 4. collect rollback commit metadata from all individual rollbacks and create 
> the restore commit metadata. there could be some commits which was already 
> rolledback, and for those, we need to manually create rollback metadata based 
> on rollback plan. More details in next para. commit the restore and publish. 
> only this will get applied to metadata table(which inturn will unwrap the 
> individual rollback metadata and apply it to metadata table). 
>  
> Failures:
> if we fail after 2nd rollback:
> on 2nd attempt, we will look at retstore plan for all commits that needs to 
> be rolledback. So, we can't really rollback the first 2 since they are 
> already rolled back. And so, we will manually create rollback metadata from 
> rollback.requested meta file. and for rest, we will follow the regular flow 
> of executing actual rollback and collecting rollback metadata. Once complete, 
> we will serialize all this info in restore metadata which gets applied to 
> metadata table. 
>  
> Alternatives: But since restore anyway is a destructive operation and is 
> advised to stop all processes, we do have an option to clean up metadata 
> table and rebootstrap completely once restore is complete. 
>  
>  
>  



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

Reply via email to