[ 
https://issues.apache.org/jira/browse/HUDI-2458?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yue Zhang updated HUDI-2458:
----------------------------
    Fix Version/s: 0.14.0
                       (was: 0.13.1)

> Relax compaction in metadata being fenced based on inflight requests in data 
> table
> ----------------------------------------------------------------------------------
>
>                 Key: HUDI-2458
>                 URL: https://issues.apache.org/jira/browse/HUDI-2458
>             Project: Apache Hudi
>          Issue Type: Improvement
>          Components: metadata
>            Reporter: sivabalan narayanan
>            Assignee: Ethan Guo
>            Priority: Blocker
>             Fix For: 0.14.0
>
>
> Relax compaction in metadata being fenced based on inflight requests in data 
> table.
> Compaction in metadata is triggered only if there are no inflight requests in 
> data table. This might cause liveness problem since for very large 
> deployments, we could either have compaction or clustering always in 
> progress. So, we should try to see how we can relax this constraint.
>  
> Proposal to remove this dependency:
> With recent addition of spurious deletes config, we can actually get away 
> with this. 
> As of now, we have 3 inter linked nuances.
>  - Compaction in metadata may not kick in, if there are any inflight 
> operations in data table. 
>  - Rollback when being applied to metadata table has a dependency on last 
> compaction instant in metadata table. We might even throw exception if 
> instant being rolledback is < latest metadata compaction instant time. 
>  - Archival in data table is fenced by latest compaction in metadata table. 
>  
> So, just incase data timeline has any dangling inflght operation (lets say 
> someone tried clustering, and killed midway and did not ever attempt again), 
> metadata compaction will never kick in at all for good. I need to check what 
> does archival do for such inflight operations in data table though when it 
> tries to archive near by commits. 
>  
> So, with spurious deletes support which we added recently, all these can be 
> much simplified. 
> Whenever we want to apply a rollback commit, we don't need to take different 
> actions based on whether the commit being rolled back is already committed to 
> metadata table or not. Just go ahead and apply the rollback. Merging of 
> metadata payload records will take care of this. If the commit was already 
> synced, final merged payload may not have spurious deletes. If the commit 
> being rolledback was never committed to metadata, final merged payload may 
> have some spurious deletes which we can ignore. 
> With this, compaction in metadata does not need to have any dependency on 
> inflight operations in data table. 
> And we can loosen up the dependency of archival in data table on metadata 
> table compaction as well. 
> So, in summary, all the 3 dependencies quoted above will be moot if we go 
> with this approach. Archival in data table does not have any dependency on 
> metadata table compaction. Rollback when being applied to metadata table does 
> not care about last metadata table compaction. Compaction in metadata table 
> can proceed even if there are inflight operations in data table. 
>  
> Especially our logic to apply rollback metadata to metadata table will become 
> a lot simpler and is easy to reason about. 
>  
>  
>  
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to