[ 
https://issues.apache.org/jira/browse/HUDI-2159?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17378257#comment-17378257
 ] 

Prashant Wason commented on HUDI-2159:
--------------------------------------

Possible solutions:
 # Create a reader mode for metadata table:
 ## hoodie.metadata.enable=true
 ## hoodie.metadata.sync=false

         In this mode, the client wont call syncMetadataTable() at the end of 
the operations.

         Since, ingestion runs at faster cadence, we can set 
hoodie.metadata.sync=true in ingestion pipeline as hoodie.metadata.sync=false 
in all other pipelines. 

 

      2. Clustering ca be cleaned as per the timeout detection using 
HeartBeats. 

 

> Supporting Clustering and Metadata Table together
> -------------------------------------------------
>
>                 Key: HUDI-2159
>                 URL: https://issues.apache.org/jira/browse/HUDI-2159
>             Project: Apache Hudi
>          Issue Type: Sub-task
>            Reporter: Prashant Wason
>            Assignee: Prashant Wason
>            Priority: Major
>
> I am testing clustering support for metadata enabled table and found a few 
> issues.
> *Setup*
> Pipeline 1: Ingestion pipeline with Metadata Table enabled. Runs every 30 
> mins. 
> Pipeline 2: Clustering pipeline with long running jobs (3-4 hours)
> Pipeline 3: Another clustering pipeline with long running jobs (3-4 hours)
>  
> *Issue #1: Parallel commits on Metadata Table*
> Assume the Clustering pipeline is completing T5.replacecommit and ingestion 
> pipeline is completing T10.commit. Metadata Table will synced at an instant 
> <T5 (Say T4) since it only sync in completion order.
> Now both the pipelines will call syncMetadataTable() which will do the 
> following:
>  # Find all un-synced instants from dataset (T5, T6 ... T10)
>  # Read each instant and perform a deltacommit on the Metadata Table with the 
> same timestamp as instant.
> There is a chance that two processed perform deltacommit at T5 on the 
> metadata table and one will fail (instant file already exists). This will be 
> an exception raised and will be detected as failure of pipeline leading to 
> false-positive alerts.
>  
> *Issue #2: No archiving/rollback support for failed clustering operations*
> If a clustering operation fails, it leaves a left-over 
> T5.replacecommit.inflight. There is no automated way to rollback or archive 
> these. Since clustering is a long running operation in general and may be run 
> through multiple pipelines at the same time, automated rollback of left-over 
> inflights doesnt work as we cannot be sure that the process is dead.
> Metadata Table sync only works in completion order. So if 
> T5.replacecommit.inflight is left-over, Metadata Table will not sync beyond 
> T5 causing a large number of LogBLocks to pile up which will have to be 
> merged in memory leading to deteriorating performance.
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Reply via email to