[ 
https://issues.apache.org/jira/browse/HUDI-5769?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Raymond Xu updated HUDI-5769:
-----------------------------
    Sprint: Sprint 2023-01-31, Sprint 2023-02-14  (was: Sprint 2023-01-31)

> Partitions created by Async indexer could be deleted by regular writers
> -----------------------------------------------------------------------
>
>                 Key: HUDI-5769
>                 URL: https://issues.apache.org/jira/browse/HUDI-5769
>             Project: Apache Hudi
>          Issue Type: Bug
>          Components: metadata
>            Reporter: sivabalan narayanan
>            Assignee: sivabalan narayanan
>            Priority: Blocker
>              Labels: pull-request-available
>             Fix For: 0.13.0
>
>
> In regular writer we have a flow, where we detect if some MDT partition is 
> not enabled, but the partition is found in storage and as part of table 
> config's fully built out partitions, hudi deletes the metadata partition with 
> the intent that user wishes to disable it. 
> But this does not sit well w/ async indexer. 
>  
> process1 -> Deltastreamer runs continuously. 
> no metadata configs set. 
> which means, default value for metadata enable = true and hence "files" 
> partition will be instantiated inline on first commit. 
> no value set for col stats enable. So, no action will be taken. 
>  
> process2: user starts HoodieIndexer for col stats partition. 
> Once indexer completes, tableConfig will add "col stats" as part of fully 
> built out metadata partition. 
>  
> While in process1, when deltastreamer goes to next write, it will detect that 
> col stats wasn't enabled (default value as per code), but tableConfig shows 
> that col stats is fully built out, and hence decides to delete the col stats 
> partition and updates the tableConfig. 
>  
>  
>  
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to