[ https://issues.apache.org/jira/browse/HUDI-5464?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Sagar Sumit updated HUDI-5464: ------------------------------ Status: Patch Available (was: In Progress) > Fix instantiation of a new partition in MDT re-using the same instant time as > a regular commit > ---------------------------------------------------------------------------------------------- > > Key: HUDI-5464 > URL: https://issues.apache.org/jira/browse/HUDI-5464 > Project: Apache Hudi > Issue Type: Bug > Components: metadata > Reporter: sivabalan narayanan > Assignee: Raymond Xu > Priority: Blocker > Labels: pull-request-available > Fix For: 0.13.0 > > > we re-use the same instant time as the commit being applied to MDT while > instantiating a new partition in MDT. this needs to be fixed. > > for eg, lets say we have 10 commits w/ already FILES enabled. > for C11, we are enabling col-stats. > after data table business, when we enter metadata writer instantiation, we > deduct that col-stats has to be instantiated and then instantiate using DC11. > in MDT timeline, we see dc11.req. dc11.inflight and dc11.complete. and then > we go ahead and apply actual C11 from DT to MDT (dc11.inflight and > dc11.complete is updated). here, we overwrite the same DC11 w/ records > pertaining to C11. > which is buggy. we definitely need to fix this. > We can add a suffix to C11 (say C11_003 or C11_001) as we do for compaction > and clean in MDT so that any additional operation in MDT has a diff commit > time format. For everything else, it should match w/ DT 1 on 1. > > > Impact: > We are over-riding the same DC for two purposes which is bad. if there is a > crash after initializing col-stats and before applying actual C11(in above > context), we might mistakenly rollback col-stats initialization, but still > table config could say that col stats is fully ready to be served. But while > reading MDT, we may not read DC11 since its a failed commit. -- This message was sent by Atlassian Jira (v8.20.10#820010)