[ https://issues.apache.org/jira/browse/HUDI-5407?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Raymond Xu updated HUDI-5407: ----------------------------- Sprint: 0.13.0 Final Sprint, 0.13.0 Final Sprint 2, 0.13.0 Final Sprint 3 (was: 0.13.0 Final Sprint, 0.13.0 Final Sprint 2) > Rollbacks in MDT is not effective > --------------------------------- > > Key: HUDI-5407 > URL: https://issues.apache.org/jira/browse/HUDI-5407 > Project: Apache Hudi > Issue Type: Bug > Components: metadata > Reporter: sivabalan narayanan > Assignee: sivabalan narayanan > Priority: Critical > Labels: pull-request-available > Fix For: 0.13.0 > > > On rare conditions, rollbacks in MDT is not effective. Apparenlty, we have > set cleaning policy to be lazy. hence rollbacks happens only when cleaner > kicks in and not when we start a new commit. Given MDT is a single writer > table, rollback blocks are effective only when the commit to rollback is just > prior to the rollback block. > > Scenarios where this could fail w/ inline compaction. > > {code:java} > Data table timeline > t1.dc t2.comp.req. |Crash t3.dc t2.comp.inflight t2.commit > MDT timeline > t1.dc. t2.comp.inflight |Crash t3.dc t4.rb(t2) t2.dc > {code} > > The first attempt of t2 in MDT should be rolled back since it crashed > mid-way. in other words, if there are any log blocks written by t2 in MDT, it > should be deemed invalid. > > But what happens is, here is how the log blocks are laid out. > log1(t1). log2(t2 first attempt) crash.... log3 (t3) log4(t4.rb rolling back > t2) ... log5 (t2) > > So, when we read the log blocks via AbstractLogRecordReader, ideally we want > to ignore log2. but when we encounter log4 for a rollback block, we only > check the previous log block for matching commit to rollback. since it does > not match w/ t2, we assume log4 is a duplicate rollback and hence still deem > log2 as a valid log block. > hence MDT could serve more data files which are not valid from a FS based > listing standpoint. > > Impact: > log blocks to be ignored are considered valid if not for this fix. > > > -- This message was sent by Atlassian Jira (v8.20.10#820010)