[ https://issues.apache.org/jira/browse/HUDI-5436?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
sivabalan narayanan updated HUDI-5436: -------------------------------------- Fix Version/s: 0.13.0 > Auto repair tool for MDT out of sync > ------------------------------------ > > Key: HUDI-5436 > URL: https://issues.apache.org/jira/browse/HUDI-5436 > Project: Apache Hudi > Issue Type: Bug > Components: metadata > Reporter: sivabalan narayanan > Priority: Critical > Fix For: 0.13.0 > > > Can we write a spark-submit to repair any out of sync issues w/ MDT. for eg, > if MDT validation failed for a given table, we don't have a good way to fix > the MDT. > So, we should develop a sparksubmit job which will try to deduce from which > commit the out of sync happens and try to fix just the delta. > > idea here is: > Try running validation job for latest files at every commit starting from > latest in reverse chronological order. At some point validation will succeed. > Lets call it commit N. > we can add savepoint to MDT at commit N and restore the table to that commit > N. > and then we can take any new commits after commitN from data table and apply > them one by one to MDT. > > Once complete, we can run validation tool again to ensure its in good shape. -- This message was sent by Atlassian Jira (v8.20.10#820010)