hudi-bot opened a new issue, #15643: URL: https://github.com/apache/hudi/issues/15643
Can we write a spark-submit to repair any out of sync issues w/ MDT. for eg, if MDT validation failed for a given table, we don't have a good way to fix the MDT. So, we should develop a sparksubmit job which will try to deduce from which commit the out of sync happens and try to fix just the delta. idea here is: Try running validation job for latest files at every commit starting from latest in reverse chronological order. At some point validation will succeed. Lets call it commit N. we can add savepoint to MDT at commit N and restore the table to that commit N. and then we can take any new commits after commitN from data table and apply them one by one to MDT. Once complete, we can run validation tool again to ensure its in good shape. ## JIRA info - Link: https://issues.apache.org/jira/browse/HUDI-5436 - Type: Bug - Epic: https://issues.apache.org/jira/browse/HUDI-1292 - Fix version(s): - 1.1.0 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
