hudi-bot opened a new issue, #14891:
URL: https://github.com/apache/hudi/issues/14891
Metadata table is boostrapped whenever it finds its commits not synced up
with data table. Each instantiation of metadata table does this check. When the
metadata table is turned on at the start, and after few commits turned off,
followed by more commits and then turned on again, the current check for
bootstrapping doesn't seem to catch the intermittent breakages in the commit
sync-up and missing out the bootstrap.
{code:java}
protected void bootstrapIfNeeded(HoodieEngineContext engineContext,
HoodieTableMetaClient dataMetaClient) throws IOException {
HoodieTimer timer = new HoodieTimer().startTimer();
boolean exists = dataMetaClient.getFs().exists(new
Path(metadataWriteConfig.getBasePath(), HoodieTableMetaClient.METAFOLDER_NAME));
boolean rebootstrap = false;
if (exists) {
// If the un-synched instants have been archived then the metadata table
will need to be bootstrapped again
HoodieTableMetaClient metadataMetaClient =
HoodieTableMetaClient.builder().setConf(hadoopConf.get())
.setBasePath(metadataWriteConfig.getBasePath()).build();
Option<HoodieInstant> latestMetadataInstant =
metadataMetaClient.getActiveTimeline().filterCompletedInstants().lastInstant();
if (!latestMetadataInstant.isPresent()) {
LOG.warn("Metadata Table will need to be re-bootstrapped as no
instants were found");
rebootstrap = true;
} else if
(!latestMetadataInstant.get().getTimestamp().equals(SOLO_COMMIT_TIMESTAMP)
&&
dataMetaClient.getActiveTimeline().getAllCommitsTimeline().isBeforeTimelineStarts(latestMetadataInstant.get().getTimestamp()))
{
// TODO: Revisit this logic and validate that filtering for all
commits timeline is the right thing to do
LOG.warn("Metadata Table will need to be re-bootstrapped as un-synced
instants have been archived."
+ " latestMetadataInstant=" +
latestMetadataInstant.get().getTimestamp()
+ ", latestDataInstant=" +
dataMetaClient.getActiveTimeline().firstInstant().get().getTimestamp());
rebootstrap = true;
}
}
{code}
## JIRA info
- Link: https://issues.apache.org/jira/browse/HUDI-2603
- Type: Bug
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]