hudi-bot opened a new issue, #14891:
URL: https://github.com/apache/hudi/issues/14891

   Metadata table is boostrapped whenever it finds its commits not synced up 
with data table. Each instantiation of metadata table does this check. When the 
metadata table is turned on at the start, and after few commits turned off, 
followed by more commits and then turned on again, the current check for 
bootstrapping doesn't seem to catch the intermittent breakages in the commit 
sync-up and missing out the bootstrap.
   
    
   {code:java}
   protected void bootstrapIfNeeded(HoodieEngineContext engineContext, 
HoodieTableMetaClient dataMetaClient) throws IOException {
     HoodieTimer timer = new HoodieTimer().startTimer();
     boolean exists = dataMetaClient.getFs().exists(new 
Path(metadataWriteConfig.getBasePath(), HoodieTableMetaClient.METAFOLDER_NAME));
     boolean rebootstrap = false;
     if (exists) {
       // If the un-synched instants have been archived then the metadata table 
will need to be bootstrapped again
       HoodieTableMetaClient metadataMetaClient = 
HoodieTableMetaClient.builder().setConf(hadoopConf.get())
           .setBasePath(metadataWriteConfig.getBasePath()).build();
       Option<HoodieInstant> latestMetadataInstant = 
metadataMetaClient.getActiveTimeline().filterCompletedInstants().lastInstant();
       if (!latestMetadataInstant.isPresent()) {
         LOG.warn("Metadata Table will need to be re-bootstrapped as no 
instants were found");
         rebootstrap = true;
       } else if 
(!latestMetadataInstant.get().getTimestamp().equals(SOLO_COMMIT_TIMESTAMP)
           && 
dataMetaClient.getActiveTimeline().getAllCommitsTimeline().isBeforeTimelineStarts(latestMetadataInstant.get().getTimestamp()))
 {
         // TODO: Revisit this logic and validate that filtering for all 
commits timeline is the right thing to do
         LOG.warn("Metadata Table will need to be re-bootstrapped as un-synced 
instants have been archived."
             + " latestMetadataInstant=" + 
latestMetadataInstant.get().getTimestamp()
             + ", latestDataInstant=" + 
dataMetaClient.getActiveTimeline().firstInstant().get().getTimestamp());
         rebootstrap = true;
       }
     }
   {code}
    
   
   ## JIRA info
   
   - Link: https://issues.apache.org/jira/browse/HUDI-2603
   - Type: Bug


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to