[ https://issues.apache.org/jira/browse/HUDI-3068?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Raymond Xu updated HUDI-3068: ----------------------------- Component/s: meta-sync (was: hive) > Add support to sync all partitions in hive sync tool > ---------------------------------------------------- > > Key: HUDI-3068 > URL: https://issues.apache.org/jira/browse/HUDI-3068 > Project: Apache Hudi > Issue Type: New Feature > Components: meta-sync > Reporter: sivabalan narayanan > Assignee: Harshal Patil > Priority: Major > Labels: pull-request-available, sev:critical > Fix For: 0.12.0 > > > If a user runs hive sync occationally and if archival kicked in and trimmed > some commits and if there were partitions added during those commits which > was never updated later, hive sync will miss out those partitions. > {code:java} > LOG.info("Last commit time synced is " + lastCommitTimeSynced.get() + ", > Getting commits since then"); > return > TimelineUtils.getPartitionsWritten(metaClient.getActiveTimeline().getCommitsTimeline() > .findInstantsAfter(lastCommitTimeSynced.get(), Integer.MAX_VALUE)); > } {code} > bcoz, we for recurrent syncs, we always fetch new commits from timeline after > the last synced instant and fetch commit metadata and go on to fetch the > partitions added as part of it. > > We can add a new config to hive sync tool to override this behavior. > --sync-all-partitions > when this config is set to true, we should ignore last synced instant and > should go the below route which is done when syncing for the first time. > > {code:java} > if (!lastCommitTimeSynced.isPresent()) { > LOG.info("Last commit time synced is not known, listing all partitions in " > + basePath + ",FS :" + fs); > HoodieLocalEngineContext engineContext = new > HoodieLocalEngineContext(metaClient.getHadoopConf()); > return FSUtils.getAllPartitionPaths(engineContext, basePath, > useFileListingFromMetadata, assumeDatePartitioning); > } {code} > > > Ref issue: > https://github.com/apache/hudi/issues/3890 -- This message was sent by Atlassian Jira (v8.20.1#820001)