nsivabalan commented on a change in pull request #3590: URL: https://github.com/apache/hudi/pull/3590#discussion_r716251124
########## File path: hudi-client/hudi-client-common/src/main/java/org/apache/hudi/metadata/HoodieBackedTableMetadataWriter.java ########## @@ -401,64 +394,83 @@ private boolean bootstrapFromFilesystem(HoodieEngineContext engineContext, Hoodi } /** - * Sync the Metadata Table from the instants created on the dataset. + * Initialize file groups for a partition. For file listing, we just have one file group. * - * @param datasetMetaClient {@code HoodieTableMetaClient} for the dataset + * All FileGroups for a given metadata partition has a fixed prefix as per the {@link MetadataPartitionType#getFileIdPrefix()}. + * Each file group is suffixed with increments of 1 starting with 1. + * + * For instance, for FILES, there is only one file group named as "files-1" + * Lets say we configure 10 file groups for record level index, and prefix as "record-index-bucket-" + * Filegroups will be named as : + * record-index-bucket-01 + * record-index-bucket-02 + * ... + * record-index-bucket-10 */ - private void syncFromInstants(HoodieTableMetaClient datasetMetaClient) { - ValidationUtils.checkState(enabled, "Metadata table cannot be synced as it is not enabled"); - // (re) init the metadata for reading. - initTableMetadata(); - try { - List<HoodieInstant> instantsToSync = metadata.findInstantsToSyncForWriter(); - if (instantsToSync.isEmpty()) { - return; - } - - LOG.info("Syncing " + instantsToSync.size() + " instants to metadata table: " + instantsToSync); - - // Read each instant in order and sync it to metadata table - for (HoodieInstant instant : instantsToSync) { - LOG.info("Syncing instant " + instant + " to metadata table"); - - Option<List<HoodieRecord>> records = HoodieTableMetadataUtil.convertInstantToMetaRecords(datasetMetaClient, - metaClient.getActiveTimeline(), instant, metadata.getUpdateTime()); - if (records.isPresent()) { - commit(records.get(), MetadataPartitionType.FILES.partitionPath(), instant.getTimestamp()); - } + private void initializeFileGroups(HoodieTableMetaClient datasetMetaClient, MetadataPartitionType metadataPartition, String instantTime, + int fileGroupCount) throws IOException { + + final HashMap<HeaderMetadataType, String> blockHeader = new HashMap<>(); + blockHeader.put(HeaderMetadataType.INSTANT_TIME, instantTime); + // Archival of data table has a dependency on compaction(base files) in metadata table. + // It is assumed that as of time Tx of base instant (/compaction time) in metadata table, + // all commits in data table is in sync with metadata table. So, we always create start with log file for any fileGroup. + final HoodieDeleteBlock block = new HoodieDeleteBlock(new HoodieKey[0], blockHeader); Review comment: I feel it may not be easy to relax this. we can discuss this async as we close out this patch. here are the two dependencies, of which 1 could be relaxed. 1. during rollback, we check if commit that is being rollback is already synced or not. if it is < last compacted time, we assume it's already synced. We can get away with this if need be. We can always assume, if the commit being rollbacked is not part of active timeline of metadata, its not synced and go ahead with it. Only difference we might have here is, there could be some additional files which are added to delete list which are not original synced to metadata table at all. 2. archival of dataset if dependent on compaction in metadata table. This might need more thoughts. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org