danny0405 commented on code in PR #13664:
URL: https://github.com/apache/hudi/pull/13664#discussion_r2246827264
##########
hudi-client/hudi-client-common/src/main/java/org/apache/hudi/metadata/HoodieBackedTableMetadataWriter.java:
##########
@@ -1485,14 +1485,53 @@ public void update(HoodieRestoreMetadata
restoreMetadata, String instantTime) {
// We need to choose a timestamp which would be a validInstantTime for
MDT. This is either a commit timestamp completed on the dataset
// or a new timestamp which we use for MDT clean, compaction etc.
String syncCommitTime = createRestoreInstantTime();
+ // For Files partition.
processAndCommit(syncCommitTime, () ->
HoodieTableMetadataUtil.convertMissingPartitionRecords(engineContext,
partitionsToDelete, partitionFilesToAdd, partitionFilesToDelete,
syncCommitTime));
+ // For Column Stats partition.
+ processAndCommit(syncCommitTime, () -> convertToColumnStatsRecord(
+ partitionFilesToAdd, partitionFilesToDelete, engineContext,
dataMetaClient,
+ dataWriteConfig.getMetadataConfig(),
Option.of(dataWriteConfig.getRecordMerger().getRecordType()),
+
dataWriteConfig.getMetadataConfig().getColumnStatsIndexParallelism()));
+ // Close.
closeInternal();
} catch (IOException e) {
throw new HoodieMetadataException("IOException during MDT restore sync",
e);
}
}
+ static Map<String, HoodieData<HoodieRecord>>
convertToColumnStatsRecord(Map<String, Map<String, Long>> partitionFilesToAdd,
+
Map<String, List<String>> partitionFilesToDelete,
+
HoodieEngineContext engineContext,
+
HoodieTableMetaClient dataMetaClient,
+
HoodieMetadataConfig metadataConfig,
+
Option<HoodieRecord.HoodieRecordType> recordTypeOpt,
+ int
columnStatsIndexParallelism) {
+ if (partitionFilesToDelete.isEmpty() && partitionFilesToAdd.isEmpty()) {
+ return Collections.emptyMap();
+ }
+ Lazy<Option<Schema>> tableSchema =
+ Lazy.lazily(() ->
HoodieTableMetadataUtil.tryResolveSchemaForTable(dataMetaClient));
Review Comment:
This will incur a timeline listing and metadata file resolving, not sure if
we can reuse the schema from somewhere.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]