[ https://issues.apache.org/jira/browse/HUDI-5919?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Yue Zhang resolved HUDI-5919. ----------------------------- > Fix the validation of partition listing in metadata table validator > ------------------------------------------------------------------- > > Key: HUDI-5919 > URL: https://issues.apache.org/jira/browse/HUDI-5919 > Project: Apache Hudi > Issue Type: Bug > Reporter: Ethan Guo > Assignee: Ethan Guo > Priority: Blocker > Labels: pull-request-available > Fix For: 0.13.1 > > > In HoodieMetadataTableValidator, we compare the partition listing between MDT > and file system: > {code:java} > // ignore partitions created by uncommitted ingestion. > allPartitionPathsFromFS = > allPartitionPathsFromFS.stream().parallel().filter(part -> { > HoodiePartitionMetadata hoodiePartitionMetadata = > new HoodiePartitionMetadata(metaClient.getFs(), > FSUtils.getPartitionPath(basePath, part)); > Option<String> instantOption = > hoodiePartitionMetadata.readPartitionCreatedCommitTime(); > if (instantOption.isPresent()) { > String instantTime = instantOption.get(); > return completedTimeline.containsOrBeforeTimelineStarts(instantTime); > } else { > return false; > } > }).collect(Collectors.toList()); > List<String> allPartitionPathsMeta = > FSUtils.getAllPartitionPaths(engineContext, basePath, true, > cfg.assumeDatePartitioning); > Collections.sort(allPartitionPathsFromFS); > Collections.sort(allPartitionPathsMeta); > if (allPartitionPathsFromFS.size() != allPartitionPathsMeta.size() > || !allPartitionPathsFromFS.equals(allPartitionPathsMeta)) { > String message = "Compare Partitions Failed! " + "AllPartitionPathsFromFS : > " + allPartitionPathsFromFS + " and allPartitionPathsMeta : " + > allPartitionPathsMeta; > LOG.error(message); > throw new HoodieValidationException(message); > } {code} > When deciding the partitions from the file system to consider for comparison, > we look at the commit time that creates the partition. > {code:java} > if (instantOption.isPresent()) { String instantTime = instantOption.get(); > return completedTimeline.containsOrBeforeTimelineStarts(instantTime); } else > { return false; } {code} > In the following scenario, the validation job fires a false alarm complaining > that the partition list returned by the file system and the metadata table > because of this check: > - Commit C1 creates the partition, the partition metadata is written, and C1 > fails during writing data files. Next time, C2 adds new data to the same > partition after C1 is rolled back. In this case, the partition metadata still > has C1 as the created commit time, since Hudi does not rewrite the partition > metadata in C2. > -- This message was sent by Atlassian Jira (v8.20.10#820010)