[ 
https://issues.apache.org/jira/browse/HUDI-5919?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yue Zhang updated HUDI-5919:
----------------------------
    Status: In Progress  (was: Open)

> Fix the validation of partition listing in metadata table validator
> -------------------------------------------------------------------
>
>                 Key: HUDI-5919
>                 URL: https://issues.apache.org/jira/browse/HUDI-5919
>             Project: Apache Hudi
>          Issue Type: Bug
>            Reporter: Ethan Guo
>            Assignee: Ethan Guo
>            Priority: Blocker
>              Labels: pull-request-available
>             Fix For: 0.13.1
>
>
> In HoodieMetadataTableValidator, we compare the partition listing between MDT 
> and file system:
> {code:java}
> // ignore partitions created by uncommitted ingestion.
> allPartitionPathsFromFS = 
> allPartitionPathsFromFS.stream().parallel().filter(part -> {
>   HoodiePartitionMetadata hoodiePartitionMetadata =
>       new HoodiePartitionMetadata(metaClient.getFs(), 
> FSUtils.getPartitionPath(basePath, part));
>   Option<String> instantOption = 
> hoodiePartitionMetadata.readPartitionCreatedCommitTime();
>   if (instantOption.isPresent()) {
>     String instantTime = instantOption.get();
>     return completedTimeline.containsOrBeforeTimelineStarts(instantTime);
>   } else {
>     return false;
>   }
> }).collect(Collectors.toList());
> List<String> allPartitionPathsMeta = 
> FSUtils.getAllPartitionPaths(engineContext, basePath, true, 
> cfg.assumeDatePartitioning);
> Collections.sort(allPartitionPathsFromFS);
> Collections.sort(allPartitionPathsMeta);
> if (allPartitionPathsFromFS.size() != allPartitionPathsMeta.size()
>     || !allPartitionPathsFromFS.equals(allPartitionPathsMeta)) {
>   String message = "Compare Partitions Failed! " + "AllPartitionPathsFromFS : 
> " + allPartitionPathsFromFS + " and allPartitionPathsMeta : " + 
> allPartitionPathsMeta;
>   LOG.error(message);
>   throw new HoodieValidationException(message);
> } {code}
> When deciding the partitions from the file system to consider for comparison, 
> we look at the commit time that creates the partition.
> {code:java}
> if (instantOption.isPresent()) { String instantTime = instantOption.get(); 
> return completedTimeline.containsOrBeforeTimelineStarts(instantTime); } else 
> { return false; } {code}
> In the following scenario, the validation job fires a false alarm complaining 
> that the partition list returned by the file system and the metadata table 
> because of this check:
> - Commit C1 creates the partition, the partition metadata is written, and C1 
> fails during writing data files.  Next time, C2 adds new data to the same 
> partition after C1 is rolled back. In this case, the partition metadata still 
> has C1 as the created commit time, since Hudi does not rewrite the partition 
> metadata in C2.
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to