[jira] [Updated] (HUDI-5919) Fix the validation of partition listing in metadata table validator

2023-05-30 Thread Ethan Guo (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-5919?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ethan Guo updated HUDI-5919:

Fix Version/s: 0.14.0
   (was: 0.13.1)

> Fix the validation of partition listing in metadata table validator
> ---
>
> Key: HUDI-5919
> URL: https://issues.apache.org/jira/browse/HUDI-5919
> Project: Apache Hudi
>  Issue Type: Bug
>Reporter: Ethan Guo
>Assignee: Ethan Guo
>Priority: Blocker
>  Labels: pull-request-available
> Fix For: 0.14.0
>
>
> In HoodieMetadataTableValidator, we compare the partition listing between MDT 
> and file system:
> {code:java}
> // ignore partitions created by uncommitted ingestion.
> allPartitionPathsFromFS = 
> allPartitionPathsFromFS.stream().parallel().filter(part -> {
>   HoodiePartitionMetadata hoodiePartitionMetadata =
>   new HoodiePartitionMetadata(metaClient.getFs(), 
> FSUtils.getPartitionPath(basePath, part));
>   Option instantOption = 
> hoodiePartitionMetadata.readPartitionCreatedCommitTime();
>   if (instantOption.isPresent()) {
> String instantTime = instantOption.get();
> return completedTimeline.containsOrBeforeTimelineStarts(instantTime);
>   } else {
> return false;
>   }
> }).collect(Collectors.toList());
> List allPartitionPathsMeta = 
> FSUtils.getAllPartitionPaths(engineContext, basePath, true, 
> cfg.assumeDatePartitioning);
> Collections.sort(allPartitionPathsFromFS);
> Collections.sort(allPartitionPathsMeta);
> if (allPartitionPathsFromFS.size() != allPartitionPathsMeta.size()
> || !allPartitionPathsFromFS.equals(allPartitionPathsMeta)) {
>   String message = "Compare Partitions Failed! " + "AllPartitionPathsFromFS : 
> " + allPartitionPathsFromFS + " and allPartitionPathsMeta : " + 
> allPartitionPathsMeta;
>   LOG.error(message);
>   throw new HoodieValidationException(message);
> } {code}
> When deciding the partitions from the file system to consider for comparison, 
> we look at the commit time that creates the partition.
> {code:java}
> if (instantOption.isPresent()) { String instantTime = instantOption.get(); 
> return completedTimeline.containsOrBeforeTimelineStarts(instantTime); } else 
> { return false; } {code}
> In the following scenario, the validation job fires a false alarm complaining 
> that the partition list returned by the file system and the metadata table 
> because of this check:
> - Commit C1 creates the partition, the partition metadata is written, and C1 
> fails during writing data files.  Next time, C2 adds new data to the same 
> partition after C1 is rolled back. In this case, the partition metadata still 
> has C1 as the created commit time, since Hudi does not rewrite the partition 
> metadata in C2.
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (HUDI-5919) Fix the validation of partition listing in metadata table validator

2023-05-22 Thread Yue Zhang (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-5919?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yue Zhang updated HUDI-5919:

Status: In Progress  (was: Open)

> Fix the validation of partition listing in metadata table validator
> ---
>
> Key: HUDI-5919
> URL: https://issues.apache.org/jira/browse/HUDI-5919
> Project: Apache Hudi
>  Issue Type: Bug
>Reporter: Ethan Guo
>Assignee: Ethan Guo
>Priority: Blocker
>  Labels: pull-request-available
> Fix For: 0.13.1
>
>
> In HoodieMetadataTableValidator, we compare the partition listing between MDT 
> and file system:
> {code:java}
> // ignore partitions created by uncommitted ingestion.
> allPartitionPathsFromFS = 
> allPartitionPathsFromFS.stream().parallel().filter(part -> {
>   HoodiePartitionMetadata hoodiePartitionMetadata =
>   new HoodiePartitionMetadata(metaClient.getFs(), 
> FSUtils.getPartitionPath(basePath, part));
>   Option instantOption = 
> hoodiePartitionMetadata.readPartitionCreatedCommitTime();
>   if (instantOption.isPresent()) {
> String instantTime = instantOption.get();
> return completedTimeline.containsOrBeforeTimelineStarts(instantTime);
>   } else {
> return false;
>   }
> }).collect(Collectors.toList());
> List allPartitionPathsMeta = 
> FSUtils.getAllPartitionPaths(engineContext, basePath, true, 
> cfg.assumeDatePartitioning);
> Collections.sort(allPartitionPathsFromFS);
> Collections.sort(allPartitionPathsMeta);
> if (allPartitionPathsFromFS.size() != allPartitionPathsMeta.size()
> || !allPartitionPathsFromFS.equals(allPartitionPathsMeta)) {
>   String message = "Compare Partitions Failed! " + "AllPartitionPathsFromFS : 
> " + allPartitionPathsFromFS + " and allPartitionPathsMeta : " + 
> allPartitionPathsMeta;
>   LOG.error(message);
>   throw new HoodieValidationException(message);
> } {code}
> When deciding the partitions from the file system to consider for comparison, 
> we look at the commit time that creates the partition.
> {code:java}
> if (instantOption.isPresent()) { String instantTime = instantOption.get(); 
> return completedTimeline.containsOrBeforeTimelineStarts(instantTime); } else 
> { return false; } {code}
> In the following scenario, the validation job fires a false alarm complaining 
> that the partition list returned by the file system and the metadata table 
> because of this check:
> - Commit C1 creates the partition, the partition metadata is written, and C1 
> fails during writing data files.  Next time, C2 adds new data to the same 
> partition after C1 is rolled back. In this case, the partition metadata still 
> has C1 as the created commit time, since Hudi does not rewrite the partition 
> metadata in C2.
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (HUDI-5919) Fix the validation of partition listing in metadata table validator

2023-03-10 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-5919?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated HUDI-5919:
-
Labels: pull-request-available  (was: )

> Fix the validation of partition listing in metadata table validator
> ---
>
> Key: HUDI-5919
> URL: https://issues.apache.org/jira/browse/HUDI-5919
> Project: Apache Hudi
>  Issue Type: Bug
>Reporter: Ethan Guo
>Assignee: Ethan Guo
>Priority: Blocker
>  Labels: pull-request-available
> Fix For: 0.13.1
>
>
> In HoodieMetadataTableValidator, we compare the partition listing between MDT 
> and file system:
> {code:java}
> // ignore partitions created by uncommitted ingestion.
> allPartitionPathsFromFS = 
> allPartitionPathsFromFS.stream().parallel().filter(part -> {
>   HoodiePartitionMetadata hoodiePartitionMetadata =
>   new HoodiePartitionMetadata(metaClient.getFs(), 
> FSUtils.getPartitionPath(basePath, part));
>   Option instantOption = 
> hoodiePartitionMetadata.readPartitionCreatedCommitTime();
>   if (instantOption.isPresent()) {
> String instantTime = instantOption.get();
> return completedTimeline.containsOrBeforeTimelineStarts(instantTime);
>   } else {
> return false;
>   }
> }).collect(Collectors.toList());
> List allPartitionPathsMeta = 
> FSUtils.getAllPartitionPaths(engineContext, basePath, true, 
> cfg.assumeDatePartitioning);
> Collections.sort(allPartitionPathsFromFS);
> Collections.sort(allPartitionPathsMeta);
> if (allPartitionPathsFromFS.size() != allPartitionPathsMeta.size()
> || !allPartitionPathsFromFS.equals(allPartitionPathsMeta)) {
>   String message = "Compare Partitions Failed! " + "AllPartitionPathsFromFS : 
> " + allPartitionPathsFromFS + " and allPartitionPathsMeta : " + 
> allPartitionPathsMeta;
>   LOG.error(message);
>   throw new HoodieValidationException(message);
> } {code}
> When deciding the partitions from the file system to consider for comparison, 
> we look at the commit time that creates the partition.
> {code:java}
> if (instantOption.isPresent()) { String instantTime = instantOption.get(); 
> return completedTimeline.containsOrBeforeTimelineStarts(instantTime); } else 
> { return false; } {code}
> There is one case that this can fire false alarm.  Consider the following 
> case.
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (HUDI-5919) Fix the validation of partition listing in metadata table validator

2023-03-10 Thread Ethan Guo (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-5919?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ethan Guo updated HUDI-5919:

Description: 
In HoodieMetadataTableValidator, we compare the partition listing between MDT 
and file system:
{code:java}
// ignore partitions created by uncommitted ingestion.
allPartitionPathsFromFS = 
allPartitionPathsFromFS.stream().parallel().filter(part -> {
  HoodiePartitionMetadata hoodiePartitionMetadata =
  new HoodiePartitionMetadata(metaClient.getFs(), 
FSUtils.getPartitionPath(basePath, part));

  Option instantOption = 
hoodiePartitionMetadata.readPartitionCreatedCommitTime();
  if (instantOption.isPresent()) {
String instantTime = instantOption.get();
return completedTimeline.containsOrBeforeTimelineStarts(instantTime);
  } else {
return false;
  }
}).collect(Collectors.toList());

List allPartitionPathsMeta = 
FSUtils.getAllPartitionPaths(engineContext, basePath, true, 
cfg.assumeDatePartitioning);

Collections.sort(allPartitionPathsFromFS);
Collections.sort(allPartitionPathsMeta);

if (allPartitionPathsFromFS.size() != allPartitionPathsMeta.size()
|| !allPartitionPathsFromFS.equals(allPartitionPathsMeta)) {
  String message = "Compare Partitions Failed! " + "AllPartitionPathsFromFS : " 
+ allPartitionPathsFromFS + " and allPartitionPathsMeta : " + 
allPartitionPathsMeta;
  LOG.error(message);
  throw new HoodieValidationException(message);
} {code}
When deciding the partitions from the file system to consider for comparison, 
we look at the commit time that creates the partition.
{code:java}
if (instantOption.isPresent()) { String instantTime = instantOption.get(); 
return completedTimeline.containsOrBeforeTimelineStarts(instantTime); } else { 
return false; } {code}
In the following scenario, the validation job fires a false alarm complaining 
that the partition list returned by the file system and the metadata table 
because of this check:
- Commit C1 creates the partition, the partition metadata is written, and C1 
fails during writing data files.  Next time, C2 adds new data to the same 
partition after C1 is rolled back. In this case, the partition metadata still 
has C1 as the created commit time, since Hudi does not rewrite the partition 
metadata in C2.

 

  was:
In HoodieMetadataTableValidator, we compare the partition listing between MDT 
and file system:
{code:java}
// ignore partitions created by uncommitted ingestion.
allPartitionPathsFromFS = 
allPartitionPathsFromFS.stream().parallel().filter(part -> {
  HoodiePartitionMetadata hoodiePartitionMetadata =
  new HoodiePartitionMetadata(metaClient.getFs(), 
FSUtils.getPartitionPath(basePath, part));

  Option instantOption = 
hoodiePartitionMetadata.readPartitionCreatedCommitTime();
  if (instantOption.isPresent()) {
String instantTime = instantOption.get();
return completedTimeline.containsOrBeforeTimelineStarts(instantTime);
  } else {
return false;
  }
}).collect(Collectors.toList());

List allPartitionPathsMeta = 
FSUtils.getAllPartitionPaths(engineContext, basePath, true, 
cfg.assumeDatePartitioning);

Collections.sort(allPartitionPathsFromFS);
Collections.sort(allPartitionPathsMeta);

if (allPartitionPathsFromFS.size() != allPartitionPathsMeta.size()
|| !allPartitionPathsFromFS.equals(allPartitionPathsMeta)) {
  String message = "Compare Partitions Failed! " + "AllPartitionPathsFromFS : " 
+ allPartitionPathsFromFS + " and allPartitionPathsMeta : " + 
allPartitionPathsMeta;
  LOG.error(message);
  throw new HoodieValidationException(message);
} {code}
When deciding the partitions from the file system to consider for comparison, 
we look at the commit time that creates the partition.
{code:java}
if (instantOption.isPresent()) { String instantTime = instantOption.get(); 
return completedTimeline.containsOrBeforeTimelineStarts(instantTime); } else { 
return false; } {code}
There is one case that this can fire false alarm.  Consider the following case.

 


> Fix the validation of partition listing in metadata table validator
> ---
>
> Key: HUDI-5919
> URL: https://issues.apache.org/jira/browse/HUDI-5919
> Project: Apache Hudi
>  Issue Type: Bug
>Reporter: Ethan Guo
>Assignee: Ethan Guo
>Priority: Blocker
>  Labels: pull-request-available
> Fix For: 0.13.1
>
>
> In HoodieMetadataTableValidator, we compare the partition listing between MDT 
> and file system:
> {code:java}
> // ignore partitions created by uncommitted ingestion.
> allPartitionPathsFromFS = 
> allPartitionPathsFromFS.stream().parallel().filter(part -> {
>   HoodiePartitionMetadata hoodiePartitionMetadata =
>   new HoodiePartitionMetadata(metaClient.getFs(), 
> FSUtils.getPartitionPath(basePath, part));
>   Option instantOption = 
> hoodiePartitionMetadata.rea

[jira] [Updated] (HUDI-5919) Fix the validation of partition listing in metadata table validator

2023-03-10 Thread Ethan Guo (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-5919?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ethan Guo updated HUDI-5919:

Description: 
In HoodieMetadataTableValidator, we compare the partition listing between MDT 
and file system:
{code:java}
// ignore partitions created by uncommitted ingestion.
allPartitionPathsFromFS = 
allPartitionPathsFromFS.stream().parallel().filter(part -> {
  HoodiePartitionMetadata hoodiePartitionMetadata =
  new HoodiePartitionMetadata(metaClient.getFs(), 
FSUtils.getPartitionPath(basePath, part));

  Option instantOption = 
hoodiePartitionMetadata.readPartitionCreatedCommitTime();
  if (instantOption.isPresent()) {
String instantTime = instantOption.get();
return completedTimeline.containsOrBeforeTimelineStarts(instantTime);
  } else {
return false;
  }
}).collect(Collectors.toList());

List allPartitionPathsMeta = 
FSUtils.getAllPartitionPaths(engineContext, basePath, true, 
cfg.assumeDatePartitioning);

Collections.sort(allPartitionPathsFromFS);
Collections.sort(allPartitionPathsMeta);

if (allPartitionPathsFromFS.size() != allPartitionPathsMeta.size()
|| !allPartitionPathsFromFS.equals(allPartitionPathsMeta)) {
  String message = "Compare Partitions Failed! " + "AllPartitionPathsFromFS : " 
+ allPartitionPathsFromFS + " and allPartitionPathsMeta : " + 
allPartitionPathsMeta;
  LOG.error(message);
  throw new HoodieValidationException(message);
} {code}
When deciding the partitions from the file system to consider for comparison, 
we look at the commit time that creates the partition.
{code:java}
if (instantOption.isPresent()) { String instantTime = instantOption.get(); 
return completedTimeline.containsOrBeforeTimelineStarts(instantTime); } else { 
return false; } {code}
There is one case that this can fire false alarm.  Consider the following case.

 

> Fix the validation of partition listing in metadata table validator
> ---
>
> Key: HUDI-5919
> URL: https://issues.apache.org/jira/browse/HUDI-5919
> Project: Apache Hudi
>  Issue Type: Bug
>Reporter: Ethan Guo
>Assignee: Ethan Guo
>Priority: Blocker
> Fix For: 0.13.1
>
>
> In HoodieMetadataTableValidator, we compare the partition listing between MDT 
> and file system:
> {code:java}
> // ignore partitions created by uncommitted ingestion.
> allPartitionPathsFromFS = 
> allPartitionPathsFromFS.stream().parallel().filter(part -> {
>   HoodiePartitionMetadata hoodiePartitionMetadata =
>   new HoodiePartitionMetadata(metaClient.getFs(), 
> FSUtils.getPartitionPath(basePath, part));
>   Option instantOption = 
> hoodiePartitionMetadata.readPartitionCreatedCommitTime();
>   if (instantOption.isPresent()) {
> String instantTime = instantOption.get();
> return completedTimeline.containsOrBeforeTimelineStarts(instantTime);
>   } else {
> return false;
>   }
> }).collect(Collectors.toList());
> List allPartitionPathsMeta = 
> FSUtils.getAllPartitionPaths(engineContext, basePath, true, 
> cfg.assumeDatePartitioning);
> Collections.sort(allPartitionPathsFromFS);
> Collections.sort(allPartitionPathsMeta);
> if (allPartitionPathsFromFS.size() != allPartitionPathsMeta.size()
> || !allPartitionPathsFromFS.equals(allPartitionPathsMeta)) {
>   String message = "Compare Partitions Failed! " + "AllPartitionPathsFromFS : 
> " + allPartitionPathsFromFS + " and allPartitionPathsMeta : " + 
> allPartitionPathsMeta;
>   LOG.error(message);
>   throw new HoodieValidationException(message);
> } {code}
> When deciding the partitions from the file system to consider for comparison, 
> we look at the commit time that creates the partition.
> {code:java}
> if (instantOption.isPresent()) { String instantTime = instantOption.get(); 
> return completedTimeline.containsOrBeforeTimelineStarts(instantTime); } else 
> { return false; } {code}
> There is one case that this can fire false alarm.  Consider the following 
> case.
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (HUDI-5919) Fix the validation of partition listing in metadata table validator

2023-03-10 Thread Ethan Guo (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-5919?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ethan Guo updated HUDI-5919:

Priority: Blocker  (was: Major)

> Fix the validation of partition listing in metadata table validator
> ---
>
> Key: HUDI-5919
> URL: https://issues.apache.org/jira/browse/HUDI-5919
> Project: Apache Hudi
>  Issue Type: Bug
>Reporter: Ethan Guo
>Assignee: Ethan Guo
>Priority: Blocker
> Fix For: 0.13.1
>
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (HUDI-5919) Fix the validation of partition listing in metadata table validator

2023-03-10 Thread Ethan Guo (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-5919?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ethan Guo updated HUDI-5919:

Fix Version/s: 0.13.1

> Fix the validation of partition listing in metadata table validator
> ---
>
> Key: HUDI-5919
> URL: https://issues.apache.org/jira/browse/HUDI-5919
> Project: Apache Hudi
>  Issue Type: Bug
>Reporter: Ethan Guo
>Priority: Major
> Fix For: 0.13.1
>
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (HUDI-5919) Fix the validation of partition listing in metadata table validator

2023-03-10 Thread Ethan Guo (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-5919?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ethan Guo updated HUDI-5919:

Story Points: 0.5

> Fix the validation of partition listing in metadata table validator
> ---
>
> Key: HUDI-5919
> URL: https://issues.apache.org/jira/browse/HUDI-5919
> Project: Apache Hudi
>  Issue Type: Bug
>Reporter: Ethan Guo
>Assignee: Ethan Guo
>Priority: Major
> Fix For: 0.13.1
>
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)