[
https://issues.apache.org/jira/browse/GOBBLIN-2159?focusedWorklogId=939760&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-939760
]
ASF GitHub Bot logged work on GOBBLIN-2159:
-------------------------------------------
Author: ASF GitHub Bot
Created on: 23/Oct/24 17:21
Start Date: 23/Oct/24 17:21
Worklog Time Spent: 10m
Work Description: Blazer-007 commented on code in PR #4058:
URL: https://github.com/apache/gobblin/pull/4058#discussion_r1813222492
##########
gobblin-data-management/src/main/java/org/apache/gobblin/data/management/copy/iceberg/IcebergTable.java:
##########
@@ -237,31 +238,35 @@ protected void registerIcebergTable(TableMetadata
srcMetadata, TableMetadata dst
* @throws RuntimeException if error occurred while reading the manifest file
*/
public List<DataFile> getPartitionSpecificDataFiles(Predicate<StructLike>
icebergPartitionFilterPredicate)
- throws TableNotFoundException {
+ throws IOException {
TableMetadata tableMetadata = accessTableMetadata();
Snapshot currentSnapshot = tableMetadata.currentSnapshot();
long currentSnapshotId = currentSnapshot.snapshotId();
List<DataFile> knownDataFiles = new ArrayList<>();
- log.info("~{}~ for snapshot '{}' - '{}' total known iceberg datafiles",
tableId, currentSnapshotId,
- knownDataFiles.size());
+ GrowthMilestoneTracker growthMilestoneTracker = new
GrowthMilestoneTracker();
//TODO: Add support for deleteManifests as well later
// Currently supporting dataManifests only
List<ManifestFile> dataManifestFiles =
currentSnapshot.dataManifests(this.tableOps.io());
for (ManifestFile manifestFile : dataManifestFiles) {
+ if (growthMilestoneTracker.isAnotherMilestone(knownDataFiles.size())) {
+ log.info("~{}~ for snapshot '{}' - before manifest-file '{}' '{}'
total known iceberg datafiles", tableId,
+ currentSnapshotId,
+ manifestFile.path(),
+ knownDataFiles.size()
+ );
+ }
Review Comment:
Yes, seems a valid approach let me remove growthMileStonetracker from that
function
Issue Time Tracking
-------------------
Worklog Id: (was: 939760)
Time Spent: 12h (was: 11h 50m)
> Support Partition Based Copy in Iceberg Distcp
> ----------------------------------------------
>
> Key: GOBBLIN-2159
> URL: https://issues.apache.org/jira/browse/GOBBLIN-2159
> Project: Apache Gobblin
> Issue Type: Task
> Reporter: Vivek Rai
> Priority: Major
> Time Spent: 12h
> Remaining Estimate: 0h
>
--
This message was sent by Atlassian Jira
(v8.20.10#820010)