phet commented on code in PR #3569:
URL: https://github.com/apache/gobblin/pull/3569#discussion_r979115309


##########
gobblin-data-management/src/main/java/org/apache/gobblin/data/management/copy/iceberg/IcebergTable.java:
##########
@@ -46,36 +51,89 @@
 public class IcebergTable {
   private final TableOperations tableOps;
 
+  /** @return metadata info limited to the most recent (current) snapshot */
   public IcebergSnapshotInfo getCurrentSnapshotInfo() throws IOException {
     TableMetadata current = tableOps.current();
-    Snapshot snapshot = current.currentSnapshot();
+    return createSnapshotInfo(current.currentSnapshot(), 
Optional.of(current.metadataFileLocation()));
+  }
+
+  /** @return metadata info for all known snapshots, ordered historically, 
with *most recent last* */
+  public Iterator<IcebergSnapshotInfo> getAllSnapshotInfosIterator() {
+    TableMetadata current = tableOps.current();
+    long currentSnapshotId = current.currentSnapshot().snapshotId();
+    List<Snapshot> snapshots = current.snapshots();
+    return Iterators.transform(snapshots.iterator(), snapshot -> {
+      try {
+        return IcebergTable.this.createSnapshotInfo(
+            snapshot,
+            currentSnapshotId == snapshot.snapshotId() ? 
Optional.of(current.metadataFileLocation()) : Optional.empty()
+        );
+      } catch (IOException e) {
+        throw new RuntimeException(e);
+      }
+    });
+  }
+
+  /**
+   * @return metadata info for all known snapshots, but incrementally, so 
content overlap between snapshots appears
+   * only within the first as they're ordered historically, with *most recent 
last*
+   */
+  public Iterator<IcebergSnapshotInfo> getIncrementalSnapshotInfosIterator() {

Review Comment:
   presently (this PR too), we only support copying the entire iceberg.  I am 
already working on the next iteration which is to calculate the delta between 
the src and dest.  (for integration testing that, we'll bootstrap "destination" 
with a replicated copy by using this code herein.)
   
   I absolutely agree we do not want to eagerly load the iceberg's entire 
metadata.  this interface is an `Iterator` for that reason.  I will 
`.reverse()` the order, to enable walking backwards in history.  the delta 
comparison will look on the destination for each snapshot's manifest list file. 
 when that's found we'll infer that replication up to that point was achieved.
   
   so yes, that is a plan I'm in-progress with carrying out.  this commit is 
just to give us working code against an MVP that does only entire copy 
(non-incremental, delta copy).



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to