danielhumanmod commented on code in PR #569:
URL: https://github.com/apache/incubator-xtable/pull/569#discussion_r1839736983
##########
xtable-api/src/main/java/org/apache/xtable/spi/extractor/ExtractFromSource.java:
##########
@@ -47,9 +49,20 @@ public IncrementalTableChanges extractTableChanges(
commitsBacklog.getCommitsToProcess().stream()
.map(conversionSource::getTableChangeForCommit)
Review Comment:
> I think that we'll want the identifier on the commit level, right?
Thanks for the response @the-other-tim-brown.
Yes, ideally, every commit in source table should directly map to one in
target table. However, based on my understanding of how XTable works, this
isn’t guaranteed. Instead, the mapping (Source -> Target) is more like a N:1
mapping, which means:
- Every commit in the target table has a corresponding mapping in the source
table.
- Not every commit in the source table has a one-to-one mapping in the
target table.
The reason is, between each sync(), there could be multiple changes on
source, and all these changes will sync as only one commit in target, just like
this example
```
Iceberg (Source) Delta (Target)
┌────────────┐ ┌─────────────────────┐
│ Snapshot 0 │ ◀ ▶ │ Version 0 (Synced) │ (can map to snapshot 0)
│ Snapshot 1 │ │ │
│ Snapshot 2 │ │ │
│ Snapshot 3 │ │ │
│ Snapshot 4 │ │ │
│ Snapshot 5 │ ◀ ▶ │ Version 1 (Synced) │ (can map to snapshot 5)
└────────────┘ └─────────────────────┘
```
Given this, I’ve chosen to use the information from the latest commit in the
source table as the source identifier.
But my understanding might be wrong, appreciate if there is any feedback or
suggestion :)
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]