JonasJ-ap commented on code in PR #6880:
URL: https://github.com/apache/iceberg/pull/6880#discussion_r1117977464
##########
delta-lake/src/main/java/org/apache/iceberg/delta/BaseSnapshotDeltaLakeTableAction.java:
##########
@@ -213,6 +218,52 @@ private PartitionSpec
getPartitionSpecFromDeltaSnapshot(Schema schema) {
return builder.build();
}
+ /**
+ * Commit the initial delta snapshot to iceberg transaction. It tries the
snapshot starting from
+ * {@code deltaStartVersion} to {@code latestVersion} and commit the first
constructable one.
+ *
+ * <p>There are two cases that the delta snapshot is not constructable:
+ *
+ * <ul>
+ * <li>the version is earlier than the earliest checkpoint
+ * <li>the corresponding data files are deleted by {@code VACUUM}
+ * </ul>
+ *
+ * <p>For more information, please refer to delta lake's <a
+ * href="https://docs.delta.io/latest/delta-batch.html#-data-retention">Data
Retention</a>
+ *
+ * @param latestVersion the latest version of the delta lake table
+ * @param transaction the iceberg transaction
+ * @return the initial version of the delta lake table that is successfully
committed to iceberg
+ */
+ private long commitInitialDeltaSnapshotToIcebergTransaction(
+ long latestVersion, Transaction transaction) {
+ long constructableStartVersion = deltaStartVersion;
+ while (constructableStartVersion <= latestVersion) {
+ try {
+ List<AddFile> initDataFiles =
+
deltaLog.getSnapshotForVersionAsOf(constructableStartVersion).getAllFiles();
+ List<DataFile> filesToAdd = Lists.newArrayList();
+ for (AddFile addFile : initDataFiles) {
+ DataFile dataFile = buildDataFileFromAction(addFile,
transaction.table());
+ filesToAdd.add(dataFile);
+ }
+
+ // AppendFiles case
+ AppendFiles appendFiles = transaction.newAppend();
+ filesToAdd.forEach(appendFiles::appendFile);
+ appendFiles.commit();
+
+ return constructableStartVersion;
+ } catch (NotFoundException | IllegalArgumentException |
DeltaStandaloneException e) {
Review Comment:
Thank you for pointing this out. The `HadoopFileIO` re-throw
`FileNotFoundExceptiion` as `NotFoundException` and general `IOException` as
`RuntimeIOException`.
https://github.com/apache/iceberg/blob/b5a31a14d56c1ee24bad87e1ac7f119d638ee320/core/src/main/java/org/apache/iceberg/hadoop/HadoopInputFile.java#L159-L170
In this case, I think we only need to handle the `NotFoundException` since
`VACUUM` may delete the data file. I've added a code block in
`buildDataFileFromAction` to explicitly check file existence and throw
`NotFoundException` if necessary.
Please let me know if you have any other concern here.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]