nsivabalan commented on code in PR #10913: URL: https://github.com/apache/hudi/pull/10913#discussion_r1540365760
########## hudi-common/src/main/java/org/apache/hudi/metadata/HoodieMetadataPayload.java: ########## @@ -550,27 +552,34 @@ private Map<String, HoodieMetadataFileInfo> combineFileSystemMetadata(HoodieMeta // - First we merge records from all of the delta log-files // - Then we merge records from base-files with the delta ones (coming as a result // of the previous step) - (oldFileInfo, newFileInfo) -> - // NOTE: We can’t assume that MT update records will be ordered the same way as actual - // FS operations (since they are not atomic), therefore MT record merging should be a - // _commutative_ & _associative_ operation (ie one that would work even in case records - // will get re-ordered), which is - // - Possible for file-sizes (since file-sizes will ever grow, we can simply - // take max of the old and new records) - // - Not possible for is-deleted flags* - // - // *However, we’re assuming that the case of concurrent write and deletion of the same - // file is _impossible_ -- it would only be possible with concurrent upsert and - // rollback operation (affecting the same log-file), which is implausible, b/c either - // of the following have to be true: - // - We’re appending to failed log-file (then the other writer is trying to - // rollback it concurrently, before it’s own write) - // - Rollback (of completed instant) is running concurrently with append (meaning - // that restore is running concurrently with a write, which is also nut supported - // currently) - newFileInfo.getIsDeleted() - ? null - : new HoodieMetadataFileInfo(Math.max(newFileInfo.getSize(), oldFileInfo.getSize()), false)); + (oldFileInfo, newFileInfo) -> { + // NOTE: We can’t assume that MT update records will be ordered the same way as actual + // FS operations (since they are not atomic), therefore MT record merging should be a + // _commutative_ & _associative_ operation (ie one that would work even in case records + // will get re-ordered), which is + // - Possible for file-sizes (since file-sizes will ever grow, we can simply + // take max of the old and new records) + // - Not possible for is-deleted flags* + // + // *However, we’re assuming that the case of concurrent write and deletion of the same + // file is _impossible_ -- it would only be possible with concurrent upsert and + // rollback operation (affecting the same log-file), which is implausible, b/c either + // of the following have to be true: + // - We’re appending to failed log-file (then the other writer is trying to + // rollback it concurrently, before it’s own write) + // - Rollback (of completed instant) is running concurrently with append (meaning + // that restore is running concurrently with a write, which is also nut supported + // currently) + if (newFileInfo.getIsDeleted()) { + if (oldFileInfo.getIsDeleted()) { + LOG.warn("A file is repeatedly deleted in the files partition of the metadata table: " + key); Review Comment: gotcha -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org