[GitHub] [hudi] danny0405 commented on a change in pull request #4336: [HUDI-3032] Do not clean the log files right after compaction for met…
danny0405 commented on a change in pull request #4336: URL: https://github.com/apache/hudi/pull/4336#discussion_r773696840 ## File path: hudi-client/hudi-client-common/src/main/java/org/apache/hudi/metadata/HoodieBackedTableMetadataWriter.java ## @@ -706,7 +706,20 @@ protected void compactIfNecessary(AbstractHoodieWriteClient writeClient, String } } - protected void doClean(AbstractHoodieWriteClient writeClient, String instantTime) { + protected void cleanIfNecessary(AbstractHoodieWriteClient writeClient, String instantTime) { +Option lastCompletedCompactionInstant = metadataMetaClient.reloadActiveTimeline() Review comment: > reloadActiveTimeline() seems to be called right before calling this function Don't think so, we should still reload the timeline if there are inline compaction triggers and completes. > check if HoodieTableMetadata already has this function `HoodieTableMetadata` has this function but just like it name says, it is a metadata reader while the logic is in writer now. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] danny0405 commented on a change in pull request #4336: [HUDI-3032] Do not clean the log files right after compaction for met…
danny0405 commented on a change in pull request #4336: URL: https://github.com/apache/hudi/pull/4336#discussion_r772953860 ## File path: hudi-client/hudi-client-common/src/main/java/org/apache/hudi/metadata/HoodieBackedTableMetadataWriter.java ## @@ -706,7 +706,20 @@ protected void compactIfNecessary(AbstractHoodieWriteClient writeClient, String } } - protected void doClean(AbstractHoodieWriteClient writeClient, String instantTime) { + protected void cleanIfNecessary(AbstractHoodieWriteClient writeClient, String instantTime) { +Option lastCompletedCompactionInstant = metadataMetaClient.reloadActiveTimeline() Review comment: Agree, where should i put the util method ? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] danny0405 commented on a change in pull request #4336: [HUDI-3032] Do not clean the log files right after compaction for met…
danny0405 commented on a change in pull request #4336: URL: https://github.com/apache/hudi/pull/4336#discussion_r772951102 ## File path: hudi-client/hudi-client-common/src/main/java/org/apache/hudi/metadata/HoodieBackedTableMetadataWriter.java ## @@ -706,7 +706,20 @@ protected void compactIfNecessary(AbstractHoodieWriteClient writeClient, String } } - protected void doClean(AbstractHoodieWriteClient writeClient, String instantTime) { + protected void cleanIfNecessary(AbstractHoodieWriteClient writeClient, String instantTime) { +Option lastCompletedCompactionInstant = metadataMetaClient.reloadActiveTimeline() Review comment: > is reloadActiveTimeline() neceassary here? Yes, it is necessary, because we can not make sure whether the compaction triggers successfully or not. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] danny0405 commented on a change in pull request #4336: [HUDI-3032] Do not clean the log files right after compaction for met…
danny0405 commented on a change in pull request #4336: URL: https://github.com/apache/hudi/pull/4336#discussion_r772930657 ## File path: hudi-client/hudi-client-common/src/main/java/org/apache/hudi/metadata/HoodieBackedTableMetadataWriter.java ## @@ -706,7 +706,20 @@ protected void compactIfNecessary(AbstractHoodieWriteClient writeClient, String } } - protected void doClean(AbstractHoodieWriteClient writeClient, String instantTime) { + protected void cleanIfNecessary(AbstractHoodieWriteClient writeClient, String instantTime) { +Option lastCompletedCompactionInstant = metadataMetaClient.reloadActiveTimeline() +.getCommitTimeline().filterCompletedInstants().lastInstant(); +if (lastCompletedCompactionInstant.isPresent() +&& metadataMetaClient.getActiveTimeline().filterCompletedInstants() + .findInstantsAfter(lastCompletedCompactionInstant.get().getTimestamp()).countInstants() < 3) { + // do not clean the log files immediately after compaction to give some buffer time for metadata table reader, Review comment: I guess so, we need some protection logic too for the MOR table log files or even COW table parquet files, because if we write COW table very frequently (like each 10 seconds), the reader would very probably encounter `FileNotFoundException` . -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] danny0405 commented on a change in pull request #4336: [HUDI-3032] Do not clean the log files right after compaction for met…
danny0405 commented on a change in pull request #4336: URL: https://github.com/apache/hudi/pull/4336#discussion_r771085416 ## File path: hudi-client/hudi-client-common/src/main/java/org/apache/hudi/metadata/HoodieBackedTableMetadataWriter.java ## @@ -706,7 +706,20 @@ protected void compactIfNecessary(AbstractHoodieWriteClient writeClient, String } } - protected void doClean(AbstractHoodieWriteClient writeClient, String instantTime) { + protected void cleanIfNecessary(AbstractHoodieWriteClient writeClient, String instantTime) { +Option lastCompletedCompactionInstant = metadataMetaClient.reloadActiveTimeline() +.getCommitTimeline().filterCompletedInstants().lastInstant(); +if (lastCompletedCompactionInstant.isPresent() +&& metadataMetaClient.getActiveTimeline().filterCompletedInstants() + .findInstantsAfter(lastCompletedCompactionInstant.get().getTimestamp()).countInstants() < 3) { + // do not clean the log files immediately after compaction to give some buffer time for metadata table reader, Review comment: Now the default compaction delta commits is 10, the min retain commits is 20, the max retain commits is 30. When the second compaction schedules and triggers, there are about 22 commits on the timeline, which would then trigger the cleaning immediately. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org