[GitHub] [hudi] danny0405 commented on a change in pull request #4336: [HUDI-3032] Do not clean the log files right after compaction for met…

2021-12-22 Thread GitBox


danny0405 commented on a change in pull request #4336:
URL: https://github.com/apache/hudi/pull/4336#discussion_r773696840



##
File path: 
hudi-client/hudi-client-common/src/main/java/org/apache/hudi/metadata/HoodieBackedTableMetadataWriter.java
##
@@ -706,7 +706,20 @@ protected void 
compactIfNecessary(AbstractHoodieWriteClient writeClient, String
 }
   }
 
-  protected void doClean(AbstractHoodieWriteClient writeClient, String 
instantTime) {
+  protected void cleanIfNecessary(AbstractHoodieWriteClient writeClient, 
String instantTime) {
+Option lastCompletedCompactionInstant = 
metadataMetaClient.reloadActiveTimeline()

Review comment:
   > reloadActiveTimeline() seems to be called right before calling this 
function
   
   Don't think so, we should still reload the timeline if there are inline 
compaction triggers and completes.
   
   > check if HoodieTableMetadata already has this function
   
   `HoodieTableMetadata` has this function but just like it name says, it is a 
metadata reader while the logic is in writer now.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [hudi] danny0405 commented on a change in pull request #4336: [HUDI-3032] Do not clean the log files right after compaction for met…

2021-12-21 Thread GitBox


danny0405 commented on a change in pull request #4336:
URL: https://github.com/apache/hudi/pull/4336#discussion_r772953860



##
File path: 
hudi-client/hudi-client-common/src/main/java/org/apache/hudi/metadata/HoodieBackedTableMetadataWriter.java
##
@@ -706,7 +706,20 @@ protected void 
compactIfNecessary(AbstractHoodieWriteClient writeClient, String
 }
   }
 
-  protected void doClean(AbstractHoodieWriteClient writeClient, String 
instantTime) {
+  protected void cleanIfNecessary(AbstractHoodieWriteClient writeClient, 
String instantTime) {
+Option lastCompletedCompactionInstant = 
metadataMetaClient.reloadActiveTimeline()

Review comment:
   Agree, where should i put the util method ?




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [hudi] danny0405 commented on a change in pull request #4336: [HUDI-3032] Do not clean the log files right after compaction for met…

2021-12-21 Thread GitBox


danny0405 commented on a change in pull request #4336:
URL: https://github.com/apache/hudi/pull/4336#discussion_r772951102



##
File path: 
hudi-client/hudi-client-common/src/main/java/org/apache/hudi/metadata/HoodieBackedTableMetadataWriter.java
##
@@ -706,7 +706,20 @@ protected void 
compactIfNecessary(AbstractHoodieWriteClient writeClient, String
 }
   }
 
-  protected void doClean(AbstractHoodieWriteClient writeClient, String 
instantTime) {
+  protected void cleanIfNecessary(AbstractHoodieWriteClient writeClient, 
String instantTime) {
+Option lastCompletedCompactionInstant = 
metadataMetaClient.reloadActiveTimeline()

Review comment:
   > is reloadActiveTimeline() neceassary here?
   
   Yes, it is necessary, because we can not make sure whether the compaction 
triggers successfully or not.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [hudi] danny0405 commented on a change in pull request #4336: [HUDI-3032] Do not clean the log files right after compaction for met…

2021-12-21 Thread GitBox


danny0405 commented on a change in pull request #4336:
URL: https://github.com/apache/hudi/pull/4336#discussion_r772930657



##
File path: 
hudi-client/hudi-client-common/src/main/java/org/apache/hudi/metadata/HoodieBackedTableMetadataWriter.java
##
@@ -706,7 +706,20 @@ protected void 
compactIfNecessary(AbstractHoodieWriteClient writeClient, String
 }
   }
 
-  protected void doClean(AbstractHoodieWriteClient writeClient, String 
instantTime) {
+  protected void cleanIfNecessary(AbstractHoodieWriteClient writeClient, 
String instantTime) {
+Option lastCompletedCompactionInstant = 
metadataMetaClient.reloadActiveTimeline()
+.getCommitTimeline().filterCompletedInstants().lastInstant();
+if (lastCompletedCompactionInstant.isPresent()
+&& metadataMetaClient.getActiveTimeline().filterCompletedInstants()
+
.findInstantsAfter(lastCompletedCompactionInstant.get().getTimestamp()).countInstants()
 < 3) {
+  // do not clean the log files immediately after compaction to give some 
buffer time for metadata table reader,

Review comment:
   I guess so, we need some protection logic too for the MOR table log 
files or even COW table parquet files, because if we write COW table very 
frequently (like each 10 seconds), the reader would very probably encounter 
`FileNotFoundException` .




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [hudi] danny0405 commented on a change in pull request #4336: [HUDI-3032] Do not clean the log files right after compaction for met…

2021-12-16 Thread GitBox


danny0405 commented on a change in pull request #4336:
URL: https://github.com/apache/hudi/pull/4336#discussion_r771085416



##
File path: 
hudi-client/hudi-client-common/src/main/java/org/apache/hudi/metadata/HoodieBackedTableMetadataWriter.java
##
@@ -706,7 +706,20 @@ protected void 
compactIfNecessary(AbstractHoodieWriteClient writeClient, String
 }
   }
 
-  protected void doClean(AbstractHoodieWriteClient writeClient, String 
instantTime) {
+  protected void cleanIfNecessary(AbstractHoodieWriteClient writeClient, 
String instantTime) {
+Option lastCompletedCompactionInstant = 
metadataMetaClient.reloadActiveTimeline()
+.getCommitTimeline().filterCompletedInstants().lastInstant();
+if (lastCompletedCompactionInstant.isPresent()
+&& metadataMetaClient.getActiveTimeline().filterCompletedInstants()
+
.findInstantsAfter(lastCompletedCompactionInstant.get().getTimestamp()).countInstants()
 < 3) {
+  // do not clean the log files immediately after compaction to give some 
buffer time for metadata table reader,

Review comment:
   Now the default compaction delta commits is 10, the min retain commits 
is 20, the max retain commits is 30.
   
   When the second compaction schedules and triggers, there are about 22 
commits on the timeline, which would then trigger the cleaning immediately.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org