nsivabalan commented on a change in pull request #3646:
URL: https://github.com/apache/hudi/pull/3646#discussion_r803321267



##########
File path: 
hudi-client/hudi-client-common/src/main/java/org/apache/hudi/config/HoodieCompactionConfig.java
##########
@@ -76,6 +76,11 @@
       .withDocumentation("Number of commits to retain, without cleaning. This 
will be retained for num_of_commits * time_between_commits "
           + "(scheduled). This also directly translates into how much data 
retention the table supports for incremental queries.");
 
+  public static final ConfigProperty<String> CLEANER_HOURS_RETAINED = 
ConfigProperty.key("hoodie.cleaner.hours.retained")
+          .defaultValue("24")
+          .withDocumentation("Number of hours for which commits need to be 
retained. This config provides a more flexible option as"
+          + "compared to number of commits retained for cleaning service");

Review comment:
       sg

##########
File path: 
hudi-client/hudi-client-common/src/main/java/org/apache/hudi/config/HoodieCompactionConfig.java
##########
@@ -76,6 +76,11 @@
       .withDocumentation("Number of commits to retain, without cleaning. This 
will be retained for num_of_commits * time_between_commits "
           + "(scheduled). This also directly translates into how much data 
retention the table supports for incremental queries.");
 
+  public static final ConfigProperty<String> CLEANER_HOURS_RETAINED = 
ConfigProperty.key("hoodie.cleaner.hours.retained")
+          .defaultValue("24")
+          .withDocumentation("Number of hours for which commits need to be 
retained. This config provides a more flexible option as"
+          + "compared to number of commits retained for cleaning service");

Review comment:
       but can we also explicitly state in the documentation. 
   "this policy will clean up commits whose timestamps are greater than the 
configured hours have elapsed".or something of these sorts. will let you take 
the call.  

##########
File path: 
hudi-client/hudi-client-common/src/main/java/org/apache/hudi/table/action/clean/CleanPlanner.java
##########
@@ -330,6 +349,19 @@ public CleanPlanner(HoodieEngineContext context, 
HoodieTable<T, I, K, O> hoodieT
     }
     return deletePaths;
   }
+
+  /**
+   * This method finds the files to be cleaned based on the number of hours. 
If {@code config.getCleanerHoursRetained()} is set to 5,
+   * all the files with commit time earlier than 5 hours will be removed. Also 
the latest file for any file group is retained.
+   * This policy gives much more flexibility to users for retaining data for 
running incremental queries as compared to
+   * KEEP_LATEST_COMMITS cleaning policy. The default number of hours is 5.
+   * @param partitionPath partition path to check
+   * @return list of files to clean
+   */
+  private List<CleanFileInfo> getFilesToCleanKeepingLatestHours(String 
partitionPath) {
+    int commitsToRetain = 0;

Review comment:
       got it. we can directly pass 0 as 2nd arg. we don't need to declare the 
variable. 




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Reply via email to