satishkotha commented on a change in pull request #1964:
URL: https://github.com/apache/hudi/pull/1964#discussion_r475920279



##########
File path: 
hudi-common/src/main/java/org/apache/hudi/common/table/timeline/HoodieDefaultTimeline.java
##########
@@ -296,6 +300,42 @@ public boolean isBeforeTimelineStarts(String instant) {
     return details.apply(instant);
   }
 
+  /**
+   * Returns partitions that have been modified in the timeline. This includes 
internal operations such as clean.
+   * Note that this only returns data for completed instants.
+   */
+  public List<String> getPartitionsMutated() {
+    return filterCompletedInstants().getInstants().flatMap(s -> {
+      switch (s.getAction()) {
+        case HoodieTimeline.COMMIT_ACTION:
+        case HoodieTimeline.DELTA_COMMIT_ACTION:
+          try {
+            HoodieCommitMetadata commitMetadata = 
HoodieCommitMetadata.fromBytes(getInstantDetails(s).get(), 
HoodieCommitMetadata.class);
+            return commitMetadata.getPartitionToWriteStats().keySet().stream();
+          } catch (IOException e) {
+            throw new HoodieIOException("Failed to get partitions written 
between " + firstInstant() + " " + lastInstant(), e);
+          }
+        case HoodieTimeline.CLEAN_ACTION:

Review comment:
       The method takes in a timeline. So hive sync only passes "commit" 
timeline to this method and gets only partitions modified by commit instants.  
I added another method just for clarity that only looks at commits. Let me know 
if you have any suggestions.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Reply via email to