yihua commented on code in PR #7561:
URL: https://github.com/apache/hudi/pull/7561#discussion_r1059107210
##
hudi-common/src/main/java/org/apache/hudi/common/table/HoodieTableMetaClient.java:
##
@@ -385,21 +386,44 @@ public HoodieMetastoreConfig getMetastoreConfig() {
}
/**
- * Returns fresh new archived commits as a timeline from startTs (inclusive).
- *
- * This is costly operation if really early endTs is specified.
- * Be caution to use this only when the time range is short.
- *
- * This method is not thread safe.
+ * Returns the cached archived timeline from startTs (inclusive).
*
- * @return Archived commit timeline
+ * @param startTs The start instant time (inclusive) of the archived
timeline.
+ * @return the archived timeline.
*/
public HoodieArchivedTimeline getArchivedTimeline(String startTs) {
-return new HoodieArchivedTimeline(this, startTs);
+return getArchivedTimeline(startTs, true);
+ }
+
+ /**
+ * Returns the cached archived timeline if using in-memory cache or a fresh
new archived
+ * timeline if not using cache, from startTs (inclusive).
+ *
+ * Instantiating an archived timeline is costly operation if really early
startTs is
+ * specified.
+ *
+ * This method is not thread safe.
+ *
+ * @param startTs The start instant time (inclusive) of the archived
timeline.
+ * @param useCache Whether to use in-memory cache.
+ * @return the archived timeline based on the arguments.
+ */
+ public HoodieArchivedTimeline getArchivedTimeline(String startTs, boolean
useCache) {
+if (useCache) {
+ return archivedTimelineMap.computeIfAbsent(startTs,
this::instantiateArchivedTimeline);
Review Comment:
The assumption is that there should be only one `startTs` in the cache so
there is no need to clear it and the cache is destructed once the lifecycle of
the meta client is over. I can make it cleared if there is a new `startTs`
coming in.
##
hudi-common/src/main/java/org/apache/hudi/common/table/timeline/TimelineUtils.java:
##
@@ -210,11 +210,30 @@ public static HoodieDefaultTimeline
getTimeline(HoodieTableMetaClient metaClient
return activeTimeline;
}
+ /**
+ * Returns a Hudi timeline with commits after the given instant time
(exclusive).
+ *
+ * @param metaClient{@link HoodieTableMetaClient} instance.
+ * @param exclusiveStartInstantTime Start instant time (exclusive).
+ * @return Hudi timeline.
+ */
+ public static HoodieTimeline getCommitsTimelineAfter(
+ HoodieTableMetaClient metaClient, String exclusiveStartInstantTime) {
+HoodieActiveTimeline activeTimeline = metaClient.getActiveTimeline();
+HoodieDefaultTimeline timeline =
+activeTimeline.isBeforeTimelineStarts(exclusiveStartInstantTime)
+? metaClient.getArchivedTimeline(exclusiveStartInstantTime)
+.mergeTimeline(activeTimeline)
+: activeTimeline;
+return timeline.getCommitsTimeline()
+.findInstantsAfter(exclusiveStartInstantTime, Integer.MAX_VALUE);
+ }
Review Comment:
We need to scan all the instants since `exclusiveStartInstantTime` to figure
out the touched partitions and it is possible that `exclusiveStartInstantTime`
is before the start of the archived timeline, in which case we need to still
scan the archived timeline (see #6662 for details). In most of the cases,
`exclusiveStartInstantTime` should be after the start of the active timeline,
so the archived timeline is not loaded.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org