yihua commented on code in PR #7561: URL: https://github.com/apache/hudi/pull/7561#discussion_r1059107210
########## hudi-common/src/main/java/org/apache/hudi/common/table/HoodieTableMetaClient.java: ########## @@ -385,21 +386,44 @@ public HoodieMetastoreConfig getMetastoreConfig() { } /** - * Returns fresh new archived commits as a timeline from startTs (inclusive). - * - * <p>This is costly operation if really early endTs is specified. - * Be caution to use this only when the time range is short. - * - * <p>This method is not thread safe. + * Returns the cached archived timeline from startTs (inclusive). * - * @return Archived commit timeline + * @param startTs The start instant time (inclusive) of the archived timeline. + * @return the archived timeline. */ public HoodieArchivedTimeline getArchivedTimeline(String startTs) { - return new HoodieArchivedTimeline(this, startTs); + return getArchivedTimeline(startTs, true); + } + + /** + * Returns the cached archived timeline if using in-memory cache or a fresh new archived + * timeline if not using cache, from startTs (inclusive). + * <p> + * Instantiating an archived timeline is costly operation if really early startTs is + * specified. + * <p> + * This method is not thread safe. + * + * @param startTs The start instant time (inclusive) of the archived timeline. + * @param useCache Whether to use in-memory cache. + * @return the archived timeline based on the arguments. + */ + public HoodieArchivedTimeline getArchivedTimeline(String startTs, boolean useCache) { + if (useCache) { + return archivedTimelineMap.computeIfAbsent(startTs, this::instantiateArchivedTimeline); Review Comment: The assumption is that there should be only one `startTs` in the cache so there is no need to clear it and the cache is destructed once the lifecycle of the meta client is over. I can make it cleared if there is a new `startTs` coming in. ########## hudi-common/src/main/java/org/apache/hudi/common/table/timeline/TimelineUtils.java: ########## @@ -210,11 +210,30 @@ public static HoodieDefaultTimeline getTimeline(HoodieTableMetaClient metaClient return activeTimeline; } + /** + * Returns a Hudi timeline with commits after the given instant time (exclusive). + * + * @param metaClient {@link HoodieTableMetaClient} instance. + * @param exclusiveStartInstantTime Start instant time (exclusive). + * @return Hudi timeline. + */ + public static HoodieTimeline getCommitsTimelineAfter( + HoodieTableMetaClient metaClient, String exclusiveStartInstantTime) { + HoodieActiveTimeline activeTimeline = metaClient.getActiveTimeline(); + HoodieDefaultTimeline timeline = + activeTimeline.isBeforeTimelineStarts(exclusiveStartInstantTime) + ? metaClient.getArchivedTimeline(exclusiveStartInstantTime) + .mergeTimeline(activeTimeline) + : activeTimeline; + return timeline.getCommitsTimeline() + .findInstantsAfter(exclusiveStartInstantTime, Integer.MAX_VALUE); + } Review Comment: We need to scan all the instants since `exclusiveStartInstantTime` to figure out the touched partitions and it is possible that `exclusiveStartInstantTime` is before the start of the archived timeline, in which case we need to still scan the archived timeline (see #6662 for details). In most of the cases, `exclusiveStartInstantTime` should be after the start of the active timeline, so the archived timeline is not loaded. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org