[GitHub] [hudi] yihua commented on a diff in pull request #7561: [HUDI-5477][DO NOT MERGE] Optimize timeline loading in Hudi sync client

2022-12-29 Thread GitBox


yihua commented on code in PR #7561:
URL: https://github.com/apache/hudi/pull/7561#discussion_r1059134988


##
hudi-common/src/main/java/org/apache/hudi/common/table/HoodieTableMetaClient.java:
##
@@ -385,21 +386,44 @@ public HoodieMetastoreConfig getMetastoreConfig() {
   }
 
   /**
-   * Returns fresh new archived commits as a timeline from startTs (inclusive).
-   *
-   * This is costly operation if really early endTs is specified.
-   * Be caution to use this only when the time range is short.
-   *
-   * This method is not thread safe.
+   * Returns the cached archived timeline from startTs (inclusive).
*
-   * @return Archived commit timeline
+   * @param startTs The start instant time (inclusive) of the archived 
timeline.
+   * @return the archived timeline.
*/
   public HoodieArchivedTimeline getArchivedTimeline(String startTs) {
-return new HoodieArchivedTimeline(this, startTs);
+return getArchivedTimeline(startTs, true);
+  }
+
+  /**
+   * Returns the cached archived timeline if using in-memory cache or a fresh 
new archived
+   * timeline if not using cache, from startTs (inclusive).
+   * 
+   * Instantiating an archived timeline is costly operation if really early 
startTs is
+   * specified.
+   * 
+   * This method is not thread safe.
+   *
+   * @param startTs  The start instant time (inclusive) of the archived 
timeline.
+   * @param useCache Whether to use in-memory cache.
+   * @return the archived timeline based on the arguments.
+   */
+  public HoodieArchivedTimeline getArchivedTimeline(String startTs, boolean 
useCache) {
+if (useCache) {
+  return archivedTimelineMap.computeIfAbsent(startTs, 
this::instantiateArchivedTimeline);

Review Comment:
   This is fixed.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [hudi] yihua commented on a diff in pull request #7561: [HUDI-5477][DO NOT MERGE] Optimize timeline loading in Hudi sync client

2022-12-29 Thread GitBox


yihua commented on code in PR #7561:
URL: https://github.com/apache/hudi/pull/7561#discussion_r1059107210


##
hudi-common/src/main/java/org/apache/hudi/common/table/HoodieTableMetaClient.java:
##
@@ -385,21 +386,44 @@ public HoodieMetastoreConfig getMetastoreConfig() {
   }
 
   /**
-   * Returns fresh new archived commits as a timeline from startTs (inclusive).
-   *
-   * This is costly operation if really early endTs is specified.
-   * Be caution to use this only when the time range is short.
-   *
-   * This method is not thread safe.
+   * Returns the cached archived timeline from startTs (inclusive).
*
-   * @return Archived commit timeline
+   * @param startTs The start instant time (inclusive) of the archived 
timeline.
+   * @return the archived timeline.
*/
   public HoodieArchivedTimeline getArchivedTimeline(String startTs) {
-return new HoodieArchivedTimeline(this, startTs);
+return getArchivedTimeline(startTs, true);
+  }
+
+  /**
+   * Returns the cached archived timeline if using in-memory cache or a fresh 
new archived
+   * timeline if not using cache, from startTs (inclusive).
+   * 
+   * Instantiating an archived timeline is costly operation if really early 
startTs is
+   * specified.
+   * 
+   * This method is not thread safe.
+   *
+   * @param startTs  The start instant time (inclusive) of the archived 
timeline.
+   * @param useCache Whether to use in-memory cache.
+   * @return the archived timeline based on the arguments.
+   */
+  public HoodieArchivedTimeline getArchivedTimeline(String startTs, boolean 
useCache) {
+if (useCache) {
+  return archivedTimelineMap.computeIfAbsent(startTs, 
this::instantiateArchivedTimeline);

Review Comment:
   The assumption is that there should be only one `startTs` in the cache so 
there is no need to clear it and the cache is destructed once the lifecycle of 
the meta client is over.  I can make it cleared if there is a new `startTs` 
coming in.



##
hudi-common/src/main/java/org/apache/hudi/common/table/timeline/TimelineUtils.java:
##
@@ -210,11 +210,30 @@ public static HoodieDefaultTimeline 
getTimeline(HoodieTableMetaClient metaClient
 return activeTimeline;
   }
 
+  /**
+   * Returns a Hudi timeline with commits after the given instant time 
(exclusive).
+   *
+   * @param metaClient{@link HoodieTableMetaClient} instance.
+   * @param exclusiveStartInstantTime Start instant time (exclusive).
+   * @return Hudi timeline.
+   */
+  public static HoodieTimeline getCommitsTimelineAfter(
+  HoodieTableMetaClient metaClient, String exclusiveStartInstantTime) {
+HoodieActiveTimeline activeTimeline = metaClient.getActiveTimeline();
+HoodieDefaultTimeline timeline =
+activeTimeline.isBeforeTimelineStarts(exclusiveStartInstantTime)
+? metaClient.getArchivedTimeline(exclusiveStartInstantTime)
+.mergeTimeline(activeTimeline)
+: activeTimeline;
+return timeline.getCommitsTimeline()
+.findInstantsAfter(exclusiveStartInstantTime, Integer.MAX_VALUE);
+  }

Review Comment:
   We need to scan all the instants since `exclusiveStartInstantTime` to figure 
out the touched partitions and it is possible that `exclusiveStartInstantTime` 
is before the start of the archived timeline, in which case we need to still 
scan the archived timeline (see #6662 for details).  In most of the cases, 
`exclusiveStartInstantTime` should be after the start of the active timeline, 
so the archived timeline is not loaded.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org