[GitHub] [hudi] danny0405 commented on a diff in pull request #8900: [HUDI-6334] Integrate logcompaction table service to metadata table and provides various bugfixes to metadata table

2023-06-09 Thread via GitHub


danny0405 commented on code in PR #8900:
URL: https://github.com/apache/hudi/pull/8900#discussion_r1224329290


##
hudi-client/hudi-client-common/src/main/java/org/apache/hudi/io/HoodieMergeHandleFactory.java:
##
@@ -47,6 +50,7 @@ public static  HoodieMergeHandle 
create(
   String fileId,
   TaskContextSupplier taskContextSupplier,
   Option keyGeneratorOpt) {
+LOG.info("Create update handle for fileId {} and partition path {} at 
commit {}", fileId, partitionPath, instantTime);
 if (table.requireSortedRecords()) {

Review Comment:
   Can we just remove this log, it's verbose.



##
hudi-client/hudi-client-common/src/main/java/org/apache/hudi/io/HoodieMergeHandleFactory.java:
##
@@ -79,6 +83,7 @@ public static  HoodieMergeHandle 
create(
   HoodieBaseFile dataFileToBeMerged,
   TaskContextSupplier taskContextSupplier,
   Option keyGeneratorOpt) {
+LOG.info("Get updateHandle for fileId {} and partitionPath {} at commit 
{}", fileId, partitionPath, instantTime);
 if (table.requireSortedRecords()) {

Review Comment:
   Can we just remove this log, it's verbose.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [hudi] danny0405 commented on a diff in pull request #8900: [HUDI-6334] Integrate logcompaction table service to metadata table and provides various bugfixes to metadata table

2023-06-09 Thread via GitHub


danny0405 commented on code in PR #8900:
URL: https://github.com/apache/hudi/pull/8900#discussion_r1223977198


##
hudi-flink-datasource/hudi-flink/src/test/java/org/apache/hudi/sink/TestStreamWriteOperatorCoordinator.java:
##
@@ -233,48 +233,49 @@ void testSyncMetadataTable() throws Exception {
 assertThat(completedTimeline.lastInstant().get().getTimestamp(), 
startsWith(HoodieTableMetadata.SOLO_COMMIT_TIMESTAMP));
 
 // test metadata table compaction
-// write another 4 commits
-for (int i = 1; i < 5; i++) {
+// write another 9 commits to trigger compaction twice. Since default 
clean version to retain is 2.

Review Comment:
   > The reason for making the change is to support restore
   
   First of all, I'm confused why this change is related with restore ? The 
change is for MDT log compaction right? Can we address the restore issue in 
another PR ?
   
   > then next cleaner job after compaction would remove previous file slice 
there by blocking restore on metadata table or loosing data.
   
   Does the new cleaning strategy solve the issue, even we keep at least   2 
versions for each file group, it does not ensure we can restore to a long 
history commit. 



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [hudi] danny0405 commented on a diff in pull request #8900: [HUDI-6334] Integrate logcompaction table service to metadata table and provides various bugfixes to metadata table

2023-06-09 Thread via GitHub


danny0405 commented on code in PR #8900:
URL: https://github.com/apache/hudi/pull/8900#discussion_r1223968227


##
hudi-client/hudi-client-common/src/main/java/org/apache/hudi/config/HoodieCompactionConfig.java:
##
@@ -61,6 +61,12 @@ public class HoodieCompactionConfig extends HoodieConfig {
   + "but users are expected to trigger async job for execution. If 
`hoodie.compact.inline` is set to true, regular writers will do both scheduling 
and "
   + "execution inline for compaction");
 
+  public static final ConfigProperty ENABLE_LOG_COMPACTION = 
ConfigProperty
+  .key("hoodie.log.compaction.enable")
+  .defaultValue("false")
+  .sinceVersion("0.14")
+  .withDocumentation("By enabling log compaction through this config, log 
compaction will also get enabled for the metadata table.");

Review Comment:
   It seems only MDT uses the log compaction. If that is true, kind of think 
this belongs to a MDT config.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [hudi] danny0405 commented on a diff in pull request #8900: [HUDI-6334] Integrate logcompaction table service to metadata table and provides various bugfixes to metadata table

2023-06-09 Thread via GitHub


danny0405 commented on code in PR #8900:
URL: https://github.com/apache/hudi/pull/8900#discussion_r1223970182


##
hudi-client/hudi-client-common/src/main/java/org/apache/hudi/metadata/HoodieTableMetadataWriter.java:
##
@@ -97,6 +98,11 @@ public interface HoodieTableMetadataWriter extends 
Serializable, AutoCloseable {
*/
   BaseHoodieWriteClient getWriteClient();
 
+  /**
+   * It returns write client for metadata table.
+   */
+  HoodieTableMetaClient getMetadataMetaClient();

Review Comment:
   Just remove unrelated changes.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [hudi] danny0405 commented on a diff in pull request #8900: [HUDI-6334] Integrate logcompaction table service to metadata table and provides various bugfixes to metadata table

2023-06-09 Thread via GitHub


danny0405 commented on code in PR #8900:
URL: https://github.com/apache/hudi/pull/8900#discussion_r1223968227


##
hudi-client/hudi-client-common/src/main/java/org/apache/hudi/config/HoodieCompactionConfig.java:
##
@@ -61,6 +61,12 @@ public class HoodieCompactionConfig extends HoodieConfig {
   + "but users are expected to trigger async job for execution. If 
`hoodie.compact.inline` is set to true, regular writers will do both scheduling 
and "
   + "execution inline for compaction");
 
+  public static final ConfigProperty ENABLE_LOG_COMPACTION = 
ConfigProperty
+  .key("hoodie.log.compaction.enable")
+  .defaultValue("false")
+  .sinceVersion("0.14")
+  .withDocumentation("By enabling log compaction through this config, log 
compaction will also get enabled for the metadata table.");

Review Comment:
   It seems only MDT uses the log compaction.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [hudi] danny0405 commented on a diff in pull request #8900: [HUDI-6334] Integrate logcompaction table service to metadata table and provides various bugfixes to metadata table

2023-06-08 Thread via GitHub


danny0405 commented on code in PR #8900:
URL: https://github.com/apache/hudi/pull/8900#discussion_r1223901787


##
hudi-client/hudi-client-common/src/main/java/org/apache/hudi/table/action/compact/plan/generators/BaseHoodieCompactionPlanGenerator.java:
##
@@ -158,7 +165,8 @@ protected List 
filterPartitionPathsByStrategy(HoodieWriteConfig writeCon
 return partitionPaths;
   }
 
-  protected boolean filterFileSlice(FileSlice fileSlice, String 
lastCompletedInstantTime, Set pendingFileGroupIds) {
+  protected boolean filterFileSlice(FileSlice fileSlice, String 
lastCompletedInstantTime,
+Set 
pendingFileGroupIds, Option instantRange) {
 return fileSlice.getLogFiles().count() > 0 && 
!pendingFileGroupIds.contains(fileSlice.getFileGroupId());

Review Comment:
   Okay, it is overridden, a little obscure.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [hudi] danny0405 commented on a diff in pull request #8900: [HUDI-6334] Integrate logcompaction table service to metadata table and provides various bugfixes to metadata table

2023-06-08 Thread via GitHub


danny0405 commented on code in PR #8900:
URL: https://github.com/apache/hudi/pull/8900#discussion_r1223899704


##
hudi-client/hudi-client-common/src/main/java/org/apache/hudi/table/action/compact/plan/generators/BaseHoodieCompactionPlanGenerator.java:
##
@@ -158,7 +165,8 @@ protected List 
filterPartitionPathsByStrategy(HoodieWriteConfig writeCon
 return partitionPaths;
   }
 
-  protected boolean filterFileSlice(FileSlice fileSlice, String 
lastCompletedInstantTime, Set pendingFileGroupIds) {
+  protected boolean filterFileSlice(FileSlice fileSlice, String 
lastCompletedInstantTime,
+Set 
pendingFileGroupIds, Option instantRange) {
 return fileSlice.getLogFiles().count() > 0 && 
!pendingFileGroupIds.contains(fileSlice.getFileGroupId());

Review Comment:
   is the `instantRange` been used anywhere?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [hudi] danny0405 commented on a diff in pull request #8900: [HUDI-6334] Integrate logcompaction table service to metadata table and provides various bugfixes to metadata table

2023-06-08 Thread via GitHub


danny0405 commented on code in PR #8900:
URL: https://github.com/apache/hudi/pull/8900#discussion_r1223899704


##
hudi-client/hudi-client-common/src/main/java/org/apache/hudi/table/action/compact/plan/generators/BaseHoodieCompactionPlanGenerator.java:
##
@@ -158,7 +165,8 @@ protected List 
filterPartitionPathsByStrategy(HoodieWriteConfig writeCon
 return partitionPaths;
   }
 
-  protected boolean filterFileSlice(FileSlice fileSlice, String 
lastCompletedInstantTime, Set pendingFileGroupIds) {
+  protected boolean filterFileSlice(FileSlice fileSlice, String 
lastCompletedInstantTime,
+Set 
pendingFileGroupIds, Option instantRange) {
 return fileSlice.getLogFiles().count() > 0 && 
!pendingFileGroupIds.contains(fileSlice.getFileGroupId());

Review Comment:
   Has the `instantRange` been used anywhere?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [hudi] danny0405 commented on a diff in pull request #8900: [HUDI-6334] Integrate logcompaction table service to metadata table and provides various bugfixes to metadata table

2023-06-08 Thread via GitHub


danny0405 commented on code in PR #8900:
URL: https://github.com/apache/hudi/pull/8900#discussion_r1223894630


##
hudi-client/hudi-client-common/src/main/java/org/apache/hudi/metadata/HoodieBackedTableMetadataWriter.java:
##
@@ -1021,17 +1023,46 @@ private void 
runPendingTableServicesOperations(BaseHoodieWriteClient writeClient
* deltacommit.
*/
   protected void compactIfNecessary(BaseHoodieWriteClient writeClient, String 
latestDeltacommitTime) {
+
+// Check if there are any pending compaction or log compaction instants in 
the timeline.
+// If pending compact/logcompaction operations are found abort scheduling 
new compaction/logcompaction operations.
+Option pendingLogCompactionInstant =
+
metadataMetaClient.getActiveTimeline().filterPendingLogCompactionTimeline().firstInstant();
+Option pendingCompactionInstant =
+
metadataMetaClient.getActiveTimeline().filterPendingCompactionTimeline().firstInstant();
+if (pendingLogCompactionInstant.isPresent() || 
pendingCompactionInstant.isPresent()) {
+  LOG.info(String.format("Not scheduling compaction or logcompaction, 
since a pending compaction instant %s or logcompaction %s instant is present",
+  pendingCompactionInstant, pendingLogCompactionInstant));
+  return;
+}
+
 // Trigger compaction with suffixes based on the same instant time. This 
ensures that any future
 // delta commits synced over will not have an instant time lesser than the 
last completed instant on the
 // metadata table.
 final String compactionInstantTime = 
HoodieTableMetadataUtil.createCompactionTimestamp(latestDeltacommitTime);
+
 // we need to avoid checking compaction w/ same instant again.
 // let's say we trigger compaction after C5 in MDT and so compaction 
completes with C4001. but C5 crashed before completing in MDT.
 // and again w/ C6, we will re-attempt compaction at which point latest 
delta commit is C4 in MDT.
 // and so we try compaction w/ instant C4001. So, we can avoid compaction 
if we already have compaction w/ same instant time.
-if 
(!metadataMetaClient.getActiveTimeline().filterCompletedInstants().containsInstant(compactionInstantTime)
-&& writeClient.scheduleCompactionAtInstant(compactionInstantTime, 
Option.empty())) {
+if 
(metadataMetaClient.getActiveTimeline().filterCompletedInstants().containsInstant(compactionInstantTime))
 {
+  LOG.info(String.format("Compaction with same %s time is already present 
in the timeline.", compactionInstantTime));
+  return;
+} else if (writeClient.scheduleCompactionAtInstant(compactionInstantTime, 
Option.empty())) {
+  LOG.info("Compaction is scheduled for timestamp " + 
compactionInstantTime);
   writeClient.compact(compactionInstantTime);
+} else if (metadataWriteConfig.isLogCompactionEnabled()) {
+  // Schedule and execute log compaction with suffixes based on the same 
instant time. This ensures that any future
+  // delta commits synced over will not have an instant time lesser than 
the last completed instant on the
+  // metadata table.
+  final String logCompactionInstantTime = 
HoodieTableMetadataUtil.createLogCompactionTimestamp(latestDeltacommitTime);
+  if 
(metadataMetaClient.getActiveTimeline().filterCompletedInstants().containsInstant(logCompactionInstantTime))
 {
+LOG.info(String.format("Log compaction with same %s time is already 
present in the timeline.", logCompactionInstantTime));
+return;

Review Comment:
   This `return` can be eliminated.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [hudi] danny0405 commented on a diff in pull request #8900: [HUDI-6334] Integrate logcompaction table service to metadata table and provides various bugfixes to metadata table

2023-06-08 Thread via GitHub


danny0405 commented on code in PR #8900:
URL: https://github.com/apache/hudi/pull/8900#discussion_r1223872493


##
hudi-client/hudi-client-common/src/main/java/org/apache/hudi/metadata/HoodieBackedTableMetadataWriter.java:
##
@@ -985,9 +985,10 @@ public void performTableServices(Option 
inFlightInstantTimestamp) {
 
   // Do timeline validation before scheduling compaction/logcompaction 
operations.
   if (validateTimelineBeforeSchedulingCompaction(inFlightInstantTimestamp, 
latestDeltacommitTime)) {
+LOG.info("No inflight commits present in either of the timelines. "
++ "Proceeding to check compaction.");

Review Comment:
   Do we need a log like this, the compaction would put many loggins anyway.



##
hudi-client/hudi-client-common/src/main/java/org/apache/hudi/metadata/HoodieBackedTableMetadataWriter.java:
##
@@ -1021,17 +1023,46 @@ private void 
runPendingTableServicesOperations(BaseHoodieWriteClient writeClient
* deltacommit.
*/
   protected void compactIfNecessary(BaseHoodieWriteClient writeClient, String 
latestDeltacommitTime) {
+
+// Check if there are any pending compaction or log compaction instants in 
the timeline.
+// If pending compact/logcompaction operations are found abort scheduling 
new compaction/logcompaction operations.
+Option pendingLogCompactionInstant =
+
metadataMetaClient.getActiveTimeline().filterPendingLogCompactionTimeline().firstInstant();

Review Comment:
   Wondering how could this happen, all the compactions are inline for MDT 
right?



##
hudi-client/hudi-client-common/src/main/java/org/apache/hudi/metadata/HoodieMetadataWriteUtils.java:
##
@@ -111,6 +112,10 @@ public static HoodieWriteConfig createMetadataWriteConfig(
 // deltacommits having corresponding completed commits. Therefore, 
we need to compact all fileslices of all
 // partitions together requiring UnBoundedCompactionStrategy.
 .withCompactionStrategy(new UnBoundedCompactionStrategy())
+// Check if log compaction is enabled, this is needed for tables 
with lot of records.
+.withLogCompactionEnabled(writeConfig.isLogCompactionEnabled())
+// This config is only used if enableLogCompactionForMetadata is 
set.

Review Comment:
   Yeah, log compaction for MDT and DT should not share the same config option 
I think.



##
hudi-flink-datasource/hudi-flink/src/test/java/org/apache/hudi/sink/TestStreamWriteOperatorCoordinator.java:
##
@@ -233,48 +233,49 @@ void testSyncMetadataTable() throws Exception {
 assertThat(completedTimeline.lastInstant().get().getTimestamp(), 
startsWith(HoodieTableMetadata.SOLO_COMMIT_TIMESTAMP));
 
 // test metadata table compaction
-// write another 4 commits
-for (int i = 1; i < 5; i++) {
+// write another 9 commits to trigger compaction twice. Since default 
clean version to retain is 2.

Review Comment:
   Why we need to change the clean strategy then?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [hudi] danny0405 commented on a diff in pull request #8900: [HUDI-6334] Integrate logcompaction table service to metadata table and provides various bugfixes to metadata table

2023-06-08 Thread via GitHub


danny0405 commented on code in PR #8900:
URL: https://github.com/apache/hudi/pull/8900#discussion_r1223869176


##
hudi-client/hudi-client-common/src/main/java/org/apache/hudi/io/HoodieMergeHandleFactory.java:
##
@@ -47,6 +50,7 @@ public static  HoodieMergeHandle 
create(
   String fileId,
   TaskContextSupplier taskContextSupplier,
   Option keyGeneratorOpt) {
+LOG.info("Get updateHandle for fileId " + fileId + " and partitionPath " + 
partitionPath + " at commit " + instantTime);

Review Comment:
   `Create update handle for fileId ... and partition path ...`



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org