nsivabalan commented on code in PR #9106:
URL: https://github.com/apache/hudi/pull/9106#discussion_r1252401000


##########
hudi-client/hudi-client-common/src/main/java/org/apache/hudi/metadata/HoodieMetadataWriteUtils.java:
##########
@@ -116,11 +134,10 @@ public static HoodieWriteConfig createMetadataWriteConfig(
             // Below config is only used if isLogCompactionEnabled is set.
             
.withLogCompactionBlocksThreshold(writeConfig.getMetadataLogCompactBlocksThreshold())
             .build())
-        .withParallelism(parallelism, parallelism)
-        .withDeleteParallelism(parallelism)
-        .withRollbackParallelism(parallelism)
-        .withFinalizeWriteParallelism(parallelism)
-        .withAllowMultiWriteOnSameInstant(true)
+        
.withStorageConfig(HoodieStorageConfig.newBuilder().hfileMaxFileSize(maxHFileSizeBytes)
+            
.logFileMaxSize(maxLogFileSizeBytes).logFileDataBlockMaxSize(maxLogBlockSizeBytes).build())
+        .withRollbackParallelism(defaultParallelism)
+        .withFinalizeWriteParallelism(defaultParallelism)

Review Comment:
   did you remove .withAllowMultiWriteOnSameInstant(true) intentionally ? 



##########
hudi-client/hudi-client-common/src/main/java/org/apache/hudi/metadata/HoodieMetadataWriteUtils.java:
##########
@@ -91,8 +108,9 @@ public static HoodieWriteConfig createMetadataWriteConfig(
         .withCleanConfig(HoodieCleanConfig.newBuilder()
             .withAsyncClean(DEFAULT_METADATA_ASYNC_CLEAN)
             .withAutoClean(false)
-            .withCleanerParallelism(parallelism)
-            .withCleanerPolicy(HoodieCleaningPolicy.KEEP_LATEST_COMMITS)
+            .withCleanerParallelism(defaultParallelism)
+            .withCleanerPolicy(HoodieCleaningPolicy.KEEP_LATEST_FILE_VERSIONS)
+            .retainFileVersions(2)

Review Comment:
   I understand it could be a larger change, but file versions makes sense in 
general. If uber has been running w/ file versions for 6+ months, we should do 
a round of testing on our end, and can possibly proceed.
    but incremental cleaning may not kick in. so, for large MDTs, wondering 
will there be any latency hit 



##########
hudi-common/src/main/java/org/apache/hudi/common/table/view/AbstractTableFileSystemView.java:
##########
@@ -341,7 +341,11 @@ private void ensurePartitionsLoadedCorrectly(List<String> 
partitionList) {
         long beginTs = System.currentTimeMillis();
         // Not loaded yet
         try {
-          LOG.info("Building file system view for partitions " + partitionSet);
+          if (partitionSet.size() < 100) {
+            LOG.info("Building file system view for partitions: " + 
partitionSet);

Review Comment:
   yes, may be we should reconsider the freq of logging here. for eg, log every 
every 100 partitions or something. not sure we will gain much by logging this 
for every partition. 



##########
hudi-common/src/main/java/org/apache/hudi/metadata/HoodieBackedTableMetadata.java:
##########
@@ -537,7 +538,8 @@ public HoodieTableMetaClient getMetadataMetaClient() {
   }
 
   public Map<String, String> stats() {
-    return metrics.map(m -> m.getStats(true, metadataMetaClient, 
this)).orElse(new HashMap<>());
+    Set<String> allMetadataPartitionPaths = 
Arrays.stream(MetadataPartitionType.values()).map(MetadataPartitionType::getPartitionPath).collect(Collectors.toSet());
+    return metrics.map(m -> m.getStats(true, metadataMetaClient, this, 
allMetadataPartitionPaths)).orElse(new HashMap<>());

Review Comment:
   HoodieMetadataMetrics.getStats(boolean detailed, HoodieTableMetaClient 
metaClient, HoodieTableMetadata metadata) 
   
   reloads the timeline. 
   can we move the reload to outside of the caller so that we don't reload for 
every MDT partition stats



##########
hudi-client/hudi-client-common/src/main/java/org/apache/hudi/metadata/HoodieBackedTableMetadataWriter.java:
##########
@@ -176,7 +176,7 @@ private void initMetadataReader() {
     }
 
     try {
-      this.metadata = new HoodieBackedTableMetadata(engineContext, 
dataWriteConfig.getMetadataConfig(), dataWriteConfig.getBasePath());
+      this.metadata = new HoodieBackedTableMetadata(engineContext, 
dataWriteConfig.getMetadataConfig(), dataWriteConfig.getBasePath(), true);

Review Comment:
   rational is that, metadata writer itself is short lived just for committing 
one instant and so we should be good to enable re-use here? 
   do we even expect to see any improvement here, since this is meant just for 
one write to MDT? 



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

Reply via email to