yihua commented on code in PR #18412:
URL: https://github.com/apache/hudi/pull/18412#discussion_r3036403812


##########
hudi-common/src/main/java/org/apache/hudi/common/table/read/buffer/LogScanningRecordBufferLoader.java:
##########
@@ -54,7 +55,14 @@ protected <T> List<String> 
scanLogFiles(HoodieReaderContext<T> readerContext, Ho
       
readStats.setTotalUpdatedRecordsCompacted(logRecordReader.getNumMergedRecordsInLog());
       readStats.setTotalLogFilesCompacted(logRecordReader.getTotalLogFiles());
       readStats.setTotalLogRecords(logRecordReader.getTotalLogRecords());
-      readStats.setTotalLogBlocks(logRecordReader.getTotalLogBlocks());
+      readStats.setTotalLogBlocks(logRecordReader.getTotalValidLogBlocks());

Review Comment:
   🤖 This silently changes the semantics of `totalLogBlocks` in commit 
metadata. Previously it reported all scanned blocks (`getTotalLogBlocks()`), 
now it reports only valid blocks after rollback/compaction filtering 
(`getTotalValidLogBlocks()`). Since this value is persisted in 
`HoodieWriteStat` and surfaced through 
`HoodieCommitMetadata.getTotalLogBlocksCompacted()` and CLI commands, existing 
consumers will see different numbers for the same workload. Could you keep 
`setTotalLogBlocks` using `getTotalLogBlocks()` (total scanned) and use the new 
`setTotalValidLogBlocks` for the valid-only count?



##########
hudi-common/src/main/java/org/apache/hudi/common/table/read/buffer/LogScanningRecordBufferLoader.java:
##########
@@ -54,7 +55,14 @@ protected <T> List<String> 
scanLogFiles(HoodieReaderContext<T> readerContext, Ho
       
readStats.setTotalUpdatedRecordsCompacted(logRecordReader.getNumMergedRecordsInLog());
       readStats.setTotalLogFilesCompacted(logRecordReader.getTotalLogFiles());
       readStats.setTotalLogRecords(logRecordReader.getTotalLogRecords());
-      readStats.setTotalLogBlocks(logRecordReader.getTotalLogBlocks());
+      readStats.setTotalLogBlocks(logRecordReader.getTotalValidLogBlocks());
+      
readStats.setTotalValidLogBlocks(logRecordReader.getTotalValidLogBlocks());
+      readStats.setTotalLogBlocksSize(logRecordReader.getTotalLogBlocksSize());
+      
readStats.setTotalLogBlocksScanTimeMs(logRecordReader.getBlocksScanDuration());
+      
readStats.setTotalLogSizeCompacted(logRecordReader.getBlocksStats().stream()

Review Comment:
   🤖 This computes `totalLogSizeCompacted` by summing only data block sizes 
from `blocksStats`, but `blocksStats` is only populated for data blocks 
(AVRO/HFILE/PARQUET) — delete blocks are excluded. The old code used 
`CompactionStrategy.TOTAL_LOG_FILE_SIZE` which was the total log file size. 
This will under-report compacted log size when delete blocks are present. Is 
that intentional?



##########
hudi-common/src/main/java/org/apache/hudi/common/table/log/BaseHoodieLogRecordReader.java:
##########
@@ -277,6 +299,7 @@ && 
compareTimestamps(logBlock.getLogBlockHeader().get(INSTANT_TIME), GREATER_THA
           }
           if (instantRange.isPresent() && 
!instantRange.get().isInRange(instantTime)) {
             // filter the log block by instant range

Review Comment:
   🤖 Using WARN level for an instant that's outside the requested range seems 
like normal filtering behavior rather than something concerning. Could this be 
noisy in production? I'd suggest DEBUG level here — an instant being filtered 
by range is expected, not a warning.



##########
hudi-common/src/main/java/org/apache/hudi/common/table/log/BaseHoodieLogRecordReader.java:
##########
@@ -306,8 +332,14 @@ && 
compareTimestamps(logBlock.getLogBlockHeader().get(INSTANT_TIME), GREATER_THA
                   logBlock.getLogBlockHeader().get(TARGET_INSTANT_TIME);
               targetRollbackInstants.add(targetInstantForCommandBlock);
               orderedInstantsList.remove(targetInstantForCommandBlock);
-              instantToBlocksMap.remove(targetInstantForCommandBlock);
+              List<HoodieLogBlock> rolledBackBlocks = 
instantToBlocksMap.remove(targetInstantForCommandBlock);
+              if (rolledBackBlocks != null) {
+                numBlocksRolledBack += rolledBackBlocks.size();
+              }
+              LOG.info("Reading a rollback block with instant {} and target 
instant {}",
+                  instantTime, targetInstantForCommandBlock);
             } else {
+              LOG.info("Reading a command block with instant {} whose 
operation is not supported", instantTime);

Review Comment:
   🤖 Good point. The `LOG.error` before the throw is added at line 345 in the 
new code (`LOG.error("Reading a command block with instant {} whose operation 
is not supported", instantTime)`), so I think this is already addressed in the 
current diff.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to