voonhous opened a new issue, #19001: URL: https://github.com/apache/hudi/issues/19001
### Describe the problem In `HoodieAppendHandle.close()`, for every produced `WriteStatus` the handle calls `storage.getPathInfo(<logFilePath>).getLength()` to record the final log file size. On object stores (S3/GCS) each call is a remote HEAD request, so a delta commit touching K file groups issues K extra round trips per append handle -- purely to read a size the handle already knows. The size is fully determined by the writes the handle just performed: `HoodieLogFormatWriter.appendBlocks` returns an `AppendResult` with the start offset and the total bytes appended (covering every on-disk byte: magic, header, content, footers, reverse-pointer long), and these already populate the delta write stat's `logOffset` and `fileSizeInBytes`. Appends within a handle are contiguous, so a log file's length equals `logOffset + fileSizeInBytes`. ### Proposed fix In `close()`, set each log file's final size to `stat.getLogOffset() + stat.getFileSizeInBytes()` instead of issuing a `getPathInfo`/HEAD per file. The value is byte-identical to `getFileStatus().getLength()` (any pre-block bytes are absorbed into `logOffset`, and `closeStream()` appends no trailer), and it removes one remote round trip per log file per file group on the MOR write path. Same class of change as the merged clean-path getPathInfo removal (#18963). Will raise a PR for this. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
