voonhous commented on code in PR #19002:
URL: https://github.com/apache/hudi/pull/19002#discussion_r3415163077
##########
hudi-client/hudi-client-common/src/main/java/org/apache/hudi/io/HoodieAppendHandle.java:
##########
@@ -564,14 +564,15 @@ public List<WriteStatus> close() {
writer = null;
}
- // update final size, once for all log files
- // TODO we can actually deduce file size purely from AppendResult (based
on offset and size
- // of the appended block)
+ // Set the final on-disk size of each log file. Appends within an append
handle are contiguous,
+ // so a log file's length equals its start offset plus the total bytes
appended to it. That is
+ // exactly what fs.getFileStatus().getLength() returns, and both values
are already captured by
+ // the AppendResult stats (logOffset and the accumulated
fileSizeInBytes). Deriving the size this
+ // way avoids a getPathInfo/HEAD per log file, which is a remote round
trip per file group on
+ // object stores.
for (WriteStatus status : statuses) {
- long logFileSize = storage.getPathInfo(
- new StoragePath(config.getBasePath(), status.getStat().getPath()))
- .getLength();
- status.getStat().setFileSizeInBytes(logFileSize);
+ HoodieDeltaWriteStat stat = (HoodieDeltaWriteStat) status.getStat();
+ stat.setFileSizeInBytes(stat.getLogOffset() +
stat.getFileSizeInBytes());
Review Comment:
Done, extracted `appendedBytes` so the before-state is explicit.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]