hudi-agent commented on code in PR #18904:
URL: https://github.com/apache/hudi/pull/18904#discussion_r3368453431
##########
hudi-client/hudi-client-common/src/main/java/org/apache/hudi/client/heartbeat/HoodieHeartbeatClient.java:
##########
@@ -197,20 +223,31 @@ public boolean isHeartbeatExpired(String instantTime)
throws IOException {
private void updateHeartbeat(String instantTime) throws
HoodieHeartbeatException {
try {
Long newHeartbeatTime = System.currentTimeMillis();
- OutputStream outputStream =
- this.storage.create(
- new StoragePath(heartbeatFolderPath, instantTime), true);
- outputStream.close();
+ writeHeartbeatFile(instantTime);
Heartbeat heartbeat = instantToHeartbeatMap.get(instantTime);
if (heartbeat.getLastHeartbeatTime() != null &&
isHeartbeatExpired(instantTime)) {
Review Comment:
🤖 The comment below correctly explains why we must stop refreshing once a
lapse has been observed — but this `lastHeartbeatTime != null` guard means the
same protection isn't applied to the first-write-timeout case. If the
synchronous `writeHeartbeatFile` from `start()` times out and the subsequent
ticks also time out for longer than `maxAllowableHeartbeatIntervalInMs` before
one finally lands, a concurrent LAZY-policy cleaner could have rolled back this
instant in the gap (no file on DFS → static
`HoodieHeartbeatUtils.isHeartbeatExpired` returns expired →
`getInstantsToRollbackForLazyCleanPolicy` picks it up). When a tick finally
succeeds, this guard skips the expiry check and sets a fresh
`lastHeartbeatTime`, so the commit-time `abortIfHeartbeatExpired()` then sees a
healthy in-memory time and proceeds — exactly the "commit on top of rolled-back
files" outcome the comment is trying to prevent. Was this edge case considered?
One option is to also abort if `lastHeartbea
tTime` is null and `currentTime - <start()-time>` already exceeds
`maxAllowableHeartbeatIntervalInMs`.
<sub><i>- AI-generated; verify before applying. React 👍/👎 to flag
quality.</i></sub>
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]