danny0405 commented on code in PR #18904:
URL: https://github.com/apache/hudi/pull/18904#discussion_r3345902379
##########
hudi-client/hudi-client-common/src/main/java/org/apache/hudi/client/heartbeat/HoodieHeartbeatClient.java:
##########
@@ -197,20 +222,26 @@ public boolean isHeartbeatExpired(String instantTime)
throws IOException {
private void updateHeartbeat(String instantTime) throws
HoodieHeartbeatException {
try {
Long newHeartbeatTime = System.currentTimeMillis();
- OutputStream outputStream =
- this.storage.create(
- new StoragePath(heartbeatFolderPath, instantTime), true);
- outputStream.close();
+ writeHeartbeatFile(instantTime);
Heartbeat heartbeat = instantToHeartbeatMap.get(instantTime);
if (heartbeat.getLastHeartbeatTime() != null &&
isHeartbeatExpired(instantTime)) {
- log.error("Aborting, missed generating heartbeat within allowable
interval {} ms", this.maxAllowableHeartbeatIntervalInMs);
- // Since TimerTask allows only java.lang.Runnable, cannot throw an
exception and bubble to the caller thread, hence
- // explicitly interrupting the timer thread.
- Thread.currentThread().interrupt();
+ // A previous refresh was delayed past the tolerable interval (e.g.
due to a slow storage write
+ // or driver pressure). Do NOT interrupt the timer thread here: that
would permanently kill all
+ // future heartbeats for this instant, turning a transient delay into
a permanent blackout.
+ // Enforcement is done at commit time in
HeartbeatUtils.abortIfHeartbeatExpired(), which is the
+ // correct and sole enforcement point.
+ log.warn("Missed generating heartbeat for instant {} within allowable
interval {} ms; continuing to refresh",
Review Comment:
makes sense somehow, but I do see some risk for correctness: when failed
writes rollback strategy is configured as LAZY, the async cleaner would
possibility rollback the current instant by removing some data files(not remove
the metadata files on timeline yet), and then the write finish to commit, then
the commit got data loss.
Should we also increase the tolerale missing cnt, the current default is 2,
should we change it to 10 or 20
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]