This is an automated email from the ASF dual-hosted git repository.
feiwang pushed a commit to branch branch-0.6
in repository https://gitbox.apache.org/repos/asf/celeborn.git
The following commit(s) were added to refs/heads/branch-0.6 by this push:
new a9c55dd53 [CELEBORN-2083] For `WorkerStatusTracker`, log error for
`recordWorkerFailure`
a9c55dd53 is described below
commit a9c55dd538adffc71dba950f96ee524314e6b409
Author: Wang, Fei <[email protected]>
AuthorDate: Sun Jul 27 22:46:20 2025 -0700
[CELEBORN-2083] For `WorkerStatusTracker`, log error for
`recordWorkerFailure`
### What changes were proposed in this pull request?
For WorkerStatusTracker, log error for recordWorkerFailure to separate with
status change from application heartbeat response.
### Why are the changes needed?
Currently, in `WorkerStatusTracker`, it logs warning for two cases:
1. status change from application heartbeat response
https://github.com/apache/celeborn/blob/ae40222351cbeb1a9bdd398d461255a0739f3cac/client/src/main/scala/org/apache/celeborn/client/WorkerStatusTracker.scala#L213-L214
2. `recordWorkerFailure ` on some failures, likes `connectFailedWorkers`.
In our use case, the celeborn cluster is very large and the worker status
change frequently, so the log for case 1 is very noisy.
I think that:
1. for case2, it is more critical, should use error level
2. for case1, it might be normal for large celeborn cluster, warning level
is fine.
With separated log levels, we can mute the noisy status change from
application heartbeat response by setting the log level for
`WorkerStatusTracker` to error.
### Does this PR introduce _any_ user-facing change?
No.
### How was this patch tested?
Code review.
Closes #3392 from turboFei/log_level_worker_status.
Authored-by: Wang, Fei <[email protected]>
Signed-off-by: Wang, Fei <[email protected]>
(cherry picked from commit 7ab6268e38a30699c1be86dd9298fd8233564f77)
Signed-off-by: Wang, Fei <[email protected]>
---
.../src/main/scala/org/apache/celeborn/client/WorkerStatusTracker.scala | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git
a/client/src/main/scala/org/apache/celeborn/client/WorkerStatusTracker.scala
b/client/src/main/scala/org/apache/celeborn/client/WorkerStatusTracker.scala
index f065f2c3e..698032014 100644
--- a/client/src/main/scala/org/apache/celeborn/client/WorkerStatusTracker.scala
+++ b/client/src/main/scala/org/apache/celeborn/client/WorkerStatusTracker.scala
@@ -124,7 +124,7 @@ class WorkerStatusTracker(
val failedWorkersMsg = failedWorkers.asScala.map { case (worker,
(status, time)) =>
s"${worker.readableAddress()} ${status.name()}
${Utils.formatTimestamp(time)}"
}.mkString("\n")
- logWarning(
+ logError(
s"""
|Reporting failed workers:
|$failedWorkersMsg$currentFailedWorkers""".stripMargin)