This is an automated email from the ASF dual-hosted git repository.

feiwang pushed a commit to branch main
in repository https://gitbox.apache.org/repos/asf/celeborn.git


The following commit(s) were added to refs/heads/main by this push:
     new 7ab6268e3 [CELEBORN-2083] For `WorkerStatusTracker`, log error for 
`recordWorkerFailure`
7ab6268e3 is described below

commit 7ab6268e38a30699c1be86dd9298fd8233564f77
Author: Wang, Fei <[email protected]>
AuthorDate: Sun Jul 27 22:46:20 2025 -0700

    [CELEBORN-2083] For `WorkerStatusTracker`, log error for 
`recordWorkerFailure`
    
    ### What changes were proposed in this pull request?
    
    For WorkerStatusTracker, log error for recordWorkerFailure to separate with 
status change from application heartbeat response.
    
    ### Why are the changes needed?
    
    Currently, in `WorkerStatusTracker`, it logs warning for two cases:
    1. status change from application heartbeat response
    
https://github.com/apache/celeborn/blob/ae40222351cbeb1a9bdd398d461255a0739f3cac/client/src/main/scala/org/apache/celeborn/client/WorkerStatusTracker.scala#L213-L214
    
    2. `recordWorkerFailure ` on some failures, likes `connectFailedWorkers`.
    
    In our use case, the celeborn cluster is very large and the worker status 
change frequently, so the log for case 1 is very noisy.
    
    I think that:
    1. for case2, it is more critical, should use error level
    2. for case1, it might be normal for large celeborn cluster, warning level 
is fine.
    
    With separated log levels, we can mute the noisy status change from 
application heartbeat response by setting the log level for 
`WorkerStatusTracker` to error.
    
    ### Does this PR introduce _any_ user-facing change?
    
    No.
    
    ### How was this patch tested?
    Code review.
    
    Closes #3392 from turboFei/log_level_worker_status.
    
    Authored-by: Wang, Fei <[email protected]>
    Signed-off-by: Wang, Fei <[email protected]>
---
 .../src/main/scala/org/apache/celeborn/client/WorkerStatusTracker.scala | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git 
a/client/src/main/scala/org/apache/celeborn/client/WorkerStatusTracker.scala 
b/client/src/main/scala/org/apache/celeborn/client/WorkerStatusTracker.scala
index f065f2c3e..698032014 100644
--- a/client/src/main/scala/org/apache/celeborn/client/WorkerStatusTracker.scala
+++ b/client/src/main/scala/org/apache/celeborn/client/WorkerStatusTracker.scala
@@ -124,7 +124,7 @@ class WorkerStatusTracker(
       val failedWorkersMsg = failedWorkers.asScala.map { case (worker, 
(status, time)) =>
         s"${worker.readableAddress()}   ${status.name()}   
${Utils.formatTimestamp(time)}"
       }.mkString("\n")
-      logWarning(
+      logError(
         s"""
            |Reporting failed workers:
            |$failedWorkersMsg$currentFailedWorkers""".stripMargin)

Reply via email to