RexXiong commented on PR #3650:
URL: https://github.com/apache/celeborn/pull/3650#issuecomment-4211073033

   Overall, this change looks good to me. The approach of retrieving failure 
counts directly from `TaskSetManager.numFailures` is more accurate than 
manually counting failed task attempts, especially for cases like container 
preemption where Spark doesn't increment the failure count.
   
   One suggestion: It would be helpful to add a test case that verifies the 
failure count is correctly incremented after an actual task failure (e.g., 
simulate a task failure and then verify that `getTaskFailureCount` returns the 
expected increased value). Currently, the test only validates boundary 
conditions (initial value, out-of-bounds indices), but doesn't cover the actual 
failure counting scenario.
   
   ---
   by claude


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to