DavidMcLaughlin opened a new issue #30: Count number of times partitioned tasks 
reenter the cluster as healthy
URL: https://github.com/apache/aurora/issues/30
 
 
   Currently when a task is PARTITIONED and LOST, Aurora reschedules a 
replacement. Later on, the task can send a message saying it was healthy and 
then Aurora will kill the old task. Receiving this signal is a huge indicator 
that you could avoid unnecessary churn in the cluster by extending timeouts. 
   
   Add a metric to monitor how often this use case happens. 

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

Reply via email to