rmatharu opened a new pull request #1347: SAMZA-2511 : Adding logic to handle 
container stop fail 
URL: https://github.com/apache/samza/pull/1347
 
 
   Problem: The standby container manager does not handle 
container-stop-failures.
   These events can happen as a result of certificate/authentication issue 
during the execution of the container-stop. The problem is the 
standby-container-failover flow relies on a stop-container succeeding and in 
this case does not complete the failover. This means the active container, for 
which a failover was initiated, is never started again.
   In case of a container-placement action, that runs into container-stop-fail, 
the action is declared as failed. 
   Cause: Above.
   Fix: The fix is for standby-container-manager to intercept and handle these 
events by continuing the failover by either selecting another standby container 
(if one is present i.e., rf > 2) or using a standby host or using any-host.
   API changes: None
   Upgrade Instructions: None

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
[email protected]


With regards,
Apache Git Services

Reply via email to