kishorvpatil opened a new pull request #3297:
URL: https://github.com/apache/storm/pull/3297


   ## What is the purpose of the change
   
   Currently, the storm version of the topology is used to determine the RPC 
heartbeats usage.  For  large clusters with beefier machines, each supervisor 
can have 100s of workers and multiple supervisor daemons going down can cause a 
lot of load on nimbus. 
   Currently, 
   * While using 2.x topologies, the RPC heartbeats ignore Pacemaker 
availability.
   * The call _sendSupervisorWorkerHeartbeat_ is just checking supervisor is up.
   * The worker should kill itself if assignment has changed. ( regression)
   * Supervisor timer threads are not named.
   * Nimbus should check if using Pacemaker and expecting heartbeat calls.
   
   With this change, if Pacemaker is used, the behavior is :
   
   1.  Worker does not call supervisor
   2.  Worker sends heartbeat to pacemaker periodically
   3.  Supervisor does not send worker heartbeats to nimbus.
   4. Nimbus checks if heartbeats should be expected from RPC calls or not.
   5. If supervisor is down, the worker kills itself on reassignment. So worker 
does hang around without checking the reassignments.
   6. Worker should restart itself if its assignments have changed. ( typically 
supervisor should notice the change in assignment and  restart worker.) But if 
supervisor is down, then this is a good backup.
   
   
   ## How was the change tested
   
   Setup cluster with Pacemaker and validate that:
   1.  Worker does sends heartbeat to Pacemaker instead of calling 
__sendSupervisorWorkerHeartbeat_.
   2. Stop Supervisor, re-balance topology- and worker dies (as assignments 
have changed - logs message about change in assignment worker.log)
   3. Supervisor does not send executor heartbeats to  nimbus.
   
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Reply via email to