+1
On Fri, Dec 18, 2015 at 3:46 PM, Jianxia Chen <[email protected]> wrote: > Hi all, > > I am trying to implement the stats for Geode membership service health > monitor, which monitors the health of the members of the distributed system > by heartbeats. I will describe the stats that will be implemented. Please > take a look and let me know what you think. > > Assume you have basic knowledge of Geode, here is a very brief description > of how the health monitor works. Every member exchanges heartbeat messages > with its neighbors to make sure that its neighbor is alive. If for some > reason, a member doesn't receive heartbeat from its neighbor, the member > will send suspect member messages to the coordinator reporting the issue. > Upon receiving the suspect member message, the coordinator will perform a > final check with the suspect member by exchanging final check messages > (similar to heartbeat) with the suspect member. Depending on the result of > final check, the coordinator can decide whether to keep or remove the > suspect member from membership. For details of the health monitor, please > refer to GEODE-77 and/or GMSHealthMonitor.java. > > The proposed stats for health monitor are: > > 1) The number of heartbeat requests a member has sent > 2) The number of heartbeat requests a member has received > 3) The number of heartbeat (responses) a member has sent > 4) The number of heartbeat (responses) a member has received > 5) The number of suspect member messages a member has sent > 6) The number of suspect member messages a member has received > 7) The number of final check request a member has sent > 8) The number of final check request a member has received > 9) The number of final check responses a member has sent > 10) The number of final check responses a member has received > > Note that there are two different types of final checks (TCP based and UDP > based), therefore more stats of these two types of final checks: > > 11) The number of TCP final check request a member has sent > 12) The number of TCP final check request a member has received > 13) The number of TCP final check responses a member has sent > 14) The number of TCP final check responses a member has received > 15) The number of UDP final check request a member has sent > 16) The number of UDP final check request a member has received > 17) The number of UDP final check responses a member has sent > 18) The number of UDP final check responses a member has received > > Thanks, > Jianxia >
