Hi all,

I am trying to implement the stats for Geode membership service health
monitor, which monitors the health of the members of the distributed system
by heartbeats. I will describe the stats that will be implemented. Please
take a look and let me know what you think.

Assume you have basic knowledge of Geode, here is a very brief description
of how the health monitor works. Every member exchanges heartbeat messages
with its neighbors to make sure that its neighbor is alive. If for some
reason, a member doesn't receive heartbeat from its neighbor, the member
will send suspect member messages to the coordinator reporting the issue.
Upon receiving the suspect member message, the coordinator will perform a
final check with the suspect member by exchanging final check messages
(similar to heartbeat) with the suspect member. Depending on the result of
final check, the coordinator can decide whether to keep or remove the
suspect member from membership. For details of the health monitor, please
refer to GEODE-77 and/or GMSHealthMonitor.java.

The proposed stats for health monitor are:

1) The number of heartbeat requests a member has sent
2) The number of heartbeat requests a member has received
3) The number of heartbeat (responses) a member has sent
4) The number of heartbeat (responses) a member has received
5) The number of suspect member messages a member has sent
6) The number of suspect member messages a member has received
7) The number of final check request a member has sent
8) The number of final check request a member has received
9) The number of final check responses a member has sent
10) The number of final check responses a member has received

Note that there are two different types of final checks (TCP based and UDP
based), therefore more stats of these two types of final checks:

11) The number of TCP final check request a member has sent
12) The number of TCP final check request a member has received
13) The number of TCP final check responses a member has sent
14) The number of TCP final check responses a member has received
15) The number of UDP final check request a member has sent
16) The number of UDP final check request a member has received
17) The number of UDP final check responses a member has sent
18) The number of UDP final check responses a member has received

Thanks,
Jianxia

Reply via email to