The corosync_cfg_get_node_addrs() call does not fill the whole of the
addrs field passed in, specifically it only writes the the address
family and IP address, leaving the port number untouched.
If the port number contains junk, then that can get passed into the
kernel by dlm_controld where it is subsequently used in the comparison
that checks for valid cluster nodes in a connection. If this happens
then an otherwise valid connection can be rejected and the dlm will hang.
I've seen this quite often on s390 but I don't see any reason why it
might not also be causing intermittent connection problems on other archs.
Signed-Off-By: Christine Caulfield <ccaul...@redhat.com>
diff --git a/dlm_controld/member.c b/dlm_controld/member.c
index d4031ee..10351ec 100644
--- a/dlm_controld/member.c
+++ b/dlm_controld/member.c
@@ -132,6 +132,7 @@ static void quorum_callback(quorum_handle_t h, uint32_t quorate,
quorum_node_count = 0;
memset(&quorum_nodes, 0, sizeof(quorum_nodes));
+ memset(&addrs, 0, sizeof(addrs));
for (i = 0; i < node_list_entries; i++)
quorum_nodes[quorum_node_count++] = node_list[i];