Hi
we have corosync(1.2.1) running on pacemkaer 1.0.6 on RHEL x86_64
while building the code there were errors related to pointer types
(GPOINTER_TO_INT in pacemaker/lib/common/remote.c :295) i changed references
from /usr/lib/glib-2.0/include to /usr/lib64/glib-2.0/include to get rid of
compilation errors
after starting corosync crmd is failing and local node is always shown as
offline in two cluster node. and following error is logged repeatedly in
var/log/message file
crmd: [3180]: info: do_cib_control: CIB connection established
.
.
.
.
.
crmd: [3180]: WARN:lrm_signon: can not initiate connection
crmd: [3180]: WARN: do_lrm_control: Failed to sign on to the LRM 3 (30 max)
times
.
crmd is getting restarted after 30 tries
debugging crmd i found the connect() api is returning -1 while connecting to
socket file /usr/var/run/heartbeat/lrm_cmd_soc
fileName:: ./lib/clplumbing/ipcsocket.c < Reusable-Cluster-Components-
6c8645d6a4c2 Cluster Glue>
line Number: 962
connect(<fd>,
{sun_family = 1, sun_path
= "/usr/var/run/heartbeat/lrm_cmd_sock", '\0' <repeats 72 times>}
)
for this the api is returning -1
further info
# ls -l /usr/var/run/heartbeat/lrm_cmd_sock
srwxrwxrwx 1 root root 0 Apr 9 19:49 /usr/var/run/heartbeat/lrm_cmd_sock
# cat /etc/passwd | grep hacluster
hacluster:x:501:501::/home/hacluster:/bin/bash
[r...@ibhost common]# cat /etc/group | grep ha
haldaemon:x:68:
hacluster:x:501:
haclient:x:502:hacluster
to find out why local node is is being shown offline using <crm status>
command any help would be appreciated?
thanks
chajo
_______________________________________________
Pacemaker mailing list
[email protected]
http://oss.clusterlabs.org/mailman/listinfo/pacemaker