Hi,
we have a simple 2-node cluster running CMAN and pacemaker under CentOS 6.
The problem is that upon startup the machines (even if "alone", i.e.
second machine is off), will give a cman timeout on startup saying
"Timed-out waiting for cluster".
*If I start the services manually an hour later or so, everything works
fine* until I reboot one machine, that machine will typically timeout
again on startup.
The issue seems not to be related to firewalling (I retried the failed
start with open firewalls- no change) nor to multicast communication as
the cluster is setup to use unicast via an bonded interface that has no
other purposes.
For the machine that timeouts, I do not see ANY communication attempts
with the other node (running tcpdump on both machines) so I suppose it
is waiting for something local to happen.
Also, I do not see ANY log entries related to the failed start; the
logfiles only get updated once the cluster has started successfully.
What might be the cause here? Could that be a fencing problem (we have
IPMI fencing configured which works)? Here is the cluster.conf:
<cluster config_version="13" name="fw-cluster">
<logging logfile="/var/log/cluster/cluster.log"
logfile_priority="info" syslog_priority="crit" to_logfile="yes"
to_syslog="yes"/>
<dlm protocol="sctp"/>
<fence_daemon clean_start="0" post_fail_delay="0" post_join_delay="3"
skip_undefined="1"/>
<clusternodes>
<clusternode name="gw1" nodeid="1">
<fence>
<method name="pcmk-redirect">
<device name="pcmk" port="gw1"/>
</method>
</fence>
</clusternode>
<clusternode name="gw2" nodeid="2">
<fence>
<method name="pcmk-redirect">
<device name="pcmk" port="gw2"/>
</method>
</fence>
</clusternode>
</clusternodes>
<cman expected_votes="1" transport="udpu" two_node="1"/>
<fencedevices>
<fencedevice agent="fence_pcmk" name="pcmk"/>
</fencedevices>
<rm>
<failoverdomains/>
<resources/>
</rm>
</cluster>
_______________________________________________
Linux-HA mailing list
Linux-HA@lists.linux-ha.org
http://lists.linux-ha.org/mailman/listinfo/linux-ha
See also: http://linux-ha.org/ReportingProblems