On 2013-03-21 14:31, Patrick Hemmer wrote: > I've got a 2-node cluster where it seems last night one of the nodes > went offline, and I can't see any reason why. > > Attached are the logs from the 2 nodes (the relevant timeframe seems to > be 2013-03-21 between 06:05 and 06:10). > This is on ubuntu 12.04
Looks like your non-redundant cluster-communication was interrupted at around that time for whatever reason and your cluster split-brained. Does the drbd-replication use a different network-connection? If yes, why not using it for a redundant ring setup ... and you should use STONITH. I also wonder why you have defined "expected_votes='1'" in your cluster.conf. Regards, Andreas -- Need help with Pacemaker? http://www.hastexo.com/now > > # crm status > ============ > Last updated: Thu Mar 21 13:17:21 2013 > Last change: Thu Mar 14 14:42:18 2013 via crm_shadow on i-a706d8ff > Stack: cman > Current DC: i-a706d8ff - partition WITHOUT quorum > Version: 1.1.6-9971ebba4494012a93c03b40a2c58ec0eb60f50c > 2 Nodes configured, unknown expected votes > 5 Resources configured. > ============ > > Online: [ i-a706d8ff ] > OFFLINE: [ i-3307d96b ] > > dns-postgresql (ocf::cloud:route53): Started i-a706d8ff > Master/Slave Set: ms-drbd-postgresql [drbd-postgresql] > Masters: [ i-a706d8ff ] > Stopped: [ drbd-postgresql:0 ] > fs-drbd-postgresql (ocf::heartbeat:Filesystem): Started i-a706d8ff > postgresql (ocf::heartbeat:pgsql): Started i-a706d8ff > > > # cman_tool nodes > Node Sts Inc Joined Name > 181480898 M 4 2013-03-14 14:25:27 i-3307d96b > 181481642 M 5132 2013-03-21 06:07:40 i-a706d8ff > > > # cman_tool status > Version: 6.2.0 > Config Version: 1 > Cluster Name: cloudapp-servic > Cluster Id: 63629 > Cluster Member: Yes > Cluster Generation: 5132 > Membership state: Cluster-Member > Nodes: 2 > Expected votes: 1 > Total votes: 2 > Node votes: 1 > Quorum: 2 > Active subsystems: 4 > Flags: > Ports Bound: 0 > Node name: i-3307d96b > Node ID: 181480898 > Multicast addresses: 255.255.255.255 > Node addresses: 10.209.45.194 > > > > # cat /etc/cluster/cluster.conf > <?xml version="1.0" ?> > <cluster name='cloudapp-servic' config_version='1'> > <logging to_logfile='no' syslog_facility='local2' > syslog_priority='debug' /> > <cman expected_votes='1' transport='udpu' /> > <clusternodes> > <clusternode nodeid='181480898' name='i-3307d96b'> > <fence> > <method name='pcmk-redirect'> > <device name='pcmk' port='i-3307d96b' /> > </method> > </fence> > </clusternode> > <clusternode nodeid='181481642' name='i-a706d8ff'> > <fence> > <method name='pcmk-redirect'> > <device name='pcmk' port='i-a706d8ff' /> > </method> > </fence> > </clusternode> > </clusternodes> > > <fencedevices> > <fencedevice name="pcmk" agent="fence_pcmk" /> > </fencedevices> > </cluster> > > > > > > _______________________________________________ > Pacemaker mailing list: Pacemaker@oss.clusterlabs.org > http://oss.clusterlabs.org/mailman/listinfo/pacemaker > > Project Home: http://www.clusterlabs.org > Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf > Bugs: http://bugs.clusterlabs.org >
signature.asc
Description: OpenPGP digital signature
_______________________________________________ Pacemaker mailing list: Pacemaker@oss.clusterlabs.org http://oss.clusterlabs.org/mailman/listinfo/pacemaker Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://bugs.clusterlabs.org