Hi, Dejan Muhamedagic <[EMAIL PROTECTED]> wrote: > Hi, > > On Thu, Feb 07, 2008 at 05:00:14PM +0100, Sebastian Reitenbach wrote: > > Hi, > > > > I have a 4 node cluster, and wanted to setup a quorum server, so that I do > > not need three running cluster nodes to get quorum. The quorumd IP address > > is a shared IP on another two node cluster. > > > > I've done the following tests, the quorumd from a 2.1.2 version of > > heartbeat, the cluster nodes had 2.1.3 version: > > > > > > > > start quorumd > > start first cluster node -> (node becomes DC, contacting the quorum) cluster > > gets quorm > > start second cluster node -> cluster still has quorum > > stop DC, -> see other node becoming DC, and contacting quorum server, > > cluster still has quorum > > kill quorumd, then see RST packets going back to cluster node (the DC tries > > to contact the quorumd every second) -> cluster still has quorum > > wait 5 minutes -> cluster still has quorum > > try to start stop a node, resource, add or remove a resource -> this works, > > then the cluster recognizes the lost quorum > > After any of these actions the cluster looses quorum? Or is it > just after the node restart? I added a dummy resource, at a time when the quorumd was not reachable, The resource got created. The defautl target role is stopped, so the Dummy was stopped. Before I was able to make the dummy active, the cluster recognized that it lost quorum and refused to make the Dummy active.
> > > then restart the quorumd -> see answers going back from quorumd to DC node, > > but cluster has no quorum again > > wait 5 minutes -> cluster still has no quorum again > > I can recall that somebody else already complained about the same > issue. most likely me some months ago, fiddling around with 2.1.2 ;) > > > restart heartbeat on one of the cluster nodes -> cluster recognizes the > > availablility of quorumd and gets quorum again > > > > Setting a node to standby, does not make the cluster recognize that the > > quorum got lost, or is available again. > > > > I also have seen, when there is a firewall, that drops packets, instead of > > answering with RST, when the quorumd is down, then the rate when the DC > > tries to reconnect to the quorumd drops to about once a minute, but that is > > OK, as I'd guess its waiting for timeouts. > > Yes, looks like a TCP/IP property. > > > So in my eyes, using a quorumd does more harm than being useful, but ma > > did sth. wrong? > > Since it has been working, you probably set it up ok. You should > open a bugzilla for this. Sorry that I can't offer more help on > the matter now. > > BTW, did you also test a split brain situation where one of the > nodes can talk to the quorumd? no, I now decided, that I run the cluster without quorumd for now. Nevertheless, I'll create a bugzilla entry. cheers Sebastian _______________________________________________ Linux-HA mailing list Linux-HA@lists.linux-ha.org http://lists.linux-ha.org/mailman/listinfo/linux-ha See also: http://linux-ha.org/ReportingProblems