Hi,
Dejan Muhamedagic <[EMAIL PROTECTED]> wrote: 
> Hi,
> 
> On Thu, Feb 07, 2008 at 05:00:14PM +0100, Sebastian Reitenbach wrote:
> > Hi,
> > 
> > I have a 4 node cluster, and wanted to setup a quorum server, so that I 
do 
> > not need three running cluster nodes to get quorum. The quorumd IP 
address 
> > is a shared IP on another two node cluster. 
> > 
> > I've done the following tests, the quorumd from a 2.1.2 version of 
> > heartbeat, the cluster nodes had 2.1.3 version:
> > 
> > 
> > 
> > start quorumd 
> > start first cluster node -> (node becomes DC, contacting the quorum) 
cluster 
> > gets quorm
> > start second cluster node -> cluster still has quorum
> > stop DC, -> see other node becoming DC, and contacting quorum server, 
> > cluster still has quorum
> > kill quorumd, then see RST packets going back to cluster node (the DC 
tries 
> > to contact the quorumd every second) -> cluster still has quorum
> > wait 5 minutes -> cluster still has quorum
> > try to start stop a node, resource, add or remove a resource -> this 
works, 
> > then the cluster recognizes the lost quorum
> 
> After any of these actions the cluster looses quorum? Or is it
> just after the node restart?
I added a dummy resource, at a time when the quorumd was not reachable, The 
resource got created. The defautl target role is stopped, so the Dummy was 
stopped. Before I was able to make the dummy active, the cluster recognized 
that it lost quorum and refused to make the Dummy active.

> 
> > then restart the quorumd -> see answers going back from quorumd to DC 
node, 
> > but cluster has no quorum again
> > wait 5 minutes -> cluster still has no quorum again
> 
> I can recall that somebody else already complained about the same
> issue.
most likely me some months ago, fiddling around with 2.1.2 ;)

> 
> > restart heartbeat on one of the cluster nodes -> cluster recognizes the 
> > availablility of quorumd and gets quorum again
> > 
> > Setting a node to standby, does not make the cluster recognize that the 
> > quorum got lost, or is available again.
> > 
> > I also have seen, when there is a firewall, that drops packets, instead 
of 
> > answering with RST, when the quorumd is down, then the rate when the DC 
> > tries to reconnect to the quorumd drops to about once a minute, but that 
is 
> > OK, as I'd guess its waiting for timeouts.
> 
> Yes, looks like a TCP/IP property.
> 
> > So in my eyes, using a quorumd does more harm than being useful, but ma
> > did sth. wrong?
> 
> Since it has been working, you probably set it up ok. You should
> open a bugzilla for this. Sorry that I can't offer more help on
> the matter now.
> 
> BTW, did you also test a split brain situation where one of the
> nodes can talk to the quorumd?
no, I now decided, that I run the cluster without quorumd for now.
Nevertheless, I'll create a bugzilla entry.

cheers
Sebastian

_______________________________________________
Linux-HA mailing list
Linux-HA@lists.linux-ha.org
http://lists.linux-ha.org/mailman/listinfo/linux-ha
See also: http://linux-ha.org/ReportingProblems

Reply via email to