Ok Javier So now i know you don't wanna the fencing and the reason :-)
<fence_daemon clean_start="1" post_fail_delay="0" post_join_delay="-1"/> and use the fence_manual 2012/6/20 Javier Vela <[email protected]> > I don't use fencing because with ha-lvm I thought that I dind't need it. > But also because both nodes are VMs in VMWare. I know that there is a > module to do fencing with vmware but I prefer to avoid it. I'm not in > control of the VMWare infraestructure and probably VMWare admins won't give > me the tools to use this module. > > Regards, Javi > > >> Fencing is critical, and running a cluster without fencing, even with >> >> qdisk, is not supported. Manual fencing is also not supported. The >> *only* way to have a reliable cluster, testing or production, is to use >> fencing. >> >> Why do you not wish to use it? >> >> On 06/20/2012 09:43 AM, Javier Vela wrote: >> >> > As I readed, if you use HA-LVM you don't need fencing because of vg >> > tagging. Is It absolutely mandatory to use fencing with qdisk? >> > >> > If it is, i supose i can use manual_fence, but in production I also >> >> > won't use fencing. >> > >> > Regards, Javi. >> > >> > Date: Wed, 20 Jun 2012 14:45:28 +0200 >> > From: [email protected] <mailto:[email protected]> >> >> > To: [email protected] <mailto:[email protected]> >> > Subject: Re: [Linux-cluster] Node can't join already quorated cluster >> >> > >> > If you don't wanna use a real fence divice, because you only do some >> > test, you have to use fence_manual agent >> > >> > 2012/6/20 Javier Vela <[email protected] <mailto:[email protected]>> >> >> > >> > Hi, I have a very strange problem, and after searching through lot >> > of forums, I haven't found the solution. This is the scenario: >> > >> > Two node cluster with Red Hat 5.7, HA-LVM, no fencing and quorum >> >> > disk. I start qdiskd, cman and rgmanager on one node. After 5 >> > minutes, finally the fencing finishes and cluster get quorate with 2 >> > votes: >> > >> > [root@node2 ~]# clustat >> > Cluster Status for test_cluster @ Wed Jun 20 05:56:39 2012 >> >> > Member Status: Quorate >> > >> > Member Name ID Status >> > ------ ---- ---- ------ >> > node1-hb 1 Offline >> >> > node2-hb 2 Online, Local, rgmanager >> > /dev/mapper/vg_qdisk-lv_qdisk 0 Online, Quorum Disk >> > >> > Service Name Owner (Last) State >> >> > ------- ---- ----- ------ ----- >> > service:postgres node2 started >> > >> > Now, I start the second node. When cman reaches fencing, it hangs >> >> > for 5 minutes aprox, and finally fails. clustat says: >> > >> > root@node1 ~]# clustat >> > Cluster Status for test_cluster @ Wed Jun 20 06:01:12 2012 >> > Member Status: Inquorate >> > >> >> > Member Name ID Status >> > ------ ---- ---- ------ >> > node1-hb 1 Online, Local >> > node2-hb 2 Offline >> >> > /dev/mapper/vg_qdisk-lv_qdisk 0 Offline >> > >> > And in /var/log/messages I can see this errors: >> > >> > Jun 20 06:02:12 node1 openais[6098]: [TOTEM] entering OPERATIONAL >> > state. >> >> > Jun 20 06:02:12 node1 openais[6098]: [CLM ] got nodejoin message >> > 15.15.2.10 >> > Jun 20 06:02:13 node1 dlm_controld[5386]: connect to ccs error -111, >> > check ccsd or cluster status >> >> > Jun 20 06:02:13 node1 ccsd[6090]: Cluster is not quorate. Refusing >> > connection. >> > Jun 20 06:02:13 node1 ccsd[6090]: Error while processing connect: >> > Connection refused >> > Jun 20 06:02:13 node1 ccsd[6090]: Initial status:: Inquorate >> >> > Jun 20 06:02:13 node1 gfs_controld[5392]: connect to ccs error -111, >> > check ccsd or cluster status >> > Jun 20 06:02:13 node1 ccsd[6090]: Cluster is not quorate. Refusing >> > connection. >> >> > Jun 20 06:02:13 node1 ccsd[6090]: Error while processing connect: >> > Connection refused >> > Jun 20 06:02:14 node1 openais[6098]: [TOTEM] entering GATHER state >> > from 9. >> > Jun 20 06:02:14 node1 ccsd[6090]: Cluster is not quorate. Refusing >> >> > connection. >> > Jun 20 06:02:14 node1 ccsd[6090]: Error while processing connect: >> > Connection refused >> > Jun 20 06:02:14 node1 ccsd[6090]: Cluster is not quorate. Refusing >> > connection. >> >> > Jun 20 06:02:14 node1 ccsd[6090]: Error while processing connect: >> > Connection refused >> > Jun 20 06:02:15 node1 ccsd[6090]: Cluster is not quorate. Refusing >> > connection. >> > Jun 20 06:02:15 node1 ccsd[6090]: Error while processing connect: >> >> > Connection refused >> > Jun 20 06:02:15 node1 ccsd[6090]: Cluster is not quorate. Refusing >> > connection. >> > Jun 20 06:02:15 node1 ccsd[6090]: Error while processing connect: >> > Connection refused >> >> > Jun 20 06:02:15 node1 ccsd[6090]: Cluster is not quorate. Refusing >> > connection. >> > Jun 20 06:02:15 node1 ccsd[6090]: Error while processing connect: >> > Connection refused >> > Jun 20 06:02:16 node1 ccsd[6090]: Cluster is not quorate. Refusing >> >> > connection. >> > Jun 20 06:02:16 node1 ccsd[6090]: Error while processing connect: >> > Connection refused >> > Jun 20 06:02:16 node1 ccsd[6090]: Cluster is not quorate. Refusing >> > connection. >> >> > Jun 20 06:02:16 node1 ccsd[6090]: Error while processing connect: >> > Connection refused >> > Jun 20 06:02:17 node1 ccsd[6090]: Cluster is not quorate. Refusing >> > connection. >> > Jun 20 06:02:17 node1 ccsd[6090]: Error while processing connect: >> >> > Connection refused >> > Jun 20 06:02:17 node1 ccsd[6090]: Cluster is not quorate. Refusing >> > connection. >> > Jun 20 06:02:17 node1 ccsd[6090]: Error while processing connect: >> > Connection refused >> >> > Jun 20 06:02:18 node1 openais[6098]: [TOTEM] entering GATHER state >> > from 0. >> > Jun 20 06:02:18 node1 openais[6098]: [TOTEM] Creating commit token >> > because I am the rep. >> > Jun 20 06:02:18 node1 openais[6098]: [TOTEM] Storing new sequence id >> >> > for ring 15c >> > Jun 20 06:02:18 node1 openais[6098]: [TOTEM] entering COMMIT state. >> > Jun 20 06:02:18 node1 openais[6098]: [TOTEM] entering RECOVERY state. >> > Jun 20 06:02:18 node1 openais[6098]: [TOTEM] position [0] member >> >> > 15.15.2.10 <http://15.15.2.10>: >> > Jun 20 06:02:18 node1 openais[6098]: [TOTEM] previous ring seq 344 >> > rep 15.15.2.10 >> > Jun 20 06:02:18 node1 openais[6098]: [TOTEM] aru e high delivered e >> >> > received flag 1 >> > Jun 20 06:02:18 node1 openais[6098]: [TOTEM] Did not need to >> > originate any messages in recovery. >> > Jun 20 06:02:18 node1 openais[6098]: [TOTEM] Sending initial ORF token >> >> > Jun 20 06:02:18 node1 openais[6098]: [TOTEM] entering OPERATIONAL >> > state. >> > Jun 20 06:02:18 node1 ccsd[6090]: Cluster is not quorate. Refusing >> > connection. >> > Jun 20 06:02:18 node1 ccsd[6090]: Error while processing connect: >> >> > Connection refused >> > Jun 20 06:02:18 node1 openais[6098]: [TOTEM] entering GATHER state >> > from 9. >> > Jun 20 06:02:18 node1 ccsd[6090]: Cluster is not quorate. Refusing >> > connection. >> >> > >> > And the quorum disk: >> > >> > [root@node2 ~]# mkqdisk -L -d >> > kqdisk v0.6.0 >> > /dev/mapper/vg_qdisk-lv_qdisk: >> > /dev/vg_qdisk/lv_qdisk: >> > Magic: eb7a62c2 >> >> > Label: cluster_qdisk >> > Created: Thu Jun 7 09:23:34 2012 >> > Host: node1 >> > Kernel Sector Size: 512 >> >> > Recorded Sector Size: 512 >> > >> > Status block for node 1 >> > Last updated by node 2 >> > Last updated on Wed Jun 20 06:17:23 2012 >> > State: Evicted >> >> > Flags: 0000 >> > Score: 0/0 >> > Average Cycle speed: 0.000500 seconds >> > Last Cycle speed: 0.000000 seconds >> > Incarnation: 4fe1a06c4fe1a06c >> >> > Status block for node 2 >> > Last updated by node 2 >> > Last updated on Wed Jun 20 07:09:38 2012 >> > State: Master >> > Flags: 0000 >> > Score: 0/0 >> >> > Average Cycle speed: 0.001000 seconds >> > Last Cycle speed: 0.000000 seconds >> > Incarnation: 4fe1a06c4fe1a06c >> > >> > >> > In the other node I don't see any errors in /var/log/messages. One >> >> > strange thing is that if I start cman on both nodes at the same >> > time, everything works fine and both nodes quorate (until I reboot >> > one node and the problem appears). I've checked that multicast is >> >> > working properly. With iperf I can send a receive multicast paquets. >> > Moreover I've seen with tcpdump the paquets that openais send when >> > cman is trying to start. I've readed about a bug in RH 5.3 with the >> >> > same behaviour, but it is solved in RH 5.4. >> > >> > I don't have Selinux enabled, and Iptables are also disabled. Here >> > is the cluster.conf simplified (with less services and resources). I >> >> > want to point out one thing. I have allow_kill="0" in order to avoid >> > fencing errors when quorum tries to fence a failed node. As <fence/> >> > is empty, before this stanza I got a lot of messages in >> >> > /var/log/messages with failed fencing. >> > >> > <?xml version="1.0"?> >> > <cluster alias="test_cluster" config_version="15" name="test_cluster"> >> >> > <fence_daemon clean_start="0" post_fail_delay="0" >> > post_join_delay="-1"/> >> > <clusternodes> >> > <clusternode name="node1-hb" nodeid="1" votes="1"> >> >> > <fence/> >> > </clusternode> >> > <clusternode name="node2-hb" nodeid="2" votes="1"> >> > <fence/> >> >> > </clusternode> >> > </clusternodes> >> > <cman two_node="0" expected_votes="3"/> >> > <fencedevices/> >> >> > >> > <rm log_facility="local4" log_level="7"> >> > <failoverdomains> >> > <failoverdomain name="etest_cluster_fo" >> >> > nofailback="1" ordered="1" restricted="1"> >> > <failoverdomainnode name="node1-hb" >> > priority="1"/> >> >> > <failoverdomainnode name="node2-hb" >> > priority="2"/> >> > </failoverdomain> >> > </failoverdomains> >> >> > <resources/> >> > <service autostart="1" domain="test_cluster_fo" >> > exclusive="0" name="postgres" recovery="relocate"> >> >> > <ip address="172.24.119.44" monitor_link="1"/> >> > <lvm name="vg_postgres" vg_name="vg_postgres" >> > lv_name="postgres"/> >> >> > >> > <fs device="/dev/vg_postgres/postgres" >> > force_fsck="1" force_unmount="1" fstype="ext3" >> > mountpoint="/var/lib/pgsql" name="postgres" self_fence="0"/> >> >> > >> > <script file="/etc/init.d/postgresql" name="postgres"> >> > </script> >> > </service> >> > </rm> >> >> > <totem consensus="4000" join="60" token="20000" >> > token_retransmits_before_loss_const="20"/> >> > <quorumd allow_kill="0" interval="1" label="cluster_qdisk" >> >> > tko="10" votes="1"> >> > <heuristic >> > program="/usr/share/cluster/check_eth_link.sh eth0" score="1" >> > interval="2" tko="3"/> >> >> > </quorumd> >> > </cluster> >> > >> > >> > The /etc/hosts: >> > 172.24.119.10 node1 >> > 172.24.119.34 node2 >> > 15.15.2.10 node1-hb node1-hb.localdomain >> >> > 15.15.2.11 node2-hb node2-hb.localdomain >> > >> > And the versions: >> > Red Hat Enterprise Linux Server release 5.7 (Tikanga) >> > cman-2.0.115-85.el5 >> > rgmanager-2.0.52-21.el5 >> >> > openais-0.80.6-30.el5 >> > >> > I don't know what else I should try, so if you can give me some >> > ideas, I will be very pleased. >> > >> > Regards, Javi. >> > >> > -- >> >> > Linux-cluster mailing list >> > [email protected] <mailto:[email protected]> >> > https://www.redhat.com/mailman/listinfo/linux-cluster >> >> > >> > >> > >> > >> > -- >> > esta es mi vida e me la vivo hasta que dios quiera >> > >> > -- Linux-cluster mailing list [email protected] >> > <mailto:[email protected]> >> >> > https://www.redhat.com/mailman/listinfo/linux-cluster >> > >> > >> > -- >> > Linux-cluster mailing list >> > [email protected] >> >> > https://www.redhat.com/mailman/listinfo/linux-cluster >> > >> >> >> -- >> Digimer >> Papers and Projects: https://alteeve.com >> >> > -- > Linux-cluster mailing list > [email protected] > https://www.redhat.com/mailman/listinfo/linux-cluster > -- esta es mi vida e me la vivo hasta que dios quiera
-- Linux-cluster mailing list [email protected] https://www.redhat.com/mailman/listinfo/linux-cluster
