Are you saying that if you manually destroy the guest, then start it up it works?
I don't think your problem is with fencing I think its that the two guests are not joining correctly. It seems like the fencing part is working. Do the logs in /var/log/messages show that one node succesfully fenced the other? What is the output of group_tool on both nodes after they have come up, this should help you debug it. I don't think its relevant but this item from the FAQ may help: http://sources.redhat.com/cluster/wiki/FAQ/Fencing#fence_stuck Joel On Wed, Sep 22, 2010 at 7:08 PM, Rakovec Jost <[email protected]> wrote: > Hi > > anybody any idea? Please help!! > > > now i can fence node but after booting it can't connect in to cluster. > > on dom0 > > fence_xvmd -LX -I xenbr0 -U xen:/// -fdddddddddddddd > > > ipv4_connect: Connecting to client > ipv4_connect: Success; fd = 12 > Rebooting domain oelcl21... > [REBOOT] Calling virDomainDestroy(0x99cede0) > libvir: Xen error : Domain not found: xenUnifiedDomainLookupByName > [[ XML Domain Info ]] > <domain type='xen' id='41'> > <name>oelcl21</name> > <uuid>07e31b27-1ff1-4754-4f58-221e8d2057d6</uuid> > <memory>1048576</memory> > <currentMemory>1048576</currentMemory> > <vcpu>2</vcpu> > <bootloader>/usr/bin/pygrub</bootloader> > <os> > <type>linux</type> > </os> > <clock offset='utc'/> > <on_poweroff>destroy</on_poweroff> > <on_reboot>restart</on_reboot> > <on_crash>restart</on_crash> > <devices> > <disk type='block' device='disk'> > <driver name='phy'/> > <source dev='/dev/vg_datastore/oelcl21'/> > <target dev='xvda' bus='xen'/> > </disk> > <disk type='block' device='disk'> > <driver name='phy'/> > <source dev='/dev/vg_datastore/skupni1'/> > <target dev='xvdb' bus='xen'/> > <shareable/> > </disk> > <interface type='bridge'> > <mac address='00:16:3e:7c:60:aa'/> > <source bridge='xenbr0'/> > <script path='/etc/xen/scripts/vif-bridge'/> > <target dev='vif41.0'/> > </interface> > <console type='pty' tty='/dev/pts/2'> > <source path='/dev/pts/2'/> > <target port='0'/> > </console> > </devices> > </domain> > > [[ XML END ]] > Calling virDomainCreateLinux().. > > > on domU -node1 > > fence_xvm -H oelcl21 -ddd > > clustat on node1: > > [r...@oelcl11 ~]# clustat > Cluster Status for cluster2 @ Wed Sep 22 11:04:49 2010 > Member Status: Quorate > > Member Name ID Status > ------ ---- ---- ------ > oelcl11 1 Online, Local, > rgmanager > oelcl21 2 Online, rgmanager > > Service Name Owner (Last) > State > ------- ---- ----- ------ > ----- > service:web oelcl11 > started > [r...@oelcl11 ~]# > > > but node2 it waits for 300s an can 't connect > > Starting daemons... done > Starting fencing... Sep 22 10:41:06 oelcl21 kernel: eth0: no IPv6 routers > present > done > [ OK ] > > [r...@oelcl21 ~]# clustat > Cluster Status for cluster2 @ Wed Sep 22 11:04:19 2010 > Member Status: Quorate > > Member Name ID Status > ------ ---- ---- ------ > oelcl11 1 Online > oelcl21 2 Online, Local > > [r...@oelcl21 ~]# > > > > br > jost > > > > > ________________________________________ > From: [email protected] [[email protected]] > On Behalf Of Rakovec Jost [[email protected]] > Sent: Monday, September 13, 2010 9:31 AM > To: linux clustering > Subject: Re: [Linux-cluster] fence in xen > > Hi > > > Q: do fence_xvmd must run also in domU? > Because I notice that if I run on host when fence_xvmd is running: > > [r...@oelcl1 ~]# fence_xvm -H oelcl2 -ddd -o null > Debugging threshold is now 3 > -- args @ 0x7fffe3f71fb0 -- > args->addr = 225.0.0.12 > args->domain = oelcl2 > args->key_file = /etc/cluster/fence_xvm.key > args->op = 0 > args->hash = 2 > args->auth = 2 > args->port = 1229 > args->ifindex = 0 > args->family = 2 > args->timeout = 30 > args->retr_time = 20 > args->flags = 0 > args->debug = 3 > -- end args -- > Reading in key file /etc/cluster/fence_xvm.key into 0x7fffe3f70f60 (4096 > max size) > Actual key length = 4096 bytesSending to 225.0.0.12 via 127.0.0.1 > Sending to 225.0.0.12 via 10.9.131.80 > Sending to 225.0.0.12 via 10.9.131.83 > Sending to 225.0.0.12 via 192.168.122.1 > Waiting for connection from XVM host daemon. > Issuing TCP challenge > Responding to TCP challenge > TCP Exchange + Authentication done... > Waiting for return value from XVM host > Remote: Operation was successful > > > but if I try to fence ---> reboot then I get: > > [r...@oelcl1 ~]# fence_xvm -H oelc2 > Remote: Operation was successful > [r...@oelcl1 ~]# > > but host2 is not reboot. > > > if fence_xvmd is not run on hosts then I get time out. > > > > [r...@oelcl1 sysconfig]# fence_xvm -H oelcl2 -ddd -o null > Debugging threshold is now 3 > -- args @ 0x7fff1a6b5580 -- > args->addr = 225.0.0.12 > args->domain = oelcl2 > args->key_file = /etc/cluster/fence_xvm.key > args->op = 0 > args->hash = 2 > args->auth = 2 > args->port = 1229 > args->ifindex = 0 > args->family = 2 > args->timeout = 30 > args->retr_time = 20 > args->flags = 0 > args->debug = 3 > -- end args -- > Reading in key file /etc/cluster/fence_xvm.key into 0x7fff1a6b4530 (4096 > max size) > Actual key length = 4096 bytesSending to 225.0.0.12 via 127.0.0.1 > Sending to 225.0.0.12 via 10.9.131.80 > Waiting for connection from XVM host daemon. > Sending to 225.0.0.12 via 127.0.0.1 > Sending to 225.0.0.12 via 10.9.131.80 > Waiting for connection from XVM host daemon. > > > > Q: how can I try if multicast is ok? > > Q: on which network interface must fence_xvmd run on dom0? I notice that on > hosts-domU is: > > virbr0 Link encap:Ethernet HWaddr 00:00:00:00:00:00 > inet addr:192.168.122.1 Bcast:192.168.122.255 Mask:255.255.255.0 > inet6 addr: fe80::200:ff:fe00:0/64 Scope:Link > UP BROADCAST RUNNING MULTICAST MTU:1500 Metric:1 > RX packets:0 errors:0 dropped:0 overruns:0 frame:0 > TX packets:40 errors:0 dropped:0 overruns:0 carrier:0 > collisions:0 txqueuelen:0 > RX bytes:0 (0.0 b) TX bytes:7212 (7.0 KiB) > > > also virbr0 > > and on dom0 guest: > > [r...@vm5 ~]# fence_xvmd -fdd -I xenbr0 > -- args @ 0xbfd26234 -- > args->addr = 225.0.0.12 > args->domain = (null) > args->key_file = /etc/cluster/fence_xvm.key > args->op = 2 > args->hash = 2 > args->auth = 2 > args->port = 1229 > args->ifindex = 7 > args->family = 2 > args->timeout = 30 > args->retr_time = 20 > args->flags = 1 > args->debug = 2 > -- end args -- > Opened ckpt vm_states > My Node ID = 1 > Domain UUID Owner State > ------ ---- ----- ----- > Domain-0 00000000-0000-0000-0000-000000000000 00001 00001 > oelcl1 2a53022c-5836-68f0-4514-02a5a0b07e81 00001 00002 > oelcl2 dd268dd4-f012-e0f7-7c77-aa8a58e1e6ab 00001 00002 > oelcman 09c783bd-9107-0916-ebbf-bd27bcc8babe 00001 00002 > Storing oelcl1 > Storing oelcl2 > > > > [r...@vm5 ~]# fence_xvmd -fdd -I virbr0 > -- args @ 0xbfd26234 -- > args->addr = 225.0.0.12 > args->domain = (null) > args->key_file = /etc/cluster/fence_xvm.key > args->op = 2 > args->hash = 2 > args->auth = 2 > args->port = 1229 > args->ifindex = 7 > args->family = 2 > args->timeout = 30 > args->retr_time = 20 > args->flags = 1 > args->debug = 2 > -- end args -- > Opened ckpt vm_states > My Node ID = 1 > Domain UUID Owner State > ------ ---- ----- ----- > Domain-0 00000000-0000-0000-0000-000000000000 00001 00001 > oelcl1 2a53022c-5836-68f0-4514-02a5a0b07e81 00001 00002 > oelcl2 dd268dd4-f012-e0f7-7c77-aa8a58e1e6ab 00001 00002 > oelcman 09c783bd-9107-0916-ebbf-bd27bcc8babe 00001 00002 > Storing oelcl1 > Storing oelcl2 > > > no meter whic interface I take fence is not done. > > > thx > > br jost > > > > > > > > > > _____________________________________ > From: [email protected] [[email protected]] > On Behalf Of Rakovec Jost [[email protected]] > Sent: Saturday, September 11, 2010 6:36 PM > To: [email protected] > Subject: [Linux-cluster] fence in xen > > Hi list! > > > I have a question about fence_xvm. > > Situation is: > > one physical server with xen --> dom0 with 2 domU. Cluster work fine > between domU --reboot, relocate, > > I'm using redhat 5.5 > > Problem is with fence from dom0 with "fence_xvm -H oelcl2" , domU is > destroyed but when it is booted back domU can't join to the cluster. domU > boot very long time --> FENCED_START_TIMEOUT=300 > > > on console I get after the node2 is up: > > node2: > > INFO: task clurgmgrd:2127 blocked for more than 120 seconds. > "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. > clurgmgrd D 0000000000000010 0 2127 2126 > (NOTLB) > ffff88006f08dda8 0000000000000286 ffff88007cc0b810 0000000000000000 > 0000000000000003 ffff880072009860 ffff880072f6b0c0 00000000000455ec > ffff880072009a48 ffffffff802649d7 > Call Trace: > [<ffffffff802649d7>] _read_lock_irq+0x9/0x19 > [<ffffffff8021420e>] filemap_nopage+0x193/0x360 > [<ffffffff80263a7e>] __mutex_lock_slowpath+0x60/0x9b > [<ffffffff80263ac8>] .text.lock.mutex+0xf/0x14 > [<ffffffff88424b64>] :dlm:dlm_new_lockspace+0x2c/0x860 > [<ffffffff80222b08>] __up_read+0x19/0x7f > [<ffffffff802d0abb>] __kmalloc+0x8f/0x9f > [<ffffffff8842b6fa>] :dlm:device_write+0x438/0x5e5 > [<ffffffff80217377>] vfs_write+0xce/0x174 > [<ffffffff80217bc4>] sys_write+0x45/0x6e > [<ffffffff802602f9>] tracesys+0xab/0xb6 > > > between booting on node2: > > Starting clvmd: dlm: Using TCP for communications > clvmd startup timed out > [FAILED] > > > > node2: > > [r...@oelcl2 init.d]# clustat > Cluster Status for cluster1 @ Sat Sep 11 18:11:21 2010 > Member Status: Quorate > > Member Name ID Status > ------ ---- ---- ------ > oelcl1 1 Online > oelcl2 2 Online, Local > > [r...@oelcl2 init.d]# > > > on first node: > > [r...@oelcl1 ~]# clustat > Cluster Status for cluster1 @ Sat Sep 11 18:12:07 2010 > Member Status: Quorate > > Member Name ID Status > ------ ---- ---- ------ > oelcl1 1 Online, Local, > rgmanager > oelcl2 2 Online, > rgmanager > > Service Name Owner (Last) > State > ------- ---- ----- ------ > ----- > service:webby oelcl1 > started > [r...@oelcl1 ~]# > > > and then I have to destroy both domU on guest and create it back to get > node2 work again. > > I have use how to on https://access.redhat.com/kb/docs/DOC-5937 and > http://sources.redhat.com/cluster/wiki/VMClusterCookbook > > > cluster config on dom0 > > > <?xml version="1.0"?> > <cluster alias="vmcluster" config_version="1" name="vmcluster"> > <clusternodes> > <clusternode name="vm5" nodeid="1" votes="1"/> > </clusternodes> > <cman/> > <fencedevices/> > <rm/> > <fence_xvmd/> > </cluster> > > > > cluster config on domU > > > <?xml version="1.0"?> > <cluster alias="cluster1" config_version="49" name="cluster1"> > <fence_daemon clean_start="0" post_fail_delay="0" > post_join_delay="4"/> > <clusternodes> > <clusternode name="oelcl1.name.comi" nodeid="1" votes="1"> > <fence> > <method name="1"> > <device domain="oelcl1" > name="xenfence1"/> > </method> > </fence> > </clusternode> > <clusternode name="oelcl2.name.com" nodeid="2" votes="1"> > <fence> > <method name="1"> > <device domain="oelcl2" > name="xenfence1"/> > </method> > </fence> > </clusternode> > </clusternodes> > <cman expected_votes="1" two_node="1"/> > <fencedevices> > <fencedevice agent="fence_xvm" name="xenfence1"/> > </fencedevices> > <rm> > <failoverdomains> > <failoverdomain name="prefer_node1" nofailback="0" > ordered="1" restricted="1"> > <failoverdomainnode name="oelcl1.name.com" > priority="1"/> > <failoverdomainnode name="oelcl2.name.com" > priority="2"/> > </failoverdomain> > </failoverdomains> > <resources> > <ip address="xx.xx.xx.xx" monitor_link="1"/> > <fs device="/dev/xvdb1" force_fsck="0" > force_unmount="0" fsid="8669" fstype="ext3" mountpoint="/var/www/html" > name="docroot" self_fence="0"/> > <script file="/etc/init.d/httpd" name="apache_s"/> > </resources> > <service autostart="1" domain="prefer_node1" exclusive="0" > name="webby" recovery="relocate"> > <ip ref="xx.xx.xx.xx"/> > <fs ref="docroot"/> > <script ref="apache_s"/> > </service> > </rm> > </cluster> > > > > > fence proces on dom0 > > [r...@vm5 cluster]# ps -ef |grep fenc > root 18690 1 0 17:40 ? 00:00:00 /sbin/fenced > root 18720 1 0 17:40 ? 00:00:00 /sbin/fence_xvmd -I xenbr0 > root 22633 14524 0 18:21 pts/3 00:00:00 grep fenc > [r...@vm5 cluster]# > > > and on domU > > [r...@oelcl1 ~]# ps -ef|grep fen > root 1523 1 0 17:41 ? 00:00:00 /sbin/fenced > root 13695 2902 0 18:22 pts/0 00:00:00 grep fen > [r...@oelcl1 ~]# > > > > Do somebody have any idea why fence don't work? > > thx > > br > > jost > > > > > > > > -- > Linux-cluster mailing list > [email protected] > https://www.redhat.com/mailman/listinfo/linux-cluster > > -- > Linux-cluster mailing list > [email protected] > https://www.redhat.com/mailman/listinfo/linux-cluster > > -- > Linux-cluster mailing list > [email protected] > https://www.redhat.com/mailman/listinfo/linux-cluster >
-- Linux-cluster mailing list [email protected] https://www.redhat.com/mailman/listinfo/linux-cluster
