Re: [ClusterLabs] Still Beginner STONITH Problem
On 7/20/20 5:05 PM, Stefan Schmitz wrote: > Hello, > > I have now deleted the previous stonith resource and added two new > ones, one for each server. The commands I used for that: > > # pcs -f stonith_cfg stonith create stonith_id_1 external/libvirt > hostlist="host2" pcmk_host_list="server2ubuntu1,server4ubuntu1" > hypervisor_uri="qemu+ssh://192.168.1.13/system" > > # pcs -f stonith_cfg stonith create stonith_id_2 external/libvirt > hostlist="Host4" pcmk_host_list="server2ubuntu1,server4ubuntu1" > hypervisor_uri="qemu+ssh://192.168.1.21/system" As already mentioned external/libvirt is the wrong fence-agent. You have to use fence_xvm as you've tried on the cmdline. You don't need hostlist and probably no pcmk_host_list as well as fence_xvm is gonna query fence_virtd for possible targets. If you don't have a 1:1 match between host names in libvirt and node-names you will definitely need a pcmk_host_map="{pacemaker-node1}:{guest-name1};..." And you have to give the attribute multicast_address=... . The reference to libvirt/hypervisor is solely in fence_virtd. If you intended to switch over to a solution without fence_virtd-service and a fence-agent that is directly talking to the hypervisor forget my comments - haven't done that with libvirt so far. Klaus > > > The behaviour is now somewhat different but it still does work. I > guess I am doing something completely wrong in in setting up the > stonith resource? > > The pcs status command shows two running stonith resources on one > server but two stopped ones on the other. Additionally there is an > failed fencing action. The one server showing running stonith is > marked as unclean and fencing wants to reboot it but fails doing so. > > Any advise on how to proceed would be greatly appreaciated. > > > The pcs shortened status outputs of each of the VMs: > > > # pcs status of server2ubuntu1 > [...] > Node List: > * Node server2ubuntu1: UNCLEAN (online) > * Online: [ server4ubuntu1 ] > > Full List of Resources: > [...] > * stonith_id_1 (stonith:external/libvirt): Started > server2ubuntu1 > * stonith_id_2 (stonith:external/libvirt): Started > server2ubuntu1 > > Failed Resource Actions: > * stonith_id_1_start_0 on server4ubuntu1 'error' (1): call=228, > status='complete', exitreason='', last-rc-change='1970-01-08 01:35:45 > +01:00', queued=4391ms, exec=2890ms > * stonith_id_2_start_0 on server4ubuntu1 'error' (1): call=229, > status='complete', exitreason='', last-rc-change='1970-01-08 01:35:45 > +01:00', queued=2230ms, exec=-441815ms > * r0_pacemaker_stop_0 on server2ubuntu1 'error' (1): call=198, > status='Timed Out', exitreason='', last-rc-change='1970-01-08 01:33:54 > +01:00', queued=321ms, exec=115529ms > * stonith_id_1_start_0 on server2ubuntu1 'error' (1): call=196, > status='complete', exitreason='', last-rc-change='1970-01-08 01:33:54 > +01:00', queued=161ms, exec=-443582ms > * stonith_id_2_start_0 on server2ubuntu1 'error' (1): call=197, > status='complete', exitreason='', last-rc-change='1970-01-08 01:33:54 > +01:00', queued=69ms, exec=-444042ms > > Failed Fencing Actions: > * reboot of server2ubuntu1 failed: delegate=, > client=pacemaker-controld.2002, origin=server4ubuntu1, > last-failed='2020-07-20 16:51:49 +02:00' > > > > # pcs status of server4ubuntu1 > [...] > Node List: > * Node server2ubuntu1: UNCLEAN (online) > * Online: [ server4ubuntu1 ] > > Full List of Resources: > [...] > * stonith_id_1 (stonith:external/libvirt): FAILED > server4ubuntu1 > * stonith_id_2 (stonith:external/libvirt): FAILED > server4ubuntu1 > > Failed Resource Actions: > * stonith_id_1_start_0 on server4ubuntu1 'error' (1): call=248, > status='complete', exitreason='', last-rc-change='1970-01-08 01:45:07 > +01:00', queued=350ms, exec=516901ms > * stonith_id_2_start_0 on server4ubuntu1 'error' (1): call=249, > status='complete', exitreason='', last-rc-change='1970-01-08 01:45:07 > +01:00', queued=149ms, exec=515438ms > * stonith_id_1_start_0 on server2ubuntu1 'error' (1): call=215, > status='complete', exitreason='', last-rc-change='1970-01-08 01:44:53 > +01:00', queued=189ms, exec=534334ms > * stonith_id_2_start_0 on server2ubuntu1 'error' (1): call=216, > status='complete', exitreason='', last-rc-change='1970-01-08 01:44:53 > +01:00', queued=82ms, exec=564228ms > > Failed Fencing Actions: > * reboot of server2ubuntu1 failed: delegate=, > client=pacemaker-controld.2002, origin=server4ubuntu1, > last-failed='2020-07-20 16:51:49 +02:00' > > > > kind regards > Stefan Schmitz > > > > > Am 20.07.2020 um 13:51 schrieb Stefan Schmitz: >> >> >> >> Am 20.07.2020 um 13:36 schrieb Klaus Wenninger: >>> On 7/20/20 1:10 PM, Stefan Schmitz wrote: Hello, thank you all very much for your help so far! We have no managed to capture the mulitcast traffic originating from one host when issuing the command "fence_xvm -o list" on the other host. Now the tcpdump at least
Re: [ClusterLabs] Still Beginner STONITH Problem
Am 20.07.2020 um 13:36 schrieb Klaus Wenninger: On 7/20/20 1:10 PM, Stefan Schmitz wrote: Hello, thank you all very much for your help so far! We have no managed to capture the mulitcast traffic originating from one host when issuing the command "fence_xvm -o list" on the other host. Now the tcpdump at least looks exactly the same on all 4 servers, hosts and guest. I can not tell how and why this just started working, but I got our Datacenter Techs final report this morning, that there are no problems present. Am 19.07.2020 um 09:32 schrieb Andrei Borzenkov: external/libvirt is unrelated to fence_xvm Could you please explain that a bit more? Do you mean that the current problem of the dysfunctional Stonith/fencing is unrelated to libvirt? Hadn't spotted that ... sry What he meant is if you are using fence_virtd-service on the host(s) then the matching fencing-resource is based on fence_xvm and not external/libvirt. The libvirt-stuff is handled by the daemon running on your host. fence_xvm opens TCP listening socket, sends request and waits for connection to this socket (from fence_virtd) which is used to submit actual fencing operation. Only the first connection request is handled. So first host that responds will be processed. Local host is likely always faster to respond than remote host. Thank you for the explanation, I get that. But what would you suggest to remedy this situation? We have been using libvirt and fence_xvm because of the clusterlabs wiki articles and the suggestions in this mailing list. Is there anything you suggest we need to change to make this Cluster finally work? Guess what he meant, what I've already suggested before and what is as well described in the article linked is having totally separate configurations for each host. If you are using different multicast-addresses or unicast - as Andrei is suggesting and which I haven't used before - probably doesn't matter. (Unless of course something is really blocking multicast ...) And you have to setup one fencing-resource per host (fence_xvm) that has the address configured you've setup on each of the hosts. Thank you for thte explanation. I sadly cannot access the articles. I take, totally separate configurations means having a stonith resource configured in the cluster for each host. So for now I will delete the current resource and try to configure two new ones. Am 18.07.2020 um 02:36 schrieb Reid Wahl: However, when users want to configure fence_xvm for multiple hosts with the libvirt backend, I have typically seen them configure multiple fence_xvm devices (one per host) and configure a different multicast address on each host. I do have an Red Hat Account but not a payed subscription, which sadly is needed to access the articles you have linked. We have installed fence_virt on both hosts since the beginning, if that is what you mean by " multiple fence_xvm devices (one per host)". They were however both configured to use the same multicast IP Adress, which we now changed so that each hosts fence_xvm install uses a different multicast IP. Sadly this does not seem to change anything in the behaviour. What is interesting though is, that i ran again fence_xvm -c changed the multicast IP to 225.0.0.13 (from .12). I killed and restarted the daemon multiple times after that. When I now run #fence_xvm -o list without specifiying an IP adress tcpdump on the other host still shows the old IP as the originating one. tcpdum on other host: Host4.54001 > 225.0.0.12.zented: [udp sum ok] UDP, length 176 Host4 > igmp.mcast.net: igmp v3 report, 1 group record(s) [gaddr 225.0.0.12 to_in { }] Host4 > igmp.mcast.net: igmp v3 report, 1 group record(s) [gaddr 225.0.0.12 to_in { }] Only when I specify the other IP it apparently really gets used: # fence_xvm -a 225.0.0.13 -o list tcpdum on other host: Host4.46011 > 225.0.0.13.zented: [udp sum ok] UDP, length 176 Host4 > igmp.mcast.net: igmp v3 report, 1 group record(s) [gaddr 225.0.0.13 to_in { }] Host4 > igmp.mcast.net: igmp v3 report, 1 group record(s) [gaddr 225.0.0.13 to_in { }] Am 17.07.2020 um 16:49 schrieb Strahil Nikolov: The simplest way to check if the libvirt's network is NAT (or not) is to try to ssh from the first VM to the second one. That does work without any issue. I can ssh to any server in our network, host or guest, without a problem. Does that mean there is no natting involved? Am 17.07.2020 um 16:41 schrieb Klaus Wenninger: How does your VM part of the network-config look like? # cat ifcfg-br0 DEVICE=br0 TYPE=Bridge BOOTPROTO=static ONBOOT=yes IPADDR=192.168.1.13 NETMASK=255.255.0.0 GATEWAY=192.168.1.1 NM_CONTROLLED=no IPV6_AUTOCONF=yes IPV6_DEFROUTE=yes IPV6_PEERDNS=yes IPV6_PEERROUTES=yes IPV6_FAILURE_FATAL=no I am at a loss an do not know why this is NAT. I am aware what NAT means, but what am I supposed to reconfigure here to dolve the problem? As long as you stay within the subnet you are
Re: [ClusterLabs] Still Beginner STONITH Problem
On 7/20/20 1:10 PM, Stefan Schmitz wrote: > Hello, > > thank you all very much for your help so far! > > We have no managed to capture the mulitcast traffic originating from > one host when issuing the command "fence_xvm -o list" on the other > host. Now the tcpdump at least looks exactly the same on all 4 > servers, hosts and guest. I can not tell how and why this just started > working, but I got our Datacenter Techs final report this morning, > that there are no problems present. > > > > Am 19.07.2020 um 09:32 schrieb Andrei Borzenkov: > >external/libvirt is unrelated to fence_xvm > > Could you please explain that a bit more? Do you mean that the current > problem of the dysfunctional Stonith/fencing is unrelated to libvirt? Hadn't spotted that ... sry What he meant is if you are using fence_virtd-service on the host(s) then the matching fencing-resource is based on fence_xvm and not external/libvirt. The libvirt-stuff is handled by the daemon running on your host. > > >fence_xvm opens TCP listening socket, sends request and waits for > >connection to this socket (from fence_virtd) which is used to submit > >actual fencing operation. Only the first connection request is handled. > >So first host that responds will be processed. Local host is likely > >always faster to respond than remote host. > > Thank you for the explanation, I get that. But what would you suggest > to remedy this situation? We have been using libvirt and fence_xvm > because of the clusterlabs wiki articles and the suggestions in this > mailing list. Is there anything you suggest we need to change to make > this Cluster finally work? Guess what he meant, what I've already suggested before and what is as well described in the article linked is having totally separate configurations for each host. If you are using different multicast-addresses or unicast - as Andrei is suggesting and which I haven't used before - probably doesn't matter. (Unless of course something is really blocking multicast ...) And you have to setup one fencing-resource per host (fence_xvm) that has the address configured you've setup on each of the hosts. > > > Am 18.07.2020 um 02:36 schrieb Reid Wahl: > >However, when users want to configure fence_xvm for multiple hosts > with the libvirt backend, I have typically seen them configure > multiple fence_xvm devices (one per host) and configure a different > multicast address on each host. > > I do have an Red Hat Account but not a payed subscription, which sadly > is needed to access the articles you have linked. > > We have installed fence_virt on both hosts since the beginning, if > that is what you mean by " multiple fence_xvm devices (one per host)". > They were however both configured to use the same multicast IP Adress, > which we now changed so that each hosts fence_xvm install uses a > different multicast IP. Sadly this does not seem to change anything in > the behaviour. > What is interesting though is, that i ran again fence_xvm -c changed > the multicast IP to 225.0.0.13 (from .12). I killed and restarted the > daemon multiple times after that. > When I now run #fence_xvm -o list without specifiying an IP adress > tcpdump on the other host still shows the old IP as the originating one. > tcpdum on other host: > Host4.54001 > 225.0.0.12.zented: [udp sum ok] UDP, length 176 > Host4 > igmp.mcast.net: igmp v3 report, 1 group record(s) [gaddr > 225.0.0.12 to_in { }] > Host4 > igmp.mcast.net: igmp v3 report, 1 group record(s) [gaddr > 225.0.0.12 to_in { }] > > Only when I specify the other IP it apparently really gets used: > # fence_xvm -a 225.0.0.13 -o list > tcpdum on other host: > Host4.46011 > 225.0.0.13.zented: [udp sum ok] UDP, length 176 > Host4 > igmp.mcast.net: igmp v3 report, 1 group record(s) [gaddr > 225.0.0.13 to_in { }] > Host4 > igmp.mcast.net: igmp v3 report, 1 group record(s) [gaddr > 225.0.0.13 to_in { }] > > > > > Am 17.07.2020 um 16:49 schrieb Strahil Nikolov: > >The simplest way to check if the libvirt's network is NAT (or not) > is to try to ssh from the first VM to the second one. > That does work without any issue. I can ssh to any server in our > network, host or guest, without a problem. Does that mean there is no > natting involved? > > > > Am 17.07.2020 um 16:41 schrieb Klaus Wenninger: > >How does your VM part of the network-config look like? > # cat ifcfg-br0 > DEVICE=br0 > TYPE=Bridge > BOOTPROTO=static > ONBOOT=yes > IPADDR=192.168.1.13 > NETMASK=255.255.0.0 > GATEWAY=192.168.1.1 > NM_CONTROLLED=no > IPV6_AUTOCONF=yes > IPV6_DEFROUTE=yes > IPV6_PEERDNS=yes > IPV6_PEERROUTES=yes > IPV6_FAILURE_FATAL=no > > > >> I am at a loss an do not know why this is NAT. I am aware what NAT > >> means, but what am I supposed to reconfigure here to dolve the > problem? > >As long as you stay within the subnet you are running on your bridge > >you won't get natted but once it starts to route via the host the > libvirt > >default bridge will be natted. > >What you
Re: [ClusterLabs] Still Beginner STONITH Problem
Hello, thank you all very much for your help so far! We have no managed to capture the mulitcast traffic originating from one host when issuing the command "fence_xvm -o list" on the other host. Now the tcpdump at least looks exactly the same on all 4 servers, hosts and guest. I can not tell how and why this just started working, but I got our Datacenter Techs final report this morning, that there are no problems present. Am 19.07.2020 um 09:32 schrieb Andrei Borzenkov: >external/libvirt is unrelated to fence_xvm Could you please explain that a bit more? Do you mean that the current problem of the dysfunctional Stonith/fencing is unrelated to libvirt? >fence_xvm opens TCP listening socket, sends request and waits for >connection to this socket (from fence_virtd) which is used to submit >actual fencing operation. Only the first connection request is handled. >So first host that responds will be processed. Local host is likely >always faster to respond than remote host. Thank you for the explanation, I get that. But what would you suggest to remedy this situation? We have been using libvirt and fence_xvm because of the clusterlabs wiki articles and the suggestions in this mailing list. Is there anything you suggest we need to change to make this Cluster finally work? Am 18.07.2020 um 02:36 schrieb Reid Wahl: >However, when users want to configure fence_xvm for multiple hosts with the libvirt backend, I have typically seen them configure multiple fence_xvm devices (one per host) and configure a different multicast address on each host. I do have an Red Hat Account but not a payed subscription, which sadly is needed to access the articles you have linked. We have installed fence_virt on both hosts since the beginning, if that is what you mean by " multiple fence_xvm devices (one per host)". They were however both configured to use the same multicast IP Adress, which we now changed so that each hosts fence_xvm install uses a different multicast IP. Sadly this does not seem to change anything in the behaviour. What is interesting though is, that i ran again fence_xvm -c changed the multicast IP to 225.0.0.13 (from .12). I killed and restarted the daemon multiple times after that. When I now run #fence_xvm -o list without specifiying an IP adress tcpdump on the other host still shows the old IP as the originating one. tcpdum on other host: Host4.54001 > 225.0.0.12.zented: [udp sum ok] UDP, length 176 Host4 > igmp.mcast.net: igmp v3 report, 1 group record(s) [gaddr 225.0.0.12 to_in { }] Host4 > igmp.mcast.net: igmp v3 report, 1 group record(s) [gaddr 225.0.0.12 to_in { }] Only when I specify the other IP it apparently really gets used: # fence_xvm -a 225.0.0.13 -o list tcpdum on other host: Host4.46011 > 225.0.0.13.zented: [udp sum ok] UDP, length 176 Host4 > igmp.mcast.net: igmp v3 report, 1 group record(s) [gaddr 225.0.0.13 to_in { }] Host4 > igmp.mcast.net: igmp v3 report, 1 group record(s) [gaddr 225.0.0.13 to_in { }] Am 17.07.2020 um 16:49 schrieb Strahil Nikolov: >The simplest way to check if the libvirt's network is NAT (or not) is to try to ssh from the first VM to the second one. That does work without any issue. I can ssh to any server in our network, host or guest, without a problem. Does that mean there is no natting involved? Am 17.07.2020 um 16:41 schrieb Klaus Wenninger: >How does your VM part of the network-config look like? # cat ifcfg-br0 DEVICE=br0 TYPE=Bridge BOOTPROTO=static ONBOOT=yes IPADDR=192.168.1.13 NETMASK=255.255.0.0 GATEWAY=192.168.1.1 NM_CONTROLLED=no IPV6_AUTOCONF=yes IPV6_DEFROUTE=yes IPV6_PEERDNS=yes IPV6_PEERROUTES=yes IPV6_FAILURE_FATAL=no >> I am at a loss an do not know why this is NAT. I am aware what NAT >> means, but what am I supposed to reconfigure here to dolve the problem? >As long as you stay within the subnet you are running on your bridge >you won't get natted but once it starts to route via the host the libvirt >default bridge will be natted. >What you can do is connect the bridges on your 2 hosts via layer 2. >Possible ways should be OpenVPN, knet, VLAN on your switches ... >(and yes - a cable ) >If your guests are using DHCP you should probably configure >fixed IPs for those MACs. All our server have fixed IPs, DHCP is not used anywhere in our network for dynamic IPs assignment. Regarding the "check if VMs are natted", is this solved by the ssh test suggested by Strahil Nikolov? Can I assume natting is not a problem here or do we still have to take measures? kind regards Stefan Schmitz Am 18.07.2020 um 02:36 schrieb Reid Wahl: > I'm not sure that the libvirt backend is intended to be used in this > way, with multiple hosts using the same multicast address. From the > fence_virt.conf man page: > > ~~~ > BACKENDS > libvirt > The libvirt plugin is the simplest plugin. It is used in > environments where routing fencing requests between
Re: [ClusterLabs] Still Beginner STONITH Problem
02.07.2020 18:18, stefan.schm...@farmpartner-tec.com пишет: > Hello, > > I hope someone can help with this problem. We are (still) trying to get > Stonith to achieve a running active/active HA Cluster, but sadly to no > avail. > > There are 2 Centos Hosts. On each one there is a virtual Ubuntu VM. The > Ubuntu VMs are the ones which should form the HA Cluster. > > The current status is this: > > # pcs status > Cluster name: pacemaker_cluster > WARNING: corosync and pacemaker node names do not match (IPs used in > setup?) > Stack: corosync > Current DC: server2ubuntu1 (version 1.1.18-2b07d5c5a9) - partition with > quorum > Last updated: Thu Jul 2 17:03:53 2020 > Last change: Thu Jul 2 14:33:14 2020 by root via cibadmin on > server4ubuntu1 > > 2 nodes configured > 13 resources configured > > Online: [ server2ubuntu1 server4ubuntu1 ] > > Full list of resources: > > stonith_id_1 (stonith:external/libvirt): Stopped external/libvirt is unrelated to fence_xvm > Master/Slave Set: r0_pacemaker_Clone [r0_pacemaker] > Masters: [ server4ubuntu1 ] > Slaves: [ server2ubuntu1 ] > Master/Slave Set: WebDataClone [WebData] > Masters: [ server2ubuntu1 server4ubuntu1 ] > Clone Set: dlm-clone [dlm] > Started: [ server2ubuntu1 server4ubuntu1 ] > Clone Set: ClusterIP-clone [ClusterIP] (unique) > ClusterIP:0 (ocf::heartbeat:IPaddr2): Started > server2ubuntu1 > ClusterIP:1 (ocf::heartbeat:IPaddr2): Started > server4ubuntu1 > Clone Set: WebFS-clone [WebFS] > Started: [ server4ubuntu1 ] > Stopped: [ server2ubuntu1 ] > Clone Set: WebSite-clone [WebSite] > Started: [ server4ubuntu1 ] > Stopped: [ server2ubuntu1 ] > > Failed Actions: > * stonith_id_1_start_0 on server2ubuntu1 'unknown error' (1): call=201, > status=Error, exitreason='', > last-rc-change='Thu Jul 2 14:37:35 2020', queued=0ms, exec=3403ms > * r0_pacemaker_monitor_6 on server2ubuntu1 'master' (8): call=203, > status=complete, exitreason='', > last-rc-change='Thu Jul 2 14:38:39 2020', queued=0ms, exec=0ms > * stonith_id_1_start_0 on server4ubuntu1 'unknown error' (1): call=202, > status=Error, exitreason='', > last-rc-change='Thu Jul 2 14:37:39 2020', queued=0ms, exec=3411ms > > > The stonith resoursce is stopped and does not seem to work. > On both hosts the command > # fence_xvm -o list > kvm102 bab3749c-15fc-40b7-8b6c-d4267b9f0eb9 on > > returns the local VM. Apparently it connects through the Virtualization > interface because it returns the VM name not the Hostname of the client > VM. I do not know if this is how it is supposed to work? > fence_xvm opens TCP listening socket, sends request and waits for connection to this socket (from fence_virtd) which is used to submit actual fencing operation. Only the first connection request is handled. So first host that responds will be processed. Local host is likely always faster to respond than remote host. > In the local network, every traffic is allowed. No firewall is locally > active, just the connections leaving the local network are firewalled. > Hence there are no coneection problems between the hosts and clients. > For example we can succesfully connect from the clients to the Hosts: > > # nc -z -v -u 192.168.1.21 1229 > Ncat: Version 7.50 ( https://nmap.org/ncat ) > Ncat: Connected to 192.168.1.21:1229. > Ncat: UDP packet sent successfully > Ncat: 1 bytes sent, 0 bytes received in 2.03 seconds. > > # nc -z -v -u 192.168.1.13 1229 > Ncat: Version 7.50 ( https://nmap.org/ncat ) > Ncat: Connected to 192.168.1.13:1229. > Ncat: UDP packet sent successfully > Ncat: 1 bytes sent, 0 bytes received in 2.03 seconds. > > > On the Ubuntu VMs we created and configured the the stonith resource > according to the howto provided here: > https://clusterlabs.org/pacemaker/doc/en-US/Pacemaker/1.1/pdf/Clusters_from_Scratch/Pacemaker-1.1-Clusters_from_Scratch-en-US.pdf > > > The actual line we used: > # pcs -f stonith_cfg stonith create stonith_id_1 external/libvirt > hostlist="Host4,host2" hypervisor_uri="qemu+ssh://192.168.1.21/system" > Again - external/libvirt is completely unrelated to fence_virt. > > But as you can see in in the pcs status output, stonith is stopped and > exits with an unkown error. > > Can somebody please advise on how to procced or what additionla > information is needed to solve this problem? > Any help would be greatly appreciated! Thank you in advance. > > Kind regards > Stefan Schmitz > > > > > > > > ___ Manage your subscription: https://lists.clusterlabs.org/mailman/listinfo/users ClusterLabs home: https://www.clusterlabs.org/
Re: [ClusterLabs] Still Beginner STONITH Problem
I'm not sure that the libvirt backend is intended to be used in this way, with multiple hosts using the same multicast address. From the fence_virt.conf man page: ~~~ BACKENDS libvirt The libvirt plugin is the simplest plugin. It is used in environments where routing fencing requests between multiple hosts is not required, for example by a user running a cluster of virtual machines on a single desktop computer. libvirt-qmf The libvirt-qmf plugin acts as a QMFv2 Console to the libvirt-qmf daemon in order to route fencing requests over AMQP to the appropriate computer. cpg The cpg plugin uses corosync CPG and libvirt to track virtual machines and route fencing requests to the appropriate computer. ~~~ I'm not an expert on fence_xvm or libvirt. It's possible that this is a viable configuration with the libvirt backend. However, when users want to configure fence_xvm for multiple hosts with the libvirt backend, I have typically seen them configure multiple fence_xvm devices (one per host) and configure a different multicast address on each host. If you have a Red Hat account, see also: - https://access.redhat.com/solutions/2386421#comment-1209661 - https://access.redhat.com/solutions/2386421#comment-1209801 On Fri, Jul 17, 2020 at 7:49 AM Strahil Nikolov wrote: > The simplest way to check if the libvirt's network is NAT (or not) is to > try to ssh from the first VM to the second one. > > I should admit that I was lost when I tried to create a routed network > in KVM, so I can't help with that. > > Best Regards, > Strahil Nikolov > > На 17 юли 2020 г. 16:56:44 GMT+03:00, "stefan.schm...@farmpartner-tec.com" > написа: > >Hello, > > > >I have now managed to get # fence_xvm -a 225.0.0.12 -o list to list at > >least its local Guest again. It seems the fence_virtd was not working > >properly anymore. > > > >Regarding the Network XML config > > > ># cat default.xml > > > > default > > > > > > > > > > > > > > > > > > > >I have used "virsh net-edit default" to test other network Devices on > >the hosts but this did not change anything. > > > >Regarding the statement > > > > > If it is created by libvirt - this is NAT and you will never > > > receive output from the other host. > > > >I am at a loss an do not know why this is NAT. I am aware what NAT > >means, but what am I supposed to reconfigure here to dolve the problem? > >Any help would be greatly appreciated. > >Thank you in advance. > > > >Kind regards > >Stefan Schmitz > > > > > >Am 15.07.2020 um 16:48 schrieb stefan.schm...@farmpartner-tec.com: > >> > >> Am 15.07.2020 um 16:29 schrieb Klaus Wenninger: > >>> On 7/15/20 4:21 PM, stefan.schm...@farmpartner-tec.com wrote: > Hello, > > > Am 15.07.2020 um 15:30 schrieb Klaus Wenninger: > > On 7/15/20 3:15 PM, Strahil Nikolov wrote: > >> If it is created by libvirt - this is NAT and you will never > >> receive output from the other host. > > And twice the same subnet behind NAT is probably giving > > issues at other places as well. > > And if using DHCP you have to at least enforce that both sides > > don't go for the same IP at least. > > But all no explanation why it doesn't work on the same host. > > Which is why I was asking for running the service on the > > bridge to check if that would work at least. So that we > > can go forward step by step. > > I just now finished trying and testing it on both hosts. > I ran # fence_virtd -c on both hosts and entered different network > devices. On both I tried br0 and the kvm10x.0. > >>> According to your libvirt-config I would have expected > >>> the bridge to be virbr0. > >> > >> I understand that, but an "virbr0" Device does not seem to exist on > >any > >> of the two hosts. > >> > >> # ip link show > >> 1: lo: mtu 65536 qdisc noqueue state UNKNOWN > >mode > >> DEFAULT group default qlen 1000 > >> link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00 > >> 2: eno1: mtu 1500 qdisc mq > >> master bond0 state UP mode DEFAULT group default qlen 1000 > >> link/ether 0c:c4:7a:fb:30:1a brd ff:ff:ff:ff:ff:ff > >> 3: enp216s0f0: mtu 1500 qdisc noop state DOWN > >mode > >> DEFAULT group default qlen 1000 > >> link/ether ac:1f:6b:26:69:dc brd ff:ff:ff:ff:ff:ff > >> 4: eno2: mtu 1500 qdisc mq > >> master bond0 state UP mode DEFAULT group default qlen 1000 > >> link/ether 0c:c4:7a:fb:30:1a brd ff:ff:ff:ff:ff:ff > >> 5: enp216s0f1: mtu 1500 qdisc noop state DOWN > >mode > >> DEFAULT group default qlen 1000 > >> link/ether ac:1f:6b:26:69:dd brd ff:ff:ff:ff:ff:ff > >> 6: bond0: mtu 1500 qdisc > >> noqueue master br0 state UP mode DEFAULT group default qlen 1000 > >> link/ether 0c:c4:7a:fb:30:1a brd ff:ff:ff:ff:ff:ff > >> 7: br0: mtu 1500 qdisc noqueue > >state > >> UP mode DEFAULT group default qlen 1000 > >> link/ether 0c:c4:7a:fb:30:1a brd
Re: [ClusterLabs] Still Beginner STONITH Problem
The simplest way to check if the libvirt's network is NAT (or not) is to try to ssh from the first VM to the second one. I should admit that I was lost when I tried to create a routed network in KVM, so I can't help with that. Best Regards, Strahil Nikolov На 17 юли 2020 г. 16:56:44 GMT+03:00, "stefan.schm...@farmpartner-tec.com" написа: >Hello, > >I have now managed to get # fence_xvm -a 225.0.0.12 -o list to list at >least its local Guest again. It seems the fence_virtd was not working >properly anymore. > >Regarding the Network XML config > ># cat default.xml > > default > > > > > > > > > >I have used "virsh net-edit default" to test other network Devices on >the hosts but this did not change anything. > >Regarding the statement > > > If it is created by libvirt - this is NAT and you will never > > receive output from the other host. > >I am at a loss an do not know why this is NAT. I am aware what NAT >means, but what am I supposed to reconfigure here to dolve the problem? >Any help would be greatly appreciated. >Thank you in advance. > >Kind regards >Stefan Schmitz > > >Am 15.07.2020 um 16:48 schrieb stefan.schm...@farmpartner-tec.com: >> >> Am 15.07.2020 um 16:29 schrieb Klaus Wenninger: >>> On 7/15/20 4:21 PM, stefan.schm...@farmpartner-tec.com wrote: Hello, Am 15.07.2020 um 15:30 schrieb Klaus Wenninger: > On 7/15/20 3:15 PM, Strahil Nikolov wrote: >> If it is created by libvirt - this is NAT and you will never >> receive output from the other host. > And twice the same subnet behind NAT is probably giving > issues at other places as well. > And if using DHCP you have to at least enforce that both sides > don't go for the same IP at least. > But all no explanation why it doesn't work on the same host. > Which is why I was asking for running the service on the > bridge to check if that would work at least. So that we > can go forward step by step. I just now finished trying and testing it on both hosts. I ran # fence_virtd -c on both hosts and entered different network devices. On both I tried br0 and the kvm10x.0. >>> According to your libvirt-config I would have expected >>> the bridge to be virbr0. >> >> I understand that, but an "virbr0" Device does not seem to exist on >any >> of the two hosts. >> >> # ip link show >> 1: lo: mtu 65536 qdisc noqueue state UNKNOWN >mode >> DEFAULT group default qlen 1000 >> link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00 >> 2: eno1: mtu 1500 qdisc mq >> master bond0 state UP mode DEFAULT group default qlen 1000 >> link/ether 0c:c4:7a:fb:30:1a brd ff:ff:ff:ff:ff:ff >> 3: enp216s0f0: mtu 1500 qdisc noop state DOWN >mode >> DEFAULT group default qlen 1000 >> link/ether ac:1f:6b:26:69:dc brd ff:ff:ff:ff:ff:ff >> 4: eno2: mtu 1500 qdisc mq >> master bond0 state UP mode DEFAULT group default qlen 1000 >> link/ether 0c:c4:7a:fb:30:1a brd ff:ff:ff:ff:ff:ff >> 5: enp216s0f1: mtu 1500 qdisc noop state DOWN >mode >> DEFAULT group default qlen 1000 >> link/ether ac:1f:6b:26:69:dd brd ff:ff:ff:ff:ff:ff >> 6: bond0: mtu 1500 qdisc >> noqueue master br0 state UP mode DEFAULT group default qlen 1000 >> link/ether 0c:c4:7a:fb:30:1a brd ff:ff:ff:ff:ff:ff >> 7: br0: mtu 1500 qdisc noqueue >state >> UP mode DEFAULT group default qlen 1000 >> link/ether 0c:c4:7a:fb:30:1a brd ff:ff:ff:ff:ff:ff >> 8: kvm101.0: mtu 1500 qdisc >pfifo_fast >> master br0 state UNKNOWN mode DEFAULT group default qlen 1000 >> link/ether fe:16:3c:ba:10:6c brd ff:ff:ff:ff:ff:ff >> >> >> After each reconfiguration I ran #fence_xvm -a 225.0.0.12 -o list On the second server it worked with each device. After that I reconfigured back to the normal device, bond0, on which it did not work anymore, it worked now again! # fence_xvm -a 225.0.0.12 -o list kvm102 >bab3749c-15fc-40b7-8b6c-d4267b9f0eb9 on But anyhow not on the first server, it did not work with any >device. # fence_xvm -a 225.0.0.12 -o list always resulted in Timed out waiting for response Operation failed Am 15.07.2020 um 15:15 schrieb Strahil Nikolov: > If it is created by libvirt - this is NAT and you will never >receive output from the other host. > To my knowledge this is configured by libvirt. At least I am not >aware having changend or configured it in any way. Up until today I did >not even know that file existed. Could you please advise on what I need >to do to fix this issue? Kind regards > Is pacemaker/corosync/knet btw. using the same interfaces/IPs? > > Klaus >> >> Best Regards, >> Strahil Nikolov >> >> На 15 юли 2020 г. 15:05:48 GMT+03:00, >> "stefan.schm...@farmpartner-tec.com" >> написа: >>> Hello,
Re: [ClusterLabs] Still Beginner STONITH Problem
On 7/17/20 3:56 PM, stefan.schm...@farmpartner-tec.com wrote: > Hello, > > I have now managed to get # fence_xvm -a 225.0.0.12 -o list to list at > least its local Guest again. It seems the fence_virtd was not working > properly anymore. > > Regarding the Network XML config > > # cat default.xml > > default > > > > > > > > > > I have used "virsh net-edit default" to test other network Devices on > the hosts but this did not change anything. I have a similar networking setup with libvirt and it behaves as expected - qemu tap devices are enslaved to that bridge. But I have explicitly configured nat () not relying on the default. And instead of a DHCP-range I'm using fixed IP-assignment: How does your VM part of the network-config look like. I have something like: > > Regarding the statement > > > If it is created by libvirt - this is NAT and you will never > > receive output from the other host. > > I am at a loss an do not know why this is NAT. I am aware what NAT > means, but what am I supposed to reconfigure here to dolve the problem? As long as you stay within the subnet you are running on your bridge you won't get natted but once it starts to route via the host the libvirt default bridge will be natted. What you can do is connect the bridges on your 2 hosts via layer 2. Possible ways should be OpenVPN, knet, VLAN on your switches ... (and yes - a cable ;-) ) If your guests are using DHCP you should probably configure fixed IPs for those MACs. Klaus > Any help would be greatly appreciated. > Thank you in advance. > > Kind regards > Stefan Schmitz > > ___ Manage your subscription: https://lists.clusterlabs.org/mailman/listinfo/users ClusterLabs home: https://www.clusterlabs.org/
Re: [ClusterLabs] Still Beginner STONITH Problem
Hello, I have now managed to get # fence_xvm -a 225.0.0.12 -o list to list at least its local Guest again. It seems the fence_virtd was not working properly anymore. Regarding the Network XML config # cat default.xml default I have used "virsh net-edit default" to test other network Devices on the hosts but this did not change anything. Regarding the statement > If it is created by libvirt - this is NAT and you will never > receive output from the other host. I am at a loss an do not know why this is NAT. I am aware what NAT means, but what am I supposed to reconfigure here to dolve the problem? Any help would be greatly appreciated. Thank you in advance. Kind regards Stefan Schmitz Am 15.07.2020 um 16:48 schrieb stefan.schm...@farmpartner-tec.com: Am 15.07.2020 um 16:29 schrieb Klaus Wenninger: On 7/15/20 4:21 PM, stefan.schm...@farmpartner-tec.com wrote: Hello, Am 15.07.2020 um 15:30 schrieb Klaus Wenninger: On 7/15/20 3:15 PM, Strahil Nikolov wrote: If it is created by libvirt - this is NAT and you will never receive output from the other host. And twice the same subnet behind NAT is probably giving issues at other places as well. And if using DHCP you have to at least enforce that both sides don't go for the same IP at least. But all no explanation why it doesn't work on the same host. Which is why I was asking for running the service on the bridge to check if that would work at least. So that we can go forward step by step. I just now finished trying and testing it on both hosts. I ran # fence_virtd -c on both hosts and entered different network devices. On both I tried br0 and the kvm10x.0. According to your libvirt-config I would have expected the bridge to be virbr0. I understand that, but an "virbr0" Device does not seem to exist on any of the two hosts. # ip link show 1: lo: mtu 65536 qdisc noqueue state UNKNOWN mode DEFAULT group default qlen 1000 link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00 2: eno1: mtu 1500 qdisc mq master bond0 state UP mode DEFAULT group default qlen 1000 link/ether 0c:c4:7a:fb:30:1a brd ff:ff:ff:ff:ff:ff 3: enp216s0f0: mtu 1500 qdisc noop state DOWN mode DEFAULT group default qlen 1000 link/ether ac:1f:6b:26:69:dc brd ff:ff:ff:ff:ff:ff 4: eno2: mtu 1500 qdisc mq master bond0 state UP mode DEFAULT group default qlen 1000 link/ether 0c:c4:7a:fb:30:1a brd ff:ff:ff:ff:ff:ff 5: enp216s0f1: mtu 1500 qdisc noop state DOWN mode DEFAULT group default qlen 1000 link/ether ac:1f:6b:26:69:dd brd ff:ff:ff:ff:ff:ff 6: bond0: mtu 1500 qdisc noqueue master br0 state UP mode DEFAULT group default qlen 1000 link/ether 0c:c4:7a:fb:30:1a brd ff:ff:ff:ff:ff:ff 7: br0: mtu 1500 qdisc noqueue state UP mode DEFAULT group default qlen 1000 link/ether 0c:c4:7a:fb:30:1a brd ff:ff:ff:ff:ff:ff 8: kvm101.0: mtu 1500 qdisc pfifo_fast master br0 state UNKNOWN mode DEFAULT group default qlen 1000 link/ether fe:16:3c:ba:10:6c brd ff:ff:ff:ff:ff:ff After each reconfiguration I ran #fence_xvm -a 225.0.0.12 -o list On the second server it worked with each device. After that I reconfigured back to the normal device, bond0, on which it did not work anymore, it worked now again! # fence_xvm -a 225.0.0.12 -o list kvm102 bab3749c-15fc-40b7-8b6c-d4267b9f0eb9 on But anyhow not on the first server, it did not work with any device. # fence_xvm -a 225.0.0.12 -o list always resulted in Timed out waiting for response Operation failed Am 15.07.2020 um 15:15 schrieb Strahil Nikolov: If it is created by libvirt - this is NAT and you will never receive output from the other host. To my knowledge this is configured by libvirt. At least I am not aware having changend or configured it in any way. Up until today I did not even know that file existed. Could you please advise on what I need to do to fix this issue? Kind regards Is pacemaker/corosync/knet btw. using the same interfaces/IPs? Klaus Best Regards, Strahil Nikolov На 15 юли 2020 г. 15:05:48 GMT+03:00, "stefan.schm...@farmpartner-tec.com" написа: Hello, Am 15.07.2020 um 13:42 Strahil Nikolov wrote: By default libvirt is using NAT and not routed network - in such case, vm1 won't receive data from host2. Can you provide the Networks' xml ? Best Regards, Strahil Nikolov # cat default.xml default I just checked this and the file is identical on both hosts. kind regards Stefan Schmitz На 15 юли 2020 г. 13:19:59 GMT+03:00, Klaus Wenninger написа: On 7/15/20 11:42 AM, stefan.schm...@farmpartner-tec.com wrote: Hello, Am 15.07.2020 um 06:32 Strahil Nikolov wrote: How did you configure the network on your ubuntu 20.04 Hosts ? I tried to setup bridged connection for the test setup , but obviously I'm missing something. Best Regards, Strahil Nikolov on the hosts (CentOS) the bridge
Re: [ClusterLabs] Still Beginner STONITH Problem
If it is created by libvirt - this is NAT and you will never receive output from the other host. Best Regards, Strahil Nikolov На 15 юли 2020 г. 15:05:48 GMT+03:00, "stefan.schm...@farmpartner-tec.com" написа: >Hello, > >Am 15.07.2020 um 13:42 Strahil Nikolov wrote: >> By default libvirt is using NAT and not routed network - in such >case, vm1 won't receive data from host2. >> >> Can you provide the Networks' xml ? >> >> Best Regards, >> Strahil Nikolov >> > ># cat default.xml > > default > > > > > > > > > >I just checked this and the file is identical on both hosts. > >kind regards >Stefan Schmitz > > >> На 15 юли 2020 г. 13:19:59 GMT+03:00, Klaus Wenninger > написа: >>> On 7/15/20 11:42 AM, stefan.schm...@farmpartner-tec.com wrote: Hello, Am 15.07.2020 um 06:32 Strahil Nikolov wrote: > How did you configure the network on your ubuntu 20.04 Hosts ? I > tried to setup bridged connection for the test setup , but >>> obviously > I'm missing something. > > Best Regards, > Strahil Nikolov > on the hosts (CentOS) the bridge config looks like that.The >bridging and configuration is handled by the virtualization software: # cat ifcfg-br0 DEVICE=br0 TYPE=Bridge BOOTPROTO=static ONBOOT=yes IPADDR=192.168.1.21 NETMASK=255.255.0.0 GATEWAY=192.168.1.1 NM_CONTROLLED=no IPV6_AUTOCONF=yes IPV6_DEFROUTE=yes IPV6_PEERDNS=yes IPV6_PEERROUTES=yes IPV6_FAILURE_FATAL=no Am 15.07.2020 um 09:50 Klaus Wenninger wrote: > Guess it is not easy to have your servers connected physically for >>> a try. > But maybe you can at least try on one host to have virt_fenced & >VM > on the same bridge - just to see if that basic pattern is working. I am not sure if I understand you correctly. What do you by having them on the same bridge? The bridge device is configured on the >host by the virtualization software. >>> I meant to check out which bridge the interface of the VM is >enslaved >>> to and to use that bridge as interface in /etc/fence_virt.conf. >>> Get me right - just for now - just to see if it is working for this >one >>> host and the corresponding guest. > Well maybe still sbdy in the middle playing IGMPv3 or the request >>> for > a certain source is needed to shoot open some firewall or >>> switch-tables. I am still waiting for the final report from our Data Center techs. >I hope that will clear up somethings. Additionally I have just noticed that apparently since switching >>> from IGMPv3 to IGMPv2 and back the command "fence_xvm -a 225.0.0.12 -o list" is no completely broken. Before that switch this command at least returned the local VM. Now >>> it returns: Timed out waiting for response Operation failed I am a bit confused by that, because all we did was running >commands like "sysctl -w net.ipv4.conf.all.force_igmp_version =" with the different Version umbers and #cat /proc/net/igmp shows that V3 is >>> used again on every device just like before...?! kind regards Stefan Schmitz > На 14 юли 2020 г. 11:06:42 GMT+03:00, > "stefan.schm...@farmpartner-tec.com" > написа: >> Hello, >> >> >> Am 09.07.2020 um 19:10 Strahil Nikolov wrote: >>> Have you run 'fence_virtd -c' ? >> Yes I had run that on both Hosts. The current config looks like >>> that >> and >> is identical on both. >> >> cat fence_virt.conf >> fence_virtd { >> listener = "multicast"; >> backend = "libvirt"; >> module_path = "/usr/lib64/fence-virt"; >> } >> >> listeners { >> multicast { >> key_file = "/etc/cluster/fence_xvm.key"; >> address = "225.0.0.12"; >> interface = "bond0"; >> family = "ipv4"; >> port = "1229"; >> } >> >> } >> >> backends { >> libvirt { >> uri = "qemu:///system"; >> } >> >> } >> >> >> The situation is still that no matter on what host I issue the >> "fence_xvm -a 225.0.0.12 -o list" command, both guest systems >>> receive >> the traffic. The local guest, but also the guest on the other >host. >>> I >> reckon that means the traffic is not filtered by any network >>> device, >> like switches or firewalls. Since the guest on the other host >>> receives >> the packages, the traffic must reach te physical server and >> networkdevice and is then routed to the VM on that host. >> But still, the traffic is not shown on the host itself. >> >> Further the local firewalls on both hosts are set to let each and >>> every >>
Re: [ClusterLabs] Still Beginner STONITH Problem
By default libvirt is using NAT and not routed network - in such case, vm1 won't receive data from host2. Can you provide the Networks' xml ? Best Regards, Strahil Nikolov На 15 юли 2020 г. 13:19:59 GMT+03:00, Klaus Wenninger написа: >On 7/15/20 11:42 AM, stefan.schm...@farmpartner-tec.com wrote: >> Hello, >> >> >> Am 15.07.2020 um 06:32 Strahil Nikolov wrote: >>> How did you configure the network on your ubuntu 20.04 Hosts ? I >>> tried to setup bridged connection for the test setup , but >obviously >>> I'm missing something. >>> >>> Best Regards, >>> Strahil Nikolov >>> >> >> on the hosts (CentOS) the bridge config looks like that.The bridging >> and configuration is handled by the virtualization software: >> >> # cat ifcfg-br0 >> DEVICE=br0 >> TYPE=Bridge >> BOOTPROTO=static >> ONBOOT=yes >> IPADDR=192.168.1.21 >> NETMASK=255.255.0.0 >> GATEWAY=192.168.1.1 >> NM_CONTROLLED=no >> IPV6_AUTOCONF=yes >> IPV6_DEFROUTE=yes >> IPV6_PEERDNS=yes >> IPV6_PEERROUTES=yes >> IPV6_FAILURE_FATAL=no >> >> >> >> Am 15.07.2020 um 09:50 Klaus Wenninger wrote: >> > Guess it is not easy to have your servers connected physically for >a >> try. >> > But maybe you can at least try on one host to have virt_fenced & VM >> > on the same bridge - just to see if that basic pattern is working. >> >> I am not sure if I understand you correctly. What do you by having >> them on the same bridge? The bridge device is configured on the host >> by the virtualization software. >I meant to check out which bridge the interface of the VM is enslaved >to and to use that bridge as interface in /etc/fence_virt.conf. >Get me right - just for now - just to see if it is working for this one >host and the corresponding guest. >> >> >> >Well maybe still sbdy in the middle playing IGMPv3 or the request >for >> >a certain source is needed to shoot open some firewall or >switch-tables. >> >> I am still waiting for the final report from our Data Center techs. I >> hope that will clear up somethings. >> >> >> Additionally I have just noticed that apparently since switching >from >> IGMPv3 to IGMPv2 and back the command "fence_xvm -a 225.0.0.12 -o >> list" is no completely broken. >> Before that switch this command at least returned the local VM. Now >it >> returns: >> Timed out waiting for response >> Operation failed >> >> I am a bit confused by that, because all we did was running commands >> like "sysctl -w net.ipv4.conf.all.force_igmp_version =" with the >> different Version umbers and #cat /proc/net/igmp shows that V3 is >used >> again on every device just like before...?! >> >> kind regards >> Stefan Schmitz >> >> >>> На 14 юли 2020 г. 11:06:42 GMT+03:00, >>> "stefan.schm...@farmpartner-tec.com" >>> написа: Hello, Am 09.07.2020 um 19:10 Strahil Nikolov wrote: > Have you run 'fence_virtd -c' ? Yes I had run that on both Hosts. The current config looks like >that and is identical on both. cat fence_virt.conf fence_virtd { listener = "multicast"; backend = "libvirt"; module_path = "/usr/lib64/fence-virt"; } listeners { multicast { key_file = "/etc/cluster/fence_xvm.key"; address = "225.0.0.12"; interface = "bond0"; family = "ipv4"; port = "1229"; } } backends { libvirt { uri = "qemu:///system"; } } The situation is still that no matter on what host I issue the "fence_xvm -a 225.0.0.12 -o list" command, both guest systems >receive the traffic. The local guest, but also the guest on the other host. >I reckon that means the traffic is not filtered by any network >device, like switches or firewalls. Since the guest on the other host >receives the packages, the traffic must reach te physical server and networkdevice and is then routed to the VM on that host. But still, the traffic is not shown on the host itself. Further the local firewalls on both hosts are set to let each and >every traffic pass. Accept to any and everything. Well at least as far as >I can see. Am 09.07.2020 um 22:34 Klaus Wenninger wrote: > makes me believe that > the whole setup doesn't lookas I would have > expected (bridges on each host where theguest > has a connection to and where ethernet interfaces > that connect the 2 hosts are part of as well On each physical server the networkcards are bonded to achieve >failure safety (bond0). The guest are connected over a bridge(br0) but apparently our virtualization softrware creates an own device named after the guest (kvm101.0). There is no direct connection between the servers, but as I said earlier, the multicast traffic does reach the VMs so I assume there
Re: [ClusterLabs] Still Beginner STONITH Problem
Am 15.07.2020 um 16:29 schrieb Klaus Wenninger: On 7/15/20 4:21 PM, stefan.schm...@farmpartner-tec.com wrote: Hello, Am 15.07.2020 um 15:30 schrieb Klaus Wenninger: On 7/15/20 3:15 PM, Strahil Nikolov wrote: If it is created by libvirt - this is NAT and you will never receive output from the other host. And twice the same subnet behind NAT is probably giving issues at other places as well. And if using DHCP you have to at least enforce that both sides don't go for the same IP at least. But all no explanation why it doesn't work on the same host. Which is why I was asking for running the service on the bridge to check if that would work at least. So that we can go forward step by step. I just now finished trying and testing it on both hosts. I ran # fence_virtd -c on both hosts and entered different network devices. On both I tried br0 and the kvm10x.0. According to your libvirt-config I would have expected the bridge to be virbr0. I understand that, but an "virbr0" Device does not seem to exist on any of the two hosts. # ip link show 1: lo: mtu 65536 qdisc noqueue state UNKNOWN mode DEFAULT group default qlen 1000 link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00 2: eno1: mtu 1500 qdisc mq master bond0 state UP mode DEFAULT group default qlen 1000 link/ether 0c:c4:7a:fb:30:1a brd ff:ff:ff:ff:ff:ff 3: enp216s0f0: mtu 1500 qdisc noop state DOWN mode DEFAULT group default qlen 1000 link/ether ac:1f:6b:26:69:dc brd ff:ff:ff:ff:ff:ff 4: eno2: mtu 1500 qdisc mq master bond0 state UP mode DEFAULT group default qlen 1000 link/ether 0c:c4:7a:fb:30:1a brd ff:ff:ff:ff:ff:ff 5: enp216s0f1: mtu 1500 qdisc noop state DOWN mode DEFAULT group default qlen 1000 link/ether ac:1f:6b:26:69:dd brd ff:ff:ff:ff:ff:ff 6: bond0: mtu 1500 qdisc noqueue master br0 state UP mode DEFAULT group default qlen 1000 link/ether 0c:c4:7a:fb:30:1a brd ff:ff:ff:ff:ff:ff 7: br0: mtu 1500 qdisc noqueue state UP mode DEFAULT group default qlen 1000 link/ether 0c:c4:7a:fb:30:1a brd ff:ff:ff:ff:ff:ff 8: kvm101.0: mtu 1500 qdisc pfifo_fast master br0 state UNKNOWN mode DEFAULT group default qlen 1000 link/ether fe:16:3c:ba:10:6c brd ff:ff:ff:ff:ff:ff After each reconfiguration I ran #fence_xvm -a 225.0.0.12 -o list On the second server it worked with each device. After that I reconfigured back to the normal device, bond0, on which it did not work anymore, it worked now again! # fence_xvm -a 225.0.0.12 -o list kvm102 bab3749c-15fc-40b7-8b6c-d4267b9f0eb9 on But anyhow not on the first server, it did not work with any device. # fence_xvm -a 225.0.0.12 -o list always resulted in Timed out waiting for response Operation failed Am 15.07.2020 um 15:15 schrieb Strahil Nikolov: If it is created by libvirt - this is NAT and you will never receive output from the other host. To my knowledge this is configured by libvirt. At least I am not aware having changend or configured it in any way. Up until today I did not even know that file existed. Could you please advise on what I need to do to fix this issue? Kind regards Is pacemaker/corosync/knet btw. using the same interfaces/IPs? Klaus Best Regards, Strahil Nikolov На 15 юли 2020 г. 15:05:48 GMT+03:00, "stefan.schm...@farmpartner-tec.com" написа: Hello, Am 15.07.2020 um 13:42 Strahil Nikolov wrote: By default libvirt is using NAT and not routed network - in such case, vm1 won't receive data from host2. Can you provide the Networks' xml ? Best Regards, Strahil Nikolov # cat default.xml default I just checked this and the file is identical on both hosts. kind regards Stefan Schmitz На 15 юли 2020 г. 13:19:59 GMT+03:00, Klaus Wenninger написа: On 7/15/20 11:42 AM, stefan.schm...@farmpartner-tec.com wrote: Hello, Am 15.07.2020 um 06:32 Strahil Nikolov wrote: How did you configure the network on your ubuntu 20.04 Hosts ? I tried to setup bridged connection for the test setup , but obviously I'm missing something. Best Regards, Strahil Nikolov on the hosts (CentOS) the bridge config looks like that.The bridging and configuration is handled by the virtualization software: # cat ifcfg-br0 DEVICE=br0 TYPE=Bridge BOOTPROTO=static ONBOOT=yes IPADDR=192.168.1.21 NETMASK=255.255.0.0 GATEWAY=192.168.1.1 NM_CONTROLLED=no IPV6_AUTOCONF=yes IPV6_DEFROUTE=yes IPV6_PEERDNS=yes IPV6_PEERROUTES=yes IPV6_FAILURE_FATAL=no Am 15.07.2020 um 09:50 Klaus Wenninger wrote: Guess it is not easy to have your servers connected physically for a try. But maybe you can at least try on one host to have virt_fenced & VM on the same bridge - just to see if that basic pattern is working. I am not sure if I understand you correctly. What do you by having them on the same bridge? The bridge device is configured on the host by the virtualization software. I meant to check out which bridge the interface of the VM is enslaved to
Re: [ClusterLabs] Still Beginner STONITH Problem
On 7/15/20 4:21 PM, stefan.schm...@farmpartner-tec.com wrote: > Hello, > > > Am 15.07.2020 um 15:30 schrieb Klaus Wenninger: >> On 7/15/20 3:15 PM, Strahil Nikolov wrote: >>> If it is created by libvirt - this is NAT and you will never >>> receive output from the other host. >> And twice the same subnet behind NAT is probably giving >> issues at other places as well. >> And if using DHCP you have to at least enforce that both sides >> don't go for the same IP at least. >> But all no explanation why it doesn't work on the same host. >> Which is why I was asking for running the service on the >> bridge to check if that would work at least. So that we >> can go forward step by step. > > I just now finished trying and testing it on both hosts. > I ran # fence_virtd -c on both hosts and entered different network > devices. On both I tried br0 and the kvm10x.0. According to your libvirt-config I would have expected the bridge to be virbr0. > > After each reconfiguration I ran #fence_xvm -a 225.0.0.12 -o list > On the second server it worked with each device. After that I > reconfigured back to the normal device, bond0, on which it did not > work anymore, it worked now again! > # fence_xvm -a 225.0.0.12 -o list > kvm102 bab3749c-15fc-40b7-8b6c-d4267b9f0eb9 on > > But anyhow not on the first server, it did not work with any device. > # fence_xvm -a 225.0.0.12 -o list always resulted in > Timed out waiting for response > Operation failed > > > > Am 15.07.2020 um 15:15 schrieb Strahil Nikolov: > > If it is created by libvirt - this is NAT and you will never receive > output from the other host. > > > To my knowledge this is configured by libvirt. At least I am not aware > having changend or configured it in any way. Up until today I did not > even know that file existed. Could you please advise on what I need to > do to fix this issue? > > Kind regards > > > > >> Is pacemaker/corosync/knet btw. using the same interfaces/IPs? >> >> Klaus >>> >>> Best Regards, >>> Strahil Nikolov >>> >>> На 15 юли 2020 г. 15:05:48 GMT+03:00, >>> "stefan.schm...@farmpartner-tec.com" >>> написа: Hello, Am 15.07.2020 um 13:42 Strahil Nikolov wrote: > By default libvirt is using NAT and not routed network - in such case, vm1 won't receive data from host2. > Can you provide the Networks' xml ? > > Best Regards, > Strahil Nikolov > # cat default.xml default I just checked this and the file is identical on both hosts. kind regards Stefan Schmitz > На 15 юли 2020 г. 13:19:59 GMT+03:00, Klaus Wenninger написа: >> On 7/15/20 11:42 AM, stefan.schm...@farmpartner-tec.com wrote: >>> Hello, >>> >>> >>> Am 15.07.2020 um 06:32 Strahil Nikolov wrote: How did you configure the network on your ubuntu 20.04 Hosts ? I tried to setup bridged connection for the test setup , but >> obviously I'm missing something. Best Regards, Strahil Nikolov >>> on the hosts (CentOS) the bridge config looks like that.The bridging >>> and configuration is handled by the virtualization software: >>> >>> # cat ifcfg-br0 >>> DEVICE=br0 >>> TYPE=Bridge >>> BOOTPROTO=static >>> ONBOOT=yes >>> IPADDR=192.168.1.21 >>> NETMASK=255.255.0.0 >>> GATEWAY=192.168.1.1 >>> NM_CONTROLLED=no >>> IPV6_AUTOCONF=yes >>> IPV6_DEFROUTE=yes >>> IPV6_PEERDNS=yes >>> IPV6_PEERROUTES=yes >>> IPV6_FAILURE_FATAL=no >>> >>> >>> >>> Am 15.07.2020 um 09:50 Klaus Wenninger wrote: Guess it is not easy to have your servers connected physically for >> a >>> try. But maybe you can at least try on one host to have virt_fenced & VM on the same bridge - just to see if that basic pattern is working. >>> I am not sure if I understand you correctly. What do you by having >>> them on the same bridge? The bridge device is configured on the host >>> by the virtualization software. >> I meant to check out which bridge the interface of the VM is enslaved >> to and to use that bridge as interface in /etc/fence_virt.conf. >> Get me right - just for now - just to see if it is working for this one >> host and the corresponding guest. >>> Well maybe still sbdy in the middle playing IGMPv3 or the request >> for a certain source is needed to shoot open some firewall or >> switch-tables. >>> I am still waiting for the final report from our Data Center techs. I >>> hope that will clear up somethings. >>> >>> >>> Additionally I have just noticed that apparently since switching >> from >>> IGMPv3 to IGMPv2 and back the command "fence_xvm -a 225.0.0.12 -o >>> list" is no completely broken.
Re: [ClusterLabs] Still Beginner STONITH Problem
Hello, Am 15.07.2020 um 15:30 schrieb Klaus Wenninger: On 7/15/20 3:15 PM, Strahil Nikolov wrote: If it is created by libvirt - this is NAT and you will never receive output from the other host. And twice the same subnet behind NAT is probably giving issues at other places as well. And if using DHCP you have to at least enforce that both sides don't go for the same IP at least. But all no explanation why it doesn't work on the same host. Which is why I was asking for running the service on the bridge to check if that would work at least. So that we can go forward step by step. I just now finished trying and testing it on both hosts. I ran # fence_virtd -c on both hosts and entered different network devices. On both I tried br0 and the kvm10x.0. After each reconfiguration I ran #fence_xvm -a 225.0.0.12 -o list On the second server it worked with each device. After that I reconfigured back to the normal device, bond0, on which it did not work anymore, it worked now again! # fence_xvm -a 225.0.0.12 -o list kvm102 bab3749c-15fc-40b7-8b6c-d4267b9f0eb9 on But anyhow not on the first server, it did not work with any device. # fence_xvm -a 225.0.0.12 -o list always resulted in Timed out waiting for response Operation failed Am 15.07.2020 um 15:15 schrieb Strahil Nikolov: > If it is created by libvirt - this is NAT and you will never receive output from the other host. > To my knowledge this is configured by libvirt. At least I am not aware having changend or configured it in any way. Up until today I did not even know that file existed. Could you please advise on what I need to do to fix this issue? Kind regards Is pacemaker/corosync/knet btw. using the same interfaces/IPs? Klaus Best Regards, Strahil Nikolov На 15 юли 2020 г. 15:05:48 GMT+03:00, "stefan.schm...@farmpartner-tec.com" написа: Hello, Am 15.07.2020 um 13:42 Strahil Nikolov wrote: By default libvirt is using NAT and not routed network - in such case, vm1 won't receive data from host2. Can you provide the Networks' xml ? Best Regards, Strahil Nikolov # cat default.xml default I just checked this and the file is identical on both hosts. kind regards Stefan Schmitz На 15 юли 2020 г. 13:19:59 GMT+03:00, Klaus Wenninger написа: On 7/15/20 11:42 AM, stefan.schm...@farmpartner-tec.com wrote: Hello, Am 15.07.2020 um 06:32 Strahil Nikolov wrote: How did you configure the network on your ubuntu 20.04 Hosts ? I tried to setup bridged connection for the test setup , but obviously I'm missing something. Best Regards, Strahil Nikolov on the hosts (CentOS) the bridge config looks like that.The bridging and configuration is handled by the virtualization software: # cat ifcfg-br0 DEVICE=br0 TYPE=Bridge BOOTPROTO=static ONBOOT=yes IPADDR=192.168.1.21 NETMASK=255.255.0.0 GATEWAY=192.168.1.1 NM_CONTROLLED=no IPV6_AUTOCONF=yes IPV6_DEFROUTE=yes IPV6_PEERDNS=yes IPV6_PEERROUTES=yes IPV6_FAILURE_FATAL=no Am 15.07.2020 um 09:50 Klaus Wenninger wrote: Guess it is not easy to have your servers connected physically for a try. But maybe you can at least try on one host to have virt_fenced & VM on the same bridge - just to see if that basic pattern is working. I am not sure if I understand you correctly. What do you by having them on the same bridge? The bridge device is configured on the host by the virtualization software. I meant to check out which bridge the interface of the VM is enslaved to and to use that bridge as interface in /etc/fence_virt.conf. Get me right - just for now - just to see if it is working for this one host and the corresponding guest. Well maybe still sbdy in the middle playing IGMPv3 or the request for a certain source is needed to shoot open some firewall or switch-tables. I am still waiting for the final report from our Data Center techs. I hope that will clear up somethings. Additionally I have just noticed that apparently since switching from IGMPv3 to IGMPv2 and back the command "fence_xvm -a 225.0.0.12 -o list" is no completely broken. Before that switch this command at least returned the local VM. Now it returns: Timed out waiting for response Operation failed I am a bit confused by that, because all we did was running commands like "sysctl -w net.ipv4.conf.all.force_igmp_version =" with the different Version umbers and #cat /proc/net/igmp shows that V3 is used again on every device just like before...?! kind regards Stefan Schmitz На 14 юли 2020 г. 11:06:42 GMT+03:00, "stefan.schm...@farmpartner-tec.com" написа: Hello, Am 09.07.2020 um 19:10 Strahil Nikolov wrote: Have you run 'fence_virtd -c' ? Yes I had run that on both Hosts. The current config looks like that and is identical on both. cat fence_virt.conf fence_virtd { listener = "multicast"; backend = "libvirt"; module_path = "/usr/lib64/fence-virt"; } listeners {
Re: [ClusterLabs] Still Beginner STONITH Problem
On 7/15/20 3:15 PM, Strahil Nikolov wrote: > If it is created by libvirt - this is NAT and you will never receive output > from the other host. And twice the same subnet behind NAT is probably giving issues at other places as well. And if using DHCP you have to at least enforce that both sides don't go for the same IP at least. But all no explanation why it doesn't work on the same host. Which is why I was asking for running the service on the bridge to check if that would work at least. So that we can go forward step by step. Is pacemaker/corosync/knet btw. using the same interfaces/IPs? Klaus > > Best Regards, > Strahil Nikolov > > На 15 юли 2020 г. 15:05:48 GMT+03:00, "stefan.schm...@farmpartner-tec.com" > написа: >> Hello, >> >> Am 15.07.2020 um 13:42 Strahil Nikolov wrote: >>> By default libvirt is using NAT and not routed network - in such >> case, vm1 won't receive data from host2. >>> Can you provide the Networks' xml ? >>> >>> Best Regards, >>> Strahil Nikolov >>> >> # cat default.xml >> >> default >> >> >> >> >> >> >> >> >> >> I just checked this and the file is identical on both hosts. >> >> kind regards >> Stefan Schmitz >> >> >>> На 15 юли 2020 г. 13:19:59 GMT+03:00, Klaus Wenninger >> написа: On 7/15/20 11:42 AM, stefan.schm...@farmpartner-tec.com wrote: > Hello, > > > Am 15.07.2020 um 06:32 Strahil Nikolov wrote: >> How did you configure the network on your ubuntu 20.04 Hosts ? I >> tried to setup bridged connection for the test setup , but obviously >> I'm missing something. >> >> Best Regards, >> Strahil Nikolov >> > on the hosts (CentOS) the bridge config looks like that.The >> bridging > and configuration is handled by the virtualization software: > > # cat ifcfg-br0 > DEVICE=br0 > TYPE=Bridge > BOOTPROTO=static > ONBOOT=yes > IPADDR=192.168.1.21 > NETMASK=255.255.0.0 > GATEWAY=192.168.1.1 > NM_CONTROLLED=no > IPV6_AUTOCONF=yes > IPV6_DEFROUTE=yes > IPV6_PEERDNS=yes > IPV6_PEERROUTES=yes > IPV6_FAILURE_FATAL=no > > > > Am 15.07.2020 um 09:50 Klaus Wenninger wrote: >> Guess it is not easy to have your servers connected physically for a > try. >> But maybe you can at least try on one host to have virt_fenced & >> VM >> on the same bridge - just to see if that basic pattern is working. > I am not sure if I understand you correctly. What do you by having > them on the same bridge? The bridge device is configured on the >> host > by the virtualization software. I meant to check out which bridge the interface of the VM is >> enslaved to and to use that bridge as interface in /etc/fence_virt.conf. Get me right - just for now - just to see if it is working for this >> one host and the corresponding guest. > >> Well maybe still sbdy in the middle playing IGMPv3 or the request for >> a certain source is needed to shoot open some firewall or switch-tables. > I am still waiting for the final report from our Data Center techs. >> I > hope that will clear up somethings. > > > Additionally I have just noticed that apparently since switching from > IGMPv3 to IGMPv2 and back the command "fence_xvm -a 225.0.0.12 -o > list" is no completely broken. > Before that switch this command at least returned the local VM. Now it > returns: > Timed out waiting for response > Operation failed > > I am a bit confused by that, because all we did was running >> commands > like "sysctl -w net.ipv4.conf.all.force_igmp_version =" with the > different Version umbers and #cat /proc/net/igmp shows that V3 is used > again on every device just like before...?! > > kind regards > Stefan Schmitz > > >> На 14 юли 2020 г. 11:06:42 GMT+03:00, >> "stefan.schm...@farmpartner-tec.com" >> написа: >>> Hello, >>> >>> >>> Am 09.07.2020 um 19:10 Strahil Nikolov wrote: Have you run 'fence_virtd -c' ? >>> Yes I had run that on both Hosts. The current config looks like that >>> and >>> is identical on both. >>> >>> cat fence_virt.conf >>> fence_virtd { >>> listener = "multicast"; >>> backend = "libvirt"; >>> module_path = "/usr/lib64/fence-virt"; >>> } >>> >>> listeners { >>> multicast { >>> key_file = "/etc/cluster/fence_xvm.key"; >>> address = "225.0.0.12"; >>> interface = "bond0"; >>> family = "ipv4"; >>> port = "1229"; >>> } >>> >>> } >>> >>> backends { >>> libvirt { >>> uri = "qemu:///system"; >>> } >>> >>> } >>> >>> >>> The situation is still
Re: [ClusterLabs] Still Beginner STONITH Problem
Hello, Am 15.07.2020 um 13:42 Strahil Nikolov wrote: By default libvirt is using NAT and not routed network - in such case, vm1 won't receive data from host2. Can you provide the Networks' xml ? Best Regards, Strahil Nikolov # cat default.xml default I just checked this and the file is identical on both hosts. kind regards Stefan Schmitz На 15 юли 2020 г. 13:19:59 GMT+03:00, Klaus Wenninger написа: On 7/15/20 11:42 AM, stefan.schm...@farmpartner-tec.com wrote: Hello, Am 15.07.2020 um 06:32 Strahil Nikolov wrote: How did you configure the network on your ubuntu 20.04 Hosts ? I tried to setup bridged connection for the test setup , but obviously I'm missing something. Best Regards, Strahil Nikolov on the hosts (CentOS) the bridge config looks like that.The bridging and configuration is handled by the virtualization software: # cat ifcfg-br0 DEVICE=br0 TYPE=Bridge BOOTPROTO=static ONBOOT=yes IPADDR=192.168.1.21 NETMASK=255.255.0.0 GATEWAY=192.168.1.1 NM_CONTROLLED=no IPV6_AUTOCONF=yes IPV6_DEFROUTE=yes IPV6_PEERDNS=yes IPV6_PEERROUTES=yes IPV6_FAILURE_FATAL=no Am 15.07.2020 um 09:50 Klaus Wenninger wrote: Guess it is not easy to have your servers connected physically for a try. But maybe you can at least try on one host to have virt_fenced & VM on the same bridge - just to see if that basic pattern is working. I am not sure if I understand you correctly. What do you by having them on the same bridge? The bridge device is configured on the host by the virtualization software. I meant to check out which bridge the interface of the VM is enslaved to and to use that bridge as interface in /etc/fence_virt.conf. Get me right - just for now - just to see if it is working for this one host and the corresponding guest. Well maybe still sbdy in the middle playing IGMPv3 or the request for a certain source is needed to shoot open some firewall or switch-tables. I am still waiting for the final report from our Data Center techs. I hope that will clear up somethings. Additionally I have just noticed that apparently since switching from IGMPv3 to IGMPv2 and back the command "fence_xvm -a 225.0.0.12 -o list" is no completely broken. Before that switch this command at least returned the local VM. Now it returns: Timed out waiting for response Operation failed I am a bit confused by that, because all we did was running commands like "sysctl -w net.ipv4.conf.all.force_igmp_version =" with the different Version umbers and #cat /proc/net/igmp shows that V3 is used again on every device just like before...?! kind regards Stefan Schmitz На 14 юли 2020 г. 11:06:42 GMT+03:00, "stefan.schm...@farmpartner-tec.com" написа: Hello, Am 09.07.2020 um 19:10 Strahil Nikolov wrote: Have you run 'fence_virtd -c' ? Yes I had run that on both Hosts. The current config looks like that and is identical on both. cat fence_virt.conf fence_virtd { listener = "multicast"; backend = "libvirt"; module_path = "/usr/lib64/fence-virt"; } listeners { multicast { key_file = "/etc/cluster/fence_xvm.key"; address = "225.0.0.12"; interface = "bond0"; family = "ipv4"; port = "1229"; } } backends { libvirt { uri = "qemu:///system"; } } The situation is still that no matter on what host I issue the "fence_xvm -a 225.0.0.12 -o list" command, both guest systems receive the traffic. The local guest, but also the guest on the other host. I reckon that means the traffic is not filtered by any network device, like switches or firewalls. Since the guest on the other host receives the packages, the traffic must reach te physical server and networkdevice and is then routed to the VM on that host. But still, the traffic is not shown on the host itself. Further the local firewalls on both hosts are set to let each and every traffic pass. Accept to any and everything. Well at least as far as I can see. Am 09.07.2020 um 22:34 Klaus Wenninger wrote: makes me believe that the whole setup doesn't lookas I would have expected (bridges on each host where theguest has a connection to and where ethernet interfaces that connect the 2 hosts are part of as well On each physical server the networkcards are bonded to achieve failure safety (bond0). The guest are connected over a bridge(br0) but apparently our virtualization softrware creates an own device named after the guest (kvm101.0). There is no direct connection between the servers, but as I said earlier, the multicast traffic does reach the VMs so I assume there is no problem with that. Am 09.07.2020 um 20:18 Vladislav Bogdanov wrote: First, you need to ensure that your switch (or all switches in the path) have igmp snooping enabled on host ports (and probably interconnects along the path between your hosts). Second,
Re: [ClusterLabs] Still Beginner STONITH Problem
On 7/15/20 11:42 AM, stefan.schm...@farmpartner-tec.com wrote: > Hello, > > > Am 15.07.2020 um 06:32 Strahil Nikolov wrote: >> How did you configure the network on your ubuntu 20.04 Hosts ? I >> tried to setup bridged connection for the test setup , but obviously >> I'm missing something. >> >> Best Regards, >> Strahil Nikolov >> > > on the hosts (CentOS) the bridge config looks like that.The bridging > and configuration is handled by the virtualization software: > > # cat ifcfg-br0 > DEVICE=br0 > TYPE=Bridge > BOOTPROTO=static > ONBOOT=yes > IPADDR=192.168.1.21 > NETMASK=255.255.0.0 > GATEWAY=192.168.1.1 > NM_CONTROLLED=no > IPV6_AUTOCONF=yes > IPV6_DEFROUTE=yes > IPV6_PEERDNS=yes > IPV6_PEERROUTES=yes > IPV6_FAILURE_FATAL=no > > > > Am 15.07.2020 um 09:50 Klaus Wenninger wrote: > > Guess it is not easy to have your servers connected physically for a > try. > > But maybe you can at least try on one host to have virt_fenced & VM > > on the same bridge - just to see if that basic pattern is working. > > I am not sure if I understand you correctly. What do you by having > them on the same bridge? The bridge device is configured on the host > by the virtualization software. I meant to check out which bridge the interface of the VM is enslaved to and to use that bridge as interface in /etc/fence_virt.conf. Get me right - just for now - just to see if it is working for this one host and the corresponding guest. > > > >Well maybe still sbdy in the middle playing IGMPv3 or the request for > >a certain source is needed to shoot open some firewall or switch-tables. > > I am still waiting for the final report from our Data Center techs. I > hope that will clear up somethings. > > > Additionally I have just noticed that apparently since switching from > IGMPv3 to IGMPv2 and back the command "fence_xvm -a 225.0.0.12 -o > list" is no completely broken. > Before that switch this command at least returned the local VM. Now it > returns: > Timed out waiting for response > Operation failed > > I am a bit confused by that, because all we did was running commands > like "sysctl -w net.ipv4.conf.all.force_igmp_version =" with the > different Version umbers and #cat /proc/net/igmp shows that V3 is used > again on every device just like before...?! > > kind regards > Stefan Schmitz > > >> На 14 юли 2020 г. 11:06:42 GMT+03:00, >> "stefan.schm...@farmpartner-tec.com" >> написа: >>> Hello, >>> >>> >>> Am 09.07.2020 um 19:10 Strahil Nikolov wrote: Have you run 'fence_virtd -c' ? >>> Yes I had run that on both Hosts. The current config looks like that >>> and >>> is identical on both. >>> >>> cat fence_virt.conf >>> fence_virtd { >>> listener = "multicast"; >>> backend = "libvirt"; >>> module_path = "/usr/lib64/fence-virt"; >>> } >>> >>> listeners { >>> multicast { >>> key_file = "/etc/cluster/fence_xvm.key"; >>> address = "225.0.0.12"; >>> interface = "bond0"; >>> family = "ipv4"; >>> port = "1229"; >>> } >>> >>> } >>> >>> backends { >>> libvirt { >>> uri = "qemu:///system"; >>> } >>> >>> } >>> >>> >>> The situation is still that no matter on what host I issue the >>> "fence_xvm -a 225.0.0.12 -o list" command, both guest systems receive >>> the traffic. The local guest, but also the guest on the other host. I >>> reckon that means the traffic is not filtered by any network device, >>> like switches or firewalls. Since the guest on the other host receives >>> the packages, the traffic must reach te physical server and >>> networkdevice and is then routed to the VM on that host. >>> But still, the traffic is not shown on the host itself. >>> >>> Further the local firewalls on both hosts are set to let each and every >>> >>> traffic pass. Accept to any and everything. Well at least as far as I >>> can see. >>> >>> >>> Am 09.07.2020 um 22:34 Klaus Wenninger wrote: makes me believe that the whole setup doesn't lookas I would have expected (bridges on each host where theguest has a connection to and where ethernet interfaces that connect the 2 hosts are part of as well >>> >>> On each physical server the networkcards are bonded to achieve failure >>> safety (bond0). The guest are connected over a bridge(br0) but >>> apparently our virtualization softrware creates an own device named >>> after the guest (kvm101.0). >>> There is no direct connection between the servers, but as I said >>> earlier, the multicast traffic does reach the VMs so I assume there is >>> no problem with that. >>> >>> >>> Am 09.07.2020 um 20:18 Vladislav Bogdanov wrote: First, you need to ensure that your switch (or all switches in the path) have igmp snooping enabled on host ports (and probably interconnects along the path between your hosts). Second, you need an igmp querier to be enabled somewhere near (better to have it enabled on a
Re: [ClusterLabs] Still Beginner STONITH Problem
Hello, Am 15.07.2020 um 06:32 Strahil Nikolov wrote: How did you configure the network on your ubuntu 20.04 Hosts ? I tried to setup bridged connection for the test setup , but obviously I'm missing something. Best Regards, Strahil Nikolov on the hosts (CentOS) the bridge config looks like that.The bridging and configuration is handled by the virtualization software: # cat ifcfg-br0 DEVICE=br0 TYPE=Bridge BOOTPROTO=static ONBOOT=yes IPADDR=192.168.1.21 NETMASK=255.255.0.0 GATEWAY=192.168.1.1 NM_CONTROLLED=no IPV6_AUTOCONF=yes IPV6_DEFROUTE=yes IPV6_PEERDNS=yes IPV6_PEERROUTES=yes IPV6_FAILURE_FATAL=no Am 15.07.2020 um 09:50 Klaus Wenninger wrote: > Guess it is not easy to have your servers connected physically for a try. > But maybe you can at least try on one host to have virt_fenced & VM > on the same bridge - just to see if that basic pattern is working. I am not sure if I understand you correctly. What do you by having them on the same bridge? The bridge device is configured on the host by the virtualization software. >Well maybe still sbdy in the middle playing IGMPv3 or the request for >a certain source is needed to shoot open some firewall or switch-tables. I am still waiting for the final report from our Data Center techs. I hope that will clear up somethings. Additionally I have just noticed that apparently since switching from IGMPv3 to IGMPv2 and back the command "fence_xvm -a 225.0.0.12 -o list" is no completely broken. Before that switch this command at least returned the local VM. Now it returns: Timed out waiting for response Operation failed I am a bit confused by that, because all we did was running commands like "sysctl -w net.ipv4.conf.all.force_igmp_version =" with the different Version umbers and #cat /proc/net/igmp shows that V3 is used again on every device just like before...?! kind regards Stefan Schmitz На 14 юли 2020 г. 11:06:42 GMT+03:00, "stefan.schm...@farmpartner-tec.com" написа: Hello, Am 09.07.2020 um 19:10 Strahil Nikolov wrote: Have you run 'fence_virtd -c' ? Yes I had run that on both Hosts. The current config looks like that and is identical on both. cat fence_virt.conf fence_virtd { listener = "multicast"; backend = "libvirt"; module_path = "/usr/lib64/fence-virt"; } listeners { multicast { key_file = "/etc/cluster/fence_xvm.key"; address = "225.0.0.12"; interface = "bond0"; family = "ipv4"; port = "1229"; } } backends { libvirt { uri = "qemu:///system"; } } The situation is still that no matter on what host I issue the "fence_xvm -a 225.0.0.12 -o list" command, both guest systems receive the traffic. The local guest, but also the guest on the other host. I reckon that means the traffic is not filtered by any network device, like switches or firewalls. Since the guest on the other host receives the packages, the traffic must reach te physical server and networkdevice and is then routed to the VM on that host. But still, the traffic is not shown on the host itself. Further the local firewalls on both hosts are set to let each and every traffic pass. Accept to any and everything. Well at least as far as I can see. Am 09.07.2020 um 22:34 Klaus Wenninger wrote: makes me believe that the whole setup doesn't lookas I would have expected (bridges on each host where theguest has a connection to and where ethernet interfaces that connect the 2 hosts are part of as well On each physical server the networkcards are bonded to achieve failure safety (bond0). The guest are connected over a bridge(br0) but apparently our virtualization softrware creates an own device named after the guest (kvm101.0). There is no direct connection between the servers, but as I said earlier, the multicast traffic does reach the VMs so I assume there is no problem with that. Am 09.07.2020 um 20:18 Vladislav Bogdanov wrote: First, you need to ensure that your switch (or all switches in the path) have igmp snooping enabled on host ports (and probably interconnects along the path between your hosts). Second, you need an igmp querier to be enabled somewhere near (better to have it enabled on a switch itself). Please verify that you see its queries on hosts. Next, you probably need to make your hosts to use IGMPv2 (not 3) as many switches still can not understand v3. This is doable by sysctl, find on internet, there are many articles. I have send an query to our Data center Techs who are analyzing this and were already on it analyzing if multicast Traffic is somewhere blocked or hindered. So far the answer is, "multicast ist explictly allowed in the local network and no packets are filtered or dropped". I am still waiting for a final report though. In the meantime I have switched IGMPv3 to IGMPv2 on every involved server, hosts and guests via the mentioned sysctl. The
Re: [ClusterLabs] Still Beginner STONITH Problem
How did you configure the network on your ubuntu 20.04 Hosts ? I tried to setup bridged connection for the test setup , but obviously I'm missing something. Best Regards, Strahil Nikolov На 14 юли 2020 г. 11:06:42 GMT+03:00, "stefan.schm...@farmpartner-tec.com" написа: >Hello, > > >Am 09.07.2020 um 19:10 Strahil Nikolov wrote: > >Have you run 'fence_virtd -c' ? >Yes I had run that on both Hosts. The current config looks like that >and >is identical on both. > >cat fence_virt.conf >fence_virtd { > listener = "multicast"; > backend = "libvirt"; > module_path = "/usr/lib64/fence-virt"; >} > >listeners { > multicast { > key_file = "/etc/cluster/fence_xvm.key"; > address = "225.0.0.12"; > interface = "bond0"; > family = "ipv4"; > port = "1229"; > } > >} > >backends { > libvirt { > uri = "qemu:///system"; > } > >} > > >The situation is still that no matter on what host I issue the >"fence_xvm -a 225.0.0.12 -o list" command, both guest systems receive >the traffic. The local guest, but also the guest on the other host. I >reckon that means the traffic is not filtered by any network device, >like switches or firewalls. Since the guest on the other host receives >the packages, the traffic must reach te physical server and >networkdevice and is then routed to the VM on that host. >But still, the traffic is not shown on the host itself. > >Further the local firewalls on both hosts are set to let each and every > >traffic pass. Accept to any and everything. Well at least as far as I >can see. > > >Am 09.07.2020 um 22:34 Klaus Wenninger wrote: > > makes me believe that > > the whole setup doesn't lookas I would have > > expected (bridges on each host where theguest > > has a connection to and where ethernet interfaces > > that connect the 2 hosts are part of as well > >On each physical server the networkcards are bonded to achieve failure >safety (bond0). The guest are connected over a bridge(br0) but >apparently our virtualization softrware creates an own device named >after the guest (kvm101.0). >There is no direct connection between the servers, but as I said >earlier, the multicast traffic does reach the VMs so I assume there is >no problem with that. > > >Am 09.07.2020 um 20:18 Vladislav Bogdanov wrote: > > First, you need to ensure that your switch (or all switches in the > > path) have igmp snooping enabled on host ports (and probably > > interconnects along the path between your hosts). > > >> Second, you need an igmp querier to be enabled somewhere near (better >> to have it enabled on a switch itself). Please verify that you see >its > > queries on hosts. > > > > Next, you probably need to make your hosts to use IGMPv2 (not 3) as > > many switches still can not understand v3. This is doable by sysctl, > > find on internet, there are many articles. > > >I have send an query to our Data center Techs who are analyzing this >and >were already on it analyzing if multicast Traffic is somewhere blocked >or hindered. So far the answer is, "multicast ist explictly allowed in >the local network and no packets are filtered or dropped". I am still >waiting for a final report though. > >In the meantime I have switched IGMPv3 to IGMPv2 on every involved >server, hosts and guests via the mentioned sysctl. The switching itself > >was successful, according to "cat /proc/net/igmp" but sadly did not >better the behavior. It actually led to that no VM received the >multicast traffic anymore too. > >kind regards >Stefan Schmitz > > >Am 09.07.2020 um 22:34 schrieb Klaus Wenninger: >> On 7/9/20 5:17 PM, stefan.schm...@farmpartner-tec.com wrote: >>> Hello, >>> Well, theory still holds I would say. I guess that the multicast-traffic from the other host or the guestsdoesn't get to the daemon on the host. Can't you just simply check if there are any firewall rules configuredon the host kernel? >>> >>> I hope I did understand you corretcly and you are referring to >iptables? >> I didn't say iptables because it might have been >> nftables - but yesthat is what I was referring to. >> Guess to understand the config the output is >> lacking verbositybut it makes me believe that >> the whole setup doesn't lookas I would have >> expected (bridges on each host where theguest >> has a connection to and where ethernet interfaces >> that connect the 2 hosts are part of as well - >> everythingconnected via layer 2 basically). >>> Here is the output of the current rules. Besides the IP of the guest >>> the output is identical on both hosts: >>> >>> # iptables -S >>> -P INPUT ACCEPT >>> -P FORWARD ACCEPT >>> -P OUTPUT ACCEPT >>> >>> # iptables -L >>> Chain INPUT (policy ACCEPT) >>> target prot opt source destination >>> >>> Chain FORWARD (policy ACCEPT) >>> target prot opt source destination >>>
Re: [ClusterLabs] Still Beginner STONITH Problem
On 7/14/20 10:06 AM, stefan.schm...@farmpartner-tec.com wrote: > Hello, > > > Am 09.07.2020 um 19:10 Strahil Nikolov wrote: > >Have you run 'fence_virtd -c' ? > Yes I had run that on both Hosts. The current config looks like that > and is identical on both. > > cat fence_virt.conf > fence_virtd { > listener = "multicast"; > backend = "libvirt"; > module_path = "/usr/lib64/fence-virt"; > } > > listeners { > multicast { > key_file = "/etc/cluster/fence_xvm.key"; > address = "225.0.0.12"; > interface = "bond0"; > family = "ipv4"; > port = "1229"; > } > > } > > backends { > libvirt { > uri = "qemu:///system"; > } > > } > > > The situation is still that no matter on what host I issue the > "fence_xvm -a 225.0.0.12 -o list" command, both guest systems receive > the traffic. The local guest, but also the guest on the other host. I > reckon that means the traffic is not filtered by any network device, > like switches or firewalls. Since the guest on the other host receives > the packages, the traffic must reach te physical server and > networkdevice and is then routed to the VM on that host. > But still, the traffic is not shown on the host itself. > > Further the local firewalls on both hosts are set to let each and > every traffic pass. Accept to any and everything. Well at least as far > as I can see. > > > Am 09.07.2020 um 22:34 Klaus Wenninger wrote: > > makes me believe that > > the whole setup doesn't lookas I would have > > expected (bridges on each host where theguest > > has a connection to and where ethernet interfaces > > that connect the 2 hosts are part of as well > > On each physical server the networkcards are bonded to achieve failure > safety (bond0). The guest are connected over a bridge(br0) but > apparently our virtualization softrware creates an own device named > after the guest (kvm101.0). > There is no direct connection between the servers, but as I said > earlier, the multicast traffic does reach the VMs so I assume there is > no problem with that. Guess it is not easy to have your servers connected physically for a try. But maybe you can at least try on one host to have virt_fenced & VM on the same bridge - just to see if that basic pattern is working. > > > Am 09.07.2020 um 20:18 Vladislav Bogdanov wrote: > > First, you need to ensure that your switch (or all switches in the > > path) have igmp snooping enabled on host ports (and probably > > interconnects along the path between your hosts). > > > > Second, you need an igmp querier to be enabled somewhere near (better > > to have it enabled on a switch itself). Please verify that you see its > > queries on hosts. > > > > Next, you probably need to make your hosts to use IGMPv2 (not 3) as > > many switches still can not understand v3. This is doable by sysctl, > > find on internet, there are many articles. > > > I have send an query to our Data center Techs who are analyzing this > and were already on it analyzing if multicast Traffic is somewhere > blocked or hindered. So far the answer is, "multicast ist explictly > allowed in the local network and no packets are filtered or dropped". > I am still waiting for a final report though. > > In the meantime I have switched IGMPv3 to IGMPv2 on every involved > server, hosts and guests via the mentioned sysctl. The switching > itself was successful, according to "cat /proc/net/igmp" but sadly did > not better the behavior. It actually led to that no VM received the > multicast traffic anymore too. Well maybe still sbdy in the middle playing IGMPv3 or the request for a certain source is needed to shoot open some firewall or switch-tables. > > kind regards > Stefan Schmitz > > > Am 09.07.2020 um 22:34 schrieb Klaus Wenninger: >> On 7/9/20 5:17 PM, stefan.schm...@farmpartner-tec.com wrote: >>> Hello, >>> Well, theory still holds I would say. I guess that the multicast-traffic from the other host or the guestsdoesn't get to the daemon on the host. Can't you just simply check if there are any firewall rules configuredon the host kernel? >>> >>> I hope I did understand you corretcly and you are referring to >>> iptables? >> I didn't say iptables because it might have been >> nftables - but yesthat is what I was referring to. >> Guess to understand the config the output is >> lacking verbositybut it makes me believe that >> the whole setup doesn't lookas I would have >> expected (bridges on each host where theguest >> has a connection to and where ethernet interfaces >> that connect the 2 hosts are part of as well - >> everythingconnected via layer 2 basically). >>> Here is the output of the current rules. Besides the IP of the guest >>> the output is identical on both hosts: >>> >>> # iptables -S >>> -P INPUT ACCEPT >>> -P FORWARD ACCEPT >>> -P OUTPUT ACCEPT >>> >>> # iptables -L >>> Chain INPUT (policy ACCEPT) >>>
Re: [ClusterLabs] Still Beginner STONITH Problem
Hello, Am 09.07.2020 um 19:10 Strahil Nikolov wrote: >Have you run 'fence_virtd -c' ? Yes I had run that on both Hosts. The current config looks like that and is identical on both. cat fence_virt.conf fence_virtd { listener = "multicast"; backend = "libvirt"; module_path = "/usr/lib64/fence-virt"; } listeners { multicast { key_file = "/etc/cluster/fence_xvm.key"; address = "225.0.0.12"; interface = "bond0"; family = "ipv4"; port = "1229"; } } backends { libvirt { uri = "qemu:///system"; } } The situation is still that no matter on what host I issue the "fence_xvm -a 225.0.0.12 -o list" command, both guest systems receive the traffic. The local guest, but also the guest on the other host. I reckon that means the traffic is not filtered by any network device, like switches or firewalls. Since the guest on the other host receives the packages, the traffic must reach te physical server and networkdevice and is then routed to the VM on that host. But still, the traffic is not shown on the host itself. Further the local firewalls on both hosts are set to let each and every traffic pass. Accept to any and everything. Well at least as far as I can see. Am 09.07.2020 um 22:34 Klaus Wenninger wrote: > makes me believe that > the whole setup doesn't lookas I would have > expected (bridges on each host where theguest > has a connection to and where ethernet interfaces > that connect the 2 hosts are part of as well On each physical server the networkcards are bonded to achieve failure safety (bond0). The guest are connected over a bridge(br0) but apparently our virtualization softrware creates an own device named after the guest (kvm101.0). There is no direct connection between the servers, but as I said earlier, the multicast traffic does reach the VMs so I assume there is no problem with that. Am 09.07.2020 um 20:18 Vladislav Bogdanov wrote: > First, you need to ensure that your switch (or all switches in the > path) have igmp snooping enabled on host ports (and probably > interconnects along the path between your hosts). > > Second, you need an igmp querier to be enabled somewhere near (better > to have it enabled on a switch itself). Please verify that you see its > queries on hosts. > > Next, you probably need to make your hosts to use IGMPv2 (not 3) as > many switches still can not understand v3. This is doable by sysctl, > find on internet, there are many articles. I have send an query to our Data center Techs who are analyzing this and were already on it analyzing if multicast Traffic is somewhere blocked or hindered. So far the answer is, "multicast ist explictly allowed in the local network and no packets are filtered or dropped". I am still waiting for a final report though. In the meantime I have switched IGMPv3 to IGMPv2 on every involved server, hosts and guests via the mentioned sysctl. The switching itself was successful, according to "cat /proc/net/igmp" but sadly did not better the behavior. It actually led to that no VM received the multicast traffic anymore too. kind regards Stefan Schmitz Am 09.07.2020 um 22:34 schrieb Klaus Wenninger: On 7/9/20 5:17 PM, stefan.schm...@farmpartner-tec.com wrote: Hello, Well, theory still holds I would say. I guess that the multicast-traffic from the other host or the guestsdoesn't get to the daemon on the host. Can't you just simply check if there are any firewall rules configuredon the host kernel? I hope I did understand you corretcly and you are referring to iptables? I didn't say iptables because it might have been nftables - but yesthat is what I was referring to. Guess to understand the config the output is lacking verbositybut it makes me believe that the whole setup doesn't lookas I would have expected (bridges on each host where theguest has a connection to and where ethernet interfaces that connect the 2 hosts are part of as well - everythingconnected via layer 2 basically). Here is the output of the current rules. Besides the IP of the guest the output is identical on both hosts: # iptables -S -P INPUT ACCEPT -P FORWARD ACCEPT -P OUTPUT ACCEPT # iptables -L Chain INPUT (policy ACCEPT) target prot opt source destination Chain FORWARD (policy ACCEPT) target prot opt source destination SOLUSVM_TRAFFIC_IN all -- anywhere anywhere SOLUSVM_TRAFFIC_OUT all -- anywhere anywhere Chain OUTPUT (policy ACCEPT) target prot opt source destination Chain SOLUSVM_TRAFFIC_IN (1 references) target prot opt source destination all -- anywhere 192.168.1.14 Chain SOLUSVM_TRAFFIC_OUT (1 references) target prot opt source destination all -- 192.168.1.14 anywhere kind regards Stefan Schmitz
Re: [ClusterLabs] Still Beginner STONITH Problem
Have you run 'fence_virtd -c' ? I made a silly mistake last time when I deployed it and the daemon was not listening on the right interface. Netstat can check this out. Also, As far as I know hosts use unicast to reply to the VMs (thus tcp/1229 and not udp/1229). If you have a developer account for Red Hat, you can check https://access.redhat.com/solutions/917833 Best Regards, Strahil Nikolov На 9 юли 2020 г. 17:01:13 GMT+03:00, "stefan.schm...@farmpartner-tec.com" написа: >Hello, > >thanks for the advise. I have worked through that list as follows: > > > - key deployed on the Hypervisours > > - key deployed on the VMs >I created the key file a while ago once on one host and distributed it >to every other host and guest. Right now it resides on all 4 machines >in >the same path: /etc/cluster/fence_xvm.key >Is there maybe a a corosync/Stonith or other function which checks the >keyfiles for any corruption or errors? > > > > - fence_virtd running on both Hypervisours >It is running on each host: ># ps aux |grep fence_virtd >root 62032 0.0 0.0 251568 4496 ?Ss Jun29 0:00 >fence_virtd > > >> - Firewall opened (1229/udp for the hosts, 1229/tcp for the >guests) > >Command on one host: >fence_xvm -a 225.0.0.12 -o list > >tcpdump on the guest residing on the other host: >host2.55179 > 225.0.0.12.1229: [udp sum ok] UDP, length 176 >host2 > igmp.mcast.net: igmp v3 report, 1 group record(s) [gaddr >225.0.0.12 to_in { }] >host2 > igmp.mcast.net: igmp v3 report, 1 group record(s) [gaddr >225.0.0.12 to_in { }] > >At least to me it looks like the VMs are reachable by the multicast >traffic. >Additionally, no matter on which host I execute the fence_xvm command, >tcpdum shows the same traffic on both guests. >But on the other hand, at the same time, tcpdump shows nothing on the >other host. Just to be sure I have flushed iptables beforehand on each >host. Is there maybe a problem? > > > > - fence_xvm on both VMs >fence_xvm is installed on both VMs ># which fence_xvm >/usr/sbin/fence_xvm > >Could you please advise on how to proceed? Thank you in advance. >Kind regards >Stefan Schmitz > >Am 08.07.2020 um 20:24 schrieb Strahil Nikolov: >> Erm...network/firewall is always "green". Run tcpdump on Host1 and >VM2 (not on the same host). >> Then run again 'fence_xvm -o list' and check what is captured. >> >> In summary, you need: >> - key deployed on the Hypervisours >> - key deployed on the VMs >> - fence_virtd running on both Hypervisours >> - Firewall opened (1229/udp for the hosts, 1229/tcp for the >guests) >> - fence_xvm on both VMs >> >> In your case , the primary suspect is multicast traffic. >> >> Best Regards, >> Strahil Nikolov >> >> На 8 юли 2020 г. 16:33:45 GMT+03:00, >"stefan.schm...@farmpartner-tec.com" > написа: >>> Hello, >>> I can't find fence_virtd for Ubuntu18, but it is available for Ubuntu20. >>> >>> We have now upgraded our Server to Ubuntu 20.04 LTS and installed >the >>> packages fence-virt and fence-virtd. >>> >>> The command "fence_xvm -a 225.0.0.12 -o list" on the Hosts still >just >>> returns the single local VM. >>> >>> The same command on both VMs results in: >>> # fence_xvm -a 225.0.0.12 -o list >>> Timed out waiting for response >>> Operation failed >>> >>> But just as before, trying to connect from the guest to the host via >nc >>> >>> just works fine. >>> #nc -z -v -u 192.168.1.21 1229 >>> Connection to 192.168.1.21 1229 port [udp/*] succeeded! >>> >>> So the hosts and service basically is reachable. >>> >>> I have spoken to our Firewall tech, he has assured me, that no local >>> traffic is hindered by anything. Be it multicast or not. >>> Software Firewalls are not present/active on any of our servers. >>> >>> Ubuntu guests: >>> # ufw status >>> Status: inactive >>> >>> CentOS hosts: >>> systemctl status firewalld >>> ● firewalld.service - firewalld - dynamic firewall daemon >>> Loaded: loaded (/usr/lib/systemd/system/firewalld.service; >disabled; >>> vendor preset: enabled) >>> Active: inactive (dead) >>> Docs: man:firewalld(1) >>> >>> >>> Any hints or help on how to remedy this problem would be greatly >>> appreciated! >>> >>> Kind regards >>> Stefan Schmitz >>> >>> >>> Am 07.07.2020 um 10:54 schrieb Klaus Wenninger: On 7/7/20 10:33 AM, Strahil Nikolov wrote: > I can't find fence_virtd for Ubuntu18, but it is available for >>> Ubuntu20. > > Your other option is to get an iSCSI from your quorum system and >use >>> that for SBD. > For watchdog, you can use 'softdog' kernel module or you can use >KVM >>> to present one to the VMs. > You can also check the '-P' flag for SBD. With kvm please use the qemu-watchdog and try to prevent using softdogwith SBD. Especially if you are aiming for a production-cluster ... Adding something like that to libvirt-xml should do the trick: >>> function='0x0'/> > > Best
Re: [ClusterLabs] Still Beginner STONITH Problem
On 7/9/20 5:17 PM, stefan.schm...@farmpartner-tec.com wrote: > Hello, > > > Well, theory still holds I would say. > > > > I guess that the multicast-traffic from the other host > > or the guestsdoesn't get to the daemon on the host. > > Can't you just simply check if there are any firewall > > rules configuredon the host kernel? > > I hope I did understand you corretcly and you are referring to iptables? I didn't say iptables because it might have been nftables - but yesthat is what I was referring to. Guess to understand the config the output is lacking verbositybut it makes me believe that the whole setup doesn't lookas I would have expected (bridges on each host where theguest has a connection to and where ethernet interfaces that connect the 2 hosts are part of as well - everythingconnected via layer 2 basically). > Here is the output of the current rules. Besides the IP of the guest > the output is identical on both hosts: > > # iptables -S > -P INPUT ACCEPT > -P FORWARD ACCEPT > -P OUTPUT ACCEPT > > # iptables -L > Chain INPUT (policy ACCEPT) > target prot opt source destination > > Chain FORWARD (policy ACCEPT) > target prot opt source destination > SOLUSVM_TRAFFIC_IN all -- anywhere anywhere > SOLUSVM_TRAFFIC_OUT all -- anywhere anywhere > > Chain OUTPUT (policy ACCEPT) > target prot opt source destination > > Chain SOLUSVM_TRAFFIC_IN (1 references) > target prot opt source destination > all -- anywhere 192.168.1.14 > > Chain SOLUSVM_TRAFFIC_OUT (1 references) > target prot opt source destination > all -- 192.168.1.14 anywhere > > kind regards > Stefan Schmitz > > ___ Manage your subscription: https://lists.clusterlabs.org/mailman/listinfo/users ClusterLabs home: https://www.clusterlabs.org/
Re: [ClusterLabs] Still Beginner STONITH Problem
On 7/9/20 8:18 PM, Vladislav Bogdanov wrote: > Hi. > > This thread is getting too long. > > First, you need to ensure that your switch (or all switches in the > path) have igmp snooping enabled on host ports (and probably > interconnects along the path between your hosts). > > Second, you need an igmp querier to be enabled somewhere near (better > to have it enabled on a switch itself). Please verify that you see its > queries on hosts. > > Next, you probably need to make your hosts to use IGMPv2 (not 3) as > many switches still can not understand v3. This is doable by sysctl, > find on internet, there are many articles. Switch configuration might be in the way as well but as the problem exists in the communication between a host and the guest running on that host I would rather bet for firewall rules on the host(s). > > These advices are also applicable for running corosync itself in > multicast mode. > > Best, > Vladislav > > Thu, 02/07/2020 в 17:18 +0200, stefan.schm...@farmpartner-tec.com > wrote: >> Hello, >> >> I hope someone can help with this problem. We are (still) trying to >> get >> Stonith to achieve a running active/active HA Cluster, but sadly to >> no >> avail. >> >> There are 2 Centos Hosts. On each one there is a virtual Ubuntu VM. >> The >> Ubuntu VMs are the ones which should form the HA Cluster. >> >> The current status is this: >> >> # pcs status >> Cluster name: pacemaker_cluster >> WARNING: corosync and pacemaker node names do not match (IPs used in >> setup?) >> Stack: corosync >> Current DC: server2ubuntu1 (version 1.1.18-2b07d5c5a9) - partition >> with >> quorum >> Last updated: Thu Jul 2 17:03:53 2020 >> Last change: Thu Jul 2 14:33:14 2020 by root via cibadmin on >> server4ubuntu1 >> >> 2 nodes configured >> 13 resources configured >> >> Online: [ server2ubuntu1 server4ubuntu1 ] >> >> Full list of resources: >> >> stonith_id_1 (stonith:external/libvirt): Stopped >> Master/Slave Set: r0_pacemaker_Clone [r0_pacemaker] >> Masters: [ server4ubuntu1 ] >> Slaves: [ server2ubuntu1 ] >> Master/Slave Set: WebDataClone [WebData] >> Masters: [ server2ubuntu1 server4ubuntu1 ] >> Clone Set: dlm-clone [dlm] >> Started: [ server2ubuntu1 server4ubuntu1 ] >> Clone Set: ClusterIP-clone [ClusterIP] (unique) >> ClusterIP:0(ocf::heartbeat:IPaddr2): Started >> server2ubuntu1 >> ClusterIP:1(ocf::heartbeat:IPaddr2): Started >> server4ubuntu1 >> Clone Set: WebFS-clone [WebFS] >> Started: [ server4ubuntu1 ] >> Stopped: [ server2ubuntu1 ] >> Clone Set: WebSite-clone [WebSite] >> Started: [ server4ubuntu1 ] >> Stopped: [ server2ubuntu1 ] >> >> Failed Actions: >> * stonith_id_1_start_0 on server2ubuntu1 'unknown error' (1): >> call=201, >> status=Error, exitreason='', >> last-rc-change='Thu Jul 2 14:37:35 2020', queued=0ms, >> exec=3403ms >> * r0_pacemaker_monitor_6 on server2ubuntu1 'master' (8): >> call=203, >> status=complete, exitreason='', >> last-rc-change='Thu Jul 2 14:38:39 2020', queued=0ms, exec=0ms >> * stonith_id_1_start_0 on server4ubuntu1 'unknown error' (1): >> call=202, >> status=Error, exitreason='', >> last-rc-change='Thu Jul 2 14:37:39 2020', queued=0ms, >> exec=3411ms >> >> >> The stonith resoursce is stopped and does not seem to work. >> On both hosts the command >> # fence_xvm -o list >> kvm102 bab3749c-15fc-40b7-8b6c-d4267b9f0eb9 >> on >> >> returns the local VM. Apparently it connects through the >> Virtualization >> interface because it returns the VM name not the Hostname of the >> client >> VM. I do not know if this is how it is supposed to work? >> >> In the local network, every traffic is allowed. No firewall is >> locally >> active, just the connections leaving the local network are >> firewalled. >> Hence there are no coneection problems between the hosts and clients. >> For example we can succesfully connect from the clients to the Hosts: >> >> # nc -z -v -u 192.168.1.21 1229 >> Ncat: Version 7.50 ( >> https://nmap.org/ncat >> ) >> Ncat: Connected to 192.168.1.21:1229. >> Ncat: UDP packet sent successfully >> Ncat: 1 bytes sent, 0 bytes received in 2.03 seconds. >> >> # nc -z -v -u 192.168.1.13 1229 >> Ncat: Version 7.50 ( >> https://nmap.org/ncat >> ) >> Ncat: Connected to 192.168.1.13:1229. >> Ncat: UDP packet sent successfully >> Ncat: 1 bytes sent, 0 bytes received in 2.03 seconds. >> >> >> On the Ubuntu VMs we created and configured the the stonith resource >> according to the howto provided here: >> https://clusterlabs.org/pacemaker/doc/en-US/Pacemaker/1.1/pdf/Clusters_from_Scratch/Pacemaker-1.1-Clusters_from_Scratch-en-US.pdf >> >> >> The actual line we used: >> # pcs -f stonith_cfg stonith create stonith_id_1 external/libvirt >> hostlist="Host4,host2" >> hypervisor_uri="qemu+ssh://192.168.1.21/system" >> >> >> But as you can see in in the pcs status output, stonith is stopped >> and >> exits
Re: [ClusterLabs] Still Beginner STONITH Problem
Hi. This thread is getting too long. First, you need to ensure that your switch (or all switches in the path) have igmp snooping enabled on host ports (and probably interconnects along the path between your hosts). Second, you need an igmp querier to be enabled somewhere near (better to have it enabled on a switch itself). Please verify that you see its queries on hosts. Next, you probably need to make your hosts to use IGMPv2 (not 3) as many switches still can not understand v3. This is doable by sysctl, find on internet, there are many articles. These advices are also applicable for running corosync itself in multicast mode. Best, Vladislav Thu, 02/07/2020 в 17:18 +0200, stefan.schm...@farmpartner-tec.com wrote: > Hello, > > I hope someone can help with this problem. We are (still) trying to > get > Stonith to achieve a running active/active HA Cluster, but sadly to > no > avail. > > There are 2 Centos Hosts. On each one there is a virtual Ubuntu VM. > The > Ubuntu VMs are the ones which should form the HA Cluster. > > The current status is this: > > # pcs status > Cluster name: pacemaker_cluster > WARNING: corosync and pacemaker node names do not match (IPs used in > setup?) > Stack: corosync > Current DC: server2ubuntu1 (version 1.1.18-2b07d5c5a9) - partition > with > quorum > Last updated: Thu Jul 2 17:03:53 2020 > Last change: Thu Jul 2 14:33:14 2020 by root via cibadmin on > server4ubuntu1 > > 2 nodes configured > 13 resources configured > > Online: [ server2ubuntu1 server4ubuntu1 ] > > Full list of resources: > > stonith_id_1 (stonith:external/libvirt): Stopped > Master/Slave Set: r0_pacemaker_Clone [r0_pacemaker] > Masters: [ server4ubuntu1 ] > Slaves: [ server2ubuntu1 ] > Master/Slave Set: WebDataClone [WebData] > Masters: [ server2ubuntu1 server4ubuntu1 ] > Clone Set: dlm-clone [dlm] > Started: [ server2ubuntu1 server4ubuntu1 ] > Clone Set: ClusterIP-clone [ClusterIP] (unique) > ClusterIP:0(ocf::heartbeat:IPaddr2): Started > server2ubuntu1 > ClusterIP:1(ocf::heartbeat:IPaddr2): Started > server4ubuntu1 > Clone Set: WebFS-clone [WebFS] > Started: [ server4ubuntu1 ] > Stopped: [ server2ubuntu1 ] > Clone Set: WebSite-clone [WebSite] > Started: [ server4ubuntu1 ] > Stopped: [ server2ubuntu1 ] > > Failed Actions: > * stonith_id_1_start_0 on server2ubuntu1 'unknown error' (1): > call=201, > status=Error, exitreason='', > last-rc-change='Thu Jul 2 14:37:35 2020', queued=0ms, > exec=3403ms > * r0_pacemaker_monitor_6 on server2ubuntu1 'master' (8): > call=203, > status=complete, exitreason='', > last-rc-change='Thu Jul 2 14:38:39 2020', queued=0ms, exec=0ms > * stonith_id_1_start_0 on server4ubuntu1 'unknown error' (1): > call=202, > status=Error, exitreason='', > last-rc-change='Thu Jul 2 14:37:39 2020', queued=0ms, > exec=3411ms > > > The stonith resoursce is stopped and does not seem to work. > On both hosts the command > # fence_xvm -o list > kvm102 bab3749c-15fc-40b7-8b6c-d4267b9f0eb9 > on > > returns the local VM. Apparently it connects through the > Virtualization > interface because it returns the VM name not the Hostname of the > client > VM. I do not know if this is how it is supposed to work? > > In the local network, every traffic is allowed. No firewall is > locally > active, just the connections leaving the local network are > firewalled. > Hence there are no coneection problems between the hosts and clients. > For example we can succesfully connect from the clients to the Hosts: > > # nc -z -v -u 192.168.1.21 1229 > Ncat: Version 7.50 ( > https://nmap.org/ncat > ) > Ncat: Connected to 192.168.1.21:1229. > Ncat: UDP packet sent successfully > Ncat: 1 bytes sent, 0 bytes received in 2.03 seconds. > > # nc -z -v -u 192.168.1.13 1229 > Ncat: Version 7.50 ( > https://nmap.org/ncat > ) > Ncat: Connected to 192.168.1.13:1229. > Ncat: UDP packet sent successfully > Ncat: 1 bytes sent, 0 bytes received in 2.03 seconds. > > > On the Ubuntu VMs we created and configured the the stonith resource > according to the howto provided here: > https://clusterlabs.org/pacemaker/doc/en-US/Pacemaker/1.1/pdf/Clusters_from_Scratch/Pacemaker-1.1-Clusters_from_Scratch-en-US.pdf > > > The actual line we used: > # pcs -f stonith_cfg stonith create stonith_id_1 external/libvirt > hostlist="Host4,host2" > hypervisor_uri="qemu+ssh://192.168.1.21/system" > > > But as you can see in in the pcs status output, stonith is stopped > and > exits with an unkown error. > > Can somebody please advise on how to procced or what additionla > information is needed to solve this problem? > Any help would be greatly appreciated! Thank you in advance. > > Kind regards > Stefan Schmitz > > > > > > > > ___ Manage your subscription:
Re: [ClusterLabs] Still Beginner STONITH Problem
Hello, > Well, theory still holds I would say. > > I guess that the multicast-traffic from the other host > or the guestsdoesn't get to the daemon on the host. > Can't you just simply check if there are any firewall > rules configuredon the host kernel? I hope I did understand you corretcly and you are referring to iptables? Here is the output of the current rules. Besides the IP of the guest the output is identical on both hosts: # iptables -S -P INPUT ACCEPT -P FORWARD ACCEPT -P OUTPUT ACCEPT # iptables -L Chain INPUT (policy ACCEPT) target prot opt source destination Chain FORWARD (policy ACCEPT) target prot opt source destination SOLUSVM_TRAFFIC_IN all -- anywhere anywhere SOLUSVM_TRAFFIC_OUT all -- anywhere anywhere Chain OUTPUT (policy ACCEPT) target prot opt source destination Chain SOLUSVM_TRAFFIC_IN (1 references) target prot opt source destination all -- anywhere 192.168.1.14 Chain SOLUSVM_TRAFFIC_OUT (1 references) target prot opt source destination all -- 192.168.1.14 anywhere kind regards Stefan Schmitz Am 09.07.2020 um 16:30 schrieb Klaus Wenninger: On 7/9/20 4:01 PM, stefan.schm...@farmpartner-tec.com wrote: Hello, thanks for the advise. I have worked through that list as follows: - key deployed on the Hypervisours - key deployed on the VMs I created the key file a while ago once on one host and distributed it to every other host and guest. Right now it resides on all 4 machines in the same path: /etc/cluster/fence_xvm.key Is there maybe a a corosync/Stonith or other function which checks the keyfiles for any corruption or errors? - fence_virtd running on both Hypervisours It is running on each host: # ps aux |grep fence_virtd root 62032 0.0 0.0 251568 4496 ? Ss Jun29 0:00 fence_virtd - Firewall opened (1229/udp for the hosts, 1229/tcp for the guests) Command on one host: fence_xvm -a 225.0.0.12 -o list tcpdump on the guest residing on the other host: host2.55179 > 225.0.0.12.1229: [udp sum ok] UDP, length 176 host2 > igmp.mcast.net: igmp v3 report, 1 group record(s) [gaddr 225.0.0.12 to_in { }] host2 > igmp.mcast.net: igmp v3 report, 1 group record(s) [gaddr 225.0.0.12 to_in { }] At least to me it looks like the VMs are reachable by the multicast traffic. Additionally, no matter on which host I execute the fence_xvm command, tcpdum shows the same traffic on both guests. But on the other hand, at the same time, tcpdump shows nothing on the other host. Just to be sure I have flushed iptables beforehand on each host. Is there maybe a problem? Well, theory still holds I would say. I guess that the multicast-traffic from the other host or the guestsdoesn't get to the daemon on the host. Can't you just simply check if there are any firewall rules configuredon the host kernel? - fence_xvm on both VMs fence_xvm is installed on both VMs # which fence_xvm /usr/sbin/fence_xvm Could you please advise on how to proceed? Thank you in advance. Kind regards Stefan Schmitz Am 08.07.2020 um 20:24 schrieb Strahil Nikolov: Erm...network/firewall is always "green". Run tcpdump on Host1 and VM2 (not on the same host). Then run again 'fence_xvm -o list' and check what is captured. In summary, you need: - key deployed on the Hypervisours - key deployed on the VMs - fence_virtd running on both Hypervisours - Firewall opened (1229/udp for the hosts, 1229/tcp for the guests) - fence_xvm on both VMs In your case , the primary suspect is multicast traffic. Best Regards, Strahil Nikolov На 8 юли 2020 г. 16:33:45 GMT+03:00, "stefan.schm...@farmpartner-tec.com" написа: Hello, I can't find fence_virtd for Ubuntu18, but it is available for Ubuntu20. We have now upgraded our Server to Ubuntu 20.04 LTS and installed the packages fence-virt and fence-virtd. The command "fence_xvm -a 225.0.0.12 -o list" on the Hosts still just returns the single local VM. The same command on both VMs results in: # fence_xvm -a 225.0.0.12 -o list Timed out waiting for response Operation failed But just as before, trying to connect from the guest to the host via nc just works fine. #nc -z -v -u 192.168.1.21 1229 Connection to 192.168.1.21 1229 port [udp/*] succeeded! So the hosts and service basically is reachable. I have spoken to our Firewall tech, he has assured me, that no local traffic is hindered by anything. Be it multicast or not. Software Firewalls are not present/active on any of our servers. Ubuntu guests: # ufw status Status: inactive CentOS hosts: systemctl status firewalld ● firewalld.service - firewalld - dynamic firewall daemon Loaded: loaded (/usr/lib/systemd/system/firewalld.service; disabled; vendor preset: enabled) Active: inactive (dead) Docs: man:firewalld(1) Any hints or help on how to remedy this problem would be greatly appreciated!
Re: [ClusterLabs] Still Beginner STONITH Problem
On 7/9/20 4:01 PM, stefan.schm...@farmpartner-tec.com wrote: > Hello, > > thanks for the advise. I have worked through that list as follows: > > > - key deployed on the Hypervisours > > - key deployed on the VMs > I created the key file a while ago once on one host and distributed it > to every other host and guest. Right now it resides on all 4 machines > in the same path: /etc/cluster/fence_xvm.key > Is there maybe a a corosync/Stonith or other function which checks the > keyfiles for any corruption or errors? > > > > - fence_virtd running on both Hypervisours > It is running on each host: > # ps aux |grep fence_virtd > root 62032 0.0 0.0 251568 4496 ? Ss Jun29 0:00 > fence_virtd > > > > - Firewall opened (1229/udp for the hosts, 1229/tcp for the guests) > > Command on one host: > fence_xvm -a 225.0.0.12 -o list > > tcpdump on the guest residing on the other host: > host2.55179 > 225.0.0.12.1229: [udp sum ok] UDP, length 176 > host2 > igmp.mcast.net: igmp v3 report, 1 group record(s) [gaddr > 225.0.0.12 to_in { }] > host2 > igmp.mcast.net: igmp v3 report, 1 group record(s) [gaddr > 225.0.0.12 to_in { }] > > At least to me it looks like the VMs are reachable by the multicast > traffic. > Additionally, no matter on which host I execute the fence_xvm command, > tcpdum shows the same traffic on both guests. > But on the other hand, at the same time, tcpdump shows nothing on the > other host. Just to be sure I have flushed iptables beforehand on each > host. Is there maybe a problem? Well, theory still holds I would say. I guess that the multicast-traffic from the other host or the guestsdoesn't get to the daemon on the host. Can't you just simply check if there are any firewall rules configuredon the host kernel? > > > > - fence_xvm on both VMs > fence_xvm is installed on both VMs > # which fence_xvm > /usr/sbin/fence_xvm > > Could you please advise on how to proceed? Thank you in advance. > Kind regards > Stefan Schmitz > > Am 08.07.2020 um 20:24 schrieb Strahil Nikolov: >> Erm...network/firewall is always "green". Run tcpdump on Host1 and >> VM2 (not on the same host). >> Then run again 'fence_xvm -o list' and check what is captured. >> >> In summary, you need: >> - key deployed on the Hypervisours >> - key deployed on the VMs >> - fence_virtd running on both Hypervisours >> - Firewall opened (1229/udp for the hosts, 1229/tcp for the guests) >> - fence_xvm on both VMs >> >> In your case , the primary suspect is multicast traffic. >> >> Best Regards, >> Strahil Nikolov >> >> На 8 юли 2020 г. 16:33:45 GMT+03:00, >> "stefan.schm...@farmpartner-tec.com" >> написа: >>> Hello, >>> I can't find fence_virtd for Ubuntu18, but it is available for Ubuntu20. >>> >>> We have now upgraded our Server to Ubuntu 20.04 LTS and installed the >>> packages fence-virt and fence-virtd. >>> >>> The command "fence_xvm -a 225.0.0.12 -o list" on the Hosts still just >>> returns the single local VM. >>> >>> The same command on both VMs results in: >>> # fence_xvm -a 225.0.0.12 -o list >>> Timed out waiting for response >>> Operation failed >>> >>> But just as before, trying to connect from the guest to the host via nc >>> >>> just works fine. >>> #nc -z -v -u 192.168.1.21 1229 >>> Connection to 192.168.1.21 1229 port [udp/*] succeeded! >>> >>> So the hosts and service basically is reachable. >>> >>> I have spoken to our Firewall tech, he has assured me, that no local >>> traffic is hindered by anything. Be it multicast or not. >>> Software Firewalls are not present/active on any of our servers. >>> >>> Ubuntu guests: >>> # ufw status >>> Status: inactive >>> >>> CentOS hosts: >>> systemctl status firewalld >>> ● firewalld.service - firewalld - dynamic firewall daemon >>> Loaded: loaded (/usr/lib/systemd/system/firewalld.service; disabled; >>> vendor preset: enabled) >>> Active: inactive (dead) >>> Docs: man:firewalld(1) >>> >>> >>> Any hints or help on how to remedy this problem would be greatly >>> appreciated! >>> >>> Kind regards >>> Stefan Schmitz >>> >>> >>> Am 07.07.2020 um 10:54 schrieb Klaus Wenninger: On 7/7/20 10:33 AM, Strahil Nikolov wrote: > I can't find fence_virtd for Ubuntu18, but it is available for >>> Ubuntu20. > > Your other option is to get an iSCSI from your quorum system and use >>> that for SBD. > For watchdog, you can use 'softdog' kernel module or you can use KVM >>> to present one to the VMs. > You can also check the '-P' flag for SBD. With kvm please use the qemu-watchdog and try to prevent using softdogwith SBD. Especially if you are aiming for a production-cluster ... Adding something like that to libvirt-xml should do the trick: >>> function='0x0'/> > > Best Regards, > Strahil Nikolov > > На 7 юли 2020 г. 10:11:38 GMT+03:00, >>> "stefan.schm...@farmpartner-tec.com" >>> написа: >>> What does 'virsh list' >>> give you onthe
Re: [ClusterLabs] Still Beginner STONITH Problem
Hello, thanks for the advise. I have worked through that list as follows: > - key deployed on the Hypervisours > - key deployed on the VMs I created the key file a while ago once on one host and distributed it to every other host and guest. Right now it resides on all 4 machines in the same path: /etc/cluster/fence_xvm.key Is there maybe a a corosync/Stonith or other function which checks the keyfiles for any corruption or errors? > - fence_virtd running on both Hypervisours It is running on each host: # ps aux |grep fence_virtd root 62032 0.0 0.0 251568 4496 ?Ss Jun29 0:00 fence_virtd > - Firewall opened (1229/udp for the hosts, 1229/tcp for the guests) Command on one host: fence_xvm -a 225.0.0.12 -o list tcpdump on the guest residing on the other host: host2.55179 > 225.0.0.12.1229: [udp sum ok] UDP, length 176 host2 > igmp.mcast.net: igmp v3 report, 1 group record(s) [gaddr 225.0.0.12 to_in { }] host2 > igmp.mcast.net: igmp v3 report, 1 group record(s) [gaddr 225.0.0.12 to_in { }] At least to me it looks like the VMs are reachable by the multicast traffic. Additionally, no matter on which host I execute the fence_xvm command, tcpdum shows the same traffic on both guests. But on the other hand, at the same time, tcpdump shows nothing on the other host. Just to be sure I have flushed iptables beforehand on each host. Is there maybe a problem? > - fence_xvm on both VMs fence_xvm is installed on both VMs # which fence_xvm /usr/sbin/fence_xvm Could you please advise on how to proceed? Thank you in advance. Kind regards Stefan Schmitz Am 08.07.2020 um 20:24 schrieb Strahil Nikolov: Erm...network/firewall is always "green". Run tcpdump on Host1 and VM2 (not on the same host). Then run again 'fence_xvm -o list' and check what is captured. In summary, you need: - key deployed on the Hypervisours - key deployed on the VMs - fence_virtd running on both Hypervisours - Firewall opened (1229/udp for the hosts, 1229/tcp for the guests) - fence_xvm on both VMs In your case , the primary suspect is multicast traffic. Best Regards, Strahil Nikolov На 8 юли 2020 г. 16:33:45 GMT+03:00, "stefan.schm...@farmpartner-tec.com" написа: Hello, I can't find fence_virtd for Ubuntu18, but it is available for Ubuntu20. We have now upgraded our Server to Ubuntu 20.04 LTS and installed the packages fence-virt and fence-virtd. The command "fence_xvm -a 225.0.0.12 -o list" on the Hosts still just returns the single local VM. The same command on both VMs results in: # fence_xvm -a 225.0.0.12 -o list Timed out waiting for response Operation failed But just as before, trying to connect from the guest to the host via nc just works fine. #nc -z -v -u 192.168.1.21 1229 Connection to 192.168.1.21 1229 port [udp/*] succeeded! So the hosts and service basically is reachable. I have spoken to our Firewall tech, he has assured me, that no local traffic is hindered by anything. Be it multicast or not. Software Firewalls are not present/active on any of our servers. Ubuntu guests: # ufw status Status: inactive CentOS hosts: systemctl status firewalld ● firewalld.service - firewalld - dynamic firewall daemon Loaded: loaded (/usr/lib/systemd/system/firewalld.service; disabled; vendor preset: enabled) Active: inactive (dead) Docs: man:firewalld(1) Any hints or help on how to remedy this problem would be greatly appreciated! Kind regards Stefan Schmitz Am 07.07.2020 um 10:54 schrieb Klaus Wenninger: On 7/7/20 10:33 AM, Strahil Nikolov wrote: I can't find fence_virtd for Ubuntu18, but it is available for Ubuntu20. Your other option is to get an iSCSI from your quorum system and use that for SBD. For watchdog, you can use 'softdog' kernel module or you can use KVM to present one to the VMs. You can also check the '-P' flag for SBD. With kvm please use the qemu-watchdog and try to prevent using softdogwith SBD. Especially if you are aiming for a production-cluster ... Adding something like that to libvirt-xml should do the trick: Best Regards, Strahil Nikolov На 7 юли 2020 г. 10:11:38 GMT+03:00, "stefan.schm...@farmpartner-tec.com" написа: What does 'virsh list' give you onthe 2 hosts? Hopefully different names for the VMs ... Yes, each host shows its own # virsh list IdName Status 2 kvm101 running # virsh list IdName State 1 kvm102 running Did you try 'fence_xvm -a {mcast-ip} -o list' on the guests as well? fence_xvm sadly does not work on the Ubuntu guests. The howto said to install "yum install fence-virt fence-virtd" which do not exist as such in Ubuntu 18.04. After we tried to find the appropiate packages we installed "libvirt-clients" and "multipath-tools". Is there
Re: [ClusterLabs] Still Beginner STONITH Problem
On 7/8/20 8:24 PM, Strahil Nikolov wrote: > Erm...network/firewall is always "green". Run tcpdump on Host1 and VM2 > (not on the same host). > Then run again 'fence_xvm -o list' and check what is captured. > > In summary, you need: > - key deployed on the Hypervisours > - key deployed on the VMs > - fence_virtd running on both Hypervisours > - Firewall opened (1229/udp for the hosts, 1229/tcp for the guests) > - fence_xvm on both VMs > > In your case , the primary suspect is multicast traffic. Or just a simple port-access issue ... firewalld is not the only way you can setup some kind of firewall on your local machine. iptables.service might be active for instance. I have no personal experience with multiple hosts & fence_xvm. So when you have solved your primary issue you might still consider running 2 parallel setups. I've read about a recommendation to do so and I have a vague memory about an email-thread stating some issues. Anybody here can state that multiple-hosts with a single multicast-ip is working reliably? > > Best Regards, > Strahil Nikolov > > На 8 юли 2020 г. 16:33:45 GMT+03:00, "stefan.schm...@farmpartner-tec.com" > написа: >> Hello, >> >>> I can't find fence_virtd for Ubuntu18, but it is available for >>> Ubuntu20. >> We have now upgraded our Server to Ubuntu 20.04 LTS and installed the >> packages fence-virt and fence-virtd. >> >> The command "fence_xvm -a 225.0.0.12 -o list" on the Hosts still just >> returns the single local VM. >> >> The same command on both VMs results in: >> # fence_xvm -a 225.0.0.12 -o list >> Timed out waiting for response >> Operation failed >> >> But just as before, trying to connect from the guest to the host via nc >> >> just works fine. >> #nc -z -v -u 192.168.1.21 1229 >> Connection to 192.168.1.21 1229 port [udp/*] succeeded! >> >> So the hosts and service basically is reachable. >> >> I have spoken to our Firewall tech, he has assured me, that no local >> traffic is hindered by anything. Be it multicast or not. >> Software Firewalls are not present/active on any of our servers. >> >> Ubuntu guests: >> # ufw status >> Status: inactive >> >> CentOS hosts: >> systemctl status firewalld >> ● firewalld.service - firewalld - dynamic firewall daemon >> Loaded: loaded (/usr/lib/systemd/system/firewalld.service; disabled; >> vendor preset: enabled) >>Active: inactive (dead) >> Docs: man:firewalld(1) >> >> >> Any hints or help on how to remedy this problem would be greatly >> appreciated! >> >> Kind regards >> Stefan Schmitz >> >> >> Am 07.07.2020 um 10:54 schrieb Klaus Wenninger: >>> On 7/7/20 10:33 AM, Strahil Nikolov wrote: I can't find fence_virtd for Ubuntu18, but it is available for >> Ubuntu20. Your other option is to get an iSCSI from your quorum system and use >> that for SBD. For watchdog, you can use 'softdog' kernel module or you can use KVM >> to present one to the VMs. You can also check the '-P' flag for SBD. >>> With kvm please use the qemu-watchdog and try to >>> prevent using softdogwith SBD. >>> Especially if you are aiming for a production-cluster ... >>> >>> Adding something like that to libvirt-xml should do the trick: >>> >>> >> function='0x0'/> >>> >>> Best Regards, Strahil Nikolov На 7 юли 2020 г. 10:11:38 GMT+03:00, >> "stefan.schm...@farmpartner-tec.com" >> написа: >> What does 'virsh list' >> give you onthe 2 hosts? Hopefully different names for >> the VMs ... > Yes, each host shows its own > > # virsh list > IdName Status > > 2 kvm101 running > > # virsh list > IdName State > > 1 kvm102 running > > > >> Did you try 'fence_xvm -a {mcast-ip} -o list' on the >> guests as well? > fence_xvm sadly does not work on the Ubuntu guests. The howto said >> to > install "yum install fence-virt fence-virtd" which do not exist as > such > in Ubuntu 18.04. After we tried to find the appropiate packages we > installed "libvirt-clients" and "multipath-tools". Is there maybe > something misisng or completely wrong? > Though we can connect to both hosts using "nc -z -v -u >> 192.168.1.21 > 1229", that just works fine. > >>> without fence-virt you can't expect the whole thing to work. >>> maybe you can build it for your ubuntu-version from sources of >>> a package for another ubuntu-version if it doesn't exist yet. >>> btw. which pacemaker-version are you using? >>> There was a convenience-fix on the master-branch for at least >>> a couple of days (sometimes during 2.0.4 release-cycle) that >>> wasn't compatible with fence_xvm. >> Usually, the biggest problem is the multicast traffic - as in >> many >> environments it can
Re: [ClusterLabs] Still Beginner STONITH Problem
Erm...network/firewall is always "green". Run tcpdump on Host1 and VM2 (not on the same host). Then run again 'fence_xvm -o list' and check what is captured. In summary, you need: - key deployed on the Hypervisours - key deployed on the VMs - fence_virtd running on both Hypervisours - Firewall opened (1229/udp for the hosts, 1229/tcp for the guests) - fence_xvm on both VMs In your case , the primary suspect is multicast traffic. Best Regards, Strahil Nikolov На 8 юли 2020 г. 16:33:45 GMT+03:00, "stefan.schm...@farmpartner-tec.com" написа: >Hello, > >>I can't find fence_virtd for Ubuntu18, but it is available for >>Ubuntu20. > >We have now upgraded our Server to Ubuntu 20.04 LTS and installed the >packages fence-virt and fence-virtd. > >The command "fence_xvm -a 225.0.0.12 -o list" on the Hosts still just >returns the single local VM. > >The same command on both VMs results in: ># fence_xvm -a 225.0.0.12 -o list >Timed out waiting for response >Operation failed > >But just as before, trying to connect from the guest to the host via nc > >just works fine. >#nc -z -v -u 192.168.1.21 1229 >Connection to 192.168.1.21 1229 port [udp/*] succeeded! > >So the hosts and service basically is reachable. > >I have spoken to our Firewall tech, he has assured me, that no local >traffic is hindered by anything. Be it multicast or not. >Software Firewalls are not present/active on any of our servers. > >Ubuntu guests: ># ufw status >Status: inactive > >CentOS hosts: >systemctl status firewalld >● firewalld.service - firewalld - dynamic firewall daemon > Loaded: loaded (/usr/lib/systemd/system/firewalld.service; disabled; >vendor preset: enabled) >Active: inactive (dead) > Docs: man:firewalld(1) > > >Any hints or help on how to remedy this problem would be greatly >appreciated! > >Kind regards >Stefan Schmitz > > >Am 07.07.2020 um 10:54 schrieb Klaus Wenninger: >> On 7/7/20 10:33 AM, Strahil Nikolov wrote: >>> I can't find fence_virtd for Ubuntu18, but it is available for >Ubuntu20. >>> >>> Your other option is to get an iSCSI from your quorum system and use >that for SBD. >>> For watchdog, you can use 'softdog' kernel module or you can use KVM >to present one to the VMs. >>> You can also check the '-P' flag for SBD. >> With kvm please use the qemu-watchdog and try to >> prevent using softdogwith SBD. >> Especially if you are aiming for a production-cluster ... >> >> Adding something like that to libvirt-xml should do the trick: >> >> > function='0x0'/> >> >> >>> >>> Best Regards, >>> Strahil Nikolov >>> >>> На 7 юли 2020 г. 10:11:38 GMT+03:00, >"stefan.schm...@farmpartner-tec.com" > написа: > What does 'virsh list' > give you onthe 2 hosts? Hopefully different names for > the VMs ... Yes, each host shows its own # virsh list IdName Status 2 kvm101 running # virsh list IdName State 1 kvm102 running > Did you try 'fence_xvm -a {mcast-ip} -o list' on the > guests as well? fence_xvm sadly does not work on the Ubuntu guests. The howto said >to install "yum install fence-virt fence-virtd" which do not exist as such in Ubuntu 18.04. After we tried to find the appropiate packages we installed "libvirt-clients" and "multipath-tools". Is there maybe something misisng or completely wrong? Though we can connect to both hosts using "nc -z -v -u >192.168.1.21 1229", that just works fine. >> without fence-virt you can't expect the whole thing to work. >> maybe you can build it for your ubuntu-version from sources of >> a package for another ubuntu-version if it doesn't exist yet. >> btw. which pacemaker-version are you using? >> There was a convenience-fix on the master-branch for at least >> a couple of days (sometimes during 2.0.4 release-cycle) that >> wasn't compatible with fence_xvm. > Usually, the biggest problem is the multicast traffic - as in >many > environments it can be dropped by firewalls. To make sure I have requested our Datacenter techs to verify that multicast Traffic can move unhindered in our local Network. But in >the past on multiple occasions they have confirmed, that local traffic >is not filtered in any way. But Since now I have never specifically >asked for multicast traffic, which I now did. I am waiting for an answer >to that question. kind regards Stefan Schmitz Am 06.07.2020 um 11:24 schrieb Klaus Wenninger: > On 7/6/20 10:10 AM, stefan.schm...@farmpartner-tec.com wrote: >> Hello, >> # fence_xvm -o list kvm102 bab3749c-15fc-40b7-8b6c-d4267b9f0eb9 on >>> This should show both
Re: [ClusterLabs] Still Beginner STONITH Problem
Erm...network/firewall is always "green". Run tcpdump on Host1 and VM2 (not on the same host). Then run again 'fence_xvm -o list' and check what is captured. In summary, you need: - key deployed on the Hypervisours - key deployed on the VMs - fence_virtd running on both Hypervisours - Firewall opened (1229/udp for the hosts, 1229/tcp for the guests) - fence_xvm on both VMs In your case , the primary suspect is multicast traffic. Best Regards, Strahil Nikolov На 8 юли 2020 г. 16:33:45 GMT+03:00, "stefan.schm...@farmpartner-tec.com" написа: >Hello, > >>I can't find fence_virtd for Ubuntu18, but it is available for >>Ubuntu20. > >We have now upgraded our Server to Ubuntu 20.04 LTS and installed the >packages fence-virt and fence-virtd. > >The command "fence_xvm -a 225.0.0.12 -o list" on the Hosts still just >returns the single local VM. > >The same command on both VMs results in: ># fence_xvm -a 225.0.0.12 -o list >Timed out waiting for response >Operation failed > >But just as before, trying to connect from the guest to the host via nc > >just works fine. >#nc -z -v -u 192.168.1.21 1229 >Connection to 192.168.1.21 1229 port [udp/*] succeeded! > >So the hosts and service basically is reachable. > >I have spoken to our Firewall tech, he has assured me, that no local >traffic is hindered by anything. Be it multicast or not. >Software Firewalls are not present/active on any of our servers. > >Ubuntu guests: ># ufw status >Status: inactive > >CentOS hosts: >systemctl status firewalld >● firewalld.service - firewalld - dynamic firewall daemon > Loaded: loaded (/usr/lib/systemd/system/firewalld.service; disabled; >vendor preset: enabled) >Active: inactive (dead) > Docs: man:firewalld(1) > > >Any hints or help on how to remedy this problem would be greatly >appreciated! > >Kind regards >Stefan Schmitz > > >Am 07.07.2020 um 10:54 schrieb Klaus Wenninger: >> On 7/7/20 10:33 AM, Strahil Nikolov wrote: >>> I can't find fence_virtd for Ubuntu18, but it is available for >Ubuntu20. >>> >>> Your other option is to get an iSCSI from your quorum system and use >that for SBD. >>> For watchdog, you can use 'softdog' kernel module or you can use KVM >to present one to the VMs. >>> You can also check the '-P' flag for SBD. >> With kvm please use the qemu-watchdog and try to >> prevent using softdogwith SBD. >> Especially if you are aiming for a production-cluster ... >> >> Adding something like that to libvirt-xml should do the trick: >> >> > function='0x0'/> >> >> >>> >>> Best Regards, >>> Strahil Nikolov >>> >>> На 7 юли 2020 г. 10:11:38 GMT+03:00, >"stefan.schm...@farmpartner-tec.com" > написа: > What does 'virsh list' > give you onthe 2 hosts? Hopefully different names for > the VMs ... Yes, each host shows its own # virsh list IdName Status 2 kvm101 running # virsh list IdName State 1 kvm102 running > Did you try 'fence_xvm -a {mcast-ip} -o list' on the > guests as well? fence_xvm sadly does not work on the Ubuntu guests. The howto said >to install "yum install fence-virt fence-virtd" which do not exist as such in Ubuntu 18.04. After we tried to find the appropiate packages we installed "libvirt-clients" and "multipath-tools". Is there maybe something misisng or completely wrong? Though we can connect to both hosts using "nc -z -v -u >192.168.1.21 1229", that just works fine. >> without fence-virt you can't expect the whole thing to work. >> maybe you can build it for your ubuntu-version from sources of >> a package for another ubuntu-version if it doesn't exist yet. >> btw. which pacemaker-version are you using? >> There was a convenience-fix on the master-branch for at least >> a couple of days (sometimes during 2.0.4 release-cycle) that >> wasn't compatible with fence_xvm. > Usually, the biggest problem is the multicast traffic - as in >many > environments it can be dropped by firewalls. To make sure I have requested our Datacenter techs to verify that multicast Traffic can move unhindered in our local Network. But in >the past on multiple occasions they have confirmed, that local traffic >is not filtered in any way. But Since now I have never specifically >asked for multicast traffic, which I now did. I am waiting for an answer >to that question. kind regards Stefan Schmitz Am 06.07.2020 um 11:24 schrieb Klaus Wenninger: > On 7/6/20 10:10 AM, stefan.schm...@farmpartner-tec.com wrote: >> Hello, >> # fence_xvm -o list kvm102 bab3749c-15fc-40b7-8b6c-d4267b9f0eb9 on >>> This should show both
Re: [ClusterLabs] Still Beginner STONITH Problem
Hello, >I can't find fence_virtd for Ubuntu18, but it is available for >Ubuntu20. We have now upgraded our Server to Ubuntu 20.04 LTS and installed the packages fence-virt and fence-virtd. The command "fence_xvm -a 225.0.0.12 -o list" on the Hosts still just returns the single local VM. The same command on both VMs results in: # fence_xvm -a 225.0.0.12 -o list Timed out waiting for response Operation failed But just as before, trying to connect from the guest to the host via nc just works fine. #nc -z -v -u 192.168.1.21 1229 Connection to 192.168.1.21 1229 port [udp/*] succeeded! So the hosts and service basically is reachable. I have spoken to our Firewall tech, he has assured me, that no local traffic is hindered by anything. Be it multicast or not. Software Firewalls are not present/active on any of our servers. Ubuntu guests: # ufw status Status: inactive CentOS hosts: systemctl status firewalld ● firewalld.service - firewalld - dynamic firewall daemon Loaded: loaded (/usr/lib/systemd/system/firewalld.service; disabled; vendor preset: enabled) Active: inactive (dead) Docs: man:firewalld(1) Any hints or help on how to remedy this problem would be greatly appreciated! Kind regards Stefan Schmitz Am 07.07.2020 um 10:54 schrieb Klaus Wenninger: On 7/7/20 10:33 AM, Strahil Nikolov wrote: I can't find fence_virtd for Ubuntu18, but it is available for Ubuntu20. Your other option is to get an iSCSI from your quorum system and use that for SBD. For watchdog, you can use 'softdog' kernel module or you can use KVM to present one to the VMs. You can also check the '-P' flag for SBD. With kvm please use the qemu-watchdog and try to prevent using softdogwith SBD. Especially if you are aiming for a production-cluster ... Adding something like that to libvirt-xml should do the trick: Best Regards, Strahil Nikolov На 7 юли 2020 г. 10:11:38 GMT+03:00, "stefan.schm...@farmpartner-tec.com" написа: What does 'virsh list' give you onthe 2 hosts? Hopefully different names for the VMs ... Yes, each host shows its own # virsh list IdName Status 2 kvm101 running # virsh list IdName State 1 kvm102 running Did you try 'fence_xvm -a {mcast-ip} -o list' on the guests as well? fence_xvm sadly does not work on the Ubuntu guests. The howto said to install "yum install fence-virt fence-virtd" which do not exist as such in Ubuntu 18.04. After we tried to find the appropiate packages we installed "libvirt-clients" and "multipath-tools". Is there maybe something misisng or completely wrong? Though we can connect to both hosts using "nc -z -v -u 192.168.1.21 1229", that just works fine. without fence-virt you can't expect the whole thing to work. maybe you can build it for your ubuntu-version from sources of a package for another ubuntu-version if it doesn't exist yet. btw. which pacemaker-version are you using? There was a convenience-fix on the master-branch for at least a couple of days (sometimes during 2.0.4 release-cycle) that wasn't compatible with fence_xvm. Usually, the biggest problem is the multicast traffic - as in many environments it can be dropped by firewalls. To make sure I have requested our Datacenter techs to verify that multicast Traffic can move unhindered in our local Network. But in the past on multiple occasions they have confirmed, that local traffic is not filtered in any way. But Since now I have never specifically asked for multicast traffic, which I now did. I am waiting for an answer to that question. kind regards Stefan Schmitz Am 06.07.2020 um 11:24 schrieb Klaus Wenninger: On 7/6/20 10:10 AM, stefan.schm...@farmpartner-tec.com wrote: Hello, # fence_xvm -o list kvm102 bab3749c-15fc-40b7-8b6c-d4267b9f0eb9 on This should show both VMs, so getting to that point will likely solve your problem. fence_xvm relies on multicast, there could be some obscure network configuration to get that working on the VMs. You said you tried on both hosts. What does 'virsh list' give you onthe 2 hosts? Hopefully different names for the VMs ... Did you try 'fence_xvm -a {mcast-ip} -o list' on the guests as well? Did you try pinging via the physical network that is connected tothe bridge configured to be used for fencing? If I got it right fence_xvm should supportcollecting answersfrom multiple hosts but I found a suggestion to do a setup with 2 multicast-addresses & keys for each host. Which route did you go? Klaus Thank you for pointing me in that direction. We have tried to solve that but with no success. We were using an howto provided here https://wiki.clusterlabs.org/wiki/Guest_Fencing Problem is, it specifically states that the tutorial does not yet support the case where guests are running on multiple
Re: [ClusterLabs] Still Beginner STONITH Problem
>With kvm please use the qemu-watchdog and try to >prevent using softdogwith SBD. >Especially if you are aiming for a production-cluster ... You can tell it to the previous company I worked for :D . All clusters were using softdog on SLES 11/12 despite the hardware had it's own. We had no issues with fencing, but we got plenty of san issues to test the fencing :) Best Regards, Strahil Nikolov ___ Manage your subscription: https://lists.clusterlabs.org/mailman/listinfo/users ClusterLabs home: https://www.clusterlabs.org/
Re: [ClusterLabs] Still Beginner STONITH Problem
I can't find fence_virtd for Ubuntu18, but it is available for Ubuntu20. Your other option is to get an iSCSI from your quorum system and use that for SBD. For watchdog, you can use 'softdog' kernel module or you can use KVM to present one to the VMs. You can also check the '-P' flag for SBD. Best Regards, Strahil Nikolov На 7 юли 2020 г. 10:11:38 GMT+03:00, "stefan.schm...@farmpartner-tec.com" написа: > >What does 'virsh list' > >give you onthe 2 hosts? Hopefully different names for > >the VMs ... > >Yes, each host shows its own > ># virsh list > IdName Status > > 2 kvm101 running > ># virsh list > IdName State > > 1 kvm102 running > > > > >Did you try 'fence_xvm -a {mcast-ip} -o list' on the > >guests as well? > >fence_xvm sadly does not work on the Ubuntu guests. The howto said to >install "yum install fence-virt fence-virtd" which do not exist as >such >in Ubuntu 18.04. After we tried to find the appropiate packages we >installed "libvirt-clients" and "multipath-tools". Is there maybe >something misisng or completely wrong? >Though we can connect to both hosts using "nc -z -v -u 192.168.1.21 >1229", that just works fine. > > > >Usually, the biggest problem is the multicast traffic - as in many > >environments it can be dropped by firewalls. > >To make sure I have requested our Datacenter techs to verify that >multicast Traffic can move unhindered in our local Network. But in the >past on multiple occasions they have confirmed, that local traffic is >not filtered in any way. But Since now I have never specifically asked >for multicast traffic, which I now did. I am waiting for an answer to >that question. > > >kind regards >Stefan Schmitz > >Am 06.07.2020 um 11:24 schrieb Klaus Wenninger: >> On 7/6/20 10:10 AM, stefan.schm...@farmpartner-tec.com wrote: >>> Hello, >>> > # fence_xvm -o list > kvm102 >bab3749c-15fc-40b7-8b6c-d4267b9f0eb9 > on >>> This should show both VMs, so getting to that point will likely >solve your problem. fence_xvm relies on multicast, there could be some obscure network configuration to get that working on the VMs. >> You said you tried on both hosts. What does 'virsh list' >> give you onthe 2 hosts? Hopefully different names for >> the VMs ... >> Did you try 'fence_xvm -a {mcast-ip} -o list' on the >> guests as well? >> Did you try pinging via the physical network that is >> connected tothe bridge configured to be used for >> fencing? >> If I got it right fence_xvm should supportcollecting >> answersfrom multiple hosts but I found a suggestion >> to do a setup with 2 multicast-addresses & keys for >> each host. >> Which route did you go? >> >> Klaus >>> >>> Thank you for pointing me in that direction. We have tried to solve >>> that but with no success. We were using an howto provided here >>> https://wiki.clusterlabs.org/wiki/Guest_Fencing >>> >>> Problem is, it specifically states that the tutorial does not yet >>> support the case where guests are running on multiple hosts. There >are >>> some short hints what might be necessary to do, but working through >>> those sadly just did not work nor where there any clues which would >>> help us finding a solution ourselves. So now we are completely stuck >>> here. >>> >>> Has someone the same configuration with Guest VMs on multiple hosts? >>> And how did you manage to get that to work? What do we need to do to >>> resolve this? Is there maybe even someone who would be willing to >take >>> a closer look at our server? Any help would be greatly appreciated! >>> >>> Kind regards >>> Stefan Schmitz >>> >>> >>> >>> Am 03.07.2020 um 02:39 schrieb Ken Gaillot: On Thu, 2020-07-02 at 17:18 +0200, >stefan.schm...@farmpartner-tec.com wrote: > Hello, > > I hope someone can help with this problem. We are (still) trying >to > get > Stonith to achieve a running active/active HA Cluster, but sadly >to > no > avail. > > There are 2 Centos Hosts. On each one there is a virtual Ubuntu >VM. > The > Ubuntu VMs are the ones which should form the HA Cluster. > > The current status is this: > > # pcs status > Cluster name: pacemaker_cluster > WARNING: corosync and pacemaker node names do not match (IPs used >in > setup?) > Stack: corosync > Current DC: server2ubuntu1 (version 1.1.18-2b07d5c5a9) - partition > with > quorum > Last updated: Thu Jul 2 17:03:53 2020 > Last change: Thu Jul 2 14:33:14 2020 by root via cibadmin on > server4ubuntu1 > > 2 nodes configured > 13 resources configured > > Online: [ server2ubuntu1 server4ubuntu1 ] > > Full list of resources: > > stonith_id_1 (stonith:external/libvirt):
Re: [ClusterLabs] Still Beginner STONITH Problem
On 7/7/20 11:12 AM, Strahil Nikolov wrote: >> With kvm please use the qemu-watchdog and try to >> prevent using softdogwith SBD. >> Especially if you are aiming for a production-cluster ... > You can tell it to the previous company I worked for :D . > All clusters were using softdog on SLES 11/12 despite the hardware had it's > own. Yes I know opinions regarding softdog do diverge a bit. Going through some possible kernel-paths at least leaves a bad taste. Doesn't mean you will have issues though. Just something where testing won't give you an easy answer. May as well depend heavily on the hardware you are running on. As long as there are better possibilities one should at least consider them. Remember to have defaulted to softdog on a pre-configured product-installer with the documentation stating that softdog has it's shortcomings and an advise to configure something else if available, you know what you are doing and you have tested it (testing if a hardware watchdog actually fires is easy while it is merely impossible to test-verify if softdog is really reliable enough). Klaus > > We had no issues with fencing, but we got plenty of san issues to test the > fencing :) > > Best Regards, > Strahil Nikolov > ___ Manage your subscription: https://lists.clusterlabs.org/mailman/listinfo/users ClusterLabs home: https://www.clusterlabs.org/
Re: [ClusterLabs] Still Beginner STONITH Problem
On 7/7/20 10:33 AM, Strahil Nikolov wrote: > I can't find fence_virtd for Ubuntu18, but it is available for Ubuntu20. > > Your other option is to get an iSCSI from your quorum system and use that for > SBD. > For watchdog, you can use 'softdog' kernel module or you can use KVM to > present one to the VMs. > You can also check the '-P' flag for SBD. With kvm please use the qemu-watchdog and try to prevent using softdogwith SBD. Especially if you are aiming for a production-cluster ... Adding something like that to libvirt-xml should do the trick: > > Best Regards, > Strahil Nikolov > > На 7 юли 2020 г. 10:11:38 GMT+03:00, "stefan.schm...@farmpartner-tec.com" > написа: >>> What does 'virsh list' >>> give you onthe 2 hosts? Hopefully different names for >>> the VMs ... >> Yes, each host shows its own >> >> # virsh list >> IdName Status >> >> 2 kvm101 running >> >> # virsh list >> IdName State >> >> 1 kvm102 running >> >> >> >>> Did you try 'fence_xvm -a {mcast-ip} -o list' on the >>> guests as well? >> fence_xvm sadly does not work on the Ubuntu guests. The howto said to >> install "yum install fence-virt fence-virtd" which do not exist as >> such >> in Ubuntu 18.04. After we tried to find the appropiate packages we >> installed "libvirt-clients" and "multipath-tools". Is there maybe >> something misisng or completely wrong? >> Though we can connect to both hosts using "nc -z -v -u 192.168.1.21 >> 1229", that just works fine. >> without fence-virt you can't expect the whole thing to work. maybe you can build it for your ubuntu-version from sources of a package for another ubuntu-version if it doesn't exist yet. btw. which pacemaker-version are you using? There was a convenience-fix on the master-branch for at least a couple of days (sometimes during 2.0.4 release-cycle) that wasn't compatible with fence_xvm. >>> Usually, the biggest problem is the multicast traffic - as in many >>> environments it can be dropped by firewalls. >> To make sure I have requested our Datacenter techs to verify that >> multicast Traffic can move unhindered in our local Network. But in the >> past on multiple occasions they have confirmed, that local traffic is >> not filtered in any way. But Since now I have never specifically asked >> for multicast traffic, which I now did. I am waiting for an answer to >> that question. >> >> >> kind regards >> Stefan Schmitz >> >> Am 06.07.2020 um 11:24 schrieb Klaus Wenninger: >>> On 7/6/20 10:10 AM, stefan.schm...@farmpartner-tec.com wrote: Hello, >> # fence_xvm -o list >> kvm102 >> bab3749c-15fc-40b7-8b6c-d4267b9f0eb9 >> on > This should show both VMs, so getting to that point will likely >> solve > your problem. fence_xvm relies on multicast, there could be some > obscure network configuration to get that working on the VMs. >>> You said you tried on both hosts. What does 'virsh list' >>> give you onthe 2 hosts? Hopefully different names for >>> the VMs ... >>> Did you try 'fence_xvm -a {mcast-ip} -o list' on the >>> guests as well? >>> Did you try pinging via the physical network that is >>> connected tothe bridge configured to be used for >>> fencing? >>> If I got it right fence_xvm should supportcollecting >>> answersfrom multiple hosts but I found a suggestion >>> to do a setup with 2 multicast-addresses & keys for >>> each host. >>> Which route did you go? >>> >>> Klaus Thank you for pointing me in that direction. We have tried to solve that but with no success. We were using an howto provided here https://wiki.clusterlabs.org/wiki/Guest_Fencing Problem is, it specifically states that the tutorial does not yet support the case where guests are running on multiple hosts. There >> are some short hints what might be necessary to do, but working through those sadly just did not work nor where there any clues which would help us finding a solution ourselves. So now we are completely stuck here. Has someone the same configuration with Guest VMs on multiple hosts? And how did you manage to get that to work? What do we need to do to resolve this? Is there maybe even someone who would be willing to >> take a closer look at our server? Any help would be greatly appreciated! Kind regards Stefan Schmitz Am 03.07.2020 um 02:39 schrieb Ken Gaillot: > On Thu, 2020-07-02 at 17:18 +0200, >> stefan.schm...@farmpartner-tec.com > wrote: >> Hello, >> >> I hope someone can help with this problem. We are (still) trying >> to >> get >> Stonith to achieve a running active/active HA Cluster, but sadly >> to >> no >> avail. >> >> There are 2 Centos
Re: [ClusterLabs] Still Beginner STONITH Problem
>What does 'virsh list' >give you onthe 2 hosts? Hopefully different names for >the VMs ... Yes, each host shows its own # virsh list IdName Status 2 kvm101 running # virsh list IdName State 1 kvm102 running >Did you try 'fence_xvm -a {mcast-ip} -o list' on the >guests as well? fence_xvm sadly does not work on the Ubuntu guests. The howto said to install "yum install fence-virt fence-virtd" which do not exist as such in Ubuntu 18.04. After we tried to find the appropiate packages we installed "libvirt-clients" and "multipath-tools". Is there maybe something misisng or completely wrong? Though we can connect to both hosts using "nc -z -v -u 192.168.1.21 1229", that just works fine. >Usually, the biggest problem is the multicast traffic - as in many >environments it can be dropped by firewalls. To make sure I have requested our Datacenter techs to verify that multicast Traffic can move unhindered in our local Network. But in the past on multiple occasions they have confirmed, that local traffic is not filtered in any way. But Since now I have never specifically asked for multicast traffic, which I now did. I am waiting for an answer to that question. kind regards Stefan Schmitz Am 06.07.2020 um 11:24 schrieb Klaus Wenninger: On 7/6/20 10:10 AM, stefan.schm...@farmpartner-tec.com wrote: Hello, # fence_xvm -o list kvm102 bab3749c-15fc-40b7-8b6c-d4267b9f0eb9 on This should show both VMs, so getting to that point will likely solve your problem. fence_xvm relies on multicast, there could be some obscure network configuration to get that working on the VMs. You said you tried on both hosts. What does 'virsh list' give you onthe 2 hosts? Hopefully different names for the VMs ... Did you try 'fence_xvm -a {mcast-ip} -o list' on the guests as well? Did you try pinging via the physical network that is connected tothe bridge configured to be used for fencing? If I got it right fence_xvm should supportcollecting answersfrom multiple hosts but I found a suggestion to do a setup with 2 multicast-addresses & keys for each host. Which route did you go? Klaus Thank you for pointing me in that direction. We have tried to solve that but with no success. We were using an howto provided here https://wiki.clusterlabs.org/wiki/Guest_Fencing Problem is, it specifically states that the tutorial does not yet support the case where guests are running on multiple hosts. There are some short hints what might be necessary to do, but working through those sadly just did not work nor where there any clues which would help us finding a solution ourselves. So now we are completely stuck here. Has someone the same configuration with Guest VMs on multiple hosts? And how did you manage to get that to work? What do we need to do to resolve this? Is there maybe even someone who would be willing to take a closer look at our server? Any help would be greatly appreciated! Kind regards Stefan Schmitz Am 03.07.2020 um 02:39 schrieb Ken Gaillot: On Thu, 2020-07-02 at 17:18 +0200, stefan.schm...@farmpartner-tec.com wrote: Hello, I hope someone can help with this problem. We are (still) trying to get Stonith to achieve a running active/active HA Cluster, but sadly to no avail. There are 2 Centos Hosts. On each one there is a virtual Ubuntu VM. The Ubuntu VMs are the ones which should form the HA Cluster. The current status is this: # pcs status Cluster name: pacemaker_cluster WARNING: corosync and pacemaker node names do not match (IPs used in setup?) Stack: corosync Current DC: server2ubuntu1 (version 1.1.18-2b07d5c5a9) - partition with quorum Last updated: Thu Jul 2 17:03:53 2020 Last change: Thu Jul 2 14:33:14 2020 by root via cibadmin on server4ubuntu1 2 nodes configured 13 resources configured Online: [ server2ubuntu1 server4ubuntu1 ] Full list of resources: stonith_id_1 (stonith:external/libvirt): Stopped Master/Slave Set: r0_pacemaker_Clone [r0_pacemaker] Masters: [ server4ubuntu1 ] Slaves: [ server2ubuntu1 ] Master/Slave Set: WebDataClone [WebData] Masters: [ server2ubuntu1 server4ubuntu1 ] Clone Set: dlm-clone [dlm] Started: [ server2ubuntu1 server4ubuntu1 ] Clone Set: ClusterIP-clone [ClusterIP] (unique) ClusterIP:0 (ocf::heartbeat:IPaddr2): Started server2ubuntu1 ClusterIP:1 (ocf::heartbeat:IPaddr2): Started server4ubuntu1 Clone Set: WebFS-clone [WebFS] Started: [ server4ubuntu1 ] Stopped: [ server2ubuntu1 ] Clone Set: WebSite-clone [WebSite] Started: [ server4ubuntu1 ] Stopped: [ server2ubuntu1 ] Failed Actions: * stonith_id_1_start_0 on server2ubuntu1 'unknown error' (1): call=201, status=Error,
Re: [ClusterLabs] Still Beginner STONITH Problem
As far as I know fence_xvm supports multiple hosts, but you need to open the port on both Hypervisour (udp) and Guest (tcp). 'fence_xvm -o list' should provide a list of VMs from all hosts that responded (and have the key). Usually, the biggest problem is the multicast traffic - as in many environments it can be dropped by firewalls. Best Regards, Strahil Nikolov На 6 юли 2020 г. 12:24:08 GMT+03:00, Klaus Wenninger написа: >On 7/6/20 10:10 AM, stefan.schm...@farmpartner-tec.com wrote: >> Hello, >> >> >> # fence_xvm -o list >> >> kvm102 >bab3749c-15fc-40b7-8b6c-d4267b9f0eb9 >> >> on >> >> >This should show both VMs, so getting to that point will likely >solve >> >your problem. fence_xvm relies on multicast, there could be some >> >obscure network configuration to get that working on the VMs. >You said you tried on both hosts. What does 'virsh list' >give you onthe 2 hosts? Hopefully different names for >the VMs ... >Did you try 'fence_xvm -a {mcast-ip} -o list' on the >guests as well? >Did you try pinging via the physical network that is >connected tothe bridge configured to be used for >fencing? >If I got it right fence_xvm should supportcollecting >answersfrom multiple hosts but I found a suggestion >to do a setup with 2 multicast-addresses & keys for >each host. >Which route did you go? > >Klaus >> >> Thank you for pointing me in that direction. We have tried to solve >> that but with no success. We were using an howto provided here >> https://wiki.clusterlabs.org/wiki/Guest_Fencing >> >> Problem is, it specifically states that the tutorial does not yet >> support the case where guests are running on multiple hosts. There >are >> some short hints what might be necessary to do, but working through >> those sadly just did not work nor where there any clues which would >> help us finding a solution ourselves. So now we are completely stuck >> here. >> >> Has someone the same configuration with Guest VMs on multiple hosts? >> And how did you manage to get that to work? What do we need to do to >> resolve this? Is there maybe even someone who would be willing to >take >> a closer look at our server? Any help would be greatly appreciated! >> >> Kind regards >> Stefan Schmitz >> >> >> >> Am 03.07.2020 um 02:39 schrieb Ken Gaillot: >>> On Thu, 2020-07-02 at 17:18 +0200, >stefan.schm...@farmpartner-tec.com >>> wrote: Hello, I hope someone can help with this problem. We are (still) trying to get Stonith to achieve a running active/active HA Cluster, but sadly to no avail. There are 2 Centos Hosts. On each one there is a virtual Ubuntu VM. The Ubuntu VMs are the ones which should form the HA Cluster. The current status is this: # pcs status Cluster name: pacemaker_cluster WARNING: corosync and pacemaker node names do not match (IPs used >in setup?) Stack: corosync Current DC: server2ubuntu1 (version 1.1.18-2b07d5c5a9) - partition with quorum Last updated: Thu Jul 2 17:03:53 2020 Last change: Thu Jul 2 14:33:14 2020 by root via cibadmin on server4ubuntu1 2 nodes configured 13 resources configured Online: [ server2ubuntu1 server4ubuntu1 ] Full list of resources: stonith_id_1 (stonith:external/libvirt): Stopped Master/Slave Set: r0_pacemaker_Clone [r0_pacemaker] Masters: [ server4ubuntu1 ] Slaves: [ server2ubuntu1 ] Master/Slave Set: WebDataClone [WebData] Masters: [ server2ubuntu1 server4ubuntu1 ] Clone Set: dlm-clone [dlm] Started: [ server2ubuntu1 server4ubuntu1 ] Clone Set: ClusterIP-clone [ClusterIP] (unique) ClusterIP:0 (ocf::heartbeat:IPaddr2): Started server2ubuntu1 ClusterIP:1 (ocf::heartbeat:IPaddr2): Started server4ubuntu1 Clone Set: WebFS-clone [WebFS] Started: [ server4ubuntu1 ] Stopped: [ server2ubuntu1 ] Clone Set: WebSite-clone [WebSite] Started: [ server4ubuntu1 ] Stopped: [ server2ubuntu1 ] Failed Actions: * stonith_id_1_start_0 on server2ubuntu1 'unknown error' (1): call=201, status=Error, exitreason='', last-rc-change='Thu Jul 2 14:37:35 2020', queued=0ms, exec=3403ms * r0_pacemaker_monitor_6 on server2ubuntu1 'master' (8): call=203, status=complete, exitreason='', last-rc-change='Thu Jul 2 14:38:39 2020', queued=0ms, >exec=0ms * stonith_id_1_start_0 on server4ubuntu1 'unknown error' (1): call=202, status=Error, exitreason='', last-rc-change='Thu Jul 2 14:37:39 2020', queued=0ms, exec=3411ms The stonith resoursce is stopped and does not seem to work. On both hosts the command # fence_xvm -o list kvm102
Re: [ClusterLabs] Still Beginner STONITH Problem
On 7/6/20 10:10 AM, stefan.schm...@farmpartner-tec.com wrote: > Hello, > > >> # fence_xvm -o list > >> kvm102 bab3749c-15fc-40b7-8b6c-d4267b9f0eb9 > >> on > > >This should show both VMs, so getting to that point will likely solve > >your problem. fence_xvm relies on multicast, there could be some > >obscure network configuration to get that working on the VMs. You said you tried on both hosts. What does 'virsh list' give you onthe 2 hosts? Hopefully different names for the VMs ... Did you try 'fence_xvm -a {mcast-ip} -o list' on the guests as well? Did you try pinging via the physical network that is connected tothe bridge configured to be used for fencing? If I got it right fence_xvm should supportcollecting answersfrom multiple hosts but I found a suggestion to do a setup with 2 multicast-addresses & keys for each host. Which route did you go? Klaus > > Thank you for pointing me in that direction. We have tried to solve > that but with no success. We were using an howto provided here > https://wiki.clusterlabs.org/wiki/Guest_Fencing > > Problem is, it specifically states that the tutorial does not yet > support the case where guests are running on multiple hosts. There are > some short hints what might be necessary to do, but working through > those sadly just did not work nor where there any clues which would > help us finding a solution ourselves. So now we are completely stuck > here. > > Has someone the same configuration with Guest VMs on multiple hosts? > And how did you manage to get that to work? What do we need to do to > resolve this? Is there maybe even someone who would be willing to take > a closer look at our server? Any help would be greatly appreciated! > > Kind regards > Stefan Schmitz > > > > Am 03.07.2020 um 02:39 schrieb Ken Gaillot: >> On Thu, 2020-07-02 at 17:18 +0200, stefan.schm...@farmpartner-tec.com >> wrote: >>> Hello, >>> >>> I hope someone can help with this problem. We are (still) trying to >>> get >>> Stonith to achieve a running active/active HA Cluster, but sadly to >>> no >>> avail. >>> >>> There are 2 Centos Hosts. On each one there is a virtual Ubuntu VM. >>> The >>> Ubuntu VMs are the ones which should form the HA Cluster. >>> >>> The current status is this: >>> >>> # pcs status >>> Cluster name: pacemaker_cluster >>> WARNING: corosync and pacemaker node names do not match (IPs used in >>> setup?) >>> Stack: corosync >>> Current DC: server2ubuntu1 (version 1.1.18-2b07d5c5a9) - partition >>> with >>> quorum >>> Last updated: Thu Jul 2 17:03:53 2020 >>> Last change: Thu Jul 2 14:33:14 2020 by root via cibadmin on >>> server4ubuntu1 >>> >>> 2 nodes configured >>> 13 resources configured >>> >>> Online: [ server2ubuntu1 server4ubuntu1 ] >>> >>> Full list of resources: >>> >>> stonith_id_1 (stonith:external/libvirt): Stopped >>> Master/Slave Set: r0_pacemaker_Clone [r0_pacemaker] >>> Masters: [ server4ubuntu1 ] >>> Slaves: [ server2ubuntu1 ] >>> Master/Slave Set: WebDataClone [WebData] >>> Masters: [ server2ubuntu1 server4ubuntu1 ] >>> Clone Set: dlm-clone [dlm] >>> Started: [ server2ubuntu1 server4ubuntu1 ] >>> Clone Set: ClusterIP-clone [ClusterIP] (unique) >>> ClusterIP:0 (ocf::heartbeat:IPaddr2): Started >>> server2ubuntu1 >>> ClusterIP:1 (ocf::heartbeat:IPaddr2): Started >>> server4ubuntu1 >>> Clone Set: WebFS-clone [WebFS] >>> Started: [ server4ubuntu1 ] >>> Stopped: [ server2ubuntu1 ] >>> Clone Set: WebSite-clone [WebSite] >>> Started: [ server4ubuntu1 ] >>> Stopped: [ server2ubuntu1 ] >>> >>> Failed Actions: >>> * stonith_id_1_start_0 on server2ubuntu1 'unknown error' (1): >>> call=201, >>> status=Error, exitreason='', >>> last-rc-change='Thu Jul 2 14:37:35 2020', queued=0ms, >>> exec=3403ms >>> * r0_pacemaker_monitor_6 on server2ubuntu1 'master' (8): >>> call=203, >>> status=complete, exitreason='', >>> last-rc-change='Thu Jul 2 14:38:39 2020', queued=0ms, exec=0ms >>> * stonith_id_1_start_0 on server4ubuntu1 'unknown error' (1): >>> call=202, >>> status=Error, exitreason='', >>> last-rc-change='Thu Jul 2 14:37:39 2020', queued=0ms, >>> exec=3411ms >>> >>> >>> The stonith resoursce is stopped and does not seem to work. >>> On both hosts the command >>> # fence_xvm -o list >>> kvm102 bab3749c-15fc-40b7-8b6c-d4267b9f0eb9 >>> on >> >> This should show both VMs, so getting to that point will likely solve >> your problem. fence_xvm relies on multicast, there could be some >> obscure network configuration to get that working on the VMs. >> >>> returns the local VM. Apparently it connects through the >>> Virtualization >>> interface because it returns the VM name not the Hostname of the >>> client >>> VM. I do not know if this is how it is supposed to work? >> >> Yes, fence_xvm knows only about the VM names. >> >> To get pacemaker to be able to use it for fencing the
Re: [ClusterLabs] Still Beginner STONITH Problem
Hello, >> # fence_xvm -o list >> kvm102 bab3749c-15fc-40b7-8b6c-d4267b9f0eb9 >> on >This should show both VMs, so getting to that point will likely solve >your problem. fence_xvm relies on multicast, there could be some >obscure network configuration to get that working on the VMs. Thank you for pointing me in that direction. We have tried to solve that but with no success. We were using an howto provided here https://wiki.clusterlabs.org/wiki/Guest_Fencing Problem is, it specifically states that the tutorial does not yet support the case where guests are running on multiple hosts. There are some short hints what might be necessary to do, but working through those sadly just did not work nor where there any clues which would help us finding a solution ourselves. So now we are completely stuck here. Has someone the same configuration with Guest VMs on multiple hosts? And how did you manage to get that to work? What do we need to do to resolve this? Is there maybe even someone who would be willing to take a closer look at our server? Any help would be greatly appreciated! Kind regards Stefan Schmitz Am 03.07.2020 um 02:39 schrieb Ken Gaillot: On Thu, 2020-07-02 at 17:18 +0200, stefan.schm...@farmpartner-tec.com wrote: Hello, I hope someone can help with this problem. We are (still) trying to get Stonith to achieve a running active/active HA Cluster, but sadly to no avail. There are 2 Centos Hosts. On each one there is a virtual Ubuntu VM. The Ubuntu VMs are the ones which should form the HA Cluster. The current status is this: # pcs status Cluster name: pacemaker_cluster WARNING: corosync and pacemaker node names do not match (IPs used in setup?) Stack: corosync Current DC: server2ubuntu1 (version 1.1.18-2b07d5c5a9) - partition with quorum Last updated: Thu Jul 2 17:03:53 2020 Last change: Thu Jul 2 14:33:14 2020 by root via cibadmin on server4ubuntu1 2 nodes configured 13 resources configured Online: [ server2ubuntu1 server4ubuntu1 ] Full list of resources: stonith_id_1 (stonith:external/libvirt): Stopped Master/Slave Set: r0_pacemaker_Clone [r0_pacemaker] Masters: [ server4ubuntu1 ] Slaves: [ server2ubuntu1 ] Master/Slave Set: WebDataClone [WebData] Masters: [ server2ubuntu1 server4ubuntu1 ] Clone Set: dlm-clone [dlm] Started: [ server2ubuntu1 server4ubuntu1 ] Clone Set: ClusterIP-clone [ClusterIP] (unique) ClusterIP:0(ocf::heartbeat:IPaddr2): Started server2ubuntu1 ClusterIP:1(ocf::heartbeat:IPaddr2): Started server4ubuntu1 Clone Set: WebFS-clone [WebFS] Started: [ server4ubuntu1 ] Stopped: [ server2ubuntu1 ] Clone Set: WebSite-clone [WebSite] Started: [ server4ubuntu1 ] Stopped: [ server2ubuntu1 ] Failed Actions: * stonith_id_1_start_0 on server2ubuntu1 'unknown error' (1): call=201, status=Error, exitreason='', last-rc-change='Thu Jul 2 14:37:35 2020', queued=0ms, exec=3403ms * r0_pacemaker_monitor_6 on server2ubuntu1 'master' (8): call=203, status=complete, exitreason='', last-rc-change='Thu Jul 2 14:38:39 2020', queued=0ms, exec=0ms * stonith_id_1_start_0 on server4ubuntu1 'unknown error' (1): call=202, status=Error, exitreason='', last-rc-change='Thu Jul 2 14:37:39 2020', queued=0ms, exec=3411ms The stonith resoursce is stopped and does not seem to work. On both hosts the command # fence_xvm -o list kvm102 bab3749c-15fc-40b7-8b6c-d4267b9f0eb9 on This should show both VMs, so getting to that point will likely solve your problem. fence_xvm relies on multicast, there could be some obscure network configuration to get that working on the VMs. returns the local VM. Apparently it connects through the Virtualization interface because it returns the VM name not the Hostname of the client VM. I do not know if this is how it is supposed to work? Yes, fence_xvm knows only about the VM names. To get pacemaker to be able to use it for fencing the cluster nodes, you have to add a pcmk_host_map parameter to the fencing resource. It looks like pcmk_host_map="nodename1:vmname1;nodename2:vmname2;..." In the local network, every traffic is allowed. No firewall is locally active, just the connections leaving the local network are firewalled. Hence there are no coneection problems between the hosts and clients. For example we can succesfully connect from the clients to the Hosts: # nc -z -v -u 192.168.1.21 1229 Ncat: Version 7.50 ( https://nmap.org/ncat ) Ncat: Connected to 192.168.1.21:1229. Ncat: UDP packet sent successfully Ncat: 1 bytes sent, 0 bytes received in 2.03 seconds. # nc -z -v -u 192.168.1.13 1229 Ncat: Version 7.50 ( https://nmap.org/ncat ) Ncat: Connected to 192.168.1.13:1229. Ncat: UDP packet sent successfully Ncat: 1 bytes sent, 0 bytes received in 2.03 seconds. On the Ubuntu VMs we created and configured the the stonith resource according to the howto
Re: [ClusterLabs] Still Beginner STONITH Problem
On Thu, 2020-07-02 at 17:18 +0200, stefan.schm...@farmpartner-tec.com wrote: > Hello, > > I hope someone can help with this problem. We are (still) trying to > get > Stonith to achieve a running active/active HA Cluster, but sadly to > no > avail. > > There are 2 Centos Hosts. On each one there is a virtual Ubuntu VM. > The > Ubuntu VMs are the ones which should form the HA Cluster. > > The current status is this: > > # pcs status > Cluster name: pacemaker_cluster > WARNING: corosync and pacemaker node names do not match (IPs used in > setup?) > Stack: corosync > Current DC: server2ubuntu1 (version 1.1.18-2b07d5c5a9) - partition > with > quorum > Last updated: Thu Jul 2 17:03:53 2020 > Last change: Thu Jul 2 14:33:14 2020 by root via cibadmin on > server4ubuntu1 > > 2 nodes configured > 13 resources configured > > Online: [ server2ubuntu1 server4ubuntu1 ] > > Full list of resources: > > stonith_id_1 (stonith:external/libvirt): Stopped > Master/Slave Set: r0_pacemaker_Clone [r0_pacemaker] > Masters: [ server4ubuntu1 ] > Slaves: [ server2ubuntu1 ] > Master/Slave Set: WebDataClone [WebData] > Masters: [ server2ubuntu1 server4ubuntu1 ] > Clone Set: dlm-clone [dlm] > Started: [ server2ubuntu1 server4ubuntu1 ] > Clone Set: ClusterIP-clone [ClusterIP] (unique) > ClusterIP:0(ocf::heartbeat:IPaddr2): Started > server2ubuntu1 > ClusterIP:1(ocf::heartbeat:IPaddr2): Started > server4ubuntu1 > Clone Set: WebFS-clone [WebFS] > Started: [ server4ubuntu1 ] > Stopped: [ server2ubuntu1 ] > Clone Set: WebSite-clone [WebSite] > Started: [ server4ubuntu1 ] > Stopped: [ server2ubuntu1 ] > > Failed Actions: > * stonith_id_1_start_0 on server2ubuntu1 'unknown error' (1): > call=201, > status=Error, exitreason='', > last-rc-change='Thu Jul 2 14:37:35 2020', queued=0ms, > exec=3403ms > * r0_pacemaker_monitor_6 on server2ubuntu1 'master' (8): > call=203, > status=complete, exitreason='', > last-rc-change='Thu Jul 2 14:38:39 2020', queued=0ms, exec=0ms > * stonith_id_1_start_0 on server4ubuntu1 'unknown error' (1): > call=202, > status=Error, exitreason='', > last-rc-change='Thu Jul 2 14:37:39 2020', queued=0ms, > exec=3411ms > > > The stonith resoursce is stopped and does not seem to work. > On both hosts the command > # fence_xvm -o list > kvm102 bab3749c-15fc-40b7-8b6c-d4267b9f0eb9 > on This should show both VMs, so getting to that point will likely solve your problem. fence_xvm relies on multicast, there could be some obscure network configuration to get that working on the VMs. > returns the local VM. Apparently it connects through the > Virtualization > interface because it returns the VM name not the Hostname of the > client > VM. I do not know if this is how it is supposed to work? Yes, fence_xvm knows only about the VM names. To get pacemaker to be able to use it for fencing the cluster nodes, you have to add a pcmk_host_map parameter to the fencing resource. It looks like pcmk_host_map="nodename1:vmname1;nodename2:vmname2;..." > In the local network, every traffic is allowed. No firewall is > locally > active, just the connections leaving the local network are > firewalled. > Hence there are no coneection problems between the hosts and clients. > For example we can succesfully connect from the clients to the Hosts: > > # nc -z -v -u 192.168.1.21 1229 > Ncat: Version 7.50 ( https://nmap.org/ncat ) > Ncat: Connected to 192.168.1.21:1229. > Ncat: UDP packet sent successfully > Ncat: 1 bytes sent, 0 bytes received in 2.03 seconds. > > # nc -z -v -u 192.168.1.13 1229 > Ncat: Version 7.50 ( https://nmap.org/ncat ) > Ncat: Connected to 192.168.1.13:1229. > Ncat: UDP packet sent successfully > Ncat: 1 bytes sent, 0 bytes received in 2.03 seconds. > > > On the Ubuntu VMs we created and configured the the stonith resource > according to the howto provided here: > https://clusterlabs.org/pacemaker/doc/en-US/Pacemaker/1.1/pdf/Clusters_from_Scratch/Pacemaker-1.1-Clusters_from_Scratch-en-US.pdf > > The actual line we used: > # pcs -f stonith_cfg stonith create stonith_id_1 external/libvirt > hostlist="Host4,host2" > hypervisor_uri="qemu+ssh://192.168.1.21/system" > > > But as you can see in in the pcs status output, stonith is stopped > and > exits with an unkown error. > > Can somebody please advise on how to procced or what additionla > information is needed to solve this problem? > Any help would be greatly appreciated! Thank you in advance. > > Kind regards > Stefan Schmitz > > > > > > > > -- Ken Gaillot ___ Manage your subscription: https://lists.clusterlabs.org/mailman/listinfo/users ClusterLabs home: https://www.clusterlabs.org/