Re: [Pacemaker] sbd fencing race
Hi, On Tue, Nov 25, 2014 at 04:20:32PM +0100, emmanuel segura wrote: Hi list, The last night, i had a cluster in fencing race using sbd as stonith Can you give a bit more details. device, i would like to know what is the effect to use start-delay in my stonith resource in this way: primitive stonith-sbd stonith:external/sbd \ params sbd_device=/dev/mapper/SBD \ op start interval=0 start-delay=5 Yes, that could help with a stonith deathmatch. Normally, you have a stonith resource running on one node. On split brain, the other node also starts the resource in order to shoot the first node. That's where start-delay comes into play. Ultimate resource for the issue: http://ourobengr.com/ha/ Cheers, Dejan Thanks ___ Pacemaker mailing list: Pacemaker@oss.clusterlabs.org http://oss.clusterlabs.org/mailman/listinfo/pacemaker Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://bugs.clusterlabs.org ___ Pacemaker mailing list: Pacemaker@oss.clusterlabs.org http://oss.clusterlabs.org/mailman/listinfo/pacemaker Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://bugs.clusterlabs.org
Re: [Pacemaker] sbd fencing race
But i would like to know if pacemaker needs to start sbd on the node where sbd resource isnt running to fence the other nodes, because i don't see any start action in the second node: ::: message_2cd.txt:Nov 23 11:43:28 node01 sbd: [69794]: WARN: CIB: We do NOT have quorum! message_2cd.txt:Nov 23 11:43:28 node01 sbd: [69791]: WARN: Pacemaker health check: UNHEALTHY message_2cd.txt:Nov 23 11:43:28 node01 pengine: [69823]: notice: LogActions: Leave stonith-sbd(Started node01) message_2ch.txt:Nov 23 11:43:28 s02srv002ch sbd: [97640]: WARN: CIB: We do NOT have quorum! : message_2ch.txt:Nov 23 11:43:28 node02 sbd: [97640]: WARN: CIB: We do NOT have quorum! message_2ch.txt:Nov 23 11:43:28 node02 sbd: [97637]: WARN: Pacemaker health check: UNHEALTHY message_2ch.txt:Nov 23 11:43:28 node02 pengine: [97679]: WARN: custom_action: Action stonith-sbd_stop_0 on node01 is unrunnable (offline) message_2ch.txt:Nov 23 11:43:28 node02 sbd: [157717]: info: Delivery process handling /dev/mapper/SBD01B0298700230 message_2ch.txt:Nov 23 11:43:28 node02 sbd: [157717]: info: Writing reset to node slot node01 message_2ch.txt:Nov 23 11:43:28 node02 sbd: [157717]: info: Messaging delay: 40 Thanks 2014-11-26 10:26 GMT+01:00 Dejan Muhamedagic deja...@fastmail.fm: Hi, On Tue, Nov 25, 2014 at 04:20:32PM +0100, emmanuel segura wrote: Hi list, The last night, i had a cluster in fencing race using sbd as stonith Can you give a bit more details. device, i would like to know what is the effect to use start-delay in my stonith resource in this way: primitive stonith-sbd stonith:external/sbd \ params sbd_device=/dev/mapper/SBD \ op start interval=0 start-delay=5 Yes, that could help with a stonith deathmatch. Normally, you have a stonith resource running on one node. On split brain, the other node also starts the resource in order to shoot the first node. That's where start-delay comes into play. Ultimate resource for the issue: http://ourobengr.com/ha/ Cheers, Dejan Thanks ___ Pacemaker mailing list: Pacemaker@oss.clusterlabs.org http://oss.clusterlabs.org/mailman/listinfo/pacemaker Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://bugs.clusterlabs.org ___ Pacemaker mailing list: Pacemaker@oss.clusterlabs.org http://oss.clusterlabs.org/mailman/listinfo/pacemaker Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://bugs.clusterlabs.org -- esta es mi vida e me la vivo hasta que dios quiera ___ Pacemaker mailing list: Pacemaker@oss.clusterlabs.org http://oss.clusterlabs.org/mailman/listinfo/pacemaker Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://bugs.clusterlabs.org
[Pacemaker] Problem with ClusterIP
Hi ! I've been using clusterip for a while now without any problem in Active/Passive clusters (2 nodes). On my last install, I'm facing quite an annoying probem. Despite the same configuration for clusterip I've ever used, , the interface is now up on both nodes which ends with an IP conflict. I'm looking for ideas to investigate the causes for such a problem. If anybody can help me on this, I would be gratefull Cheers -- Anne http://mageia.org ___ Pacemaker mailing list: Pacemaker@oss.clusterlabs.org http://oss.clusterlabs.org/mailman/listinfo/pacemaker Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://bugs.clusterlabs.org
Re: [Pacemaker] Avoid monitoring of resources on nodes
Daniel Dehennin daniel.dehen...@baby-gnu.org writes: I'll try find how to make the change directly in XML. Ok, looking at git history this feature seems only available on master branch and not yet released. I do not have that feature on my pacemaker version. Does it sounds normal, I have: - asymmetrical Opt-in cluster[1] - a group of resources with INFINITY location on a specific node And the nodes excluded are fenced because of many monitor errors about this resource. Regards. Footnotes: [1] http://clusterlabs.org/doc/en-US/Pacemaker/1.1-plugin/html/Pacemaker_Explained/_asymmetrical_opt_in_clusters.html -- Daniel Dehennin Récupérer ma clef GPG: gpg --recv-keys 0xCC1E9E5B7A6FE2DF Fingerprint: 3E69 014E 5C23 50E8 9ED6 2AAD CC1E 9E5B 7A6F E2DF signature.asc Description: PGP signature ___ Pacemaker mailing list: Pacemaker@oss.clusterlabs.org http://oss.clusterlabs.org/mailman/listinfo/pacemaker Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://bugs.clusterlabs.org
Re: [Pacemaker] Problem with ClusterIP
Am Mittwoch, 26. November 2014, 12:01:36 schrieb Anne Nicolas: Hi ! I've been using clusterip for a while now without any problem in Active/Passive clusters (2 nodes). Could you please explain, how could you use the ClusterIP in an active/passive cluster? ClusterIP ist for the use in an active/active cluster. See man iptables and look for the CLUSTERIP target. On my last install, I'm facing quite an annoying probem. Despite the same configuration for clusterip I've ever used, , the interface is now up on both nodes which ends with an IP conflict. Please explain more detailed. What is your config? What do you expect the cluster to do? What really happens? Where is the problem? I'm looking for ideas to investigate the causes for such a problem. If anybody can help me on this, I would be gratefull Yes, I think I can help ;-) Mit freundlichen Grüßen, Michael Schwartzkopff -- [*] sys4 AG http://sys4.de, +49 (89) 30 90 46 64, +49 (162) 165 0044 Franziskanerstraße 15, 81669 München Sitz der Gesellschaft: München, Amtsgericht München: HRB 199263 Vorstand: Patrick Ben Koetter, Marc Schiffbauer Aufsichtsratsvorsitzender: Florian Kirstein signature.asc Description: This is a digitally signed message part. ___ Pacemaker mailing list: Pacemaker@oss.clusterlabs.org http://oss.clusterlabs.org/mailman/listinfo/pacemaker Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://bugs.clusterlabs.org
Re: [Pacemaker] Problem with ClusterIP
Le 26/11/2014 12:23, Michael Schwartzkopff a écrit : Am Mittwoch, 26. November 2014, 12:01:36 schrieb Anne Nicolas: Hi ! I've been using clusterip for a while now without any problem in Active/Passive clusters (2 nodes). Could you please explain, how could you use the ClusterIP in an active/passive cluster? ClusterIP ist for the use in an active/active cluster. See man iptables and look for the CLUSTERIP target. Please explain more detailed. What is your config? What do you expect the cluster to do? What really happens? Where is the problem? Maybe my explanation was not that clear. Here is my configuration crm configuration show node $id=17435146 pogcupsvr node $id=34212362 pogcupsvr2 primitive apache ocf:heartbeat:apache \ params configfile=/etc/httpd/conf/httpd.conf \ op start interval=0 timeout=40s \ op stop interval=0 timeout=60s primitive clusterip ocf:heartbeat:IPaddr2 \ params ip=172.16.16.11 cidr_netmask=24 nic=eth0 \ meta target-role=Started ... property $id=cib-bootstrap-options \ dc-version=1.1.7-2.mga1-ee0730e13d124c3d58f00016c3376a1de5323cff \ cluster-infrastructure=corosync \ stonith-enabled=false \ no-quorum-policy=ignore rsc_defaults $id=rsc-options \ resource-stickiness=100 So I've started primary node (pogcupsvr). Configuration was checked and ok. Then started the second node (pogcupsvr2). This time all the configuration looked ok, no error but when I checked the network configuration, eth0 was up on both nodes with same IP address of course, instead of having it up only on primary node. What I was expected (and in all other tests it was ok ) is that eth0 was up only on primary node and used by apache server. I'm looking for ideas to investigate the causes for such a problem. If anybody can help me on this, I would be gratefull Yes, I think I can help ;-) Thanks for that :) Mit freundlichen Grüßen, Michael Schwartzkopff ___ Pacemaker mailing list: Pacemaker@oss.clusterlabs.org http://oss.clusterlabs.org/mailman/listinfo/pacemaker Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://bugs.clusterlabs.org -- Anne http://mageia.org ___ Pacemaker mailing list: Pacemaker@oss.clusterlabs.org http://oss.clusterlabs.org/mailman/listinfo/pacemaker Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://bugs.clusterlabs.org
Re: [Pacemaker] Avoid monitoring of resources on nodes
26.11.2014 14:21, Daniel Dehennin wrote: Daniel Dehennin daniel.dehen...@baby-gnu.org writes: I'll try find how to make the change directly in XML. Ok, looking at git history this feature seems only available on master branch and not yet released. I do not have that feature on my pacemaker version. Does it sounds normal, I have: - asymmetrical Opt-in cluster[1] - a group of resources with INFINITY location on a specific node And the nodes excluded are fenced because of many monitor errors about this resource. Nodes may be fenced because of resource _only_ if resource fails to stop. I can only guess what exactly happens: * cluster probes all resource on all nodes (to prevent that you need feature mentioned by David) * some of resource probes return something except not running * cluster tries to stop that resources * stop fails * node is fenced You need to locate what exactly resource returns error on probe and fix that agent (actually you do not use OCF agents but rather upstart jobs and LSB scripts). Above is for the case if all nodes have mysql job and both scripts installed. If pacemaker decides to fence because one of them is missing - that should be a bug. Regards. Footnotes: [1] http://clusterlabs.org/doc/en-US/Pacemaker/1.1-plugin/html/Pacemaker_Explained/_asymmetrical_opt_in_clusters.html ___ Pacemaker mailing list: Pacemaker@oss.clusterlabs.org http://oss.clusterlabs.org/mailman/listinfo/pacemaker Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://bugs.clusterlabs.org ___ Pacemaker mailing list: Pacemaker@oss.clusterlabs.org http://oss.clusterlabs.org/mailman/listinfo/pacemaker Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://bugs.clusterlabs.org
Re: [Pacemaker] Problem with ClusterIP
Am Mittwoch, 26. November 2014, 12:54:20 schrieb Anne Nicolas: Le 26/11/2014 12:23, Michael Schwartzkopff a écrit : Am Mittwoch, 26. November 2014, 12:01:36 schrieb Anne Nicolas: Hi ! I've been using clusterip for a while now without any problem in Active/Passive clusters (2 nodes). Could you please explain, how could you use the ClusterIP in an active/passive cluster? ClusterIP ist for the use in an active/active cluster. See man iptables and look for the CLUSTERIP target. Please explain more detailed. What is your config? What do you expect the cluster to do? What really happens? Where is the problem? Maybe my explanation was not that clear. Yes. Here is my configuration crm configuration show node $id=17435146 pogcupsvr node $id=34212362 pogcupsvr2 primitive apache ocf:heartbeat:apache \ params configfile=/etc/httpd/conf/httpd.conf \ op start interval=0 timeout=40s \ op stop interval=0 timeout=60s primitive clusterip ocf:heartbeat:IPaddr2 \ params ip=172.16.16.11 cidr_netmask=24 nic=eth0 \ meta target-role=Started ... property $id=cib-bootstrap-options \ dc-version=1.1.7-2.mga1-ee0730e13d124c3d58f00016c3376a1de5323cff \ cluster-infrastructure=corosync \ stonith-enabled=false \ no-quorum-policy=ignore rsc_defaults $id=rsc-options \ resource-stickiness=100 So I've started primary node (pogcupsvr). Configuration was checked and ok. Then started the second node (pogcupsvr2). This time all the configuration looked ok, no error but when I checked the network configuration, eth0 was up on both nodes with same IP address of course, instead of having it up only on primary node. If that is your config than the start of the IP address on BOTH nodes is really bad. This should not happen and is definitely an error. BUT: I doubt that this is you complete config, because this would not work anyway. The cluster would start the IP address on one node and the Webserver in the other node. Please paste the complete config. Then the community would be able to help. Mit freundlichen Grüßen, Michael Schwartzkopff -- [*] sys4 AG http://sys4.de, +49 (89) 30 90 46 64, +49 (162) 165 0044 Franziskanerstraße 15, 81669 München Sitz der Gesellschaft: München, Amtsgericht München: HRB 199263 Vorstand: Patrick Ben Koetter, Marc Schiffbauer Aufsichtsratsvorsitzender: Florian Kirstein signature.asc Description: This is a digitally signed message part. ___ Pacemaker mailing list: Pacemaker@oss.clusterlabs.org http://oss.clusterlabs.org/mailman/listinfo/pacemaker Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://bugs.clusterlabs.org
Re: [Pacemaker] Problem with ClusterIP
Le 26/11/2014 13:07, Michael Schwartzkopff a écrit : Am Mittwoch, 26. November 2014, 12:54:20 schrieb Anne Nicolas: Le 26/11/2014 12:23, Michael Schwartzkopff a écrit : Am Mittwoch, 26. November 2014, 12:01:36 schrieb Anne Nicolas: Hi ! I've been using clusterip for a while now without any problem in Active/Passive clusters (2 nodes). Could you please explain, how could you use the ClusterIP in an active/passive cluster? ClusterIP ist for the use in an active/active cluster. See man iptables and look for the CLUSTERIP target. Please explain more detailed. What is your config? What do you expect the cluster to do? What really happens? Where is the problem? Maybe my explanation was not that clear. Yes. Here is my configuration crm configuration show node $id=17435146 pogcupsvr node $id=34212362 pogcupsvr2 primitive apache ocf:heartbeat:apache \ params configfile=/etc/httpd/conf/httpd.conf \ op start interval=0 timeout=40s \ op stop interval=0 timeout=60s primitive clusterip ocf:heartbeat:IPaddr2 \ params ip=172.16.16.11 cidr_netmask=24 nic=eth0 \ meta target-role=Started ... property $id=cib-bootstrap-options \ dc-version=1.1.7-2.mga1-ee0730e13d124c3d58f00016c3376a1de5323cff \ cluster-infrastructure=corosync \ stonith-enabled=false \ no-quorum-policy=ignore rsc_defaults $id=rsc-options \ resource-stickiness=100 So I've started primary node (pogcupsvr). Configuration was checked and ok. Then started the second node (pogcupsvr2). This time all the configuration looked ok, no error but when I checked the network configuration, eth0 was up on both nodes with same IP address of course, instead of having it up only on primary node. If that is your config than the start of the IP address on BOTH nodes is really bad. This should not happen and is definitely an error. BUT: I doubt that this is you complete config, because this would not work anyway. The cluster would start the IP address on one node and the Webserver in the other node. Please paste the complete config. Then the community would be able to help. Here is the complete configuration: node $id=17435146 pogcupsvr node $id=34212362 pogcupsvr2 primitive apache ocf:heartbeat:apache \ params configfile=/etc/httpd/conf/httpd.conf \ op start interval=0 timeout=40s \ op stop interval=0 timeout=60s primitive clusterip ocf:heartbeat:IPaddr2 \ params ip=172.16.16.11 cidr_netmask=24 nic=eth0 \ meta target-role=Started primitive drbdserv ocf:linbit:drbd \ params drbd_resource=server \ op monitor interval=60s primitive fsserv ocf:heartbeat:Filesystem \ params device=/dev/drbd/by-res/server directory=/clusterfs fstype=ext4 primitive libvirt-guests lsb:libvirt-guests primitive libvirtd lsb:libvirtd primitive mysql ocf:heartbeat:mysql \ params binary=/usr/bin/mysqld_safe config=/etc/my.cnf datadir=/clusterfs/mysql \ op start interval=0 timeout=40s \ op stop interval=0 timeout=60s \ meta target-role=Started primitive named lsb:named primitive samba lsb:smb group services fsserv clusterip libvirtd samba apache mysql ms drbdservClone drbdserv \ meta master-max=1 master-node-max=1 clone-max=2 clone-node-max=1 notify=true colocation fs_on_drbd inf: fsserv drbdservClone:Master order fsserv-after-drbdserv inf: drbdservClone:promote fsserv:start property $id=cib-bootstrap-options \ dc-version=1.1.7-2.mga1-ee0730e13d124c3d58f00016c3376a1de5323cff \ cluster-infrastructure=corosync \ stonith-enabled=false \ no-quorum-policy=ignore rsc_defaults $id=rsc-options \ resource-stickiness=100 -- Anne http://mageia.org ___ Pacemaker mailing list: Pacemaker@oss.clusterlabs.org http://oss.clusterlabs.org/mailman/listinfo/pacemaker Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://bugs.clusterlabs.org
Re: [Pacemaker] Problem with ClusterIP
Anne, Are you expecting the eth0 to actually put in the down state like using the ifconfig eth0 down command? If so the IPaddr2 resource does not do that. What that is used for is to configure a second IP address on the NIC that can be moved around from eth0 on each node. Can you clearify that? Also, can you paste the output of the ip addr command as well? The full configuration of crm configure show would also be helpful Thanks, Keith Keith Ouellette kei...@fibermountain.com 700 West Johnson Avenue Cheshire, CT06410 www.fibermountain.com P. (203) 806-4046 C. (860) 810-4877 F. (845) 358-7882 Disclaimer: The information contained in this communication is confidential, may be privileged and is intended for the exclusive use of the above named addressee(s). If you are not the intended recipient(s), you are expressly prohibited from copying, distributing, disseminating, or in any other way using any information contained within this communication. If you have received this communication in error, please contact the sender by telephone or by response via mail. We have taken precautions to minimize the risk of transmitting software viruses, but we advise you to carry out your own virus checks on this message, as well as any attachments. We cannot accept liability for any loss or damage caused by software viruses. -Original Message- From: Anne Nicolas [mailto:enna...@gmail.com] Sent: Wednesday, November 26, 2014 6:54 AM To: pacemaker@oss.clusterlabs.org Subject: Re: [Pacemaker] Problem with ClusterIP Le 26/11/2014 12:23, Michael Schwartzkopff a écrit : Am Mittwoch, 26. November 2014, 12:01:36 schrieb Anne Nicolas: Hi ! I've been using clusterip for a while now without any problem in Active/Passive clusters (2 nodes). Could you please explain, how could you use the ClusterIP in a active/passive cluster? ClusterIP ist for the use in an active/active cluster. See man iptables and look for the CLUSTERIP target. Please explain more detailed. What is your config? What do you expect the cluster to do? What really happens? Where is the problem? Maybe my explanation was not that clear. Here is my configuration crm configuration show node $id=17435146 pogcupsvr node $id=34212362 pogcupsvr2 primitive apache ocf:heartbeat:apache \ params configfile=/etc/httpd/conf/httpd.conf \ op start interval=0 timeout=40s \ op stop interval=0 timeout=60s primitive clusterip ocf:heartbeat:IPaddr2 \ params ip=172.16.16.11 cidr_netmask=24 nic=eth0 \ meta target-role=Started ... property $id=cib-bootstrap-options \ dc-version=1.1.7-2.mga1-ee0730e13d124c3d58f00016c3376a1de5323cff \ cluster-infrastructure=corosync \ stonith-enabled=false \ no-quorum-policy=ignore rsc_defaults $id=rsc-options \ resource-stickiness=100 So I've started primary node (pogcupsvr). Configuration was checked and ok. Then started the second node (pogcupsvr2). This time all the configuration looked ok, no error but when I checked the network configuration, eth0 was up on both nodes with same IP address of course, instead of having it up only on primary node. What I was expected (and in all other tests it was ok ) is that eth0 was up only on primary node and used by apache server. I'm looking for ideas to investigate the causes for such a problem. If anybody can help me on this, I would be gratefull Yes, I think I can help ;-) Thanks for that :) Mit freundlichen Grüßen, Michael Schwartzkopff ___ Pacemaker mailing list: Pacemaker@oss.clusterlabs.org http://oss.clusterlabs.org/mailman/listinfo/pacemaker Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://bugs.clusterlabs.org -- Anne http://mageia.org ___ Pacemaker mailing list: Pacemaker@oss.clusterlabs.org http://oss.clusterlabs.org/mailman/listinfo/pacemaker Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://bugs.clusterlabs.org ___ Pacemaker mailing list: Pacemaker@oss.clusterlabs.org http://oss.clusterlabs.org/mailman/listinfo/pacemaker Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://bugs.clusterlabs.org
Re: [Pacemaker] Problem with ClusterIP
2014-11-26 13:22 GMT+01:00 Keith Ouellette kei...@fibermountain.com: Anne, Are you expecting the eth0 to actually put in the down state like using the ifconfig eth0 down command? If so the IPaddr2 resource does not do that. What that is used for is to configure a second IP address on the NIC that can be moved around from eth0 on each node. Can you clearify that? Also, can you paste the output of the ip addr command as well? The full configuration of crm configure show would also be helpful Maybe I've misunderstood documentation but just reused http://clusterlabs.org/doc/en-US/Pacemaker/1.1-plugin/html-single/Clusters_from_Scratch/#_perform_a_failover It seemed to work at the beginning as expected. I've sent some minutes ago my full configuration Thanks, Keith Keith Ouellette kei...@fibermountain.com 700 West Johnson Avenue Cheshire, CT06410 www.fibermountain.com P. (203) 806-4046 C. (860) 810-4877 F. (845) 358-7882 Disclaimer: The information contained in this communication is confidential, may be privileged and is intended for the exclusive use of the above named addressee(s). If you are not the intended recipient(s), you are expressly prohibited from copying, distributing, disseminating, or in any other way using any information contained within this communication. If you have received this communication in error, please contact the sender by telephone or by response via mail. We have taken precautions to minimize the risk of transmitting software viruses, but we advise you to carry out your own virus checks on this message, as well as any attachments. We cannot accept liability for any loss or damage caused by software viruses. -Original Message- From: Anne Nicolas [mailto:enna...@gmail.com] Sent: Wednesday, November 26, 2014 6:54 AM To: pacemaker@oss.clusterlabs.org Subject: Re: [Pacemaker] Problem with ClusterIP Le 26/11/2014 12:23, Michael Schwartzkopff a écrit : Am Mittwoch, 26. November 2014, 12:01:36 schrieb Anne Nicolas: Hi ! I've been using clusterip for a while now without any problem in Active/Passive clusters (2 nodes). Could you please explain, how could you use the ClusterIP in a active/passive cluster? ClusterIP ist for the use in an active/active cluster. See man iptables and look for the CLUSTERIP target. Please explain more detailed. What is your config? What do you expect the cluster to do? What really happens? Where is the problem? Maybe my explanation was not that clear. Here is my configuration crm configuration show node $id=17435146 pogcupsvr node $id=34212362 pogcupsvr2 primitive apache ocf:heartbeat:apache \ params configfile=/etc/httpd/conf/httpd.conf \ op start interval=0 timeout=40s \ op stop interval=0 timeout=60s primitive clusterip ocf:heartbeat:IPaddr2 \ params ip=172.16.16.11 cidr_netmask=24 nic=eth0 \ meta target-role=Started ... property $id=cib-bootstrap-options \ dc-version=1.1.7-2.mga1-ee0730e13d124c3d58f00016c3376a1de5323cff \ cluster-infrastructure=corosync \ stonith-enabled=false \ no-quorum-policy=ignore rsc_defaults $id=rsc-options \ resource-stickiness=100 So I've started primary node (pogcupsvr). Configuration was checked and ok. Then started the second node (pogcupsvr2). This time all the configuration looked ok, no error but when I checked the network configuration, eth0 was up on both nodes with same IP address of course, instead of having it up only on primary node. What I was expected (and in all other tests it was ok ) is that eth0 was up only on primary node and used by apache server. I'm looking for ideas to investigate the causes for such a problem. If anybody can help me on this, I would be gratefull Yes, I think I can help ;-) Thanks for that :) Mit freundlichen Grüßen, Michael Schwartzkopff ___ Pacemaker mailing list: Pacemaker@oss.clusterlabs.org http://oss.clusterlabs.org/mailman/listinfo/pacemaker Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://bugs.clusterlabs.org -- Anne http://mageia.org ___ Pacemaker mailing list: Pacemaker@oss.clusterlabs.org http://oss.clusterlabs.org/mailman/listinfo/pacemaker Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://bugs.clusterlabs.org ___ Pacemaker mailing list: Pacemaker@oss.clusterlabs.org http://oss.clusterlabs.org/mailman/listinfo/pacemaker Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://bugs.clusterlabs.org -- Anne http://www.mageia.org
Re: [Pacemaker] sbd fencing race
On Wed, Nov 26, 2014 at 11:13:41AM +0100, emmanuel segura wrote: But i would like to know if pacemaker needs to start sbd on the node where sbd resource isnt running to fence the other nodes, because i don't see any start action in the second node: That's strange. I'd expect that a stonith resource needs to be started (enabled) first. Perhaps that changed, as it seems to be the case judging by the logs below. I cannot offer any more advice here, but would still like to know the circumstances and how it happened that the nodes shot each other. Thanks, Dejan ::: message_2cd.txt:Nov 23 11:43:28 node01 sbd: [69794]: WARN: CIB: We do NOT have quorum! message_2cd.txt:Nov 23 11:43:28 node01 sbd: [69791]: WARN: Pacemaker health check: UNHEALTHY message_2cd.txt:Nov 23 11:43:28 node01 pengine: [69823]: notice: LogActions: Leave stonith-sbd(Started node01) message_2ch.txt:Nov 23 11:43:28 s02srv002ch sbd: [97640]: WARN: CIB: We do NOT have quorum! : message_2ch.txt:Nov 23 11:43:28 node02 sbd: [97640]: WARN: CIB: We do NOT have quorum! message_2ch.txt:Nov 23 11:43:28 node02 sbd: [97637]: WARN: Pacemaker health check: UNHEALTHY message_2ch.txt:Nov 23 11:43:28 node02 pengine: [97679]: WARN: custom_action: Action stonith-sbd_stop_0 on node01 is unrunnable (offline) message_2ch.txt:Nov 23 11:43:28 node02 sbd: [157717]: info: Delivery process handling /dev/mapper/SBD01B0298700230 message_2ch.txt:Nov 23 11:43:28 node02 sbd: [157717]: info: Writing reset to node slot node01 message_2ch.txt:Nov 23 11:43:28 node02 sbd: [157717]: info: Messaging delay: 40 Thanks 2014-11-26 10:26 GMT+01:00 Dejan Muhamedagic deja...@fastmail.fm: Hi, On Tue, Nov 25, 2014 at 04:20:32PM +0100, emmanuel segura wrote: Hi list, The last night, i had a cluster in fencing race using sbd as stonith Can you give a bit more details. device, i would like to know what is the effect to use start-delay in my stonith resource in this way: primitive stonith-sbd stonith:external/sbd \ params sbd_device=/dev/mapper/SBD \ op start interval=0 start-delay=5 Yes, that could help with a stonith deathmatch. Normally, you have a stonith resource running on one node. On split brain, the other node also starts the resource in order to shoot the first node. That's where start-delay comes into play. Ultimate resource for the issue: http://ourobengr.com/ha/ Cheers, Dejan Thanks ___ Pacemaker mailing list: Pacemaker@oss.clusterlabs.org http://oss.clusterlabs.org/mailman/listinfo/pacemaker Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://bugs.clusterlabs.org ___ Pacemaker mailing list: Pacemaker@oss.clusterlabs.org http://oss.clusterlabs.org/mailman/listinfo/pacemaker Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://bugs.clusterlabs.org -- esta es mi vida e me la vivo hasta que dios quiera ___ Pacemaker mailing list: Pacemaker@oss.clusterlabs.org http://oss.clusterlabs.org/mailman/listinfo/pacemaker Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://bugs.clusterlabs.org ___ Pacemaker mailing list: Pacemaker@oss.clusterlabs.org http://oss.clusterlabs.org/mailman/listinfo/pacemaker Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://bugs.clusterlabs.org
Re: [Pacemaker] Problem with ClusterIP
Le 26/11/2014 13:43, Michael Schwartzkopff a écrit : Am Mittwoch, 26. November 2014, 13:22:53 schrieben Sie: Le 26/11/2014 13:07, Michael Schwartzkopff a écrit : Am Mittwoch, 26. November 2014, 12:54:20 schrieb Anne Nicolas: Le 26/11/2014 12:23, Michael Schwartzkopff a écrit : Am Mittwoch, 26. November 2014, 12:01:36 schrieb Anne Nicolas: Hi ! I've been using clusterip for a while now without any problem in Active/Passive clusters (2 nodes). Could you please explain, how could you use the ClusterIP in an active/passive cluster? ClusterIP ist for the use in an active/active cluster. See man iptables and look for the CLUSTERIP target. Please explain more detailed. What is your config? What do you expect the cluster to do? What really happens? Where is the problem? Maybe my explanation was not that clear. Yes. Here is my configuration crm configuration show node $id=17435146 pogcupsvr node $id=34212362 pogcupsvr2 primitive apache ocf:heartbeat:apache \ params configfile=/etc/httpd/conf/httpd.conf \ op start interval=0 timeout=40s \ op stop interval=0 timeout=60s primitive clusterip ocf:heartbeat:IPaddr2 \ params ip=172.16.16.11 cidr_netmask=24 nic=eth0 \ meta target-role=Started ... property $id=cib-bootstrap-options \ dc-version=1.1.7-2.mga1-ee0730e13d124c3d58f00016c3376a1de5323cff \ cluster-infrastructure=corosync \ stonith-enabled=false \ no-quorum-policy=ignore rsc_defaults $id=rsc-options \ resource-stickiness=100 So I've started primary node (pogcupsvr). Configuration was checked and ok. Then started the second node (pogcupsvr2). This time all the configuration looked ok, no error but when I checked the network configuration, eth0 was up on both nodes with same IP address of course, instead of having it up only on primary node. If that is your config than the start of the IP address on BOTH nodes is really bad. This should not happen and is definitely an error. BUT: I doubt that this is you complete config, because this would not work anyway. The cluster would start the IP address on one node and the Webserver in the other node. Please paste the complete config. Then the community would be able to help. Here is the complete configuration: node $id=17435146 pogcupsvr node $id=34212362 pogcupsvr2 primitive apache ocf:heartbeat:apache \ params configfile=/etc/httpd/conf/httpd.conf \ op start interval=0 timeout=40s \ op stop interval=0 timeout=60s primitive clusterip ocf:heartbeat:IPaddr2 \ params ip=172.16.16.11 cidr_netmask=24 nic=eth0 \ meta target-role=Started primitive drbdserv ocf:linbit:drbd \ params drbd_resource=server \ op monitor interval=60s primitive fsserv ocf:heartbeat:Filesystem \ params device=/dev/drbd/by-res/server directory=/clusterfs fstype=ext4 primitive libvirt-guests lsb:libvirt-guests primitive libvirtd lsb:libvirtd primitive mysql ocf:heartbeat:mysql \ params binary=/usr/bin/mysqld_safe config=/etc/my.cnf datadir=/clusterfs/mysql \ op start interval=0 timeout=40s \ op stop interval=0 timeout=60s \ meta target-role=Started primitive named lsb:named primitive samba lsb:smb group services fsserv clusterip libvirtd samba apache mysql ms drbdservClone drbdserv \ meta master-max=1 master-node-max=1 clone-max=2 clone-node-max=1 notify=true colocation fs_on_drbd inf: fsserv drbdservClone:Master order fsserv-after-drbdserv inf: drbdservClone:promote fsserv:start property $id=cib-bootstrap-options \ dc-version=1.1.7-2.mga1-ee0730e13d124c3d58f00016c3376a1de5323cff \ cluster-infrastructure=corosync \ stonith-enabled=false \ no-quorum-policy=ignore rsc_defaults $id=rsc-options \ resource-stickiness=100 OK. Config seems to be ok. But I would make the constraints to work on the group, not on the fsserver. But since the fsserver is the first resource in the group, everything should be ok. Now: This time all the configuration looked ok, no error but when I checked the network configuration, eth0 was up on both nodes with same IP address of course, instead of having it up only on primary node. Please could you paste the output of the command ip addr list dev eth0 on both nodes? In fact I read your message and it just turned on the light in some part of my brain... I checked the interface coinfiguration and discovered an IP address on both side... And for sure that could not work. Removing this just did the trick and everything is ok now. Sorry for the noise but thanks for helping in finding this stupid mistake :) Mit freundlichen Grüßen, Michael Schwartzkopff -- Anne http://mageia.org ___ Pacemaker mailing list: Pacemaker@oss.clusterlabs.org http://oss.clusterlabs.org/mailman/listinfo/pacemaker Project Home:
Re: [Pacemaker] sbd fencing race
I think pacemaker doesn't care about the sbd resource status when it needs to make a fencing call, that what i think, but i hope some one, will give me some more information. Thanks 2014-11-26 15:11 GMT+01:00 Dejan Muhamedagic deja...@fastmail.fm: On Wed, Nov 26, 2014 at 11:13:41AM +0100, emmanuel segura wrote: But i would like to know if pacemaker needs to start sbd on the node where sbd resource isnt running to fence the other nodes, because i don't see any start action in the second node: That's strange. I'd expect that a stonith resource needs to be started (enabled) first. Perhaps that changed, as it seems to be the case judging by the logs below. I cannot offer any more advice here, but would still like to know the circumstances and how it happened that the nodes shot each other. Thanks, Dejan ::: message_2cd.txt:Nov 23 11:43:28 node01 sbd: [69794]: WARN: CIB: We do NOT have quorum! message_2cd.txt:Nov 23 11:43:28 node01 sbd: [69791]: WARN: Pacemaker health check: UNHEALTHY message_2cd.txt:Nov 23 11:43:28 node01 pengine: [69823]: notice: LogActions: Leave stonith-sbd(Started node01) message_2ch.txt:Nov 23 11:43:28 s02srv002ch sbd: [97640]: WARN: CIB: We do NOT have quorum! : message_2ch.txt:Nov 23 11:43:28 node02 sbd: [97640]: WARN: CIB: We do NOT have quorum! message_2ch.txt:Nov 23 11:43:28 node02 sbd: [97637]: WARN: Pacemaker health check: UNHEALTHY message_2ch.txt:Nov 23 11:43:28 node02 pengine: [97679]: WARN: custom_action: Action stonith-sbd_stop_0 on node01 is unrunnable (offline) message_2ch.txt:Nov 23 11:43:28 node02 sbd: [157717]: info: Delivery process handling /dev/mapper/SBD01B0298700230 message_2ch.txt:Nov 23 11:43:28 node02 sbd: [157717]: info: Writing reset to node slot node01 message_2ch.txt:Nov 23 11:43:28 node02 sbd: [157717]: info: Messaging delay: 40 Thanks 2014-11-26 10:26 GMT+01:00 Dejan Muhamedagic deja...@fastmail.fm: Hi, On Tue, Nov 25, 2014 at 04:20:32PM +0100, emmanuel segura wrote: Hi list, The last night, i had a cluster in fencing race using sbd as stonith Can you give a bit more details. device, i would like to know what is the effect to use start-delay in my stonith resource in this way: primitive stonith-sbd stonith:external/sbd \ params sbd_device=/dev/mapper/SBD \ op start interval=0 start-delay=5 Yes, that could help with a stonith deathmatch. Normally, you have a stonith resource running on one node. On split brain, the other node also starts the resource in order to shoot the first node. That's where start-delay comes into play. Ultimate resource for the issue: http://ourobengr.com/ha/ Cheers, Dejan Thanks ___ Pacemaker mailing list: Pacemaker@oss.clusterlabs.org http://oss.clusterlabs.org/mailman/listinfo/pacemaker Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://bugs.clusterlabs.org ___ Pacemaker mailing list: Pacemaker@oss.clusterlabs.org http://oss.clusterlabs.org/mailman/listinfo/pacemaker Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://bugs.clusterlabs.org -- esta es mi vida e me la vivo hasta que dios quiera ___ Pacemaker mailing list: Pacemaker@oss.clusterlabs.org http://oss.clusterlabs.org/mailman/listinfo/pacemaker Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://bugs.clusterlabs.org ___ Pacemaker mailing list: Pacemaker@oss.clusterlabs.org http://oss.clusterlabs.org/mailman/listinfo/pacemaker Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://bugs.clusterlabs.org -- esta es mi vida e me la vivo hasta que dios quiera ___ Pacemaker mailing list: Pacemaker@oss.clusterlabs.org http://oss.clusterlabs.org/mailman/listinfo/pacemaker Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://bugs.clusterlabs.org
Re: [Pacemaker] [ha-wg] [Cluster-devel] [Linux-HA] [RFC] Organizing HA Summit 2015
On 2014-11-25T16:46:01, David Vossel dvos...@redhat.com wrote: Okay, okay, apparently we have got enough topics to discuss. I'll grumble a bit more about Brno, but let's get the organisation of that thing on track ... Sigh. Always so much work! I'm assuming arrival on the 3rd and departure on the 6th would be the plan? Personally I'm interested in talking about scaling - with pacemaker-remoted and/or a new messaging/membership layer. If we're going to talk about scaling, we should throw in our new docker support in the same discussion. Docker lends itself well to the pet vs cattle analogy. I see management of docker with pacemaker making quite a bit of sense now that we have the ability to scale into the cattle territory. While we're on that, I'd like to throw in a heretic thought and suggest that one might want to look at etcd and fleetd. Other design-y topics: - SBD Point taken. I have actually not forgotten this Andrew, and am reading your development. I probably just need to pull the code over ... - degraded mode - improved notifications - containerisation of services (cgroups, docker, virt) - resource-agents (upstream releases, handling of pull requests, testing) Yep, We definitely need to talk about the resource-agents. Agreed. User-facing topics could include recent features (ie. pacemaker-remoted, crm_resource --restart) and common deployment scenarios (eg. NFS) that people get wrong. Adding to the list, it would be a good idea to talk about Deployment integration testing, what's going on with the phd project and why it's important regardless if you're interested in what the project functionally does. OK. So QA is within scope as well. It seems the agenda will fill up quite nicely. Regards, Lars -- Architect Storage/HA SUSE LINUX GmbH, GF: Jeff Hawn, Jennifer Guild, Felix Imendörffer, HRB 21284 (AG Nürnberg) Experience is the name everyone gives to their mistakes. -- Oscar Wilde ___ Pacemaker mailing list: Pacemaker@oss.clusterlabs.org http://oss.clusterlabs.org/mailman/listinfo/pacemaker Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://bugs.clusterlabs.org
Re: [Pacemaker] [Cluster-devel] [ha-wg] [Linux-HA] [RFC] Organizing HA Summit 2015
On 11/26/2014 4:41 PM, Lars Marowsky-Bree wrote: On 2014-11-25T16:46:01, David Vossel dvos...@redhat.com wrote: Okay, okay, apparently we have got enough topics to discuss. I'll grumble a bit more about Brno, but let's get the organisation of that thing on track ... Sigh. Always so much work! I'm assuming arrival on the 3rd and departure on the 6th would be the plan? Yes that´s correct. Devconf starts the 6. Fabio Personally I'm interested in talking about scaling - with pacemaker-remoted and/or a new messaging/membership layer. If we're going to talk about scaling, we should throw in our new docker support in the same discussion. Docker lends itself well to the pet vs cattle analogy. I see management of docker with pacemaker making quite a bit of sense now that we have the ability to scale into the cattle territory. While we're on that, I'd like to throw in a heretic thought and suggest that one might want to look at etcd and fleetd. Other design-y topics: - SBD Point taken. I have actually not forgotten this Andrew, and am reading your development. I probably just need to pull the code over ... - degraded mode - improved notifications - containerisation of services (cgroups, docker, virt) - resource-agents (upstream releases, handling of pull requests, testing) Yep, We definitely need to talk about the resource-agents. Agreed. User-facing topics could include recent features (ie. pacemaker-remoted, crm_resource --restart) and common deployment scenarios (eg. NFS) that people get wrong. Adding to the list, it would be a good idea to talk about Deployment integration testing, what's going on with the phd project and why it's important regardless if you're interested in what the project functionally does. OK. So QA is within scope as well. It seems the agenda will fill up quite nicely. Regards, Lars ___ Pacemaker mailing list: Pacemaker@oss.clusterlabs.org http://oss.clusterlabs.org/mailman/listinfo/pacemaker Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://bugs.clusterlabs.org
Re: [Pacemaker] Fencing of bare-metal remote nodes
26.11.2014 18:36, David Vossel wrote: - Original Message - 25.11.2014 23:41, David Vossel wrote: - Original Message - Hi! is subj implemented? Trying echo c /proc/sysrq-trigger on remote nodes and no fencing occurs. Yes, fencing remote-nodes works. Are you certain your fencing devices can handle fencing the remote-node? Fencing a remote-node requires a cluster node to invoke the agent that actually performs the fencing action on the remote-node. Yes, if I invoke fencing action manually ('crm node fence rnode' in crmsh syntax), node is fenced. So the issue seems to be related to the detection of a need fencing. Comments in related git commits are a little bit terse in this area. So could you please explain what exactly needs to happen on a remote node to initiate fencing? I tried so far: * kill pacemaker_remoted when no resources are running. systemd restated it and crmd reconnected after some time. * crash kernel when no resources are running * crash kernel during massive start of resources this last one should definitely cause fencing. What version of pacemaker are you using? I've made changes in this area recently. Can you provide a crm_report. It's c191bf3. crm_report is ready, but I still wait an approval from a customer to send it. -- David No fencing happened. In the last case that start actions 'hung' and were failed by timeout (it is rather long), node was not even listed as failed. My customer asked me to stop crashing nodes because one of them does not boot anymore (I like that modern UEFI hardware very much.), so it is hard for me to play more with that. Best, Vladislav -- Vossel Best, Vladislav ___ Pacemaker mailing list: Pacemaker@oss.clusterlabs.org http://oss.clusterlabs.org/mailman/listinfo/pacemaker Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://bugs.clusterlabs.org ___ Pacemaker mailing list: Pacemaker@oss.clusterlabs.org http://oss.clusterlabs.org/mailman/listinfo/pacemaker Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://bugs.clusterlabs.org ___ Pacemaker mailing list: Pacemaker@oss.clusterlabs.org http://oss.clusterlabs.org/mailman/listinfo/pacemaker Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://bugs.clusterlabs.org ___ Pacemaker mailing list: Pacemaker@oss.clusterlabs.org http://oss.clusterlabs.org/mailman/listinfo/pacemaker Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://bugs.clusterlabs.org ___ Pacemaker mailing list: Pacemaker@oss.clusterlabs.org http://oss.clusterlabs.org/mailman/listinfo/pacemaker Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://bugs.clusterlabs.org
Re: [Pacemaker] [Linux-HA] [ha-wg] [RFC] Organizing HA Summit 2015
25.11.2014 12:54, Lars Marowsky-Bree wrote:... OK, let's switch tracks a bit. What *topics* do we actually have? Can we fill two days? Where would we want to collect them? Just my 2c. - It would be interesting to get some bird-view information on what C APIs corosync and pacemaker currently provide to application developers (one immediate use-case is in-app monitoring of the cluster events). - One more (more developer-bounded) topic could be a resource degraded state support. From the user perspective it would be nice to have. One immediate example is iscsi connection to several portals. When some portals are not accessible, connection still may work, but in the degraded state. Best, Vladislav ___ Pacemaker mailing list: Pacemaker@oss.clusterlabs.org http://oss.clusterlabs.org/mailman/listinfo/pacemaker Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://bugs.clusterlabs.org
Re: [Pacemaker] Fencing of bare-metal remote nodes
- Original Message - 26.11.2014 18:36, David Vossel wrote: - Original Message - 25.11.2014 23:41, David Vossel wrote: - Original Message - Hi! is subj implemented? Trying echo c /proc/sysrq-trigger on remote nodes and no fencing occurs. Yes, fencing remote-nodes works. Are you certain your fencing devices can handle fencing the remote-node? Fencing a remote-node requires a cluster node to invoke the agent that actually performs the fencing action on the remote-node. Yes, if I invoke fencing action manually ('crm node fence rnode' in crmsh syntax), node is fenced. So the issue seems to be related to the detection of a need fencing. Comments in related git commits are a little bit terse in this area. So could you please explain what exactly needs to happen on a remote node to initiate fencing? I tried so far: * kill pacemaker_remoted when no resources are running. systemd restated it and crmd reconnected after some time. This should definitely cause the remote-node to be fenced. I tested this earlier today after reading you were having problems and my setup fenced the remote-node correctly. * crash kernel when no resources are running If a remote-node connection is lost and pacemaker was able to verify the node is clean before the connection is lost, pacemaker will attempt to reconnect to the remote-node without issuing a fencing request. I could see why both fencing and not fencing in this situation could be desired. Maybe i should make an option. * crash kernel during massive start of resources This should definitely cause the remote node to be fenced. this last one should definitely cause fencing. What version of pacemaker are you using? I've made changes in this area recently. Can you provide a crm_report. It's c191bf3. crm_report is ready, but I still wait an approval from a customer to send it. Great. I really need to see what you all are doing. Outside of my own setup I have not seen many setups where pacemaker remote deployed on baremetal nodes. It is possible something in your configuration exposes some edge case I haven't encountered yet. There's a US holiday Thrusday and Friday, so I won't be able to look at this until next week. -- Vossel -- David No fencing happened. In the last case that start actions 'hung' and were failed by timeout (it is rather long), node was not even listed as failed. My customer asked me to stop crashing nodes because one of them does not boot anymore (I like that modern UEFI hardware very much.), so it is hard for me to play more with that. Best, Vladislav -- Vossel Best, Vladislav ___ Pacemaker mailing list: Pacemaker@oss.clusterlabs.org http://oss.clusterlabs.org/mailman/listinfo/pacemaker Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://bugs.clusterlabs.org ___ Pacemaker mailing list: Pacemaker@oss.clusterlabs.org http://oss.clusterlabs.org/mailman/listinfo/pacemaker Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://bugs.clusterlabs.org ___ Pacemaker mailing list: Pacemaker@oss.clusterlabs.org http://oss.clusterlabs.org/mailman/listinfo/pacemaker Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://bugs.clusterlabs.org ___ Pacemaker mailing list: Pacemaker@oss.clusterlabs.org http://oss.clusterlabs.org/mailman/listinfo/pacemaker Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://bugs.clusterlabs.org ___ Pacemaker mailing list: Pacemaker@oss.clusterlabs.org http://oss.clusterlabs.org/mailman/listinfo/pacemaker Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://bugs.clusterlabs.org ___ Pacemaker mailing list: Pacemaker@oss.clusterlabs.org http://oss.clusterlabs.org/mailman/listinfo/pacemaker Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://bugs.clusterlabs.org
Re: [Pacemaker] [ha-wg-technical] [ha-wg] [Cluster-devel] [Linux-HA] [RFC] Organizing HA Summit 2015
On 27 Nov 2014, at 2:41 am, Lars Marowsky-Bree l...@suse.com wrote: On 2014-11-25T16:46:01, David Vossel dvos...@redhat.com wrote: Okay, okay, apparently we have got enough topics to discuss. I'll grumble a bit more about Brno, but let's get the organisation of that thing on track ... Sigh. Always so much work! I'm assuming arrival on the 3rd and departure on the 6th would be the plan? Personally I'm interested in talking about scaling - with pacemaker-remoted and/or a new messaging/membership layer. If we're going to talk about scaling, we should throw in our new docker support in the same discussion. Docker lends itself well to the pet vs cattle analogy. I see management of docker with pacemaker making quite a bit of sense now that we have the ability to scale into the cattle territory. While we're on that, I'd like to throw in a heretic thought and suggest that one might want to look at etcd and fleetd. Nod. I suspect the next evolutionary step will be to sit on a NoSQL/Big-data kind of table somehow. I was intending to head down that path last year when I did all that cib work. Other design-y topics: - SBD Point taken. I have actually not forgotten this Andrew, and am reading your development. I probably just need to pull the code over ... ok - degraded mode - improved notifications - containerisation of services (cgroups, docker, virt) - resource-agents (upstream releases, handling of pull requests, testing) Yep, We definitely need to talk about the resource-agents. Agreed. User-facing topics could include recent features (ie. pacemaker-remoted, crm_resource --restart) and common deployment scenarios (eg. NFS) that people get wrong. Adding to the list, it would be a good idea to talk about Deployment integration testing, what's going on with the phd project and why it's important regardless if you're interested in what the project functionally does. OK. So QA is within scope as well. It seems the agenda will fill up quite nicely. Regards, Lars -- Architect Storage/HA SUSE LINUX GmbH, GF: Jeff Hawn, Jennifer Guild, Felix Imendörffer, HRB 21284 (AG Nürnberg) Experience is the name everyone gives to their mistakes. -- Oscar Wilde ___ ha-wg-technical mailing list ha-wg-techni...@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/ha-wg-technical ___ Pacemaker mailing list: Pacemaker@oss.clusterlabs.org http://oss.clusterlabs.org/mailman/listinfo/pacemaker Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://bugs.clusterlabs.org
Re: [Pacemaker] Suicide fencing and watchdog questions
On 25 Nov 2014, at 10:37 pm, Vladislav Bogdanov bub...@hoster-ok.com wrote: Hi, Is there any information how watchdog integration is intended to work? What are currently-evaluated use-cases for that? It seems to be forcibly disabled id SBD is not detected... Are you referring to no-quorum-policy=suicide? Also, is there any way to make node (in one-node cluster ;) ) to suicide if it detects fencing is required? Technically, that can be done with IPMI 'power cycle' or 'power reset' commands - but node (and thus the whole cluster) will not know about fencing is succeeded, because if it received the answer, then fencing failed. But node will be hard reboot and thus cleaned up otherwise. Best, Vladislav ___ Pacemaker mailing list: Pacemaker@oss.clusterlabs.org http://oss.clusterlabs.org/mailman/listinfo/pacemaker Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://bugs.clusterlabs.org ___ Pacemaker mailing list: Pacemaker@oss.clusterlabs.org http://oss.clusterlabs.org/mailman/listinfo/pacemaker Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://bugs.clusterlabs.org
Re: [Pacemaker] Suicide fencing and watchdog questions
27.11.2014 03:43, Andrew Beekhof wrote: On 25 Nov 2014, at 10:37 pm, Vladislav Bogdanov bub...@hoster-ok.com wrote: Hi, Is there any information how watchdog integration is intended to work? What are currently-evaluated use-cases for that? It seems to be forcibly disabled id SBD is not detected... Are you referring to no-quorum-policy=suicide? That too. But main intention was to understand what value that feature can bring at all. I tried to enable it without SBD or no-quorum-policy=suicide and watchdog was not fired up. Then I looked at sources and realized that it is enabled only when SBD is detected, and is not actually managed by the cluster option. Also, is there any way to make node (in one-node cluster ;) ) to suicide if it detects fencing is required? Technically, that can be done with IPMI 'power cycle' or 'power reset' commands - but node (and thus the whole cluster) will not know about fencing is succeeded, because if it received the answer, then fencing failed. But node will be hard reboot and thus cleaned up otherwise. Best, Vladislav ___ Pacemaker mailing list: Pacemaker@oss.clusterlabs.org http://oss.clusterlabs.org/mailman/listinfo/pacemaker Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://bugs.clusterlabs.org ___ Pacemaker mailing list: Pacemaker@oss.clusterlabs.org http://oss.clusterlabs.org/mailman/listinfo/pacemaker Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://bugs.clusterlabs.org ___ Pacemaker mailing list: Pacemaker@oss.clusterlabs.org http://oss.clusterlabs.org/mailman/listinfo/pacemaker Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://bugs.clusterlabs.org