Re: [Pacemaker] Clustermon issue
Le 18/12/2014 16:21, Marco Querci a écrit : Hi all, I have a pacemaker + corosync cluster installed on a CentOS 6.5 I have a resource ClusterMon with external_agent param set up. Before last pacemaker update the events notification to the external script worked perfectly. After pacemaker update to version 1.1.11, the ClusterMon resource continues to work but stop notifying to the external agent. I followed setup instructions found on internet but I can't figure how it doesn't work as expected. Any help will be appreciated. Many thanks. Hello, please paste your full configuration here please so we understand how you use the ClusterMon stuff. Remember that on RHEL 6.x, SNMP support is not built in ; but that's probably why you use an external_agent. I just need to make sure by reading your configuration. Cheers, Florian Crouzat ___ Pacemaker mailing list: Pacemaker@oss.clusterlabs.org http://oss.clusterlabs.org/mailman/listinfo/pacemaker Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://bugs.clusterlabs.org
Re: [Pacemaker] How to get external-agent to work?
Le 26/03/2014 18:55, John Wei a écrit : > I am trying to receive notification through an external program on > CentOS 6.5 with pacemaker. I have tried crm_mon -E and ClusterMon RA, > none of them seem to work. > Anyone has insight what I might have done wrong? > Maybe get informations from this old howto I wrote couple month ago: http://floriancrouzat.net/2013/01/monitor-a-pacemaker-cluster-with-ocfpacemakerclustermon-andor-external-agent/ or, provides use with more info, logs, configurations, etc. -- Cheers, Florian Crouzat ___ Pacemaker mailing list: Pacemaker@oss.clusterlabs.org http://oss.clusterlabs.org/mailman/listinfo/pacemaker Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://bugs.clusterlabs.org
Re: [Pacemaker] question on resource colocation
Le 18/12/2013 17:20, Brusq, Jerome a écrit : Dear all, It is maybe a stupid question… Sorry I’m a new user of pacemaker. Not at all. I have 2 nodes and 5 resources (resource1, resource2, resource3, resource4 and resource5). The 5 resources have to work together on the same node. I tried the feature group and colocation .. but I have always the same issue : When node2 is power off, node1 is active and all resources are up. If service3 fails, pacemaker stops resource4 and resource5 as well (I have read this is the normal behavior)… This is because group is a syntaxic shortcut for colocation + ordering. The important word here beeing "ordering". If you stop 3, then 4 and 5 will stop as they *require* 3 to be up. Is there a way for pacemaker to restart only service3 ? I think you would have to remove your group and create all the colocations constraints and no ordering constraits. I'm not sure it's possible to create a group of unordered resources. -- Cheers, Florian Crouzat ___ Pacemaker mailing list: Pacemaker@oss.clusterlabs.org http://oss.clusterlabs.org/mailman/listinfo/pacemaker Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://bugs.clusterlabs.org
Re: [Pacemaker] does adding a second ring actually work with cman?
Le 16/12/2013 17:28, Brian J. Murrell a écrit : So is there something I am misunderstanding or is this not actually working? The node I was trying this on is EL6.4. Redundant ring protocol in CMAN works very fine with >=RHEL6.4 (I only use in active mode though). Is it possible that lotus-5vm8 (from DNS) and lotus-5vm8-ring1 (from /etc/hosts) resolves to the same IP (10.128.0.206) which could maybe confuse cman and make it decide that there is only one ring ? In my case, to be extra safe, and because I want everything to work without network in case of admin interventions/failures (whatever), I always put both entries in /etc/hosts, eg: 192.168.10.1 node01 node01.example.com 192.168.150.1 node01_alt node01_alt.example.com -- Cheers, Florian Crouzat ___ Pacemaker mailing list: Pacemaker@oss.clusterlabs.org http://oss.clusterlabs.org/mailman/listinfo/pacemaker Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://bugs.clusterlabs.org
Re: [Pacemaker] howto group resources without having an order
Le 26/11/2013 10:49, Bauer, Stefan (IZLBW Extern) a écrit : The thing is, that resource sets are not configurable without editing the xml directly. True? Not true at all. Crm configure only knows group order and colocation. Nop, eg (from memories, check the syntax, it exists): order foo INF: (A B) C D (E F G) colo bar INF: (E F G) D C (B A) I just don't want to mess with the raw xml files if there are other options. You should never edit xml, indeed. -- Cheers, Florian Crouzat ___ Pacemaker mailing list: Pacemaker@oss.clusterlabs.org http://oss.clusterlabs.org/mailman/listinfo/pacemaker Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://bugs.clusterlabs.org
Re: [Pacemaker] howto group resources without having an order
Le 26/11/2013 10:19, Bauer, Stefan (IZLBW Extern) a écrit : Hi, thank you for your input - unfortunately i want to go another path if possible to not not have to change more parts of my configuration: So basically you want to fix your non-working configuration without changing your (non-working) configuration ? Right, that seems reasonnable. I have setup so far: group cluster1 p_eth0 p_conntrackd location groupwithping cluster1 \ rule "id="groupwithping-rule" pingd: defined pingd colocation cluster inf: p_eth0 p_conntrackd Now I cannot simply add p_openvpn1 + openvpn2 to the above colocation because then the order is active. Why is colocation even taking care of the order of the resources?! No you cannot. If I change it to: Colocation cluster inf: p_eth0 p_conntrackd p_openvpn1 p_openvpn2 - I cannot start openvpn2 without having openvpn1 up. This is not what I want. What you want, I already told you. Thank you. Stefan -- Cheers, Florian Crouzat ___ Pacemaker mailing list: Pacemaker@oss.clusterlabs.org http://oss.clusterlabs.org/mailman/listinfo/pacemaker Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://bugs.clusterlabs.org
Re: [Pacemaker] howto group resources without having an order
Le 26/11/2013 07:40, Bauer, Stefan (IZLBW Extern) a écrit : Dear Developers & Users, i have 4 resources: p_eth0 p_conntrackd p_openvpn1 p_openvpn2 Right now, I use group and colocation to let p_eth0 and p_conntrackd start in the right order (first eth0, then conntrackd). I want now to also include p_openvpn1 + 2 but not having them in any order. Means – running on the same cluster node but independent from each other. I want to be able to not depend on openvpn1 to start openvpn2 (that’s the default behavior iirc without groups/orders). Any help is greatly appreciated. Best regards Stefan Use resources sets (both for ordering[1] and collocation[2]). And play with the value of parameters "sequential=" and "require-all=". [1] - http://clusterlabs.org/doc/en-US/Pacemaker/1.1/html/Pacemaker_Explained/s-resource-sets-ordering.html [2] - http://clusterlabs.org/doc/en-US/Pacemaker/1.1/html/Pacemaker_Explained/s-resource-sets-collocation.html -- Cheers, Florian Crouzat ___ Pacemaker mailing list: Pacemaker@oss.clusterlabs.org http://oss.clusterlabs.org/mailman/listinfo/pacemaker Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://bugs.clusterlabs.org
Re: [Pacemaker] configuration files of cluster
Le 18/11/2013 15:58, Dvorak Andreas a écrit : Dear all Now I finished the configuation of my first pacemaker cluster and would like to backup configuration files. But I do not find any. The configuration and cluster state are shared accros the cluster nodes and physically stored in an XML file called a CIB (Cluster Information Base). At a given time, the CIB stores the exact state of the cluster (with scores, roles, & such) Can you please help me needs to put in a backup? With pca config I get a nice output, but I would like to have that in one or more files. Assuming you just want to save your starting configuration and not the state of the cluster or anything complex, saving the output of "pcs config" is alright, and can be easily restored. But you could also create "shadow cibs" and restart from them (~ save points). In any case, I'd say, do not touch the XML files. My cluster installation Pacemaker 1.1.8 cman 3.0.12.1 pcs 0.9.26 ccs 0.16.2 resource-agents 3.9.2 corosync 1.4.1 Best regards Andreas Dvorak ___ Pacemaker mailing list: Pacemaker@oss.clusterlabs.org http://oss.clusterlabs.org/mailman/listinfo/pacemaker Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://bugs.clusterlabs.org -- Cheers, Florian Crouzat ___ Pacemaker mailing list: Pacemaker@oss.clusterlabs.org http://oss.clusterlabs.org/mailman/listinfo/pacemaker Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://bugs.clusterlabs.org
Re: [Pacemaker] running scripts in a resource
Le 15/11/2013 13:32, Dvorak Andreas a écrit : Dear all, I would like to create a resource that sets up routes an a node. I'm quite sure there are dedicated resources to manage routes *but* if you want to script stuff yourself, then create a lsb resource (eg: lsb:my_routes) which adds routes in the start() section, checks for them in the status() section and removes them in the stop() section. Make sure your return code are LSB compliant: clusterlabs.org/doc/en-US/Pacemaker/1.1-crmsh/html/Pacemaker_Explained/ap-lsb.html I have got three scripts check_routes.sh delete_routes.sh setup_routes.sh but unfortunately I do not find documentation how to do that. Can somebody please help me? -- Cheers, Florian Crouzat ___ Pacemaker mailing list: Pacemaker@oss.clusterlabs.org http://oss.clusterlabs.org/mailman/listinfo/pacemaker Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://bugs.clusterlabs.org
Re: [Pacemaker] Offline Cluster edit
Le 15/10/2013 09:39, Robert Lindgren a écrit : I have a cluster that is offline, and I can't start it to do edits (since IPs and so will conflict with old cluster). What is the preferred way of doing the edits (change IPs) so that I can start the cluster? Can't you start only one of the node unplugged from the network with the wrong configuration, edit and update the configuration with pcs/crmsh, replug the network when it's all okay, then have other nodes rejoin the cluster (with old conf), they automagically update to new conf as they see they are deprecated, and there are no conflicts in the entire process. -- Cheers, Florian Crouzat ___ Pacemaker mailing list: Pacemaker@oss.clusterlabs.org http://oss.clusterlabs.org/mailman/listinfo/pacemaker Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://bugs.clusterlabs.org
Re: [Pacemaker] create 2-node Active/Passive firewall cluster
Le 19/09/2013 11:43, David Lang a écrit : On Thu, 19 Sep 2013, Florian Crouzat wrote: Le 18/09/2013 20:34, Jeff Weber a ?crit : I am looking to create a 2-node Active/Passive firewall cluster. I am an experienced Linux user, but new to HA clusters. I have scanned "Clusters From Scratch" and "Pacemaker Explained". I found these docs helpful, but a bit overwhelming, being new to HA clusters. My goals: * create 2-node Active/Passive firewall cluster * Each FW node has an external, and internal interface * Cluster software presents external, internal VIPs * VIPs must be co-located on same node * One node is preferred for VIP locations * If any interface fails on node currently hosting VIPs, VIPs move to other node For simplicity sake, I'll start by creating VIPs, and add firewall plumbing to the VIPs in the future. My config: CentOS-6.3 based distro + corosync-1.4.1-1 pacemaker-1.1.8-1 pcs-0.9.26-1 resource-agents-3.9.2-12 and all required dependencies My questions: This sounds like a common use case, but I could not find an example/HOWTO. Did I miss it? Do I have the correct HA cluster packages, versions to start work? Do I also need the cman?, ccs packages? How many interfaces should each cluster node have? 2 interfaces: internal, external or 3 interfaces: internal, external, monitor Do I need to configure corosync.conf/totem/interface/bindnetaddr, and if so, bind to what net? $1M question: How to configure cluster to monitor all internal, external cluster interfaces, and perform failover? Here's my estimate: * create external VIP as IpAddr2 and bind to external interfaces * create internal VIP as IpAddr2 and bind to internal interfaces * co-locate both VIPs together * specify a location constraint for preferred node Any help would be appreciated, thanks Jeff I have several two-nodes firewall clusters running pacemaker+cman (since EL6.4) and they work perfectly. My setup is as follow: Both node boots in a "passive" firewall state (via chkconfig). In this state, only corosync trafic is allowed between nodes (and admin access on non-VIP IPs). From that state, they both start cman+pacemaker and via a location preference + 3 ping nodes, the node with the best score starts the resources. Resources are a group of 30+ IPaddr2, iptables and custom daemons such as bind, postfix, ldirectord, etc. All resources are collocated and ordered so they all are on the same node and starts in a correct order (first I get the VIPs then I start the firewall, then I bind the daemons, etc) VIPs are not really monitored as pacemaker doesn't really do that, it just checks the IP is present in some sort of "sudo ip addr ls | fgrep " ; if you unplug the network cable, it won't see it: that's where you define wisely your ping nodes so that you can monitor the connectivity of certain subnet/gateway from all nodes and decide which is the best connected one in case of incident. If you like, I can paste configuration files (cluster.conf + CIB) I've been running active/failover firewall clusters with heartbeat since about 2000, and one suggestion that I would make. If you can leave all the daemons running all the time, the failover process is far more robust (and faster since you don't have daemons to start). If you set net.ipv4.ip_nonlocal_bind you can even have the daemons startup binding to the VIP addresses that don't yet exist. If you do not have to have the daemons bound to the VIP, the fact that they are always running on the backup box gives you a quick way to check if a failover would solve the problem or not by having a client connect directly to the second box. The drawback is that someone may configure something to point directly at a box and not at a VIP and you won't detect it (without log analysis) until the box they point at actually goes down. David Lang I never thought about that, it seems it could be interesting, especially with slow (start|stop)ing daemons such as squid. In my case, my daemons would be protected by the "passive firewall state" that my nodes have when they don't host resources. Thanks for bringing this up. -- Cheers, Florian Crouzat ___ Pacemaker mailing list: Pacemaker@oss.clusterlabs.org http://oss.clusterlabs.org/mailman/listinfo/pacemaker Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://bugs.clusterlabs.org
Re: [Pacemaker] very slow pacemaker/corosync shutdown
Le 19/09/2013 00:25, David Lang a écrit : I'm frequently running into a problem that shutting down pacemaker/corosync takes a very long time (several minutes) Just to be 100% sure, you always respect the stop order ? Pacemaker *then* CMAN/corosync ? -- Cheers, Florian Crouzat ___ Pacemaker mailing list: Pacemaker@oss.clusterlabs.org http://oss.clusterlabs.org/mailman/listinfo/pacemaker Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://bugs.clusterlabs.org
Re: [Pacemaker] create 2-node Active/Passive firewall cluster
Le 18/09/2013 20:34, Jeff Weber a écrit : I am looking to create a 2-node Active/Passive firewall cluster. I am an experienced Linux user, but new to HA clusters. I have scanned "Clusters From Scratch" and "Pacemaker Explained". I found these docs helpful, but a bit overwhelming, being new to HA clusters. My goals: * create 2-node Active/Passive firewall cluster * Each FW node has an external, and internal interface * Cluster software presents external, internal VIPs * VIPs must be co-located on same node * One node is preferred for VIP locations * If any interface fails on node currently hosting VIPs, VIPs move to other node For simplicity sake, I'll start by creating VIPs, and add firewall plumbing to the VIPs in the future. My config: CentOS-6.3 based distro + corosync-1.4.1-1 pacemaker-1.1.8-1 pcs-0.9.26-1 resource-agents-3.9.2-12 and all required dependencies My questions: This sounds like a common use case, but I could not find an example/HOWTO. Did I miss it? Do I have the correct HA cluster packages, versions to start work? Do I also need the cman?, ccs packages? How many interfaces should each cluster node have? 2 interfaces: internal, external or 3 interfaces: internal, external, monitor Do I need to configure corosync.conf/totem/interface/bindnetaddr, and if so, bind to what net? $1M question: How to configure cluster to monitor all internal, external cluster interfaces, and perform failover? Here's my estimate: * create external VIP as IpAddr2 and bind to external interfaces * create internal VIP as IpAddr2 and bind to internal interfaces * co-locate both VIPs together * specify a location constraint for preferred node Any help would be appreciated, thanks Jeff I have several two-nodes firewall clusters running pacemaker+cman (since EL6.4) and they work perfectly. My setup is as follow: Both node boots in a "passive" firewall state (via chkconfig). In this state, only corosync trafic is allowed between nodes (and admin access on non-VIP IPs). From that state, they both start cman+pacemaker and via a location preference + 3 ping nodes, the node with the best score starts the resources. Resources are a group of 30+ IPaddr2, iptables and custom daemons such as bind, postfix, ldirectord, etc. All resources are collocated and ordered so they all are on the same node and starts in a correct order (first I get the VIPs then I start the firewall, then I bind the daemons, etc) VIPs are not really monitored as pacemaker doesn't really do that, it just checks the IP is present in some sort of "sudo ip addr ls | fgrep " ; if you unplug the network cable, it won't see it: that's where you define wisely your ping nodes so that you can monitor the connectivity of certain subnet/gateway from all nodes and decide which is the best connected one in case of incident. If you like, I can paste configuration files (cluster.conf + CIB) Cheers -- Cheers, Florian Crouzat ___ Pacemaker mailing list: Pacemaker@oss.clusterlabs.org http://oss.clusterlabs.org/mailman/listinfo/pacemaker Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://bugs.clusterlabs.org
Re: [Pacemaker] CMAN nodes online
Le 16/09/2013 17:30, Gopalakrishnan N a écrit : Hey Guys, The OS am running is CentOS 6.4 (64bit) and I have disabled IPtables and SeLinux. My goal is to make Apache Tomcat as HA. As a first step thought of testing with Apache. My network setup is like this, Node1 is connected to switch Note2 is connected to switch. My cluster.conf file is as follows, [root@test01 ~]# cat /etc/cluster/cluster.conf And @some point of time am able to see both nodes are registered, [root@test01 ~]# cman_tool nodes Node Sts Inc Joined Name 1 M104 2013-09-17 02:01:11 test01 2 M108 2013-09-17 02:15:07 test02 And @sometimes with crm_mon -1, I get the following [root@test01 ~]# crm_mon -1 Last updated: Tue Sep 17 02:47:27 2013 Last change: Tue Sep 17 00:42:43 2013 via crmd on test01 Stack: cman Current DC: NONE 4 Nodes configured, 2 expected votes 0 Resources configured. Node test01: UNCLEAN (offline) Node test01.iopextech.com <http://test01.iopextech.com>: UNCLEAN (offline) Node test02: UNCLEAN (offline) Node test02.iopextech.com <http://test02.iopextech.com>: UNCLEAN (offline) Thanks. Ok, if you are still unhappy with CMAN check your switch for multicast. From there, I assume you are happy with cman otherwise there are no reasons to start pacemaker, but you did, so you get my point ;) This kind of errors often indicates that your /etc/hosts or DNS is wrong and pacemaker has a hard time to map hostnames and IPs, etc (not sure about the internals there...) Get rid of the invalid nodes by editing the configuration; stop pacemaker; fix your resolutions and retry. In the entire process, I usually use FQDN in my cluster.conf file, and I add them and the corresponding ring IP in my /etc/hosts file on all nodes. I also use the same FQDN in my pacemaker configuration. Cheers On Mon, Sep 16, 2013 at 8:40 PM, Florian Crouzat mailto:gen...@floriancrouzat.net>> wrote: Le 16/09/2013 14:18, Gopalakrishnan N a écrit : Do I need to have a cross over cable between each node? Is it mandatory? Nop it doesn't. In your case, I'd check the network architecture and/or firewalling regarding multicast. You probably either have wrong iptables and/or a switch dropping your multicast corosync ring(s). Also, please, as Andreas said: try to communicate with us in a more efficient way: more context, more informations and more output (pasted somewhere). We are happy to help people, but we don't have to waste our time trying to understand what these exact people doesn't tell us because they are lazy. ps: also use 'corosync-objctl' ; it's a good command to debug rings and configurations. Cheers, Florian. On Mon, Sep 16, 2013 at 8:01 PM, Gopalakrishnan N mailto:gopalakrishnan...@gmail.com> <mailto:gopalakrishnan.an@__gmail.com <mailto:gopalakrishnan...@gmail.com>>> wrote: Again the when i restarted the pacemaker and cman not the nodes are not in online, back to square 1. node1 shows only node1 online, and node2 says node2 online. I don't know what's happening in the background... Any advice would be appreciated.. Thanks. On Mon, Sep 16, 2013 at 6:47 PM, Gopalakrishnan N mailto:gopalakrishnan...@gmail.com> <mailto:gopalakrishnan.an@__gmail.com <mailto:gopalakrishnan...@gmail.com>>> wrote: Hi guys, I got it, basically it tool some time to propogate and now two nodes are showing online... Thanks. On Mon, Sep 16, 2013 at 6:39 PM, Gopalakrishnan N mailto:gopalakrishnan...@gmail.com> <mailto:gopalakrishnan.an@__gmail.com <mailto:gopalakrishnan...@gmail.com>>> wrote: I have configured CMAN as per the link http://clusterlabs.org/doc/en-__US/Pacemaker/1.1-plugin/html-__single/Clusters_from_Scratch/__index.html#_configuring_cman <http://clusterlabs.org/doc/en-US/Pacemaker/1.1-plugin/html-single/Clusters_from_Scratch/index.html#_configuring_cman> but when I type cman_tools nodes only one node is online even thought the cluster.conf is propogated in other node as well. what could be the reason, in node1, cman_tool nodes shows only node1 online, in node2 it shows only node2 is online. How to make two nodes as onlin
Re: [Pacemaker] CMAN nodes online
Le 16/09/2013 14:18, Gopalakrishnan N a écrit : Do I need to have a cross over cable between each node? Is it mandatory? Nop it doesn't. In your case, I'd check the network architecture and/or firewalling regarding multicast. You probably either have wrong iptables and/or a switch dropping your multicast corosync ring(s). Also, please, as Andreas said: try to communicate with us in a more efficient way: more context, more informations and more output (pasted somewhere). We are happy to help people, but we don't have to waste our time trying to understand what these exact people doesn't tell us because they are lazy. ps: also use 'corosync-objctl' ; it's a good command to debug rings and configurations. Cheers, Florian. On Mon, Sep 16, 2013 at 8:01 PM, Gopalakrishnan N mailto:gopalakrishnan...@gmail.com>> wrote: Again the when i restarted the pacemaker and cman not the nodes are not in online, back to square 1. node1 shows only node1 online, and node2 says node2 online. I don't know what's happening in the background... Any advice would be appreciated.. Thanks. On Mon, Sep 16, 2013 at 6:47 PM, Gopalakrishnan N mailto:gopalakrishnan...@gmail.com>> wrote: Hi guys, I got it, basically it tool some time to propogate and now two nodes are showing online... Thanks. On Mon, Sep 16, 2013 at 6:39 PM, Gopalakrishnan N mailto:gopalakrishnan...@gmail.com>> wrote: I have configured CMAN as per the link http://clusterlabs.org/doc/en-US/Pacemaker/1.1-plugin/html-single/Clusters_from_Scratch/index.html#_configuring_cman but when I type cman_tools nodes only one node is online even thought the cluster.conf is propogated in other node as well. what could be the reason, in node1, cman_tool nodes shows only node1 online, in node2 it shows only node2 is online. How to make two nodes as online, even thought CMAN service is running in both nodes. Thanks in advance. Regards, Gopal ___ Pacemaker mailing list: Pacemaker@oss.clusterlabs.org http://oss.clusterlabs.org/mailman/listinfo/pacemaker Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://bugs.clusterlabs.org -- Cheers, Florian Crouzat ___ Pacemaker mailing list: Pacemaker@oss.clusterlabs.org http://oss.clusterlabs.org/mailman/listinfo/pacemaker Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://bugs.clusterlabs.org
Re: [Pacemaker] Pacemaker with Apache Tomcat
Le 09/09/2013 17:34, Gopalakrishnan N a écrit : Hi, Any tutorial to install pacemaker with Apache Tomcat... Regards, Gopal Yes, first setup a pacemaker cluster by following the guides (Cluster from scratch) at http://clusterlabs.org/doc/ Then ask your eventual questions about managing a tomcat resource. -- Cheers, Florian Crouzat ___ Pacemaker mailing list: Pacemaker@oss.clusterlabs.org http://oss.clusterlabs.org/mailman/listinfo/pacemaker Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://bugs.clusterlabs.org
Re: [Pacemaker] Does ocf:heartbeat:IPaddr2 support binding the virtual IP on a bond interface?
Le 28/08/2013 19:18, Xiaomin Zhang a écrit : Actually I don't know how to specify the bond interface to assign this virtual IP. $ sudo crm ra meta IPaddr2 search for "nic" and make sure the underlaying interface is up as pacemaker doesn't do "ifup" but create aliases on already created interfaces (cf prerequisite in the "nic" section). -- Cheers, Florian Crouzat ___ Pacemaker mailing list: Pacemaker@oss.clusterlabs.org http://oss.clusterlabs.org/mailman/listinfo/pacemaker Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://bugs.clusterlabs.org
Re: [Pacemaker] Using "avoids" location constraint
Le 08/07/2013 09:49, Andrew Morgan a écrit : I'm attempting to implement a 3 node cluster where only 2 nodes are there to actually run the services and the 3rd is there to form a quorum (so that the cluster stays up when one of the 2 'workload' nodes fails). To this end, I added a location avoids contraint so that the services (including drbd) don't get placed on the 3rd node (drbd3)... pcs constraint location ms_drbd avoids drbd3.localdomain the problem is that this constraint doesn't appear to be enforced and I see failed actions where Pacemaker has attempted to start the services on drbd3. In most cases I can just ignore the error but if I attempt to migrate the services using "pcs move" then it causes a fatal startup loop for drbd. If I migrate by adding an extra location contraint preferring the other workload node then I can migrate ok. I'm using Oracle Linux 6.4; drbd83-utils 8.3.11; corosync 1.4.1; cman 3.0.12.1; Pacemaker 1.1.8 & pcs 1.1.8 I'm no quorum-node expert but I believe your initial design isn't optimal. You could probably even run with only two nodes (real nodes) and no-quorum-policy=ignore + fencing (for data integrity) [1] This is what most (all?) people with two nodes clusters do. But if you really believe you need to be quorate, then I think you need to define your third node as quorum-node in corosync/cman (not sure how since EL6.4 and CMAN) and I cannot find a valid link. IIRC with such definition, you won't need the location constraints. [1] http://clusterlabs.org/doc/en-US/Pacemaker/1.1-plugin/html/Clusters_from_Scratch/_perform_a_failover.html#_quorum_and_two_node_clusters -- Cheers, Florian Crouzat ___ Pacemaker mailing list: Pacemaker@oss.clusterlabs.org http://oss.clusterlabs.org/mailman/listinfo/pacemaker Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://bugs.clusterlabs.org
Re: [Pacemaker] Fixed! - Re: Problem with dual-PDU fencing node with redundant PSUs
Le 29/06/2013 01:22, Andrew Beekhof a écrit : On 29/06/2013, at 12:22 AM, Digimer wrote: On 06/28/2013 06:21 AM, Andrew Beekhof wrote: On 28/06/2013, at 5:22 PM, Lars Marowsky-Bree wrote: On 2013-06-27T12:53:01, Digimer wrote: primitive fence_n01_psu1_off stonith:fence_apc_snmp \ params ipaddr="an-p01" pcmk_reboot_action="off" port="1" pcmk_host_list="an-c03n01.alteeve.ca" primitive fence_n01_psu1_on stonith:fence_apc_snmp \ params ipaddr="an-p01" pcmk_reboot_action="on" port="1" pcmk_host_list="an-c03n01.alteeve.ca" So every device twice, including location constraints? I see potential for optimization by improving how the fence code handles this ... That's abhorrently complex. (And I'm not sure the 'action' parameter ought to be overwritten.) I'm not crazy about it either because it means the device is tied to a specific command. But it seems to be something all the RHCS people try to do... Maybe something in the rhcs water cooler made us all mad... ;) Glad you got it working, though. location loc_fence_n01_ipmi fence_n01_ipmi -inf: an-c03n01.alteeve.ca [...] I'm not sure you need any of these location constraints, by the way. Did you test if it works without them? Again, this is after just one test. I will want to test it several more times before I consider it reliable. Ideally, I would love to hear Andrew or others confirm this looks sane/correct. It looks correct, but not quite sane. ;-) That seems not to be something you can address, though. I'm thinking that fencing topology should be smart enough to, if multiple fencing devices are specified, to know how to expand them to "first all off (if off fails anywhere, it's a failure), then all on (if on fails, it is not a failure)". That'd greatly simplify the syntax. The RH agents have apparently already been updated to support multiple ports. I'm really not keen on having the stonith-ng doing this. This doesn't help people who have dual power rails/PDUs for power redundancy. I'm yet to be convinced that having two PDUs is helping those people in the first place. If it were actually useful, I suspect more than two/three people would have asked for it in the last decade. Well, it's probably because many people are still toying around with pacemaker and I assume that not many advanced RHCS users have yet tried to translate their current RHCS cluster to pacemaker. Digimer and I did, and we both failed having the equivalent configuration we had in our RHCS setup. I suspect more and more people will hit this issue soon or later. Anyway, whatever will follow in terms of configuration primitive or API, thanks to Digimer tests we now have something (even if unelegant) working :) -- Cheers, Florian Crouzat ___ Pacemaker mailing list: Pacemaker@oss.clusterlabs.org http://oss.clusterlabs.org/mailman/listinfo/pacemaker Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://bugs.clusterlabs.org
Re: [Pacemaker] crm ptest does not show graphics
Le 20/06/2013 14:42, Michael Schwartzkopff a écrit : Am Donnerstag, 20. Juni 2013, 13:43:10 schrieb Florian Crouzat: My original question was about crm. It was about an old tool that doesn't work in your environment. I was just trying to guide you to the correct direction with a working an updated tool instead of fixing something prehistoric. Now if you are not happy with my answers, then good luck. Nothing do to here anymore. -- Cheers, Florian Crouzat ___ Pacemaker mailing list: Pacemaker@oss.clusterlabs.org http://oss.clusterlabs.org/mailman/listinfo/pacemaker Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://bugs.clusterlabs.org
Re: [Pacemaker] crm ptest does not show graphics
Le 20/06/2013 13:06, Michael Schwartzkopff a écrit : Am Donnerstag, 20. Juni 2013, 11:03:53 schrieb Florian Crouzat: > I usually use "crm_simulate -S -L -VVV". From inside the crm subshell??? Nope. $ type -a crm_simulate crm_simulate is /usr/sbin/crm_simulate $ sudo yum whatprovides $(type -a crm_simulate) [...] pacemaker-cli-1.1.8-7.el6.x86_64 : Command line tools for controlling Pacemaker clusters Repo: installed -- Cheers, Florian Crouzat ___ Pacemaker mailing list: Pacemaker@oss.clusterlabs.org http://oss.clusterlabs.org/mailman/listinfo/pacemaker Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://bugs.clusterlabs.org
Re: [Pacemaker] crm ptest does not show graphics
Le 19/06/2013 10:31, Michael Schwartzkopff a écrit : Hi, When I enter: # crm configure (... do some changes ...) and enter # crm(live)# ptest I believe ptest is not recommended anymore and might even be deprecated and replaced by "crm_simulate". I usually use "crm_simulate -S -L -VVV". Actually, ptest is documented under the 1.0 version[1] of the doc while crm_simulate is documented under the 1.1 version[2] which is the most up-to-date one. [1] - http://clusterlabs.org/doc/en-US/Pacemaker/1.0/html/Pacemaker_Explained/s-config-testing-changes.html [2] - http://clusterlabs.org/doc/en-US/Pacemaker/1.1/html/Pacemaker_Explained/s-config-testing-changes.html -- Cheers, Florian Crouzat ___ Pacemaker mailing list: Pacemaker@oss.clusterlabs.org http://oss.clusterlabs.org/mailman/listinfo/pacemaker Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://bugs.clusterlabs.org
Re: [Pacemaker] Can I use Pacemaker release 1.1.8 for production clusters?
Le 10/06/2013 16:46, Michael Furman a écrit : Hi all! According to the Wiki http://clusterlabs.org/wiki/Releases This page has not been updated since 10:49, 11 February 2011 Even numbered release series (eg. 0.6.x, 1.0.x) are recommended for production clusters. I need to install Pacemaker on Centos 6 machines. Unfortunately, the main Centos repository contains only 1.1.8-7.el6 version. Questions: Can I use Pacemaker release 1.1.8 for production clusters (we want to work with the Centos repository)? I hope so Do you expect to change existing features in 1.1.8? I believe it's not really a tech preview anymore since EL6.4 so I'd expect things not to move a lot anymore until RHEL7 Do you have uncompleted features in 1.1.8? What repository contains Pacemaker release 1.0.12? Thanks for your help, Michael -- Cheers, Florian Crouzat ___ Pacemaker mailing list: Pacemaker@oss.clusterlabs.org http://oss.clusterlabs.org/mailman/listinfo/pacemaker Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://bugs.clusterlabs.org
Re: [Pacemaker] Removing resource from group without disturbing remaining resources in group
Le 06/06/2013 17:29, Dejan Muhamedagic a écrit : Approximative syntax, do not blame me ! > >* crm configure property maintenance-mode=true >* crm resource stop R1 # it won't stop as it's in maintenance-mode A recent crmsh knows that it can delete a resource if maintenance-mode is set. Oh. Okay, thanks :) -- Cheers, Florian Crouzat ___ Pacemaker mailing list: Pacemaker@oss.clusterlabs.org http://oss.clusterlabs.org/mailman/listinfo/pacemaker Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://bugs.clusterlabs.org
Re: [Pacemaker] failcount,start/stop-failure in crm_mon
Le 06/06/2013 16:43, Vadym Chepkov a écrit : On Jun 6, 2013, at 10:29 AM, Wolfgang Routschka wrote: Hi, one question today about deleting start/stop error in crm_mon. How can I delete failure/errors in crm_mon without having to restart/refresh resources? crm resource cleanup some-resource Beware that resources can move when failcounts are cleared. Eg: removing a +INF failcount makes the node eligible to host this resource again, and depending on your configuration, resource could move back there instantly. -- Cheers, Florian Crouzat ___ Pacemaker mailing list: Pacemaker@oss.clusterlabs.org http://oss.clusterlabs.org/mailman/listinfo/pacemaker Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://bugs.clusterlabs.org
Re: [Pacemaker] Removing resource from group without disturbing remaining resources in group
Le 06/06/2013 16:35, Andreas Mock a écrit : Hi all, is there a way to remove a resource from a group without disturbing the other resources in the group. The following example: - G1 has R1 R2 R3 - All resources are started - Stopping R1 would cause a stop of R2 R3 - So, the idea was: * crm configure edit => remove R1 from the group while running * stop resource * delete resource BUT: At some point (which we couldn't find out at the moment) all remaining resources of the group are restarted. It seems that the change of the implicit dependency tree of the initial group forces a rebuild of that tree including a restart of that group. (Andrew: Is this assumption right?) So, is there are way to add/remove resources from group without disturbing the other resources. It's clear to me that the resources would restart when the node assignment after removing would change. Hints welcome. Approximative syntax, do not blame me ! * crm configure property maintenance-mode=true * crm resource stop R1 # it won't stop as it's in maintenance-mode * crm configure delete R1 * crm configure show # very that all references to R1 are gone * crm resource reprobe # the cluster double check the status of declared resources and sees that everything is fine and R1 doesn't exists anymore * crm_mon -Arf1 # double check that everything is "started (unmanaged)" and R1 is gone * crm_simulate -S -L -VVV # optional, to check what would happen when leaving maintenance-mode * crm configure property maintenance-mode=false If something goes wrong while in maintenance-mode, crm resource cleanup foo might be handy. Nothing should move, start or stop until you leave maintenance-mode anyway. I use this scenario very often, to add or remove IPaddr2 resources to a group of 30+ IPaddr2. -- Cheers, Florian Crouzat ___ Pacemaker mailing list: Pacemaker@oss.clusterlabs.org http://oss.clusterlabs.org/mailman/listinfo/pacemaker Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://bugs.clusterlabs.org
Re: [Pacemaker] failed actions after resource creation
Le 06/06/2013 16:25, andreas graeper a écrit : hi and thanks. (better sentences: i will give my best) Okay on inactive node there is actually only /etc/init.d/nfs and neither nfs-common nor nfs-kernel-server. is monitor not only looking for the running service on active node, but for the situation on inactive node, too ? Well, the main goal of a cluster is the ability to move resources between members in case of failures of the hosting node so yes, of course pacemaker checks the capacity of all cluster nodes to host the resources .. so i would have expected, that the missing nfs-kernel-server was reported, too. i guess, this can be handled only with a init-script 'nfs' (same name on both nodes) that is starting/killing nfs-commo/nfs-kernel-server ? or is there another solution ? If you want to clusterize nfs, than have all your node nfs-ready, otherwise, I really don't see what is your inactive node good for ? what is monitor in case of resource managed by lsb-script doing ? is it calling `service xxx status` ? Yes. what does the monitor expect on node where service is running / not running ? http://clusterlabs.org/doc/en-US/Pacemaker/1.1-crmsh/html/Pacemaker_Explained/ap-lsb.html thanks in advance andreas You are welcome. 2013/6/6 Florian Crouzat mailto:gen...@floriancrouzat.net>> Le 06/06/2013 15:49, andreas graeper a écrit : p_nfscommon_monitor_0 (node=linag, call=189, rc=5, status=complete): not installed Sounds obvious: "not installed". Node "linag" is missing some daemons/scripts , probably nfs-related. Check your nfs packages and configuration on both nodes, node1 should be missing something. what can i do ? Better sentences. -- Cheers, Florian Crouzat -- Cheers, Florian Crouzat ___ Pacemaker mailing list: Pacemaker@oss.clusterlabs.org http://oss.clusterlabs.org/mailman/listinfo/pacemaker Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://bugs.clusterlabs.org
Re: [Pacemaker] failed actions after resource creation
Le 06/06/2013 15:49, andreas graeper a écrit : p_nfscommon_monitor_0 (node=linag, call=189, rc=5, status=complete): not installed Sounds obvious: "not installed". Node "linag" is missing some daemons/scripts , probably nfs-related. Check your nfs packages and configuration on both nodes, node1 should be missing something. what can i do ? Better sentences. -- Cheers, Florian Crouzat ___ Pacemaker mailing list: Pacemaker@oss.clusterlabs.org http://oss.clusterlabs.org/mailman/listinfo/pacemaker Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://bugs.clusterlabs.org
Re: [Pacemaker] target: 7 vs. rc: 0 error
Le 06/06/2013 15:47, ESWAR RAO a écrit : Please let me know if there's any chance that I can monitor the already running resources ? That sentence makes no sense to me. As I said, you must not start resource from outside the cluster (runlevels). Let the resource-manager (pacemaker) start them and all your troubles will seem so faraway. And if you want to /add/ a new resource in the cluster configuration and that this very resource is already running from outside the cluster, then you should activate maintenance-mode, define the new resource, reprobe so the cluster sees it is running, and exit maintenance-mode. -- Cheers, Florian Crouzat ___ Pacemaker mailing list: Pacemaker@oss.clusterlabs.org http://oss.clusterlabs.org/mailman/listinfo/pacemaker Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://bugs.clusterlabs.org
Re: [Pacemaker] target: 7 vs. rc: 0 error
Le 06/06/2013 11:40, ESWAR RAO a écrit : Hi All, Can someone help me in my below setup: I am trying to monitor RA on a 2 node cluster setup. The resources are already running before start of HB+pacemaker. + #crm configure primitive oc_d1 lsb::testd1 meta allow-migrate="true" migration-threshold="1" failure-timeout="30s" op monitor interval="3s" #crm configure clone oc_d1_clone oc_d1 #crm configure primitive oc_d2 lsb::testd2 meta allow-migrate="true" migration-threshold="3" failure-timeout="30s" op monitor interval="5s" #crm configure clone oc_d2_clone oc_d2 + But the resources are getting stopped on one node with below errors: [11519]: WARN: status_from_rc: Action 9 (oc_d1:0_monitor_0) on ubuntu191 failed (target: 7 vs. rc: 0): Error Jun 06 14:45:22 ubuntu191 pengine: [11712]: notice: LogActions: Stop oc_d1:0 (ubuntu191) Clone Set: oc_d1_clone [oc_d1] Started: [ ubuntu190 ] Stopped: [ oc_d1:1 ] Clone Set: oc_d2_clone [oc_d2] Started: [ ubuntu190 ] Stopped: [ oc_d2:1 ] Thanks Eswar 1/ There is no question in your email. 2/ You should not start your HA resources in the runlevels if you intend to have them clusterized: pacemaker (aka: the resource manager) should start them. 3/ Eventually, if you persist against point 2, check what does return-code 7 means, and eventually read the sources from your RA to fine what could possibly returns 7. -- Cheers, Florian Crouzat ___ Pacemaker mailing list: Pacemaker@oss.clusterlabs.org http://oss.clusterlabs.org/mailman/listinfo/pacemaker Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://bugs.clusterlabs.org
Re: [Pacemaker] group resource starting parallel
Le 05/06/2013 16:23, Wolfgang Routschka a écrit : Hi Guys, one question about group resource for starting parallel configuring with crmshell (Scientific Linux 64 with pacemaker 1.1.8-7, cman-3.0.12.1-49 and crmsh-1.2.5-55). in my 2 node cluster I´ll configured a group with 40 ip-address resources for easy managing. Now I want that start the resources parallel. in my crmshell I cannot use the option "meta ordered=false" - these option is no longer disponse for my information Afte searching i found "resource sets" so I hope it´s correct for my way to parallel my resources but I can´t configure resource sets in crmshell in my opinion. How can I configure my resources to start parallel? Greetings Wolfgang Well, as one the primary author of pacemaker once said[1] "Unordered and/or uncolocated groups are an abomination." I don't know if his position has moved but a group beeing a syntaxic shortcut for ordering+collocation, trying to make it behave otherwise might not be a good idea, even if I understand your need to address a group of 40 resources in a command. Question: the couple seconds (if not a single second) required to start synchronously 40 IPaddr2 RA are too long to wait for you ? Why do you /must/ start them in parallel ? [1] - http://oss.clusterlabs.org/pipermail/pacemaker/2011-January/008969.html -- Cheers, Florian Crouzat ___ Pacemaker mailing list: Pacemaker@oss.clusterlabs.org http://oss.clusterlabs.org/mailman/listinfo/pacemaker Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://bugs.clusterlabs.org
Re: [Pacemaker] lsb resource manager
Le 04/06/2013 11:55, andreas graeper a écrit : but pacemaker should realize that lsb:xxx did not start ?! what is to do ? maybe the init scripts return is not correct ?! Check your init-script against lsb compliance: http://clusterlabs.org/doc/en-US/Pacemaker/1.1-crmsh/html/Pacemaker_Explained/ap-lsb.html -- Cheers, Florian Crouzat ___ Pacemaker mailing list: Pacemaker@oss.clusterlabs.org http://oss.clusterlabs.org/mailman/listinfo/pacemaker Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://bugs.clusterlabs.org
Re: [Pacemaker] more newbie questions
Le 04/06/2013 06:44, Alex Samad - Yieldbroker a écrit : but I am looking for info on the op start op monito op stop where can I find that. Googling show these in examples but doesn't explain them. Operations are not tied to a resource agent but are generic: start, stop, monitor, and eventually promote/demote, etc. More information in the official documentation, here http://clusterlabs.org/doc/en-US/Pacemaker/1.1/html/Pacemaker_Explained/_resource_operations.html -- Cheers, Florian Crouzat ___ Pacemaker mailing list: Pacemaker@oss.clusterlabs.org http://oss.clusterlabs.org/mailman/listinfo/pacemaker Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://bugs.clusterlabs.org
Re: [Pacemaker] Shutdown of pacemaker service takes 20 minutes
Le 30/05/2013 13:57, Johan Huysmans a écrit : When my resource has received the stop command, it will stop, but this takes some time. When the status is monitored during shutdown of the resource this will fail, as the resource is configured as on-fail="block", the resource is set to unmanaged. Is this a bug? or can I workaround this issue? You should probably override the default timeout for the stop() operation with a custom values of your choice. See table 5.3 http://clusterlabs.org/doc/en-US/Pacemaker/1.1/html/Pacemaker_Explained/_resource_operations.html#_monitoring_resources_for_failure Eg: primitive foo ocf:x:y \ op monitor on-fail="restart" interval="10s" \ op start interval="0" timeout="2min" \ op stop interval="0" timeout="5min" -- Cheers, Florian Crouzat ___ Pacemaker mailing list: Pacemaker@oss.clusterlabs.org http://oss.clusterlabs.org/mailman/listinfo/pacemaker Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://bugs.clusterlabs.org
Re: [Pacemaker] HA for apache, doesnot work with pacemaker
Le 29/05/2013 11:48, Gopi Krishna B a écrit : Allow from 10.102.228.60 Maybe you need Allow from 127.0.0.1 , or ::1 (localhost). Anyway, proceed step by step and validate your apache setup first, especially that the /status works fine. Then you can try to add HA to it, and provides pacemaker logs once you are sure apache is properly configured but pacemaker still fails to bring it up. -- Cheers, Florian Crouzat ___ Pacemaker mailing list: Pacemaker@oss.clusterlabs.org http://oss.clusterlabs.org/mailman/listinfo/pacemaker Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://bugs.clusterlabs.org
Re: [Pacemaker] newbie question(s)
Le 24/05/2013 15:36, Nick Khamis a écrit : We are looking to put something together but don't wan't to use cman for certain things and pacemaker for other. This will be strictly and active/active using OpenAIS and pacemaker. Has that been decoupled, stable and sorted out? I don't have the slightest idea. Sorry I never used any of these products, I just run couples good ol' cman+pacemaker 2 nodes clusters without shared storage. -- Cheers, Florian Crouzat ___ Pacemaker mailing list: Pacemaker@oss.clusterlabs.org http://oss.clusterlabs.org/mailman/listinfo/pacemaker Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://bugs.clusterlabs.org
Re: [Pacemaker] newbie question(s)
Le 24/05/2013 04:15, Alex Samad - Yieldbroker a écrit : -Original Message- From: Florian Crouzat [mailto:gen...@floriancrouzat.net] Sent: Thursday, 23 May 2013 6:27 PM To: pacemaker@oss.clusterlabs.org Subject: Re: [Pacemaker] newbie question(s) [snip] You could also wait for a failover where the VIP or (any resource) will fail to properly stop, the cluster doesn't know what do to on stop-failures (beside fencing), it freezes in this weird state => you have two slaves on your network until an admin fixes it. True, what would you use for quorum (it's a 2 node cluster)? I never used quorum on any 2 nodes cluster, by definition it makes no sense, so I always used no-quorum-policy=ignore. If you really want quorum, you need a third player, either a dedicated quorum pseudo-node (don't know much about these) or a third node. Hope it helps... -- Cheers, Florian Crouzat ___ Pacemaker mailing list: Pacemaker@oss.clusterlabs.org http://oss.clusterlabs.org/mailman/listinfo/pacemaker Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://bugs.clusterlabs.org
Re: [Pacemaker] newbie question(s)
Le 22/05/2013 02:13, Alex Samad - Yieldbroker a écrit : > >Any help or suggestions muchly appreciated > > > >Also, fencing! Not sure that I need it. The app is always running on both nodes, it's just the ip address that is shared A cluster without fencing is not a cluster, by definition. Proper fencing should be at least two different methods, eg: IPMI + PDU. Some peoples live with clusters without fencing, they mostly rely on luck and luck is not very "High Availability". If you want a proof that you need it, just drop all corosync network traffic => split brain, you have two masters on your network until an admin fixes it. You could also wait for a failover where the VIP or (any resource) will fail to properly stop, the cluster doesn't know what do to on stop-failures (beside fencing), it freezes in this weird state => you have two slaves on your network until an admin fixes it. -- Cheers, Florian Crouzat ___ Pacemaker mailing list: Pacemaker@oss.clusterlabs.org http://oss.clusterlabs.org/mailman/listinfo/pacemaker Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://bugs.clusterlabs.org
Re: [Pacemaker] Post script question
Le 22/05/2013 15:02, Daniel Gullin a écrit : Hi, is there is any possibilities to use a “post-script” when a failover has happened ? I have a corosync/pacemaker installation with two services, filesystem and IP and two nodes. The system should be in passive/active mode. When a failover has happen the passive node should mount the shared disk and migrate the shared IP as well, when that is finished I want pacemake to run a script on the “new” active node… Could I do this ? Thanks Daniel Maybe there are more elegant ways I haven't heard about but you can totally create a lsb resource firing a custom script of yours in the start() section, and by smart usage of group or collocation+ordering, force this resource to be started after all the others. Beware of the lsb compliance of your resource... -- Cheers, Florian Crouzat ___ Pacemaker mailing list: Pacemaker@oss.clusterlabs.org http://oss.clusterlabs.org/mailman/listinfo/pacemaker Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://bugs.clusterlabs.org
Re: [Pacemaker] newbie question(s)
tat can fail 99 more times on dc1wwwrp02 before being forced off May 21 09:09:28 dc1wwwrp01 pengine[2355]: notice: process_pe_message: Transition 5487: PEngine Input stored in: /var/lib/pengine/pe-input-1485.bz2 May 21 09:09:28 dc1wwwrp01 crmd[2356]: notice: do_state_transition: State transition S_POLICY_ENGINE -> S_TRANSITION_ENGINE [ input=I_PE_SUCCESS cause=C_IPC_MESSAGE origin=handle_response ] May 21 09:09:28 dc1wwwrp01 crmd[2356]: info: do_te_invoke: Processing graph 5487 (ref=pe_calc-dc-1369091368-5548) derived from /var/lib/pengine/pe-input-1485.bz2 May 21 09:09:28 dc1wwwrp01 crmd[2356]: notice: run_graph: Transition 5487 (Complete=0, Pending=0, Fired=0, Skipped=0, Incomplete=0, Source=/var/lib/pengine/pe-input-1485.bz2): Complete May 21 09:09:28 dc1wwwrp01 crmd[2356]: notice: do_state_transition: State transition S_TRANSITION_ENGINE -> S_IDLE [ input=I_TE_SUCCESS cause=C_FSA_INTERNAL origin=notify_crmd ] May 21 09:12:35 dc1wwwrp01 cib[2351]: info: cib_stats: Processed 1 operations (0.00us average, 0% utilization) in the last 10min May 21 09:23:12 dc1wwwrp01 crm_resource[5165]:error: unpack_rsc_op: Preventing ybrpstat from re-starting on dc1wwwrp01: operation monitor failed 'insufficient privileges' (rc=4) May 21 09:23:12 dc1wwwrp01 crm_resource[5165]:error: unpack_rsc_op: Preventing ybrpstat from re-starting on dc1wwwrp02: operation monitor failed 'insufficient privileges' (rc=4) May 21 09:23:12 dc1wwwrp01 cib[2351]: info: cib_process_request: Operation complete: op cib_delete for section //node_state[@uname='dc1wwwrp01']//lrm_resource[@id='ybrpstat'] (origin=local/crmd/5589, version=0.101.36): ok (rc=0) May 21 09:23:12 dc1wwwrp01 crmd[2356]: info: delete_resource: Removing resource ybrpstat for 5165_crm_resource (internal) on dc1wwwrp01 May 21 09:23:12 dc1wwwrp01 crmd[2356]: info: notify_deleted: Notifying 5165_crm_resource on dc1wwwrp01 that ybrpstat was deleted May 21 09:23:12 dc1wwwrp01 crmd[2356]: warning: decode_transition_key: Bad UUID (crm-resource-5165) in sscanf result (3) for 0:0:crm-resource-5165 May 21 09:23:12 dc1wwwrp01 attrd[2354]: notice: attrd_trigger_update: Sending flush op to all hosts for: fail-count-ybrpstat () May 21 09:23:12 dc1wwwrp01 cib[2351]: info: cib_process_request: Operation complete: op cib_delete for section //node_state[@uname='dc1wwwrp01']//lrm_resource[@id='ybrpstat'] (origin=local/crmd/5590, version=0.101.37): ok (rc=0) May 21 09:23:12 dc1wwwrp01 crmd[2356]: info: abort_transition_graph: te_update_diff:320 - Triggered transition abort (complete=1, tag=lrm_rsc_op, id=ybrpstat_last_0, magic=0:0;3:3572:0:c348b36c-f6dd-4a7d-ac5b-01a3b8ce3c34, cib=0.101.37) : Resource op removal May 21 09:23:12 dc1wwwrp01 crmd[2356]: info: abort_transition_graph: te_update_diff:320 - Triggered transition abort (complete=1, tag=lrm_rsc_op, id=ybrpstat_last_0, magic=0:0;3:3572:0:c348b36c-f6dd-4a7d-ac5b-01a3b8ce3c34, cib=0.101.37) : Resource op removal From node2 === May 21 09:20:03 dc1wwwrp02 lrmd: [2045]: info: rsc:ybrpip:16: monitor May 21 09:23:12 dc1wwwrp02 lrmd: [2045]: info: cancel_op: operation monitor[16] on ocf::IPaddr2::ybrpip for client 2048, its parameters: CRM_meta_name=[monitor] cidr_netmask=[24] crm_feature_set=[3.0.6] CRM_meta_timeout=[2] CRM_meta_interval=[5000] ip=[10.32.21.10] cancelled May 21 09:23:12 dc1wwwrp02 lrmd: [2045]: info: rsc:ybrpip:20: stop May 21 09:23:12 dc1wwwrp02 cib[2043]: info: apply_xml_diff: Digest mis-match: expected dcee73fe6518ac0d4b3429425d5dfc16, calculated 4a39d2ad25d50af2ec19b5b24252aef8 May 21 09:23:12 dc1wwwrp02 cib[2043]: notice: cib_process_diff: Diff 0.101.36 -> 0.101.37 not applied to 0.101.36: Failed application of an update diff May 21 09:23:12 dc1wwwrp02 cib[2043]: info: cib_server_process_diff: Requesting re-sync from peer May 21 09:23:12 dc1wwwrp02 cib[2043]: notice: cib_server_process_diff: Not applying diff 0.101.36 -> 0.101.37 (sync in progress) May 21 09:23:12 dc1wwwrp02 cib[2043]: notice: cib_server_process_diff: Not applying diff 0.101.37 -> 0.102.1 (sync in progress) May 21 09:23:12 dc1wwwrp02 cib[2043]: notice: cib_server_process_diff: Not applying diff 0.102.1 -> 0.102.2 (sync in progress) May 21 09:23:12 dc1wwwrp02 cib[2043]: notice: cib_server_process_diff: Not applying diff 0.102.2 -> 0.102.3 (sync in progress) May 21 09:23:12 dc1wwwrp02 cib[2043]: notice: cib_server_process_diff: Not applying diff 0.102.3 -> 0.102.4 (sync in progress) Any help or suggestions muchly appreciated Also, fencing! Thanks Alex -- Cheers, Florian Crouzat ___ Pacemaker mailing list: Pacemaker@oss.clusterlabs.org http://oss.clusterlabs.org/mailman/listinfo/pacemaker Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://bugs.clusterlabs.org
Re: [Pacemaker] question about interface failover
Le 18/05/2013 20:23, christopher barry a écrit : On Fri, 2013-05-17 at 10:41 +0200, Florian Crouzat wrote: Le 16/05/2013 21:45, christopher barry a écrit : Greetings, I've setup a new 2-node mysql cluster using * drbd 8.3.1.3 * corosync 1.4.2 * pacemaker 117 on Debian Wheezy nodes. failover seems to be working fine for everything except the ips manually configured on the interfaces. This sentence makes no sense to me. The cluster will not failover something that is not clusterized (a 'manually' configured IP...) What are you trying to achieve exactly ? Also, could you pastebin the output of "crm_mon -Arf1" I find it more easy to read. see config here: http://pastebin.aquilenet.fr/?9eb51f6fb7d65fda#/YvSiYFocOzogAmPU9g +g09RcJvhHbgrY1JuN7D+gA4= If I bring down an interface, when the cluster restarts it, it only starts it with the vip - the original ip and route have been removed. Makes sense if you added the 'original' IP manually... You should have non-VIP in /etc/sysconfig/network/ifcfg-* But then again, please precise what you are trying to achieve. not sure what to do to make sure the permanent ip and the routes get restored. I'm not all that versed on the cluster commandline yet, and I'm using LCMC for most of my usage. (@howard2.rjmetrics.com)-(14:00 / Sat May 18) [-][~]# crm_mon -Arf1 Last updated: Sat May 18 14:00:27 2013 Last change: Thu May 16 17:33:07 2013 via crm_attribute on howard3.rjmetrics.com Stack: openais Current DC: howard3.rjmetrics.com - partition with quorum Version: 1.1.7-ee0730e13d124c3d58f00016c3376a1de5323cff 2 Nodes configured, 2 expected votes 6 Resources configured. Online: [ howard3.rjmetrics.com howard2.rjmetrics.com ] Full list of resources: Master/Slave Set: ms_drbd_mysql [p_drbd_mysql] Masters: [ howard2.rjmetrics.com ] Slaves: [ howard3.rjmetrics.com ] Resource Group: g_mysql p_fs_mysql(ocf::heartbeat:Filesystem):Started howard2.rjmetrics.com ClusterPrivateIP (ocf::heartbeat:IPaddr2): Started howard2.rjmetrics.com ClusterPublicIP (ocf::heartbeat:IPaddr2): Started howard2.rjmetrics.com p_mysql (ocf::heartbeat:mysql): Started howard2.rjmetrics.com Node Attributes: * Node howard3.rjmetrics.com: + master-p_drbd_mysql:0: 1000 * Node howard2.rjmetrics.com: + master-p_drbd_mysql:1: 1 Migration summary: * Node howard3.rjmetrics.com: p_drbd_mysql:1: migration-threshold=100 fail-count=1 * Node howard2.rjmetrics.com: ClusterPublicIP: migration-threshold=100 fail-count=1 Failed actions: p_drbd_mysql:1_promote_0 (node=howard3.rjmetrics.com, call=29, rc=-2, status=Timed Out): unknown exec error ClusterPublicIP_monitor_3 (node=howard2.rjmetrics.com, call=122, rc=7, status=complete): not running howard2 and howard3 are the two clustered servers. During testing, when I ifdown either eth0 or eth1, the cluster starts the vip back up, but the other non-vip IPs and routes do not get started. I'm running Debian, so these are configured in /etc/network/interfaces. Saying 'manually' configured was misleading on my part, sorry about that. Mhh, I cannot reproduce right now but I was pretty sure that IPaddr2 used "ip addr add X.X.X.X/YY dev ZZ" so I was expecting that ifdowning device ZZ would prevent pacemaker to re-up the VIP as the underlaying device doesn't exists anymore. It's even proved by the fact that the non-vip doesn't come up again: IPaddr2 doesn't ifup, it add an alias to an existing device. See "sudo crm ra meta IPaddr2" and search for "nic=" Anyway, "ifdown" is not a valid use case to test your cluster, this doesn't represent any possible valid production scenario. eth0 is the public interface, and eth1 is the private interface. eth2 and eth3 are bonded as bond0, use jumbo frames, and are crossover cabled between the nodes. The test I was doing was to pull cables from eth0 and eth1, which hung the cluster. My assumption is that I need to add more configuration elements to manage the other IPs and also setup some ping hosts that when unreachable will initiate failover. What would help me I think is an example config or pointers to how to add these elements. Well, without digging much in your configuration, you need ping-nodes yes so that your most connected nodes "wins", and you also need fencing, that is mandatory on any cluster. Here's sample configuration for ping nodes and a location constraing so that the most connected nodes hosts the resource "foo": primitive ping-gw-sw1-sw2 ocf:pacemaker:ping \ params host_list="192.168.10.1 192.168.2.11 192.168.2.12" dampen="35s" attempts="2" timeout="2" multiplier="100" \ op monitor interval="
Re: [Pacemaker] question about interface failover
Le 16/05/2013 21:45, christopher barry a écrit : Greetings, I've setup a new 2-node mysql cluster using * drbd 8.3.1.3 * corosync 1.4.2 * pacemaker 117 on Debian Wheezy nodes. failover seems to be working fine for everything except the ips manually configured on the interfaces. This sentence makes no sense to me. The cluster will not failover something that is not clusterized (a 'manually' configured IP...) What are you trying to achieve exactly ? Also, could you pastebin the output of "crm_mon -Arf1" I find it more easy to read. see config here: http://pastebin.aquilenet.fr/?9eb51f6fb7d65fda#/YvSiYFocOzogAmPU9g +g09RcJvhHbgrY1JuN7D+gA4= If I bring down an interface, when the cluster restarts it, it only starts it with the vip - the original ip and route have been removed. Makes sense if you added the 'original' IP manually... You should have non-VIP in /etc/sysconfig/network/ifcfg-* But then again, please precise what you are trying to achieve. not sure what to do to make sure the permanent ip and the routes get restored. I'm not all that versed on the cluster commandline yet, and I'm using LCMC for most of my usage. -- Cheers, Florian Crouzat ___ Pacemaker mailing list: Pacemaker@oss.clusterlabs.org http://oss.clusterlabs.org/mailman/listinfo/pacemaker Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://bugs.clusterlabs.org
Re: [Pacemaker] ClusterMon Resource starting multiple instances of crm_mon
Le 09/05/2013 16:40, Steven Bambling a écrit : I'm having some issues with getting some cluster monitoring setup and configured on a 3 node multi-state cluster. I'm using Florian's blog as an example http://floriancrouzat.net/2013/01/monitor-a-pacemaker-cluster-with-ocfpacemakerclustermon-andor-external-agent/. Btw, if any clarification is required on this blog post, let me know ;) -- Cheers, Florian Crouzat ___ Pacemaker mailing list: Pacemaker@oss.clusterlabs.org http://oss.clusterlabs.org/mailman/listinfo/pacemaker Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://bugs.clusterlabs.org
Re: [Pacemaker] cman based cluster fencindevice fence_pcmk
Le 11/04/2013 16:49, Wolfgang Routschka a écrit : Hi all, one question today about cman based cluster on rhel6 and clone systems with fencingdeviceagent fence_pcmk In my scenario the stonithdevice is a IBM based IPMI-Management Interface (IMM) so I want to use fence_ipmilan from package resource-agents. after reading rhel quickstart guide _http://clusterlabs.org/quickstart-redhat.html_ and cluster from Scratch 8.2.3 Configuring CMAN Fencing _http://clusterlabs.org/doc/en-US/Pacemaker/1.1-plugin/html/Clusters_from_Scratch/ch08s02s03.html_ I´m not sure how can I use fence_ipmilan for stonith device resource because in my configuration fencingdevice is fence_pcmk "ccs -f /etc/cluster/cluster.conf --addfencedev pcmk agent=fence_pcmk" Greetings Wolfgang All "ccs" commands address CMAN. With these, you only configure CMAN to delegate fencing to Pacemaker. Then, with regular Pacemaker commands and by defining usual primitives (fence_ipmilan, fence_wti ...), you achieve resource-level fencing. Do not mix the two concepts. -- Cheers, Florian Crouzat ___ Pacemaker mailing list: Pacemaker@oss.clusterlabs.org http://oss.clusterlabs.org/mailman/listinfo/pacemaker Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://bugs.clusterlabs.org
Re: [Pacemaker] ping resource polling skew
Le 20/03/2013 04:11, Quentin Smith a écrit : Is there any way to get Pacemaker to delay resource transitions until at least one full polling cycle has happened, so that in the event of an outage of the ping target, resources stay put where they are running? there is the "dampen" parameter use a high value like 3 or more times the monitor-interval to give all nodes the chance to detect the dead target(s), that should help. Does that actually help in this case? My understanding is that the dampen parameter will delay the attribute change for each host, but those delays will still tick down separately for each node, resulting in exactly the same behavior, just delayed by dampen seconds. I have had the same questions, and I was quite surprised to see this issue wasn't really mentioned anywhere. So far, I've been relying on the dampen parameter. Here is my resource definition: primitive ping-nq-sw-swsec ocf:pacemaker:ping \ params host_list="192.168.10.1 192.168.2.11 192.168.2.12" dampen="35s" attempts="2" timeout="2" multiplier="100" \ op monitor interval="15s" As I understand it, a node cannot trigger any transition until 35s (dampen) had passed since this particular node lost a ping-node. And by setting a monitor interval of 15s, I can be sure that within this 35s, all nodes should have marked that ping-node as dead and continue to all have a common score => nothing moves (35s > 2*15s so at least all nodes have pinged twice during the dampen delay) Hope that helps. -- Cheers, Florian Crouzat ___ Pacemaker mailing list: Pacemaker@oss.clusterlabs.org http://oss.clusterlabs.org/mailman/listinfo/pacemaker Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://bugs.clusterlabs.org
Re: [Pacemaker] Running a custom script
Le 12/03/2013 12:17, Michael Smith a écrit : Hi, I think it’s quite simple but I can’t see how to do it. I have a two node cluster serving up a samba share and allowing access via sftp, these I have all working fine, but for the following. In order to have samba and sftp working together on the same directories I need to do a mount –o bind . This I need to be done when the cluster fails over, it’s a one off event and doesn’t have a any running process or require a start/stop script. I have written the script but I now need it to run as a clustered resource, once the file system has been mounted (drdb) and before I start samba and set the virtual IP. Hope you can help. Thanks Michael Smith Define a lsb resource (see Pacemaker Explained), it can be anything from an init-script to a custom script, as long as it's LSB compliant[1]. You can have it behave like any other shipped resource-agent. [1] - http://clusterlabs.org/doc/en-US/Pacemaker/1.1-crmsh/html/Pacemaker_Explained/ap-lsb.html -- Cheers, Florian Crouzat ___ Pacemaker mailing list: Pacemaker@oss.clusterlabs.org http://oss.clusterlabs.org/mailman/listinfo/pacemaker Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://bugs.clusterlabs.org
Re: [Pacemaker] set the timeout
Le 11/03/2013 15:44, fatcha...@gmx.de a écrit : Hi, how can I adjust the timeout for the start of a mysqld or a httpd in a config. It seems to me as if this is a little to short for our systems. One of my configuration looks like this: See http://clusterlabs.org/doc/en-US/Pacemaker/1.1-crmsh/html/Pacemaker_Explained/_resource_operations.html It's the same as defining a timeout for a "monitor" operation. You can do it for any operation. Eg: primitive foo lsb:bar \ op monitor on-fail="restart" interval="10s" \ op start interval="0" timeout="3min" \ op stop interval="0" timeout="1min" Don't forget stop timeouts or you'll (probably) get fenced ;) -- Cheers, Florian Crouzat ___ Pacemaker mailing list: Pacemaker@oss.clusterlabs.org http://oss.clusterlabs.org/mailman/listinfo/pacemaker Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://bugs.clusterlabs.org
Re: [Pacemaker] Monitor process, migrate only ip resources
Le 20/02/2013 09:07, Grant Bagdasarian a écrit : 3)When I only stop the kamailio process on the primary node, the process is restarted again. Which is also good, but I thought it would migrate everything to the secondary node when the kamailio process stopped. Pacemaker always tries to restart a failed resources on it's original hosting node. Only if it fails to do so it will migrate the resource(s) to another node. If there is something wrong on the primary node which causes the kamailio process to keep crashing and restarting, it could be a hazard for our production environment. I guess the question is: is failcount incremented when a failed resource *is* restarted on the same node and no transition occurs . -- Cheers, Florian Crouzat ___ Pacemaker mailing list: Pacemaker@oss.clusterlabs.org http://oss.clusterlabs.org/mailman/listinfo/pacemaker Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://bugs.clusterlabs.org
Re: [Pacemaker] Monitor process, migrate only ip resources
Le 19/02/2013 13:54, Grant Bagdasarian a écrit : Hello, I wish to monitor a certain running process and migrate floating IP addresses when this process stops running. My current configuration is as following: crm(live)configure# show node $id="8fe81814-6e85-454f-b77b-5783cc18f4c6" proxy1 node $id="ceb5c90f-ee6a-44b9-b722-78781f6a61ab" proxy2 primitive sip_ip ocf:heartbeat:IPaddr \ params ip="10.0.0.1" cidr_netmask="255.255.255.0" nic="eth1" \ op monitor interval="40s" timeout="20s" primitive sip_ip_2 ocf:heartbeat:IPaddr \ params ip="10.0.0.2" cidr_netmask="255.255.255.0" nic="eth1" \ op monitor interval="40s" timeout="20s" primitive sip_ip_3 ocf:heartbeat:IPaddr \ params ip="10.0.0.3" cidr_netmask="255.255.255.0" nic="eth1" \ op monitor interval="40s" timeout="20s" location sip_ip_pref sip_ip 100: proxy1 location sip_ip_pref_2 sip_ip_2 101: proxy1 location sip_ip_pref_3 sip_ip_3 102: proxy1 property $id="cib-bootstrap-options" \ dc-version="1.0.8-042548a451fce8400660f6031f4da6f0223dd5dd" \ cluster-infrastructure="Heartbeat" \ stonith-enabled="false" Couple days ago our kamailio process stopped and the ip resources weren’t migrated to our secondary node. Of course, why would they ? The secondary node already has the kamailio process running. First, you remove kamailio from runlevel so that it doesn't start on server boot and only the cluster manages it. How do I configure ha so that the kamailio process is monitored every x seconds and when it has stopped the three ip addresses are migrated to the secondary node? Then, you define a kamailio resource (possibly by using lsb:kamailio is there is not a real resource-agent). Finally, you create a group with your 3 IPs + kamailio. Remember, groups are ordered and collocated set of resources that starts from left to right, and are stopped in the opposite order. Possibly, you define a location constraint for the group to prefer proxy1. In the end, it might be something like: ... primitive sip_ip ocf:heartbeat:IPaddr \ params ip="10.0.0.1" cidr_netmask="255.255.255.0" nic="eth1" \ op monitor interval="40s" timeout="20s" primitive sip_ip_2 ocf:heartbeat:IPaddr \ params ip="10.0.0.2" cidr_netmask="255.255.255.0" nic="eth1" \ op monitor interval="40s" timeout="20s" primitive sip_ip_3 ocf:heartbeat:IPaddr \ params ip="10.0.0.3" cidr_netmask="255.255.255.0" nic="eth1" \ op monitor interval="40s" timeout="20s" primitive kamailio lsb:kamailio \ op monitor interval="40s" timeout="20s" group SIP sip_ip sip_ip_2 sip_ip_3 kamailio location SIP_prefer_proxy1 SIP 100: proxy1 ... Note that with such as config, kamailio is restarted when it is migrated, which I assumed is something you want, so that it can bind on the sip_ip ... -- Cheers, Florian Crouzat ___ Pacemaker mailing list: Pacemaker@oss.clusterlabs.org http://oss.clusterlabs.org/mailman/listinfo/pacemaker Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://bugs.clusterlabs.org
Re: [Pacemaker] Online add a new node to cluster communicating by UDPU
Le 12/02/2013 10:39, Lars Marowsky-Bree a écrit : On 2013-02-12T10:24:29, Florian Crouzat wrote: There might be other ways, probably cleaner, but you can always: "always" is relative. This doesn't work for services like GFS2/OCFS2/cLVM2. Regards, Lars Yeah right, I don't use shared storage in my clusters, so my mind skipped that part, you are right. -- Cheers, Florian Crouzat ___ Pacemaker mailing list: Pacemaker@oss.clusterlabs.org http://oss.clusterlabs.org/mailman/listinfo/pacemaker Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://bugs.clusterlabs.org
Re: [Pacemaker] Online add a new node to cluster communicating by UDPU
Le 12/02/2013 10:10, Michal Fiala a écrit : Hello, is there a way how to online add a new node to corosync/pacemaker cluster, where nodes communicate by unicast UDP? Thanks Michal There might be other ways, probably cleaner, but you can always: * put the cluster into maintenance-mode * shutdown the cluster stack on all nodes (pacemaker + corosync) * reconfigure the ring(s) on all nodes * start corosync on all nodes * test connectivity (corosync-objctl | fgrep members) * start pacemaker -- Cheers, Florian Crouzat ___ Pacemaker mailing list: Pacemaker@oss.clusterlabs.org http://oss.clusterlabs.org/mailman/listinfo/pacemaker Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://bugs.clusterlabs.org
Re: [Pacemaker] crm_mon SNMP support
Le 05/02/2013 02:36, Andrew Beekhof a écrit : On Fri, Feb 1, 2013 at 8:21 PM, Florian Crouzat wrote: Le 01/02/2013 03:48, Andrew Beekhof a écrit : On Tue, Jan 22, 2013 at 3:18 AM, Florian Crouzat wrote: Le 29/11/2012 22:10, Andrew Beekhof a écrit : Not so fast :-) crm_mon supports -E, --external-agent=value A program to run when resource operations take place. -e, --external-recipient=value A recipient for your program (assuming you want the program to send something to someone). so without recompiling, you can call a script - possibly it could call something that sends out snmp alerts ;-) So, I took a first shot at writing an external-agent script that would somehow reproduce the behavior of crm_mon when SNMP support is built-in. Basically, to refocus the discussion, I've written this script because I want to be alerted via SNMP on most of the cluster events but sadly my version of crm_mon doesn't have SNMP support (RHEL6), so I cannot use this feature combined with ocf:pacemaker:ClusterMon but I can use crm_mon ability to trigger an external-agent (script, binary...) Script: http://files.floriancrouzat.net/clusterMon.sh It respects PCMK-MIB.txt. Is that something you'd like to include with pacemaker? Well, I'm really not sure it would be useful for anyone as it not generic at all, and highly oriented for my needs. Looks pretty configurable to me. Maybe it would fit better in an "example" section of Pacemaker_Explained That also works :) Well, do as you please. I'm by no means able to choose :) Not sure what you meant by "Is that something you'd like to include with pacemaker?" but feel free to commit the script to any "extra" or "script" folder in pacemaker sources, maybe remove my name and add the required license(s) to it or just paste the few bash lines I produced to the correct section of the correct documentation. I did some polishing on the comments in the scripts since Chapter 7 of Pacemaker Explained has been published (http://clusterlabs.org/doc/en-US/Pacemaker/1.1-crmsh/html/Pacemaker_Explained/s-notification-external.html) -- Cheers, Florian Crouzat ___ Pacemaker mailing list: Pacemaker@oss.clusterlabs.org http://oss.clusterlabs.org/mailman/listinfo/pacemaker Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://bugs.clusterlabs.org
Re: [Pacemaker] crm_mon SNMP support
Le 01/02/2013 03:48, Andrew Beekhof a écrit : On Tue, Jan 22, 2013 at 3:18 AM, Florian Crouzat wrote: Le 29/11/2012 22:10, Andrew Beekhof a écrit : Not so fast :-) crm_mon supports -E, --external-agent=value A program to run when resource operations take place. -e, --external-recipient=value A recipient for your program (assuming you want the program to send something to someone). so without recompiling, you can call a script - possibly it could call something that sends out snmp alerts ;-) So, I took a first shot at writing an external-agent script that would somehow reproduce the behavior of crm_mon when SNMP support is built-in. Basically, to refocus the discussion, I've written this script because I want to be alerted via SNMP on most of the cluster events but sadly my version of crm_mon doesn't have SNMP support (RHEL6), so I cannot use this feature combined with ocf:pacemaker:ClusterMon but I can use crm_mon ability to trigger an external-agent (script, binary...) Script: http://files.floriancrouzat.net/clusterMon.sh It respects PCMK-MIB.txt. Is that something you'd like to include with pacemaker? Well, I'm really not sure it would be useful for anyone as it not generic at all, and highly oriented for my needs. Maybe it would fit better in an "example" section of Pacemaker_Explained ps: yes I know, lots of comments and few actual lines of code but that's just because http://clusterlabs.org/doc/en-US/Pacemaker/1.1/html-single/Pacemaker_Explained/#ch-notification hasn't been published yet. /me kicks of a rebuild now. Thank you for that, hope my poor writing skills won't be noticed =) -- Cheers, Florian Crouzat ___ Pacemaker mailing list: Pacemaker@oss.clusterlabs.org http://oss.clusterlabs.org/mailman/listinfo/pacemaker Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://bugs.clusterlabs.org
Re: [Pacemaker] crm_mon SNMP support
Le 29/11/2012 22:10, Andrew Beekhof a écrit : Not so fast :-) crm_mon supports -E, --external-agent=value A program to run when resource operations take place. -e, --external-recipient=value A recipient for your program (assuming you want the program to send something to someone). so without recompiling, you can call a script - possibly it could call something that sends out snmp alerts ;-) So, I took a first shot at writing an external-agent script that would somehow reproduce the behavior of crm_mon when SNMP support is built-in. Basically, to refocus the discussion, I've written this script because I want to be alerted via SNMP on most of the cluster events but sadly my version of crm_mon doesn't have SNMP support (RHEL6), so I cannot use this feature combined with ocf:pacemaker:ClusterMon but I can use crm_mon ability to trigger an external-agent (script, binary...) Script: http://files.floriancrouzat.net/clusterMon.sh It respects PCMK-MIB.txt. ps: yes I know, lots of comments and few actual lines of code but that's just because http://clusterlabs.org/doc/en-US/Pacemaker/1.1/html-single/Pacemaker_Explained/#ch-notification hasn't been published yet. Any directions, hints or corrections are welcome. Florian. -- Cheers, Florian Crouzat ___ Pacemaker mailing list: Pacemaker@oss.clusterlabs.org http://oss.clusterlabs.org/mailman/listinfo/pacemaker Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://bugs.clusterlabs.org
Re: [Pacemaker] crm_mon SNMP support
Le 07/12/2012 03:19, Andrew Beekhof a écrit : On Fri, Dec 7, 2012 at 2:58 AM, Florian Crouzat wrote: I cannot find any good place where to write this new content in Pacemaker Explained. If advised (in terms of table of contents), I'll happily provide a patch. I reckon just before https://github.com/ClusterLabs/pacemaker/blob/master/doc/Pacemaker_Explained/en-US/Ch-Advanced-Options.txt#L78 is as good as any :) Actually, there was an obvious place ; undocumented Chapter 7: "Receiving Notification for Cluster Events". Please find attached a git patch, it's basically my first time with both asciidoc and git, also, I'm not a native English speaker so please be indulgent. -- Cheers, Florian Crouzat diff --git a/doc/Pacemaker_Explained/en-US/Ch-Notifications.txt b/doc/Pacemaker_Explained/en-US/Ch-Notifications.txt index e69de29..ad99977 100644 --- a/doc/Pacemaker_Explained/en-US/Ch-Notifications.txt +++ b/doc/Pacemaker_Explained/en-US/Ch-Notifications.txt @@ -0,0 +1,126 @@ += Receiving Notification for Cluster Events = + +anchor:ch-notifications[Chapter 7, Receiving Notification for Cluster Events] +indexterm:[Resource,Notification] + +A Pacemaker cluster is an event driven system. In this context, an event is a +resource failure or configuration change (not exhaustive). + +The +ocf:pacemaker:ClusterMon+ resource can monitor the cluster status and +triggers alerts on each cluster event. This resource runs +crm_mon+ in the +background at regular intervals (configurable) and uses +crm_mon+ capabilities +to send emails (SMTP), SNMP traps or to execute an external program via the ++extra_options+ parameter. + +[NOTE] +Depending on your system settings and compilation settings, SNMP or email +alerts might be unavailable. Check +crm_mon --help+ output to see if these +options are available to you. In any case, executing an external agent will +always be available, and you can have this agent to send emails, SNMP traps, +or whatever action you develop. + +[[s-notification-snmp]] +== Configuring SNMP Notifications == +indexterm:[Resource,Notification,SNMP] + +Requires an IP to send SNMP traps to, and a SNMP community. +Pacemaker MIB is found in _/usr/share/snmp/mibs/PCMK-MIB.txt_ + +.Configuring ClusterMon to send SNMP traps += +[source,XML] + + + + + + + + + += + +[[s-notification-email]] +== Configuring Email Notifications == +indexterm:[Resource,Notification,SMTP,Email] + +Requires a user to send mail alerts to. "Mail-From", SMTP relay and Subject prefix can also be configured. + +.Configuring ClusterMon to send email alerts += +[source,XML] + + + + + + + + + += + +[[s-notification-external]] +== Configuring Notifications via External-Agent == + +Requires a program (external-agent) to run when resource operations take +place, and an external-recipient (IP address, Email address, URI…). When +triggered, the external-agent is fed with dynamically filled environnement +variables describing precisely the cluster event that occurred. By making +smart usage of these variables in your external-agent code, you can trigger +any action. + +.Configuring ClusterMon to execute an external-agent += +[source,XML] + + + + + + + + + += + +.Environment Variables Passed to the External Agent +[width="95%",cols="1m,2<",options="header",align="center"] +|= + +|Environment Variable +|Description + +|CRM_notify_recipient +| The static external-recipient from the resource definition. + indexterm:[Environment Variable,CRM_notify_recipient] + +|CRM_notify_node +| The node on which the status change happened. + indexterm:[Environment Variable,CRM_notify_node] + +|CRM_notify_rsc +| The name of the resource that changed the status. + indexterm:[Environment Variable,CRM_notify_rsc] + +|CRM_notify_task +| The operation that caused the status change. + indexterm:[Environment Variable,CRM_notify_task] + +|CRM_notify_desc +| The textual output relevant error code of the operation (if any) that caused the status change. + indexterm:[Environment Variable,CRM_notify_desc] + +|CRM_notify_rc +| The return code of the operation. + indexterm:[Environment Variable,CRM_notify_rc] + +|CRM_notify_target_rc +| The expected return code of the operation. + indexterm:[Environment Variable,CRM_notify_target_rc] + +|CRM_notify_status +| The numerical representation of the status of the operation. + indexterm:[Environment Variable,CRM_notify_target_rc] + +|= diff --git a/doc/Pacemaker_Explained/en-US/Pacemaker_Explained.xml b/doc/Pacemaker_Explained/en-US/Pacemaker_Explained.xml index 54662d8..aa1eab4 100644 --- a/doc/Pacemaker_Explained/en-US/Pacemaker_Explained.x
Re: [Pacemaker] crm_mon SNMP support
Le 05/12/2012 01:38, Andrew Beekhof a écrit : On Tuesday, December 4, 2012, Florian Crouzat wrote: Le 03/12/2012 03:27, Andrew Beekhof a écrit : On Sat, Dec 1, 2012 at 1:07 AM, Florian Crouzat wrote: Le 29/11/2012 22:10, Andrew Beekhof a écrit : Not so fast :-) crm_mon supports -E, --external-agent=value A program to run when resource operations take place. -e, --external-recipient=value A recipient for your program (assuming you want the program to send something to someone). so without recompiling, you can call a script - possibly it could call something that sends out snmp alerts ;-) Oh, great! I had a hard time understanding these two options and how they relate, you helped me on IRC but I'll reply here so there is a trace in case someone is also interested. Thanks for that. I really need to make some time to document this. If you have a suggestion as where this documentation should go, I might propose a patch. I'm not sure crm_mon --help or man crm_mon can be more verbose than they already are. Giving a full example and mentioning the ENV variables to use in the external-agent etc is too long for these brief doc. What do you think ? The proper place would be "pacemaker explained" Happily it lives in the source tree (doc/Pacemaker_Explained/en-US/Ch-*) in asciidoc format. A patch with the details above would be most welcome :) I cannot find any good place where to write this new content in Pacemaker Explained. If advised (in terms of table of contents), I'll happily provide a patch. Here is my resource: primitive ClusterMon ocf:pacemaker:ClusterMon \ params user="root" update="30" extra_options="-E /usr/local/bin/foo.sh -e 192.168.1.2" \ op monitor on-fail="restart" interval="10" \ meta target-role="Started" clone ClusterMon-clone ClusterMon Here is the content of my script: $ cat /usr/local/bin/foo.sh #!/bin/bash ( echo CRM_notify_recipient $CRM_notify_recipient echo CRM_notify_node $CRM_notify_node echo CRM_notify_rsc $CRM_notify_rsc echo CRM_notify_task $CRM_notify_task echo CRM_notify_desc $CRM_notify_desc echo CRM_notify_rc $CRM_notify_rc echo CRM_notify_target_rc $CRM_notify_target_rc echo CRM_notify_status $CRM_notify_status echo ) > /tmp/pacemaker.log Finally, this is the resulting log of one execution. The script is executed on each cluster operation/transition (monitor, stop, start) etc. $ cat /tmp/pacemaker.log CRM_notify_recipient 192.168.1.2 CRM_notify_node scoresby2.lyra-network.com <http://scoresby2.lyra-network.com> CRM_notify_rsc F CRM_notify_task monitor CRM_notify_desc ok CRM_notify_rc 0 CRM_notify_target_rc 0 CRM_notify_status 0 One just has to do some scripting with these variables to match its needs. In my case, I guess I want a SNMP trap whenever CRM_notify_rc != 0. Thanks -- Florian Crouzat -- Cheers, Florian Crouzat -- Cheers, Florian Crouzat ___ Pacemaker mailing list: Pacemaker@oss.clusterlabs.org http://oss.clusterlabs.org/mailman/listinfo/pacemaker Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://bugs.clusterlabs.org
Re: [Pacemaker] crm_mon SNMP support
Le 03/12/2012 03:27, Andrew Beekhof a écrit : On Sat, Dec 1, 2012 at 1:07 AM, Florian Crouzat wrote: Le 29/11/2012 22:10, Andrew Beekhof a écrit : Not so fast :-) crm_mon supports -E, --external-agent=value A program to run when resource operations take place. -e, --external-recipient=value A recipient for your program (assuming you want the program to send something to someone). so without recompiling, you can call a script - possibly it could call something that sends out snmp alerts ;-) Oh, great! I had a hard time understanding these two options and how they relate, you helped me on IRC but I'll reply here so there is a trace in case someone is also interested. Thanks for that. I really need to make some time to document this. If you have a suggestion as where this documentation should go, I might propose a patch. I'm not sure crm_mon --help or man crm_mon can be more verbose than they already are. Giving a full example and mentioning the ENV variables to use in the external-agent etc is too long for these brief doc. What do you think ? Here is my resource: primitive ClusterMon ocf:pacemaker:ClusterMon \ params user="root" update="30" extra_options="-E /usr/local/bin/foo.sh -e 192.168.1.2" \ op monitor on-fail="restart" interval="10" \ meta target-role="Started" clone ClusterMon-clone ClusterMon Here is the content of my script: $ cat /usr/local/bin/foo.sh #!/bin/bash ( echo CRM_notify_recipient $CRM_notify_recipient echo CRM_notify_node $CRM_notify_node echo CRM_notify_rsc $CRM_notify_rsc echo CRM_notify_task $CRM_notify_task echo CRM_notify_desc $CRM_notify_desc echo CRM_notify_rc $CRM_notify_rc echo CRM_notify_target_rc $CRM_notify_target_rc echo CRM_notify_status $CRM_notify_status echo ) > /tmp/pacemaker.log Finally, this is the resulting log of one execution. The script is executed on each cluster operation/transition (monitor, stop, start) etc. $ cat /tmp/pacemaker.log CRM_notify_recipient 192.168.1.2 CRM_notify_node scoresby2.lyra-network.com CRM_notify_rsc F CRM_notify_task monitor CRM_notify_desc ok CRM_notify_rc 0 CRM_notify_target_rc 0 CRM_notify_status 0 One just has to do some scripting with these variables to match its needs. In my case, I guess I want a SNMP trap whenever CRM_notify_rc != 0. Thanks -- Florian Crouzat -- Cheers, Florian Crouzat ___ Pacemaker mailing list: Pacemaker@oss.clusterlabs.org http://oss.clusterlabs.org/mailman/listinfo/pacemaker Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://bugs.clusterlabs.org
Re: [Pacemaker] Getting Started
Le 03/12/2012 15:24, Brett Maton a écrit : Hi List, I'm new to corosync / pacemaker so please forgive my ignorance! I currently have Postgres streaming replication between node1(master) and node2(slave, hot standby), the replication user authenticates to master using an md5 password. All good there... My goal use pacemaker / heartbeat to move VIP and promote node2 if node1 fails, without using drdb or pg-pool. What I'm having trouble with is finding resources for learning what I need to configure with regards to corosync / pacemaker to implement failover. All of the guides I've found use DRDB and/or a much more robust network configuration. I'm currently using CentOS 6.3 with PostgreSQL 9.2 corosync-1.4.1-7.el6_3.1.x86_64 pacemaker-1.1.7-6.el6.x86_64 node1 192.168.0.1 node2 192.168.0.2 dbVIP 192.168.0.101 Any help and suggested reading appreciated. Thanks in advance, Brett Well, if you don't need shared storage and only a VIP over which postgres runs, I guess the official guide should be good: http://clusterlabs.org/doc/en-US/Pacemaker/1.1-crmsh/html-single/Clusters_from_Scratch/ Forget the drdb stuff, and base your configuration on the httpd examples that collocates a VIP and an httpd daemon in an active/passive two nodes cluster. (Chapter 6). -- Cheers, Florian Crouzat ___ Pacemaker mailing list: Pacemaker@oss.clusterlabs.org http://oss.clusterlabs.org/mailman/listinfo/pacemaker Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://bugs.clusterlabs.org
Re: [Pacemaker] crm_mon SNMP support
Le 29/11/2012 22:10, Andrew Beekhof a écrit : Not so fast :-) crm_mon supports -E, --external-agent=value A program to run when resource operations take place. -e, --external-recipient=value A recipient for your program (assuming you want the program to send something to someone). so without recompiling, you can call a script - possibly it could call something that sends out snmp alerts ;-) Oh, great! I had a hard time understanding these two options and how they relate, you helped me on IRC but I'll reply here so there is a trace in case someone is also interested. Here is my resource: primitive ClusterMon ocf:pacemaker:ClusterMon \ params user="root" update="30" extra_options="-E /usr/local/bin/foo.sh -e 192.168.1.2" \ op monitor on-fail="restart" interval="10" \ meta target-role="Started" clone ClusterMon-clone ClusterMon Here is the content of my script: $ cat /usr/local/bin/foo.sh #!/bin/bash ( echo CRM_notify_recipient $CRM_notify_recipient echo CRM_notify_node $CRM_notify_node echo CRM_notify_rsc $CRM_notify_rsc echo CRM_notify_task $CRM_notify_task echo CRM_notify_desc $CRM_notify_desc echo CRM_notify_rc $CRM_notify_rc echo CRM_notify_target_rc $CRM_notify_target_rc echo CRM_notify_status $CRM_notify_status echo ) > /tmp/pacemaker.log Finally, this is the resulting log of one execution. The script is executed on each cluster operation/transition (monitor, stop, start) etc. $ cat /tmp/pacemaker.log CRM_notify_recipient 192.168.1.2 CRM_notify_node scoresby2.lyra-network.com CRM_notify_rsc F CRM_notify_task monitor CRM_notify_desc ok CRM_notify_rc 0 CRM_notify_target_rc 0 CRM_notify_status 0 One just has to do some scripting with these variables to match its needs. In my case, I guess I want a SNMP trap whenever CRM_notify_rc != 0. Thanks -- Florian Crouzat ___ Pacemaker mailing list: Pacemaker@oss.clusterlabs.org http://oss.clusterlabs.org/mailman/listinfo/pacemaker Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://bugs.clusterlabs.org
Re: [Pacemaker] crm_mon SNMP support
Le 29/11/2012 01:27, Andrew Beekhof a écrit : On Wed, Nov 28, 2012 at 2:34 AM, Florian Crouzat wrote: Hi, I have in my current production configuration the following resource: primitive SNMPMonitor ocf:heartbeat:ClusterMon \ params pidfile="/var/run/crm_mon.pid" extra_options="-S 192.168.2.3 -C public" \ op monitor on-fail="restart" interval="10s" I was working a couple months ago, and I haven't touched it since. Apparently, I missed a couple changelogs :/ I was investigating why I wasn't receiving SNMP traps anymore during the last couples of migration/changes in the cluster state. I found out that my version of crm_mon is compiled without SNMP (or email) supports. $ sudo crm_mon -$ && cat /etc/redhat-release Pacemaker 1.1.6-3.el6 Written by Andrew Beekhof CentOS release 6.2 (Final) I found out the following changelogs: * Mon Sep 26 2011 Andrew Beekhof 1.1.6-2 - Do not build in support for heartbeat, snmp, esmtp by default - Create a package for cluster unaware libraries to minimze our footprint on non-cluster nodes - Better package descriptions What are my options, knowing that I'm in a PCI-DSS environment that forbids any compiler in production, and that I'd rather not maintain myself a snmp-enabled version of the package ? I'm not familiar with the term PCI-DSS... does that allow you to rebuild src.rpm packages? If so, just run: rpmbuild --with snmp --rebuild pacemaker-.src.rpm Yes I can (tm). FYI, PCI-DSS defines the securities requirements that you must follow whenever you handle credit card data (eg: you work in the credit card industry). Amongst many other things, it forbids compiler in production. Although, I could recompile in my lab, either from scratch with gcc/make or as you suggested. But I have so many things to keep up-to-date/running that I'm not sure I'll manage to keep pacemaker-cli up to date and PCI-DSS also requires that you update every packages within a month after every update/erratas. I guess there will never be a pacemaker-cli-snmp package, so I don't have any options anymore except hire someone to start packaging stuff =) Thanks for your suggestions though. -- Cheers, Florian Crouzat ___ Pacemaker mailing list: Pacemaker@oss.clusterlabs.org http://oss.clusterlabs.org/mailman/listinfo/pacemaker Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://bugs.clusterlabs.org
[Pacemaker] crm_mon SNMP support
Hi, I have in my current production configuration the following resource: primitive SNMPMonitor ocf:heartbeat:ClusterMon \ params pidfile="/var/run/crm_mon.pid" extra_options="-S 192.168.2.3 -C public" \ op monitor on-fail="restart" interval="10s" I was working a couple months ago, and I haven't touched it since. Apparently, I missed a couple changelogs :/ I was investigating why I wasn't receiving SNMP traps anymore during the last couples of migration/changes in the cluster state. I found out that my version of crm_mon is compiled without SNMP (or email) supports. $ sudo crm_mon -$ && cat /etc/redhat-release Pacemaker 1.1.6-3.el6 Written by Andrew Beekhof CentOS release 6.2 (Final) I found out the following changelogs: * Mon Sep 26 2011 Andrew Beekhof 1.1.6-2 - Do not build in support for heartbeat, snmp, esmtp by default - Create a package for cluster unaware libraries to minimze our footprint on non-cluster nodes - Better package descriptions What are my options, knowing that I'm in a PCI-DSS environment that forbids any compiler in production, and that I'd rather not maintain myself a snmp-enabled version of the package ? -- Cheers, Florian Crouzat ___ Pacemaker mailing list: Pacemaker@oss.clusterlabs.org http://oss.clusterlabs.org/mailman/listinfo/pacemaker Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://bugs.clusterlabs.org
Re: [Pacemaker] Recommendations in reducing failover response time
Le 05/11/2012 17:05, Arturo Borrero Gonzalez a écrit : Wich version of pacemaker/RAs are you using? thanks, Regards. $ sudo rpm -qa | grep -e pacemaker -e corosync -e resource-agent corosynclib-1.4.1-4.el6_2.3.x86_64 pacemaker-libs-1.1.6-3.el6.x86_64 pacemaker-cli-1.1.6-3.el6.x86_64 resource-agents-3.9.2-7.el6.x86_64 pacemaker-1.1.6-3.el6.x86_64 pacemaker-cluster-libs-1.1.6-3.el6.x86_64 corosync-1.4.1-4.el6_2.3.x86_64 $ cat /etc/redhat-release CentOS release 6.2 (Final) -- Cheers, Florian Crouzat ___ Pacemaker mailing list: Pacemaker@oss.clusterlabs.org http://oss.clusterlabs.org/mailman/listinfo/pacemaker Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://bugs.clusterlabs.org
Re: [Pacemaker] Recommendations in reducing failover response time
; primitive p_xorp lsb:/etc/init.d/xorp \ op monitor interval="5" group g_ipv p_ipv_vlan27 p_ipv p_ipv_vlan7 p_ipv_vlan6 p_ipv_vlan54 p_ipv_vlan51 p_ipv_vlan31 p_ipv_vlan34 p_ipv_nat p_ipv_vlan23 p_ipv_vlan10 p_ipv_vlan28 p_ipv_openvpn p_ipv_v6 p_ipv_v6_vlan51 p_ipv_v6_vlan31 p_ipv_v6_vlan6 p_ipv_v6_vlan7 p_ipv_v6_vlan27 p_ipv_v6_vlan54 p_ipv_v6_vlan34 colocation dhcp-ipv inf: p_dhcp g_ipv colocation firewall-ipv inf: p_firewall g_ipv colocation openvpn-ipv inf: p_openvpn p_ipv colocation radvd-ipv inf: p_radvd g_ipv colocation xorp-ipv inf: p_xorp g_ipv property $id="cib-bootstrap-options" \ dc-version="1.1.7-ee0730e13d124c3d58f00016c3376a1de5323cff" \ cluster-infrastructure="openais" \ expected-quorum-votes="2" \ stonith-enabled="false" \ no-quorum-policy="ignore" \ last-lrm-refresh="1352111523" rsc_defaults $id="rsc-options" \ resource-stickiness="100" I think these colocation constraints are missing ordering constraint, for security reasons you should make sure that the VIPs are first created, then the firewall should be started, and finally anything else, otherwise openvpn or radvd could start /before/ the firewall. Colocation doesn't imply ordering. (had a hard time understanding that one...) In my configuration, I aggregated all that stuff into one colocation set + one ordering set. Maybe you could see if it fits best and/or helps you with the duration of the failover. Eg: group IPHA eth0.10HA eth0.11HA eth0.12HA eth0.13HA eth0.14HA eth0.15HA eth0.16HA eth0.18HA eth0.19HA eth0.20HA eth0.21HA eth0.22HA eth0.23-1HA eth0.23-254HA eth0.24HA eth0.26HA eth0.2HA eth0.3-10HA eth0.3-15HA eth0.3-30HA eth0.4HA eth0.5-10HA eth0.5-215HA eth0.5-230HA eth0.7HA eth0.8HA eth0HA eth1-2HA eth1-3HA eth1-pubHA eth1.2HA eth1.97HA eth1.98HA eth1.99HA eth0.27HA eth0.28HA eth0.29HA eth0.30HA \ meta target-role="Started" \ meta globally-unique="false" target-role="Started" [...] colocation c_foo inf: ( bind ldirectord ldirectordBDD ldirectordMasterSlave openvpn stunnel ) IPHA firewall order o_foo inf: IPHA firewall ( bind ldirectord ldirectordBDD ldirectordMasterSlave openvpn stunnel ) Well, overall, sorry I didn't really helped you, just wanted to highlight some configuration tweaks as I run almost the same cluster. Cheers. -- Cheers, Florian Crouzat ___ Pacemaker mailing list: Pacemaker@oss.clusterlabs.org http://oss.clusterlabs.org/mailman/listinfo/pacemaker Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://bugs.clusterlabs.org
Re: [Pacemaker] lsb: could not parse meta-data
Le 22/10/2012 14:23, vishal kumar a écrit : Hi Please do suggest me where am i going wrong. Thanks for the help. See: crm ra help meta Then try something like: crm ra meta sshd lsb # parameters order matter Anyway, you won't learn anything out of meta-datas from a LSB initscript, because it's just a script (not cluster oriented, not a real resource agent), it's not multistate, nothing like that, only start/stop/monitor and default mandatory settings. -- Cheers, Florian Crouzat ___ Pacemaker mailing list: Pacemaker@oss.clusterlabs.org http://oss.clusterlabs.org/mailman/listinfo/pacemaker Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://bugs.clusterlabs.org
Re: [Pacemaker] resource can not be started
Le 18/10/2012 03:13, Mia Lueng a écrit : I just set "one" order rule: order rg_apache_order : res_ip_apache res_apache You mean the rule group rg_apache2 res_apache res_ip_apache \ meta target-role="Started" also indicates an order rule ? Yep, and a collocation as I said. It's all in the doc. http://www.clusterlabs.org/doc/en-US/Pacemaker/1.1-crmsh/html-single/Pacemaker_Explained/index.html#group-resources -- Cheers, Florian Crouzat ___ Pacemaker mailing list: Pacemaker@oss.clusterlabs.org http://oss.clusterlabs.org/mailman/listinfo/pacemaker Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://bugs.clusterlabs.org
Re: [Pacemaker] resource can not be started
Le 17/10/2012 11:56, Mia Lueng a écrit : group rg_apache2 res_apache res_ip_apache \ meta target-role="Started" order rg_apache_order : res_ip_apache res_apache It seems not correct. A group is already a shortcut for an ordered collocation. Either use collocation+ordering, or a group, not "almost both". -- Cheers, Florian Crouzat ___ Pacemaker mailing list: Pacemaker@oss.clusterlabs.org http://oss.clusterlabs.org/mailman/listinfo/pacemaker Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://bugs.clusterlabs.org
Re: [Pacemaker] Resource agent IPaddr2 failed to start
Le 09/10/2012 11:17, Soni Maula Harriz a écrit : This is the configuration : crm configure show node cluster1 node cluster2 primitive ClusterIP ocf:heartbeat:IPaddr2 \ params ip="xxx.xxx.xxx.289" cidr_netmask="32" \ op monitor interval="30s" property $id="cib-bootstrap-options" \ dc-version="1.1.7-6.el6-148fccfd5985c5590cc601123c6c16e966b85d14" \ cluster-infrastructure="openais" \ expected-quorum-votes="2" \ stonith-enabled="false" Off-topic advice, for a two nodes cluster such as yours, you *must* set the property no-quorum-policy="ignore" -- Cheers, Florian Crouzat ___ Pacemaker mailing list: Pacemaker@oss.clusterlabs.org http://oss.clusterlabs.org/mailman/listinfo/pacemaker Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://bugs.clusterlabs.org
Re: [Pacemaker] Resource agent IPaddr2 failed to start
Le 09/10/2012 10:39, Soni Maula Harriz a écrit : Dear all, I'm a newbie in clustering. I have been following the 'Cluster from scratch' tutorial. I use Centos 6.3 and install pacemaker and corosync from : yum install pacemaker corosync This is the version i got Pacemaker 1.1.7-6.el6 Corosync Cluster Engine, version '1.4.1' This time i have been this far : adding IPaddr2 resource (http://www.clusterlabs.org/doc/en-US/Pacemaker/1.1/html/Clusters_from_Scratch/_adding_a_resource.html) everything goes well before adding IPaddr2 resource. when i run 'crm status', it print out Last updated: Tue Oct 9 14:58:30 2012 Last change: Tue Oct 9 13:53:41 2012 via cibadmin on cluster1 Stack: openais Current DC: cluster1 - partition with quorum Version: 1.1.7-6.el6-148fccfd5985c5590cc601123c6c16e966b85d14 2 Nodes configured, 2 expected votes 1 Resources configured. Online: [ cluster1 cluster2 ] Failed actions: ClusterIP_start_0 (node=cluster1, call=3, rc=6, status=complete): not configured This is the error i got from /var/log/message Oct 8 17:07:16 cluster2 IPaddr2(ClusterIP)[15969]: ERROR: [/usr/lib64/heartbeat/findif -C] failed Oct 8 17:07:16 cluster2 crmd[15937]: warning: status_from_rc: Action 4 (ClusterIP_start_0) on cluster2 failed (target: 0 vs. rc: 6): Error Oct 8 17:07:16 cluster2 pengine[15936]:error: unpack_rsc_op: Preventing ClusterIP from re-starting anywhere in the cluster : operation start failed 'not configured' (rc=6) I have been searching through the google, but can't find the right solution for my problem. I have stopped the firewall and disabled the SElinux. Any help would be appreciated. Just a wild guess, you haven't created the network interface associated. Eg: for this resource to work: primitive eth0.21HA ocf:heartbeat:IPaddr2 \ params ip="10.0.8.1" cidr_netmask="29" nic="eth0.21" \ op monitor on-fail="restart" interval="10s" You *must* have "ifup-ed" the eth0.21 iface before otherwise Pacemaker cannot apply the VIP because there is no iface. You can create eth0.21 iface with fake IP such as 127.10.10.10/32 ofc. Quote of "sudo crm ra meta IPaddr2": nic (string, [eth0]): Network interface The base network interface on which the IP address will be brought online. If left empty, the script will try and determine this from the routing table. Do NOT specify an alias interface in the form eth0:1 or anything here; rather, specify the base interface only. Prerequisite: *There must be at least one static IP address, which is not managed by the cluster, assigned to the network interface.* If you can not assign any static IP address on the interface, modify this kernel parameter: sysctl -w net.ipv4.conf.all.promote_secondaries=1 (or per device) -- Cheers, Florian Crouzat ___ Pacemaker mailing list: Pacemaker@oss.clusterlabs.org http://oss.clusterlabs.org/mailman/listinfo/pacemaker Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://bugs.clusterlabs.org
Re: [Pacemaker] Loss of ocf:pacemaker:ping target forces resources to restart?
Le 24/08/2012 01:36, Andrew Martin a écrit : The dampen parameter tells the cluster to wait before making any decision, so that if the IP comes back online within the dampen period then no action is taken. Is this correct? This is also my understanding of this parameter. -- Cheers, Florian Crouzat ___ Pacemaker mailing list: Pacemaker@oss.clusterlabs.org http://oss.clusterlabs.org/mailman/listinfo/pacemaker Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://bugs.clusterlabs.org
Re: [Pacemaker] Loss of ocf:pacemaker:ping target forces resources to restart?
Le 22/08/2012 18:23, Andrew Martin a écrit : Hello, I have a 3 node Pacemaker + Heartbeat cluster (two real nodes and 1 quorum node that cannot run resources) running on Ubuntu 12.04 Server amd64. This cluster has a DRBD resource that it mounts and then runs a KVM virtual machine from. I have configured the cluster to use ocf:pacemaker:ping with two other devices on the network (192.168.0.128, 192.168.0.129), and set constraints to move the resources to the most well-connected node (whichever node can see more of these two devices): primitive p_ping ocf:pacemaker:ping \ params name="p_ping" host_list="192.168.0.128 192.168.0.129" multiplier="1000" attempts="8" debug="true" \ op start interval="0" timeout="60" \ op monitor interval="10s" timeout="60" ... clone cl_ping p_ping \ meta interleave="true" ... location loc_run_on_most_connected g_vm \ rule $id="loc_run_on_most_connected-rule" p_ping: defined p_ping Today, 192.168.0.128's network cable was unplugged for a few seconds and then plugged back in. During this time, pacemaker recognized that it could not ping 192.168.0.128 and restarted all of the resources, but left them on the same node. My understanding was that since neither node could ping 192.168.0.128 during this period, pacemaker would do nothing with the resources (leave them running). It would only migrate or restart the resources if for example node2 could ping 192.168.0.128 but node1 could not (move the resources to where things are better-connected). Is this understanding incorrect? If so, is there a way I can change my configuration so that it will only restart/migrate resources if one node is found to be better connected? Can you tell me why these resources were restarted? I have attached the syslog as well as my full CIB configuration. Thanks, Andrew Martin This is an interesting question and I'm also interested in answers. I had the same observations, and there is also the case where the monitor() aren't synced across all nodes so, "Node 1 issue a monitor() on the ping resource and finds ping-node dead, node2 hasn't pinged yet, so node1 moves things to node2 but node2 now issue a monitor() and also finds ping-node dead." The only solution I found was to adjust the dampen parameter to at least 2*monitor().interval so that I can be *sure* that all nodes have issued a monitor() and they all decreased they scores so that when a decision occurs, nothings move. It's been a long time I haven't tested, my cluster is very very stable, I guess I should retry to validate it's still a working trick. dampen (integer, [5s]): Dampening interval The time to wait (dampening) further changes occur Eg: primitive ping-nq-sw-swsec ocf:pacemaker:ping \ params host_list="192.168.10.1 192.168.2.11 192.168.2.12" dampen="35s" attempts="2" timeout="2" multiplier="100" \ op monitor interval="15s" -- Cheers, Florian Crouzat ___ Pacemaker mailing list: Pacemaker@oss.clusterlabs.org http://oss.clusterlabs.org/mailman/listinfo/pacemaker Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://bugs.clusterlabs.org
Re: [Pacemaker] ping - resource failover
Le 13/08/2012 08:09, Nicolai Langfeldt a écrit : On 2012-08-10 16:30, Josh wrote: location location_www1-vip www1-vip \ rule $id="location_www1-vip-rule" pingd: defined pingd It will probably work better if you: * Write the rule correctly ;-) pingd: defined pingd is a perfectly valid syntax as explained in this post http://www.woodwose.net/thatremindsme/2011/04/the-pacemaker-ping-resource-agent/ I happen to use it just fine. -- Cheers, Florian Crouzat ___ Pacemaker mailing list: Pacemaker@oss.clusterlabs.org http://oss.clusterlabs.org/mailman/listinfo/pacemaker Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://bugs.clusterlabs.org
Re: [Pacemaker] group resource - altering default order
Le 14/06/2012 14:55, Nicolaas Stuyt a écrit : Hello, Is there a way to affect a groups ordering behavior? The default appears to be left to right which I can understand. However if I wish to create a group of "like primitives" and not primitives who are dependant on each other - for example a list of (ocf::heartbeat:Filesystem) primitives - then I care less about the order. I was hoping to specify a meta characteristic like "lazy" I guess; something like meta order="lazy" or "not-applicable". This ordering behavior seems to exist in colocation as well. What I wish to achieve is to take a file system resource down - say for maintenance and then allow it to come back when maintenance is completed without affecting the other Filesystem primitives down the list. To provide a visual illustration of this I created a cloned group of dummies using (ocf::pacemaker:Dummy) and I have stopped just the last dummy primitive: Clone Set: cl_dummies [grp_dummies] Resource Group: grp_dummies:0 dummy1:0 (ocf::pacemaker:Dummy): Started node1 dummy2:0 (ocf::pacemaker:Dummy): Started node1 dummy3:0 (ocf::pacemaker:Dummy): Started node1 dummy4:0 (ocf::pacemaker:Dummy): Started node1 dummy5:0 (ocf::pacemaker:Dummy): Stopped Resource Group: grp_dummies:1 dummy1:1 (ocf::pacemaker:Dummy): Started node2 dummy2:1 (ocf::pacemaker:Dummy): Started node2 dummy3:1 (ocf::pacemaker:Dummy): Started node2 dummy4:1 (ocf::pacemaker:Dummy): Started node2 dummy5:1 (ocf::pacemaker:Dummy): Stopped When all are running crm_mon presents as: Clone Set: cl_dummies [grp_dummies] Started: [ node1 node2] What I want to achieve is to take a middle dummy primitive out but leave the rest running. When I do that - for example on dummy3; dummies 4 and 5 also get taken out such that I am left with this: Clone Set: cl_dummies [grp_dummies] Resource Group: grp_dummies:0 dummy1:0 (ocf::pacemaker:Dummy): Started node1 dummy2:0 (ocf::pacemaker:Dummy): Started node1 dummy3:0 (ocf::pacemaker:Dummy): Stopped dummy4:0 (ocf::pacemaker:Dummy): Stopped dummy5:0 (ocf::pacemaker:Dummy): Stopped Resource Group: grp_dummies:1 dummy1:1 (ocf::pacemaker:Dummy): Started node2 dummy2:1 (ocf::pacemaker:Dummy): Started node2 dummy3:1 (ocf::pacemaker:Dummy): Stopped dummy4:1 (ocf::pacemaker:Dummy): Stopped dummy5:1 (ocf::pacemaker:Dummy): Stopped What I would like is this: Clone Set: cl_dummies [grp_dummies] Resource Group: grp_dummies:0 dummy1:0 (ocf::pacemaker:Dummy): Started node1 dummy2:0 (ocf::pacemaker:Dummy): Started node1 dummy3:0 (ocf::pacemaker:Dummy): Stopped dummy4:0 (ocf::pacemaker:Dummy): Started node1 dummy5:0 (ocf::pacemaker:Dummy): Started node1 Resource Group: grp_dummies:1 dummy1:1 (ocf::pacemaker:Dummy): Started node2 dummy2:1 (ocf::pacemaker:Dummy): Started node2 dummy3:1 (ocf::pacemaker:Dummy): Stopped dummy4:1 (ocf::pacemaker:Dummy): Started node2 dummy5:1 (ocf::pacemaker:Dummy): Started node2 A work around I have considered is to edit the group and resort the list so that the Filesystem primitive I'm interested in working on is named last but I wonder if I'll remember ;-) to do that the next time I need to perform maintenance. Or is there a better way to do this that I have not been introduced too yet? Regards, Nick It's already been discussed a couple times. See http://oss.clusterlabs.org/pipermail/pacemaker/2011-March/009435.html and follow link/answers. -- Cheers, Florian Crouzat ___ Pacemaker mailing list: Pacemaker@oss.clusterlabs.org http://oss.clusterlabs.org/mailman/listinfo/pacemaker Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://bugs.clusterlabs.org
Re: [Pacemaker] Problem with state: UNCLEAN (OFFLINE)
Le 08/06/2012 13:01, Juan M. Sierra a écrit : Problem with state: UNCLEAN (OFFLINE) Hello, I'm trying to get up a directord service with pacemaker. But, I found a problem with the unclean (offline) state. The initial state of my cluster was this: /Online: [ node2 node1 ] node1-STONITH (stonith:external/ipmi): Started node2 node2-STONITH (stonith:external/ipmi): Started node1 Clone Set: Connected Started: [ node2 node1 ] Clone Set: ldirector-activo-activo Started: [ node2 node1 ] ftp-vip (ocf::heartbeat:IPaddr): Started node1 web-vip (ocf::heartbeat:IPaddr): Started node2 Migration summary: * Node node1: pingd=2000 * Node node2: pingd=2000 node2-STONITH: migration-threshold=100 fail-count=100 / and then, I removed the electric connection of node1, the state was the next: /Node node1 (8b2aede9-61bb-4a5a-aef6-25fbdefdddfd): UNCLEAN (offline) Online: [ node2 ] node1-STONITH (stonith:external/ipmi): Started node2 FAILED Clone Set: Connected Started: [ node2 ] Stopped: [ ping:1 ] Clone Set: ldirector-activo-activo Started: [ node2 ] Stopped: [ ldirectord:1 ] web-vip (ocf::heartbeat:IPaddr): Started node2 Migration summary: * Node node2: pingd=2000 node2-STONITH: migration-threshold=100 fail-count=100 node1-STONITH: migration-threshold=100 fail-count=100 Failed actions: node2-STONITH_start_0 (node=node2, call=22, rc=2, status=complete): invalid parameter node1-STONITH_monitor_6 (node=node2, call=11, rc=14, status=complete): status: unknown node1-STONITH_start_0 (node=node2, call=34, rc=1, status=complete): unknown error / I was hoping that node2 take the management of ftp-vip resource, but it wasn't in that way. node1 kept in a unclean state and node2 didn't take the management of its resources. When I put back the electric connection of node1 and it was recovered then, node2 took the management of ftp-vip resource. I've seen some similar conversations here. Please, could you show me some idea about this subject or some thread where this is discussed? Thanks a lot! Regards, It has been discussed for resource failover but I guess it's the same: http://oss.clusterlabs.org/pipermail/pacemaker/2012-May/014260.html The motto here (discovered it a couple days ago) is "better have a hanged cluster than a corrupted one, especially with shared filesystem/resources.". So, node1 failed but node2 hasn't been able to confirm its death because stonith failed apparently, then, the design choice is for the cluster to hang while waiting for a way to know the real state of node1 (at reboot in this case). -- Cheers, Florian Crouzat ___ Pacemaker mailing list: Pacemaker@oss.clusterlabs.org http://oss.clusterlabs.org/mailman/listinfo/pacemaker Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://bugs.clusterlabs.org
Re: [Pacemaker] ldirectord ocf errors
Le 04/06/2012 19:52, Jake Smith a écrit : - Original Message - From: "Dennis Jacobfeuerborn" To: pacemaker@oss.clusterlabs.org Sent: Monday, June 4, 2012 12:55:48 PM Subject: [Pacemaker] ldirectord ocf errors Hi, I'm trying to get started with ldirectord on a Centos 6.2 system but things don't seem to work well at the moment. I get the following when I try to set up a ocf:heartbeat:ldirectord resource: Where did you get that RA ? It's not part of the resource-agents-3.9.2-7.el6.x86_64 package for CentOS 6.2 and my system seems to be up to date with default repos and EPEL. I'm currently having 3 ldirectord resources as LSB scripts, so I'm interested. -- Cheers, Florian Crouzat ___ Pacemaker mailing list: Pacemaker@oss.clusterlabs.org http://oss.clusterlabs.org/mailman/listinfo/pacemaker Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://bugs.clusterlabs.org
Re: [Pacemaker] R: Pacemaker installation - Failed dependencies
Le 22/05/2012 12:03, Chiesa Stefano a écrit : Hello Rene, thanks for your answer. Yours is a "general consideration" anyway, right? In the Pacemaker installation log below "heartbeat" is not mentioned. So the failed dependancies are related to Pacemaker. Even if in the future I'll use corosync I'll face the same errors, or not? Sorry in case I didn't understand your answer.. Stefano. Since EL6.0 linux-ha is part of base. You must remove your clusterlabs repofile I guess. -- Cheers, Florian Crouzat ___ Pacemaker mailing list: Pacemaker@oss.clusterlabs.org http://oss.clusterlabs.org/mailman/listinfo/pacemaker Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://bugs.clusterlabs.org
Re: [Pacemaker] ping resource
Le 11/05/2012 14:56, fatcha...@gmx.de a écrit : When I deactivate the interface card which is connected with default gateway on one node, nothing happens. Actually, something happen, the ping score. You can check its value with cibadmin -Q | grep pingd or crm_mon -Arf1 | fgrep ping primitive ping-gateway ocf:pacemaker:ping \ params host_list="192.168.xxx.1" multiplier="100" clone pingclone ping-gateway \ meta interleave="true" Any Suggestions are welcome But since you haven't linked your resources and the ping score, nothing moves. I recommend you read these links: * http://www.clusterlabs.org/doc/en-US/Pacemaker/1.1/html/Pacemaker_Explained/ch09s03s03.html * http://www.woodwose.net/thatremindsme/2011/04/the-pacemaker-ping-resource-agent/ -- Cheers, Florian Crouzat ___ Pacemaker mailing list: Pacemaker@oss.clusterlabs.org http://oss.clusterlabs.org/mailman/listinfo/pacemaker Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://bugs.clusterlabs.org
Re: [Pacemaker] How to stop a resource?
Le 15/03/2012 12:50, Tim Ward a écrit : So, does anyone have any ideas as to what is going on here, and/or how to actually stop and then delete something? - thanks! $ crm configure property maintenance-mode=true commit quit $ /etc/init.d/myResource stop $ crm resource cleanup myResource # so that it sees it's actually stopped cd configure edit # now you can delete myResource commit maintenance-mode=false commit quit The maintenance-mode might not be useful depending on your groups, colocations and orders. -- Cheers, Florian Crouzat ___ Pacemaker mailing list: Pacemaker@oss.clusterlabs.org http://oss.clusterlabs.org/mailman/listinfo/pacemaker Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://bugs.clusterlabs.org
Re: [Pacemaker] Surprisingly fast start of resources on cluster failover.
Le 07/03/2012 22:08, Lars Ellenberg a écrit : Did you also time it as time /etc/init.d/firewall>out.txt 2>err.txt Yes, it's the same, 65 seconds vs 24 from Pacemaker. It only prints ~250 lines. Other than above suggestion, did you verify that it ends up doing the same thing when started from pacemaker, compared to when started by you from commandline? Did you compare the results? Absolutely the same, this is my core firewall, it sures works fine when starting from pacemaker. It's probably just a side effect of the server reboot, and/or the fact that it doesn't yet handles named, ldirector, stunnel & such because in the pacemaker resource group, "firewall" goes first. -- Cheers, Florian Crouzat ___ Pacemaker mailing list: Pacemaker@oss.clusterlabs.org http://oss.clusterlabs.org/mailman/listinfo/pacemaker Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://bugs.clusterlabs.org
[Pacemaker] Surprisingly fast start of resources on cluster failover.
Hi, On a two nodes active/passive cluster, I placed a location constraint of 50 for #uname node1. As soon as applied, things moved from node2 to node1: right. I have a lsb init script defined as a resource: $ crm configure show firewall primitive firewall lsb:firewall\ op monitor on-fail="restart" interval="10s" \ op start interval="0" timeout="3min" \ op stop interval="0" timeout="1min" \ meta target-role="Started" This lsb takes a long time to start, at least 55 seconds when fired from my shell over ssh. It logs a couple things to std{out,err}. I have Florian's rsyslog config: https://github.com/fghaas/pacemaker/blob/syslog/extra/rsyslog/pacemaker.conf.in So, while node1 was taking-over, I noticed in /var/log/pacemaker/lrmd.log that it only took 24 seconds to start that resource. 2012-03-06T07:20:11.844573+01:00 node1 lrmd: [9322]: info: rsc:firewall:129: start 2012-03-06T07:20:11.864758+01:00 node1 lrmd: [9322]: info: RA output: (firewall:start:stdout) Starting. Becoming active [...] 2012-03-06T07:20:35.133591+01:00 node1 lrmd: [9322]: info: RA output: (firewall:start:stderr) #033[33;01m*#033[0m New rules are now applied. My question: how comes pacemaker starts a resources twice as fast than I do from CLI ? -- Florian Crouzat ___ Pacemaker mailing list: Pacemaker@oss.clusterlabs.org http://oss.clusterlabs.org/mailman/listinfo/pacemaker Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://bugs.clusterlabs.org
Re: [Pacemaker] colocation quandery
Le 22/02/2012 17:06, Jean-Francois Malouin a écrit : can't wrap my head around it so I created a test environment to play around the different scenarios. But... is it possible to use crm to create the colocation as in Example 6.16 in http://www.clusterlabs.org/doc/en-US/Pacemaker/1.1/html/Pacemaker_Explained/s-resource-sets-collocation.html I've only been able to stuff the relevant xml bits into the cib using cibadmin. Figure 6.4 is: colocation foo inf: A B ( C D E ) F G Same goes for ordered sets. -- Cheers, Florian Crouzat ___ Pacemaker mailing list: Pacemaker@oss.clusterlabs.org http://oss.clusterlabs.org/mailman/listinfo/pacemaker Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://bugs.clusterlabs.org
Re: [Pacemaker] iptables cluster
Le 13/02/2012 10:21, Karlis Kisis a écrit : Question #2: The whole clustering thingy works by stopping the service on one node and starting it on the other. In my case, I would not want iptables to be stopped but instead restarted with a "passive" config, like block all traffic from outside (instead of dropping firewall entirely). How would I go about it? Custom scripts? Yes In fact, I have such a setup, I created a LSB compliant initscript for iptables (/etc/init.d/firewall) and added a lsb:firewall resource. /etc/init.d/firewall start(): /usr/local/firewall/firewall.sh /etc/init.d/firewall stop(): /usr/local/firewall/firewall-passive.sh As for the status() function, you'd have to decide a way to know in which state you are. -- Cheers, Florian Crouzat ___ Pacemaker mailing list: Pacemaker@oss.clusterlabs.org http://oss.clusterlabs.org/mailman/listinfo/pacemaker Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://bugs.clusterlabs.org
Re: [Pacemaker] Cluster Shutdown Fails - causes nodes to hang
Le 30/01/2012 01:42, Gruen, Wolfgang a écrit : > Issuing /etc/init.d/corosync stop or /etc/init.d/pacemaker stop causes On a running cluster, you need to stop pacemaker first, it's mandatory AFAIK. -- Cheers, Florian Crouzat ___ Pacemaker mailing list: Pacemaker@oss.clusterlabs.org http://oss.clusterlabs.org/mailman/listinfo/pacemaker Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://bugs.clusterlabs.org
Re: [Pacemaker] How to flush the arp cache of a router?
Le 26/01/2012 01:05, ge...@riseup.net a écrit : I tried to put it into a file and made this executable, and used lsb: to call it (which didn't work). Then I googled for hours to find out, how to call scripts from within crm, but had no success... You should be able to run any script you want using a lsb RA. Just make sure to write your init script to be LSB compliant: http://www.clusterlabs.org/doc/en-US/Pacemaker/1.0/html/Pacemaker_Explained/ap-lsb.html and this should work. Cheers, Florian Crouzat ___ Pacemaker mailing list: Pacemaker@oss.clusterlabs.org http://oss.clusterlabs.org/mailman/listinfo/pacemaker Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://bugs.clusterlabs.org
Re: [Pacemaker] Failed actions: ...not installed
Le 24/01/2012 13:21, Stallmann, Andreas a écrit : Any ideas on how to get this running? If you want to debug what pacemaker does with your RA, I'd suggest you insert set -x at the top of /usr/lib/ocf/resource.d/heartbeat/mysql and read the (now flooded) logs. Cheers, Florian. ___ Pacemaker mailing list: Pacemaker@oss.clusterlabs.org http://oss.clusterlabs.org/mailman/listinfo/pacemaker Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://bugs.clusterlabs.org