[Linux-HA] Some questions about High Availability
Hi, I'm new to heartbeat, just started to read about it a week ago. I'm using ubuntu, I installed heartbeat 2.1.4 in my two nodes. Here is ha.cf of them: logfacility local0 keepalive 2 deadtime 5 udpport 694 bcast eth5 auto_failback on node node1 node node2 And here is haresources (I'm testing only apache, the ip address don't matter): node1 apache2 I stoped heartbeat service in node1 and then apache started in node2 , then I started heartbeat service in node1 and apache started in node1 but stayed active in node2. Shouldn't the one in node2 be stopped? Another question: If I stop apache service in one node it don't start in the other node, shouldn't it be started since apache is down? How heartbeat monitors the process to see if it is down or up? And another question: I'm used with VMware HA for High Availability between VMS, it saves a copy of the vm and start it in case of the main one goes down, the vm's data are syncronized between each other. My objective when I started to read about heartbeat was to achieve this same goal but with physical machines, but with some research I realised that heartbeat don't do this alone, I need another software to syncronize data between the nodes. What I want to ask is wich one is the best to syncronize ALL data between the 2 nodes including data that may be in use, data related to hardware (my two nodes have the same hardware), in a way that one node is the mirror of the other just like in vmware HA. Or maybe heartbeat isn't the right solution for this goal? I've heard about LVS for ha and load balancing and Linux HPC and I was thinking about some hardware stuff too. One more question: I've read somewhere that you can have a heartbeat server used to manage the multiple pairs of nodes, is this right? Is there a gui for this like a web interface? Thanks. ___ Linux-HA mailing list Linux-HA@lists.linux-ha.org http://lists.linux-ha.org/mailman/listinfo/linux-ha See also: http://linux-ha.org/ReportingProblems
Re: [Linux-HA] Some questions about High Availability
The heartbeat package is deprecated in favour of corosync + pacemaker. There is no planned future development of heartbeat. I'd recommend reading the Clusters From Scratch tutorials available on the pacemaker website. On 11/30/2012 02:08 PM, Rodrigo Abrantes Antunes wrote: Hi, I'm new to heartbeat, just started to read about it a week ago. I'm using ubuntu, I installed heartbeat 2.1.4 in my two nodes. Here is ha.cf of them: logfacility local0 keepalive 2 deadtime 5 udpport 694 bcast eth5 auto_failback on nodenode1 nodenode2 And here is haresources (I'm testing only apache, the ip address don't matter): node1 apache2 I stoped heartbeat service in node1 and then apache started in node2 , then I started heartbeat service in node1 and apache started in node1 but stayed active in node2. Shouldn't the one in node2 be stopped? Another question: If I stop apache service in one node it don't start in the other node, shouldn't it be started since apache is down? How heartbeat monitors the process to see if it is down or up? And another question: I'm used with VMware HA for High Availability between VMS, it saves a copy of the vm and start it in case of the main one goes down, the vm's data are syncronized between each other. My objective when I started to read about heartbeat was to achieve this same goal but with physical machines, but with some research I realised that heartbeat don't do this alone, I need another software to syncronize data between the nodes. What I want to ask is wich one is the best to syncronize ALL data between the 2 nodes including data that may be in use, data related to hardware (my two nodes have the same hardware), in a way that one node is the mirror of the other just like in vmware HA. Or maybe heartbeat isn't the right solution for this goal? I've heard about LVS for ha and load balancing and Linux HPC and I was thinking about some hardware stuff too. One more question: I've read somewhere that you can have a heartbeat server used to manage the multiple pairs of nodes, is this right? Is there a gui for this like a web interface? Thanks. ___ Linux-HA mailing list Linux-HA@lists.linux-ha.org http://lists.linux-ha.org/mailman/listinfo/linux-ha See also: http://linux-ha.org/ReportingProblems -- Digimer Papers and Projects: https://alteeve.ca/w/ What if the cure for cancer is trapped in the mind of a person without access to education? ___ Linux-HA mailing list Linux-HA@lists.linux-ha.org http://lists.linux-ha.org/mailman/listinfo/linux-ha See also: http://linux-ha.org/ReportingProblems
Re: [Linux-HA] Some questions about High Availability
On 11/30/2012 01:08 PM, Rodrigo Abrantes Antunes wrote: Hi, I'm new to heartbeat, just started to read about it a week ago. I'm using ubuntu, I installed heartbeat 2.1.4 in my two nodes. Here is ha.cf of them: logfacility local0 keepalive 2 deadtime 5 udpport 694 bcast eth5 auto_failback on nodenode1 nodenode2 I think you may need crm no as well. What's your eth5 connected to? And here is haresources (I'm testing only apache, the ip address don't matter): node1 apache2 I stoped heartbeat service in node1 and then apache started in node2 , then I started heartbeat service in node1 and apache started in node1 but stayed active in node2. Shouldn't the one in node2 be stopped? Yes it should be. Read the logs (tail -f on one console while stopping heartbeat on another). Another question: If I stop apache service in one node it don't start in the other node, shouldn't it be started since apache is down? How heartbeat monitors the process to see if it is down or up? No because in your configuration heartbeat doesn't monitor services. Install mon and write a custom alert script that does /usr/share/heartbeat/hb_standby all when process dies. What I want to ask is wich one is the best to syncronize ALL data between the 2 nodes including data that may be in use Simple setup: put it all on one filesystem and put that on DRBD. See http://www.drbd.org/users-guide-8.3/s-heartbeat-r1.html Also recommended: drbdlinks. HTH -- Dimitri Maziuk Programmer/sysadmin BioMagResBank, UW-Madison -- http://www.bmrb.wisc.edu signature.asc Description: OpenPGP digital signature ___ Linux-HA mailing list Linux-HA@lists.linux-ha.org http://lists.linux-ha.org/mailman/listinfo/linux-ha See also: http://linux-ha.org/ReportingProblems
[Linux-HA] Some help on understanding how HA issues are addressed by pacemaker
Hi, I am looking into using your facilities to have high availability on my system. I am trying to figure out some things. I hope you guys could help me. I am interested in knowing how pacemaker migrates a VIP and how a splitbrain situation is address by your facilities. To be specific: I am interested in the following setup: 2 linux machines. Each machine runs a load balancer and a Tomcat instance. If I understand correctly pacemaker will be responsible to assign the main VIP to one of the nodes. My questions are: 1)Will pacemaker monitor/restart the load balancers on each machine in case of crash? 2) How does pacemaker decide to migrate the VIP to the other node? 3) Do the pacemakers in each machine communicate? If yes how do you handle network failure? Could I end up with split-brain? 4) Generally how is split-brain addressed using pacemaker? 5) Could pacemaker monitor Tomcat? As you can see I am interested in maintain quorum in a two-node configuration. If you can help me with this info to find a proper direction it would be much appreciated! Thank you ___ Linux-HA mailing list Linux-HA@lists.linux-ha.org http://lists.linux-ha.org/mailman/listinfo/linux-ha See also: http://linux-ha.org/ReportingProblems
Re: [Linux-HA] Some help on understanding how HA issues are addressed by pacemaker
- Original Message - From: Hermes Flying flyingher...@yahoo.com To: linux-ha@lists.linux-ha.org Sent: Friday, November 30, 2012 4:04:34 PM Subject: [Linux-HA] Some help on understanding how HA issues are addressed by pacemaker Hi, I am looking into using your facilities to have high availability on my system. I am trying to figure out some things. I hope you guys could help me. I am interested in knowing how pacemaker migrates a VIP and how a splitbrain situation is address by your facilities. To be specific: I am interested in the following setup: 2 linux machines. Each machine runs a load balancer and a Tomcat instance. If I understand correctly pacemaker will be responsible to assign the main VIP to one of the nodes. My questions are: 1)Will pacemaker monitor/restart the load balancers on each machine in case of crash? 2) How does pacemaker decide to migrate the VIP to the other node? 3) Do the pacemakers in each machine communicate? If yes how do you handle network failure? Could I end up with split-brain? 4) Generally how is split-brain addressed using pacemaker? 5) Could pacemaker monitor Tomcat? As you can see I am interested in maintain quorum in a two-node configuration. If you can help me with this info to find a proper direction it would be much appreciated! Thank you Hey, You may or may not have looked at this already, but this is a good place to start, http://clusterlabs.org/doc/ Read chapter one of this document. http://clusterlabs.org/doc/en-US/Pacemaker/1.1-pcs/html-single/Pacemaker_Explained/index.html Running through this 2 node cluster exercise will likely answer many of your questions. http://clusterlabs.org/doc/en-US/Pacemaker/1.1-pcs/html-single/Clusters_from_Scratch/index.html -- Vossel ___ Linux-HA mailing list Linux-HA@lists.linux-ha.org http://lists.linux-ha.org/mailman/listinfo/linux-ha See also: http://linux-ha.org/ReportingProblems ___ Linux-HA mailing list Linux-HA@lists.linux-ha.org http://lists.linux-ha.org/mailman/listinfo/linux-ha See also: http://linux-ha.org/ReportingProblems
Re: [Linux-HA] Some help on understanding how HA issues are addressed by pacemaker
On 11/30/2012 05:04 PM, Hermes Flying wrote: Hi, I am looking into using your facilities to have high availability on my system. I am trying to figure out some things. I hope you guys could help me. I am interested in knowing how pacemaker migrates a VIP and how a splitbrain situation is address by your facilities. To be specific: I am interested in the following setup: 2 linux machines. Each machine runs a load balancer and a Tomcat instance. If I understand correctly pacemaker will be responsible to assign the main VIP to one of the nodes. Pacemaker can handle virtual IP addresses, but it's more of a fail-over system, rather than a load-balancing, round-robin system. For load balancing, look at Red Hat's LVS. My questions are: 1)Will pacemaker monitor/restart the load balancers on each machine in case of crash? It can monitor/recover/relocate any service that uses init.d style scripts. If a script/service responds properly to stop, status and start, you're good to go. 2) How does pacemaker decide to migrate the VIP to the other node? At the most simple; When the machine hosting the VIP fails, it will relocate. You can control how, when and where the VIP fails back (look at 'resource stickiness'). 3) Do the pacemakers in each machine communicate? If yes how do you handle network failure? Could I end up with split-brain? Pacemaker uses corosync for cluster membership, quorum and fencing. A properly configured fence device (aka stonith), will prevent a split brain. If you disable or fail to properly setup fencing, split brains are possible and even likely. 4) Generally how is split-brain addressed using pacemaker? Fencing to prevent it. 5) Could pacemaker monitor Tomcat? If it supports stop, start and status, yes. As you can see I am interested in maintain quorum in a two-node configuration. If you can help me with this info to find a proper direction it would be much appreciated! Quorum needs to be disabled in a two-node cluster. This is fine with good fencing. To learn more, please see the documentation available here: http://clusterlabs.org/doc/ -- Digimer Papers and Projects: https://alteeve.ca/w/ What if the cure for cancer is trapped in the mind of a person without access to education? ___ Linux-HA mailing list Linux-HA@lists.linux-ha.org http://lists.linux-ha.org/mailman/listinfo/linux-ha See also: http://linux-ha.org/ReportingProblems
Re: [Linux-HA] master/slave drbd resource STILL will not failover
30.11.2012 00:14, Robinson, Eric wrote: Bump... does anyone have some insight on this? Google is not turning up anything useful. Our newest cluster will not failover master/slave drbd resources. It works fine manually using drbdadm from a shell prompt, but when we try it using 'crm node standby' and letting the cluster manage the resource, crm_mon just keeps saying the resource FAILED. We see a lot of these messages in the corosync.log file: drbd(p_drbd1)[12814]: 2012/11/27_15:31:59 DEBUG: ha02_mysql: Calling drbdadm -c /etc/drbd.conf primary ha02_mysql drbd(p_drbd1)[12814]: 2012/11/27_15:31:59 ERROR: ha02_mysql: Called drbdadm -c /etc/drbd.conf primary ha02_mysql drbd(p_drbd1)[12814]: 2012/11/27_15:31:59 ERROR: ha02_mysql: Exit code 11 There is no indication of what may be causing the 'Exit code 11' Here is a link to the corosync log, taken from the standby server (ha09a) where we are trying to fail the resource to... www.psmnv.com/downloads/corosync1.loghttp://www.psmnv.com/downloads/corosync1.log Here is what I have installed... corosync-1.4.1-7.el6_3.1.x86_64 corosynclib-1.4.1-7.el6_3.1.x86_64 pacemaker-1.1.8-4.el6.x86_64 pacemaker-cli-1.1.8-4.el6.x86_64 pacemaker-cluster-libs-1.1.8-4.el6.x86_64 pacemaker-libs-1.1.8-4.el6.x86_64 Following is my crm config. It's pretty basic. node ha09a \ attributes standby=off node ha09b \ attributes standby=off primitive p_drbd0 ocf:linbit:drbd \ params drbd_resource=ha01_mysql \ op monitor interval=60s primitive p_drbd1 ocf:linbit:drbd \ params drbd_resource=ha02_mysql \ op monitor interval=45s primitive p_vip_clust08 ocf:heartbeat:IPaddr2 \ params ip=192.168.10.210 cidr_netmask=32 \ op monitor interval=30s primitive p_vip_clust09 ocf:heartbeat:IPaddr2 \ params ip=192.168.10.211 cidr_netmask=32 \ op monitor interval=30s ms ms_drbd0 p_drbd0 \ meta master-max=1 master-node-max=1 clone-max=2 clone-node-max=1 notify=true target-role=Master ms ms_drbd1 p_drbd1 \ meta master-max=1 master-node-max=1 clone-max=2 clone-node-max=1 notify=true target-role=Master Try to set 'target-role=Started' in both of them. Vladislav ___ Linux-HA mailing list Linux-HA@lists.linux-ha.org http://lists.linux-ha.org/mailman/listinfo/linux-ha See also: http://linux-ha.org/ReportingProblems