Re: [Linux-HA] Heartbeat 2.1.4 and 2.9.9 together?
haresources clusters should be fine. for crm clusters it depends if you go for 1.0 or 0.6 On Fri, May 1, 2009 at 10:32 PM, Mike Sweetser - Adhost mik...@adhost.com wrote: Hello: I'm looking to migrate an existing Heartbeat 2.1.4 installation to 2.9.9. Would it be possible to upgrade the servers one at a time, which would require running one server with 2.1.4 and one server with 2.9.9 for a short period? Would there be any incompatibility issues in doing so? Thank You, Mike Sweetser ___ Linux-HA mailing list Linux-HA@lists.linux-ha.org http://lists.linux-ha.org/mailman/listinfo/linux-ha See also: http://linux-ha.org/ReportingProblems ___ Linux-HA mailing list Linux-HA@lists.linux-ha.org http://lists.linux-ha.org/mailman/listinfo/linux-ha See also: http://linux-ha.org/ReportingProblems
Re: [Linux-HA] Problems With SLES11 + DRBD
darren.mans...@opengi.co.uk wrote: Hello everyone. Long post, sorry. I've been trying to get SLES11 with Pacemaker 1.0 / OpenAIS working for most of this week without success so far. I thought I may as well bundle my problems into one mail to see if anyone can offer any advice. Goal: I'm trying to get a 2 node Active/Passive cluster working with DRBD replication, an ext3 FS on top of DRBD and a virtual IP. I want the active node to have a mounted FS that I can serve requests from using ProFTPD or another FTP daemon. If the active node fails I want the cluster to migrate all 4 resources (DRBD, FS, ProFTPD, Virtual IP) across to the other node. I don't have any STONITH devices at the moment. Approach: We are going with SLES11 with Pacemaker 1.0.3 and OpenAIS 0.80.3, after already using SLES10SP2 with Heartbeat 2.1.4 and ldirectord in a live running 2-node Active/Active cluster. We are using LVM under DRBD for future disk expansion. Problem1 - Using DRBD OCF RA: I wanted to use the latest and greatest for the approaches, so tried the DRBD OCF RA following this howto: http://www.clusterlabs.org/wiki/DRBD_HowTo_1.0 . The configuration works and I can manually migrate resources but if I just reboot the node that has the drbd resource on it I see the resource gets migrated to the other node for about 2 seconds then is stopped: Normal operation: Last updated: Fri May 1 16:33:00 2009 Current DC: gihub2 - partition with quorum And this is your reason. The no-quorum-policy default is stop (you even configured it, see below), which means do not run any resources if you do not have qorum. The node is alone, so it does not have quorum. If you want it to run things anyway, set no-quorum-policy to ignore. That would be the old heartbeat behaviour. Version: 1.0.3-0080ec086ae9c20ad5c4c3562000c0ad68374f0a 2 Nodes configured, 2 expected votes 1 Resources configured. Online: [ gihub1 gihub2 ] drbd0 (ocf::heartbeat:drbd): Started gihub1 Reboot gihub1: Last updated: Fri May 1 16:35:34 2009 Current DC: gihub2 - partition with quorum Version: 1.0.3-0080ec086ae9c20ad5c4c3562000c0ad68374f0a 2 Nodes configured, 2 expected votes 1 Resources configured. Online: [ gihub2 ] OFFLINE: [ gihub1 ] drbd0 (ocf::heartbeat:drbd): Started gihub2 Then after a couple of seconds: Last updated: Fri May 1 16:37:11 2009 Current DC: gihub2 - partition WITHOUT quorum Version: 1.0.3-0080ec086ae9c20ad5c4c3562000c0ad68374f0a 2 Nodes configured, 2 expected votes 1 Resources configured. Online: [ gihub2 ] OFFLINE: [ gihub1 ] /var/log/messages says: May 1 16:46:33 gihub2 openais[5362]: [TOTEM] The token was lost in the OPERATIONAL state. May 1 16:46:33 gihub2 openais[5362]: [TOTEM] Receive multicast socket recv buffer size (262142 bytes). May 1 16:46:33 gihub2 openais[5362]: [TOTEM] Transmit multicast socket send buffer size (262142 bytes). May 1 16:46:33 gihub2 openais[5362]: [TOTEM] entering GATHER state from 2. May 1 16:46:36 gihub2 kernel: drbd0: conn( WFConnection - Disconnecting ) May 1 16:46:36 gihub2 kernel: drbd0: Discarding network configuration. May 1 16:46:36 gihub2 kernel: drbd0: Connection closed May 1 16:46:36 gihub2 kernel: drbd0: conn( Disconnecting - StandAlone ) May 1 16:46:36 gihub2 kernel: drbd0: receiver terminated May 1 16:46:36 gihub2 kernel: drbd0: Terminating receiver thread May 1 16:46:36 gihub2 kernel: drbd0: disk( UpToDate - Diskless ) May 1 16:46:36 gihub2 kernel: drbd0: drbd_bm_resize called with capacity == 0 May 1 16:46:36 gihub2 kernel: drbd0: worker terminated May 1 16:46:36 gihub2 kernel: drbd0: Terminating worker thread May 1 16:46:36 gihub2 openais[5362]: [TOTEM] entering GATHER state from 0. May 1 16:46:36 gihub2 openais[5362]: [TOTEM] Creating commit token because I am the rep. May 1 16:46:36 gihub2 openais[5362]: [TOTEM] Saving state aru 6b high seq received 6b May 1 16:46:36 gihub2 lrmd: [5370]: info: rsc:drbd0: stop May 1 16:46:36 gihub2 cib: [5369]: notice: ais_dispatch: Membership 400: quorum lost May 1 16:46:36 gihub2 openais[5362]: [TOTEM] Storing new sequence id for ring 190 May 1 16:46:36 gihub2 openais[5362]: [TOTEM] entering COMMIT state. May 1 16:46:36 gihub2 openais[5362]: [TOTEM] entering RECOVERY state. May 1 16:46:36 gihub2 openais[5362]: [TOTEM] position [0] member 2.21.4.41: May 1 16:46:36 gihub2 openais[5362]: [TOTEM] previous ring seq 396 rep 2.21.4.40 May 1 16:46:36 gihub2 openais[5362]: [TOTEM] aru 6b high delivered 6b received flag 1 May 1 16:46:36 gihub2 openais[5362]: [TOTEM] Did not need to originate any messages in recovery. May 1 16:46:36 gihub2 openais[5362]: [TOTEM] Sending
Re: [Linux-HA] Problems With SLES11 + DRBD
Dominik Klein wrote: darren.mans...@opengi.co.uk wrote: Hello everyone. Long post, sorry. I've been trying to get SLES11 with Pacemaker 1.0 / OpenAIS working for most of this week without success so far. I thought I may as well bundle my problems into one mail to see if anyone can offer any advice. Goal: I'm trying to get a 2 node Active/Passive cluster working with DRBD replication, an ext3 FS on top of DRBD and a virtual IP. I want the active node to have a mounted FS that I can serve requests from using ProFTPD or another FTP daemon. If the active node fails I want the cluster to migrate all 4 resources (DRBD, FS, ProFTPD, Virtual IP) across to the other node. I don't have any STONITH devices at the moment. Approach: We are going with SLES11 with Pacemaker 1.0.3 and OpenAIS 0.80.3, after already using SLES10SP2 with Heartbeat 2.1.4 and ldirectord in a live running 2-node Active/Active cluster. We are using LVM under DRBD for future disk expansion. Problem1 - Using DRBD OCF RA: I wanted to use the latest and greatest for the approaches, so tried the DRBD OCF RA following this howto: http://www.clusterlabs.org/wiki/DRBD_HowTo_1.0 . The configuration works and I can manually migrate resources but if I just reboot the node that has the drbd resource on it I see the resource gets migrated to the other node for about 2 seconds then is stopped: Normal operation: Last updated: Fri May 1 16:33:00 2009 Current DC: gihub2 - partition with quorum And this is your reason. Bla. (read below) The no-quorum-policy default is stop (you even configured it, see below), which means do not run any resources if you do not have qorum. The node is alone, so it does not have quorum. If you want it to run things anyway, set no-quorum-policy to ignore. That would be the old heartbeat behaviour. Version: 1.0.3-0080ec086ae9c20ad5c4c3562000c0ad68374f0a 2 Nodes configured, 2 expected votes 1 Resources configured. Online: [ gihub1 gihub2 ] drbd0 (ocf::heartbeat:drbd): Started gihub1 Reboot gihub1: Last updated: Fri May 1 16:35:34 2009 Current DC: gihub2 - partition with quorum Version: 1.0.3-0080ec086ae9c20ad5c4c3562000c0ad68374f0a 2 Nodes configured, 2 expected votes 1 Resources configured. Online: [ gihub2 ] OFFLINE: [ gihub1 ] drbd0 (ocf::heartbeat:drbd): Started gihub2 Then after a couple of seconds: Last updated: Fri May 1 16:37:11 2009 Current DC: gihub2 - partition WITHOUT quorum Here you are without quorum. Sorry. Regards Dominik Version: 1.0.3-0080ec086ae9c20ad5c4c3562000c0ad68374f0a 2 Nodes configured, 2 expected votes 1 Resources configured. Online: [ gihub2 ] OFFLINE: [ gihub1 ] /var/log/messages says: May 1 16:46:33 gihub2 openais[5362]: [TOTEM] The token was lost in the OPERATIONAL state. May 1 16:46:33 gihub2 openais[5362]: [TOTEM] Receive multicast socket recv buffer size (262142 bytes). May 1 16:46:33 gihub2 openais[5362]: [TOTEM] Transmit multicast socket send buffer size (262142 bytes). May 1 16:46:33 gihub2 openais[5362]: [TOTEM] entering GATHER state from 2. May 1 16:46:36 gihub2 kernel: drbd0: conn( WFConnection - Disconnecting ) May 1 16:46:36 gihub2 kernel: drbd0: Discarding network configuration. May 1 16:46:36 gihub2 kernel: drbd0: Connection closed May 1 16:46:36 gihub2 kernel: drbd0: conn( Disconnecting - StandAlone ) May 1 16:46:36 gihub2 kernel: drbd0: receiver terminated May 1 16:46:36 gihub2 kernel: drbd0: Terminating receiver thread May 1 16:46:36 gihub2 kernel: drbd0: disk( UpToDate - Diskless ) May 1 16:46:36 gihub2 kernel: drbd0: drbd_bm_resize called with capacity == 0 May 1 16:46:36 gihub2 kernel: drbd0: worker terminated May 1 16:46:36 gihub2 kernel: drbd0: Terminating worker thread May 1 16:46:36 gihub2 openais[5362]: [TOTEM] entering GATHER state from 0. May 1 16:46:36 gihub2 openais[5362]: [TOTEM] Creating commit token because I am the rep. May 1 16:46:36 gihub2 openais[5362]: [TOTEM] Saving state aru 6b high seq received 6b May 1 16:46:36 gihub2 lrmd: [5370]: info: rsc:drbd0: stop May 1 16:46:36 gihub2 cib: [5369]: notice: ais_dispatch: Membership 400: quorum lost May 1 16:46:36 gihub2 openais[5362]: [TOTEM] Storing new sequence id for ring 190 May 1 16:46:36 gihub2 openais[5362]: [TOTEM] entering COMMIT state. May 1 16:46:36 gihub2 openais[5362]: [TOTEM] entering RECOVERY state. May 1 16:46:36 gihub2 openais[5362]: [TOTEM] position [0] member 2.21.4.41: May 1 16:46:36 gihub2 openais[5362]: [TOTEM] previous ring seq 396 rep 2.21.4.40 May 1 16:46:36 gihub2 openais[5362]: [TOTEM] aru 6b high delivered 6b received flag 1 May 1 16:46:36 gihub2 openais[5362]: [TOTEM] Did not need to originate any messages in recovery. May 1 16:46:36 gihub2
Re: [Linux-HA] crm CLI
Ciao, in attachment my cib.xml where I've already have a group and location constraint with score=100. If I understood correctly, the score is related to all resources so if I don't have one of them the score is less than 100 and everything will be migrated to the standby node, is that true ? If yes then I need to find another way to monitor a mount resource because if I disconnect the SAN's fiber cable I still see the mounted filesystems and the score's group is 100 :-(( thanks cristina On Apr 24, 2009, at 3:52 PM, Dejan Muhamedagic wrote: Ciao, On Fri, Apr 24, 2009 at 09:29:12AM +0200, Cristina Bulfon wrote: Ciao, I tried to build pacemaker rpm w/o success :-(( I will do another time and it will fail then I am going to compile the source. You can just grab the source rpm and comment out the offending line in lib/ais/Makefile.am and do rpmbuild -bb. First install all the build requirements (heartbeat/openais-dev). Anyway I thought to use pacemaker because I understand that is the better way to modify cib.xml but if there are any other way to do it I will. pacemaker is the best if you're starting now. The crm shell will help you avoid xml in case you're allergic to it. However my focus point is : - my resource group is composed from 4 single resources: 2 Filesystem, Ipaddr and AFS. for better comprehension follow the haresource file ( from my point of view is more readable than cib.xml) True, but cib is more powerful :) a.roma1.infn.it \ IPaddr::X.X.X.31/24/eth0 \ Filesystem::/dev/AFS/sda3::/vicepa/::xfs \ Filesystem::/dev/AFS/sda1::/usr/afs::ext3 \ afs AFS is related to the filesystems so for any reason the /vicepa is not reachable everything has to be stopped and pass the resources to the other node. So is there a way to do it with cib.xml ? some advice how to monitor the above resource or the group. Just put everything in a group. Add a location constraint with some score (say 100) to prefer that node. And you should get a fencing device and create stonith resources. Good luck! Thanks, Dejan Thanks cristina On Apr 21, 2009, at 3:22 PM, Dejan Muhamedagic wrote: Ciao, On Tue, Apr 21, 2009 at 08:37:17AM +0200, Cristina Bulfon wrote: Ciao, I am lost :-) Dejan, if I understand correctly you mean that I have to rebuild all the packages : heartbeat, pacemaker and ais starting from source. No, just the pacemaker. Or wait until the new packages are built. Unfortunately, I can't say how long it may take. Thanks, Dejan thanks cristina On Apr 20, 2009, at 1:08 PM, Dejan Muhamedagic wrote: Hi, On Fri, Apr 17, 2009 at 03:39:32PM +0100, Jason Fitzpatrick wrote: Hi Cristina that repo should have all the required files in it,, These are quite old. rhel4 weren't built for quite some time. The build log says: cc1: error: unrecognized command line option -Wno-pointer-sign That's in lib/ais/Makefile.am. You could build the package yourself, just remove this option beforehand. Thanks, Dejan Jason [image: [ ]] heartbeat-2.99.2-6.2.i386.rpm http://download.opensuse.org/repositories/server:/ha-clustering/RHEL_4/i386/heartbeat-2.99.2-6.2.i386.rpm 12-Apr-2009 13:36 1.5M Mirrors http://download.opensuse.org/repositories/server:/ha-clustering/RHEL_4/i386/heartbeat-2.99.2-6.2.i386.rpm?mirrorlist Metalink http://download.opensuse.org/repositories/server:/ha-clustering/RHEL_4/i386/heartbeat-2.99.2-6.2.i386.rpm.metalink [image: [ ]] heartbeat-common-2.99.2-6.2.i386.rpm http://download.opensuse.org/repositories/server:/ha-clustering/RHEL_4/i386/heartbeat-common-2.99.2-6.2.i386.rpm 12-Apr-2009 13:36 1.3M Mirrors http://download.opensuse.org/repositories/server:/ha-clustering/RHEL_4/i386/heartbeat-common-2.99.2-6.2.i386.rpm?mirrorlist Metalink http://download.opensuse.org/repositories/server:/ha-clustering/RHEL_4/i386/heartbeat-common-2.99.2-6.2.i386.rpm.metalink [image: [ ]] heartbeat-debug-2.99.2-6.2.i386.rpm http://download.opensuse.org/repositories/server:/ha-clustering/RHEL_4/i386/heartbeat-debug-2.99.2-6.2.i386.rpm 12-Apr-2009 13:36 698K Mirrors http://download.opensuse.org/repositories/server:/ha-clustering/RHEL_4/i386/heartbeat-debug-2.99.2-6.2.i386.rpm?mirrorlist Metalink http://download.opensuse.org/repositories/server:/ha-clustering/RHEL_4/i386/heartbeat-debug-2.99.2-6.2.i386.rpm.metalink [image: [ ]] heartbeat-devel-2.99.2-6.2.i386.rpm http://download.opensuse.org/repositories/server:/ha-clustering/RHEL_4/i386/heartbeat-devel-2.99.2-6.2.i386.rpm 12-Apr-2009 13:36 197K Mirrors http://download.opensuse.org/repositories/server:/ha-clustering/RHEL_4/i386/heartbeat-devel-2.99.2-6.2.i386.rpm?mirrorlist Metalink http://download.opensuse.org/repositories/server:/ha-clustering/RHEL_4/i386/heartbeat-devel-2.99.2-6.2.i386.rpm.metalink [image: [ ]] heartbeat-ldirectord-2.99.2-6.2.i386.rpm
Re: [Linux-HA] [Pacemaker] new doc about stonith/fencing
-Ursprüngliche Nachricht- Von: Peter Kruse p...@q-leap.com Gesendet: 04.05.09 15:19:06 An: pacema...@oss.clusterlabs.org Betreff: Re: [Linux-HA] [Pacemaker] new doc about stonith/fencing Hi Peter, If the PDUs becomes unavailable and shortly after the host is unavailable as well, then assume the host is down and fenced successfully. 'assume' is the bad word here. Stonith is there so that the cluster does NOT have to assume anything, but be SURE that there is a predictible state of the cluster. IMHO you answered your question for yourself. ;-) But I'm also interested in the answer of Dejan. Best regards Andreas ___ Linux-HA mailing list Linux-HA@lists.linux-ha.org http://lists.linux-ha.org/mailman/listinfo/linux-ha See also: http://linux-ha.org/ReportingProblems
Re: [Linux-HA] [Pacemaker] new doc about stonith/fencing
Hi Dejan, Dejan Muhamedagic wrote: As usual, constructive criticism/suggestions/etc are welcome. Thanks for sharing. Allow me to bring up a topic that to my point of view is important. You have written: The lights-out devices (IBM RSA, HP iLO, Dell DRAC) are becoming increasingly popular and in future they may even become standard equipment of of-the-shelf computers. They are, however, inferior to UPS devices, because they share a power supply with their host (a cluster node). If a node stays without power, the device supposed to control it would be just as useless. Even though this is obvious to us, the cluster manager is not in the know and will try to fence the node in vain. This will continue forever because all other resource operations would wait for the fencing/stonith operation to succeed. This is the same problem with PDUs as they share the same power supply with the host as well. Is there any intention to deal with this issue? I'm thinking of the powerfail algorithm: If the PDUs becomes unavailable and shortly after the host is unavailable as well, then assume the host is down and fenced successfully. This would be true if the PDU (and with it the host) loses power. At the moment it looks that stonith without such an algorithm is a SPoF by design, because after a single failure (powerloss), the cluster is not able to bring up the resources again. Looking forward to your comments, Peter ___ Linux-HA mailing list Linux-HA@lists.linux-ha.org http://lists.linux-ha.org/mailman/listinfo/linux-ha See also: http://linux-ha.org/ReportingProblems
Re: [Linux-HA] V1 and VLANs
Am Montag, 4. Mai 2009 15:08:56 schrieb m...@bortal.de: Hello List, i would like to use vlans with heartbeat v1. My /etc/network/interfaces looks like this: iface vlan20 inet static vlan-raw-device eth2 address 192.168.100.1 netmask 255.255.255.0 How do i bring up such a device with haresources? Thanks, Mario ___ Linux-HA mailing list Linux-HA@lists.linux-ha.org http://lists.linux-ha.org/mailman/listinfo/linux-ha See also: http://linux-ha.org/ReportingProblems No idea about v1, but in v2 CRM takes control about IP addresses. The underlying vlan IFs have to be created by the system during startup. -- Dr. Michael Schwartzkopff MultiNET Services GmbH Addresse: Bretonischer Ring 7; 85630 Grasbrunn; Germany Tel: +49 - 89 - 45 69 11 0 Fax: +49 - 89 - 45 69 11 21 mob: +49 - 174 - 343 28 75 mail: mi...@multinet.de web: www.multinet.de Sitz der Gesellschaft: 85630 Grasbrunn Registergericht: Amtsgericht München HRB 114375 Geschäftsführer: Günter Jurgeneit, Hubert Martens --- PGP Fingerprint: F919 3919 FF12 ED5A 2801 DEA6 AA77 57A4 EDD8 979B Skype: misch42 ___ Linux-HA mailing list Linux-HA@lists.linux-ha.org http://lists.linux-ha.org/mailman/listinfo/linux-ha See also: http://linux-ha.org/ReportingProblems