Re: [Linux-HA] Heartbeat 2.1.4 and 2.9.9 together?

2009-05-04 Thread Andrew Beekhof
haresources clusters should be fine.
for crm clusters it depends if you go for 1.0 or 0.6

On Fri, May 1, 2009 at 10:32 PM, Mike Sweetser - Adhost
mik...@adhost.com wrote:
 Hello:

 I'm looking to migrate an existing Heartbeat 2.1.4 installation to
 2.9.9.  Would it be possible to upgrade the servers one at a time, which
 would require running one server with 2.1.4 and one server with 2.9.9
 for a short period?  Would there be any incompatibility issues in doing
 so?

 Thank You,
 Mike Sweetser

 ___
 Linux-HA mailing list
 Linux-HA@lists.linux-ha.org
 http://lists.linux-ha.org/mailman/listinfo/linux-ha
 See also: http://linux-ha.org/ReportingProblems

___
Linux-HA mailing list
Linux-HA@lists.linux-ha.org
http://lists.linux-ha.org/mailman/listinfo/linux-ha
See also: http://linux-ha.org/ReportingProblems


Re: [Linux-HA] Problems With SLES11 + DRBD

2009-05-04 Thread Dominik Klein
darren.mans...@opengi.co.uk wrote:
 Hello everyone. Long post, sorry.
 
  
 
 I've been trying to get SLES11 with Pacemaker 1.0 / OpenAIS working for
 most of this week without success so far. I thought I may as well bundle
 my problems into one mail to see if anyone can offer any advice.
 
  
 
 Goal: I'm trying to get a 2 node Active/Passive cluster working with
 DRBD replication, an ext3 FS on top of DRBD and a virtual IP. I want the
 active node to have a mounted FS that I can serve requests from using
 ProFTPD or another FTP daemon. If the active node fails I want the
 cluster to migrate all 4 resources (DRBD, FS, ProFTPD, Virtual IP)
 across to the other node. I don't have any STONITH devices at the
 moment.
 
  
 
 Approach: We are going with SLES11 with Pacemaker 1.0.3 and OpenAIS
 0.80.3, after already using SLES10SP2 with Heartbeat 2.1.4 and
 ldirectord in a live running 2-node Active/Active cluster. We are using
 LVM under DRBD for future disk expansion.
 
  
 
 Problem1 - Using DRBD OCF RA: I wanted to use the latest and greatest
 for the approaches, so tried the DRBD OCF RA following this howto:
 http://www.clusterlabs.org/wiki/DRBD_HowTo_1.0 . The configuration works
 and I can manually migrate resources but if I just reboot the node that
 has the drbd resource on it I see the resource gets migrated to the
 other node for about 2 seconds then is stopped:
 
  
 
 Normal operation:
 
 
 
 Last updated: Fri May  1 16:33:00 2009
 
 Current DC: gihub2 - partition with quorum

And this is your reason. The no-quorum-policy default is stop (you
even configured it, see below), which means do not run any resources if
you do not have qorum. The node is alone, so it does not have quorum.

If you want it to run things anyway, set no-quorum-policy to ignore.
That would be the old heartbeat behaviour.

 Version: 1.0.3-0080ec086ae9c20ad5c4c3562000c0ad68374f0a
 
 2 Nodes configured, 2 expected votes
 
 1 Resources configured.
 
 
 
  
 
 Online: [ gihub1 gihub2 ]
 
  
 
 drbd0   (ocf::heartbeat:drbd):  Started gihub1
 
  
 
  
 
 Reboot gihub1:
 
 
 
 Last updated: Fri May  1 16:35:34 2009
 
 Current DC: gihub2 - partition with quorum
 
 Version: 1.0.3-0080ec086ae9c20ad5c4c3562000c0ad68374f0a
 
 2 Nodes configured, 2 expected votes
 
 1 Resources configured.
 
 
 
  
 
 Online: [ gihub2 ]
 
 OFFLINE: [ gihub1 ]
 
  
 
 drbd0   (ocf::heartbeat:drbd):  Started gihub2
 
  
 
  
 
 Then after a couple of seconds:
 
 
 
 Last updated: Fri May  1 16:37:11 2009
 
 Current DC: gihub2 - partition WITHOUT quorum
 
 Version: 1.0.3-0080ec086ae9c20ad5c4c3562000c0ad68374f0a
 
 2 Nodes configured, 2 expected votes
 
 1 Resources configured.
 
 
 
  
 
 Online: [ gihub2 ]
 
 OFFLINE: [ gihub1 ]
 
  
 
  
 
 /var/log/messages says:
 
 May  1 16:46:33 gihub2 openais[5362]: [TOTEM] The token was lost in the
 OPERATIONAL state.
 
 May  1 16:46:33 gihub2 openais[5362]: [TOTEM] Receive multicast socket
 recv buffer size (262142 bytes).
 
 May  1 16:46:33 gihub2 openais[5362]: [TOTEM] Transmit multicast socket
 send buffer size (262142 bytes).
 
 May  1 16:46:33 gihub2 openais[5362]: [TOTEM] entering GATHER state from
 2.
 
 May  1 16:46:36 gihub2 kernel: drbd0: conn( WFConnection -
 Disconnecting ) 
 
 May  1 16:46:36 gihub2 kernel: drbd0: Discarding network configuration.
 
 May  1 16:46:36 gihub2 kernel: drbd0: Connection closed
 
 May  1 16:46:36 gihub2 kernel: drbd0: conn( Disconnecting - StandAlone
 ) 
 
 May  1 16:46:36 gihub2 kernel: drbd0: receiver terminated
 
 May  1 16:46:36 gihub2 kernel: drbd0: Terminating receiver thread
 
 May  1 16:46:36 gihub2 kernel: drbd0: disk( UpToDate - Diskless ) 
 
 May  1 16:46:36 gihub2 kernel: drbd0: drbd_bm_resize called with
 capacity == 0
 
 May  1 16:46:36 gihub2 kernel: drbd0: worker terminated
 
 May  1 16:46:36 gihub2 kernel: drbd0: Terminating worker thread
 
 May  1 16:46:36 gihub2 openais[5362]: [TOTEM] entering GATHER state from
 0.
 
 May  1 16:46:36 gihub2 openais[5362]: [TOTEM] Creating commit token
 because I am the rep.
 
 May  1 16:46:36 gihub2 openais[5362]: [TOTEM] Saving state aru 6b high
 seq received 6b
 
 May  1 16:46:36 gihub2 lrmd: [5370]: info: rsc:drbd0: stop
 
 May  1 16:46:36 gihub2 cib: [5369]: notice: ais_dispatch: Membership
 400: quorum lost
 
 May  1 16:46:36 gihub2 openais[5362]: [TOTEM] Storing new sequence id
 for ring 190
 
 May  1 16:46:36 gihub2 openais[5362]: [TOTEM] entering COMMIT state.
 
 May  1 16:46:36 gihub2 openais[5362]: [TOTEM] entering RECOVERY state.
 
 May  1 16:46:36 gihub2 openais[5362]: [TOTEM] position [0] member
 2.21.4.41:
 
 May  1 16:46:36 gihub2 openais[5362]: [TOTEM] previous ring seq 396 rep
 2.21.4.40
 
 May  1 16:46:36 gihub2 openais[5362]: [TOTEM] aru 6b high delivered 6b
 received flag 1
 
 May  1 16:46:36 gihub2 openais[5362]: [TOTEM] Did not need to originate
 any messages in recovery.
 
 May  1 16:46:36 gihub2 openais[5362]: [TOTEM] Sending 

Re: [Linux-HA] Problems With SLES11 + DRBD

2009-05-04 Thread Dominik Klein
Dominik Klein wrote:
 darren.mans...@opengi.co.uk wrote:
 Hello everyone. Long post, sorry.

  

 I've been trying to get SLES11 with Pacemaker 1.0 / OpenAIS working for
 most of this week without success so far. I thought I may as well bundle
 my problems into one mail to see if anyone can offer any advice.

  

 Goal: I'm trying to get a 2 node Active/Passive cluster working with
 DRBD replication, an ext3 FS on top of DRBD and a virtual IP. I want the
 active node to have a mounted FS that I can serve requests from using
 ProFTPD or another FTP daemon. If the active node fails I want the
 cluster to migrate all 4 resources (DRBD, FS, ProFTPD, Virtual IP)
 across to the other node. I don't have any STONITH devices at the
 moment.

  

 Approach: We are going with SLES11 with Pacemaker 1.0.3 and OpenAIS
 0.80.3, after already using SLES10SP2 with Heartbeat 2.1.4 and
 ldirectord in a live running 2-node Active/Active cluster. We are using
 LVM under DRBD for future disk expansion.

  

 Problem1 - Using DRBD OCF RA: I wanted to use the latest and greatest
 for the approaches, so tried the DRBD OCF RA following this howto:
 http://www.clusterlabs.org/wiki/DRBD_HowTo_1.0 . The configuration works
 and I can manually migrate resources but if I just reboot the node that
 has the drbd resource on it I see the resource gets migrated to the
 other node for about 2 seconds then is stopped:

  

 Normal operation:

 

 Last updated: Fri May  1 16:33:00 2009

 Current DC: gihub2 - partition with quorum
 
 And this is your reason. 

Bla. (read below)

 The no-quorum-policy default is stop (you
 even configured it, see below), which means do not run any resources if
 you do not have qorum. The node is alone, so it does not have quorum.
 
 If you want it to run things anyway, set no-quorum-policy to ignore.
 That would be the old heartbeat behaviour.
 
 Version: 1.0.3-0080ec086ae9c20ad5c4c3562000c0ad68374f0a

 2 Nodes configured, 2 expected votes

 1 Resources configured.

 

  

 Online: [ gihub1 gihub2 ]

  

 drbd0   (ocf::heartbeat:drbd):  Started gihub1

  

  

 Reboot gihub1:

 

 Last updated: Fri May  1 16:35:34 2009

 Current DC: gihub2 - partition with quorum

 Version: 1.0.3-0080ec086ae9c20ad5c4c3562000c0ad68374f0a

 2 Nodes configured, 2 expected votes

 1 Resources configured.

 

  

 Online: [ gihub2 ]

 OFFLINE: [ gihub1 ]

  

 drbd0   (ocf::heartbeat:drbd):  Started gihub2

  

  

 Then after a couple of seconds:

 

 Last updated: Fri May  1 16:37:11 2009

 Current DC: gihub2 - partition WITHOUT quorum

Here you are without quorum.
Sorry.

Regards
Dominik

 Version: 1.0.3-0080ec086ae9c20ad5c4c3562000c0ad68374f0a

 2 Nodes configured, 2 expected votes

 1 Resources configured.

 

  

 Online: [ gihub2 ]

 OFFLINE: [ gihub1 ]

  

  

 /var/log/messages says:

 May  1 16:46:33 gihub2 openais[5362]: [TOTEM] The token was lost in the
 OPERATIONAL state.

 May  1 16:46:33 gihub2 openais[5362]: [TOTEM] Receive multicast socket
 recv buffer size (262142 bytes).

 May  1 16:46:33 gihub2 openais[5362]: [TOTEM] Transmit multicast socket
 send buffer size (262142 bytes).

 May  1 16:46:33 gihub2 openais[5362]: [TOTEM] entering GATHER state from
 2.

 May  1 16:46:36 gihub2 kernel: drbd0: conn( WFConnection -
 Disconnecting ) 

 May  1 16:46:36 gihub2 kernel: drbd0: Discarding network configuration.

 May  1 16:46:36 gihub2 kernel: drbd0: Connection closed

 May  1 16:46:36 gihub2 kernel: drbd0: conn( Disconnecting - StandAlone
 ) 

 May  1 16:46:36 gihub2 kernel: drbd0: receiver terminated

 May  1 16:46:36 gihub2 kernel: drbd0: Terminating receiver thread

 May  1 16:46:36 gihub2 kernel: drbd0: disk( UpToDate - Diskless ) 

 May  1 16:46:36 gihub2 kernel: drbd0: drbd_bm_resize called with
 capacity == 0

 May  1 16:46:36 gihub2 kernel: drbd0: worker terminated

 May  1 16:46:36 gihub2 kernel: drbd0: Terminating worker thread

 May  1 16:46:36 gihub2 openais[5362]: [TOTEM] entering GATHER state from
 0.

 May  1 16:46:36 gihub2 openais[5362]: [TOTEM] Creating commit token
 because I am the rep.

 May  1 16:46:36 gihub2 openais[5362]: [TOTEM] Saving state aru 6b high
 seq received 6b

 May  1 16:46:36 gihub2 lrmd: [5370]: info: rsc:drbd0: stop

 May  1 16:46:36 gihub2 cib: [5369]: notice: ais_dispatch: Membership
 400: quorum lost

 May  1 16:46:36 gihub2 openais[5362]: [TOTEM] Storing new sequence id
 for ring 190

 May  1 16:46:36 gihub2 openais[5362]: [TOTEM] entering COMMIT state.

 May  1 16:46:36 gihub2 openais[5362]: [TOTEM] entering RECOVERY state.

 May  1 16:46:36 gihub2 openais[5362]: [TOTEM] position [0] member
 2.21.4.41:

 May  1 16:46:36 gihub2 openais[5362]: [TOTEM] previous ring seq 396 rep
 2.21.4.40

 May  1 16:46:36 gihub2 openais[5362]: [TOTEM] aru 6b high delivered 6b
 received flag 1

 May  1 16:46:36 gihub2 openais[5362]: [TOTEM] Did not need to originate
 any messages in recovery.

 May  1 16:46:36 gihub2 

Re: [Linux-HA] crm CLI

2009-05-04 Thread Cristina Bulfon

Ciao,

in attachment my cib.xml where I've already have a group and location  
constraint with score=100.
If I understood correctly, the score is related to all resources so if  
I don't have one of them
the score is less than 100 and everything will be migrated to the  
standby node, is that true ?


If yes then I need to find another way to monitor a mount resource  
because if I disconnect
the SAN's fiber cable I still see the mounted filesystems and the  
score's group is 100 :-((


thanks

cristina



On Apr 24, 2009, at 3:52 PM, Dejan Muhamedagic wrote:


Ciao,

On Fri, Apr 24, 2009 at 09:29:12AM +0200, Cristina Bulfon wrote:

Ciao,

I tried to build pacemaker rpm w/o success :-((
I will do another time and it will fail then I am going to compile  
the

source.


You can just grab the source rpm and comment out the offending
line in lib/ais/Makefile.am and do rpmbuild -bb. First install
all the build requirements (heartbeat/openais-dev).

Anyway I thought to use pacemaker because I understand that is the  
better

way to
modify cib.xml but if there are any other way to do it I will.


pacemaker is the best if you're starting now. The crm shell will
help you avoid xml in case you're allergic to it.


However my focus point  is :

- my resource group is composed from 4 single resources: 2  
Filesystem,

Ipaddr  and AFS.
for better comprehension follow the haresource file ( from my point  
of view

is more readable than cib.xml)


True, but cib is more powerful :)


a.roma1.infn.it \
   IPaddr::X.X.X.31/24/eth0 \
   Filesystem::/dev/AFS/sda3::/vicepa/::xfs \
   Filesystem::/dev/AFS/sda1::/usr/afs::ext3 \
   afs

AFS is related to the filesystems so for any reason the /vicepa is  
not
reachable everything has to be stopped and pass the resources to  
the other

node.
So is there a way to do it with cib.xml ? some advice how to  
monitor the

above resource or the group.


Just put everything in a group. Add a location constraint with
some score (say 100) to prefer that node. And you should get a
fencing device and create stonith resources.

Good luck!

Thanks,

Dejan


Thanks

cristina

On Apr 21, 2009, at 3:22 PM, Dejan Muhamedagic wrote:


Ciao,

On Tue, Apr 21, 2009 at 08:37:17AM +0200, Cristina Bulfon wrote:

Ciao,

I am lost :-)
Dejan, if I understand correctly you mean that I have to rebuild  
all the

packages : heartbeat, pacemaker and ais
starting from source.


No, just the pacemaker. Or wait until the new packages are built.
Unfortunately, I can't say how long it may take.

Thanks,

Dejan


thanks

cristina

On Apr 20, 2009, at 1:08 PM, Dejan Muhamedagic wrote:


Hi,

On Fri, Apr 17, 2009 at 03:39:32PM +0100, Jason Fitzpatrick wrote:

Hi Cristina

that repo should have all the required files in it,,


These are quite old. rhel4 weren't built for quite some time. The
build log says:

cc1: error: unrecognized command line option -Wno-pointer-sign

That's in lib/ais/Makefile.am. You could build the package
yourself, just remove this option beforehand.

Thanks,

Dejan


Jason

[image: [   ]] heartbeat-2.99.2-6.2.i386.rpm
http://download.opensuse.org/repositories/server:/ha-clustering/RHEL_4/i386/heartbeat-2.99.2-6.2.i386.rpm 


   12-Apr-2009 13:36  1.5M   Mirrors
http://download.opensuse.org/repositories/server:/ha-clustering/RHEL_4/i386/heartbeat-2.99.2-6.2.i386.rpm?mirrorlist 


Metalink
http://download.opensuse.org/repositories/server:/ha-clustering/RHEL_4/i386/heartbeat-2.99.2-6.2.i386.rpm.metalink 


[image: [   ]] heartbeat-common-2.99.2-6.2.i386.rpm
http://download.opensuse.org/repositories/server:/ha-clustering/RHEL_4/i386/heartbeat-common-2.99.2-6.2.i386.rpm 


12-Apr-2009 13:36  1.3M   Mirrors
http://download.opensuse.org/repositories/server:/ha-clustering/RHEL_4/i386/heartbeat-common-2.99.2-6.2.i386.rpm?mirrorlist 


Metalink
http://download.opensuse.org/repositories/server:/ha-clustering/RHEL_4/i386/heartbeat-common-2.99.2-6.2.i386.rpm.metalink 


[image: [   ]] heartbeat-debug-2.99.2-6.2.i386.rpm
http://download.opensuse.org/repositories/server:/ha-clustering/RHEL_4/i386/heartbeat-debug-2.99.2-6.2.i386.rpm 


 12-Apr-2009 13:36  698K   Mirrors
http://download.opensuse.org/repositories/server:/ha-clustering/RHEL_4/i386/heartbeat-debug-2.99.2-6.2.i386.rpm?mirrorlist 


Metalink
http://download.opensuse.org/repositories/server:/ha-clustering/RHEL_4/i386/heartbeat-debug-2.99.2-6.2.i386.rpm.metalink 


[image: [   ]] heartbeat-devel-2.99.2-6.2.i386.rpm
http://download.opensuse.org/repositories/server:/ha-clustering/RHEL_4/i386/heartbeat-devel-2.99.2-6.2.i386.rpm 


 12-Apr-2009 13:36  197K   Mirrors
http://download.opensuse.org/repositories/server:/ha-clustering/RHEL_4/i386/heartbeat-devel-2.99.2-6.2.i386.rpm?mirrorlist 


Metalink
http://download.opensuse.org/repositories/server:/ha-clustering/RHEL_4/i386/heartbeat-devel-2.99.2-6.2.i386.rpm.metalink 


[image: [   ]] heartbeat-ldirectord-2.99.2-6.2.i386.rpm

Re: [Linux-HA] [Pacemaker] new doc about stonith/fencing

2009-05-04 Thread Andreas Mock
 -Ursprüngliche Nachricht-
 Von: Peter Kruse p...@q-leap.com
 Gesendet: 04.05.09 15:19:06
 An:   pacema...@oss.clusterlabs.org
 Betreff: Re: [Linux-HA] [Pacemaker] new doc about stonith/fencing

Hi Peter,

 If the PDUs becomes unavailable and shortly after the host is unavailable as
 well, then assume the host is down and fenced successfully.

'assume' is the bad word here. Stonith is there so that the cluster does NOT 
have
to assume anything, but be SURE that there is a predictible state of the 
cluster.

IMHO you answered your question for yourself.  ;-)

But I'm also interested in the answer of Dejan. 

Best regards
Andreas

___
Linux-HA mailing list
Linux-HA@lists.linux-ha.org
http://lists.linux-ha.org/mailman/listinfo/linux-ha
See also: http://linux-ha.org/ReportingProblems


Re: [Linux-HA] [Pacemaker] new doc about stonith/fencing

2009-05-04 Thread Peter Kruse
Hi Dejan,

Dejan Muhamedagic wrote:
 As usual, constructive criticism/suggestions/etc are welcome.

Thanks for sharing.
Allow me to bring up a topic that to my point of view is important.
You have written:

 The lights-out devices (IBM RSA, HP iLO, Dell DRAC) are becoming increasingly 
 popular
 and in future they may even become standard equipment of of-the-shelf 
 computers.
 They are, however, inferior to UPS devices, because they share a power supply 
 with their
 host (a cluster node). If a node stays without power, the device supposed to 
 control it
 would be just as useless. Even though this is obvious to us, the cluster 
 manager is not
 in the know and will try to fence the node in vain. This will continue 
 forever because all
 other resource operations would wait for the fencing/stonith operation to 
 succeed.

This is the same problem with PDUs as they share the same power supply with
the host as well.  Is there any intention to deal with this issue?  I'm
thinking of the powerfail algorithm:

If the PDUs becomes unavailable and shortly after the host is unavailable as
well, then assume the host is down and fenced successfully.

This would be true if the PDU (and with it the host) loses power.
At the moment it looks that stonith without such an algorithm is
a SPoF by design, because after a single failure (powerloss), the
cluster is not able to bring up the resources again.

Looking forward to your comments,

   Peter

___
Linux-HA mailing list
Linux-HA@lists.linux-ha.org
http://lists.linux-ha.org/mailman/listinfo/linux-ha
See also: http://linux-ha.org/ReportingProblems


Re: [Linux-HA] V1 and VLANs

2009-05-04 Thread Michael Schwartzkopff
Am Montag, 4. Mai 2009 15:08:56 schrieb m...@bortal.de:
 Hello List,

 i would like to use vlans with heartbeat v1. My /etc/network/interfaces
 looks like this:

 iface vlan20 inet static
 vlan-raw-device eth2
 address 192.168.100.1
 netmask 255.255.255.0


 How do i bring up such a device with haresources?

 Thanks,
 Mario
 ___
 Linux-HA mailing list
 Linux-HA@lists.linux-ha.org
 http://lists.linux-ha.org/mailman/listinfo/linux-ha
 See also: http://linux-ha.org/ReportingProblems

No idea about v1, but in v2 CRM takes control about IP addresses. The 
underlying vlan IFs have to be created by the system during startup.

-- 
Dr. Michael Schwartzkopff
MultiNET Services GmbH
Addresse: Bretonischer Ring 7; 85630 Grasbrunn; Germany
Tel: +49 - 89 - 45 69 11 0
Fax: +49 - 89 - 45 69 11 21
mob: +49 - 174 - 343 28 75

mail: mi...@multinet.de
web: www.multinet.de

Sitz der Gesellschaft: 85630 Grasbrunn
Registergericht: Amtsgericht München HRB 114375
Geschäftsführer: Günter Jurgeneit, Hubert Martens

---

PGP Fingerprint: F919 3919 FF12 ED5A 2801 DEA6 AA77 57A4 EDD8 979B
Skype: misch42
___
Linux-HA mailing list
Linux-HA@lists.linux-ha.org
http://lists.linux-ha.org/mailman/listinfo/linux-ha
See also: http://linux-ha.org/ReportingProblems