[Linux-HA] Restarting a resource that failed to start

2007-04-12 Thread Piotr Kaczmarzyk
Hi, I'm using version 2.0.8 and I tried to provide a highly-available squid service. I wrote my own OCF script which was tested in two versions: ver 1. 'Start' function started squid, waited a few seconds, then tried to connect to port 8080, issued a HTTP request and returned either $OCF

Re: [Linux-HA] UDP Checksum error in heartbeat packets

2007-04-12 Thread Dominik Klein
That's an awful lot of bad packets, it seems to me. My best guess is that it's a hardware problem. Having a look at all packets on the interface, I actually see these errors not only for the heartbeat packets, but also for other (TCP and UDP) packets. Plus, I also see them on another device (

[Linux-HA] Memory Leaks

2007-04-12 Thread Hariharan Jayaraman
Hi All, We are using linux ha for achieving HA solution for a 2 node system. We have observed memory leaks in the crmd process and seen it grow to beyond 700MB. I am currently running 2.0.8 with a patch that fixes a few memory leak issues in crmd. I am trying to use valgrind and efence to nail

Re: [Linux-HA] Two heartbeats on one machine

2007-04-12 Thread Alan Robertson
Jaroslav Prodelal wrote: > Hello! > >I'd like to ask you about running two HA systems (two heartbeats) on > one machine. Is it posiblle? > >We'd like to setup 2-node clusters between 3 machines which should > looks like that > > > M1 - [HB1-2] - M2 >

Aperi (Re: [Linux-HA] Heartbeat versus Novell Cluster Services)

2007-04-12 Thread Robert Wipfel
>>> On Thu, Apr 12, 2007 at 11:42 AM, in message <[EMAIL PROTECTED]>, Lars Marowsky-Bree <[EMAIL PROTECTED]> wrote: > On 2007- 04- 12T09:21:02, Robert Wipfel <[EMAIL PROTECTED]> wrote: > >> But first, it's probably appropriate to comment on Novell's >> contributions to open source software in gen

[Linux-HA] 2 nodes A/P using pindg+stonith problem

2007-04-12 Thread Yann Dille
Dear Heartbeat user community -and Masters-, I'm having many troubles making working a "simple" DRBD/NFS Active/Passive config in a 2-node cluster as soon as I want to put additional feature to increase the availability in case of network failure : STONITH (suicide) and pingd (failover if the

[Linux-HA] Two heartbeats on one machine

2007-04-12 Thread Jaroslav Prodelal
Hello! I'd like to ask you about running two HA systems (two heartbeats) on one machine. Is it posiblle? We'd like to setup 2-node clusters between 3 machines which should looks like that M1 - [HB1-2] - M2 / \ [HB1-3]

Re: [Linux-HA] Re: heartbeat does not start when the stonith device is not available

2007-04-12 Thread Alan Robertson
Lars Marowsky-Bree wrote: > On 2007-04-12T08:05:29, Alan Robertson <[EMAIL PROTECTED]> wrote: > >> So, if you would consider an R2/CRM/CIB configuration, it interacts with >> STONITH in a completely different way, which might not be ideal either, >> but it should start the other resources (you can

[Linux-HA] Memory Leaks

2007-04-12 Thread Hariharan Jayaraman
Hi All, We are using linux ha for achieving HA solution for a 2 node system. We have observed memory leaks in the crmd process and seen it grow to beyond 700MB. I am currently running 2.0.8 with a patch that fixes a few memory leak issues in crmd. I am trying to use valgrind and efence to nail

Re: [Linux-HA] OCF_RESKEY_interval

2007-04-12 Thread Andrew Beekhof
On 4/12/07, Lars Marowsky-Bree <[EMAIL PROTECTED]> wrote: On 2007-04-12T08:58:15, Andrew Beekhof <[EMAIL PROTECTED]> wrote: > >> >So I thought that probe is maybe never unset > >> > >> correct > > > >http://www.osdl.org/developer_bugzilla/show_bug.cgi?id=1479 > > lmb - not the same thing. the r

Re: [Linux-HA] Heartbeat versus Novell Cluster Services

2007-04-12 Thread Lars Marowsky-Bree
On 2007-04-12T09:21:02, Robert Wipfel <[EMAIL PROTECTED]> wrote: > But first, it's probably appropriate to comment on Novell's > contributions to open source software in general, and the Heartbeat > project in particular. Everyone of course knows Lars and Andrew; > the architect and lead developer

Re: [Linux-HA] Re: heartbeat does not start when the stonith device is not available

2007-04-12 Thread Lars Marowsky-Bree
On 2007-04-12T08:05:29, Alan Robertson <[EMAIL PROTECTED]> wrote: > So, if you would consider an R2/CRM/CIB configuration, it interacts with > STONITH in a completely different way, which might not be ideal either, > but it should start the other resources (you can _make_ it start the > other reso

Re: [Linux-HA] OCF_RESKEY_interval

2007-04-12 Thread Lars Marowsky-Bree
On 2007-04-12T08:58:15, Andrew Beekhof <[EMAIL PROTECTED]> wrote: > >> >So I thought that probe is maybe never unset > >> > >> correct > > > >http://www.osdl.org/developer_bugzilla/show_bug.cgi?id=1479 > > lmb - not the same thing. the resource is not deleted in between the > two types of monito

Re: [Linux-HA] UDP Checksum error in heartbeat packets

2007-04-12 Thread Alan Robertson
Dominik Klein wrote: > Hi > > I use heartbeat 2.0.7 from openSuSE 10.2 > > 10.250.250.27 is master > 10.250.250.28 is backup > They are connected with direct ethernet cable > > When watching UDP heartbeats with tshark, my master machine says this > (checksum errors): > 16:27:24.140509 10.250.250

Re: [Linux-HA] Heartbeat versus Novell Cluster Services

2007-04-12 Thread Robert Wipfel
>>> On Thu, Apr 12, 2007 at 7:58 AM, in message <[EMAIL PROTECTED]>, Alan Robertson <[EMAIL PROTECTED]> wrote: > Sander van Vugt wrote: >> Hi, >> >> Just like to know your opinion about the following. A pure Linux shop >> would of course definitely go for Heartbeat as the solution for high >> av

Re: [Linux-HA] Re: ping/ping_group directives failing WAS: unable to start pingd (cannot start resource groups as a result

2007-04-12 Thread Alan Robertson
Terry L. Inzauro wrote: > Alan Robertson wrote: >> Terry L. Inzauro wrote: >>> Andrew Beekhof wrote: On 4/11/07, Terry L. Inzauro <[EMAIL PROTECTED]> wrote: > list, > > this is a continuation of another thread that was started a few weeks > back. the original thread was >

[Linux-HA] UDP Checksum error in heartbeat packets

2007-04-12 Thread Dominik Klein
Hi I use heartbeat 2.0.7 from openSuSE 10.2 10.250.250.27 is master 10.250.250.28 is backup They are connected with direct ethernet cable When watching UDP heartbeats with tshark, my master machine says this (checksum errors): 16:27:24.140509 10.250.250.28 -> 10.250.250.27 UDP Source port: 327

Re: [Linux-HA] Re: ping/ping_group directives failing WAS: unable to start pingd (cannot start resource groups as a result

2007-04-12 Thread Terry L. Inzauro
Alan Robertson wrote: > Terry L. Inzauro wrote: >> Andrew Beekhof wrote: >>> On 4/11/07, Terry L. Inzauro <[EMAIL PROTECTED]> wrote: list, this is a continuation of another thread that was started a few weeks back. the original thread was started in regards to the set

Re: [Linux-HA] Getting the status of the node

2007-04-12 Thread Alan Robertson
Mark Eisenblaetter wrote: > hi, > > i don't get used to the strukture of the site. I have to search a long > time for partikulare informations. Wenn usinbg the search funktion the > hits are of both versions. > > so i think that i don't find all informations on tis site A lot of information appl

Re: [Linux-HA] Permission on cib.xml

2007-04-12 Thread Alan Robertson
Benjamin Watine wrote: > Andrew Beekhof a écrit : >> On 4/11/07, Benjamin Watine <[EMAIL PROTECTED]> wrote: >>> Alan Robertson a écrit : >>> > Benjamin Watine wrote: >>> >> Alan Robertson a écrit : >>> >>> Benjamin Watine wrote: >>> Hi >>> >>> I'm trying to chmod 660 cib.xml to give

Re: [Linux-HA] Re: heartbeat does not start when the stonith device is not available

2007-04-12 Thread Alan Robertson
Martin wrote: >> The only way to verify the configuration is to talk to it. > Yes - and if there is a problem talking to it, heartbeat should complain > loudly, but it should then continue to run and provide services. Currently > it exits. > >> You surely >> don't want to find out 2 years later

Re: [Linux-HA] Resource

2007-04-12 Thread Alan Robertson
maike wrote: > Hi people, i have a situation and > Resource X is (potentially) active on 2 nodes > How i can work is that? Do you mean that you want it be active on two nodes? Do you mean that we made it active on two nodes? Do you mean that a system administrator made it active on two nodes? D

Re: [Linux-HA] no failback

2007-04-12 Thread Alan Robertson
Bernd Eichenberg wrote: > Hi at all, > > I've to warn you, because I'm a newby at HA and I'm german > with very small english skillz. > Hope you understand my problem and thank for that. > > My Problem is, there is no failback after the 1.node is valid > again. > 1. node is Suse 8 with heartbeat

Re: [Linux-HA] Heartbeat versus Novell Cluster Services

2007-04-12 Thread Alan Robertson
Sander van Vugt wrote: > Hi, > > Just like to know your opinion about the following. A pure Linux shop > would of course definitely go for Heartbeat as the solution for high > availability. However, in an environment that comes from Novell's > NetWare, Novell Cluster Services (NCS) would be the be

Re: [Linux-HA] Re: ping/ping_group directives failing WAS: unable to start pingd (cannot start resource groups as a result

2007-04-12 Thread Alan Robertson
Terry L. Inzauro wrote: > Andrew Beekhof wrote: >> On 4/11/07, Terry L. Inzauro <[EMAIL PROTECTED]> wrote: >>> list, >>> >>> this is a continuation of another thread that was started a few weeks >>> back. the original thread was >>> started in regards >>> to the setup of pingd. this thread is in r

[Linux-HA] Re: ping/ping_group directives failing WAS: unable to start pingd (cannot start resource groups as a result

2007-04-12 Thread Terry L. Inzauro
Andrew Beekhof wrote: > On 4/11/07, Terry L. Inzauro <[EMAIL PROTECTED]> wrote: >> list, >> >> this is a continuation of another thread that was started a few weeks >> back. the original thread was >> started in regards >> to the setup of pingd. this thread is in regards to pingd not being >> able

[Linux-HA] Re: ping/ping_group directives failing WAS: unable to start pingd (cannot start resource groups as a result

2007-04-12 Thread Alan Robertson
Andrew Beekhof wrote: > On 4/11/07, Terry L. Inzauro <[EMAIL PROTECTED]> wrote: >> list, >> >> this is a continuation of another thread that was started a few weeks >> back. the original thread was >> started in regards >> to the setup of pingd. this thread is in regards to pingd not being >> able

[Linux-HA] no failback

2007-04-12 Thread Bernd Eichenberg
Hi at all, I've to warn you, because I'm a newby at HA and I'm german with very small english skillz. Hope you understand my problem and thank for that. My Problem is, there is no failback after the 1.node is valid again. 1. node is Suse 8 with heartbeat that came with. 2. node is debian 3 with

Re: [Linux-HA] Heartbeat versus Novell Cluster Services

2007-04-12 Thread Yan Fitterer
Oh - and I forgot... UI was _much_ nicer last time I looked. And nowhere near as buggy as the HB GUI. Yan Fitterer wrote: > NCS has better integration with EVMS, and has data-network heartbeat. It > does not therefore require STONITH. > > It has had much more testing than HB for large clusters as

Re: [Linux-HA] Heartbeat versus Novell Cluster Services

2007-04-12 Thread Yan Fitterer
NCS has better integration with EVMS, and has data-network heartbeat. It does not therefore require STONITH. It has had much more testing than HB for large clusters as well. 20+ node clusters are not uncommon. Yan Sander van Vugt wrote: > Hi, > > Just like to know your opinion about the followi

[Linux-HA] ping/ping_group directives failing WAS: unable to start pingd (cannot start resource groups as a result

2007-04-12 Thread Andrew Beekhof
On 4/11/07, Terry L. Inzauro <[EMAIL PROTECTED]> wrote: list, this is a continuation of another thread that was started a few weeks back. the original thread was started in regards to the setup of pingd. this thread is in regards to pingd not being able to start for whatever reason and i susp

Re: [Linux-HA] Permission on cib.xml

2007-04-12 Thread Benjamin Watine
Andrew Beekhof a écrit : On 4/11/07, Benjamin Watine <[EMAIL PROTECTED]> wrote: Alan Robertson a écrit : > Benjamin Watine wrote: >> Alan Robertson a écrit : >>> Benjamin Watine wrote: Hi I'm trying to chmod 660 cib.xml to give w/r access to hacluster and haclient. I do it on

Re: [Linux-HA] pingd not failing over

2007-04-12 Thread Andrew Beekhof
i hate to pester, but where are the "fail counts" kept track of and what maintains them? they are stored in the status section and are maintained by the tengine process (which increases it whenever a monitor action fails) there is also a CLI tool called crm_failcount that can be used to view a

Re: [Linux-HA] OCF_RESKEY_interval

2007-04-12 Thread Andrew Beekhof
On 4/11/07, Lars Marowsky-Bree <[EMAIL PROTECTED]> wrote: On 2007-04-10T18:29:10, Andrew Beekhof <[EMAIL PROTECTED]> wrote: > >Apr 10 17:30:25 ha-test-1 process[26425]: Returnig 7 > >Apr 10 17:30:40 ha-test-1 process[26493]: Maintainance = > >Apr 10 17:30:40 ha-test-1 process[26493]: OCF_RESKEY_