[Linux-HA] strange 3 node behavior with named

2009-04-24 Thread Jiann-Ming Su
I have a 3 node cluster with CRM.  One node is configure for quorum
only.  I have the Default Resource Stickiness set to INFINITY for the
cluster.  When I take the quorum node down, the two other nodes seem
to renegotiate.  This seems completely unnecessary as the two nodes
have quorum between the two and the resource should not change.
During this process of figuring out who should keep running the
services, it tries to restart an already running named process:

Apr 24 18:54:30 dhcp1-admin lrmd: [2901]: info: RA output:
(named:start:stdout) Starting named:
Apr 24 18:54:30 dhcp1-admin lrmd: [2901]: info: RA output:
(named:start:stdout) named: already running[FAILED]
Apr 24 18:54:30 dhcp1-admin lrmd: [2901]: info: RA output:
(named:start:stdout)

Of the 8 other resources managed by heartbeat, named is the only one
it tried to restart.  This puts the entire cluster into a half working
state.  Cleaning up the resource gets the cluster back to full health.
 Why is this only happening with the named process?  Thanks for any
insights.




-- 
Jiann-Ming Su
"I have to decide between two equally frightening options.
 If I wanted to do that, I'd vote." --Duckman
"The system's broke, Hank.  The election baby has peed in
the bath water.  You got to throw 'em both out."  --Dale Gribble
"Those who vote decide nothing.
Those who count the votes decide everything.”  --Joseph Stalin
___
Linux-HA mailing list
Linux-HA@lists.linux-ha.org
http://lists.linux-ha.org/mailman/listinfo/linux-ha
See also: http://linux-ha.org/ReportingProblems


Re: [Linux-HA] OCF and hb_gui problem

2009-04-24 Thread Jiann-Ming Su
On Thu, Apr 16, 2009 at 10:47 AM, Dejan Muhamedagic  wrote:
> Hi,
>
> On Wed, Apr 15, 2009 at 09:37:46PM -0400, Jiann-Ming Su wrote:
>> On Tue, Apr 14, 2009 at 9:44 AM, Jiann-Ming Su  wrote:
>> >
>> > Yep, that's what mine looks like. ?It's installed with the same custom
>> > rpm package that I've used on our other Linux HA clusters (12 systems
>> > total). ?It's only affecting this one cluster member. ?The other two
>> > systems in this particular cluster work fine. ?And, this one was
>> > working fine at one point since I had used it in the past to configure
>> > the cluster it is in. ?I'll try forcing a reinstall of the rpm package
>> > and see if that fixes the problem.
>> >
>>
>> Rather than force a reinstall, I simply did a freshen and verify with
>> rpm.  According to rpm, the installed files seem to be fine.
>
> rpm --verify?
>

--verify and --freshen

>
> Strange. You can set debug to 1 or 2 in ha.cf. Try then
> lrmadmin -C and lrmadmin -T ocf/lsb. Then look for messages from
> lrmd.
>

I even set debug to 3.  No logs when I run:

# /usr/lib/heartbeat/lrmadmin -T ocf
There are 0 RAs:

With "lsb" there are 63 RAs listed.

> BTW, did you try to read directories/files under /usr/lib/ocf as a normal 
> user?

I'm using root on the systems that work and the one that doesn't.
-- 
Jiann-Ming Su
"I have to decide between two equally frightening options.
 If I wanted to do that, I'd vote." --Duckman
"The system's broke, Hank.  The election baby has peed in
the bath water.  You got to throw 'em both out."  --Dale Gribble
"Those who vote decide nothing.
Those who count the votes decide everything.”  --Joseph Stalin
___
Linux-HA mailing list
Linux-HA@lists.linux-ha.org
http://lists.linux-ha.org/mailman/listinfo/linux-ha
See also: http://linux-ha.org/ReportingProblems


[Linux-HA] stonith and pingd question

2009-04-24 Thread Annette Jäkel
Hi,
I dont understand the meaning of a stonith clone resource. I have a two node
cluster and every node has to stonith to the IP of the other node. So I have
two stonith resources and a placement for each of them. But if I define only
one stonith resource and clone it, the params would be the same on each
clone, wouldn't it?

Until now I have no pingd resource. If I define a pingd resource, do I
further need a stonith resource or is it an either...or decision? Seems to
me stonith is more than pingd because pingd only checks communication and
decide to fence a resource but stonith can make a decision to
shutdown/reboot a whole node (regardingly to the stonith resource used). But
depends stonith from the results of pingd checks? If I define ucasts in
ha.cf, stonith and a pingd resource - when would a stonith be shoot?

Thanks for a hint.
Best regards,
Annette



___
Linux-HA mailing list
Linux-HA@lists.linux-ha.org
http://lists.linux-ha.org/mailman/listinfo/linux-ha
See also: http://linux-ha.org/ReportingProblems


Re: [Linux-HA] stonith and pingd question

2009-04-24 Thread Dejan Muhamedagic
Hi,

On Fri, Apr 24, 2009 at 05:11:17PM +0200, Annette J?kel wrote:
> Hi,
> I dont understand the meaning of a stonith clone resource. I have a two node
> cluster and every node has to stonith to the IP of the other node. So I have
> two stonith resources and a placement for each of them. But if I define only
> one stonith resource and clone it, the params would be the same on each
> clone, wouldn't it?

Depends on the plugin, i.e. if it supports multiple hostnames.
Most support only one and in that case obviously you need two
resources (clones won't make sense in that case).

> Until now I have no pingd resource. If I define a pingd resource, do I
> further need a stonith resource or is it an either...or decision?

It's not. You need both.

> Seems to
> me stonith is more than pingd because pingd only checks communication and
> decide to fence a resource but stonith can make a decision to

pingd can't fence. It can only help CRM decide where to place
resources.

> shutdown/reboot a whole node (regardingly to the stonith resource used). But
> depends stonith from the results of pingd checks?

No.

> If I define ucasts in
> ha.cf, stonith and a pingd resource - when would a stonith be shoot?

Typical cases for fencing is if we can't talk to a node, hence we
don't know what's going on there, and if a resource fails to
stop, because then we don't know in which state that resource is.
There could be others. For example, you can set on-fail="fence"
for any operation on any resource.

Thanks,

Dejan

P.S. See the other recent post and take a look at the document. I
hope that there are still people reading documentation :)

> Thanks for a hint.
> Best regards,
> Annette
> 
> 
> 
> ___
> Linux-HA mailing list
> Linux-HA@lists.linux-ha.org
> http://lists.linux-ha.org/mailman/listinfo/linux-ha
> See also: http://linux-ha.org/ReportingProblems
___
Linux-HA mailing list
Linux-HA@lists.linux-ha.org
http://lists.linux-ha.org/mailman/listinfo/linux-ha
See also: http://linux-ha.org/ReportingProblems


Re: [Linux-HA] clvm with opensuse 11.1

2009-04-24 Thread Karl Katzke

Howdy! Thanks for the responses.  

Surprisingly, clvmd does work as-is in 11.1, as long as you have
openais and pacemaker installed and running already. However, it does
not (obviously) show up as a resource controlled by pacemaker in
crm_mon; "0 resources configured." ocfs2 also works, but has to be
started manually on reboot because obviously clvmd isn't running yet ...
(see below)   

I'm a newbie to pacemaker configuration, so I haven't yet figured out
how to get clvmd set up as a service with pacemaker. I have it started
manually right now. This obviously isn't proper, and some RTFM would
definitely solve the issue, but I'd like to go home at some point
tonight...  

Queries, some of which are ocfs2-specific and you can skip them if you
can't answer:  
1) Where are the dummy examples for adding clvmd to pacemaker? 
2) What other pieces do I need to add to /etc/init.d/ and set to start?
 
3) Right now, after starting clvmd I need to run pvscan and lvscan
before I can see the lvm2 volume, whereupon I can start ocfs2 and mount
the volume. How do I automate this process, or will pacemaker do it for
me once I have it configured properly?  
4) I'm using the standard o2cb cluster stack; do I need to change this
for pacemaker? (again, obviously it's working, but...)  
5) Fencing. Does ocfs2 handle the fencing, or do I need to find and
install a fencing manager that I haven't found so far? 

I sincerely apologize for the newbie questions, and I promise that "I'm
blogging this" and once I've got it working and repeatable, it'll be
googleable.  

-K 

>>> Lars Marowsky-Bree  4/24/2009 6:24 AM >>>
On 2009-04-23T18:44:35, Karl Katzke  wrote:

>
> Howdy!
>
> I'm attempting to get CLVM set up on OpenSuSE 11.1 with pacemaker and
openais. So far I *seem* to have most things installed, but I'm having
difficulty getting the lvm groups to talk. 

I don't think cLVM in 11.1 works as-is. Which packages did you
install?

We fixed it in SLE11 HA, but that was post-11.1; though Factory should
have all the code, so should the openSUSE build service.

> Before I start reinventing the wheel and asking a whole load of
stupid
> questions, has anyone produced a howto on getting clvmd set up with
> lvm2? I did find an older howto;
> http://sources.redhat.com/cluster/doc/usage.txt -- but it seems to
be
> quite outdated at this point, and requires a bunch of packages and
> resources that aren't even available in the OpenSuSE tree. 

It is quite simple; change the locking type in lvm.conf to 3, add the
ocf:pacemaker:controld and ocf:lvm2:clvmd resources as clones to the
configuration, and then vgcreate -c etc will simply start working.

It's also documented in the SLE11 HA docs; they are LGPL, but due to
the
release pressure, we've not yet sync'ed the latest copy to the outside
I
think.


Regards,
Lars

--
SuSE Labs, OPS Engineering, Novell, Inc.
SUSE LINUX Products GmbH, GF: Markus Rex, HRB 16746 (AG Nürnberg)
"Experience is the name everyone gives to their mistakes." -- Oscar
Wilde

___
Linux-HA mailing list
Linux-HA@lists.linux-ha.org
http://lists.linux-ha.org/mailman/listinfo/linux-ha
See also: http://linux-ha.org/ReportingProblems
___
Linux-HA mailing list
Linux-HA@lists.linux-ha.org
http://lists.linux-ha.org/mailman/listinfo/linux-ha
See also: http://linux-ha.org/ReportingProblems


[Linux-HA] new doc about stonith/fencing

2009-04-24 Thread Dejan Muhamedagic
Hi,

Trying to make it a bit less mysterious, I wrote something about
fencing and stonith quite a while ago and then forgot to share
the link. Sorry about that.

Here it is:

http://www.clusterlabs.org/mediawiki/images/f/f2/Crm_fencing.pdf

As usual, constructive criticism/suggestions/etc are welcome.
I won't be able to read your impressions for the next two weeks,
but will sure look forward to see them afterwards.

Cheers,

Dejan
___
Linux-HA mailing list
Linux-HA@lists.linux-ha.org
http://lists.linux-ha.org/mailman/listinfo/linux-ha
See also: http://linux-ha.org/ReportingProblems


Re: [Linux-HA] clvm with opensuse 11.1

2009-04-24 Thread Dejan Muhamedagic
Hi,

On Fri, Apr 24, 2009 at 01:24:50PM +0200, Lars Marowsky-Bree wrote:
> On 2009-04-23T18:44:35, Karl Katzke  wrote:
> 
> > 
> > Howdy! 
> > 
> > I'm attempting to get CLVM set up on OpenSuSE 11.1 with pacemaker and 
> > openais. So far I *seem* to have most things installed, but I'm having 
> > difficulty getting the lvm groups to talk.  
> 
> I don't think cLVM in 11.1 works as-is. Which packages did you install?
> 
> We fixed it in SLE11 HA, but that was post-11.1; though Factory should
> have all the code, so should the openSUSE build service.
> 
> > Before I start reinventing the wheel and asking a whole load of stupid
> > questions, has anyone produced a howto on getting clvmd set up with
> > lvm2? I did find an older howto;
> > http://sources.redhat.com/cluster/doc/usage.txt -- but it seems to be
> > quite outdated at this point, and requires a bunch of packages and
> > resources that aren't even available in the OpenSuSE tree.  
> 
> It is quite simple; change the locking type in lvm.conf to 3, add the
> ocf:pacemaker:controld and ocf:lvm2:clvmd resources as clones to the
> configuration, and then vgcreate -c etc will simply start working.

Looks like a good candidate for a crm shell template ;-)

Dejan

> It's also documented in the SLE11 HA docs; they are LGPL, but due to the
> release pressure, we've not yet sync'ed the latest copy to the outside I
> think.
> 
> 
> Regards,
> Lars
> 
> -- 
> SuSE Labs, OPS Engineering, Novell, Inc.
> SUSE LINUX Products GmbH, GF: Markus Rex, HRB 16746 (AG N?rnberg)
> "Experience is the name everyone gives to their mistakes." -- Oscar Wilde
> 
> ___
> Linux-HA mailing list
> Linux-HA@lists.linux-ha.org
> http://lists.linux-ha.org/mailman/listinfo/linux-ha
> See also: http://linux-ha.org/ReportingProblems
___
Linux-HA mailing list
Linux-HA@lists.linux-ha.org
http://lists.linux-ha.org/mailman/listinfo/linux-ha
See also: http://linux-ha.org/ReportingProblems


Re: [Linux-HA] Snmp monitoring of HA

2009-04-24 Thread Dejan Muhamedagic
Hi,

On Fri, Apr 24, 2009 at 10:11:19AM +0200, elziege7 wrote:
> > ipfail works only with v1. For v2 use pingd.
> >
> > Due to a bug, the IPaddr resource is not converted properly
> > (parameters are mixed). You'll have to edit the CIB and fix them.
> > Use cibadmin.
> 
> Thanks for your help Dejan!
> 
> I have changed the ha.cf (pingd instead of ipfail) and corrected the
> IP Adress Resource with cibadmin (nic and subnet were interchanged)
> restarted hearbeat.
> 
> It works better now, but there is still one problem. The shared IP is
> assigned correctly to the master node and the DRBD device is also
> mounted correctly. The problem is that the applications that HA should
> handle for me (postfix, dovecot and mysql) are being started and
> stopped every few seconds on the master node.

That's probably that these LSB resource agents (aka init scripts)
are not exactly LSB compliant. The logs are too frightening to
read, but you can look for just what's happening with resources:

grep lrmd.* /var/log/ha-log

The info::  will tell you what is about to be run:

> lrmd[11800]: 2009/04/24_08:34:42 info: rsc:dovecot_4: stop

and then just follow your resource from there.

For instance:

> lrmd[11800]: 2009/04/24_08:34:44 info: RA output:
> (dovecot_4:monitor:stderr) Usage: /etc/init.d/dovecot
> {start|stop|restart|force-reload}

This RA is missing the status action which is required. Either
find one which has it, or implement one yourself, or write an OCF
RA and contribute :)

It's very important that all your resource agents comply with
LSB/OCF standard. Take a look here:

http://www.linux-ha.org/LSBResourceAgent

I'm afraid that you can't rely on your distribution to do that
correctly. And then the best would be, given enough patience, to
test your resources by hand.

The best is to use the supplied resource agents (from heartbeat
and pacemaker) because they are more robust and written to work
with the cluster. So, perhaps you should switch from the lsb
mysql to the OCF mysql.

I hope that you got enough information to find out what's wrong
yourself ;-)

Thanks,

Dejan

> Thanks in advance for your support
> ___
> Linux-HA mailing list
> Linux-HA@lists.linux-ha.org
> http://lists.linux-ha.org/mailman/listinfo/linux-ha
> See also: http://linux-ha.org/ReportingProblems
___
Linux-HA mailing list
Linux-HA@lists.linux-ha.org
http://lists.linux-ha.org/mailman/listinfo/linux-ha
See also: http://linux-ha.org/ReportingProblems


Re: [Linux-HA] crm CLI

2009-04-24 Thread Dejan Muhamedagic
Ciao,

On Fri, Apr 24, 2009 at 09:29:12AM +0200, Cristina Bulfon wrote:
> Ciao,
>
> I tried to build pacemaker rpm w/o success :-((
> I will do another time and it will fail then I am going to compile the 
> source.

You can just grab the source rpm and comment out the offending
line in lib/ais/Makefile.am and do rpmbuild -bb. First install
all the build requirements (heartbeat/openais-dev).

> Anyway I thought to use pacemaker because I understand that is the better 
> way to
> modify cib.xml but if there are any other way to do it I will.

pacemaker is the best if you're starting now. The crm shell will
help you avoid xml in case you're allergic to it.

> However my focus point  is :
>
> - my resource group is composed from 4 single resources: 2 Filesystem, 
> Ipaddr  and AFS.
> for better comprehension follow the haresource file ( from my point of view 
> is more readable than cib.xml)

True, but cib is more powerful :)

> a.roma1.infn.it \
> IPaddr::X.X.X.31/24/eth0 \
> Filesystem::/dev/AFS/sda3::/vicepa/::xfs \
> Filesystem::/dev/AFS/sda1::/usr/afs::ext3 \
> afs
>
> AFS is related to the filesystems so for any reason the /vicepa is not 
> reachable everything has to be stopped and pass the resources to the other 
> node.
> So is there a way to do it with cib.xml ? some advice how to monitor the 
> above resource or the group.

Just put everything in a group. Add a location constraint with
some score (say 100) to prefer that node. And you should get a
fencing device and create stonith resources.

Good luck!

Thanks,

Dejan

> Thanks
>
> cristina
>
> On Apr 21, 2009, at 3:22 PM, Dejan Muhamedagic wrote:
>
>> Ciao,
>>
>> On Tue, Apr 21, 2009 at 08:37:17AM +0200, Cristina Bulfon wrote:
>>> Ciao,
>>>
>>> I am lost :-)
>>> Dejan, if I understand correctly you mean that I have to rebuild all the
>>> packages : heartbeat, pacemaker and ais
>>> starting from source.
>>
>> No, just the pacemaker. Or wait until the new packages are built.
>> Unfortunately, I can't say how long it may take.
>>
>> Thanks,
>>
>> Dejan
>>
>>> thanks
>>>
>>> cristina
>>>
>>> On Apr 20, 2009, at 1:08 PM, Dejan Muhamedagic wrote:
>>>
 Hi,

 On Fri, Apr 17, 2009 at 03:39:32PM +0100, Jason Fitzpatrick wrote:
> Hi Cristina
>
> that repo should have all the required files in it,,

 These are quite old. rhel4 weren't built for quite some time. The
 build log says:

 cc1: error: unrecognized command line option "-Wno-pointer-sign"

 That's in lib/ais/Makefile.am. You could build the package
 yourself, just remove this option beforehand.

 Thanks,

 Dejan

> Jason
>
> [image: [   ]] heartbeat-2.99.2-6.2.i386.rpm
> 
> 12-Apr-2009 13:36  1.5M   Mirrors
> 
> Metalink
> 
> [image: [   ]] heartbeat-common-2.99.2-6.2.i386.rpm
> 
>  12-Apr-2009 13:36  1.3M   Mirrors
> 
> Metalink
> 
> [image: [   ]] heartbeat-debug-2.99.2-6.2.i386.rpm
> 
>   12-Apr-2009 13:36  698K   Mirrors
> 
> Metalink
> 
> [image: [   ]] heartbeat-devel-2.99.2-6.2.i386.rpm
> 
>   12-Apr-2009 13:36  197K   Mirrors
> 
> Metalink
> 
> [image: [   ]] heartbeat-ldirectord-2.99.2-6.2.i386.rpm
> 
>  12-Apr-2009 13:36  131K   Mirrors
> 

Re: [Linux-HA] clvm with opensuse 11.1

2009-04-24 Thread Lars Marowsky-Bree
On 2009-04-23T18:44:35, Karl Katzke  wrote:

> 
> Howdy! 
> 
> I'm attempting to get CLVM set up on OpenSuSE 11.1 with pacemaker and 
> openais. So far I *seem* to have most things installed, but I'm having 
> difficulty getting the lvm groups to talk.  

I don't think cLVM in 11.1 works as-is. Which packages did you install?

We fixed it in SLE11 HA, but that was post-11.1; though Factory should
have all the code, so should the openSUSE build service.

> Before I start reinventing the wheel and asking a whole load of stupid
> questions, has anyone produced a howto on getting clvmd set up with
> lvm2? I did find an older howto;
> http://sources.redhat.com/cluster/doc/usage.txt -- but it seems to be
> quite outdated at this point, and requires a bunch of packages and
> resources that aren't even available in the OpenSuSE tree.  

It is quite simple; change the locking type in lvm.conf to 3, add the
ocf:pacemaker:controld and ocf:lvm2:clvmd resources as clones to the
configuration, and then vgcreate -c etc will simply start working.

It's also documented in the SLE11 HA docs; they are LGPL, but due to the
release pressure, we've not yet sync'ed the latest copy to the outside I
think.


Regards,
Lars

-- 
SuSE Labs, OPS Engineering, Novell, Inc.
SUSE LINUX Products GmbH, GF: Markus Rex, HRB 16746 (AG Nürnberg)
"Experience is the name everyone gives to their mistakes." -- Oscar Wilde

___
Linux-HA mailing list
Linux-HA@lists.linux-ha.org
http://lists.linux-ha.org/mailman/listinfo/linux-ha
See also: http://linux-ha.org/ReportingProblems


Re: [Linux-HA] Snmp monitoring of HA

2009-04-24 Thread elziege7
> ipfail works only with v1. For v2 use pingd.
>
> Due to a bug, the IPaddr resource is not converted properly
> (parameters are mixed). You'll have to edit the CIB and fix them.
> Use cibadmin.

Thanks for your help Dejan!

I have changed the ha.cf (pingd instead of ipfail) and corrected the
IP Adress Resource with cibadmin (nic and subnet were interchanged)
restarted hearbeat.

It works better now, but there is still one problem. The shared IP is
assigned correctly to the master node and the DRBD device is also
mounted correctly. The problem is that the applications that HA should
handle for me (postfix, dovecot and mysql) are being started and
stopped every few seconds on the master node.

Here is the log of the master node mail1:

crmd[11803]: 2009/04/24_08:34:40 info: process_lrm_event: LRM
operation postfix_5_stop_0 (call=92, rc=0) complete
crmd[11803]: 2009/04/24_08:34:42 info: do_lrm_rsc_op: Performing
op=dovecot_4_stop_0 key=3:18:0:5227dfff-5bec-4089-8e11-cc08065768d1)
lrmd[11800]: 2009/04/24_08:34:42 info: rsc:dovecot_4: stop
lrmd[14142]: 2009/04/24_08:34:42 WARN: For LSB init script, no
additional parameters are needed.
crmd[11803]: 2009/04/24_08:34:42 info: process_lrm_event: LRM
operation dovecot_4_monitor_12 (call=90, rc=-2) Cancelled
crmd[11803]: 2009/04/24_08:34:42 info: process_lrm_event: LRM
operation dovecot_4_stop_0 (call=93, rc=0) complete
crmd[11803]: 2009/04/24_08:34:43 info: do_lrm_rsc_op: Performing
op=dovecot_4_start_0 key=16:18:0:5227dfff-5bec-4089-8e11-cc08065768d1)
lrmd[11800]: 2009/04/24_08:34:43 info: rsc:dovecot_4: start
lrmd[14150]: 2009/04/24_08:34:43 WARN: For LSB init script, no
additional parameters are needed.
crmd[11803]: 2009/04/24_08:34:43 info: process_lrm_event: LRM
operation dovecot_4_start_0 (call=94, rc=0) complete
crmd[11803]: 2009/04/24_08:34:44 info: do_lrm_rsc_op: Performing
op=dovecot_4_monitor_12
key=2:18:0:5227dfff-5bec-4089-8e11-cc08065768d1)
crmd[11803]: 2009/04/24_08:34:44 info: do_lrm_rsc_op: Performing
op=postfix_5_start_0 key=17:18:0:5227dfff-5bec-4089-8e11-cc08065768d1)
lrmd[11800]: 2009/04/24_08:34:44 info: rsc:postfix_5: start
lrmd[14181]: 2009/04/24_08:34:44 WARN: For LSB init script, no
additional parameters are needed.
lrmd[11800]: 2009/04/24_08:34:44 info: RA output:
(postfix_5:start:stdout) Starting Postfix Mail Transport Agent:
postfix
lrmd[11800]: 2009/04/24_08:34:44 info: RA output:
(dovecot_4:monitor:stderr) Usage: /etc/init.d/dovecot
{start|stop|restart|force-reload}

crmd[11803]: 2009/04/24_08:34:44 info: process_lrm_event: LRM
operation dovecot_4_monitor_12 (call=95, rc=7) complete
lrmd[11800]: 2009/04/24_08:34:44 info: RA output: (postfix_5:start:stdout) .

crmd[11803]: 2009/04/24_08:34:44 info: process_lrm_event: LRM
operation postfix_5_start_0 (call=96, rc=0) complete
crmd[11803]: 2009/04/24_08:34:46 info: do_lrm_rsc_op: Performing
op=postfix_5_stop_0 key=18:19:0:5227dfff-5bec-4089-8e11-cc08065768d1)
lrmd[11800]: 2009/04/24_08:34:46 info: rsc:postfix_5: stop
lrmd[14258]: 2009/04/24_08:34:46 WARN: For LSB init script, no
additional parameters are needed.
lrmd[11800]: 2009/04/24_08:34:46 info: RA output:
(postfix_5:stop:stdout) Stopping Postfix Mail Transport Agent: postfix
lrmd[11800]: 2009/04/24_08:34:46 info: RA output: (postfix_5:stop:stdout) .

crmd[11803]: 2009/04/24_08:34:46 info: process_lrm_event: LRM
operation postfix_5_stop_0 (call=97, rc=0) complete
crmd[11803]: 2009/04/24_08:34:47 info: do_lrm_rsc_op: Performing
op=dovecot_4_stop_0 key=3:19:0:5227dfff-5bec-4089-8e11-cc08065768d1)
lrmd[11800]: 2009/04/24_08:34:47 info: rsc:dovecot_4: stop
lrmd[14271]: 2009/04/24_08:34:47 WARN: For LSB init script, no
additional parameters are needed.
crmd[11803]: 2009/04/24_08:34:47 info: process_lrm_event: LRM
operation dovecot_4_monitor_12 (call=95, rc=-2) Cancelled
crmd[11803]: 2009/04/24_08:34:47 info: process_lrm_event: LRM
operation dovecot_4_stop_0 (call=98, rc=0) complete
crmd[11803]: 2009/04/24_08:34:48 info: do_lrm_rsc_op: Performing
op=dovecot_4_start_0 key=16:19:0:5227dfff-5bec-4089-8e11-cc08065768d1)
lrmd[11800]: 2009/04/24_08:34:48 info: rsc:dovecot_4: start
lrmd[14280]: 2009/04/24_08:34:48 WARN: For LSB init script, no
additional parameters are needed.
crmd[11803]: 2009/04/24_08:34:48 info: process_lrm_event: LRM
operation dovecot_4_start_0 (call=99, rc=0) complete
crmd[11803]: 2009/04/24_08:34:49 info: do_lrm_rsc_op: Performing
op=dovecot_4_monitor_12
key=2:19:0:5227dfff-5bec-4089-8e11-cc08065768d1)
crmd[11803]: 2009/04/24_08:34:49 info: do_lrm_rsc_op: Performing
op=postfix_5_start_0 key=17:19:0:5227dfff-5bec-4089-8e11-cc08065768d1)
lrmd[11800]: 2009/04/24_08:34:49 info: rsc:postfix_5: start
lrmd[14311]: 2009/04/24_08:34:49 WARN: For LSB init script, no
additional parameters are needed.
lrmd[11800]: 2009/04/24_08:34:49 info: RA output:
(postfix_5:start:stdout) Starting Postfix Mail Transport Agent:
postfix
lrmd[11800]: 2009/04/24_08:34:49 info: RA output:
(dovecot_4:monitor:stderr) Usage: /etc/init.d/d

Re: [Linux-HA] crm CLI

2009-04-24 Thread Cristina Bulfon

Ciao,

I tried to build pacemaker rpm w/o success :-((
I will do another time and it will fail then I am going to compile the  
source.


Anyway I thought to use pacemaker because I understand that is the  
better way to

modify cib.xml but if there are any other way to do it I will.

However my focus point  is :

- my resource group is composed from 4 single resources: 2 Filesystem,  
Ipaddr  and AFS.
for better comprehension follow the haresource file ( from my point of  
view is more readable than cib.xml)


a.roma1.infn.it \
IPaddr::X.X.X.31/24/eth0 \
Filesystem::/dev/AFS/sda3::/vicepa/::xfs \
Filesystem::/dev/AFS/sda1::/usr/afs::ext3 \
afs

AFS is related to the filesystems so for any reason the /vicepa is not  
reachable everything has to be stopped and pass the resources to the  
other node.
So is there a way to do it with cib.xml ? some advice how to monitor  
the above resource or the group.



Thanks

cristina

On Apr 21, 2009, at 3:22 PM, Dejan Muhamedagic wrote:


Ciao,

On Tue, Apr 21, 2009 at 08:37:17AM +0200, Cristina Bulfon wrote:

Ciao,

I am lost :-)
Dejan, if I understand correctly you mean that I have to rebuild  
all the

packages : heartbeat, pacemaker and ais
starting from source.


No, just the pacemaker. Or wait until the new packages are built.
Unfortunately, I can't say how long it may take.

Thanks,

Dejan


thanks

cristina

On Apr 20, 2009, at 1:08 PM, Dejan Muhamedagic wrote:


Hi,

On Fri, Apr 17, 2009 at 03:39:32PM +0100, Jason Fitzpatrick wrote:

Hi Cristina

that repo should have all the required files in it,,


These are quite old. rhel4 weren't built for quite some time. The
build log says:

cc1: error: unrecognized command line option "-Wno-pointer-sign"

That's in lib/ais/Makefile.am. You could build the package
yourself, just remove this option beforehand.

Thanks,

Dejan


Jason

[image: [   ]] heartbeat-2.99.2-6.2.i386.rpm


12-Apr-2009 13:36  1.5M   Mirrors


Metalink


[image: [   ]] heartbeat-common-2.99.2-6.2.i386.rpm


 12-Apr-2009 13:36  1.3M   Mirrors


Metalink


[image: [   ]] heartbeat-debug-2.99.2-6.2.i386.rpm


  12-Apr-2009 13:36  698K   Mirrors


Metalink


[image: [   ]] heartbeat-devel-2.99.2-6.2.i386.rpm


  12-Apr-2009 13:36  197K   Mirrors


Metalink


[image: [   ]] heartbeat-ldirectord-2.99.2-6.2.i386.rpm


 12-Apr-2009 13:36  131K   Mirrors


Metalink


[image: [   ]] heartbeat-resources-2.99.2-6.2.i386.rpm


  12-Apr-2009 13:36  245K   Mirrors


Metalink


[image: [   ]] libheartbeat-devel-2.99.2-6.2.i386.rpm


   12-Apr-2009 13:36  152K   Mirrors


Re: [Linux-HA] Assymetric Clustering

2009-04-24 Thread Andrew Beekhof
On Tue, Apr 21, 2009 at 21:54, fsalas  wrote:
>
> First of all, thanks for prompt anwers
>
>
>
>
>> On Mon, 20 Apr 2009 14:11:22 -0700:
>>
>> 8.10 has 2.1.3 which is not a good choice. use at least 2.1.4 or
>> heartbeat 2.99.2 / pacemaker 1.0.3 . maybe you have to compile it
>> yourself.
>>
>>
>
> Ok, I will have a look into it, to see if I can upgrade to those versions
> Ive found that creating the missing lsb script with a dummy one with just an
> exit 5 in it, partially solve the problem, as the resources are no more in
> failed state, but stills logs a lot of errors trying to monitor it
>
>
> Thomas Mueller-14 wrote:
>>
>>
>> is the "symmetric-cluster" option true (shell: "cibadmin -Q | grep
>> symmetric" - no output means "true")?
>>
>>
>
> Ive forgotten to mention that Ive already set that option to false.
>
> Anyone knows if  2.99 or 2.1.4 version behaves different ?

well yeah, 2.99 doesn't include the crm anymore.
thats why you need pacemaker
___
Linux-HA mailing list
Linux-HA@lists.linux-ha.org
http://lists.linux-ha.org/mailman/listinfo/linux-ha
See also: http://linux-ha.org/ReportingProblems