Re: [Pacemaker] Problem with external/ipmi stonith plugin?!

2010-02-03 Thread Dejan Muhamedagic
Hi,

On Wed, Feb 03, 2010 at 01:21:48PM +0100, Moritz Krinke wrote:
 Hello,
 
 i'm trying to get the ipmi plugin to work using the configuration shown. The 
 plugin always reports rc 1 to pacemaker.
 I had a look at the ipmi script and verified that the cmds used to indeed 
 work (ipmitool is working as is the connection to the ipmi device if i set 
 the needed env vars)
 
 But, looking at the script, i find it strange that it seems to expect having 
 the hostname of the machine which is supposed to be controlled as a seconds 
 argument.
 I echo'd all passed parameters to logfile, but the script never gets passed a 
 second argument.
 
 Is the plugin broken or am i missing something in the configuration?
 
 Im quite new to pacemaker/corosync so forgive me for a maybe obvious fault ;)
 
 centos5.4 x86_64
 corosync-1.2.0-1.el5
 openais-1.1.0-1.el5
 pacemaker-1.0.7-2.el5
 
 crm configure show:
 node ds10-100-64-202
 node ds10-100-64-204
 primitive STONITH202 stonith:external/ipmi \
 op monitor interval=10m timeout=1m \
 params hostname=ds10-100-64-202 ipaddr=10.100.64.203 
 userid=root passwd=pw interface=lan \
 meta target-role=started
 primitive STONITH204 stonith:external/ipmi \
 op monitor interval=10m timeout=1m \
 params hostname=ds10-100-64-204 ipaddr=10.100.64.205 
 userid=root passwd=pw interface=lan \
 meta target-role=started
 ...
 location l-STONITH202 STONITH202 -inf: ds10-100-64-202
 location l-STONITH204 STONITH204 -inf: ds10-100-64-204
 property $id=cib-bootstrap-options \
 dc-version=1.0.7-d3fa20fc76c7947d6de66db7e52526dc6bd7d782 \
 cluster-infrastructure=openais \
 expected-quorum-votes=2 \
 no-quorum-policy=ignore \
 stonith-action=poweroff
 rsc_defaults $id=rsc_defaults-options \
 resource-stickiness=100
 
 crm status:
 Failed actions:
 STONITH202_start_0 (node=ds10-100-64-204, call=24, rc=1, 
 status=complete): unknown error
 STONITH204_start_0 (node=ds10-100-64-202, call=132, rc=1, 
 status=complete): unknown error
 
 
 log:
 ...
 WARN: unpack_rsc_op: Processing failed op STONITH204_start_0 on 
 ds10-100-64-202: unknown error (1)
 ...

Nothing else in the logs? Unfortunately, the stonith logging was
not very good until a few months ago (included in the latest
1.0.3 release). If your copy is still older, then you'll have to
see what happens using the stonith program:

# stonith -t external/ipmi -n  (to see the list of params)
# stonith -d -t external/ipmi -p params -lS

See stonith(8).

Thanks,

Dejan

 Any Ideas/Suggestions? Thanks ;)
 
 Moritz
 ___
 Pacemaker mailing list
 Pacemaker@oss.clusterlabs.org
 http://oss.clusterlabs.org/mailman/listinfo/pacemaker

___
Pacemaker mailing list
Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker


Re: [Pacemaker] Fwd: [Cluster-devel] Organizing Bug Squash PartyforCluster 3.x, GFS2 and more

2010-02-03 Thread Dominik Klein
Koch, Sebastian wrote:
 Ahh great, that's good news. I've never been to australia hehe. If it would 
 be in germany or maybe austria i will participate and try my best to help 
 squash bugs. But i am no dveeloper i am more a technician.

I may be wrong here, but I think this party will have nothing to do
with people meeting in person. Will it?

Regards
Dominik

___
Pacemaker mailing list
Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker


Re: [Pacemaker] Multi-level ACLs for the CIB

2010-02-03 Thread Andrew Beekhof
On Tue, Feb 2, 2010 at 6:14 AM, Yan Gao y...@novell.com wrote:

[snip]

 A configuration example:
 ..
 acls
  role id=operator
    write id=operator-write-0 tag=nodes/
    write id=operator-write-1 tag=status/
  /role
  role id=monitor
    read id=monitor-read-0 tag=nodes/
    read id=monitor-read-1 tag=status/
  /role

[snip]

Quick question, have you tried using crm_mon with a configuration like this?
I'm pretty sure you'll get nothing sensible as it can't find the resources.

Might want to think about how to deal with that...

___
Pacemaker mailing list
Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker


Re: [Pacemaker] fedora 10 package update from yum

2010-02-03 Thread E-Blokos


- Original Message - 
From: Oscar Remí­rez de Ganuza Satrústegui oscar...@unav.es

To: pacema...@clusterlabs.org
Sent: Wednesday, February 03, 2010 2:18 AM
Subject: Re: [Pacemaker] fedora 10 package update from yum



___
Pacemaker mailing list
Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker



thanks Oscar,

but now...

-- Processing Dependency: openais = 0.80.5-15.1 for package: 
libopenais-devel-0.80.5-15.1.x86_64

-- Finished Dependency Resolution
libopenais-devel-0.80.5-15.1.x86_64 from installed has depsolving problems
 -- Missing Dependency: openais = 0.80.5-15.1 is needed by package 
libopenais-devel-0.80.5-15.1.x86_64 (installed)
Error: Missing Dependency: openais = 0.80.5-15.1 is needed by package 
libopenais-devel-0.80.5-15.1.x86_64 (installed)

You could try using --skip-broken to work around the problem
You could try running: package-cleanup --problems
   package-cleanup --dupes
   rpm -Va --nofiles --nodigest

should I remove libopenais ?

Thanks 



___
Pacemaker mailing list
Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker


Re: [Pacemaker] Two node DRBD cluster will not automatically failover to the secondary

2010-02-03 Thread Tom Pride
Hi Shravan,

Thank you very much for your reply.  I know it was quite a while ago that I
posted my question to the mailing list, but I've been working on other
things and have only just had the chance to come back to this.

You say that I need to setup stonith resources along with setting
stonith-enabled = true.  Well I know how to change the stonith-enabled
setting, but I have no clue as to how I go about setting up the appropriate
stonith resources to prevent DRBD from getting into a split brain
situation.  The documentation provided on the DRBD website about setting up
a 2 node cluster with Pacemaker doesn't tell you to enable stonith or
configure stonith resources. It does talk about the resource fencing options
within the /etc/drbd.conf of which I have configured:

resource r0 {
  disk {
fencing resource-only;
  }
  handlers {
fence-peer /usr/lib/drbd/crm-fence-peer.sh;
after-resync-target /usr/lib/drbd/crm-unfence-peer.sh;
  }


I've searched the internet high and low for example pacemaker configs that
show you how to configure stonith resources for DRBD, but I can't find
anything useful.

This howto (
http://www.howtoforge.com/installation-and-setup-guide-for-drbd-openais-pacemaker-xen-on-opensuse-11.1)
 that I found spells out how to configure a cluster and even states:
STONITH is disabled in this configuration though it is highly-recommended
in any production environment to eliminate the risk of divergent data. but
infuriatingly it doesn't tell you how.

Could you please give me some pointers or some helpful examples or perhaps
point me to someone or something that can give me a hand in this area?

Many Thanks
Tom


On Thu, Dec 17, 2009 at 2:14 PM, Shravan Mishra shravan.mis...@gmail.comwrote:

 Hi,

 For stateful resources like drbd you will have to setup stonith resources
 for them to function properly or at all.
 stonith-enabled is true by default.

 Sincerely
 Shravan

 On Thu, Dec 17, 2009 at 6:29 AM, Tom Pride tom.pr...@gmail.com wrote:

 Hi there,

 I have setup a two node DRBD culster with pacemaker using the instructions
 provided on the drbd.org website:
 http://www.drbd.org/users-guide-emb/ch-pacemaker.html  The cluster works
 perfectly and I can migrate the resources back and forth between the two
 nodes without a problem.  However, if I try simulating a complete server
 failure of the master node by powering off the server, pacemaker does not
 then automatically bring up the remaining node as the master.  I need some
 help to find out what configuration changes I need to make in order for my
 cluster to failover automatically.

 The cluster is built on 2 Redhat EL 5.3 servers running the following
 software versions:
 drbd-8.3.6-1
 pacemaker-1.0.5-4.1
 openais-0.80.5-15.1

 Below I have listed the drbd.conf, openais.conf and the output of crm
 configuration show.  If someone could take a look at these for me and
 provide any suggestions/modifications I would be most grateful.

 Thanks,
 Tom

 /etc/drbd.conf

 global {
   usage-count no;
 }
 common {
   protocol C;
 }
 resource r0 {
   disk {
 fencing resource-only;
   }
   handlers {
 fence-peer /usr/lib/drbd/crm-fence-peer.
 sh;
 after-resync-target /usr/lib/drbd/crm-unfence-peer.sh;
   }
   syncer {
 rate 40M;
   }
   on mq001.back.live.cwwtf.local {
 device/dev/drbd1;
 disk  /dev/cciss/c0d0p1;
 address   172.23.8.69:7789;
 meta-disk internal;
   }
   on mq002.back.live.cwwtf.local {
 device/dev/drbd1;
 disk  /dev/cciss/c0d0p1;
 address   172.23.8.70:7789;
 meta-disk internal;
   }
 }


 r...@mq001:~# cat /etc/ais/openais.conf
 totem {
   version: 2
   token: 3000
   token_retransmits_before_loss_const: 10
   join: 60
   consensus: 1500
   vsftype: none
   max_messages: 20
   clear_node_high_bit: yes
   secauth: on
   threads: 0
   rrp_mode: passive
   interface {
 ringnumber: 0
 bindnetaddr: 172.59.60.0
 mcastaddr: 239.94.1.1
 mcastport: 5405
   }
   interface {
 ringnumber: 1
 bindnetaddr: 172.23.8.0
 mcastaddr: 239.94.2.1
 mcastport: 5405
   }
 }
 logging {
   to_stderr: yes
   debug: on
   timestamp: on
   to_file: no
   to_syslog: yes
   syslog_facility: daemon
 }
 amf {
   mode: disabled
 }
 service {
   ver:   0
   name:  pacemaker
   use_mgmtd: yes
 }
 aisexec {
   user:   root
   group:  root
 }


 r...@mq001:~# crm configure show
 node mq001.back.live.cwwtf.local
 node mq002.back.live.cwwtf.local
 primitive activemq-emp lsb:bbc-activemq-emp
 primitive activemq-forge-services lsb:bbc-activemq-forge-services
 primitive activemq-social lsb:activemq-social
 primitive drbd_activemq ocf:linbit:drbd \
 params drbd_resource=r0 \
 op monitor interval=15s
 primitive fs_activemq ocf:heartbeat:Filesystem \
 params device=/dev/drbd1 directory=/drbd fstype=ext3
 primitive ip_activemq ocf:heartbeat:IPaddr2 \
 params ip=172.23.8.71 nic=eth0
 group activemq fs_activemq ip_activemq activemq-forge-services
 

[Pacemaker] Auto-restart service on IP shift?

2010-02-03 Thread Erich Weiler
I think I read about this somehwere, but I can't find it now...  I have 
a cloned service on two nodes that is always running on both (named, one 
is a replicated slave).  The IP sticks to one, then floats to the other 
if the first one goes down.  Works great.


My problem is that named only binds to the IP addrs that are there when 
it comes up, so the second node doesn't bind to the floating IP when it 
floats over there because bind was already started.  If I restart named 
after the IP comes over then it works, but that isn't as automatic as 
I'd like.  ;)


Is there any way in crm to configure bind to automatically restart on 
the second node if it notices the IP floats over to it?


Thanks for the continued expert advice...!

-erich

___
Pacemaker mailing list
Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker


[Pacemaker] debian/lenny: pacemaker 1.07 where are the STONITH apps/agents?

2010-02-03 Thread Bruno Voigt
I've setup a new lenny - 2 node cluster with
Martin Gerhard Loschwitz  new Pacemaker 1.07  Heartbeat 3.0.2 versions
and before being able to start using it
I'm current desperately looking  for the HP STONITH (riloe) agents/apps.

apt-cache show stonith libstonith0 tells me that they are  transitional
dummy packages.

In which repository may I find them suitable for lenny and above
Pacekamer/Heartbeat versions?

TIA,
Bruno

___
Pacemaker mailing list
Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker


Re: [Pacemaker] debian/lenny: pacemaker 1.07 where are the STONITH apps/agents?

2010-02-03 Thread Moritz Krinke
the stonith plugins are in the cluster-glue package which is in the yum
repositories by clusterlabs/suse, not sure about .deb's.

2010/2/3 Bruno Voigt bruno.vo...@ic3s.de

 I've setup a new lenny - 2 node cluster with
 Martin Gerhard Loschwitz  new Pacemaker 1.07  Heartbeat 3.0.2 versions
 and before being able to start using it
 I'm current desperately looking  for the HP STONITH (riloe) agents/apps.

 apt-cache show stonith libstonith0 tells me that they are  transitional
 dummy packages.

 In which repository may I find them suitable for lenny and above
 Pacekamer/Heartbeat versions?

 TIA,
 Bruno

 ___
 Pacemaker mailing list
 Pacemaker@oss.clusterlabs.org
 http://oss.clusterlabs.org/mailman/listinfo/pacemaker

___
Pacemaker mailing list
Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker


[Pacemaker] thread safety problem with pacemaker and corosync integration

2010-02-03 Thread Steven Dake
For some time people have reported segfaults on startup when using
pacemaker as a plugin to corosync related to tzset in the stack trace.
I believe we had fixed this by removing the thread-unsafe usage of
localtime and strftime calls in the code base of corosync in 1.2.0.

Via further investigation by H.J. Lee, he mostly identified a problem
with localtime_r calling tzset calling getenv().  If at about the same
time, another thread calls setenv(), the other thread's getenv could
segfault.  syslog() also calls localtime_r in glibc.  On some rare
occasions Pacemaker calls setenv() while corosync executes a syslog
operation resulting in a segfault.

Posix is clear on this issue - tzset should be thread safe, localtime_r
should be thread safe, syslog should be thread safe.  Some C libraries
implementations of these functions unfortunately are not thread safe for
these functions when used in conjunction with setenv because they use
getenv internally (which is not required to be thread safe by posix).

Our short term plan is to workaround these problems in glibc by doing
the following:
1) providing a getenv/setenv api inside coroapi.h so that corosync
internal code and third party plugins such as pacemaker can use a mutex
protected getenv/setenv
2) porting our syslog-direct-communication code from whitetank and avoid
using the syslog C library api (which again uses localtime_r) call
entirely
3) implementing a localtime_r replacement which does not call tzset on
each execution so that timestamp:on operational mode does not suffer
from this same problem

If your suffering from this issue, please be aware we have a root cause
and will get it resolved.

Regards
-steve


___
Pacemaker mailing list
Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker


Re: [Pacemaker] Multi-level ACLs for the CIB

2010-02-03 Thread Yan Gao


Andrew Beekhof wrote:
 On Tue, Feb 2, 2010 at 6:14 AM, Yan Gao y...@novell.com wrote:
 
 [snip]
 
 A configuration example:
 ..
 acls
  role id=operator
write id=operator-write-0 tag=nodes/
write id=operator-write-1 tag=status/
  /role
  role id=monitor
read id=monitor-read-0 tag=nodes/
read id=monitor-read-1 tag=status/
  /role
 
 [snip]
 
 Quick question, have you tried using crm_mon with a configuration like this?
 I'm pretty sure you'll get nothing sensible as it can't find the resources.
Indeed. I ever thought that the information from status... could be enough
for monitoring, while then realized both of the nodes and resources from
configuration... are required.

 
 Might want to think about how to deal with that...
We could either give some well defined ACLs for that, or is it possible that
crm_mon doesn't dependent on the info from configration?

-- 
Yan Gao y...@novell.com
Software Engineer
China Server Team, OPS Engineering, Novell, Inc.

___
Pacemaker mailing list
Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker


Re: [Pacemaker] Multi-level ACLs for the CIB

2010-02-03 Thread Yan Gao


On 02/04/10 12:36, Tim Serong wrote:
 On 2/4/2010 at 02:52 PM, Yan Gao y...@novell.com wrote: 
  
 Andrew Beekhof wrote: 
 On Tue, Feb 2, 2010 at 6:14 AM, Yan Gao y...@novell.com wrote: 
  
 [snip] 
  
 A configuration example: 
 .. 
 acls 
  role id=operator 
write id=operator-write-0 tag=nodes/ 
write id=operator-write-1 tag=status/ 
  /role 
  role id=monitor 
read id=monitor-read-0 tag=nodes/ 
read id=monitor-read-1 tag=status/ 
  /role 
  
 [snip] 
  
 Quick question, have you tried using crm_mon with a configuration like  
 this? 
 I'm pretty sure you'll get nothing sensible as it can't find the resources. 
 Indeed. I ever thought that the information from status... could be 
 enough 
 for monitoring, while then realized both of the nodes and resources from 
 configuration... are required. 
  
  
 Might want to think about how to deal with that... 
 We could either give some well defined ACLs for that, or is it possible that 
 crm_mon doesn't dependent on the info from configration? 
  
 I don't think so...  cib/configuration/resources etc. is the canonical
 source for what's configured, and may include things for which there is
 no status information yet.  There's nothing in cib/status yet, for example,
 if the cluster is just starting up, yet crm_mon will still show you the
 configured nodes and resources.  I've followed the same logic with Hawk,
 too, i.e. I'm interrogating cib/configuration to see what's meant to be
 there, then later check cib/status to see if it actually is.
That makes sense. What's showing up totally depends on how many information
for the pe_working_set to unpack.

 
 Default ACL that grants everyone read access to configuration, maybe?
I'd not prefer defaulting it. We could set it for an user/role properly, or
in a template for user to reference, instead of breaking the ACL policy.

Thanks,
  Yan
-- 
Yan Gao y...@novell.com
Software Engineer
China Server Team, OPS Engineering, Novell, Inc.

___
Pacemaker mailing list
Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker


Re: [Pacemaker] crm_attribte failed with Multiple attributes match

2010-02-03 Thread renayama19661014
Hi Andrew,

This problem occurred in our environment.

 * RHEL5.4(x86) on Esxi 2node
 * corosync 1.1.2
 * Pacemaker-1-0-6f67420618b0.tar.gz

It sometimes occurs when I carry out test.sh in long time.

[r...@srv01 ~]# ./test.sh 
(snip)
scope=status  name=master-vmrd-res: value=132
Multiple attributes match name=master-vmrd-res:
  Value: 133(id=status-srv01-master-vmrd-res:)
  Value: 133(id=status-srv01-master-vmrd-res:)
  Value: 133(id=status-srv01-master-vmrd-res:)
scope=status  name=master-vmrd-res: value=134
scope=status  name=master-vmrd-res: value=135
scope=status  name=master-vmrd-res: value=136
scope=status  name=master-vmrd-res: value=137
Multiple attributes match name=master-vmrd-res:
  Value: 138(id=status-srv01-master-vmrd-res:)
  Value: 138(id=status-srv01-master-vmrd-res:)
  Value: 138(id=status-srv01-master-vmrd-res:)
scope=status  name=master-vmrd-res: value=139
scope=status  name=master-vmrd-res: value=140
scope=status  name=master-vmrd-res: value=141


I think that this problem is very important.
The failure of the update of the attribute may cause the fail-over that we do 
not expect.

After all is it a problem of libxml2? 
Is it necessary to update libxml2? 

Best Regards,
Hideo Yamauchi.


--- Andrew Beekhof and...@beekhof.net wrote:

 On Mon, Feb 1, 2010 at 5:50 AM, hj lee kerd...@gmail.com wrote:
  Sorry, my typo. I meant openais-0.80.5 in
  http://download.opensuse.org/repositories/server:/ha-clustering/CentOS_5/i386/
  I had so many problems with this openais-0.80.5. After upgrading
  corosync-1.1.2, most issues are gone.
 
 FYI, http://www.clusterlabs.org/wiki/Install now refers to packages
 from http://www.clusterlabs.org/rpm/
 I am no longer keeping download.opensuse.org updated (though someone
 else might decide to do so)
 
 ___
 Pacemaker mailing list
 Pacemaker@oss.clusterlabs.org
 http://oss.clusterlabs.org/mailman/listinfo/pacemaker
 


___
Pacemaker mailing list
Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker


Re: [Pacemaker] fedora 10 package update from yum

2010-02-03 Thread Andrew Beekhof
On Wed, Feb 3, 2010 at 4:58 PM, E-Blokos in...@e-blokos.com wrote:

 - Original Message - From: Oscar Remí­rez de Ganuza Satrústegui
 oscar...@unav.es
 To: pacema...@clusterlabs.org
 Sent: Wednesday, February 03, 2010 10:56 AM
 Subject: Re: [Pacemaker] fedora 10 package update from yum


 ___
 Pacemaker mailing list
 Pacemaker@oss.clusterlabs.org
 http://oss.clusterlabs.org/mailman/listinfo/pacemaker


 I use

 openais-0.80.5-15.1.x86_64
 pacemaker-1.0.5-4.2.x86_64
 libopenais-devel-0.80.5-15.1.x86_64

why do you need this last one installed?


 installed from old opensuse repo

Its probably just easiest to remove the old set of packages before
installing the new ones.
I don't think the OBS ones followed the fedora naming conventions -
that might be causing problems.

___
Pacemaker mailing list
Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker