Re: [Pacemaker] Problem with external/ipmi stonith plugin?!
Hi, On Wed, Feb 03, 2010 at 01:21:48PM +0100, Moritz Krinke wrote: Hello, i'm trying to get the ipmi plugin to work using the configuration shown. The plugin always reports rc 1 to pacemaker. I had a look at the ipmi script and verified that the cmds used to indeed work (ipmitool is working as is the connection to the ipmi device if i set the needed env vars) But, looking at the script, i find it strange that it seems to expect having the hostname of the machine which is supposed to be controlled as a seconds argument. I echo'd all passed parameters to logfile, but the script never gets passed a second argument. Is the plugin broken or am i missing something in the configuration? Im quite new to pacemaker/corosync so forgive me for a maybe obvious fault ;) centos5.4 x86_64 corosync-1.2.0-1.el5 openais-1.1.0-1.el5 pacemaker-1.0.7-2.el5 crm configure show: node ds10-100-64-202 node ds10-100-64-204 primitive STONITH202 stonith:external/ipmi \ op monitor interval=10m timeout=1m \ params hostname=ds10-100-64-202 ipaddr=10.100.64.203 userid=root passwd=pw interface=lan \ meta target-role=started primitive STONITH204 stonith:external/ipmi \ op monitor interval=10m timeout=1m \ params hostname=ds10-100-64-204 ipaddr=10.100.64.205 userid=root passwd=pw interface=lan \ meta target-role=started ... location l-STONITH202 STONITH202 -inf: ds10-100-64-202 location l-STONITH204 STONITH204 -inf: ds10-100-64-204 property $id=cib-bootstrap-options \ dc-version=1.0.7-d3fa20fc76c7947d6de66db7e52526dc6bd7d782 \ cluster-infrastructure=openais \ expected-quorum-votes=2 \ no-quorum-policy=ignore \ stonith-action=poweroff rsc_defaults $id=rsc_defaults-options \ resource-stickiness=100 crm status: Failed actions: STONITH202_start_0 (node=ds10-100-64-204, call=24, rc=1, status=complete): unknown error STONITH204_start_0 (node=ds10-100-64-202, call=132, rc=1, status=complete): unknown error log: ... WARN: unpack_rsc_op: Processing failed op STONITH204_start_0 on ds10-100-64-202: unknown error (1) ... Nothing else in the logs? Unfortunately, the stonith logging was not very good until a few months ago (included in the latest 1.0.3 release). If your copy is still older, then you'll have to see what happens using the stonith program: # stonith -t external/ipmi -n (to see the list of params) # stonith -d -t external/ipmi -p params -lS See stonith(8). Thanks, Dejan Any Ideas/Suggestions? Thanks ;) Moritz ___ Pacemaker mailing list Pacemaker@oss.clusterlabs.org http://oss.clusterlabs.org/mailman/listinfo/pacemaker ___ Pacemaker mailing list Pacemaker@oss.clusterlabs.org http://oss.clusterlabs.org/mailman/listinfo/pacemaker
Re: [Pacemaker] Fwd: [Cluster-devel] Organizing Bug Squash PartyforCluster 3.x, GFS2 and more
Koch, Sebastian wrote: Ahh great, that's good news. I've never been to australia hehe. If it would be in germany or maybe austria i will participate and try my best to help squash bugs. But i am no dveeloper i am more a technician. I may be wrong here, but I think this party will have nothing to do with people meeting in person. Will it? Regards Dominik ___ Pacemaker mailing list Pacemaker@oss.clusterlabs.org http://oss.clusterlabs.org/mailman/listinfo/pacemaker
Re: [Pacemaker] Multi-level ACLs for the CIB
On Tue, Feb 2, 2010 at 6:14 AM, Yan Gao y...@novell.com wrote: [snip] A configuration example: .. acls role id=operator write id=operator-write-0 tag=nodes/ write id=operator-write-1 tag=status/ /role role id=monitor read id=monitor-read-0 tag=nodes/ read id=monitor-read-1 tag=status/ /role [snip] Quick question, have you tried using crm_mon with a configuration like this? I'm pretty sure you'll get nothing sensible as it can't find the resources. Might want to think about how to deal with that... ___ Pacemaker mailing list Pacemaker@oss.clusterlabs.org http://oss.clusterlabs.org/mailman/listinfo/pacemaker
Re: [Pacemaker] fedora 10 package update from yum
- Original Message - From: Oscar Remírez de Ganuza Satrústegui oscar...@unav.es To: pacema...@clusterlabs.org Sent: Wednesday, February 03, 2010 2:18 AM Subject: Re: [Pacemaker] fedora 10 package update from yum ___ Pacemaker mailing list Pacemaker@oss.clusterlabs.org http://oss.clusterlabs.org/mailman/listinfo/pacemaker thanks Oscar, but now... -- Processing Dependency: openais = 0.80.5-15.1 for package: libopenais-devel-0.80.5-15.1.x86_64 -- Finished Dependency Resolution libopenais-devel-0.80.5-15.1.x86_64 from installed has depsolving problems -- Missing Dependency: openais = 0.80.5-15.1 is needed by package libopenais-devel-0.80.5-15.1.x86_64 (installed) Error: Missing Dependency: openais = 0.80.5-15.1 is needed by package libopenais-devel-0.80.5-15.1.x86_64 (installed) You could try using --skip-broken to work around the problem You could try running: package-cleanup --problems package-cleanup --dupes rpm -Va --nofiles --nodigest should I remove libopenais ? Thanks ___ Pacemaker mailing list Pacemaker@oss.clusterlabs.org http://oss.clusterlabs.org/mailman/listinfo/pacemaker
Re: [Pacemaker] Two node DRBD cluster will not automatically failover to the secondary
Hi Shravan, Thank you very much for your reply. I know it was quite a while ago that I posted my question to the mailing list, but I've been working on other things and have only just had the chance to come back to this. You say that I need to setup stonith resources along with setting stonith-enabled = true. Well I know how to change the stonith-enabled setting, but I have no clue as to how I go about setting up the appropriate stonith resources to prevent DRBD from getting into a split brain situation. The documentation provided on the DRBD website about setting up a 2 node cluster with Pacemaker doesn't tell you to enable stonith or configure stonith resources. It does talk about the resource fencing options within the /etc/drbd.conf of which I have configured: resource r0 { disk { fencing resource-only; } handlers { fence-peer /usr/lib/drbd/crm-fence-peer.sh; after-resync-target /usr/lib/drbd/crm-unfence-peer.sh; } I've searched the internet high and low for example pacemaker configs that show you how to configure stonith resources for DRBD, but I can't find anything useful. This howto ( http://www.howtoforge.com/installation-and-setup-guide-for-drbd-openais-pacemaker-xen-on-opensuse-11.1) that I found spells out how to configure a cluster and even states: STONITH is disabled in this configuration though it is highly-recommended in any production environment to eliminate the risk of divergent data. but infuriatingly it doesn't tell you how. Could you please give me some pointers or some helpful examples or perhaps point me to someone or something that can give me a hand in this area? Many Thanks Tom On Thu, Dec 17, 2009 at 2:14 PM, Shravan Mishra shravan.mis...@gmail.comwrote: Hi, For stateful resources like drbd you will have to setup stonith resources for them to function properly or at all. stonith-enabled is true by default. Sincerely Shravan On Thu, Dec 17, 2009 at 6:29 AM, Tom Pride tom.pr...@gmail.com wrote: Hi there, I have setup a two node DRBD culster with pacemaker using the instructions provided on the drbd.org website: http://www.drbd.org/users-guide-emb/ch-pacemaker.html The cluster works perfectly and I can migrate the resources back and forth between the two nodes without a problem. However, if I try simulating a complete server failure of the master node by powering off the server, pacemaker does not then automatically bring up the remaining node as the master. I need some help to find out what configuration changes I need to make in order for my cluster to failover automatically. The cluster is built on 2 Redhat EL 5.3 servers running the following software versions: drbd-8.3.6-1 pacemaker-1.0.5-4.1 openais-0.80.5-15.1 Below I have listed the drbd.conf, openais.conf and the output of crm configuration show. If someone could take a look at these for me and provide any suggestions/modifications I would be most grateful. Thanks, Tom /etc/drbd.conf global { usage-count no; } common { protocol C; } resource r0 { disk { fencing resource-only; } handlers { fence-peer /usr/lib/drbd/crm-fence-peer. sh; after-resync-target /usr/lib/drbd/crm-unfence-peer.sh; } syncer { rate 40M; } on mq001.back.live.cwwtf.local { device/dev/drbd1; disk /dev/cciss/c0d0p1; address 172.23.8.69:7789; meta-disk internal; } on mq002.back.live.cwwtf.local { device/dev/drbd1; disk /dev/cciss/c0d0p1; address 172.23.8.70:7789; meta-disk internal; } } r...@mq001:~# cat /etc/ais/openais.conf totem { version: 2 token: 3000 token_retransmits_before_loss_const: 10 join: 60 consensus: 1500 vsftype: none max_messages: 20 clear_node_high_bit: yes secauth: on threads: 0 rrp_mode: passive interface { ringnumber: 0 bindnetaddr: 172.59.60.0 mcastaddr: 239.94.1.1 mcastport: 5405 } interface { ringnumber: 1 bindnetaddr: 172.23.8.0 mcastaddr: 239.94.2.1 mcastport: 5405 } } logging { to_stderr: yes debug: on timestamp: on to_file: no to_syslog: yes syslog_facility: daemon } amf { mode: disabled } service { ver: 0 name: pacemaker use_mgmtd: yes } aisexec { user: root group: root } r...@mq001:~# crm configure show node mq001.back.live.cwwtf.local node mq002.back.live.cwwtf.local primitive activemq-emp lsb:bbc-activemq-emp primitive activemq-forge-services lsb:bbc-activemq-forge-services primitive activemq-social lsb:activemq-social primitive drbd_activemq ocf:linbit:drbd \ params drbd_resource=r0 \ op monitor interval=15s primitive fs_activemq ocf:heartbeat:Filesystem \ params device=/dev/drbd1 directory=/drbd fstype=ext3 primitive ip_activemq ocf:heartbeat:IPaddr2 \ params ip=172.23.8.71 nic=eth0 group activemq fs_activemq ip_activemq activemq-forge-services
[Pacemaker] Auto-restart service on IP shift?
I think I read about this somehwere, but I can't find it now... I have a cloned service on two nodes that is always running on both (named, one is a replicated slave). The IP sticks to one, then floats to the other if the first one goes down. Works great. My problem is that named only binds to the IP addrs that are there when it comes up, so the second node doesn't bind to the floating IP when it floats over there because bind was already started. If I restart named after the IP comes over then it works, but that isn't as automatic as I'd like. ;) Is there any way in crm to configure bind to automatically restart on the second node if it notices the IP floats over to it? Thanks for the continued expert advice...! -erich ___ Pacemaker mailing list Pacemaker@oss.clusterlabs.org http://oss.clusterlabs.org/mailman/listinfo/pacemaker
[Pacemaker] debian/lenny: pacemaker 1.07 where are the STONITH apps/agents?
I've setup a new lenny - 2 node cluster with Martin Gerhard Loschwitz new Pacemaker 1.07 Heartbeat 3.0.2 versions and before being able to start using it I'm current desperately looking for the HP STONITH (riloe) agents/apps. apt-cache show stonith libstonith0 tells me that they are transitional dummy packages. In which repository may I find them suitable for lenny and above Pacekamer/Heartbeat versions? TIA, Bruno ___ Pacemaker mailing list Pacemaker@oss.clusterlabs.org http://oss.clusterlabs.org/mailman/listinfo/pacemaker
Re: [Pacemaker] debian/lenny: pacemaker 1.07 where are the STONITH apps/agents?
the stonith plugins are in the cluster-glue package which is in the yum repositories by clusterlabs/suse, not sure about .deb's. 2010/2/3 Bruno Voigt bruno.vo...@ic3s.de I've setup a new lenny - 2 node cluster with Martin Gerhard Loschwitz new Pacemaker 1.07 Heartbeat 3.0.2 versions and before being able to start using it I'm current desperately looking for the HP STONITH (riloe) agents/apps. apt-cache show stonith libstonith0 tells me that they are transitional dummy packages. In which repository may I find them suitable for lenny and above Pacekamer/Heartbeat versions? TIA, Bruno ___ Pacemaker mailing list Pacemaker@oss.clusterlabs.org http://oss.clusterlabs.org/mailman/listinfo/pacemaker ___ Pacemaker mailing list Pacemaker@oss.clusterlabs.org http://oss.clusterlabs.org/mailman/listinfo/pacemaker
[Pacemaker] thread safety problem with pacemaker and corosync integration
For some time people have reported segfaults on startup when using pacemaker as a plugin to corosync related to tzset in the stack trace. I believe we had fixed this by removing the thread-unsafe usage of localtime and strftime calls in the code base of corosync in 1.2.0. Via further investigation by H.J. Lee, he mostly identified a problem with localtime_r calling tzset calling getenv(). If at about the same time, another thread calls setenv(), the other thread's getenv could segfault. syslog() also calls localtime_r in glibc. On some rare occasions Pacemaker calls setenv() while corosync executes a syslog operation resulting in a segfault. Posix is clear on this issue - tzset should be thread safe, localtime_r should be thread safe, syslog should be thread safe. Some C libraries implementations of these functions unfortunately are not thread safe for these functions when used in conjunction with setenv because they use getenv internally (which is not required to be thread safe by posix). Our short term plan is to workaround these problems in glibc by doing the following: 1) providing a getenv/setenv api inside coroapi.h so that corosync internal code and third party plugins such as pacemaker can use a mutex protected getenv/setenv 2) porting our syslog-direct-communication code from whitetank and avoid using the syslog C library api (which again uses localtime_r) call entirely 3) implementing a localtime_r replacement which does not call tzset on each execution so that timestamp:on operational mode does not suffer from this same problem If your suffering from this issue, please be aware we have a root cause and will get it resolved. Regards -steve ___ Pacemaker mailing list Pacemaker@oss.clusterlabs.org http://oss.clusterlabs.org/mailman/listinfo/pacemaker
Re: [Pacemaker] Multi-level ACLs for the CIB
Andrew Beekhof wrote: On Tue, Feb 2, 2010 at 6:14 AM, Yan Gao y...@novell.com wrote: [snip] A configuration example: .. acls role id=operator write id=operator-write-0 tag=nodes/ write id=operator-write-1 tag=status/ /role role id=monitor read id=monitor-read-0 tag=nodes/ read id=monitor-read-1 tag=status/ /role [snip] Quick question, have you tried using crm_mon with a configuration like this? I'm pretty sure you'll get nothing sensible as it can't find the resources. Indeed. I ever thought that the information from status... could be enough for monitoring, while then realized both of the nodes and resources from configuration... are required. Might want to think about how to deal with that... We could either give some well defined ACLs for that, or is it possible that crm_mon doesn't dependent on the info from configration? -- Yan Gao y...@novell.com Software Engineer China Server Team, OPS Engineering, Novell, Inc. ___ Pacemaker mailing list Pacemaker@oss.clusterlabs.org http://oss.clusterlabs.org/mailman/listinfo/pacemaker
Re: [Pacemaker] Multi-level ACLs for the CIB
On 02/04/10 12:36, Tim Serong wrote: On 2/4/2010 at 02:52 PM, Yan Gao y...@novell.com wrote: Andrew Beekhof wrote: On Tue, Feb 2, 2010 at 6:14 AM, Yan Gao y...@novell.com wrote: [snip] A configuration example: .. acls role id=operator write id=operator-write-0 tag=nodes/ write id=operator-write-1 tag=status/ /role role id=monitor read id=monitor-read-0 tag=nodes/ read id=monitor-read-1 tag=status/ /role [snip] Quick question, have you tried using crm_mon with a configuration like this? I'm pretty sure you'll get nothing sensible as it can't find the resources. Indeed. I ever thought that the information from status... could be enough for monitoring, while then realized both of the nodes and resources from configuration... are required. Might want to think about how to deal with that... We could either give some well defined ACLs for that, or is it possible that crm_mon doesn't dependent on the info from configration? I don't think so... cib/configuration/resources etc. is the canonical source for what's configured, and may include things for which there is no status information yet. There's nothing in cib/status yet, for example, if the cluster is just starting up, yet crm_mon will still show you the configured nodes and resources. I've followed the same logic with Hawk, too, i.e. I'm interrogating cib/configuration to see what's meant to be there, then later check cib/status to see if it actually is. That makes sense. What's showing up totally depends on how many information for the pe_working_set to unpack. Default ACL that grants everyone read access to configuration, maybe? I'd not prefer defaulting it. We could set it for an user/role properly, or in a template for user to reference, instead of breaking the ACL policy. Thanks, Yan -- Yan Gao y...@novell.com Software Engineer China Server Team, OPS Engineering, Novell, Inc. ___ Pacemaker mailing list Pacemaker@oss.clusterlabs.org http://oss.clusterlabs.org/mailman/listinfo/pacemaker
Re: [Pacemaker] crm_attribte failed with Multiple attributes match
Hi Andrew, This problem occurred in our environment. * RHEL5.4(x86) on Esxi 2node * corosync 1.1.2 * Pacemaker-1-0-6f67420618b0.tar.gz It sometimes occurs when I carry out test.sh in long time. [r...@srv01 ~]# ./test.sh (snip) scope=status name=master-vmrd-res: value=132 Multiple attributes match name=master-vmrd-res: Value: 133(id=status-srv01-master-vmrd-res:) Value: 133(id=status-srv01-master-vmrd-res:) Value: 133(id=status-srv01-master-vmrd-res:) scope=status name=master-vmrd-res: value=134 scope=status name=master-vmrd-res: value=135 scope=status name=master-vmrd-res: value=136 scope=status name=master-vmrd-res: value=137 Multiple attributes match name=master-vmrd-res: Value: 138(id=status-srv01-master-vmrd-res:) Value: 138(id=status-srv01-master-vmrd-res:) Value: 138(id=status-srv01-master-vmrd-res:) scope=status name=master-vmrd-res: value=139 scope=status name=master-vmrd-res: value=140 scope=status name=master-vmrd-res: value=141 I think that this problem is very important. The failure of the update of the attribute may cause the fail-over that we do not expect. After all is it a problem of libxml2? Is it necessary to update libxml2? Best Regards, Hideo Yamauchi. --- Andrew Beekhof and...@beekhof.net wrote: On Mon, Feb 1, 2010 at 5:50 AM, hj lee kerd...@gmail.com wrote: Sorry, my typo. I meant openais-0.80.5 in http://download.opensuse.org/repositories/server:/ha-clustering/CentOS_5/i386/ I had so many problems with this openais-0.80.5. After upgrading corosync-1.1.2, most issues are gone. FYI, http://www.clusterlabs.org/wiki/Install now refers to packages from http://www.clusterlabs.org/rpm/ I am no longer keeping download.opensuse.org updated (though someone else might decide to do so) ___ Pacemaker mailing list Pacemaker@oss.clusterlabs.org http://oss.clusterlabs.org/mailman/listinfo/pacemaker ___ Pacemaker mailing list Pacemaker@oss.clusterlabs.org http://oss.clusterlabs.org/mailman/listinfo/pacemaker
Re: [Pacemaker] fedora 10 package update from yum
On Wed, Feb 3, 2010 at 4:58 PM, E-Blokos in...@e-blokos.com wrote: - Original Message - From: Oscar Remírez de Ganuza Satrústegui oscar...@unav.es To: pacema...@clusterlabs.org Sent: Wednesday, February 03, 2010 10:56 AM Subject: Re: [Pacemaker] fedora 10 package update from yum ___ Pacemaker mailing list Pacemaker@oss.clusterlabs.org http://oss.clusterlabs.org/mailman/listinfo/pacemaker I use openais-0.80.5-15.1.x86_64 pacemaker-1.0.5-4.2.x86_64 libopenais-devel-0.80.5-15.1.x86_64 why do you need this last one installed? installed from old opensuse repo Its probably just easiest to remove the old set of packages before installing the new ones. I don't think the OBS ones followed the fedora naming conventions - that might be causing problems. ___ Pacemaker mailing list Pacemaker@oss.clusterlabs.org http://oss.clusterlabs.org/mailman/listinfo/pacemaker