Re: [Pacemaker] [Openais] Unusual exit code with /etc/init.d/corosync stop
On Tue, Mar 23, 2010 at 12:42 AM, Andreas Mock andreas.m...@web.de wrote: Hi all, I'm using corosync 1.2.0 from the packages of clusterlabs.org on openSuSE 11.2. A correct /etc/init.d/corosync stop issues a return code of 1 The rc code isn't coming from corosync at all. Its coming from the last command in stop(), which is echo. Please run the following and report the result: echo ; echo $? On Fedora it produces: [09:14 AM] r...@f12 ~/tmp # echo ; echo $? 0 [09:14 AM] r...@f12 ~/tmp # which definitely hurts the Cluster Test Suite when stopping the cluster stack asuming (IMHO correctly) that a problem free execution of the rc script should return 0 and not 1. The problem is indirectly the setting of the return code variable $rtrn in the while loop waiting for corosync to die. While loop is exited exactly when the status call delivers a 1 meaning that the process isn't there any more. This rc of 1 will then be delivered as return code of the stop-call. Here's the patch just to show the little change. ---8-- --- /etc/init.d/corosync 2010-01-20 21:23:53.0 +0100 +++ /tmp/corosync 2010-03-23 00:25:12.794065102 +0100 @@ -138,6 +138,7 @@ ;; stop) stop + rtrn=0 ;; *) echo usage: $0 {start|stop|restart|reload|force-reload|condrestart|try-restart|status} ---8-- Best regards Andreas Mock ___ Openais mailing list open...@lists.linux-foundation.org https://lists.linux-foundation.org/mailman/listinfo/openais ___ Pacemaker mailing list Pacemaker@oss.clusterlabs.org http://oss.clusterlabs.org/mailman/listinfo/pacemaker
Re: [Pacemaker] Centos 5 and GFS
On Thu, Mar 25, 2010 at 1:53 AM, Cristian Mammoli - Apra Sistemi c.mamm...@apra.it wrote: Cristian Mammoli - Apra Sistemi wrote: Thank you Andrew, I'll try to backport it. Checking kernel: Current kernel version: 2.6.18 Minimum kernel version: 2.6.31 FAILED! I guess I'll have to wait for RHEL6 ;-( Better chances with OCFS2? No. You'll still need the DLM from the same package. I'd suggest setting up a Fedora-12 system to get familiar with everything while you wait for RHEL6. The getting started guide has step by step instructions for setting it up. ___ Pacemaker mailing list Pacemaker@oss.clusterlabs.org http://oss.clusterlabs.org/mailman/listinfo/pacemaker
Re: [Pacemaker] configuring the monitor interval
On Wed, Mar 24, 2010 at 10:44 PM, Alan Jones falanclus...@gmail.com wrote: Friends, The ocf:pacemaker:Dummy example resource agent script specifies a default monitoring interval (10) which I assume is 10 seconds. This seems like the appropriate place to specify this interval, ie. the resource implementation knows how heavy weight the monitor is and what is a good compromise, etc. Its just a hint for GUIs to display to their users. Its not actually used by the cluster. However, using the command line crm configuration I'm unable to get the monitor to be called with overiding this default, eg: primitive foo ocf:pacemaker:Dummy op monitor interval=20s Without the option, monitor isn't called; without specifiying the interval the option fails. I'd prefer not to configure a monitor interval for each instence of the every resource. Then the cluster won't periodically check the resource's health. - Is there a way to opt-in for enabling the monitor within the resource definition (script)? no - Is there a way to configure the monitor option in the crm command syntax without specifying the interval? no ___ Pacemaker mailing list Pacemaker@oss.clusterlabs.org http://oss.clusterlabs.org/mailman/listinfo/pacemaker
Re: [Pacemaker] Collocation Resources
On Wed, Mar 24, 2010 at 10:49 PM, Travis Dolan tra...@mylasso.com wrote: I believe I have found the appropriate where I need to go. Looks like the bug is assigned to you, let me know if I am incorrect. Perfect. Thanks. ___ Pacemaker mailing list Pacemaker@oss.clusterlabs.org http://oss.clusterlabs.org/mailman/listinfo/pacemaker
Re: [Pacemaker] [Openais] Unusual exit code with /etc/init.d/corosync stop
-Ursprüngliche Nachricht- Von: Andrew Beekhof and...@beekhof.net Gesendet: 25.03.2010 09:15:11 An: Andreas Mock andreas.m...@web.de Betreff: Re: [Openais] Unusual exit code with /etc/init.d/corosync stop On Tue, Mar 23, 2010 at 12:42 AM, Andreas Mock [ wrote: Hi all, I'm using corosync 1.2.0 from the packages of clusterlabs.org on openSuSE 11.2. A correct /etc/init.d/corosync stop issues a return code of 1 The rc code isn't coming from corosync at all. Its coming from the last command in stop(), which is echo. Where in my original post did I say that the return code comes from corosync (binary)?? Please read the mail completely. In the first sentence I just described the version and platform I'm using and that the script /etc/init.d/corosync issues a return code of 1 when stopping worked correctly. Some lines further - you can see them in your quoted post - I'll explain - probably in bad English - what the reason for this return code is, as I investigated this problem by debugging the script /etc/init.d/corosync. Read the rest of my mail carefully and you get the reason for that behaviour. a) The very last line is: exit $rtrn b) Where is the global variable $rtrn initialized and set?? c) It gets set in shell function status!! d) When you do a stop and the stop works status is called the last time in the while loop setting $rtrn to 1. e) This variable is never changed afterwards. f) It is returned by the last statement, look at a) Best regards Andreas Mock Please run the following and report the result: echo ; echo $? On Fedora it produces: [09:14 AM] r...@f12 ~/tmp # echo ; echo $? 0 [09:14 AM] r...@f12 ~/tmp # which definitely hurts the Cluster Test Suite when stopping the cluster stack asuming (IMHO correctly) that a problem free execution of the rc script should return 0 and not 1. The problem is indirectly the setting of the return code variable $rtrn in the while loop waiting for corosync to die. While loop is exited exactly when the status call delivers a 1 meaning that the process isn't there any more. This rc of 1 will then be delivered as return code of the stop-call. Here's the patch just to show the little change. ---8-- --- /etc/init.d/corosync 2010-01-20 21:23:53.0 +0100 +++ /tmp/corosync 2010-03-23 00:25:12.794065102 +0100 @@ -138,6 +138,7 @@ ;; stop) stop + rtrn=0 ;; *) echo usage: $0 {start|stop|restart|reload|force-reload|condrestart|try-restart|status} ---8-- Best regards Andreas Mock ___ Openais mailing list open...@lists.linux-foundation.org https://lists.linux-foundation.org/mailman/listinfo/openais ___ Pacemaker mailing list Pacemaker@oss.clusterlabs.org http://oss.clusterlabs.org/mailman/listinfo/pacemaker
[Pacemaker] YaST and SuSE HA add-on - DRBD configuration
Hi, We've got the SuSE Linux Enterprise 11 HA add-on, which comes with OpenAIS, Pacemaker and DRBD, as well as YaST modules for configuring these. We want to run two DRBD pairs: - One with ext3 in a standard master/slave configuration - One with ocfs2 in an active/active configuration I have two questions: 1. I understand I need to set net { allow-two-primaries; } in drbd.conf - is that correct? YaST doesn't have an option for this, and it overwrites the change if I put it in the file manually :-( 2. Does each DRBD device need a unique port? The default for /dev/drbd1 is 7789, and I've chosen 7788 for /dev/drbd2. Is this the correct thing to do? Thanks Martin ___ Pacemaker mailing list Pacemaker@oss.clusterlabs.org http://oss.clusterlabs.org/mailman/listinfo/pacemaker
Re: [Pacemaker] Centos 5 and GFS
Andrew Beekhof wrote: No. You'll still need the DLM from the same package. I'd suggest setting up a Fedora-12 system to get familiar with everything while you wait for RHEL6. The getting started guide has step by step instructions for setting it up. Unfortunately I only have a test environment, and the distro should be the same we use in production sites (RHEL5/CENTOS5). Anyway, thnaks for the time and the clarification. Cheers -- Cristian Mammoli APRA SISTEMI srl Via Brodolini,6 Jesi (AN) tel dir. 0731 719822 Web www.apra.it e-mail c.mamm...@apra.it ___ Pacemaker mailing list Pacemaker@oss.clusterlabs.org http://oss.clusterlabs.org/mailman/listinfo/pacemaker
Re: [Pacemaker] [Openais] Unusual exit code with /etc/init.d/corosync stop (Steve - Please ack new patch)
On Thu, Mar 25, 2010 at 9:32 AM, Andreas Mock andreas.m...@web.de wrote: -Ursprüngliche Nachricht- Von: Andrew Beekhof and...@beekhof.net Gesendet: 25.03.2010 09:15:11 An: Andreas Mock andreas.m...@web.de Betreff: Re: [Openais] Unusual exit code with /etc/init.d/corosync stop On Tue, Mar 23, 2010 at 12:42 AM, Andreas Mock [ wrote: Hi all, I'm using corosync 1.2.0 from the packages of clusterlabs.org on openSuSE 11.2. A correct /etc/init.d/corosync stop issues a return code of 1 The rc code isn't coming from corosync at all. Its coming from the last command in stop(), which is echo. Where in my original post did I say that the return code comes from corosync (binary)?? Please read the mail completely. In the first sentence I just described the version and platform I'm using and that the script /etc/init.d/corosync issues a return code of 1 when stopping worked correctly. Some lines further - you can see them in your quoted post - I'll explain - probably in bad English - what the reason for this return code is, as I investigated this problem by debugging the script /etc/init.d/corosync. Read the rest of my mail carefully and you get the reason for that behaviour. a) The very last line is: exit $rtrn b) Where is the global variable $rtrn initialized and set?? c) It gets set in shell function status!! d) When you do a stop and the stop works status is called the last time in the while loop setting $rtrn to 1. e) This variable is never changed afterwards. f) It is returned by the last statement, look at a) Do try to calm down a little. I made a mistake, it happens when one tries responding to 40-50 conversations a day. Patching after stop is wrong though, the root cause is status() not using a local variable. --- ./etc/init.d/corosync.old 2010-03-25 10:21:19.673779309 +0100 +++ ./etc/init.d/corosync 2010-03-25 10:23:47.318779319 +0100 @@ -40,13 +40,13 @@ failure() status() { pid=$(pidof $1 2/dev/null) - rtrn=$? - if [ $rtrn -ne 0 ]; then + rc=$? + if [ $rc -ne 0 ]; then echo $1 is stopped else echo $1 (pid $pid) is running... fi - return $rtrn + return $rc } ___ Pacemaker mailing list Pacemaker@oss.clusterlabs.org http://oss.clusterlabs.org/mailman/listinfo/pacemaker
Re: [Pacemaker] Centos 5 and GFS
On Thu, Mar 25, 2010 at 10:26 AM, Cristian Mammoli - Apra Sistemi c.mamm...@apra.it wrote: Andrew Beekhof wrote: No. You'll still need the DLM from the same package. I'd suggest setting up a Fedora-12 system to get familiar with everything while you wait for RHEL6. The getting started guide has step by step instructions for setting it up. Unfortunately I only have a test environment, and the distro should be the same we use in production sites (RHEL5/CENTOS5). Anyway, thnaks for the time and the clarification. What about some VMs? Just a thought. ___ Pacemaker mailing list Pacemaker@oss.clusterlabs.org http://oss.clusterlabs.org/mailman/listinfo/pacemaker
Re: [Pacemaker] [Openais] Unusual exit code with /etc/init.d/corosync stop (Steve - Please ack new patch)
-Ursprüngliche Nachricht- Von: Andrew Beekhof and...@beekhof.net Gesendet: 25.03.2010 10:29:50 An: Andreas Mock andreas.m...@web.de Betreff: Re: [Openais] Unusual exit code with /etc/init.d/corosync stop (Steve - Please ack new patch) Do try to calm down a little. Sorry, if it sounded upset. That was not my intention. In fact I thought that the way I expressed myself was the reason you didn't understand me and therefore I tried to sum it up in a different way. It seems to have worked. I made a mistake, it happens when one tries responding to 40-50 conversations a day. As it happens to everyone of us from time to time. :-) Be sure I have the greatest respect for your fast replies to any questions sent to the mailing list. I'm sure I'm not the only one being thankful for that. So, in fact I'm totally relaxed. The more that I read that we get corosync 1.2.1 soon from you guys. Best regards Andreas Mock ___ Pacemaker mailing list Pacemaker@oss.clusterlabs.org http://oss.clusterlabs.org/mailman/listinfo/pacemaker
Re: [Pacemaker] Question about Dual Primary DRBD + OCFS2
On Wed, Mar 24, 2010 at 12:33 PM, r...@free.fr wrote: Hi and thanks for you answer. Here my hb_report with 2 tests : * node standby / node online = Inconsistent drbd (resolved by drbdadm verify or reboot of the node) * ifdown / kill of dlm/corosync = instant reboot = I can't find any trace of this problem exept my screen dump. I hope you'll see something interresting in my logs. Oh, debian i686. Figures. You'll need to rebuild the packages yourself, the ones from Madkiss' repo don't seem to work for i686. Thanks you again for your help, Regards - Mail Original - De: Andrew Beekhof and...@beekhof.net À: The Pacemaker cluster resource manager pacemaker@oss.clusterlabs.org Envoyé: Mardi 23 Mars 2010 20h12:58 GMT +01:00 Amsterdam / Berlin / Berne / Rome / Stockholm / Vienne Objet: Re: [Pacemaker] Question about Dual Primary DRBD + OCFS2 We'd need a stack trace, that screen dump doesn't help much I'm afraid. Try using hb_report to grab the logs etc. It also includes backtraces from any cores it finds. On Tue, Mar 23, 2010 at 6:55 PM, r...@free.fr wrote: Hi, Some tests today... If I switch off my network interface (ifdown eth0) or if i kill (-9) corosync, i've got a segfault of dlm_controld and the node reboot. Is it normal ? My tests are too hard ? Thanks a lot ;-) Regards - Mail Original - De: r...@free.fr À: pacemaker@oss.clusterlabs.org Envoyé: Lundi 22 Mars 2010 18h03:49 GMT +01:00 Amsterdam / Berlin / Berne / Rome / Stockholm / Vienne Objet: [Pacemaker] Question about Dual Primary DRBD + OCFS2 Hi all, Following this doc http://www.clusterlabs.org/wiki/Dual_Primary_DRBD_%2B_OCFS2, I've just installed 2 nodes (with some minors adjustements) and now I'm testing my setup. If I set one node in standby and bring it online again, the other node sees this node Inconsistent. The node just back from standby mode is UpToDate for him. I've not this problem when I reboot a node (reboot). I think that the problem is (from my log) : ERROR: r0: Called drbdadm -c /etc/drbd.conf secondary r0 State change failed: (-12) Device is held open by someone I've no STONITH system :-( Is it a problem for my tests ? Thanks to all, sorry for my english. Regards. ___ Pacemaker mailing list Pacemaker@oss.clusterlabs.org http://oss.clusterlabs.org/mailman/listinfo/pacemaker ___ Pacemaker mailing list Pacemaker@oss.clusterlabs.org http://oss.clusterlabs.org/mailman/listinfo/pacemaker ___ Pacemaker mailing list Pacemaker@oss.clusterlabs.org http://oss.clusterlabs.org/mailman/listinfo/pacemaker ___ Pacemaker mailing list Pacemaker@oss.clusterlabs.org http://oss.clusterlabs.org/mailman/listinfo/pacemaker ___ Pacemaker mailing list Pacemaker@oss.clusterlabs.org http://oss.clusterlabs.org/mailman/listinfo/pacemaker
Re: [Pacemaker] WARN: Rexmit of seq ..........
Thanks for the help. I'll try to arrange a reboot of the current DC. See if that fixes it. Kind Regards --- Lester Joseph Linux Systems Administrator -Original Message- From: Lars Ellenberg [mailto:lars.ellenb...@linbit.com] Sent: Wednesday, March 24, 2010 9:55 PM To: pacemaker@oss.clusterlabs.org Subject: Re: [Pacemaker] WARN: Rexmit of seq .. On Wed, Mar 24, 2010 at 11:40:54AM +, Joseph, Lester wrote: Yes they can. That's the confusing bit As I mentioned in previous email, we rebooted the switch and the nodes would have lost connectivity briefly. But the switch is back online now and the nodes have connectivity as they did before. Yet the message is constantly being generated despite my actions. I will continue to troubleshoot. I have this theory that include/heartbeat.h:195:#define MAXMSGHIST 500 may be too low and may wrap (several times?) on a busy pacemaker cluster, or if you have very low keepalive set, once you have flaky communication problems for an extended period of time. If someone has a way to reproduce this behaviour, we could check if upping that define would fix it (extend the period where heartbeat can cope with flaky). How to get out of there? well... First, get your comms in order. Then, maybe restarting the loudest node helps? Or on both? Or on the node that does _not_ log those messages? Dunno... never seen this myself. Though there are sporadic reports of similar messages in the archives, and mystical workarounds involving the deletion of some files of which I very much doubt that have anything to do with this particular Rexmit message... -- : Lars Ellenberg : LINBIT | Your Way to High Availability : DRBD/HA support and consulting http://www.linbit.com DRBD(r) and LINBIT(r) are registered trademarks of LINBIT, Austria. ___ Pacemaker mailing list Pacemaker@oss.clusterlabs.org http://oss.clusterlabs.org/mailman/listinfo/pacemaker This e-mail has been scanned for all viruses by Star. The service is powered by MessageLabs. For more information on a proactive anti-virus service working around the clock, around the globe, visit: http://www.star.net.uk This email has been sent from Gala Coral Group Limited (GCG) or a subsidiary or associated company. GCG is registered in England with company number 4639005. Registered office address: 71 Queensway, London W2 4QH, United Kingdom; website: www.galacoral.com. This e-mail message (and any attachments) is confidential and may contain privileged and/or proprietorial information protected by legal rules. It is for use by the intended addressee only. If you believe you are not the intended recipient or that the sender is not authorised to send you the email, please return it to the sender (and please copy it to h...@galacoral.com) and then delete it from your computer. You should not otherwise copy or disclose its contents to anyone. Except where this email is sent in the usual course of business, the views expressed are those of the sender and not necessarily ours. We reserve the right to monitor all emails sent to and from our businesses, to protect the businesses and to ensure compliance with internal policies. Emails are not secure and cannot be guaranteed to be error-free, as they can be intercepted, amended, lost or destroyed, and may contain viruses; anyone who communicates with us by email is taken to accept these risks. GCG accepts no liability for any loss or damage which may be caused by software viruses. ___ Pacemaker mailing list Pacemaker@oss.clusterlabs.org http://oss.clusterlabs.org/mailman/listinfo/pacemaker
Re: [Pacemaker] Centos 5 and GFS
Andrew Beekhof wrote: The getting started guide has step by step instructions for setting it up. Unfortunately I only have a test environment, and the distro should be the same we use in production sites (RHEL5/CENTOS5). Anyway, thnaks for the time and the clarification. What about some VMs? Just a thought. The cluster itself runs vmware virtual machines and I don't think vmware server2 or ESX support nested virtualization. I'll probably try with FC12 and some dummy resources, just to do failover tests and some benchmark of gfs2 vs ext3. Thanks -- Cristian Mammoli APRA SISTEMI srl Via Brodolini,6 Jesi (AN) tel dir. 0731 719822 Web www.apra.it e-mail c.mamm...@apra.it ___ Pacemaker mailing list Pacemaker@oss.clusterlabs.org http://oss.clusterlabs.org/mailman/listinfo/pacemaker
Re: [Pacemaker] pingd fails to update CIB
I have the same problem as Quentin Smith (sticky pingd value=0). [12:49:44 ha1] ~ rpm -qa | egrep -i pacemaker|corosync|heartbeat|resource heartbeat-libs-3.0.2-2.el5 heartbeat-3.0.2-2.el5 pacemaker-libs-1.0.8-1.el5 resource-agents-1.0.1-1.el5 corosync-1.2.0-1.el5 pacemaker-1.0.8-1.el5 corosynclib-1.2.0-1.el5 [12:38:10 ha1] ~ crm configure show node ha1.intra.genaker.net node ha2.intra.genaker.net primitive MyDrbd ocf:linbit:drbd \ params drbd_resource=r0 \ op monitor interval=15s \ op start interval=0 timeout=240s \ op stop interval=0 timeout=100s primitive MyFs ocf:heartbeat:Filesystem \ params device=/dev/drbd/by-res/r0 directory=/data fstype=ext3 \ op start interval=0 timeout=60s \ op stop interval=0 timeout=60s primitive MyIp ocf:heartbeat:IPaddr2 \ params ip=10.1.1.6 nic=eth0 cidr_netmask=32 broadcast=10.1.1.255 iflabel=Cluster \ op monitor interval=10s \ op start interval=0 timeout=90s \ op stop interval=0 timeout=100s primitive MyPing ocf:pacemaker:pingd \ params host_list=10.1.1.1 multiplier=100 \ op monitor interval=15s timeout=20s \ op start interval=0 timeout=90s \ op stop interval=0 timeout=100s group MyGroup MyFs MyIp ms MyMsDrbd MyDrbd \ meta master-max=1 master-node-max=1 clone-max=2 clone-node-max=1 notify=true clone MyPingClone MyPing \ meta globally-unique=false location MyLocation MyMsDrbd \ rule $id=MyLocation-rule $role=Master -inf: not_defined pingd or pingd lte 0 colocation MyColocation inf: MyGroup MyMsDrbd:Master order MyOrder inf: MyMsDrbd:promote MyGroup:start property $id=cib-bootstrap-options \ cluster-infrastructure=openais \ stonith-enabled=false \ no-quorum-policy=ignore \ dc-version=1.0.8-2a76c6ac04bcccf42b89a08e55bfbd90da2fb49a \ expected-quorum-votes=2 [12:23:54 ha1] ~ cibadmin -Q | grep name=\pingd\ value= nvpair id=status-ha2.intra.genaker.net-pingd name=pingd value=0/ nvpair id=status-ha1.intra.genaker.net-pingd name=pingd value=0/ [12:23:59 ha1] ~ attrd_updater -R [12:24:13 ha1] ~ cibadmin -Q | grep name=\pingd\ value= nvpair id=status-ha2.intra.genaker.net-pingd name=pingd value=100/ nvpair id=status-ha1.intra.genaker.net-pingd name=pingd value=100/ ___ Pacemaker mailing list Pacemaker@oss.clusterlabs.org http://oss.clusterlabs.org/mailman/listinfo/pacemaker
[Pacemaker] showscores.sh script (patch?)
Hi all, mainly Dominik Klein - thanks for the showscores.sh script. But I am curious, why the script filters numeric values only for failcount values meaning INFINITY are just ignored: 1. Simulating a failure Failed actions: dom0-fs-Dom0_start_0 (node=vsp11.example.com, call=15, rc=5, status=complete): not installed 2. Before patch ResourceScore Node Stickiness #FailMigration-Threshold dom0-drbd-Dom0 -100 vsp11.example.com 90 0 dom0-drbd-Dom0 410 vsp9.example.com 90 0 dom0-fs-Dom00 vsp9.example.com 0 0 dom0-fs-Dom0-100 vsp11.example.com 0 == Note no #Fail value 3. After patch ResourceScore Node Stickiness #FailMigration-Threshold dom0-drbd-Dom0 -100 vsp11.example.com 90 0 dom0-drbd-Dom0 410 vsp9.example.com 90 0 dom0-fs-Dom00 vsp9.example.com 0 0 dom0-fs-Dom0-100 vsp11.example.com 0 INFINITY == Note we can see the INF value now Would it be feasible to apply this patch: -- --- showscores.sh 2010-03-25 14:59:21.0 + +++ showscores-new.sh 2010-03-25 15:00:38.0 + @@ -91,3 +91,3 @@ get_failcount() { #usage $0 res node -failcount=`crm_failcount -G -r $1 -U $2 -Q 2/dev/null|grep -o ^[0-9]*$` +failcount=`crm_failcount -G -r $1 -U $2 -Q 2/dev/null` } -- With regards, Tino ___ Pacemaker mailing list Pacemaker@oss.clusterlabs.org http://oss.clusterlabs.org/mailman/listinfo/pacemaker
[Pacemaker] Someone using ibmrsa-telnet external stonith plugin?
Hi all, is there someone using the external stonith plugin 'ibmrsa-telnet'? I found some issues introduced by modifications of other contributors in the currect version. I want to correct the issues and present a patch for that. As the RSA or IMM boards behave all a little bit different I need people willing to test the (hopefully) corrected script on their platforms. Please contact me. I would appreciate it. Best regards Andreas Mock ___ Pacemaker mailing list Pacemaker@oss.clusterlabs.org http://oss.clusterlabs.org/mailman/listinfo/pacemaker
Re: [Pacemaker] DRBD Management Console 0.7.0
On Thu, March 25, 2010 4:29 pm, martin.br...@icw.de wrote: Hi Rasto, I played around with the MC and it is really a promising integrative approach for managing a DRBD and Pacemaker Cluster. For now it is really nice for demonstration purposes like detaching the primary with failover. What I am missing is a resource cleanup (crm resource cleanup resource name), is this function in your release plan? It is there. When you click on right click on a resource or in the resource Actions menu. It is called either Restart Failed (Clean up) or Reset Fail-Count (Clean Up). This works only if you have some fail-count, or the resource failed completely. If you want to clean-up the resource just so, there is more planed, see this discussion: http://www.mail-archive.com/drbd...@lists.linbit.com/msg00081.html Basically I am trying to hide things like LRM and cleanup from the user. cleanup could be something like clear history, where there would be also show history. When a resource fails a user should be able to quickly identify how to activate the resource, removing the error messages. Cleanup and fail-counter are not very descriptive in this case. Anyway I will give it still some more thought. Rasto -- : Dipl-Ing Rastislav Levrinc : DRBD-MC http://www.drbd.org/mc/management-console/ : DRBD/HA support and consulting http://www.linbit.com/ DRBD(R) and LINBIT(R) are registered trademarks of LINBIT, Austria. ___ Pacemaker mailing list Pacemaker@oss.clusterlabs.org http://oss.clusterlabs.org/mailman/listinfo/pacemaker
[Pacemaker] CAn't install resource-agents-1.0.2 on EL5
Hello all, Sorry iof this has been covered already - I dug through the lists and didn't see this problem anywhere. In the process of migrating up to centOS 5.4 from 5.3 I noticed that I cannot install the newer resource-agents-1.0.2 from yum. I have the older one installed (v1.0.1), and want to get it up to date. Yum transaction fails from missing dependancy: Setting up Update Process Resolving Dependencies -- Running transaction check --- Package resource-agents.x86_64 0:1.0.2-1 set to be updated -- Processing Dependency: libnet.so.1()(64bit) for package: resource-agents -- Finished Dependency Resolution resource-agents-1.0.2-1.x86_64 from linbit has depsolving problems -- Missing Dependency: libnet.so.1()(64bit) is needed by package resource-agents-1.0.2-1.x86_64 (linbit) Error: Missing Dependency: libnet.so.1()(64bit) is needed by package resource-agents-1.0.2-1.x86_64 (linbit) I have the usual default CentOS-base.repo, plus I add epel and rpmforge besides clusterlabs and linbit (we have support contract with Linbit). Where can I get this libnet.so.1()(64bit) dependancy from?? Yum can't find it: [r...@ccsha2 ~]# yum whatprovides */libnet.so.1 Loaded plugins: fastestmirror, priorities Loading mirror speeds from cached hostfile * addons: linux.mirrors.es.net * base: repo.genomics.upenn.edu * epel: archive.linux.duke.edu * extras: ftp.wallawalla.edu * rpmforge: apt.sw.be * updates: ftp.ussg.iu.edu Excluding Packages from CentOS-5 - Extras Finished epel/filelists_db| 4.1 MB 00:01 No Matches found -Thanks! Kenneth M DeChick Linux Systems Administrator Community Computer Service, Inc. (315)-255-1751 ext154 http://www.medent.com k...@medent.com Registered Linux User #497318 -- -- -- -- -- -- -- -- -- -- -- You canna change the laws of physics, Captain; I've got to have thirtyminutes! . This message has been scanned for viruses and dangerous content by MailScanner, SpamAssassin ClamAV. This message and any attachments may contain information that is protected by law as privileged and confidential, and is transmitted for the sole use of the intended recipient(s). If you are not the intended recipient, you are hereby notified that any use, dissemination, copying or retention of this e-mail or the information contained herein is strictly prohibited. If you received this e-mail in error, please immediately notify the sender by e-mail, and permanently delete this e-mail. ___ Pacemaker mailing list Pacemaker@oss.clusterlabs.org http://oss.clusterlabs.org/mailman/listinfo/pacemaker
Re: [Pacemaker] Someone using ibmrsa-telnet external stonith plugin?
-Ursprüngliche Nachricht- Von: Florian Haas florian.h...@linbit.com Gesendet: 25.03.2010 16:23:59 An: The Pacemaker cluster resource manager pacemaker@oss.clusterlabs.org Betreff: Re: [Pacemaker] Someone using ibmrsa-telnet external stonith plugin? On 03/25/2010 04:09 PM, Andreas Mock wrote: Hi all, is there someone using the external stonith plugin 'ibmrsa-telnet'? I found some issues introduced by modifications of other contributors in the currect version. Such as? a) ha_log.sh is used to send debug messages. It is called as bash command line without escaping dangerous characters. This leads to unwanted file creation in the directory /var/lib/heartbeat/cores/root/(replicateable) b) The regex pattern for the expect command don't match. So the communication doesn't work as originally intended. I contacted the contributor of that piece of code. He found that error early after his contribution, but the error correction went to /dev/null somehow. I am beginning to believe that everyone should start using IPMI, and spare themselves of these proprietary out-of-band nightmares. There are some questions regarding this: a) Has anyone made experiences with using IPMI through the stack running on the OS. If I understand it right then OpenIPMI provides this kind of in-band-communication. My understanding of STONITH was that there has to be a way to kill a node WITHOUT a dependency on the node's health. Is it safe to use the IPMI interface provided by a daemon running in the OS of the node I want to shoot? b) The newer IMM supports IPMI through the out-bound-communication over the IMM ip address. RSA II does not have this option as far as I know (updates welcome) what has been the reason for writing this telnet-beast. ;-) c) Which stonith agent is the better one 'ipmilan' or 'external/ipmi'? Enlighting informations very welcome. Best regards Andreas Mock signature.asc Description: OpenPGP digital signature ___ Pacemaker mailing list Pacemaker@oss.clusterlabs.org http://oss.clusterlabs.org/mailman/listinfo/pacemaker ___ Pacemaker mailing list Pacemaker@oss.clusterlabs.org http://oss.clusterlabs.org/mailman/listinfo/pacemaker
Re: [Pacemaker] Someone using ibmrsa-telnet external stonith plugin?
-Ursprüngliche Nachricht- Von: Dejan Muhamedagic deja...@fastmail.fm Gesendet: 25.03.2010 16:33:55 An: pacemaker@oss.clusterlabs.org Betreff: Re: [Pacemaker] Someone using ibmrsa-telnet external stonith plugin? Just provide patches. Hi Dejan, see attached. Two problems should be solved. (see answer to Florian Haas on this thread) Best regards Andreas Mock ibmrsa-telnet.patch Description: Binary data ___ Pacemaker mailing list Pacemaker@oss.clusterlabs.org http://oss.clusterlabs.org/mailman/listinfo/pacemaker
Re: [Pacemaker] About replacement of clone and handling of the fail number of times.
Hi Andrew, globally-unique=false means that :0 and :1 are actually the same resource. its perfectly valid for entries for both to exist on the node, but the PE should fold them together internally. in most ways it does, just not for failures (yet). Thank you for comment. Some we were confused. In the first place, by setting of globally-unique, what kind of difference is there? In addition, what kind of place do you use it in? Best Regards, Hideo Yamauchi. --- Andrew Beekhof and...@beekhof.net wrote: 2010/3/24 renayama19661...@ybb.ne.jp: Hi Andrew, Do you mean: why is the clone on srv01 always $clone:0 but on srv02 its sometimes $clone:0 and sometimes $clone:1 ? yes. The replacement thought both nodes to be the same movement. Because it is globally-unique=false. globally-unique=false means that :0 and :1 are actually the same resource. its perfectly valid for entries for both to exist on the node, but the PE should fold them together internally. in most ways it does, just not for failures (yet). ___ Pacemaker mailing list Pacemaker@oss.clusterlabs.org http://oss.clusterlabs.org/mailman/listinfo/pacemaker ___ Pacemaker mailing list Pacemaker@oss.clusterlabs.org http://oss.clusterlabs.org/mailman/listinfo/pacemaker