Re: [Pacemaker] [Openais] Unusual exit code with /etc/init.d/corosync stop

2010-03-25 Thread Andrew Beekhof
On Tue, Mar 23, 2010 at 12:42 AM, Andreas Mock andreas.m...@web.de wrote:
 Hi all,

 I'm using corosync 1.2.0 from the packages of clusterlabs.org on openSuSE 
 11.2.
 A correct /etc/init.d/corosync stop issues a return code of 1

The rc code isn't coming from corosync at all.
Its coming from the last command in stop(), which is echo.

Please run the following and report the result:
   echo ; echo $?

On Fedora it produces:

[09:14 AM] r...@f12 ~/tmp # echo ; echo $?

0
[09:14 AM] r...@f12 ~/tmp #


 which definitely hurts
 the Cluster Test Suite when stopping the cluster stack asuming (IMHO 
 correctly)
 that a problem free execution of the rc script should return 0 and not 1.



 The problem is indirectly the setting of the return code variable $rtrn in 
 the while

 loop waiting for corosync to die. While loop is exited exactly when the status

 call delivers a 1 meaning that the process isn't there any more. This rc of 1

 will then be delivered as return code of the stop-call.



 Here's the patch just to show the little change.

 ---8--

 --- /etc/init.d/corosync 2010-01-20 21:23:53.0 +0100
 +++ /tmp/corosync 2010-03-23 00:25:12.794065102 +0100
 @@ -138,6 +138,7 @@
 ;;
 stop)
 stop
 + rtrn=0
 ;;
 *)
 echo usage: $0 
 {start|stop|restart|reload|force-reload|condrestart|try-restart|status}
 ---8--



 Best regards

 Andreas Mock








 ___
 Openais mailing list
 open...@lists.linux-foundation.org
 https://lists.linux-foundation.org/mailman/listinfo/openais

___
Pacemaker mailing list
Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker


Re: [Pacemaker] Centos 5 and GFS

2010-03-25 Thread Andrew Beekhof
On Thu, Mar 25, 2010 at 1:53 AM, Cristian Mammoli - Apra Sistemi
c.mamm...@apra.it wrote:
 Cristian Mammoli - Apra Sistemi wrote:

 Thank you Andrew, I'll try to backport it.

 Checking kernel:
  Current kernel version: 2.6.18
  Minimum kernel version: 2.6.31
  FAILED!

 I guess I'll have to wait for RHEL6 ;-(
 Better chances with OCFS2?

No. You'll still need the DLM from the same package.
I'd suggest setting up a Fedora-12 system to get familiar with
everything while you wait for RHEL6.

The getting started guide has step by step instructions for setting it up.

___
Pacemaker mailing list
Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker


Re: [Pacemaker] configuring the monitor interval

2010-03-25 Thread Andrew Beekhof
On Wed, Mar 24, 2010 at 10:44 PM, Alan Jones falanclus...@gmail.com wrote:
 Friends,
 The ocf:pacemaker:Dummy example resource agent script specifies a default
 monitoring interval (10)
 which I assume is 10 seconds.  This seems like the appropriate place to
 specify this interval, ie.
 the resource implementation knows how heavy weight the monitor is and what
 is a good compromise, etc.

Its just a hint for GUIs to display to their users.
Its not actually used by the cluster.

 However, using the command line crm configuration I'm unable to get the
 monitor to be called with
 overiding this default, eg:

 primitive foo ocf:pacemaker:Dummy op monitor interval=20s

 Without the option, monitor isn't called; without specifiying the interval
 the option fails.
 I'd prefer not to configure a monitor interval for each instence of the
 every resource.

Then the cluster won't periodically check the resource's health.


 - Is there a way to opt-in for enabling the monitor within the resource
 definition (script)?

no

 - Is there a way to configure the monitor option in the crm command syntax
 without specifying the interval?

no

___
Pacemaker mailing list
Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker


Re: [Pacemaker] Collocation Resources

2010-03-25 Thread Andrew Beekhof
On Wed, Mar 24, 2010 at 10:49 PM, Travis Dolan tra...@mylasso.com wrote:
 I believe I have found the appropriate where I need to go. Looks like the
 bug is assigned to you, let me know if I am incorrect.

Perfect. Thanks.

___
Pacemaker mailing list
Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker


Re: [Pacemaker] [Openais] Unusual exit code with /etc/init.d/corosync stop

2010-03-25 Thread Andreas Mock
-Ursprüngliche Nachricht-
Von: Andrew Beekhof and...@beekhof.net
Gesendet: 25.03.2010 09:15:11
An: Andreas Mock andreas.m...@web.de
Betreff: Re: [Openais] Unusual exit code with /etc/init.d/corosync stop

On Tue, Mar 23, 2010 at 12:42 AM, Andreas Mock [ wrote:
 Hi all,

 I'm using corosync 1.2.0 from the packages of clusterlabs.org on openSuSE 
 11.2.
 A correct /etc/init.d/corosync stop issues a return code of 1

The rc code isn't coming from corosync at all.
Its coming from the last command in stop(), which is echo.

Where in my original post did I say that the return code comes from  corosync 
(binary)??

Please read the mail completely. In the first sentence I just described the
version and platform I'm using and that the script /etc/init.d/corosync issues a
return code of 1 when stopping worked correctly.

Some lines further - you can see them in your quoted post - I'll explain - 
probably in bad English -
what the reason for this return code is, as I investigated this problem by 
debugging 
the script /etc/init.d/corosync.

Read the rest of my mail carefully and you get the reason for that behaviour.
a) The very last line is: exit $rtrn
b) Where is the global variable $rtrn initialized and set??
c) It gets set in shell function status!!
d) When you do a stop and the stop works status is called the last time in the 
while
loop setting $rtrn to 1.
e) This variable is never changed afterwards.
f) It is returned by the last statement, look at a)


Best regards
Andreas Mock



Please run the following and report the result:
   echo ; echo $?

On Fedora it produces:

[09:14 AM] r...@f12 ~/tmp # echo ; echo $?

0
[09:14 AM] r...@f12 ~/tmp #


 which definitely hurts
 the Cluster Test Suite when stopping the cluster stack asuming (IMHO 
 correctly)
 that a problem free execution of the rc script should return 0 and not 1.



 The problem is indirectly the setting of the return code variable $rtrn in 
 the while

 loop waiting for corosync to die. While loop is exited exactly when the 
 status

 call delivers a 1 meaning that the process isn't there any more. This rc of 1

 will then be delivered as return code of the stop-call.



 Here's the patch just to show the little change.

 ---8--

 --- /etc/init.d/corosync 2010-01-20 21:23:53.0 +0100
 +++ /tmp/corosync 2010-03-23 00:25:12.794065102 +0100
 @@ -138,6 +138,7 @@
 ;;
 stop)
 stop
 + rtrn=0
 ;;
 *)
 echo usage: $0 
 {start|stop|restart|reload|force-reload|condrestart|try-restart|status}
 ---8--



 Best regards

 Andreas Mock








 ___
 Openais mailing list
 open...@lists.linux-foundation.org
 https://lists.linux-foundation.org/mailman/listinfo/openais

___
Pacemaker mailing list
Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker


[Pacemaker] YaST and SuSE HA add-on - DRBD configuration

2010-03-25 Thread Martin Aspeli

Hi,

We've got the SuSE Linux Enterprise 11 HA add-on, which comes with 
OpenAIS, Pacemaker and DRBD, as well as YaST modules for configuring these.


We want to run two DRBD pairs:

 - One with ext3 in a standard master/slave configuration
 - One with ocfs2 in an active/active configuration

I have two questions:

 1. I understand I need to set net { allow-two-primaries; } in 
drbd.conf - is that correct? YaST doesn't have an option for this, and 
it overwrites the change if I put it in the file manually :-(


 2. Does each DRBD device need a unique port? The default for 
/dev/drbd1 is 7789, and I've chosen 7788 for /dev/drbd2. Is this the 
correct thing to do?


Thanks
Martin


___
Pacemaker mailing list
Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker


Re: [Pacemaker] Centos 5 and GFS

2010-03-25 Thread Cristian Mammoli - Apra Sistemi

Andrew Beekhof wrote:


No. You'll still need the DLM from the same package.
I'd suggest setting up a Fedora-12 system to get familiar with
everything while you wait for RHEL6.

The getting started guide has step by step instructions for setting it up.


Unfortunately I only have a test environment, and the distro should be 
the same we use in production sites (RHEL5/CENTOS5).

Anyway, thnaks for the time and the clarification.

Cheers
--
Cristian Mammoli
APRA SISTEMI srl
Via Brodolini,6 Jesi (AN)
tel dir. 0731 719822

Web   www.apra.it
e-mail  c.mamm...@apra.it

___
Pacemaker mailing list
Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker


Re: [Pacemaker] [Openais] Unusual exit code with /etc/init.d/corosync stop (Steve - Please ack new patch)

2010-03-25 Thread Andrew Beekhof
On Thu, Mar 25, 2010 at 9:32 AM, Andreas Mock andreas.m...@web.de wrote:
 -Ursprüngliche Nachricht-
 Von: Andrew Beekhof and...@beekhof.net
 Gesendet: 25.03.2010 09:15:11
 An: Andreas Mock andreas.m...@web.de
 Betreff: Re: [Openais] Unusual exit code with /etc/init.d/corosync stop

On Tue, Mar 23, 2010 at 12:42 AM, Andreas Mock [ wrote:
 Hi all,

 I'm using corosync 1.2.0 from the packages of clusterlabs.org on openSuSE 
 11.2.
 A correct /etc/init.d/corosync stop issues a return code of 1

The rc code isn't coming from corosync at all.
Its coming from the last command in stop(), which is echo.

 Where in my original post did I say that the return code comes from  corosync 
 (binary)??

 Please read the mail completely. In the first sentence I just described the
 version and platform I'm using and that the script /etc/init.d/corosync 
 issues a
 return code of 1 when stopping worked correctly.

 Some lines further - you can see them in your quoted post - I'll explain - 
 probably in bad English -
 what the reason for this return code is, as I investigated this problem by 
 debugging
 the script /etc/init.d/corosync.

 Read the rest of my mail carefully and you get the reason for that behaviour.
 a) The very last line is: exit $rtrn
 b) Where is the global variable $rtrn initialized and set??
 c) It gets set in shell function status!!
 d) When you do a stop and the stop works status is called the last time in 
 the while
 loop setting $rtrn to 1.
 e) This variable is never changed afterwards.
 f) It is returned by the last statement, look at a)

Do try to calm down a little.
I made a mistake, it happens when one tries responding to 40-50
conversations a day.

Patching after stop is wrong though, the root cause is status() not
using a local variable.

--- ./etc/init.d/corosync.old   2010-03-25 10:21:19.673779309 +0100
+++ ./etc/init.d/corosync   2010-03-25 10:23:47.318779319 +0100
@@ -40,13 +40,13 @@ failure()
 status()
 {
pid=$(pidof $1 2/dev/null)
-   rtrn=$?
-   if [ $rtrn -ne 0 ]; then
+   rc=$?
+   if [ $rc -ne 0 ]; then
echo $1 is stopped
else
echo $1 (pid $pid) is running...
fi
-   return $rtrn
+   return $rc
 }

___
Pacemaker mailing list
Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker


Re: [Pacemaker] Centos 5 and GFS

2010-03-25 Thread Andrew Beekhof
On Thu, Mar 25, 2010 at 10:26 AM, Cristian Mammoli - Apra Sistemi
c.mamm...@apra.it wrote:
 Andrew Beekhof wrote:

 No. You'll still need the DLM from the same package.
 I'd suggest setting up a Fedora-12 system to get familiar with
 everything while you wait for RHEL6.

 The getting started guide has step by step instructions for setting it up.

 Unfortunately I only have a test environment, and the distro should be the
 same we use in production sites (RHEL5/CENTOS5).
 Anyway, thnaks for the time and the clarification.

What about some VMs?  Just a thought.

___
Pacemaker mailing list
Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker


Re: [Pacemaker] [Openais] Unusual exit code with /etc/init.d/corosync stop (Steve - Please ack new patch)

2010-03-25 Thread Andreas Mock
-Ursprüngliche Nachricht-
Von: Andrew Beekhof and...@beekhof.net
Gesendet: 25.03.2010 10:29:50
An: Andreas Mock andreas.m...@web.de
Betreff: Re: [Openais] Unusual exit code with /etc/init.d/corosync stop (Steve  
- Please ack new patch)

Do try to calm down a little.

Sorry, if it sounded upset. That was not my intention. 

In fact I thought that the way I expressed myself was the
reason you didn't understand me and therefore I tried to sum it
up in a different way. It seems to have worked.

I made a mistake, it happens when one tries responding to 40-50
conversations a day.

As it happens to everyone of us from time to time.  :-)

Be sure I have the greatest respect for your fast replies to any questions
sent to the mailing list. I'm sure I'm not the only one being thankful
for that. 

So, in fact I'm totally relaxed. The more that I read that we
get corosync 1.2.1 soon from you guys.

Best regards
Andreas Mock

___
Pacemaker mailing list
Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker


Re: [Pacemaker] Question about Dual Primary DRBD + OCFS2

2010-03-25 Thread Andrew Beekhof
On Wed, Mar 24, 2010 at 12:33 PM,  r...@free.fr wrote:
 Hi and thanks for you answer.
 Here my hb_report with 2 tests :
 * node standby / node online = Inconsistent drbd (resolved by drbdadm verify 
 or reboot of the node)
 * ifdown / kill of dlm/corosync = instant reboot = I can't find any trace of 
 this problem exept my screen dump.
 I hope you'll see something interresting in my logs.

Oh, debian i686. Figures.
You'll need to rebuild the packages yourself, the ones from Madkiss'
repo don't seem to work for i686.


 Thanks you again for your help,
 Regards


 - Mail Original -
 De: Andrew Beekhof and...@beekhof.net
 À: The Pacemaker cluster resource manager pacemaker@oss.clusterlabs.org
 Envoyé: Mardi 23 Mars 2010 20h12:58 GMT +01:00 Amsterdam / Berlin / Berne / 
 Rome / Stockholm / Vienne
 Objet: Re: [Pacemaker] Question about Dual Primary DRBD + OCFS2

 We'd need a stack trace, that screen dump doesn't help much I'm afraid.
 Try using hb_report to grab the logs etc.  It also includes backtraces
 from any cores it finds.

 On Tue, Mar 23, 2010 at 6:55 PM,  r...@free.fr wrote:
 Hi,

 Some tests today...
 If I switch off my network interface (ifdown eth0) or if i kill (-9) 
 corosync, i've got a segfault of dlm_controld and the node reboot.
 Is it normal ? My tests are too hard ?

 Thanks a lot ;-)

 Regards

 - Mail Original -
 De: r...@free.fr
 À: pacemaker@oss.clusterlabs.org
 Envoyé: Lundi 22 Mars 2010 18h03:49 GMT +01:00 Amsterdam / Berlin / Berne / 
 Rome / Stockholm / Vienne
 Objet: [Pacemaker] Question about Dual Primary DRBD + OCFS2

 Hi all,

 Following this doc 
 http://www.clusterlabs.org/wiki/Dual_Primary_DRBD_%2B_OCFS2, I've just 
 installed 2 nodes (with some minors adjustements) and now I'm testing my 
 setup.
 If I set one node in standby and bring it online again, the other node sees 
 this node Inconsistent. The node just back from standby mode is UpToDate 
 for him.
 I've not this problem when I reboot a node (reboot).
 I think that the problem is (from my log) :
 ERROR: r0: Called drbdadm -c /etc/drbd.conf secondary r0
 State change failed: (-12) Device is held open by someone

 I've no STONITH system :-( Is it a problem for my tests ?

 Thanks to all, sorry for my english.
 Regards.

 ___
 Pacemaker mailing list
 Pacemaker@oss.clusterlabs.org
 http://oss.clusterlabs.org/mailman/listinfo/pacemaker

 ___
 Pacemaker mailing list
 Pacemaker@oss.clusterlabs.org
 http://oss.clusterlabs.org/mailman/listinfo/pacemaker



 ___
 Pacemaker mailing list
 Pacemaker@oss.clusterlabs.org
 http://oss.clusterlabs.org/mailman/listinfo/pacemaker

 ___
 Pacemaker mailing list
 Pacemaker@oss.clusterlabs.org
 http://oss.clusterlabs.org/mailman/listinfo/pacemaker



___
Pacemaker mailing list
Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker


Re: [Pacemaker] WARN: Rexmit of seq ..........

2010-03-25 Thread Joseph, Lester
Thanks for the help.
I'll try to arrange a reboot of the current DC. See if that fixes it.

Kind Regards
---
Lester Joseph
Linux Systems Administrator


-Original Message-
From: Lars Ellenberg [mailto:lars.ellenb...@linbit.com]
Sent: Wednesday, March 24, 2010 9:55 PM
To: pacemaker@oss.clusterlabs.org
Subject: Re: [Pacemaker] WARN: Rexmit of seq ..

On Wed, Mar 24, 2010 at 11:40:54AM +, Joseph, Lester wrote:
 Yes they can. That's the confusing bit

 As I mentioned in previous email, we rebooted the switch and the nodes
 would have lost connectivity briefly. But the switch is back online
 now and the nodes have connectivity as they did before. Yet the
 message is constantly being generated despite my actions.

 I will continue to troubleshoot.

I have this theory that
include/heartbeat.h:195:#define MAXMSGHIST  500
may be too low and may wrap (several times?) on a busy pacemaker
cluster, or if you have very low keepalive set, once you have flaky
communication problems for an extended period of time.

If someone has a way to reproduce this behaviour,
we could check if upping that define would fix it
(extend the period where heartbeat can cope with flaky).

How to get out of there?
well...

First, get your comms in order.

Then, maybe restarting the loudest node helps?
Or on both? Or on the node that does _not_ log those messages?
Dunno... never seen this myself.
Though there are sporadic reports of similar messages in the archives,
and mystical workarounds involving the deletion of some files of which
I very much doubt that have anything to do with this particular Rexmit
message...


--
: Lars Ellenberg
: LINBIT | Your Way to High Availability
: DRBD/HA support and consulting http://www.linbit.com

DRBD(r) and LINBIT(r) are registered trademarks of LINBIT, Austria.

___
Pacemaker mailing list
Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker


This e-mail has been scanned for all viruses by Star. The
service is powered by MessageLabs. For more information on a proactive
anti-virus service working around the clock, around the globe, visit:
http://www.star.net.uk


This email has been sent from Gala Coral Group Limited (GCG) or a subsidiary 
or associated company. GCG is registered in England with company number 
4639005.   Registered office address: 71 Queensway, London W2 4QH, United 
Kingdom; website: www.galacoral.com.

This e-mail message (and any attachments) is confidential and may contain 
privileged and/or proprietorial information protected by legal rules.  It is 
for use by the intended addressee only. If you believe you are not the intended 
recipient or that the sender is not authorised to send you the email, please 
return it to the sender (and please copy it to h...@galacoral.com) and then 
delete it from your computer.  You should not otherwise copy or disclose its 
contents to anyone.

Except where this email is sent in the usual course of business, the views 
expressed are those of the sender and not necessarily ours.  We reserve the 
right to monitor all emails sent to and from our businesses, to protect the 
businesses and to ensure compliance with internal policies.

Emails are not secure and cannot be guaranteed to be error-free, as they can be 
intercepted, amended, lost or destroyed, and may contain viruses; anyone who 
communicates with us by email is taken to accept these risks.  GCG accepts no 
liability for any loss or damage which may be caused by software viruses.

___
Pacemaker mailing list
Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker


Re: [Pacemaker] Centos 5 and GFS

2010-03-25 Thread Cristian Mammoli - Apra Sistemi

Andrew Beekhof wrote:


The getting started guide has step by step instructions for setting it up.

Unfortunately I only have a test environment, and the distro should be the
same we use in production sites (RHEL5/CENTOS5).
Anyway, thnaks for the time and the clarification.


What about some VMs?  Just a thought.


The cluster itself runs vmware virtual machines and I don't think vmware 
server2 or ESX support nested virtualization.
I'll probably try with FC12 and some dummy resources, just to do 
failover tests and some benchmark of gfs2 vs ext3.


Thanks
--
Cristian Mammoli
APRA SISTEMI srl
Via Brodolini,6 Jesi (AN)
tel dir. 0731 719822

Web   www.apra.it
e-mail  c.mamm...@apra.it

___
Pacemaker mailing list
Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker


Re: [Pacemaker] pingd fails to update CIB

2010-03-25 Thread Marc Villacorta
I have the same problem as Quentin Smith (sticky pingd value=0).



[12:49:44 ha1] ~ rpm -qa | egrep -i pacemaker|corosync|heartbeat|resource
heartbeat-libs-3.0.2-2.el5
heartbeat-3.0.2-2.el5
pacemaker-libs-1.0.8-1.el5
resource-agents-1.0.1-1.el5
corosync-1.2.0-1.el5
pacemaker-1.0.8-1.el5
corosynclib-1.2.0-1.el5



[12:38:10 ha1] ~ crm configure show
node ha1.intra.genaker.net
node ha2.intra.genaker.net
primitive MyDrbd ocf:linbit:drbd \
params drbd_resource=r0 \
op monitor interval=15s \
op start interval=0 timeout=240s \
op stop interval=0 timeout=100s
primitive MyFs ocf:heartbeat:Filesystem \
params device=/dev/drbd/by-res/r0 directory=/data fstype=ext3
\
op start interval=0 timeout=60s \
op stop interval=0 timeout=60s
primitive MyIp ocf:heartbeat:IPaddr2 \
params ip=10.1.1.6 nic=eth0 cidr_netmask=32
broadcast=10.1.1.255 iflabel=Cluster \
op monitor interval=10s \
op start interval=0 timeout=90s \
op stop interval=0 timeout=100s
primitive MyPing ocf:pacemaker:pingd \
params host_list=10.1.1.1 multiplier=100 \
op monitor interval=15s timeout=20s \
op start interval=0 timeout=90s \
op stop interval=0 timeout=100s
group MyGroup MyFs MyIp
ms MyMsDrbd MyDrbd \
meta master-max=1 master-node-max=1 clone-max=2
clone-node-max=1 notify=true
clone MyPingClone MyPing \
meta globally-unique=false
location MyLocation MyMsDrbd \
rule $id=MyLocation-rule $role=Master -inf: not_defined pingd or
pingd lte 0
colocation MyColocation inf: MyGroup MyMsDrbd:Master
order MyOrder inf: MyMsDrbd:promote MyGroup:start
property $id=cib-bootstrap-options \
cluster-infrastructure=openais \
stonith-enabled=false \
no-quorum-policy=ignore \
dc-version=1.0.8-2a76c6ac04bcccf42b89a08e55bfbd90da2fb49a \
expected-quorum-votes=2



[12:23:54 ha1] ~ cibadmin -Q | grep name=\pingd\ value=
  nvpair id=status-ha2.intra.genaker.net-pingd name=pingd
value=0/
  nvpair id=status-ha1.intra.genaker.net-pingd name=pingd
value=0/
[12:23:59 ha1] ~ attrd_updater -R
[12:24:13 ha1] ~ cibadmin -Q | grep name=\pingd\ value=
  nvpair id=status-ha2.intra.genaker.net-pingd name=pingd
value=100/
  nvpair id=status-ha1.intra.genaker.net-pingd name=pingd
value=100/
___
Pacemaker mailing list
Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker


[Pacemaker] showscores.sh script (patch?)

2010-03-25 Thread Maros Timko
Hi all,

mainly Dominik Klein - thanks for the showscores.sh script. But I am
curious, why the script filters numeric values only for failcount
values meaning INFINITY are just ignored:
1. Simulating a failure
Failed actions:
dom0-fs-Dom0_start_0 (node=vsp11.example.com, call=15, rc=5,
status=complete): not installed

2. Before patch
ResourceScore Node
Stickiness #FailMigration-Threshold
dom0-drbd-Dom0  -100  vsp11.example.com 90 0
dom0-drbd-Dom0  410   vsp9.example.com  90 0
dom0-fs-Dom00 vsp9.example.com  0  0
dom0-fs-Dom0-100  vsp11.example.com 0  == Note no
#Fail value

3. After patch
ResourceScore Node
Stickiness #FailMigration-Threshold
dom0-drbd-Dom0  -100  vsp11.example.com 90 0
dom0-drbd-Dom0  410   vsp9.example.com  90 0
dom0-fs-Dom00 vsp9.example.com  0  0
dom0-fs-Dom0-100  vsp11.example.com 0
INFINITY == Note we can see the INF value now

Would it be feasible to apply this patch:
--
--- showscores.sh   2010-03-25 14:59:21.0 +
+++ showscores-new.sh   2010-03-25 15:00:38.0 +
@@ -91,3 +91,3 @@
 get_failcount() { #usage $0 res node
-failcount=`crm_failcount -G -r $1 -U $2 -Q 2/dev/null|grep
-o ^[0-9]*$`
+failcount=`crm_failcount -G -r $1 -U $2 -Q 2/dev/null`
 }
--

With regards,
Tino

___
Pacemaker mailing list
Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker


[Pacemaker] Someone using ibmrsa-telnet external stonith plugin?

2010-03-25 Thread Andreas Mock
Hi all,

is there someone using the external stonith plugin 'ibmrsa-telnet'?

I found some issues introduced by modifications of other contributors
in the currect version.

I want to correct the issues and present a patch for that.

As the RSA or IMM boards behave all a little bit different I need people
willing to test the (hopefully) corrected script on their platforms.

Please contact me. I would appreciate it.

Best regards
Andreas Mock

___
Pacemaker mailing list
Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker


Re: [Pacemaker] DRBD Management Console 0.7.0

2010-03-25 Thread Rasto Levrinc

On Thu, March 25, 2010 4:29 pm, martin.br...@icw.de wrote:
 Hi Rasto,


 I played around with the MC and it is really a promising integrative
 approach for managing a DRBD and Pacemaker Cluster. For now it is really
 nice for demonstration purposes like detaching the primary with failover.


 What I am missing is a resource cleanup (crm resource cleanup resource
 name), is this function in your release plan?

It is there. When you click on right click on a resource or in the
resource Actions menu. It is called either Restart Failed (Clean up)
or Reset Fail-Count (Clean Up). This works only if you have some
fail-count, or the resource failed completely.

If you want to clean-up the resource just so, there is more planed, see
this discussion:

http://www.mail-archive.com/drbd...@lists.linbit.com/msg00081.html

Basically I am trying to hide things like LRM and cleanup
from the user. cleanup could be something like clear history, where
there would be also show history. When a resource fails a user should be
able to quickly identify how to activate the resource, removing the error
messages. Cleanup and fail-counter are not very descriptive in this case.
Anyway I will give it still some more thought.

Rasto

-- 
: Dipl-Ing Rastislav Levrinc
: DRBD-MC http://www.drbd.org/mc/management-console/
: DRBD/HA support and consulting http://www.linbit.com/
DRBD(R) and LINBIT(R) are registered trademarks of LINBIT, Austria.



___
Pacemaker mailing list
Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker


[Pacemaker] CAn't install resource-agents-1.0.2 on EL5

2010-03-25 Thread Ken Dechick
Hello all, 

  Sorry iof this has been covered already - I dug through the lists and didn't
see this problem anywhere. In the process of migrating up to centOS 5.4 from
5.3 I noticed that I cannot install the newer resource-agents-1.0.2 from yum.
I have the older one installed (v1.0.1), and want to get it up to date. Yum
transaction fails from missing dependancy:

Setting up Update Process
Resolving Dependencies
-- Running transaction check
--- Package resource-agents.x86_64 0:1.0.2-1 set to be updated
-- Processing Dependency: libnet.so.1()(64bit) for package: resource-agents
-- Finished Dependency Resolution
resource-agents-1.0.2-1.x86_64 from linbit has depsolving problems
  -- Missing Dependency: libnet.so.1()(64bit) is needed by package
resource-agents-1.0.2-1.x86_64 (linbit)
Error: Missing Dependency: libnet.so.1()(64bit) is needed by package
resource-agents-1.0.2-1.x86_64 (linbit)

I have the usual default CentOS-base.repo, plus I add epel and rpmforge
besides clusterlabs and linbit (we have support contract with Linbit).

Where can I get this libnet.so.1()(64bit) dependancy from?? Yum can't find it:
[r...@ccsha2 ~]# yum whatprovides */libnet.so.1
Loaded plugins: fastestmirror, priorities
Loading mirror speeds from cached hostfile
 * addons: linux.mirrors.es.net
 * base: repo.genomics.upenn.edu
 * epel: archive.linux.duke.edu
 * extras: ftp.wallawalla.edu
 * rpmforge: apt.sw.be
 * updates: ftp.ussg.iu.edu
Excluding Packages from CentOS-5 - Extras
Finished
epel/filelists_db| 4.1 MB 00:01
No Matches found

 
-Thanks!


Kenneth M DeChick
Linux Systems Administrator
Community Computer Service, Inc.
(315)-255-1751 ext154
http://www.medent.com
k...@medent.com
Registered Linux User #497318
-- -- -- -- -- -- -- -- -- -- --
You canna change the laws of physics, Captain; I've got to have thirtyminutes! 


.

This message has been scanned for viruses and dangerous content by MailScanner, 
SpamAssassin  ClamAV.

This message and any attachments may contain information that is protected by 
law as privileged and confidential, and
is transmitted for the sole use of the intended recipient(s). If you are not 
the intended recipient, you are hereby notified
that any use, dissemination, copying or retention of this e-mail or the 
information contained herein is strictly prohibited.
If you received this e-mail in error, please immediately notify the sender by 
e-mail, and permanently delete this e-mail.


___
Pacemaker mailing list
Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker


Re: [Pacemaker] Someone using ibmrsa-telnet external stonith plugin?

2010-03-25 Thread Andreas Mock
-Ursprüngliche Nachricht-
Von: Florian Haas florian.h...@linbit.com
Gesendet: 25.03.2010 16:23:59
An: The Pacemaker cluster resource manager pacemaker@oss.clusterlabs.org
Betreff: Re: [Pacemaker] Someone using ibmrsa-telnet external stonith plugin?

On 03/25/2010 04:09 PM, Andreas Mock wrote:
 Hi all,
 
 is there someone using the external stonith plugin 'ibmrsa-telnet'?
 
 I found some issues introduced by modifications of other contributors
 in the currect version.

Such as?

a) ha_log.sh is used to send debug messages. It is called as bash command line
without escaping dangerous characters. This leads to unwanted file creation
in the directory /var/lib/heartbeat/cores/root/(replicateable)

b) The regex pattern for the expect command don't match. So the communication
doesn't work as originally intended. I contacted the contributor of that piece
of code. He found that error early after his contribution, but the error 
correction
went to /dev/null somehow.

I am beginning to believe that everyone should start using IPMI, and
spare themselves of these proprietary out-of-band nightmares.

There are some questions regarding this:
a) Has anyone made experiences with using IPMI through the stack running
on the OS. If I understand it right then OpenIPMI provides this kind of
in-band-communication. My understanding of STONITH was that there has
to be a way to kill a node WITHOUT a dependency on the node's health.
Is it safe to use the IPMI interface provided by a daemon running in the OS
of the node I want to shoot?

b) The newer IMM supports IPMI through the out-bound-communication
over the IMM ip address. RSA II does not have this option as far as I know
(updates welcome) what has been the reason for writing this telnet-beast.  ;-)

c) Which stonith agent is the better one 'ipmilan' or 'external/ipmi'?

Enlighting informations very welcome.

Best regards
Andreas Mock

signature.asc
Description: OpenPGP digital signature
___
Pacemaker mailing list
Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker
___
Pacemaker mailing list
Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker


Re: [Pacemaker] Someone using ibmrsa-telnet external stonith plugin?

2010-03-25 Thread Andreas Mock
-Ursprüngliche Nachricht-
Von: Dejan Muhamedagic deja...@fastmail.fm
Gesendet: 25.03.2010 16:33:55
An: pacemaker@oss.clusterlabs.org
Betreff: Re: [Pacemaker] Someone using ibmrsa-telnet external stonith plugin?

Just provide patches.


Hi Dejan,

see attached.

Two problems should be solved. (see answer to Florian Haas on this thread)

Best regards
Andreas Mock

ibmrsa-telnet.patch
Description: Binary data
___
Pacemaker mailing list
Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker


Re: [Pacemaker] About replacement of clone and handling of the fail number of times.

2010-03-25 Thread renayama19661014
Hi Andrew,

 globally-unique=false means that :0 and :1 are actually the same resource.
 its perfectly valid for entries for both to exist on the node, but the
 PE should fold them together internally.
 
 in most ways it does, just not for failures (yet).

Thank you for comment. 
Some we were confused. 

In the first place, by setting of globally-unique, what kind of difference is 
there? 
In addition, what kind of place do you use it in?

Best Regards,
Hideo Yamauchi.

--- Andrew Beekhof and...@beekhof.net wrote:

 2010/3/24  renayama19661...@ybb.ne.jp:
  Hi Andrew,
 
  Do you mean: why is the clone on srv01 always $clone:0 but on srv02
  its sometimes $clone:0 and sometimes $clone:1 ?
 
  yes.
 
  The replacement thought both nodes to be the same movement.
  Because it is globally-unique=false.
 
 globally-unique=false means that :0 and :1 are actually the same resource.
 its perfectly valid for entries for both to exist on the node, but the
 PE should fold them together internally.
 
 in most ways it does, just not for failures (yet).
 
 ___
 Pacemaker mailing list
 Pacemaker@oss.clusterlabs.org
 http://oss.clusterlabs.org/mailman/listinfo/pacemaker
 


___
Pacemaker mailing list
Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker