Re: [Pacemaker] Live demo of Pacemaker Cloud on Fedora: Friday August 5th at 8am PST

2011-08-03 Thread Bob Schatz
Steven,

Are you planning on recording/taping it if I want to watch it later?

Thanks,

Bob



From: Steven Dake sd...@redhat.com
To: pcmk-cl...@oss.clusterlabs.org
Cc: aeolus-de...@lists.fedorahosted.org; Fedora Cloud SIG 
cl...@lists.fedoraproject.org; open...@lists.linux-foundation.org 
open...@lists.linux-foundation.org; The Pacemaker cluster resource manager 
pacemaker@oss.clusterlabs.org
Sent: Wednesday, August 3, 2011 9:42 AM
Subject: [Pacemaker] Live demo of Pacemaker Cloud on Fedora: Friday August 5th 
at 8am PST

Extending a general invitation to the high availability communities and
other cloud community contributors to participate in a live demo I am
giving on Friday August 5th 8am PST (GMT-7).  Demo portion of session is
15 minutes and will be provided first followed by more details of our
approach to high availability.

I will use elluminate to show the demo on my desktop machine.  To make
elluminate work, you will need icedtea-web installed on your system
which is not typically installed by default.

You will also need a conference # and bridge code.  Please contact me
offlist with your location and I'll provide you with a hopefully toll
free conference # and bridge code.

Elluminate link:
https://sas.elluminate.com/m.jnlp?sid=819password=M.13AB020AEBE358D265FD925A07335F

Bridge Code:  Please contact me off list with your location and I'll
respond back with dial-in information.

___
Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://developerbugs.linux-foundation.org/enter_bug.cgi?product=Pacemaker___
Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://developerbugs.linux-foundation.org/enter_bug.cgi?product=Pacemaker


Re: [Pacemaker] Fw: Configuration for FS over DRBD over LVM

2011-07-20 Thread Bob Schatz
One correction:

I removed the location constraint and simply went with this:

      colocation coloc-rule-w-master inf: glance-repos ms_drbd:Master 
glance-repos-fs-group
      order glance-order-fs-after-drbd inf: glance-repos:start ms_drbd:promote 
glance-repos-fs-group:start
      order glance-order-fs-after-drbd2 inf: glance-repos-fs-group:stop 
ms_drbd:demote ms_drbd:stop glance-repos:stop

I called out the stop of DRBD before the stop of LVM.   The syslog attached 
previously is for this configuration.


Thanks,

Bob



From: Bob Schatz bsch...@yahoo.com
To: pacemaker@oss.clusterlabs.org pacemaker@oss.clusterlabs.org
Sent: Wednesday, July 20, 2011 11:32 AM
Subject: [Pacemaker] Fw:  Configuration for FS over DRBD over LVM


I tried another test based on this thread:

http://www.gossamer-threads.com/lists/linuxha/pacemaker/65928?search_string=lvm%20drbd;#65928

I removed the location constraint and simply went with this:

        colocation coloc-rule-w-master inf: glance-repos ms_drbd:Master 
glance-repos-fs-group
        order glance-order-fs-after-drbd inf: glance-repos:start 
ms_drbd:promote glance-repos-fs-group:start
        order glance-order-fs-after-drbd2 inf: glance-repos-fs-group:stop 
ms_drbd:demote glance-repos:stop


The stop actions were called in this order:

stop file system
demote DRBD
stop LVM   *
stop DRBD *

instead of:

stop file system
demote DRBD
stop DRBD **
stop LVM **

I see these messages in the log which I believe are debug messages based on 
reading other threads:

        pengine: [21021]: debug: text2task: Unsupported action: 
glance-order-fs-after-drbd-0-start-begin
        pengine: [21021]: debug: text2task: Unsupported action: 
glance-order-fs-after-drbd-0-start-end
        pengine: [21021]: debug: text2task: Unsupported action: 
glance-order-fs-after-drbd-0-stop-begin
        pengine: [21021]: debug: text2task: Unsupported action: 
glance-order-fs-after-drbd-0-stop-end
        pengine: [21021]: debug: text2task: Unsupported action: 
glance-order-fs-after-drbd-1-promote-begin
        pengine: [21021]: debug: text2task: Unsupported action: 
glance-order-fs-after-drbd-1-promote-end
        pengine: [21021]: debug: text2task: Unsupported action: 
glance-order-fs-after-drbd-1-demote-begin
        pengine: [21021]: debug: text2task: Unsupported action: 
glance-order-fs-after-drbd-1-demote-end
        pengine: [21021]: debug: text2task: Unsupported action: 
glance-order-fs-after-drbd-2-start-begin
        pengine: [21021]: debug: text2task: Unsupported action: 
glance-order-fs-after-drbd-2-start-end
        pengine: [21021]: debug: text2task: Unsupported action: 
glance-order-fs-after-drbd-2-stop-begin
        pengine: [21021]: debug: text2task: Unsupported action: 
glance-order-fs-after-drbd-2-stop-end
        pengine: [21021]: debug: text2task: Unsupported action: 
glance-order-fs-after-drbd2-0-stop-begin
        pengine: [21021]: debug: text2task: Unsupported action: 
glance-order-fs-after-drbd2-0-stop-end
        pengine: [21021]: debug: text2task: Unsupported action: 
glance-order-fs-after-drbd2-0-start-begin
        pengine: [21021]: debug: text2task: Unsupported action: 
glance-order-fs-after-drbd2-0-start-end
        pengine: [21021]: debug: text2task: Unsupported action: 
glance-order-fs-after-drbd2-1-demote-begin
        pengine: [21021]: debug: text2task: Unsupported action: 
glance-order-fs-after-drbd2-1-demote-end
        pengine: [21021]: debug: text2task: Unsupported action: 
glance-order-fs-after-drbd2-1-promote-begin
        pengine: [21021]: debug: text2task: Unsupported action: 
glance-order-fs-after-drbd2-1-promote-end
        pengine: [21021]: debug: text2task: Unsupported action: 
glance-order-fs-after-drbd2-2-stop-begin
        pengine: [21021]: debug: text2task: Unsupported action: 
glance-order-fs-after-drbd2-2-stop-end
        pengine: [21021]: debug: text2task: Unsupported action: 
glance-order-fs-after-drbd2-2-start-begin
        pengine: [21021]: debug: text2task: Unsupported action: 
glance-order-fs-after-drbd2-2-start-end

I have attached a syslog-pacemaker log of the /etc/init.d/corosync start 
through /etc/init.d/corosync stop sequence.


Thanks,

Bob

- Forwarded Message -
From: Bob Schatz bsch...@yahoo.com
To: pacemaker@oss.clusterlabs.org pacemaker@oss.clusterlabs.org
Sent: Tuesday, July 19, 2011 4:38 PM
Subject: [Pacemaker] Configuration for FS over DRBD over LVM


Hi,

I am trying to configure an FS running on top of DRBD on top of LVM or:

     FS
     |
    DRBD
     |
    LVM

I am using Pacemaker 1.0.8, Ubuntu 10.04 and DRBD 8.3.7.

Reviewing all the manuals (Pacemaker Explained 1.0, DRBD 8.4 User Guide, etc) I 
came up with this Pacemaker configuration:

node cnode-1-3-5
node cnode-1-3-6
primitive glance-drbd ocf:linbit:drbd \
        params drbd_resource=glance-repos-drbd \
        op start interval=0 timeout=240 \
        op stop interval=0 timeout=100 \
        op monitor

[Pacemaker] Fw: Configuration for FS over DRBD over LVM

2011-07-20 Thread Bob Schatz
One correction:



I removed the location constraint and simply went with this:

      colocation coloc-rule-w-master inf: glance-repos ms_drbd:Master 
glance-repos-fs-group
      order glance-order-fs-after-drbd inf: glance-repos:start ms_drbd:promote 
glance-repos-fs-group:start
      order glance-order-fs-after-drbd2 inf: glance-repos-fs-group:stop 
ms_drbd:demote ms_drbd:stop glance-repos:stop

I called out the stop of DRBD before the stop of LVM.   The syslog attached 
previously is for this configuration.


Thanks,

Bob



From: Bob Schatz bsch...@yahoo.com
To: pacemaker@oss.clusterlabs.org pacemaker@oss.clusterlabs.org
Sent: Wednesday, July 20, 2011 11:32 AM
Subject: [Pacemaker] Fw:  Configuration for FS over DRBD over LVM


I tried another test based on this thread:

http://www.gossamer-threads.com/lists/linuxha/pacemaker/65928?search_string=lvm%20drbd;#65928

I removed the location constraint and simply went with this:

        colocation coloc-rule-w-master inf: glance-repos ms_drbd:Master 
glance-repos-fs-group
        order glance-order-fs-after-drbd inf: glance-repos:start 
ms_drbd:promote glance-repos-fs-group:start
        order glance-order-fs-after-drbd2 inf: glance-repos-fs-group:stop 
ms_drbd:demote glance-repos:stop


The stop actions were called in this order:

stop file system
demote DRBD
stop LVM   *
stop DRBD *

instead of:

stop file system
demote DRBD
stop DRBD **
stop LVM **

I see these messages in the log which I believe are debug messages based on 
reading other threads:

        pengine: [21021]: debug: text2task: Unsupported action: 
glance-order-fs-after-drbd-0-start-begin
        pengine: [21021]: debug: text2task: Unsupported action: 
glance-order-fs-after-drbd-0-start-end
        pengine: [21021]: debug: text2task: Unsupported action: 
glance-order-fs-after-drbd-0-stop-begin
        pengine: [21021]: debug: text2task: Unsupported action: 
glance-order-fs-after-drbd-0-stop-end
        pengine: [21021]: debug: text2task: Unsupported action: 
glance-order-fs-after-drbd-1-promote-begin
        pengine: [21021]: debug: text2task: Unsupported action: 
glance-order-fs-after-drbd-1-promote-end
        pengine: [21021]: debug: text2task: Unsupported action: 
glance-order-fs-after-drbd-1-demote-begin
        pengine: [21021]: debug: text2task: Unsupported action: 
glance-order-fs-after-drbd-1-demote-end
        pengine: [21021]: debug: text2task: Unsupported action: 
glance-order-fs-after-drbd-2-start-begin
        pengine: [21021]: debug: text2task: Unsupported action: 
glance-order-fs-after-drbd-2-start-end
        pengine: [21021]: debug: text2task: Unsupported action: 
glance-order-fs-after-drbd-2-stop-begin
        pengine: [21021]: debug: text2task: Unsupported action: 
glance-order-fs-after-drbd-2-stop-end
        pengine: [21021]: debug: text2task: Unsupported action: 
glance-order-fs-after-drbd2-0-stop-begin
        pengine: [21021]: debug: text2task: Unsupported action: 
glance-order-fs-after-drbd2-0-stop-end
        pengine: [21021]: debug: text2task: Unsupported action: 
glance-order-fs-after-drbd2-0-start-begin
        pengine: [21021]: debug: text2task: Unsupported action: 
glance-order-fs-after-drbd2-0-start-end
        pengine: [21021]: debug: text2task: Unsupported action: 
glance-order-fs-after-drbd2-1-demote-begin
        pengine: [21021]: debug: text2task: Unsupported action: 
glance-order-fs-after-drbd2-1-demote-end
        pengine: [21021]: debug: text2task: Unsupported action: 
glance-order-fs-after-drbd2-1-promote-begin
        pengine: [21021]: debug: text2task: Unsupported action: 
glance-order-fs-after-drbd2-1-promote-end
        pengine: [21021]: debug: text2task: Unsupported action: 
glance-order-fs-after-drbd2-2-stop-begin
        pengine: [21021]: debug: text2task: Unsupported action: 
glance-order-fs-after-drbd2-2-stop-end
        pengine: [21021]: debug: text2task: Unsupported action: 
glance-order-fs-after-drbd2-2-start-begin
        pengine: [21021]: debug: text2task: Unsupported action: 
glance-order-fs-after-drbd2-2-start-end

I have attached a syslog-pacemaker log of the /etc/init.d/corosync start 
through /etc/init.d/corosync stop sequence.


Thanks,

Bob

- Forwarded Message -
From: Bob Schatz bsch...@yahoo.com
To: pacemaker@oss.clusterlabs.org pacemaker@oss.clusterlabs.org
Sent: Tuesday, July 19, 2011 4:38 PM
Subject: [Pacemaker] Configuration for FS over DRBD over LVM


Hi,

I am trying to configure an FS running on top of DRBD on top of LVM or:

     FS
     |
    DRBD
     |
    LVM

I am using Pacemaker 1.0.8, Ubuntu 10.04 and DRBD 8.3.7.

Reviewing all the manuals (Pacemaker Explained 1.0, DRBD 8.4 User Guide, etc) I 
came up with this Pacemaker configuration:

node cnode-1-3-5
node cnode-1-3-6
primitive glance-drbd ocf:linbit:drbd \
        params drbd_resource=glance-repos-drbd \
        op start interval=0 timeout=240 \
        op stop interval=0 timeout=100 \
        op

[Pacemaker] Fw: Fw: Configuration for FS over DRBD over LVM

2011-07-20 Thread Bob Schatz
Okay, this configuration works on one node (I am waiting for a hardware problem 
to be fixed before testing with second node):

node cnode-1-3-5
node cnode-1-3-6
primitive glance-drbd ocf:linbit:drbd \
        params drbd_resource=glance-repos-drbd \
        op start interval=0 timeout=240 \
        op stop interval=0 timeout=100 \
        op monitor interval=59s role=Master timeout=30s \
        op monitor interval=61s role=Slave timeout=30s
primitive glance-fs ocf:heartbeat:Filesystem \
        params device=/dev/drbd1 directory=/glance-mount fstype=ext4 \
        op start interval=0 timeout=60 \
        op monitor interval=60 timeout=60 OCF_CHECK_LEVEL=20 \
        op stop interval=0 timeout=120
primitive glance-ip ocf:heartbeat:IPaddr2 \
        params ip=10.4.0.25 nic=br100:1 \
        op monitor interval=5s
primitive glance-repos ocf:heartbeat:LVM \
        params volgrpname=glance-repos exclusive=true \
        op start interval=0 timeout=30 \
         op stop interval=0 timeout=30
group glance-repos-fs-group glance-fs glance-ip \
         meta target-role=Started
ms ms_drbd glance-drbd \
        meta master-node-max=1 clone-max=2 clone-node-max=1 
globally-unique=false notify=true target-role=Master
colocation coloc-rule-w-master inf: ms_drbd:Master glance-repos-fs-group
colocation coloc-rule-w-master2 inf: glance-repos ms_drbd:Master
order glance-order-fs-after-drbd inf: glance-repos:start ms_drbd:start
order glance-order-fs-after-drbd-stop inf: glance-repos-fs-group:stop 
ms_drbd:demote
order glance-order-fs-after-drbd-stop2 inf: ms_drbd:demote ms_drbd:stop
order glance-order-fs-after-drbd-stop3 inf: ms_drbd:stop glance-repos:stop
order glance-order-fs-after-drbd2 inf: ms_drbd:start ms_drbd:promote
order glance-order-fs-after-drbd3 inf: ms_drbd:promote 
glance-repos-fs-group:start
property $id=cib-bootstrap-options \
        dc-version=1.0.8-042548a451fce8400660f6031f4da6f0223dd5dd \
        cluster-infrastructure=openais \
        expected-quorum-votes=1 \
        stonith-enabled=false \
        no-quorum-policy=ignore \
        last-lrm-refresh=1310768814

I will let everyone know how testing goes.


Thanks,

Bob

- Forwarded Message -
From: Bob Schatz bsch...@yahoo.com
To: pacemaker@oss.clusterlabs.org pacemaker@oss.clusterlabs.org
Sent: Wednesday, July 20, 2011 1:38 PM
Subject: [Pacemaker]  Fw:  Configuration for FS over DRBD over LVM


One correction:



I removed the location constraint and simply went with this:

      colocation coloc-rule-w-master inf: glance-repos ms_drbd:Master 
glance-repos-fs-group
      order glance-order-fs-after-drbd inf: glance-repos:start ms_drbd:promote 
glance-repos-fs-group:start
      order glance-order-fs-after-drbd2 inf: glance-repos-fs-group:stop 
ms_drbd:demote ms_drbd:stop glance-repos:stop

I called out the stop of DRBD before the stop of LVM.   The syslog attached 
previously is for this configuration.


Thanks,

Bob



From: Bob Schatz bsch...@yahoo.com
To: pacemaker@oss.clusterlabs.org pacemaker@oss.clusterlabs.org
Sent: Wednesday, July 20, 2011 11:32 AM
Subject: [Pacemaker] Fw:  Configuration for FS over DRBD over LVM


I tried another test based on this thread:

http://www.gossamer-threads.com/lists/linuxha/pacemaker/65928?search_string=lvm%20drbd;#65928

I removed the location constraint and simply went with this:

        colocation coloc-rule-w-master inf: glance-repos ms_drbd:Master 
glance-repos-fs-group
        order glance-order-fs-after-drbd inf: glance-repos:start 
ms_drbd:promote glance-repos-fs-group:start
        order glance-order-fs-after-drbd2 inf: glance-repos-fs-group:stop 
ms_drbd:demote glance-repos:stop


The stop actions were called in this order:

stop file system
demote DRBD
stop LVM   *
stop DRBD *

instead of:

stop file system
demote DRBD
stop DRBD **
stop LVM **

I see these messages in the log which I believe are debug messages based on 
reading other threads:

        pengine: [21021]: debug: text2task: Unsupported action: 
glance-order-fs-after-drbd-0-start-begin
        pengine: [21021]: debug: text2task: Unsupported action: 
glance-order-fs-after-drbd-0-start-end
        pengine: [21021]: debug: text2task: Unsupported action: 
glance-order-fs-after-drbd-0-stop-begin
        pengine: [21021]: debug: text2task: Unsupported action: 
glance-order-fs-after-drbd-0-stop-end
        pengine: [21021]: debug: text2task: Unsupported action: 
glance-order-fs-after-drbd-1-promote-begin
        pengine: [21021]: debug: text2task: Unsupported action: 
glance-order-fs-after-drbd-1-promote-end
        pengine: [21021]: debug: text2task: Unsupported action: 
glance-order-fs-after-drbd-1-demote-begin
        pengine: [21021]: debug: text2task: Unsupported action: 
glance-order-fs-after-drbd-1-demote-end
        pengine: [21021]: debug: text2task: Unsupported action: 
glance-order-fs-after-drbd-2-start-begin
        pengine: [21021]: debug: text2task: Unsupported action: 
glance

[Pacemaker] Configuration for FS over DRBD over LVM

2011-07-19 Thread Bob Schatz
Hi,

I am trying to configure an FS running on top of DRBD on top of LVM or:

     FS
     |
    DRBD
     |
    LVM

I am using Pacemaker 1.0.8, Ubuntu 10.04 and DRBD 8.3.7.

Reviewing all the manuals (Pacemaker Explained 1.0, DRBD 8.4 User Guide, etc) I 
came up with this Pacemaker configuration:

node cnode-1-3-5
node cnode-1-3-6
primitive glance-drbd ocf:linbit:drbd \
        params drbd_resource=glance-repos-drbd \
        op start interval=0 timeout=240 \
        op stop interval=0 timeout=100 \
        op monitor interval=59s role=Master timeout=30s \
        op monitor interval=61s role=Slave timeout=30s
primitive glance-fs ocf:heartbeat:Filesystem \
        params device=/dev/drbd1 directory=/glance-mount fstype=ext4 \
        op start interval=0 timeout=60 \
        op monitor interval=60 timeout=60 OCF_CHECK_LEVEL=20 \
        op stop interval=0 timeout=120
primitive glance-repos ocf:heartbeat:LVM \
        params volgrpname=glance-repos exclusive=true \
        op start interval=0 timeout=30 \
        op stop interval=0 timeout=30
group glance-repos-fs-group glance-fs
ms ms_drbd glance-drbd \
        meta master-node-max=1 clone-max=2 clone-node-max=1 
globally-unique=false notify=true target-role=Master
location drbd_on_node1 ms_drbd \
        rule $id=drbd_on_node1-rule $role=Master 100: #uname eq cnode-1-3-5
colocation coloc-rule-w-master inf: glance-repos ms_drbd:Master
order glance-order-fs-after-drbd inf: glance-repos:start ms_drbd:promote 
glance-repos-fs-group:start
property $id=cib-bootstrap-options \
        dc-version=1.0.8-042548a451fce8400660f6031f4da6f0223dd5dd \
        cluster-infrastructure=openais \
        expected-quorum-votes=1 \
        stonith-enabled=false \
        no-quorum-policy=ignore \
        last-lrm-refresh=1310768814

On one node, things come up cleanly.  In fact, debug messages in the agent show 
that the start() functions for the agent are called and exited in order (LVM 
start, DRBD start and Filesystem start).

The problem occurs when I do /etc/init.d/corosync stop on a single node.   
What happens is that the stop() functions are called in this order:

1. LVM stop
2. Filesystem stop
3. DRBD stop

What I have tried:

1. I tried setting the score of the order to 500 assuming that this would 
mean the colocation rule would hit first.  Still the same problem.
2. I tried leaving off the :start and :promote options on the order line. 
  The stop order was still LVM, Filesystem, and DRBD
3. I tried adding another colocation rule colocation coloc-rule-w-master2 inf: 
ms_drbd:Master glance-repos-fs-group to tie glance-repos-fs-group to the same 
node as DRBD.   Stop still had the same issue.   I assume that I will still 
need this rule when I add a second node to the test.

Any suggestions would be appreciated.

A side note, the reason I have a group for the file system is that I would like 
to add an application and IP address to the group once I get this working.   
Also, the reason I have LVM under DRBD is that I want to be able to grow the 
LVM volume as needed and then expand the DRBD volume.


Thanks in advance,

Bob___
Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://developerbugs.linux-foundation.org/enter_bug.cgi?product=Pacemaker


Re: [Pacemaker] Question regarding starting of master/slave resources and ELECTIONs

2011-04-15 Thread Bob Schatz
Andrew,

Comments at end with BS




From: Andrew Beekhof and...@beekhof.net
To: Bob Schatz bsch...@yahoo.com
Cc: The Pacemaker cluster resource manager pacemaker@oss.clusterlabs.org
Sent: Fri, April 15, 2011 4:28:52 AM
Subject: Re: [Pacemaker] Question regarding starting of master/slave resources 
and ELECTIONs

On Fri, Apr 15, 2011 at 5:58 AM, Bob Schatz bsch...@yahoo.com wrote:
 Andrew,
 Thanks for the help
 Comments inline with BS
 
 From: Andrew Beekhof and...@beekhof.net
 To: Bob Schatz bsch...@yahoo.com
 Cc: The Pacemaker cluster resource manager pacemaker@oss.clusterlabs.org
 Sent: Thu, April 14, 2011 2:14:40 AM
 Subject: Re: [Pacemaker] Question regarding starting of master/slave
 resources and ELECTIONs

 On Thu, Apr 14, 2011 at 10:49 AM, Andrew Beekhof and...@beekhof.net wrote:

 I noticed that 4 of the master/slave resources will start right away but
 the
 5 master/slave resource seems to take a minute or so and I am only
 running
 with one node.
 Is this expected?

 Probably, if the other 4 take around a minute each to start.
 There is an lrmd config variable that controls how much parallelism it
 allows (but i forget the name).
 Bob It's max-children and I set it to 40 for this test to see if it
 would
 change the behavior.  (/sbin/lrmadmin -p max-children 40)

 Thats surprising.  I'll have a look at the logs.

 Looking at the logs, I see a couple of things:


 This is very bad:
 Apr 12 19:33:42 mgraid-S30311-1 crmd: [17529]: WARN: get_uuid:
 Could not calculate UUID for mgraid-s30311-0
 Apr 12 19:33:42 mgraid-S30311-1 crmd: [17529]: WARN:
 populate_cib_nodes_ha: Node mgraid-s30311-0: no uuid found

 For some reason pacemaker cant get the node's uuid from heartbeat.

 BS I create the uuid when the node comes up.

Heartbeat should have already created it before pacemaker even got
started though.


 So we start a few things:

 Apr 12 19:33:41 mgraid-S30311-1 crmd: [17529]: info:
 do_lrm_rsc_op: Performing
 key=23:3:0:48aac631-8177-4cda-94ea-48dfa9b1a90f
 op=SSS30311:0_start_0 )
 Apr 12 19:33:41 mgraid-S30311-1 crmd: [17529]: info:
 do_lrm_rsc_op: Performing
 key=49:3:0:48aac631-8177-4cda-94ea-48dfa9b1a90f
 op=SSJ30312:0_start_0 )
 Apr 12 19:33:41 mgraid-S30311-1 crmd: [17529]: info:
 do_lrm_rsc_op: Performing
 key=75:3:0:48aac631-8177-4cda-94ea-48dfa9b1a90f
 op=SSJ30313:0_start_0 )
 Apr 12 19:33:41 mgraid-S30311-1 crmd: [17529]: info:
 do_lrm_rsc_op: Performing
 key=101:3:0:48aac631-8177-4cda-94ea-48dfa9b1a90f
 op=SSJ30314:0_start_0 )

 But then another change comes in:

 Apr 12 19:33:41 mgraid-S30311-1 crmd: [17529]: info:
 abort_transition_graph: need_abort:59 - Triggered transition abort
 (complete=0) : Non-status change

 Normally we'd recompute and keep going, but it was a(nother) replace
 operation, so:

 Apr 12 19:33:42 mgraid-S30311-1 crmd: [17529]: info:
 do_state_transition: State transition S_TRANSITION_ENGINE -
 S_ELECTION [ input=I_ELECTION cause=C_FSA_INTERNAL
 origin=do_cib_replaced ]

 All the time goes here:

 Apr 12 19:35:31 mgraid-S30311-1 crmd: [17529]: WARN:
 action_timer_callback: Timer popped (timeout=2,
 abort_level=100, complete=true)
 Apr 12 19:35:31 mgraid-S30311-1 crmd: [17529]: WARN:
 action_timer_callback: Ignoring timeout while not in transition
 Apr 12 19:35:31 mgraid-S30311-1 crmd: [17529]: WARN:
 action_timer_callback: Timer popped (timeout=2,
 abort_level=100, complete=true)
 Apr 12 19:35:31 mgraid-S30311-1 crmd: [17529]: WARN:
 action_timer_callback: Ignoring timeout while not in transition
 Apr 12 19:35:31 mgraid-S30311-1 crmd: [17529]: WARN:
 action_timer_callback: Timer popped (timeout=2,
 abort_level=100, complete=true)
 Apr 12 19:35:31 mgraid-S30311-1 crmd: [17529]: WARN:
 action_timer_callback: Ignoring timeout while not in transition
 Apr 12 19:35:31 mgraid-S30311-1 crmd: [17529]: WARN:
 action_timer_callback: Timer popped (timeout=2,
 abort_level=100, complete=true)
 Apr 12 19:35:31 mgraid-S30311-1 crmd: [17529]: WARN:
 action_timer_callback: Ignoring timeout while not in transition
 Apr 12 19:37:00 mgraid-S30311-1 crmd: [17529]: ERROR:
 crm_timer_popped: Integration Timer (I_INTEGRATED) just popped!

 but its not at all clear to me why - although certainly avoiding the
 election would help.
 Is there any chance to load all the changes at once?

 BS Yes.  That worked.  I created the configuration in a file and then did
 a crm configure load update filename to avoid the election
 Possibly the delay related to the UUID issue above, possibly it might
 be related to one of these two patches that went in after 1.0.9

 andrew (stable-1.0)High: crmd: Make sure we always poke the FSA after
 a transition to clear any TE_HALT actions CS: 9187c0506fd3 On:
 2010-07-07
 andrew (stable-1.0)High: crmd: Reschedule the PE_START action if its
 not already running when we try

Re: [Pacemaker] Question regarding starting of master/slave resources and ELECTIONs

2011-04-13 Thread Bob Schatz
Andrew,

Thanks for responding.  Comments inline with Bob




From: Andrew Beekhof and...@beekhof.net
To: The Pacemaker cluster resource manager pacemaker@oss.clusterlabs.org
Cc: Bob Schatz bsch...@yahoo.com
Sent: Tue, April 12, 2011 11:23:14 PM
Subject: Re: [Pacemaker] Question regarding starting of master/slave resources 
and ELECTIONs

On Wed, Apr 13, 2011 at 4:54 AM, Bob Schatz bsch...@yahoo.com wrote:
 Hi,
 I am running Pacemaker 1.0.9 with Heartbeat 3.0.3.
 I create 5 master/slave resources in /etc/ha.d/resource.d/startstop during
 post-start.

I had no idea this was possible.  Why would you do this?

Bob  We and I know of a couple of other companies, bundle LinuxHA/Pacemaker 
into an appliance.  For me, when the appliance boots, it creates HA resources 
based on the hardware it discovers.   I assumed that once POST-START was called 
in the startstop script and we have a DC then the cluster is up and running.  I 
then use crm commands to create the configuration, etc.  I further assumed 
that since we have one DC in the cluster then all crm commands which modify 
the configuration would be ordered even if the DC fails over to a different 
node.  Is this incorrect?

 I noticed that 4 of the master/slave resources will start right away but the
 5 master/slave resource seems to take a minute or so and I am only running
 with one node.
 Is this expected?

Probably, if the other 4 take around a minute each to start.
There is an lrmd config variable that controls how much parallelism it
allows (but i forget the name).

Bob It's max-children and I set it to 40 for this test to see if it would 
change the behavior.  (/sbin/lrmadmin -p max-children 40)

 My configuration is below and I have also attached ha-debug.
 Also, what triggers a crmd election?

Node up/down events and whenever someone replaces the cib (which the
shell used to do a lot).

Bob For my test, I only started one node so that I could avoid node up/down 
events.  I believe the log shows the cib being replaced.  Since I am using crm 
then I assume it must be due to crm.   Do the crm_resource, etc commands also 
replace the cib?  Would that avoid elections as a result of cibs being replaced?


Thanks,

Bob

  I seemed to have a lot of elections in
 the attached log.  I was assuming that on a single node I would only run the
 election once in the beginning and then there would not be another one until
 a new node joined.

 Thanks,
 Bob

 My configuration is:
 node $id=856c1f72-7cd1-4906-8183-8be87eef96f2 mgraid-s30311-1
 primitive SSJ30312 ocf:omneon:ss \
 params ss_resource=SSJ30312
 ssconf=/var/omneon/config/config.J30312 \
 op monitor interval=3s role=Master timeout=7s \
 op monitor interval=10s role=Slave timeout=7 \
 op stop interval=0 timeout=20 \
 op start interval=0 timeout=300
 primitive SSJ30313 ocf:omneon:ss \
 params ss_resource=SSJ30313
 ssconf=/var/omneon/config/config.J30313 \
 op monitor interval=3s role=Master timeout=7s \
 op monitor interval=10s role=Slave timeout=7 \
 op stop interval=0 timeout=20 \
 op start interval=0 timeout=300
 primitive SSJ30314 ocf:omneon:ss \
 params ss_resource=SSJ30314
 ssconf=/var/omneon/config/config.J30314 \
 op monitor interval=3s role=Master timeout=7s \
 op monitor interval=10s role=Slave timeout=7 \
 op stop interval=0 timeout=20 \
 op start interval=0 timeout=300
 primitive SSJ30315 ocf:omneon:ss \
 params ss_resource=SSJ30315
 ssconf=/var/omneon/config/config.J30315 \
 op monitor interval=3s role=Master timeout=7s \
 op monitor interval=10s role=Slave timeout=7 \
 op stop interval=0 timeout=20 \
 op start interval=0 timeout=300
 primitive SSS30311 ocf:omneon:ss \
 params ss_resource=SSS30311
 ssconf=/var/omneon/config/config.S30311 \
 op monitor interval=3s role=Master timeout=7s \
 op monitor interval=10s role=Slave timeout=7 \
 op stop interval=0 timeout=20 \
 op start interval=0 timeout=300
 primitive icms lsb:S53icms \
 op monitor interval=5s timeout=7 \
 op start interval=0 timeout=5
 primitive mgraid-stonith stonith:external/mgpstonith \
 params hostlist=mgraid-canister \
 op monitor interval=0 timeout=20s
 primitive omserver lsb:S49omserver \
 op monitor interval=5s timeout=7 \
 op start interval=0 timeout=5
 ms ms-SSJ30312 SSJ30312 \
 meta clone-max=2 notify=true globally-unique=false
 target-role=Started
 ms ms-SSJ30313 SSJ30313 \
 meta clone-max=2 notify=true globally-unique=false
 target-role=Started
 ms ms-SSJ30314 SSJ30314 \
 meta clone-max=2 notify=true globally-unique=false
 target-role=Started
 ms ms-SSJ30315 SSJ30315 \
 meta clone-max=2 notify=true globally-unique=false

[Pacemaker] Clearing a resource which returned not installed from START

2011-03-30 Thread Bob Schatz
I am running Pacemaker 1.0.9 and Heartbeat 3.0.3.

I started a resource and the agent start method returned OCF_ERR_INSTALLED.

I have fixed the problem and I would like to restart the resource and I cannot 
get it to restart.

Any ideas?


Thanks,

Bob


The failcounts are 0 as shown below and with the crm_resource command:

 # crm_mon -1 -f
 
 Last updated: Wed Mar 30 19:55:39 2011
 Stack: Heartbeat
 Current DC: mgraid-sd6661-0 (f4e5e15c-d06b-4e37-89b9-4621af05128f) - 
partition with quorum
 Version: 1.0.9-89bd754939df5150de7cd76835f98fe90851b677
 2 Nodes configured, unknown expected votes
5 Resources configured.


 Online: [ mgraid-sd6661-1 mgraid-sd6661-0 ]

  Clone Set: Fencing
  Started: [ mgraid-sd6661-1 mgraid-sd6661-0 ]
  Clone Set: cloneIcms
  Started: [ mgraid-sd6661-1 mgraid-sd6661-0 ]
  Clone Set: cloneOmserver
  Started: [ mgraid-sd6661-1 mgraid-sd6661-0 ]
  Master/Slave Set: ms-SSSD6661
  Masters: [ mgraid-sd6661-0 ]
  Slaves: [ mgraid-sd6661-1 ]
  Master/Slave Set: ms-SSJD6662
  Masters: [ mgraid-sd6661-0 ]
  Stopped: [ SSJD6662:0 ]

 Migration summary:
 * Node mgraid-sd6661-0:
 * Node mgraid-sd6661-1:

 Failed actions:
SSJD6662:0_start_0 (node=mgraid-sd6661-1, call=27, rc=5, 
status=complete): not installed

I have also tried to cleanup the resource with these commands:

  #  crm_resource --resource SSJD6662:0  --cleanup --node 
mgraid-sd6661-1
  #  crm_resource --resource SSJD6662:1  --cleanup --node 
mgraid-sd6661-1
  #  crm_resource --resource SSJD6662:0  --cleanup --node 
mgraid-sd6661-0
  #  crm_resource --resource SSJD6662:1 --cleanup --node mgraid-sd6661-0
  # crm_resource --resource ms-SSJD6662 --cleanup --node mgraid-sd6661-1

  # crm resource start SSJD6662:0

My configuration is:

node $id=856c1f72-7cd1-4906-8183-8be87eef96f2 mgraid-sd6661-1
node $id=f4e5e15c-d06b-4e37-89b9-4621af05128f mgraid-sd6661-0
primitive SSJD6662 ocf:omneon:ss \
params ss_resource=SSJD6662 
ssconf=/var/omneon/config/config.JD6662 \
op monitor interval=3s role=Master timeout=7s \
op monitor interval=10s role=Slave timeout=7 \
op stop interval=0 timeout=20 \
op start interval=0 timeout=300
primitive SSSD6661 ocf:omneon:ss \
params ss_resource=SSSD6661 
ssconf=/var/omneon/config/config.SD6661 \
op monitor interval=3s role=Master timeout=7s \
op monitor interval=10s role=Slave timeout=7 \
op stop interval=0 timeout=20 \
op start interval=0 timeout=300
primitive icms lsb:S53icms \
op monitor interval=5s timeout=7 \
op start interval=0 timeout=5
primitive mgraid-stonith stonith:external/mgpstonith \
params hostlist=mgraid-canister \
op monitor interval=0 timeout=20s
primitive omserver lsb:S49omserver \
op monitor interval=5s timeout=7 \
op start interval=0 timeout=5
ms ms-SSJD6662 SSJD6662 \
meta clone-max=2 notify=true globally-unique=false 
target-role=Started
ms ms-SSSD6661 SSSD6661 \
meta clone-max=2 notify=true globally-unique=false 
target-role=Started
clone Fencing mgraid-stonith
clone cloneIcms icms
clone cloneOmserver omserver
location ms-SSJD6662-master-w1 ms-SSJD6662 \
rule $id=ms-SSJD6662-master-w1-rule $role=master 100: #uname eq 
mgraid-sd6661-1
location ms-SSSD6661-master-w1 ms-SSSD6661 \
rule $id=ms-SSSD6661-master-w1-rule $role=master 100: #uname eq 
mgraid-sd6661-0
order orderms-SSJD6662 0: cloneIcms ms-SSJD6662
order orderms-SSSD6661 0: cloneIcms ms-SSSD6661
property $id=cib-bootstrap-options \
dc-version=1.0.9-89bd754939df5150de7cd76835f98fe90851b677 \
cluster-infrastructure=Heartbeat \
dc-deadtime=5s \
stonith-enabled=true \
last-lrm-refresh=1301536426


  

___
Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://developerbugs.linux-foundation.org/enter_bug.cgi?product=Pacemaker


Re: [Pacemaker] WARN: msg_to_op(1324): failed to get the value of field lrm_opstatus from a ha_msg

2011-03-25 Thread Bob Schatz
A few more thoughts that occurred after I hit return

1.  This problem sees to only occur when /etc/init.d/heartbeat start is 
executed on two nodes at the same time.  If I only do one at a time it does not 
seem to occur.  (this may be related to the creation of master/slave resources 
in /etc/ha.d/resource.d/startstop when heartbeat starts)
2.  This problem seemed to occur most frequently when I went from 4 
master/slave 
resources to 6 master/slave resources.

Thanks,

Bob


- Original Message 
From: Bob Schatz bsch...@yahoo.com
To: The Pacemaker cluster resource manager pacemaker@oss.clusterlabs.org
Sent: Fri, March 25, 2011 4:22:39 PM
Subject: Re: [Pacemaker] WARN: msg_to_op(1324): failed to get the value of 
field 
lrm_opstatus from a ha_msg

After reading more threads, I noticed that I needed to include the PE outputs.

Therefore, I have rerun the tests and included the PE outputs, the 
configuration 

file and the logs for both nodes.

The test was rerun with max-children of 20.

Thanks,

Bob


- Original Message 
From: Bob Schatz bsch...@yahoo.com
To: pacemaker@oss.clusterlabs.org
Sent: Thu, March 24, 2011 7:35:54 PM
Subject: [Pacemaker] WARN: msg_to_op(1324): failed to get the value of field 
lrm_opstatus from a ha_msg

I am getting these messages in the log:

   2011-03-24 18:53:12| warning |crmd: [27913]: WARN: msg_to_op(1324): failed 
to 


get the value of field lrm_opstatus from  a ha_msg
   2011-03-24 18:53:12| info |crmd: [27913]: info: msg_to_op: Message follows:
   2011-03-24 18:53:12| info |crmd: [27913]: info: MSG: Dumping message with 16 
fields
   2011-03-24 18:53:12| info |crmd: [27913]: info: MSG[0] : [lrm_t=op]
   2011-03-24 18:53:12| info |crmd: [27913]: info: MSG[1] : 
[lrm_rid=SSJE02A2:0]
   2011-03-24 18:53:12| info |crmd: [27913]: info: MSG[2] : [lrm_op=start]
   2011-03-24 18:53:12| info |crmd: [27913]: info: MSG[3] : [lrm_timeout=30]
   2011-03-24 18:53:12| info |crmd: [27913]: info: MSG[4] : [lrm_interval=0]
   2011-03-24 18:53:12| info |crmd: [27913]: info: MSG[5] : [lrm_delay=0]
   2011-03-24 18:53:12| info |crmd: [27913]: info: MSG[6] : [lrm_copyparams=1]
   2011-03-24 18:53:12| info |crmd: [27913]: info: MSG[7] : [lrm_t_run=0]
   2011-03-24 18:53:12| info |crmd: [27913]: info: MSG[8] : [lrm_t_rcchange=0]
   2011-03-24 18:53:12| info |crmd: [27913]: info: MSG[9] : [lrm_exec_time=0]
   2011-03-24 18:53:12| info |crmd: [27913]: info: MSG[10] : [lrm_queue_time=0]
   2011-03-24 18:53:12| info |crmd: [27913]: info: MSG[11] : [lrm_targetrc=-1]
   2011-03-24 18:53:12| info |crmd: [27913]: info: MSG[12] : [lrm_app=crmd]
   2011-03-24 18:53:12| info |crmd: [27913]: info: MSG[13] : 
[lrm_userdata=91:3:0:dc9ad1c7-1d74-4418-a002-34426b34b576]
   2011-03-24 18:53:12| info |crmd: [27913]: info: MSG[14] : 
[(2)lrm_param=0x64c230(938 1098)]
   2011-03-24 18:53:12| info |crmd: [27913]: info: MSG: Dumping message with 27 
fields
   2011-03-24 18:53:12| info |crmd: [27913]: info: MSG[0] : [CRM_meta_clone=0]
   2011-03-24 18:53:12| info |crmd: [27913]: info: MSG[1] : 
[CRM_meta_notify_slave_resource= ]
   2011-03-24 18:53:12| info |crmd: [27913]: info: MSG[2] : 
[CRM_meta_notify_active_resource= ]
   2011-03-24 18:53:12| info |crmd: [27913]: info: MSG[3] : 
[CRM_meta_notify_demote_uname= ]
   2011-03-24 18:53:12| info |crmd: [27913]: info: MSG[4] : 
[CRM_meta_notify_inactive_resource=SSJE02A2:0 SSJE02A2:1 ]
   2011-03-24 18:53:12| info |crmd: [27913]: info: MSG[5] : 
[ssconf=/var/omneon/config/config.JE02A2]
   2011-03-24 18:53:12| info |crmd: [27913]: info: MSG[6] : 
[CRM_meta_master_node_max=1]
   2011-03-24 18:53:12| info |crmd: [27913]: info: MSG[7] : 
[CRM_meta_notify_stop_resource= ]
   2011-03-24 18:53:12| info |crmd: [27913]: info: MSG[8] : 
[CRM_meta_notify_master_resource= ]
   2011-03-24 18:53:12| info |crmd: [27913]: info: MSG[9] : 
[CRM_meta_clone_node_max=1]
   2011-03-24 18:53:12| info |crmd: [27913]: info: MSG[10] : 
[CRM_meta_clone_max=2]
   2011-03-24 18:53:12| info |crmd: [27913]: info: MSG[11] : 
[CRM_meta_notify=true]
   2011-03-24 18:53:12| info |crmd: [27913]: info: MSG[12] : 
[CRM_meta_notify_start_resource=SSJE02A2:0 SSJE02A2:1 ]
   2011-03-24 18:53:12| info |crmd: [27913]: info: MSG[13] : 
[CRM_meta_notify_stop_uname= ]
   2011-03-24 18:53:12| info |crmd: [27913]: info: MSG[14] : 
[crm_feature_set=3.0.1]
   2011-03-24 18:53:12| info |crmd: [27913]: info: MSG[15] : 
[CRM_meta_notify_master_uname= ]
   2011-03-24 18:53:12| info |crmd: [27913]: info: MSG[16] : 
[CRM_meta_master_max=1]
   2011-03-24 18:53:12| info |crmd: [27913]: info: MSG[17] : 
[CRM_meta_globally_unique=false]
   2011-03-24 18:53:12| info |crmd: [27913]: info: MSG[18] : 
[CRM_meta_notify_promote_resource=SSJE02A2:0 ]
   2011-03-24 18:53:12| info |crmd: [27913]: info: MSG[19] : 
[CRM_meta_notify_promote_uname=mgraid-se02a1-0 ]
   2011-03-24 18:53:12| info |crmd: [27913]: info: MSG[20] : 
[CRM_meta_notify_active_uname= ]
   2011-03-24 18:53:12| info

Re: [Pacemaker] Return value from promote function

2011-02-16 Thread Bob Schatz
Thanks Andrew!

This works.


- Original Message 
From: Andrew Beekhof and...@beekhof.net
To: The Pacemaker cluster resource manager pacemaker@oss.clusterlabs.org
Sent: Thu, February 10, 2011 1:37:52 AM
Subject: Re: [Pacemaker] Return value from promote function

On Tue, Feb 8, 2011 at 3:42 AM, Bob Schatz bsch...@yahoo.com wrote:
 I am running Pacemaker 1.0.9.1 and Heartbeat 3.0.3.

 I have a master/slave resource with an agent.

 When the resource hangs while doing a promote, the resource returns
 OCF_ERR_GENERIC.

 However, all this does is call demote on the resource, restart the resource on
 the same node and then retry the promote again on the same node.

 Is there anyway I can have the CRM promote the resource on the peer node
 instead?

Have the agent set a different promotion score with crm_master.


 My configuration is:

 node $id=856c1f72-7cd1-4906-8183-8be87eef96f2 mgraid-mkp9010repk-1
 node $id=f4e5e15c-d06b-4e37-89b9-4621af05128f mgraid-mkp9010repk-0
 primitive SSMKP9010REPK ocf:omneon:ss \
params ss_resource=SSMKP9010REPK
 ssconf=/var/omneon/config/config.MKP9010REPK \
op monitor interval=3s role=Master timeout=7s \
op monitor interval=10s role=Slave timeout=7 \
op stop interval=0 timeout=120 \
op start interval=0 timeout=600
 primitive icms lsb:S53icms \
op monitor interval=5s timeout=7 \
op start interval=0 timeout=5
 primitive mgraid-stonith stonith:external/mgpstonith \
params hostlist=mgraid-canister \
op monitor interval=0 timeout=20s
 primitive omserver lsb:S49omserver \
op monitor interval=5s timeout=7 \
op start interval=0 timeout=5
 ms ms-SSMKP9010REPK SSMKP9010REPK \
meta clone-max=2 notify=true globally-unique=false
 target-role=Master
 clone Fencing mgraid-stonith
 clone cloneIcms icms
 clone cloneOmserver omserver
 location ms-SSMKP9010REPK-master-w1 ms-SSMKP9010REPK \
rule $id=ms-SSMKP9010REPK-master-w1-rule $role=master 100: #uname 
eq
 mgraid-mkp9010repk-0
 order orderms-SSMKP9010REPK 0: cloneIcms ms-SSMKP9010REPK
 property $id=cib-bootstrap-options \
dc-version=1.0.9-89bd754939df5150de7cd76835f98fe90851b677 \
cluster-infrastructure=Heartbeat \
dc-deadtime=5s \
stonith-enabled=true



 Thanks,

 Bob




 ___
 Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
 http://oss.clusterlabs.org/mailman/listinfo/pacemaker

 Project Home: http://www.clusterlabs.org
 Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
 Bugs: 
http://developerbugs.linux-foundation.org/enter_bug.cgi?product=Pacemaker


___
Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://developerbugs.linux-foundation.org/enter_bug.cgi?product=Pacemaker



  

___
Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://developerbugs.linux-foundation.org/enter_bug.cgi?product=Pacemaker


[Pacemaker] Return value from promote function

2011-02-07 Thread Bob Schatz
I am running Pacemaker 1.0.9.1 and Heartbeat 3.0.3.

I have a master/slave resource with an agent.

When the resource hangs while doing a promote, the resource returns 
OCF_ERR_GENERIC.

However, all this does is call demote on the resource, restart the resource on 
the same node and then retry the promote again on the same node.

Is there anyway I can have the CRM promote the resource on the peer node 
instead?

My configuration is:

node $id=856c1f72-7cd1-4906-8183-8be87eef96f2 mgraid-mkp9010repk-1
node $id=f4e5e15c-d06b-4e37-89b9-4621af05128f mgraid-mkp9010repk-0
primitive SSMKP9010REPK ocf:omneon:ss \
params ss_resource=SSMKP9010REPK 
ssconf=/var/omneon/config/config.MKP9010REPK \
op monitor interval=3s role=Master timeout=7s \
op monitor interval=10s role=Slave timeout=7 \
op stop interval=0 timeout=120 \
op start interval=0 timeout=600
primitive icms lsb:S53icms \
op monitor interval=5s timeout=7 \
op start interval=0 timeout=5
primitive mgraid-stonith stonith:external/mgpstonith \
params hostlist=mgraid-canister \
op monitor interval=0 timeout=20s
primitive omserver lsb:S49omserver \
op monitor interval=5s timeout=7 \
op start interval=0 timeout=5
ms ms-SSMKP9010REPK SSMKP9010REPK \
meta clone-max=2 notify=true globally-unique=false 
target-role=Master
clone Fencing mgraid-stonith
clone cloneIcms icms
clone cloneOmserver omserver
location ms-SSMKP9010REPK-master-w1 ms-SSMKP9010REPK \
rule $id=ms-SSMKP9010REPK-master-w1-rule $role=master 100: #uname 
eq 
mgraid-mkp9010repk-0
order orderms-SSMKP9010REPK 0: cloneIcms ms-SSMKP9010REPK
property $id=cib-bootstrap-options \
dc-version=1.0.9-89bd754939df5150de7cd76835f98fe90851b677 \
cluster-infrastructure=Heartbeat \
dc-deadtime=5s \
stonith-enabled=true



Thanks,

Bob


  

___
Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://developerbugs.linux-foundation.org/enter_bug.cgi?product=Pacemaker


Re: [Pacemaker] OCF RA dev guide: final heads up

2010-12-06 Thread Bob Schatz
Florain,

Comments below with [BS]


Thanks,

Bob


- Original Message 
From: Florian Haas florian.h...@linbit.com
To: pacemaker@oss.clusterlabs.org
Sent: Mon, December 6, 2010 7:25:28 AM
Subject: Re: [Pacemaker] OCF RA dev guide: final heads up

Hello Bob,

On 2010-12-03 20:12, Bob Schatz wrote:
 Florian,
 
 Thanks for writing this!
 
 I already found one or two errors related to return codes in my agent based 
 on 

 your document. :)
 
 I have not read the entire document but I do have these comments:
 
 1.Does this document apply to all versions of the agent framework or only 
 certain versions(hopefully all in one place)?  I think the document should 
 have 

 a section which specifies which versions are covered.  Also, if certain areas 
 only apply to a certain version then a Note should be mentioned in the 
 section.
 
 2.In Section 3.8  OCF_NOT_RUNNING, how can a monitor return 
 OCF_FAILED_MASTER?   Is there an environment variable passed to the monitor 
 action which says I think you are a master - tell me if you or are not?

No, the very purpose of monitor is to _find out_ the status of the
resource. If the resource can query its own master/slave status, it
should do so, and then if it is both a master and failed, it should
return OCF_FAILED_MASTER.


[BS] Okay.  That makes sense now.

 3.In Section 5.3 monitor action, it would be nice if you show how a 
 OCF_FAILED_MASTER is returned.

Hm. Let me defer that for a little bit.


[BS] Sounds good

 4.Sections 5.8 migrate_to action and 5.9 migrate_from action, do these 
apply 

 to master/slave resources also or only to primitive resources?

Good question, and indeed I don't know. It's conceivable that a clone
set (remember, m/s are just clones with a little extra) has a clone-max
that is less than the number of nodes in the cluster, and supports
migration, and therefore a clone instance should be able to live-migrate
to a different node. I have no clue whether it's indeed implemented that
way, though.

Andrew, maybe you can shed some extra light on this?

 5.Section 5.10 notify action, I think you to want to add a note/reference 
to 

 the Pacemaker Configuration Explained section 10.3.3.9 Proper Interpretation 
of 

 Notification Environment Variables.  (Section name may be different as I was 
 looking at 1.0 from about a year ago).

Good idea. I'll put that on my to-do list.

 6.Section 8.4 Specifying a master preference, starting in at least 
 version 

 of Pacemaker 1.0.9.1 it is possible to specify a negative master score.  I 
think 

 it would be good to add this to the example as well as a note about which 
 version has this functionality since it was broken in 1.0.6.

Don't you think this would just royally confuse people?


[BS] You are probably right.  I guess you don't want to document bugs and 
workarounds from past releases in the current manual.   That makes sense.


Florian


  

___
Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://developerbugs.linux-foundation.org/enter_bug.cgi?product=Pacemaker


Re: [Pacemaker] OCF RA dev guide: final heads up

2010-12-03 Thread Bob Schatz
Florian,

Thanks for writing this!

I already found one or two errors related to return codes in my agent based on 
your document. :)

I have not read the entire document but I do have these comments:

1.Does this document apply to all versions of the agent framework or only 
certain versions(hopefully all in one place)?  I think the document should have 
a section which specifies which versions are covered.  Also, if certain areas 
only apply to a certain version then a Note should be mentioned in the 
section.

2.In Section 3.8  OCF_NOT_RUNNING, how can a monitor return 
OCF_FAILED_MASTER?   Is there an environment variable passed to the monitor 
action which says I think you are a master - tell me if you or are not?

3.In Section 5.3 monitor action, it would be nice if you show how a 
OCF_FAILED_MASTER is returned.

4.Sections 5.8 migrate_to action and 5.9 migrate_from action, do these 
apply 
to master/slave resources also or only to primitive resources?

5.Section 5.10 notify action, I think you to want to add a note/reference 
to 
the Pacemaker Configuration Explained section 10.3.3.9 Proper Interpretation 
of 
Notification Environment Variables.  (Section name may be different as I was 
looking at 1.0 from about a year ago).

6.Section 8.4 Specifying a master preference, starting in at least version 
of Pacemaker 1.0.9.1 it is possible to specify a negative master score.  I 
think 
it would be good to add this to the example as well as a note about which 
version has this functionality since it was broken in 1.0.6.


Thanks,

Bob


- Original Message 
From: Florian Haas florian.h...@linbit.com
To: High-Availability Linux Development List linux-ha-...@lists.linux-ha.org; 
The Pacemaker cluster resource manager pacemaker@oss.clusterlabs.org; 
cluster-de...@redhat.com
Sent: Fri, December 3, 2010 1:46:29 AM
Subject: [Pacemaker] OCF RA dev guide: final heads up

Folks,

I've heard a few positive and zero negative reviews about the current
OCF resource agent dev guide draft, so I intend to publish a first
released version early next week. It's going to go up on the
linux-ha.org web site initially, and will stay there until it finds a
better home.

If anyone has objections, please let me know.

The current draft is here:

http://people.linbit.com/~florian/ra-dev-guide/ (HTML)
http://people.linbit.com/~florian/ra-dev-guide.pdf (PDF)

Cheers,
Florian


  

___
Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://developerbugs.linux-foundation.org/enter_bug.cgi?product=Pacemaker


[Pacemaker] (no subject)

2010-11-13 Thread Bob Schatz
pLunch this week?brbrbr/p
pSent from Yahoo! Mail on Android/p



  ___
Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://developerbugs.linux-foundation.org/enter_bug.cgi?product=Pacemaker


[Pacemaker] Question about fix for bug 2477

2010-11-09 Thread Bob Schatz
I am using 1.0.9.1 of Pacemaker.

I have applied the fix for bug 2477 and it is not working for me.
I started with this:  # crm_mon -n -1  Last updated: Mon Nov  8 
09:49:07 2010 Stack: Heartbeat Current DC: mgraid-mkp9010repk-0 
(f4e5e15c-d06b-4e37-89b9-4621af05128f) - partition with quorum Version: 
1.0.9-89bd754939df5150de7cd76835f98fe90851b677 2 Nodes configured, unknown 
expected votes 4 Resources configured.   Node 
mgraid-mkp9010repk-0 (f4e5e15c-d06b-4e37-89b9-4621af05128f): online 
SSMKP9010REPK:0 (ocf::omneon:ss) Master icms:0  (lsb:S53icms) 
Started mgraid-stonith:0(stonith:external/mgpstonith) Started   
  
omserver:0  (lsb:S49omserver) Started Node mgraid-mkp9010repk-1 
(856c1f72-7cd1-4906-8183-8be87eef96f2): online omserver:1  
(lsb:S49omserver) Started SSMKP9010REPK:1 (ocf::omneon:ss) 
Slave 
icms:1  (lsb:S53icms) Started mgraid-stonith:1
(stonith:external/mgpstonith) Started  This is the output I received:  # 
./crm_resource -r ms-SSMKP9010REPK -W resource ms-SSMKP9010REPK is 
running on: mgraid-mkp9010repk-0 resource ms-SSMKP9010REPK is running 
on: mgraid-mkp9010repk-1
The bug fix adds this check:
 if((the_rsc-variant == pe_native)  (the_rsc-role == RSC_ROLE_MASTER)) 
{  
state = Master;  }  fprintf(stdout, resource %s is running on: %s 
%s\n,  rsc, node-details-uname, state); 

When I dump the_rsc with the debugger I see that the_rsc-variant is pe_master 
and not pe_native.

Also, the_rsc-role is RSC_ROLE_STOPPED.  This is even if I use the original 
crm_resource.c.  The complete dump of the the_rsc structure is:

(gdb) print *the_rsc
$2 = {id = 0x64d260 ms-SSMKP9010REPK, clone_name = 0x0,
  long_name = 0x64d280 ms-SSMKP9010REPK, xml = 0x634ca0, ops_xml = 0x0, 
parent = 0x0,
  variant_opaque = 0x64d6a0, variant = pe_master, fns = 0x7f8496b67f00, cmds = 
0x0,
  recovery_type = recovery_stop_start, restart_type = pe_restart_ignore, 
priority = 0, stickiness = 0,
  sort_index = 0, failure_timeout = 0, effective_priority = 0, 
migration_threshold = 100,
  flags = 262418, rsc_cons_lhs = 0x0, rsc_cons = 0x0, rsc_location = 0x0, 
actions = 0x0,
  allocated_to = 0x0, running_on = 0x658060, known_on = 0x0, allowed_nodes = 
0x60e2c0,
  role = RSC_ROLE_STOPPED, next_role = RSC_ROLE_MASTER, meta = 0x648990, 
parameters = 0x648940,
  children = 0x610280}


Any idea why this can happen?

Is there another fix I need for 1.0.9.1 to make this change work?


Thanks,

Bob


  

___
Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://developerbugs.linux-foundation.org/enter_bug.cgi?product=Pacemaker


Re: [Pacemaker] Best way to find master node

2010-08-26 Thread Bob Schatz
Thanks - filed as 2477

Thanks,

Bob


- Original Message 
From: Andrew Beekhof and...@beekhof.net
To: The Pacemaker cluster resource manager pacemaker@oss.clusterlabs.org
Sent: Wed, August 25, 2010 11:22:24 PM
Subject: Re: [Pacemaker] Best way to find master node

On Wed, Aug 25, 2010 at 6:39 PM, Bob Schatz bsch...@yahoo.com wrote:
 Yes it does.

Ok.  Could you create a bugzilla for this please? I'll make sure it gets fixed.

 Here is output from a different cluster which is at the same 1.0.9.1 Pacemaker
 version.

 # crm_mon -n -1
 
 Last updated: Wed Aug 25 09:35:51 2010
 Stack: Heartbeat
 Current DC: mg-wd-wcaw30021216-0 (f4e5e15c-d06b-4e37-89b9-4621af05128f) -
 partition with quorum
 Version: 1.0.9-89bd754939df5150de7cd76835f98fe90851b677
 2 Nodes configured, unknown expected votes
 2 Resources configured.
 

 Node mg-wd-wcaw30021216-0 (f4e5e15c-d06b-4e37-89b9-4621af05128f): online
SSWD-WCAW30021216:0 (ocf::omneon:ss) Master
SSWD-WCAW30021767:0 (ocf::omneon:ss) Slave
 Node mg-wd-wcaw30021216-1 (856c1f72-7cd1-4906-8183-8be87eef96f2): online
SSWD-WCAW30021767:1 (ocf::omneon:ss) Master
SSWD-WCAW30021216:1 (ocf::omneon:ss) Slave
 [r...@mg-wd-wcaw30021216-0 ~]# crm resource show
 INFO: building help index
  Master/Slave Set: ms-SSWD-WCAW30021216
 Masters: [ mg-wd-wcaw30021216-0 ]
 Slaves: [ mg-wd-wcaw30021216-1 ]
  Master/Slave Set: ms-SSWD-WCAW30021767
 Masters: [ mg-wd-wcaw30021216-1 ]
 Slaves: [ mg-wd-wcaw30021216-0 ]
 [r...@mg-wd-wcaw30021216-0 ~]# crm resource status ms-SSWD-WCAW30021216
 resource ms-SSWD-WCAW30021216 is running on: mg-wd-wcaw30021216-0
 resource ms-SSWD-WCAW30021216 is running on: mg-wd-wcaw30021216-1
 [r...@mg-wd-wcaw30021216-0 ~]# crm resource status ms-SSWD-WCAW30021767
 resource ms-SSWD-WCAW30021767 is running on: mg-wd-wcaw30021216-0
 resource ms-SSWD-WCAW30021767 is running on: mg-wd-wcaw30021216-1
 [r...@mg-wd-wcaw30021216-0 ~]#


 Thanks,

 Bob



 - Original Message 
 From: Andrew Beekhof and...@beekhof.net
 To: The Pacemaker cluster resource manager pacemaker@oss.clusterlabs.org
 Sent: Tue, August 24, 2010 11:38:25 PM
 Subject: Re: [Pacemaker] Best way to find master node

 On Tue, Aug 24, 2010 at 12:37 AM, Bob Schatz bsch...@yahoo.com wrote:
 I would like to find the master node for a resource.

 On 1.0.9.1, when I do:

# crm resource status ms-SSWD-WCAW30019072
resource ms-SSWD-WCAW30019072 is  running on: box-0
resource ms-SSWD-WCAW30019072 is  running on: box-1

 This does not tell me if it is master or slave.

 Does crm_mon indicate that either have been promoted to master?


 I found this thread:

http://www.gossamer-threads.com/lists/linuxha/pacemaker/60434?search_string=crm_resource%20master%20;#60434

4


 but I could not find a bug filed.

 Can I file a bug for this?  Would it be on crm_resource?

 Is there any workaround for this?


 Thanks,

 Bob




 ___
 Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
 http://oss.clusterlabs.org/mailman/listinfo/pacemaker

 Project Home: http://www.clusterlabs.org
 Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
 Bugs:
 http://developerbugs.linux-foundation.org/enter_bug.cgi?product=Pacemaker


 ___
 Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
 http://oss.clusterlabs.org/mailman/listinfo/pacemaker

 Project Home: http://www.clusterlabs.org
 Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
 Bugs: 
http://developerbugs.linux-foundation.org/enter_bug.cgi?product=Pacemaker





 ___
 Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
 http://oss.clusterlabs.org/mailman/listinfo/pacemaker

 Project Home: http://www.clusterlabs.org
 Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
 Bugs: 
http://developerbugs.linux-foundation.org/enter_bug.cgi?product=Pacemaker


___
Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://developerbugs.linux-foundation.org/enter_bug.cgi?product=Pacemaker



  

___
Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://developerbugs.linux-foundation.org/enter_bug.cgi?product=Pacemaker


Re: [Pacemaker] Best way to find master node

2010-08-25 Thread Bob Schatz
Yes it does.

Here is output from a different cluster which is at the same 1.0.9.1 Pacemaker 
version.

# crm_mon -n -1

Last updated: Wed Aug 25 09:35:51 2010
Stack: Heartbeat
Current DC: mg-wd-wcaw30021216-0 (f4e5e15c-d06b-4e37-89b9-4621af05128f) - 
partition with quorum
Version: 1.0.9-89bd754939df5150de7cd76835f98fe90851b677
2 Nodes configured, unknown expected votes
2 Resources configured.


Node mg-wd-wcaw30021216-0 (f4e5e15c-d06b-4e37-89b9-4621af05128f): online
SSWD-WCAW30021216:0 (ocf::omneon:ss) Master
SSWD-WCAW30021767:0 (ocf::omneon:ss) Slave
Node mg-wd-wcaw30021216-1 (856c1f72-7cd1-4906-8183-8be87eef96f2): online
SSWD-WCAW30021767:1 (ocf::omneon:ss) Master
SSWD-WCAW30021216:1 (ocf::omneon:ss) Slave
[r...@mg-wd-wcaw30021216-0 ~]# crm resource show
INFO: building help index
 Master/Slave Set: ms-SSWD-WCAW30021216
 Masters: [ mg-wd-wcaw30021216-0 ]
 Slaves: [ mg-wd-wcaw30021216-1 ]
 Master/Slave Set: ms-SSWD-WCAW30021767
 Masters: [ mg-wd-wcaw30021216-1 ]
 Slaves: [ mg-wd-wcaw30021216-0 ]
[r...@mg-wd-wcaw30021216-0 ~]# crm resource status ms-SSWD-WCAW30021216
resource ms-SSWD-WCAW30021216 is running on: mg-wd-wcaw30021216-0
resource ms-SSWD-WCAW30021216 is running on: mg-wd-wcaw30021216-1
[r...@mg-wd-wcaw30021216-0 ~]# crm resource status ms-SSWD-WCAW30021767
resource ms-SSWD-WCAW30021767 is running on: mg-wd-wcaw30021216-0
resource ms-SSWD-WCAW30021767 is running on: mg-wd-wcaw30021216-1
[r...@mg-wd-wcaw30021216-0 ~]#


Thanks,

Bob



- Original Message 
From: Andrew Beekhof and...@beekhof.net
To: The Pacemaker cluster resource manager pacemaker@oss.clusterlabs.org
Sent: Tue, August 24, 2010 11:38:25 PM
Subject: Re: [Pacemaker] Best way to find master node

On Tue, Aug 24, 2010 at 12:37 AM, Bob Schatz bsch...@yahoo.com wrote:
 I would like to find the master node for a resource.

 On 1.0.9.1, when I do:

# crm resource status ms-SSWD-WCAW30019072
resource ms-SSWD-WCAW30019072 is  running on: box-0
resource ms-SSWD-WCAW30019072 is  running on: box-1

 This does not tell me if it is master or slave.

Does crm_mon indicate that either have been promoted to master?


 I found this thread:

http://www.gossamer-threads.com/lists/linuxha/pacemaker/60434?search_string=crm_resource%20master%20;#60434
4


 but I could not find a bug filed.

 Can I file a bug for this?  Would it be on crm_resource?

 Is there any workaround for this?


 Thanks,

 Bob




 ___
 Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
 http://oss.clusterlabs.org/mailman/listinfo/pacemaker

 Project Home: http://www.clusterlabs.org
 Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
 Bugs: 
http://developerbugs.linux-foundation.org/enter_bug.cgi?product=Pacemaker


___
Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://developerbugs.linux-foundation.org/enter_bug.cgi?product=Pacemaker



  

___
Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://developerbugs.linux-foundation.org/enter_bug.cgi?product=Pacemaker


[Pacemaker] Best way to find master node

2010-08-23 Thread Bob Schatz
I would like to find the master node for a resource.

On 1.0.9.1, when I do:

# crm resource status ms-SSWD-WCAW30019072
resource ms-SSWD-WCAW30019072 is  running on: box-0
resource ms-SSWD-WCAW30019072 is  running on: box-1

This does not tell me if it is master or slave.

I found this thread:

http://www.gossamer-threads.com/lists/linuxha/pacemaker/60434?search_string=crm_resource%20master%20;#60434


but I could not find a bug filed.

Can I file a bug for this?  Would it be on crm_resource?

Is there any workaround for this?


Thanks,

Bob


  

___
Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://developerbugs.linux-foundation.org/enter_bug.cgi?product=Pacemaker


Re: [Pacemaker] Pacemaker 1.0.8 and -INFINITY master score

2010-08-13 Thread Bob Schatz
Dejan,

Thanks for the quick response!

Comments below with [BS]

- Original Message 
From: Dejan Muhamedagic deja...@fastmail.fm
To: The Pacemaker cluster resource manager pacemaker@oss.clusterlabs.org
Sent: Fri, August 13, 2010 3:03:49 AM
Subject: Re: [Pacemaker] Pacemaker 1.0.8 and -INFINITY master score

Hi,

On Thu, Aug 12, 2010 at 12:54:10PM -0700, Bob Schatz wrote:
 I upgraded to Pacemaker 1.0.8 since my application consists of Master/Slave 
 resources and I wanted to pick up the fix for setting negative master scores.

Why not to 1.0.9.1?


[BS] Because when I checked the 
link http://www.clusterlabs.org/wiki/Get_Pacemaker 
 it seemed to indicated that 1.0.8 was the last one fully tested with 
the rest of the stack
 (heartbeat, etc).   Are there fixes in 1.0.9 in this area?  I 
generally 
am conservative about
upgrading HA software until it has been out a while.  :)

 I am now able to set negative master scores when a resources starts and is 
SLAVE 

 but can't be promoted.  (The reason I want this is the process needs an 
 administrative override and has to be up and running for the administrative 
 override.)
 
 However, when I test this on a one node cluster I see that the resource loops 
 through the cycle (attempt promote, timeout, stop the resource, start the 
 resource, ...).
 
 I would have thought that a master score of -INFINITY would have prevented 
 the 

 promotion.

Yes, sounds like it. Where is the score set? Wouldn't resource
demote do the right thing?

[BS] I used the ~October 2009 DRBD agent as a reference.  The crm_master -Q -l 
reboot -v -10
call is made at the end of the start() entry point in the agent.
 I am not sure what you mean by resource demote

 My configuration is:
 
 node $id=f4e5e15c-d06b-4e37-89b9-4621af05128f mgraid-bob1-0
 primitive SSJ5AMKP9010REPK ocf:omneon:ss \
 params ss_resource=SSJ5AMKP9010REPK 
 ssconf=/var/omneon/config/config.J5AMKP9010REPK \
 op monitor interval=3s role=Master timeout=7s \
 op monitor interval=10s role=Slave timeout=7 \
 op stop interval=0 timeout=100 \
 op start interval=0 timeout=120 \
 meta id=SSJ5AMKP9010REPK-meta_attributes
 ms ms-SSJ5AMKP9010REPK SSJ5AMKP9010REPK \
 meta clone-max=2 notify=true globaally-unique=false 

You have a typo here.

[BS] AH!!!  I owe you at least one drink for that!!!


Thanks,

Bob

Thanks,

Dejan

 target-role=Master
 location ms-SSJ5AMKP9010REPK-master-w1 ms-SSJ5AMKP9010REPK \
 rule $id=ms-SSJ5AMKP9010REPK-master-w1-rule $role=master 100: 
#uname 

 eq mgraid-BOB1-0
 property $id=cib-bootstrap-options \
 dc-version=1.0.8-9881a7350d6182bae9e8e557cf20a3cc5dac3ee7 \
 cluster-infrastructure=Heartbeat \
 stonith-enabled=false
 
 Is this the expected behavior?
 
 
 Thanks,
 
 Bob
 
 
 
  
 
 
 ___
 Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
 http://oss.clusterlabs.org/mailman/listinfo/pacemaker
 
 Project Home: http://www.clusterlabs.org
 Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
 Bugs: 
http://developerbugs.linux-foundation.org/enter_bug.cgi?product=Pacemaker

___
Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://developerbugs.linux-foundation.org/enter_bug.cgi?product=Pacemaker



  


___
Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://developerbugs.linux-foundation.org/enter_bug.cgi?product=Pacemaker


[Pacemaker] crm and primitive meta id - 1.0.8 vs 1.0.9

2010-08-13 Thread Bob Schatz
On 1.0.6 and 1.0.8 I use to do this to create a primitive:

crm configure primitive SS1 ocf:omneon:ss params ss_resource=SS1 \
ssconf=${CONFIG_FILE} op monitor interval=3s role=Master \
timeout=7s op monitor interval=10s role=Slave timeout=7 \
op stop timeout=100 op start timeout=120 \
meta id=SS1-meta_attributes

However, when I do this with 1.0.9, I get this error:

   crm configure primitive SS1 ocf:omneon:ss params ss_resource=SS1 ssconf= op 
monitor interval=3s role=Master timeout=7s op monitor  interval=10s 
role=Slave timeout=7 op stop timeout=100 op start timeout=120 meta 
id=SS1-meta_attributes
ERROR: SS1: attribute id does not exist

I have to admit that I do not know what meta id=SS1-meta_attributes does.  I 
assume I ran about it but I cannot find the document any longer.


Thanks,

Bob



  

___
Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://developerbugs.linux-foundation.org/enter_bug.cgi?product=Pacemaker


Re: [Pacemaker] Pacemaker 1.0.8 and -INFINITY master score

2010-08-13 Thread Bob Schatz
Dejan,

I tested this with 1.0.9.1.   If a negative master score is set then it does 
not 
call promote.  Thanks!

However,  I still see notifications messages as shown below but it does not 
appear that the actual notification entry point is called in the agent.
If you agree, I will file a bug.

Aug 13 17:02:20 mgraid-MGG90106M6T-0 pengine: [4463]: info: master_color: 
ms-SSJ5AMGG90106M6T: Promoted 0 instances of a possible 1 to master

Aug 13 17:03:34 mgraid-MGG90106M6T-0 crmd: [1102]: info: te_rsc_command: 
Initiating action 44: notify SSJ5AMGG90106M6T:0_pre_notify_promote_0 on 
mgraid-mgg90106m6t-1

Aug 13 17:02:20 mgraid-MGG90106M6T-0 pengine: [4463]: ERROR: 
create_notification_boundaries: Creating boundaries for ms-SSJ5AMGG90106M6T
Aug 13 17:02:20 mgraid-MGG90106M6T-0 pengine: [4463]: ERROR: 
create_notification_boundaries: Creating boundaries for ms-SSJ5AMGG90106M6T
Aug 13 17:02:20 mgraid-MGG90106M6T-0 pengine: [4463]: ERROR: 
create_notification_boundaries: Creating boundaries for ms-SSJ5AMGG90106M6T
Aug 13 17:02:20 mgraid-MGG90106M6T-0 pengine: [4463]: ERROR: 
create_notification_boundaries: Creating boundaries for ms-SSJ5AMGG90106M6T

Aug 13 17:03:38 mgraid-MGG90106M6T-0 crmd: [1102]: info: te_rsc_command: Ini
tiating action 8: promote SSJ5AMGG90106M6T:0_promote_0 on mgraid-mgg9010
6m6t-1
Aug 13 17:03:40 mgraid-MGG90106M6T-0 crmd: [1102]: info: match_graph_event:
Action SSJ5AMGG90106M6T:0_promote_0 (8) confirmed on mgraid-mgg90106m6t-
1 (rc=0)
Aug 13 17:03:41 mgraid-MGG90106M6T-0 crmd: [1102]: info: te_rsc_command: Ini
tiating action 45: notify SSJ5AMGG90106M6T:0_post_notify_promote_0 on mgraid
-mgg90106m6t-1
Aug 13 17:03:43 mgraid-MGG90106M6T-0 crmd: [1102]: info: match_graph_event:
Action SSJ5AMGG90106M6T:0_post_notify_promote_0 (45) confirmed on mgraid-mgg
90106m6t-1 (rc=0)


Thanks,

Bob



- Original Message 
From: Bob Schatz bsch...@yahoo.com
To: The Pacemaker cluster resource manager pacemaker@oss.clusterlabs.org
Sent: Fri, August 13, 2010 9:25:08 AM
Subject: Re: [Pacemaker] Pacemaker 1.0.8 and -INFINITY master score

Dejan,

Thanks for the quick response!

Comments below with [BS]

- Original Message 
From: Dejan Muhamedagic deja...@fastmail.fm
To: The Pacemaker cluster resource manager pacemaker@oss.clusterlabs.org
Sent: Fri, August 13, 2010 3:03:49 AM
Subject: Re: [Pacemaker] Pacemaker 1.0.8 and -INFINITY master score

Hi,

On Thu, Aug 12, 2010 at 12:54:10PM -0700, Bob Schatz wrote:
 I upgraded to Pacemaker 1.0.8 since my application consists of Master/Slave 
 resources and I wanted to pick up the fix for setting negative master scores.

Why not to 1.0.9.1?


[BS] Because when I checked the 
link http://www.clusterlabs.org/wiki/Get_Pacemaker 
 it seemed to indicated that 1.0.8 was the last one fully tested with 
the rest of the stack
 (heartbeat, etc).   Are there fixes in 1.0.9 in this area?  I 
generally 

am conservative about
upgrading HA software until it has been out a while.  :)

 I am now able to set negative master scores when a resources starts and is 
SLAVE 

 but can't be promoted.  (The reason I want this is the process needs an 
 administrative override and has to be up and running for the administrative 
 override.)
 
 However, when I test this on a one node cluster I see that the resource loops 
 through the cycle (attempt promote, timeout, stop the resource, start the 
 resource, ...).
 
 I would have thought that a master score of -INFINITY would have prevented 
 the 


 promotion.

Yes, sounds like it. Where is the score set? Wouldn't resource
demote do the right thing?

[BS] I used the ~October 2009 DRBD agent as a reference.  The crm_master -Q -l 
reboot -v -10
call is made at the end of the start() entry point in the agent.
 I am not sure what you mean by resource demote

 My configuration is:
 
 node $id=f4e5e15c-d06b-4e37-89b9-4621af05128f mgraid-bob1-0
 primitive SSJ5AMKP9010REPK ocf:omneon:ss \
 params ss_resource=SSJ5AMKP9010REPK 
 ssconf=/var/omneon/config/config.J5AMKP9010REPK \
 op monitor interval=3s role=Master timeout=7s \
 op monitor interval=10s role=Slave timeout=7 \
 op stop interval=0 timeout=100 \
 op start interval=0 timeout=120 \
 meta id=SSJ5AMKP9010REPK-meta_attributes
 ms ms-SSJ5AMKP9010REPK SSJ5AMKP9010REPK \
 meta clone-max=2 notify=true globaally-unique=false 

You have a typo here.

[BS] AH!!!  I owe you at least one drink for that!!!


Thanks,

Bob

Thanks,

Dejan

 target-role=Master
 location ms-SSJ5AMKP9010REPK-master-w1 ms-SSJ5AMKP9010REPK \
 rule $id=ms-SSJ5AMKP9010REPK-master-w1-rule $role=master 100: 
#uname 

 eq mgraid-BOB1-0
 property $id=cib-bootstrap-options \
 dc-version=1.0.8-9881a7350d6182bae9e8e557cf20a3cc5dac3ee7 \
 cluster-infrastructure=Heartbeat \
 stonith-enabled=false

[Pacemaker] Pacemaker 1.0.8 and -INFINITY master score

2010-08-12 Thread Bob Schatz
I upgraded to Pacemaker 1.0.8 since my application consists of Master/Slave 
resources and I wanted to pick up the fix for setting negative master scores.

I am now able to set negative master scores when a resources starts and is 
SLAVE 
but can't be promoted.  (The reason I want this is the process needs an 
administrative override and has to be up and running for the administrative 
override.)

However, when I test this on a one node cluster I see that the resource loops 
through the cycle (attempt promote, timeout, stop the resource, start the 
resource, ...).

I would have thought that a master score of -INFINITY would have prevented the 
promotion.

My configuration is:

node $id=f4e5e15c-d06b-4e37-89b9-4621af05128f mgraid-bob1-0
primitive SSJ5AMKP9010REPK ocf:omneon:ss \
params ss_resource=SSJ5AMKP9010REPK 
ssconf=/var/omneon/config/config.J5AMKP9010REPK \
op monitor interval=3s role=Master timeout=7s \
op monitor interval=10s role=Slave timeout=7 \
op stop interval=0 timeout=100 \
op start interval=0 timeout=120 \
meta id=SSJ5AMKP9010REPK-meta_attributes
ms ms-SSJ5AMKP9010REPK SSJ5AMKP9010REPK \
meta clone-max=2 notify=true globaally-unique=false 
target-role=Master
location ms-SSJ5AMKP9010REPK-master-w1 ms-SSJ5AMKP9010REPK \
rule $id=ms-SSJ5AMKP9010REPK-master-w1-rule $role=master 100: 
#uname 
eq mgraid-BOB1-0
property $id=cib-bootstrap-options \
dc-version=1.0.8-9881a7350d6182bae9e8e557cf20a3cc5dac3ee7 \
cluster-infrastructure=Heartbeat \
stonith-enabled=false

Is this the expected behavior?


Thanks,

Bob



  


___
Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://developerbugs.linux-foundation.org/enter_bug.cgi?product=Pacemaker