Re: [Pacemaker] Using Pacemaker/Corosync to manage 2 node SHARED-DISK Cluster

2011-04-08 Thread mark - pacemaker list
Hi Phil,

On Fri, Apr 8, 2011 at 11:13 AM, Phil Hunt  wrote:
>
> Hi
>
> I have been playing with DRBD, thats cool
>
> But I have 2 VM RHEL linux boxes.  They each have a boot device (20g) and a 
> shared ISCSI 200G volume.
>
> I've played with ucarp and have the commands to make available/mount the disk 
> and dismount the shared disk using vgchange/mount/umount, etc.
>
> But decided to user packemaker/heartbeat, since it is more robust
> Got corosync running, vip running.
>
> But I do not see how to make the Pacemaker section mount or dismount a 
> STANDARD EXT3
> disk shared.  I've seen tons of tutorials for drbd and clustered FS  but none 
> showing a simple mount of a disk as a resouce on a node becoming master, or 
> how to dismount it.
>
> Does pacemaker run a script if requested?  Or is the mount/dismount all 
> hard-coded.
>
> I know I'm missing something simple here.  I built the following, but the 
> colocation and order commands error out as syntax error:
>
> node prodmessage1v
> node prodmessage2v
> primitive p_fs_data ocf:heartbeat:Filesystem \
>        params device="/dev/mapper/vg2-dbdata" directory="/data" fstype="ext3"
> primitive p_ip ocf:heartbeat:IPaddr2 \
>        params ip="10.64.114.80" cidr_netmask="32" \
>        op monitor interval="30s"
> primitive p_ping ocf:pacemaker:ping \
>        params name="p_ping" host_list="10.64.114.47 10.64.114.48 10.64.114.4" 
> \
>        op monitor interval="15s" timeout="30s"
> primitive p_rhap lsb:rhapsody \
>        op monitor interval="60s" timeout="120s"
> group g_cluster_services p_ip p_fs_data p_rhap
> clone c_ping p_ping \
>        meta globally-unique="false"
> location loc_ping g_cluster_services \
>        rule $id="loc_ping-rule" -inf: not_defined p_ping or p_ping lte 0
> colocation colo_mnt_on_master inf: g_cluster_services
> order ord_mount_after_drbd inf: g_cluster_services:start
> property $id="cib-bootstrap-options" \
>        dc-version="1.0.10-da7075976b5ff0bee71074385f8fd02f296ec8a3" \
>        cluster-infrastructure="openais" \
>        expected-quorum-votes="2" \
>        stonith-enabled="false" \
>        no-quorum-policy="ignore"
> rsc_defaults $id="rsc-options" \
>        resource-stickiness="100"
>


If I'm understanding your goal here, it's to have an iSCSI disk with
an LVM volume(s) formated with ext3 that moves around with your
virtual IP and the services it handles.  I'm doing the same thing with
three MySQL instances, and I'll throw my config on here but I think
all you're missing is ocf:heartbeat:LVM (you only want vg2 active on
whichever node is going to mount the filesystem) and probably
ocf:heartbeat:iscsi.  I suppose you could let all nodes attach to the
iSCSI disk but only activate the volume group on one, I just like the
simplicity of one node, one LUN.

I'm on the tail-end of beating on this configuration in a test lab
before we install with the real hardware, but it is proving to be very
robust and reliable so far:


Last updated: Fri Apr  8 18:33:43 2011
Stack: Heartbeat
Current DC: cn3.testlab.local (860664d4-6731-4af0-b596-fbeacd5ec300) -
partition with quorum
Version: 1.0.10-da7075976b5ff0bee71074385f8fd02f296ec8a3
3 Nodes configured, unknown expected votes
4 Resources configured.

Online: [ cn2.testlab.local cn3.testlab.local cn1.testlab.local ]
 Resource Group: MySQL-history
     iscsi_mysql_history (ocf::heartbeat:iscsi): Started cn1.testlab.local
     volgrp_mysql_history (ocf::heartbeat:LVM): Started cn1.testlab.local
     fs_mysql_history (ocf::heartbeat:Filesystem): Started cn1.testlab.local
     ip_mysql_history (ocf::heartbeat:IPaddr2): Started cn1.testlab.local
     mysql_history (ocf::heartbeat:mysql): Started cn1.testlab.local
 Resource Group: MySQL-hsa
     iscsi_mysql_hsa (ocf::heartbeat:iscsi): Started cn2.testlab.local
     volgrp_mysql_hsa (ocf::heartbeat:LVM): Started cn2.testlab.local
     fs_mysql_hsa (ocf::heartbeat:Filesystem): Started cn2.testlab.local
     ip_mysql_hsa (ocf::heartbeat:IPaddr2): Started cn2.testlab.local
     mysql_hsa (ocf::heartbeat:mysql): Started cn2.testlab.local
 Resource Group: MySQL-livedata
     iscsi_mysql_livedata (ocf::heartbeat:iscsi): Started cn3.testlab.local
     volgrp_mysql_livedata (ocf::heartbeat:LVM): Started cn3.testlab.local
     fs_mysql_livedata (ocf::heartbeat:Filesystem): Started cn3.testlab.local
     ip_mysql_livedata (ocf::heartbeat:IPaddr2): Started cn3.testlab.local
     mysql_livedata (ocf::heartbeat:mysql): Started cn3.testlab.local
 stonith_sbd (stonith:external/sbd): Started cn3.testlab.local

You'll see in my config (stuck at the bottom because of length) that I
don't have any colocations; groups have seemed quite sufficient to
make sure everything that needs to be together stays together.  If
handing iSCSI over to pacemaker too, you'll want to make certain that
you disable "iscsi" but leave "iscsid" enabled at boot.  If you leave
the iscsi init script enabled it logs in to everything it knows of
from past 

[Pacemaker] Shared failover

2011-04-08 Thread ruslan usifov
Hello

As said here
http://www.clusterlabs.org/doc/en-US/Pacemaker/1.1/html/Pacemaker_Explained/s-intro-redundancy.html
"Pacemaker allowing several active/passive clusters to be combined and share
a common backup node" But how to implemet such configuration? Cluster form
crutch manual doesn't holds any information about this type of clusters:-((
___
Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://developerbugs.linux-foundation.org/enter_bug.cgi?product=Pacemaker


[Pacemaker] Setting the logfile option in corosync.conf to a directory not created causes corosync to fail to start with non-descriptive parse error...

2011-04-08 Thread Colin Hines
Just adding this as an FYI if anyone comes across it...

Not creating the logfile directory that is listed in corosync.conf will
create the following log errors and corosync will fail to start (this is
with the latest rpm based builds from http://www.clusterlabs.org/rpm/epel-5/

Apr  8 12:34:24 cvt-db-003 corosync[24350]:   [MAIN  ] Successfully read
main configuration file '/etc/corosync/corosync.conf'.
Apr  8 12:34:24 cvt-db-003 corosync[24350]:   [MAIN  ] parse error in
config: parse error in config: .
Apr  8 12:34:24 cvt-db-003 corosync[24350]:   [MAIN  ] Corosync Cluster
Engine exiting with status 8 at main.c:1397.

c
___
Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://developerbugs.linux-foundation.org/enter_bug.cgi?product=Pacemaker


Re: [Pacemaker] Following the clusters from scratch v2 document, and coming up with weird (erroneous?) errors...

2011-04-08 Thread Colin Hines
Okey dokey, I've done some further troubleshooting and started again from
scratch on a new node.  I'm performing this setup on a CentOS 5.5 node.
 Here's an excerpt from my messages file taken after doing a "yum -y install
pacemaker corosync"

Apr  8 11:50:19 cvt-db-003 yum: Updated: bzip2-libs-1.0.3-6.el5_5.x86_64
many packages removed..
Apr  8 11:50:34 cvt-db-003 yum: Installed: corosync-1.2.7-1.1.el5.i386
Apr  8 11:50:34 cvt-db-003 yum: Installed: corosynclib-1.2.7-1.1.el5.x86_64
Apr  8 11:50:34 cvt-db-003 yum: Installed:
pacemaker-libs-1.0.10-1.4.el5.x86_64
Apr  8 11:50:34 cvt-db-003 yum: Installed: corosync-1.2.7-1.1.el5.x86_64
Apr  8 11:50:35 cvt-db-003 yum: Installed:
heartbeat-stonith-2.1.4-11.el5.x86_64
Apr  8 11:50:35 cvt-db-003 yum: Installed: pacemaker-1.0.10-1.4.el5.i386
Apr  8 11:50:35 cvt-db-003 yum: Updated: rpm-libs-4.4.2.3-20.el5_5.1.x86_64
Apr  8 11:50:35 cvt-db-003 yum: Updated: rpm-4.4.2.3-20.el5_5.1.x86_64
Apr  8 11:50:35 cvt-db-003 yum: Updated:
rpm-python-4.4.2.3-20.el5_5.1.x86_64
Apr  8 11:50:36 cvt-db-003 yum: Installed: pacemaker-1.0.10-1.4.el5.x86_64
Apr  8 11:50:39 cvt-db-003 cl_status: [18858]: ERROR: Cannot signon with
heartbeat
Apr  8 11:50:39 cvt-db-003 cl_status: [18858]: ERROR: REASON: hb_api_signon:
Can't initiate connection  to heartbeat
Apr  8 11:50:39 cvt-db-003 cl_status: [18859]: ERROR: Cannot signon with
heartbeat
Apr  8 11:50:39 cvt-db-003 cl_status: [18859]: ERROR: REASON: hb_api_signon:
Can't initiate connection  to heartbeat
Apr  8 11:51:39 cvt-db-003 cl_status: [18971]: ERROR: Cannot signon with
heartbeat
...many more follow


What's weird to me is that I hadn't started ANY services or run any commands
by this point, I'm thinking something in the RPM is kicking off that
cl_status command.

I believe I've determined that when rpm package
heartbeat-3.0.3-2.3.el5.x86_64.rpm is installed, that's when the errors
start occurring.  It seems like that is a required dependency for the latest
pacemaker RPM on http://www.clusterlabs.org/rpm/epel-5/.  I removed the
pacemaker and heartbeat packages using yum, and then re-added them via RPMs,
but found out that pacemaker requires the heartbeat-libs package or tools
such as crm_verify fail.  Following re-install of heartbeat-libs, pacemaker,
and pacemaker-libs with --no-deps, no more erroneous error messages.  I can
break/fix the issue by installing and removing
the heartbeat-3.0.3-2.3.el5.x86_64 package.

c


On Fri, Apr 8, 2011 at 9:48 AM, Lars Ellenberg wrote:

> On Fri, Apr 08, 2011 at 09:13:45AM +0200, Andrew Beekhof wrote:
> > On Thu, Apr 7, 2011 at 11:48 PM, Colin Hines 
> wrote:
> > > I've recently followed the clusters from scratch v2 document for RHEL
> and
> > > although my cluster works and fails over correctly using corosync, I
> have
> > > the following error message coming up in my logs consistently, twice a
> > > minute:
> > > Apr  7 17:44:41 cvt-db-005 cl_status: [5901]: ERROR: Cannot signon with
> > > heartbeat
> > > Apr  7 17:44:41 cvt-db-005 cl_status: [5901]: ERROR: REASON:
> hb_api_signon:
> > > Can't initiate connection  to heartbeat
> >
> > Someone/something is running cl_status.
> > Find out who/what and stop them - it has no place in a corosync based
> cluster.
>
> That could be the status action of the SBD stonith plugin,
> between commits
> http://hg.linux-ha.org/glue/rev/faada7f3d069(Apr 2010)
> http://hg.linux-ha.org/glue/rev/1448deafdf79(May 2010)
>
> if so, upgrade your "cluster glue".
>
> > > I can send my configs, but they're pretty vanilla, has anyone seen
> anything
> > > like this before.   I did have a heartbeat installation on this host
> before
> > > I followed the CFSv2 document, but heartbeat is stopped and I've
> verified
> > > that cl_status doesn't output those errors if I stop corosync.
> > > c
>
> --
> : Lars Ellenberg
> : LINBIT | Your Way to High Availability
> : DRBD/HA support and consulting http://www.linbit.com
>
> DRBD® and LINBIT® are registered trademarks of LINBIT, Austria.
>
> ___
> Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
> http://oss.clusterlabs.org/mailman/listinfo/pacemaker
>
> Project Home: http://www.clusterlabs.org
> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
> Bugs:
> http://developerbugs.linux-foundation.org/enter_bug.cgi?product=Pacemaker
>
___
Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://developerbugs.linux-foundation.org/enter_bug.cgi?product=Pacemaker


[Pacemaker] Using Pacemaker/Corosync to manage 2 node SHARED-DISK Cluster

2011-04-08 Thread Phil Hunt

Hi

I have been playing with DRBD, thats cool

But I have 2 VM RHEL linux boxes.  They each have a boot device (20g) and a 
shared ISCSI 200G volume.  I've played with ucarp and have the commands to make 
available/mount the disk and dismount the shared disk using 
vgchange/mount/umount, etc.

But decided to user packemaker/heartbeat, since it is more robust
Got corosync running, vip running.

But I do not see how to make the Pacemaker section mount or dismount a STANDARD 
EXT3
disk shared.  I've seen tons of tutorials for drbd and clustered FS  but none 
showing a simple mount of a disk as a resouce on a node becoming master, or how 
to dismount it.

Does pacemaker run a script if requested?  Or is the mount/dismount all 
hard-coded.

I know I'm missing something simple here.  I built the following, but the 
colocation and order commands error out as syntax error:

node prodmessage1v
node prodmessage2v
primitive p_fs_data ocf:heartbeat:Filesystem \
params device="/dev/mapper/vg2-dbdata" directory="/data" fstype="ext3"
primitive p_ip ocf:heartbeat:IPaddr2 \
params ip="10.64.114.80" cidr_netmask="32" \
op monitor interval="30s"
primitive p_ping ocf:pacemaker:ping \
params name="p_ping" host_list="10.64.114.47 10.64.114.48 10.64.114.4" \
op monitor interval="15s" timeout="30s"
primitive p_rhap lsb:rhapsody \
op monitor interval="60s" timeout="120s"
group g_cluster_services p_ip p_fs_data p_rhap
clone c_ping p_ping \
meta globally-unique="false"
location loc_ping g_cluster_services \
rule $id="loc_ping-rule" -inf: not_defined p_ping or p_ping lte 0
colocation colo_mnt_on_master inf: g_cluster_services
order ord_mount_after_drbd inf: g_cluster_services:start
property $id="cib-bootstrap-options" \
dc-version="1.0.10-da7075976b5ff0bee71074385f8fd02f296ec8a3" \
cluster-infrastructure="openais" \
expected-quorum-votes="2" \
stonith-enabled="false" \
no-quorum-policy="ignore"
rsc_defaults $id="rsc-options" \
resource-stickiness="100"











PHIL HUNT AMS Consultant 
phil.h...@orionhealth.com 
P: +1 857 488 4749 
M: +1 508 654 7371 
S: philhu0724 
www.orionhealth.com 

___
Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://developerbugs.linux-foundation.org/enter_bug.cgi?product=Pacemaker


Re: [Pacemaker] Help With Cluster Failure

2011-04-08 Thread Andrew Beekhof
Got it. Will have a look on Monday. Happy weekend!

On Fri, Apr 8, 2011 at 2:50 PM,   wrote:
> -Original Message-
> From: Andrew Beekhof [mailto:and...@beekhof.net]
> Sent: 08 April 2011 08:15
> To: The Pacemaker cluster resource manager
> Cc: Darren Mansell
> Subject: Re: [Pacemaker] Help With Cluster Failure
>
> On Thu, Apr 7, 2011 at 12:12 PM,   wrote:
>> Hi all.
>>
>>
>>
>> One of my clusters had a STONITH shoot-out last night and then refused
>
>> to do anything but sit there from 0400 until 0735 after I'd been woken
>
>> up to fix it.
>>
>>
>>
>> In the end, just a resource cleanup fixed it, which I don't think
>> should be the case.
>>
>>
>>
>> I have an 8MB hb_report file. Is that too big to attach to send here?
>> Should I upload it somewhere?
>
> Is there somewhere you can put it and send us a URL?
>
>
>
> Absolutely. Thanks Andrew.
>
> www.mysqlsimplecluster.com/HB_report/DM_report_1.tar.bz2
>
> Darren
>

___
Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://developerbugs.linux-foundation.org/enter_bug.cgi?product=Pacemaker


Re: [Pacemaker] Following the clusters from scratch v2 document, and coming up with weird (erroneous?) errors...

2011-04-08 Thread Lars Ellenberg
On Fri, Apr 08, 2011 at 09:13:45AM +0200, Andrew Beekhof wrote:
> On Thu, Apr 7, 2011 at 11:48 PM, Colin Hines  wrote:
> > I've recently followed the clusters from scratch v2 document for RHEL and
> > although my cluster works and fails over correctly using corosync, I have
> > the following error message coming up in my logs consistently, twice a
> > minute:
> > Apr  7 17:44:41 cvt-db-005 cl_status: [5901]: ERROR: Cannot signon with
> > heartbeat
> > Apr  7 17:44:41 cvt-db-005 cl_status: [5901]: ERROR: REASON: hb_api_signon:
> > Can't initiate connection  to heartbeat
> 
> Someone/something is running cl_status.
> Find out who/what and stop them - it has no place in a corosync based cluster.

That could be the status action of the SBD stonith plugin,
between commits
http://hg.linux-ha.org/glue/rev/faada7f3d069(Apr 2010)
http://hg.linux-ha.org/glue/rev/1448deafdf79(May 2010)

if so, upgrade your "cluster glue".

> > I can send my configs, but they're pretty vanilla, has anyone seen anything
> > like this before.   I did have a heartbeat installation on this host before
> > I followed the CFSv2 document, but heartbeat is stopped and I've verified
> > that cl_status doesn't output those errors if I stop corosync.
> > c

-- 
: Lars Ellenberg
: LINBIT | Your Way to High Availability
: DRBD/HA support and consulting http://www.linbit.com

DRBD® and LINBIT® are registered trademarks of LINBIT, Austria.

___
Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://developerbugs.linux-foundation.org/enter_bug.cgi?product=Pacemaker


[Pacemaker] [PATCH]Low: lib/common/utils.c: Don't try to print unprintable option values in crm_help

2011-04-08 Thread Holger Teutsch
Hi,
during work on the move-XXX stuff I discovered this.
Regards
Holger

# HG changeset patch
# User Holger Teutsch 
# Date 1302259903 -7200
# Branch mig
# Node ID caed31174dc966450a31da048b640201980870a8
# Parent  9451c288259b7b9fd6f32f5df01d47569e570c58
Low: lib/common/utils.c: Don't try to print unprintable option values in crm_help

diff -r 9451c288259b -r caed31174dc9 lib/common/utils.c
--- a/lib/common/utils.c	Tue Apr 05 13:24:21 2011 +0200
+++ b/lib/common/utils.c	Fri Apr 08 12:51:43 2011 +0200
@@ -2281,7 +2281,13 @@
 		fprintf(stream, "%s\n", crm_long_options[i].desc);
 		
 	} else {
-		fprintf(stream, " -%c, --%s%c%s\t%s\n", crm_long_options[i].val, crm_long_options[i].name,
+/* is val printable as char ? */
+if(crm_long_options[i].val <= UCHAR_MAX) {
+fprintf(stream, " -%c,", crm_long_options[i].val);
+} else {
+fputs("", stream);
+}
+		fprintf(stream, " --%s%c%s\t%s\n", crm_long_options[i].name,
 			crm_long_options[i].has_arg?'=':' ',crm_long_options[i].has_arg?"value":"",
 			crm_long_options[i].desc?crm_long_options[i].desc:"");
 	}
___
Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://developerbugs.linux-foundation.org/enter_bug.cgi?product=Pacemaker


Re: [Pacemaker] [PATCH]Bug 2567 - crm resource migrate should support an optional "role" parameter

2011-04-08 Thread Holger Teutsch
On Thu, 2011-04-07 at 12:33 +0200, Dejan Muhamedagic wrote:
> > New syntax:
> > ---
> > 
> > crm_resource --move-from --resource myresource --node mynode
> >-> all resource variants: check whether active on mynode, then create 
> > standby constraint
> > 
> > crm_resource --move-from --resource myresource
> >-> primitive/group: set --node `current_node`, then create standby 
> > constraint
> >-> clone/master: refused
> > 
> > crm_resource --move-to --resource myresource --node mynode
> >   -> all resource variants: create prefer constraint
> > 
> > crm_resource --move-to --resource myresource --master --node mynode
> >   -> master: check whether active as slave on mynode, then create prefer 
> > constraint for master role
> >   -> others: refused
> > 
> > crm_resource --move-cleanup --resource myresource
> >   -> zap constraints
> > 
> > As we are already short on meaningful single letter options I vote for long 
> > options only.
> > 
> > Backwards Compatibility:
> > 
> > 
> > crm_resource {-M|--move} --resource myresource
> >   -> output deprecation warning
> >   -> treat as crm_resource --move-from --resource myresource
> > 
> > crm_resource {-M|--move} --resource myresource --node mynode
> >   -> output deprecation warning
> >   -> treat as crm_resource --move-to --resource myresource --node mynode
> > 
> > crm_resource {-U|--unmove} --resource myresource
> >   -> output deprecation warning
> >   -> treat as crm_resource --move-cleanup --resource myresource
> 
> All looks fine to me.
> 
> > For the shell:
> > Should we go for similar commands or keep "migrate-XXX"
> 
> migrate is a bit of a misnomer, could be confused with the
> migrate operation. I'd vote to leave old migrate/unmigrate
> as deprecated and introduce just move-from/to/cleanup variants.
> 

Deajn & Andrew,
please find attached the patches that implement these commands for
review. The require the patch
 
Low: lib/common/utils.c: Don't try to print unprintable option values in 
crm_help

that I send separately because it is not directly related to the
movement stuff.

I think that the preceding discussions were very valuable to fully
understand issues and implications and I'm confident that the new
command set is consistent and behaves with predictable outcome.

Regards
Holger


diff -r b4f456380f60 doc/crm_cli.txt
--- a/doc/crm_cli.txt	Thu Mar 17 09:41:25 2011 +0100
+++ b/doc/crm_cli.txt	Fri Apr 08 14:23:59 2011 +0200
@@ -810,28 +810,44 @@
 unmanage 
 ...
 
-[[cmdhelp_resource_migrate,migrate a resource to another node]]
- `migrate` (`move`)
-
-Migrate a resource to a different node. If node is left out, the
-resource is migrated by creating a constraint which prevents it from
-running on the current node. Additionally, you may specify a
+[[cmdhelp_resource_move-to,move a resource to another node]]
+ `move-to`
+
+Move a resource to a different node. The resource is moved by creating
+a constraint which forces it to run on the specified node.
+Additionally, you may specify a lifetime for the constraint---once it
+expires, the location constraint will no longer be active.
+For a master resource specify :master to move the master role.
+
+Usage:
+...
+move-to [:master]  [] [force]
+...
+
+[[cmdhelp_resource_move-from,move a resource away from the specified node]]
+ `move-from`
+
+Move a resource away from the specified node. 
+If node is left out, the the node where the resource is currently active
+is used.
+The resource is moved by creating a constraint which prevents it from
+running on the specified node. Additionally, you may specify a
 lifetime for the constraint---once it expires, the location
 constraint will no longer be active.
 
 Usage:
 ...
-migrate  [] [] [force]
+move-from  [] [] [force]
 ...
 
-[[cmdhelp_resource_unmigrate,unmigrate a resource to another node]]
- `unmigrate` (`unmove`)
-
-Remove the constraint generated by the previous migrate command.
+[[cmdhelp_resource_move-cleanup,Cleanup previously created move constraint]]
+ `move-cleanup`
+
+Remove the constraint generated by the previous move-to/move-from command.
 
 Usage:
 ...
-unmigrate 
+move-cleanup 
 ...
 
 [[cmdhelp_resource_param,manage a parameter of a resource]]
diff -r b4f456380f60 tools/crm_resource.c
--- a/tools/crm_resource.c	Thu Mar 17 09:41:25 2011 +0100
+++ b/tools/crm_resource.c	Fri Apr 08 15:02:39 2011 +0200
@@ -52,7 +52,8 @@
 const char *prop_id = NULL;
 const char *prop_set = NULL;
 char *move_lifetime = NULL;
-char rsc_cmd = 'L';
+int move_master = 0;
+int rsc_cmd = 'L';
 char *our_pid = NULL;
 IPC_Channel *crmd_channel = NULL;
 char *xml_file = NULL;
@@ -192,6 +193,33 @@
 return 0;
 }
 
+/* return role of resource on node */
+static int
+role_on_node(resource_t *rsc, const char *node_uname)
+{
+GListPtr lpc = NULL;
+
+if(rsc->variant > pe_nat

Re: [Pacemaker] Help With Cluster Failure

2011-04-08 Thread Darren.Mansell
-Original Message-
From: Andrew Beekhof [mailto:and...@beekhof.net] 
Sent: 08 April 2011 08:15
To: The Pacemaker cluster resource manager
Cc: Darren Mansell
Subject: Re: [Pacemaker] Help With Cluster Failure

On Thu, Apr 7, 2011 at 12:12 PM,   wrote:
> Hi all.
>
>
>
> One of my clusters had a STONITH shoot-out last night and then refused

> to do anything but sit there from 0400 until 0735 after I'd been woken

> up to fix it.
>
>
>
> In the end, just a resource cleanup fixed it, which I don't think 
> should be the case.
>
>
>
> I have an 8MB hb_report file. Is that too big to attach to send here? 
> Should I upload it somewhere?

Is there somewhere you can put it and send us a URL?



Absolutely. Thanks Andrew.

www.mysqlsimplecluster.com/HB_report/DM_report_1.tar.bz2 

Darren

___
Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://developerbugs.linux-foundation.org/enter_bug.cgi?product=Pacemaker


Re: [Pacemaker] Help With Cluster Failure

2011-04-08 Thread Andrew Beekhof
On Thu, Apr 7, 2011 at 12:12 PM,   wrote:
> Hi all.
>
>
>
> One of my clusters had a STONITH shoot-out last night and then refused to do
> anything but sit there from 0400 until 0735 after I’d been woken up to fix
> it.
>
>
>
> In the end, just a resource cleanup fixed it, which I don’t think should be
> the case.
>
>
>
> I have an 8MB hb_report file. Is that too big to attach to send here? Should
> I upload it somewhere?

Is there somewhere you can put it and send us a URL?

___
Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://developerbugs.linux-foundation.org/enter_bug.cgi?product=Pacemaker


Re: [Pacemaker] Following the clusters from scratch v2 document, and coming up with weird (erroneous?) errors...

2011-04-08 Thread Andrew Beekhof
On Thu, Apr 7, 2011 at 11:48 PM, Colin Hines  wrote:
> I've recently followed the clusters from scratch v2 document for RHEL and
> although my cluster works and fails over correctly using corosync, I have
> the following error message coming up in my logs consistently, twice a
> minute:
> Apr  7 17:44:41 cvt-db-005 cl_status: [5901]: ERROR: Cannot signon with
> heartbeat
> Apr  7 17:44:41 cvt-db-005 cl_status: [5901]: ERROR: REASON: hb_api_signon:
> Can't initiate connection  to heartbeat

Someone/something is running cl_status.
Find out who/what and stop them - it has no place in a corosync based cluster.

> I can send my configs, but they're pretty vanilla, has anyone seen anything
> like this before.   I did have a heartbeat installation on this host before
> I followed the CFSv2 document, but heartbeat is stopped and I've verified
> that cl_status doesn't output those errors if I stop corosync.
> c
> ___
> Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
> http://oss.clusterlabs.org/mailman/listinfo/pacemaker
>
> Project Home: http://www.clusterlabs.org
> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
> Bugs:
> http://developerbugs.linux-foundation.org/enter_bug.cgi?product=Pacemaker
>
>

___
Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://developerbugs.linux-foundation.org/enter_bug.cgi?product=Pacemaker