Re: [Pacemaker] Batch pacemaker configuration

2012-06-08 Thread Matteo Bignotti

No the nodes it's ok, shouldn't delete them.

but to use the crm configure erase I would need to stop any resource 
that's running, correct?


what is better? is the cibadmin too invasive?

On 06/08/2012 03:05 AM, Dejan Muhamedagic wrote:

Hi,

On Thu, Jun 07, 2012 at 12:01:16PM -0700, Matteo Bignotti wrote:

Hi guys,

I'm trying to configure in batch pacemaker, so far what I am doing
in orders is

# Flushing the old configuration
cibadmin -E --force

You should use crm configure erase for this. Note that it
doesn't delete nodes (nodes are somewhat special).


#reloading the new one
crm configure load replace [filename]

and then restarting the machine

now, I find it kinda hard to believe that the only batch command to
refresh the configuration needs to be forced.

There should be a reason, doesn't it mention why?


Also, is there any
other way to load a configuration file? (non xml) because the crm
configure sometimes prompts me for a question, which would be always
"Y" but I can't find anywhere in the documentation how to set an
auto "yes" in the command line.

Just use -F (force).

Thanks,

Dejan


thank you guys

Matteo

___
Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org

___
Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org



___
Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org


Re: [Pacemaker] Configuring a cluster for asymmetric operation

2012-06-08 Thread David Vossel
- Original Message -
> From: "Brad Jones" 
> To: "David Coulson" 
> Cc: "The Pacemaker cluster resource manager" 
> Sent: Friday, June 8, 2012 10:21:51 AM
> Subject: Re: [Pacemaker] Configuring a cluster for asymmetric operation
> 
> David - thanks much.  I installed drbd, kept the skeleton (basically
> empty) config file in place on server C, and it has stopped trying to
> monitor.  I still don't quite understand how a server not running a
> particular RA can monitor a service anyway?

The server not running the RA is not monitoring the active resource on another 
server. The monitor action is being used to verify that the RA is _NOT_ running 
on the server it shouldn't be running on.  This is done when pacemaker starts 
up so it can be certain it knows the state of the cluster.

For example, say a daemon pacemaker is supposed to have control over was 
started before pacemaker... Pacemaker uses the monitor operation to detect that 
daemon is already running, otherwise pacemaker wouldn't know it was up and 
might try to start it somewhere else.

-- Vossel


> 
> But my immediate problem is indeed solved and I can keep symmetric
> "on."
> --
> Brad Jones
> b...@jones.name
> Mobile: 303-219-0795
> 
> 
> On Fri, Jun 8, 2012 at 4:45 AM, David Coulson
>  wrote:
> > Pacemaker needs to be able to monitor on all nodes. Maybe if you
> > install
> > drbd on the third node but don't configure anything monitor will
> > correctly
> > report it is not running over there, and your location rules will
> > stop it
> > from even trying.
> >
> > Or just change the RA for DRBD to report not running instead of not
> > installed - I had to do that in a few cases where the RA needed the
> > config
> > file to exist, but the file was on a filesystem managed by
> > pacemaker. Seems
> > to work ok, and isn't impacting the cluster at all.
> >
> >
> > On 6/8/12 3:36 AM, Brad Jones wrote:
> >>
> >> I have a cluster with three nodes, A B and C.  A and B can
> >> basically
> >> stand in for one another; they run a DRBD master/slave set, and
> >> failure of A where most things run normally fires up services on
> >> B.
> >>
> >> C helps arbitrate quorum and runs Stonith plugins.  I've written
> >> location rules to prohibit most of the production services from
> >> running on node C.
> >>
> >> Problem is, the DRBD RA tries to monitor on node C, even though
> >> DRBD
> >> isn't even installed.  (It fails, helpfully, "not installed.")
> >>  I'm
> >> stuck with a slowly-incrementing failcount on C.
> >>
> >> When I give all services a default location rule and set
> >> symmetric-cluster="false", some of the resources stay running on A
> >> but
> >> most of the important ones stop.
> >>
> >> This was raised back in 2010 here:
> >> 
> >> but I can't find much more about this issue on the web and there's
> >> only passing mention of it in the Pacemaker documentation.
> >>
> >> Bottom line: What's required to correctly configure a cluster with
> >> symmetric-cluster set to false?
> >>
> >> FYI I'm on pacemaker 1.1.6, heartbeat 3.0.5, on Ubuntu 10.04.
> >>
> >> Thanks!  Brad
> >> --
> >> Brad Jones
> >> b...@jones.name
> >> Mobile: 303-219-0795
> >>
> >> ___
> >> Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
> >> http://oss.clusterlabs.org/mailman/listinfo/pacemaker
> >>
> >> Project Home: http://www.clusterlabs.org
> >> Getting started:
> >> http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
> >> Bugs: http://bugs.clusterlabs.org
> 
> ___
> Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
> http://oss.clusterlabs.org/mailman/listinfo/pacemaker
> 
> Project Home: http://www.clusterlabs.org
> Getting started:
> http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
> Bugs: http://bugs.clusterlabs.org
> 

___
Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org


Re: [Pacemaker] Configuring a cluster for asymmetric operation

2012-06-08 Thread David Vossel
- Original Message -
> From: "Jake Smith" 
> To: "The Pacemaker cluster resource manager" 
> Sent: Friday, June 8, 2012 10:45:59 AM
> Subject: Re: [Pacemaker] Configuring a cluster for asymmetric operation
> 
> 
> 
> 
> 
> David,
> 
> 
> 
> Just so I'm clearer (when I run into this issue...)
> 
> 
> 
> You have to have all of the supporting software for each resource
> installed (i.e. drbd, nfs, mysql, or whatever) not just the RA's?
> 
> 
> 
> If you have the RA's installed you'd get "not installed" but that is
> not sufficient for Pacemaker to assume it's not running? I would
> think that not installed would be can't be running but... ;-)
>

My understanding is this. If you have symmetric-cluster == true, both the RA 
scripts and all the software the RA scripts use should be installed on every 
node in the cluster.  An RA returning 'not installed' will cause failures if 
symmetric-cluster == true.  This is why Brad had to install the drbd software 
on server C with a skeleton config even though the software will never run.

If symmetric-cluster == false, it appears that it is sufficient enough to just 
have the RA scripts installed.  In that case 'not installed' is treated the 
same as if the resource isn't running.  This bit is based on my analysis of the 
code.  I have not actually tested this scenario out, but this is what I would 
expect to happen.

-- Vossel

> 
> 
> Thanks!
> 
> 
> 
> 
> Jake
> 

___
Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org


Re: [Pacemaker] Announce: pcs / pcs-gui (Pacemaker/Corosync Configuration System)

2012-06-08 Thread Lars Ellenberg
On Wed, Jun 06, 2012 at 07:22:47PM +0200, Rasto Levrinc wrote:
> On Wed, Jun 6, 2012 at 4:45 PM, Lars Ellenberg
>  wrote:
> > On Tue, Jun 05, 2012 at 05:15:04PM +0200, Rasto Levrinc wrote:
> >> On Tue, Jun 5, 2012 at 1:27 PM, Lars Marowsky-Bree  wrote:
> >> > On 2012-06-05T09:43:09, Andrew Beekhof  wrote:
> >> >
> >> >> Every argument made so far applies equally to HAWK and the Linbit GUI,
> >> >> yet there was no outcry when they were announced.
> >> >
> >> > No, like I said above, that did suck - but the architecture truly is
> >> > different and drbd-mc just wasn't the right answer for customers who
> >> > wanted a HTML-only frontend. Besides, this is not an outcry. An outcry
> >> > is revoking people's mailing list privileges and posting angry blogs.
> >> > ;-)
> >>
> >> Ok, I see the point of both sides, so I will not join the outcry. :)
> >>
> >> Just for the record, the drbd mc / lcmc as an applet and a little bit
> >> backend could look like a web application, only better.
> >
> > ... once it is cleaned up to not try to use up a couple GB of RAM and
> > loop in the GC, while the typical default browser plugin JVM settings
> > allow for a handful of MB, max ...  that cleanup may be useful anyways.
> 
> I haven't seen such behavior and I don't know your configuration, so
> thanks for the bug-report, I guess. :)

To be fair, that was on a slow 32bit windows xp in an old IE with
probably old-ish java [*], and a default memory setting for plugin JVM
or (I think) 64M. the config was very simple at that point, like
two node, two drbd, one iSCSI target and lun and IP each,
done from crm shell. After some time things became visible,
but once you started to do something, it would start garbage collecting
and never become responsive again.

Once we started a standalone java, and adjusted the memory parameters
to allow for 500 or 800 or so MB, it became useable.

[*] so it may have been only the old java, even. who knows.

I did not try to reproduce yet in any ways.  But still, even on very
simple configurations, the memory consumption of LCMC can be excessive,
for whatever reason.

-- 
: Lars Ellenberg
: LINBIT | Your Way to High Availability
: DRBD/HA support and consulting http://www.linbit.com

DRBD® and LINBIT® are registered trademarks of LINBIT, Austria.

___
Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org


Re: [Pacemaker] Finer control over when email is sent?

2012-06-08 Thread Lars Ellenberg
On Tue, Jun 05, 2012 at 04:08:02PM -0700, Peter Skirko wrote:
> Looking at the source, it seems like the assumption is to just wire up an
> external program to do the notifications for you, that way you can send
> whatever you do or don't want.
> 
> Thanks,
> -Peter
> 
> On Tue, Jun 5, 2012 at 3:37 PM, Peter Skirko  wrote:
> 
> > Hi,
> >
> > We are using pacemaker 1.0.8 and heartbeat 3.0.3 on ubuntu 10.04. We are

Just for information:
https://launchpad.net/~ubuntu-ha-maintainers/+archive/ppa
deb http://ppa.launchpad.net/ubuntu-ha-maintainers/ppa/ubuntu lucid main 

> > currently sending mail from crm_mon as follows:
> >
> > crm_mon --daemonize --mail-to f...@mixpanel.com --mail-host localhost:25
> >
> > My question is: is it possible to exert finer control over which emails
> > are actually sent?
> >
> > For example, we have ping resources that are checking the health of
> > various network interfaces. Right now, we are receiving emails for start,
> > monitor, and stop events for these resources, but we don't want these
> > emails. We just want emails relating to failures, and just to start and
> > stop events on our IP addresses?
> >
> > I checked the documentation and man pages and didn't see anything
> > immediate, but I wanted to make sure I hadn't overlooked any options.
> >
> > Thanks,
> > -Peter
> >


-- 
: Lars Ellenberg
: LINBIT | Your Way to High Availability
: DRBD/HA support and consulting http://www.linbit.com

___
Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org


Re: [Pacemaker] [RFC] [Patch] DC node preferences (dc-priority)

2012-06-08 Thread Lars Ellenberg
On Mon, Jun 04, 2012 at 11:33:45AM +1000, Andrew Beekhof wrote:
> On Mon, Jun 4, 2012 at 11:28 AM, Andrew Beekhof  wrote:
> > On Fri, May 25, 2012 at 7:48 PM, Florian Haas  wrote:
> >> On Fri, May 25, 2012 at 11:38 AM, Lars Ellenberg
> >>  wrote:
> >>> On Fri, May 25, 2012 at 11:15:32AM +0200, Florian Haas wrote:
>  On Fri, May 25, 2012 at 10:45 AM, Lars Ellenberg
>   wrote:
>  > Sorry, sent to early.
>  >
>  > That would not catch the case of cluster partitions joining,
>  > only the pacemaker startup with fully connected cluster communication
>  > already up.
>  >
>  > I thought about a dc-priority default of 100,
>  > and only triggering a re-election if I am DC,
>  > my dc-priority is < 50, and I see a node joining.
> 
>  Hardcoded arbitrary defaults aren't that much fun. "You can use any
>  number, but 100 is the magic threshold" is something I wouldn't want
>  to explain to people over and over again.
> >>>
> >>> Then don't ;-)
> >>>
> >>> Not helping, and irrelevant to this case.
> >>>
> >>> Besides that was an example.
> >>> Easily possible: move the "I want to lose" vs "I want to win"
> >>> magic number to be 0, and allow both positive and negative priorities.
> >>> You get to decide whether positive or negative is the "I'd rather lose"
> >>> side. Want to make that configurable as well? Right.
> >>
> >> Nope, 0 is used as a threshold value in Pacemaker all over the place.
> >> So allowing both positive and negative priorities and making 0 the
> >> default sounds perfectly sane to me.
> >>
> >>> I don't think this can be made part of the cib configuration,
> >>> DC election takes place before cibs are resynced, so if you have
> >>> diverging cibs, you possibly end up with a never ending election?
> >>>
> >>> Then maybe the election is stable enough,
> >>> even after this change to the algorithm.
> >>
> >> Andrew?
> >
> > Probably.  The preferences are not going to be rapidly changing, so
> > there is no reason to suspect it would destabilise things.
> 
> Oh, you mean if the values are stored in the CIB?
> Yeah, I guess you could have issues if you changed the CIB during a
> cluster partition... dont do that?

Right. That was my concern.
So I'd rather not add them to the cib,
but get them from environment variables.
Which means that I would need to restart the local stack, if I wanted
to change the preference. Good enough.

> Honestly though, given the number (1? 2? 0?) of sites in the world
> that actually need this, my main criteria for a successful patch is
> "not screwing it up for everyone else".
> Which certainly rules out starting elections just because someone
> joined.  Although "i've just started and have a non-zero preference so
> I'm going to force an election" would be fine.

Thanks.
I'll see what the current status of that patch is, and if we can prepare
a patch to be considered for upstream inclusion.
May take a while though, due to round trip times ;-)


-- 
: Lars Ellenberg
: LINBIT | Your Way to High Availability
: DRBD/HA support and consulting http://www.linbit.com

___
Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org


Re: [Pacemaker] Removed nodes showing back in status

2012-06-08 Thread David Vossel
- Original Message -
> From: "Larry Brigman" 
> To: "The Pacemaker cluster resource manager" 
> Sent: Friday, June 8, 2012 3:46:52 PM
> Subject: Re: [Pacemaker] Removed nodes showing back in status
> 
> ping.  What can I do to assist in moving this bug forward to be fix?

Submitting a patch will move it forward quickly.  It is in my queue to look at, 
but I'm not sure when I'll have a dedicated block of time to fix it myself.  It 
will likely get fixed before the next release.

-- Vossel

___
Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org


Re: [Pacemaker] Removed nodes showing back in status

2012-06-08 Thread Larry Brigman
ping.  What can I do to assist in moving this bug forward to be fix?

On Wed, May 30, 2012 at 10:42 AM, Larry Brigman  wrote:
> On Tue, May 29, 2012 at 3:08 PM, Larry Brigman  
> wrote:
>> On Fri, May 25, 2012 at 3:40 PM, David Vossel  wrote:
>>> - Original Message -
 From: "Larry Brigman" 
 To: "The Pacemaker cluster resource manager" 
 
 Sent: Friday, May 25, 2012 5:27:21 PM
 Subject: Re: [Pacemaker] Removed nodes showing back in status

 On Fri, May 25, 2012 at 9:59 AM, Larry Brigman
  wrote:
 > On Wed, May 16, 2012 at 1:53 PM, David Vossel 
 > wrote:
 >> - Original Message -
 >>> From: "Larry Brigman" 
 >>> To: "The Pacemaker cluster resource manager"
 >>> 
 >>> Sent: Monday, May 14, 2012 4:59:55 PM
 >>> Subject: Re: [Pacemaker] Removed nodes showing back in status
 >>>
 >>> On Mon, May 14, 2012 at 2:13 PM, David Vossel
 >>> 
 >>> wrote:
 >>> > - Original Message -
 >>> >> From: "Larry Brigman" 
 >>> >> To: "The Pacemaker cluster resource manager"
 >>> >> 
 >>> >> Sent: Monday, May 14, 2012 1:30:22 PM
 >>> >> Subject: Re: [Pacemaker] Removed nodes showing back in status
 >>> >>
 >>> >> On Mon, May 14, 2012 at 9:54 AM, Larry Brigman
 >>> >>  wrote:
 >>> >> > I have a 5 node cluster (but it could be any number of
 >>> >> > nodes, 3
 >>> >> > or
 >>> >> > larger).
 >>> >> > I am testing some scripts for node removal.
 >>> >> > I remove a node from the cluster and everything looks
 >>> >> > correct
 >>> >> > from
 >>> >> > crm
 >>> >> > status standpoint.
 >>> >> > When I remove a second node, the first node that was removed
 >>> >> > now
 >>> >> > shows back
 >>> >> > in the crm status as off-line.  I'm following the guidelines
 >>> >> > provided
 >>> >> > in Pacemaker Explained docs.
 >>> >> > http://www.clusterlabs.org/doc/en-US/Pacemaker/1.1/html/Pacemaker_Explained/s-node-delete.html
 >>> >> >
 >>> >> > I believe this is a bug but want to put it out to the list
 >>> >> > to be
 >>> >> > sure.
 >>> >> > Versions.
 >>> >> > RHEL5.7 x86_64
 >>> >> > corosync-1.4.2
 >>> >> > openais-1.1.3
 >>> >> > pacemaker-1.1.5
 >>> >> >
 >>> >> > Status after first node removed
 >>> >> > [root@portland-3 ~]# crm status
 >>> >> > 
 >>> >> > Last updated: Mon May 14 08:42:04 2012
 >>> >> > Stack: openais
 >>> >> > Current DC: portland-1 - partition with quorum
 >>> >> > Version:
 >>> >> > 1.1.5-1.3.sme-01e86afaaa6d4a8c4836f68df80ababd6ca3902f
 >>> >> > 4 Nodes configured, 4 expected votes
 >>> >> > 0 Resources configured.
 >>> >> > 
 >>> >> >
 >>> >> > Online: [ portland-1 portland-2 portland-3 portland-4 ]
 >>> >> >
 >>> >> > Status after second node removed.
 >>> >> > [root@portland-3 ~]# crm status
 >>> >> > 
 >>> >> > Last updated: Mon May 14 08:42:45 2012
 >>> >> > Stack: openais
 >>> >> > Current DC: portland-1 - partition with quorum
 >>> >> > Version:
 >>> >> > 1.1.5-1.3.sme-01e86afaaa6d4a8c4836f68df80ababd6ca3902f
 >>> >> > 4 Nodes configured, 3 expected votes
 >>> >> > 0 Resources configured.
 >>> >> > 
 >>> >> >
 >>> >> > Online: [ portland-1 portland-3 portland-4 ]
 >>> >> > OFFLINE: [ portland-5 ]
 >>> >> >
 >>> >> > Both nodes were removed from the cluster from node 1.
 >>> >>
 >>> >> When I added a node back into the cluster the second node
 >>> >> that was removed now shows as offline.
 >>> >
 >>> > The only time I've seen this sort of behavior is when I don't
 >>> > completely shutdown corosync and pacemaker on the node I'm
 >>> > removing before I delete it's configuration from the cib.  Are
 >>> > you
 >>> > sure corosync and pacemaker are gone before you delete the node
 >>> > from the cluster config?
 >>>
 >>> Well, I run service pacemaker stop and service corosync stop
 >>> prior to
 >>> doing
 >>> the remove.  Since I am doing it all in a script it's possible
 >>> that
 >>> there
 >>> is a race condition that I have just expose or the services are
 >>> not
 >>> fully down
 >>> when the service script exits.
 >>
 >> Yep, If you are waiting for the service scripts to return I would
 >> expect it to be safe to remove the nodes at that point.
 >>
 >>> BTW, I'm running pacemaker as it's own process instead of being a
 >>> child of
 >>> corosync (if that makes a difference).
 >>>
 >>
 >> This shouldn't matter.
 >>
 >> An hb_report of this will help us distinguish if this is a bug or
 >> not.
 > Bug opened with the hb and crm reports.
 > https://developerbugs.linuxfoundation.org/show_bug.cgi?id=2648
 >

 I just tried something that seem to poi

Re: [Pacemaker] MySQL not starting

2012-06-08 Thread Yves Trudeau

Hi,
   please use the latest version of the agent and look here for 
documentation:


https://github.com/jayjanssen/Percona-Pacemaker-Resource-Agents/blob/master/doc/PRM-setup-guide.rst

Regards,

Yves

Le 2012-05-16 08:29, Stallmann, Andreas a écrit :

Hi!

I try to get a mysql master / slave setup running, according to
http://www.e-rave.nl/prm-mysql-ha-and-pacemaker-in-the-mix. I had this
setup running already, but now as we are switching from SLES to Ubuntu,
I have to set it up anew.

Here’s the primitive for mysql:

primitive p_mysql ocf:heartbeat:mysql \

params config="/etc/mysql/my.cnf" log="/var/log/mysql.err"
additional_parameters="--debug" pid="/var/run/mysqld/mysqld.pid"
socket="/var/run/mysqld/mysqld.sock" replication_user="repl"
replication_passwd="***" max_slave_lag="15"
evict_outdated_slaves="false" binary="/usr/sbin/mysqld" test_user="root"
test_passwd="***" \

op monitor interval="20s" role="Master" OCF_CHECK_LEVEL="1" \

op monitor interval="30s" role="Slave" timeout="30s" OCF_CHECK_LEVEL="1" \

op start interval="0" timeout="120s" \

op stop interval="0" timeout="120s"

and here’s the master-slave-resource:

ms ms_MySQL p_mysql \

meta master-max="1" master-node-max="1" clone-max="2" clone-node-max="1"
notify="true" globally-unique="false

Some part of it works, as the replication info is available:

property $id="mysql_replication" \

replication_info="10.10.0.101|mysql-bin.05|107"

Still, the agent fails, as crm_mon reports:

Failed actions:

p_mysql:0_start_0 (node=proto-cms-appl01, call=22, rc=-2, status=Timed
Out): unknown exec error

p_mysql:1_start_0 (node=proto-cms-appl02, call=21, rc=-2, status=Timed
Out): unknown exec error

Starting mysql by hand with service mysql startworks nicely, while
ocf-tester –v –n p_mysql –o … /usr/lib…/mysql keeps on reporting

mysql[20688]: ERROR: MySQL is not running

and never (!) ends, unless it is killed via CTRL-C.

I find nothing of any (obvious) relevance in /var/log/syslog besides

May 16 13:27:20 proto-cms-appl01 mysql[1300]: ERROR: ERROR 1210 (HY000)
at line 1: Incorrect arguments to MASTER_HOST

May 16 13:27:20 proto-cms-appl01 mysql[1300]: ERROR: Failed to set master

which is from one hour ago.

mysqld.log, mysql.err, and mysql.log under /var/log/ stay empty, as well
as the directory /var/log/mysql.

Any ideas?

Cheers,

Andreas

BTW: Does anyone else feel, that “unknown exec error” is not the most
informative way to report an error? It would be nice if the “verbosity”
could somehow be increased. What do you think?

PS: Yves, if you happen to read this: Are there any updates due for your
mysql resource agent? When will it be included into the regular
pacemaker-resource-agents package?

--
CONET Solutions GmbH
Andreas Stallmann,
Theodor-Heuss-Allee 19, 53773 Hennef
Tel.: +49 2242 939-677, Fax: +49 2242 939-393
Mobil: +49 172 2455051
Internet: http://www.conet.de, mailto: astallm...@conet.de



CONET Solutions GmbH, Theodor-Heuss-Allee 19, 53773 Hennef.
Registergericht/Registration Court: Amtsgericht Siegburg (HRB Nr. 9136)
Geschäftsführer/Managing Director: Anke Höfer





___
Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org



___
Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org


Re: [Pacemaker] MySQL not starting

2012-06-08 Thread Yves Trudeau

Hi,
   the agent does a RESET SLAVE now and check for master log file 
afterward.  I think this is handled correctly.


Regards,

Yves

Le 2012-05-29 10:19, Stallmann, Andreas a écrit :

Hi!


MySQL has changed CHANGE MASTER TO syntax in 5.1 (IIRC), and it won't accept
an empty host argument anymore. I had to manually patch the RA to use '-' as 
the host argument if it's empty.


Did you send your patch to Yves? Yves, did you include this patch in the newest 
resource agent?

Cheers,

Andreas
PS: Yves, would you consider your mysql resource agent to be "production 
stable"? We're currently on our way to switch from mysql on drbd to mysql 
replication and I'm asking myself if this is a wise idea, yet. No offence meant; I'm just 
curious what your opinion is.


-
CONET Solutions GmbH, Theodor-Heuss-Allee 19, 53773 Hennef.
Registergericht/Registration Court: Amtsgericht Siegburg (HRB Nr. 9136)
Gesch?ftsf?hrer/Managing Directors: Anke H?fer
-


___
Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org




___
Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org


Re: [Pacemaker] Configuring a cluster for asymmetric operation

2012-06-08 Thread Jake Smith

David,

Just so I'm clearer (when I run into this issue...)

You have to have all of the supporting software for each resource installed 
(i.e. drbd, nfs, mysql, or whatever) not just the RA's?

If you have the RA's installed you'd get "not installed" but that is 
not sufficient for Pacemaker to assume it's not running? I would think that not 
installed would be can't be running but... ;-)

Thanks!



Jake


 
- Original Message -

From: "David Vossel" 
To: "The Pacemaker cluster resource manager" 
Sent: Friday, June 8, 2012 11:14:18 AM
Subject: Re: [Pacemaker] Configuring a cluster for asymmetric operation

- Original Message -
> From: "Matthew O'Connor" 
> To: "The Pacemaker cluster resource manager" 
> Sent: Friday, June 8, 2012 8:36:13 AM
> Subject: Re: [Pacemaker] Configuring a cluster for asymmetric operation
>
> I ran across the same issue when putting together an asymmetric
> cluster.  Generally, I've found that it's necessary to make sure all
> nodes have the same resource-related backing software installed.  For
> instance, I too have had to install DRBD on all nodes, even those not
> running DRBD.  iSCSITarget is another one - even with the RA
> available,
> not having the proper service available for the monitor to query can
> cause a monitor error - which tends to nuke the resource on all
> nodes,
> even the properly configured ones.  As it was explained to me,
> Pacemaker
> needs to be able to know that a resource isn't (or is) running on a
> node
> that it isn't (or is) supposed to be running on.  Lack of resource
> agent
> (i.e. lack of ability to monitor) is not taken as an indication that
> such a resource isn't running there.

Yes, this is true. There is no way around this, pacemaker will probe every 
known resource on every node each time it starts up.  Without having the 
resource agents installed on all nodes, pacemaker can not verify a resource is 
running or not, which will cause problems.  Even if the configuration prevents 
a resource from ever being started by pacemaker on a node, pacemaker will still 
need a way to verify that resource isn't already running when it starts up.

-- Vossel

> On 6/8/2012 3:36 AM, Brad Jones wrote:
> > I have a cluster with three nodes, A B and C.  A and B can
> > basically
> > stand in for one another; they run a DRBD master/slave set, and
> > failure of A where most things run normally fires up services on B.
> >
> > C helps arbitrate quorum and runs Stonith plugins.  I've written
> > location rules to prohibit most of the production services from
> > running on node C.
> >
> > Problem is, the DRBD RA tries to monitor on node C, even though
> > DRBD
> > isn't even installed.  (It fails, helpfully, "not installed.")  I'm
> > stuck with a slowly-incrementing failcount on C.
> >
> > When I give all services a default location rule and set
> > symmetric-cluster="false", some of the resources stay running on A
> > but
> > most of the important ones stop.
> >
> > This was raised back in 2010 here:
> > 
> > but I can't find much more about this issue on the web and there's
> > only passing mention of it in the Pacemaker documentation.
> >
> > Bottom line: What's required to correctly configure a cluster with
> > symmetric-cluster set to false?
> >
> > FYI I'm on pacemaker 1.1.6, heartbeat 3.0.5, on Ubuntu 10.04.
> >
> > Thanks!  Brad
> > --
> > Brad Jones
> > b...@jones.name
> > Mobile: 303-219-0795
> >
> > ___
> > Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
> > http://oss.clusterlabs.org/mailman/listinfo/pacemaker
> >
> > Project Home: http://www.clusterlabs.org
> > Getting started:
> > http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
> > Bugs: http://bugs.clusterlabs.org
>
> --
>
> Sincerely,
>   Matthew O'Connor
>
> -
> Sr. Software Engineer
> PGP/GPG Key: 0x55F981C4
> Fingerprint: E5DC A0F8 5A40 E4DA 2CE6 B5A2 014C 2CBF 55F9 81C4
>
> Engineering and Computer Simulations, Inc.
> 11825 High Tech Ave Suite 250
> Orlando, FL 32817
>
> Tel:   407-823-9991 x315
> Fax:   407-823-8299
> Email: m...@ecsorl.com
> Web:   www.ecsorl.com
> -
>
> CONFIDENTIAL NOTICE: The information contained in this electronic
> message is legally privileged, confidential and exempt from
> disclosure
> under applicable law. It is intended only for the use of the
> individual
> or entity named above. If the reader of this message is not the
> intended
> recipient, you are hereby notified that any dissemination,
> distribution
> or copying of this message is strictly prohibited. If you have
> received
> this communication in error, please notify the sender immediately by
> return e-mail and delete the original message and any copies of it
> from
> your computer system. Thank you.
>
> __

Re: [Pacemaker] Configuring a cluster for asymmetric operation

2012-06-08 Thread Brad Jones
David - thanks much.  I installed drbd, kept the skeleton (basically
empty) config file in place on server C, and it has stopped trying to
monitor.  I still don't quite understand how a server not running a
particular RA can monitor a service anyway?

But my immediate problem is indeed solved and I can keep symmetric "on."
--
Brad Jones
b...@jones.name
Mobile: 303-219-0795


On Fri, Jun 8, 2012 at 4:45 AM, David Coulson  wrote:
> Pacemaker needs to be able to monitor on all nodes. Maybe if you install
> drbd on the third node but don't configure anything monitor will correctly
> report it is not running over there, and your location rules will stop it
> from even trying.
>
> Or just change the RA for DRBD to report not running instead of not
> installed - I had to do that in a few cases where the RA needed the config
> file to exist, but the file was on a filesystem managed by pacemaker. Seems
> to work ok, and isn't impacting the cluster at all.
>
>
> On 6/8/12 3:36 AM, Brad Jones wrote:
>>
>> I have a cluster with three nodes, A B and C.  A and B can basically
>> stand in for one another; they run a DRBD master/slave set, and
>> failure of A where most things run normally fires up services on B.
>>
>> C helps arbitrate quorum and runs Stonith plugins.  I've written
>> location rules to prohibit most of the production services from
>> running on node C.
>>
>> Problem is, the DRBD RA tries to monitor on node C, even though DRBD
>> isn't even installed.  (It fails, helpfully, "not installed.")  I'm
>> stuck with a slowly-incrementing failcount on C.
>>
>> When I give all services a default location rule and set
>> symmetric-cluster="false", some of the resources stay running on A but
>> most of the important ones stop.
>>
>> This was raised back in 2010 here:
>> 
>> but I can't find much more about this issue on the web and there's
>> only passing mention of it in the Pacemaker documentation.
>>
>> Bottom line: What's required to correctly configure a cluster with
>> symmetric-cluster set to false?
>>
>> FYI I'm on pacemaker 1.1.6, heartbeat 3.0.5, on Ubuntu 10.04.
>>
>> Thanks!  Brad
>> --
>> Brad Jones
>> b...@jones.name
>> Mobile: 303-219-0795
>>
>> ___
>> Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
>> http://oss.clusterlabs.org/mailman/listinfo/pacemaker
>>
>> Project Home: http://www.clusterlabs.org
>> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
>> Bugs: http://bugs.clusterlabs.org

___
Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org


Re: [Pacemaker] Configuring a cluster for asymmetric operation

2012-06-08 Thread David Vossel
- Original Message -
> From: "Matthew O'Connor" 
> To: "The Pacemaker cluster resource manager" 
> Sent: Friday, June 8, 2012 8:36:13 AM
> Subject: Re: [Pacemaker] Configuring a cluster for asymmetric operation
> 
> I ran across the same issue when putting together an asymmetric
> cluster.  Generally, I've found that it's necessary to make sure all
> nodes have the same resource-related backing software installed.  For
> instance, I too have had to install DRBD on all nodes, even those not
> running DRBD.  iSCSITarget is another one - even with the RA
> available,
> not having the proper service available for the monitor to query can
> cause a monitor error - which tends to nuke the resource on all
> nodes,
> even the properly configured ones.  As it was explained to me,
> Pacemaker
> needs to be able to know that a resource isn't (or is) running on a
> node
> that it isn't (or is) supposed to be running on.  Lack of resource
> agent
> (i.e. lack of ability to monitor) is not taken as an indication that
> such a resource isn't running there.

Yes, this is true. There is no way around this, pacemaker will probe every 
known resource on every node each time it starts up.  Without having the 
resource agents installed on all nodes, pacemaker can not verify a resource is 
running or not, which will cause problems.  Even if the configuration prevents 
a resource from ever being started by pacemaker on a node, pacemaker will still 
need a way to verify that resource isn't already running when it starts up.

-- Vossel

> On 6/8/2012 3:36 AM, Brad Jones wrote:
> > I have a cluster with three nodes, A B and C.  A and B can
> > basically
> > stand in for one another; they run a DRBD master/slave set, and
> > failure of A where most things run normally fires up services on B.
> >
> > C helps arbitrate quorum and runs Stonith plugins.  I've written
> > location rules to prohibit most of the production services from
> > running on node C.
> >
> > Problem is, the DRBD RA tries to monitor on node C, even though
> > DRBD
> > isn't even installed.  (It fails, helpfully, "not installed.")  I'm
> > stuck with a slowly-incrementing failcount on C.
> >
> > When I give all services a default location rule and set
> > symmetric-cluster="false", some of the resources stay running on A
> > but
> > most of the important ones stop.
> >
> > This was raised back in 2010 here:
> > 
> > but I can't find much more about this issue on the web and there's
> > only passing mention of it in the Pacemaker documentation.
> >
> > Bottom line: What's required to correctly configure a cluster with
> > symmetric-cluster set to false?
> >
> > FYI I'm on pacemaker 1.1.6, heartbeat 3.0.5, on Ubuntu 10.04.
> >
> > Thanks!  Brad
> > --
> > Brad Jones
> > b...@jones.name
> > Mobile: 303-219-0795
> >
> > ___
> > Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
> > http://oss.clusterlabs.org/mailman/listinfo/pacemaker
> >
> > Project Home: http://www.clusterlabs.org
> > Getting started:
> > http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
> > Bugs: http://bugs.clusterlabs.org
> 
> --
> 
> Sincerely,
>   Matthew O'Connor
> 
> -
> Sr. Software Engineer
> PGP/GPG Key: 0x55F981C4
> Fingerprint: E5DC A0F8 5A40 E4DA 2CE6 B5A2 014C 2CBF 55F9 81C4
> 
> Engineering and Computer Simulations, Inc.
> 11825 High Tech Ave Suite 250
> Orlando, FL 32817
> 
> Tel:   407-823-9991 x315
> Fax:   407-823-8299
> Email: m...@ecsorl.com
> Web:   www.ecsorl.com
> -
> 
> CONFIDENTIAL NOTICE: The information contained in this electronic
> message is legally privileged, confidential and exempt from
> disclosure
> under applicable law. It is intended only for the use of the
> individual
> or entity named above. If the reader of this message is not the
> intended
> recipient, you are hereby notified that any dissemination,
> distribution
> or copying of this message is strictly prohibited. If you have
> received
> this communication in error, please notify the sender immediately by
> return e-mail and delete the original message and any copies of it
> from
> your computer system. Thank you.
> 
> ___
> Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
> http://oss.clusterlabs.org/mailman/listinfo/pacemaker
> 
> Project Home: http://www.clusterlabs.org
> Getting started:
> http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
> Bugs: http://bugs.clusterlabs.org
> 

___
Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org


Re: [Pacemaker] Problem with state: UNCLEAN (OFFLINE)

2012-06-08 Thread Digimer
I've seen the cable coming from the redundant PSU's backplane short 
against the chassis. I've seen the RJ45 connector for IPMI/iLO go back. 
Of course, a switch port could go bad or a network cable could come out. 
There are many ways that IPMI/iLO could fail independent of the incoming 
power.


For this reason, if you can test (I think you are not in production yet) 
the latest version of pacemaker, setup a switched PDU as a backup fence 
device. This is what I always do in RHCS. This way, if the IPMI does 
fail for whatever reason, you can reach out to the PDU(s) and cut off 
the power to both sides of the PSU.


For example (using pseudo cluster.conf terms)

Node1
  fence method1
IPMI_node1
  fence method2
pdu1 - outlet 1
pdu2 - outlet 1
Node2
  fence method1
IPMI_node2
  fence method2
pdu1 - outlet 2
pdu2 - outlet 2

Note that both PDUs will have to return success for the method itself to 
be considered a success.


On 06/08/2012 09:11 AM, Juan M. Sierra wrote:

Hello,

First of all, thank you very much for your quickly reply.

Your advice has made me thinking about the energy problem and its
relation with stonith. In my case, I use two machines with ILO-similar
system (like HP servers) and two power supplies.

Really, it's a very strange event that the two power supplies will fail
together. The another case would be the motherboard will get seriously
damaged.

In any case, I understand I'll need a third element (independent of both
machines) to ensure that stonith works fine. Maybe something like an UPS
or an advanced power supply line.

I'll try to investigate about this a little more. Again, thank you a lot
for your help.

Cheers,

El 08/06/12 13:45, Florian Haas escribió:

On Fri, Jun 8, 2012 at 1:01 PM, Juan M. Sierra wrote:

Problem with state: UNCLEAN (OFFLINE)

Hello,

I'm trying to get up a directord service with pacemaker.

But, I found a problem with the unclean (offline) state. The initial
state
of my cluster was this:

Online: [ node2 node1 ]

node1-STONITH (stonith:external/ipmi): Started node2
node2-STONITH (stonith:external/ipmi): Started node1
Clone Set: Connected
Started: [ node2 node1 ]
Clone Set: ldirector-activo-activo
Started: [ node2 node1 ]
ftp-vip (ocf::heartbeat:IPaddr): Started node1
web-vip (ocf::heartbeat:IPaddr): Started node2

Migration summary:
* Node node1: pingd=2000
* Node node2: pingd=2000
node2-STONITH: migration-threshold=100 fail-count=100

and then, I removed the electric connection of node1, the state was the
next:

Node node1 (8b2aede9-61bb-4a5a-aef6-25fbdefdddfd): UNCLEAN (offline)
Online: [ node2 ]

node1-STONITH (stonith:external/ipmi): Started node2 FAILED
Clone Set: Connected
Started: [ node2 ]
Stopped: [ ping:1 ]
Clone Set: ldirector-activo-activo
Started: [ node2 ]
Stopped: [ ldirectord:1 ]
web-vip (ocf::heartbeat:IPaddr): Started node2

Migration summary:
* Node node2: pingd=2000
node2-STONITH: migration-threshold=100 fail-count=100
node1-STONITH: migration-threshold=100 fail-count=100

Failed actions:
node2-STONITH_start_0 (node=node2, call=22, rc=2, status=complete):
invalid parameter
node1-STONITH_monitor_6 (node=node2, call=11, rc=14,
status=complete): status: unknown
node1-STONITH_start_0 (node=node2, call=34, rc=1, status=complete):
unknown error

I was hoping that node2 take the management of ftp-vip resource, but it
wasn't in that way. node1 kept in a unclean state and node2 didn't
take the
management of its resources. When I put back the electric connection of
node1 and it was recovered then, node2 took the management of ftp-vip
resource.

I've seen some similar conversations here. Please, could you show me
some
idea about this subject or some thread where this is discussed?

Well your healthy node failed to fence your offending node. So fix
your STONITH device configuration and as soon as that is able to
fence, your failover should work fine.

Of course, if your IPMI BMC fails immediately after you remove power
from the machine (i.e. it has no backup battery so it can at least
report the power status), then you might have to fix your issue by
switching to a different STONITH device altogether.

Cheers,
Florian






--
Digimer
Papers and Projects: https://alteeve.com

___
Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org


Re: [Pacemaker] Configuring a cluster for asymmetric operation

2012-06-08 Thread Matthew O'Connor
I ran across the same issue when putting together an asymmetric
cluster.  Generally, I've found that it's necessary to make sure all
nodes have the same resource-related backing software installed.  For
instance, I too have had to install DRBD on all nodes, even those not
running DRBD.  iSCSITarget is another one - even with the RA available,
not having the proper service available for the monitor to query can
cause a monitor error - which tends to nuke the resource on all nodes,
even the properly configured ones.  As it was explained to me, Pacemaker
needs to be able to know that a resource isn't (or is) running on a node
that it isn't (or is) supposed to be running on.  Lack of resource agent
(i.e. lack of ability to monitor) is not taken as an indication that
such a resource isn't running there.

On 6/8/2012 3:36 AM, Brad Jones wrote:
> I have a cluster with three nodes, A B and C.  A and B can basically
> stand in for one another; they run a DRBD master/slave set, and
> failure of A where most things run normally fires up services on B.
>
> C helps arbitrate quorum and runs Stonith plugins.  I've written
> location rules to prohibit most of the production services from
> running on node C.
>
> Problem is, the DRBD RA tries to monitor on node C, even though DRBD
> isn't even installed.  (It fails, helpfully, "not installed.")  I'm
> stuck with a slowly-incrementing failcount on C.
>
> When I give all services a default location rule and set
> symmetric-cluster="false", some of the resources stay running on A but
> most of the important ones stop.
>
> This was raised back in 2010 here:
> 
> but I can't find much more about this issue on the web and there's
> only passing mention of it in the Pacemaker documentation.
>
> Bottom line: What's required to correctly configure a cluster with
> symmetric-cluster set to false?
>
> FYI I'm on pacemaker 1.1.6, heartbeat 3.0.5, on Ubuntu 10.04.
>
> Thanks!  Brad
> --
> Brad Jones
> b...@jones.name
> Mobile: 303-219-0795
>
> ___
> Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
> http://oss.clusterlabs.org/mailman/listinfo/pacemaker
>
> Project Home: http://www.clusterlabs.org
> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
> Bugs: http://bugs.clusterlabs.org

-- 

Sincerely,
  Matthew O'Connor

-
Sr. Software Engineer
PGP/GPG Key: 0x55F981C4
Fingerprint: E5DC A0F8 5A40 E4DA 2CE6 B5A2 014C 2CBF 55F9 81C4

Engineering and Computer Simulations, Inc.
11825 High Tech Ave Suite 250
Orlando, FL 32817

Tel:   407-823-9991 x315
Fax:   407-823-8299
Email: m...@ecsorl.com
Web:   www.ecsorl.com
-

CONFIDENTIAL NOTICE: The information contained in this electronic
message is legally privileged, confidential and exempt from disclosure
under applicable law. It is intended only for the use of the individual
or entity named above. If the reader of this message is not the intended
recipient, you are hereby notified that any dissemination, distribution
or copying of this message is strictly prohibited. If you have received
this communication in error, please notify the sender immediately by
return e-mail and delete the original message and any copies of it from
your computer system. Thank you.

___
Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org


Re: [Pacemaker] Problem with state: UNCLEAN (OFFLINE)

2012-06-08 Thread Juan M. Sierra

Hello,

Thank you a lot. It's an interesting thread for my problem. I'll 
investigate about it more.


Regards,

El 08/06/12 13:51, Florian Crouzat escribió:

Le 08/06/2012 13:01, Juan M. Sierra a écrit :

Problem with state: UNCLEAN (OFFLINE)

Hello,

I'm trying to get up a directord service with pacemaker.

But, I found a problem with the unclean (offline) state. The initial
state of my cluster was this:

/Online: [ node2 node1 ]

node1-STONITH (stonith:external/ipmi): Started node2
node2-STONITH (stonith:external/ipmi): Started node1
Clone Set: Connected
Started: [ node2 node1 ]
Clone Set: ldirector-activo-activo
Started: [ node2 node1 ]
ftp-vip (ocf::heartbeat:IPaddr): Started node1
web-vip (ocf::heartbeat:IPaddr): Started node2

Migration summary:
* Node node1: pingd=2000
* Node node2: pingd=2000
node2-STONITH: migration-threshold=100 fail-count=100
/

and then, I removed the electric connection of node1, the state was the
next:

/Node node1 (8b2aede9-61bb-4a5a-aef6-25fbdefdddfd): UNCLEAN 
(offline)

Online: [ node2 ]

node1-STONITH (stonith:external/ipmi): Started node2 FAILED
Clone Set: Connected
Started: [ node2 ]
Stopped: [ ping:1 ]
Clone Set: ldirector-activo-activo
Started: [ node2 ]
Stopped: [ ldirectord:1 ]
web-vip (ocf::heartbeat:IPaddr): Started node2

Migration summary:
* Node node2: pingd=2000
node2-STONITH: migration-threshold=100 fail-count=100
node1-STONITH: migration-threshold=100 fail-count=100

Failed actions:
node2-STONITH_start_0 (node=node2, call=22, rc=2, status=complete):
invalid parameter
node1-STONITH_monitor_6 (node=node2, call=11, rc=14,
status=complete): status: unknown
node1-STONITH_start_0 (node=node2, call=34, rc=1, status=complete):
unknown error
/

I was hoping that node2 take the management of ftp-vip resource, but it
wasn't in that way. node1 kept in a unclean state and node2 didn't take
the management of its resources. When I put back the electric connection
of node1 and it was recovered then, node2 took the management of ftp-vip
resource.

I've seen some similar conversations here. Please, could you show me
some idea about this subject or some thread where this is discussed?

Thanks a lot!

Regards,



It has been discussed for resource failover but I guess it's the same: 
http://oss.clusterlabs.org/pipermail/pacemaker/2012-May/014260.html


The motto here (discovered it a couple days ago) is "better have a 
hanged cluster than a corrupted one, especially with shared 
filesystem/resources.".
So, node1 failed but node2 hasn't been able to confirm its death 
because stonith failed apparently, then, the design choice is for the 
cluster to hang while waiting for a way to know the real state of 
node1 (at reboot in this case).





--
Juan Manuel Sierra Prieto
Administración de Sistemas
Centro Informatico Cientifico de Andalucia (CICA)
Avda. Reina Mercedes s/n - 41012 - Sevilla (Spain)
Tfno.: +34 955 056 600 / FAX: +34 955 056 650
Consejería de Economía, Innovación y Ciencia
Junta de Andalucía


___
Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org


Re: [Pacemaker] Problem with state: UNCLEAN (OFFLINE)

2012-06-08 Thread Juan M. Sierra

Hello,

First of all, thank you very much for your quickly reply.

Your advice has made me thinking about the energy problem and its 
relation with stonith. In my case, I use two machines with ILO-similar 
system (like HP servers) and two power supplies.


Really, it's a very strange event that the two power supplies will fail 
together. The another case would be the motherboard will get seriously 
damaged.


In any case, I understand I'll need a third element (independent of both 
machines) to ensure that stonith works fine. Maybe something like an UPS 
or an advanced power supply line.


I'll try to investigate about this a little more. Again, thank you a lot 
for your help.


Cheers,

El 08/06/12 13:45, Florian Haas escribió:

On Fri, Jun 8, 2012 at 1:01 PM, Juan M. Sierra  wrote:

Problem with state: UNCLEAN (OFFLINE)

Hello,

I'm trying to get up a directord service with pacemaker.

But, I found a problem with the unclean (offline) state. The initial state
of my cluster was this:

Online: [ node2 node1 ]

node1-STONITH(stonith:external/ipmi):Started node2
node2-STONITH(stonith:external/ipmi):Started node1
  Clone Set: Connected
  Started: [ node2 node1 ]
  Clone Set: ldirector-activo-activo
  Started: [ node2 node1 ]
ftp-vip (ocf::heartbeat:IPaddr):Started node1
web-vip (ocf::heartbeat:IPaddr):Started node2

Migration summary:
* Node node1:  pingd=2000
* Node node2:  pingd=2000
node2-STONITH: migration-threshold=100 fail-count=100

and then, I removed the electric connection of node1, the state was the
next:

Node node1 (8b2aede9-61bb-4a5a-aef6-25fbdefdddfd): UNCLEAN (offline)
Online: [ node2 ]

node1-STONITH(stonith:external/ipmi):Started node2 FAILED
  Clone Set: Connected
  Started: [ node2 ]
  Stopped: [ ping:1 ]
  Clone Set: ldirector-activo-activo
  Started: [ node2 ]
  Stopped: [ ldirectord:1 ]
web-vip (ocf::heartbeat:IPaddr):Started node2

Migration summary:
* Node node2:  pingd=2000
node2-STONITH: migration-threshold=100 fail-count=100
node1-STONITH: migration-threshold=100 fail-count=100

Failed actions:
 node2-STONITH_start_0 (node=node2, call=22, rc=2, status=complete):
invalid parameter
 node1-STONITH_monitor_6 (node=node2, call=11, rc=14,
status=complete): status: unknown
 node1-STONITH_start_0 (node=node2, call=34, rc=1, status=complete):
unknown error

I was hoping that node2 take the management of ftp-vip resource, but it
wasn't in that way. node1 kept in a unclean state and node2 didn't take the
management of its resources. When I put back the electric connection of
node1 and it was recovered then, node2 took the management of ftp-vip
resource.

I've seen some similar conversations here. Please, could you show me some
idea about this subject or some thread where this is discussed?

Well your healthy node failed to fence your offending node. So fix
your STONITH device configuration and as soon as that is able to
fence, your failover should work fine.

Of course, if your IPMI BMC fails immediately after you remove power
from the machine (i.e. it has no backup battery so it can at least
report the power status), then you might have to fix your issue by
switching to a different STONITH device altogether.

Cheers,
Florian



--
Juan Manuel Sierra Prieto
Administración de Sistemas
Centro Informatico Cientifico de Andalucia (CICA)
Avda. Reina Mercedes s/n - 41012 - Sevilla (Spain)
Tfno.: +34 955 056 600 / FAX: +34 955 056 650
Consejería de Economía, Innovación y Ciencia
Junta de Andalucía


___
Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org


Re: [Pacemaker] Problem with state: UNCLEAN (OFFLINE)

2012-06-08 Thread Florian Haas
On Fri, Jun 8, 2012 at 1:01 PM, Juan M. Sierra  wrote:
> Problem with state: UNCLEAN (OFFLINE)
>
> Hello,
>
> I'm trying to get up a directord service with pacemaker.
>
> But, I found a problem with the unclean (offline) state. The initial state
> of my cluster was this:
>
> Online: [ node2 node1 ]
>
> node1-STONITH    (stonith:external/ipmi):    Started node2
> node2-STONITH    (stonith:external/ipmi):    Started node1
>  Clone Set: Connected
>  Started: [ node2 node1 ]
>  Clone Set: ldirector-activo-activo
>  Started: [ node2 node1 ]
> ftp-vip (ocf::heartbeat:IPaddr):    Started node1
> web-vip (ocf::heartbeat:IPaddr):    Started node2
>
> Migration summary:
> * Node node1:  pingd=2000
> * Node node2:  pingd=2000
>    node2-STONITH: migration-threshold=100 fail-count=100
>
> and then, I removed the electric connection of node1, the state was the
> next:
>
> Node node1 (8b2aede9-61bb-4a5a-aef6-25fbdefdddfd): UNCLEAN (offline)
> Online: [ node2 ]
>
> node1-STONITH    (stonith:external/ipmi):    Started node2 FAILED
>  Clone Set: Connected
>  Started: [ node2 ]
>  Stopped: [ ping:1 ]
>  Clone Set: ldirector-activo-activo
>  Started: [ node2 ]
>  Stopped: [ ldirectord:1 ]
> web-vip (ocf::heartbeat:IPaddr):    Started node2
>
> Migration summary:
> * Node node2:  pingd=2000
>    node2-STONITH: migration-threshold=100 fail-count=100
>    node1-STONITH: migration-threshold=100 fail-count=100
>
> Failed actions:
>     node2-STONITH_start_0 (node=node2, call=22, rc=2, status=complete):
> invalid parameter
>     node1-STONITH_monitor_6 (node=node2, call=11, rc=14,
> status=complete): status: unknown
>     node1-STONITH_start_0 (node=node2, call=34, rc=1, status=complete):
> unknown error
>
> I was hoping that node2 take the management of ftp-vip resource, but it
> wasn't in that way. node1 kept in a unclean state and node2 didn't take the
> management of its resources. When I put back the electric connection of
> node1 and it was recovered then, node2 took the management of ftp-vip
> resource.
>
> I've seen some similar conversations here. Please, could you show me some
> idea about this subject or some thread where this is discussed?

Well your healthy node failed to fence your offending node. So fix
your STONITH device configuration and as soon as that is able to
fence, your failover should work fine.

Of course, if your IPMI BMC fails immediately after you remove power
from the machine (i.e. it has no backup battery so it can at least
report the power status), then you might have to fix your issue by
switching to a different STONITH device altogether.

Cheers,
Florian

-- 
Need help with High Availability?
http://www.hastexo.com/now

___
Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org


Re: [Pacemaker] Problem with state: UNCLEAN (OFFLINE)

2012-06-08 Thread Florian Crouzat

Le 08/06/2012 13:01, Juan M. Sierra a écrit :

Problem with state: UNCLEAN (OFFLINE)

Hello,

I'm trying to get up a directord service with pacemaker.

But, I found a problem with the unclean (offline) state. The initial
state of my cluster was this:

/Online: [ node2 node1 ]

node1-STONITH (stonith:external/ipmi): Started node2
node2-STONITH (stonith:external/ipmi): Started node1
Clone Set: Connected
Started: [ node2 node1 ]
Clone Set: ldirector-activo-activo
Started: [ node2 node1 ]
ftp-vip (ocf::heartbeat:IPaddr): Started node1
web-vip (ocf::heartbeat:IPaddr): Started node2

Migration summary:
* Node node1: pingd=2000
* Node node2: pingd=2000
node2-STONITH: migration-threshold=100 fail-count=100
/

and then, I removed the electric connection of node1, the state was the
next:

/Node node1 (8b2aede9-61bb-4a5a-aef6-25fbdefdddfd): UNCLEAN (offline)
Online: [ node2 ]

node1-STONITH (stonith:external/ipmi): Started node2 FAILED
Clone Set: Connected
Started: [ node2 ]
Stopped: [ ping:1 ]
Clone Set: ldirector-activo-activo
Started: [ node2 ]
Stopped: [ ldirectord:1 ]
web-vip (ocf::heartbeat:IPaddr): Started node2

Migration summary:
* Node node2: pingd=2000
node2-STONITH: migration-threshold=100 fail-count=100
node1-STONITH: migration-threshold=100 fail-count=100

Failed actions:
node2-STONITH_start_0 (node=node2, call=22, rc=2, status=complete):
invalid parameter
node1-STONITH_monitor_6 (node=node2, call=11, rc=14,
status=complete): status: unknown
node1-STONITH_start_0 (node=node2, call=34, rc=1, status=complete):
unknown error
/

I was hoping that node2 take the management of ftp-vip resource, but it
wasn't in that way. node1 kept in a unclean state and node2 didn't take
the management of its resources. When I put back the electric connection
of node1 and it was recovered then, node2 took the management of ftp-vip
resource.

I've seen some similar conversations here. Please, could you show me
some idea about this subject or some thread where this is discussed?

Thanks a lot!

Regards,



It has been discussed for resource failover but I guess it's the same: 
http://oss.clusterlabs.org/pipermail/pacemaker/2012-May/014260.html


The motto here (discovered it a couple days ago) is "better have a 
hanged cluster than a corrupted one, especially with shared 
filesystem/resources.".
So, node1 failed but node2 hasn't been able to confirm its death because 
stonith failed apparently, then, the design choice is for the cluster to 
hang while waiting for a way to know the real state of node1 (at reboot 
in this case).



--
Cheers,
Florian Crouzat

___
Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org


[Pacemaker] Problem with state: UNCLEAN (OFFLINE)

2012-06-08 Thread Juan M. Sierra

Problem with state: UNCLEAN (OFFLINE)

Hello,

I'm trying to get up a directord service with pacemaker.

But, I found a problem with the unclean (offline) state. The initial 
state of my cluster was this:


   /Online: [ node2 node1 ]

   node1-STONITH(stonith:external/ipmi):Started node2
   node2-STONITH(stonith:external/ipmi):Started node1
 Clone Set: Connected
 Started: [ node2 node1 ]
 Clone Set: ldirector-activo-activo
 Started: [ node2 node1 ]
   ftp-vip (ocf::heartbeat:IPaddr):Started node1
   web-vip (ocf::heartbeat:IPaddr):Started node2

   Migration summary:
   * Node node1:  pingd=2000
   * Node node2:  pingd=2000
   node2-STONITH: migration-threshold=100 fail-count=100
   /

and then, I removed the electric connection of node1, the state was the 
next:


   /Node node1 (8b2aede9-61bb-4a5a-aef6-25fbdefdddfd): UNCLEAN (offline)
   Online: [ node2 ]

   node1-STONITH(stonith:external/ipmi):Started node2 FAILED
 Clone Set: Connected
 Started: [ node2 ]
 Stopped: [ ping:1 ]
 Clone Set: ldirector-activo-activo
 Started: [ node2 ]
 Stopped: [ ldirectord:1 ]
   web-vip (ocf::heartbeat:IPaddr):Started node2

   Migration summary:
   * Node node2:  pingd=2000
   node2-STONITH: migration-threshold=100 fail-count=100
   node1-STONITH: migration-threshold=100 fail-count=100

   Failed actions:
node2-STONITH_start_0 (node=node2, call=22, rc=2,
   status=complete): invalid parameter
node1-STONITH_monitor_6 (node=node2, call=11, rc=14,
   status=complete): status: unknown
node1-STONITH_start_0 (node=node2, call=34, rc=1,
   status=complete): unknown error
   /

I was hoping that node2 take the management of ftp-vip resource, but it 
wasn't in that way. node1 kept in a unclean state and node2 didn't take 
the management of its resources. When I put back the electric connection 
of node1 and it was recovered then, node2 took the management of ftp-vip 
resource.


I've seen some similar conversations here. Please, could you show me 
some idea about this subject or some thread where this is discussed?


Thanks a lot!

Regards,

--
Juan Manuel Sierra Prieto
Administración de Sistemas
Centro Informatico Cientifico de Andalucia (CICA)
Avda. Reina Mercedes s/n - 41012 - Sevilla (Spain)
Tfno.: +34 955 056 600 / FAX: +34 955 056 650
Consejería de Economía, Innovación y Ciencia
Junta de Andalucía

___
Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org


Re: [Pacemaker] Configuring a cluster for asymmetric operation

2012-06-08 Thread David Coulson
Pacemaker needs to be able to monitor on all nodes. Maybe if you install 
drbd on the third node but don't configure anything monitor will 
correctly report it is not running over there, and your location rules 
will stop it from even trying.


Or just change the RA for DRBD to report not running instead of not 
installed - I had to do that in a few cases where the RA needed the 
config file to exist, but the file was on a filesystem managed by 
pacemaker. Seems to work ok, and isn't impacting the cluster at all.


On 6/8/12 3:36 AM, Brad Jones wrote:

I have a cluster with three nodes, A B and C.  A and B can basically
stand in for one another; they run a DRBD master/slave set, and
failure of A where most things run normally fires up services on B.

C helps arbitrate quorum and runs Stonith plugins.  I've written
location rules to prohibit most of the production services from
running on node C.

Problem is, the DRBD RA tries to monitor on node C, even though DRBD
isn't even installed.  (It fails, helpfully, "not installed.")  I'm
stuck with a slowly-incrementing failcount on C.

When I give all services a default location rule and set
symmetric-cluster="false", some of the resources stay running on A but
most of the important ones stop.

This was raised back in 2010 here:

but I can't find much more about this issue on the web and there's
only passing mention of it in the Pacemaker documentation.

Bottom line: What's required to correctly configure a cluster with
symmetric-cluster set to false?

FYI I'm on pacemaker 1.1.6, heartbeat 3.0.5, on Ubuntu 10.04.

Thanks!  Brad
--
Brad Jones
b...@jones.name
Mobile: 303-219-0795

___
Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org


___
Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org


Re: [Pacemaker] Batch pacemaker configuration

2012-06-08 Thread Dejan Muhamedagic
Hi,

On Thu, Jun 07, 2012 at 12:01:16PM -0700, Matteo Bignotti wrote:
> Hi guys,
> 
> I'm trying to configure in batch pacemaker, so far what I am doing
> in orders is
> 
> # Flushing the old configuration
> cibadmin -E --force

You should use crm configure erase for this. Note that it
doesn't delete nodes (nodes are somewhat special).

> #reloading the new one
> crm configure load replace [filename]
> 
> and then restarting the machine
> 
> now, I find it kinda hard to believe that the only batch command to
> refresh the configuration needs to be forced.

There should be a reason, doesn't it mention why?

> Also, is there any
> other way to load a configuration file? (non xml) because the crm
> configure sometimes prompts me for a question, which would be always
> "Y" but I can't find anywhere in the documentation how to set an
> auto "yes" in the command line.

Just use -F (force). 

Thanks,

Dejan

> thank you guys
> 
> Matteo
> 
> ___
> Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
> http://oss.clusterlabs.org/mailman/listinfo/pacemaker
> 
> Project Home: http://www.clusterlabs.org
> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
> Bugs: http://bugs.clusterlabs.org

___
Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org


Re: [Pacemaker] Behavior of booth arbitrator when used pacemaker

2012-06-08 Thread Yuichi SEINO
Hi Jiaiu,

I could  understand this answer. I thought that this behavior is
strange when watching crm_mon.
Currently, I see that this behavior is correct.

Thank you.

Sincerely,
Yuichi

2012/6/7 Jiaju Zhang :
> On Thu, 2012-06-07 at 18:22 +0900, Yuichi SEINO wrote:
>> Hi  Jiaju,
>>
>> I have a question about booth arbitrator.
>> I use pacemaker to start up booth arbitrator.
>> When booth arbitrator is started up,  a ticket information is not
>> still written in cib.
>> This cib have the node with arbitrator.
>> But a ticket information is written in this cib after a ticket was
>> granted to a site.
>> Since a ticket was granted, cib always have a ticket information.
>> I guess that crm_ticket write a ticket information.
>
> Yes, currently the behavior is like this.
>
>>
>> Do you think this behavior is a bug?
>
> I've been thinking it is a usability issue but not a bug;) since what we
> cared about is the granted ticket information. If it has not been
> granted before, it doesn't matter.
>
>>
>> I think that cib should have a ticket information when booth
>> arbitrator is started up.
>
> So we may need to add a logic, which initializes the ticket information
> to the cib at first.
>
> Thanks,
> Jiaju
>



-- 
Yuichi SEINO
METROSYSTEMS CORPORATION
E-mail:seino.clust...@gmail.com

___
Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org


Re: [Pacemaker] ERROR: create_notification_boundaries

2012-06-08 Thread Florian Haas
On Fri, Jun 8, 2012 at 8:49 AM, Sébastien Riccio  wrote:
> Hi,
>
> while reading the corosync log file i'm seeing a lot of these entries:
>
> Jun 08 04:11:43 filer-01-b pengine: [13718]: ERROR:
> create_notification_boundaries: Creating boundaries for ms_DATA1
> Jun 08 04:11:43 filer-01-b pengine: [13718]: ERROR:
> create_notification_boundaries: Creating boundaries for ms_DATA1
> Jun 08 04:11:43 filer-01-b pengine: [13718]: ERROR:
> create_notification_boundaries: Creating boundaries for ms_DATA1
> Jun 08 04:11:43 filer-01-b pengine: [13718]: ERROR:
> create_notification_boundaries: Creating boundaries for ms_DATA1
>
> I found this thread talking about it:
> http://www.mail-archive.com/pacemaker@oss.clusterlabs.org/msg05414.html

That's actually a Pacemaker issue, not Corosync.

> But as I'm running the debian squeeze version of corosync, I would like to
> know if it's still for debug purposes or if I have to worry about it.
> If not, is there a way to disable such debug messages in the logs ?
>
> root# corosync -v
> Corosync Cluster Engine, version '1.2.1' SVN revision '2723:2724'
> Copyright (c) 2006-2009 Red Hat, Inc.
>
> root# dpkg -l | grep corosync
> ii  corosync                             1.2.1-4
>  Standards-based cluster framework (daemon and modules)
> ii  libcorosync4                         1.2.1-4
>  Standards-based cluster framework (libraries)
>
> Thanks a lot.

aptitude install -t squeeze-backports pacemaker corosync

The Corosync and Pacemaker versions in stock Debian squeeze are
sufficiently ancient to warrant an upgrade.

Cheers,
Florian

-- 
Need help with High Availability?
http://www.hastexo.com/now

___
Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org


[Pacemaker] Configuring a cluster for asymmetric operation

2012-06-08 Thread Brad Jones
I have a cluster with three nodes, A B and C.  A and B can basically
stand in for one another; they run a DRBD master/slave set, and
failure of A where most things run normally fires up services on B.

C helps arbitrate quorum and runs Stonith plugins.  I've written
location rules to prohibit most of the production services from
running on node C.

Problem is, the DRBD RA tries to monitor on node C, even though DRBD
isn't even installed.  (It fails, helpfully, "not installed.")  I'm
stuck with a slowly-incrementing failcount on C.

When I give all services a default location rule and set
symmetric-cluster="false", some of the resources stay running on A but
most of the important ones stop.

This was raised back in 2010 here:

but I can't find much more about this issue on the web and there's
only passing mention of it in the Pacemaker documentation.

Bottom line: What's required to correctly configure a cluster with
symmetric-cluster set to false?

FYI I'm on pacemaker 1.1.6, heartbeat 3.0.5, on Ubuntu 10.04.

Thanks!  Brad
--
Brad Jones
b...@jones.name
Mobile: 303-219-0795

___
Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org