[Pacemaker] Call for review of undocumented parameters in resource agent meta data

2015-02-11 Thread Lars Ellenberg
On Fri, Jan 30, 2015 at 09:52:49PM +0100, Dejan Muhamedagic wrote:
 Hello,
 
 We've tagged today (Jan 30) a new stable resource-agents release
 (3.9.6) in the upstream repository.
 
 Big thanks go to all contributors! Needless to say, without you
 this release would not be possible.

Big thanks to Dejan.
Who once again finally did,
what I meant to do in late 2013 already, but simply pushed off
for over a year (and no-one else stepped up, either...)

So: Thank You.

I just today noticed that apparently some resource agents
accept and use parameters that are not documented in their meta data.

I now came up with a bash two-liner,
which likely still produces a lot of noise,
because it does not take into account that some agents
source additional helper files.

But here is the list:

--- used, but not described
+++ described, but apparently not used.

EvmsSCC   +OCF_RESKEY_ignore_deprecation
Evmsd +OCF_RESKEY_ignore_deprecation

?? intentionally undocumented ??

IPaddr+OCF_RESKEY_iflabel
IPaddr-OCF_RESKEY_netmask

Not sure.


IPaddr2   -OCF_RESKEY_netmask

intentional, backward compat, quoting the agent:
# Note: We had a version out there for a while which used
# netmask instead of cidr_netmask. Don't remove this aliasing code!


Please help review these:

IPsrcaddr -OCF_RESKEY_ip
IPsrcaddr +OCF_RESKEY_cidr_netmask
IPv6addr.c-OCF_RESKEY_cidr_netmask
IPv6addr.c-OCF_RESKEY_ipv6addr
IPv6addr.c-OCF_RESKEY_nic
LinuxSCSI +OCF_RESKEY_ignore_deprecation
Squid -OCF_RESKEY_squid_confirm_trialcount
Squid -OCF_RESKEY_squid_opts
Squid -OCF_RESKEY_squid_suspend_trialcount
SysInfo   -OCF_RESKEY_clone
WAS6  -OCF_RESKEY_profileName
apache+OCF_RESKEY_use_ipv6
conntrackd-OCF_RESKEY_conntrackd
dnsupdate -OCF_RESKEY_opts
dnsupdate +OCF_RESKEY_nsupdate_opts
docker-OCF_RESKEY_container
ethmonitor-OCF_RESKEY_check_level
ethmonitor-OCF_RESKEY_multiplicator

galera+OCF_RESKEY_additional_parameters
galera+OCF_RESKEY_binary
galera+OCF_RESKEY_client_binary
galera+OCF_RESKEY_config
galera+OCF_RESKEY_datadir
galera+OCF_RESKEY_enable_creation
galera+OCF_RESKEY_group
galera+OCF_RESKEY_log
galera+OCF_RESKEY_pid
galera+OCF_RESKEY_socket
galera+OCF_RESKEY_user

Probably all bogus, it source mysql-common.sh.
Someone please have a more detailed look.


iSCSILogicalUnit  +OCF_RESKEY_product_id
iSCSILogicalUnit  +OCF_RESKEY_vendor_id

false positive

surprise: florian learned some wizardry back then ;-)
for var in scsi_id scsi_sn vendor_id product_id; do
envar=OCF_RESKEY_${var}
if [ -n ${!envar} ]; then
params=${params} ${var}=${!envar}
fi
done

If such magic is used elsewhere,
that could mask Used but not documented cases.


iface-bridge  -OCF_RESKEY_multicast_querier

!!  Yep, that needs to be documented!

mysql-proxy   -OCF_RESKEY_group
mysql-proxy   -OCF_RESKEY_user

Oops, apparently my magic scriptlet below needs to learn to
ignore script comments...

named -OCF_RESKEY_rootdir

!!  Probably a bug:
named_rootdir is documented.


nfsserver -OCF_RESKEY_nfs_notify_cmd

!!  Yep, that needs to be documented!


nginx -OCF_RESKEY_client
nginx +OCF_RESKEY_testclient
!!  client is used, but not documented,
!!  testclient is documented, but unused...
Bug?

nginx -OCF_RESKEY_nginx

Bogus. Needs to be dropped from leading comment block.

oracle-OCF_RESKEY_tns_admin

!!  Yep, that needs to be documented!

pingd +OCF_RESKEY_ignore_deprecation

?? intentionally undocumented ??

pingd -OCF_RESKEY_update

!!  Yep, is undocumented.

sg_persist+OCF_RESKEY_binary
sg_persist-OCF_RESKEY_sg_persist_binary

!!  BUG? binary vs sg_persist_binary

varnish   -OCF_RESKEY_binary

!!  Yep, is undocumented.


Please someone find the time to prepare pull requests
to fix these...

Thanks,

Lars

-
List was generated by below scriptlet,
which can be improved.  The improved version should probably be part of
a unit test check, when building resource-agents.

# In the git checkout of the resource agents,
# get a list of files that look like actual agent scripts.
cd heartbeat
A=$(git ls-files | xargs grep -s -l 'resource-agent ')

# and for each of these files,
# diff the list of OCF_RESKEY_* occurrences
# with the list of parameter name=* ones.
for a in $A; do
diff -U0 \
(  grep -h -o 

[Pacemaker] Announcing the Heartbeat 3.0.6 Release

2015-02-10 Thread Lars Ellenberg

TL;DR:

  If you intend to set up a new High Availability cluster
  using the Pacemaker cluster manager,
  you typically should not care for Heartbeat,
  but use recent releases (2.3.x) of Corosync.

  If you don't care for Heartbeat, don't read further.

Unless you are beekhof... there's a question below ;-)



After 3½ years since the last officially tagged release of Heartbeat,
I have seen the need to do a new maintenance release.

  The Heartbeat 3.0.6 release tag: 3d59540cf28d
  and the change set it points to: cceeb47a7d8f

The main reason for this was that pacemaker more recent than
somewhere between 1.1.6 and 1.1.7 would no longer work properly
on the Heartbeat cluster stack.

Because some of the daemons have moved from glue to pacemaker proper,
and changed their paths. This has been fixed in Heartbeat.

And because during that time, stonith-ng was refactored, and would still
reliably fence, but not understand its own confirmation message, so it
was effectively broken. This I fixed in pacemaker.



If you chose to run new Pacemaker with the Heartbeat communication stack,
it should be at least 1.1.12 with a few patches,
see my December 2014 commits at the top of
https://github.com/lge/pacemaker/commits/linbit-cluster-stack-pcmk-1.1.12
I'm not sure if they got into pacemaker upstream yet.

beekhof?
Do I need to rebase?
Or did I miss you merging these?

---

If you have those patches,
consider setting this new ha.cf configuration parameter:

# If pacemaker crmd spawns the pengine itself,
# it sometimes forgets to kill the pengine on shutdown,
# which later may confuse the system after cluster restart.
# Tell the system that Heartbeat is supposed to
# control the pengine directly.
crmd_spawns_pengine off



Here is the shortened Heartbeat changelog,
the longer version is available in mercurial:
http://hg.linux-ha.org/heartbeat-STABLE_3_0/shortlog

- fix emergency shutdown due to broken update_ackseq
- fix node dead detection problems
- fix converging of membership (ccm)
- fix init script startup glitch (caused by changes in glue/resource-agents)
- heartbeat.service file for systemd platforms
- new ucast6 UDP IPv6 communication plugin
- package ha_api.py in standard package
- update some man pages, specifically the example ha.cf
- also report ccm membership status for cl_status hbstatus -v
- updated some log messages, or their log levels
- reduce max_delay in broadcast client_status query to one second
- apply various (mostly cosmetic) patches from Debian
- drop HBcompress compression plugins: they are part of cluster glue
- drop openais HBcomm plugin
- better support for current pacemaker versions
- try to not miss a SIGTERM (fix problem with very fast respawn/stop cycle)
- dopd: ignore dead ping nodes
- cl_status improvements
- api internals: reduce IPC round-trips to get at status information
- uid=root is sufficient to use heartbeat api (gid=haclient remains sufficient)
- fix /dev/null as log- or debugfile setting
- move daemon binaries into libexecdir
- document movement of compression plugins into cluster-glue
- fix usage of SO_REUSEPORT in ucast sockets
- fix compile issues with recent gcc and -Werror

Note that a number of the mentioned fixes have been created two years
ago already, and may have been released in packages for a long time,
where vendors have chosen to package them.



As to future plans for Heartbeat:

Heartbeat is still useful for non-pacemaker, haresources-mode clusters.

We (Linbit) will maintain Heartbeat for the foreseeable future.
That should not be too much of a burden, as it is stable,
and due to long years of field exposure, all bugs are known ;-)

The most notable shortcoming when using Heartbeat with Pacemaker
clusters would be the limited message size.
There are currently no plans to remove that limitation.

With its wide choice of communications paths, even exotic
communication plugins, and the ability to run arbitrarily many
paths, some deployments may even favor it over Corosync still.

But typically, for new deployments involving Pacemaker,
in most cases you should chose Corosync 2.3.x
as your membership and communication layer.

For existing deployments using Heartbeat,
upgrading to this Heartbeat version is strongly recommended.

Thanks,

Lars Ellenberg



signature.asc
Description: Digital signature
___
Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org


Re: [Pacemaker] Two node cluster and no hardware device for stonith.

2015-02-09 Thread Lars Ellenberg
On Fri, Feb 06, 2015 at 04:15:44PM +0100, Dejan Muhamedagic wrote:
 Hi,
 
 On Thu, Feb 05, 2015 at 09:18:50AM +0100, Digimer wrote:
  That is the problem that makes geo-clustering very hard to nearly
  impossible. You can look at the Booth option for pacemaker, but that
  requires two (or more) full clusters, plus an arbitrator 3rd
 
 A full cluster can consist of one node only. Hence, it is
 possible to have a kind of stretch two-node [multi-site] cluster
 based on tickets and managed by booth.

In theory.

In practice, we rely on proper behaviour of the other site,
in case a ticket is revoked, or cannot be renewed.

Relying on a single node for proper behaviour does not inspire
as much confidence as relying on a multi-node HA-cluster at each site,
which we can expect to ensure internal fencing.

With reliable hardware watchdogs, it still should be ok to do
stretched two node HA clusters in a reliable way.

Be generous with timeouts.

And document which failure modes you expect to handle,
and how to deal with the worst-case scenarios if you end up with some
failure case that you are not equipped to handle properly.

There are deployments which favor
rather online with _potential_ split brain over
rather offline just in case.

Document this, print it out on paper,

   I am aware that this may lead to lost transactions,
   data divergence, data corruption, or data loss.
   I am personally willing to take the blame,
   and live with the consequences.

Have some boss sign that ^^^
in the real world using a real pen.

Lars

-- 
: Lars Ellenberg
: http://www.LINBIT.com | Your Way to High Availability
: DRBD, Linux-HA  and  Pacemaker support and consulting

DRBD® and LINBIT® are registered trademarks of LINBIT, Austria.

___
Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org


[Pacemaker] Patches: RFC before pull request

2014-12-09 Thread Lars Ellenberg

Andrew,
All,

Please have a look at the patches I queued up here:
https://github.com/lge/pacemaker/commits/for-beekhof

Most (not all) are specific for the heartbeat cluster stack.

Thanks,
Lars

A few comments here:

-

This effectively changes crm_mon output,
but also changes logging where this method is invoked:

Low: native_print: report target-role as well

This is for the Why does my resource not start? guys who
forgot to remove the limiting target-role setting.

Report target role (unless Started, which is the default anyways),
if it limits our abilities (Slave, Stopped),
or if it differs from the current status.

-

Heartbeat specific:

Low: allow heartbeat to spawn the pengine itself, and tell crmd about it

Heartbeat 3.0.6 now may spawn the pengine directly, and will announce
this in the environment -- I introduced the setting crmd_spawns_pengine.

This improves shutdown behavior.  Otherwise I regularly find an orphaned
pengine process after pacemaker shutdown.

-

Heartbeat specific, as consequence of the fix blow:

Low: add debugging aid to help spot missing set_msg_callback()s on heartbeat

In ha_msg_dispatch(), change from rcvmsg() to readmsg().
rcvmsg() is internally simply a wrapper around readmsg(),
which silently deletes messages without matching callback.

Use readmsg() directly here. It will only return unprocessed (by
callbacks) messages, so log a warning, notice or debug message
depending on message header information, and ha_msg_del() it ourselves.

-

Heartbeat specific bug fix:

High: fix stonith ignoring its own messages on heartbeat

Since the introduction of the additional F_TYPE messages
T_STONITH_NOTIFY and T_STONITH_TIMEOUT_VALUE, and their use as message
types in global heartbeat cluster messages, stonith-ng was broken on the
heartbeat cluster stack.

When delegation was made the default, and the result could only be
reaped by listening for the T_STONITH_NOTIFY message, no-one (but
stonithd itself) would ever notice successful completion,
and stonith would be re-issued forever.

Registering callbacks for these F_TYPE fixes these hung stonith and
stonith_admin operations on the heartbeat cluster stack.

-

Heartbeat specific:

Medium: fix tracking of peer client process status on heartbeat

Don't optimistically assume that peer client processes are alive,
or that a node that can talk to us is in fact member of the same
ccm partition.

Whenever ccm tells us about a new membership, *ask* for peer client
process status.

-

This oneliner may well be relevant for corosync CPG as well,
possibly one of the reasons the pcmk_cpg_membership() has this funny
appears to be online even though we think it is dead block?

fix crm_update_peer_proc to NOT ignore flags if partially set

The set_bit() function used here actually deals with masks, not bit 
numbers.
The flag argument should in fact be plural: flags.

These proc flag bits are not always set one at a time,
but for example as crm_proc_crmd | crm_proc_cpg,
and not necessarily cleared with the same combination.

Ignoring to-be-set flags just because *some* of the flag bits are
already set is clearly a bug, and may be the reason for stale process
cache information.

-

Heartbeat specific:

Medium: map heartbeat JOIN/LEAVE status to ONLINE/OFFLINE

The rest of the code deals in online and offline,
not join and leave. Need to map these states,
or the rest of the code won't work properly.

-

Generic, if shutdown is requested before stonith connection was ever established
(due to other problems), inisting to re-try the stonith connection confused the 
shutdown.

Medium: don't trigger a stonith_reconnect if no longer required

Get rid of some spurious error messages, and speed up shutdown,
even if the connection to the stonith daemon failed.

-

Non-functional change, just for readability:

Low: use CRM_NODE_MEMBER, not CRM_NODE_ACTIVE

ACTIVE is defined to be MEMBER anyways:
include/crm/cluster.h:#define CRM_NODE_ACTIVECRM_NODE_MEMBER

Don't confuse the reader of the code
by implying it was something different.

-

Heartbeat specific, packaging only:

Low: heartbeat 3.0.6 knows to finds the daemons; drop compat symlinks


___
Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org


Re: [Pacemaker] How to avoid CRM sending stop when ha.cf gets 2nd node configured

2014-11-10 Thread Lars Ellenberg
On Sat, Nov 08, 2014 at 12:58:36AM +, aridh bose wrote:
 Hi,
 While using heartbeat and pacemaker, is it possible to bringup first
 node which can go as Master, followed by second node which should go
 as Slave without causing any issues to the first node? Currently, I
 see a  couple of problems in achieving this:1. Assuming I am not using
 mcast communication, heartbeat is mandating me to configure second
 node info either in ha.cf or in /etc/hosts file with associated IP
 address. Why can't it come up by itself as Master to start with?

 2. If I update ha.cf with the 2nd node info and use 'heartbeat -r' CRM
 first sends stop on the Master before sending start.
 Appreciate any help or pointers.


Regardless of what you do there, or why,
or on which communication stack:

how about you first put pacemaker into maintenance-mode,
then you do your re-archetecturing of your cluster,
and once you are satisfied with the new cluster,
you take it out of maintenance mode again?

At least that is one of the intended use cases
for maintenance mode.

-- 
: Lars Ellenberg
: http://www.LINBIT.com | Your Way to High Availability
: DRBD, Linux-HA  and  Pacemaker support and consulting

DRBD® and LINBIT® are registered trademarks of LINBIT, Austria.

___
Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org


Re: [Pacemaker] [ha-wg-technical] [Linux-HA] [ha-wg] [RFC] Organizing HA Summit 2015

2014-11-05 Thread Lars Ellenberg
On Sat, Nov 01, 2014 at 01:19:35AM -0400, Digimer wrote:
 All the cool kids will be there.
 
 You want to be a cool kid, right?

Well, no. ;-)

But I'll still be there,
and a few other Linbit'ers as well.

Fabio, let us know what we could do to help make it happen.

Lars

 On 01/11/14 01:06 AM, Fabio M. Di Nitto wrote:
  just a kind reminder.
 
 On 9/8/2014 12:30 PM, Fabio M. Di Nitto wrote:
  All,
  
  it's been almost 6 years since we had a face to face meeting for all
  developers and vendors involved in Linux HA.
  
  I'd like to try and organize a new event and piggy-back with DevConf in
  Brno [1].
  
  DevConf will start Friday the 6th of Feb 2015 in Red Hat Brno offices.
  
  My suggestion would be to have a 2 days dedicated HA summit the 4th and
  the 5th of February.
  
  The goal for this meeting is to, beside to get to know each other and
  all social aspect of those events, tune the directions of the various HA
  projects and explore common areas of improvements.
  
  I am also very open to the idea of extending to 3 days, 1 one dedicated
  to customers/users and 2 dedicated to developers, by starting the 3rd.
  
  Thoughts?
  
  Fabio
  
  PS Please hit reply all or include me in CC just to make sure I'll see
  an answer :)
  
  [1] http://devconf.cz/
 
  Could you please let me know by end of Nov if you are interested or not?
 
  I have heard only from few people so far.
 
  Cheers
  Fabio

___
Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org


Re: [Pacemaker] can we update an attribute with cmpxchg atomic compare and exchange semantics?

2014-09-30 Thread Lars Ellenberg
On Tue, Sep 30, 2014 at 01:51:21PM +1000, Andrew Beekhof wrote:
 
 On 30 Sep 2014, at 6:22 am, Lars Ellenberg lars.ellenb...@linbit.com wrote:
 
  On Wed, Sep 10, 2014 at 11:50:58AM +0200, Lars Ellenberg wrote:
  
  Hi Andrew (and others).
  
  For a certain use case (yes, I'm talking about DRBD peer-fencing on
  loss of replication link), it would be nice to be able to say:
  
   update some_attribute=some_attribute+1 where some_attribute = 0
  
   delete some_attribute where some_attribute=0
  
  Ok, that's not the classic cmpxchg(), more of an atomic_add();
  or similar enough. With hopefully just a single cib roundrip.
  
  
  Let me rephrase:
  Update attribute this_is_pink (for node-X with ID attr-ID):
  
   fail if said attr-ID exists elsewhere (not as the intended attribute
   at the intended place in the xml tree)
 (this comes for free already, I think)
 
   if it does not exist at all, assume it was present with current value 0
  
   if the current (or assumed current) value is = 0, add 1
  
   if the current value is  0, fail
  
   (optionally: return new value? old value?)
  
  Did anyone read this?
 
 Yep, but it requires a non-trivial answer so it got deferred :)
 
 Its a reasonable request, we've spoken about something similar in the past 
 and its clear that at some point attrd needs to grow some extra capabilities.
 Exactly when it will bubble up to the top of the todo list is less certain, 
 though I would happily coach someone with the necessary motivation.
 
 The other thing to mention is that currently the only part that wont work is 
 if the current value is  0, fail.
 Setting value=value++ will do the rest.

Nice.

 So my question would be... how important is the 'lt 0' case?
 
 Actually, come to think of it, it's not a bad default behaviour.  
 Certainly failing value++ if value=-INFINITY would be logically consistent 
 with the existing code.
 Would that be sufficient?
 

I need to think about that some more.
I may need to actually try this out and try to implement my scenario.


-- 
: Lars Ellenberg
: LINBIT | Your Way to High Availability
: DRBD/HA support and consulting http://www.linbit.com

DRBD® and LINBIT® are registered trademarks of LINBIT, Austria.

___
Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org


Re: [Pacemaker] can we update an attribute with cmpxchg atomic compare and exchange semantics?

2014-09-29 Thread Lars Ellenberg
On Wed, Sep 10, 2014 at 11:50:58AM +0200, Lars Ellenberg wrote:
 
 Hi Andrew (and others).
 
 For a certain use case (yes, I'm talking about DRBD peer-fencing on
 loss of replication link), it would be nice to be able to say:
 
   update some_attribute=some_attribute+1 where some_attribute = 0
 
   delete some_attribute where some_attribute=0
 
 Ok, that's not the classic cmpxchg(), more of an atomic_add();
 or similar enough. With hopefully just a single cib roundrip.
 
 
 Let me rephrase:
 Update attribute this_is_pink (for node-X with ID attr-ID):
 
   fail if said attr-ID exists elsewhere (not as the intended attribute
   at the intended place in the xml tree)
   (this comes for free already, I think)
   
   if it does not exist at all, assume it was present with current value 0
 
   if the current (or assumed current) value is = 0, add 1
 
   if the current value is  0, fail
 
   (optionally: return new value? old value?)

Did anyone read this?

 My intended use case scenario is this:
 
   Two DRBD nodes, several DRBD resources,
   at least a few of them in dual-primary.
 
   Replication link breaks.
 
   Fence-peer handlers are triggered individually for each resource on
   both nodes, and try to concurrently modify the cib (place fencing
   constraints).
 
 With the current implementation of crm-fence-peer.sh, it is likely that
 some DRBD resources win on one node, some win on the other node.
 The respective losers will have their IO blocked.
 
 Which means that most likely on both nodes some DRBD will stay blocked,
 some monitor operation will soon fail, some stop operation (to recover
 from the monitor fail) will soon fail, and the recovery of that will be
 node-level fencing of the affected node.
 
 In short: both nodes will be hard-reset
 because of a replication link failure.
 
 
 
 If I would instead use a single attribute (with a pre-determined ID) for all
 instances of the fence-peer handler, the first to come would chose the
 victim node, all others would just add their count.
 There will be only one loser, and more importantly: one survivor.
 
 Once the replication link is re-established,
 DRBD resynchronization will bring the former loser up-to-date,
 and the respective after-resync handlers will decrease that breakage
 count. Once the breakage count hits zero, it can and should be deleted.
 
 Presence of the breakage count attribute with value  0 would mean
 this node must not be promoted, which would be a static constraint
 to be added to all DRBD resources.
 
 Does that make sense?
 
 (I have more insane proposals, in case we have multiple (more than 2)
  Primaries during normal operation, but I'm not yet able to write them
  down without being seriously confused by myself...)
 
 
 I could open-code it with shell and cibadmin, btw.
 I did a proof-of-concept once that does
   a. cibadmin -Q
   b. some calculations,
  then prepares the update statement xml based on cib content seen,
  *including* the cib generation counters
   c. cibadmin -R (or -C, -M, -D, as appropriate)
  this will fail if the cib was modified in a relevant way since a,
  because of the included generation counters
   d. repeat as necessary
 
  
 But that is beyond ugly.
 And probably fragile.
 And would often fail for all the wrong reasons, just because some status
 code has changed and bumped the cib generation counters.
 
 What would be needed to add such functionality?
 Where would it go?
 cibadmin? cib? crm_attribute? possibly also attrd?
 
 Thanks,
   Lars
 
 
 ___
 Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
 http://oss.clusterlabs.org/mailman/listinfo/pacemaker
 
 Project Home: http://www.clusterlabs.org
 Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
 Bugs: http://bugs.clusterlabs.org

-- 
: Lars Ellenberg
: LINBIT | Your Way to High Availability
: DRBD/HA support and consulting http://www.linbit.com

DRBD® and LINBIT® are registered trademarks of LINBIT, Austria.

___
Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org


[Pacemaker] can we update an attribute with cmpxchg atomic compare and exchange semantics?

2014-09-10 Thread Lars Ellenberg

Hi Andrew (and others).

For a certain use case (yes, I'm talking about DRBD peer-fencing on
loss of replication link), it would be nice to be able to say:

  update some_attribute=some_attribute+1 where some_attribute = 0

  delete some_attribute where some_attribute=0

Ok, that's not the classic cmpxchg(), more of an atomic_add();
or similar enough. With hopefully just a single cib roundrip.


Let me rephrase:
Update attribute this_is_pink (for node-X with ID attr-ID):

  fail if said attr-ID exists elsewhere (not as the intended attribute
  at the intended place in the xml tree)
(this comes for free already, I think)

  if it does not exist at all, assume it was present with current value 0

  if the current (or assumed current) value is = 0, add 1

  if the current value is  0, fail

  (optionally: return new value? old value?)




My intended use case scenario is this:

  Two DRBD nodes, several DRBD resources,
  at least a few of them in dual-primary.

  Replication link breaks.

  Fence-peer handlers are triggered individually for each resource on
  both nodes, and try to concurrently modify the cib (place fencing
  constraints).

With the current implementation of crm-fence-peer.sh, it is likely that
some DRBD resources win on one node, some win on the other node.
The respective losers will have their IO blocked.

Which means that most likely on both nodes some DRBD will stay blocked,
some monitor operation will soon fail, some stop operation (to recover
from the monitor fail) will soon fail, and the recovery of that will be
node-level fencing of the affected node.

In short: both nodes will be hard-reset
because of a replication link failure.



If I would instead use a single attribute (with a pre-determined ID) for all
instances of the fence-peer handler, the first to come would chose the
victim node, all others would just add their count.
There will be only one loser, and more importantly: one survivor.

Once the replication link is re-established,
DRBD resynchronization will bring the former loser up-to-date,
and the respective after-resync handlers will decrease that breakage
count. Once the breakage count hits zero, it can and should be deleted.

Presence of the breakage count attribute with value  0 would mean
this node must not be promoted, which would be a static constraint
to be added to all DRBD resources.

Does that make sense?

(I have more insane proposals, in case we have multiple (more than 2)
 Primaries during normal operation, but I'm not yet able to write them
 down without being seriously confused by myself...)


I could open-code it with shell and cibadmin, btw.
I did a proof-of-concept once that does
  a. cibadmin -Q
  b. some calculations,
 then prepares the update statement xml based on cib content seen,
 *including* the cib generation counters
  c. cibadmin -R (or -C, -M, -D, as appropriate)
 this will fail if the cib was modified in a relevant way since a,
 because of the included generation counters
  d. repeat as necessary

 
But that is beyond ugly.
And probably fragile.
And would often fail for all the wrong reasons, just because some status
code has changed and bumped the cib generation counters.

What would be needed to add such functionality?
Where would it go?
cibadmin? cib? crm_attribute? possibly also attrd?

Thanks,
Lars


___
Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org


Re: [Pacemaker] Configuration recommandations for (very?) large cluster

2014-08-13 Thread Lars Ellenberg
, crm_msg_crmd, novote, TRUE);
 free_xml(novote);
--- include/crm/msg_xml.h.orig  2011-11-28 16:41:47.309414327 +0100
+++ include/crm/msg_xml.h   2011-11-28 16:42:23.921417584 +0100
@@ -33,6 +33,7 @@
 #  define F_CRM_USER   crm_user
 #  define F_CRM_JOIN_IDjoin_id
 #  define F_CRM_ELECTION_IDelection-id
+#  define F_CRM_DC_PRIO dc-prio
 #  define F_CRM_ELECTION_AGE_S election-age-sec
 #  define F_CRM_ELECTION_AGE_USelection-age-nano-sec
 #  define F_CRM_ELECTION_OWNER election-owner
--- lib/ais/plugin.c.orig   2011-11-28 16:42:57.002411543 +0100
+++ lib/ais/plugin.c2011-11-28 16:44:22.160413844 +0100
@@ -409,6 +409,9 @@
 get_config_opt(pcmk_api, local_handle, use_logd, value, no);
 pcmk_env.use_logd = value;
 
+get_config_opt(pcmk_api, local_handle, dc_prio, value, 1);
+pcmk_env.dc_prio = value;
+
 get_config_opt(pcmk_api, local_handle, use_mgmtd, value, no);
 if (ais_get_boolean(value) == FALSE) {
 int lpc = 0;
@@ -599,6 +602,7 @@
 pcmk_env.logfile = NULL;
 pcmk_env.use_logd = false;
 pcmk_env.syslog = daemon;
+pcmk_env.dc_prio = 1;
 
 if (cs_uid != root_uid) {
 ais_err(Corosync must be configured to start as 'root',
--- lib/ais/utils.c.orig2011-11-28 16:45:01.940415754 +0100
+++ lib/ais/utils.c 2011-11-28 16:45:33.018412117 +0100
@@ -237,6 +237,7 @@
setenv(HA_logfacility,pcmk_env.syslog,   1);
setenv(HA_LOGFACILITY,pcmk_env.syslog,   1);
setenv(HA_use_logd,   pcmk_env.use_logd, 1);
+setenv(HA_dc_prio, pcmk_env.dc_prio, 1);
setenv(HA_quorum_type,pcmk_env.quorum,   1);
 /* *INDENT-ON* */
 
--- lib/ais/utils.h.orig2011-11-28 16:45:45.143412597 +0100
+++ lib/ais/utils.h 2011-11-28 16:46:37.026410208 +0100
@@ -238,6 +238,7 @@
 const char *syslog;
 const char *logfile;
 const char *use_logd;
+const char *dc_prio;
 const char *quorum;
 };
 
--- crmd/messages.c.orig2012-05-25 16:23:22.913106180 +0200
+++ crmd/messages.c 2012-05-25 16:28:30.330263392 +0200
@@ -36,6 +36,8 @@
 #include crmd_messages.h
 #include crmd_lrm.h
 
+static int our_dc_prio = INT_MIN;
+
 GListPtr fsa_message_queue = NULL;
 extern void crm_shutdown(int nsig);
 
@@ -693,7 +695,19 @@
 /*== DC-Only Actions ==*/
 if (AM_I_DC) {
 if (strcmp(op, CRM_OP_JOIN_ANNOUNCE) == 0) {
-return I_NODE_JOIN;
+   if (our_dc_prio == INT_MIN) {
+   char * dc_prio_str = getenv(HA_dc_prio);
+
+   if (dc_prio_str == NULL) {
+   our_dc_prio = 1;
+   } else {
+   our_dc_prio = atoi(dc_prio_str);
+   }
+   }   
+   if (our_dc_prio == 0)
+   return I_ELECTION;  
+else 
+   return I_NODE_JOIN;
 
 } else if (strcmp(op, CRM_OP_JOIN_REQUEST) == 0) {
 return I_JOIN_REQUEST;


-- 
: Lars Ellenberg
: LINBIT | Your Way to High Availability
: DRBD/HA support and consulting http://www.linbit.com

DRBD® and LINBIT® are registered trademarks of LINBIT, Austria.

___
Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org


Re: [Pacemaker] Multiple node loadbalancing

2014-07-30 Thread Lars Ellenberg
On Wed, Jul 30, 2014 at 02:59:31PM +0200, Machiel wrote:
 Hi Guys
 
 We are trying to setup the following, however we can not seem to
 find any references on the internet which will explain on how to
 configure it.
 
 We have 3 machines, and we need to setup load balancing on the
 machines as follows:
 
 - Load balancer and apps running on all 3 machines
 - 1 machine should be the load balancer (master) which
 will balance traffic over all 3 machines including itself.
 - should this node fail, the second node should take
 over the task, and if the second node should fail, then the 3rd node
 should take over as standalone until the other nodes are restored.
 
 We are only able to find configuration instructions on how
 to setup load balancing for 2 nodes which we have done several
 times, however no info for 3 nodes.
 
 
 We are currently using ldirectord and heartbeat, however in
 this setup, if the first node fails , then both the 2nd and 3rd
 nodes try to take over. (this was configured very long ago though).

While the communication and membership layer of heartbeat always
supported many nodes, the resource manager part of heartbeat
(haresources mode) is a very basic shell script, and does only support
two-node clusters.

With haresources mode of heartbeat,
you can only do two-node clusters
(if you intend to keep your sanity).

 I would really appreciate any suggestions on this or even
 links where I can find the information would be appreciated.

Use pacemaker.

Whether you want heartbeat or corosync
as the communication an membership layer is up to you.

For new installations and recent OS releases,
pacemaker + corosync is generally the recommended way.

-- 
: Lars Ellenberg
: LINBIT | Your Way to High Availability
: DRBD/HA support and consulting http://www.linbit.com

DRBD® and LINBIT® are registered trademarks of LINBIT, Austria.

___
Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org


Re: [Pacemaker] [DRBD-user] DRBD active/passive on Pacemaker+CMAN cluster unexpectedly performs STONITH when promoting

2014-07-07 Thread Lars Ellenberg
On Fri, Jul 04, 2014 at 06:04:12PM +0200, Giuseppe Ragusa wrote:
   The setup almost works (all seems ok with: pcs status, crm_mon
   -Arf1, corosync-cfgtool -s, corosync-objctl | grep member) , but
   every time it needs a resource promotion (to Master, i.e. becoming
   primary) it either fails or fences the other node (the one supposed to
   become Slave i.e. secondary) and only then succeeds.
  
   It happens, for example both on initial resource definition (when
   attempting first start) and on node entering standby (when trying to
   automatically move the resources by stopping then starting them).
   
   I collected a full pcs cluster report and I can provide a CIB dump,
   but I will initially paste here an excerpt from my configuration just
   in case it happens to be a simple configuration error that someone can
   spot on the fly ; (hoping...)
   
   Keep in mind that the setup has separated redundant network
   connections for LAN (1 Gib/s LACP to switches), Corosync (1 Gib/s
   roundrobin back-to-back) and DRBD (10 Gib/s roundrobin back-to-back)
   and that FQDNs are correctly resolved through /etc/hosts
  
  Make sure youre DRBD are Connected UpToDate/UpToDate
  before you let the cluster take over control of who is master.
 
 Thanks for your important reminder.
 
 Actually they had been Connected UpToDate/UpToDate, and I subsequently had 
 all manually demoted to secondary
 then down-ed before eventually stopping the (manually started) DRBD service.
 
 Only at the end did I start/configure the cluster.
 
 The problem is now resolved and it seems that my improper use of
 rhcs_fence as fence-peer was the culprit (now switched to
 crm-fence-peer.sh), but I still do not understand why rhcs_fence was
 called at all in the beginning (once called, it may have caused
 unforeseen consequences, I admit) since DRBD docs clearly state that
 communication disruption must be involved in order to call fence-peer
 into action.

You likely managed to have data divergence
between your instances of DRBD,
likely caused by a cluster split-brain.

So DRBD would refuse to connect,
and thus would be not connected when promoted.

Just because you can shoot someone
does not make your data any better,
nor does it tell the victim node that his data is bad
(from the shooting nodes point of view)
so they would just keep killing each other then.

Don't do that.

But tell the cluster to not even attempt to promote,
unless the local data is known to be UpToDate *and*
the remote data is either known (DRBD is connected)
or the remote date is known to be bad (Outdated or worse).

the ocf:linbit:drbd agent has an adjust master scores
parameter for that. See there.

Lars

-- 
: Lars Ellenberg
: LINBIT | Your Way to High Availability
: DRBD/HA support and consulting http://www.linbit.com

DRBD® and LINBIT® are registered trademarks of LINBIT, Austria.
__
please don't Cc me, but send to list   --   I'm subscribed

___
Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org


Re: [Pacemaker] [DRBD-user] DRBD active/passive on Pacemaker+CMAN cluster unexpectedly performs STONITH when promoting

2014-07-04 Thread Lars Ellenberg
   device name=pcmk port=cluster2.verolengo.privatelan/
 /method
   /fence
 /clusternode
   /clusternodes
   fencedevices
 fencedevice name=pcmk agent=fence_pcmk/
   /fencedevices
   fence_daemon clean_start=0 post_fail_delay=30 post_join_delay=30/
   logging debug=on/
   rm disabled=1
 failoverdomains/
 resources/
   /rm
 /cluster
 
 --
 
 Pacemaker:
 
 PROPERTIES:
 
 pcs property set default-resource-stickiness=100
 pcs property set no-quorum-policy=ignore
 
 STONITH:
 
 pcs stonith create ilocluster1 fence_ilo2 action=off delay=10 \
 ipaddr=ilocluster1.verolengo.privatelan login=cluster2 passwd=test 
 power_wait=4 \
 pcmk_host_check=static-list 
 pcmk_host_list=cluster1.verolengo.privatelan op monitor interval=60s
 pcs stonith create ilocluster2 fence_ilo2 action=off \
 ipaddr=ilocluster2.verolengo.privatelan login=cluster1 passwd=test 
 power_wait=4 \
 pcmk_host_check=static-list 
 pcmk_host_list=cluster2.verolengo.privatelan op monitor interval=60s
 pcs stonith create pdu1 fence_apc action=off \
 ipaddr=pdu1.verolengo.privatelan login=cluster passwd=test \
  
 pcmk_host_map=cluster1.verolengo.privatelan:3,cluster1.verolengo.privatelan:4,cluster2.verolengo.privatelan:6,cluster2.verolengo.privatelan:7
  \
 pcmk_host_check=static-list 
 pcmk_host_list=cluster1.verolengo.privatelan,cluster2.verolengo.privatelan 
 op monitor interval=60s
 
 pcs stonith level add 1 cluster1.verolengo.privatelan ilocluster1
 pcs stonith level add 2 cluster1.verolengo.privatelan pdu1
 pcs stonith level add 1 cluster2.verolengo.privatelan ilocluster2
 pcs stonith level add 2 cluster2.verolengo.privatelan pdu1
 
 pcs property set stonith-enabled=true
 pcs property set stonith-action=off
 
 SAMPLE RESOURCE:
 
 pcs cluster cib dc_cfg
 pcs -f dc_cfg resource create DCVMDisk ocf:linbit:drbd \
 drbd_resource=dc_vm op monitor interval=31s role=Master \
 op monitor interval=29s role=Slave \
 op start interval=0 timeout=120s \
 op stop interval=0 timeout=180s
 pcs -f dc_cfg resource master DCVMDiskClone DCVMDisk \
 master-max=1 master-node-max=1 clone-max=2 clone-node-max=1 \
 notify=true target-role=Started is-managed=true
 pcs -f dc_cfg resource create DCVM ocf:heartbeat:VirtualDomain \
 config=/etc/libvirt/qemu/dc.xml migration_transport=tcp 
 migration_network_suffix=-10g \
 hypervisor=qemu:///system meta allow-migrate=false target-role=Started 
 is-managed=true \
 op start interval=0 timeout=120s \
 op stop interval=0 timeout=120s \
 op monitor interval=60s timeout=120s
 pcs -f dc_cfg constraint colocation add DCVM DCVMDiskClone INFINITY 
 with-rsc-role=Master
 pcs -f dc_cfg constraint order promote DCVMDiskClone then start DCVM
 pcs -f dc_cfg constraint location DCVM prefers 
 cluster2.verolengo.privatelan=50
 pcs cluster cib-push firewall_cfg
 
 Since I know that pcs still has some rough edges, I installed crmsh too, but 
 never actually used it.
 
 Many thanks in advance for your attention.
 
 Kind regards,
 Giuseppe Ragusa
 
 

 ___
 drbd-user mailing list
 drbd-u...@lists.linbit.com
 http://lists.linbit.com/mailman/listinfo/drbd-user


-- 
: Lars Ellenberg
: LINBIT | Your Way to High Availability
: DRBD/HA support and consulting http://www.linbit.com

DRBD® and LINBIT® are registered trademarks of LINBIT, Austria.
__
please don't Cc me, but send to list   --   I'm subscribed

___
Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org


Re: [Pacemaker] Not unmoving colocated resources can provoke DRBD split-brain

2014-06-12 Thread Lars Ellenberg
On Thu, Jun 12, 2014 at 10:10:55AM +1000, Andrew Beekhof wrote:
 Referring to the king of drbd... 
 Lars, question for you inline.

  ===
  primitive DRBD-ffm ocf:linbit:drbd params drbd_resource=ffm \
  op start interval=0 timeout=240 \
  op promote interval=0 timeout=90 \
  op demote interval=0 timeout=90 \
  op notify interval=0 timeout=90 \
  op stop interval=0 timeout=100 \
  op monitor role=Slave timeout=20 interval=20 \
  op monitor role=Master timeout=20 interval=10
  ms ms-DRBD-ffm DRBD-ffm meta master-max=1 master-node-max=1 \
  clone-max=2 clone-node-max=1 notify=true
  colocation coloc-ms-DRBD-ffm-follows-ALL-ffm inf: \
  ms-DRBD-ffm:Master ALL-ffm
  order ord-ALL-ffm-before-DRBD-ffm inf: ALL-ffm ms-DRBD-ffm:promote
  location loc-ms-DRBD-ffm-korfwm01 ms-DRBD-ffm -inf: korfwm01
  location loc-ms-DRBD-ffm-korfwm02 ms-DRBD-ffm -inf: korfwm02
  ===
  
  # crm node standby korfwf01 ; sleep 10
  # crm node online korfwf01 ; sleep 10
  # crm resource move ALL-ffm korfwf01 ; sleep 10
  # crm node standby korfwf01 ; sleep 10
  # crm node online korfwf01 ; sleep 10
  *bang* split-brain.
  
  This is because with the last command online korfwf01 pacemaker starts
  and the immediately promotes ms-DRBD-ffm without giving any time for
  drbd to sync with the peer.
 
 Have you seen anything like this before?
 I don't know we have any capacity to delay the promotion in the PE... 
 perhaps the agent needs to delay setting a master score if its out of date?
 or maybe loop in the promote action and set a really long timeout

You want to configure DRBD for fencing resource-and-stonith,
and use the fence-peer handler crm-fence-peer.sh
(and the corresponding crm-unfence-peer.sh in the after-resync handler.

Done.

What does that do?

If a fencing policy != dont-care is configured,
DRBD, if gracefully disconnected (stop), will outdate a secondary.
Outdated secondaries refuse to be promoted.

On non-graceful disconnect, a Primary will freeze IO,
call the fence-peer handler, which places a constraint pinning the
primary role to where it currently is, and on success resume IO.

Also, DRBD will not consider itself as UpToDate immediately after
start, but as Consistent at best, which will use a minimal
master_score (or none at all, see adjust-master-scores).

Due to this constraint, pacemaker will not attempt promotion
on the node that was fenced (in this case only fenced from becomming
Primary, no necessarily shot... it really only places a constraint)
until that node is unfenced (the constraint is removed),
which will happen in the after-resync-target handler (crm-unfence-peer.sh).

If you don't like the freeze IO part above,
you can use the resource-only fencing policy.
The and-stonith part is really only about the freeze-io.
The crm-fence-peer.sh does NOT (usually) trigger stonith itself.
It may wait for a successful stonith though, if it thinks one is pending.

The only reliable (as can be) way to avoid data divergence with DRBD and
pacemaker is to use redundant cluster communications,
use working and tested node level fencing on the pacemaker level,
*and* use fencing resource-and-stonith + crm-fence-peer.sh on the DRBD level.

You may want to use the adjust-master-score parameter of the DRBD
resource agent as well, to avoid pacemaker attempting to promote an
only Consistent DRBD, which will usually fail anyways.
See description there.

-- 
: Lars Ellenberg
: LINBIT | Your Way to High Availability
: DRBD/HA support and consulting http://www.linbit.com

DRBD® and LINBIT® are registered trademarks of LINBIT, Austria.

___
Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org


Re: [Pacemaker] DRBD primary/primary + Pacemaker goes into split brain after crm node standby/online

2014-06-12 Thread Lars Ellenberg
On Mon, Jun 09, 2014 at 08:07:51PM +0200, Alexis de BRUYN wrote:
 Hi Everybody,
 
 I have an issue with a 2-node Debian Wheezy primary/primary DRBD
 Pacemaker/Corosync configuration.
 
 After a 'crm node standby' then a 'crm node online', the DRBD volume
 stays in a 'split brain state' (cs:StandAlone ro:Primary/Unknown).
 
 A soft or hard reboot of one node gets rid of the split brain and/or
 doesn't create one.
 
 I have followed http://www.drbd.org/users-guide-8.3/ and keep my tests
 as simple as possible (no activity and no filesystem on the DRBD volume).
 
 I don't see what I am doing wrong. Could anybody help me with this please.

Use fencing, both node-level fencing on the Pacemaker level,
*and* constraint fencing on the DRBD level:

 # cat /etc/drbd.d/sda4.res
 resource sda4 {
  device /dev/drbd0;
  disk /dev/sda4;
  meta-disk internal;
 
   startup {
 become-primary-on both;
   }
 
   handlers {
 split-brain /usr/lib/drbd/notify-split-brain.sh root;

 fence-peer crm-fence-peer.sh;
 after-resync-target crm-unfence-peer.sh;

   }

disk {
 fencing resource-and-stonith;
 }

 
   net {
 allow-two-primaries;
 after-sb-0pri discard-zero-changes;
 after-sb-1pri discard-secondary;
 after-sb-2pri disconnect;
   }
  on testvm1 {
   address 192.168.1.201:7788;
  }
  on testvm2 {
   address 192.168.1.202:7788;
  }
 
  syncer {
   rate 100M;
   al-extents 3389;
  }
 }
-- 
: Lars Ellenberg
: LINBIT | Your Way to High Availability
: DRBD/HA support and consulting http://www.linbit.com

DRBD® and LINBIT® are registered trademarks of LINBIT, Austria.

___
Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org


Re: [Pacemaker] drbd + lvm

2014-06-12 Thread Lars Ellenberg
On Thu, Mar 13, 2014 at 03:57:28PM -0400, David Vossel wrote:
 
 
 
 
 - Original Message -
  From: Infoomatic infooma...@gmx.at
  To: pacemaker@oss.clusterlabs.org
  Sent: Thursday, March 13, 2014 2:26:00 PM
  Subject: [Pacemaker] drbd + lvm
  
  Hi list,
  
  I am having troubles with pacemaker and lvm and stacked drbd resources.
  The system consists of 2 Ubuntu 12 LTS servers, each having two partitions 
  of
  an underlying raid 1+0 as volume group with one LV each as a drbd backing
  device. The purpose is for usage with VMs and adjusting needed disk space
  flexible, so on top of the drbd resources there are LVs for each VM.
  I created a stack with LCMC, which is like:
  
  DRBD-LV-libvirt and
  DRBD-LV-Filesystem-lxc
  
  The problem now: the system has hickups - when VM01 runs on HOST01 (being
  primary DRBD) and HOST02 is restarting, lvm is reloaded (at boot time) and
  the LVs are being activated. This of course results in an error, the log
  entry:
  
  Mar 13 17:58:42 host01 pengine: [27563]: ERROR: native_create_actions:
  Resource res_LVM_1 (ocf::LVM) is active on 2 nodes attempting recovery
  
  Therefore, as configured, the resource is stopped and started again (on only
  one node). Thus, all VMs and containers relying on this are also restared.
  
  When I disable the LVs that use the DRBD resource at boot (lvm.conf:
  volume_list only containing the VG from the partitions of the raidsystem) a
  reboot of the secondary does not restart the VMs running on the primary.
  However, if the primary goes down (e.g. power interruption), the secondary
  cannot activate the LVs of the VMs because they are not in the list of
  lvm.conf to be activated.
  
  Has anyone had this issue and resolved it? Any ideas? Thanks in advance!
 
 Yep, i've hit this as well. Use the latest LVM agent. I already fixed all of 
 this.

I you exclude the DRBD lower level devices in your lvm.conf filter
(and update your initramfs to have a proper copy of that lvm.conf),
and only allow them to be accessed via DRBD,
LVM cannot possibly activate them on boot.
But only after DRBD was promoted.
Which supposedly happens via pacemaker only.
And unless some udev rule auto-activates any VG found immediately,
it should only be activated via pacemaker as well.

So something like that should be in your lvm.conf:
  filter = [ a|^/dev/your/system/PVs|, a|^/dev/drbd|, r|.| ]

 https://github.com/ClusterLabs/resource-agents/blob/master/heartbeat/LVM
 
 Keep your volume_list the way it is and use the 'exclusive=true' LVM
 option.   This will allow the LVM agent to activate volumes that don't
 exist in the volume_list.

That is a nice feature, but if I'm correct, it is unrelated here.

-- 
: Lars Ellenberg
: LINBIT | Your Way to High Availability
: DRBD/HA support and consulting http://www.linbit.com

DRBD® and LINBIT® are registered trademarks of LINBIT, Austria.

___
Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org


Re: [Pacemaker] no-quorum-policy = demote?

2014-04-11 Thread Lars Ellenberg
On Fri, Apr 11, 2014 at 10:02:59AM +0200, Christian Ciach wrote:
 Thank you for pointing me to the environment variables. Unfortunately, none
 of these work in this case. For example: Assume one node is currently the
 master. Then, because of a network failure, this node loses quorum. Because
 no-quorum-policy is set to ignore, this node will keep being a master.
 In this case there is no change of state, thus the notify-function of the
 OCF-agent does not get called by pacemaker. I've already tried this, so I
 am quite sure about that.


Very very hackish idea:

  set monitor interval of the Master role to T seconds
  and fail (+demote) if no quorum.

  (or use a dummy resource agent similar to the ping RA,
  and update some node attribute from there...
  then have a contraint for the Master role on that node attribute)

  in your promote action,
refuse to promote if no quorum
sleep 3*T (+ time to demote)
only then actually promote.

That way, you are reasonably sure that,
before you actually promote,
the former master had a chance to notice quorum loss and demote.

But you really should look into booth, or proper fencing.

-- 
: Lars Ellenberg
: LINBIT | Your Way to High Availability
: DRBD/HA support and consulting http://www.linbit.com

DRBD® and LINBIT® are registered trademarks of LINBIT, Austria.

___
Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org


Re: [Pacemaker] Colocation constraint to External Managed Resource

2013-10-11 Thread Lars Ellenberg
On Thu, Oct 10, 2013 at 06:20:54PM +0200, Robert H. wrote:
 Hello,
 
 Am 10.10.2013 16:18, schrieb Andreas Kurz:
 
 You configured a monitor operation for this unmanaged resource?
 
 Yes, and some parts work as expected, however some behaviour is
 strange.
 
 Config (relevant part only):
 
 
 primitive mysql-percona lsb:mysql \
 op start enabled=false interval=0 \
 op stop enabled=false interval=0 \
 op monitor enabled=true timeout=20s interval=10s \

You probably also want to monitor even if pacemaker thinks this is
supposed to be stopped.

op monitor interval=11s timeout=20s role=Stopped


 meta migration-threshold=2 failure-timeout=30s
 is-managed=false
 clone CLONE-percona mysql-percona \
 meta clone-max=2 clone-node-max=1 is-managed=false
 location clone-percona-placement CLONE-percona \
 rule $id=clone-percona-placement-rule -inf: #uname ne
 NODE1 and #uname ne NODE2
 colocation APP-dev2-private-percona-withip inf: IP CLONE-percona
 
 
 Test:
 
 
 I start by both Percona XtraDB machines running:
 
  IP-dev2-privatevip1(ocf::heartbeat:IPaddr2):   Started
 NODE2
  Clone Set: CLONE-percona [mysql-percona] (unmanaged)
  mysql-percona:0(lsb:mysql):Started NODE1 (unmanaged)
  mysql-percona:1(lsb:mysql):Started NODE2 (unmanaged)
 
 shell# /etc/init.d/mysql stop on NODE2
 
 ... Pacemaker reacts as expected 
 
  IP-dev2-privatevip1(ocf::heartbeat:IPaddr2):   Started
 NODE1
  Clone Set: CLONE-percona [mysql-percona] (unmanaged)
  mysql-percona:0(lsb:mysql):Started NODE1 (unmanaged)
  mysql-percona:1(lsb:mysql):Started NODE2 (unmanaged)
 FAILED
 
.. then I wait 
.. after some time (1 min), the ressource is shown as running ...
 
  IP-dev2-privatevip1(ocf::heartbeat:IPaddr2):   Started
 NODE1
  Clone Set: CLONE-percona [mysql-percona] (unmanaged)
  mysql-percona:0(lsb:mysql):Started NODE1 (unmanaged)
  mysql-percona:1(lsb:mysql):Started NODE2 (unmanaged)
 
 But it is definitly not running:
 
 shell# /etc/init.d/mysql status
 MySQL (Percona XtraDB Cluster) is not running
 [FEHLGESCHLAGEN]
 
 When I run probe crm resource reprobe it switches to:
 
  IP-dev2-privatevip1(ocf::heartbeat:IPaddr2):   Started
 NODE1
  Clone Set: CLONE-percona [mysql-percona] (unmanaged)
  mysql-percona:0(lsb:mysql):Started NODE1 (unmanaged)
  Stopped: [ mysql-percona:1 ]
 
 Then when I start it again:
 
 /etc/init.d/mysql start on NODE2
 
 It stays this way:
 
  IP-dev2-privatevip1(ocf::heartbeat:IPaddr2):   Started
 NODE1
  Clone Set: CLONE-percona [mysql-percona] (unmanaged)
  mysql-percona:0(lsb:mysql):Started NODE1 (unmanaged)
  Stopped: [ mysql-percona:1 ]
 
 Only a manual reprobe helps:
 
  IP-dev2-privatevip1(ocf::heartbeat:IPaddr2):   Started
 NODE1
  Clone Set: CLONE-percona [mysql-percona] (unmanaged)
  mysql-percona:0(lsb:mysql):Started NODE1 (unmanaged)
  mysql-percona:1(lsb:mysql):Started NODE2 (unmanaged)
 
 Same thing happens when I reboot NODE2 (or other way around).
 
 ---
 
 I would expect that crm_mon ALWAYS reflects the local state, however
 it looks like a bug for me.

crm_mon reflects what is in the cib.  If no-one re-populates the cib
with the current state of the world, what it shows will be stale.

-- 
: Lars Ellenberg
: LINBIT | Your Way to High Availability
: DRBD/HA support and consulting http://www.linbit.com

DRBD® and LINBIT® are registered trademarks of LINBIT, Austria.

___
Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org


Re: [Pacemaker] [Linux-HA] Probably a regression of the linbit drbd agent between pacemaker 1.1.8 and 1.1.10

2013-09-10 Thread Lars Ellenberg
On Mon, Sep 09, 2013 at 01:41:17PM +0200, Andreas Mock wrote:
 Hi Lars,
 
 here also my official Thank you very much looking
 at the problem.

 I've been looking forward to the official release
 of drbd 8.4.4.
 
 Or do you need disoriented rc testers like me? ;-)

Why not?
That's what release candidates are intended for.
You'd only have to confirm that it works for you now.

Respectively, that it still does not,
in which case you better report that now
than after the release, right?


-- 
: Lars Ellenberg
: LINBIT | Your Way to High Availability
: DRBD/HA support and consulting http://www.linbit.com

DRBD® and LINBIT® are registered trademarks of LINBIT, Austria.

___
Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org


Re: [Pacemaker] [Linux-HA] Probably a regression of the linbit drbd agent between pacemaker 1.1.8 and 1.1.10

2013-09-09 Thread Lars Ellenberg
On Mon, Sep 09, 2013 at 02:42:45PM +1000, Andrew Beekhof wrote:
 
 On 06/09/2013, at 5:51 PM, Lars Ellenberg lars.ellenb...@linbit.com wrote:
 
  On Tue, Aug 27, 2013 at 06:51:45AM +0200, Andreas Mock wrote:
  Hi Andrew,
  
  as this is a real showstopper at the moment I invested some other
  hours to be sure (as far as possible) not having made an error.
  
  Some additions:
  1) I mirrored the whole mini drbd config to another pacemaker cluster.
  Same result: pacemaker 1.1.8 works, pacemaker 1.1.10 not 
  2) When I remove the target role Stopped from the drbd ms resource
  and insert the config snippet related to the drbd device via crm -f file
  to a lean running pacemaker config (pacemaker cluster options, stonith
  resources),
  it seems to work. That means one of the nodes gets promoted.
  
  Then after stopping 'crm resource stop ms_drbd_xxx' and starting again
  I see the same promotion error as described.
  
  The drbd resource agent is using /usr/sbin/crm_master.
  Is there a possibility that feedback given through this client tool
  is changing the timing behaviour of pacemaker? Or the way
  transitions are scheduled?
  Any idea that may be related to a change in pacemaker?
  
  I think that recent pacemaker allows for start and promote in the
  same transition.
 
 At least in the one case I saw logs of, this wasn't the case.
 The PE computed:
 
 Current cluster status:
 Online: [ db05 db06 ]
 
 r_stonith-db05(stonith:fence_imm):Started db06 
 r_stonith-db06(stonith:fence_imm):Started db05 
 Master/Slave Set: ms_drbd_fodb [r_drbd_fodb]
 Slaves: [ db05 db06 ]
 Master/Slave Set: ms_drbd_fodblog [r_drbd_fodblog]
 Slaves: [ db05 db06 ]
 
 Transition Summary:
 * Promote r_drbd_fodb:0   (Slave - Master db05)
 * Promote r_drbd_fodblog:0(Slave - Master db05)
 
 and it was the promotion of r_drbd_fodb:0 that failed.

Right.

Off-list communication revealed that
DRBD came up as Consistent only,
which is a normal and expected state,
when using resource level fencing.

The promotion attempt then raced with the connection handshake.
The DRBD fence-peer handler is run (because it's only Consistent,
not UpToDate) and returns successfully, but due to that race,
this result is ignored, DRBD stays only Consistent, which
is not good enough to be promoted (need access to UpToDate data).

Once the handshake is done, that also results in access to good data,
which is why the next promotion attempt succeeds.


Something in the timing of pacemaker actions has changed
between the affected and unaffected versions.
Apparently before there was enough time to do the connection handshake
before the promote request was made.


This race is fixed with DRBD 8.3.16 and 8.4.4 (currently rc1)

You can avoid that race by not allowing Pacemaker to promote
if DRBD is only Consistent.

Pacemaker will only attempt promotion,
if there is a positive master score for the resource.

The ocf:linbit:drbd RA hardcodes the master score for
Consistent to 5.
So you may edit the RA and instead remove the master score
for the only Consistent.

(above mentioned fixed DRBD versions also introduce a new
adjust_master_score paramater, and this becomes configurable)

Or you can add a location constraint like this:
 location no-master-if-only-consistent ms_drbd_XY \
rule $role=Master -10: defined #uname

where defined #uname is a funny way to express true,
as in this constraint reduces the resulting master score by 10,
always, anywhere.

If you have other $role=Master constraints, you may need to play with
the scores to achieve the desired outcome.


  I suspect you would not be able to reproduce by:
   crm resource stop ms_drbd
   crm resource demote ms_drbd (will only make drbd Secondary stuff)
 ... meanwhile, DRBD will establish the connection ...
   crm resource promote ms_drbd (will then promote one node)

By first allowing DRBD to do the handshake in Secondary/Secondary,
and only later allowing it to promote,
this sequence also avoids the race.

Cheers,
Lars

-- 
: Lars Ellenberg
: LINBIT | Your Way to High Availability
: DRBD/HA support and consulting http://www.linbit.com

DRBD® and LINBIT® are registered trademarks of LINBIT, Austria.

___
Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org


Re: [Pacemaker] [Linux-HA] Probably a regression of the linbit drbd agent between pacemaker 1.1.8 and 1.1.10

2013-09-06 Thread Lars Ellenberg
 started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
 Bugs: http://bugs.clusterlabs.org

-- 
: Lars Ellenberg
: LINBIT | Your Way to High Availability
: DRBD/HA support and consulting http://www.linbit.com

DRBD® and LINBIT® are registered trademarks of LINBIT, Austria.

___
Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org


Re: [Pacemaker] ha_logd and logfile rotation

2013-08-07 Thread Lars Ellenberg
On Tue, Aug 06, 2013 at 12:26:02PM -0600, Dan Urist wrote:
 I'm trying to use heartbeat with ha_logd, but I can't find any
 documentation for the proper way to handle log file rotation when using
 ha_logd. The docs at http://linux-ha.org/wiki/Ha.cf state:
 
  If the logging daemon is used, all log messages will be sent through
  IPC to the logging daemon, which then writes them into log files. In
  case the logging daemon dies (for whatever reason), a warning message
  will be logged and all messages will be written to log files directly.
 
 So it's not possible to stop ha_logd, rotate the log files and then
 restart it. How can I rotate log files without restarting heartbeat? 

If you logrotate with delay compress or whatever that is called,
it should just notice itself and reopen.

Also, logd is supposed to handle SIGHUP by re-opening the log files.
If it does not do that for you, upgrade.
If it still does not do that, complain again ;-)

-- 
: Lars Ellenberg
: LINBIT | Your Way to High Availability
: DRBD/HA support and consulting http://www.linbit.com

DRBD® and LINBIT® are registered trademarks of LINBIT, Austria.

___
Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org


Re: [Pacemaker] Multi-node resource dependency

2013-07-19 Thread Lars Ellenberg
On Fri, Jul 19, 2013 at 04:49:21PM +0200, Tomcsányi, Domonkos wrote:
 Hello everyone,
 
 I have been struggling with this issue for quite some time so I
 decided to ask you to see if maybe you can shed some light on this
 problem.
 So here is the deal:
 I have a 4-node cluster, from which 2-2 nodes belong together.
 In ASCII art it would look like this:
 
 -- --
 | NODE1 |--   | NODE 2  |
 -- --
 | |
 | |
 | |
 -- --
 | NODE 3 |  --  | NODE 4 |
 -- --
 
 Now the behaviour I would like to achieve:
 If NODE 1 goes offline its services should get migrated to NODE 2
 AND NODE 3's services should get migrated to NODE 4.
 If NODE 3 goes offline its services should get migrated to NODE4 AND
 NODE1's services should get migrated to NODE 2.
 Of course the same should happen vice versa with NODE 2 and NODE 4.
 
 The services NODE1 and 2 are the same naturally, but they differ
 from NODE 3's and 4's services. So I added some 'location'
 directives to the config so the services can only be started on the
 right nodes.
 I tried 'colocation' which is great, but not for this kind of
 behaviour: if I colocate both resource groups of NODE 1 and 3 only
 one of them starts (of course, because colocation means the
 resource/resource group(s) should be running on the same NODE, so my
 location directives kick in and prevent for example NODE 3's
 services from starting on NODE 1).
 
 So my question is: is it possible to define such behaviour I
 described above in Pacemaker? If yes, how?

You may use node attributes in colocation constraints.

So you would give your nodes attributes, first:

crm node
attribute NODE1 set color pink
attribute NODE3 set color pink

attribute NODE2 set color slime
attribute NODE4 set color slime

crm configure
colocation c-by-color inf: rsc_a rsc_b rsc_c node-attribute=color

The implicit default node-attribute is #uname ...
so using color the resources only need to run on nodes with the same
value for the node-attribute color.

Lars

-- 
: Lars Ellenberg
: LINBIT | Your Way to High Availability
: DRBD/HA support and consulting http://www.linbit.com

DRBD® and LINBIT® are registered trademarks of LINBIT, Austria.

___
Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org


Re: [Pacemaker] crond on both nodes (active/passive) but some jobs on active only

2013-07-05 Thread Lars Ellenberg
On Fri, Jul 05, 2013 at 04:52:35PM +0200, andreas graeper wrote:
 when i wrote a script handled by ocf:heartbeat:anything i.e. that is
 signalling the cron-daemon to reload crontabs
 when crontab file is enabled by symlink:start and disabled by symlink:stop
 
 how can i achieve that the script runs after symlink :start and :stop ?
 when i define order-constraint R1 then R2 this implizit means R1:start ,
 R2:start and R2:stop, R1:stop ?
 

Not an answer to that specific question,
rather a why even bother suggestion:

You say:
  two nodes active/passive and fetchmail as cronjob shall run on active only.

How do you know the node is active?
Maybe some specific file system is mounted?
Great.  You have files and directories
which are only visible on an active node.

Why not prefix your cron job lines with
test -e /this/file/only/visible/on/active || exit 0; real cron command follows
 or
cd /some/dir/only/on/active || exit 0; real cron command

 or a wrapper, if that looks too ugly
only-on-active real cron command

/bin/only-on-active:
#!/bin/sh
same-active-test-as-above || exit 0
$@ # do the real cron command

Lars

 2013/7/5 andreas graeper agrae...@googlemail.com
 
  hi,
  two nodes active/passive and fetchmail as cronjob shall run on active only.
 
  i use ocf:heartbeat:symlink to move / rename
 
  /etc/cron.d/jobs  /etc/cron.d/jobs.disable
 
  i read anywhere crond ignores files with dot.
 
  but new experience: crond needs to restarted or signalled.
 
  how this is done best within pacemaker ?
  is clone for me ?
 
 
  thanks in advance
  andreas


-- 
: Lars Ellenberg
: LINBIT | Your Way to High Availability
: DRBD/HA support and consulting http://www.linbit.com

DRBD® and LINBIT® are registered trademarks of LINBIT, Austria.

___
Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org


Re: [Pacemaker] drbd on passive node not started

2013-07-03 Thread Lars Ellenberg
On Fri, Jun 21, 2013 at 03:36:39PM +0200, andreas graeper wrote:
 hi,
 n1 active node is started and everything works fine, but after reboot n2
 drbd is not started by pacemaker. when i start drbd manually, crm_mon shows
 it as slave ( as if there were no problems).
 
 maybe someone experienced can have a look into logs ?

The logs you provide clearly show that pacemaker *did* start DRBD,
and successfully.

Wrong timeframe?

Lars

-- 
: Lars Ellenberg
: LINBIT | Your Way to High Availability
: DRBD/HA support and consulting http://www.linbit.com

DRBD® and LINBIT® are registered trademarks of LINBIT, Austria.

___
Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org


Re: [Pacemaker] Monitor and standby

2013-07-03 Thread Lars Ellenberg
On Wed, Jul 03, 2013 at 12:21:37PM +0200, Denis Witt wrote:
 Hi List,
 
 we have a two node cluster (test1-node1, test1-node2) with an additional
 quorum node (test1). On all nodes MySQL is running. test1-node1 and
 test1-node2 sharing the MySQL-Database via DRBD, so only one Node
 should run MySQL. On test1 there is a MySQL-Slave connected to
 test1-node1/test1-node2. test1 is always in Standby-Mode.
 
 The problem is now that the MySQL-Slave on test1 is shut down by crmd:
 
 Jul  3 12:05:12 test2 crmd: [5945]: info: te_rsc_command: Initiating action 
 22: monitor p_mysql_monitor_0 on test2 (local)
 Jul  3 12:05:14 test2 pengine: [5944]: ERROR: native_create_actions: Resource 
 p_mysql (lsb::mysql) is active on 2 nodes attempting recovery


There.
init script status action told pacemaker that mysql was running on both
nodes, pacemaker was told it should run only once.
pacemaker recovers by stopping both and starting one.

 Jul  3 12:05:14 test2 pengine: [5944]: notice: LogActions: Restart 
 p_mysql#011(Started test2-node1)
 Jul  3 12:05:15 test2 crmd: [5945]: info: te_rsc_command: Initiating action 
 54: stop p_mysql_stop_0 on test2 (local)
 
 From my understanding this shouldn't happen as test1 was set to standby
 before:
 
 Jul  3 12:04:48 test2 cib: [5940]: info: cib:diff: +   nvpair 
 id=nodes-test2-standby name=standby value=on /
 
 How could we solve this?

use the mysql RA with proper parameters, so it won't get confused by a
different instance of mysql.

Or fix the init script status action to be able to distinguish between
the cluster mysql instance, and your other mysql instance.

Note that a pacemaker node in standby is supposed to not run any
resources, so if it notices that DRBD is running there (in Secondary),
it will stop it, too.

Maybe you and pacemaker disagree about the meaning of standby?


-- 
: Lars Ellenberg
: LINBIT | Your Way to High Availability
: DRBD/HA support and consulting http://www.linbit.com

DRBD® and LINBIT® are registered trademarks of LINBIT, Austria.

___
Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org


Re: [Pacemaker] Monitor and standby

2013-07-03 Thread Lars Ellenberg
On Wed, Jul 03, 2013 at 12:56:35PM +0200, Denis Witt wrote:
 On Wed, 3 Jul 2013 12:35:34 +0200
 Lars Ellenberg lars.ellenb...@linbit.com wrote:
 
  Maybe you and pacemaker disagree about the meaning of standby?
 
 Hi Lars,
 
 obviously, yes. My understanding was that a standby node just adds it
 vote for quorum but isn't monitored at all. Thanks for clarify this.
 
 We solved it by renaming the Init-Script from mysql to mysqlslave on
 this node. Now the monitor complains about mysql isn't installed, but
 we can live with that.

What purpose, exactly, is pacemaker supposed to serve in your setup?

Why are you using pacemaker at all,
if you intend to do everything manually anyways?

Lars

___
Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org


Re: [Pacemaker] Fixed! - Re: Problem with dual-PDU fencing node with redundant PSUs

2013-06-29 Thread Lars Ellenberg
On Fri, Jun 28, 2013 at 07:27:19PM -0400, Digimer wrote:
 On 06/28/2013 07:22 PM, Andrew Beekhof wrote:
  
  On 29/06/2013, at 12:22 AM, Digimer li...@alteeve.ca wrote:
  
  On 06/28/2013 06:21 AM, Andrew Beekhof wrote:
 
  On 28/06/2013, at 5:22 PM, Lars Marowsky-Bree l...@suse.com wrote:
 
  On 2013-06-27T12:53:01, Digimer li...@alteeve.ca wrote:
 
  primitive fence_n01_psu1_off stonith:fence_apc_snmp \
params ipaddr=an-p01 pcmk_reboot_action=off port=1
  pcmk_host_list=an-c03n01.alteeve.ca
  primitive fence_n01_psu1_on stonith:fence_apc_snmp \
params ipaddr=an-p01 pcmk_reboot_action=on port=1
  pcmk_host_list=an-c03n01.alteeve.ca
 
  So every device twice, including location constraints? I see potential
  for optimization by improving how the fence code handles this ... That's
  abhorrently complex. (And I'm not sure the 'action' parameter ought to
  be overwritten.)
 
  I'm not crazy about it either because it means the device is tied to a 
  specific command.
  But it seems to be something all the RHCS people try to do...
 
  Maybe something in the rhcs water cooler made us all mad... ;)
 
  Glad you got it working, though.
 
  location loc_fence_n01_ipmi fence_n01_ipmi -inf: an-c03n01.alteeve.ca
  [...]
 
  I'm not sure you need any of these location constraints, by the way. Did
  you test if it works without them?
 
  Again, this is after just one test. I will want to test it several more
  times before I consider it reliable. Ideally, I would love to hear
  Andrew or others confirm this looks sane/correct.
 
  It looks correct, but not quite sane. ;-) That seems not to be
  something you can address, though. I'm thinking that fencing topology
  should be smart enough to, if multiple fencing devices are specified, to
  know how to expand them to first all off (if off fails anywhere, it's a
  failure), then all on (if on fails, it is not a failure). That'd
  greatly simplify the syntax.
 
  The RH agents have apparently already been updated to support multiple 
  ports.
  I'm really not keen on having the stonith-ng doing this.
 
  This doesn't help people who have dual power rails/PDUs for power
  redundancy.
  
  I'm yet to be convinced that having two PDUs is helping those people in the 
  first place.
  If it were actually useful, I suspect more than two/three people would have 
  asked for it in the last decade.
 
 Step 1. Use one PDU
 Step 2. Kill PDU
 
 Your node is dead and can not be fenced.

I have multiple independend cluster communication channels.
I don't see the node on either of them,
I cannot reach it's IPMI or equivalent,
I cannot reach it's PDU

I'd argue that a failure mode where all of the above was true,
and that node would still be alive is sufficiently unlikely
to just conclude that it is in fact dead.

Rather that than a fencing method that returns yes, I rebooted that
node when in fact that node did not even notice...

 Using two separate UPSes and two separate PDUs to feed either PSU in
 each node (and either switch in a two-switch configuration with bonded
 network links) means that you can lose a power rail and not have an
 interruption.

 I can't say why it's not a more common configuration, but I can say that
 I do not see another way to provide redundant power. For me, an HA
 cluster is not truly HA until all single points of failure have been
 removed.

If I do have two independend UPSes and PDUs and PSUs,
(yes, that is a common setup)
and I want a second fencing method to fallback from IPMI, then yes,
it'd would be nice to have some clean and easy way
to tell pacemaker to do that.

But not having that fallback fencing method does not introduce a SPOF.
Both mainboard (or kernel or resource stop failure or whatever)
and BMC would have to fail at the same time for the cluster to block...

-- 
: Lars Ellenberg
: LINBIT | Your Way to High Availability
: DRBD/HA support and consulting http://www.linbit.com

___
Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org


Re: [Pacemaker] owership of created symlink

2013-06-04 Thread Lars Ellenberg
On Tue, Jun 04, 2013 at 07:15:11PM +0200, andreas graeper wrote:
 hi,
 i tried, before starting dovecot+exim+fetchmail
 
 to create a symlink
 /var/mail - /mnt/mirror/var/mail
 with ra ocf:heartbeat:symlink
 
 i changed target :
   chmod 0775
   chown root.mail
 
 but i need write permission to /var/mail
 cause exim wants to create a lock file
 
 i tried to manually
  chown -h root.mail /var/mail
 and link is now 777 root.mail

Ownership and permissions of the link do not matter at all.
For the mount point the same.

Ownership and permissions of the directory matters.

once mounted, do chown / chmod on /mnt/mirror/var/mail/.

Also make sure the uid/gid is the same on all nodes.

 but old problem euid=5xx egid=8 (mail) can not create lock file
 /var/mail/.lock
 
 please help.
 andreas


-- 
: Lars Ellenberg
: LINBIT | Your Way to High Availability
: DRBD/HA support and consulting http://www.linbit.com

DRBD® and LINBIT® are registered trademarks of LINBIT, Austria.

___
Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org


Re: [Pacemaker] crm subshell 1.2.4 incompatible to pacemaker 1.1.9?

2013-05-15 Thread Lars Ellenberg
On Wed, May 15, 2013 at 03:34:14PM +0200, Dejan Muhamedagic wrote:
 On Tue, May 14, 2013 at 10:03:59PM +0200, Lars Ellenberg wrote:
  On Tue, May 14, 2013 at 09:59:50PM +0200, Lars Ellenberg wrote:
   On Mon, May 13, 2013 at 01:53:11PM +0200, Michael Schwartzkopff wrote:
Hi,

crm tells me it is version 1.2.4
pacemaker tell me it is verison 1.1.9

So it should work since incompatibilities are resolved in crm higher 
that 
version 1.2.1. Anywas crm tells me nonsense:

# crm
crm(live)# node
crm(live)node# standby node1
ERROR: bad lifetime: node1
   
   Your node is not named node1.
   check: crm node list
   
   Maybe a typo, maybe some case-is-significant nonsense,
   maybe you just forgot to use the fqdn.
   maybe the check for is this a known node name is (now) broken?
   
   
   standby with just one argument checks if that argument
   happens to be a known node name,
   and assumes that if it is not,
   it has to be a lifetime,
   and the current node is used as node name...
   
   Maybe we should invert that logic, and instead compare the single
   argument against allowed lifetime values (reboot, forever), and assume
   it is supposed to be a node name otherwise?
   
   Then the error would become
   ERROR: unknown node name: node1
   
   Which is probably more useful most of the time.
   
   Dejan?
  
  Something like this maybe:
  
  diff --git a/modules/ui.py.in b/modules/ui.py.in
  --- a/modules/ui.py.in
  +++ b/modules/ui.py.in
  @@ -1185,7 +1185,7 @@ class NodeMgmt(UserInterface):
   if not args:
   node = vars.this_node
   if len(args) == 1:
  -if not args[0] in listnodes():
  +if args[0] in (reboot, forever):
 
 Yes, I wanted to look at it again. Another complication is that
 the lifetime can be just about anything in that date ISO format.

That may well be, but right now those would be rejected by crmsh
anyways:

if lifetime not in (None,reboot,forever):
common_err(bad lifetime: %s % lifetime)
return False

-- 
: Lars Ellenberg
: LINBIT | Your Way to High Availability
: DRBD/HA support and consulting http://www.linbit.com

___
Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org


Re: [Pacemaker] crm subshell 1.2.4 incompatible to pacemaker 1.1.9?

2013-05-14 Thread Lars Ellenberg
On Mon, May 13, 2013 at 01:53:11PM +0200, Michael Schwartzkopff wrote:
 Hi,
 
 crm tells me it is version 1.2.4
 pacemaker tell me it is verison 1.1.9
 
 So it should work since incompatibilities are resolved in crm higher that 
 version 1.2.1. Anywas crm tells me nonsense:
 
 # crm
 crm(live)# node
 crm(live)node# standby node1
 ERROR: bad lifetime: node1

Your node is not named node1.
check: crm node list

Maybe a typo, maybe some case-is-significant nonsense,
maybe you just forgot to use the fqdn.
maybe the check for is this a known node name is (now) broken?


standby with just one argument checks if that argument
happens to be a known node name,
and assumes that if it is not,
it has to be a lifetime,
and the current node is used as node name...

Maybe we should invert that logic, and instead compare the single
argument against allowed lifetime values (reboot, forever), and assume
it is supposed to be a node name otherwise?

Then the error would become
ERROR: unknown node name: node1

Which is probably more useful most of the time.

Dejan?

-- 
: Lars Ellenberg
: LINBIT | Your Way to High Availability
: DRBD/HA support and consulting http://www.linbit.com

___
Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org


Re: [Pacemaker] crm subshell 1.2.4 incompatible to pacemaker 1.1.9?

2013-05-14 Thread Lars Ellenberg
On Tue, May 14, 2013 at 09:59:50PM +0200, Lars Ellenberg wrote:
 On Mon, May 13, 2013 at 01:53:11PM +0200, Michael Schwartzkopff wrote:
  Hi,
  
  crm tells me it is version 1.2.4
  pacemaker tell me it is verison 1.1.9
  
  So it should work since incompatibilities are resolved in crm higher that 
  version 1.2.1. Anywas crm tells me nonsense:
  
  # crm
  crm(live)# node
  crm(live)node# standby node1
  ERROR: bad lifetime: node1
 
 Your node is not named node1.
 check: crm node list
 
 Maybe a typo, maybe some case-is-significant nonsense,
 maybe you just forgot to use the fqdn.
 maybe the check for is this a known node name is (now) broken?
 
 
 standby with just one argument checks if that argument
 happens to be a known node name,
 and assumes that if it is not,
 it has to be a lifetime,
 and the current node is used as node name...
 
 Maybe we should invert that logic, and instead compare the single
 argument against allowed lifetime values (reboot, forever), and assume
 it is supposed to be a node name otherwise?
 
 Then the error would become
 ERROR: unknown node name: node1
 
 Which is probably more useful most of the time.
 
 Dejan?

Something like this maybe:

diff --git a/modules/ui.py.in b/modules/ui.py.in
--- a/modules/ui.py.in
+++ b/modules/ui.py.in
@@ -1185,7 +1185,7 @@ class NodeMgmt(UserInterface):
 if not args:
 node = vars.this_node
 if len(args) == 1:
-if not args[0] in listnodes():
+if args[0] in (reboot, forever):
 node = vars.this_node
 lifetime = args[0]
 else:

-- 
: Lars Ellenberg
: LINBIT | Your Way to High Availability
: DRBD/HA support and consulting http://www.linbit.com

___
Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org


Re: [Pacemaker] Exchanging data between resource agent instances

2013-03-19 Thread Lars Ellenberg
On Tue, Mar 19, 2013 at 08:22:39AM +0100, Riccardo Bicelli wrote:
 Because I'm trying to set up an active/standby scsi cluster using alua. I
 need to create a dummy device in the same size of the real device.

Is that so.  What for?
Can you explain in more detail?

 For getting dev size I use blockdev --getsize64 device_name
 The problem is, when I'm using DRBD, that blockdev fails on slave device.

Well, then use awk '/ drbd0$/ { print $3 * 1024 }' /proc/partitions
No?


-- 
: Lars Ellenberg
: LINBIT | Your Way to High Availability
: DRBD/HA support and consulting http://www.linbit.com

___
Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org


Re: [Pacemaker] Exchanging data between resource agent instances

2013-03-18 Thread Lars Ellenberg
On Mon, Mar 18, 2013 at 08:49:41PM +0100, Riccardo Bicelli wrote:
 Hello,
 anyone knows if is it possible to exchange data between two instances of a
 resource agent?
 
 I have a Master/Slave resource agent that, when slave,  has to create a
 dummy device in same size of a given block device (DRBD) running on Master.

Why?
What do you want to achieve?

 Since the  block device is not accessible when the resource is slave, I was
 wondering if master could read size of device and report it to the slave.

does cat /proc/partitions help?

 I don't like the idea of putting that size in the cib.

-- 
: Lars Ellenberg
: LINBIT | Your Way to High Availability
: DRBD/HA support and consulting http://www.linbit.com

___
Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org


Re: [Pacemaker] Pacemaker DRBD as Physical Volume on Encrypted RAID1

2013-03-06 Thread Lars Ellenberg
On Mon, Mar 04, 2013 at 04:27:24PM -0500, senrab...@aol.com wrote:
 Hi All:
 
 We're new to pacemaker (just got some great help from this forum
 getting it working with LVM as backing device), and would like to
 explore the Physical Volume option. We're trying configure on top of
 an existing Encrypted RAID1 set up and employ LVM.
 
 NOTE:  our goal is to run many virtual servers, each in its own
 logical volume and it looks like putting LVM on top of the DRBD would
 allow us to add logical volumes on the fly, but also have a
 simpler setup with one drbd device for all the logical volumes and
 one related pacemaker config.  Hence, exploring DRBD as a physical
 volume.


A single DRBD has a single activity log,
running many virtual servers from there will very likely cause
the worst possible workload (many totally random writes).

You really want to use DRBD 8.4.3,
see https://blogs.linbit.com/p/469/843-random-writes-faster/
for why.


 Q:  For pacemaker to work, how do we do the DRBD disk/device mapping
 in the drbd.conf file?  And should we set things up and encrypt last,
 or can we apply DRBD and Pacemaker to an existing Encypted RAID1
 setup?


Neither Pacemaker nor DRBD do particularly care.

If you want to stack the encryption layer on top of DRBD, fine.
(you'd probably need to teach some pacemaker resource agent to start
the encryption layer).

If you want to stack DRBD on top of the encryption layer, just as fine.

Unless you provide the decryption key in plaintext somewhere, failover
will likely be easier to automate if you have DRBD on top of encryption,
so if you want the real device encrypted, I'd recommend to put
encryption below DRBD.

Obviously, the DRBD replication traffic will still be plaintext in
that case.

 The examples we've seen show mapping between the drbd device and a
 physical disk (e.g., sdb) in the drbd.conf, and then  pvcreate
 /dev/drbdnum and creating a volume group and logical volume on the
 drbd device.
 
 So for this type of set up, drbd.conf might look like:
 
 device/dev/drbd1;
 disk  /dev/sdb;
 address xx.xx.xx.xx:7789;
 meta-disk internal;
 
 In our case, because we have an existing RAID1 (md2) and it's
 encrypted (md2_crypt or /dev/dm-7 ...  we're unsure which partition
 actually has the data), any thoughts on how to do the DRBD mapping?
 E.g., 
 
 device /dev/drbd1 minor 1;
 disk /dev/???;
 address xx.xx.xx.xx:7789; 
 meta-disk internal;
 
 I.e., what goes in the disk /dev/?;?  Would it be disk 
 /dev/md2_crypt;?

Yes.

 And can we do our setup on an existing Encrypted RAID1 setup

Yes.

 (if we do pvcreate on drbd1, we get errors)?

Huh?

-- 
: Lars Ellenberg
: LINBIT | Your Way to High Availability
: DRBD/HA support and consulting http://www.linbit.com

DRBD® and LINBIT® are registered trademarks of LINBIT, Austria.

___
Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org


Re: [Pacemaker] Trouble with ocf:Squid resource agent

2013-02-08 Thread Lars Ellenberg
On Mon, Aug 13, 2012 at 02:07:46PM +0200, Dejan Muhamedagic wrote:
 Hi,
 
 On Mon, Jul 30, 2012 at 12:09:10PM -0400, Jake Smith wrote:
  
  - Original Message -
   From: Julien Cornuwel cornu...@gmail.com
   To: pacemaker@oss.clusterlabs.org
   Sent: Wednesday, July 25, 2012 5:51:28 AM
   Subject: Re: [Pacemaker] Trouble with ocf:Squid resource agent
   
   Oops! Spoke too fast. The fix below allows squid to start. But the
   script also has problems in the 'stop' part. It is stuck in an
   infinite loop and here are the logs (repeats every second) :
   
   Jul 25 11:38:47 corsen-a lrmd: [24099]: info: RA output:
   (Proxy:stop:stderr) /usr/lib/ocf/resource.d//heartbeat/Squid: line
   320: kill: -: arguments must be process or job IDs
   Jul 25 11:38:47 corsen-a lrmd: [24099]: info: RA output:
   (Proxy:stop:stderr) /usr/lib/ocf/resource.d//heartbeat/Squid: line
   320: kill: -: arguments must be process or job IDs
   Jul 25 11:38:48 corsen-a Squid(Proxy)[24659]: [25682]: INFO:
   squid:stop_squid:318:  try to stop by SIGKILL: -
   Jul 25 11:38:48 corsen-a Squid(Proxy)[24659]: [25682]: INFO:
   squid:stop_squid:318:  try to stop by SIGKILL: -
   
   Being on a deadline, I'll use the lsb script for the moment. If
   someone figures out how to use this ocf script, I'm very interrested.
   

Did you try to use the current version of the script?

It very much looks like you miss out on this fix:

commit cbf70945f162aa296dacfc07817f1764a76e412e
Author: Dejan Muhamedagic de...@suse.de
Date:   Mon Oct 1 12:43:29 2012 +0200

Medium: Squid: fix getting PIDs of squid processes (lf#2653)

See
https://github.com/ClusterLabs/resource-agents/commit/cbf70945f162aa296dacfc07817f1764a76e412e

(and some other fixes that come later!)

Fixed! The problem comes from the squid ocf script
(/usr/lib/ocf/resource.d/heartbeat/Squid) that doesn't handle IPv6
addresses correctly.
All you have to do is modify the line 198 as such :
awk '/(tcp.*[0-9]+\.[0-9]+\.+[0-9]+\.[0-9]+:'$SQUID_PORT'
|tcp.*:::'$SQUID_PORT' )/{

This is supposed to be fixed as well
in the current version of that script...

 Yes. If somebody opens a bugzilla at LF
 (https://developerbugs.linuxfoundation.org/) or an issue at
 https://github.com/ClusterLabs/resource-agents somebody
 (hopefully the author) will take care of it.

As I wrote, I think both of these are already fixed.

Please use resource-agents v3.9.5.

Lars

-- 
: Lars Ellenberg
: LINBIT | Your Way to High Availability
: DRBD/HA support and consulting http://www.linbit.com

DRBD® and LINBIT® are registered trademarks of LINBIT, Austria.

___
Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org


Re: [Pacemaker] Trouble with ocf:Squid resource agent

2013-02-08 Thread Lars Ellenberg
On Fri, Feb 08, 2013 at 11:21:15AM +0100, Lars Ellenberg wrote:
 On Mon, Aug 13, 2012 at 02:07:46PM +0200, Dejan Muhamedagic wrote:

Appologies, I did not look at the date of the Post.
For some reason it appeart as first unread, and I assumed it was
recent. D'oh.

 :-)

 Please use resource-agents v3.9.5.

Lars


___
Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org


Re: [Pacemaker] unable to load drbd module

2012-09-17 Thread Lars Ellenberg
On Thu, Aug 30, 2012 at 10:09:09AM -0700, ROBERTO GUERERO wrote:
 hi guys,
 I was following the pacemaker 1.1 pdf to setup HA on my Centos 6.3 all
 went ok until i reached storage with drbd, after following the
 instruction and start to modprobe drbd error showed up and says cannot
 allocate memory.

 Kindly advise on how to fix this issue.


You likely are using a drbd module compiled against RHEL 6.2 kernel
headers on a RHEL 6.3 kernel, and you are 32bit.

Does not work, but be happy that you did not try it the other way
around: trying to modprobe a drbd compiled against 6.3  headers
on a 32bit 6.2 kernel will panic the box...

They pretend to have a stable kABI,
but still they break occasionally.

At least they try harder to keep that kABI stable within a sub-release.


Sorry, but there is not much we can do.

Please install a kmod-drbd that is matching your kernel,
or compile yourself against matching kernel headers.


-- 
: Lars Ellenberg
: LINBIT | Your Way to High Availability
: DRBD/HA support and consulting http://www.linbit.com

DRBD® and LINBIT® are registered trademarks of LINBIT, Austria.

___
Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org


Re: [Pacemaker] drbd under pacemaker - always get split brain

2012-07-12 Thread Lars Ellenberg
On Wed, Jul 11, 2012 at 11:38:52AM +0200, Nikola Ciprich wrote:
  Well, I'd expect that to be safer as your current configuration ...
  discard-zero-changes will never overwrite data automatically  have
  you tried adding the start-delay to DRBD start operation? I'm curious if
  that is already sufficient for your problem.
 Hi,
 
 tried 
 op id=drbd-sas0-start-0 interval=0 name=start start-delay=10s 
 timeout=240s/
 (I hope it's the setting You've meant, although I'm not sure, I haven't found 
 any documentation
 on start-delay option)
 
 but didn't help..

Of course not.


You Problem is this:

DRBD config:
   allow-two-primaries,
   but *NO* fencing policy,
   and *NO* fencing handler.

And, as if that was not bad enough already,
Pacemaker config:
no-quorum-policy=ignore \
stonith-enabled=false

D'oh.

And then, well,
your nodes come up some minute+ after each other,
and Pacemaker and DRBD behave exactly as configured:


Jul 10 06:00:12 vmnci20 crmd: [3569]: info: do_state_transition: All 1 cluster 
nodes are eligible to run resources.


Note the *1* ...

So it starts:
Jul 10 06:00:12 vmnci20 pengine: [3568]: notice: LogActions: Start   
drbd-sas0:0(vmnci20)

But leaves:
Jul 10 06:00:12 vmnci20 pengine: [3568]: notice: LogActions: Leave   
drbd-sas0:1(Stopped)
as there is no peer node yet.


And on the next iteration, we still have only one node:
Jul 10 06:00:15 vmnci20 crmd: [3569]: info: do_state_transition: All 1 cluster 
nodes are eligible to run resources.

So we promote:
Jul 10 06:00:15 vmnci20 pengine: [3568]: notice: LogActions: Promote 
drbd-sas0:0(Slave - Master vmnci20)


And only some minute later, the peer node joins:
Jul 10 06:01:33 vmnci20 crmd: [3569]: info: do_state_transition: State 
transition S_INTEGRATION - S_FINALIZE_JOIN [ input=I_INTEGRATED 
cause=C_FSA_INTERNAL origin=check_join_state ]
Jul 10 06:01:33 vmnci20 crmd: [3569]: info: do_state_transition: All 2 cluster 
nodes responded to the join offer.

So now we can start the peer:

Jul 10 06:01:33 vmnci20 pengine: [3568]: notice: LogActions: Leave   
drbd-sas0:0(Master vmnci20)
Jul 10 06:01:33 vmnci20 pengine: [3568]: notice: LogActions: Start   
drbd-sas0:1(vmnci21)


And it even is promoted right away:
Jul 10 06:01:36 vmnci20 pengine: [3568]: notice: LogActions: Promote 
drbd-sas0:1(Slave - Master vmnci21)

And within those 3 seconds, DRBD was not able to establish the connection yet.


You configured DRBD and Pacemaker to produce data divergence.
Not suprisingly, that is exactly what you get.



Fix your Problem.
See above; hint: fencing resource-and-stonith,
crm-fence-peer.sh + stonith_admin,
add stonith, maybe add a third node so you don't need to ignore quorum,
...

And all will be well.



-- 
: Lars Ellenberg
: LINBIT | Your Way to High Availability
: DRBD/HA support and consulting http://www.linbit.com

___
Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org


Re: [Pacemaker] drbd under pacemaker - always get split brain

2012-07-12 Thread Lars Ellenberg
On Thu, Jul 12, 2012 at 04:23:51PM +0200, Nikola Ciprich wrote:
 Hello Lars,
 
 thanks for Your reply..
 
  You Problem is this:
  
  DRBD config:
 allow-two-primaries,
 but *NO* fencing policy,
 and *NO* fencing handler.
  
  And, as if that was not bad enough already,
  Pacemaker config:
  no-quorum-policy=ignore \
  stonith-enabled=false
 
 yes, I've written it's just test cluster on virtual machines. therefore no 
 fencing devices.
 
 however I don't think it's the whole problem source, I've tried starting 
 node2 much later
 after node1 (actually node1 has been running for about 1 day), and got right 
 into same situation..
 pacemaker just doesn't wait long enough before the drbds can connect at all 
 and seems to promote them both.
 it really seems to be regression to me, as this was always working well...

It is not.

Pacemaker may just be quicker to promote now,
or in your setup other things may have changed
which also changed the timing behaviour.

But what you are trying to do has always been broken,
and will always be broken.

 even though I've set no-quorum-policy to freeze, the problem returns as soon 
 as cluster becomes quorate..
 I have all split-brain and fencing scripts in drbd disabled intentionaly so I 
 had chance to investigate, otherwise
 one of the nodes always commited suicide but there should be no reason for 
 split brain..

Right.

That's why shooting as in stonith is not good enough a fencing
mechanism in a drbd dual Primary cluster. You also need to tell the peer
that it is outdated, respectively must not become Primary or Master
until it synced up (or at least, *starts* to sync up).

You can do that using the crm-fence-peer.sh (it does not actually tell
DRBD that it is outdated, but it tells Pacemaker to not promote that
other node, which is even better, if the rest of the system is properly set up.

crm-fence-peer.sh alone is also not good enough in certain situations.
That's why you need both, the drbd fence-peer mechanism *and* stonith.

 
 cheers!
 
 nik
 
 
 
 
  D'oh.
  
  And then, well,
  your nodes come up some minute+ after each other,
  and Pacemaker and DRBD behave exactly as configured:
  
  
  Jul 10 06:00:12 vmnci20 crmd: [3569]: info: do_state_transition: All 1 
  cluster nodes are eligible to run resources.
  
  
  Note the *1* ...
  
  So it starts:
  Jul 10 06:00:12 vmnci20 pengine: [3568]: notice: LogActions: Start   
  drbd-sas0:0(vmnci20)
  
  But leaves:
  Jul 10 06:00:12 vmnci20 pengine: [3568]: notice: LogActions: Leave   
  drbd-sas0:1(Stopped)
  as there is no peer node yet.
  
  
  And on the next iteration, we still have only one node:
  Jul 10 06:00:15 vmnci20 crmd: [3569]: info: do_state_transition: All 1 
  cluster nodes are eligible to run resources.
  
  So we promote:
  Jul 10 06:00:15 vmnci20 pengine: [3568]: notice: LogActions: Promote 
  drbd-sas0:0(Slave - Master vmnci20)
  
  
  And only some minute later, the peer node joins:
  Jul 10 06:01:33 vmnci20 crmd: [3569]: info: do_state_transition: State 
  transition S_INTEGRATION - S_FINALIZE_JOIN [ input=I_INTEGRATED 
  cause=C_FSA_INTERNAL origin=check_join_state ]
  Jul 10 06:01:33 vmnci20 crmd: [3569]: info: do_state_transition: All 2 
  cluster nodes responded to the join offer.
  
  So now we can start the peer:
  
  Jul 10 06:01:33 vmnci20 pengine: [3568]: notice: LogActions: Leave   
  drbd-sas0:0(Master vmnci20)
  Jul 10 06:01:33 vmnci20 pengine: [3568]: notice: LogActions: Start   
  drbd-sas0:1(vmnci21)
  
  
  And it even is promoted right away:
  Jul 10 06:01:36 vmnci20 pengine: [3568]: notice: LogActions: Promote 
  drbd-sas0:1(Slave - Master vmnci21)
  
  And within those 3 seconds, DRBD was not able to establish the connection 
  yet.
  
  
  You configured DRBD and Pacemaker to produce data divergence.
  Not suprisingly, that is exactly what you get.
  
  
  
  Fix your Problem.
  See above; hint: fencing resource-and-stonith,
  crm-fence-peer.sh + stonith_admin,
  add stonith, maybe add a third node so you don't need to ignore quorum,
  ...
  
  And all will be well.


-- 
: Lars Ellenberg
: LINBIT | Your Way to High Availability
: DRBD/HA support and consulting http://www.linbit.com

DRBD® and LINBIT® are registered trademarks of LINBIT, Austria.

___
Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org


Re: [Pacemaker] Two slave nodes, neither will promote to Master

2012-07-03 Thread Lars Ellenberg
On Mon, Jun 25, 2012 at 04:48:50PM +0100, Regendoerp, Achim wrote:
 Hi,
 
 I'm currently looking at two VMs which are supposed to mount a drive in
 a given directory, depending on who's the master. This was decided above
 me, therefore no DRBD stuff (which would've made things easier), but
 still using corosync/pacemaker to do the cluster work.
 
 As it is currently, both nodes are online and configured, but none are
 switching to Master. In lack of a DRBD resource, I tried using the Dummy
 Pacemaker. If that's not the correct RA, please enlighten me on this too.


As has been stated already, to simulate a Stateful resource use the
ocf:pacemaker:Stateful agent.

But... iiuc, you are using a shared disk.

Why would you want that dummy resource at all?
why not simply:

 Below's the current config:
 
 node NODE01 \
 attributes standby=off
 node NODE02 \
 attributes standby=off
 primitive clusterIP ocf:heartbeat:IPaddr2 \
 params ip=10.64.96.31 nic=eth1:1 \
 op monitor on-fail=restart interval=5s
 primitive clusterIParp ocf:heartbeat:SendArp \
 params ip=10.64.96.31 nic=eth1:1
 primitive fs_nfs ocf:heartbeat:Filesystem \
 params device=/dev/vg_shared/lv_nfs_01 directory=/shared
 fstype=ext4 \
 op start interval=0 timeout=240 \
 op stop interval=0 timeout=240 on-fail=restart

delete that:
- primitive ms_dummy ocf:pacemaker:Dummy \
- op start interval=0 timeout=240 \
- op stop interval=0 timeout=240 \
- op monitor interval=15 role=Master timeout=240 \
- op monitor interval=30 role=Slave on-fail=restart timeout-240

 primitive nfs_share ocf:heartbeat:nfsserver \
 params nfs_ip=10.64.96.31 nfs_init_script=/etc/init.d/nfs
 nfs_shared_infodir=/shared/nfs nfs_notify_cmd=/sbin/rpc.statd \
 op start interval=0 timeout=240 \
 op stop interval=0 timeout=240 on-fail=restart
 group Services clusterIP clusterIParp fs_nfs nfs_share \
 meta target-role=Started is-managed=true
 multiple-active=stop_start

and that:
- ms ms_nfs ms_dummy \
- meta target-role=Master master-max=1 master-node=1 
clone-max=2 clone-node-max=1 notify=true

and that:
- colocation services_on_master inf: Services ms_nfs:Master
- order fs_before_services inf: ms_nfs:promote Services:start

 property $id=cib-bootstrap-options \
 dc-version=1.1.6-3.el6-a02c0f19a00c1eb2527ad38f146ebc0834814558 \
 cluster-infrastructure=openais \
 expected-quorum-votes=2 \
 no-quorum-policy=ignore \
 stonith-enabled=false
 rsc_defaults $id=rsc-options \
 resource-stickiness=200

That's all you need for a shared disk cluster.

Well. Almost.
Of course you have to configure, enable, test and use stonith.

-- 
: Lars Ellenberg
: LINBIT | Your Way to High Availability
: DRBD/HA support and consulting http://www.linbit.com

___
Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org


Re: [Pacemaker] Why Did Pacemaker Restart this VirtualDomain Resource?

2012-06-19 Thread Lars Ellenberg
On Tue, Jun 19, 2012 at 11:12:46AM -0500, Andrew Martin wrote:
 Hi Emmanuel, 
 
 
 Thanks for the idea. I looked through the rest of the log and these
 return code 8 errors on the ocf:linbit:drbd resources are occurring
 at other intervals (e.g. today) when the VirtualDomain resource is
 unaffected. This seems to indicate that these soft errors do not

No soft error here.
monitor exit code 8 is OCF_RUNNING_MASTER.
expected an healthy.

Lars

 trigger a restart of the VirtualDomain resource. Is there anything
 else in the log that could indicate what caused this, or is there
 somewhere else I can look? 
 
 
 Thanks, 
 
 
 Andrew 
 
 - Original Message -
 
 From: emmanuel segura  emi2f...@gmail.com  
 To: The Pacemaker cluster resource manager  pacemaker@oss.clusterlabs.org 
  
 Sent: Tuesday, June 19, 2012 9:57:19 AM 
 Subject: Re: [Pacemaker] Why Did Pacemaker Restart this VirtualDomain 
 Resource? 
 
 I didn't see any error in your config, the only thing i seen it's this 
 == 
 Jun 14 15:35:27 vmhost1 lrmd: [3853]: info: rsc:p_drbd_vmstore:0 
 monitor[55] (pid 12323) 
 Jun 14 15:35:27 vmhost1 lrmd: [3853]: info: rsc:p_drbd_mount2:0 monitor[53] 
 (pid 12324) 
 Jun 14 15:35:27 vmhost1 lrmd: [3853]: info: operation monitor[55] on 
 p_drbd_vmstore:0 for client 3856: pid 12323 exited with return code 8 
 Jun 14 15:35:27 vmhost1 lrmd: [3853]: info: operation monitor[53] on 
 p_drbd_mount2:0 for client 3856: pid 12324 exited with return code 8 
 Jun 14 15:35:31 vmhost1 lrmd: [3853]: info: rsc:p_drbd_mount1:0 monitor[54] 
 (pid 12396) 
 = 
 it can be a drbd problem, but i tell you the true i'm not sure 

___
Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org


Re: [Pacemaker] Why Did Pacemaker Restart this VirtualDomain Resource?

2012-06-19 Thread Lars Ellenberg
On Tue, Jun 19, 2012 at 09:38:50AM -0500, Andrew Martin wrote:
 Hello, 
 
 
 I have a 3 node Pacemaker+Heartbeat cluster (two real nodes and one standby 
 quorum node) with Ubuntu 10.04 LTS on the nodes and using the 
 Pacemaker+Heartbeat packages from the Ubuntu HA Team PPA ( 
 https://launchpad.net/~ubuntu-ha-maintainers/+archive/ppa ). I have 
 configured 3 DRBD resources, a filesystem mount, and a KVM-based virtual 
 machine (using the VirtualDomain resource). I have constraints in place so 
 that the DRBD devices must become primary and the filesystem must be mounted 
 before the VM can start: 

 location loc_run_on_most_connected g_vm \ 
 rule $id=loc_run_on_most_connected-rule p_ping: defined p_ping 

This is the rule

 This has been working well, however last week Pacemaker all of a
 sudden stopped the p_vm_myvm resource and then started it up again. I
 have attached the relevant section of /var/log/daemon.log - I am
 unable to determine what caused Pacemaker to restart this resource.
 Based on the log, could you tell me what event triggered this? 
 
 
 Thanks, 
 
 
 Andrew 

 Jun 14 15:25:00 vmhost1 lrmd: [3853]: info: rsc:p_sysadmin_notify:0 
 monitor[18] (pid 3661)
 Jun 14 15:25:00 vmhost1 lrmd: [3853]: info: operation monitor[18] on 
 p_sysadmin_notify:0 for client 3856: pid 3661 exited with return code 0
 Jun 14 15:26:42 vmhost1 cib: [3852]: info: cib_stats: Processed 219 
 operations (182.00us average, 0% utilization) in the last 10min
 Jun 14 15:32:43 vmhost1 lrmd: [3853]: info: operation monitor[22] on p_ping:0 
 for client 3856: pid 10059 exited with return code 0
 Jun 14 15:35:27 vmhost1 lrmd: [3853]: info: rsc:p_drbd_vmstore:0 monitor[55] 
 (pid 12323)
 Jun 14 15:35:27 vmhost1 lrmd: [3853]: info: rsc:p_drbd_mount2:0 monitor[53] 
 (pid 12324)
 Jun 14 15:35:27 vmhost1 lrmd: [3853]: info: operation monitor[55] on 
 p_drbd_vmstore:0 for client 3856: pid 12323 exited with return code 8
 Jun 14 15:35:27 vmhost1 lrmd: [3853]: info: operation monitor[53] on 
 p_drbd_mount2:0 for client 3856: pid 12324 exited with return code 8
 Jun 14 15:35:31 vmhost1 lrmd: [3853]: info: rsc:p_drbd_mount1:0 monitor[54] 
 (pid 12396)
 Jun 14 15:35:31 vmhost1 lrmd: [3853]: info: operation monitor[54] on 
 p_drbd_mount1:0 for client 3856: pid 12396 exited with return code 8
 Jun 14 15:36:42 vmhost1 cib: [3852]: info: cib_stats: Processed 220 
 operations (272.00us average, 0% utilization) in the last 10min
 Jun 14 15:37:34 vmhost1 lrmd: [3853]: info: rsc:p_vm_myvm monitor[57] (pid 
 14061)
 Jun 14 15:37:34 vmhost1 lrmd: [3853]: info: operation monitor[57] on 
 p_vm_myvm for client 3856: pid 14061 exited with return code 0

 Jun 14 15:42:35 vmhost1 attrd: [3855]: notice: attrd_trigger_update: Sending 
 flush op to all hosts for: p_ping (1000)
 Jun 14 15:42:35 vmhost1 attrd: [3855]: notice: attrd_perform_update: Sent 
 update 163: p_ping=1000

And here the score on the location constraint changes for this node.

You asked for run on most connected, and your pingd resource
determined that the other one was better connected.


 Jun 14 15:42:36 vmhost1 crmd: [3856]: info: do_lrm_rsc_op: Performing 
 key=136:2351:0:7f6d66f7-cfe5-4820-8289-0e47d8c9102b op=p_vm_myvm_stop_0 )
 Jun 14 15:42:36 vmhost1 lrmd: [3853]: info: rsc:p_vm_myvm stop[58] (pid 18174)

...

 Jun 14 15:43:32 vmhost1 attrd: [3855]: notice: attrd_trigger_update: Sending 
 flush op to all hosts for: p_ping (2000)
 Jun 14 15:43:32 vmhost1 attrd: [3855]: notice: attrd_perform_update: Sent 
 update 165: p_ping=2000

And there it is back on 2000 again ...

Lars

-- 
: Lars Ellenberg
: LINBIT | Your Way to High Availability
: DRBD/HA support and consulting http://www.linbit.com

DRBD® and LINBIT® are registered trademarks of LINBIT, Austria.

___
Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org


Re: [Pacemaker] Changing name/location of resource script

2012-06-12 Thread Lars Ellenberg
On Tue, Jun 12, 2012 at 08:52:27AM -0700, Walter Feddern wrote:
 I have a 4 node cluster running about 120 tomcat resources. Currently they 
 are using the stock tomcat resource script ( ocf:heartbeat:tomcat )
 
 As I may need to make some adjustments to the script for our environment, I 
 would like to move it out of the heartbeat directory. I have created a 
 directory 'custom', and can edit the resource manually using:
 
 crm configure edit tomcat_rsc1
 
 then making the change using 'vi'
 
 As I have to make the change to 120 resources, I would like find a way
 to automate it a bit more, but have not been able to find an easy way
 to make the change on the command line.

crm configure edit, then :%s///
... but wait ...
crm configure help filter

careful, that one is a bit tricky to get right.

 
 Any suggestions?
 
 Thanks,
 Walter.

-- 
: Lars Ellenberg
: LINBIT | Your Way to High Availability
: DRBD/HA support and consulting http://www.linbit.com

___
Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org


Re: [Pacemaker] Announce: pcs / pcs-gui (Pacemaker/Corosync Configuration System)

2012-06-08 Thread Lars Ellenberg
On Wed, Jun 06, 2012 at 07:22:47PM +0200, Rasto Levrinc wrote:
 On Wed, Jun 6, 2012 at 4:45 PM, Lars Ellenberg
 lars.ellenb...@linbit.com wrote:
  On Tue, Jun 05, 2012 at 05:15:04PM +0200, Rasto Levrinc wrote:
  On Tue, Jun 5, 2012 at 1:27 PM, Lars Marowsky-Bree l...@suse.com wrote:
   On 2012-06-05T09:43:09, Andrew Beekhof and...@beekhof.net wrote:
  
   Every argument made so far applies equally to HAWK and the Linbit GUI,
   yet there was no outcry when they were announced.
  
   No, like I said above, that did suck - but the architecture truly is
   different and drbd-mc just wasn't the right answer for customers who
   wanted a HTML-only frontend. Besides, this is not an outcry. An outcry
   is revoking people's mailing list privileges and posting angry blogs.
   ;-)
 
  Ok, I see the point of both sides, so I will not join the outcry. :)
 
  Just for the record, the drbd mc / lcmc as an applet and a little bit
  backend could look like a web application, only better.
 
  ... once it is cleaned up to not try to use up a couple GB of RAM and
  loop in the GC, while the typical default browser plugin JVM settings
  allow for a handful of MB, max ...  that cleanup may be useful anyways.
 
 I haven't seen such behavior and I don't know your configuration, so
 thanks for the bug-report, I guess. :)

To be fair, that was on a slow 32bit windows xp in an old IE with
probably old-ish java [*], and a default memory setting for plugin JVM
or (I think) 64M. the config was very simple at that point, like
two node, two drbd, one iSCSI target and lun and IP each,
done from crm shell. After some time things became visible,
but once you started to do something, it would start garbage collecting
and never become responsive again.

Once we started a standalone java, and adjusted the memory parameters
to allow for 500 or 800 or so MB, it became useable.

[*] so it may have been only the old java, even. who knows.

I did not try to reproduce yet in any ways.  But still, even on very
simple configurations, the memory consumption of LCMC can be excessive,
for whatever reason.

-- 
: Lars Ellenberg
: LINBIT | Your Way to High Availability
: DRBD/HA support and consulting http://www.linbit.com

DRBD® and LINBIT® are registered trademarks of LINBIT, Austria.

___
Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org


Re: [Pacemaker] Announce: pcs / pcs-gui (Pacemaker/Corosync Configuration System)

2012-06-06 Thread Lars Ellenberg
On Tue, Jun 05, 2012 at 05:15:04PM +0200, Rasto Levrinc wrote:
 On Tue, Jun 5, 2012 at 1:27 PM, Lars Marowsky-Bree l...@suse.com wrote:
  On 2012-06-05T09:43:09, Andrew Beekhof and...@beekhof.net wrote:
 
  Every argument made so far applies equally to HAWK and the Linbit GUI,
  yet there was no outcry when they were announced.
 
  No, like I said above, that did suck - but the architecture truly is
  different and drbd-mc just wasn't the right answer for customers who
  wanted a HTML-only frontend. Besides, this is not an outcry. An outcry
  is revoking people's mailing list privileges and posting angry blogs.
  ;-)
 
 Ok, I see the point of both sides, so I will not join the outcry. :)
 
 Just for the record, the drbd mc / lcmc as an applet and a little bit
 backend could look like a web application, only better.

... once it is cleaned up to not try to use up a couple GB of RAM and
loop in the GC, while the typical default browser plugin JVM settings
allow for a handful of MB, max ...  that cleanup may be useful anyways.

 ;)

I still like LCMC.

Lars

___
Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org


Re: [Pacemaker] [RFC] [Patch] DC node preferences (dc-priority)

2012-05-25 Thread Lars Ellenberg
On Fri, May 25, 2012 at 10:50:25AM +1000, Andrew Beekhof wrote:
 On Fri, May 25, 2012 at 10:04 AM, Lars Ellenberg
 lars.ellenb...@linbit.com wrote:
  On Sun, May 06, 2012 at 09:45:09PM +1000, Andrew Beekhof wrote:
  On Thu, May 3, 2012 at 5:38 PM, Lars Ellenberg
  lars.ellenb...@linbit.com wrote:
  
   People sometimes think they have a use case
   for influencing which node will be the DC.
 
  Agreed :-)
 
  
   Sometimes it is latency (certain cli commands work faster
   when done on the DC),
 
  Config changes can be run against any node, there is no reason to go
  to the one on the DC.
 
   sometimes they add a mostly quorum
   node which may be not quite up to the task of being DC.
 
  I'm not sure I buy that.  Most of the load would comes from the
  resources themselves.
 
   Prohibiting a node from becoming DC completely would
   mean it can not even be cleanly shutdown (with 1.0.x, no MCP),
   or act on its own resources for certain no-quorum policies.
  
   So here is a patch I have been asked to present for discussion,
 
  May one ask where it originated?
 
   against Pacemaker 1.0, that introduces a dc-prio configuration
   parameter, which will add some skew to the election algorithm.
  
  
   Open questions:
    * does it make sense at all?
 
  Doubtful :-)
 
  
    * election algorithm compatibility, stability:
     will the election be correct if some nodes have this patch,
     and some don't ?
 
  Unlikely, but you could easily make it so by placing it after the
  version check (and bumping said version in the patch)
 
    * How can it be improved so that a node with dc-prio=0 will
     give up its DC-role as soon as there is at least one other node
     with dc-prio  0?
 
  Short of causing an election every time a node joins... I doubt it.
 
  Where would be a suitable place in the code/fsa to do so?
 
 Just after the call to exit(0) :)

Just what I thought ;-)

 I'd do it at the end of do_started() but only if dc-priority*  0.
 That way you only cause an election if someone who is likely to win it starts.
 And people that don't enable this feature are unaffected.

 * Not dc-prio, its 2012, there's no need to save the extra 4 chars :-)

Thanks,

-- 
: Lars Ellenberg
: LINBIT | Your Way to High Availability
: DRBD/HA support and consulting http://www.linbit.com

DRBD® and LINBIT® are registered trademarks of LINBIT, Austria.

___
Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org


Re: [Pacemaker] Debug message granularity

2012-05-25 Thread Lars Ellenberg
On Wed, May 23, 2012 at 08:37:44AM +1000, Andrew Beekhof wrote:
 On Tue, May 22, 2012 at 9:51 PM, Ron Kerry rke...@sgi.com wrote:
  On 5/22/12 3:33 AM, Andrew Beekhof wrote:
 
  and I see nothing in
    pacemaker itself that gives me any separate controls over its logging
    verbosity.
 
  Which is why I mentioned:
 
  
    You should be able to define
    PCMK_trace_functions=nction1,function2,... as an environment
 
  There is also PCMK_trace_files.
  Depending on your version you may also be able to set
  PCMK_debug=crmd,pengine,... or send SIGUSR1 to the process to increase
  the log level
 
  
    variable to get additional information from just those functions.
    It might take a bit of searching through source code to find the
   ones
    you care about, but it is possible.
 
 
  Thanks! I actually have a couple of different versions I am dealing with. I
  will poke through the source for the newest one (SLES11 SP2 ... pacemaker
  1.1.6) I have and see what I can do. I actually do not have a specific
  problem I am tracking right now. I am just trying to develop a tool kit of
  things to do when one of our customers runs into resource issues.
 
 Makes sense.
 FYI: In future versions (1.1.8 onwards) sending SIGUSR1 to a process
 (or setting PCMK_blackbox) will enable a logging blackbox.
 This is a rolling buffer of all possible log messages (including debug
 and optionally traces) that can be dumped to a separate file by
 sending SIGTRAP.
 If enabled, we also dump it to a file when asserts are triggered.
 
 This provides easy access to copious amounts of debug for resolving
 issues without requiring rebuilds, restarts or needlessly spamming
 syslog.

/me dances a jig and a reel

Lars

___
Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org


Re: [Pacemaker] [RFC] [Patch] DC node preferences (dc-priority)

2012-05-25 Thread Lars Ellenberg
On Fri, May 25, 2012 at 11:15:32AM +0200, Florian Haas wrote:
 On Fri, May 25, 2012 at 10:45 AM, Lars Ellenberg
 lars.ellenb...@linbit.com wrote:
  Sorry, sent to early.
 
  That would not catch the case of cluster partitions joining,
  only the pacemaker startup with fully connected cluster communication
  already up.
 
  I thought about a dc-priority default of 100,
  and only triggering a re-election if I am DC,
  my dc-priority is  50, and I see a node joining.
 
 Hardcoded arbitrary defaults aren't that much fun. You can use any
 number, but 100 is the magic threshold is something I wouldn't want
 to explain to people over and over again.

Then don't ;-)

Not helping, and irrelevant to this case.

Besides that was an example.
Easily possible: move the I want to lose vs I want to win
magic number to be 0, and allow both positive and negative priorities.
You get to decide whether positive or negative is the I'd rather lose
side. Want to make that configurable as well? Right.

I don't think this can be made part of the cib configuration,
DC election takes place before cibs are resynced, so if you have
diverging cibs, you possibly end up with a never ending election?

Then maybe the election is stable enough,
even after this change to the algorithm.

But you'd need to add an other trigger on dc-priority in configuration
changed, complicating this stuff for no reason.

 We actually discussed node defaults a while back. Those would be
 similar to resource and op defaults which Pacemaker already has, and
 set defaults for node attributes for newly joined nodes. At the time
 the idea was to support putting new joiners in standby mode by
 default, so when you added a node in a symmetric cluster, you wouldn't
 need to be afraid that Pacemaker would shuffle resources around.[1]
 This dc-priority would be another possibly useful use case for this.

Not so sure about that.

 [1] Yes, semi-doable with putting the cluster into maintenance mode
 before firing up the new node, setting that node into standby, and
 then unsetting maintenance mode. But that's just an additional step
 that users can easily forget about.

Why not simply add the node to the cib, and set it to standby,
before it even joins for the first time.

-- 
: Lars Ellenberg
: LINBIT | Your Way to High Availability
: DRBD/HA support and consulting http://www.linbit.com

___
Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org


Re: [Pacemaker] [RFC] [Patch] DC node preferences (dc-priority)

2012-05-25 Thread Lars Ellenberg
On Fri, May 25, 2012 at 09:05:54PM +1000, Andrew Beekhof wrote:
 On Fri, May 25, 2012 at 7:48 PM, Florian Haas flor...@hastexo.com wrote:
  On Fri, May 25, 2012 at 11:38 AM, Lars Ellenberg
  lars.ellenb...@linbit.com wrote:
  On Fri, May 25, 2012 at 11:15:32AM +0200, Florian Haas wrote:
  On Fri, May 25, 2012 at 10:45 AM, Lars Ellenberg
  lars.ellenb...@linbit.com wrote:
   Sorry, sent to early.
  
   That would not catch the case of cluster partitions joining,
   only the pacemaker startup with fully connected cluster communication
   already up.
  
   I thought about a dc-priority default of 100,
   and only triggering a re-election if I am DC,
   my dc-priority is  50, and I see a node joining.
 
  Hardcoded arbitrary defaults aren't that much fun. You can use any
  number, but 100 is the magic threshold is something I wouldn't want
  to explain to people over and over again.
 
  Then don't ;-)
 
  Not helping, and irrelevant to this case.
 
  Besides that was an example.
  Easily possible: move the I want to lose vs I want to win
  magic number to be 0, and allow both positive and negative priorities.
  You get to decide whether positive or negative is the I'd rather lose
  side. Want to make that configurable as well? Right.
 
  Nope, 0 is used as a threshold value in Pacemaker all over the place.
  So allowing both positive and negative priorities and making 0 the
  default sounds perfectly sane to me.
 
  I don't think this can be made part of the cib configuration,
  DC election takes place before cibs are resynced, so if you have
  diverging cibs, you possibly end up with a never ending election?
 
  Then maybe the election is stable enough,
  even after this change to the algorithm.
 
  Andrew?
 
 This whole thread makes me want to hurt kittens.

Yep...

Sorry for that :(

Lars

___
Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org


[Pacemaker] Adding a new node in standby.

2012-05-25 Thread Lars Ellenberg
On Fri, May 25, 2012 at 11:48:29AM +0200, Florian Haas wrote:
  We actually discussed node defaults a while back. Those would be
  similar to resource and op defaults which Pacemaker already has, and
  set defaults for node attributes for newly joined nodes. At the time
  the idea was to support putting new joiners in standby mode by
  default, so when you added a node in a symmetric cluster, you wouldn't
  need to be afraid that Pacemaker would shuffle resources around.[1]

  [1] Yes, semi-doable with putting the cluster into maintenance mode
  before firing up the new node, setting that node into standby, and
  then unsetting maintenance mode. But that's just an additional step
  that users can easily forget about.
 
  Why not simply add the node to the cib, and set it to standby,
  before it even joins for the first time.
 
 Haha, good one.
 
 Wait, you weren't joking?

Nope.  Works for me.
Not that I do that very often, but I did,
and it worked.

-- 
: Lars Ellenberg
: LINBIT | Your Way to High Availability
: DRBD/HA support and consulting http://www.linbit.com

___
Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org


Re: [Pacemaker] DRBD LVM EXT4 NFS performance

2012-05-24 Thread Lars Ellenberg
On Thu, May 24, 2012 at 03:34:51PM +0300, Dan Frincu wrote:
 Hi,
 
 On Mon, May 21, 2012 at 4:24 PM, Christoph Bartoschek bartosc...@gmx.de 
 wrote:
  Florian Haas wrote:
 
  Thus I would expect to have a write performance of about 100 MByte/s. But
  dd gives me only 20 MByte/s.
 
  dd if=/dev/zero of=bigfile.10G bs=8192  count=1310720
  1310720+0 records in
  1310720+0 records out
  10737418240 bytes (11 GB) copied, 498.26 s, 21.5 MB/s
 
  If you used that same dd invocation for your local test that allegedly
  produced 450 MB/s, you've probably been testing only your page cache.
  Add oflag=dsync or oflag=direct (the latter will only work locally, as
  NFS doesn't support O_DIRECT).
 
  If your RAID is one of reasonably contemporary SAS or SATA drives,
  then a sustained to-disk throughput of 450 MB/s would require about
  7-9 stripes in a RAID-0 or RAID-10 configuration. Is that what you've
  got? Or are you writing to SSDs?
 
  I used the same invocation with different filenames each time. To which page
  cache to you refer? To the one on the client or on the server side?
 
  We are using RAID-1 with 6 x 2 disks. I have repeated the local test 10
  times with different files in a row:
 
  for i in `seq 10`; do time dd if=/dev/zero of=bigfile.10G.$i bs=8192
  count=1310720; done
 
  The resulting values on a system that is also used by other programs as
  reported by dd are:
 
  515 MB/s, 480 MB/s, 340 MB/s, 338 MB/s, 360 MB/s, 284 MB/s, 311 MB/s, 320
  MB/s, 242 MB/s,  289 MB/s
 
  So I think that the system is capable of more than 200 MB/s which is way
  more what can arrive over the network.
 
 A bit off-topic maybe.
 
 Whenever you do these kinds of tests regarding performance on disk
 (locally) to test actual speed and not some caching, as Florian said,
 you should use oflag=direct option to dd and also echo 3 
 /proc/sys/vm/drop_caches and sync.
 

You should sync before you drop caches,
or you won't drop those caches that have been dirty at that time.

 I usually use echo 3  /proc/sys/vm/drop_caches  sync  date 
 time dd if=/dev/zero of=whatever bs=1G count=x oflag=direct  sync 
 date
 
 You can assess if there is data being flushed if the results given by
 dd differ from those obtained by calculating the amount of data
 written between the two date calls. It also helps to push more data
 than the controller can store.

Also, dd is doing one bs sized chunk at a time.

fio with appropriate options can be more useful,
once you learned all those options, and how to interpret the results...

 Regards,
 Dan
 
 
  I've done the measurements on the filesystem that sits on top of LVM and
  DRBD. Thus I think that DRBD is not a problem.
 
  However the strange thing is that I get 108 MB/s on the clients as soon as I
  disable the secondary node for DRBD. Maybe there is strange interaction
  between DRBD and NFS.

Dedicated replication link?

Maybe the additional latency is all that kills you.
Do you have non-volatile write cache on your IO backend?
Did you post your drbd configuration setings already?

 
  After reenabling the secondary node the DRBD synchronization is quite slow.
 
 
 
  Has anyone an idea what could cause such problems? I have no idea for
  further analysis.
 
  As a knee-jerk response, that might be the classic issue of NFS
  filling up the page cache until it hits the vm.dirty_ratio and then
  having a ton of stuff to write to disk, which the local I/O subsystem
  can't cope with.
 
  Sounds reasonable but shouldn't the I/O subsystem be capable to write
  anything away that arrives?
 
  Christoph

-- 
: Lars Ellenberg
: LINBIT | Your Way to High Availability
: DRBD/HA support and consulting http://www.linbit.com

DRBD® and LINBIT® are registered trademarks of LINBIT, Austria.

___
Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org


Re: [Pacemaker] [RFC] [Patch] DC node preferences (dc-priority)

2012-05-24 Thread Lars Ellenberg
On Sun, May 06, 2012 at 09:45:09PM +1000, Andrew Beekhof wrote:
 On Thu, May 3, 2012 at 5:38 PM, Lars Ellenberg
 lars.ellenb...@linbit.com wrote:
 
  People sometimes think they have a use case
  for influencing which node will be the DC.
 
 Agreed :-)
 
 
  Sometimes it is latency (certain cli commands work faster
  when done on the DC),
 
 Config changes can be run against any node, there is no reason to go
 to the one on the DC.
 
  sometimes they add a mostly quorum
  node which may be not quite up to the task of being DC.
 
 I'm not sure I buy that.  Most of the load would comes from the
 resources themselves.
 
  Prohibiting a node from becoming DC completely would
  mean it can not even be cleanly shutdown (with 1.0.x, no MCP),
  or act on its own resources for certain no-quorum policies.
 
  So here is a patch I have been asked to present for discussion,
 
 May one ask where it originated?
 
  against Pacemaker 1.0, that introduces a dc-prio configuration
  parameter, which will add some skew to the election algorithm.
 
 
  Open questions:
   * does it make sense at all?
 
 Doubtful :-)
 
 
   * election algorithm compatibility, stability:
    will the election be correct if some nodes have this patch,
    and some don't ?
 
 Unlikely, but you could easily make it so by placing it after the
 version check (and bumping said version in the patch)
 
   * How can it be improved so that a node with dc-prio=0 will
    give up its DC-role as soon as there is at least one other node
    with dc-prio  0?
 
 Short of causing an election every time a node joins... I doubt it.

Where would be a suitable place in the code/fsa to do so?

Thanks,

-- 
: Lars Ellenberg
: LINBIT | Your Way to High Availability
: DRBD/HA support and consulting http://www.linbit.com

___
Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org


[Pacemaker] [RFC] [Patch] DC node preferences (dc-priority)

2012-05-03 Thread Lars Ellenberg
, local_handle, use_mgmtd, value, no);
 if(ais_get_boolean(value) == FALSE) {
int lpc = 0;
@@ -584,6 +587,7 @@
 pcmk_env.logfile  = NULL;
 pcmk_env.use_logd = false;
 pcmk_env.syslog   = daemon;
+pcmk_env.dc_prio = 1;
 
 if(cs_uid != root_uid) {
ais_err(Corosync must be configured to start as 'root',
--- ./lib/ais/utils.c.orig  2011-05-11 11:27:08.460183200 +0200
+++ ./lib/ais/utils.c   2011-05-11 17:29:09.182064800 +0200
@@ -171,6 +171,7 @@
setenv(HA_logfacility,pcmk_env.syslog,   1);
setenv(HA_LOGFACILITY,pcmk_env.syslog,   1);
setenv(HA_use_logd,   pcmk_env.use_logd, 1);
+   setenv(HA_dc_prio,pcmk_env.dc_prio,  1);
if(pcmk_env.logfile) {
setenv(HA_debugfile, pcmk_env.logfile, 1);
}
--- ./lib/ais/utils.h.orig  2011-05-11 11:26:12.757414700 +0200
+++ ./lib/ais/utils.h   2011-05-11 17:36:34.194841700 +0200
@@ -226,6 +226,7 @@
const char *syslog;
const char *logfile;
const char *use_logd;
+   const char *dc_prio;
 };
 
 extern struct pcmk_env_s pcmk_env;



-- 
: Lars Ellenberg
: LINBIT | Your Way to High Availability
: DRBD/HA support and consulting http://www.linbit.com

DRBD® and LINBIT® are registered trademarks of LINBIT, Austria.

___
Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org


Re: [Pacemaker] ERROR: te_graph_trigger: Transition failed: terminated pacemaker's problem or mine?

2012-04-30 Thread Lars Ellenberg
On Mon, Apr 30, 2012 at 01:00:11PM +1000, Andrew Beekhof wrote:
 On Sat, Apr 28, 2012 at 5:40 AM, Lars Ellenberg
 lars.ellenb...@linbit.com wrote:
  On Fri, Apr 27, 2012 at 11:31:23AM +0100, Tim Small wrote:
  Hi,
 
  I'm trying to get to the bottom of a problem I'm seeing with a cluster.
  At this stage I'm unclear as to whether the issue is with the config or
  not - the generated error messages seem unclear.  So I'm not sure
  whether I should be staring at the config or the source code at this
  point, and would appreciate a clue!
 
  I'm running with some of the (live) resources in an unmanaged state
  whilst testing fail-over with other (non-dependant) resources.
 
  The managed resources are a number of OpenVZ virtual machines (each
  comprising 3 primitives - file-system + OpenVZ VE + SendArp).  The
  filesystems are on LVM volume groups, and the single LVM PV for each
  volume group resides on a DRBD volume.  There are n virtual machines per
  DRBD volume.
 
  I'm running pacemaker 1.0.9.1+hg15626-1 on Debian 6.0.  Here are some of
  the messages (configuration follows at the end of the email):
 
  Upgrading to 1.0.12, or 1.1.7, may get you a little further.
  It would not solve the I need to stop that resource first, but I can
  not as it is unmanaged dependency problem you apparently have here.
 
 There's really not a lot the cluster can do in this situation, there's
 a 50% chance of getting it wrong no matter what we do.
 In the most recent versions we now log as loudly as possible
 (LOG_CRIT) that we cant shutdown because something depends on an
 unmanaged resource.

That's in fact what I meant ;-)

Not only the cryptic ERROR: te_graph_trigger: Transition failed: terminated
but Hey you fool, I cannot do that because you told me not to manage
that resource, but the other ones depend on it.

Though, you still have to spot that line in the flood...

-- 
: Lars Ellenberg
: LINBIT | Your Way to High Availability
: DRBD/HA support and consulting http://www.linbit.com

DRBD® and LINBIT® are registered trademarks of LINBIT, Austria.

___
Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org


Re: [Pacemaker] ERROR: te_graph_trigger: Transition failed: terminated pacemaker's problem or mine?

2012-04-27 Thread Lars Ellenberg
-SendArp-with-athena-VE inf: athena-SendArp athena-VE
 colocation athena-VE-with-athena-FS inf: athena-VE athena-FS

 colocation calypso-FS-on-essex03-LVM inf: calypso-FS essex03-LVM

Ok, colo calypso with essex03... but then, why ...

 colocation calypso-SendArp-with-calypso-VE inf: calypso-SendArp calypso-VE
 colocation calypso-VE-with-calypso-FS inf: calypso-VE calypso-FS
 colocation epione-FS-on-essex02-LVM inf: epione-FS essex02-LVM
 colocation epione-FS-with-essex02-LVM inf: epione-FS essex02-LVM
 colocation epione-SendArp-with-epione-VE inf: epione-SendArp epione-VE
 colocation epione-VE-with-epione-FS inf: epione-VE epione-FS
 colocation essex02-LVM-with-essex02-DRBD-Master inf: essex02-LVM
 ms-drbd-essex02:Master
 colocation essex03LVM-on-ms-drbd-essex03 inf: essex03-LVM
 ms-drbd-essex03:Master
 colocation essextest-FS-with-essex02-LVM inf: essextest-FS essex02-LVM
 colocation essextest-SendArp-with-essextest-VE inf: essextest-SendArp
 essextest-VE
 colocation essextest-VE-with-essextest-FS inf: essextest-VE essextest-FS
 order artemis-FS-before-artemis-VE inf: artemis-FS artemis-VE
 order artemis-VE-before-artemis-SendArp inf: artemis-VE artemis-SendArp
 order athena-FS-before-athena-VE inf: athena-FS athena-VE
 order athena-VE-before-athena-SendArp inf: athena-VE athena-SendArp
 order calypso-FS-before-calypso-VE inf: calypso-FS calypso-VE
 order calypso-VE-before-calypso-SendArp inf: calypso-VE calypso-SendArp
 order epione-FS-before-epione-VE inf: epione-FS epione-VE
 order epione-VE-before-epione-SendArp inf: epione-VE epione-SendArp
 order essex02-lvm-before-artemis-FS inf: essex02-LVM artemis-FS
 order essex02-lvm-before-athena-FS inf: essex02-LVM athena-FS

 order essex02-lvm-before-calypso-FS inf: essex02-LVM calypso-FS

Order essex02 with calypso? typo? is this supposed to be essex03?

 order essex02-lvm-before-epione-FS inf: essex02-LVM epione-FS
 order essex02-lvm-before-essextest-FS inf: essex02-LVM essextest-FS
 order essextest-FS-before-essextest-VE inf: essextest-FS essextest-VE
 order essextest-VE-before-essextest-SendArp inf: essextest-VE
 essextest-SendArp
 order ms-drbd-essex02-before-lvm inf: ms-drbd-essex02:promote
 essex02-LVM:start
 order ms-drbd-essex03-before-lvm inf: ms-drbd-essex03:promote
 essex03-LVM:start
 property $id=cib-bootstrap-options \
 dc-version=1.0.9-74392a28b7f31d7ddc86689598bd23114f58978b \
 cluster-infrastructure=openais \
 expected-quorum-votes=2 \
 no-quorum-policy=ignore \
 stonith-enabled=false \
 last-lrm-refresh=1335487560
 
 
 # crm configure verify
 WARNING: artemis-FS: default timeout 20s for stop is smaller than the
 advised 60
 WARNING: artemis-VE: default timeout 20s for stop is smaller than the
 advised 75
 WARNING: athena-FS: default timeout 20s for stop is smaller than the
 advised 60
 WARNING: athena-VE: default timeout 20s for stop is smaller than the
 advised 75
 WARNING: calypso-FS: default timeout 20s for stop is smaller than the
 advised 60
 WARNING: calypso-VE: default timeout 20s for stop is smaller than the
 advised 75
 WARNING: epione-FS: default timeout 20s for stop is smaller than the
 advised 60
 WARNING: epione-VE: default timeout 20s for stop is smaller than the
 advised 75
 WARNING: essex02-LVM: default timeout 20s for stop is smaller than the
 advised 30
 WARNING: essex03-LVM: default timeout 20s for stop is smaller than the
 advised 30
 WARNING: essextest-FS: default timeout 20s for stop is smaller than the
 advised 60
 WARNING: essextest-VE: default timeout 20s for stop is smaller than the
 advised 75
 WARNING: essex02-DRBD: specified timeout 100s for start is smaller than
 the advised 240
 WARNING: essex02-DRBD: default timeout 20s for stop is smaller than the
 advised 100
 WARNING: essex03-DRBD: default timeout 20s for stop is smaller than the
 advised 100

-- 
: Lars Ellenberg
: LINBIT | Your Way to High Availability
: DRBD/HA support and consulting http://www.linbit.com

DRBD® and LINBIT® are registered trademarks of LINBIT, Austria.

___
Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org


Re: [Pacemaker] Sporadic problems of rejoin after split brain situation

2012-03-20 Thread Lars Ellenberg
: do_te_invoke: Processing graph 16 
 (ref=pe_calc-dc-1331858128-123) derived from /var/lib/pengine/pe-input-201.bz2
 Mar 16 01:35:28 oan1 pengine: [17673]: notice: process_pe_message: Transition 
 16: PEngine Input stored in: /var/lib/pengine/pe-input-201.bz2
 Mar 16 01:35:28 oan1 crmd: [7601]: info: run_graph: 
 
 Mar 16 01:35:28 oan1 crmd: [7601]: notice: run_graph: Transition 16 
 (Complete=0, 
 Pending=0, Fired=0, Skipped=0, Incomplete=0, Source=/var/lib/pengine/pe-input-
 201.bz2): Complete
 Mar 16 01:35:28 oan1 crmd: [7601]: info: te_graph_trigger: Transition 16 is 
 now 
 complete
 Mar 16 01:35:28 oan1 crmd: [7601]: info: notify_crmd: Transition 16 status: 
 done 
 - null
 Mar 16 01:35:28 oan1 crmd: [7601]: info: do_state_transition: State 
 transition 
 S_TRANSITION_ENGINE - S_IDLE [ input=I_TE_SUCCESS cause=C_FSA_INTERNAL 
 origin=notify_crmd ]
 Mar 16 01:35:28 oan1 crmd: [7601]: info: do_state_transition: Starting 
 PEngine 
 Recheck Timer
 Mar 16 01:35:30 oan1 cib: [7597]: info: cib_process_diff: Diff 0.210.56 - 
 0.210.57 not applied to 0.210.50: current num_updates is less than required
 Mar 16 01:35:30 oan1 cib: [7597]: WARN: cib_server_process_diff: Not 
 requesting 
 full refresh in R/W mode
 Mar 16 01:35:30 oan1 ccm: [7596]: info: Break tie for 2 nodes cluster
 Mar 16 01:35:30 oan1 cib: [7597]: info: mem_handle_event: Got an event 
 OC_EV_MS_INVALID from ccm
 Mar 16 01:35:30 oan1 cib: [7597]: info: mem_handle_event: no mbr_track info
 Mar 16 01:35:30 oan1 cib: [7597]: info: mem_handle_event: Got an event 
 OC_EV_MS_NEW_MEMBERSHIP from ccm
 Mar 16 01:35:30 oan1 cib: [7597]: info: mem_handle_event: instance=14, 
 nodes=1, 
 new=0, lost=0, n_idx=0, new_idx=1, old_idx=3
 Mar 16 01:35:30 oan1 cib: [7597]: info: cib_ccm_msg_callback: Processing CCM 
 event=NEW MEMBERSHIP (id=14)
 Mar 16 01:35:30 oan1 crmd: [7601]: info: mem_handle_event: Got an event 
 OC_EV_MS_INVALID from ccm
 Mar 16 01:35:30 oan1 crmd: [7601]: info: mem_handle_event: no mbr_track info
 Mar 16 01:35:30 oan1 crmd: [7601]: info: mem_handle_event: Got an event 
 OC_EV_MS_NEW_MEMBERSHIP from ccm
 Mar 16 01:35:30 oan1 crmd: [7601]: info: mem_handle_event: instance=14, 
 nodes=1, 
 new=0, lost=0, n_idx=0, new_idx=1, old_idx=3
 Mar 16 01:35:30 oan1 crmd: [7601]: info: crmd_ccm_msg_callback: Quorum 
 (re)attained after event=NEW MEMBERSHIP (id=14)
 Mar 16 01:35:30 oan1 crmd: [7601]: info: ccm_event_detail: NEW MEMBERSHIP: 
 trans=14, nodes=1, new=0, lost=0 n_idx=0, new_idx=1, old_idx=3
 Mar 16 01:35:30 oan1 crmd: [7601]: info: ccm_event_detail:CURRENT: oan1 
 [nodeid=0, born=14]
 Mar 16 01:35:30 oan1 crmd: [7601]: info: populate_cib_nodes_ha: Requesting 
 the 
 list of configured nodes
 Mar 16 01:35:31 oan1 ccm: [7596]: info: Break tie for 2 nodes cluster
 Mar 16 01:35:31 oan1 crmd: [7601]: info: mem_handle_event: Got an event 
 OC_EV_MS_INVALID from ccm
 Mar 16 01:35:31 oan1 crmd: [7601]: info: mem_handle_event: no mbr_track info
 Mar 16 01:35:31 oan1 crmd: [7601]: info: mem_handle_event: Got an event 
 OC_EV_MS_NEW_MEMBERSHIP from ccm
 Mar 16 01:35:31 oan1 crmd: [7601]: info: mem_handle_event: instance=15, 
 nodes=1, 
 new=0, lost=0, n_idx=0, new_idx=1, old_idx=3
 Mar 16 01:35:31 oan1 cib: [7597]: info: mem_handle_event: Got an event 
 OC_EV_MS_INVALID from ccm
 Mar 16 01:35:31 oan1 cib: [7597]: info: mem_handle_event: no mbr_track info
 Mar 16 01:35:31 oan1 cib: [7597]: info: mem_handle_event: Got an event 
 OC_EV_MS_NEW_MEMBERSHIP from ccm
 Mar 16 01:35:31 oan1 cib: [7597]: info: mem_handle_event: instance=15, 
 nodes=1, 
 new=0, lost=0, n_idx=0, new_idx=1, old_idx=3
 Mar 16 01:35:31 oan1 cib: [7597]: info: cib_ccm_msg_callback: Processing CCM 
 event=NEW MEMBERSHIP (id=15)
 Mar 16 01:35:31 oan1 crmd: [7601]: info: crmd_ccm_msg_callback: Quorum 
 (re)attained after event=NEW MEMBERSHIP (id=15)
 Mar 16 01:35:31 oan1 crmd: [7601]: info: ccm_event_detail: NEW MEMBERSHIP: 
 trans=15, nodes=1, new=0, lost=0 n_idx=0, new_idx=1, old_idx=3
 Mar 16 01:35:31 oan1 crmd: [7601]: info: ccm_event_detail:CURRENT: oan1 
 [nodeid=0, born=15]
 Mar 16 01:35:31 oan1 cib: [7597]: info: cib_process_request: Operation 
 complete: 
 op cib_modify for section nodes (origin=local/crmd/205, version=0.210.51): ok 
 (rc=0)
 Mar 16 01:35:31 oan1 crmd: [7601]: info: populate_cib_nodes_ha: Requesting 
 the 
 list of configured nodes
 
 
 
 ___
 Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
 http://oss.clusterlabs.org/mailman/listinfo/pacemaker
 
 Project Home: http://www.clusterlabs.org
 Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
 Bugs: http://bugs.clusterlabs.org

-- 
: Lars Ellenberg
: LINBIT | Your Way to High Availability
: DRBD/HA support and consulting http://www.linbit.com

DRBD® and LINBIT® are registered trademarks of LINBIT, Austria.

___
Pacemaker mailing list

Re: [Pacemaker] Can Master/Slave resource transit from Master to Stopped directly ?

2012-03-07 Thread Lars Ellenberg
On Tue, Mar 06, 2012 at 08:18:28PM +0900, Takatoshi MATSUO wrote:
 Hi Dejan
 
 2012/3/6 Dejan Muhamedagic deja...@fastmail.fm:
  Hi,
 
  On Tue, Mar 06, 2012 at 01:15:45PM +0900, Takatoshi MATSUO wrote:
  Hi
 
  I want Pacemaker to transit from Master to Stopped directly on demote
  without failcount
  for managing PostgreSQL streaming replication.
  Can Pacemaker do this ?
 
  What the RA should do on demote is, well, demote an instance to
  slave. Why would you want to stop it?
 
 Because PostgreSQL cannot transit from Master to Slave.
 
  Of course, nothing's stopping you to that and I guess that pacemaker would 
  be able to
  deal with it eventually. But note that it'll expect the resource
  to be in the Started state after demote.
 
 It causes failing of monitor in spite of success of demote.
 
 
  I returned $OCF_NOT_RUNNING on demote as a trial,
  but it incremented a failcount.
 
  $OCF_NOT_RUNNING should be used only by the monitor operation.
  It'll count as error with other operations.
 
 get it.

Actually, Andrew told me on IRC about plans to support this:
beekhof oh, and start ops will be able to tell us a resource is master and 
demote that its stopped
beekhof if thats something you feel inclined to take advantage

So, a start could then return $OCF_RUNNING_MASTER to indicate that it
went straight into Master mode, and a demote would be able to indicate
it went straight into Stopped state by returning $OCF_NOT_RUNNING.

No idea when that will be available or in which release.

Lars

-- 
: Lars Ellenberg
: LINBIT | Your Way to High Availability
: DRBD/HA support and consulting http://www.linbit.com

DRBD® and LINBIT® are registered trademarks of LINBIT, Austria.

___
Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org


Re: [Pacemaker] Surprisingly fast start of resources on cluster failover.

2012-03-07 Thread Lars Ellenberg
On Tue, Mar 06, 2012 at 01:49:11PM +0100, Florian Crouzat wrote:
 Hi,
 
 On a two nodes active/passive cluster, I placed a location
 constraint of 50 for #uname node1. As soon as applied, things moved
 from node2 to node1: right.
 I have a lsb init script defined as a resource:
 
 $ crm configure show firewall
 primitive firewall lsb:firewall\
 op monitor on-fail=restart interval=10s \
 op start interval=0 timeout=3min \
 op stop interval=0 timeout=1min \
 meta target-role=Started
 
 This lsb takes a long time to start, at least 55 seconds when fired
 from my shell over ssh.
 It logs a couple things to std{out,err}.

If a couple things actually happen to be a lot,
then having stdout/err on tty via ssh in xterm ...
can slow things down.

Did you also time it as
  time /etc/init.d/firewall out.txt 2err.txt

 So, while node1 was taking-over, I noticed in
 /var/log/pacemaker/lrmd.log that it only took 24 seconds to start
 that resource.

 My question: how comes pacemaker starts a resources twice as fast
 than I do from CLI ?

Other than above suggestion,
did you verify that it ends up doing the same thing
when started from pacemaker,
compared to when started by you from commandline?
Did you compare the results?

-- 
: Lars Ellenberg
: LINBIT | Your Way to High Availability
: DRBD/HA support and consulting http://www.linbit.com

DRBD® and LINBIT® are registered trademarks of LINBIT, Austria.

___
Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org


Re: [Pacemaker] Failing to move around IPaddr2 resource

2012-02-24 Thread Lars Ellenberg
 configuration).

If you remove the primary address,
all secondary addressees are removed as well.

If you get concurrent stop of several such IPs, those that expect
the IP to be still there may fail. Basically that's some racecondition.

If that is indeed your issue,
you can either assign one static IP on that nic, or
 sysctl -w net.ipv4.conf.all.promote_secondaries=1
(or per device).

  When I manually go and cleanup the failed nodes, they get properly assigned
  to the nodes that aren't down, so if we can't resolve the underlying issue,
  is there a way to automatically attempt to cleanup failed resources a
  limited number of times?
 
 I don't think you want to start the IP somewhere else if its still
 active on the original node.
 
 
  My configuration is here, in case there's anything wrong with it.
 
 Looks like you forgot to attach it.
 
 
  Anlu
 
 
  ___
  Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
  http://oss.clusterlabs.org/mailman/listinfo/pacemaker
 
  Project Home: http://www.clusterlabs.org
  Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
  Bugs: http://bugs.clusterlabs.org
 
 
 ___
 Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
 http://oss.clusterlabs.org/mailman/listinfo/pacemaker
 
 Project Home: http://www.clusterlabs.org
 Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
 Bugs: http://bugs.clusterlabs.org

-- 
: Lars Ellenberg
: LINBIT | Your Way to High Availability
: DRBD/HA support and consulting http://www.linbit.com

DRBD® and LINBIT® are registered trademarks of LINBIT, Austria.

___
Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org


Re: [Pacemaker] Where is MAXMSG defined?

2012-02-07 Thread Lars Ellenberg
On Tue, Feb 07, 2012 at 01:13:19PM +0200, Adrian Fita wrote:
 Hi.
 
 I can't find any trace of define MAXMSG in either pacemaker,
 corosync, heartbeat's source code. I tried with grep -R 'MAXMSG' *
 and nothing. Where is it defined?!

If you are asking about what I think you do,
then that would be in glue,
include/clplumbing/ipc.h

But be careful, when fiddling with it.

What are you trying to solve, btw?

-- 
: Lars Ellenberg
: LINBIT | Your Way to High Availability
: DRBD/HA support and consulting http://www.linbit.com

___
Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org


Re: [Pacemaker] Proper way to migrate multistate resource?

2012-02-07 Thread Lars Ellenberg
On Mon, Feb 06, 2012 at 04:48:26PM -0800, Chet Burgess wrote:
 Greetings,
 
 I'm some what new to pacemaker and have been playing around with a
 number of configurations in a lab. Most recently I've been testing a
 multistate resource using the ofc:pacemaker:Stateful example RA.
 
 While I've gotten the agent to work and notice that if I shutdown or
 kill a node the resources migrate I can't seem to figure out the
 proper way to migrate the resource between nodes when they are both
 up. 
 
 For regular resources I've used crm resource migrate rsc without
 issue. However when I try this with a multistate resource it doesn't
 seem to work. When I run the command it just puts the slave node into
 a stopped state. If I try and tell it to migrate specifically to the
 slave node it claims to already be running their (which I suppose in a
 sense it is).

the crm shell does not support roles for the move or migrate command
(yet; maybe in newer versions. Dejan?).

What you need to do is set a location constraint on the role.
 * force master role off from one node:

location you-name-it resource-id \
rule $role=Master -inf: \
#uname eq node-where-it-should-be-slave

 * or force master role off from all but one node,
   note the double negation in this one:

location you-name-it resource-id \
rule $role=Master -inf: \
#uname ne node-where-it-should-be-master

Cheers,

Lars

---
: Lars Ellenberg
: LINBIT | Your Way to High Availability
: DRBD/HA support and consulting http://www.linbit.com

 The only method I've found to safely and reliable migrate a multistate
 resource from one node to another is I think it has something to do
 with the resource constraints I used to prefer a particular node, but
 I'm not entirely sure how the constraints and the master/slave state
 updating stuff works.
 
 Am I using the wrong tool to migrate a multistate resource or is my
 configuration wrong in some way?  Any input greatly appreciated. 
 Thank you.
 
 
 Configuration:
 r...@tst3.local1.mc:/home/cfb$ crm configure show
 node tst3.local1.mc.metacloud.com
 node tst4.local1.mc.metacloud.com
 primitive stateful-test ocf:pacemaker:Stateful \
   op monitor interval=30s role=Slave \
   op monitor interval=31s role=Master
 ms ms-test stateful-test \
   meta clone-node-max=1 notify=false master-max=1 
 master-node-max=1 target-role=Master
 location ms-test_constraint_1 ms-test 25: tst3.local1.mc.metacloud.com
 location ms-test_constraint_2 ms-test 20: tst4.local1.mc.metacloud.com
 property $id=cib-bootstrap-options \
   cluster-infrastructure=openais \
   dc-version=1.1.5-01e86afaaa6d4a8c4836f68df80ababd6ca3902f \
   last-lrm-refresh=1325273678 \
   expected-quorum-votes=2 \
   no-quorum-policy=ignore \
   stonith-enabled=false
 rsc_defaults $id=rsc-options \
   resource-stickiness=100
 
 --
 Chet Burgess
 c...@liquidreality.org
 
 
 
 
 ___
 Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
 http://oss.clusterlabs.org/mailman/listinfo/pacemaker
 
 Project Home: http://www.clusterlabs.org
 Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
 Bugs: http://bugs.clusterlabs.org

___
Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org


Re: [Pacemaker] Proper way to migrate multistate resource?

2012-02-07 Thread Lars Ellenberg
On Tue, Feb 07, 2012 at 02:03:32PM +0100, Michael Schwartzkopff wrote:
  On Mon, Feb 06, 2012 at 04:48:26PM -0800, Chet Burgess wrote:
   Greetings,
   
   I'm some what new to pacemaker and have been playing around with a
   number of configurations in a lab. Most recently I've been testing a
   multistate resource using the ofc:pacemaker:Stateful example RA.
   
   While I've gotten the agent to work and notice that if I shutdown or
   kill a node the resources migrate I can't seem to figure out the
   proper way to migrate the resource between nodes when they are both
   up.
   
   For regular resources I've used crm resource migrate rsc without
   issue. However when I try this with a multistate resource it doesn't
   seem to work. When I run the command it just puts the slave node into
   a stopped state. If I try and tell it to migrate specifically to the
   slave node it claims to already be running their (which I suppose in a
   sense it is).
  
  the crm shell does not support roles for the move or migrate command
  (yet; maybe in newer versions. Dejan?).
  
  What you need to do is set a location constraint on the role.
   * force master role off from one node:
  
  location you-name-it resource-id \
  rule $role=Master -inf: \
  #uname eq node-where-it-should-be-slave
  
   * or force master role off from all but one node,
 note the double negation in this one:
  
  location you-name-it resource-id \
  rule $role=Master -inf: \
  #uname ne node-where-it-should-be-master
 
 These constraints would prevent the MS resource to run in Master state even 
 on 
 that node. Even in case the preferred node is not available any more. This 
 might be not what Chet wanted.

Well, it is just what crm resource migrate does, otherwise.

After migration, you obviously need to unmigrate,
i.e. delete that constraint again.


-- 
: Lars Ellenberg
: LINBIT | Your Way to High Availability
: DRBD/HA support and consulting http://www.linbit.com

DRBD® and LINBIT® are registered trademarks of LINBIT, Austria.

___
Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org


Re: [Pacemaker] How to start resources in a Resource Group in parallel

2012-02-02 Thread Lars Ellenberg
On Thu, Feb 02, 2012 at 08:28:16PM +1100, Andrew Beekhof wrote:
 On Tue, Jan 31, 2012 at 9:52 PM, Dejan Muhamedagic deja...@fastmail.fm 
 wrote:
  Hi,
 
  On Tue, Jan 31, 2012 at 10:29:14AM +, Kashif Jawed Siddiqui wrote:
  Hi Andrew,
 
            It is the LRMD_MAX_CHILDREN limit which by default is 4.
 
            I see in forums that this parameter is tunable by adding 
  /etc/sysconfig/pacemaker
  with the following line as content
     LRMD_MAX_CHILDREN=8
 
            But the above works only for Hearbeat. How do we do it for 
  Corosync?
 
            can you suggest?
 
  It is not heartbeat or corosync specific, but depends on support
  in the init script (/etc/init.d/corosync). The init script should
  read the sysconfig file and then invoke lrmadmin to set the max
  children parameter.
 
 Just a reminder, but systemd unit files cannot do this.
 SLES wont be affected for a while, but openSUSE users will presumably
 start complaining soon.
 
 I recommend:
 
 diff -r 0285b706fcde lrm/lrmd/lrmd.c
 --- a/lrm/lrmd/lrmd.c Tue Sep 28 19:10:38 2010 +0200
 +++ b/lrm/lrmd/lrmd.c Thu Feb 02 20:27:33 2012 +1100
 @@ -832,6 +832,13 @@ main(int argc, char ** argv)
   init_stop(PID_FILE);
   }
 
 +if(getenv(LRMD_MAX_CHILDREN)) {
 +int tmp = atoi(getenv(LRMD_MAX_CHILDREN));
 +if(tmp  4) {
 +max_child_count = tmp;
 +}
 +}
 +
   return init_start();
  }

Yes, please...

and of course we have to remember to not only set, but also export
LRMD_MAX_CHILDREN from wherever lrmd will be started from.

Lars

___
Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org


Re: [Pacemaker] don't want to restart clone resource

2012-02-01 Thread Lars Ellenberg
On Wed, Feb 01, 2012 at 03:43:55PM +0100, Andreas Kurz wrote:
 Hello,
 
 On 02/01/2012 10:39 AM, Fanghao Sha wrote:
  Hi Lars,
  
  Yes, you are right. But how to prevent the orphaned resources from
  stopping by default, please?
 
 crm configure property stop-orphan-resources=false

Well, sure. But for normal ophans,
you actually want them to be stopped.

No, pacemaker needs some additional smarts to recognize
that there actually are no orphans, maybe by first relabling,
and only then checking for instance label  clone-max.

Did you file a bugzilla?
Has that made progress?


-- 
: Lars Ellenberg
: LINBIT | Your Way to High Availability
: DRBD/HA support and consulting http://www.linbit.com

___
Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org


Re: [Pacemaker] How to start resources in a Resource Group in parallel

2012-01-31 Thread Lars Ellenberg
On Tue, Jan 31, 2012 at 11:38:17AM +, Kashif Jawed Siddiqui wrote:
 Hi,
Yes it has to be provided in init scripts of corosync or heartbeat. 
 But for corosync 1.4.2 for SLES, it is not provided.
 
Can you help me update corosync init script to include to same?
 
Sample script will definitly help.

Well, just look at what the heartbeat start script does:

http://hg.linux-ha.org/heartbeat-STABLE_3_0/file/1f282434405b/heartbeat/init.d/heartbeat.in#l262

The relevant commit adding this was
http://hg.linux-ha.org/heartbeat-STABLE_3_0/rev/f61f00ab4fab

But since you are using SLES, why not complain there,
and have them add it for you?

-- 
: Lars Ellenberg
: LINBIT | Your Way to High Availability
: DRBD/HA support and consulting http://www.linbit.com

___
Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org


Re: [Pacemaker] How to flush the arp cache of a router?

2012-01-31 Thread Lars Ellenberg
On Thu, Jan 26, 2012 at 01:05:07AM +0100, ge...@riseup.net wrote:
 He all,
 
 I'm using Debian Stable and corosync/pacemaker/DRBD with Asterisk in a
 master/slave-setup.
 I get calls routed from my carrier to an ip in a private net. I'm using a
 Cisco 876 as the router.
 
 As the ressource agent for managing a virtual ip I'm using IpAddr2, which
 should do arp broadcast when bringing up the ip (as far as I read).
 However, in my case this doesn't work.
 I then had a look at SendArp, but read, that one shouldn't use this in
 conjunction with IpAddr2. Anyway, this didn't work also.
 
 In the end I tried to use arping, which works great, but I found no way to
 execute it from the cluster automatically. I tried to put it into a file
 and made this executable, and used lsb: to call it (which didn't work).
 Then I googled for hours to find out, how to call scripts from within crm,
 but had no success...
 
 Could someone point me into the right direction?

Did you tcpdump?
Does IPaddr2 send_arp actually work and send out the unsolicited arps it
is supposed to send?

Do you have any IPaddr2.*: ERROR: Could not send gratuitous arps in
your logs?

Maybe replacing the call to send_arp with calls to arping will do,
as I described in this thread:
http://www.gossamer-threads.com/lists/linuxha/pacemaker/58444

-- 
: Lars Ellenberg
: LINBIT | Your Way to High Availability
: DRBD/HA support and consulting http://www.linbit.com

___
Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org


Re: [Pacemaker] [Problem] The attrd does not sometimes stop.

2012-01-16 Thread Lars Ellenberg
On Mon, Jan 16, 2012 at 04:46:58PM +1100, Andrew Beekhof wrote:
  Now we proceed to the next mainloop poll:
 
  poll([{fd=7, events=POLLIN|POLLPRI}, {fd=4, events=POLLIN|POLLPRI}, {fd=5, 
  events=POLLIN|POLLPRI}], 3, -1
 
  Note the -1 (infinity timeout!)
 
  So even though the trigger was (presumably) set,
  and the -prepare() should have returned true,
  the mainloop waits forever for something to happen on those file 
  descriptors.
 
 
  I suggest this:
 
  crm_trigger_prepare should set *timeout = 0, if trigger is set.
 
  Also think about this race: crm_trigger_prepare was already
  called, only then the signal came in...
 
  diff --git a/lib/common/mainloop.c b/lib/common/mainloop.c
  index 2e8b1d0..fd17b87 100644
  --- a/lib/common/mainloop.c
  +++ b/lib/common/mainloop.c
  @@ -33,6 +33,13 @@ static gboolean
   crm_trigger_prepare(GSource * source, gint * timeout)
   {
      crm_trigger_t *trig = (crm_trigger_t *) source;
  +    /* Do not delay signal processing by the mainloop poll stage */
  +    if (trig-trigger)
  +           *timeout = 0;
  +    /* To avoid races between signal delivery and the mainloop poll stage,
  +     * make sure we always have a finite timeout. Unit: milliseconds. */
  +    else
  +           *timeout = 5000; /* arbitrary */
 
      return trig-trigger;
   }
 
 
  This scenario does not let the blocked IPC off the hook, though.
  That is still possible, both for blocking send and blocking receive,
  so that should probably be fixed as well, somehow.
  I'm not sure how likely this stuck in blocking IPC is, though.
 
 Interesting, are you sure you're in the right function though?
 trigger and signal events don't have a file descriptor... wouldn't
 these polls be for the IPC related sources and wouldn't they be
 setting their own timeout?

http://developer.gnome.org/glib/2.30/glib-The-Main-Event-Loop.html#GSourceFuncs

iiuc, mainloop does something similar to (oversimplified):
timeout = -1; /* infinity */
for s in all GSource
tmp_timeout = -1;
s-prepare(s, tmp_timeout)
if (tmp_timeout = 0  tmp_timeout  timeout)
timeout = tmp_timeout;

poll(GSource fd set, n, timeout);

for s in all GSource
if s-check(s)
s-dispatch(s, ...)

And at some stage it also orders by priority, of course.

Also compare with the comment above /* Sigh... */ in glue G_SIG_prepare().

BTW, the mentioned race between signal delivery and mainloop already
doing the poll stage could potentially be solved by using
cl_signal_set_interrupt(SIGTERM, 1),
which would mean we can condense the prepare to
if (trig-trigger)
*timeout = 0;
return trig-trigger;

Glue (and heartbeat) code base is not that, let's say, involved,
because someone had been paranoid.
But because someone had been paranoid for a reason ;-)

Cheers,

-- 
: Lars Ellenberg
: LINBIT | Your Way to High Availability
: DRBD/HA support and consulting http://www.linbit.com

DRBD® and LINBIT® are registered trademarks of LINBIT, Austria.

___
Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org


Re: [Pacemaker] [Problem] The attrd does not sometimes stop.

2012-01-16 Thread Lars Ellenberg
On Mon, Jan 16, 2012 at 11:42:32PM +1100, Andrew Beekhof wrote:
  http://developer.gnome.org/glib/2.30/glib-The-Main-Event-Loop.html#GSourceFuncs
 
  iiuc, mainloop does something similar to (oversimplified):
         timeout = -1; /* infinity */
         for s in all GSource
                 tmp_timeout = -1;
                 s-prepare(s, tmp_timeout)
                 if (tmp_timeout = 0  tmp_timeout  timeout)
                         timeout = tmp_timeout;
 
         poll(GSource fd set, n, timeout);
 
  I'm looking at the glib code again now, and it still looks to me like
  the trigger and signal sources do not appear in this fd set.
  Their setup functions would have to have called g_source_add_poll()
  somewhere, which they don't.
 
  So I'm still not seeing why its a trigger or signal sources' fault
  that glib is doing a never ending call to poll().
  poll() is going to get called regardless of whether our prepare
  function returns true or not.
 
  Looking closer, crm_trigger_prepare() returning TRUE results in:
                   ready_source-flags |= G_SOURCE_READY;
 
  which in turn causes:
           context-timeout = 0;
 
  which is essentially what adding
        if (trig-trigger)
                *timeout = 0;
 
  to crm_trigger_prepare() was intended to achieve.
 
  Shouldn't the fd, ipc or wait sources (who do call g_source_add_poll()
  and could therefor cause poll() to block forever) have a sane timeout
  in their prepare functions?

Probably should, but they usually have not.
The reasoning probably is, each GSource is responsible for *itself* only.

That is why first all sources are prepared.

If no non-fd, non-pollable source feels the need to reduce the
*timeout to something finite in its prepare(), so be it.

Besides, what is sane? 1 second? 5? 120? 240?

That's why G_CH_prepare_int() sets the *timeout to 1000,
and why I suggest to set it to 0 if prepare already knows that the
trigger is set, and to some finite amount to avoid getting stuck in
poll, in case no timeout or outher source source is active which also
set some finite timeout.

BTW, if you have an *idle* sources, prepare should set timeout to 0.

For those interested, all described below
http://developer.gnome.org/glib/2.30/glib-The-Main-Event-Loop.html#GSourceFuncs

For idle sources, the prepare and check functions always return TRUE to
indicate that the source is always ready to be processed. The prepare
function also returns a timeout value of 0 to ensure that the poll()
call doesn't block (since that would be time wasted which could have
been spent running the idle function).

... timeout sources ... returns a timeout value to ensure that the
poll() call doesn't block too long ...

... file descriptor sources ... timeout to -1 to indicate that is does
not mind how long the poll() call blocks ... 

  Or is it because the signal itself is interrupting some essential part
  of G_CH_prepare_int() and friends?

In the provided strace, it looks like the SIGTERM
is delivered while calling some G_CH_prepare_int,
the -prepare() used by G_main_add_IPC_Channel.

Since the signal sources are of higher priority,
we probably are passt those already in this iteration,
we will only notice the trigger in the next check(),
after the poll.

So it is vital for any non-pollable source such as signals
to set a finite timeout in their prepare(),
even if we also mark that signal siginterrupt().

         for s in all GSource
                 if s-check(s)
                         s-dispatch(s, ...)
 
  And at some stage it also orders by priority, of course.
 
  Also compare with the comment above /* Sigh... */ in glue G_SIG_prepare().
 
  BTW, the mentioned race between signal delivery and mainloop already
  doing the poll stage could potentially be solved by using
  cl_signal_set_interrupt(SIGTERM, 1),

As I just wrote above, that race is not solved at all.
Only the (necessarily set) finite timeout of the poll
would be shortened in that case.

  But I can't escape the feeling that calling this just masks the
  underlying why is there a never-ending call to poll() in the first
  place issue.
  G_CH_prepare_int() and friends /should/ be setting timeouts so that
  poll() can return and any sources created by g_idle_source_new() can
  execute.
 
  Actually, thinking further, I'm pretty convinced that poll() with an
  infinite timeout is the default mode of operation for mainloops with
  cluster-glue's IPC and FD sources.
  And that this is not a good thing :)

Well, if there are *only* pollable sources, it is.
If there are any other sources, they should have set
their limit on what they think is an acceptable timeout
int their prepare().

 Far too late, brain shutting down.

 ;-)

 ...not a good thing, because it breaks the idle stuff,

see above, explanation on developer.gnome.org,
idle stuff is expected to set timeout 0 (or just a few ms).

 but most of all because it requires /all/ external events to come out
 of that poll() call.

If you 

Re: [Pacemaker] [Question] About the rotation of the pe-file.

2012-01-14 Thread Lars Ellenberg
On Fri, Jan 06, 2012 at 10:12:06AM +0900, renayama19661...@ybb.ne.jp wrote:
 Hi Andrew,
 
 Thank you for comments.
 
  Could you try with:
  
  while(max = 0  sequence  max) {
  
 
 The problem is not settled by this correction.
 The rotation is carried out with a value except 0.

If you want it to be between [0, max-1],
obviously that should be
        while(max  0  sequence = max) {
                sequence -= max;
        }

Though I wonder why not simply:
if (max == 0)
return;
if (sequence  max)
sequence = 0;


-- 
: Lars Ellenberg
: LINBIT | Your Way to High Availability
: DRBD/HA support and consulting http://www.linbit.com

___
Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org


Re: [Pacemaker] [Problem] The attrd does not sometimes stop.

2012-01-14 Thread Lars Ellenberg
On Tue, Jan 10, 2012 at 04:43:51PM +0900, renayama19661...@ybb.ne.jp wrote:
 Hi Lars,
 
 I attach strace file when a problem reappeared at the end of last year.
 I used glue which applied your patch for confirmation.
 
 It is the file which I picked with attrd by strace -p command right before I 
 stop Heartbeat.
 
 Finally SIGTERM caught it, but attrd did not stop.
 The attrd stopped afterwards when I sent SIGKILL.

The strace reveals something interesting:

This poll looks like the mainloop poll,
but some -prepare() has modified the timeout to be 0,
so we proceed directly to -check() and then -dispatch().

 poll([{fd=7, events=POLLIN|POLLPRI}, {fd=4, events=POLLIN|POLLPRI}, {fd=8, 
 events=POLLIN|POLLPRI}], 3, 0) = 1 ([{fd=8, revents=POLLIN|POLLHUP}])

 times({tms_utime=2, tms_stime=3, tms_cutime=0, tms_cstime=0}) = 433738632
 recv(4, 0x95af308, 576, MSG_DONTWAIT)   = -1 EAGAIN (Resource temporarily 
 unavailable)
...
 recv(7, 0x95b1657, 3513, MSG_DONTWAIT)  = -1 EAGAIN (Resource temporarily 
 unavailable)
 poll([{fd=7, events=0}], 1, 0)  = ? ERESTART_RESTARTBLOCK (To be 
 restarted)
 --- SIGTERM (Terminated) @ 0 (0) ---
 sigreturn() = ? (mask now [])

Ok. signal received, trigger set.
Still finishing this mainloop iteration, though.

These recv(),poll() look like invocations of G_CH_prepare_int().
Does not matter much, though.

 recv(7, 0x95b1657, 3513, MSG_DONTWAIT)  = -1 EAGAIN (Resource temporarily 
 unavailable)
 poll([{fd=7, events=0}], 1, 0)  = 0 (Timeout)
 recv(7, 0x95b1657, 3513, MSG_DONTWAIT)  = -1 EAGAIN (Resource temporarily 
 unavailable)
 poll([{fd=7, events=0}], 1, 0)  = 0 (Timeout)

 times({tms_utime=2, tms_stime=3, tms_cutime=0, tms_cstime=0}) = 433738634

Now we proceed to the next mainloop poll:

 poll([{fd=7, events=POLLIN|POLLPRI}, {fd=4, events=POLLIN|POLLPRI}, {fd=5, 
 events=POLLIN|POLLPRI}], 3, -1

Note the -1 (infinity timeout!)

So even though the trigger was (presumably) set,
and the -prepare() should have returned true,
the mainloop waits forever for something to happen on those file descriptors.


I suggest this:

crm_trigger_prepare should set *timeout = 0, if trigger is set.

Also think about this race: crm_trigger_prepare was already
called, only then the signal came in...

diff --git a/lib/common/mainloop.c b/lib/common/mainloop.c
index 2e8b1d0..fd17b87 100644
--- a/lib/common/mainloop.c
+++ b/lib/common/mainloop.c
@@ -33,6 +33,13 @@ static gboolean
 crm_trigger_prepare(GSource * source, gint * timeout)
 {
 crm_trigger_t *trig = (crm_trigger_t *) source;
+/* Do not delay signal processing by the mainloop poll stage */
+if (trig-trigger)
+   *timeout = 0;
+/* To avoid races between signal delivery and the mainloop poll stage,
+ * make sure we always have a finite timeout. Unit: milliseconds. */
+else
+   *timeout = 5000; /* arbitrary */
 
 return trig-trigger;
 }


This scenario does not let the blocked IPC off the hook, though.
That is still possible, both for blocking send and blocking receive,
so that should probably be fixed as well, somehow.
I'm not sure how likely this stuck in blocking IPC is, though.

-- 
: Lars Ellenberg
: LINBIT | Your Way to High Availability
: DRBD/HA support and consulting http://www.linbit.com

DRBD® and LINBIT® are registered trademarks of LINBIT, Austria.

___
Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org


Re: [Pacemaker] [Problem] The attrd does not sometimes stop.

2011-12-29 Thread Lars Ellenberg
On Thu, Dec 22, 2011 at 09:54:47AM +0900, renayama19661...@ybb.ne.jp wrote:
 Hi Dejan,
 Hi Lars,
 
 In our environment, the problem recurred with the patch of Mr. Lars.
 After a problem occurred, I sent TERM signal, but attrd does not seem to
 receive TERM at all.

If you are able to reproduce,
you could try to find out what exactly attrd is doing.

various ways to try to do that:
cat /proc/pid-of-attrd/stack   # if your platform supports that
strace it,
ltrace it,
attach with gdb and provide a stack trace, or even start to single step it,
cause attrd to core dump, and analyse the core.

 The reconsideration of the patch is necessary for the solution to problem.
 
 
 Best Regards,
 Hideo Yamauchi.
 
 
 --- On Tue, 2011/11/15, renayama19661...@ybb.ne.jp 
 renayama19661...@ybb.ne.jp wrote:
 
  Hi Dejan,
  Hi Lars,
  
  I understood it.
  I try the operation of the patch in our environment.
  
  To Alan: Will you try a patch?
  
  Best Regards,
  Hideo Yamauchi.
  
  --- On Tue, 2011/11/15, Dejan Muhamedagic deja...@fastmail.fm wrote:
  
   Hi,
   
   On Mon, Nov 14, 2011 at 01:17:37PM +0100, Lars Ellenberg wrote:
On Mon, Nov 14, 2011 at 11:58:09AM +1100, Andrew Beekhof wrote:
 On Mon, Nov 7, 2011 at 8:39 AM, Lars Ellenberg
 lars.ellenb...@linbit.com wrote:
  On Thu, Nov 03, 2011 at 01:49:46AM +1100, Andrew Beekhof wrote:
  On Tue, Oct 18, 2011 at 12:19 PM,  renayama19661...@ybb.ne.jp 
  wrote:
   Hi,
  
   We sometimes fail in a stop of attrd.
  
   Step1. start a cluster in 2 nodes
   Step2. stop the first node.(/etc/init.d/heartbeat stop.)
   Step3. stop the second node after time passed a 
   little.(/etc/init.d/heartbeat
   stop.)
  
   The attrd catches the TERM signal, but does not stop.
 
  There's no evidence that it actually catches it, only that it is 
  sent.
  I've seen it before but never figured out why it occurs.
 
  I had it once tracked down almost to where it occurs, but then got 
  distracted.
  Yes the signal was delivered.
 
  I *think* it had to do with attrd doing a blocking read,
  or looping in some internal message delivery function too often.
 
  I had a quick look at the code again now, to try and remember,
  but I'm not sure.
 
  I *may* be that, because
  xmlfromIPC(IPC_Channel * ch, int timeout) calls
     msg = msgfromIPC_timeout(ch, MSG_ALLOWINTR, timeout, ipc_rc);
 
  And MSG_ALLOWINTR will cause msgfromIPC_ll() to
         IPC_INTR:
                 if ( allow_intr){
                         goto startwait;
 
  Depending on the frequency of deliverd signals, it may cause this 
  goto
  startwait loop to never exit, because the timeout always starts 
  again
  from the full passed in timeout.
 
  If only one signal is deliverd, it may still take 120 seconds
  (MAX_IPC_DELAY from crm.h) to be actually processed, as the signal
  handler only raises a flag for the next mainloop iteration.
 
  If a (non-fatal) signal is delivered every few seconds,
  then the goto loop will never timeout.
 
  Please someone check this for plausibility ;-)
 
 Most plausible explanation I've heard so far... still odd that only
 attrd is affected.
 So what do we do about it?

Reproduce, and confirm that this is what people are seeing.

Make attrd non-blocking?

Fix the ipc layer to not restart the full timeout,
but only the remaining partial time?
   
   Lars and I made a quick patch for cluster-glue (attached).
   Hideo-san, is there a way for you to verify if it helps? The
   patch is not perfect and under unfavourable circumstances it may
   still take a long time for the caller to exit, but it'd be good
   to know if this is the right spot.
   
   Cheers,
   
   Dejan
   
-- 
: Lars Ellenberg
: LINBIT | Your Way to High Availability
: DRBD/HA support and consulting http://www.linbit.com

___
Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: 
http://developerbugs.linux-foundation.org/enter_bug.cgi?product=Pacemaker
  
  
 
 ___
 Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
 http://oss.clusterlabs.org/mailman/listinfo/pacemaker
 
 Project Home: http://www.clusterlabs.org
 Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
 Bugs: http://bugs.clusterlabs.org

-- 
: Lars Ellenberg
: LINBIT | Your Way to High Availability
: DRBD/HA support and consulting http://www.linbit.com

DRBD® and LINBIT® are registered trademarks of LINBIT, Austria.

___
Pacemaker

Re: [Pacemaker] Remote CRM shell from LCMC

2011-12-28 Thread Lars Ellenberg
On Wed, Dec 28, 2011 at 12:57:33AM +0100, Rasto Levrinc wrote:
 Hi,
 
 this being a slow news day, There is this great new feature in LCMC, but
 probably completely useless. :) The LCMC used to show for testing purposes
 the CRM shell configuration, but people started to use it, so I left it
 there, made it now editable and added a commit button, that commits the
 changes. You can see it as a hole in the bottom of the car, if you are stuck
 you can still power the car by your feet.
 
 There are also some unexpected advantages over crm configure edit, see
 the video.
 
 http://youtu.be/X75wzUTRmjU?hd=1

Nice.

Sound is missing for me from 3:00 onwards.
Just in case that was not intentional...

Lars

___
Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org


Re: [Pacemaker] don't want to restart clone resource

2011-12-16 Thread Lars Ellenberg
On Fri, Dec 09, 2011 at 08:23:36AM +1100, Andrew Beekhof wrote:
 Can you file a bug and attach a crm_report to it please?
 Unfortunately there's not enough information here to figure out the
 cause (although it does look like a bug)


Node count drops from three to two,
 rsc:2 becomes the label of orphaned resources,
 orphanes are to be stopped by default?

Something like that?

 
 2011/12/1 Sha Fanghao shafang...@gmail.com:
  Hi,
 
 
 
  I have a cluster 3 nodes (CentOS 5.2) using pacemaker-1.0.11(also 1.0.12),
  with heartbeat-3.0.3.
 
  You can see the configuration:
 
 
 
  #crm configure show:
 
  node $id=85e0ca02-7aa4-45c8-9911-4035e1e6ee15 node-2
 
  node $id=a046bd1e-6267-49e5-902d-c87b6ed1dcb9 node-0
 
  node $id=d0f0b2ab-f243-4f78-b541-314fa7d6b346 node-1
 
  primitive failover-ip ocf:heartbeat:IPaddr2 \
 
      params ip=10.10.5.83 \
 
      op monitor interval=5s
 
  primitive master-app-rsc lsb:cluster-master \
 
      op monitor interval=5s
 
  primitive node-app-rsc lsb:cluster-node \
 
      op monitor interval=5s
 
  group group-dc failover-ip master-app-rsc
 
  clone clone-node-app-rsc node-app-rsc
 
  location rule-group-dc group-dc \
 
      rule $id=rule-group-dc-rule -inf: #is_dc eq false
 
  property $id=cib-bootstrap-options \
 
      start-failure-is-fatal=false \
 
      no-quorum-policy=ignore \
 
      symmetric-cluster=true \
 
      stonith-enabled=false \
 
      dc-version=1.0.11-1554a83db0d3c3e546cfd3aaff6af1184f79ee87 \
 
      cluster-infrastructure=Heartbeat
 
 
 
  #crm_mon -n -1:
 
  
 
  Last updated: Sat Oct 29 08:44:14 2011
 
  Stack: Heartbeat
 
  Current DC: node-0 (a046bd1e-6267-49e5-902d-c87b6ed1dcb9) - partition with
  quorum
 
  Version: 1.0.11-1554a83db0d3c3e546cfd3aaff6af1184f79ee87
 
  3 Nodes configured, unknown expected votes
 
  2 Resources configured.
 
  
 
 
 
  Node node-0 (a046bd1e-6267-49e5-902d-c87b6ed1dcb9): online
 
      master-app-rsc  (lsb:cluster-master) Started
 
      failover-ip (ocf::heartbeat:IPaddr2) Started
 
      node-app-rsc:0  (lsb:cluster-node) Started
 
  Node node-1 (d0f0b2ab-f243-4f78-b541-314fa7d6b346): online
 
      node-app-rsc:1  (lsb:cluster-node) Started
 
  Node node-2 (85e0ca02-7aa4-45c8-9911-4035e1e6ee15): online
 
      node-app-rsc:2  (lsb:cluster-node) Started
 
 
 
 
 
  The problem:
 
  After stopping heartbeat service on node-1, if I remove node-1 with command
  hb_delnode node-1  crm node delete node-1, then
 
  the clone resource(node-app-rsc:2) running on the node-2 will restart and
  change to node-app-rsc:1.
 
  You know, the node-app-rsc is my application, and I don't want it to
  restart.
 
  How could I do, Please?
 
 
 
  Any help will be very appreciated. :)
 
 
 
 
 
  Best Regards,
 
   Fanghao Sha
 
 
 
 
 
 
 
 
  ___
  Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
  http://oss.clusterlabs.org/mailman/listinfo/pacemaker
 
  Project Home: http://www.clusterlabs.org
  Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
  Bugs: http://bugs.clusterlabs.org
 
 
 ___
 Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
 http://oss.clusterlabs.org/mailman/listinfo/pacemaker
 
 Project Home: http://www.clusterlabs.org
 Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
 Bugs: http://bugs.clusterlabs.org

-- 
: Lars Ellenberg
: LINBIT | Your Way to High Availability
: DRBD/HA support and consulting http://www.linbit.com

DRBD® and LINBIT® are registered trademarks of LINBIT, Austria.

___
Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org


Re: [Pacemaker] [Linux-HA] Antw: Re: Q: unmanaged MD-RAID auto-recovery

2011-11-25 Thread Lars Ellenberg
On Fri, Nov 25, 2011 at 01:54:33PM +0100, Florian Haas wrote:
 On 11/25/11 13:29, Lars Ellenberg wrote:
  From the log snippet it's
  not entirely clear whether that's a recurring monitor (interval ==
  whatever you configured, or 20 if default), or a probe (interval == 0).
 
  A recurring monitor clearly should not happen at all when unmanaged.
  
  That is incorrect.
  
  is-managed=false does still monitor the resource.  It only prevents
  pacemaker from sending start/stop etc commands to that resource.
 
 My understanding was that only probes would still occur (on
 cluster-recheck-interval, or when new nodes joined the cluster). And I
 maintain that that would be the intuitively correct behavior for
 unmanaged resources. Andrew?

Well, your understanding or intuition seem to misguide you this time.
But if you think I make shit up ;-)
http://www.gossamer-threads.com/lists/linuxha/pacemaker/70606#70606

  If the implementation of the monitor action in the RA does trigger
  auto-recovery or other things, well, then it does.
 
 Which seems to operate on the same assumption, really, that an unmanaged
 resource never has its monitor action executed.
 
 I still think that this attempt to auto-recover from _within_ the
 monitor action is a bit insane, but maybe lmb (who implemented that
 part, as per git blame) would be able to share his thoughts as to why he
 did it that way.

Well, that's the only place where an auto-recovery of a degraded
(not yet failed!) md array can be triggered from pacemaker.

There is no $OCF_DEGRADED status code,
and no try-resource-internal-recovery action.
And if there was, what else could it do?

If you rather have some external monitoring page an operator
to then log in and do the same actions...

If you do md over long distance iSCSI (e.g.),
and you lose one of the links, md will detach that leg.
If the link comes back, this is where it then could recover,
and start to resync.

Besides, you explicitly have to request this behaviour of the RA.

I think that approach is perfectly sane.

-- 
: Lars Ellenberg
: LINBIT | Your Way to High Availability
: DRBD/HA support and consulting http://www.linbit.com

DRBD® and LINBIT® are registered trademarks of LINBIT, Austria.

___
Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org


Re: [Pacemaker] Syntax highlighting in vim for crm configure edit

2011-11-22 Thread Lars Ellenberg
On Fri, Aug 19, 2011 at 05:28:09PM +0300, Dan Frincu wrote:
 Hi,
 
 On Thu, Aug 18, 2011 at 5:53 PM, Digimer li...@alteeve.com wrote:
  On 08/18/2011 10:39 AM, Trevor Hemsley wrote:
  Hi all
 
  I have attached a first stab at a vim syntax highlighting file for 'crm
  configure edit'
 
  To activate this, I have added 'filetype plugin on' to my /root/.vimrc
  then created /root/.vim/{ftdetect,ftplugin}/pcmk.vim
 
  In /root/.vim/ftdetect/pcmk.vim I have the following content
 
  au BufNewFile,BufRead /tmp/tmp* set filetype=pcmk
 
  but there may be a better way to make this happen. /root/.vim/pcmk.vim
  is the attached file.
 
  Comments (not too nasty please!) welcome.
 
 I've added a couple of extra keywords to the file, to cover a couple
 more use cases. Other than that, great job.
 
 Regards,
 Dan
 
 
  I would love to see proper support added for CRM syntax highlighting
  added to vim. I will give this is a test and write back in a bit.
 
  --
  Digimer

Cool.

I remember that I had some initial attempt about a year ago myself
writing some vim syntax file, I attach as pacemaker-crm.vim.
(took me some minutes to dig it up again).

I did not really look at the current pcmk.vim, just tried it, and
apparently it does not attempt to give the user hints for common errors,
or at least not for those I do most commonly.

If you use the pacemaker-crm.vim (which I attached),
it would highlight a few things as error, like spurious space after
backslash, spurious backslash before new primitive definition,
forgetting the colon after an order or colocation score,
all these things.

It is incomplete, and I don't even know anymore what I thought when I
wrote it, it was never in active use, and I won't have time do actually
work on this myself. I may or may not be able to answer questions ;-)

Not perfect, either.
Probably detects much more errors than necessary,
and does not detect some that would be nice to have detected.
(brace errors, quotation errors ...)

But if there should be some vim syntax wizard out there,
maybe our two attempts on doing it can somehow be merged.

I'll just throw it at you, feel free to ignore, or reuse (parts) of it.

Cheers,
Lars

-- 
: Lars Ellenberg
: LINBIT | Your Way to High Availability
: DRBD/HA support and consulting http://www.linbit.com
 Vim syntax file
 Language: pacemaker-crm configuration style 
(http://www.clusterlabs.org/doc/crm_cli.html)
 Filename:pacemaker-crm.vim
 Language:pacemaker crm configuration text
 Maintainer:  Lars Ellenberg l...@linbit.com
 Last Change: Thu, 18 Feb 2010 16:04:36 +0100

What to do to install this file:
 $ mkdir -p ~/.vim/syntax
 $ cp pacemaker-crm.vim ~/.vim/syntax
 to set the filetype manually, just do :setf pacemaker-crm
 TODO: autodetection logic, maybe
 augroup filetypedetect
 au BufNewFile,BufRead *.pacemaker-crm setf pacemaker-crm
 augroup END

If you do not already have a .vimrc with syntax on then do this:
 $ echo syntax on ~/.vimrc

Now every file with a filename matching *.pacemaker-crm will be edited
using these definitions for syntax highlighting.

 TODO: maybe add some indentation rules as well?


 For version 5.x: Clear all syntax items
 For version 6.x: Quit when a syntax file was already loaded
if version  600
  syntax clear
elseif exists(b:current_syntax)
  finish
endif
syn clear

syn sync lines=30
syn case ignore

syn match   crm_unexpected  /[^ ]\+/

syn match   crm_lspace  transparent /^[ \t]*/ 
nextgroup=crm_node,crm_container,crm_head
syn match   crm_tspace_err  /\\[ \t]\+/
syn match   crm_tspace_err  
/\\\n\(primitive\|node\|group\|ms\|order\|location\|colocation\|property\).*/
syn match   crm_nodetransparent /\node \$id=[^ ]\+ 
\([a-z0-9.-]\+\)\?/
\   contains=crm_head,crm_assign,crm_nodename
\   nextgroup=crm_block

syn region  crm_block   transparent keepend contained start=/[ \t]/ 
skip=/\\$/ end=/$/
\   
contains=crm_assign,crm_key,crm_meta,crm_tspace_err,crm_ops
syn region  crm_order_block transparent keepend contained start=/[ \t]/ 
skip=/\\$/ end=/$/
\   contains=crm_order_ref
syn region  crm_colo_block  transparent keepend contained start=/[ \t]/ 
skip=/\\$/ end=/$/
\   contains=crm_colo_ref
syn region  crm_metatransparent keepend contained start=/[ 
\t]meta\/ skip=/\\$/ end=/$/ end=/[ \t]\(params\|op\)[ \t]/
\   contains=crm_key,crm_meta_assign

syn keyword crm_container   contained group clone ms nextgroup=crm_id
syn keyword crm_headcontained node
syn keyword crm_headcontained property nextgroup=crm_block
syn keyword crm_headcontained primitive nextgroup=crm_res_id
syn keyword crm_headcontained location nextgroup=crm_id
syn match   crm_id  contained nextgroup=crm_ref,crm_block /[ 
\t]\+\[a-z0-9_-]\+\/

syn

Re: [Pacemaker] IPv6addr failure loopback interface

2011-11-17 Thread Lars Ellenberg
,
then ifconfig | grep not seeing the address?

I think that's not necessary.

  then
 ocf_log info $process: Started successfully.
 return $OCF_SUCCESS
  else
 ocf_log err $process: Could not be started: ipv6addr[\$ipv6addr\]
 cidr_netmask[\$cidr_netmask\].
  return $OCF_ERR_GENERIC
 fi
 else
  # If already running, consider start successful
 ocf_log debug $process: is already running
  return $OCF_SUCCESS
 fi
 }
 
 IPv6addrLO_stop() {
 
 ocf_log debug $process: Running STOP function.
 
 if [ -n $OCF_RESKEY_stop_timeout ]
 then
 stop_timeout=$OCF_RESKEY_stop_timeout
 elif [ -n $OCF_RESKEY_CRM_meta_timeout ]; then
 # Allow 2/3 of the action timeout for the orderly shutdown
 # (The origin unit is ms, hence the conversion)
 stop_timeout=$((OCF_RESKEY_CRM_meta_timeout/1500))
 else
 stop_timeout=10
 fi

and suddenly, completely different (and much more readable) indentation.
thanks.

Still I think this is no necessary.
Or at least, I don't understand what you are trying to protect against:

Why would ifconfig del fail, and a few seconds later succeed?

If you really want to retry, this whole function should become
while iface_has_ipv6  ! ifconfig del ; do sleep 1; done
return $OCF_SUCCESS
and the crmd/lrmd will enforce the timeout on you.

No need to go fancy and simulate a shutdown escalation like an IP address was
a database or something.

 if IPv6addrLO_status
 then
 $IFCONFIG_BIN $IFACE del `cat $pidfile`
 i=0
 while [ $i -lt $stop_timeout ]
 do
 if ! IPv6addrLO_status
 then
 rm -f $pidfile
 return $OCF_SUCCESS
 fi
 sleep 1
 i=`expr $i + 1`
 done
 ocf_log warn Stop failed. Trying again.
 $IFCONFIG_BIN $IFACE del `cat $pidfile`
 rm -f $pidfile
 if ! IPv6addrLO_status
 then
 ocf_log warn Stop success.
 return $OCF_SUCCESS
 else
 ocf_log err Failed to stop.
 return $OCF_ERR_GENERIC
 fi
 else
 # was not running, so stop can be considered successful
  $ICONFIG_BIN $IFACE del `cat $pidfile`
 rm -f $pidfile
 return $OCF_SUCCESS
  fi
 }
 
 IPv6addrLO_monitor() {
 IPv6addrLO_status
  ret=$?
 if [ $ret -eq $OCF_SUCCESS ]
 then
  if [ -n $OCF_RESKEY_monitor_hook ]; then
 eval $OCF_RESKEY_monitor_hook
 if [ $? -ne $OCF_SUCCESS ]; then
 return ${OCF_ERR_GENERIC}
 fi
 return $OCF_SUCCESS
  else
 true
 fi
  else
 return $ret
 fi
 }
 
 
 IPv6addrLO_validate() {
 
 ocf_log debug IPv6addrLO validating: args:[\$*\]
 
 if [ -x $IFCONFIG_BIN ]
 then
 ocf_log debug Binary \$IFCONFIG_BIN\ exist and is executable.
  return $OCF_SUCCESS
 else
 ocf_log err Binary \$IFCONFIG_BIN\ does not exist or isn't executable.
  return $OCF_ERR_INSTALLED
 fi
 ocf_log err Error while validating.
  return $OCF_ERR_GENERIC
 }
 
 IPv6addrLO_meta(){
 cat END
 ?xml version=1.0?
 !DOCTYPE resource-agent SYSTEM ra-api-1.dtd
 resource-agent name=IPv6addrLO
 version0.1/version
 longdesc lang=en
 OCF RA to manage IPv6addr on loopback interface Linux
 /longdesc
 shortdesc lang=enIPv6 addr on loopback linux/shortdesc
 
 parameters
 parameter name=ipv6addr required=1
 longdesc lang=en
 The ipv6 addr to asign to the loopback interface.
 /longdesc
 shortdesc lang=enIpv6 addr to the loopback interface./shortdesc
 content type=string default=/
 /parameter
 parameter name=cidr_netmask required=1
 longdesc lang=en
 The cidr netmask of the ipv6 addr.
 /longdesc
 shortdesc lang=ennetmask of the ipv6 addr./shortdesc
 content type=string default=128/
 /parameter
 parameter name=logfile required=0
 longdesc lang=en
 File to write STDOUT to
 /longdesc
 shortdesc lang=enFile to write STDOUT to/shortdesc
 content type=string /
 /parameter
 parameter name=errlogfile required=0
 longdesc lang=en
 File to write STDERR to
 /longdesc
 shortdesc lang=enFile to write STDERR to/shortdesc
 content type=string /
 /parameter
 /parameters
 actions
 action name=start   timeout=20s /
 action name=stoptimeout=20s /
 action name=monitor depth=0  timeout=20s interval=10 /
 action name=meta-data  timeout=5 /
 action name=validate-all  timeout=5 /
 /actions
 /resource-agent
 END
 exit 0
 }
 
 case $1 in
 meta-data|metadata|meta_data|meta)
 IPv6addrLO_meta
  ;;
 start)
 IPv6addrLO_start
  ;;
 stop)
 IPv6addrLO_stop
  ;;
 monitor)
 IPv6addrLO_monitor
  ;;
 validate-all)
 IPv6addrLO_validate
  ;;
 *)
 ocf_log err $0 was called with unsupported arguments:
  exit $OCF_ERR_UNIMPLEMENTED
 ;;
 esac


Cheers,

-- 
: Lars Ellenberg
: LINBIT | Your

Re: [Pacemaker] [Drbd-dev] crm_attribute --quiet (was Fwd: [Linux-HA] Should This Worry Me?)

2011-11-14 Thread Lars Ellenberg
On Mon, Nov 14, 2011 at 09:51:46AM +1100, Andrew Beekhof wrote:
  confused as to what the correct flag actually is. ocf:linbit:drbd (in
  both 8.3 and 8.4) uses -Q whereas Pacemaker expects -q as of this
  commit:
 
  commit c11ce5e9b0b13ead02b5fc4add928d7e7f95092e
  Author: Andrew Beekhof and...@beekhof.net
  Date:   Tue Sep 22 17:29:38 2009 +0200
 
     Medium: Tools: Use -q as the short form for --quiet (for consistency)
 
     Mercurial revision: 7289e661e4923beee4b7b45bc85592564ccdc438
 
  Should ocf:linbit:drbd be using -q?
 
 Correct.  Sorry about that.


-Q is still accepted, though.
As it is accepted for a larger range of crm_attribute versions,
I'll keep it for now.

-- 
: Lars Ellenberg
: LINBIT | Your Way to High Availability
: DRBD/HA support and consulting http://www.linbit.com

___
Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://developerbugs.linux-foundation.org/enter_bug.cgi?product=Pacemaker


Re: [Pacemaker] Newcomer's question - API?

2011-11-07 Thread Lars Ellenberg
On Tue, Nov 01, 2011 at 04:52:42PM -, Tim Ward wrote:
  You can try tooking at LCMC as that is a Java-based GUI that 
  should at least get you going.
 
 I did find some Java code but we can't use it because it's GPL, and I
 didn't want to study it in case I accidentally copied some of it in
 recreating it.

You know, there are effectively no more than two entities you need to
talk to, if you wanted the LCMC under some non-GPL licence.
Which is Rasto, and LINBIT.

Just a thought...

-- 
: Lars Ellenberg
: LINBIT | Your Way to High Availability
: DRBD/HA support and consulting http://www.linbit.com

___
Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://developerbugs.linux-foundation.org/enter_bug.cgi?product=Pacemaker


Re: [Pacemaker] [Problem] The attrd does not sometimes stop.

2011-11-06 Thread Lars Ellenberg
On Thu, Nov 03, 2011 at 01:49:46AM +1100, Andrew Beekhof wrote:
 On Tue, Oct 18, 2011 at 12:19 PM,  renayama19661...@ybb.ne.jp wrote:
  Hi,
 
  We sometimes fail in a stop of attrd.
 
  Step1. start a cluster in 2 nodes
  Step2. stop the first node.(/etc/init.d/heartbeat stop.)
  Step3. stop the second node after time passed a 
  little.(/etc/init.d/heartbeat
  stop.)
 
  The attrd catches the TERM signal, but does not stop.
 
 There's no evidence that it actually catches it, only that it is sent.
 I've seen it before but never figured out why it occurs.

I had it once tracked down almost to where it occurs, but then got distracted.
Yes the signal was delivered.

I *think* it had to do with attrd doing a blocking read,
or looping in some internal message delivery function too often.

I had a quick look at the code again now, to try and remember,
but I'm not sure.

I *may* be that, because
xmlfromIPC(IPC_Channel * ch, int timeout) calls
msg = msgfromIPC_timeout(ch, MSG_ALLOWINTR, timeout, ipc_rc);

And MSG_ALLOWINTR will cause msgfromIPC_ll() to 
IPC_INTR:
if ( allow_intr){
goto startwait;

Depending on the frequency of deliverd signals, it may cause this goto
startwait loop to never exit, because the timeout always starts again
from the full passed in timeout.

If only one signal is deliverd, it may still take 120 seconds
(MAX_IPC_DELAY from crm.h) to be actually processed, as the signal
handler only raises a flag for the next mainloop iteration.

If a (non-fatal) signal is delivered every few seconds,
then the goto loop will never timeout.

Please someone check this for plausibility ;-)

-- 
: Lars Ellenberg
: LINBIT | Your Way to High Availability
: DRBD/HA support and consulting http://www.linbit.com

___
Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://developerbugs.linux-foundation.org/enter_bug.cgi?product=Pacemaker


Re: [Pacemaker] location setting with parenthesis

2011-11-03 Thread Lars Ellenberg
On Thu, Nov 03, 2011 at 07:23:01PM +0900, 池田 淳子 wrote:
 Hi,
 
  location rsc_location-1 msDRBD \
  rule role=master -inf: \
  (defined master-prmMySQL:0 and master-prmMySQL:0 gt 0) or \
  (defined master-prmMySQL:1 and master-prmMySQL:1 gt 0)
  
  Why not using two rules for this location constraint? I expect that to
  work the same way you want to express in your rule above.
 
 Do you mean the following rules?
 
 location rsc_location-1 msDRBD \
 rule role=master -inf: defined master-prmMySQL:0 and master-prmMySQL:0 gt 0 \
 rule role=master -inf: defined master-prmMySQL:1 and master-prmMySQL:1 gt 0

I may be missing something obvious, but why not a colocation constraint
between msDRBD and prmMySQL?

something like
colocation asdf -inf: msDRBD:Master prmMySQL:Master



-- 
: Lars Ellenberg
: LINBIT | Your Way to High Availability
: DRBD/HA support and consulting http://www.linbit.com

___
Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://developerbugs.linux-foundation.org/enter_bug.cgi?product=Pacemaker


Re: [Pacemaker] location setting with parenthesis

2011-11-03 Thread Lars Ellenberg
On Thu, Nov 03, 2011 at 09:30:45PM +0100, Andreas Kurz wrote:
 On 11/03/2011 12:38 PM, Lars Ellenberg wrote:
  On Thu, Nov 03, 2011 at 07:23:01PM +0900, 池田 淳子 wrote:
  Hi,
 
  location rsc_location-1 msDRBD \
  rule role=master -inf: \
  (defined master-prmMySQL:0 and master-prmMySQL:0 gt 0) or \
  (defined master-prmMySQL:1 and master-prmMySQL:1 gt 0)
 
  Why not using two rules for this location constraint? I expect that to
  work the same way you want to express in your rule above.
 
  Do you mean the following rules?
 
  location rsc_location-1 msDRBD \
  rule role=master -inf: defined master-prmMySQL:0 and master-prmMySQL:0 gt 
  0 \
  rule role=master -inf: defined master-prmMySQL:1 and master-prmMySQL:1 gt 0
  
  I may be missing something obvious, but why not a colocation constraint
  between msDRBD and prmMySQL?
  
  something like
  colocation asdf -inf: msDRBD:Master prmMySQL:Master
  
  
  
 
 I don't think you miss something obvious, lars ;-)
 
 yes, that constraint you recommend would be the way to go ... I was only
 commenting on the parenthesis not on the quality of the rules ;-)

Well, actually probably 

colocation asdf -inf: msDRBD:Master msMySQL:Master

assuming prmMySQL was the primitive and msMySQL the ms resource.

Anyways, variations of that theme should do fine.


-- 
: Lars Ellenberg
: LINBIT | Your Way to High Availability
: DRBD/HA support and consulting http://www.linbit.com

___
Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://developerbugs.linux-foundation.org/enter_bug.cgi?product=Pacemaker


Re: [Pacemaker] crm_master triggering assert section != NULL

2011-10-13 Thread Lars Ellenberg
On Wed, Oct 12, 2011 at 08:08:21PM -0400, Yves Trudeau wrote:
 What about referring to the git repository here:
 
 http://www.clusterlabs.org/wiki/Get_Pacemaker#Building_from_Source

http://www.clusterlabs.org/mwiki/index.php?title=Installdiff=1287oldid=1282

Lars

___
Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://developerbugs.linux-foundation.org/enter_bug.cgi?product=Pacemaker


Re: [Pacemaker] Postgres RA won't start

2011-10-13 Thread Lars Ellenberg
On Wed, Oct 12, 2011 at 07:41:20PM -0600, Serge Dubrouski wrote:
 On Wed, Oct 12, 2011 at 9:20 AM, Amar Prasovic a...@linux.org.ba wrote:
 
  Thank you all for tips and suggestions. I managed to configure postgres so
  it actually starts.
 
  First, I updated resource-agents (Florian thanks for the tip, still don't
  know how did I manage to miss that :) )
  Second, I deleted postgres primitive, cleared all failcounts and configure
  it again like this:
 
  primitive postgres_res ocf:heartbeat:pgsql \
  params pgctl=/usr/lib/postgresql/8.4/bin/pg_ctl
  psql=/usr/bin/psql start_opt= pgdata=/var/lib/postgresql/8.4/main
  config=/etc/postgresql/8.4/main/postgresql.conf pgdba=postgres \
 
  op start interval=0 timeout=120s \
  op stop interval=0 timeout=120s \
  op monitor interval=30s timeout=30s depth=0
 
  After that, it all worked like a charm.
 
  However, I noticed some strange output in the log file, it wasn't there
  before I updated the resource-agents.
 
  Here is the extract from the syslog:
 
  http://pastebin.com/ybPi0VMp
 
  (postgres_res:monitor:stderr) [: 647: monitor: unexpected operator
 
  This error is actually reported with any operator. I tried to start the
  script from CLI, I got the same thing with ./pgsql start, ./pgsql status,
  ./pgsql stop
 
 
 Weird. I don't know what to tell. The RA is basically all right, it just
 misses one nor very important fix. On my system CentOS 5. PosgreSQL 8.4 or
 9.0 it doesn't produce any errors. If understand you log right the problem
 is in line 647 of the RA which is:
 
 [ $1 == validate-all ]  exit $rc

 == != =

Make that [ $1 = validate-all ]  exit $rc


-- 
: Lars Ellenberg
: LINBIT | Your Way to High Availability
: DRBD/HA support and consulting http://www.linbit.com

___
Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://developerbugs.linux-foundation.org/enter_bug.cgi?product=Pacemaker


Re: [Pacemaker] Postgres RA won't start

2011-10-13 Thread Lars Ellenberg
On Thu, Oct 13, 2011 at 06:35:27AM -0600, Serge Dubrouski wrote:
 On Thu, Oct 13, 2011 at 4:29 AM, Lars Ellenberg
 lars.ellenb...@linbit.comwrote:
 
  On Wed, Oct 12, 2011 at 07:41:20PM -0600, Serge Dubrouski wrote:
   On Wed, Oct 12, 2011 at 9:20 AM, Amar Prasovic a...@linux.org.ba
  wrote:
  
Thank you all for tips and suggestions. I managed to configure postgres
  so
it actually starts.
   
First, I updated resource-agents (Florian thanks for the tip, still
  don't
know how did I manage to miss that :) )
Second, I deleted postgres primitive, cleared all failcounts and
  configure
it again like this:
   
primitive postgres_res ocf:heartbeat:pgsql \
params pgctl=/usr/lib/postgresql/8.4/bin/pg_ctl
psql=/usr/bin/psql start_opt= pgdata=/var/lib/postgresql/8.4/main
config=/etc/postgresql/8.4/main/postgresql.conf pgdba=postgres \
   
op start interval=0 timeout=120s \
op stop interval=0 timeout=120s \
op monitor interval=30s timeout=30s depth=0
   
After that, it all worked like a charm.
   
However, I noticed some strange output in the log file, it wasn't there
before I updated the resource-agents.
   
Here is the extract from the syslog:
   
http://pastebin.com/ybPi0VMp
   
(postgres_res:monitor:stderr) [: 647: monitor: unexpected operator
   
This error is actually reported with any operator. I tried to start the
script from CLI, I got the same thing with ./pgsql start, ./pgsql
  status,
./pgsql stop
   
  
   Weird. I don't know what to tell. The RA is basically all right, it just
   misses one nor very important fix. On my system CentOS 5. PosgreSQL 8.4
  or
   9.0 it doesn't produce any errors. If understand you log right the
  problem
   is in line 647 of the RA which is:
  
   [ $1 == validate-all ]  exit $rc
 
   == != =
 
 
 Theoretically yes = is for strings and == is for numbers. But why it
 would create a problem on Debian and not on CentOS and why nobody else
 reported this issue so far?
 
 BTW, other RAs use  == operator as well: apache, LVM, portblock,

As you found out by now, if they are bash, that's ok.
If they are /bin/sh, then that's a bug.
dash for example does not like ==.

And no, apache and portblock use these in some embeded awk script.

LVM I fixed as well.

  Make that [ $1 = validate-all ]  exit $rc

-- 
: Lars Ellenberg
: LINBIT | Your Way to High Availability
: DRBD/HA support and consulting http://www.linbit.com

___
Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://developerbugs.linux-foundation.org/enter_bug.cgi?product=Pacemaker


Re: [Pacemaker] nginx OCF script - strange syslog output

2011-10-12 Thread Lars Ellenberg
On Wed, Oct 12, 2011 at 09:23:15PM +0200, Dejan Muhamedagic wrote:
 Hi,
 
 On Wed, Oct 12, 2011 at 05:28:47PM +0200, Amar Prasovic wrote:
  Hello everyone,
  
  I've found one nginx OCF script online and decided to use it since no
  default script is provided.
  Here is the script I am using:
  
  http://pastebin.com/CCApckew
 
 You can always get the latest nginx release from our repository
 https://github.com/ClusterLabs/resource-agents
 
  The good news is, the script is functional, I get nginx running.
  The sort of a bad news is, every ten seconds I got some strange log output.
  
  Here is the extract from my syslog:
  
  http://pastebin.com/ybPi0VMp
  
  I suppose the problem is somewhere with monitor operator but I cannot figure
  out where.

Parsing of the nginx configuration file is done on each invokation,
which is a design bug^W choice of that resource agent,
so it is done on every monitor action.

Parsing is rudimentary at best.
Things get read by awk, passed to shell commands, mangled again through
sed and awk, the result being finally eval'ed...

A lot of stuff that can go wrong there.

All of that just to guess the root, pid,
and listen directive from the config file.

  I used this script with Debian 5 some half a year ago and I didn't have this
  output. It appeared on Debian 6.0.3

Compare the config files (nginx.conf and it's includes).
Avoid more than one statement on one line.

Especially include statements.
My guess is that parsing those is partially broken,
possibly only for relative paths.

 No idea what's going on. But it doesn't look good. In particular
 as it looks like as if it's trying to execute something it
 shouldn't. You can add at the top of the RA 'set -x' in between
 monitors then take look at the logs. Beware: you should probably
 disable monitor while editing the RA. Or best to try it out on a
 test cluster.
 
 Thanks,
 
 Dejan
 
  Now, this is not some essential problem since logrotate is in place and the
  file is not getting that big, but still it kind of makes reading the file
  difficult since I have to scroll through thousands of unnecessary lines.
  
  -- 
  Amar Prasovic
  Gaißacher Straße 17
  D - 81371 München
 
  ___
  Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
  http://oss.clusterlabs.org/mailman/listinfo/pacemaker
  
  Project Home: http://www.clusterlabs.org
  Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
  Bugs: 
  http://developerbugs.linux-foundation.org/enter_bug.cgi?product=Pacemaker
 
 
 ___
 Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
 http://oss.clusterlabs.org/mailman/listinfo/pacemaker
 
 Project Home: http://www.clusterlabs.org
 Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
 Bugs: 
 http://developerbugs.linux-foundation.org/enter_bug.cgi?product=Pacemaker

-- 
: Lars Ellenberg
: LINBIT | Your Way to High Availability
: DRBD/HA support and consulting http://www.linbit.com

DRBD® and LINBIT® are registered trademarks of LINBIT, Austria.

___
Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://developerbugs.linux-foundation.org/enter_bug.cgi?product=Pacemaker


Re: [Pacemaker] crm_master triggering assert section != NULL

2011-10-12 Thread Lars Ellenberg
 a protection against flapping in case a slave
 hovers around the replication lag threshold
 You should get plenty of inspiration there from how the dampen parameter
 is used in ocf:pacemaker:ping.
 
 ok, I'll check
 The current RA does implement that but it is not required giving the
 context.  The new RA does implement flapping protection.
 
 - upon demote of a master, the RA _must_ attempt to kill all user
 (non-system) connections
 
 The current RA does not do that but it is easy to implement
 Yeah, as I assume it would be in the other one.
 
 - Slaves must be read-only
 
 That's fine, handled by the current RA.
 Correct.
 
 - Monitor should test MySQL and replication.  If either is bad, vips
 should be moved away.  Common errors should not trigger actions.
 Like I said, should be feasible with the node attribute approach
 outlined above. No reason to muck around with the resources directly.
 
 That's handled by the current RA for most of if.  The error handling
 could be added.
 
 - Slaves should update their master score according to the state of
 their replication.
 
 Handled by both RA
 Right.
 
 So, at the minimum, the RA needs to be able to store the master
 coordinate information, either in the resource parameters or in
 transient attributes and must be able to modify resources location
 scores.  The script _was_ working before I got the cib issue, maybe it
 was purely accidental but it proves the concept.  I was actually
 implement/testing the relay_log completion stuff.  I chose not to use
 the current agent because I didn't want to manage MySQL itself, just
 replication.
 
 I am wide open to argue any Pacemaker or RA architecture/design part but
 I don't want to argue the replication requirements, they are fundamental
 in my mind.
 Yup, and I still believe that ocf:heartbeat:mysql either already
 addresses those, or they could be addressed in a much cleaner fashion
 than writing a new RA.
 
 Now, if the only remaining point is but I want to write an agent that
 can do _less_ than an existing one (namely, manage only replication,
 not the underlying daemon), then I guess I can't argue with that, but
 I'd still believe that would be a suboptimal approach.
 Ohh...  don't get me wrong, I am not the kind of guy that takes
 pride in having re-invented the flat tire.  I want an opensource
 _solution_ I can offer to my customers.  I think part of the problem
 here is that we are not talking about the same ocf:heartbeat:mysql
 RA.  What is mainstream is what you can get with apt-get install
 pacemaker on 10.04 LTS for example.  This is 1.0.8.  I also tried
 1.0.11 and still it is obviously not the same version.  I got my
 latest agent version as explained in the clusterlabs FAQ page
 from:
 
 wget -O resource-agents.tar.bz2
 http://hg.linux-ha.org/agents/archive/tip.tar.bz2
 
 Where can I get the version you are using :)
 
 Regards,
 
 Yves
 
 Cheers,
 Florian
 
 
 
 ___
 Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
 http://oss.clusterlabs.org/mailman/listinfo/pacemaker
 
 Project Home: http://www.clusterlabs.org
 Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
 Bugs: 
 http://developerbugs.linux-foundation.org/enter_bug.cgi?product=Pacemaker

-- 
: Lars Ellenberg
: LINBIT | Your Way to High Availability
: DRBD/HA support and consulting http://www.linbit.com

DRBD® and LINBIT® are registered trademarks of LINBIT, Austria.

___
Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://developerbugs.linux-foundation.org/enter_bug.cgi?product=Pacemaker


Re: [Pacemaker] crm_master triggering assert section != NULL

2011-10-12 Thread Lars Ellenberg
On Thu, Oct 13, 2011 at 01:21:46AM +0200, Lars Ellenberg wrote:
 On Wed, Oct 12, 2011 at 05:09:45PM -0400, Yves Trudeau wrote:
  Hi Florian,
  
  On 11-10-12 04:09 PM, Florian Haas wrote:
  On 2011-10-12 21:46, Yves Trudeau wrote:
  Hi Florian,
 sure, let me state the requirements.  If those requirements can be
  met, pacemaker will be much more used to manage MySQL replication.
  Right now, although at Percona I deal with many large MySQL deployments,
  none are using the current agent.   Another tool, MMM is currently used
  but it is currently orphan and suffers from many pretty fundamental
  flaws (while implement about the same logic as below).
  
  Consider a pool of N identical MySQL servers.  In that case we need:
  - N replication resources (it could be the MySQL RA)
  - N Reader_vip
  - 1 Writer_vip
  
  Reader vips are used by the application to run queries that do not
  modify data, usually accessed is round-robin fashion.  When the
  application needs to write something, it uses the writer_vip.  That's
  how read/write splitting is implement in many many places.
  
  So, for the agent, here are the requirements:
  
  - No need to manage MySQL itself
  
  The resource we are interested in is replication, MySQL itself is at
  another level.  If the RA is to manage MySQL, it must not interfere.
  
  - the writer_vip must be assigned only to the master, after it is promoted
  
  This, is easy with colocation
  Agreed.
  
  - After the promotion of a new master, all slaves should be allowed to
  complete the application of their relay logs prior to any change master
  
  The current RA does not do that but it should be fairly easy to implement.
  That's a use case for a pre-promote and post-promote notification. Like
  the mysql RA currently does.
  
  - After its promotion and before allowing writes to it, a master should
  publish its current master file and position.   I am using resource
  parameters in the CIB for these (I am wondering if transient attributes
  could be used instead)
  They could, and you should. Like the mysql RA currently does.
  
  
  The RA I downloaded following instruction of the wiki stating it is
  the latest sources:
  
  wget -O resource-agents.tar.bz2
  http://hg.linux-ha.org/agents/archive/tip.tar.bz2
 
 Has moved to github.
 I'll try to make that more obvious at the website,

Hm. That I had already done,
not sure what else I could do there.

 but that won't help for direct download hg archive links.

Now, those I simply disabled,
so people will notice ;-)

 http://github.com/ClusterLabs/resource-agents/blob/master/heartbeat/mysql
 
 raw download:
 http://raw.github.com/ClusterLabs/resource-agents/master/heartbeat/mysql
 
 Also see this pull request:
 https://github.com/ClusterLabs/resource-agents/pull/28


-- 
: Lars Ellenberg
: LINBIT | Your Way to High Availability
: DRBD/HA support and consulting http://www.linbit.com

___
Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://developerbugs.linux-foundation.org/enter_bug.cgi?product=Pacemaker


Re: [Pacemaker] primary does not run alone

2011-10-11 Thread Lars Ellenberg
On Tue, Oct 11, 2011 at 09:09:52AM +0900, H.Nakai wrote:
 Hi, Andreas, Lars, and everybody
 
 I will try newer version.
 
 But, I want below.

DRBD has fencing policies (fencing resource-and-stonith, for example),
which, if configured, cause it to call fencing handlers (handler { fence-peer 
 })
when appropriate.

There are various fence-peer handlers.
 One is the drbd-peer-outdater,
which needs dopd, which at this point depends on the heartbeat
communication layer.

Then there is the crm-fence-peer.sh script,
which works by setting a pacemaker location constraint instead of
actually setting the peer outdated.

See if that works like you think it should.

 Primary
   demote
   wait 5-10 seconds
   check Secondary is promoted or
 still secondary or disconnected
   if Secondary is promoted and still primary,
set local outdate
   (This means shutdown only Primary)
   if Secondary is still secondary or disconnected,
 not set local outdate
   (This means shutdown both of Primary and Secondary)
   disconnect
   shutdown
 Seconday
   check Primary
   if Primary is primary, set local outdate
   if Primary is demoted(secondary), not set outdate
   disconnect
   shutdown
 
 (2011/10/08 7:14), Lars Ellenberg wrote:
  On Fri, Oct 07, 2011 at 11:29:57PM +0200, Andreas Kurz wrote:
  Hello,
  
  On 10/07/2011 04:51 AM, H.Nakai wrote:
   Hi, I'm from Japan, in trouble.
   In the case blow, server which was primary
   sometimes do not run drbd/heartbeat.
   
   Server A(primary), Server B(secondary) is running.
   Shutdown A and immediately Shutdown B.
   Switch on only A, it dose not run drbd/heartbeat.
   
   It may happen when one server was broken.
   
   I'm using,
   drbd83-8.3.8-1.el5
   heartbeat-3.0.5-1.1.el5
   pacemaker-1.0.11-1.2.el5
   resource-agents-3.9.2-1.1.el5
   centos5.6
   Servers are using two LANs(eth0, eth1) and not using serial cable.
   
   I checked /usr/lib/ocf/resource.d/linbit/drbd,
   and insert some debug codes.
   At drbd_stop(), in while loop,
   only when Unconfigured, break and call maybe_outdate_self().
   But sometimes, $OCF_RESKEY_CRM_meta_notify_master_uname or
   $OCF_RESKEY_CRM_meta_notify_promote_uname are not null.
   So, at maybe_outdate_self(), it is going to set outdate.
   And, it always show warning messages below. But, outdated flag is set.
   State change failed: Disk state is lower than outdated
state = { cs:StandAlone ro:Secondary/Unknown ds:Diskless/DUnknown r--- 
   }
   wanted = { cs:StandAlone ro:Secondary/Unknown ds:Outdated/DUnknown r--- 
   }
  
  those are expected and harmless, even though I admit they are annoying.
  
   I do not want to be set outdated flag, when shutdown both of them.
   I want to know what program set $OCF_RESKEY_CRM_* variables,
   with what condition set these variables,
   and when these variables are set.
  
  you need a newer OCF resource agent, at least from DRBD 8.3.9. There was
  the new parameter stop_outdates_secondary (defaults to true)
  introduced ... set this to false to change the behavior of your setup
  and be warned: this increases the change to come up with old (outdated)
  data.
  
  BTW, that default has changed to false,
  because of a bug in some version of pacemaker,
  which got the environment for stop operations wrong.
  pacemaker 1.0.11 is ok again, iirc.
  
  Anyways, if you simply go to DRBD 8.3.11, you should be good.
  If you want only the agent script, grab it there:
  http://git.drbd.org/drbd-8.3.git/?a=blob_plain;f=scripts/drbd.ocf
  
 
 Thanks,
 
 Nickey
 
 ___
 Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
 http://oss.clusterlabs.org/mailman/listinfo/pacemaker
 
 Project Home: http://www.clusterlabs.org
 Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
 Bugs: 
 http://developerbugs.linux-foundation.org/enter_bug.cgi?product=Pacemaker

-- 
: Lars Ellenberg
: LINBIT | Your Way to High Availability
: DRBD/HA support and consulting http://www.linbit.com

DRBD® and LINBIT® are registered trademarks of LINBIT, Austria.

___
Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://developerbugs.linux-foundation.org/enter_bug.cgi?product=Pacemaker


Re: [Pacemaker] primary does not run alone

2011-10-07 Thread Lars Ellenberg
On Fri, Oct 07, 2011 at 11:29:57PM +0200, Andreas Kurz wrote:
 Hello,
 
 On 10/07/2011 04:51 AM, H.Nakai wrote:
  Hi, I'm from Japan, in trouble.
  In the case blow, server which was primary
  sometimes do not run drbd/heartbeat.
  
  Server A(primary), Server B(secondary) is running.
  Shutdown A and immediately Shutdown B.
  Switch on only A, it dose not run drbd/heartbeat.
  
  It may happen when one server was broken.
  
  I'm using,
  drbd83-8.3.8-1.el5
  heartbeat-3.0.5-1.1.el5
  pacemaker-1.0.11-1.2.el5
  resource-agents-3.9.2-1.1.el5
  centos5.6
  Servers are using two LANs(eth0, eth1) and not using serial cable.
  
  I checked /usr/lib/ocf/resource.d/linbit/drbd,
  and insert some debug codes.
  At drbd_stop(), in while loop,
  only when Unconfigured, break and call maybe_outdate_self().
  But sometimes, $OCF_RESKEY_CRM_meta_notify_master_uname or
  $OCF_RESKEY_CRM_meta_notify_promote_uname are not null.
  So, at maybe_outdate_self(), it is going to set outdate.
  And, it always show warning messages below. But, outdated flag is set.
  State change failed: Disk state is lower than outdated
   state = { cs:StandAlone ro:Secondary/Unknown ds:Diskless/DUnknown r--- }
  wanted = { cs:StandAlone ro:Secondary/Unknown ds:Outdated/DUnknown r--- }

those are expected and harmless, even though I admit they are annoying.

  I do not want to be set outdated flag, when shutdown both of them.
  I want to know what program set $OCF_RESKEY_CRM_* variables,
  with what condition set these variables,
  and when these variables are set.
 
 you need a newer OCF resource agent, at least from DRBD 8.3.9. There was
 the new parameter stop_outdates_secondary (defaults to true)
 introduced ... set this to false to change the behavior of your setup
 and be warned: this increases the change to come up with old (outdated)
 data.

BTW, that default has changed to false,
because of a bug in some version of pacemaker,
which got the environment for stop operations wrong.
pacemaker 1.0.11 is ok again, iirc.

Anyways, if you simply go to DRBD 8.3.11, you should be good.
If you want only the agent script, grab it there:
http://git.drbd.org/drbd-8.3.git/?a=blob_plain;f=scripts/drbd.ocf

-- 
: Lars Ellenberg
: LINBIT | Your Way to High Availability
: DRBD/HA support and consulting http://www.linbit.com

___
Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://developerbugs.linux-foundation.org/enter_bug.cgi?product=Pacemaker


Re: [Pacemaker] concurrent uses of cibadmin: Signon to CIB failed: connection failed

2011-10-03 Thread Lars Ellenberg
On Thu, Sep 29, 2011 at 03:45:32PM -0400, Brian J. Murrell wrote:
 So, in another thread there was a discussion of using cibadmin to
 mitigate possible concurrency issue of crm shell.  I have written a test
 program to test that theory and unfortunately cibadmin falls down in the
 face of heavy concurrency also with errors such as:
 
 Signon to CIB failed: connection failed
 Init failed, could not perform requested operations
 Signon to CIB failed: connection failed
 Init failed, could not perform requested operations
 Signon to CIB failed: connection failed
 Init failed, could not perform requested operations

Cib does a listen(sock_fd, 10),
implicitly, via glue, clplumbing ipcsocket.c, socket_wait_conn_new()

You get a connection request backlog of 10.  Usually that is enough to
give a server enough time to accept them in time.
If you concurrently create many new client sessions,
some client connect() may fail.

Those would then need to be retried.

My feeling is, any retry logic for concurrency issues should go in some
shell wrapper, though. If you really expect to run into too many
connect attempts to cib at the same time regularly,
You are doing it wrong ;-)

cibadmin seems to have consistent error codes,
this particular problem should fall into exit code 10.


 Effectively my test runs:
 
 for x in $(seq 1 50); do
 cibadmin -o resources -C -x resource-$x.xml 
 done
 
 My complete test program is attached for review/experimentation if you wish.
 
 Am I doing something wrong or is this a bug?  I'm using pacemaker
 1.0.10-1.4.el5 for what it's worth.
 
 Cheers,
 b.


-- 
: Lars Ellenberg
: LINBIT | Your Way to High Availability
: DRBD/HA support and consulting http://www.linbit.com

___
Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://developerbugs.linux-foundation.org/enter_bug.cgi?product=Pacemaker


Re: [Pacemaker] Trouble with ordering

2011-09-30 Thread Lars Ellenberg
On Fri, Sep 30, 2011 at 10:06:51AM +0200, Gerald Vogt wrote:
 Hi!
 
 I am running a cluster with 3 nodes. These nodes provide dns service.
 The purpose of the cluster is to have our two dns service ip addresses
 online at all times. I use IPaddr2 and that part works.
 
 Now I try to extend our setup to check the dns service itself. So far,
 if a dns server on any node stops or hangs the cluster won't notice.
 Thus, I wrote a custom ocf script to check whether the dns service on
 a node is operational (i.e. if the dns server is listening on the ip
 address and whether it responds to a dns request).
 
 All cluster nodes are slave dns servers, therefore the dns server
 process is running at all times to get zone transfers from the dns
 master.
 
 Obviously, the dns service resource must be colocated with the IP
 address resource. However, as the dns server is running at all times,
 the dns service resource must be started or stopped after the ip
 address. This leads me to something like this:
 
 primitive ns1-ip ocf:heartbeat:IPaddr2 ...
 primitive ns1-dns ocf:custom:dns op monitor interval=30s
 
 colocation dns-ip1 inf: ns1-dns ns1-ip
 order ns1-ip-dns inf: ns1-ip ns1-dns symmetrical=false

maybe, if this is what you mean, add:
order ns1-ip-dns inf: ns1-ip:stop ns1-dns:stop symmetrical=false

 
 Problem 1: it seems as if the order constraint does not wait for an
 operation on the first resource to finish before it starts the
 operation on the second. When I migrate an IP address to another node
 the stop operation on ns1-dns will fail because the ip address is
 still active on the network interface. I have worked around this by
 checking for the IP address on the interface in the stop part of my
 dns script and sleeping 5 seconds if it is still there before checking
 again and continuing.
 
 Shouldn't the stop on ns1-ip first finish before the node initiates
 the stop on ns1-dns?
 
 Problem 2: if the dns service fails, e.g. hangs, the monitor operation
 fails. Thus, the cluster wants to migrate the ip address and service
 to another node. However, it first initiates a stop on ns1-dns and
 then on ns1-ip.
 
 What I need is ns1-ip to stop before ns1-dns. But this seems
 impossible to configure. The order constraint only says what operation
 is executed on ns1-dns depending on the status of ns1-ip. It says what
 happens after something. It cannot say what happens before something.
 Is that correct? Or am I missing a configuration option?
 
 Thanks,
 
 Gerald

-- 
: Lars Ellenberg
: LINBIT | Your Way to High Availability
: DRBD/HA support and consulting http://www.linbit.com

___
Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://developerbugs.linux-foundation.org/enter_bug.cgi?product=Pacemaker


Re: [Pacemaker] OCF exit code 8 triggers WARN message

2011-09-16 Thread Lars Ellenberg
On Fri, Sep 16, 2011 at 05:02:52PM +0200, Dejan Muhamedagic wrote:
 Hi Thilo,
 
 On Fri, Sep 16, 2011 at 04:41:59PM +0200, Thilo Uttendorfer wrote:
  Hi,
  
  I experience a lot of WARN log entries in several pacemaker cluster 
  setups:
  
  Sep 16 11:53:21 server01 lrmd: [23946]: WARN: Managed res1:0:monitor 
  process 
  26489 exited with return code 8.
  
  That's because multi state resources like DRBD have some special return 
  codes. 8 means OCF_RUNNING_MASTER which should not trigger a warning. The 
  folowing patch in cluster-clue solved this issue:
  
  -
  diff -u  lib/clplumbing/proctrack.c lib/clplumbing/proctrack.c.patched
  
  --- lib/clplumbing/proctrack.c  2011-09-16 15:48:25.0 +0200
  +++ lib/clplumbing/proctrack.c.patched  2011-09-16 15:51:43.0 +0200
  @@ -271,7 +271,7 @@
   
  if (doreport) {
  if (deathbyexit) {
  -   cl_log((exitcode == 0 ? LOG_INFO : LOG_WARNING)
  +   cl_log(((exitcode == 0 || exitcode == 8) ? LOG_INFO 
  : 
  LOG_WARNING)
  ,   Managed %s process %d exited with return 
  code %d.
  ,   type, pid, exitcode);
  }else if (deathbysig) {
  -
 
 I did consider this before but was worried that a process
 different from OCF RA instance could exit with such a code. Code
 7 (not running) also belongs to this category. Anyway, we should
 probably add this patch.

Hm...
As lrmd is not the sole users of that proctrack interface,
and not everything lrmd does is a monitor operation,

can we add an other loglevel flag there, e.g. PT_LOG_OCF_MONITOR,
and base degradation of log level for expected exit codes on that?


-- 
: Lars Ellenberg
: LINBIT | Your Way to High Availability
: DRBD/HA support and consulting http://www.linbit.com

___
Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://developerbugs.linux-foundation.org/enter_bug.cgi?product=Pacemaker


Re: [Pacemaker] Strange DRBD error in cluster operation

2011-09-05 Thread Lars Ellenberg
On Thu, Sep 01, 2011 at 02:59:56PM +0200, Michael Schwartzkopff wrote:
 Hi,
 
 from time to time we see the DRBD M/S resource failing on one of our clusters.
 
 From the logs we see that the monitoring fail with rc=5 (not_installed) and 
 the log entry:
 
 lrmd: [2454]: info: RA output: (resDRBD:1:monitor:stderr) /etc/drbd.conf:3: 
 Failed to open include file 'drbd.d/global_common.conf'.
 
 This happens about once per week and causes constant trouble.
 
 Any ideas what might be the reason for this behavior?

You periodically re-create that file from some recipe,
and it so happens that at the time of the monitor,
it is not there?


-- 
: Lars Ellenberg
: LINBIT | Your Way to High Availability
: DRBD/HA support and consulting http://www.linbit.com

___
Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://developerbugs.linux-foundation.org/enter_bug.cgi?product=Pacemaker


Re: [Pacemaker] Unable to execute crm(heartbeat/pacemaker) commands

2011-08-25 Thread Lars Ellenberg
On Thu, Aug 25, 2011 at 10:34:07AM +0200, Dejan Muhamedagic wrote:
 Hi,
 
 On Tue, Aug 23, 2011 at 11:15:09AM +0530, rakesh k wrote:
  Hi
  
  I am using Heartbeat(3.0.3) and pacemaker (1.0.9).
  
  We are facing the following issue. Please find the details.
  
  we had installed heartbeat and pacemaker,on the uinux BOX(CENT OS operation
  system).
  
  we had created a ssh user and provided it to one of the developers.
  please find the directory structure and the bash profile for that ssh user.
  
  bash-3.2# cat .bash_profile
  # .bash_profile
  # User specific environment and startup programs
  PATH=$PATH:/usr/sbin
  export PATH
  bash-3.2#
  but when one of the developer logs in to the box where heartbeat/pacemaker
  is
  installed through ssh .
  he is unable to execute crm configuration commands.
  say for example. while we are executing the following crm configuration
  commands .
  we are unable to execute crm configuration commands and the system is
  hanging
  while executing.
 
 What is hanging? The crm shell? Does it react to ctrl-C? Can you
 provide more details.

My guess is that the shell prompt is hanging.
Why?

Because you end the last part of the input with backslash.
Which of course causes shell to wait for yet an other line.

And if you don't type that line (or an additional return)
that shell prompt will wait for a very long time.

If that guess should turn out to be true, I suggest you
sleep more, drink more water or tea or coffee or whatever helps,

Or first learn about shell and do some *nix systems 101 in general
before trying to do cluster stuff.

 
  Please find the crm configuration command we are using and the snapshot of
  the bash prompt while executing
  
  -bash-3.2$ crm configure primitive HttpdVIP ocf:heartbeat:IPaddr3 \
  params ip=10.104.231.78 eth_num=eth0:2
  vip_cleanup_file=/var/run/bigha.pid \
  op start interval=0 timeout=120s \
  op stop interval=0 timeout=120s \
   params ip=10.104.231.78 eth_num=eth0:2
  vip_cleanup_file=/var/run/bigha.pid \
  op monitor interval=30s  op start interval=0
  timeout=120s \
   op stop interval=0 timeout=120s \
   op monitor interval=30s
 
 Do you actually type all this on the command line? Why would you
 want to do that, why not use a file. There's no telling if and
 how shell expansion would affect this.
 
 Thanks,
 
 Dejan
 
  can you please help me on this particular sceanrio.
 
  ___
  Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
  http://oss.clusterlabs.org/mailman/listinfo/pacemaker
  
  Project Home: http://www.clusterlabs.org
  Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
  Bugs: 
  http://developerbugs.linux-foundation.org/enter_bug.cgi?product=Pacemaker
 
 
 ___
 Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
 http://oss.clusterlabs.org/mailman/listinfo/pacemaker
 
 Project Home: http://www.clusterlabs.org
 Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
 Bugs: 
 http://developerbugs.linux-foundation.org/enter_bug.cgi?product=Pacemaker

-- 
: Lars Ellenberg
: LINBIT | Your Way to High Availability
: DRBD/HA support and consulting http://www.linbit.com

DRBD® and LINBIT® are registered trademarks of LINBIT, Austria.

___
Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://developerbugs.linux-foundation.org/enter_bug.cgi?product=Pacemaker


Re: [Pacemaker] Unable to execute crm(heartbeat/pacemaker) commands

2011-08-25 Thread Lars Ellenberg
On Thu, Aug 25, 2011 at 11:05:32AM +0200, Lars Ellenberg wrote:
 My guess is that the shell prompt is hanging.
 Why?
 
 Because you end the last part of the input with backslash.
 Which of course causes shell to wait for yet an other line.
 
 And if you don't type that line (or an additional return)
 that shell prompt will wait for a very long time.
 
 If that guess should turn out to be true, I suggest you
 sleep more, drink more water or tea or coffee or whatever helps,
 
 Or first learn about shell and do some *nix systems 101 in general
 before trying to do cluster stuff.

Then again, if it is something completely different,
I apologize for being impertinent ...

Lars

___
Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://developerbugs.linux-foundation.org/enter_bug.cgi?product=Pacemaker


Re: [Pacemaker] compression with heartbeat doesn't seem to work

2011-08-23 Thread Lars Ellenberg
=23,ackseq=244435,lastmsg=442
 Aug 19 07:38:21 usrv-qpr2 heartbeat: [23222]: ERROR: Cannot rexmit pkt 22 
 for usrv-qpr5: seqno too low
 Aug 19 07:38:21 usrv-qpr2 heartbeat: [23222]: info: fromnode =usrv-qpr5, 
 fromnode's ackseq = 244435
 Aug 19 07:38:21 usrv-qpr2 heartbeat: [23222]: info: hist information:
 Aug 19 07:38:21 usrv-qpr2 heartbeat: [23222]: info: hiseq =244943, 
 lowseq=23,ackseq=244435,lastmsg=442
 Aug 19 07:38:21 usrv-qpr2 heartbeat: [23222]: ERROR: Message hist queue is 
 filling up (500 messages in queue)
 Aug 19 07:38:21 usrv-qpr2 heartbeat: [23222]: ERROR: Message hist queue is 
 filling up (500 messages in queue)
 Aug 19 07:38:22 usrv-qpr2 heartbeat: [23222]: info: all clients are now 
 resumed
 
 My questions:
 
 1)  Seems like the compression is not working.  Is there something
 we need to do to enable it?  We have tried both bz2 and  zlib.  We've
 played with the compression threshold as well.

See above.
Because pacemaker sometimes does not mark large message field values as
should-be-compressed in the heartbeat message api way, you need
traditional_compression on, to allow heartbeat to compress the full
message instead.

 2)  How do we get the non DC system back on-line?  Rebooting does not 
 work since the DC can't seem to send the diffs to sync it.
 
 3)  If the diff it is trying to send is truly too long, how do I recover 
 from that?

Sometimes pacemaker needs to send the full cib.
The cib, particularly the status section, will grow over time, as it
accumulates probing, monitoring, and other action results.

If you start off with a cib that is too large, you are out of luck.
If you start with a cib that fits, it still may grow too large over
time, so you may need to do some special maintenance there,
delete outdated status results in time by hand or similar.

Probably rather consider using corosync instead in that case,
or reducing the number of your services/clones.

 4)  Would more information be useful in diagnosing the problem?

I don't think so.


-- 
: Lars Ellenberg
: LINBIT | Your Way to High Availability
: DRBD/HA support and consulting http://www.linbit.com

DRBD® and LINBIT® are registered trademarks of LINBIT, Austria.

___
Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://developerbugs.linux-foundation.org/enter_bug.cgi?product=Pacemaker


Re: [Pacemaker] Not seeing VIF/VIP on pacemaker system

2011-07-28 Thread Lars Ellenberg
On Thu, Jul 28, 2011 at 02:09:46PM -0400, Leonard Smith wrote:
 I have a very simply cluster configuration where I have a Virtual IP
 that is shared between two hosts. It is working fine, except that I
 cannot goto the hosts, issue an ifconfig command, and see a virttual
 IP address or the fact that the IP address is bound to the host.
 
 I would expect to see a VIF or at least the fact that the ip address
 is bound to the eth0 interface.
 
 Centos 5.6
 pacemaker-1.0.11-1.2.el5
 pacemaker-libs-1.0.11-1.2.el5
 
 
 
 node $id=xx bos-vs002.foo.bar
 node $id=xx bos-vs001.foo.bar
 
 primitive ClusterIP ocf:heartbeat:IPaddr2 \
   params ip=10.1.0.22 cidr_netmask=255.255.252.0 nic=eth0 \
   op monitor interval=10s
 
 property $id=cib-bootstrap-options \
   dc-version=1.0.11-1554a83db0d3c3e546cfd3aaff6af1184f79ee87 \
   cluster-infrastructure=Heartbeat \
   stonith-enabled=false \
   no-quorum-policy=ignore \
   default-resource-stickiness=1000
 
 [root@bos-vs001 ~]# ifconfig -a
 eth0  Link encap:Ethernet  HWaddr 00:16:36:41:D3:6D
   inet addr:10.1.1.1  Bcast:10.1.3.255  Mask:255.255.252.0
   UP BROADCAST RUNNING MULTICAST  MTU:1500  Metric:1
   RX packets:454721 errors:0 dropped:0 overruns:0 frame:0
   TX packets:90795 errors:0 dropped:0 overruns:0 carrier:0
   collisions:0 txqueuelen:1000
   RX bytes:257195727 (245.2 MiB)  TX bytes:160400169 (152.9 MiB)
 
 loLink encap:Local Loopback
   inet addr:127.0.0.1  Mask:255.0.0.0
   UP LOOPBACK RUNNING  MTU:16436  Metric:1
   RX packets:146 errors:0 dropped:0 overruns:0 frame:0
   TX packets:146 errors:0 dropped:0 overruns:0 carrier:0
   collisions:0 txqueuelen:0
   RX bytes:13592 (13.2 KiB)  TX bytes:13592 (13.2 KiB)

IPaddr != IPaddr2,
ifconfig != ip (from the iproute package)

# this will list the addresses:
ip addr show 
# also try:
ip -o -f inet a s
man ip

If you want/need ifconfig to see those aliases as well, you need to
label them, i.e. add the parameter iflabel to your primitive.


-- 
: Lars Ellenberg
: LINBIT | Your Way to High Availability
: DRBD/HA support and consulting http://www.linbit.com

DRBD® and LINBIT® are registered trademarks of LINBIT, Austria.

___
Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://developerbugs.linux-foundation.org/enter_bug.cgi?product=Pacemaker


Re: [Pacemaker] Cluster with DRBD : split brain

2011-07-26 Thread Lars Ellenberg
On Wed, Jul 20, 2011 at 11:36:25AM -0400, Digimer wrote:
 On 07/20/2011 11:24 AM, Hugo Deprez wrote:
  Hello Andrew,
  
  in fact DRBD was in standalone mode but the cluster was working :
  
  Here is the syslog of the drbd's split brain :
  
  Jul 15 08:45:34 node1 kernel: [1536023.052245] block drbd0: Handshake
  successful: Agreed network protocol version 91
  Jul 15 08:45:34 node1 kernel: [1536023.052267] block drbd0: conn(
  WFConnection - WFReportParams )
  Jul 15 08:45:34 node1 kernel: [1536023.066677] block drbd0: Starting
  asender thread (from drbd0_receiver [23281])
  Jul 15 08:45:34 node1 kernel: [1536023.066863] block drbd0:
  data-integrity-alg: not-used
  Jul 15 08:45:34 node1 kernel: [1536023.079182] block drbd0:
  drbd_sync_handshake:
  Jul 15 08:45:34 node1 kernel: [1536023.079190] block drbd0: self
  BBA9B794EDB65CDF:9E8FB52F896EF383:C5FE44742558F9E1:1F9E06135B8E296F
  bits:75338 flags:0
  Jul 15 08:45:34 node1 kernel: [1536023.079196] block drbd0: peer
  8343B5F30B2BF674:9E8FB52F896EF382:C5FE44742558F9E0:1F9E06135B8E296F
  bits:769 flags:0
  Jul 15 08:45:34 node1 kernel: [1536023.079200] block drbd0:
  uuid_compare()=100 by rule 90
  Jul 15 08:45:34 node1 kernel: [1536023.079203] block drbd0: Split-Brain
  detected, dropping connection!
  Jul 15 08:45:34 node1 kernel: [1536023.079439] block drbd0: helper
  command: /sbin/drbdadm split-brain minor-0
  Jul 15 08:45:34 node1 kernel: [1536023.083955] block drbd0: meta
  connection shut down by peer.
  Jul 15 08:45:34 node1 kernel: [1536023.084163] block drbd0: conn(
  WFReportParams - NetworkFailure )
  Jul 15 08:45:34 node1 kernel: [1536023.084173] block drbd0: asender
  terminated
  Jul 15 08:45:34 node1 kernel: [1536023.084176] block drbd0: Terminating
  asender thread
  Jul 15 08:45:34 node1 kernel: [1536023.084406] block drbd0: helper
  command: /sbin/drbdadm split-brain minor-0 exit code 0 (0x0)
  Jul 15 08:45:34 node1 kernel: [1536023.084420] block drbd0: conn(
  NetworkFailure - Disconnecting )
  Jul 15 08:45:34 node1 kernel: [1536023.084430] block drbd0: error
  receiving ReportState, l: 4!
  Jul 15 08:45:34 node1 kernel: [1536023.084789] block drbd0: Connection
  closed
  Jul 15 08:45:34 node1 kernel: [1536023.084813] block drbd0: conn(
  Disconnecting - StandAlone )
  Jul 15 08:45:34 node1 kernel: [1536023.086345] block drbd0: receiver
  terminated
  Jul 15 08:45:34 node1 kernel: [1536023.086349] block drbd0: Terminating
  receiver thread
 
 This was a DRBD split-brain, not a pacemaker split. I think that might
 have been the source of confusion.
 
 The split brain occurs when both DRBD nodes lose contact with one
 another and then proceed as StandAlone/Primary/UpToDate. To avoid this,
 configure fencing (stonith) in Pacemaker, then use 'crm-fence-peer.sh'
 in drbd.conf;
 
 ===
 disk {
 fencing resource-and-stonith;
 }
 
 handlers {
 outdate-peer/path/to/crm-fence-peer.sh;
 }
 ===

Thanks, that is basically right.
Let me fill in some details, though:

 This will tell DRBD to block (resource) and fence (stonith). DRBD will

drbd fencing options are fencing resource-only,
and fencing resource-and-stonith. 

resource-only does *not* block IO while the fencing handler runs.

resource-and-stonith does block IO.

 not resume IO until either the fence script exits with a success, or
 until an admit types 'drbdadm resume-io res'.


 The CRM script simply calls pacemaker and asks it to fence the other
 node.

No.  It tries to place a constraint forcing the Master role off of any
node but the one with the good data.

 When a node has actually failed, then the lost no is fenced. If
 both nodes are up but disconnected, as you had, then only the fastest
 node will succeed in calling the fence, and the slower node will be
 fenced before it can call a fence.

fenced may be restricted from being/becoming Master by that fencing
constraint. Or, if pacemaker decided to do so, actually shot by some
node level fencing agent (stonith).

All that resource-level fencing by placing some constraint stuff
obviously only works as long as the cluster communication is still up.
It not only the drbd replication link had issues, but the cluster
communication was down as well, it becomes a bit more complex.

-- 
: Lars Ellenberg
: LINBIT | Your Way to High Availability
: DRBD/HA support and consulting http://www.linbit.com

DRBD® and LINBIT® are registered trademarks of LINBIT, Austria.

___
Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://developerbugs.linux-foundation.org/enter_bug.cgi?product=Pacemaker


Re: [Pacemaker] Location issue: how to force only one specific location, and only as Slave

2011-07-05 Thread Lars Ellenberg
On Tue, Jul 05, 2011 at 11:40:04AM +1000, Andrew Beekhof wrote:
 On Mon, Jul 4, 2011 at 11:42 PM, ruslan usifov ruslan.usi...@gmail.com 
 wrote:
 
 
  2011/6/27 Andrew Beekhof and...@beekhof.net
 
  On Tue, Jun 21, 2011 at 10:22 PM, ruslan usifov ruslan.usi...@gmail.com
  wrote:
   No, i mean that in this constaint:
  
   location ms_drbd_web-U_slave_on_drbd3 ms_drbd_web-U \
       rule role=slave -inf: #uname ne drbd3
  
   pacemaker will try to start slave part of resource (if drbd3 is down) on
   other nodes, but it doesn't must do that.
 
  The only way to express this is to have:
  - a fake resource that can only run on drbd3, and
  - an ordering constraint tells ms_drbd_web-U to start only after the
  fake resource is active
 
 
  In future releases does this change?
 
 Its a planned but unimplemented feature.

(please do not use drbdXYZ as host name!  imagine to explain what you
mean by drbd7 on drbd3 to someone else ...)

If I understand correctly, you want to
 * restrict the resource to run only on one specific host
 * prevent it from becoming primary, ever

Then why not (I assume hostname X now):

# disallow anywhere but X
location l_ms_drbd_only_on_X ms_drbd \
 rule -inf: #uname ne X

# but even on X, don't become Primary.
location l_ms_drbd_no_primary_on_X ms_drbd \
 rule $role=Master -inf: #uname eq X

If you want pacemaker to really always do exactly that,
then it seems to be most effective to not try to force that,
but to forbid everything else ;-)

-- 
: Lars Ellenberg
: LINBIT | Your Way to High Availability
: DRBD/HA support and consulting http://www.linbit.com

DRBD® and LINBIT® are registered trademarks of LINBIT, Austria.

___
Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://developerbugs.linux-foundation.org/enter_bug.cgi?product=Pacemaker


  1   2   3   >