[Pacemaker] Call for review of undocumented parameters in resource agent meta data
On Fri, Jan 30, 2015 at 09:52:49PM +0100, Dejan Muhamedagic wrote: Hello, We've tagged today (Jan 30) a new stable resource-agents release (3.9.6) in the upstream repository. Big thanks go to all contributors! Needless to say, without you this release would not be possible. Big thanks to Dejan. Who once again finally did, what I meant to do in late 2013 already, but simply pushed off for over a year (and no-one else stepped up, either...) So: Thank You. I just today noticed that apparently some resource agents accept and use parameters that are not documented in their meta data. I now came up with a bash two-liner, which likely still produces a lot of noise, because it does not take into account that some agents source additional helper files. But here is the list: --- used, but not described +++ described, but apparently not used. EvmsSCC +OCF_RESKEY_ignore_deprecation Evmsd +OCF_RESKEY_ignore_deprecation ?? intentionally undocumented ?? IPaddr+OCF_RESKEY_iflabel IPaddr-OCF_RESKEY_netmask Not sure. IPaddr2 -OCF_RESKEY_netmask intentional, backward compat, quoting the agent: # Note: We had a version out there for a while which used # netmask instead of cidr_netmask. Don't remove this aliasing code! Please help review these: IPsrcaddr -OCF_RESKEY_ip IPsrcaddr +OCF_RESKEY_cidr_netmask IPv6addr.c-OCF_RESKEY_cidr_netmask IPv6addr.c-OCF_RESKEY_ipv6addr IPv6addr.c-OCF_RESKEY_nic LinuxSCSI +OCF_RESKEY_ignore_deprecation Squid -OCF_RESKEY_squid_confirm_trialcount Squid -OCF_RESKEY_squid_opts Squid -OCF_RESKEY_squid_suspend_trialcount SysInfo -OCF_RESKEY_clone WAS6 -OCF_RESKEY_profileName apache+OCF_RESKEY_use_ipv6 conntrackd-OCF_RESKEY_conntrackd dnsupdate -OCF_RESKEY_opts dnsupdate +OCF_RESKEY_nsupdate_opts docker-OCF_RESKEY_container ethmonitor-OCF_RESKEY_check_level ethmonitor-OCF_RESKEY_multiplicator galera+OCF_RESKEY_additional_parameters galera+OCF_RESKEY_binary galera+OCF_RESKEY_client_binary galera+OCF_RESKEY_config galera+OCF_RESKEY_datadir galera+OCF_RESKEY_enable_creation galera+OCF_RESKEY_group galera+OCF_RESKEY_log galera+OCF_RESKEY_pid galera+OCF_RESKEY_socket galera+OCF_RESKEY_user Probably all bogus, it source mysql-common.sh. Someone please have a more detailed look. iSCSILogicalUnit +OCF_RESKEY_product_id iSCSILogicalUnit +OCF_RESKEY_vendor_id false positive surprise: florian learned some wizardry back then ;-) for var in scsi_id scsi_sn vendor_id product_id; do envar=OCF_RESKEY_${var} if [ -n ${!envar} ]; then params=${params} ${var}=${!envar} fi done If such magic is used elsewhere, that could mask Used but not documented cases. iface-bridge -OCF_RESKEY_multicast_querier !! Yep, that needs to be documented! mysql-proxy -OCF_RESKEY_group mysql-proxy -OCF_RESKEY_user Oops, apparently my magic scriptlet below needs to learn to ignore script comments... named -OCF_RESKEY_rootdir !! Probably a bug: named_rootdir is documented. nfsserver -OCF_RESKEY_nfs_notify_cmd !! Yep, that needs to be documented! nginx -OCF_RESKEY_client nginx +OCF_RESKEY_testclient !! client is used, but not documented, !! testclient is documented, but unused... Bug? nginx -OCF_RESKEY_nginx Bogus. Needs to be dropped from leading comment block. oracle-OCF_RESKEY_tns_admin !! Yep, that needs to be documented! pingd +OCF_RESKEY_ignore_deprecation ?? intentionally undocumented ?? pingd -OCF_RESKEY_update !! Yep, is undocumented. sg_persist+OCF_RESKEY_binary sg_persist-OCF_RESKEY_sg_persist_binary !! BUG? binary vs sg_persist_binary varnish -OCF_RESKEY_binary !! Yep, is undocumented. Please someone find the time to prepare pull requests to fix these... Thanks, Lars - List was generated by below scriptlet, which can be improved. The improved version should probably be part of a unit test check, when building resource-agents. # In the git checkout of the resource agents, # get a list of files that look like actual agent scripts. cd heartbeat A=$(git ls-files | xargs grep -s -l 'resource-agent ') # and for each of these files, # diff the list of OCF_RESKEY_* occurrences # with the list of parameter name=* ones. for a in $A; do diff -U0 \ ( grep -h -o
[Pacemaker] Announcing the Heartbeat 3.0.6 Release
TL;DR: If you intend to set up a new High Availability cluster using the Pacemaker cluster manager, you typically should not care for Heartbeat, but use recent releases (2.3.x) of Corosync. If you don't care for Heartbeat, don't read further. Unless you are beekhof... there's a question below ;-) After 3½ years since the last officially tagged release of Heartbeat, I have seen the need to do a new maintenance release. The Heartbeat 3.0.6 release tag: 3d59540cf28d and the change set it points to: cceeb47a7d8f The main reason for this was that pacemaker more recent than somewhere between 1.1.6 and 1.1.7 would no longer work properly on the Heartbeat cluster stack. Because some of the daemons have moved from glue to pacemaker proper, and changed their paths. This has been fixed in Heartbeat. And because during that time, stonith-ng was refactored, and would still reliably fence, but not understand its own confirmation message, so it was effectively broken. This I fixed in pacemaker. If you chose to run new Pacemaker with the Heartbeat communication stack, it should be at least 1.1.12 with a few patches, see my December 2014 commits at the top of https://github.com/lge/pacemaker/commits/linbit-cluster-stack-pcmk-1.1.12 I'm not sure if they got into pacemaker upstream yet. beekhof? Do I need to rebase? Or did I miss you merging these? --- If you have those patches, consider setting this new ha.cf configuration parameter: # If pacemaker crmd spawns the pengine itself, # it sometimes forgets to kill the pengine on shutdown, # which later may confuse the system after cluster restart. # Tell the system that Heartbeat is supposed to # control the pengine directly. crmd_spawns_pengine off Here is the shortened Heartbeat changelog, the longer version is available in mercurial: http://hg.linux-ha.org/heartbeat-STABLE_3_0/shortlog - fix emergency shutdown due to broken update_ackseq - fix node dead detection problems - fix converging of membership (ccm) - fix init script startup glitch (caused by changes in glue/resource-agents) - heartbeat.service file for systemd platforms - new ucast6 UDP IPv6 communication plugin - package ha_api.py in standard package - update some man pages, specifically the example ha.cf - also report ccm membership status for cl_status hbstatus -v - updated some log messages, or their log levels - reduce max_delay in broadcast client_status query to one second - apply various (mostly cosmetic) patches from Debian - drop HBcompress compression plugins: they are part of cluster glue - drop openais HBcomm plugin - better support for current pacemaker versions - try to not miss a SIGTERM (fix problem with very fast respawn/stop cycle) - dopd: ignore dead ping nodes - cl_status improvements - api internals: reduce IPC round-trips to get at status information - uid=root is sufficient to use heartbeat api (gid=haclient remains sufficient) - fix /dev/null as log- or debugfile setting - move daemon binaries into libexecdir - document movement of compression plugins into cluster-glue - fix usage of SO_REUSEPORT in ucast sockets - fix compile issues with recent gcc and -Werror Note that a number of the mentioned fixes have been created two years ago already, and may have been released in packages for a long time, where vendors have chosen to package them. As to future plans for Heartbeat: Heartbeat is still useful for non-pacemaker, haresources-mode clusters. We (Linbit) will maintain Heartbeat for the foreseeable future. That should not be too much of a burden, as it is stable, and due to long years of field exposure, all bugs are known ;-) The most notable shortcoming when using Heartbeat with Pacemaker clusters would be the limited message size. There are currently no plans to remove that limitation. With its wide choice of communications paths, even exotic communication plugins, and the ability to run arbitrarily many paths, some deployments may even favor it over Corosync still. But typically, for new deployments involving Pacemaker, in most cases you should chose Corosync 2.3.x as your membership and communication layer. For existing deployments using Heartbeat, upgrading to this Heartbeat version is strongly recommended. Thanks, Lars Ellenberg signature.asc Description: Digital signature ___ Pacemaker mailing list: Pacemaker@oss.clusterlabs.org http://oss.clusterlabs.org/mailman/listinfo/pacemaker Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://bugs.clusterlabs.org
Re: [Pacemaker] Two node cluster and no hardware device for stonith.
On Fri, Feb 06, 2015 at 04:15:44PM +0100, Dejan Muhamedagic wrote: Hi, On Thu, Feb 05, 2015 at 09:18:50AM +0100, Digimer wrote: That is the problem that makes geo-clustering very hard to nearly impossible. You can look at the Booth option for pacemaker, but that requires two (or more) full clusters, plus an arbitrator 3rd A full cluster can consist of one node only. Hence, it is possible to have a kind of stretch two-node [multi-site] cluster based on tickets and managed by booth. In theory. In practice, we rely on proper behaviour of the other site, in case a ticket is revoked, or cannot be renewed. Relying on a single node for proper behaviour does not inspire as much confidence as relying on a multi-node HA-cluster at each site, which we can expect to ensure internal fencing. With reliable hardware watchdogs, it still should be ok to do stretched two node HA clusters in a reliable way. Be generous with timeouts. And document which failure modes you expect to handle, and how to deal with the worst-case scenarios if you end up with some failure case that you are not equipped to handle properly. There are deployments which favor rather online with _potential_ split brain over rather offline just in case. Document this, print it out on paper, I am aware that this may lead to lost transactions, data divergence, data corruption, or data loss. I am personally willing to take the blame, and live with the consequences. Have some boss sign that ^^^ in the real world using a real pen. Lars -- : Lars Ellenberg : http://www.LINBIT.com | Your Way to High Availability : DRBD, Linux-HA and Pacemaker support and consulting DRBD® and LINBIT® are registered trademarks of LINBIT, Austria. ___ Pacemaker mailing list: Pacemaker@oss.clusterlabs.org http://oss.clusterlabs.org/mailman/listinfo/pacemaker Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://bugs.clusterlabs.org
[Pacemaker] Patches: RFC before pull request
Andrew, All, Please have a look at the patches I queued up here: https://github.com/lge/pacemaker/commits/for-beekhof Most (not all) are specific for the heartbeat cluster stack. Thanks, Lars A few comments here: - This effectively changes crm_mon output, but also changes logging where this method is invoked: Low: native_print: report target-role as well This is for the Why does my resource not start? guys who forgot to remove the limiting target-role setting. Report target role (unless Started, which is the default anyways), if it limits our abilities (Slave, Stopped), or if it differs from the current status. - Heartbeat specific: Low: allow heartbeat to spawn the pengine itself, and tell crmd about it Heartbeat 3.0.6 now may spawn the pengine directly, and will announce this in the environment -- I introduced the setting crmd_spawns_pengine. This improves shutdown behavior. Otherwise I regularly find an orphaned pengine process after pacemaker shutdown. - Heartbeat specific, as consequence of the fix blow: Low: add debugging aid to help spot missing set_msg_callback()s on heartbeat In ha_msg_dispatch(), change from rcvmsg() to readmsg(). rcvmsg() is internally simply a wrapper around readmsg(), which silently deletes messages without matching callback. Use readmsg() directly here. It will only return unprocessed (by callbacks) messages, so log a warning, notice or debug message depending on message header information, and ha_msg_del() it ourselves. - Heartbeat specific bug fix: High: fix stonith ignoring its own messages on heartbeat Since the introduction of the additional F_TYPE messages T_STONITH_NOTIFY and T_STONITH_TIMEOUT_VALUE, and their use as message types in global heartbeat cluster messages, stonith-ng was broken on the heartbeat cluster stack. When delegation was made the default, and the result could only be reaped by listening for the T_STONITH_NOTIFY message, no-one (but stonithd itself) would ever notice successful completion, and stonith would be re-issued forever. Registering callbacks for these F_TYPE fixes these hung stonith and stonith_admin operations on the heartbeat cluster stack. - Heartbeat specific: Medium: fix tracking of peer client process status on heartbeat Don't optimistically assume that peer client processes are alive, or that a node that can talk to us is in fact member of the same ccm partition. Whenever ccm tells us about a new membership, *ask* for peer client process status. - This oneliner may well be relevant for corosync CPG as well, possibly one of the reasons the pcmk_cpg_membership() has this funny appears to be online even though we think it is dead block? fix crm_update_peer_proc to NOT ignore flags if partially set The set_bit() function used here actually deals with masks, not bit numbers. The flag argument should in fact be plural: flags. These proc flag bits are not always set one at a time, but for example as crm_proc_crmd | crm_proc_cpg, and not necessarily cleared with the same combination. Ignoring to-be-set flags just because *some* of the flag bits are already set is clearly a bug, and may be the reason for stale process cache information. - Heartbeat specific: Medium: map heartbeat JOIN/LEAVE status to ONLINE/OFFLINE The rest of the code deals in online and offline, not join and leave. Need to map these states, or the rest of the code won't work properly. - Generic, if shutdown is requested before stonith connection was ever established (due to other problems), inisting to re-try the stonith connection confused the shutdown. Medium: don't trigger a stonith_reconnect if no longer required Get rid of some spurious error messages, and speed up shutdown, even if the connection to the stonith daemon failed. - Non-functional change, just for readability: Low: use CRM_NODE_MEMBER, not CRM_NODE_ACTIVE ACTIVE is defined to be MEMBER anyways: include/crm/cluster.h:#define CRM_NODE_ACTIVECRM_NODE_MEMBER Don't confuse the reader of the code by implying it was something different. - Heartbeat specific, packaging only: Low: heartbeat 3.0.6 knows to finds the daemons; drop compat symlinks ___ Pacemaker mailing list: Pacemaker@oss.clusterlabs.org http://oss.clusterlabs.org/mailman/listinfo/pacemaker Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://bugs.clusterlabs.org
Re: [Pacemaker] How to avoid CRM sending stop when ha.cf gets 2nd node configured
On Sat, Nov 08, 2014 at 12:58:36AM +, aridh bose wrote: Hi, While using heartbeat and pacemaker, is it possible to bringup first node which can go as Master, followed by second node which should go as Slave without causing any issues to the first node? Currently, I see a couple of problems in achieving this:1. Assuming I am not using mcast communication, heartbeat is mandating me to configure second node info either in ha.cf or in /etc/hosts file with associated IP address. Why can't it come up by itself as Master to start with? 2. If I update ha.cf with the 2nd node info and use 'heartbeat -r' CRM first sends stop on the Master before sending start. Appreciate any help or pointers. Regardless of what you do there, or why, or on which communication stack: how about you first put pacemaker into maintenance-mode, then you do your re-archetecturing of your cluster, and once you are satisfied with the new cluster, you take it out of maintenance mode again? At least that is one of the intended use cases for maintenance mode. -- : Lars Ellenberg : http://www.LINBIT.com | Your Way to High Availability : DRBD, Linux-HA and Pacemaker support and consulting DRBD® and LINBIT® are registered trademarks of LINBIT, Austria. ___ Pacemaker mailing list: Pacemaker@oss.clusterlabs.org http://oss.clusterlabs.org/mailman/listinfo/pacemaker Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://bugs.clusterlabs.org
Re: [Pacemaker] [ha-wg-technical] [Linux-HA] [ha-wg] [RFC] Organizing HA Summit 2015
On Sat, Nov 01, 2014 at 01:19:35AM -0400, Digimer wrote: All the cool kids will be there. You want to be a cool kid, right? Well, no. ;-) But I'll still be there, and a few other Linbit'ers as well. Fabio, let us know what we could do to help make it happen. Lars On 01/11/14 01:06 AM, Fabio M. Di Nitto wrote: just a kind reminder. On 9/8/2014 12:30 PM, Fabio M. Di Nitto wrote: All, it's been almost 6 years since we had a face to face meeting for all developers and vendors involved in Linux HA. I'd like to try and organize a new event and piggy-back with DevConf in Brno [1]. DevConf will start Friday the 6th of Feb 2015 in Red Hat Brno offices. My suggestion would be to have a 2 days dedicated HA summit the 4th and the 5th of February. The goal for this meeting is to, beside to get to know each other and all social aspect of those events, tune the directions of the various HA projects and explore common areas of improvements. I am also very open to the idea of extending to 3 days, 1 one dedicated to customers/users and 2 dedicated to developers, by starting the 3rd. Thoughts? Fabio PS Please hit reply all or include me in CC just to make sure I'll see an answer :) [1] http://devconf.cz/ Could you please let me know by end of Nov if you are interested or not? I have heard only from few people so far. Cheers Fabio ___ Pacemaker mailing list: Pacemaker@oss.clusterlabs.org http://oss.clusterlabs.org/mailman/listinfo/pacemaker Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://bugs.clusterlabs.org
Re: [Pacemaker] can we update an attribute with cmpxchg atomic compare and exchange semantics?
On Tue, Sep 30, 2014 at 01:51:21PM +1000, Andrew Beekhof wrote: On 30 Sep 2014, at 6:22 am, Lars Ellenberg lars.ellenb...@linbit.com wrote: On Wed, Sep 10, 2014 at 11:50:58AM +0200, Lars Ellenberg wrote: Hi Andrew (and others). For a certain use case (yes, I'm talking about DRBD peer-fencing on loss of replication link), it would be nice to be able to say: update some_attribute=some_attribute+1 where some_attribute = 0 delete some_attribute where some_attribute=0 Ok, that's not the classic cmpxchg(), more of an atomic_add(); or similar enough. With hopefully just a single cib roundrip. Let me rephrase: Update attribute this_is_pink (for node-X with ID attr-ID): fail if said attr-ID exists elsewhere (not as the intended attribute at the intended place in the xml tree) (this comes for free already, I think) if it does not exist at all, assume it was present with current value 0 if the current (or assumed current) value is = 0, add 1 if the current value is 0, fail (optionally: return new value? old value?) Did anyone read this? Yep, but it requires a non-trivial answer so it got deferred :) Its a reasonable request, we've spoken about something similar in the past and its clear that at some point attrd needs to grow some extra capabilities. Exactly when it will bubble up to the top of the todo list is less certain, though I would happily coach someone with the necessary motivation. The other thing to mention is that currently the only part that wont work is if the current value is 0, fail. Setting value=value++ will do the rest. Nice. So my question would be... how important is the 'lt 0' case? Actually, come to think of it, it's not a bad default behaviour. Certainly failing value++ if value=-INFINITY would be logically consistent with the existing code. Would that be sufficient? I need to think about that some more. I may need to actually try this out and try to implement my scenario. -- : Lars Ellenberg : LINBIT | Your Way to High Availability : DRBD/HA support and consulting http://www.linbit.com DRBD® and LINBIT® are registered trademarks of LINBIT, Austria. ___ Pacemaker mailing list: Pacemaker@oss.clusterlabs.org http://oss.clusterlabs.org/mailman/listinfo/pacemaker Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://bugs.clusterlabs.org
Re: [Pacemaker] can we update an attribute with cmpxchg atomic compare and exchange semantics?
On Wed, Sep 10, 2014 at 11:50:58AM +0200, Lars Ellenberg wrote: Hi Andrew (and others). For a certain use case (yes, I'm talking about DRBD peer-fencing on loss of replication link), it would be nice to be able to say: update some_attribute=some_attribute+1 where some_attribute = 0 delete some_attribute where some_attribute=0 Ok, that's not the classic cmpxchg(), more of an atomic_add(); or similar enough. With hopefully just a single cib roundrip. Let me rephrase: Update attribute this_is_pink (for node-X with ID attr-ID): fail if said attr-ID exists elsewhere (not as the intended attribute at the intended place in the xml tree) (this comes for free already, I think) if it does not exist at all, assume it was present with current value 0 if the current (or assumed current) value is = 0, add 1 if the current value is 0, fail (optionally: return new value? old value?) Did anyone read this? My intended use case scenario is this: Two DRBD nodes, several DRBD resources, at least a few of them in dual-primary. Replication link breaks. Fence-peer handlers are triggered individually for each resource on both nodes, and try to concurrently modify the cib (place fencing constraints). With the current implementation of crm-fence-peer.sh, it is likely that some DRBD resources win on one node, some win on the other node. The respective losers will have their IO blocked. Which means that most likely on both nodes some DRBD will stay blocked, some monitor operation will soon fail, some stop operation (to recover from the monitor fail) will soon fail, and the recovery of that will be node-level fencing of the affected node. In short: both nodes will be hard-reset because of a replication link failure. If I would instead use a single attribute (with a pre-determined ID) for all instances of the fence-peer handler, the first to come would chose the victim node, all others would just add their count. There will be only one loser, and more importantly: one survivor. Once the replication link is re-established, DRBD resynchronization will bring the former loser up-to-date, and the respective after-resync handlers will decrease that breakage count. Once the breakage count hits zero, it can and should be deleted. Presence of the breakage count attribute with value 0 would mean this node must not be promoted, which would be a static constraint to be added to all DRBD resources. Does that make sense? (I have more insane proposals, in case we have multiple (more than 2) Primaries during normal operation, but I'm not yet able to write them down without being seriously confused by myself...) I could open-code it with shell and cibadmin, btw. I did a proof-of-concept once that does a. cibadmin -Q b. some calculations, then prepares the update statement xml based on cib content seen, *including* the cib generation counters c. cibadmin -R (or -C, -M, -D, as appropriate) this will fail if the cib was modified in a relevant way since a, because of the included generation counters d. repeat as necessary But that is beyond ugly. And probably fragile. And would often fail for all the wrong reasons, just because some status code has changed and bumped the cib generation counters. What would be needed to add such functionality? Where would it go? cibadmin? cib? crm_attribute? possibly also attrd? Thanks, Lars ___ Pacemaker mailing list: Pacemaker@oss.clusterlabs.org http://oss.clusterlabs.org/mailman/listinfo/pacemaker Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://bugs.clusterlabs.org -- : Lars Ellenberg : LINBIT | Your Way to High Availability : DRBD/HA support and consulting http://www.linbit.com DRBD® and LINBIT® are registered trademarks of LINBIT, Austria. ___ Pacemaker mailing list: Pacemaker@oss.clusterlabs.org http://oss.clusterlabs.org/mailman/listinfo/pacemaker Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://bugs.clusterlabs.org
[Pacemaker] can we update an attribute with cmpxchg atomic compare and exchange semantics?
Hi Andrew (and others). For a certain use case (yes, I'm talking about DRBD peer-fencing on loss of replication link), it would be nice to be able to say: update some_attribute=some_attribute+1 where some_attribute = 0 delete some_attribute where some_attribute=0 Ok, that's not the classic cmpxchg(), more of an atomic_add(); or similar enough. With hopefully just a single cib roundrip. Let me rephrase: Update attribute this_is_pink (for node-X with ID attr-ID): fail if said attr-ID exists elsewhere (not as the intended attribute at the intended place in the xml tree) (this comes for free already, I think) if it does not exist at all, assume it was present with current value 0 if the current (or assumed current) value is = 0, add 1 if the current value is 0, fail (optionally: return new value? old value?) My intended use case scenario is this: Two DRBD nodes, several DRBD resources, at least a few of them in dual-primary. Replication link breaks. Fence-peer handlers are triggered individually for each resource on both nodes, and try to concurrently modify the cib (place fencing constraints). With the current implementation of crm-fence-peer.sh, it is likely that some DRBD resources win on one node, some win on the other node. The respective losers will have their IO blocked. Which means that most likely on both nodes some DRBD will stay blocked, some monitor operation will soon fail, some stop operation (to recover from the monitor fail) will soon fail, and the recovery of that will be node-level fencing of the affected node. In short: both nodes will be hard-reset because of a replication link failure. If I would instead use a single attribute (with a pre-determined ID) for all instances of the fence-peer handler, the first to come would chose the victim node, all others would just add their count. There will be only one loser, and more importantly: one survivor. Once the replication link is re-established, DRBD resynchronization will bring the former loser up-to-date, and the respective after-resync handlers will decrease that breakage count. Once the breakage count hits zero, it can and should be deleted. Presence of the breakage count attribute with value 0 would mean this node must not be promoted, which would be a static constraint to be added to all DRBD resources. Does that make sense? (I have more insane proposals, in case we have multiple (more than 2) Primaries during normal operation, but I'm not yet able to write them down without being seriously confused by myself...) I could open-code it with shell and cibadmin, btw. I did a proof-of-concept once that does a. cibadmin -Q b. some calculations, then prepares the update statement xml based on cib content seen, *including* the cib generation counters c. cibadmin -R (or -C, -M, -D, as appropriate) this will fail if the cib was modified in a relevant way since a, because of the included generation counters d. repeat as necessary But that is beyond ugly. And probably fragile. And would often fail for all the wrong reasons, just because some status code has changed and bumped the cib generation counters. What would be needed to add such functionality? Where would it go? cibadmin? cib? crm_attribute? possibly also attrd? Thanks, Lars ___ Pacemaker mailing list: Pacemaker@oss.clusterlabs.org http://oss.clusterlabs.org/mailman/listinfo/pacemaker Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://bugs.clusterlabs.org
Re: [Pacemaker] Configuration recommandations for (very?) large cluster
, crm_msg_crmd, novote, TRUE); free_xml(novote); --- include/crm/msg_xml.h.orig 2011-11-28 16:41:47.309414327 +0100 +++ include/crm/msg_xml.h 2011-11-28 16:42:23.921417584 +0100 @@ -33,6 +33,7 @@ # define F_CRM_USER crm_user # define F_CRM_JOIN_IDjoin_id # define F_CRM_ELECTION_IDelection-id +# define F_CRM_DC_PRIO dc-prio # define F_CRM_ELECTION_AGE_S election-age-sec # define F_CRM_ELECTION_AGE_USelection-age-nano-sec # define F_CRM_ELECTION_OWNER election-owner --- lib/ais/plugin.c.orig 2011-11-28 16:42:57.002411543 +0100 +++ lib/ais/plugin.c2011-11-28 16:44:22.160413844 +0100 @@ -409,6 +409,9 @@ get_config_opt(pcmk_api, local_handle, use_logd, value, no); pcmk_env.use_logd = value; +get_config_opt(pcmk_api, local_handle, dc_prio, value, 1); +pcmk_env.dc_prio = value; + get_config_opt(pcmk_api, local_handle, use_mgmtd, value, no); if (ais_get_boolean(value) == FALSE) { int lpc = 0; @@ -599,6 +602,7 @@ pcmk_env.logfile = NULL; pcmk_env.use_logd = false; pcmk_env.syslog = daemon; +pcmk_env.dc_prio = 1; if (cs_uid != root_uid) { ais_err(Corosync must be configured to start as 'root', --- lib/ais/utils.c.orig2011-11-28 16:45:01.940415754 +0100 +++ lib/ais/utils.c 2011-11-28 16:45:33.018412117 +0100 @@ -237,6 +237,7 @@ setenv(HA_logfacility,pcmk_env.syslog, 1); setenv(HA_LOGFACILITY,pcmk_env.syslog, 1); setenv(HA_use_logd, pcmk_env.use_logd, 1); +setenv(HA_dc_prio, pcmk_env.dc_prio, 1); setenv(HA_quorum_type,pcmk_env.quorum, 1); /* *INDENT-ON* */ --- lib/ais/utils.h.orig2011-11-28 16:45:45.143412597 +0100 +++ lib/ais/utils.h 2011-11-28 16:46:37.026410208 +0100 @@ -238,6 +238,7 @@ const char *syslog; const char *logfile; const char *use_logd; +const char *dc_prio; const char *quorum; }; --- crmd/messages.c.orig2012-05-25 16:23:22.913106180 +0200 +++ crmd/messages.c 2012-05-25 16:28:30.330263392 +0200 @@ -36,6 +36,8 @@ #include crmd_messages.h #include crmd_lrm.h +static int our_dc_prio = INT_MIN; + GListPtr fsa_message_queue = NULL; extern void crm_shutdown(int nsig); @@ -693,7 +695,19 @@ /*== DC-Only Actions ==*/ if (AM_I_DC) { if (strcmp(op, CRM_OP_JOIN_ANNOUNCE) == 0) { -return I_NODE_JOIN; + if (our_dc_prio == INT_MIN) { + char * dc_prio_str = getenv(HA_dc_prio); + + if (dc_prio_str == NULL) { + our_dc_prio = 1; + } else { + our_dc_prio = atoi(dc_prio_str); + } + } + if (our_dc_prio == 0) + return I_ELECTION; +else + return I_NODE_JOIN; } else if (strcmp(op, CRM_OP_JOIN_REQUEST) == 0) { return I_JOIN_REQUEST; -- : Lars Ellenberg : LINBIT | Your Way to High Availability : DRBD/HA support and consulting http://www.linbit.com DRBD® and LINBIT® are registered trademarks of LINBIT, Austria. ___ Pacemaker mailing list: Pacemaker@oss.clusterlabs.org http://oss.clusterlabs.org/mailman/listinfo/pacemaker Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://bugs.clusterlabs.org
Re: [Pacemaker] Multiple node loadbalancing
On Wed, Jul 30, 2014 at 02:59:31PM +0200, Machiel wrote: Hi Guys We are trying to setup the following, however we can not seem to find any references on the internet which will explain on how to configure it. We have 3 machines, and we need to setup load balancing on the machines as follows: - Load balancer and apps running on all 3 machines - 1 machine should be the load balancer (master) which will balance traffic over all 3 machines including itself. - should this node fail, the second node should take over the task, and if the second node should fail, then the 3rd node should take over as standalone until the other nodes are restored. We are only able to find configuration instructions on how to setup load balancing for 2 nodes which we have done several times, however no info for 3 nodes. We are currently using ldirectord and heartbeat, however in this setup, if the first node fails , then both the 2nd and 3rd nodes try to take over. (this was configured very long ago though). While the communication and membership layer of heartbeat always supported many nodes, the resource manager part of heartbeat (haresources mode) is a very basic shell script, and does only support two-node clusters. With haresources mode of heartbeat, you can only do two-node clusters (if you intend to keep your sanity). I would really appreciate any suggestions on this or even links where I can find the information would be appreciated. Use pacemaker. Whether you want heartbeat or corosync as the communication an membership layer is up to you. For new installations and recent OS releases, pacemaker + corosync is generally the recommended way. -- : Lars Ellenberg : LINBIT | Your Way to High Availability : DRBD/HA support and consulting http://www.linbit.com DRBD® and LINBIT® are registered trademarks of LINBIT, Austria. ___ Pacemaker mailing list: Pacemaker@oss.clusterlabs.org http://oss.clusterlabs.org/mailman/listinfo/pacemaker Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://bugs.clusterlabs.org
Re: [Pacemaker] [DRBD-user] DRBD active/passive on Pacemaker+CMAN cluster unexpectedly performs STONITH when promoting
On Fri, Jul 04, 2014 at 06:04:12PM +0200, Giuseppe Ragusa wrote: The setup almost works (all seems ok with: pcs status, crm_mon -Arf1, corosync-cfgtool -s, corosync-objctl | grep member) , but every time it needs a resource promotion (to Master, i.e. becoming primary) it either fails or fences the other node (the one supposed to become Slave i.e. secondary) and only then succeeds. It happens, for example both on initial resource definition (when attempting first start) and on node entering standby (when trying to automatically move the resources by stopping then starting them). I collected a full pcs cluster report and I can provide a CIB dump, but I will initially paste here an excerpt from my configuration just in case it happens to be a simple configuration error that someone can spot on the fly ; (hoping...) Keep in mind that the setup has separated redundant network connections for LAN (1 Gib/s LACP to switches), Corosync (1 Gib/s roundrobin back-to-back) and DRBD (10 Gib/s roundrobin back-to-back) and that FQDNs are correctly resolved through /etc/hosts Make sure youre DRBD are Connected UpToDate/UpToDate before you let the cluster take over control of who is master. Thanks for your important reminder. Actually they had been Connected UpToDate/UpToDate, and I subsequently had all manually demoted to secondary then down-ed before eventually stopping the (manually started) DRBD service. Only at the end did I start/configure the cluster. The problem is now resolved and it seems that my improper use of rhcs_fence as fence-peer was the culprit (now switched to crm-fence-peer.sh), but I still do not understand why rhcs_fence was called at all in the beginning (once called, it may have caused unforeseen consequences, I admit) since DRBD docs clearly state that communication disruption must be involved in order to call fence-peer into action. You likely managed to have data divergence between your instances of DRBD, likely caused by a cluster split-brain. So DRBD would refuse to connect, and thus would be not connected when promoted. Just because you can shoot someone does not make your data any better, nor does it tell the victim node that his data is bad (from the shooting nodes point of view) so they would just keep killing each other then. Don't do that. But tell the cluster to not even attempt to promote, unless the local data is known to be UpToDate *and* the remote data is either known (DRBD is connected) or the remote date is known to be bad (Outdated or worse). the ocf:linbit:drbd agent has an adjust master scores parameter for that. See there. Lars -- : Lars Ellenberg : LINBIT | Your Way to High Availability : DRBD/HA support and consulting http://www.linbit.com DRBD® and LINBIT® are registered trademarks of LINBIT, Austria. __ please don't Cc me, but send to list -- I'm subscribed ___ Pacemaker mailing list: Pacemaker@oss.clusterlabs.org http://oss.clusterlabs.org/mailman/listinfo/pacemaker Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://bugs.clusterlabs.org
Re: [Pacemaker] [DRBD-user] DRBD active/passive on Pacemaker+CMAN cluster unexpectedly performs STONITH when promoting
device name=pcmk port=cluster2.verolengo.privatelan/ /method /fence /clusternode /clusternodes fencedevices fencedevice name=pcmk agent=fence_pcmk/ /fencedevices fence_daemon clean_start=0 post_fail_delay=30 post_join_delay=30/ logging debug=on/ rm disabled=1 failoverdomains/ resources/ /rm /cluster -- Pacemaker: PROPERTIES: pcs property set default-resource-stickiness=100 pcs property set no-quorum-policy=ignore STONITH: pcs stonith create ilocluster1 fence_ilo2 action=off delay=10 \ ipaddr=ilocluster1.verolengo.privatelan login=cluster2 passwd=test power_wait=4 \ pcmk_host_check=static-list pcmk_host_list=cluster1.verolengo.privatelan op monitor interval=60s pcs stonith create ilocluster2 fence_ilo2 action=off \ ipaddr=ilocluster2.verolengo.privatelan login=cluster1 passwd=test power_wait=4 \ pcmk_host_check=static-list pcmk_host_list=cluster2.verolengo.privatelan op monitor interval=60s pcs stonith create pdu1 fence_apc action=off \ ipaddr=pdu1.verolengo.privatelan login=cluster passwd=test \ pcmk_host_map=cluster1.verolengo.privatelan:3,cluster1.verolengo.privatelan:4,cluster2.verolengo.privatelan:6,cluster2.verolengo.privatelan:7 \ pcmk_host_check=static-list pcmk_host_list=cluster1.verolengo.privatelan,cluster2.verolengo.privatelan op monitor interval=60s pcs stonith level add 1 cluster1.verolengo.privatelan ilocluster1 pcs stonith level add 2 cluster1.verolengo.privatelan pdu1 pcs stonith level add 1 cluster2.verolengo.privatelan ilocluster2 pcs stonith level add 2 cluster2.verolengo.privatelan pdu1 pcs property set stonith-enabled=true pcs property set stonith-action=off SAMPLE RESOURCE: pcs cluster cib dc_cfg pcs -f dc_cfg resource create DCVMDisk ocf:linbit:drbd \ drbd_resource=dc_vm op monitor interval=31s role=Master \ op monitor interval=29s role=Slave \ op start interval=0 timeout=120s \ op stop interval=0 timeout=180s pcs -f dc_cfg resource master DCVMDiskClone DCVMDisk \ master-max=1 master-node-max=1 clone-max=2 clone-node-max=1 \ notify=true target-role=Started is-managed=true pcs -f dc_cfg resource create DCVM ocf:heartbeat:VirtualDomain \ config=/etc/libvirt/qemu/dc.xml migration_transport=tcp migration_network_suffix=-10g \ hypervisor=qemu:///system meta allow-migrate=false target-role=Started is-managed=true \ op start interval=0 timeout=120s \ op stop interval=0 timeout=120s \ op monitor interval=60s timeout=120s pcs -f dc_cfg constraint colocation add DCVM DCVMDiskClone INFINITY with-rsc-role=Master pcs -f dc_cfg constraint order promote DCVMDiskClone then start DCVM pcs -f dc_cfg constraint location DCVM prefers cluster2.verolengo.privatelan=50 pcs cluster cib-push firewall_cfg Since I know that pcs still has some rough edges, I installed crmsh too, but never actually used it. Many thanks in advance for your attention. Kind regards, Giuseppe Ragusa ___ drbd-user mailing list drbd-u...@lists.linbit.com http://lists.linbit.com/mailman/listinfo/drbd-user -- : Lars Ellenberg : LINBIT | Your Way to High Availability : DRBD/HA support and consulting http://www.linbit.com DRBD® and LINBIT® are registered trademarks of LINBIT, Austria. __ please don't Cc me, but send to list -- I'm subscribed ___ Pacemaker mailing list: Pacemaker@oss.clusterlabs.org http://oss.clusterlabs.org/mailman/listinfo/pacemaker Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://bugs.clusterlabs.org
Re: [Pacemaker] Not unmoving colocated resources can provoke DRBD split-brain
On Thu, Jun 12, 2014 at 10:10:55AM +1000, Andrew Beekhof wrote: Referring to the king of drbd... Lars, question for you inline. === primitive DRBD-ffm ocf:linbit:drbd params drbd_resource=ffm \ op start interval=0 timeout=240 \ op promote interval=0 timeout=90 \ op demote interval=0 timeout=90 \ op notify interval=0 timeout=90 \ op stop interval=0 timeout=100 \ op monitor role=Slave timeout=20 interval=20 \ op monitor role=Master timeout=20 interval=10 ms ms-DRBD-ffm DRBD-ffm meta master-max=1 master-node-max=1 \ clone-max=2 clone-node-max=1 notify=true colocation coloc-ms-DRBD-ffm-follows-ALL-ffm inf: \ ms-DRBD-ffm:Master ALL-ffm order ord-ALL-ffm-before-DRBD-ffm inf: ALL-ffm ms-DRBD-ffm:promote location loc-ms-DRBD-ffm-korfwm01 ms-DRBD-ffm -inf: korfwm01 location loc-ms-DRBD-ffm-korfwm02 ms-DRBD-ffm -inf: korfwm02 === # crm node standby korfwf01 ; sleep 10 # crm node online korfwf01 ; sleep 10 # crm resource move ALL-ffm korfwf01 ; sleep 10 # crm node standby korfwf01 ; sleep 10 # crm node online korfwf01 ; sleep 10 *bang* split-brain. This is because with the last command online korfwf01 pacemaker starts and the immediately promotes ms-DRBD-ffm without giving any time for drbd to sync with the peer. Have you seen anything like this before? I don't know we have any capacity to delay the promotion in the PE... perhaps the agent needs to delay setting a master score if its out of date? or maybe loop in the promote action and set a really long timeout You want to configure DRBD for fencing resource-and-stonith, and use the fence-peer handler crm-fence-peer.sh (and the corresponding crm-unfence-peer.sh in the after-resync handler. Done. What does that do? If a fencing policy != dont-care is configured, DRBD, if gracefully disconnected (stop), will outdate a secondary. Outdated secondaries refuse to be promoted. On non-graceful disconnect, a Primary will freeze IO, call the fence-peer handler, which places a constraint pinning the primary role to where it currently is, and on success resume IO. Also, DRBD will not consider itself as UpToDate immediately after start, but as Consistent at best, which will use a minimal master_score (or none at all, see adjust-master-scores). Due to this constraint, pacemaker will not attempt promotion on the node that was fenced (in this case only fenced from becomming Primary, no necessarily shot... it really only places a constraint) until that node is unfenced (the constraint is removed), which will happen in the after-resync-target handler (crm-unfence-peer.sh). If you don't like the freeze IO part above, you can use the resource-only fencing policy. The and-stonith part is really only about the freeze-io. The crm-fence-peer.sh does NOT (usually) trigger stonith itself. It may wait for a successful stonith though, if it thinks one is pending. The only reliable (as can be) way to avoid data divergence with DRBD and pacemaker is to use redundant cluster communications, use working and tested node level fencing on the pacemaker level, *and* use fencing resource-and-stonith + crm-fence-peer.sh on the DRBD level. You may want to use the adjust-master-score parameter of the DRBD resource agent as well, to avoid pacemaker attempting to promote an only Consistent DRBD, which will usually fail anyways. See description there. -- : Lars Ellenberg : LINBIT | Your Way to High Availability : DRBD/HA support and consulting http://www.linbit.com DRBD® and LINBIT® are registered trademarks of LINBIT, Austria. ___ Pacemaker mailing list: Pacemaker@oss.clusterlabs.org http://oss.clusterlabs.org/mailman/listinfo/pacemaker Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://bugs.clusterlabs.org
Re: [Pacemaker] DRBD primary/primary + Pacemaker goes into split brain after crm node standby/online
On Mon, Jun 09, 2014 at 08:07:51PM +0200, Alexis de BRUYN wrote: Hi Everybody, I have an issue with a 2-node Debian Wheezy primary/primary DRBD Pacemaker/Corosync configuration. After a 'crm node standby' then a 'crm node online', the DRBD volume stays in a 'split brain state' (cs:StandAlone ro:Primary/Unknown). A soft or hard reboot of one node gets rid of the split brain and/or doesn't create one. I have followed http://www.drbd.org/users-guide-8.3/ and keep my tests as simple as possible (no activity and no filesystem on the DRBD volume). I don't see what I am doing wrong. Could anybody help me with this please. Use fencing, both node-level fencing on the Pacemaker level, *and* constraint fencing on the DRBD level: # cat /etc/drbd.d/sda4.res resource sda4 { device /dev/drbd0; disk /dev/sda4; meta-disk internal; startup { become-primary-on both; } handlers { split-brain /usr/lib/drbd/notify-split-brain.sh root; fence-peer crm-fence-peer.sh; after-resync-target crm-unfence-peer.sh; } disk { fencing resource-and-stonith; } net { allow-two-primaries; after-sb-0pri discard-zero-changes; after-sb-1pri discard-secondary; after-sb-2pri disconnect; } on testvm1 { address 192.168.1.201:7788; } on testvm2 { address 192.168.1.202:7788; } syncer { rate 100M; al-extents 3389; } } -- : Lars Ellenberg : LINBIT | Your Way to High Availability : DRBD/HA support and consulting http://www.linbit.com DRBD® and LINBIT® are registered trademarks of LINBIT, Austria. ___ Pacemaker mailing list: Pacemaker@oss.clusterlabs.org http://oss.clusterlabs.org/mailman/listinfo/pacemaker Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://bugs.clusterlabs.org
Re: [Pacemaker] drbd + lvm
On Thu, Mar 13, 2014 at 03:57:28PM -0400, David Vossel wrote: - Original Message - From: Infoomatic infooma...@gmx.at To: pacemaker@oss.clusterlabs.org Sent: Thursday, March 13, 2014 2:26:00 PM Subject: [Pacemaker] drbd + lvm Hi list, I am having troubles with pacemaker and lvm and stacked drbd resources. The system consists of 2 Ubuntu 12 LTS servers, each having two partitions of an underlying raid 1+0 as volume group with one LV each as a drbd backing device. The purpose is for usage with VMs and adjusting needed disk space flexible, so on top of the drbd resources there are LVs for each VM. I created a stack with LCMC, which is like: DRBD-LV-libvirt and DRBD-LV-Filesystem-lxc The problem now: the system has hickups - when VM01 runs on HOST01 (being primary DRBD) and HOST02 is restarting, lvm is reloaded (at boot time) and the LVs are being activated. This of course results in an error, the log entry: Mar 13 17:58:42 host01 pengine: [27563]: ERROR: native_create_actions: Resource res_LVM_1 (ocf::LVM) is active on 2 nodes attempting recovery Therefore, as configured, the resource is stopped and started again (on only one node). Thus, all VMs and containers relying on this are also restared. When I disable the LVs that use the DRBD resource at boot (lvm.conf: volume_list only containing the VG from the partitions of the raidsystem) a reboot of the secondary does not restart the VMs running on the primary. However, if the primary goes down (e.g. power interruption), the secondary cannot activate the LVs of the VMs because they are not in the list of lvm.conf to be activated. Has anyone had this issue and resolved it? Any ideas? Thanks in advance! Yep, i've hit this as well. Use the latest LVM agent. I already fixed all of this. I you exclude the DRBD lower level devices in your lvm.conf filter (and update your initramfs to have a proper copy of that lvm.conf), and only allow them to be accessed via DRBD, LVM cannot possibly activate them on boot. But only after DRBD was promoted. Which supposedly happens via pacemaker only. And unless some udev rule auto-activates any VG found immediately, it should only be activated via pacemaker as well. So something like that should be in your lvm.conf: filter = [ a|^/dev/your/system/PVs|, a|^/dev/drbd|, r|.| ] https://github.com/ClusterLabs/resource-agents/blob/master/heartbeat/LVM Keep your volume_list the way it is and use the 'exclusive=true' LVM option. This will allow the LVM agent to activate volumes that don't exist in the volume_list. That is a nice feature, but if I'm correct, it is unrelated here. -- : Lars Ellenberg : LINBIT | Your Way to High Availability : DRBD/HA support and consulting http://www.linbit.com DRBD® and LINBIT® are registered trademarks of LINBIT, Austria. ___ Pacemaker mailing list: Pacemaker@oss.clusterlabs.org http://oss.clusterlabs.org/mailman/listinfo/pacemaker Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://bugs.clusterlabs.org
Re: [Pacemaker] no-quorum-policy = demote?
On Fri, Apr 11, 2014 at 10:02:59AM +0200, Christian Ciach wrote: Thank you for pointing me to the environment variables. Unfortunately, none of these work in this case. For example: Assume one node is currently the master. Then, because of a network failure, this node loses quorum. Because no-quorum-policy is set to ignore, this node will keep being a master. In this case there is no change of state, thus the notify-function of the OCF-agent does not get called by pacemaker. I've already tried this, so I am quite sure about that. Very very hackish idea: set monitor interval of the Master role to T seconds and fail (+demote) if no quorum. (or use a dummy resource agent similar to the ping RA, and update some node attribute from there... then have a contraint for the Master role on that node attribute) in your promote action, refuse to promote if no quorum sleep 3*T (+ time to demote) only then actually promote. That way, you are reasonably sure that, before you actually promote, the former master had a chance to notice quorum loss and demote. But you really should look into booth, or proper fencing. -- : Lars Ellenberg : LINBIT | Your Way to High Availability : DRBD/HA support and consulting http://www.linbit.com DRBD® and LINBIT® are registered trademarks of LINBIT, Austria. ___ Pacemaker mailing list: Pacemaker@oss.clusterlabs.org http://oss.clusterlabs.org/mailman/listinfo/pacemaker Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://bugs.clusterlabs.org
Re: [Pacemaker] Colocation constraint to External Managed Resource
On Thu, Oct 10, 2013 at 06:20:54PM +0200, Robert H. wrote: Hello, Am 10.10.2013 16:18, schrieb Andreas Kurz: You configured a monitor operation for this unmanaged resource? Yes, and some parts work as expected, however some behaviour is strange. Config (relevant part only): primitive mysql-percona lsb:mysql \ op start enabled=false interval=0 \ op stop enabled=false interval=0 \ op monitor enabled=true timeout=20s interval=10s \ You probably also want to monitor even if pacemaker thinks this is supposed to be stopped. op monitor interval=11s timeout=20s role=Stopped meta migration-threshold=2 failure-timeout=30s is-managed=false clone CLONE-percona mysql-percona \ meta clone-max=2 clone-node-max=1 is-managed=false location clone-percona-placement CLONE-percona \ rule $id=clone-percona-placement-rule -inf: #uname ne NODE1 and #uname ne NODE2 colocation APP-dev2-private-percona-withip inf: IP CLONE-percona Test: I start by both Percona XtraDB machines running: IP-dev2-privatevip1(ocf::heartbeat:IPaddr2): Started NODE2 Clone Set: CLONE-percona [mysql-percona] (unmanaged) mysql-percona:0(lsb:mysql):Started NODE1 (unmanaged) mysql-percona:1(lsb:mysql):Started NODE2 (unmanaged) shell# /etc/init.d/mysql stop on NODE2 ... Pacemaker reacts as expected IP-dev2-privatevip1(ocf::heartbeat:IPaddr2): Started NODE1 Clone Set: CLONE-percona [mysql-percona] (unmanaged) mysql-percona:0(lsb:mysql):Started NODE1 (unmanaged) mysql-percona:1(lsb:mysql):Started NODE2 (unmanaged) FAILED .. then I wait .. after some time (1 min), the ressource is shown as running ... IP-dev2-privatevip1(ocf::heartbeat:IPaddr2): Started NODE1 Clone Set: CLONE-percona [mysql-percona] (unmanaged) mysql-percona:0(lsb:mysql):Started NODE1 (unmanaged) mysql-percona:1(lsb:mysql):Started NODE2 (unmanaged) But it is definitly not running: shell# /etc/init.d/mysql status MySQL (Percona XtraDB Cluster) is not running [FEHLGESCHLAGEN] When I run probe crm resource reprobe it switches to: IP-dev2-privatevip1(ocf::heartbeat:IPaddr2): Started NODE1 Clone Set: CLONE-percona [mysql-percona] (unmanaged) mysql-percona:0(lsb:mysql):Started NODE1 (unmanaged) Stopped: [ mysql-percona:1 ] Then when I start it again: /etc/init.d/mysql start on NODE2 It stays this way: IP-dev2-privatevip1(ocf::heartbeat:IPaddr2): Started NODE1 Clone Set: CLONE-percona [mysql-percona] (unmanaged) mysql-percona:0(lsb:mysql):Started NODE1 (unmanaged) Stopped: [ mysql-percona:1 ] Only a manual reprobe helps: IP-dev2-privatevip1(ocf::heartbeat:IPaddr2): Started NODE1 Clone Set: CLONE-percona [mysql-percona] (unmanaged) mysql-percona:0(lsb:mysql):Started NODE1 (unmanaged) mysql-percona:1(lsb:mysql):Started NODE2 (unmanaged) Same thing happens when I reboot NODE2 (or other way around). --- I would expect that crm_mon ALWAYS reflects the local state, however it looks like a bug for me. crm_mon reflects what is in the cib. If no-one re-populates the cib with the current state of the world, what it shows will be stale. -- : Lars Ellenberg : LINBIT | Your Way to High Availability : DRBD/HA support and consulting http://www.linbit.com DRBD® and LINBIT® are registered trademarks of LINBIT, Austria. ___ Pacemaker mailing list: Pacemaker@oss.clusterlabs.org http://oss.clusterlabs.org/mailman/listinfo/pacemaker Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://bugs.clusterlabs.org
Re: [Pacemaker] [Linux-HA] Probably a regression of the linbit drbd agent between pacemaker 1.1.8 and 1.1.10
On Mon, Sep 09, 2013 at 01:41:17PM +0200, Andreas Mock wrote: Hi Lars, here also my official Thank you very much looking at the problem. I've been looking forward to the official release of drbd 8.4.4. Or do you need disoriented rc testers like me? ;-) Why not? That's what release candidates are intended for. You'd only have to confirm that it works for you now. Respectively, that it still does not, in which case you better report that now than after the release, right? -- : Lars Ellenberg : LINBIT | Your Way to High Availability : DRBD/HA support and consulting http://www.linbit.com DRBD® and LINBIT® are registered trademarks of LINBIT, Austria. ___ Pacemaker mailing list: Pacemaker@oss.clusterlabs.org http://oss.clusterlabs.org/mailman/listinfo/pacemaker Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://bugs.clusterlabs.org
Re: [Pacemaker] [Linux-HA] Probably a regression of the linbit drbd agent between pacemaker 1.1.8 and 1.1.10
On Mon, Sep 09, 2013 at 02:42:45PM +1000, Andrew Beekhof wrote: On 06/09/2013, at 5:51 PM, Lars Ellenberg lars.ellenb...@linbit.com wrote: On Tue, Aug 27, 2013 at 06:51:45AM +0200, Andreas Mock wrote: Hi Andrew, as this is a real showstopper at the moment I invested some other hours to be sure (as far as possible) not having made an error. Some additions: 1) I mirrored the whole mini drbd config to another pacemaker cluster. Same result: pacemaker 1.1.8 works, pacemaker 1.1.10 not 2) When I remove the target role Stopped from the drbd ms resource and insert the config snippet related to the drbd device via crm -f file to a lean running pacemaker config (pacemaker cluster options, stonith resources), it seems to work. That means one of the nodes gets promoted. Then after stopping 'crm resource stop ms_drbd_xxx' and starting again I see the same promotion error as described. The drbd resource agent is using /usr/sbin/crm_master. Is there a possibility that feedback given through this client tool is changing the timing behaviour of pacemaker? Or the way transitions are scheduled? Any idea that may be related to a change in pacemaker? I think that recent pacemaker allows for start and promote in the same transition. At least in the one case I saw logs of, this wasn't the case. The PE computed: Current cluster status: Online: [ db05 db06 ] r_stonith-db05(stonith:fence_imm):Started db06 r_stonith-db06(stonith:fence_imm):Started db05 Master/Slave Set: ms_drbd_fodb [r_drbd_fodb] Slaves: [ db05 db06 ] Master/Slave Set: ms_drbd_fodblog [r_drbd_fodblog] Slaves: [ db05 db06 ] Transition Summary: * Promote r_drbd_fodb:0 (Slave - Master db05) * Promote r_drbd_fodblog:0(Slave - Master db05) and it was the promotion of r_drbd_fodb:0 that failed. Right. Off-list communication revealed that DRBD came up as Consistent only, which is a normal and expected state, when using resource level fencing. The promotion attempt then raced with the connection handshake. The DRBD fence-peer handler is run (because it's only Consistent, not UpToDate) and returns successfully, but due to that race, this result is ignored, DRBD stays only Consistent, which is not good enough to be promoted (need access to UpToDate data). Once the handshake is done, that also results in access to good data, which is why the next promotion attempt succeeds. Something in the timing of pacemaker actions has changed between the affected and unaffected versions. Apparently before there was enough time to do the connection handshake before the promote request was made. This race is fixed with DRBD 8.3.16 and 8.4.4 (currently rc1) You can avoid that race by not allowing Pacemaker to promote if DRBD is only Consistent. Pacemaker will only attempt promotion, if there is a positive master score for the resource. The ocf:linbit:drbd RA hardcodes the master score for Consistent to 5. So you may edit the RA and instead remove the master score for the only Consistent. (above mentioned fixed DRBD versions also introduce a new adjust_master_score paramater, and this becomes configurable) Or you can add a location constraint like this: location no-master-if-only-consistent ms_drbd_XY \ rule $role=Master -10: defined #uname where defined #uname is a funny way to express true, as in this constraint reduces the resulting master score by 10, always, anywhere. If you have other $role=Master constraints, you may need to play with the scores to achieve the desired outcome. I suspect you would not be able to reproduce by: crm resource stop ms_drbd crm resource demote ms_drbd (will only make drbd Secondary stuff) ... meanwhile, DRBD will establish the connection ... crm resource promote ms_drbd (will then promote one node) By first allowing DRBD to do the handshake in Secondary/Secondary, and only later allowing it to promote, this sequence also avoids the race. Cheers, Lars -- : Lars Ellenberg : LINBIT | Your Way to High Availability : DRBD/HA support and consulting http://www.linbit.com DRBD® and LINBIT® are registered trademarks of LINBIT, Austria. ___ Pacemaker mailing list: Pacemaker@oss.clusterlabs.org http://oss.clusterlabs.org/mailman/listinfo/pacemaker Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://bugs.clusterlabs.org
Re: [Pacemaker] [Linux-HA] Probably a regression of the linbit drbd agent between pacemaker 1.1.8 and 1.1.10
started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://bugs.clusterlabs.org -- : Lars Ellenberg : LINBIT | Your Way to High Availability : DRBD/HA support and consulting http://www.linbit.com DRBD® and LINBIT® are registered trademarks of LINBIT, Austria. ___ Pacemaker mailing list: Pacemaker@oss.clusterlabs.org http://oss.clusterlabs.org/mailman/listinfo/pacemaker Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://bugs.clusterlabs.org
Re: [Pacemaker] ha_logd and logfile rotation
On Tue, Aug 06, 2013 at 12:26:02PM -0600, Dan Urist wrote: I'm trying to use heartbeat with ha_logd, but I can't find any documentation for the proper way to handle log file rotation when using ha_logd. The docs at http://linux-ha.org/wiki/Ha.cf state: If the logging daemon is used, all log messages will be sent through IPC to the logging daemon, which then writes them into log files. In case the logging daemon dies (for whatever reason), a warning message will be logged and all messages will be written to log files directly. So it's not possible to stop ha_logd, rotate the log files and then restart it. How can I rotate log files without restarting heartbeat? If you logrotate with delay compress or whatever that is called, it should just notice itself and reopen. Also, logd is supposed to handle SIGHUP by re-opening the log files. If it does not do that for you, upgrade. If it still does not do that, complain again ;-) -- : Lars Ellenberg : LINBIT | Your Way to High Availability : DRBD/HA support and consulting http://www.linbit.com DRBD® and LINBIT® are registered trademarks of LINBIT, Austria. ___ Pacemaker mailing list: Pacemaker@oss.clusterlabs.org http://oss.clusterlabs.org/mailman/listinfo/pacemaker Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://bugs.clusterlabs.org
Re: [Pacemaker] Multi-node resource dependency
On Fri, Jul 19, 2013 at 04:49:21PM +0200, Tomcsányi, Domonkos wrote: Hello everyone, I have been struggling with this issue for quite some time so I decided to ask you to see if maybe you can shed some light on this problem. So here is the deal: I have a 4-node cluster, from which 2-2 nodes belong together. In ASCII art it would look like this: -- -- | NODE1 |-- | NODE 2 | -- -- | | | | | | -- -- | NODE 3 | -- | NODE 4 | -- -- Now the behaviour I would like to achieve: If NODE 1 goes offline its services should get migrated to NODE 2 AND NODE 3's services should get migrated to NODE 4. If NODE 3 goes offline its services should get migrated to NODE4 AND NODE1's services should get migrated to NODE 2. Of course the same should happen vice versa with NODE 2 and NODE 4. The services NODE1 and 2 are the same naturally, but they differ from NODE 3's and 4's services. So I added some 'location' directives to the config so the services can only be started on the right nodes. I tried 'colocation' which is great, but not for this kind of behaviour: if I colocate both resource groups of NODE 1 and 3 only one of them starts (of course, because colocation means the resource/resource group(s) should be running on the same NODE, so my location directives kick in and prevent for example NODE 3's services from starting on NODE 1). So my question is: is it possible to define such behaviour I described above in Pacemaker? If yes, how? You may use node attributes in colocation constraints. So you would give your nodes attributes, first: crm node attribute NODE1 set color pink attribute NODE3 set color pink attribute NODE2 set color slime attribute NODE4 set color slime crm configure colocation c-by-color inf: rsc_a rsc_b rsc_c node-attribute=color The implicit default node-attribute is #uname ... so using color the resources only need to run on nodes with the same value for the node-attribute color. Lars -- : Lars Ellenberg : LINBIT | Your Way to High Availability : DRBD/HA support and consulting http://www.linbit.com DRBD® and LINBIT® are registered trademarks of LINBIT, Austria. ___ Pacemaker mailing list: Pacemaker@oss.clusterlabs.org http://oss.clusterlabs.org/mailman/listinfo/pacemaker Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://bugs.clusterlabs.org
Re: [Pacemaker] crond on both nodes (active/passive) but some jobs on active only
On Fri, Jul 05, 2013 at 04:52:35PM +0200, andreas graeper wrote: when i wrote a script handled by ocf:heartbeat:anything i.e. that is signalling the cron-daemon to reload crontabs when crontab file is enabled by symlink:start and disabled by symlink:stop how can i achieve that the script runs after symlink :start and :stop ? when i define order-constraint R1 then R2 this implizit means R1:start , R2:start and R2:stop, R1:stop ? Not an answer to that specific question, rather a why even bother suggestion: You say: two nodes active/passive and fetchmail as cronjob shall run on active only. How do you know the node is active? Maybe some specific file system is mounted? Great. You have files and directories which are only visible on an active node. Why not prefix your cron job lines with test -e /this/file/only/visible/on/active || exit 0; real cron command follows or cd /some/dir/only/on/active || exit 0; real cron command or a wrapper, if that looks too ugly only-on-active real cron command /bin/only-on-active: #!/bin/sh same-active-test-as-above || exit 0 $@ # do the real cron command Lars 2013/7/5 andreas graeper agrae...@googlemail.com hi, two nodes active/passive and fetchmail as cronjob shall run on active only. i use ocf:heartbeat:symlink to move / rename /etc/cron.d/jobs /etc/cron.d/jobs.disable i read anywhere crond ignores files with dot. but new experience: crond needs to restarted or signalled. how this is done best within pacemaker ? is clone for me ? thanks in advance andreas -- : Lars Ellenberg : LINBIT | Your Way to High Availability : DRBD/HA support and consulting http://www.linbit.com DRBD® and LINBIT® are registered trademarks of LINBIT, Austria. ___ Pacemaker mailing list: Pacemaker@oss.clusterlabs.org http://oss.clusterlabs.org/mailman/listinfo/pacemaker Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://bugs.clusterlabs.org
Re: [Pacemaker] drbd on passive node not started
On Fri, Jun 21, 2013 at 03:36:39PM +0200, andreas graeper wrote: hi, n1 active node is started and everything works fine, but after reboot n2 drbd is not started by pacemaker. when i start drbd manually, crm_mon shows it as slave ( as if there were no problems). maybe someone experienced can have a look into logs ? The logs you provide clearly show that pacemaker *did* start DRBD, and successfully. Wrong timeframe? Lars -- : Lars Ellenberg : LINBIT | Your Way to High Availability : DRBD/HA support and consulting http://www.linbit.com DRBD® and LINBIT® are registered trademarks of LINBIT, Austria. ___ Pacemaker mailing list: Pacemaker@oss.clusterlabs.org http://oss.clusterlabs.org/mailman/listinfo/pacemaker Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://bugs.clusterlabs.org
Re: [Pacemaker] Monitor and standby
On Wed, Jul 03, 2013 at 12:21:37PM +0200, Denis Witt wrote: Hi List, we have a two node cluster (test1-node1, test1-node2) with an additional quorum node (test1). On all nodes MySQL is running. test1-node1 and test1-node2 sharing the MySQL-Database via DRBD, so only one Node should run MySQL. On test1 there is a MySQL-Slave connected to test1-node1/test1-node2. test1 is always in Standby-Mode. The problem is now that the MySQL-Slave on test1 is shut down by crmd: Jul 3 12:05:12 test2 crmd: [5945]: info: te_rsc_command: Initiating action 22: monitor p_mysql_monitor_0 on test2 (local) Jul 3 12:05:14 test2 pengine: [5944]: ERROR: native_create_actions: Resource p_mysql (lsb::mysql) is active on 2 nodes attempting recovery There. init script status action told pacemaker that mysql was running on both nodes, pacemaker was told it should run only once. pacemaker recovers by stopping both and starting one. Jul 3 12:05:14 test2 pengine: [5944]: notice: LogActions: Restart p_mysql#011(Started test2-node1) Jul 3 12:05:15 test2 crmd: [5945]: info: te_rsc_command: Initiating action 54: stop p_mysql_stop_0 on test2 (local) From my understanding this shouldn't happen as test1 was set to standby before: Jul 3 12:04:48 test2 cib: [5940]: info: cib:diff: + nvpair id=nodes-test2-standby name=standby value=on / How could we solve this? use the mysql RA with proper parameters, so it won't get confused by a different instance of mysql. Or fix the init script status action to be able to distinguish between the cluster mysql instance, and your other mysql instance. Note that a pacemaker node in standby is supposed to not run any resources, so if it notices that DRBD is running there (in Secondary), it will stop it, too. Maybe you and pacemaker disagree about the meaning of standby? -- : Lars Ellenberg : LINBIT | Your Way to High Availability : DRBD/HA support and consulting http://www.linbit.com DRBD® and LINBIT® are registered trademarks of LINBIT, Austria. ___ Pacemaker mailing list: Pacemaker@oss.clusterlabs.org http://oss.clusterlabs.org/mailman/listinfo/pacemaker Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://bugs.clusterlabs.org
Re: [Pacemaker] Monitor and standby
On Wed, Jul 03, 2013 at 12:56:35PM +0200, Denis Witt wrote: On Wed, 3 Jul 2013 12:35:34 +0200 Lars Ellenberg lars.ellenb...@linbit.com wrote: Maybe you and pacemaker disagree about the meaning of standby? Hi Lars, obviously, yes. My understanding was that a standby node just adds it vote for quorum but isn't monitored at all. Thanks for clarify this. We solved it by renaming the Init-Script from mysql to mysqlslave on this node. Now the monitor complains about mysql isn't installed, but we can live with that. What purpose, exactly, is pacemaker supposed to serve in your setup? Why are you using pacemaker at all, if you intend to do everything manually anyways? Lars ___ Pacemaker mailing list: Pacemaker@oss.clusterlabs.org http://oss.clusterlabs.org/mailman/listinfo/pacemaker Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://bugs.clusterlabs.org
Re: [Pacemaker] Fixed! - Re: Problem with dual-PDU fencing node with redundant PSUs
On Fri, Jun 28, 2013 at 07:27:19PM -0400, Digimer wrote: On 06/28/2013 07:22 PM, Andrew Beekhof wrote: On 29/06/2013, at 12:22 AM, Digimer li...@alteeve.ca wrote: On 06/28/2013 06:21 AM, Andrew Beekhof wrote: On 28/06/2013, at 5:22 PM, Lars Marowsky-Bree l...@suse.com wrote: On 2013-06-27T12:53:01, Digimer li...@alteeve.ca wrote: primitive fence_n01_psu1_off stonith:fence_apc_snmp \ params ipaddr=an-p01 pcmk_reboot_action=off port=1 pcmk_host_list=an-c03n01.alteeve.ca primitive fence_n01_psu1_on stonith:fence_apc_snmp \ params ipaddr=an-p01 pcmk_reboot_action=on port=1 pcmk_host_list=an-c03n01.alteeve.ca So every device twice, including location constraints? I see potential for optimization by improving how the fence code handles this ... That's abhorrently complex. (And I'm not sure the 'action' parameter ought to be overwritten.) I'm not crazy about it either because it means the device is tied to a specific command. But it seems to be something all the RHCS people try to do... Maybe something in the rhcs water cooler made us all mad... ;) Glad you got it working, though. location loc_fence_n01_ipmi fence_n01_ipmi -inf: an-c03n01.alteeve.ca [...] I'm not sure you need any of these location constraints, by the way. Did you test if it works without them? Again, this is after just one test. I will want to test it several more times before I consider it reliable. Ideally, I would love to hear Andrew or others confirm this looks sane/correct. It looks correct, but not quite sane. ;-) That seems not to be something you can address, though. I'm thinking that fencing topology should be smart enough to, if multiple fencing devices are specified, to know how to expand them to first all off (if off fails anywhere, it's a failure), then all on (if on fails, it is not a failure). That'd greatly simplify the syntax. The RH agents have apparently already been updated to support multiple ports. I'm really not keen on having the stonith-ng doing this. This doesn't help people who have dual power rails/PDUs for power redundancy. I'm yet to be convinced that having two PDUs is helping those people in the first place. If it were actually useful, I suspect more than two/three people would have asked for it in the last decade. Step 1. Use one PDU Step 2. Kill PDU Your node is dead and can not be fenced. I have multiple independend cluster communication channels. I don't see the node on either of them, I cannot reach it's IPMI or equivalent, I cannot reach it's PDU I'd argue that a failure mode where all of the above was true, and that node would still be alive is sufficiently unlikely to just conclude that it is in fact dead. Rather that than a fencing method that returns yes, I rebooted that node when in fact that node did not even notice... Using two separate UPSes and two separate PDUs to feed either PSU in each node (and either switch in a two-switch configuration with bonded network links) means that you can lose a power rail and not have an interruption. I can't say why it's not a more common configuration, but I can say that I do not see another way to provide redundant power. For me, an HA cluster is not truly HA until all single points of failure have been removed. If I do have two independend UPSes and PDUs and PSUs, (yes, that is a common setup) and I want a second fencing method to fallback from IPMI, then yes, it'd would be nice to have some clean and easy way to tell pacemaker to do that. But not having that fallback fencing method does not introduce a SPOF. Both mainboard (or kernel or resource stop failure or whatever) and BMC would have to fail at the same time for the cluster to block... -- : Lars Ellenberg : LINBIT | Your Way to High Availability : DRBD/HA support and consulting http://www.linbit.com ___ Pacemaker mailing list: Pacemaker@oss.clusterlabs.org http://oss.clusterlabs.org/mailman/listinfo/pacemaker Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://bugs.clusterlabs.org
Re: [Pacemaker] owership of created symlink
On Tue, Jun 04, 2013 at 07:15:11PM +0200, andreas graeper wrote: hi, i tried, before starting dovecot+exim+fetchmail to create a symlink /var/mail - /mnt/mirror/var/mail with ra ocf:heartbeat:symlink i changed target : chmod 0775 chown root.mail but i need write permission to /var/mail cause exim wants to create a lock file i tried to manually chown -h root.mail /var/mail and link is now 777 root.mail Ownership and permissions of the link do not matter at all. For the mount point the same. Ownership and permissions of the directory matters. once mounted, do chown / chmod on /mnt/mirror/var/mail/. Also make sure the uid/gid is the same on all nodes. but old problem euid=5xx egid=8 (mail) can not create lock file /var/mail/.lock please help. andreas -- : Lars Ellenberg : LINBIT | Your Way to High Availability : DRBD/HA support and consulting http://www.linbit.com DRBD® and LINBIT® are registered trademarks of LINBIT, Austria. ___ Pacemaker mailing list: Pacemaker@oss.clusterlabs.org http://oss.clusterlabs.org/mailman/listinfo/pacemaker Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://bugs.clusterlabs.org
Re: [Pacemaker] crm subshell 1.2.4 incompatible to pacemaker 1.1.9?
On Wed, May 15, 2013 at 03:34:14PM +0200, Dejan Muhamedagic wrote: On Tue, May 14, 2013 at 10:03:59PM +0200, Lars Ellenberg wrote: On Tue, May 14, 2013 at 09:59:50PM +0200, Lars Ellenberg wrote: On Mon, May 13, 2013 at 01:53:11PM +0200, Michael Schwartzkopff wrote: Hi, crm tells me it is version 1.2.4 pacemaker tell me it is verison 1.1.9 So it should work since incompatibilities are resolved in crm higher that version 1.2.1. Anywas crm tells me nonsense: # crm crm(live)# node crm(live)node# standby node1 ERROR: bad lifetime: node1 Your node is not named node1. check: crm node list Maybe a typo, maybe some case-is-significant nonsense, maybe you just forgot to use the fqdn. maybe the check for is this a known node name is (now) broken? standby with just one argument checks if that argument happens to be a known node name, and assumes that if it is not, it has to be a lifetime, and the current node is used as node name... Maybe we should invert that logic, and instead compare the single argument against allowed lifetime values (reboot, forever), and assume it is supposed to be a node name otherwise? Then the error would become ERROR: unknown node name: node1 Which is probably more useful most of the time. Dejan? Something like this maybe: diff --git a/modules/ui.py.in b/modules/ui.py.in --- a/modules/ui.py.in +++ b/modules/ui.py.in @@ -1185,7 +1185,7 @@ class NodeMgmt(UserInterface): if not args: node = vars.this_node if len(args) == 1: -if not args[0] in listnodes(): +if args[0] in (reboot, forever): Yes, I wanted to look at it again. Another complication is that the lifetime can be just about anything in that date ISO format. That may well be, but right now those would be rejected by crmsh anyways: if lifetime not in (None,reboot,forever): common_err(bad lifetime: %s % lifetime) return False -- : Lars Ellenberg : LINBIT | Your Way to High Availability : DRBD/HA support and consulting http://www.linbit.com ___ Pacemaker mailing list: Pacemaker@oss.clusterlabs.org http://oss.clusterlabs.org/mailman/listinfo/pacemaker Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://bugs.clusterlabs.org
Re: [Pacemaker] crm subshell 1.2.4 incompatible to pacemaker 1.1.9?
On Mon, May 13, 2013 at 01:53:11PM +0200, Michael Schwartzkopff wrote: Hi, crm tells me it is version 1.2.4 pacemaker tell me it is verison 1.1.9 So it should work since incompatibilities are resolved in crm higher that version 1.2.1. Anywas crm tells me nonsense: # crm crm(live)# node crm(live)node# standby node1 ERROR: bad lifetime: node1 Your node is not named node1. check: crm node list Maybe a typo, maybe some case-is-significant nonsense, maybe you just forgot to use the fqdn. maybe the check for is this a known node name is (now) broken? standby with just one argument checks if that argument happens to be a known node name, and assumes that if it is not, it has to be a lifetime, and the current node is used as node name... Maybe we should invert that logic, and instead compare the single argument against allowed lifetime values (reboot, forever), and assume it is supposed to be a node name otherwise? Then the error would become ERROR: unknown node name: node1 Which is probably more useful most of the time. Dejan? -- : Lars Ellenberg : LINBIT | Your Way to High Availability : DRBD/HA support and consulting http://www.linbit.com ___ Pacemaker mailing list: Pacemaker@oss.clusterlabs.org http://oss.clusterlabs.org/mailman/listinfo/pacemaker Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://bugs.clusterlabs.org
Re: [Pacemaker] crm subshell 1.2.4 incompatible to pacemaker 1.1.9?
On Tue, May 14, 2013 at 09:59:50PM +0200, Lars Ellenberg wrote: On Mon, May 13, 2013 at 01:53:11PM +0200, Michael Schwartzkopff wrote: Hi, crm tells me it is version 1.2.4 pacemaker tell me it is verison 1.1.9 So it should work since incompatibilities are resolved in crm higher that version 1.2.1. Anywas crm tells me nonsense: # crm crm(live)# node crm(live)node# standby node1 ERROR: bad lifetime: node1 Your node is not named node1. check: crm node list Maybe a typo, maybe some case-is-significant nonsense, maybe you just forgot to use the fqdn. maybe the check for is this a known node name is (now) broken? standby with just one argument checks if that argument happens to be a known node name, and assumes that if it is not, it has to be a lifetime, and the current node is used as node name... Maybe we should invert that logic, and instead compare the single argument against allowed lifetime values (reboot, forever), and assume it is supposed to be a node name otherwise? Then the error would become ERROR: unknown node name: node1 Which is probably more useful most of the time. Dejan? Something like this maybe: diff --git a/modules/ui.py.in b/modules/ui.py.in --- a/modules/ui.py.in +++ b/modules/ui.py.in @@ -1185,7 +1185,7 @@ class NodeMgmt(UserInterface): if not args: node = vars.this_node if len(args) == 1: -if not args[0] in listnodes(): +if args[0] in (reboot, forever): node = vars.this_node lifetime = args[0] else: -- : Lars Ellenberg : LINBIT | Your Way to High Availability : DRBD/HA support and consulting http://www.linbit.com ___ Pacemaker mailing list: Pacemaker@oss.clusterlabs.org http://oss.clusterlabs.org/mailman/listinfo/pacemaker Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://bugs.clusterlabs.org
Re: [Pacemaker] Exchanging data between resource agent instances
On Tue, Mar 19, 2013 at 08:22:39AM +0100, Riccardo Bicelli wrote: Because I'm trying to set up an active/standby scsi cluster using alua. I need to create a dummy device in the same size of the real device. Is that so. What for? Can you explain in more detail? For getting dev size I use blockdev --getsize64 device_name The problem is, when I'm using DRBD, that blockdev fails on slave device. Well, then use awk '/ drbd0$/ { print $3 * 1024 }' /proc/partitions No? -- : Lars Ellenberg : LINBIT | Your Way to High Availability : DRBD/HA support and consulting http://www.linbit.com ___ Pacemaker mailing list: Pacemaker@oss.clusterlabs.org http://oss.clusterlabs.org/mailman/listinfo/pacemaker Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://bugs.clusterlabs.org
Re: [Pacemaker] Exchanging data between resource agent instances
On Mon, Mar 18, 2013 at 08:49:41PM +0100, Riccardo Bicelli wrote: Hello, anyone knows if is it possible to exchange data between two instances of a resource agent? I have a Master/Slave resource agent that, when slave, has to create a dummy device in same size of a given block device (DRBD) running on Master. Why? What do you want to achieve? Since the block device is not accessible when the resource is slave, I was wondering if master could read size of device and report it to the slave. does cat /proc/partitions help? I don't like the idea of putting that size in the cib. -- : Lars Ellenberg : LINBIT | Your Way to High Availability : DRBD/HA support and consulting http://www.linbit.com ___ Pacemaker mailing list: Pacemaker@oss.clusterlabs.org http://oss.clusterlabs.org/mailman/listinfo/pacemaker Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://bugs.clusterlabs.org
Re: [Pacemaker] Pacemaker DRBD as Physical Volume on Encrypted RAID1
On Mon, Mar 04, 2013 at 04:27:24PM -0500, senrab...@aol.com wrote: Hi All: We're new to pacemaker (just got some great help from this forum getting it working with LVM as backing device), and would like to explore the Physical Volume option. We're trying configure on top of an existing Encrypted RAID1 set up and employ LVM. NOTE: our goal is to run many virtual servers, each in its own logical volume and it looks like putting LVM on top of the DRBD would allow us to add logical volumes on the fly, but also have a simpler setup with one drbd device for all the logical volumes and one related pacemaker config. Hence, exploring DRBD as a physical volume. A single DRBD has a single activity log, running many virtual servers from there will very likely cause the worst possible workload (many totally random writes). You really want to use DRBD 8.4.3, see https://blogs.linbit.com/p/469/843-random-writes-faster/ for why. Q: For pacemaker to work, how do we do the DRBD disk/device mapping in the drbd.conf file? And should we set things up and encrypt last, or can we apply DRBD and Pacemaker to an existing Encypted RAID1 setup? Neither Pacemaker nor DRBD do particularly care. If you want to stack the encryption layer on top of DRBD, fine. (you'd probably need to teach some pacemaker resource agent to start the encryption layer). If you want to stack DRBD on top of the encryption layer, just as fine. Unless you provide the decryption key in plaintext somewhere, failover will likely be easier to automate if you have DRBD on top of encryption, so if you want the real device encrypted, I'd recommend to put encryption below DRBD. Obviously, the DRBD replication traffic will still be plaintext in that case. The examples we've seen show mapping between the drbd device and a physical disk (e.g., sdb) in the drbd.conf, and then pvcreate /dev/drbdnum and creating a volume group and logical volume on the drbd device. So for this type of set up, drbd.conf might look like: device/dev/drbd1; disk /dev/sdb; address xx.xx.xx.xx:7789; meta-disk internal; In our case, because we have an existing RAID1 (md2) and it's encrypted (md2_crypt or /dev/dm-7 ... we're unsure which partition actually has the data), any thoughts on how to do the DRBD mapping? E.g., device /dev/drbd1 minor 1; disk /dev/???; address xx.xx.xx.xx:7789; meta-disk internal; I.e., what goes in the disk /dev/?;? Would it be disk /dev/md2_crypt;? Yes. And can we do our setup on an existing Encrypted RAID1 setup Yes. (if we do pvcreate on drbd1, we get errors)? Huh? -- : Lars Ellenberg : LINBIT | Your Way to High Availability : DRBD/HA support and consulting http://www.linbit.com DRBD® and LINBIT® are registered trademarks of LINBIT, Austria. ___ Pacemaker mailing list: Pacemaker@oss.clusterlabs.org http://oss.clusterlabs.org/mailman/listinfo/pacemaker Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://bugs.clusterlabs.org
Re: [Pacemaker] Trouble with ocf:Squid resource agent
On Mon, Aug 13, 2012 at 02:07:46PM +0200, Dejan Muhamedagic wrote: Hi, On Mon, Jul 30, 2012 at 12:09:10PM -0400, Jake Smith wrote: - Original Message - From: Julien Cornuwel cornu...@gmail.com To: pacemaker@oss.clusterlabs.org Sent: Wednesday, July 25, 2012 5:51:28 AM Subject: Re: [Pacemaker] Trouble with ocf:Squid resource agent Oops! Spoke too fast. The fix below allows squid to start. But the script also has problems in the 'stop' part. It is stuck in an infinite loop and here are the logs (repeats every second) : Jul 25 11:38:47 corsen-a lrmd: [24099]: info: RA output: (Proxy:stop:stderr) /usr/lib/ocf/resource.d//heartbeat/Squid: line 320: kill: -: arguments must be process or job IDs Jul 25 11:38:47 corsen-a lrmd: [24099]: info: RA output: (Proxy:stop:stderr) /usr/lib/ocf/resource.d//heartbeat/Squid: line 320: kill: -: arguments must be process or job IDs Jul 25 11:38:48 corsen-a Squid(Proxy)[24659]: [25682]: INFO: squid:stop_squid:318: try to stop by SIGKILL: - Jul 25 11:38:48 corsen-a Squid(Proxy)[24659]: [25682]: INFO: squid:stop_squid:318: try to stop by SIGKILL: - Being on a deadline, I'll use the lsb script for the moment. If someone figures out how to use this ocf script, I'm very interrested. Did you try to use the current version of the script? It very much looks like you miss out on this fix: commit cbf70945f162aa296dacfc07817f1764a76e412e Author: Dejan Muhamedagic de...@suse.de Date: Mon Oct 1 12:43:29 2012 +0200 Medium: Squid: fix getting PIDs of squid processes (lf#2653) See https://github.com/ClusterLabs/resource-agents/commit/cbf70945f162aa296dacfc07817f1764a76e412e (and some other fixes that come later!) Fixed! The problem comes from the squid ocf script (/usr/lib/ocf/resource.d/heartbeat/Squid) that doesn't handle IPv6 addresses correctly. All you have to do is modify the line 198 as such : awk '/(tcp.*[0-9]+\.[0-9]+\.+[0-9]+\.[0-9]+:'$SQUID_PORT' |tcp.*:::'$SQUID_PORT' )/{ This is supposed to be fixed as well in the current version of that script... Yes. If somebody opens a bugzilla at LF (https://developerbugs.linuxfoundation.org/) or an issue at https://github.com/ClusterLabs/resource-agents somebody (hopefully the author) will take care of it. As I wrote, I think both of these are already fixed. Please use resource-agents v3.9.5. Lars -- : Lars Ellenberg : LINBIT | Your Way to High Availability : DRBD/HA support and consulting http://www.linbit.com DRBD® and LINBIT® are registered trademarks of LINBIT, Austria. ___ Pacemaker mailing list: Pacemaker@oss.clusterlabs.org http://oss.clusterlabs.org/mailman/listinfo/pacemaker Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://bugs.clusterlabs.org
Re: [Pacemaker] Trouble with ocf:Squid resource agent
On Fri, Feb 08, 2013 at 11:21:15AM +0100, Lars Ellenberg wrote: On Mon, Aug 13, 2012 at 02:07:46PM +0200, Dejan Muhamedagic wrote: Appologies, I did not look at the date of the Post. For some reason it appeart as first unread, and I assumed it was recent. D'oh. :-) Please use resource-agents v3.9.5. Lars ___ Pacemaker mailing list: Pacemaker@oss.clusterlabs.org http://oss.clusterlabs.org/mailman/listinfo/pacemaker Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://bugs.clusterlabs.org
Re: [Pacemaker] unable to load drbd module
On Thu, Aug 30, 2012 at 10:09:09AM -0700, ROBERTO GUERERO wrote: hi guys, I was following the pacemaker 1.1 pdf to setup HA on my Centos 6.3 all went ok until i reached storage with drbd, after following the instruction and start to modprobe drbd error showed up and says cannot allocate memory. Kindly advise on how to fix this issue. You likely are using a drbd module compiled against RHEL 6.2 kernel headers on a RHEL 6.3 kernel, and you are 32bit. Does not work, but be happy that you did not try it the other way around: trying to modprobe a drbd compiled against 6.3 headers on a 32bit 6.2 kernel will panic the box... They pretend to have a stable kABI, but still they break occasionally. At least they try harder to keep that kABI stable within a sub-release. Sorry, but there is not much we can do. Please install a kmod-drbd that is matching your kernel, or compile yourself against matching kernel headers. -- : Lars Ellenberg : LINBIT | Your Way to High Availability : DRBD/HA support and consulting http://www.linbit.com DRBD® and LINBIT® are registered trademarks of LINBIT, Austria. ___ Pacemaker mailing list: Pacemaker@oss.clusterlabs.org http://oss.clusterlabs.org/mailman/listinfo/pacemaker Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://bugs.clusterlabs.org
Re: [Pacemaker] drbd under pacemaker - always get split brain
On Wed, Jul 11, 2012 at 11:38:52AM +0200, Nikola Ciprich wrote: Well, I'd expect that to be safer as your current configuration ... discard-zero-changes will never overwrite data automatically have you tried adding the start-delay to DRBD start operation? I'm curious if that is already sufficient for your problem. Hi, tried op id=drbd-sas0-start-0 interval=0 name=start start-delay=10s timeout=240s/ (I hope it's the setting You've meant, although I'm not sure, I haven't found any documentation on start-delay option) but didn't help.. Of course not. You Problem is this: DRBD config: allow-two-primaries, but *NO* fencing policy, and *NO* fencing handler. And, as if that was not bad enough already, Pacemaker config: no-quorum-policy=ignore \ stonith-enabled=false D'oh. And then, well, your nodes come up some minute+ after each other, and Pacemaker and DRBD behave exactly as configured: Jul 10 06:00:12 vmnci20 crmd: [3569]: info: do_state_transition: All 1 cluster nodes are eligible to run resources. Note the *1* ... So it starts: Jul 10 06:00:12 vmnci20 pengine: [3568]: notice: LogActions: Start drbd-sas0:0(vmnci20) But leaves: Jul 10 06:00:12 vmnci20 pengine: [3568]: notice: LogActions: Leave drbd-sas0:1(Stopped) as there is no peer node yet. And on the next iteration, we still have only one node: Jul 10 06:00:15 vmnci20 crmd: [3569]: info: do_state_transition: All 1 cluster nodes are eligible to run resources. So we promote: Jul 10 06:00:15 vmnci20 pengine: [3568]: notice: LogActions: Promote drbd-sas0:0(Slave - Master vmnci20) And only some minute later, the peer node joins: Jul 10 06:01:33 vmnci20 crmd: [3569]: info: do_state_transition: State transition S_INTEGRATION - S_FINALIZE_JOIN [ input=I_INTEGRATED cause=C_FSA_INTERNAL origin=check_join_state ] Jul 10 06:01:33 vmnci20 crmd: [3569]: info: do_state_transition: All 2 cluster nodes responded to the join offer. So now we can start the peer: Jul 10 06:01:33 vmnci20 pengine: [3568]: notice: LogActions: Leave drbd-sas0:0(Master vmnci20) Jul 10 06:01:33 vmnci20 pengine: [3568]: notice: LogActions: Start drbd-sas0:1(vmnci21) And it even is promoted right away: Jul 10 06:01:36 vmnci20 pengine: [3568]: notice: LogActions: Promote drbd-sas0:1(Slave - Master vmnci21) And within those 3 seconds, DRBD was not able to establish the connection yet. You configured DRBD and Pacemaker to produce data divergence. Not suprisingly, that is exactly what you get. Fix your Problem. See above; hint: fencing resource-and-stonith, crm-fence-peer.sh + stonith_admin, add stonith, maybe add a third node so you don't need to ignore quorum, ... And all will be well. -- : Lars Ellenberg : LINBIT | Your Way to High Availability : DRBD/HA support and consulting http://www.linbit.com ___ Pacemaker mailing list: Pacemaker@oss.clusterlabs.org http://oss.clusterlabs.org/mailman/listinfo/pacemaker Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://bugs.clusterlabs.org
Re: [Pacemaker] drbd under pacemaker - always get split brain
On Thu, Jul 12, 2012 at 04:23:51PM +0200, Nikola Ciprich wrote: Hello Lars, thanks for Your reply.. You Problem is this: DRBD config: allow-two-primaries, but *NO* fencing policy, and *NO* fencing handler. And, as if that was not bad enough already, Pacemaker config: no-quorum-policy=ignore \ stonith-enabled=false yes, I've written it's just test cluster on virtual machines. therefore no fencing devices. however I don't think it's the whole problem source, I've tried starting node2 much later after node1 (actually node1 has been running for about 1 day), and got right into same situation.. pacemaker just doesn't wait long enough before the drbds can connect at all and seems to promote them both. it really seems to be regression to me, as this was always working well... It is not. Pacemaker may just be quicker to promote now, or in your setup other things may have changed which also changed the timing behaviour. But what you are trying to do has always been broken, and will always be broken. even though I've set no-quorum-policy to freeze, the problem returns as soon as cluster becomes quorate.. I have all split-brain and fencing scripts in drbd disabled intentionaly so I had chance to investigate, otherwise one of the nodes always commited suicide but there should be no reason for split brain.. Right. That's why shooting as in stonith is not good enough a fencing mechanism in a drbd dual Primary cluster. You also need to tell the peer that it is outdated, respectively must not become Primary or Master until it synced up (or at least, *starts* to sync up). You can do that using the crm-fence-peer.sh (it does not actually tell DRBD that it is outdated, but it tells Pacemaker to not promote that other node, which is even better, if the rest of the system is properly set up. crm-fence-peer.sh alone is also not good enough in certain situations. That's why you need both, the drbd fence-peer mechanism *and* stonith. cheers! nik D'oh. And then, well, your nodes come up some minute+ after each other, and Pacemaker and DRBD behave exactly as configured: Jul 10 06:00:12 vmnci20 crmd: [3569]: info: do_state_transition: All 1 cluster nodes are eligible to run resources. Note the *1* ... So it starts: Jul 10 06:00:12 vmnci20 pengine: [3568]: notice: LogActions: Start drbd-sas0:0(vmnci20) But leaves: Jul 10 06:00:12 vmnci20 pengine: [3568]: notice: LogActions: Leave drbd-sas0:1(Stopped) as there is no peer node yet. And on the next iteration, we still have only one node: Jul 10 06:00:15 vmnci20 crmd: [3569]: info: do_state_transition: All 1 cluster nodes are eligible to run resources. So we promote: Jul 10 06:00:15 vmnci20 pengine: [3568]: notice: LogActions: Promote drbd-sas0:0(Slave - Master vmnci20) And only some minute later, the peer node joins: Jul 10 06:01:33 vmnci20 crmd: [3569]: info: do_state_transition: State transition S_INTEGRATION - S_FINALIZE_JOIN [ input=I_INTEGRATED cause=C_FSA_INTERNAL origin=check_join_state ] Jul 10 06:01:33 vmnci20 crmd: [3569]: info: do_state_transition: All 2 cluster nodes responded to the join offer. So now we can start the peer: Jul 10 06:01:33 vmnci20 pengine: [3568]: notice: LogActions: Leave drbd-sas0:0(Master vmnci20) Jul 10 06:01:33 vmnci20 pengine: [3568]: notice: LogActions: Start drbd-sas0:1(vmnci21) And it even is promoted right away: Jul 10 06:01:36 vmnci20 pengine: [3568]: notice: LogActions: Promote drbd-sas0:1(Slave - Master vmnci21) And within those 3 seconds, DRBD was not able to establish the connection yet. You configured DRBD and Pacemaker to produce data divergence. Not suprisingly, that is exactly what you get. Fix your Problem. See above; hint: fencing resource-and-stonith, crm-fence-peer.sh + stonith_admin, add stonith, maybe add a third node so you don't need to ignore quorum, ... And all will be well. -- : Lars Ellenberg : LINBIT | Your Way to High Availability : DRBD/HA support and consulting http://www.linbit.com DRBD® and LINBIT® are registered trademarks of LINBIT, Austria. ___ Pacemaker mailing list: Pacemaker@oss.clusterlabs.org http://oss.clusterlabs.org/mailman/listinfo/pacemaker Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://bugs.clusterlabs.org
Re: [Pacemaker] Two slave nodes, neither will promote to Master
On Mon, Jun 25, 2012 at 04:48:50PM +0100, Regendoerp, Achim wrote: Hi, I'm currently looking at two VMs which are supposed to mount a drive in a given directory, depending on who's the master. This was decided above me, therefore no DRBD stuff (which would've made things easier), but still using corosync/pacemaker to do the cluster work. As it is currently, both nodes are online and configured, but none are switching to Master. In lack of a DRBD resource, I tried using the Dummy Pacemaker. If that's not the correct RA, please enlighten me on this too. As has been stated already, to simulate a Stateful resource use the ocf:pacemaker:Stateful agent. But... iiuc, you are using a shared disk. Why would you want that dummy resource at all? why not simply: Below's the current config: node NODE01 \ attributes standby=off node NODE02 \ attributes standby=off primitive clusterIP ocf:heartbeat:IPaddr2 \ params ip=10.64.96.31 nic=eth1:1 \ op monitor on-fail=restart interval=5s primitive clusterIParp ocf:heartbeat:SendArp \ params ip=10.64.96.31 nic=eth1:1 primitive fs_nfs ocf:heartbeat:Filesystem \ params device=/dev/vg_shared/lv_nfs_01 directory=/shared fstype=ext4 \ op start interval=0 timeout=240 \ op stop interval=0 timeout=240 on-fail=restart delete that: - primitive ms_dummy ocf:pacemaker:Dummy \ - op start interval=0 timeout=240 \ - op stop interval=0 timeout=240 \ - op monitor interval=15 role=Master timeout=240 \ - op monitor interval=30 role=Slave on-fail=restart timeout-240 primitive nfs_share ocf:heartbeat:nfsserver \ params nfs_ip=10.64.96.31 nfs_init_script=/etc/init.d/nfs nfs_shared_infodir=/shared/nfs nfs_notify_cmd=/sbin/rpc.statd \ op start interval=0 timeout=240 \ op stop interval=0 timeout=240 on-fail=restart group Services clusterIP clusterIParp fs_nfs nfs_share \ meta target-role=Started is-managed=true multiple-active=stop_start and that: - ms ms_nfs ms_dummy \ - meta target-role=Master master-max=1 master-node=1 clone-max=2 clone-node-max=1 notify=true and that: - colocation services_on_master inf: Services ms_nfs:Master - order fs_before_services inf: ms_nfs:promote Services:start property $id=cib-bootstrap-options \ dc-version=1.1.6-3.el6-a02c0f19a00c1eb2527ad38f146ebc0834814558 \ cluster-infrastructure=openais \ expected-quorum-votes=2 \ no-quorum-policy=ignore \ stonith-enabled=false rsc_defaults $id=rsc-options \ resource-stickiness=200 That's all you need for a shared disk cluster. Well. Almost. Of course you have to configure, enable, test and use stonith. -- : Lars Ellenberg : LINBIT | Your Way to High Availability : DRBD/HA support and consulting http://www.linbit.com ___ Pacemaker mailing list: Pacemaker@oss.clusterlabs.org http://oss.clusterlabs.org/mailman/listinfo/pacemaker Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://bugs.clusterlabs.org
Re: [Pacemaker] Why Did Pacemaker Restart this VirtualDomain Resource?
On Tue, Jun 19, 2012 at 11:12:46AM -0500, Andrew Martin wrote: Hi Emmanuel, Thanks for the idea. I looked through the rest of the log and these return code 8 errors on the ocf:linbit:drbd resources are occurring at other intervals (e.g. today) when the VirtualDomain resource is unaffected. This seems to indicate that these soft errors do not No soft error here. monitor exit code 8 is OCF_RUNNING_MASTER. expected an healthy. Lars trigger a restart of the VirtualDomain resource. Is there anything else in the log that could indicate what caused this, or is there somewhere else I can look? Thanks, Andrew - Original Message - From: emmanuel segura emi2f...@gmail.com To: The Pacemaker cluster resource manager pacemaker@oss.clusterlabs.org Sent: Tuesday, June 19, 2012 9:57:19 AM Subject: Re: [Pacemaker] Why Did Pacemaker Restart this VirtualDomain Resource? I didn't see any error in your config, the only thing i seen it's this == Jun 14 15:35:27 vmhost1 lrmd: [3853]: info: rsc:p_drbd_vmstore:0 monitor[55] (pid 12323) Jun 14 15:35:27 vmhost1 lrmd: [3853]: info: rsc:p_drbd_mount2:0 monitor[53] (pid 12324) Jun 14 15:35:27 vmhost1 lrmd: [3853]: info: operation monitor[55] on p_drbd_vmstore:0 for client 3856: pid 12323 exited with return code 8 Jun 14 15:35:27 vmhost1 lrmd: [3853]: info: operation monitor[53] on p_drbd_mount2:0 for client 3856: pid 12324 exited with return code 8 Jun 14 15:35:31 vmhost1 lrmd: [3853]: info: rsc:p_drbd_mount1:0 monitor[54] (pid 12396) = it can be a drbd problem, but i tell you the true i'm not sure ___ Pacemaker mailing list: Pacemaker@oss.clusterlabs.org http://oss.clusterlabs.org/mailman/listinfo/pacemaker Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://bugs.clusterlabs.org
Re: [Pacemaker] Why Did Pacemaker Restart this VirtualDomain Resource?
On Tue, Jun 19, 2012 at 09:38:50AM -0500, Andrew Martin wrote: Hello, I have a 3 node Pacemaker+Heartbeat cluster (two real nodes and one standby quorum node) with Ubuntu 10.04 LTS on the nodes and using the Pacemaker+Heartbeat packages from the Ubuntu HA Team PPA ( https://launchpad.net/~ubuntu-ha-maintainers/+archive/ppa ). I have configured 3 DRBD resources, a filesystem mount, and a KVM-based virtual machine (using the VirtualDomain resource). I have constraints in place so that the DRBD devices must become primary and the filesystem must be mounted before the VM can start: location loc_run_on_most_connected g_vm \ rule $id=loc_run_on_most_connected-rule p_ping: defined p_ping This is the rule This has been working well, however last week Pacemaker all of a sudden stopped the p_vm_myvm resource and then started it up again. I have attached the relevant section of /var/log/daemon.log - I am unable to determine what caused Pacemaker to restart this resource. Based on the log, could you tell me what event triggered this? Thanks, Andrew Jun 14 15:25:00 vmhost1 lrmd: [3853]: info: rsc:p_sysadmin_notify:0 monitor[18] (pid 3661) Jun 14 15:25:00 vmhost1 lrmd: [3853]: info: operation monitor[18] on p_sysadmin_notify:0 for client 3856: pid 3661 exited with return code 0 Jun 14 15:26:42 vmhost1 cib: [3852]: info: cib_stats: Processed 219 operations (182.00us average, 0% utilization) in the last 10min Jun 14 15:32:43 vmhost1 lrmd: [3853]: info: operation monitor[22] on p_ping:0 for client 3856: pid 10059 exited with return code 0 Jun 14 15:35:27 vmhost1 lrmd: [3853]: info: rsc:p_drbd_vmstore:0 monitor[55] (pid 12323) Jun 14 15:35:27 vmhost1 lrmd: [3853]: info: rsc:p_drbd_mount2:0 monitor[53] (pid 12324) Jun 14 15:35:27 vmhost1 lrmd: [3853]: info: operation monitor[55] on p_drbd_vmstore:0 for client 3856: pid 12323 exited with return code 8 Jun 14 15:35:27 vmhost1 lrmd: [3853]: info: operation monitor[53] on p_drbd_mount2:0 for client 3856: pid 12324 exited with return code 8 Jun 14 15:35:31 vmhost1 lrmd: [3853]: info: rsc:p_drbd_mount1:0 monitor[54] (pid 12396) Jun 14 15:35:31 vmhost1 lrmd: [3853]: info: operation monitor[54] on p_drbd_mount1:0 for client 3856: pid 12396 exited with return code 8 Jun 14 15:36:42 vmhost1 cib: [3852]: info: cib_stats: Processed 220 operations (272.00us average, 0% utilization) in the last 10min Jun 14 15:37:34 vmhost1 lrmd: [3853]: info: rsc:p_vm_myvm monitor[57] (pid 14061) Jun 14 15:37:34 vmhost1 lrmd: [3853]: info: operation monitor[57] on p_vm_myvm for client 3856: pid 14061 exited with return code 0 Jun 14 15:42:35 vmhost1 attrd: [3855]: notice: attrd_trigger_update: Sending flush op to all hosts for: p_ping (1000) Jun 14 15:42:35 vmhost1 attrd: [3855]: notice: attrd_perform_update: Sent update 163: p_ping=1000 And here the score on the location constraint changes for this node. You asked for run on most connected, and your pingd resource determined that the other one was better connected. Jun 14 15:42:36 vmhost1 crmd: [3856]: info: do_lrm_rsc_op: Performing key=136:2351:0:7f6d66f7-cfe5-4820-8289-0e47d8c9102b op=p_vm_myvm_stop_0 ) Jun 14 15:42:36 vmhost1 lrmd: [3853]: info: rsc:p_vm_myvm stop[58] (pid 18174) ... Jun 14 15:43:32 vmhost1 attrd: [3855]: notice: attrd_trigger_update: Sending flush op to all hosts for: p_ping (2000) Jun 14 15:43:32 vmhost1 attrd: [3855]: notice: attrd_perform_update: Sent update 165: p_ping=2000 And there it is back on 2000 again ... Lars -- : Lars Ellenberg : LINBIT | Your Way to High Availability : DRBD/HA support and consulting http://www.linbit.com DRBD® and LINBIT® are registered trademarks of LINBIT, Austria. ___ Pacemaker mailing list: Pacemaker@oss.clusterlabs.org http://oss.clusterlabs.org/mailman/listinfo/pacemaker Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://bugs.clusterlabs.org
Re: [Pacemaker] Changing name/location of resource script
On Tue, Jun 12, 2012 at 08:52:27AM -0700, Walter Feddern wrote: I have a 4 node cluster running about 120 tomcat resources. Currently they are using the stock tomcat resource script ( ocf:heartbeat:tomcat ) As I may need to make some adjustments to the script for our environment, I would like to move it out of the heartbeat directory. I have created a directory 'custom', and can edit the resource manually using: crm configure edit tomcat_rsc1 then making the change using 'vi' As I have to make the change to 120 resources, I would like find a way to automate it a bit more, but have not been able to find an easy way to make the change on the command line. crm configure edit, then :%s/// ... but wait ... crm configure help filter careful, that one is a bit tricky to get right. Any suggestions? Thanks, Walter. -- : Lars Ellenberg : LINBIT | Your Way to High Availability : DRBD/HA support and consulting http://www.linbit.com ___ Pacemaker mailing list: Pacemaker@oss.clusterlabs.org http://oss.clusterlabs.org/mailman/listinfo/pacemaker Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://bugs.clusterlabs.org
Re: [Pacemaker] Announce: pcs / pcs-gui (Pacemaker/Corosync Configuration System)
On Wed, Jun 06, 2012 at 07:22:47PM +0200, Rasto Levrinc wrote: On Wed, Jun 6, 2012 at 4:45 PM, Lars Ellenberg lars.ellenb...@linbit.com wrote: On Tue, Jun 05, 2012 at 05:15:04PM +0200, Rasto Levrinc wrote: On Tue, Jun 5, 2012 at 1:27 PM, Lars Marowsky-Bree l...@suse.com wrote: On 2012-06-05T09:43:09, Andrew Beekhof and...@beekhof.net wrote: Every argument made so far applies equally to HAWK and the Linbit GUI, yet there was no outcry when they were announced. No, like I said above, that did suck - but the architecture truly is different and drbd-mc just wasn't the right answer for customers who wanted a HTML-only frontend. Besides, this is not an outcry. An outcry is revoking people's mailing list privileges and posting angry blogs. ;-) Ok, I see the point of both sides, so I will not join the outcry. :) Just for the record, the drbd mc / lcmc as an applet and a little bit backend could look like a web application, only better. ... once it is cleaned up to not try to use up a couple GB of RAM and loop in the GC, while the typical default browser plugin JVM settings allow for a handful of MB, max ... that cleanup may be useful anyways. I haven't seen such behavior and I don't know your configuration, so thanks for the bug-report, I guess. :) To be fair, that was on a slow 32bit windows xp in an old IE with probably old-ish java [*], and a default memory setting for plugin JVM or (I think) 64M. the config was very simple at that point, like two node, two drbd, one iSCSI target and lun and IP each, done from crm shell. After some time things became visible, but once you started to do something, it would start garbage collecting and never become responsive again. Once we started a standalone java, and adjusted the memory parameters to allow for 500 or 800 or so MB, it became useable. [*] so it may have been only the old java, even. who knows. I did not try to reproduce yet in any ways. But still, even on very simple configurations, the memory consumption of LCMC can be excessive, for whatever reason. -- : Lars Ellenberg : LINBIT | Your Way to High Availability : DRBD/HA support and consulting http://www.linbit.com DRBD® and LINBIT® are registered trademarks of LINBIT, Austria. ___ Pacemaker mailing list: Pacemaker@oss.clusterlabs.org http://oss.clusterlabs.org/mailman/listinfo/pacemaker Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://bugs.clusterlabs.org
Re: [Pacemaker] Announce: pcs / pcs-gui (Pacemaker/Corosync Configuration System)
On Tue, Jun 05, 2012 at 05:15:04PM +0200, Rasto Levrinc wrote: On Tue, Jun 5, 2012 at 1:27 PM, Lars Marowsky-Bree l...@suse.com wrote: On 2012-06-05T09:43:09, Andrew Beekhof and...@beekhof.net wrote: Every argument made so far applies equally to HAWK and the Linbit GUI, yet there was no outcry when they were announced. No, like I said above, that did suck - but the architecture truly is different and drbd-mc just wasn't the right answer for customers who wanted a HTML-only frontend. Besides, this is not an outcry. An outcry is revoking people's mailing list privileges and posting angry blogs. ;-) Ok, I see the point of both sides, so I will not join the outcry. :) Just for the record, the drbd mc / lcmc as an applet and a little bit backend could look like a web application, only better. ... once it is cleaned up to not try to use up a couple GB of RAM and loop in the GC, while the typical default browser plugin JVM settings allow for a handful of MB, max ... that cleanup may be useful anyways. ;) I still like LCMC. Lars ___ Pacemaker mailing list: Pacemaker@oss.clusterlabs.org http://oss.clusterlabs.org/mailman/listinfo/pacemaker Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://bugs.clusterlabs.org
Re: [Pacemaker] [RFC] [Patch] DC node preferences (dc-priority)
On Fri, May 25, 2012 at 10:50:25AM +1000, Andrew Beekhof wrote: On Fri, May 25, 2012 at 10:04 AM, Lars Ellenberg lars.ellenb...@linbit.com wrote: On Sun, May 06, 2012 at 09:45:09PM +1000, Andrew Beekhof wrote: On Thu, May 3, 2012 at 5:38 PM, Lars Ellenberg lars.ellenb...@linbit.com wrote: People sometimes think they have a use case for influencing which node will be the DC. Agreed :-) Sometimes it is latency (certain cli commands work faster when done on the DC), Config changes can be run against any node, there is no reason to go to the one on the DC. sometimes they add a mostly quorum node which may be not quite up to the task of being DC. I'm not sure I buy that. Most of the load would comes from the resources themselves. Prohibiting a node from becoming DC completely would mean it can not even be cleanly shutdown (with 1.0.x, no MCP), or act on its own resources for certain no-quorum policies. So here is a patch I have been asked to present for discussion, May one ask where it originated? against Pacemaker 1.0, that introduces a dc-prio configuration parameter, which will add some skew to the election algorithm. Open questions: * does it make sense at all? Doubtful :-) * election algorithm compatibility, stability: will the election be correct if some nodes have this patch, and some don't ? Unlikely, but you could easily make it so by placing it after the version check (and bumping said version in the patch) * How can it be improved so that a node with dc-prio=0 will give up its DC-role as soon as there is at least one other node with dc-prio 0? Short of causing an election every time a node joins... I doubt it. Where would be a suitable place in the code/fsa to do so? Just after the call to exit(0) :) Just what I thought ;-) I'd do it at the end of do_started() but only if dc-priority* 0. That way you only cause an election if someone who is likely to win it starts. And people that don't enable this feature are unaffected. * Not dc-prio, its 2012, there's no need to save the extra 4 chars :-) Thanks, -- : Lars Ellenberg : LINBIT | Your Way to High Availability : DRBD/HA support and consulting http://www.linbit.com DRBD® and LINBIT® are registered trademarks of LINBIT, Austria. ___ Pacemaker mailing list: Pacemaker@oss.clusterlabs.org http://oss.clusterlabs.org/mailman/listinfo/pacemaker Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://bugs.clusterlabs.org
Re: [Pacemaker] Debug message granularity
On Wed, May 23, 2012 at 08:37:44AM +1000, Andrew Beekhof wrote: On Tue, May 22, 2012 at 9:51 PM, Ron Kerry rke...@sgi.com wrote: On 5/22/12 3:33 AM, Andrew Beekhof wrote: and I see nothing in pacemaker itself that gives me any separate controls over its logging verbosity. Which is why I mentioned: You should be able to define PCMK_trace_functions=nction1,function2,... as an environment There is also PCMK_trace_files. Depending on your version you may also be able to set PCMK_debug=crmd,pengine,... or send SIGUSR1 to the process to increase the log level variable to get additional information from just those functions. It might take a bit of searching through source code to find the ones you care about, but it is possible. Thanks! I actually have a couple of different versions I am dealing with. I will poke through the source for the newest one (SLES11 SP2 ... pacemaker 1.1.6) I have and see what I can do. I actually do not have a specific problem I am tracking right now. I am just trying to develop a tool kit of things to do when one of our customers runs into resource issues. Makes sense. FYI: In future versions (1.1.8 onwards) sending SIGUSR1 to a process (or setting PCMK_blackbox) will enable a logging blackbox. This is a rolling buffer of all possible log messages (including debug and optionally traces) that can be dumped to a separate file by sending SIGTRAP. If enabled, we also dump it to a file when asserts are triggered. This provides easy access to copious amounts of debug for resolving issues without requiring rebuilds, restarts or needlessly spamming syslog. /me dances a jig and a reel Lars ___ Pacemaker mailing list: Pacemaker@oss.clusterlabs.org http://oss.clusterlabs.org/mailman/listinfo/pacemaker Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://bugs.clusterlabs.org
Re: [Pacemaker] [RFC] [Patch] DC node preferences (dc-priority)
On Fri, May 25, 2012 at 11:15:32AM +0200, Florian Haas wrote: On Fri, May 25, 2012 at 10:45 AM, Lars Ellenberg lars.ellenb...@linbit.com wrote: Sorry, sent to early. That would not catch the case of cluster partitions joining, only the pacemaker startup with fully connected cluster communication already up. I thought about a dc-priority default of 100, and only triggering a re-election if I am DC, my dc-priority is 50, and I see a node joining. Hardcoded arbitrary defaults aren't that much fun. You can use any number, but 100 is the magic threshold is something I wouldn't want to explain to people over and over again. Then don't ;-) Not helping, and irrelevant to this case. Besides that was an example. Easily possible: move the I want to lose vs I want to win magic number to be 0, and allow both positive and negative priorities. You get to decide whether positive or negative is the I'd rather lose side. Want to make that configurable as well? Right. I don't think this can be made part of the cib configuration, DC election takes place before cibs are resynced, so if you have diverging cibs, you possibly end up with a never ending election? Then maybe the election is stable enough, even after this change to the algorithm. But you'd need to add an other trigger on dc-priority in configuration changed, complicating this stuff for no reason. We actually discussed node defaults a while back. Those would be similar to resource and op defaults which Pacemaker already has, and set defaults for node attributes for newly joined nodes. At the time the idea was to support putting new joiners in standby mode by default, so when you added a node in a symmetric cluster, you wouldn't need to be afraid that Pacemaker would shuffle resources around.[1] This dc-priority would be another possibly useful use case for this. Not so sure about that. [1] Yes, semi-doable with putting the cluster into maintenance mode before firing up the new node, setting that node into standby, and then unsetting maintenance mode. But that's just an additional step that users can easily forget about. Why not simply add the node to the cib, and set it to standby, before it even joins for the first time. -- : Lars Ellenberg : LINBIT | Your Way to High Availability : DRBD/HA support and consulting http://www.linbit.com ___ Pacemaker mailing list: Pacemaker@oss.clusterlabs.org http://oss.clusterlabs.org/mailman/listinfo/pacemaker Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://bugs.clusterlabs.org
Re: [Pacemaker] [RFC] [Patch] DC node preferences (dc-priority)
On Fri, May 25, 2012 at 09:05:54PM +1000, Andrew Beekhof wrote: On Fri, May 25, 2012 at 7:48 PM, Florian Haas flor...@hastexo.com wrote: On Fri, May 25, 2012 at 11:38 AM, Lars Ellenberg lars.ellenb...@linbit.com wrote: On Fri, May 25, 2012 at 11:15:32AM +0200, Florian Haas wrote: On Fri, May 25, 2012 at 10:45 AM, Lars Ellenberg lars.ellenb...@linbit.com wrote: Sorry, sent to early. That would not catch the case of cluster partitions joining, only the pacemaker startup with fully connected cluster communication already up. I thought about a dc-priority default of 100, and only triggering a re-election if I am DC, my dc-priority is 50, and I see a node joining. Hardcoded arbitrary defaults aren't that much fun. You can use any number, but 100 is the magic threshold is something I wouldn't want to explain to people over and over again. Then don't ;-) Not helping, and irrelevant to this case. Besides that was an example. Easily possible: move the I want to lose vs I want to win magic number to be 0, and allow both positive and negative priorities. You get to decide whether positive or negative is the I'd rather lose side. Want to make that configurable as well? Right. Nope, 0 is used as a threshold value in Pacemaker all over the place. So allowing both positive and negative priorities and making 0 the default sounds perfectly sane to me. I don't think this can be made part of the cib configuration, DC election takes place before cibs are resynced, so if you have diverging cibs, you possibly end up with a never ending election? Then maybe the election is stable enough, even after this change to the algorithm. Andrew? This whole thread makes me want to hurt kittens. Yep... Sorry for that :( Lars ___ Pacemaker mailing list: Pacemaker@oss.clusterlabs.org http://oss.clusterlabs.org/mailman/listinfo/pacemaker Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://bugs.clusterlabs.org
[Pacemaker] Adding a new node in standby.
On Fri, May 25, 2012 at 11:48:29AM +0200, Florian Haas wrote: We actually discussed node defaults a while back. Those would be similar to resource and op defaults which Pacemaker already has, and set defaults for node attributes for newly joined nodes. At the time the idea was to support putting new joiners in standby mode by default, so when you added a node in a symmetric cluster, you wouldn't need to be afraid that Pacemaker would shuffle resources around.[1] [1] Yes, semi-doable with putting the cluster into maintenance mode before firing up the new node, setting that node into standby, and then unsetting maintenance mode. But that's just an additional step that users can easily forget about. Why not simply add the node to the cib, and set it to standby, before it even joins for the first time. Haha, good one. Wait, you weren't joking? Nope. Works for me. Not that I do that very often, but I did, and it worked. -- : Lars Ellenberg : LINBIT | Your Way to High Availability : DRBD/HA support and consulting http://www.linbit.com ___ Pacemaker mailing list: Pacemaker@oss.clusterlabs.org http://oss.clusterlabs.org/mailman/listinfo/pacemaker Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://bugs.clusterlabs.org
Re: [Pacemaker] DRBD LVM EXT4 NFS performance
On Thu, May 24, 2012 at 03:34:51PM +0300, Dan Frincu wrote: Hi, On Mon, May 21, 2012 at 4:24 PM, Christoph Bartoschek bartosc...@gmx.de wrote: Florian Haas wrote: Thus I would expect to have a write performance of about 100 MByte/s. But dd gives me only 20 MByte/s. dd if=/dev/zero of=bigfile.10G bs=8192 count=1310720 1310720+0 records in 1310720+0 records out 10737418240 bytes (11 GB) copied, 498.26 s, 21.5 MB/s If you used that same dd invocation for your local test that allegedly produced 450 MB/s, you've probably been testing only your page cache. Add oflag=dsync or oflag=direct (the latter will only work locally, as NFS doesn't support O_DIRECT). If your RAID is one of reasonably contemporary SAS or SATA drives, then a sustained to-disk throughput of 450 MB/s would require about 7-9 stripes in a RAID-0 or RAID-10 configuration. Is that what you've got? Or are you writing to SSDs? I used the same invocation with different filenames each time. To which page cache to you refer? To the one on the client or on the server side? We are using RAID-1 with 6 x 2 disks. I have repeated the local test 10 times with different files in a row: for i in `seq 10`; do time dd if=/dev/zero of=bigfile.10G.$i bs=8192 count=1310720; done The resulting values on a system that is also used by other programs as reported by dd are: 515 MB/s, 480 MB/s, 340 MB/s, 338 MB/s, 360 MB/s, 284 MB/s, 311 MB/s, 320 MB/s, 242 MB/s, 289 MB/s So I think that the system is capable of more than 200 MB/s which is way more what can arrive over the network. A bit off-topic maybe. Whenever you do these kinds of tests regarding performance on disk (locally) to test actual speed and not some caching, as Florian said, you should use oflag=direct option to dd and also echo 3 /proc/sys/vm/drop_caches and sync. You should sync before you drop caches, or you won't drop those caches that have been dirty at that time. I usually use echo 3 /proc/sys/vm/drop_caches sync date time dd if=/dev/zero of=whatever bs=1G count=x oflag=direct sync date You can assess if there is data being flushed if the results given by dd differ from those obtained by calculating the amount of data written between the two date calls. It also helps to push more data than the controller can store. Also, dd is doing one bs sized chunk at a time. fio with appropriate options can be more useful, once you learned all those options, and how to interpret the results... Regards, Dan I've done the measurements on the filesystem that sits on top of LVM and DRBD. Thus I think that DRBD is not a problem. However the strange thing is that I get 108 MB/s on the clients as soon as I disable the secondary node for DRBD. Maybe there is strange interaction between DRBD and NFS. Dedicated replication link? Maybe the additional latency is all that kills you. Do you have non-volatile write cache on your IO backend? Did you post your drbd configuration setings already? After reenabling the secondary node the DRBD synchronization is quite slow. Has anyone an idea what could cause such problems? I have no idea for further analysis. As a knee-jerk response, that might be the classic issue of NFS filling up the page cache until it hits the vm.dirty_ratio and then having a ton of stuff to write to disk, which the local I/O subsystem can't cope with. Sounds reasonable but shouldn't the I/O subsystem be capable to write anything away that arrives? Christoph -- : Lars Ellenberg : LINBIT | Your Way to High Availability : DRBD/HA support and consulting http://www.linbit.com DRBD® and LINBIT® are registered trademarks of LINBIT, Austria. ___ Pacemaker mailing list: Pacemaker@oss.clusterlabs.org http://oss.clusterlabs.org/mailman/listinfo/pacemaker Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://bugs.clusterlabs.org
Re: [Pacemaker] [RFC] [Patch] DC node preferences (dc-priority)
On Sun, May 06, 2012 at 09:45:09PM +1000, Andrew Beekhof wrote: On Thu, May 3, 2012 at 5:38 PM, Lars Ellenberg lars.ellenb...@linbit.com wrote: People sometimes think they have a use case for influencing which node will be the DC. Agreed :-) Sometimes it is latency (certain cli commands work faster when done on the DC), Config changes can be run against any node, there is no reason to go to the one on the DC. sometimes they add a mostly quorum node which may be not quite up to the task of being DC. I'm not sure I buy that. Most of the load would comes from the resources themselves. Prohibiting a node from becoming DC completely would mean it can not even be cleanly shutdown (with 1.0.x, no MCP), or act on its own resources for certain no-quorum policies. So here is a patch I have been asked to present for discussion, May one ask where it originated? against Pacemaker 1.0, that introduces a dc-prio configuration parameter, which will add some skew to the election algorithm. Open questions: * does it make sense at all? Doubtful :-) * election algorithm compatibility, stability: will the election be correct if some nodes have this patch, and some don't ? Unlikely, but you could easily make it so by placing it after the version check (and bumping said version in the patch) * How can it be improved so that a node with dc-prio=0 will give up its DC-role as soon as there is at least one other node with dc-prio 0? Short of causing an election every time a node joins... I doubt it. Where would be a suitable place in the code/fsa to do so? Thanks, -- : Lars Ellenberg : LINBIT | Your Way to High Availability : DRBD/HA support and consulting http://www.linbit.com ___ Pacemaker mailing list: Pacemaker@oss.clusterlabs.org http://oss.clusterlabs.org/mailman/listinfo/pacemaker Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://bugs.clusterlabs.org
[Pacemaker] [RFC] [Patch] DC node preferences (dc-priority)
, local_handle, use_mgmtd, value, no); if(ais_get_boolean(value) == FALSE) { int lpc = 0; @@ -584,6 +587,7 @@ pcmk_env.logfile = NULL; pcmk_env.use_logd = false; pcmk_env.syslog = daemon; +pcmk_env.dc_prio = 1; if(cs_uid != root_uid) { ais_err(Corosync must be configured to start as 'root', --- ./lib/ais/utils.c.orig 2011-05-11 11:27:08.460183200 +0200 +++ ./lib/ais/utils.c 2011-05-11 17:29:09.182064800 +0200 @@ -171,6 +171,7 @@ setenv(HA_logfacility,pcmk_env.syslog, 1); setenv(HA_LOGFACILITY,pcmk_env.syslog, 1); setenv(HA_use_logd, pcmk_env.use_logd, 1); + setenv(HA_dc_prio,pcmk_env.dc_prio, 1); if(pcmk_env.logfile) { setenv(HA_debugfile, pcmk_env.logfile, 1); } --- ./lib/ais/utils.h.orig 2011-05-11 11:26:12.757414700 +0200 +++ ./lib/ais/utils.h 2011-05-11 17:36:34.194841700 +0200 @@ -226,6 +226,7 @@ const char *syslog; const char *logfile; const char *use_logd; + const char *dc_prio; }; extern struct pcmk_env_s pcmk_env; -- : Lars Ellenberg : LINBIT | Your Way to High Availability : DRBD/HA support and consulting http://www.linbit.com DRBD® and LINBIT® are registered trademarks of LINBIT, Austria. ___ Pacemaker mailing list: Pacemaker@oss.clusterlabs.org http://oss.clusterlabs.org/mailman/listinfo/pacemaker Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://bugs.clusterlabs.org
Re: [Pacemaker] ERROR: te_graph_trigger: Transition failed: terminated pacemaker's problem or mine?
On Mon, Apr 30, 2012 at 01:00:11PM +1000, Andrew Beekhof wrote: On Sat, Apr 28, 2012 at 5:40 AM, Lars Ellenberg lars.ellenb...@linbit.com wrote: On Fri, Apr 27, 2012 at 11:31:23AM +0100, Tim Small wrote: Hi, I'm trying to get to the bottom of a problem I'm seeing with a cluster. At this stage I'm unclear as to whether the issue is with the config or not - the generated error messages seem unclear. So I'm not sure whether I should be staring at the config or the source code at this point, and would appreciate a clue! I'm running with some of the (live) resources in an unmanaged state whilst testing fail-over with other (non-dependant) resources. The managed resources are a number of OpenVZ virtual machines (each comprising 3 primitives - file-system + OpenVZ VE + SendArp). The filesystems are on LVM volume groups, and the single LVM PV for each volume group resides on a DRBD volume. There are n virtual machines per DRBD volume. I'm running pacemaker 1.0.9.1+hg15626-1 on Debian 6.0. Here are some of the messages (configuration follows at the end of the email): Upgrading to 1.0.12, or 1.1.7, may get you a little further. It would not solve the I need to stop that resource first, but I can not as it is unmanaged dependency problem you apparently have here. There's really not a lot the cluster can do in this situation, there's a 50% chance of getting it wrong no matter what we do. In the most recent versions we now log as loudly as possible (LOG_CRIT) that we cant shutdown because something depends on an unmanaged resource. That's in fact what I meant ;-) Not only the cryptic ERROR: te_graph_trigger: Transition failed: terminated but Hey you fool, I cannot do that because you told me not to manage that resource, but the other ones depend on it. Though, you still have to spot that line in the flood... -- : Lars Ellenberg : LINBIT | Your Way to High Availability : DRBD/HA support and consulting http://www.linbit.com DRBD® and LINBIT® are registered trademarks of LINBIT, Austria. ___ Pacemaker mailing list: Pacemaker@oss.clusterlabs.org http://oss.clusterlabs.org/mailman/listinfo/pacemaker Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://bugs.clusterlabs.org
Re: [Pacemaker] ERROR: te_graph_trigger: Transition failed: terminated pacemaker's problem or mine?
-SendArp-with-athena-VE inf: athena-SendArp athena-VE colocation athena-VE-with-athena-FS inf: athena-VE athena-FS colocation calypso-FS-on-essex03-LVM inf: calypso-FS essex03-LVM Ok, colo calypso with essex03... but then, why ... colocation calypso-SendArp-with-calypso-VE inf: calypso-SendArp calypso-VE colocation calypso-VE-with-calypso-FS inf: calypso-VE calypso-FS colocation epione-FS-on-essex02-LVM inf: epione-FS essex02-LVM colocation epione-FS-with-essex02-LVM inf: epione-FS essex02-LVM colocation epione-SendArp-with-epione-VE inf: epione-SendArp epione-VE colocation epione-VE-with-epione-FS inf: epione-VE epione-FS colocation essex02-LVM-with-essex02-DRBD-Master inf: essex02-LVM ms-drbd-essex02:Master colocation essex03LVM-on-ms-drbd-essex03 inf: essex03-LVM ms-drbd-essex03:Master colocation essextest-FS-with-essex02-LVM inf: essextest-FS essex02-LVM colocation essextest-SendArp-with-essextest-VE inf: essextest-SendArp essextest-VE colocation essextest-VE-with-essextest-FS inf: essextest-VE essextest-FS order artemis-FS-before-artemis-VE inf: artemis-FS artemis-VE order artemis-VE-before-artemis-SendArp inf: artemis-VE artemis-SendArp order athena-FS-before-athena-VE inf: athena-FS athena-VE order athena-VE-before-athena-SendArp inf: athena-VE athena-SendArp order calypso-FS-before-calypso-VE inf: calypso-FS calypso-VE order calypso-VE-before-calypso-SendArp inf: calypso-VE calypso-SendArp order epione-FS-before-epione-VE inf: epione-FS epione-VE order epione-VE-before-epione-SendArp inf: epione-VE epione-SendArp order essex02-lvm-before-artemis-FS inf: essex02-LVM artemis-FS order essex02-lvm-before-athena-FS inf: essex02-LVM athena-FS order essex02-lvm-before-calypso-FS inf: essex02-LVM calypso-FS Order essex02 with calypso? typo? is this supposed to be essex03? order essex02-lvm-before-epione-FS inf: essex02-LVM epione-FS order essex02-lvm-before-essextest-FS inf: essex02-LVM essextest-FS order essextest-FS-before-essextest-VE inf: essextest-FS essextest-VE order essextest-VE-before-essextest-SendArp inf: essextest-VE essextest-SendArp order ms-drbd-essex02-before-lvm inf: ms-drbd-essex02:promote essex02-LVM:start order ms-drbd-essex03-before-lvm inf: ms-drbd-essex03:promote essex03-LVM:start property $id=cib-bootstrap-options \ dc-version=1.0.9-74392a28b7f31d7ddc86689598bd23114f58978b \ cluster-infrastructure=openais \ expected-quorum-votes=2 \ no-quorum-policy=ignore \ stonith-enabled=false \ last-lrm-refresh=1335487560 # crm configure verify WARNING: artemis-FS: default timeout 20s for stop is smaller than the advised 60 WARNING: artemis-VE: default timeout 20s for stop is smaller than the advised 75 WARNING: athena-FS: default timeout 20s for stop is smaller than the advised 60 WARNING: athena-VE: default timeout 20s for stop is smaller than the advised 75 WARNING: calypso-FS: default timeout 20s for stop is smaller than the advised 60 WARNING: calypso-VE: default timeout 20s for stop is smaller than the advised 75 WARNING: epione-FS: default timeout 20s for stop is smaller than the advised 60 WARNING: epione-VE: default timeout 20s for stop is smaller than the advised 75 WARNING: essex02-LVM: default timeout 20s for stop is smaller than the advised 30 WARNING: essex03-LVM: default timeout 20s for stop is smaller than the advised 30 WARNING: essextest-FS: default timeout 20s for stop is smaller than the advised 60 WARNING: essextest-VE: default timeout 20s for stop is smaller than the advised 75 WARNING: essex02-DRBD: specified timeout 100s for start is smaller than the advised 240 WARNING: essex02-DRBD: default timeout 20s for stop is smaller than the advised 100 WARNING: essex03-DRBD: default timeout 20s for stop is smaller than the advised 100 -- : Lars Ellenberg : LINBIT | Your Way to High Availability : DRBD/HA support and consulting http://www.linbit.com DRBD® and LINBIT® are registered trademarks of LINBIT, Austria. ___ Pacemaker mailing list: Pacemaker@oss.clusterlabs.org http://oss.clusterlabs.org/mailman/listinfo/pacemaker Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://bugs.clusterlabs.org
Re: [Pacemaker] Sporadic problems of rejoin after split brain situation
: do_te_invoke: Processing graph 16 (ref=pe_calc-dc-1331858128-123) derived from /var/lib/pengine/pe-input-201.bz2 Mar 16 01:35:28 oan1 pengine: [17673]: notice: process_pe_message: Transition 16: PEngine Input stored in: /var/lib/pengine/pe-input-201.bz2 Mar 16 01:35:28 oan1 crmd: [7601]: info: run_graph: Mar 16 01:35:28 oan1 crmd: [7601]: notice: run_graph: Transition 16 (Complete=0, Pending=0, Fired=0, Skipped=0, Incomplete=0, Source=/var/lib/pengine/pe-input- 201.bz2): Complete Mar 16 01:35:28 oan1 crmd: [7601]: info: te_graph_trigger: Transition 16 is now complete Mar 16 01:35:28 oan1 crmd: [7601]: info: notify_crmd: Transition 16 status: done - null Mar 16 01:35:28 oan1 crmd: [7601]: info: do_state_transition: State transition S_TRANSITION_ENGINE - S_IDLE [ input=I_TE_SUCCESS cause=C_FSA_INTERNAL origin=notify_crmd ] Mar 16 01:35:28 oan1 crmd: [7601]: info: do_state_transition: Starting PEngine Recheck Timer Mar 16 01:35:30 oan1 cib: [7597]: info: cib_process_diff: Diff 0.210.56 - 0.210.57 not applied to 0.210.50: current num_updates is less than required Mar 16 01:35:30 oan1 cib: [7597]: WARN: cib_server_process_diff: Not requesting full refresh in R/W mode Mar 16 01:35:30 oan1 ccm: [7596]: info: Break tie for 2 nodes cluster Mar 16 01:35:30 oan1 cib: [7597]: info: mem_handle_event: Got an event OC_EV_MS_INVALID from ccm Mar 16 01:35:30 oan1 cib: [7597]: info: mem_handle_event: no mbr_track info Mar 16 01:35:30 oan1 cib: [7597]: info: mem_handle_event: Got an event OC_EV_MS_NEW_MEMBERSHIP from ccm Mar 16 01:35:30 oan1 cib: [7597]: info: mem_handle_event: instance=14, nodes=1, new=0, lost=0, n_idx=0, new_idx=1, old_idx=3 Mar 16 01:35:30 oan1 cib: [7597]: info: cib_ccm_msg_callback: Processing CCM event=NEW MEMBERSHIP (id=14) Mar 16 01:35:30 oan1 crmd: [7601]: info: mem_handle_event: Got an event OC_EV_MS_INVALID from ccm Mar 16 01:35:30 oan1 crmd: [7601]: info: mem_handle_event: no mbr_track info Mar 16 01:35:30 oan1 crmd: [7601]: info: mem_handle_event: Got an event OC_EV_MS_NEW_MEMBERSHIP from ccm Mar 16 01:35:30 oan1 crmd: [7601]: info: mem_handle_event: instance=14, nodes=1, new=0, lost=0, n_idx=0, new_idx=1, old_idx=3 Mar 16 01:35:30 oan1 crmd: [7601]: info: crmd_ccm_msg_callback: Quorum (re)attained after event=NEW MEMBERSHIP (id=14) Mar 16 01:35:30 oan1 crmd: [7601]: info: ccm_event_detail: NEW MEMBERSHIP: trans=14, nodes=1, new=0, lost=0 n_idx=0, new_idx=1, old_idx=3 Mar 16 01:35:30 oan1 crmd: [7601]: info: ccm_event_detail:CURRENT: oan1 [nodeid=0, born=14] Mar 16 01:35:30 oan1 crmd: [7601]: info: populate_cib_nodes_ha: Requesting the list of configured nodes Mar 16 01:35:31 oan1 ccm: [7596]: info: Break tie for 2 nodes cluster Mar 16 01:35:31 oan1 crmd: [7601]: info: mem_handle_event: Got an event OC_EV_MS_INVALID from ccm Mar 16 01:35:31 oan1 crmd: [7601]: info: mem_handle_event: no mbr_track info Mar 16 01:35:31 oan1 crmd: [7601]: info: mem_handle_event: Got an event OC_EV_MS_NEW_MEMBERSHIP from ccm Mar 16 01:35:31 oan1 crmd: [7601]: info: mem_handle_event: instance=15, nodes=1, new=0, lost=0, n_idx=0, new_idx=1, old_idx=3 Mar 16 01:35:31 oan1 cib: [7597]: info: mem_handle_event: Got an event OC_EV_MS_INVALID from ccm Mar 16 01:35:31 oan1 cib: [7597]: info: mem_handle_event: no mbr_track info Mar 16 01:35:31 oan1 cib: [7597]: info: mem_handle_event: Got an event OC_EV_MS_NEW_MEMBERSHIP from ccm Mar 16 01:35:31 oan1 cib: [7597]: info: mem_handle_event: instance=15, nodes=1, new=0, lost=0, n_idx=0, new_idx=1, old_idx=3 Mar 16 01:35:31 oan1 cib: [7597]: info: cib_ccm_msg_callback: Processing CCM event=NEW MEMBERSHIP (id=15) Mar 16 01:35:31 oan1 crmd: [7601]: info: crmd_ccm_msg_callback: Quorum (re)attained after event=NEW MEMBERSHIP (id=15) Mar 16 01:35:31 oan1 crmd: [7601]: info: ccm_event_detail: NEW MEMBERSHIP: trans=15, nodes=1, new=0, lost=0 n_idx=0, new_idx=1, old_idx=3 Mar 16 01:35:31 oan1 crmd: [7601]: info: ccm_event_detail:CURRENT: oan1 [nodeid=0, born=15] Mar 16 01:35:31 oan1 cib: [7597]: info: cib_process_request: Operation complete: op cib_modify for section nodes (origin=local/crmd/205, version=0.210.51): ok (rc=0) Mar 16 01:35:31 oan1 crmd: [7601]: info: populate_cib_nodes_ha: Requesting the list of configured nodes ___ Pacemaker mailing list: Pacemaker@oss.clusterlabs.org http://oss.clusterlabs.org/mailman/listinfo/pacemaker Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://bugs.clusterlabs.org -- : Lars Ellenberg : LINBIT | Your Way to High Availability : DRBD/HA support and consulting http://www.linbit.com DRBD® and LINBIT® are registered trademarks of LINBIT, Austria. ___ Pacemaker mailing list
Re: [Pacemaker] Can Master/Slave resource transit from Master to Stopped directly ?
On Tue, Mar 06, 2012 at 08:18:28PM +0900, Takatoshi MATSUO wrote: Hi Dejan 2012/3/6 Dejan Muhamedagic deja...@fastmail.fm: Hi, On Tue, Mar 06, 2012 at 01:15:45PM +0900, Takatoshi MATSUO wrote: Hi I want Pacemaker to transit from Master to Stopped directly on demote without failcount for managing PostgreSQL streaming replication. Can Pacemaker do this ? What the RA should do on demote is, well, demote an instance to slave. Why would you want to stop it? Because PostgreSQL cannot transit from Master to Slave. Of course, nothing's stopping you to that and I guess that pacemaker would be able to deal with it eventually. But note that it'll expect the resource to be in the Started state after demote. It causes failing of monitor in spite of success of demote. I returned $OCF_NOT_RUNNING on demote as a trial, but it incremented a failcount. $OCF_NOT_RUNNING should be used only by the monitor operation. It'll count as error with other operations. get it. Actually, Andrew told me on IRC about plans to support this: beekhof oh, and start ops will be able to tell us a resource is master and demote that its stopped beekhof if thats something you feel inclined to take advantage So, a start could then return $OCF_RUNNING_MASTER to indicate that it went straight into Master mode, and a demote would be able to indicate it went straight into Stopped state by returning $OCF_NOT_RUNNING. No idea when that will be available or in which release. Lars -- : Lars Ellenberg : LINBIT | Your Way to High Availability : DRBD/HA support and consulting http://www.linbit.com DRBD® and LINBIT® are registered trademarks of LINBIT, Austria. ___ Pacemaker mailing list: Pacemaker@oss.clusterlabs.org http://oss.clusterlabs.org/mailman/listinfo/pacemaker Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://bugs.clusterlabs.org
Re: [Pacemaker] Surprisingly fast start of resources on cluster failover.
On Tue, Mar 06, 2012 at 01:49:11PM +0100, Florian Crouzat wrote: Hi, On a two nodes active/passive cluster, I placed a location constraint of 50 for #uname node1. As soon as applied, things moved from node2 to node1: right. I have a lsb init script defined as a resource: $ crm configure show firewall primitive firewall lsb:firewall\ op monitor on-fail=restart interval=10s \ op start interval=0 timeout=3min \ op stop interval=0 timeout=1min \ meta target-role=Started This lsb takes a long time to start, at least 55 seconds when fired from my shell over ssh. It logs a couple things to std{out,err}. If a couple things actually happen to be a lot, then having stdout/err on tty via ssh in xterm ... can slow things down. Did you also time it as time /etc/init.d/firewall out.txt 2err.txt So, while node1 was taking-over, I noticed in /var/log/pacemaker/lrmd.log that it only took 24 seconds to start that resource. My question: how comes pacemaker starts a resources twice as fast than I do from CLI ? Other than above suggestion, did you verify that it ends up doing the same thing when started from pacemaker, compared to when started by you from commandline? Did you compare the results? -- : Lars Ellenberg : LINBIT | Your Way to High Availability : DRBD/HA support and consulting http://www.linbit.com DRBD® and LINBIT® are registered trademarks of LINBIT, Austria. ___ Pacemaker mailing list: Pacemaker@oss.clusterlabs.org http://oss.clusterlabs.org/mailman/listinfo/pacemaker Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://bugs.clusterlabs.org
Re: [Pacemaker] Failing to move around IPaddr2 resource
configuration). If you remove the primary address, all secondary addressees are removed as well. If you get concurrent stop of several such IPs, those that expect the IP to be still there may fail. Basically that's some racecondition. If that is indeed your issue, you can either assign one static IP on that nic, or sysctl -w net.ipv4.conf.all.promote_secondaries=1 (or per device). When I manually go and cleanup the failed nodes, they get properly assigned to the nodes that aren't down, so if we can't resolve the underlying issue, is there a way to automatically attempt to cleanup failed resources a limited number of times? I don't think you want to start the IP somewhere else if its still active on the original node. My configuration is here, in case there's anything wrong with it. Looks like you forgot to attach it. Anlu ___ Pacemaker mailing list: Pacemaker@oss.clusterlabs.org http://oss.clusterlabs.org/mailman/listinfo/pacemaker Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://bugs.clusterlabs.org ___ Pacemaker mailing list: Pacemaker@oss.clusterlabs.org http://oss.clusterlabs.org/mailman/listinfo/pacemaker Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://bugs.clusterlabs.org -- : Lars Ellenberg : LINBIT | Your Way to High Availability : DRBD/HA support and consulting http://www.linbit.com DRBD® and LINBIT® are registered trademarks of LINBIT, Austria. ___ Pacemaker mailing list: Pacemaker@oss.clusterlabs.org http://oss.clusterlabs.org/mailman/listinfo/pacemaker Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://bugs.clusterlabs.org
Re: [Pacemaker] Where is MAXMSG defined?
On Tue, Feb 07, 2012 at 01:13:19PM +0200, Adrian Fita wrote: Hi. I can't find any trace of define MAXMSG in either pacemaker, corosync, heartbeat's source code. I tried with grep -R 'MAXMSG' * and nothing. Where is it defined?! If you are asking about what I think you do, then that would be in glue, include/clplumbing/ipc.h But be careful, when fiddling with it. What are you trying to solve, btw? -- : Lars Ellenberg : LINBIT | Your Way to High Availability : DRBD/HA support and consulting http://www.linbit.com ___ Pacemaker mailing list: Pacemaker@oss.clusterlabs.org http://oss.clusterlabs.org/mailman/listinfo/pacemaker Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://bugs.clusterlabs.org
Re: [Pacemaker] Proper way to migrate multistate resource?
On Mon, Feb 06, 2012 at 04:48:26PM -0800, Chet Burgess wrote: Greetings, I'm some what new to pacemaker and have been playing around with a number of configurations in a lab. Most recently I've been testing a multistate resource using the ofc:pacemaker:Stateful example RA. While I've gotten the agent to work and notice that if I shutdown or kill a node the resources migrate I can't seem to figure out the proper way to migrate the resource between nodes when they are both up. For regular resources I've used crm resource migrate rsc without issue. However when I try this with a multistate resource it doesn't seem to work. When I run the command it just puts the slave node into a stopped state. If I try and tell it to migrate specifically to the slave node it claims to already be running their (which I suppose in a sense it is). the crm shell does not support roles for the move or migrate command (yet; maybe in newer versions. Dejan?). What you need to do is set a location constraint on the role. * force master role off from one node: location you-name-it resource-id \ rule $role=Master -inf: \ #uname eq node-where-it-should-be-slave * or force master role off from all but one node, note the double negation in this one: location you-name-it resource-id \ rule $role=Master -inf: \ #uname ne node-where-it-should-be-master Cheers, Lars --- : Lars Ellenberg : LINBIT | Your Way to High Availability : DRBD/HA support and consulting http://www.linbit.com The only method I've found to safely and reliable migrate a multistate resource from one node to another is I think it has something to do with the resource constraints I used to prefer a particular node, but I'm not entirely sure how the constraints and the master/slave state updating stuff works. Am I using the wrong tool to migrate a multistate resource or is my configuration wrong in some way? Any input greatly appreciated. Thank you. Configuration: r...@tst3.local1.mc:/home/cfb$ crm configure show node tst3.local1.mc.metacloud.com node tst4.local1.mc.metacloud.com primitive stateful-test ocf:pacemaker:Stateful \ op monitor interval=30s role=Slave \ op monitor interval=31s role=Master ms ms-test stateful-test \ meta clone-node-max=1 notify=false master-max=1 master-node-max=1 target-role=Master location ms-test_constraint_1 ms-test 25: tst3.local1.mc.metacloud.com location ms-test_constraint_2 ms-test 20: tst4.local1.mc.metacloud.com property $id=cib-bootstrap-options \ cluster-infrastructure=openais \ dc-version=1.1.5-01e86afaaa6d4a8c4836f68df80ababd6ca3902f \ last-lrm-refresh=1325273678 \ expected-quorum-votes=2 \ no-quorum-policy=ignore \ stonith-enabled=false rsc_defaults $id=rsc-options \ resource-stickiness=100 -- Chet Burgess c...@liquidreality.org ___ Pacemaker mailing list: Pacemaker@oss.clusterlabs.org http://oss.clusterlabs.org/mailman/listinfo/pacemaker Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://bugs.clusterlabs.org ___ Pacemaker mailing list: Pacemaker@oss.clusterlabs.org http://oss.clusterlabs.org/mailman/listinfo/pacemaker Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://bugs.clusterlabs.org
Re: [Pacemaker] Proper way to migrate multistate resource?
On Tue, Feb 07, 2012 at 02:03:32PM +0100, Michael Schwartzkopff wrote: On Mon, Feb 06, 2012 at 04:48:26PM -0800, Chet Burgess wrote: Greetings, I'm some what new to pacemaker and have been playing around with a number of configurations in a lab. Most recently I've been testing a multistate resource using the ofc:pacemaker:Stateful example RA. While I've gotten the agent to work and notice that if I shutdown or kill a node the resources migrate I can't seem to figure out the proper way to migrate the resource between nodes when they are both up. For regular resources I've used crm resource migrate rsc without issue. However when I try this with a multistate resource it doesn't seem to work. When I run the command it just puts the slave node into a stopped state. If I try and tell it to migrate specifically to the slave node it claims to already be running their (which I suppose in a sense it is). the crm shell does not support roles for the move or migrate command (yet; maybe in newer versions. Dejan?). What you need to do is set a location constraint on the role. * force master role off from one node: location you-name-it resource-id \ rule $role=Master -inf: \ #uname eq node-where-it-should-be-slave * or force master role off from all but one node, note the double negation in this one: location you-name-it resource-id \ rule $role=Master -inf: \ #uname ne node-where-it-should-be-master These constraints would prevent the MS resource to run in Master state even on that node. Even in case the preferred node is not available any more. This might be not what Chet wanted. Well, it is just what crm resource migrate does, otherwise. After migration, you obviously need to unmigrate, i.e. delete that constraint again. -- : Lars Ellenberg : LINBIT | Your Way to High Availability : DRBD/HA support and consulting http://www.linbit.com DRBD® and LINBIT® are registered trademarks of LINBIT, Austria. ___ Pacemaker mailing list: Pacemaker@oss.clusterlabs.org http://oss.clusterlabs.org/mailman/listinfo/pacemaker Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://bugs.clusterlabs.org
Re: [Pacemaker] How to start resources in a Resource Group in parallel
On Thu, Feb 02, 2012 at 08:28:16PM +1100, Andrew Beekhof wrote: On Tue, Jan 31, 2012 at 9:52 PM, Dejan Muhamedagic deja...@fastmail.fm wrote: Hi, On Tue, Jan 31, 2012 at 10:29:14AM +, Kashif Jawed Siddiqui wrote: Hi Andrew, It is the LRMD_MAX_CHILDREN limit which by default is 4. I see in forums that this parameter is tunable by adding /etc/sysconfig/pacemaker with the following line as content LRMD_MAX_CHILDREN=8 But the above works only for Hearbeat. How do we do it for Corosync? can you suggest? It is not heartbeat or corosync specific, but depends on support in the init script (/etc/init.d/corosync). The init script should read the sysconfig file and then invoke lrmadmin to set the max children parameter. Just a reminder, but systemd unit files cannot do this. SLES wont be affected for a while, but openSUSE users will presumably start complaining soon. I recommend: diff -r 0285b706fcde lrm/lrmd/lrmd.c --- a/lrm/lrmd/lrmd.c Tue Sep 28 19:10:38 2010 +0200 +++ b/lrm/lrmd/lrmd.c Thu Feb 02 20:27:33 2012 +1100 @@ -832,6 +832,13 @@ main(int argc, char ** argv) init_stop(PID_FILE); } +if(getenv(LRMD_MAX_CHILDREN)) { +int tmp = atoi(getenv(LRMD_MAX_CHILDREN)); +if(tmp 4) { +max_child_count = tmp; +} +} + return init_start(); } Yes, please... and of course we have to remember to not only set, but also export LRMD_MAX_CHILDREN from wherever lrmd will be started from. Lars ___ Pacemaker mailing list: Pacemaker@oss.clusterlabs.org http://oss.clusterlabs.org/mailman/listinfo/pacemaker Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://bugs.clusterlabs.org
Re: [Pacemaker] don't want to restart clone resource
On Wed, Feb 01, 2012 at 03:43:55PM +0100, Andreas Kurz wrote: Hello, On 02/01/2012 10:39 AM, Fanghao Sha wrote: Hi Lars, Yes, you are right. But how to prevent the orphaned resources from stopping by default, please? crm configure property stop-orphan-resources=false Well, sure. But for normal ophans, you actually want them to be stopped. No, pacemaker needs some additional smarts to recognize that there actually are no orphans, maybe by first relabling, and only then checking for instance label clone-max. Did you file a bugzilla? Has that made progress? -- : Lars Ellenberg : LINBIT | Your Way to High Availability : DRBD/HA support and consulting http://www.linbit.com ___ Pacemaker mailing list: Pacemaker@oss.clusterlabs.org http://oss.clusterlabs.org/mailman/listinfo/pacemaker Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://bugs.clusterlabs.org
Re: [Pacemaker] How to start resources in a Resource Group in parallel
On Tue, Jan 31, 2012 at 11:38:17AM +, Kashif Jawed Siddiqui wrote: Hi, Yes it has to be provided in init scripts of corosync or heartbeat. But for corosync 1.4.2 for SLES, it is not provided. Can you help me update corosync init script to include to same? Sample script will definitly help. Well, just look at what the heartbeat start script does: http://hg.linux-ha.org/heartbeat-STABLE_3_0/file/1f282434405b/heartbeat/init.d/heartbeat.in#l262 The relevant commit adding this was http://hg.linux-ha.org/heartbeat-STABLE_3_0/rev/f61f00ab4fab But since you are using SLES, why not complain there, and have them add it for you? -- : Lars Ellenberg : LINBIT | Your Way to High Availability : DRBD/HA support and consulting http://www.linbit.com ___ Pacemaker mailing list: Pacemaker@oss.clusterlabs.org http://oss.clusterlabs.org/mailman/listinfo/pacemaker Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://bugs.clusterlabs.org
Re: [Pacemaker] How to flush the arp cache of a router?
On Thu, Jan 26, 2012 at 01:05:07AM +0100, ge...@riseup.net wrote: He all, I'm using Debian Stable and corosync/pacemaker/DRBD with Asterisk in a master/slave-setup. I get calls routed from my carrier to an ip in a private net. I'm using a Cisco 876 as the router. As the ressource agent for managing a virtual ip I'm using IpAddr2, which should do arp broadcast when bringing up the ip (as far as I read). However, in my case this doesn't work. I then had a look at SendArp, but read, that one shouldn't use this in conjunction with IpAddr2. Anyway, this didn't work also. In the end I tried to use arping, which works great, but I found no way to execute it from the cluster automatically. I tried to put it into a file and made this executable, and used lsb: to call it (which didn't work). Then I googled for hours to find out, how to call scripts from within crm, but had no success... Could someone point me into the right direction? Did you tcpdump? Does IPaddr2 send_arp actually work and send out the unsolicited arps it is supposed to send? Do you have any IPaddr2.*: ERROR: Could not send gratuitous arps in your logs? Maybe replacing the call to send_arp with calls to arping will do, as I described in this thread: http://www.gossamer-threads.com/lists/linuxha/pacemaker/58444 -- : Lars Ellenberg : LINBIT | Your Way to High Availability : DRBD/HA support and consulting http://www.linbit.com ___ Pacemaker mailing list: Pacemaker@oss.clusterlabs.org http://oss.clusterlabs.org/mailman/listinfo/pacemaker Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://bugs.clusterlabs.org
Re: [Pacemaker] [Problem] The attrd does not sometimes stop.
On Mon, Jan 16, 2012 at 04:46:58PM +1100, Andrew Beekhof wrote: Now we proceed to the next mainloop poll: poll([{fd=7, events=POLLIN|POLLPRI}, {fd=4, events=POLLIN|POLLPRI}, {fd=5, events=POLLIN|POLLPRI}], 3, -1 Note the -1 (infinity timeout!) So even though the trigger was (presumably) set, and the -prepare() should have returned true, the mainloop waits forever for something to happen on those file descriptors. I suggest this: crm_trigger_prepare should set *timeout = 0, if trigger is set. Also think about this race: crm_trigger_prepare was already called, only then the signal came in... diff --git a/lib/common/mainloop.c b/lib/common/mainloop.c index 2e8b1d0..fd17b87 100644 --- a/lib/common/mainloop.c +++ b/lib/common/mainloop.c @@ -33,6 +33,13 @@ static gboolean crm_trigger_prepare(GSource * source, gint * timeout) { crm_trigger_t *trig = (crm_trigger_t *) source; + /* Do not delay signal processing by the mainloop poll stage */ + if (trig-trigger) + *timeout = 0; + /* To avoid races between signal delivery and the mainloop poll stage, + * make sure we always have a finite timeout. Unit: milliseconds. */ + else + *timeout = 5000; /* arbitrary */ return trig-trigger; } This scenario does not let the blocked IPC off the hook, though. That is still possible, both for blocking send and blocking receive, so that should probably be fixed as well, somehow. I'm not sure how likely this stuck in blocking IPC is, though. Interesting, are you sure you're in the right function though? trigger and signal events don't have a file descriptor... wouldn't these polls be for the IPC related sources and wouldn't they be setting their own timeout? http://developer.gnome.org/glib/2.30/glib-The-Main-Event-Loop.html#GSourceFuncs iiuc, mainloop does something similar to (oversimplified): timeout = -1; /* infinity */ for s in all GSource tmp_timeout = -1; s-prepare(s, tmp_timeout) if (tmp_timeout = 0 tmp_timeout timeout) timeout = tmp_timeout; poll(GSource fd set, n, timeout); for s in all GSource if s-check(s) s-dispatch(s, ...) And at some stage it also orders by priority, of course. Also compare with the comment above /* Sigh... */ in glue G_SIG_prepare(). BTW, the mentioned race between signal delivery and mainloop already doing the poll stage could potentially be solved by using cl_signal_set_interrupt(SIGTERM, 1), which would mean we can condense the prepare to if (trig-trigger) *timeout = 0; return trig-trigger; Glue (and heartbeat) code base is not that, let's say, involved, because someone had been paranoid. But because someone had been paranoid for a reason ;-) Cheers, -- : Lars Ellenberg : LINBIT | Your Way to High Availability : DRBD/HA support and consulting http://www.linbit.com DRBD® and LINBIT® are registered trademarks of LINBIT, Austria. ___ Pacemaker mailing list: Pacemaker@oss.clusterlabs.org http://oss.clusterlabs.org/mailman/listinfo/pacemaker Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://bugs.clusterlabs.org
Re: [Pacemaker] [Problem] The attrd does not sometimes stop.
On Mon, Jan 16, 2012 at 11:42:32PM +1100, Andrew Beekhof wrote: http://developer.gnome.org/glib/2.30/glib-The-Main-Event-Loop.html#GSourceFuncs iiuc, mainloop does something similar to (oversimplified): timeout = -1; /* infinity */ for s in all GSource tmp_timeout = -1; s-prepare(s, tmp_timeout) if (tmp_timeout = 0 tmp_timeout timeout) timeout = tmp_timeout; poll(GSource fd set, n, timeout); I'm looking at the glib code again now, and it still looks to me like the trigger and signal sources do not appear in this fd set. Their setup functions would have to have called g_source_add_poll() somewhere, which they don't. So I'm still not seeing why its a trigger or signal sources' fault that glib is doing a never ending call to poll(). poll() is going to get called regardless of whether our prepare function returns true or not. Looking closer, crm_trigger_prepare() returning TRUE results in: ready_source-flags |= G_SOURCE_READY; which in turn causes: context-timeout = 0; which is essentially what adding if (trig-trigger) *timeout = 0; to crm_trigger_prepare() was intended to achieve. Shouldn't the fd, ipc or wait sources (who do call g_source_add_poll() and could therefor cause poll() to block forever) have a sane timeout in their prepare functions? Probably should, but they usually have not. The reasoning probably is, each GSource is responsible for *itself* only. That is why first all sources are prepared. If no non-fd, non-pollable source feels the need to reduce the *timeout to something finite in its prepare(), so be it. Besides, what is sane? 1 second? 5? 120? 240? That's why G_CH_prepare_int() sets the *timeout to 1000, and why I suggest to set it to 0 if prepare already knows that the trigger is set, and to some finite amount to avoid getting stuck in poll, in case no timeout or outher source source is active which also set some finite timeout. BTW, if you have an *idle* sources, prepare should set timeout to 0. For those interested, all described below http://developer.gnome.org/glib/2.30/glib-The-Main-Event-Loop.html#GSourceFuncs For idle sources, the prepare and check functions always return TRUE to indicate that the source is always ready to be processed. The prepare function also returns a timeout value of 0 to ensure that the poll() call doesn't block (since that would be time wasted which could have been spent running the idle function). ... timeout sources ... returns a timeout value to ensure that the poll() call doesn't block too long ... ... file descriptor sources ... timeout to -1 to indicate that is does not mind how long the poll() call blocks ... Or is it because the signal itself is interrupting some essential part of G_CH_prepare_int() and friends? In the provided strace, it looks like the SIGTERM is delivered while calling some G_CH_prepare_int, the -prepare() used by G_main_add_IPC_Channel. Since the signal sources are of higher priority, we probably are passt those already in this iteration, we will only notice the trigger in the next check(), after the poll. So it is vital for any non-pollable source such as signals to set a finite timeout in their prepare(), even if we also mark that signal siginterrupt(). for s in all GSource if s-check(s) s-dispatch(s, ...) And at some stage it also orders by priority, of course. Also compare with the comment above /* Sigh... */ in glue G_SIG_prepare(). BTW, the mentioned race between signal delivery and mainloop already doing the poll stage could potentially be solved by using cl_signal_set_interrupt(SIGTERM, 1), As I just wrote above, that race is not solved at all. Only the (necessarily set) finite timeout of the poll would be shortened in that case. But I can't escape the feeling that calling this just masks the underlying why is there a never-ending call to poll() in the first place issue. G_CH_prepare_int() and friends /should/ be setting timeouts so that poll() can return and any sources created by g_idle_source_new() can execute. Actually, thinking further, I'm pretty convinced that poll() with an infinite timeout is the default mode of operation for mainloops with cluster-glue's IPC and FD sources. And that this is not a good thing :) Well, if there are *only* pollable sources, it is. If there are any other sources, they should have set their limit on what they think is an acceptable timeout int their prepare(). Far too late, brain shutting down. ;-) ...not a good thing, because it breaks the idle stuff, see above, explanation on developer.gnome.org, idle stuff is expected to set timeout 0 (or just a few ms). but most of all because it requires /all/ external events to come out of that poll() call. If you
Re: [Pacemaker] [Question] About the rotation of the pe-file.
On Fri, Jan 06, 2012 at 10:12:06AM +0900, renayama19661...@ybb.ne.jp wrote: Hi Andrew, Thank you for comments. Could you try with: while(max = 0 sequence max) { The problem is not settled by this correction. The rotation is carried out with a value except 0. If you want it to be between [0, max-1], obviously that should be while(max 0 sequence = max) { sequence -= max; } Though I wonder why not simply: if (max == 0) return; if (sequence max) sequence = 0; -- : Lars Ellenberg : LINBIT | Your Way to High Availability : DRBD/HA support and consulting http://www.linbit.com ___ Pacemaker mailing list: Pacemaker@oss.clusterlabs.org http://oss.clusterlabs.org/mailman/listinfo/pacemaker Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://bugs.clusterlabs.org
Re: [Pacemaker] [Problem] The attrd does not sometimes stop.
On Tue, Jan 10, 2012 at 04:43:51PM +0900, renayama19661...@ybb.ne.jp wrote: Hi Lars, I attach strace file when a problem reappeared at the end of last year. I used glue which applied your patch for confirmation. It is the file which I picked with attrd by strace -p command right before I stop Heartbeat. Finally SIGTERM caught it, but attrd did not stop. The attrd stopped afterwards when I sent SIGKILL. The strace reveals something interesting: This poll looks like the mainloop poll, but some -prepare() has modified the timeout to be 0, so we proceed directly to -check() and then -dispatch(). poll([{fd=7, events=POLLIN|POLLPRI}, {fd=4, events=POLLIN|POLLPRI}, {fd=8, events=POLLIN|POLLPRI}], 3, 0) = 1 ([{fd=8, revents=POLLIN|POLLHUP}]) times({tms_utime=2, tms_stime=3, tms_cutime=0, tms_cstime=0}) = 433738632 recv(4, 0x95af308, 576, MSG_DONTWAIT) = -1 EAGAIN (Resource temporarily unavailable) ... recv(7, 0x95b1657, 3513, MSG_DONTWAIT) = -1 EAGAIN (Resource temporarily unavailable) poll([{fd=7, events=0}], 1, 0) = ? ERESTART_RESTARTBLOCK (To be restarted) --- SIGTERM (Terminated) @ 0 (0) --- sigreturn() = ? (mask now []) Ok. signal received, trigger set. Still finishing this mainloop iteration, though. These recv(),poll() look like invocations of G_CH_prepare_int(). Does not matter much, though. recv(7, 0x95b1657, 3513, MSG_DONTWAIT) = -1 EAGAIN (Resource temporarily unavailable) poll([{fd=7, events=0}], 1, 0) = 0 (Timeout) recv(7, 0x95b1657, 3513, MSG_DONTWAIT) = -1 EAGAIN (Resource temporarily unavailable) poll([{fd=7, events=0}], 1, 0) = 0 (Timeout) times({tms_utime=2, tms_stime=3, tms_cutime=0, tms_cstime=0}) = 433738634 Now we proceed to the next mainloop poll: poll([{fd=7, events=POLLIN|POLLPRI}, {fd=4, events=POLLIN|POLLPRI}, {fd=5, events=POLLIN|POLLPRI}], 3, -1 Note the -1 (infinity timeout!) So even though the trigger was (presumably) set, and the -prepare() should have returned true, the mainloop waits forever for something to happen on those file descriptors. I suggest this: crm_trigger_prepare should set *timeout = 0, if trigger is set. Also think about this race: crm_trigger_prepare was already called, only then the signal came in... diff --git a/lib/common/mainloop.c b/lib/common/mainloop.c index 2e8b1d0..fd17b87 100644 --- a/lib/common/mainloop.c +++ b/lib/common/mainloop.c @@ -33,6 +33,13 @@ static gboolean crm_trigger_prepare(GSource * source, gint * timeout) { crm_trigger_t *trig = (crm_trigger_t *) source; +/* Do not delay signal processing by the mainloop poll stage */ +if (trig-trigger) + *timeout = 0; +/* To avoid races between signal delivery and the mainloop poll stage, + * make sure we always have a finite timeout. Unit: milliseconds. */ +else + *timeout = 5000; /* arbitrary */ return trig-trigger; } This scenario does not let the blocked IPC off the hook, though. That is still possible, both for blocking send and blocking receive, so that should probably be fixed as well, somehow. I'm not sure how likely this stuck in blocking IPC is, though. -- : Lars Ellenberg : LINBIT | Your Way to High Availability : DRBD/HA support and consulting http://www.linbit.com DRBD® and LINBIT® are registered trademarks of LINBIT, Austria. ___ Pacemaker mailing list: Pacemaker@oss.clusterlabs.org http://oss.clusterlabs.org/mailman/listinfo/pacemaker Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://bugs.clusterlabs.org
Re: [Pacemaker] [Problem] The attrd does not sometimes stop.
On Thu, Dec 22, 2011 at 09:54:47AM +0900, renayama19661...@ybb.ne.jp wrote: Hi Dejan, Hi Lars, In our environment, the problem recurred with the patch of Mr. Lars. After a problem occurred, I sent TERM signal, but attrd does not seem to receive TERM at all. If you are able to reproduce, you could try to find out what exactly attrd is doing. various ways to try to do that: cat /proc/pid-of-attrd/stack # if your platform supports that strace it, ltrace it, attach with gdb and provide a stack trace, or even start to single step it, cause attrd to core dump, and analyse the core. The reconsideration of the patch is necessary for the solution to problem. Best Regards, Hideo Yamauchi. --- On Tue, 2011/11/15, renayama19661...@ybb.ne.jp renayama19661...@ybb.ne.jp wrote: Hi Dejan, Hi Lars, I understood it. I try the operation of the patch in our environment. To Alan: Will you try a patch? Best Regards, Hideo Yamauchi. --- On Tue, 2011/11/15, Dejan Muhamedagic deja...@fastmail.fm wrote: Hi, On Mon, Nov 14, 2011 at 01:17:37PM +0100, Lars Ellenberg wrote: On Mon, Nov 14, 2011 at 11:58:09AM +1100, Andrew Beekhof wrote: On Mon, Nov 7, 2011 at 8:39 AM, Lars Ellenberg lars.ellenb...@linbit.com wrote: On Thu, Nov 03, 2011 at 01:49:46AM +1100, Andrew Beekhof wrote: On Tue, Oct 18, 2011 at 12:19 PM, renayama19661...@ybb.ne.jp wrote: Hi, We sometimes fail in a stop of attrd. Step1. start a cluster in 2 nodes Step2. stop the first node.(/etc/init.d/heartbeat stop.) Step3. stop the second node after time passed a little.(/etc/init.d/heartbeat stop.) The attrd catches the TERM signal, but does not stop. There's no evidence that it actually catches it, only that it is sent. I've seen it before but never figured out why it occurs. I had it once tracked down almost to where it occurs, but then got distracted. Yes the signal was delivered. I *think* it had to do with attrd doing a blocking read, or looping in some internal message delivery function too often. I had a quick look at the code again now, to try and remember, but I'm not sure. I *may* be that, because xmlfromIPC(IPC_Channel * ch, int timeout) calls msg = msgfromIPC_timeout(ch, MSG_ALLOWINTR, timeout, ipc_rc); And MSG_ALLOWINTR will cause msgfromIPC_ll() to IPC_INTR: if ( allow_intr){ goto startwait; Depending on the frequency of deliverd signals, it may cause this goto startwait loop to never exit, because the timeout always starts again from the full passed in timeout. If only one signal is deliverd, it may still take 120 seconds (MAX_IPC_DELAY from crm.h) to be actually processed, as the signal handler only raises a flag for the next mainloop iteration. If a (non-fatal) signal is delivered every few seconds, then the goto loop will never timeout. Please someone check this for plausibility ;-) Most plausible explanation I've heard so far... still odd that only attrd is affected. So what do we do about it? Reproduce, and confirm that this is what people are seeing. Make attrd non-blocking? Fix the ipc layer to not restart the full timeout, but only the remaining partial time? Lars and I made a quick patch for cluster-glue (attached). Hideo-san, is there a way for you to verify if it helps? The patch is not perfect and under unfavourable circumstances it may still take a long time for the caller to exit, but it'd be good to know if this is the right spot. Cheers, Dejan -- : Lars Ellenberg : LINBIT | Your Way to High Availability : DRBD/HA support and consulting http://www.linbit.com ___ Pacemaker mailing list: Pacemaker@oss.clusterlabs.org http://oss.clusterlabs.org/mailman/listinfo/pacemaker Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://developerbugs.linux-foundation.org/enter_bug.cgi?product=Pacemaker ___ Pacemaker mailing list: Pacemaker@oss.clusterlabs.org http://oss.clusterlabs.org/mailman/listinfo/pacemaker Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://bugs.clusterlabs.org -- : Lars Ellenberg : LINBIT | Your Way to High Availability : DRBD/HA support and consulting http://www.linbit.com DRBD® and LINBIT® are registered trademarks of LINBIT, Austria. ___ Pacemaker
Re: [Pacemaker] Remote CRM shell from LCMC
On Wed, Dec 28, 2011 at 12:57:33AM +0100, Rasto Levrinc wrote: Hi, this being a slow news day, There is this great new feature in LCMC, but probably completely useless. :) The LCMC used to show for testing purposes the CRM shell configuration, but people started to use it, so I left it there, made it now editable and added a commit button, that commits the changes. You can see it as a hole in the bottom of the car, if you are stuck you can still power the car by your feet. There are also some unexpected advantages over crm configure edit, see the video. http://youtu.be/X75wzUTRmjU?hd=1 Nice. Sound is missing for me from 3:00 onwards. Just in case that was not intentional... Lars ___ Pacemaker mailing list: Pacemaker@oss.clusterlabs.org http://oss.clusterlabs.org/mailman/listinfo/pacemaker Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://bugs.clusterlabs.org
Re: [Pacemaker] don't want to restart clone resource
On Fri, Dec 09, 2011 at 08:23:36AM +1100, Andrew Beekhof wrote: Can you file a bug and attach a crm_report to it please? Unfortunately there's not enough information here to figure out the cause (although it does look like a bug) Node count drops from three to two, rsc:2 becomes the label of orphaned resources, orphanes are to be stopped by default? Something like that? 2011/12/1 Sha Fanghao shafang...@gmail.com: Hi, I have a cluster 3 nodes (CentOS 5.2) using pacemaker-1.0.11(also 1.0.12), with heartbeat-3.0.3. You can see the configuration: #crm configure show: node $id=85e0ca02-7aa4-45c8-9911-4035e1e6ee15 node-2 node $id=a046bd1e-6267-49e5-902d-c87b6ed1dcb9 node-0 node $id=d0f0b2ab-f243-4f78-b541-314fa7d6b346 node-1 primitive failover-ip ocf:heartbeat:IPaddr2 \ params ip=10.10.5.83 \ op monitor interval=5s primitive master-app-rsc lsb:cluster-master \ op monitor interval=5s primitive node-app-rsc lsb:cluster-node \ op monitor interval=5s group group-dc failover-ip master-app-rsc clone clone-node-app-rsc node-app-rsc location rule-group-dc group-dc \ rule $id=rule-group-dc-rule -inf: #is_dc eq false property $id=cib-bootstrap-options \ start-failure-is-fatal=false \ no-quorum-policy=ignore \ symmetric-cluster=true \ stonith-enabled=false \ dc-version=1.0.11-1554a83db0d3c3e546cfd3aaff6af1184f79ee87 \ cluster-infrastructure=Heartbeat #crm_mon -n -1: Last updated: Sat Oct 29 08:44:14 2011 Stack: Heartbeat Current DC: node-0 (a046bd1e-6267-49e5-902d-c87b6ed1dcb9) - partition with quorum Version: 1.0.11-1554a83db0d3c3e546cfd3aaff6af1184f79ee87 3 Nodes configured, unknown expected votes 2 Resources configured. Node node-0 (a046bd1e-6267-49e5-902d-c87b6ed1dcb9): online master-app-rsc (lsb:cluster-master) Started failover-ip (ocf::heartbeat:IPaddr2) Started node-app-rsc:0 (lsb:cluster-node) Started Node node-1 (d0f0b2ab-f243-4f78-b541-314fa7d6b346): online node-app-rsc:1 (lsb:cluster-node) Started Node node-2 (85e0ca02-7aa4-45c8-9911-4035e1e6ee15): online node-app-rsc:2 (lsb:cluster-node) Started The problem: After stopping heartbeat service on node-1, if I remove node-1 with command hb_delnode node-1 crm node delete node-1, then the clone resource(node-app-rsc:2) running on the node-2 will restart and change to node-app-rsc:1. You know, the node-app-rsc is my application, and I don't want it to restart. How could I do, Please? Any help will be very appreciated. :) Best Regards, Fanghao Sha ___ Pacemaker mailing list: Pacemaker@oss.clusterlabs.org http://oss.clusterlabs.org/mailman/listinfo/pacemaker Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://bugs.clusterlabs.org ___ Pacemaker mailing list: Pacemaker@oss.clusterlabs.org http://oss.clusterlabs.org/mailman/listinfo/pacemaker Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://bugs.clusterlabs.org -- : Lars Ellenberg : LINBIT | Your Way to High Availability : DRBD/HA support and consulting http://www.linbit.com DRBD® and LINBIT® are registered trademarks of LINBIT, Austria. ___ Pacemaker mailing list: Pacemaker@oss.clusterlabs.org http://oss.clusterlabs.org/mailman/listinfo/pacemaker Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://bugs.clusterlabs.org
Re: [Pacemaker] [Linux-HA] Antw: Re: Q: unmanaged MD-RAID auto-recovery
On Fri, Nov 25, 2011 at 01:54:33PM +0100, Florian Haas wrote: On 11/25/11 13:29, Lars Ellenberg wrote: From the log snippet it's not entirely clear whether that's a recurring monitor (interval == whatever you configured, or 20 if default), or a probe (interval == 0). A recurring monitor clearly should not happen at all when unmanaged. That is incorrect. is-managed=false does still monitor the resource. It only prevents pacemaker from sending start/stop etc commands to that resource. My understanding was that only probes would still occur (on cluster-recheck-interval, or when new nodes joined the cluster). And I maintain that that would be the intuitively correct behavior for unmanaged resources. Andrew? Well, your understanding or intuition seem to misguide you this time. But if you think I make shit up ;-) http://www.gossamer-threads.com/lists/linuxha/pacemaker/70606#70606 If the implementation of the monitor action in the RA does trigger auto-recovery or other things, well, then it does. Which seems to operate on the same assumption, really, that an unmanaged resource never has its monitor action executed. I still think that this attempt to auto-recover from _within_ the monitor action is a bit insane, but maybe lmb (who implemented that part, as per git blame) would be able to share his thoughts as to why he did it that way. Well, that's the only place where an auto-recovery of a degraded (not yet failed!) md array can be triggered from pacemaker. There is no $OCF_DEGRADED status code, and no try-resource-internal-recovery action. And if there was, what else could it do? If you rather have some external monitoring page an operator to then log in and do the same actions... If you do md over long distance iSCSI (e.g.), and you lose one of the links, md will detach that leg. If the link comes back, this is where it then could recover, and start to resync. Besides, you explicitly have to request this behaviour of the RA. I think that approach is perfectly sane. -- : Lars Ellenberg : LINBIT | Your Way to High Availability : DRBD/HA support and consulting http://www.linbit.com DRBD® and LINBIT® are registered trademarks of LINBIT, Austria. ___ Pacemaker mailing list: Pacemaker@oss.clusterlabs.org http://oss.clusterlabs.org/mailman/listinfo/pacemaker Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://bugs.clusterlabs.org
Re: [Pacemaker] Syntax highlighting in vim for crm configure edit
On Fri, Aug 19, 2011 at 05:28:09PM +0300, Dan Frincu wrote: Hi, On Thu, Aug 18, 2011 at 5:53 PM, Digimer li...@alteeve.com wrote: On 08/18/2011 10:39 AM, Trevor Hemsley wrote: Hi all I have attached a first stab at a vim syntax highlighting file for 'crm configure edit' To activate this, I have added 'filetype plugin on' to my /root/.vimrc then created /root/.vim/{ftdetect,ftplugin}/pcmk.vim In /root/.vim/ftdetect/pcmk.vim I have the following content au BufNewFile,BufRead /tmp/tmp* set filetype=pcmk but there may be a better way to make this happen. /root/.vim/pcmk.vim is the attached file. Comments (not too nasty please!) welcome. I've added a couple of extra keywords to the file, to cover a couple more use cases. Other than that, great job. Regards, Dan I would love to see proper support added for CRM syntax highlighting added to vim. I will give this is a test and write back in a bit. -- Digimer Cool. I remember that I had some initial attempt about a year ago myself writing some vim syntax file, I attach as pacemaker-crm.vim. (took me some minutes to dig it up again). I did not really look at the current pcmk.vim, just tried it, and apparently it does not attempt to give the user hints for common errors, or at least not for those I do most commonly. If you use the pacemaker-crm.vim (which I attached), it would highlight a few things as error, like spurious space after backslash, spurious backslash before new primitive definition, forgetting the colon after an order or colocation score, all these things. It is incomplete, and I don't even know anymore what I thought when I wrote it, it was never in active use, and I won't have time do actually work on this myself. I may or may not be able to answer questions ;-) Not perfect, either. Probably detects much more errors than necessary, and does not detect some that would be nice to have detected. (brace errors, quotation errors ...) But if there should be some vim syntax wizard out there, maybe our two attempts on doing it can somehow be merged. I'll just throw it at you, feel free to ignore, or reuse (parts) of it. Cheers, Lars -- : Lars Ellenberg : LINBIT | Your Way to High Availability : DRBD/HA support and consulting http://www.linbit.com Vim syntax file Language: pacemaker-crm configuration style (http://www.clusterlabs.org/doc/crm_cli.html) Filename:pacemaker-crm.vim Language:pacemaker crm configuration text Maintainer: Lars Ellenberg l...@linbit.com Last Change: Thu, 18 Feb 2010 16:04:36 +0100 What to do to install this file: $ mkdir -p ~/.vim/syntax $ cp pacemaker-crm.vim ~/.vim/syntax to set the filetype manually, just do :setf pacemaker-crm TODO: autodetection logic, maybe augroup filetypedetect au BufNewFile,BufRead *.pacemaker-crm setf pacemaker-crm augroup END If you do not already have a .vimrc with syntax on then do this: $ echo syntax on ~/.vimrc Now every file with a filename matching *.pacemaker-crm will be edited using these definitions for syntax highlighting. TODO: maybe add some indentation rules as well? For version 5.x: Clear all syntax items For version 6.x: Quit when a syntax file was already loaded if version 600 syntax clear elseif exists(b:current_syntax) finish endif syn clear syn sync lines=30 syn case ignore syn match crm_unexpected /[^ ]\+/ syn match crm_lspace transparent /^[ \t]*/ nextgroup=crm_node,crm_container,crm_head syn match crm_tspace_err /\\[ \t]\+/ syn match crm_tspace_err /\\\n\(primitive\|node\|group\|ms\|order\|location\|colocation\|property\).*/ syn match crm_nodetransparent /\node \$id=[^ ]\+ \([a-z0-9.-]\+\)\?/ \ contains=crm_head,crm_assign,crm_nodename \ nextgroup=crm_block syn region crm_block transparent keepend contained start=/[ \t]/ skip=/\\$/ end=/$/ \ contains=crm_assign,crm_key,crm_meta,crm_tspace_err,crm_ops syn region crm_order_block transparent keepend contained start=/[ \t]/ skip=/\\$/ end=/$/ \ contains=crm_order_ref syn region crm_colo_block transparent keepend contained start=/[ \t]/ skip=/\\$/ end=/$/ \ contains=crm_colo_ref syn region crm_metatransparent keepend contained start=/[ \t]meta\/ skip=/\\$/ end=/$/ end=/[ \t]\(params\|op\)[ \t]/ \ contains=crm_key,crm_meta_assign syn keyword crm_container contained group clone ms nextgroup=crm_id syn keyword crm_headcontained node syn keyword crm_headcontained property nextgroup=crm_block syn keyword crm_headcontained primitive nextgroup=crm_res_id syn keyword crm_headcontained location nextgroup=crm_id syn match crm_id contained nextgroup=crm_ref,crm_block /[ \t]\+\[a-z0-9_-]\+\/ syn
Re: [Pacemaker] IPv6addr failure loopback interface
, then ifconfig | grep not seeing the address? I think that's not necessary. then ocf_log info $process: Started successfully. return $OCF_SUCCESS else ocf_log err $process: Could not be started: ipv6addr[\$ipv6addr\] cidr_netmask[\$cidr_netmask\]. return $OCF_ERR_GENERIC fi else # If already running, consider start successful ocf_log debug $process: is already running return $OCF_SUCCESS fi } IPv6addrLO_stop() { ocf_log debug $process: Running STOP function. if [ -n $OCF_RESKEY_stop_timeout ] then stop_timeout=$OCF_RESKEY_stop_timeout elif [ -n $OCF_RESKEY_CRM_meta_timeout ]; then # Allow 2/3 of the action timeout for the orderly shutdown # (The origin unit is ms, hence the conversion) stop_timeout=$((OCF_RESKEY_CRM_meta_timeout/1500)) else stop_timeout=10 fi and suddenly, completely different (and much more readable) indentation. thanks. Still I think this is no necessary. Or at least, I don't understand what you are trying to protect against: Why would ifconfig del fail, and a few seconds later succeed? If you really want to retry, this whole function should become while iface_has_ipv6 ! ifconfig del ; do sleep 1; done return $OCF_SUCCESS and the crmd/lrmd will enforce the timeout on you. No need to go fancy and simulate a shutdown escalation like an IP address was a database or something. if IPv6addrLO_status then $IFCONFIG_BIN $IFACE del `cat $pidfile` i=0 while [ $i -lt $stop_timeout ] do if ! IPv6addrLO_status then rm -f $pidfile return $OCF_SUCCESS fi sleep 1 i=`expr $i + 1` done ocf_log warn Stop failed. Trying again. $IFCONFIG_BIN $IFACE del `cat $pidfile` rm -f $pidfile if ! IPv6addrLO_status then ocf_log warn Stop success. return $OCF_SUCCESS else ocf_log err Failed to stop. return $OCF_ERR_GENERIC fi else # was not running, so stop can be considered successful $ICONFIG_BIN $IFACE del `cat $pidfile` rm -f $pidfile return $OCF_SUCCESS fi } IPv6addrLO_monitor() { IPv6addrLO_status ret=$? if [ $ret -eq $OCF_SUCCESS ] then if [ -n $OCF_RESKEY_monitor_hook ]; then eval $OCF_RESKEY_monitor_hook if [ $? -ne $OCF_SUCCESS ]; then return ${OCF_ERR_GENERIC} fi return $OCF_SUCCESS else true fi else return $ret fi } IPv6addrLO_validate() { ocf_log debug IPv6addrLO validating: args:[\$*\] if [ -x $IFCONFIG_BIN ] then ocf_log debug Binary \$IFCONFIG_BIN\ exist and is executable. return $OCF_SUCCESS else ocf_log err Binary \$IFCONFIG_BIN\ does not exist or isn't executable. return $OCF_ERR_INSTALLED fi ocf_log err Error while validating. return $OCF_ERR_GENERIC } IPv6addrLO_meta(){ cat END ?xml version=1.0? !DOCTYPE resource-agent SYSTEM ra-api-1.dtd resource-agent name=IPv6addrLO version0.1/version longdesc lang=en OCF RA to manage IPv6addr on loopback interface Linux /longdesc shortdesc lang=enIPv6 addr on loopback linux/shortdesc parameters parameter name=ipv6addr required=1 longdesc lang=en The ipv6 addr to asign to the loopback interface. /longdesc shortdesc lang=enIpv6 addr to the loopback interface./shortdesc content type=string default=/ /parameter parameter name=cidr_netmask required=1 longdesc lang=en The cidr netmask of the ipv6 addr. /longdesc shortdesc lang=ennetmask of the ipv6 addr./shortdesc content type=string default=128/ /parameter parameter name=logfile required=0 longdesc lang=en File to write STDOUT to /longdesc shortdesc lang=enFile to write STDOUT to/shortdesc content type=string / /parameter parameter name=errlogfile required=0 longdesc lang=en File to write STDERR to /longdesc shortdesc lang=enFile to write STDERR to/shortdesc content type=string / /parameter /parameters actions action name=start timeout=20s / action name=stoptimeout=20s / action name=monitor depth=0 timeout=20s interval=10 / action name=meta-data timeout=5 / action name=validate-all timeout=5 / /actions /resource-agent END exit 0 } case $1 in meta-data|metadata|meta_data|meta) IPv6addrLO_meta ;; start) IPv6addrLO_start ;; stop) IPv6addrLO_stop ;; monitor) IPv6addrLO_monitor ;; validate-all) IPv6addrLO_validate ;; *) ocf_log err $0 was called with unsupported arguments: exit $OCF_ERR_UNIMPLEMENTED ;; esac Cheers, -- : Lars Ellenberg : LINBIT | Your
Re: [Pacemaker] [Drbd-dev] crm_attribute --quiet (was Fwd: [Linux-HA] Should This Worry Me?)
On Mon, Nov 14, 2011 at 09:51:46AM +1100, Andrew Beekhof wrote: confused as to what the correct flag actually is. ocf:linbit:drbd (in both 8.3 and 8.4) uses -Q whereas Pacemaker expects -q as of this commit: commit c11ce5e9b0b13ead02b5fc4add928d7e7f95092e Author: Andrew Beekhof and...@beekhof.net Date: Tue Sep 22 17:29:38 2009 +0200 Medium: Tools: Use -q as the short form for --quiet (for consistency) Mercurial revision: 7289e661e4923beee4b7b45bc85592564ccdc438 Should ocf:linbit:drbd be using -q? Correct. Sorry about that. -Q is still accepted, though. As it is accepted for a larger range of crm_attribute versions, I'll keep it for now. -- : Lars Ellenberg : LINBIT | Your Way to High Availability : DRBD/HA support and consulting http://www.linbit.com ___ Pacemaker mailing list: Pacemaker@oss.clusterlabs.org http://oss.clusterlabs.org/mailman/listinfo/pacemaker Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://developerbugs.linux-foundation.org/enter_bug.cgi?product=Pacemaker
Re: [Pacemaker] Newcomer's question - API?
On Tue, Nov 01, 2011 at 04:52:42PM -, Tim Ward wrote: You can try tooking at LCMC as that is a Java-based GUI that should at least get you going. I did find some Java code but we can't use it because it's GPL, and I didn't want to study it in case I accidentally copied some of it in recreating it. You know, there are effectively no more than two entities you need to talk to, if you wanted the LCMC under some non-GPL licence. Which is Rasto, and LINBIT. Just a thought... -- : Lars Ellenberg : LINBIT | Your Way to High Availability : DRBD/HA support and consulting http://www.linbit.com ___ Pacemaker mailing list: Pacemaker@oss.clusterlabs.org http://oss.clusterlabs.org/mailman/listinfo/pacemaker Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://developerbugs.linux-foundation.org/enter_bug.cgi?product=Pacemaker
Re: [Pacemaker] [Problem] The attrd does not sometimes stop.
On Thu, Nov 03, 2011 at 01:49:46AM +1100, Andrew Beekhof wrote: On Tue, Oct 18, 2011 at 12:19 PM, renayama19661...@ybb.ne.jp wrote: Hi, We sometimes fail in a stop of attrd. Step1. start a cluster in 2 nodes Step2. stop the first node.(/etc/init.d/heartbeat stop.) Step3. stop the second node after time passed a little.(/etc/init.d/heartbeat stop.) The attrd catches the TERM signal, but does not stop. There's no evidence that it actually catches it, only that it is sent. I've seen it before but never figured out why it occurs. I had it once tracked down almost to where it occurs, but then got distracted. Yes the signal was delivered. I *think* it had to do with attrd doing a blocking read, or looping in some internal message delivery function too often. I had a quick look at the code again now, to try and remember, but I'm not sure. I *may* be that, because xmlfromIPC(IPC_Channel * ch, int timeout) calls msg = msgfromIPC_timeout(ch, MSG_ALLOWINTR, timeout, ipc_rc); And MSG_ALLOWINTR will cause msgfromIPC_ll() to IPC_INTR: if ( allow_intr){ goto startwait; Depending on the frequency of deliverd signals, it may cause this goto startwait loop to never exit, because the timeout always starts again from the full passed in timeout. If only one signal is deliverd, it may still take 120 seconds (MAX_IPC_DELAY from crm.h) to be actually processed, as the signal handler only raises a flag for the next mainloop iteration. If a (non-fatal) signal is delivered every few seconds, then the goto loop will never timeout. Please someone check this for plausibility ;-) -- : Lars Ellenberg : LINBIT | Your Way to High Availability : DRBD/HA support and consulting http://www.linbit.com ___ Pacemaker mailing list: Pacemaker@oss.clusterlabs.org http://oss.clusterlabs.org/mailman/listinfo/pacemaker Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://developerbugs.linux-foundation.org/enter_bug.cgi?product=Pacemaker
Re: [Pacemaker] location setting with parenthesis
On Thu, Nov 03, 2011 at 07:23:01PM +0900, 池田 淳子 wrote: Hi, location rsc_location-1 msDRBD \ rule role=master -inf: \ (defined master-prmMySQL:0 and master-prmMySQL:0 gt 0) or \ (defined master-prmMySQL:1 and master-prmMySQL:1 gt 0) Why not using two rules for this location constraint? I expect that to work the same way you want to express in your rule above. Do you mean the following rules? location rsc_location-1 msDRBD \ rule role=master -inf: defined master-prmMySQL:0 and master-prmMySQL:0 gt 0 \ rule role=master -inf: defined master-prmMySQL:1 and master-prmMySQL:1 gt 0 I may be missing something obvious, but why not a colocation constraint between msDRBD and prmMySQL? something like colocation asdf -inf: msDRBD:Master prmMySQL:Master -- : Lars Ellenberg : LINBIT | Your Way to High Availability : DRBD/HA support and consulting http://www.linbit.com ___ Pacemaker mailing list: Pacemaker@oss.clusterlabs.org http://oss.clusterlabs.org/mailman/listinfo/pacemaker Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://developerbugs.linux-foundation.org/enter_bug.cgi?product=Pacemaker
Re: [Pacemaker] location setting with parenthesis
On Thu, Nov 03, 2011 at 09:30:45PM +0100, Andreas Kurz wrote: On 11/03/2011 12:38 PM, Lars Ellenberg wrote: On Thu, Nov 03, 2011 at 07:23:01PM +0900, 池田 淳子 wrote: Hi, location rsc_location-1 msDRBD \ rule role=master -inf: \ (defined master-prmMySQL:0 and master-prmMySQL:0 gt 0) or \ (defined master-prmMySQL:1 and master-prmMySQL:1 gt 0) Why not using two rules for this location constraint? I expect that to work the same way you want to express in your rule above. Do you mean the following rules? location rsc_location-1 msDRBD \ rule role=master -inf: defined master-prmMySQL:0 and master-prmMySQL:0 gt 0 \ rule role=master -inf: defined master-prmMySQL:1 and master-prmMySQL:1 gt 0 I may be missing something obvious, but why not a colocation constraint between msDRBD and prmMySQL? something like colocation asdf -inf: msDRBD:Master prmMySQL:Master I don't think you miss something obvious, lars ;-) yes, that constraint you recommend would be the way to go ... I was only commenting on the parenthesis not on the quality of the rules ;-) Well, actually probably colocation asdf -inf: msDRBD:Master msMySQL:Master assuming prmMySQL was the primitive and msMySQL the ms resource. Anyways, variations of that theme should do fine. -- : Lars Ellenberg : LINBIT | Your Way to High Availability : DRBD/HA support and consulting http://www.linbit.com ___ Pacemaker mailing list: Pacemaker@oss.clusterlabs.org http://oss.clusterlabs.org/mailman/listinfo/pacemaker Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://developerbugs.linux-foundation.org/enter_bug.cgi?product=Pacemaker
Re: [Pacemaker] crm_master triggering assert section != NULL
On Wed, Oct 12, 2011 at 08:08:21PM -0400, Yves Trudeau wrote: What about referring to the git repository here: http://www.clusterlabs.org/wiki/Get_Pacemaker#Building_from_Source http://www.clusterlabs.org/mwiki/index.php?title=Installdiff=1287oldid=1282 Lars ___ Pacemaker mailing list: Pacemaker@oss.clusterlabs.org http://oss.clusterlabs.org/mailman/listinfo/pacemaker Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://developerbugs.linux-foundation.org/enter_bug.cgi?product=Pacemaker
Re: [Pacemaker] Postgres RA won't start
On Wed, Oct 12, 2011 at 07:41:20PM -0600, Serge Dubrouski wrote: On Wed, Oct 12, 2011 at 9:20 AM, Amar Prasovic a...@linux.org.ba wrote: Thank you all for tips and suggestions. I managed to configure postgres so it actually starts. First, I updated resource-agents (Florian thanks for the tip, still don't know how did I manage to miss that :) ) Second, I deleted postgres primitive, cleared all failcounts and configure it again like this: primitive postgres_res ocf:heartbeat:pgsql \ params pgctl=/usr/lib/postgresql/8.4/bin/pg_ctl psql=/usr/bin/psql start_opt= pgdata=/var/lib/postgresql/8.4/main config=/etc/postgresql/8.4/main/postgresql.conf pgdba=postgres \ op start interval=0 timeout=120s \ op stop interval=0 timeout=120s \ op monitor interval=30s timeout=30s depth=0 After that, it all worked like a charm. However, I noticed some strange output in the log file, it wasn't there before I updated the resource-agents. Here is the extract from the syslog: http://pastebin.com/ybPi0VMp (postgres_res:monitor:stderr) [: 647: monitor: unexpected operator This error is actually reported with any operator. I tried to start the script from CLI, I got the same thing with ./pgsql start, ./pgsql status, ./pgsql stop Weird. I don't know what to tell. The RA is basically all right, it just misses one nor very important fix. On my system CentOS 5. PosgreSQL 8.4 or 9.0 it doesn't produce any errors. If understand you log right the problem is in line 647 of the RA which is: [ $1 == validate-all ] exit $rc == != = Make that [ $1 = validate-all ] exit $rc -- : Lars Ellenberg : LINBIT | Your Way to High Availability : DRBD/HA support and consulting http://www.linbit.com ___ Pacemaker mailing list: Pacemaker@oss.clusterlabs.org http://oss.clusterlabs.org/mailman/listinfo/pacemaker Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://developerbugs.linux-foundation.org/enter_bug.cgi?product=Pacemaker
Re: [Pacemaker] Postgres RA won't start
On Thu, Oct 13, 2011 at 06:35:27AM -0600, Serge Dubrouski wrote: On Thu, Oct 13, 2011 at 4:29 AM, Lars Ellenberg lars.ellenb...@linbit.comwrote: On Wed, Oct 12, 2011 at 07:41:20PM -0600, Serge Dubrouski wrote: On Wed, Oct 12, 2011 at 9:20 AM, Amar Prasovic a...@linux.org.ba wrote: Thank you all for tips and suggestions. I managed to configure postgres so it actually starts. First, I updated resource-agents (Florian thanks for the tip, still don't know how did I manage to miss that :) ) Second, I deleted postgres primitive, cleared all failcounts and configure it again like this: primitive postgres_res ocf:heartbeat:pgsql \ params pgctl=/usr/lib/postgresql/8.4/bin/pg_ctl psql=/usr/bin/psql start_opt= pgdata=/var/lib/postgresql/8.4/main config=/etc/postgresql/8.4/main/postgresql.conf pgdba=postgres \ op start interval=0 timeout=120s \ op stop interval=0 timeout=120s \ op monitor interval=30s timeout=30s depth=0 After that, it all worked like a charm. However, I noticed some strange output in the log file, it wasn't there before I updated the resource-agents. Here is the extract from the syslog: http://pastebin.com/ybPi0VMp (postgres_res:monitor:stderr) [: 647: monitor: unexpected operator This error is actually reported with any operator. I tried to start the script from CLI, I got the same thing with ./pgsql start, ./pgsql status, ./pgsql stop Weird. I don't know what to tell. The RA is basically all right, it just misses one nor very important fix. On my system CentOS 5. PosgreSQL 8.4 or 9.0 it doesn't produce any errors. If understand you log right the problem is in line 647 of the RA which is: [ $1 == validate-all ] exit $rc == != = Theoretically yes = is for strings and == is for numbers. But why it would create a problem on Debian and not on CentOS and why nobody else reported this issue so far? BTW, other RAs use == operator as well: apache, LVM, portblock, As you found out by now, if they are bash, that's ok. If they are /bin/sh, then that's a bug. dash for example does not like ==. And no, apache and portblock use these in some embeded awk script. LVM I fixed as well. Make that [ $1 = validate-all ] exit $rc -- : Lars Ellenberg : LINBIT | Your Way to High Availability : DRBD/HA support and consulting http://www.linbit.com ___ Pacemaker mailing list: Pacemaker@oss.clusterlabs.org http://oss.clusterlabs.org/mailman/listinfo/pacemaker Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://developerbugs.linux-foundation.org/enter_bug.cgi?product=Pacemaker
Re: [Pacemaker] nginx OCF script - strange syslog output
On Wed, Oct 12, 2011 at 09:23:15PM +0200, Dejan Muhamedagic wrote: Hi, On Wed, Oct 12, 2011 at 05:28:47PM +0200, Amar Prasovic wrote: Hello everyone, I've found one nginx OCF script online and decided to use it since no default script is provided. Here is the script I am using: http://pastebin.com/CCApckew You can always get the latest nginx release from our repository https://github.com/ClusterLabs/resource-agents The good news is, the script is functional, I get nginx running. The sort of a bad news is, every ten seconds I got some strange log output. Here is the extract from my syslog: http://pastebin.com/ybPi0VMp I suppose the problem is somewhere with monitor operator but I cannot figure out where. Parsing of the nginx configuration file is done on each invokation, which is a design bug^W choice of that resource agent, so it is done on every monitor action. Parsing is rudimentary at best. Things get read by awk, passed to shell commands, mangled again through sed and awk, the result being finally eval'ed... A lot of stuff that can go wrong there. All of that just to guess the root, pid, and listen directive from the config file. I used this script with Debian 5 some half a year ago and I didn't have this output. It appeared on Debian 6.0.3 Compare the config files (nginx.conf and it's includes). Avoid more than one statement on one line. Especially include statements. My guess is that parsing those is partially broken, possibly only for relative paths. No idea what's going on. But it doesn't look good. In particular as it looks like as if it's trying to execute something it shouldn't. You can add at the top of the RA 'set -x' in between monitors then take look at the logs. Beware: you should probably disable monitor while editing the RA. Or best to try it out on a test cluster. Thanks, Dejan Now, this is not some essential problem since logrotate is in place and the file is not getting that big, but still it kind of makes reading the file difficult since I have to scroll through thousands of unnecessary lines. -- Amar Prasovic Gaißacher Straße 17 D - 81371 München ___ Pacemaker mailing list: Pacemaker@oss.clusterlabs.org http://oss.clusterlabs.org/mailman/listinfo/pacemaker Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://developerbugs.linux-foundation.org/enter_bug.cgi?product=Pacemaker ___ Pacemaker mailing list: Pacemaker@oss.clusterlabs.org http://oss.clusterlabs.org/mailman/listinfo/pacemaker Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://developerbugs.linux-foundation.org/enter_bug.cgi?product=Pacemaker -- : Lars Ellenberg : LINBIT | Your Way to High Availability : DRBD/HA support and consulting http://www.linbit.com DRBD® and LINBIT® are registered trademarks of LINBIT, Austria. ___ Pacemaker mailing list: Pacemaker@oss.clusterlabs.org http://oss.clusterlabs.org/mailman/listinfo/pacemaker Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://developerbugs.linux-foundation.org/enter_bug.cgi?product=Pacemaker
Re: [Pacemaker] crm_master triggering assert section != NULL
a protection against flapping in case a slave hovers around the replication lag threshold You should get plenty of inspiration there from how the dampen parameter is used in ocf:pacemaker:ping. ok, I'll check The current RA does implement that but it is not required giving the context. The new RA does implement flapping protection. - upon demote of a master, the RA _must_ attempt to kill all user (non-system) connections The current RA does not do that but it is easy to implement Yeah, as I assume it would be in the other one. - Slaves must be read-only That's fine, handled by the current RA. Correct. - Monitor should test MySQL and replication. If either is bad, vips should be moved away. Common errors should not trigger actions. Like I said, should be feasible with the node attribute approach outlined above. No reason to muck around with the resources directly. That's handled by the current RA for most of if. The error handling could be added. - Slaves should update their master score according to the state of their replication. Handled by both RA Right. So, at the minimum, the RA needs to be able to store the master coordinate information, either in the resource parameters or in transient attributes and must be able to modify resources location scores. The script _was_ working before I got the cib issue, maybe it was purely accidental but it proves the concept. I was actually implement/testing the relay_log completion stuff. I chose not to use the current agent because I didn't want to manage MySQL itself, just replication. I am wide open to argue any Pacemaker or RA architecture/design part but I don't want to argue the replication requirements, they are fundamental in my mind. Yup, and I still believe that ocf:heartbeat:mysql either already addresses those, or they could be addressed in a much cleaner fashion than writing a new RA. Now, if the only remaining point is but I want to write an agent that can do _less_ than an existing one (namely, manage only replication, not the underlying daemon), then I guess I can't argue with that, but I'd still believe that would be a suboptimal approach. Ohh... don't get me wrong, I am not the kind of guy that takes pride in having re-invented the flat tire. I want an opensource _solution_ I can offer to my customers. I think part of the problem here is that we are not talking about the same ocf:heartbeat:mysql RA. What is mainstream is what you can get with apt-get install pacemaker on 10.04 LTS for example. This is 1.0.8. I also tried 1.0.11 and still it is obviously not the same version. I got my latest agent version as explained in the clusterlabs FAQ page from: wget -O resource-agents.tar.bz2 http://hg.linux-ha.org/agents/archive/tip.tar.bz2 Where can I get the version you are using :) Regards, Yves Cheers, Florian ___ Pacemaker mailing list: Pacemaker@oss.clusterlabs.org http://oss.clusterlabs.org/mailman/listinfo/pacemaker Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://developerbugs.linux-foundation.org/enter_bug.cgi?product=Pacemaker -- : Lars Ellenberg : LINBIT | Your Way to High Availability : DRBD/HA support and consulting http://www.linbit.com DRBD® and LINBIT® are registered trademarks of LINBIT, Austria. ___ Pacemaker mailing list: Pacemaker@oss.clusterlabs.org http://oss.clusterlabs.org/mailman/listinfo/pacemaker Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://developerbugs.linux-foundation.org/enter_bug.cgi?product=Pacemaker
Re: [Pacemaker] crm_master triggering assert section != NULL
On Thu, Oct 13, 2011 at 01:21:46AM +0200, Lars Ellenberg wrote: On Wed, Oct 12, 2011 at 05:09:45PM -0400, Yves Trudeau wrote: Hi Florian, On 11-10-12 04:09 PM, Florian Haas wrote: On 2011-10-12 21:46, Yves Trudeau wrote: Hi Florian, sure, let me state the requirements. If those requirements can be met, pacemaker will be much more used to manage MySQL replication. Right now, although at Percona I deal with many large MySQL deployments, none are using the current agent. Another tool, MMM is currently used but it is currently orphan and suffers from many pretty fundamental flaws (while implement about the same logic as below). Consider a pool of N identical MySQL servers. In that case we need: - N replication resources (it could be the MySQL RA) - N Reader_vip - 1 Writer_vip Reader vips are used by the application to run queries that do not modify data, usually accessed is round-robin fashion. When the application needs to write something, it uses the writer_vip. That's how read/write splitting is implement in many many places. So, for the agent, here are the requirements: - No need to manage MySQL itself The resource we are interested in is replication, MySQL itself is at another level. If the RA is to manage MySQL, it must not interfere. - the writer_vip must be assigned only to the master, after it is promoted This, is easy with colocation Agreed. - After the promotion of a new master, all slaves should be allowed to complete the application of their relay logs prior to any change master The current RA does not do that but it should be fairly easy to implement. That's a use case for a pre-promote and post-promote notification. Like the mysql RA currently does. - After its promotion and before allowing writes to it, a master should publish its current master file and position. I am using resource parameters in the CIB for these (I am wondering if transient attributes could be used instead) They could, and you should. Like the mysql RA currently does. The RA I downloaded following instruction of the wiki stating it is the latest sources: wget -O resource-agents.tar.bz2 http://hg.linux-ha.org/agents/archive/tip.tar.bz2 Has moved to github. I'll try to make that more obvious at the website, Hm. That I had already done, not sure what else I could do there. but that won't help for direct download hg archive links. Now, those I simply disabled, so people will notice ;-) http://github.com/ClusterLabs/resource-agents/blob/master/heartbeat/mysql raw download: http://raw.github.com/ClusterLabs/resource-agents/master/heartbeat/mysql Also see this pull request: https://github.com/ClusterLabs/resource-agents/pull/28 -- : Lars Ellenberg : LINBIT | Your Way to High Availability : DRBD/HA support and consulting http://www.linbit.com ___ Pacemaker mailing list: Pacemaker@oss.clusterlabs.org http://oss.clusterlabs.org/mailman/listinfo/pacemaker Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://developerbugs.linux-foundation.org/enter_bug.cgi?product=Pacemaker
Re: [Pacemaker] primary does not run alone
On Tue, Oct 11, 2011 at 09:09:52AM +0900, H.Nakai wrote: Hi, Andreas, Lars, and everybody I will try newer version. But, I want below. DRBD has fencing policies (fencing resource-and-stonith, for example), which, if configured, cause it to call fencing handlers (handler { fence-peer }) when appropriate. There are various fence-peer handlers. One is the drbd-peer-outdater, which needs dopd, which at this point depends on the heartbeat communication layer. Then there is the crm-fence-peer.sh script, which works by setting a pacemaker location constraint instead of actually setting the peer outdated. See if that works like you think it should. Primary demote wait 5-10 seconds check Secondary is promoted or still secondary or disconnected if Secondary is promoted and still primary, set local outdate (This means shutdown only Primary) if Secondary is still secondary or disconnected, not set local outdate (This means shutdown both of Primary and Secondary) disconnect shutdown Seconday check Primary if Primary is primary, set local outdate if Primary is demoted(secondary), not set outdate disconnect shutdown (2011/10/08 7:14), Lars Ellenberg wrote: On Fri, Oct 07, 2011 at 11:29:57PM +0200, Andreas Kurz wrote: Hello, On 10/07/2011 04:51 AM, H.Nakai wrote: Hi, I'm from Japan, in trouble. In the case blow, server which was primary sometimes do not run drbd/heartbeat. Server A(primary), Server B(secondary) is running. Shutdown A and immediately Shutdown B. Switch on only A, it dose not run drbd/heartbeat. It may happen when one server was broken. I'm using, drbd83-8.3.8-1.el5 heartbeat-3.0.5-1.1.el5 pacemaker-1.0.11-1.2.el5 resource-agents-3.9.2-1.1.el5 centos5.6 Servers are using two LANs(eth0, eth1) and not using serial cable. I checked /usr/lib/ocf/resource.d/linbit/drbd, and insert some debug codes. At drbd_stop(), in while loop, only when Unconfigured, break and call maybe_outdate_self(). But sometimes, $OCF_RESKEY_CRM_meta_notify_master_uname or $OCF_RESKEY_CRM_meta_notify_promote_uname are not null. So, at maybe_outdate_self(), it is going to set outdate. And, it always show warning messages below. But, outdated flag is set. State change failed: Disk state is lower than outdated state = { cs:StandAlone ro:Secondary/Unknown ds:Diskless/DUnknown r--- } wanted = { cs:StandAlone ro:Secondary/Unknown ds:Outdated/DUnknown r--- } those are expected and harmless, even though I admit they are annoying. I do not want to be set outdated flag, when shutdown both of them. I want to know what program set $OCF_RESKEY_CRM_* variables, with what condition set these variables, and when these variables are set. you need a newer OCF resource agent, at least from DRBD 8.3.9. There was the new parameter stop_outdates_secondary (defaults to true) introduced ... set this to false to change the behavior of your setup and be warned: this increases the change to come up with old (outdated) data. BTW, that default has changed to false, because of a bug in some version of pacemaker, which got the environment for stop operations wrong. pacemaker 1.0.11 is ok again, iirc. Anyways, if you simply go to DRBD 8.3.11, you should be good. If you want only the agent script, grab it there: http://git.drbd.org/drbd-8.3.git/?a=blob_plain;f=scripts/drbd.ocf Thanks, Nickey ___ Pacemaker mailing list: Pacemaker@oss.clusterlabs.org http://oss.clusterlabs.org/mailman/listinfo/pacemaker Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://developerbugs.linux-foundation.org/enter_bug.cgi?product=Pacemaker -- : Lars Ellenberg : LINBIT | Your Way to High Availability : DRBD/HA support and consulting http://www.linbit.com DRBD® and LINBIT® are registered trademarks of LINBIT, Austria. ___ Pacemaker mailing list: Pacemaker@oss.clusterlabs.org http://oss.clusterlabs.org/mailman/listinfo/pacemaker Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://developerbugs.linux-foundation.org/enter_bug.cgi?product=Pacemaker
Re: [Pacemaker] primary does not run alone
On Fri, Oct 07, 2011 at 11:29:57PM +0200, Andreas Kurz wrote: Hello, On 10/07/2011 04:51 AM, H.Nakai wrote: Hi, I'm from Japan, in trouble. In the case blow, server which was primary sometimes do not run drbd/heartbeat. Server A(primary), Server B(secondary) is running. Shutdown A and immediately Shutdown B. Switch on only A, it dose not run drbd/heartbeat. It may happen when one server was broken. I'm using, drbd83-8.3.8-1.el5 heartbeat-3.0.5-1.1.el5 pacemaker-1.0.11-1.2.el5 resource-agents-3.9.2-1.1.el5 centos5.6 Servers are using two LANs(eth0, eth1) and not using serial cable. I checked /usr/lib/ocf/resource.d/linbit/drbd, and insert some debug codes. At drbd_stop(), in while loop, only when Unconfigured, break and call maybe_outdate_self(). But sometimes, $OCF_RESKEY_CRM_meta_notify_master_uname or $OCF_RESKEY_CRM_meta_notify_promote_uname are not null. So, at maybe_outdate_self(), it is going to set outdate. And, it always show warning messages below. But, outdated flag is set. State change failed: Disk state is lower than outdated state = { cs:StandAlone ro:Secondary/Unknown ds:Diskless/DUnknown r--- } wanted = { cs:StandAlone ro:Secondary/Unknown ds:Outdated/DUnknown r--- } those are expected and harmless, even though I admit they are annoying. I do not want to be set outdated flag, when shutdown both of them. I want to know what program set $OCF_RESKEY_CRM_* variables, with what condition set these variables, and when these variables are set. you need a newer OCF resource agent, at least from DRBD 8.3.9. There was the new parameter stop_outdates_secondary (defaults to true) introduced ... set this to false to change the behavior of your setup and be warned: this increases the change to come up with old (outdated) data. BTW, that default has changed to false, because of a bug in some version of pacemaker, which got the environment for stop operations wrong. pacemaker 1.0.11 is ok again, iirc. Anyways, if you simply go to DRBD 8.3.11, you should be good. If you want only the agent script, grab it there: http://git.drbd.org/drbd-8.3.git/?a=blob_plain;f=scripts/drbd.ocf -- : Lars Ellenberg : LINBIT | Your Way to High Availability : DRBD/HA support and consulting http://www.linbit.com ___ Pacemaker mailing list: Pacemaker@oss.clusterlabs.org http://oss.clusterlabs.org/mailman/listinfo/pacemaker Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://developerbugs.linux-foundation.org/enter_bug.cgi?product=Pacemaker
Re: [Pacemaker] concurrent uses of cibadmin: Signon to CIB failed: connection failed
On Thu, Sep 29, 2011 at 03:45:32PM -0400, Brian J. Murrell wrote: So, in another thread there was a discussion of using cibadmin to mitigate possible concurrency issue of crm shell. I have written a test program to test that theory and unfortunately cibadmin falls down in the face of heavy concurrency also with errors such as: Signon to CIB failed: connection failed Init failed, could not perform requested operations Signon to CIB failed: connection failed Init failed, could not perform requested operations Signon to CIB failed: connection failed Init failed, could not perform requested operations Cib does a listen(sock_fd, 10), implicitly, via glue, clplumbing ipcsocket.c, socket_wait_conn_new() You get a connection request backlog of 10. Usually that is enough to give a server enough time to accept them in time. If you concurrently create many new client sessions, some client connect() may fail. Those would then need to be retried. My feeling is, any retry logic for concurrency issues should go in some shell wrapper, though. If you really expect to run into too many connect attempts to cib at the same time regularly, You are doing it wrong ;-) cibadmin seems to have consistent error codes, this particular problem should fall into exit code 10. Effectively my test runs: for x in $(seq 1 50); do cibadmin -o resources -C -x resource-$x.xml done My complete test program is attached for review/experimentation if you wish. Am I doing something wrong or is this a bug? I'm using pacemaker 1.0.10-1.4.el5 for what it's worth. Cheers, b. -- : Lars Ellenberg : LINBIT | Your Way to High Availability : DRBD/HA support and consulting http://www.linbit.com ___ Pacemaker mailing list: Pacemaker@oss.clusterlabs.org http://oss.clusterlabs.org/mailman/listinfo/pacemaker Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://developerbugs.linux-foundation.org/enter_bug.cgi?product=Pacemaker
Re: [Pacemaker] Trouble with ordering
On Fri, Sep 30, 2011 at 10:06:51AM +0200, Gerald Vogt wrote: Hi! I am running a cluster with 3 nodes. These nodes provide dns service. The purpose of the cluster is to have our two dns service ip addresses online at all times. I use IPaddr2 and that part works. Now I try to extend our setup to check the dns service itself. So far, if a dns server on any node stops or hangs the cluster won't notice. Thus, I wrote a custom ocf script to check whether the dns service on a node is operational (i.e. if the dns server is listening on the ip address and whether it responds to a dns request). All cluster nodes are slave dns servers, therefore the dns server process is running at all times to get zone transfers from the dns master. Obviously, the dns service resource must be colocated with the IP address resource. However, as the dns server is running at all times, the dns service resource must be started or stopped after the ip address. This leads me to something like this: primitive ns1-ip ocf:heartbeat:IPaddr2 ... primitive ns1-dns ocf:custom:dns op monitor interval=30s colocation dns-ip1 inf: ns1-dns ns1-ip order ns1-ip-dns inf: ns1-ip ns1-dns symmetrical=false maybe, if this is what you mean, add: order ns1-ip-dns inf: ns1-ip:stop ns1-dns:stop symmetrical=false Problem 1: it seems as if the order constraint does not wait for an operation on the first resource to finish before it starts the operation on the second. When I migrate an IP address to another node the stop operation on ns1-dns will fail because the ip address is still active on the network interface. I have worked around this by checking for the IP address on the interface in the stop part of my dns script and sleeping 5 seconds if it is still there before checking again and continuing. Shouldn't the stop on ns1-ip first finish before the node initiates the stop on ns1-dns? Problem 2: if the dns service fails, e.g. hangs, the monitor operation fails. Thus, the cluster wants to migrate the ip address and service to another node. However, it first initiates a stop on ns1-dns and then on ns1-ip. What I need is ns1-ip to stop before ns1-dns. But this seems impossible to configure. The order constraint only says what operation is executed on ns1-dns depending on the status of ns1-ip. It says what happens after something. It cannot say what happens before something. Is that correct? Or am I missing a configuration option? Thanks, Gerald -- : Lars Ellenberg : LINBIT | Your Way to High Availability : DRBD/HA support and consulting http://www.linbit.com ___ Pacemaker mailing list: Pacemaker@oss.clusterlabs.org http://oss.clusterlabs.org/mailman/listinfo/pacemaker Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://developerbugs.linux-foundation.org/enter_bug.cgi?product=Pacemaker
Re: [Pacemaker] OCF exit code 8 triggers WARN message
On Fri, Sep 16, 2011 at 05:02:52PM +0200, Dejan Muhamedagic wrote: Hi Thilo, On Fri, Sep 16, 2011 at 04:41:59PM +0200, Thilo Uttendorfer wrote: Hi, I experience a lot of WARN log entries in several pacemaker cluster setups: Sep 16 11:53:21 server01 lrmd: [23946]: WARN: Managed res1:0:monitor process 26489 exited with return code 8. That's because multi state resources like DRBD have some special return codes. 8 means OCF_RUNNING_MASTER which should not trigger a warning. The folowing patch in cluster-clue solved this issue: - diff -u lib/clplumbing/proctrack.c lib/clplumbing/proctrack.c.patched --- lib/clplumbing/proctrack.c 2011-09-16 15:48:25.0 +0200 +++ lib/clplumbing/proctrack.c.patched 2011-09-16 15:51:43.0 +0200 @@ -271,7 +271,7 @@ if (doreport) { if (deathbyexit) { - cl_log((exitcode == 0 ? LOG_INFO : LOG_WARNING) + cl_log(((exitcode == 0 || exitcode == 8) ? LOG_INFO : LOG_WARNING) , Managed %s process %d exited with return code %d. , type, pid, exitcode); }else if (deathbysig) { - I did consider this before but was worried that a process different from OCF RA instance could exit with such a code. Code 7 (not running) also belongs to this category. Anyway, we should probably add this patch. Hm... As lrmd is not the sole users of that proctrack interface, and not everything lrmd does is a monitor operation, can we add an other loglevel flag there, e.g. PT_LOG_OCF_MONITOR, and base degradation of log level for expected exit codes on that? -- : Lars Ellenberg : LINBIT | Your Way to High Availability : DRBD/HA support and consulting http://www.linbit.com ___ Pacemaker mailing list: Pacemaker@oss.clusterlabs.org http://oss.clusterlabs.org/mailman/listinfo/pacemaker Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://developerbugs.linux-foundation.org/enter_bug.cgi?product=Pacemaker
Re: [Pacemaker] Strange DRBD error in cluster operation
On Thu, Sep 01, 2011 at 02:59:56PM +0200, Michael Schwartzkopff wrote: Hi, from time to time we see the DRBD M/S resource failing on one of our clusters. From the logs we see that the monitoring fail with rc=5 (not_installed) and the log entry: lrmd: [2454]: info: RA output: (resDRBD:1:monitor:stderr) /etc/drbd.conf:3: Failed to open include file 'drbd.d/global_common.conf'. This happens about once per week and causes constant trouble. Any ideas what might be the reason for this behavior? You periodically re-create that file from some recipe, and it so happens that at the time of the monitor, it is not there? -- : Lars Ellenberg : LINBIT | Your Way to High Availability : DRBD/HA support and consulting http://www.linbit.com ___ Pacemaker mailing list: Pacemaker@oss.clusterlabs.org http://oss.clusterlabs.org/mailman/listinfo/pacemaker Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://developerbugs.linux-foundation.org/enter_bug.cgi?product=Pacemaker
Re: [Pacemaker] Unable to execute crm(heartbeat/pacemaker) commands
On Thu, Aug 25, 2011 at 10:34:07AM +0200, Dejan Muhamedagic wrote: Hi, On Tue, Aug 23, 2011 at 11:15:09AM +0530, rakesh k wrote: Hi I am using Heartbeat(3.0.3) and pacemaker (1.0.9). We are facing the following issue. Please find the details. we had installed heartbeat and pacemaker,on the uinux BOX(CENT OS operation system). we had created a ssh user and provided it to one of the developers. please find the directory structure and the bash profile for that ssh user. bash-3.2# cat .bash_profile # .bash_profile # User specific environment and startup programs PATH=$PATH:/usr/sbin export PATH bash-3.2# but when one of the developer logs in to the box where heartbeat/pacemaker is installed through ssh . he is unable to execute crm configuration commands. say for example. while we are executing the following crm configuration commands . we are unable to execute crm configuration commands and the system is hanging while executing. What is hanging? The crm shell? Does it react to ctrl-C? Can you provide more details. My guess is that the shell prompt is hanging. Why? Because you end the last part of the input with backslash. Which of course causes shell to wait for yet an other line. And if you don't type that line (or an additional return) that shell prompt will wait for a very long time. If that guess should turn out to be true, I suggest you sleep more, drink more water or tea or coffee or whatever helps, Or first learn about shell and do some *nix systems 101 in general before trying to do cluster stuff. Please find the crm configuration command we are using and the snapshot of the bash prompt while executing -bash-3.2$ crm configure primitive HttpdVIP ocf:heartbeat:IPaddr3 \ params ip=10.104.231.78 eth_num=eth0:2 vip_cleanup_file=/var/run/bigha.pid \ op start interval=0 timeout=120s \ op stop interval=0 timeout=120s \ params ip=10.104.231.78 eth_num=eth0:2 vip_cleanup_file=/var/run/bigha.pid \ op monitor interval=30s op start interval=0 timeout=120s \ op stop interval=0 timeout=120s \ op monitor interval=30s Do you actually type all this on the command line? Why would you want to do that, why not use a file. There's no telling if and how shell expansion would affect this. Thanks, Dejan can you please help me on this particular sceanrio. ___ Pacemaker mailing list: Pacemaker@oss.clusterlabs.org http://oss.clusterlabs.org/mailman/listinfo/pacemaker Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://developerbugs.linux-foundation.org/enter_bug.cgi?product=Pacemaker ___ Pacemaker mailing list: Pacemaker@oss.clusterlabs.org http://oss.clusterlabs.org/mailman/listinfo/pacemaker Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://developerbugs.linux-foundation.org/enter_bug.cgi?product=Pacemaker -- : Lars Ellenberg : LINBIT | Your Way to High Availability : DRBD/HA support and consulting http://www.linbit.com DRBD® and LINBIT® are registered trademarks of LINBIT, Austria. ___ Pacemaker mailing list: Pacemaker@oss.clusterlabs.org http://oss.clusterlabs.org/mailman/listinfo/pacemaker Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://developerbugs.linux-foundation.org/enter_bug.cgi?product=Pacemaker
Re: [Pacemaker] Unable to execute crm(heartbeat/pacemaker) commands
On Thu, Aug 25, 2011 at 11:05:32AM +0200, Lars Ellenberg wrote: My guess is that the shell prompt is hanging. Why? Because you end the last part of the input with backslash. Which of course causes shell to wait for yet an other line. And if you don't type that line (or an additional return) that shell prompt will wait for a very long time. If that guess should turn out to be true, I suggest you sleep more, drink more water or tea or coffee or whatever helps, Or first learn about shell and do some *nix systems 101 in general before trying to do cluster stuff. Then again, if it is something completely different, I apologize for being impertinent ... Lars ___ Pacemaker mailing list: Pacemaker@oss.clusterlabs.org http://oss.clusterlabs.org/mailman/listinfo/pacemaker Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://developerbugs.linux-foundation.org/enter_bug.cgi?product=Pacemaker
Re: [Pacemaker] compression with heartbeat doesn't seem to work
=23,ackseq=244435,lastmsg=442 Aug 19 07:38:21 usrv-qpr2 heartbeat: [23222]: ERROR: Cannot rexmit pkt 22 for usrv-qpr5: seqno too low Aug 19 07:38:21 usrv-qpr2 heartbeat: [23222]: info: fromnode =usrv-qpr5, fromnode's ackseq = 244435 Aug 19 07:38:21 usrv-qpr2 heartbeat: [23222]: info: hist information: Aug 19 07:38:21 usrv-qpr2 heartbeat: [23222]: info: hiseq =244943, lowseq=23,ackseq=244435,lastmsg=442 Aug 19 07:38:21 usrv-qpr2 heartbeat: [23222]: ERROR: Message hist queue is filling up (500 messages in queue) Aug 19 07:38:21 usrv-qpr2 heartbeat: [23222]: ERROR: Message hist queue is filling up (500 messages in queue) Aug 19 07:38:22 usrv-qpr2 heartbeat: [23222]: info: all clients are now resumed My questions: 1) Seems like the compression is not working. Is there something we need to do to enable it? We have tried both bz2 and zlib. We've played with the compression threshold as well. See above. Because pacemaker sometimes does not mark large message field values as should-be-compressed in the heartbeat message api way, you need traditional_compression on, to allow heartbeat to compress the full message instead. 2) How do we get the non DC system back on-line? Rebooting does not work since the DC can't seem to send the diffs to sync it. 3) If the diff it is trying to send is truly too long, how do I recover from that? Sometimes pacemaker needs to send the full cib. The cib, particularly the status section, will grow over time, as it accumulates probing, monitoring, and other action results. If you start off with a cib that is too large, you are out of luck. If you start with a cib that fits, it still may grow too large over time, so you may need to do some special maintenance there, delete outdated status results in time by hand or similar. Probably rather consider using corosync instead in that case, or reducing the number of your services/clones. 4) Would more information be useful in diagnosing the problem? I don't think so. -- : Lars Ellenberg : LINBIT | Your Way to High Availability : DRBD/HA support and consulting http://www.linbit.com DRBD® and LINBIT® are registered trademarks of LINBIT, Austria. ___ Pacemaker mailing list: Pacemaker@oss.clusterlabs.org http://oss.clusterlabs.org/mailman/listinfo/pacemaker Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://developerbugs.linux-foundation.org/enter_bug.cgi?product=Pacemaker
Re: [Pacemaker] Not seeing VIF/VIP on pacemaker system
On Thu, Jul 28, 2011 at 02:09:46PM -0400, Leonard Smith wrote: I have a very simply cluster configuration where I have a Virtual IP that is shared between two hosts. It is working fine, except that I cannot goto the hosts, issue an ifconfig command, and see a virttual IP address or the fact that the IP address is bound to the host. I would expect to see a VIF or at least the fact that the ip address is bound to the eth0 interface. Centos 5.6 pacemaker-1.0.11-1.2.el5 pacemaker-libs-1.0.11-1.2.el5 node $id=xx bos-vs002.foo.bar node $id=xx bos-vs001.foo.bar primitive ClusterIP ocf:heartbeat:IPaddr2 \ params ip=10.1.0.22 cidr_netmask=255.255.252.0 nic=eth0 \ op monitor interval=10s property $id=cib-bootstrap-options \ dc-version=1.0.11-1554a83db0d3c3e546cfd3aaff6af1184f79ee87 \ cluster-infrastructure=Heartbeat \ stonith-enabled=false \ no-quorum-policy=ignore \ default-resource-stickiness=1000 [root@bos-vs001 ~]# ifconfig -a eth0 Link encap:Ethernet HWaddr 00:16:36:41:D3:6D inet addr:10.1.1.1 Bcast:10.1.3.255 Mask:255.255.252.0 UP BROADCAST RUNNING MULTICAST MTU:1500 Metric:1 RX packets:454721 errors:0 dropped:0 overruns:0 frame:0 TX packets:90795 errors:0 dropped:0 overruns:0 carrier:0 collisions:0 txqueuelen:1000 RX bytes:257195727 (245.2 MiB) TX bytes:160400169 (152.9 MiB) loLink encap:Local Loopback inet addr:127.0.0.1 Mask:255.0.0.0 UP LOOPBACK RUNNING MTU:16436 Metric:1 RX packets:146 errors:0 dropped:0 overruns:0 frame:0 TX packets:146 errors:0 dropped:0 overruns:0 carrier:0 collisions:0 txqueuelen:0 RX bytes:13592 (13.2 KiB) TX bytes:13592 (13.2 KiB) IPaddr != IPaddr2, ifconfig != ip (from the iproute package) # this will list the addresses: ip addr show # also try: ip -o -f inet a s man ip If you want/need ifconfig to see those aliases as well, you need to label them, i.e. add the parameter iflabel to your primitive. -- : Lars Ellenberg : LINBIT | Your Way to High Availability : DRBD/HA support and consulting http://www.linbit.com DRBD® and LINBIT® are registered trademarks of LINBIT, Austria. ___ Pacemaker mailing list: Pacemaker@oss.clusterlabs.org http://oss.clusterlabs.org/mailman/listinfo/pacemaker Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://developerbugs.linux-foundation.org/enter_bug.cgi?product=Pacemaker
Re: [Pacemaker] Cluster with DRBD : split brain
On Wed, Jul 20, 2011 at 11:36:25AM -0400, Digimer wrote: On 07/20/2011 11:24 AM, Hugo Deprez wrote: Hello Andrew, in fact DRBD was in standalone mode but the cluster was working : Here is the syslog of the drbd's split brain : Jul 15 08:45:34 node1 kernel: [1536023.052245] block drbd0: Handshake successful: Agreed network protocol version 91 Jul 15 08:45:34 node1 kernel: [1536023.052267] block drbd0: conn( WFConnection - WFReportParams ) Jul 15 08:45:34 node1 kernel: [1536023.066677] block drbd0: Starting asender thread (from drbd0_receiver [23281]) Jul 15 08:45:34 node1 kernel: [1536023.066863] block drbd0: data-integrity-alg: not-used Jul 15 08:45:34 node1 kernel: [1536023.079182] block drbd0: drbd_sync_handshake: Jul 15 08:45:34 node1 kernel: [1536023.079190] block drbd0: self BBA9B794EDB65CDF:9E8FB52F896EF383:C5FE44742558F9E1:1F9E06135B8E296F bits:75338 flags:0 Jul 15 08:45:34 node1 kernel: [1536023.079196] block drbd0: peer 8343B5F30B2BF674:9E8FB52F896EF382:C5FE44742558F9E0:1F9E06135B8E296F bits:769 flags:0 Jul 15 08:45:34 node1 kernel: [1536023.079200] block drbd0: uuid_compare()=100 by rule 90 Jul 15 08:45:34 node1 kernel: [1536023.079203] block drbd0: Split-Brain detected, dropping connection! Jul 15 08:45:34 node1 kernel: [1536023.079439] block drbd0: helper command: /sbin/drbdadm split-brain minor-0 Jul 15 08:45:34 node1 kernel: [1536023.083955] block drbd0: meta connection shut down by peer. Jul 15 08:45:34 node1 kernel: [1536023.084163] block drbd0: conn( WFReportParams - NetworkFailure ) Jul 15 08:45:34 node1 kernel: [1536023.084173] block drbd0: asender terminated Jul 15 08:45:34 node1 kernel: [1536023.084176] block drbd0: Terminating asender thread Jul 15 08:45:34 node1 kernel: [1536023.084406] block drbd0: helper command: /sbin/drbdadm split-brain minor-0 exit code 0 (0x0) Jul 15 08:45:34 node1 kernel: [1536023.084420] block drbd0: conn( NetworkFailure - Disconnecting ) Jul 15 08:45:34 node1 kernel: [1536023.084430] block drbd0: error receiving ReportState, l: 4! Jul 15 08:45:34 node1 kernel: [1536023.084789] block drbd0: Connection closed Jul 15 08:45:34 node1 kernel: [1536023.084813] block drbd0: conn( Disconnecting - StandAlone ) Jul 15 08:45:34 node1 kernel: [1536023.086345] block drbd0: receiver terminated Jul 15 08:45:34 node1 kernel: [1536023.086349] block drbd0: Terminating receiver thread This was a DRBD split-brain, not a pacemaker split. I think that might have been the source of confusion. The split brain occurs when both DRBD nodes lose contact with one another and then proceed as StandAlone/Primary/UpToDate. To avoid this, configure fencing (stonith) in Pacemaker, then use 'crm-fence-peer.sh' in drbd.conf; === disk { fencing resource-and-stonith; } handlers { outdate-peer/path/to/crm-fence-peer.sh; } === Thanks, that is basically right. Let me fill in some details, though: This will tell DRBD to block (resource) and fence (stonith). DRBD will drbd fencing options are fencing resource-only, and fencing resource-and-stonith. resource-only does *not* block IO while the fencing handler runs. resource-and-stonith does block IO. not resume IO until either the fence script exits with a success, or until an admit types 'drbdadm resume-io res'. The CRM script simply calls pacemaker and asks it to fence the other node. No. It tries to place a constraint forcing the Master role off of any node but the one with the good data. When a node has actually failed, then the lost no is fenced. If both nodes are up but disconnected, as you had, then only the fastest node will succeed in calling the fence, and the slower node will be fenced before it can call a fence. fenced may be restricted from being/becoming Master by that fencing constraint. Or, if pacemaker decided to do so, actually shot by some node level fencing agent (stonith). All that resource-level fencing by placing some constraint stuff obviously only works as long as the cluster communication is still up. It not only the drbd replication link had issues, but the cluster communication was down as well, it becomes a bit more complex. -- : Lars Ellenberg : LINBIT | Your Way to High Availability : DRBD/HA support and consulting http://www.linbit.com DRBD® and LINBIT® are registered trademarks of LINBIT, Austria. ___ Pacemaker mailing list: Pacemaker@oss.clusterlabs.org http://oss.clusterlabs.org/mailman/listinfo/pacemaker Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://developerbugs.linux-foundation.org/enter_bug.cgi?product=Pacemaker
Re: [Pacemaker] Location issue: how to force only one specific location, and only as Slave
On Tue, Jul 05, 2011 at 11:40:04AM +1000, Andrew Beekhof wrote: On Mon, Jul 4, 2011 at 11:42 PM, ruslan usifov ruslan.usi...@gmail.com wrote: 2011/6/27 Andrew Beekhof and...@beekhof.net On Tue, Jun 21, 2011 at 10:22 PM, ruslan usifov ruslan.usi...@gmail.com wrote: No, i mean that in this constaint: location ms_drbd_web-U_slave_on_drbd3 ms_drbd_web-U \ rule role=slave -inf: #uname ne drbd3 pacemaker will try to start slave part of resource (if drbd3 is down) on other nodes, but it doesn't must do that. The only way to express this is to have: - a fake resource that can only run on drbd3, and - an ordering constraint tells ms_drbd_web-U to start only after the fake resource is active In future releases does this change? Its a planned but unimplemented feature. (please do not use drbdXYZ as host name! imagine to explain what you mean by drbd7 on drbd3 to someone else ...) If I understand correctly, you want to * restrict the resource to run only on one specific host * prevent it from becoming primary, ever Then why not (I assume hostname X now): # disallow anywhere but X location l_ms_drbd_only_on_X ms_drbd \ rule -inf: #uname ne X # but even on X, don't become Primary. location l_ms_drbd_no_primary_on_X ms_drbd \ rule $role=Master -inf: #uname eq X If you want pacemaker to really always do exactly that, then it seems to be most effective to not try to force that, but to forbid everything else ;-) -- : Lars Ellenberg : LINBIT | Your Way to High Availability : DRBD/HA support and consulting http://www.linbit.com DRBD® and LINBIT® are registered trademarks of LINBIT, Austria. ___ Pacemaker mailing list: Pacemaker@oss.clusterlabs.org http://oss.clusterlabs.org/mailman/listinfo/pacemaker Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://developerbugs.linux-foundation.org/enter_bug.cgi?product=Pacemaker