[Pacemaker] increase the debug level
Hi I use corosync+pacemaker. I set debug:on in corosync.conf.However there is nothing more output in log. If I want log to output more debug info what I should do?Thanks a lot ___ Pacemaker mailing list: Pacemaker@oss.clusterlabs.org http://oss.clusterlabs.org/mailman/listinfo/pacemaker Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://developerbugs.linux-foundation.org/enter_bug.cgi?product=Pacemaker
Re: [Pacemaker] Rsource failover error
Hello, Many thanks. BR, CFK -Original Message- From: Andreas Kurz [mailto:andreas.k...@linbit.com] Sent: Friday, March 18, 2011 4:22 PM To: pacemaker@oss.clusterlabs.org Subject: Re: [Pacemaker] Rsource failover error hello, On 2011-03-18 08:37, c...@itri.org.tw wrote: Dear all, I am a new member to this mailing list. Please let me know if the explanation is not clear enough. I setup a Centos 5.4 cluster environment (2 nodes, alpha1 and alpha2) with the following software: Corosync 1.3.0 Pacemaker 1.0.10. Drbd 8.3.9 The environment is constructed as Active/Passive cluster mode based on http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf. I setup four resources ( IP, DRBD, FileSystem, Apache) and want to test different failover situations. When I kill the corosync process at Active host, the Pacemaker seems fail to move DRBD:Master to the original Passive host, said Alpha2. is there a log entry like 'Multiple primaries not allowed by config' ? ... if you only kill corosync and DRBD is still connected and running fine DRBD will refuse to be promoted on both sides if not configured. and yes ... stonith would solve this problem. Regards, Andreas Corosync and DRBD configuration files are attached in this mail, and the crm configuration is listed below = node alpha1 node alpha2 primitive ClusterIP ocf:heartbeat:IPaddr2 \ params ip=192.168.75.10 cidr_netmask=32 \ op monitor interval=10s primitive Disk ocf:linbit:drbd \ params drbd_resource=ccmadata \ op monitor interval=60s primitive FS ocf:heartbeat:Filesystem \ params device=/dev/drbd0 directory=/var/www/html fstype= ext3 primitive WebSite ocf:heartbeat:apache \ params configfile=/etc/httpd/conf/httpd.conf \ op monitor interval=1min ms DiskClone Disk \ meta master-max=1 master-node-max=1 clone-max=2 clone-node-max=1 notify=true colocation drbd-with-ip inf: ClusterIP DiskClone:Master colocation fs-on-drbd inf: FS DiskClone:Master colocation website-with-fs inf: WebSite FS order DiskClone-after-IP inf: DiskClone:promote ClusterIP:start order FS-after-DiskClone inf: DiskClone:promote FS:start order WebSite-after-FS inf: FS:start WebSite:start property $id=cib-bootstrap-options \ dc-version=1.0.10-da7075976b5ff0bee71074385f8fd02f296ec8a3 \ cluster-infrastructure=openais \ expected-quorum-votes=2 \ stonith-enabled=false \ no-quorum-policy=ignore = The first abnormal monitoring message by crm_mon command is = Last updated: Thu Mar 17 18:19:04 2011 Stack: openais Current DC: alpha2 - partition WITHOUT quorum Version: 1.0.10-da7075976b5ff0bee71074385f8fd02f296ec8a3 2 Nodes configured, 2 expected votes 4 Resources configured. Online: [ alpha2 ] OFFLINE: [ alpha1 ] Master/Slave Set: DiskClone Slaves: [ alpha2 ] Stopped: [ Disk:0 ] = The last abnormal monitoring message is = Last updated: Thu Mar 17 18:20:01 2011 Stack: openais Current DC: alpha2 - partition WITHOUT quorum Version: 1.0.10-da7075976b5ff0bee71074385f8fd02f296ec8a3 2 Nodes configured, 2 expected votes 4 Resources configured. Online: [ alpha2 ] OFFLINE: [ alpha1 ] Master/Slave Set: DiskClone Slaves: [ alpha2 ] Stopped: [ Disk:1 ] Failed actions: Disk:1_promote_0 (node=alpha2, call=12, rc=-2, status=Timed Out): unknown ex ec error Disk:0_promote_0 (node=alpha2, call=22, rc=-2, status=Timed Out): unknown ex ec error = Corosync log on host Alpha1 is drbd_test_alpha1.log, and that on hoat Alpha2 is drbd_test_alpha2.log My questions are: 1) How to solve this issue? Do I miss some crm configuration for this situation? 2) According to corosync log on host Alpha2, Pacemaker wants to prompt 2 DRBD masters (Please correct me if I am wrong). The action is failed because the operation mode is set as Active/Passive mode and only 1 DRBD master is allowed to exist. Should I add additional crm or drbd.conf configurations? 3) I am still study STONITH. Is my question a split-brain issue? Thanks for your help. BR, Chia-Feng Kang
Re: [Pacemaker] [pacemaker][patch 1/4] Simple changes for Pacemaker Explained, chapter 4 Ch_Nodes.xml
ack, some pretty bad english there :-) On Mon, Mar 21, 2011 at 3:27 AM, Marcus Barrow mbar...@redhat.com wrote: Some simple changes for the Pacemaker Explained document. These are for CH_Nodes.xml and consist of some typos, missing words etc. Regards, Marcus Barrow ___ Pacemaker mailing list: Pacemaker@oss.clusterlabs.org http://oss.clusterlabs.org/mailman/listinfo/pacemaker Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://developerbugs.linux-foundation.org/enter_bug.cgi?product=Pacemaker ___ Pacemaker mailing list: Pacemaker@oss.clusterlabs.org http://oss.clusterlabs.org/mailman/listinfo/pacemaker Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://developerbugs.linux-foundation.org/enter_bug.cgi?product=Pacemaker
Re: [Pacemaker] [pacemaker][patch 3/4] Simple changes for Pacemaker Explained, Chapter 6 CH_Constraints.xml
Needs some updates. + Scores of all kinds are integral to how a cluster works. well, not all clusters, just pacemaker ones. How about: + Scores of all kinds are integral to how Pacemaker clusters work. Assume the intent was to avoid confusion with actual sets here? - titleExample set of opt-in location constraints/title + titleExample of opt-in location constraints/title Prefer something like: + titleExample usage of opt-in location constraints/title or similar to indicate that they only make sense together. I usually try to avoid questions as titles: - titleWhat if Two Nodes Have the Same Score/title + titleWhat if Two Nodes Have the Same Score?/title How about: + titleWhen Two Nodes Have the Same Score/title I like the existing text in this case - titleSpecifying the Order Resources Should Start/Stop In/title - paraThe way to specify the order in which resources should start is by creating literalrsc_order/literal constraints./para + titleSpecifying Resource Start/Stop Order/title + paraUse a literalrsc_order/literal constraint to specify resource ordering./para Also here: - entryThe name of a resource that must be started before the then resource is allowed to. /entry + entryThe name of a resource that must be started before the then resource. /entry Although changing to be a literal would be an improvement. Also think colocation makes more sense than resource here: - entryThe colocation target. The cluster will decide where to put this resource first and then decide where to put the resource in the rsc field/entry + entryThe resource target. The cluster will decide where to put this resource first and then decide where to put the colocation resource specified in the rsc field/entry + paraResource sets were introduced for ordering and dependency contraints to simplify this situation./para Prefer instead: + paraTo simplify the construction of ordering chains, the resource set syntax may be used instead./para +Using resource sets for complex colocation contraints makes things easier. Prefer: + paraTo simplify the construction of colocation chains, the resource set syntax may be used instead./para nack, the word equivalent is important here - titleThe equivalent colocation chain expressed using resource_sets/title + titleA resource set for the same colocation dependency chain/title and here: - titleA group resource with the equivalent colocation rules/title + titleA group resource for the same colocation dependency chain/title Small improvement to: + The only thing that matters is that in order for any member of a set to be active, all the members of the previous set must also be active (and naturally on the same node). When a set has literalsequential=true/literal, then in order for any member to be active, the previous members must also be active. + The only thing that matters is that in order for any member of a set to be active, all the members of the previous setfootnoteparaas determined by the display order in the configuration/para/footnote must also be active (and naturally on the same node). + When a set has literalsequential=true/literal, then in order for any member to be active, the previous members must also be active. Strictly speaking, they do have ordering dependancies, just not within the set. + captionVisual representation of a colocation chain where the members of the middle set have no order dependencies/caption Suggest: + captionVisual representation of a colocation chain where the members of the middle set have no ordering dependencies with the other sets/caption On Mon, Mar 21, 2011 at 3:51 AM, Marcus Barrow mbar...@redhat.com wrote: More simple changes for the Pacemaker Explained document. These are for CH_Constraints.xml and consist of typos and small changes. It also includes a change to Section 6.6 where dependency on preceding sets and preceding members of sets are described as M=1 and N+1. These were just changed to use the word preceding, which might be more clear. Regards, Marcus Barrow ___ Pacemaker mailing list: Pacemaker@oss.clusterlabs.org http://oss.clusterlabs.org/mailman/listinfo/pacemaker Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://developerbugs.linux-foundation.org/enter_bug.cgi?product=Pacemaker ___ Pacemaker mailing list: Pacemaker@oss.clusterlabs.org http://oss.clusterlabs.org/mailman/listinfo/pacemaker Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://developerbugs.linux-foundation.org/enter_bug.cgi?product=Pacemaker
Re: [Pacemaker] Filesystem resource agent patch
On Mon, Mar 21, 2011 at 07:17:52AM +0100, Marko Potocnik wrote: Actually the symbolic link is the beautifier. We use different versions of database server and using the symbolic link mount point is always the same. Do I need to do anything else for the patch to make it into the main branch? I'm not sure about the availability of readlink, and it's actual behaviour (exit codes), if it exists. But this patch should still behave anyways, so that's OK. I personally feel that using symlinks as mount points should not even work, and will confuse more than beautify. But maybe that's just me. -- : Lars Ellenberg : LINBIT | Your Way to High Availability : DRBD/HA support and consulting http://www.linbit.com DRBD® and LINBIT® are registered trademarks of LINBIT, Austria. ___ Pacemaker mailing list: Pacemaker@oss.clusterlabs.org http://oss.clusterlabs.org/mailman/listinfo/pacemaker Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://developerbugs.linux-foundation.org/enter_bug.cgi?product=Pacemaker
[Pacemaker] Fencing order
Hi. Today, we had a network outage. Quite a few problems suddenly arised in out setup, including crashed corosync, known notify bug in DRBD RA and some problem with VirtualDomain RA timeout on stop. But particularly strange was fencing behaviour. Initially, one node (wapgw1-1) has parted from the cluster. When connection was restored, corosync has died on that node. It was considered offline unclean and was scheduled to be fenced. Fencing by HP iLO did not work (currently, I do not know why). Second priority fencing method is meatware, and it did take time. Second node, wapgw1-2, hit DRBD notify bug and failed to stop some resources. It was online unclean. It also was scheduled to be fenced. HP iLO was available for this node, but it had not been STONITHed until I manually confirmed STONITH for wapgw1-1. When I confirmed first node restart, second node was fenced automatically. Is this ordering intended behaviour or a bug? It's pacemaker 1.0.10, corosync 1.2.7. Three-node cluster. -- Pavel Levshin ___ Pacemaker mailing list: Pacemaker@oss.clusterlabs.org http://oss.clusterlabs.org/mailman/listinfo/pacemaker Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://developerbugs.linux-foundation.org/enter_bug.cgi?product=Pacemaker
Re: [Pacemaker] Very strange behavior on asymmetric cluster
On Sat, Mar 19, 2011 at 4:14 PM, Pavel Levshin pa...@levshin.spb.ru wrote: 19.03.2011 19:10, Dan Frincu: Even if that is set, we need to verify that the resources are, indeed, NOT running where they shouldn't be; remember, it is our job to ensure that the configured policy is enforced. So, we probe them everywhere to ensure they are indeed not around, and stop them if we find them. Again, WHY do you need to verify things which cannot happen by setup? If some resource cannot, REALLY CANNOT exist on a node, and administrator can confirm this, why rely on network, cluster stack, resource agents, electricity in power outlet, etc. to verify that 2+2 is still 4? Don't want to step on any toes or anything, mainly because me stepping on somebody's toes without the person wearing a pair of steel-toe cap boots would leave them toeless, but I've been hearing the ranting go on and on and just felt like maybe something's missing from the picture, specifically, an example for why checking for resources on passive nodes is a good thing, which I haven't seen thus far. ... Ok, so far it sounds perfect, but what happens if on the secondary/passive node, someone starts the service, by user error, by upgrading the software and thus activating its automatic startup at the given runlevel and restarting the secondary node (common practice when performing upgrades in a cluster environment), etc. If Pacemaker were not to check all the nodes for the service being active or not = epic fail. Its state-based model, where it maintains a state of the resources and performs the necessary actions to bring the cluster to that state is what saves us from the epic fail moment. Surely you are right. Resources must be monitored on standby nodes to prevent such a scenario. You can screw your setup by many other ways, howewer. And pacemaker (1.0.10, at least) does not execute recurring monitor on passive node, so you may start your service by hands, and it will be unnoticed for quite some time. What I am talking about is monitoring (probing) of a resource on a node where this resource cannot be exist. For example, if you have five nodes in your cluster and a DRBD resource, which can, by it's nature, work on no more than two nodes. Then, other three of your nodes will be occasionally probed for that resource. If that action fails, the resource will be restarted everywhere. If that node cannot be fenced, the resource will be dead. As far as I understand that would require a definition of a quorum node or another special kind of node where resource cannot exist. Figuring out a a such role from location/collocation rules seems to complex to me. The idea of quorum node was abandoned by long ago in favor for some other features/project that Lars mentioned earlier. There is still at least one case when such a failure may happen even if RA is perfect: misbehaving or highly overloaded node may cause RA timeout. And bugs or configuration errors may, of course. A resource should not depend on unrelated things, such as nodes which have no connections to the resource. Then the resource will be more stable. I'm trying to be impartial here, although I may be biased by my experience to rule in favor of Pacemaker, but here's a thought, it's a free world, we all have the freedom of speech, which I'm also exercising at the moment, want something done, do it yourself, patches are being accepted, don't have the time, ask people for their help, in a polite manner, wait for them to reply, kindly ask them again (and prayers are heard, Steven Dake released http://www.mail-archive.com/openais@lists.linux-foundation.org/msg06072.html a patch for automatic redundant ring recovery, thank you Steven), want something done fast, pay some developers to do it for you, say the folks over at www.linbit.com wouldn't mind some sponsorship (and I'm not affiliated with them in any way, believe it or not, I'm actually doing this without external incentives, from the kindness of my heart so to speak). My goal for now is to make the problem clear to the team. It is doubtful that such a patch will be accepted without that, given current reaction. Moreover, it is not clear how to fix the problem to the best advantage. This cluster stack is brilliant. It's a pity to see how it fails to keep a resource running while it is relatively simple to avoid unneeded downtime. Thank you for participating. P.S. There is a crude workaround: op monitor interval=0 timeout=10 on_fail=nothing. Obvoiusly, it has own deficiencies. -- Pavel Levshin ___ Pacemaker mailing list: Pacemaker@oss.clusterlabs.org http://oss.clusterlabs.org/mailman/listinfo/pacemaker Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://developerbugs.linux-foundation.org/enter_bug.cgi?product=Pacemaker -- Serge Dubrouski.
Re: [Pacemaker] Very strange behavior on asymmetric cluster
On Mon, Mar 21, 2011 at 10:43 AM, Carlos G Mendioroz t...@huapi.ba.ar wrote: Serge Dubrouski @ 21/03/2011 13:10 -0300 dixit: What I am talking about is monitoring (probing) of a resource on a node where this resource cannot be exist. As far as I understand that would require a definition of a quorum node or another special kind of node where resource cannot exist. Figuring out a a such role from location/collocation rules seems to complex to me. The idea of quorum node was abandoned by long ago in favor for some other features/project that Lars mentioned earlier. There is already a location rule, and a minus infinite value. Is that value being used dynamically ? If not, that could be used as a marker for this (resource) can not possibly run in this node so monitoring is not necesary ? It is used dynamically quite often. For example moving resource out of one node creates a such location rule. Does it mean that along with moving resource Pacemaker has to stop monitoring it on the left node? I don't think so. -- Carlos G Mendioroz t...@huapi.ba.ar LW7 EQI Argentina ___ Pacemaker mailing list: Pacemaker@oss.clusterlabs.org http://oss.clusterlabs.org/mailman/listinfo/pacemaker Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://developerbugs.linux-foundation.org/enter_bug.cgi?product=Pacemaker -- Serge Dubrouski. ___ Pacemaker mailing list: Pacemaker@oss.clusterlabs.org http://oss.clusterlabs.org/mailman/listinfo/pacemaker Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://developerbugs.linux-foundation.org/enter_bug.cgi?product=Pacemaker
Re: [Pacemaker] Very strange behavior on asymmetric cluster
Serge Dubrouski @ 21/03/2011 13:49 -0300 dixit: On Mon, Mar 21, 2011 at 10:43 AM, Carlos G Mendioroz t...@huapi.ba.ar wrote: Serge Dubrouski @ 21/03/2011 13:10 -0300 dixit: What I am talking about is monitoring (probing) of a resource on a node where this resource cannot be exist. As far as I understand that would require a definition of a quorum node or another special kind of node where resource cannot exist. Figuring out a a such role from location/collocation rules seems to complex to me. The idea of quorum node was abandoned by long ago in favor for some other features/project that Lars mentioned earlier. There is already a location rule, and a minus infinite value. Is that value being used dynamically ? If not, that could be used as a marker for this (resource) can not possibly run in this node so monitoring is not necesary ? It is used dynamically quite often. For example moving resource out of one node creates a such location rule. Does it mean that along with moving resource Pacemaker has to stop monitoring it on the left node? I don't think so. Neither do I. That was exactly my precondition :) Being that the RA absence is dealt with ok (i.e. no need to install the RA to enable pacemaker to do what it needs) then I feel it's ok anyway. I've seen many times arguments of the kind if the admin does this, then it breaks. I buy no such argument. I'm against systems playing smarter than admins. -- Carlos G Mendioroz t...@huapi.ba.ar LW7 EQI Argentina ___ Pacemaker mailing list: Pacemaker@oss.clusterlabs.org http://oss.clusterlabs.org/mailman/listinfo/pacemaker Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://developerbugs.linux-foundation.org/enter_bug.cgi?product=Pacemaker
Re: [Pacemaker] Very strange behavior on asymmetric cluster
21.03.2011 20:14, Carlos G Mendioroz: It is used dynamically quite often. For example moving resource out of one node creates a such location rule. Does it mean that along with moving resource Pacemaker has to stop monitoring it on the left node? I don't think so. You are right, location rules is not suitable for this case. I'd prefer an additional meta parameter (or two) for the resource, listing included or excluded nodes. Neither do I. That was exactly my precondition :) Being that the RA absence is dealt with ok (i.e. no need to install the RA to enable pacemaker to do what it needs) then I feel it's ok anyway. It's not completely OK. First, personally I have been in sutuation when rc=5 not installed had been lost due to (still existing) bug (http://developerbugs.linux-foundation.org/show_bug.cgi?id=2568). It is a particular case which will eventually be fixed. But there are other possibilities to get into similar situation. Why wait for disaster? Second, RA is not resource. You may have two independent resources with one RA, suitable for different nodes. You can overcome this by copying the RA and accessing it by different names for each resource. It would lead to the case #1. Third, deleted RA may resurrect after software upgrade. You can defend yourself against this by using nonstandard location for your RAs. It may be considered good practice anyway, but IMHO this 'best practice' is not described in documentation. All of this makes building highly available cluster more difficult. I've seen many times arguments of the kind if the admin does this, then it breaks. I buy no such argument. I'm against systems playing smarter than admins. So am I. Currently, the system tries to auto-detect resource existence by probing it, when admin knows that resource cannot exist there. -- Pavel Levshin ___ Pacemaker mailing list: Pacemaker@oss.clusterlabs.org http://oss.clusterlabs.org/mailman/listinfo/pacemaker Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://developerbugs.linux-foundation.org/enter_bug.cgi?product=Pacemaker
[Pacemaker] Stonith
Hello, I need to make a cluster with for a database with its Filesystems, and I understood that I need to use STONITH, which one you think is the best one, and what parameters I need. Also how doens STONITH exactly work. Thanks in advance. ___ Pacemaker mailing list: Pacemaker@oss.clusterlabs.org http://oss.clusterlabs.org/mailman/listinfo/pacemaker Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://developerbugs.linux-foundation.org/enter_bug.cgi?product=Pacemaker
Re: [Pacemaker] Fwd: All resources bounce on failback
21.03.2011 1:39, David Morton: order DB_SHARE_FIRST_DEPOT inf: CL_OCFS2_SHARED DEPOT order DB_SHARE_FIRST_ESP_AUDIT inf: CL_OCFS2_SHARED ESP_AUDIT Hmm, does not this cause the observed behaviour? Infinite score makes order mandatory. It is not simple ordering. It requires to do both actions together always. Order is also symmetric by default. Your rules could be written in common language as follows: 1. Always start CL_OCFS2_SHARED then start DEPOT; 1a. Always stop DEPOT then stop CL_OCFS2_SHARED; 2. Always start CL_OCFS2_SHARED then start ESP_AUDIT; 2a. Always stop ESP_AUDIT then stop CL_OCFS2_SHARED; In your described case, cluster wants to execute 2a. It causes 1a to be executed, because CL_OCFS2_SHARED stops. Then the cluster starts DEPOT again. Where this behaviour is useful is not clear to me. Could anyone explain? I should suggest relaxing your ordering rules to 0: score. -- Pavel Levshin ___ Pacemaker mailing list: Pacemaker@oss.clusterlabs.org http://oss.clusterlabs.org/mailman/listinfo/pacemaker Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://developerbugs.linux-foundation.org/enter_bug.cgi?product=Pacemaker
Re: [Pacemaker] [PATCH]Bug 2567 - crm resource migrate should support an optional role parameter
Hi Dejan, On Mon, 2011-03-21 at 16:11 +0100, Dejan Muhamedagic wrote: Hi Holger, On Sat, Mar 19, 2011 at 11:55:57AM +0100, Holger Teutsch wrote: Hi Dejan, On Fri, 2011-03-18 at 14:24 +0100, Dejan Muhamedagic wrote: Hi, On Fri, Mar 18, 2011 at 12:21:40PM +0100, Holger Teutsch wrote: Hi, I would like to submit 2 patches of an initial implementation for discussion. .. To recall: crm_resource --move resource creates a standby rule that moves the resource off the currently active node while crm_resource --move resource --node newnode creates a prefer rule that moves the resource to the new node. When dealing with clones and masters the behavior was random as the code only considers the node where the first instance of the clone was started. The new code behaves consistently for the master role of an m/s resource. The options --master and rsc:master are somewhat redundant as a slave move is not supported. Currently it's more an acknowledgement of the user. On the other hand it is desirable (and was requested several times on the ML) to stop a single resource instance of a clone or master on a specific node. Should that be implemented by something like crm_resource --move-off --resource myresource --node devel2 ? or should crm_resource refuse to work on clones and/or should moving the master role be the default for m/s resources and the --master option discarded ? I think that we also need to consider the case when clone-max is less than the number of nodes. If I understood correctly what you were saying. So, all of move slave and move master and move clone should be possible. I think the following use cases cover what can be done with such kind of interface: crm_resource --moveoff --resource myresource --node mynode - all resource variants: check whether active on mynode, then create standby constraint crm_resource --move --resource myresource - primitive/group: convert to --moveoff --node `current_node` - clone/master: refused crm_resource --move --resource myresource --node mynode - primitive/group: create prefer constraint - clone/master: refused crm_resource --move --resource myresource --master --node mynode - master: create prefer constraint for master role - others: refused They should work (witch foreseeable outcome!) regardless of the setting of clone-max. This seems quite complicated to me. Took me a while to figure out what's what and where :) Why bother doing the thinking for I'm afraid the matter *is* complicated. The current implementation of crm_resource --move --resource myResource (without node name) is moving off the resource from the node it is currently active on by creating a standby constraint. For clones and masters there is no such *single* active node the constraint can be constructed for. Consider this use case: I have 2 nodes and a clone or master and would like to safely get rid of one instance on a particular node (e.g. with agents 1.0.5 the slave of a DB2 HADR pair 8-) ). No idea how that should be done without a move-off functionality. users? The only case which seems to me worth considering is refusing setting role for non-ms resources. Otherwise, let's let the user move things around and enjoy the consequences. Definitely not true for production clusters. The tools should produce least surprise consequences. Cheers, Over the weekend I implemented the above mentioned functionality. Drop me note if you want to play with an early snapshot 8-) Regards Holger ___ Pacemaker mailing list: Pacemaker@oss.clusterlabs.org http://oss.clusterlabs.org/mailman/listinfo/pacemaker Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://developerbugs.linux-foundation.org/enter_bug.cgi?product=Pacemaker
Re: [Pacemaker] Fwd: All resources bounce on failback
22.03.2011 0:06, David Morton: Many thanks Pavel !! Using a value of 0 changes the behavior to the desired, makes perfect sense when explained in plain terms also !! I will experiment with some non-0 values, what situations could cause the order directive not being honored with a 0 value ? Advisory-only ordering is applied when both action need to be executed. You have colocation constraint which takes care of starting OCFS2_SHARED with DEPOT, and then order constraint determines what starts first. The same logic applies to stop. So this setup should be safe. Note that score in ordering constraint is somewhat misleading. The actual value does not matter; basically, there is only two possible values: above zero for mandatory constraing and zero or less for advisory one. -- Pavel Levshin ___ Pacemaker mailing list: Pacemaker@oss.clusterlabs.org http://oss.clusterlabs.org/mailman/listinfo/pacemaker Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://developerbugs.linux-foundation.org/enter_bug.cgi?product=Pacemaker