Re: [Pacemaker] Question about crm_mon -n option
(13.03.27 18:01), Andrew Beekhof wrote: On Wed, Mar 27, 2013 at 7:44 PM, Kazunori INOUE inouek...@intellilink.co.jp wrote: Hi, I'm using pacemaker-1.1 (c7910371a5. the latest devel). In the case of globally-unique=false, instance numbers are appended to the result of crm_mon -n, as with in the case of globally-unique=true. Is this specifications? $ crm configure show : primitive prmDummy ocf:pacemaker:Dummy clone clnDummy prmDummy \ meta clone-max=2 clone-node-max=1 globally-unique=false $ crm_mon -n : Node dev1 (3232261525): online prmDummy:1 (ocf::pacemaker:Dummy): Started Node dev2 (3232261523): online prmDummy:0 (ocf::pacemaker:Dummy): Started Case without -n, instance numbers are not appended. Yeah, instance numbers shouldn't show up here I wrote the patch which does not display instance numbers, when globally-unique is false. https://github.com/inouekazu/pacemaker/commit/c9b0ef4e4b3be336a31d83a9297ef23f1adf7c8b The following files are results of crm_mon before and after applying this patch. - before_applying.log - after_applying.log The cluster configuration is as follows. $ crm configure show node $id=3232261523 dev2 node $id=3232261525 dev1 primitive prmDummy ocf:pacemaker:Dummy \ op monitor on-fail=restart interval=10s primitive prmDummy2 ocf:pacemaker:Dummy \ op monitor on-fail=restart interval=10s primitive prmStateful ocf:pacemaker:Stateful \ op monitor interval=11s role=Master on-fail=restart \ op monitor interval=12s role=Slave on-fail=restart ms msStateful prmStateful \ meta master-max=1 master-node-max=1 clone-max=2 clone-node-max=1 notify=true globally-unique=false clone clnDummy prmDummy \ meta clone-max=2 clone-node-max=1 globally-unique=false property $id=cib-bootstrap-options \ dc-version=1.1.10-1.el6-e8caee8 \ cluster-infrastructure=corosync \ no-quorum-policy=ignore \ stonith-enabled=false \ startup-fencing=false rsc_defaults $id=rsc-options \ resource-stickiness=INFINITY \ migration-threshold=1 $ crm_mon -r : Full list of resources: Clone Set: clnDummy [prmDummy] Started: [ dev1 dev2 ] Best Regards, Kazunori INOUE ___ Pacemaker mailing list: Pacemaker@oss.clusterlabs.org http://oss.clusterlabs.org/mailman/listinfo/pacemaker Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://bugs.clusterlabs.org ___ Pacemaker mailing list: Pacemaker@oss.clusterlabs.org http://oss.clusterlabs.org/mailman/listinfo/pacemaker Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://bugs.clusterlabs.org $ crm_mon -1 Last updated: Mon Apr 1 16:09:19 2013 Last change: Mon Apr 1 15:27:44 2013 via cibadmin on dev1 Stack: corosync Current DC: dev2 (3232261523) - partition with quorum Version: 1.1.10-1.el6-e8caee8 2 Nodes configured, unknown expected votes 5 Resources configured. Online: [ dev1 dev2 ] prmDummy2 (ocf::pacemaker:Dummy): Started dev1 Master/Slave Set: msStateful [prmStateful] Masters: [ dev1 ] Stopped: [ prmStateful:1 ] Clone Set: clnDummy [prmDummy] Started: [ dev1 ] Stopped: [ prmDummy:1 ] Failed actions: prmStateful_monitor_12000 (node=dev2, call=38, rc=7, status=complete): not running prmDummy_monitor_1 (node=dev2, call=25, rc=7, status=complete): not running $ $ crm_mon -n1 Last updated: Mon Apr 1 16:09:26 2013 Last change: Mon Apr 1 15:27:44 2013 via cibadmin on dev1 Stack: corosync Current DC: dev2 (3232261523) - partition with quorum Version: 1.1.10-1.el6-e8caee8 2 Nodes configured, unknown expected votes 5 Resources configured. Node dev1 (3232261525): online prmDummy2 (ocf::pacemaker:Dummy): Started prmStateful:0 (ocf::pacemaker:Stateful): Master prmDummy:0 (ocf::pacemaker:Dummy): Started Node dev2 (3232261523): online Failed actions: prmStateful_monitor_12000 (node=dev2, call=38, rc=7, status=complete): not running prmDummy_monitor_1 (node=dev2, call=25, rc=7, status=complete): not running $ $ crm_mon -r1 Last updated: Mon Apr 1 16:09:30 2013 Last change: Mon Apr 1 15:27:44 2013 via cibadmin on dev1 Stack: corosync Current DC: dev2 (3232261523) - partition with quorum Version: 1.1.10-1.el6-e8caee8 2 Nodes configured, unknown expected votes 5 Resources configured. Online: [ dev1 dev2 ] Full list of resources: prmDummy2 (ocf::pacemaker:Dummy): Started dev1 Master/Slave Set: msStateful [prmStateful] Masters: [ dev1 ] Stopped: [ prmStateful:1 ] Clone Set: clnDummy [prmDummy] Started: [ dev1 ] Stopped: [ prmDummy:1 ] Failed actions: prmStateful_monitor_12000 (node=dev2, call=38, rc=7,
Re: [Pacemaker] Speeding up startup after migration
- Original Message - From: Vladislav Bogdanov bub...@hoster-ok.com To: pacemaker@oss.clusterlabs.org Sent: Friday, March 29, 2013 2:03:27 AM Subject: Re: [Pacemaker] Speeding up startup after migration 29.03.2013 03:31, Andrew Beekhof wrote: On Fri, Mar 29, 2013 at 4:12 AM, Benjamin Kiessling mittages...@l.unchti.me wrote: Hi, we've got a small pacemaker cluster running which controls an active/passive router. On this cluster we've got a semi-large (~30) number of primitives which are grouped together. On migration it takes quite a long time until each resource is brought up again because they are started sequentially. Is there a way to speed up the process, ideally to execute these resource agents in parallel? They are fully independent so the order in which they finish is of no concern. I'm guessing you have them in a group? Don't do that and they will fail over in parallel. Does current lrmd implementation have batch-limit like cluster-glue's one had? Can't find where is it. The batch-limit option is still around, but has nothing to do with the lrmd. It does limit how many resources can execute in parallel, but at the transition engine level rather than the lrmd. http://clusterlabs.org/doc/en-US/Pacemaker/1.1-pcs/html-single/Pacemaker_Explained/index.html#_available_cluster_options -- Vossel ___ Pacemaker mailing list: Pacemaker@oss.clusterlabs.org http://oss.clusterlabs.org/mailman/listinfo/pacemaker Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://bugs.clusterlabs.org ___ Pacemaker mailing list: Pacemaker@oss.clusterlabs.org http://oss.clusterlabs.org/mailman/listinfo/pacemaker Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://bugs.clusterlabs.org
Re: [Pacemaker] Same host displayed twice in crm status
- Original Message - From: Nicolas J. nikkro70+pacema...@gmail.com To: pacemaker@oss.clusterlabs.org Sent: Friday, March 29, 2013 8:55:30 AM Subject: [Pacemaker] Same host displayed twice in crm status Hi, I have a problem with a Corosync/Pacemaker configuration. One host of the cluster has been renamed and now the host is displayed twice in the configuration. When I try to remove the host from the configuration it works but if corosync is restarted on one node, the old host appears again. I tried several ways to delete the host with no effect. How can I delete the wrong host? For the pacemaker version you are using, try deleting the node from the configuration in both the node and status sections, then use crm_node -R option to remove the node from the cluster's internal cache. In pacemaker versions = 1.1.8 only the crm_node -R option is required to remove a node. -- Vossel I checked the Linux configuration and there is no place where the old name is referenced. It's an OEL/Red Hat linux. Output - [root@vmtestoradg2 ~]# crm status Last updated: Fri Mar 29 14:51:56 2013 Stack: openais Current DC: vmtestoradg1 - partition with quorum Version: 1.1.5-1.1.el5-01e86afaaa6d4a8c4836f68df80ababd6ca3902f 4 Nodes configured, 3 expected votes 1 Resources configured. Online: [ vmtestoradg1 vmtestora10g01 vmtestoradg2 ] OFFLINE: [ VMTESTORADG2.it.dbi-services.com ] DG_IP (ocf::heartbeat:IPaddr2): Started vmtestoradg1 [root@vmtestoradg2 ~]# crm node clearstate VMTESTORADG2.it.dbi-services.com Do you really want to drop state for node VMTESTORADG2.it.dbi-services.com ? y [root@vmtestoradg2 ~]# crm node delete VMTESTORADG2.it.dbi-services.com INFO: node VMTESTORADG2.it.dbi-services.com not found by crm_node INFO: node VMTESTORADG2.it.dbi-services.com deleted Thanks in advance Best Regards, Nicolas J. ___ Pacemaker mailing list: Pacemaker@oss.clusterlabs.org http://oss.clusterlabs.org/mailman/listinfo/pacemaker Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://bugs.clusterlabs.org ___ Pacemaker mailing list: Pacemaker@oss.clusterlabs.org http://oss.clusterlabs.org/mailman/listinfo/pacemaker Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://bugs.clusterlabs.org
Re: [Pacemaker] Speeding up startup after migration
01.04.2013 17:28, David Vossel пишет: - Original Message - From: Vladislav Bogdanov bub...@hoster-ok.com To: pacemaker@oss.clusterlabs.org Sent: Friday, March 29, 2013 2:03:27 AM Subject: Re: [Pacemaker] Speeding up startup after migration 29.03.2013 03:31, Andrew Beekhof wrote: On Fri, Mar 29, 2013 at 4:12 AM, Benjamin Kiessling mittages...@l.unchti.me wrote: Hi, we've got a small pacemaker cluster running which controls an active/passive router. On this cluster we've got a semi-large (~30) number of primitives which are grouped together. On migration it takes quite a long time until each resource is brought up again because they are started sequentially. Is there a way to speed up the process, ideally to execute these resource agents in parallel? They are fully independent so the order in which they finish is of no concern. I'm guessing you have them in a group? Don't do that and they will fail over in parallel. Does current lrmd implementation have batch-limit like cluster-glue's one had? Can't find where is it. The batch-limit option is still around, but has nothing to do with the lrmd. It does limit how many resources can execute in parallel, but at the transition engine level rather than the lrmd. Yep, I know that option, it was there for a very long time. So, if I understand correctly, new lrmd runs as many simultaneous jobs as possible. Unfortunately, in some circumstances this would result in the high node load and timeouts. Is there a way to some-how limit that load? http://clusterlabs.org/doc/en-US/Pacemaker/1.1-pcs/html-single/Pacemaker_Explained/index.html#_available_cluster_options -- Vossel ___ Pacemaker mailing list: Pacemaker@oss.clusterlabs.org http://oss.clusterlabs.org/mailman/listinfo/pacemaker Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://bugs.clusterlabs.org ___ Pacemaker mailing list: Pacemaker@oss.clusterlabs.org http://oss.clusterlabs.org/mailman/listinfo/pacemaker Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://bugs.clusterlabs.org ___ Pacemaker mailing list: Pacemaker@oss.clusterlabs.org http://oss.clusterlabs.org/mailman/listinfo/pacemaker Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://bugs.clusterlabs.org
Re: [Pacemaker] Speeding up startup after migration
- Original Message - From: Vladislav Bogdanov bub...@hoster-ok.com To: pacemaker@oss.clusterlabs.org Sent: Monday, April 1, 2013 10:35:39 AM Subject: Re: [Pacemaker] Speeding up startup after migration 01.04.2013 17:28, David Vossel пишет: - Original Message - From: Vladislav Bogdanov bub...@hoster-ok.com To: pacemaker@oss.clusterlabs.org Sent: Friday, March 29, 2013 2:03:27 AM Subject: Re: [Pacemaker] Speeding up startup after migration 29.03.2013 03:31, Andrew Beekhof wrote: On Fri, Mar 29, 2013 at 4:12 AM, Benjamin Kiessling mittages...@l.unchti.me wrote: Hi, we've got a small pacemaker cluster running which controls an active/passive router. On this cluster we've got a semi-large (~30) number of primitives which are grouped together. On migration it takes quite a long time until each resource is brought up again because they are started sequentially. Is there a way to speed up the process, ideally to execute these resource agents in parallel? They are fully independent so the order in which they finish is of no concern. I'm guessing you have them in a group? Don't do that and they will fail over in parallel. Does current lrmd implementation have batch-limit like cluster-glue's one had? Can't find where is it. The batch-limit option is still around, but has nothing to do with the lrmd. It does limit how many resources can execute in parallel, but at the transition engine level rather than the lrmd. Yep, I know that option, it was there for a very long time. So, if I understand correctly, new lrmd runs as many simultaneous jobs as possible. Unfortunately, in some circumstances this would result in the high node load and timeouts. Is there a way to some-how limit that load? Isn't that what the batch-limit option does? or are you saying you want a batch limit type option that is node specific? Why are you concerned about this behavior living in the LRMD instead of at the transition processing level? I believe if we do any batch limiting type behavior at the LRMD level we're going to run into problems with the transition timers in the crmd. The LRMD needs to always perform the actions it is given as soon as possible. -- Vossel http://clusterlabs.org/doc/en-US/Pacemaker/1.1-pcs/html-single/Pacemaker_Explained/index.html#_available_cluster_options -- Vossel ___ Pacemaker mailing list: Pacemaker@oss.clusterlabs.org http://oss.clusterlabs.org/mailman/listinfo/pacemaker Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://bugs.clusterlabs.org ___ Pacemaker mailing list: Pacemaker@oss.clusterlabs.org http://oss.clusterlabs.org/mailman/listinfo/pacemaker Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://bugs.clusterlabs.org ___ Pacemaker mailing list: Pacemaker@oss.clusterlabs.org http://oss.clusterlabs.org/mailman/listinfo/pacemaker Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://bugs.clusterlabs.org ___ Pacemaker mailing list: Pacemaker@oss.clusterlabs.org http://oss.clusterlabs.org/mailman/listinfo/pacemaker Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://bugs.clusterlabs.org
Re: [Pacemaker] Speeding up startup after migration
01.04.2013 20:09, David Vossel wrote: - Original Message - From: Vladislav Bogdanov bub...@hoster-ok.com To: pacemaker@oss.clusterlabs.org Sent: Monday, April 1, 2013 10:35:39 AM Subject: Re: [Pacemaker] Speeding up startup after migration 01.04.2013 17:28, David Vossel пишет: - Original Message - From: Vladislav Bogdanov bub...@hoster-ok.com To: pacemaker@oss.clusterlabs.org Sent: Friday, March 29, 2013 2:03:27 AM Subject: Re: [Pacemaker] Speeding up startup after migration 29.03.2013 03:31, Andrew Beekhof wrote: On Fri, Mar 29, 2013 at 4:12 AM, Benjamin Kiessling mittages...@l.unchti.me wrote: Hi, we've got a small pacemaker cluster running which controls an active/passive router. On this cluster we've got a semi-large (~30) number of primitives which are grouped together. On migration it takes quite a long time until each resource is brought up again because they are started sequentially. Is there a way to speed up the process, ideally to execute these resource agents in parallel? They are fully independent so the order in which they finish is of no concern. I'm guessing you have them in a group? Don't do that and they will fail over in parallel. Does current lrmd implementation have batch-limit like cluster-glue's one had? Can't find where is it. The batch-limit option is still around, but has nothing to do with the lrmd. It does limit how many resources can execute in parallel, but at the transition engine level rather than the lrmd. Yep, I know that option, it was there for a very long time. So, if I understand correctly, new lrmd runs as many simultaneous jobs as possible. Unfortunately, in some circumstances this would result in the high node load and timeouts. Is there a way to some-how limit that load? Isn't that what the batch-limit option does? or are you saying you want a batch limit type option that is node specific? Why are you concerned about this behavior living in the LRMD instead of at the transition processing level? There was a limit in a glue's lrmd, and I think it was there for reason. I do not know which behavior is better, they are just different. I believe if we do any batch limiting type behavior at the LRMD level we're going to run into problems with the transition timers in the crmd. Did that change in crmd after lrmd replacement? The LRMD needs to always perform the actions it is given as soon as possible. Yes, but... heavy load on a host (because of f.e. 150 CPU-intensive operations run in parallel) may cause f.e. monitor timeouts and then resource restarts and then stop timeouts and fencing. ___ Pacemaker mailing list: Pacemaker@oss.clusterlabs.org http://oss.clusterlabs.org/mailman/listinfo/pacemaker Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://bugs.clusterlabs.org
Re: [Pacemaker] [Problem][crmsh]The designation of the 'ordered' attribute becomes the error.
Hi Dejan, On 2013-03-06 11:59, Dejan Muhamedagic wrote: Hi Hideo-san, On Wed, Mar 06, 2013 at 10:37:44AM +0900, renayama19661...@ybb.ne.jp wrote: Hi Dejan, Hi Andrew, As for the crm shell, the check of the meta attribute was revised with the next patch. * http://hg.savannah.gnu.org/hgweb/crmsh/rev/d1174f42f4b3 This patch was backported in Pacemaker1.0.13. * https://github.com/ClusterLabs/pacemaker-1.0/commit/fa1a99ab36e0ed015f1bcbbb28f7db962a9d1abc#shell/modules/cibconfig.py However, the ordered,colocated attribute of the group resource is treated as an error when I use crm Shell which adopted this patch. -- (snip) ### Group Configuration ### group master-group \ vip-master \ vip-rep \ meta \ ordered=false (snip) [root@rh63-heartbeat1 ~]# crm configure load update test2339.crm INFO: building help index crm_verify[20028]: 2013/03/06_17:57:18 WARN: unpack_nodes: Blind faith: not fencing unseen nodes WARNING: vip-master: specified timeout 60s for start is smaller than the advised 90 WARNING: vip-master: specified timeout 60s for stop is smaller than the advised 100 WARNING: vip-rep: specified timeout 60s for start is smaller than the advised 90 WARNING: vip-rep: specified timeout 60s for stop is smaller than the advised 100 ERROR: master-group: attribute ordered does not exist - WHY? Do you still want to commit? y -- If it chooses `yes` by a confirmation message, it is reflected, but it is a problem that error message is displayed. * The error occurs in the same way when I appoint colocated attribute. AndI noticed that there was not explanation of ordered,colocated of the group resource in online help of Pacemaker. I think that the designation of the ordered,colocated attribute should not become the error in group resource. In addition, I think that ordered,colocated should be added to online help. These attributes are not listed in crmsh. Does the attached patch help? Dejan, will this patch for the missing ordered and collocated group meta-attribute be included in the next crmsh release? ... can't see the patch in the current tip. Thanks Regards, Andreas Thanks, Dejan Best Regards, Hideo Yamauchi. ___ Pacemaker mailing list: Pacemaker@oss.clusterlabs.org http://oss.clusterlabs.org/mailman/listinfo/pacemaker Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://bugs.clusterlabs.org ___ Pacemaker mailing list: Pacemaker@oss.clusterlabs.org http://oss.clusterlabs.org/mailman/listinfo/pacemaker Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://bugs.clusterlabs.org signature.asc Description: OpenPGP digital signature ___ Pacemaker mailing list: Pacemaker@oss.clusterlabs.org http://oss.clusterlabs.org/mailman/listinfo/pacemaker Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://bugs.clusterlabs.org
Re: [Pacemaker] Speeding up startup after migration
On 2013-04-01T13:09:14, David Vossel dvos...@redhat.com wrote: So, if I understand correctly, new lrmd runs as many simultaneous jobs as possible. Unfortunately, in some circumstances this would result in the high node load and timeouts. Is there a way to some-how limit that load? Isn't that what the batch-limit option does? or are you saying you want a batch limit type option that is node specific? Why are you concerned about this behavior living in the LRMD instead of at the transition processing level? I believe if we do any batch limiting type behavior at the LRMD level we're going to run into problems with the transition timers in the crmd. The LRMD needs to always perform the actions it is given as soon as possible. Seriously, folks, the LRM rewrite may turn out not to be the best example of pacemaker's attention to detail ;-) Yes, the previous LRM had a per-node concurrency limit. This avoided overloading the nodes via IO, which is why it was added. (And also smoothed out spikes in the monitoring calls should they happen to coincide.) Default limit of parallel executions was 4 or half the number of CPU cores, if memory serves. This turned out to actually improve performance (since it avoided said spikes), and avoid timeouts. (While it is true that, given a perfect scheduler, the total runtime of N_1..100 being kicked off all at once should be equal to N_1..100 being kicked off serially, it's quite likely that doing the former will mean at least a few of those 100 operations hitting its *individual* timeout at the LRM level.) The TE doesn't have enough knowledge to enforce this, since it doesn't know if monitors get scheduled. The transition timers weren't really a problem, since they had some lee-way accounted for. If we don't have this functionality right now anymore, I do believe we need it back. I do seem to recall that at the time, Andrew preferred it to be implemented at the LRM level, because it avoided a more complex transition graph logic (e.g., the batch-limit functionality on a per-node level, and doing something smart about monitors); but my memory is hazy on this detail. Nowadays, since we have the migration-threshold anyway, it may be possible to do something about it cleanly in the TE, but that still would leave the monitors unsolved ... Regards, Lars (PS: 1.1.8 really isn't turning out to be my favorite release. If I wasn't afraid it'd received as a rant, I'd try to write up a post-mortem from my/our perspective to see what might be avoidable in the future.) -- Architect Storage/HA SUSE LINUX Products GmbH, GF: Jeff Hawn, Jennifer Guild, Felix Imendörffer, HRB 21284 (AG Nürnberg) Experience is the name everyone gives to their mistakes. -- Oscar Wilde ___ Pacemaker mailing list: Pacemaker@oss.clusterlabs.org http://oss.clusterlabs.org/mailman/listinfo/pacemaker Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://bugs.clusterlabs.org