date:20130401

Re: [Pacemaker] Question about crm_mon -n option

2013-04-01 Thread Kazunori INOUE

(13.03.27 18:01), Andrew Beekhof wrote:
 On Wed, Mar 27, 2013 at 7:44 PM, Kazunori INOUE
 inouek...@intellilink.co.jp wrote:
 Hi,

 I'm using pacemaker-1.1 (c7910371a5. the latest devel).

 In the case of globally-unique=false, instance numbers are appended
 to the result of crm_mon -n, as with in the case of
 globally-unique=true.
 Is this specifications?

 $ crm configure show
   :
 primitive prmDummy ocf:pacemaker:Dummy
 clone clnDummy prmDummy \
  meta clone-max=2 clone-node-max=1 globally-unique=false

 $ crm_mon -n
   :
 Node dev1 (3232261525): online
  prmDummy:1  (ocf::pacemaker:Dummy): Started
 Node dev2 (3232261523): online
  prmDummy:0  (ocf::pacemaker:Dummy): Started


 Case without -n, instance numbers are not appended.

 Yeah, instance numbers shouldn't show up here

I wrote the patch which does not display instance numbers, when
globally-unique is false.
https://github.com/inouekazu/pacemaker/commit/c9b0ef4e4b3be336a31d83a9297ef23f1adf7c8b

The following files are results of crm_mon before and after applying
this patch.
- before_applying.log
- after_applying.log

The cluster configuration is as follows.

$ crm configure show
node $id=3232261523 dev2
node $id=3232261525 dev1
primitive prmDummy ocf:pacemaker:Dummy \
op monitor on-fail=restart interval=10s
primitive prmDummy2 ocf:pacemaker:Dummy \
op monitor on-fail=restart interval=10s
primitive prmStateful ocf:pacemaker:Stateful \
op monitor interval=11s role=Master on-fail=restart \
op monitor interval=12s role=Slave on-fail=restart
ms msStateful prmStateful \
meta master-max=1 master-node-max=1 clone-max=2 
clone-node-max=1 notify=true globally-unique=false
clone clnDummy prmDummy \
meta clone-max=2 clone-node-max=1 globally-unique=false
property $id=cib-bootstrap-options \
dc-version=1.1.10-1.el6-e8caee8 \
cluster-infrastructure=corosync \
no-quorum-policy=ignore \
stonith-enabled=false \
startup-fencing=false
rsc_defaults $id=rsc-options \
resource-stickiness=INFINITY \
migration-threshold=1


 $ crm_mon -r
   :
 Full list of resources:

   Clone Set: clnDummy [prmDummy]
   Started: [ dev1 dev2 ]

 
 Best Regards,
 Kazunori INOUE


 ___
 Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
 http://oss.clusterlabs.org/mailman/listinfo/pacemaker

 Project Home: http://www.clusterlabs.org
 Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
 Bugs: http://bugs.clusterlabs.org

 ___
 Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
 http://oss.clusterlabs.org/mailman/listinfo/pacemaker

 Project Home: http://www.clusterlabs.org
 Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
 Bugs: http://bugs.clusterlabs.org
$ crm_mon -1
Last updated: Mon Apr  1 16:09:19 2013
Last change: Mon Apr  1 15:27:44 2013 via cibadmin on dev1
Stack: corosync
Current DC: dev2 (3232261523) - partition with quorum
Version: 1.1.10-1.el6-e8caee8
2 Nodes configured, unknown expected votes
5 Resources configured.


Online: [ dev1 dev2 ]

 prmDummy2  (ocf::pacemaker:Dummy): Started dev1
 Master/Slave Set: msStateful [prmStateful]
 Masters: [ dev1 ]
 Stopped: [ prmStateful:1 ]
 Clone Set: clnDummy [prmDummy]
 Started: [ dev1 ]
 Stopped: [ prmDummy:1 ]

Failed actions:
prmStateful_monitor_12000 (node=dev2, call=38, rc=7, status=complete): not 
running
prmDummy_monitor_1 (node=dev2, call=25, rc=7, status=complete): not 
running
$
$ crm_mon -n1
Last updated: Mon Apr  1 16:09:26 2013
Last change: Mon Apr  1 15:27:44 2013 via cibadmin on dev1
Stack: corosync
Current DC: dev2 (3232261523) - partition with quorum
Version: 1.1.10-1.el6-e8caee8
2 Nodes configured, unknown expected votes
5 Resources configured.


Node dev1 (3232261525): online
prmDummy2   (ocf::pacemaker:Dummy): Started
prmStateful:0   (ocf::pacemaker:Stateful):  Master
prmDummy:0  (ocf::pacemaker:Dummy): Started
Node dev2 (3232261523): online

Failed actions:
prmStateful_monitor_12000 (node=dev2, call=38, rc=7, status=complete): not 
running
prmDummy_monitor_1 (node=dev2, call=25, rc=7, status=complete): not 
running
$
$ crm_mon -r1
Last updated: Mon Apr  1 16:09:30 2013
Last change: Mon Apr  1 15:27:44 2013 via cibadmin on dev1
Stack: corosync
Current DC: dev2 (3232261523) - partition with quorum
Version: 1.1.10-1.el6-e8caee8
2 Nodes configured, unknown expected votes
5 Resources configured.


Online: [ dev1 dev2 ]

Full list of resources:

 prmDummy2  (ocf::pacemaker:Dummy): Started dev1
 Master/Slave Set: msStateful [prmStateful]
 Masters: [ dev1 ]
 Stopped: [ prmStateful:1 ]
 Clone Set: clnDummy [prmDummy]
 Started: [ dev1 ]
 Stopped: [ prmDummy:1 ]

Failed actions:
prmStateful_monitor_12000 (node=dev2, call=38, rc=7,

Re: [Pacemaker] Speeding up startup after migration

2013-04-01 Thread David Vossel

- Original Message -
From: Vladislav Bogdanov bub...@hoster-ok.com
To: pacemaker@oss.clusterlabs.org
Sent: Friday, March 29, 2013 2:03:27 AM
Subject: Re: [Pacemaker] Speeding up startup after migration

29.03.2013 03:31, Andrew Beekhof wrote:
On Fri, Mar 29, 2013 at 4:12 AM, Benjamin Kiessling
mittages...@l.unchti.me wrote:
Hi,

we've got a small pacemaker cluster running which controls an
active/passive router. On this cluster we've got a semi-large (~30)
number of primitives which are grouped together. On migration it takes
quite a long time until each resource is brought up again because they
are started sequentially. Is there a way to speed up the process,
ideally to execute these resource agents in parallel? They are fully
independent so the order in which they finish is of no concern.

I'm guessing you have them in a group? Don't do that and they will
fail over in parallel.

Does current lrmd implementation have batch-limit like cluster-glue's
one had? Can't find where is it.

The batch-limit option is still around, but has nothing to do with the lrmd.
It does limit how many resources can execute in parallel, but at the transition
engine level rather than the lrmd.

http://clusterlabs.org/doc/en-US/Pacemaker/1.1-pcs/html-single/Pacemaker_Explained/index.html#_available_cluster_options

-- Vossel

___
Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org

___
Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org

Re: [Pacemaker] Same host displayed twice in crm status

2013-04-01 Thread David Vossel

- Original Message -
 From: Nicolas J. nikkro70+pacema...@gmail.com
 To: pacemaker@oss.clusterlabs.org
 Sent: Friday, March 29, 2013 8:55:30 AM
 Subject: [Pacemaker] Same host displayed twice in crm status

 Hi,

 I have a problem with a Corosync/Pacemaker configuration.
 One host of the cluster has been renamed and now the host is displayed twice
 in the configuration.

 When I try to remove the host from the configuration it works but if corosync
 is restarted on one node, the old host appears again.
 I tried several ways to delete the host with no effect.

 How can I delete the wrong host?

For the pacemaker version you are using, try deleting the node from the 
configuration in both the node and status sections, then use crm_node -R 
option to remove the node from the cluster's internal cache.  In pacemaker 
versions = 1.1.8 only the crm_node -R option is required to remove a node.

-- Vossel

 I checked the Linux configuration and there is no place where the old name is
 referenced.
 It's an OEL/Red Hat linux.

 Output
 -
 [root@vmtestoradg2 ~]# crm status

 Last updated: Fri Mar 29 14:51:56 2013
 Stack: openais
 Current DC: vmtestoradg1 - partition with quorum
 Version: 1.1.5-1.1.el5-01e86afaaa6d4a8c4836f68df80ababd6ca3902f
 4 Nodes configured, 3 expected votes
 1 Resources configured.

 Online: [ vmtestoradg1 vmtestora10g01 vmtestoradg2 ]
 OFFLINE: [ VMTESTORADG2.it.dbi-services.com ]

 DG_IP (ocf::heartbeat:IPaddr2): Started vmtestoradg1

 [root@vmtestoradg2 ~]# crm node clearstate VMTESTORADG2.it.dbi-services.com
 Do you really want to drop state for node VMTESTORADG2.it.dbi-services.com ?
 y
 [root@vmtestoradg2 ~]# crm node delete VMTESTORADG2.it.dbi-services.com
 INFO: node VMTESTORADG2.it.dbi-services.com not found by crm_node
 INFO: node VMTESTORADG2.it.dbi-services.com deleted

 Thanks in advance

 Best Regards,

 Nicolas J.

 ___
 Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
 http://oss.clusterlabs.org/mailman/listinfo/pacemaker

 Project Home: http://www.clusterlabs.org
 Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
 Bugs: http://bugs.clusterlabs.org

___
Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org

Re: [Pacemaker] Speeding up startup after migration

2013-04-01 Thread Vladislav Bogdanov

01.04.2013 17:28, David Vossel пишет:

29.03.2013 03:31, Andrew Beekhof wrote:
On Fri, Mar 29, 2013 at 4:12 AM, Benjamin Kiessling
mittages...@l.unchti.me wrote:
Hi,

I'm guessing you have them in a group? Don't do that and they will
fail over in parallel.

Does current lrmd implementation have batch-limit like cluster-glue's
one had? Can't find where is it.

The batch-limit option is still around, but has nothing to do with
the lrmd. It does limit how many resources can execute in parallel, but at
the transition engine level rather than the lrmd.

Yep, I know that option, it was there for a very long time.

So, if I understand correctly, new lrmd runs as many simultaneous jobs
as possible. Unfortunately, in some circumstances this would result in
the high node load and timeouts. Is there a way to some-how limit that load?

http://clusterlabs.org/doc/en-US/Pacemaker/1.1-pcs/html-single/Pacemaker_Explained/index.html#_available_cluster_options

-- Vossel

___
Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org

___
Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org

___
Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org

Re: [Pacemaker] Speeding up startup after migration

2013-04-01 Thread David Vossel

- Original Message -
From: Vladislav Bogdanov bub...@hoster-ok.com
To: pacemaker@oss.clusterlabs.org
Sent: Monday, April 1, 2013 10:35:39 AM
Subject: Re: [Pacemaker] Speeding up startup after migration

01.04.2013 17:28, David Vossel пишет:

29.03.2013 03:31, Andrew Beekhof wrote:
On Fri, Mar 29, 2013 at 4:12 AM, Benjamin Kiessling
mittages...@l.unchti.me wrote:
Hi,

I'm guessing you have them in a group? Don't do that and they will
fail over in parallel.

Does current lrmd implementation have batch-limit like cluster-glue's
one had? Can't find where is it.

The batch-limit option is still around, but has nothing to do with
the lrmd. It does limit how many resources can execute in parallel, but at
the transition engine level rather than the lrmd.

Yep, I know that option, it was there for a very long time.

Isn't that what the batch-limit option does? or are you saying you want a
batch limit type option that is node specific? Why are you concerned about this
behavior living in the LRMD instead of at the transition processing level?

I believe if we do any batch limiting type behavior at the LRMD level we're
going to run into problems with the transition timers in the crmd. The LRMD
needs to always perform the actions it is given as soon as possible.