Re: [Pacemaker] Get group behaviour with Master slave or clones envolved

2014-02-17 Thread Néstor C .
2014-02-17 1:22 GMT+01:00 Andrew Beekhof and...@beekhof.net:


 On 21 Jan 2014, at 10:50 pm, Néstor C. xala...@gmail.com wrote:

  Hello.
 
  When you need that some primitives switch in block you can use a group.

 Groups are just a syntactic shortcut for ordering and colocation
 constraints.


There is a manner of view this hyde contraints?


 
  There is a manner to get this when you have a clone or a master/slave
 involved?
 
  For example:
 
  Imagine a drbd disk (DR), a filesystem over ti (FS) and a service over
 all (SRV).
  The first one is a ms resource, and the other, primitives.
 
  The colocation rules are:
  colocation fs_on_dr inf: FS DR:Master
  colocation srv_on_fs inf: SRV FS
 
  The order rules are:
  order fs_after_dr inf: DR:promote FS:start
  order srv_after_fs inf: FS:start SRV:start
 
  How can switch the entire cluster to other node if SRV fails? (like if
 all was in a group)

 In what way does it not do so already with the above constraints?


I want some kind of  circular constraint like if srv fails move the
entire stack to other node.
If i add other constraint like:
colocation dr on srv inf: DR:Master SRV

It doesn't start.

Thanks!


___
Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org


Re: [Pacemaker] resource is too active problem in a 2-node cluster

2014-02-17 Thread Ajay Aggarwal
Thanks Andrew for pointing towards the OCF resource agent's list of 
must implement actions. I noticed that our OCF script only implements 
start, stop and monitor. It does not implement meta-data and 
validate-all.  Could this error be a result of these un-implemented 
actions?


Thanks

On 02/16/2014 09:15 PM, Andrew Beekhof wrote:

On 12 Feb 2014, at 1:39 am, Ajay Aggarwal aaggar...@verizon.com wrote:


Yes, we have cman (version: cman-3.0.12.1-49). We use manual fencing ( I know 
it is not recommended).  There is an external monitoring and fencing service 
that we use (our own).

Perhaps subject line resource is too active problem in a 2-node cluster was 
misleading. Real problem is that resource is *NOT* too active, but pacemaker thinks it is.

It only thinks what the resource agent tells us.
Sounds like script.sh isn't OCF compliant.

  
http://clusterlabs.org/doc/en-US/Pacemaker/1.1/html/Pacemaker_Explained/_actions.html


Which leads to undesirable recovery procedure. See log lines below

Feb 04 11:27:38 [45167] gol-5-7-0pengine:  warning: unpack_rsc_op: 
Processing failed op monitor for GOL-HA on gol-5-7-0: unknown error (1)
Feb 04 11:27:38 [45167] gol-5-7-0pengine:  warning: unpack_rsc_op: 
Processing failed op monitor for GOL-HA on gol-5-7-6: unknown error (1)
Feb 04 11:27:38 [45167] gol-5-7-0pengine:error: native_create_actions:  
   Resource GOL-HA (ocf::script.sh) is active on 2 nodes attempting recovery





On 02/10/2014 09:43 PM, Digimer wrote:

On 10/02/14 09:13 PM, Aggarwal, Ajay wrote:

I have a 2 node cluster with no-quorum-policy=ignore. I call these nodes as 
node-0 and node-1. In addition, I have two cluster resources in a group; an 
IP-address and an OCF script.

Turning off quorum on a 2-node cluster is fine, in fact, it's required. 
However, that makes stonith all the more important. Without stonith, in any 
cluster but in particualr on two node clusters, things will not work right.

First and foremost; Configure stonith and test to make sure it works.


Pacemaker version: 1.1.10
Corosync version: 1-4.1-15
OS: CentOS 6.4

With CentOS/RHEL 6, you need cman as well. Please be sure to also configure fence_pcmk in 
cluster.conf to hook it into pacemaker's real fencing.


What am I doing wrong?

snip

 nvpair id=cib-bootstrap-options-stonith-enabled name=stonith-enabled 
value=false/

That. :)

Once you have stonith working, see if the problem remains.



___
Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org



___
Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org


___
Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org


Re: [Pacemaker] node1 fencing itself after node2 being fenced

2014-02-17 Thread Asgaroth
 -Original Message-
 From: Andrew Beekhof [mailto:and...@beekhof.net]
 Sent: 17 February 2014 00:55
 To: li...@blueface.com; The Pacemaker cluster resource manager
 Subject: Re: [Pacemaker] node1 fencing itself after node2 being fenced
 
 
 If you have configured cman to use fence_pcmk, then all cman/dlm/clvmd
 fencing operations are sent to Pacemaker.
 If you aren't running pacemaker, then you have a big problem as no-one can
 perform fencing.

I have configured pacemaker as the resource manager and I have it enabled to
start on boot-up too as follows:

chkconfig cman on
chkconfig clvmd on
chkconfig pacemaker on

 
 I don't know if you are testing without pacemaker running, but if so you
 would need to configure cman with real fencing devices.


I have been testing with pacemaker running and the fencing appears to be
operating fine, the issue I seem to have is that clvmd is unable re-acquire
its locks when attempting to rejoin the cluster after a fence operation, so
it looks like clvmd just hangs when the startup script fires it off on
boot-up. When the 3rd node is in this state (hung clvmd), then the other 2
nodes are unable to obtain locks from the third node as clvmd has hung, as
an example, this is what happens when the 3rd node is hung at the clvmd
startup phase after pacemaker has issued a fence operation (running pvs on
node1)

[root@test01 ~]# pvs
  Error locking on node test03: Command timed out
  Unable to obtain global lock.
 
The dlm elements look fine to me here too:

[root@test01 ~]# dlm_tool ls
dlm lockspaces
name  cdr
id0xa8054052
flags 0x0008 fs_reg
changemember 2 joined 0 remove 1 failed 1 seq 2,2
members   1 2 

name  clvmd
id0x4104eefa
flags 0x 
changemember 3 joined 1 remove 0 failed 0 seq 3,3
members   1 2 3

So it looks like cman/dlm are operating properly, however, clvmd hangs and
never exits so pacemaker never starts on the 3rd node. So the 3rd node is in
pending state while clvmd is hung:

[root@test02 ~]# crm_mon -Afr -1
Last updated: Mon Feb 17 15:52:28 2014
Last change: Mon Feb 17 15:43:16 2014 via cibadmin on test01
Stack: cman
Current DC: test02 - partition with quorum
Version: 1.1.10-14.el6_5.2-368c726
3 Nodes configured
15 Resources configured


Node test03: pending
Online: [ test01 test02 ]

Full list of resources:

 fence_test01  (stonith:fence_vmware_soap):Started test01 
 fence_test02  (stonith:fence_vmware_soap):Started test02 
 fence_test03  (stonith:fence_vmware_soap):Started test01 
 Clone Set: fs_cdr-clone [fs_cdr]
 Started: [ test01 test02 ]
 Stopped: [ test03 ]
 Resource Group: sftp01-vip
 vip-001(ocf::heartbeat:IPaddr2):   Started test01 
 vip-002(ocf::heartbeat:IPaddr2):   Started test01 
 Resource Group: sftp02-vip
 vip-003(ocf::heartbeat:IPaddr2):   Started test02 
 vip-004(ocf::heartbeat:IPaddr2):   Started test02 
 Resource Group: sftp03-vip
 vip-005(ocf::heartbeat:IPaddr2):   Started test02 
 vip-006(ocf::heartbeat:IPaddr2):   Started test02 
 sftp01 (lsb:sftp01):   Started test01 
 sftp02 (lsb:sftp02):   Started test02 
 sftp03 (lsb:sftp03):   Started test02 

Node Attributes:
* Node test01:
* Node test02:
* Node test03:

Migration summary:
* Node test03: 
* Node test02: 
* Node test01:


___
Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org


Re: [Pacemaker] resource is too active problem in a 2-node cluster

2014-02-17 Thread Andrew Beekhof

On 18 Feb 2014, at 5:33 am, Ajay Aggarwal aaggar...@verizon.com wrote:

 Thanks Andrew for pointing towards the OCF resource agent's list of must 
 implement actions. I noticed that our OCF script only implements start, stop 
 and monitor. It does not implement meta-data and validate-all.  Could this 
 error be a result of these un-implemented actions? 

Unlikely. More likely the monitor action is not correctly returning 
OCF_NOT_RUNNING if run before the resource is running.

 On 02/16/2014 09:15 PM, Andrew Beekhof wrote:
 On 12 Feb 2014, at 1:39 am, Ajay Aggarwal aaggar...@verizon.com
  wrote:
 
 
 Yes, we have cman (version: cman-3.0.12.1-49). We use manual fencing ( I 
 know it is not recommended).  There is an external monitoring and fencing 
 service that we use (our own).
 
 Perhaps subject line resource is too active problem in a 2-node cluster 
 was misleading. Real problem is that resource is *NOT* too active, but 
 pacemaker thinks it is.
 
 It only thinks what the resource agent tells us.
 Sounds like script.sh isn't OCF compliant.
 
  
 http://clusterlabs.org/doc/en-US/Pacemaker/1.1/html/Pacemaker_Explained/_actions.html
 
 
 
 Which leads to undesirable recovery procedure. See log lines below
 
 Feb 04 11:27:38 [45167] gol-5-7-0pengine:  warning: unpack_rsc_op: 
 Processing failed op monitor for GOL-HA on gol-5-7-0: unknown error (1)
 Feb 04 11:27:38 [45167] gol-5-7-0pengine:  warning: unpack_rsc_op: 
 Processing failed op monitor for GOL-HA on gol-5-7-6: unknown error (1)
 Feb 04 11:27:38 [45167] gol-5-7-0pengine:error: 
 native_create_actions: Resource GOL-HA (ocf::script.sh) is active on 2 
 nodes attempting recovery
 
 
 
 
 
 On 02/10/2014 09:43 PM, Digimer wrote:
 
 On 10/02/14 09:13 PM, Aggarwal, Ajay wrote:
 
 I have a 2 node cluster with no-quorum-policy=ignore. I call these nodes 
 as node-0 and node-1. In addition, I have two cluster resources in a 
 group; an IP-address and an OCF script.
 
 Turning off quorum on a 2-node cluster is fine, in fact, it's required. 
 However, that makes stonith all the more important. Without stonith, in 
 any cluster but in particualr on two node clusters, things will not work 
 right.
 
 First and foremost; Configure stonith and test to make sure it works.
 
 
Pacemaker version: 1.1.10
Corosync version: 1-4.1-15
OS: CentOS 6.4
 
 With CentOS/RHEL 6, you need cman as well. Please be sure to also 
 configure fence_pcmk in cluster.conf to hook it into pacemaker's real 
 fencing.
 
 
 What am I doing wrong?
 
 snip
 
 nvpair id=cib-bootstrap-options-stonith-enabled 
 name=stonith-enabled value=false/
 
 That. :)
 
 Once you have stonith working, see if the problem remains.
 
 
 
 ___
 Pacemaker mailing list: 
 Pacemaker@oss.clusterlabs.org
 http://oss.clusterlabs.org/mailman/listinfo/pacemaker
 
 
 Project Home: 
 http://www.clusterlabs.org
 
 Getting started: 
 http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
 
 Bugs: 
 http://bugs.clusterlabs.org
 
 
 ___
 Pacemaker mailing list: 
 Pacemaker@oss.clusterlabs.org
 http://oss.clusterlabs.org/mailman/listinfo/pacemaker
 
 
 Project Home: 
 http://www.clusterlabs.org
 
 Getting started: 
 http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
 
 Bugs: 
 http://bugs.clusterlabs.org
 
 ___
 Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
 http://oss.clusterlabs.org/mailman/listinfo/pacemaker
 
 Project Home: http://www.clusterlabs.org
 Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
 Bugs: http://bugs.clusterlabs.org



signature.asc
Description: Message signed with OpenPGP using GPGMail
___
Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org


Re: [Pacemaker] Get group behaviour with Master slave or clones envolved

2014-02-17 Thread Andrew Beekhof

On 17 Feb 2014, at 10:34 pm, Néstor C. xala...@gmail.com wrote:

 
 
 
 2014-02-17 1:22 GMT+01:00 Andrew Beekhof and...@beekhof.net:
 
 On 21 Jan 2014, at 10:50 pm, Néstor C. xala...@gmail.com wrote:
 
  Hello.
 
  When you need that some primitives switch in block you can use a group.
 
 Groups are just a syntactic shortcut for ordering and colocation constraints.
 
 There is a manner of view this hyde contraints? 
 
 
 
  There is a manner to get this when you have a clone or a master/slave 
  involved?
 
  For example:
 
  Imagine a drbd disk (DR), a filesystem over ti (FS) and a service over all 
  (SRV).
  The first one is a ms resource, and the other, primitives.
 
  The colocation rules are:
  colocation fs_on_dr inf: FS DR:Master
  colocation srv_on_fs inf: SRV FS
 
  The order rules are:
  order fs_after_dr inf: DR:promote FS:start
  order srv_after_fs inf: FS:start SRV:start
 
  How can switch the entire cluster to other node if SRV fails? (like if all 
  was in a group)
 
 In what way does it not do so already with the above constraints?
 
 I want some kind of  circular constraint

We actively detect and prevent colocation and ordering loops.
What version of pacemaker is this?

 like if srv fails move the entire stack to other node.
 If i add other constraint like:
 colocation dr on srv inf: DR:Master SRV
 
 It doesn't start.
 
 Thanks!
  
 
 ___
 Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
 http://oss.clusterlabs.org/mailman/listinfo/pacemaker
 
 Project Home: http://www.clusterlabs.org
 Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
 Bugs: http://bugs.clusterlabs.org



signature.asc
Description: Message signed with OpenPGP using GPGMail
___
Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org


Re: [Pacemaker] node1 fencing itself after node2 being fenced

2014-02-17 Thread Andrew Beekhof

On 18 Feb 2014, at 5:52 am, Asgaroth li...@blueface.com wrote:

 -Original Message-
 From: Andrew Beekhof [mailto:and...@beekhof.net]
 Sent: 17 February 2014 00:55
 To: li...@blueface.com; The Pacemaker cluster resource manager
 Subject: Re: [Pacemaker] node1 fencing itself after node2 being fenced
 
 
 If you have configured cman to use fence_pcmk, then all cman/dlm/clvmd
 fencing operations are sent to Pacemaker.
 If you aren't running pacemaker, then you have a big problem as no-one can
 perform fencing.
 
 I have configured pacemaker as the resource manager and I have it enabled to
 start on boot-up too as follows:
 
 chkconfig cman on
 chkconfig clvmd on
 chkconfig pacemaker on
 
 
 I don't know if you are testing without pacemaker running, but if so you
 would need to configure cman with real fencing devices.
 
 
 I have been testing with pacemaker running and the fencing appears to be
 operating fine, the issue I seem to have is that clvmd is unable re-acquire
 its locks when attempting to rejoin the cluster after a fence operation, so
 it looks like clvmd just hangs when the startup script fires it off on
 boot-up. When the 3rd node is in this state (hung clvmd), then the other 2
 nodes are unable to obtain locks from the third node as clvmd has hung, as
 an example, this is what happens when the 3rd node is hung at the clvmd
 startup phase after pacemaker has issued a fence operation (running pvs on
 node1)

The 3rd node should (and needs to be) fenced at this point to allow the cluster 
to continue.
Is this not happening?

Did you specify on-fail=fence for the clvmd agent?

 
 [root@test01 ~]# pvs
  Error locking on node test03: Command timed out
  Unable to obtain global lock.
 
 The dlm elements look fine to me here too:
 
 [root@test01 ~]# dlm_tool ls
 dlm lockspaces
 name  cdr
 id0xa8054052
 flags 0x0008 fs_reg
 changemember 2 joined 0 remove 1 failed 1 seq 2,2
 members   1 2 
 
 name  clvmd
 id0x4104eefa
 flags 0x 
 changemember 3 joined 1 remove 0 failed 0 seq 3,3
 members   1 2 3
 
 So it looks like cman/dlm are operating properly, however, clvmd hangs and
 never exits so pacemaker never starts on the 3rd node. So the 3rd node is in
 pending state while clvmd is hung:
 
 [root@test02 ~]# crm_mon -Afr -1
 Last updated: Mon Feb 17 15:52:28 2014
 Last change: Mon Feb 17 15:43:16 2014 via cibadmin on test01
 Stack: cman
 Current DC: test02 - partition with quorum
 Version: 1.1.10-14.el6_5.2-368c726
 3 Nodes configured
 15 Resources configured
 
 
 Node test03: pending
 Online: [ test01 test02 ]
 
 Full list of resources:
 
 fence_test01  (stonith:fence_vmware_soap):Started test01 
 fence_test02  (stonith:fence_vmware_soap):Started test02 
 fence_test03  (stonith:fence_vmware_soap):Started test01 
 Clone Set: fs_cdr-clone [fs_cdr]
 Started: [ test01 test02 ]
 Stopped: [ test03 ]
 Resource Group: sftp01-vip
 vip-001(ocf::heartbeat:IPaddr2):   Started test01 
 vip-002(ocf::heartbeat:IPaddr2):   Started test01 
 Resource Group: sftp02-vip
 vip-003(ocf::heartbeat:IPaddr2):   Started test02 
 vip-004(ocf::heartbeat:IPaddr2):   Started test02 
 Resource Group: sftp03-vip
 vip-005(ocf::heartbeat:IPaddr2):   Started test02 
 vip-006(ocf::heartbeat:IPaddr2):   Started test02 
 sftp01 (lsb:sftp01):   Started test01 
 sftp02 (lsb:sftp02):   Started test02 
 sftp03 (lsb:sftp03):   Started test02 
 
 Node Attributes:
 * Node test01:
 * Node test02:
 * Node test03:
 
 Migration summary:
 * Node test03: 
 * Node test02: 
 * Node test01:
 
 
 ___
 Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
 http://oss.clusterlabs.org/mailman/listinfo/pacemaker
 
 Project Home: http://www.clusterlabs.org
 Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
 Bugs: http://bugs.clusterlabs.org



signature.asc
Description: Message signed with OpenPGP using GPGMail
___
Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org


Re: [Pacemaker] crm_mon --as-html default permissions

2014-02-17 Thread Andrew Beekhof

On 12 Feb 2014, at 9:53 pm, Marko Potocnik marko.potoc...@gmail.com wrote:

 Hi,
 
 I've upgraded from pacemaker-1.1.7-6.el6.x86_64 to 
 pacemaker-1.1.10-14.el6_5.2.x86_64.
 I use crm_mon with --as-html option to get the cluster status in html file. 
 I've noticed that the permissions for file have changed from 644 to 640. 
 Looking at source code I see that umask is set to reflect the 640 
 permissions, but not for crm_mon.
 default system umask is set to 0022 (644 permission).
 
 Any idea why I get the 640 permissions?

There doesn't seem to be anything explicit in the crm_mon code. Just a call to 
fopen()

   Any created files will have mode S_IRUSR | S_IWUSR | S_IRGRP | S_IWGRP | 
S_IROTH | S_IWOTH (0666), as modified by the process's umask value (see 
umask(2)).

However, it seems all code runs the following in crm_log_init():

umask(S_IWGRP | S_IWOTH | S_IROTH);

which could well be the cause


signature.asc
Description: Message signed with OpenPGP using GPGMail
___
Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org


Re: [Pacemaker] Resource Agents in OpenVZ containers

2014-02-17 Thread Andrew Beekhof

On 17 Feb 2014, at 5:38 pm, emmanuel segura emi2f...@gmail.com wrote:

 example: colocation ipwithpgsql inf: virtualip psql:Master

Ah, so colocating with the host running the vm with the master inside it.
Thats not something we can do yet, sorry.

 
 
 2014-02-17 6:25 GMT+01:00 Tomasz Kontusz tomasz.kont...@gmail.com:
 
 
 Andrew Beekhof and...@beekhof.net napisał:
 
 On 16 Feb 2014, at 6:53 am, emmanuel segura emi2f...@gmail.com wrote:
 
  i think, if you use pacemaker_remote inside the container, the
 container will be a normal node of you cluster, so you can run pgsql +
 vip in it
 
 
  2014-02-15 19:40 GMT+01:00 Tomasz Kontusz tomasz.kont...@gmail.com:
  Hi
  I'm setting up a cluster which will use OpenVZ containers for
 separating resource's environments.
  So far I see it like this:
   * each node runs Pacemaker
   * each container runs pacemaker_remote, and one kind of resource
 (but there might be multiple containers providing the same resource)
   * containers are started with VirtualDomain agent (I had to patch it
 a bit to work around libvirt/OpenVZ issue),
 each container resource is node-specific (and constrained to only
 run on the right node)
 
  The problem I have is with running pgsql database with virtual IP in
 such setup.
  I want to have IPaddr2 resource started on the node that holds
 container with current pgsql master.
  How can I go about achieving something like that?
 
 Colocate the IP with the OpenVZ VM?
 
 Won't work, because the containers are normally all running (psql01 on node 
 01, psql02 on node 02), and I want to collocate with current master.
 
  Is the idea of using pacemaker_remote in such setup sensible?
 
  --
  Tomasz Kontusz
 
  ___
  Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
  http://oss.clusterlabs.org/mailman/listinfo/pacemaker
 
  Project Home: http://www.clusterlabs.org
  Getting started:
 http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
  Bugs: http://bugs.clusterlabs.org
 
 
 
  --
  esta es mi vida e me la vivo hasta que dios quiera
  ___
  Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
  http://oss.clusterlabs.org/mailman/listinfo/pacemaker
 
  Project Home: http://www.clusterlabs.org
  Getting started:
 http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
  Bugs: http://bugs.clusterlabs.org
 
 
 
 
 
 ___
 Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
 http://oss.clusterlabs.org/mailman/listinfo/pacemaker
 
 Project Home: http://www.clusterlabs.org
 Getting started:
 http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
 Bugs: http://bugs.clusterlabs.org
 
 --
 Wysłane za pomocą K-9 Mail.
 
 ___
 Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
 http://oss.clusterlabs.org/mailman/listinfo/pacemaker
 
 Project Home: http://www.clusterlabs.org
 Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
 Bugs: http://bugs.clusterlabs.org
 
 
 
 -- 
 esta es mi vida e me la vivo hasta que dios quiera
 ___
 Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
 http://oss.clusterlabs.org/mailman/listinfo/pacemaker
 
 Project Home: http://www.clusterlabs.org
 Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
 Bugs: http://bugs.clusterlabs.org



signature.asc
Description: Message signed with OpenPGP using GPGMail
___
Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org


Re: [Pacemaker] What is the reason which the node in which failure has not occurred carries out lost?

2014-02-17 Thread Andrew Beekhof

On 31 Jan 2014, at 6:20 pm, yusuke iida yusk.i...@gmail.com wrote:

 Hi, all
 
 I measure the performance of Pacemaker in the following combinations.
 Pacemaker-1.1.11.rc1
 libqb-0.16.0
 corosync-2.3.2
 
 All nodes are KVM virtual machines.
 
  stopped the node of vm01 compulsorily from the inside, after starting 14 
 nodes.
 virsh destroy vm01 was used for the stop.
 Then, in addition to the compulsorily stopped node, other nodes are separated 
 from a cluster.
 
 The log of Retransmit List: is then outputted in large quantities from 
 corosync.

Probably best to poke the corosync guys about this.

However, = .11 is known to cause significant CPU usage with that many nodes.
I can easily imagine this staving corosync of resources and causing breakage.

I would _highly_ recommend retesting with the current git master of pacemaker.
I merged the new cib code last week which is faster by _two_ orders of 
magnitude and uses significantly less CPU.

I'd be interested to hear your feedback.

 
 What is the reason which the node in which failure has not occurred carries 
 out lost?
 
 Please advise, if there is a problem in a setup in something.
 
 I attached the report when the problem occurred.
 https://drive.google.com/file/d/0BwMFJItoO-fVMkFWWWlQQldsSFU/edit?usp=sharing
 
 Regards,
 Yusuke
 -- 
  
 METRO SYSTEMS CO., LTD 
 
 Yusuke Iida 
 Mail: yusk.i...@gmail.com
  
 ___
 Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
 http://oss.clusterlabs.org/mailman/listinfo/pacemaker
 
 Project Home: http://www.clusterlabs.org
 Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
 Bugs: http://bugs.clusterlabs.org



signature.asc
Description: Message signed with OpenPGP using GPGMail
___
Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org


Re: [Pacemaker] crm_resource -L not trustable right after restart

2014-02-17 Thread Andrew Beekhof

On 22 Jan 2014, at 10:54 am, Brian J. Murrell (brian) br...@interlinx.bc.ca 
wrote:

 On Thu, 2014-01-16 at 14:49 +1100, Andrew Beekhof wrote:
 
 What crm_mon are you looking at?
 I see stuff like:
 
 virt-fencing (stonith:fence_xvm):Started rhos4-node3 
 Resource Group: mysql-group
 mysql-vip(ocf::heartbeat:IPaddr2):   Started rhos4-node3 
 mysql-fs (ocf::heartbeat:Filesystem):Started rhos4-node3 
 mysql-db (ocf::heartbeat:mysql): Started rhos4-node3 
 
 Yes, you are right.  I couldn't see the forest for the trees.
 
 I initially was optimistic about crm_mon being more truthful than
 crm_resource but it turns out it is not.

It can't be, they're both obtaining their data from the same place (the cib).

 
 Take for example these commands to set a constraint and start a resource
 (which has already been defined at this point):
 
 [21/Jan/2014:13:46:40] cibadmin -o constraints -C -X 'rsc_location 
 id=res1-primary node=node5 rsc=res1 score=20/'
 [21/Jan/2014:13:46:41] cibadmin -o constraints -C -X 'rsc_location 
 id=res1-secondary node=node6 rsc=res1 score=10/'
 [21/Jan/2014:13:46:42] crm_resource -r 'res1' -p target-role -m -v 'Started'
 
 and then these repeated calls to crm_mon -1 on node5:
 
 [21/Jan/2014:13:46:42] crm_mon -1
 Last updated: Tue Jan 21 13:46:42 2014
 Last change: Tue Jan 21 13:46:42 2014 via crm_resource on node5
 Stack: openais
 Current DC: node5 - partition with quorum
 Version: 1.1.10-14.el6_5.1-368c726
 2 Nodes configured
 2 Resources configured
 
 
 Online: [ node5 node6 ]
 
 st-fencing(stonith:fence_product):Started node5 
 res1  (ocf::product:Target):  Started node6 
 
 [21/Jan/2014:13:46:42] crm_mon -1
 Last updated: Tue Jan 21 13:46:42 2014
 Last change: Tue Jan 21 13:46:42 2014 via crm_resource on node5
 Stack: openais
 Current DC: node5 - partition with quorum
 Version: 1.1.10-14.el6_5.1-368c726
 2 Nodes configured
 2 Resources configured
 
 
 Online: [ node5 node6 ]
 
 st-fencing(stonith:fence_product):Started node5 
 res1  (ocf::product:Target):  Started node6 
 
 [21/Jan/2014:13:46:49] crm_mon -1 -r
 Last updated: Tue Jan 21 13:46:49 2014
 Last change: Tue Jan 21 13:46:42 2014 via crm_resource on node5
 Stack: openais
 Current DC: node5 - partition with quorum
 Version: 1.1.10-14.el6_5.1-368c726
 2 Nodes configured
 2 Resources configured
 
 
 Online: [ node5 node6 ]
 
 Full list of resources:
 
 st-fencing(stonith:fence_product):Started node5 
 res1  (ocf::product:Target):  Started node5 
 
 The first two are not correct, showing the resource started on node6
 when it was actually started on node5.

Was it running there to begin with?
Answering my own question... yes. It was:

 Jan 21 13:46:41 node5 crmd[8695]:  warning: status_from_rc: Action 6 
 (res1_monitor_0) on node6 failed (target: 7 vs. rc: 0): Error

and then we try to stop it:

 Jan 21 13:46:41 node5 crmd[8695]:   notice: te_rsc_command: Initiating action 
 7: stop res1_stop_0 on node6


So you are correct that something is wrong, but it isn't pacemaker.


  Finally, 7 seconds later, it is
 reporting correctly.  The logs on node{5,6} bear this out.  The resource
 was actually only ever started on node5 and never on node6.

Wrong.



signature.asc
Description: Message signed with OpenPGP using GPGMail
___
Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org


Re: [Pacemaker] [Patch]Information of Connectivity is lost is not displayed

2014-02-17 Thread Andrew Beekhof

On 17 Feb 2014, at 5:43 pm, renayama19661...@ybb.ne.jp wrote:

 Hi All,
 
 The next change was accomplished by Mr. Lars.
 
 https://github.com/ClusterLabs/pacemaker/commit/6a17c003b0167de9fe51d5330fb6e4f1b4ffe64c

I'm confused... that patch seems to be the reverse of yours.
Are you saying that we need to undo Lars' one?

 
 I may lack the correction of other parts which are not the patch which I sent.
 
 Best Regards,
 Hideo Yamauchi.
 
 --- On Mon, 2014/2/17, renayama19661...@ybb.ne.jp 
 renayama19661...@ybb.ne.jp wrote:
 
 Hi All,
 
 The crm_mon tool which is attached to Pacemaker1.1 seems to have a problem.
 I send a patch.
 
 Best Regards,
 Hideo Yamauchi.
 
 ___
 Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
 http://oss.clusterlabs.org/mailman/listinfo/pacemaker
 
 Project Home: http://www.clusterlabs.org
 Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
 Bugs: http://bugs.clusterlabs.org



signature.asc
Description: Message signed with OpenPGP using GPGMail
___
Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org


[Pacemaker] [Problem] Fail-over is delayed.(State transition is not calculated.)

2014-02-17 Thread renayama19661014
Hi All,

I confirmed movement at the time of the trouble in one of Master/Slave in 
Pacemaker1.1.11.

-

Step1) Constitute a cluster.

[root@srv01 ~]# crm_mon -1 -Af
Last updated: Tue Feb 18 18:07:24 2014
Last change: Tue Feb 18 18:05:46 2014 via crmd on srv01
Stack: corosync
Current DC: srv01 (3232238180) - partition with quorum
Version: 1.1.10-9d39a6b
2 Nodes configured
6 Resources configured


Online: [ srv01 srv02 ]

 vip-master (ocf::heartbeat:Dummy): Started srv01 
 vip-rep(ocf::heartbeat:Dummy): Started srv01 
 Master/Slave Set: msPostgresql [pgsql]
 Masters: [ srv01 ]
 Slaves: [ srv02 ]
 Clone Set: clnPingd [prmPingd]
 Started: [ srv01 srv02 ]

Node Attributes:
* Node srv01:
+ default_ping_set  : 100   
+ master-pgsql  : 10
* Node srv02:
+ default_ping_set  : 100   
+ master-pgsql  : 5 

Migration summary:
* Node srv01: 
* Node srv02: 

Step2) Monitor error in vip-master.

[root@srv01 ~]# rm -rf /var/run/resource-agents/Dummy-vip-master.state 

[root@srv01 ~]# crm_mon -1 -Af  
Last updated: Tue Feb 18 18:07:58 2014
Last change: Tue Feb 18 18:05:46 2014 via crmd on srv01
Stack: corosync
Current DC: srv01 (3232238180) - partition with quorum
Version: 1.1.10-9d39a6b
2 Nodes configured
6 Resources configured


Online: [ srv01 srv02 ]

 Master/Slave Set: msPostgresql [pgsql]
 Masters: [ srv01 ]
 Slaves: [ srv02 ]
 Clone Set: clnPingd [prmPingd]
 Started: [ srv01 srv02 ]

Node Attributes:
* Node srv01:
+ default_ping_set  : 100   
+ master-pgsql  : 10
* Node srv02:
+ default_ping_set  : 100   
+ master-pgsql  : 5 

Migration summary:
* Node srv01: 
   vip-master: migration-threshold=1 fail-count=1 last-failure='Tue Feb 18 
18:07:50 2014'
* Node srv02: 

Failed actions:
vip-master_monitor_1 on srv01 'not running' (7): call=30, 
status=complete, last-rc-change='Tue Feb 18 18:07:50 2014', queued=0ms, exec=0ms
-

However, the resource does not fail-over.

But, fail-over is calculated when I check cib in crm_simulate at this point in 
time.

-
[root@srv01 ~]# crm_simulate -L -s

Current cluster status:
Online: [ srv01 srv02 ]

 vip-master (ocf::heartbeat:Dummy): Stopped 
 vip-rep(ocf::heartbeat:Dummy): Stopped 
 Master/Slave Set: msPostgresql [pgsql]
 Masters: [ srv01 ]
 Slaves: [ srv02 ]
 Clone Set: clnPingd [prmPingd]
 Started: [ srv01 srv02 ]

Allocation scores:
clone_color: clnPingd allocation score on srv01: 0
clone_color: clnPingd allocation score on srv02: 0
clone_color: prmPingd:0 allocation score on srv01: INFINITY
clone_color: prmPingd:0 allocation score on srv02: 0
clone_color: prmPingd:1 allocation score on srv01: 0
clone_color: prmPingd:1 allocation score on srv02: INFINITY
native_color: prmPingd:0 allocation score on srv01: INFINITY
native_color: prmPingd:0 allocation score on srv02: 0
native_color: prmPingd:1 allocation score on srv01: -INFINITY
native_color: prmPingd:1 allocation score on srv02: INFINITY
clone_color: msPostgresql allocation score on srv01: 0
clone_color: msPostgresql allocation score on srv02: 0
clone_color: pgsql:0 allocation score on srv01: INFINITY
clone_color: pgsql:0 allocation score on srv02: 0
clone_color: pgsql:1 allocation score on srv01: 0
clone_color: pgsql:1 allocation score on srv02: INFINITY
native_color: pgsql:0 allocation score on srv01: INFINITY
native_color: pgsql:0 allocation score on srv02: 0
native_color: pgsql:1 allocation score on srv01: -INFINITY
native_color: pgsql:1 allocation score on srv02: INFINITY
pgsql:1 promotion score on srv02: 5
pgsql:0 promotion score on srv01: 1
native_color: vip-master allocation score on srv01: -INFINITY
native_color: vip-master allocation score on srv02: INFINITY
native_color: vip-rep allocation score on srv01: -INFINITY
native_color: vip-rep allocation score on srv02: INFINITY

Transition Summary:
 * Start   vip-master   (srv02)
 * Start   vip-rep  (srv02)
 * Demote  pgsql:0  (Master - Slave srv01)
 * Promote pgsql:1  (Slave - Master srv02)

-

In addition, fail-over is calculated even if cluster_recheck_interval is 
carried out.

Fail-over is carried out even if I carry out cibadmin -B.

-
[root@srv01 ~]# cibadmin -B

[root@srv01 ~]# crm_mon -1 -Af
Last updated: Tue Feb 18 18:21:15 2014
Last change: Tue Feb 18 18:21:00 2014 via cibadmin on srv01
Stack: corosync
Current DC: srv01 (3232238180) - partition with quorum
Version: 1.1.10-9d39a6b
2 Nodes configured
6 Resources configured


Online: [ srv01 srv02 ]

 vip-master (ocf::heartbeat:Dummy): Started srv02 
 vip-rep(ocf::heartbeat:Dummy): Started srv02 
 Master/Slave Set: 

Re: [Pacemaker] [Patch]Information of Connectivity is lost is not displayed

2014-02-17 Thread renayama19661014
Hi Andrew,

 I'm confused... that patch seems to be the reverse of yours.
 Are you saying that we need to undo Lars' one?

No, I do not understand the meaning of the correction of Mr. Lars.

However, as now, crm_mon does not display a right attribute.
Possibly did you not discuss the correction to put meta data in rsc-parameters 
with Mr. Lars? Or Mr. David?

Best Regards,
Hideo Yamauchi.

--- On Tue, 2014/2/18, Andrew Beekhof and...@beekhof.net wrote:

 
 On 17 Feb 2014, at 5:43 pm, renayama19661...@ybb.ne.jp wrote:
 
  Hi All,
  
  The next change was accomplished by Mr. Lars.
  
  https://github.com/ClusterLabs/pacemaker/commit/6a17c003b0167de9fe51d5330fb6e4f1b4ffe64c
 
 I'm confused... that patch seems to be the reverse of yours.
 Are you saying that we need to undo Lars' one?
 
  
  I may lack the correction of other parts which are not the patch which I 
  sent.
  
  Best Regards,
  Hideo Yamauchi.
  
  --- On Mon, 2014/2/17, renayama19661...@ybb.ne.jp 
  renayama19661...@ybb.ne.jp wrote:
  
  Hi All,
  
  The crm_mon tool which is attached to Pacemaker1.1 seems to have a problem.
  I send a patch.
  
  Best Regards,
  Hideo Yamauchi.
  
  ___
  Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
  http://oss.clusterlabs.org/mailman/listinfo/pacemaker
  
  Project Home: http://www.clusterlabs.org
  Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
  Bugs: http://bugs.clusterlabs.org
 
 

___
Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org


Re: [Pacemaker] stopped resource was judged to be active

2014-02-17 Thread Andrew Beekhof

On 10 Feb 2014, at 5:28 pm, Kazunori INOUE kazunori.ino...@gmail.com wrote:

 Hi,
 
 Pacemaker stopped, but it was judged that a resource was active.
 I put crm_report here.
 https://drive.google.com/file/d/0B9eNn1AWfKD4S29JWk1ldUJJNGs/edit?usp=sharing
 
 [Steps to reproduce]
 1) start up the cluster
 
 Stack: corosync
 Current DC: bl460g1n7 (3232261593) - partition with quorum
 Version: 1.1.10-21de3a0
 2 Nodes configured
 34 Resources configured
 
 
 Online: [ bl460g1n6 bl460g1n7 ]
 
 Full list of resources:
 ...snip...
 
 
 * election-attrd exists in bl460g1n7.
 Feb  4 14:06:38 bl460g1n7 attrd[28811]: info: election_complete:
 Election election-attrd complete
 
 
 2) banish election-attrd from DC node
 I suppose that it is a condition that there are DC and election-attrd
 in a different node.
 
 [bl460g1n7]$ pkill -9 attrd
 Feb  4 14:07:15 bl460g1n6 attrd[16927]: info: election_complete:
 Election election-attrd complete
 
 
 3) stop DC ( after making a resource fail )
 [bl460g1n7]$ stop pacemaker.combined
 Feb  4 14:09:39 bl460g1n7 crmd[28813]:   notice: process_lrm_event:
 LRM operation prmClone9_stop_0 (call=150, rc=0, cib-update=98,
 confirmed=true) ok

There are cases when = .11 could loose resource updates like this.
The subsequent behaviour by pacemaker (fencing the node) is correct but clearly 
suboptimal.

Happily the same code that improves the CIB's performance also makes this 
impossible.
So if you should find this problem gone if you try with the current git master.

 :
 Feb  4 14:09:39 bl460g1n7 pacemakerd[28803]: info: main: Exiting 
 pacemakerd
 Feb  4 14:09:39 bl460g1n7 pacemakerd[28803]: info:
 crm_xml_cleanup: Cleaning up memory from libxml2
 
 * pacemaker of bl460g1n7 stopped normally, but bl460g1n6 judged that a
  resource was active.
 Feb  4 14:09:41 bl460g1n6 pengine[16928]:  warning: pe_fence_node:
 Node bl460g1n7 will be fenced because prmClone9:0 is thought to be
 active there
 
 
 Best regards,
 Kazunori INOUE
 
 ___
 Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
 http://oss.clusterlabs.org/mailman/listinfo/pacemaker
 
 Project Home: http://www.clusterlabs.org
 Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
 Bugs: http://bugs.clusterlabs.org



signature.asc
Description: Message signed with OpenPGP using GPGMail
___
Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org


Re: [Pacemaker] pre_notify_demote is issued twice

2014-02-17 Thread Andrew Beekhof

On 6 Feb 2014, at 7:45 pm, Keisuke MORI keisuke.mori...@gmail.com wrote:

 Hi,
 
 I observed that pre_notify_demote is issued twice when a master
 resource is migrating.
 I'm wondering if this is the correct behavior.
 
 Steps to reproduce:
 
 - Start up 2 nodes cluster configured for the PostgreSQL streaming
 replication using pgsql RA as  a master/slave resource.
 - kill the postgresql process on the master node to induce a fail-over.
 - The fail-over succeeds as expected, but pre_notify_demote was
 executed twice on each node before demoting on the master resource.
 
 100% reproducible on my cluster.
 
 Pacemaker version: 1.1.11-rc4 (source build from the repo)
 OS: RHEL6.4
 
 I have never seen this on Pacemaker-1.0.* cluster with the same configuration.
 
 The relevant logs and pe-inputs are attached.
 
 
 Diagnostics:
 
 (1) The first transition caused by the process failure (pe-input-160)
 initiates pre_notify_demote on both nodes and cancelling slave monitor
 on the slave node.
 {{{
 171 Jan 30 16:08:59 rhel64-1 crmd[8143]:   notice: te_rsc_command:
 Initiating action 9: cancel prmPostgresql_cancel_1 on rhel64-2
 172 Jan 30 16:08:59 rhel64-1 crmd[8143]:   notice: te_rsc_command:
 Initiating action 79: notify prmPostgresql_pre_notify_demote_0 on
 rhel64-1 (local)
 
 175 Jan 30 16:08:59 rhel64-1 crmd[8143]:   notice: te_rsc_command:
 Initiating action 81: notify prmPostgresql_pre_notify_demote_0 on
 rhel64-2
 }}}
 
 (2) When cancelling slave monitor completes, the transition is aborted
 by Resource op removal.
 {{{
 176 Jan 30 16:08:59 rhel64-1 crmd[8143]: info: match_graph_event:
 Action prmPostgresql_monitor_1 (9) confirmed on rhel64-2 (rc=0)
 177 Jan 30 16:08:59 rhel64-1 cib[8138]: info: cib_process_request:
 Completed cib_delete operation for section status: OK (rc=0,
 origin=rhel64-2/crmd/21, version=0.37.9)
 178 Jan 30 16:08:59 rhel64-1 crmd[8143]: info:
 abort_transition_graph: te_update_diff:258 - Triggered transition
 abort (complete=0, node=rhel64-2, tag=lrm_rsc_op,
 id=prmPostgresql_monitor_1,
 magic=0:0;26:12:0:acf9a2a3-307c-460b-b786-fc20e6b8aad5, cib=0.37.9) :
 Resource op removal
 }}}
 
 (3) The second transition is calculated by the abort (pe-input-161)
 which results initiating pre_notify_demote again.

If the demote didn't complete (or wasn't even attempted), then we must send the 
pre_notify_demote again unfortunately.
The real bug may well be that the transition shouldn't have been aborted.

 {{{
 227 Jan 30 16:09:01 rhel64-1 pengine[8142]:   notice:
 process_pe_message: Calculated Transition 15:
 /var/lib/pacemaker/pengine/pe-input-161.bz2
 229 Jan 30 16:09:01 rhel64-1 crmd[8143]:   notice: te_rsc_command:
 Initiating action 78: notify prmPostgresql_pre_notify_demote_0 on
 rhel64-1 (local)
 232 Jan 30 16:09:01 rhel64-1 crmd[8143]:   notice: te_rsc_command:
 Initiating action 80: notify prmPostgresql_pre_notify_demote_0 on
 rhel64-2
 }}}
 
 I think that the transition abort at (2) should not happen.
 
 Regards,
 -- 
 Keisuke MORI
 logs-pre-notify-20140206.tar.bz2___
 Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
 http://oss.clusterlabs.org/mailman/listinfo/pacemaker
 
 Project Home: http://www.clusterlabs.org
 Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
 Bugs: http://bugs.clusterlabs.org



signature.asc
Description: Message signed with OpenPGP using GPGMail
___
Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org


Re: [Pacemaker] [Patch]Information of Connectivity is lost is not displayed

2014-02-17 Thread renayama19661014
Hi Andrew,

Thank you for comments.

 can I see the config of yours that crm_mon is not displaying correctly?

It is displayed as follows.
-
[root@srv01 tmp]# crm_mon -1 -Af   
Last updated: Tue Feb 18 19:51:04 2014
Last change: Tue Feb 18 19:48:55 2014 via cibadmin on srv01
Stack: corosync
Current DC: srv01 (3232238180) - partition WITHOUT quorum
Version: 1.1.10-9d39a6b
1 Nodes configured
5 Resources configured


Online: [ srv01 ]

Clone Set: clnPingd [prmPingd]
 Started: [ srv01 ]

Node Attributes:
* Node srv01:
+ default_ping_set  : 0 

Migration summary:
* Node srv01: 

-

I uploaded log in the next place.(trac2781.zip)

 * https://skydrive.live.com/?cid=3A14D57622C66876id=3A14D57622C66876%21117

Best Regards,
Hideo Yamauchi.


--- On Tue, 2014/2/18, Andrew Beekhof and...@beekhof.net wrote:

 
 On 18 Feb 2014, at 12:19 pm, renayama19661...@ybb.ne.jp wrote:
 
  Hi Andrew,
  
  I'm confused... that patch seems to be the reverse of yours.
  Are you saying that we need to undo Lars' one?
  
  No, I do not understand the meaning of the correction of Mr. Lars.
 
 
 name, multiplier and host_list are all resource parameters, not meta 
 attributes.
 so lars' patch should be correct.
 
 can I see the config of yours that crm_mon is not displaying correctly?
 
  
  However, as now, crm_mon does not display a right attribute.
  Possibly did you not discuss the correction to put meta data in 
  rsc-parameters with Mr. Lars? Or Mr. David?
  
  Best Regards,
  Hideo Yamauchi.
  
  --- On Tue, 2014/2/18, Andrew Beekhof and...@beekhof.net wrote:
  
  
  On 17 Feb 2014, at 5:43 pm, renayama19661...@ybb.ne.jp wrote:
  
  Hi All,
  
  The next change was accomplished by Mr. Lars.
  
  https://github.com/ClusterLabs/pacemaker/commit/6a17c003b0167de9fe51d5330fb6e4f1b4ffe64c
  
  I'm confused... that patch seems to be the reverse of yours.
  Are you saying that we need to undo Lars' one?
  
  
  I may lack the correction of other parts which are not the patch which I 
  sent.
  
  Best Regards,
  Hideo Yamauchi.
  
  --- On Mon, 2014/2/17, renayama19661...@ybb.ne.jp 
  renayama19661...@ybb.ne.jp wrote:
  
  Hi All,
  
  The crm_mon tool which is attached to Pacemaker1.1 seems to have a 
  problem.
  I send a patch.
  
  Best Regards,
  Hideo Yamauchi.
  
  ___
  Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
  http://oss.clusterlabs.org/mailman/listinfo/pacemaker
  
  Project Home: http://www.clusterlabs.org
  Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
  Bugs: http://bugs.clusterlabs.org
  
  
  
  ___
  Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
  http://oss.clusterlabs.org/mailman/listinfo/pacemaker
  
  Project Home: http://www.clusterlabs.org
  Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
  Bugs: http://bugs.clusterlabs.org
 
 

___
Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org


Re: [Pacemaker] stopped resource was judged to be active

2014-02-17 Thread Kazunori INOUE
2014-02-18 10:43 GMT+09:00 Andrew Beekhof and...@beekhof.net:

 On 10 Feb 2014, at 5:28 pm, Kazunori INOUE kazunori.ino...@gmail.com wrote:

 Hi,

 Pacemaker stopped, but it was judged that a resource was active.
 I put crm_report here.
 https://drive.google.com/file/d/0B9eNn1AWfKD4S29JWk1ldUJJNGs/edit?usp=sharing

 [Steps to reproduce]
 1) start up the cluster

 Stack: corosync
 Current DC: bl460g1n7 (3232261593) - partition with quorum
 Version: 1.1.10-21de3a0
 2 Nodes configured
 34 Resources configured


 Online: [ bl460g1n6 bl460g1n7 ]

 Full list of resources:
 ...snip...


 * election-attrd exists in bl460g1n7.
 Feb  4 14:06:38 bl460g1n7 attrd[28811]: info: election_complete:
 Election election-attrd complete


 2) banish election-attrd from DC node
 I suppose that it is a condition that there are DC and election-attrd
 in a different node.

 [bl460g1n7]$ pkill -9 attrd
 Feb  4 14:07:15 bl460g1n6 attrd[16927]: info: election_complete:
 Election election-attrd complete


 3) stop DC ( after making a resource fail )
 [bl460g1n7]$ stop pacemaker.combined
 Feb  4 14:09:39 bl460g1n7 crmd[28813]:   notice: process_lrm_event:
 LRM operation prmClone9_stop_0 (call=150, rc=0, cib-update=98,
 confirmed=true) ok

 There are cases when = .11 could loose resource updates like this.
 The subsequent behaviour by pacemaker (fencing the node) is correct but 
 clearly suboptimal.

 Happily the same code that improves the CIB's performance also makes this 
 impossible.
 So if you should find this problem gone if you try with the current git 
 master.


OK, I'll try.
Thanks.

 :
 Feb  4 14:09:39 bl460g1n7 pacemakerd[28803]: info: main: Exiting 
 pacemakerd
 Feb  4 14:09:39 bl460g1n7 pacemakerd[28803]: info:
 crm_xml_cleanup: Cleaning up memory from libxml2

 * pacemaker of bl460g1n7 stopped normally, but bl460g1n6 judged that a
  resource was active.
 Feb  4 14:09:41 bl460g1n6 pengine[16928]:  warning: pe_fence_node:
 Node bl460g1n7 will be fenced because prmClone9:0 is thought to be
 active there


 Best regards,
 Kazunori INOUE

 ___
 Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
 http://oss.clusterlabs.org/mailman/listinfo/pacemaker

 Project Home: http://www.clusterlabs.org
 Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
 Bugs: http://bugs.clusterlabs.org


 ___
 Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
 http://oss.clusterlabs.org/mailman/listinfo/pacemaker

 Project Home: http://www.clusterlabs.org
 Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
 Bugs: http://bugs.clusterlabs.org

___
Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org