Re: [Pacemaker] Cleanup over secondary node

2013-04-16 Thread Daniel Bareiro
Ho Andrew.

On Monday, 15 April 2013 14:36:48 +1000,
Andrew Beekhof wrote:

  I'm testing Pacemaker+Corosync cluster with KVM virtual machines. When
  restarting a node, I got the following status:
  
  # crm status
  
  Last updated: Sun Apr 14 11:50:00 2013
  Last change: Sun Apr 14 11:49:54 2013
  Stack: openais
  Current DC: daedalus - partition with quorum
  Version: 1.1.7-ee0730e13d124c3d58f00016c3376a1de5323cff
  2 Nodes configured, 2 expected votes
  8 Resources configured.
  
  
  Online: [ atlantis daedalus ]
  
  Resource Group: servicios
  fs_drbd_servicios  (ocf::heartbeat:Filesystem):Started daedalus
  clusterIP  (ocf::heartbeat:IPaddr2):   Started daedalus
  Mysql  (ocf::heartbeat:mysql): Started daedalus
  Apache (ocf::heartbeat:apache):Started daedalus
  Pure-FTPd  (ocf::heartbeat:Pure-FTPd): Started daedalus
  Asterisk   (ocf::heartbeat:asterisk):  Started daedalus
  Master/Slave Set: drbd_serviciosClone [drbd_servicios]
  Masters: [ daedalus ]
  Slaves: [ atlantis ]
  
  Failed actions:
 Asterisk_monitor_0 (node=atlantis, call=12, rc=5, status=complete): not 
  installed
  
  
  The problem is that if I do a cleanup of the Asterisk resource in the
  secondary, this has no effect. It seems to be Paceemaker needs to have
  access to the config file to the resource.

 Not Pacemaker, the resource agent.
 Pacemaker runs a non-recurring monitor operation to see what state the
 service is in, it seems the asterisk agent needs that config file.
 
 I'd suggest changing the agent so that if the asterisk process is not
 running, the agent returns 7 (not running) before trying to access the
 config file.

I was reviewing the resource definition assuming there I might have made
some reference to the Asterisk configuration file, but this was not the
case:

primitive Asterisk ocf:heartbeat:asterisk \
params realtime=true \
op monitor interval=60s \
meta target-role=Started

This agent is the one that is available in the resource-agents package
from Debian Backports repository:

atlantis:~# aptitude show resource-agents
Paquete: resource-agents
Nuevo: sí
Estado: instalado
Instalado automáticamente: sí
Versión: 1:3.9.2-5~bpo60+1
Prioridad: opcional
Sección: admin
Desarrollador: Debian HA Maintainers 
debian-ha-maintain...@lists.alioth.debian.org
Tamaño sin comprimir: 2.228 k
Depende de: libc6 (= 2.4), libglib2.0-0 (= 2.12.0), libnet1 (= 1.1.2.1), 
libplumb2, libplumbgpl2, cluster-glue, python
Tiene conflictos con: cluster-agents (= 1:1.0.4-1), rgmanager (= 3.0.12-2+b1)
Reemplaza: cluster-agents (= 1:1.0.4-1), rgmanager (= 3.0.12-2+b1)
Descripción: Cluster Resource Agents
 The Cluster Resource Agents are a set of scripts to interface with several 
services to operate in a High Availability environment for both Pacemaker and
 rgmanager resource managers.
Página principal: https://github.com/ClusterLabs/resource-agents




Do you know if there is any way to get the behavior that you suggested
me using this agent?


Thanks for your reply.


Regards,
Daniel
-- 
Ing. Daniel Bareiro - GNU/Linux registered user #188.598
Proudly running Debian GNU/Linux with uptime:
21:54:06 up 52 days,  6:01, 11 users,  load average: 0.00, 0.02, 0.00


signature.asc
Description: Digital signature
___
Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org


[Pacemaker] Cleanup over secondary node

2013-04-14 Thread Daniel Bareiro

Hi all!

I'm testing Pacemaker+Corosync cluster with KVM virtual machines. When
restarting a node, I got the following status:

# crm status

Last updated: Sun Apr 14 11:50:00 2013
Last change: Sun Apr 14 11:49:54 2013
Stack: openais
Current DC: daedalus - partition with quorum
Version: 1.1.7-ee0730e13d124c3d58f00016c3376a1de5323cff
2 Nodes configured, 2 expected votes
8 Resources configured.


Online: [ atlantis daedalus ]

 Resource Group: servicios
 fs_drbd_servicios  (ocf::heartbeat:Filesystem):Started daedalus
 clusterIP  (ocf::heartbeat:IPaddr2):   Started daedalus
 Mysql  (ocf::heartbeat:mysql): Started daedalus
 Apache (ocf::heartbeat:apache):Started daedalus
 Pure-FTPd  (ocf::heartbeat:Pure-FTPd): Started daedalus
 Asterisk   (ocf::heartbeat:asterisk):  Started daedalus
 Master/Slave Set: drbd_serviciosClone [drbd_servicios]
 Masters: [ daedalus ]
 Slaves: [ atlantis ]

Failed actions:
Asterisk_monitor_0 (node=atlantis, call=12, rc=5, status=complete): not 
installed


The problem is that if I do a cleanup of the Asterisk resource in the
secondary, this has no effect. It seems to be Paceemaker needs to have
access to the config file to the resource. But this is not available,
because it is mounted on the DRBD device that is accessible in the
primary:

Apr 14 11:58:06 atlantis cib: [1136]: info: apply_xml_diff: Digest mis-match: 
expected f6e4778e0ca9d8d681ba86acb83a6086, calculated 
ad03ff3e0622f60c78e8e1ece055bd63
Apr 14 11:58:06 atlantis cib: [1136]: notice: cib_process_diff: Diff 0.825.3 - 
0.825.4 not applied to 0.825.3: Failed application of an update diff
Apr 14 11:58:06 atlantis cib: [1136]: info: cib_server_process_diff: Requesting 
re-sync from peer
Apr 14 11:58:06 atlantis crmd: [1141]: info: delete_resource: Removing resource 
Asterisk for 3141_crm_resource (internal) on atlantis
Apr 14 11:58:06 atlantis crmd: [1141]: info: notify_deleted: Notifying 
3141_crm_resource on atlantis that Asterisk was deleted
Apr 14 11:58:06 atlantis crmd: [1141]: WARN: decode_transition_key: Bad UUID 
(crm-resource-3141) in sscanf result (3) for 0:0:crm-resource-3141
Apr 14 11:58:06 atlantis crmd: [1141]: info: ais_dispatch_message: Membership 
1616: quorum retained
Apr 14 11:58:06 atlantis lrmd: [1138]: info: rsc:Asterisk probe[13] (pid 3144)
Apr 14 11:58:06 atlantis asterisk[3144]: ERROR: Config 
/etc/asterisk/asterisk.conf doesn't exist
Apr 14 11:58:06 atlantis lrmd: [1138]: info: operation monitor[13] on Asterisk 
for client 1141: pid 3144 exited with return code 5
Apr 14 11:58:06 atlantis crmd: [1141]: info: process_lrm_event: LRM operation 
Asterisk_monitor_0 (call=13, rc=5, cib-update=40, confirmed=true) not installed


Is there any way to remedy this situation?


Thanks in advance for your reply.


Regards,
Daniel
-- 
Ing. Daniel Bareiro - GNU/Linux registered user #188.598
Proudly running Debian GNU/Linux with uptime:
11:46:23 up 49 days, 19:53, 12 users,  load average: 0.00, 0.01, 0.00


signature.asc
Description: Digital signature
___
Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org


Re: [Pacemaker] Problems with Pacemaker + Corosync after reboot

2010-12-23 Thread Daniel Bareiro
On Wednesday, 22 December 2010 08:29:02 -0500,
Shravan Mishra wrote:

 Hi,

Hi, Shravan.

 What's happening is that corosync is forking but the exec is not
 happening.

And do you think that what is shown in the logs is consistent with what
is shown using ps?

 I used to see this problem in my case when syslog-ng process was not
 running.
 
 Try checking that and starting it and then start corosync.

Now I see that if I do a shutdown of the node that has the resource
(failover-ip), then this does not migrate to another node. By doing the
test I made sure Pacemaker + Corosync are functioning correctly on both
nodes before doing a shutdown of Atlantis.

Before making a shutdown of Atlantis:

---
daedalus:~# crm_mon --one-shot

Last updated: Thu Dec 23 19:24:09 2010
Stack: openais
Current DC: atlantis - partition with quorum
Version: 1.0.9-74392a28b7f31d7ddc86689598bd23114f58978b
2 Nodes configured, 2 expected votes
1 Resources configured.


Online: [ atlantis daedalus ]

 failover-ip(ocf::heartbeat:IPaddr):Started atlantis
---

After doing a shutdown of Atlantis:

---
daedalus:~# crm_mon --one-shot

Last updated: Thu Dec 23 19:25:44 2010
Stack: openais
Current DC: daedalus - partition WITHOUT quorum
Version: 1.0.9-74392a28b7f31d7ddc86689598bd23114f58978b
2 Nodes configured, 2 expected votes
1 Resources configured.


Online: [ daedalus ]
OFFLINE: [ atlantis ]
---

Here I'm using a configuration like the one presented in the wiki [1].

I am also noting that after the Atlantis launch, corosync makes the fork
without exec (as we assume from what I showed in the previous mail) and
only now is when the resource migrates to Daedalus:

---
daedalus:~# crm_mon --one-shot

Last updated: Thu Dec 23 19:49:11 2010
Stack: openais
Current DC: daedalus - partition with quorum
Version: 1.0.9-74392a28b7f31d7ddc86689598bd23114f58978b
2 Nodes configured, 2 expected votes
1 Resources configured.


Online: [ daedalus ]
OFFLINE: [ atlantis ]

 failover-ip(ocf::heartbeat:IPaddr):Started daedalus
---


---
atlantis:~# crm_mon --one-shot

Connection to cluster failed: connection failed
---

I tried doing a corosync stop, but the processes are not closed:

atlantis:~# ps auxf
[...]
root  1564  0.0  1.2 168144  3240 ?S19:38   0:00 
/usr/sbin/corosync
root  1565  0.0  1.2 168144  3240 ?S19:38   0:00 
/usr/sbin/corosync
root  1566  0.0  1.2 168144  3240 ?S19:38   0:00 
/usr/sbin/corosync
root  1567  0.0  1.2 168144  3240 ?S19:38   0:00 
/usr/sbin/corosync
root  1568  0.0  1.2 168144  3240 ?S19:38   0:00 
/usr/sbin/corosync
root  1569  0.0  1.2 168144  3240 ?S19:38   0:00 
/usr/sbin/corosync


The only way I found to correctly start corosync is doing a pkill -9
corosync and corosync start:


atlantis:~# ps auxf
[...]
root  2120  0.2  1.9 134288  5060 ?Ssl  19:59   0:00 
/usr/sbin/corosync
root  2128  0.0  4.5  76028 11600 ?SLs  19:59   0:00  \_ 
/usr/lib/heartbeat/stonithd
105   2129  0.1  2.0  79104  5120 ?S19:59   0:00  \_ 
/usr/lib/heartbeat/cib
root  2130  0.0  0.8  71580  2108 ?S19:59   0:00  \_ 
/usr/lib/heartbeat/lrmd
105   2131  0.0  1.3  79968  3340 ?S19:59   0:00  \_ 
/usr/lib/heartbeat/attrd
105   2132  0.0  1.1  80332  2892 ?S19:59   0:00  \_ 
/usr/lib/heartbeat/pengine
105   2133  0.0  1.4  86216  3764 ?S19:59   0:00  \_ 
/usr/lib/heartbeat/crmd


After this, the resource automatically migrates back to Atlantis:

---
daedalus:~# crm_mon --one-shot

Last updated: Thu Dec 23 20:03:18 2010
Stack: openais
Current DC: daedalus - partition with quorum
Version: 1.0.9-74392a28b7f31d7ddc86689598bd23114f58978b
2 Nodes configured, 2 expected votes
1 Resources configured.


Online: [ atlantis daedalus ]

 failover-ip(ocf::heartbeat:IPaddr):Started atlantis
---


Any idea how to fix this problem with Corosync?

Why to do a shutdown of Atlantis the resource does not migrate to
Daedalus?



Thanks for your reply.

Regards,
Daniel

[1] http://www.clusterlabs.org/wiki/Debian_Lenny_HowTo
-- 
Daniel Bareiro - GNU/Linux registered user #188.598
Proudly