Re: [Pacemaker] Cleanup over secondary node
Ho Andrew. On Monday, 15 April 2013 14:36:48 +1000, Andrew Beekhof wrote: I'm testing Pacemaker+Corosync cluster with KVM virtual machines. When restarting a node, I got the following status: # crm status Last updated: Sun Apr 14 11:50:00 2013 Last change: Sun Apr 14 11:49:54 2013 Stack: openais Current DC: daedalus - partition with quorum Version: 1.1.7-ee0730e13d124c3d58f00016c3376a1de5323cff 2 Nodes configured, 2 expected votes 8 Resources configured. Online: [ atlantis daedalus ] Resource Group: servicios fs_drbd_servicios (ocf::heartbeat:Filesystem):Started daedalus clusterIP (ocf::heartbeat:IPaddr2): Started daedalus Mysql (ocf::heartbeat:mysql): Started daedalus Apache (ocf::heartbeat:apache):Started daedalus Pure-FTPd (ocf::heartbeat:Pure-FTPd): Started daedalus Asterisk (ocf::heartbeat:asterisk): Started daedalus Master/Slave Set: drbd_serviciosClone [drbd_servicios] Masters: [ daedalus ] Slaves: [ atlantis ] Failed actions: Asterisk_monitor_0 (node=atlantis, call=12, rc=5, status=complete): not installed The problem is that if I do a cleanup of the Asterisk resource in the secondary, this has no effect. It seems to be Paceemaker needs to have access to the config file to the resource. Not Pacemaker, the resource agent. Pacemaker runs a non-recurring monitor operation to see what state the service is in, it seems the asterisk agent needs that config file. I'd suggest changing the agent so that if the asterisk process is not running, the agent returns 7 (not running) before trying to access the config file. I was reviewing the resource definition assuming there I might have made some reference to the Asterisk configuration file, but this was not the case: primitive Asterisk ocf:heartbeat:asterisk \ params realtime=true \ op monitor interval=60s \ meta target-role=Started This agent is the one that is available in the resource-agents package from Debian Backports repository: atlantis:~# aptitude show resource-agents Paquete: resource-agents Nuevo: sí Estado: instalado Instalado automáticamente: sí Versión: 1:3.9.2-5~bpo60+1 Prioridad: opcional Sección: admin Desarrollador: Debian HA Maintainers debian-ha-maintain...@lists.alioth.debian.org Tamaño sin comprimir: 2.228 k Depende de: libc6 (= 2.4), libglib2.0-0 (= 2.12.0), libnet1 (= 1.1.2.1), libplumb2, libplumbgpl2, cluster-glue, python Tiene conflictos con: cluster-agents (= 1:1.0.4-1), rgmanager (= 3.0.12-2+b1) Reemplaza: cluster-agents (= 1:1.0.4-1), rgmanager (= 3.0.12-2+b1) Descripción: Cluster Resource Agents The Cluster Resource Agents are a set of scripts to interface with several services to operate in a High Availability environment for both Pacemaker and rgmanager resource managers. Página principal: https://github.com/ClusterLabs/resource-agents Do you know if there is any way to get the behavior that you suggested me using this agent? Thanks for your reply. Regards, Daniel -- Ing. Daniel Bareiro - GNU/Linux registered user #188.598 Proudly running Debian GNU/Linux with uptime: 21:54:06 up 52 days, 6:01, 11 users, load average: 0.00, 0.02, 0.00 signature.asc Description: Digital signature ___ Pacemaker mailing list: Pacemaker@oss.clusterlabs.org http://oss.clusterlabs.org/mailman/listinfo/pacemaker Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://bugs.clusterlabs.org
[Pacemaker] Cleanup over secondary node
Hi all! I'm testing Pacemaker+Corosync cluster with KVM virtual machines. When restarting a node, I got the following status: # crm status Last updated: Sun Apr 14 11:50:00 2013 Last change: Sun Apr 14 11:49:54 2013 Stack: openais Current DC: daedalus - partition with quorum Version: 1.1.7-ee0730e13d124c3d58f00016c3376a1de5323cff 2 Nodes configured, 2 expected votes 8 Resources configured. Online: [ atlantis daedalus ] Resource Group: servicios fs_drbd_servicios (ocf::heartbeat:Filesystem):Started daedalus clusterIP (ocf::heartbeat:IPaddr2): Started daedalus Mysql (ocf::heartbeat:mysql): Started daedalus Apache (ocf::heartbeat:apache):Started daedalus Pure-FTPd (ocf::heartbeat:Pure-FTPd): Started daedalus Asterisk (ocf::heartbeat:asterisk): Started daedalus Master/Slave Set: drbd_serviciosClone [drbd_servicios] Masters: [ daedalus ] Slaves: [ atlantis ] Failed actions: Asterisk_monitor_0 (node=atlantis, call=12, rc=5, status=complete): not installed The problem is that if I do a cleanup of the Asterisk resource in the secondary, this has no effect. It seems to be Paceemaker needs to have access to the config file to the resource. But this is not available, because it is mounted on the DRBD device that is accessible in the primary: Apr 14 11:58:06 atlantis cib: [1136]: info: apply_xml_diff: Digest mis-match: expected f6e4778e0ca9d8d681ba86acb83a6086, calculated ad03ff3e0622f60c78e8e1ece055bd63 Apr 14 11:58:06 atlantis cib: [1136]: notice: cib_process_diff: Diff 0.825.3 - 0.825.4 not applied to 0.825.3: Failed application of an update diff Apr 14 11:58:06 atlantis cib: [1136]: info: cib_server_process_diff: Requesting re-sync from peer Apr 14 11:58:06 atlantis crmd: [1141]: info: delete_resource: Removing resource Asterisk for 3141_crm_resource (internal) on atlantis Apr 14 11:58:06 atlantis crmd: [1141]: info: notify_deleted: Notifying 3141_crm_resource on atlantis that Asterisk was deleted Apr 14 11:58:06 atlantis crmd: [1141]: WARN: decode_transition_key: Bad UUID (crm-resource-3141) in sscanf result (3) for 0:0:crm-resource-3141 Apr 14 11:58:06 atlantis crmd: [1141]: info: ais_dispatch_message: Membership 1616: quorum retained Apr 14 11:58:06 atlantis lrmd: [1138]: info: rsc:Asterisk probe[13] (pid 3144) Apr 14 11:58:06 atlantis asterisk[3144]: ERROR: Config /etc/asterisk/asterisk.conf doesn't exist Apr 14 11:58:06 atlantis lrmd: [1138]: info: operation monitor[13] on Asterisk for client 1141: pid 3144 exited with return code 5 Apr 14 11:58:06 atlantis crmd: [1141]: info: process_lrm_event: LRM operation Asterisk_monitor_0 (call=13, rc=5, cib-update=40, confirmed=true) not installed Is there any way to remedy this situation? Thanks in advance for your reply. Regards, Daniel -- Ing. Daniel Bareiro - GNU/Linux registered user #188.598 Proudly running Debian GNU/Linux with uptime: 11:46:23 up 49 days, 19:53, 12 users, load average: 0.00, 0.01, 0.00 signature.asc Description: Digital signature ___ Pacemaker mailing list: Pacemaker@oss.clusterlabs.org http://oss.clusterlabs.org/mailman/listinfo/pacemaker Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://bugs.clusterlabs.org
Re: [Pacemaker] Problems with Pacemaker + Corosync after reboot
On Wednesday, 22 December 2010 08:29:02 -0500, Shravan Mishra wrote: Hi, Hi, Shravan. What's happening is that corosync is forking but the exec is not happening. And do you think that what is shown in the logs is consistent with what is shown using ps? I used to see this problem in my case when syslog-ng process was not running. Try checking that and starting it and then start corosync. Now I see that if I do a shutdown of the node that has the resource (failover-ip), then this does not migrate to another node. By doing the test I made sure Pacemaker + Corosync are functioning correctly on both nodes before doing a shutdown of Atlantis. Before making a shutdown of Atlantis: --- daedalus:~# crm_mon --one-shot Last updated: Thu Dec 23 19:24:09 2010 Stack: openais Current DC: atlantis - partition with quorum Version: 1.0.9-74392a28b7f31d7ddc86689598bd23114f58978b 2 Nodes configured, 2 expected votes 1 Resources configured. Online: [ atlantis daedalus ] failover-ip(ocf::heartbeat:IPaddr):Started atlantis --- After doing a shutdown of Atlantis: --- daedalus:~# crm_mon --one-shot Last updated: Thu Dec 23 19:25:44 2010 Stack: openais Current DC: daedalus - partition WITHOUT quorum Version: 1.0.9-74392a28b7f31d7ddc86689598bd23114f58978b 2 Nodes configured, 2 expected votes 1 Resources configured. Online: [ daedalus ] OFFLINE: [ atlantis ] --- Here I'm using a configuration like the one presented in the wiki [1]. I am also noting that after the Atlantis launch, corosync makes the fork without exec (as we assume from what I showed in the previous mail) and only now is when the resource migrates to Daedalus: --- daedalus:~# crm_mon --one-shot Last updated: Thu Dec 23 19:49:11 2010 Stack: openais Current DC: daedalus - partition with quorum Version: 1.0.9-74392a28b7f31d7ddc86689598bd23114f58978b 2 Nodes configured, 2 expected votes 1 Resources configured. Online: [ daedalus ] OFFLINE: [ atlantis ] failover-ip(ocf::heartbeat:IPaddr):Started daedalus --- --- atlantis:~# crm_mon --one-shot Connection to cluster failed: connection failed --- I tried doing a corosync stop, but the processes are not closed: atlantis:~# ps auxf [...] root 1564 0.0 1.2 168144 3240 ?S19:38 0:00 /usr/sbin/corosync root 1565 0.0 1.2 168144 3240 ?S19:38 0:00 /usr/sbin/corosync root 1566 0.0 1.2 168144 3240 ?S19:38 0:00 /usr/sbin/corosync root 1567 0.0 1.2 168144 3240 ?S19:38 0:00 /usr/sbin/corosync root 1568 0.0 1.2 168144 3240 ?S19:38 0:00 /usr/sbin/corosync root 1569 0.0 1.2 168144 3240 ?S19:38 0:00 /usr/sbin/corosync The only way I found to correctly start corosync is doing a pkill -9 corosync and corosync start: atlantis:~# ps auxf [...] root 2120 0.2 1.9 134288 5060 ?Ssl 19:59 0:00 /usr/sbin/corosync root 2128 0.0 4.5 76028 11600 ?SLs 19:59 0:00 \_ /usr/lib/heartbeat/stonithd 105 2129 0.1 2.0 79104 5120 ?S19:59 0:00 \_ /usr/lib/heartbeat/cib root 2130 0.0 0.8 71580 2108 ?S19:59 0:00 \_ /usr/lib/heartbeat/lrmd 105 2131 0.0 1.3 79968 3340 ?S19:59 0:00 \_ /usr/lib/heartbeat/attrd 105 2132 0.0 1.1 80332 2892 ?S19:59 0:00 \_ /usr/lib/heartbeat/pengine 105 2133 0.0 1.4 86216 3764 ?S19:59 0:00 \_ /usr/lib/heartbeat/crmd After this, the resource automatically migrates back to Atlantis: --- daedalus:~# crm_mon --one-shot Last updated: Thu Dec 23 20:03:18 2010 Stack: openais Current DC: daedalus - partition with quorum Version: 1.0.9-74392a28b7f31d7ddc86689598bd23114f58978b 2 Nodes configured, 2 expected votes 1 Resources configured. Online: [ atlantis daedalus ] failover-ip(ocf::heartbeat:IPaddr):Started atlantis --- Any idea how to fix this problem with Corosync? Why to do a shutdown of Atlantis the resource does not migrate to Daedalus? Thanks for your reply. Regards, Daniel [1] http://www.clusterlabs.org/wiki/Debian_Lenny_HowTo -- Daniel Bareiro - GNU/Linux registered user #188.598 Proudly