Short description ----------------------- Corosync ignores my resources order settings.
Final goal ----------- Being able to HA zimbra. Description of the system ----------------------------------- This is an Ubuntu 10.04 LTS because current stable Zimbra works in Ubuntu 10.04 and not yet in 12.04. I've dist-upgraded packages from: https://launchpad.net/~ubuntu-ha-maintainers/+archive/ppa as it was advised on some sites. My main configuration is based on this document: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf I've created some OCF resource agents (for zimbra and some network stuff) on my own and I've already tested them thanks to ocf-tester and ocf-tester-py (a hack of mine of ocf-tester that allows you to test python based ocf scripts). Finally some packages versions: libcrmcluster1 1.1.6-2ubuntu0~ppa2 libcrmcommon2 1.1.6-2ubuntu0~ppa2 corosync 1.4.2-1ubuntu0~ppa1 libcorosync4 1.4.2-1ubuntu0~ppa1 lvm2 2.02.54-1ubuntu4.1ppa5 pacemaker 1.1.6-2ubuntu0~ppa2 libglib2.0-0 2.24.1-0ubuntu1.1~ppa1 lvm2 2.02.54-1ubuntu4.1ppa5 cluster-glue 1.0.8-2ubuntu0~ppa4 libcluster-glue 1.0.8-2ubuntu0~ppa4 resource-agents 1:3.9.2-4ubuntu0~ppa2 crm configure show output: ----------------------------------- adrian@zhatest-01:~$ sudo crm configure show node zhatest-01.domain.com node zhatest-02.domain.com primitive ClusterDefaultRoute ocf:btactic:OVHdefaultroute \ op monitor interval="30s" primitive ClusterHostRoute ocf:btactic:OVHhostroute \ params device="eth0" \ op monitor interval="30s" primitive ClusterIP ocf:heartbeat:IPaddr2 \ params nic="eth0" ip="1.2.3.4" cidr_netmask="32" broadcast="1.2.3.4" \ op monitor interval="30s" primitive ClusterOVHFailover ocf:btactic:OVHfailover \ op monitor interval="120s" timeout="60s" \ op start interval="0" timeout="660" \ op stop interval="0" timeout="660" \ params nichandle="MYLOGIN" password="MYSECRET" failover="1.2.3.4" \ meta target-role="Started" primitive ZimbraData ocf:linbit:drbd \ params drbd_resource="zimbradata" \ op monitor interval="60s" role="Master" \ op monitor interval="50s" role="Slave" \ op start interval="0" role="Master" timeout="240" \ op start interval="0" role="Slave" timeout="240" \ op stop interval="0" role="Master" timeout="100" \ op stop interval="0" role="Slave" timeout="100" primitive ZimbraFS ocf:heartbeat:Filesystem \ params device="/dev/drbd/by-res/zimbradata" directory="/opt/zimbra" fstype="ext4" \ op start interval="0" timeout="60s" \ op stop interval="0" timeout="60s" primitive ZimbraServer ocf:btactic:zimbra \ op monitor interval="2min" \ op start interval="0" timeout="360s" \ op stop interval="0" timeout="360s" group MySystem ClusterOVHFailover ClusterIP ClusterHostRoute ClusterDefaultRoute group MyZimbra ZimbraFS ZimbraServer ms ZimbraDataClone ZimbraData \ meta master-max="1" master-node-max="1" clone-max="2" clone-node-max="1" notify="true" location prefer-zhatest-01 MyZimbra 50: zhatest-01.domain.com colocation everything-together inf: MySystem ZimbraDataClone:Master MyZimbra order everything-ordered inf: MySystem ZimbraDataClone:promote MyZimbra property $id="cib-bootstrap-options" \ no-quorum-policy="ignore" \ stonith-enabled="false" \ dc-version="1.1.6-9971ebba4494012a93c03b40a2c58ec0eb60f50c" \ cluster-infrastructure="openais" \ expected-quorum-votes="2" rsc_defaults $id="rsc-options" \ resource-stickiness="100" crm_on -orVVVV1 output: ---------------------------------- crm_mon[4215]: 2012/08/18_19:46:39 info: main: Starting crm_mon crm_mon[4215]: 2012/08/18_19:46:39 info: unpack_config: Startup probes: enabled crm_mon[4215]: 2012/08/18_19:46:39 notice: unpack_config: On loss of CCM Quorum: Ignore crm_mon[4215]: 2012/08/18_19:46:39 info: unpack_config: Node scores: 'red' = -INFINITY, 'yellow' = 0, 'green' = 0 crm_mon[4215]: 2012/08/18_19:46:39 info: unpack_domains: Unpacking domains crm_mon[4215]: 2012/08/18_19:46:39 info: determine_online_status: Node zhatest-01.domain.com is online crm_mon[4215]: 2012/08/18_19:46:39 notice: unpack_rsc_op: Hard error - ZimbraServer_last_failure_0 failed with rc=5: Preventing ZimbraServer from re-starting on zhatest-01.domain.com ============ Last updated: Sat Aug 18 19:46:39 2012 Last change: Sat Aug 18 18:09:51 2012 via crmd on zhatest-01.domain.com Stack: openais Current DC: zhatest-01.domain.com - partition WITHOUT quorum Version: 1.1.6-9971ebba4494012a93c03b40a2c58ec0eb60f50c 2 Nodes configured, 2 expected votes 8 Resources configured. ============ Online: [ zhatest-01.domain.com ] OFFLINE: [ zhatest-02.domain.com ] Full list of resources: Resource Group: MySystem ClusterOVHFailover (ocf::btactic:OVHfailover): Stopped ClusterIP (ocf::heartbeat:IPaddr2): Stopped ClusterHostRoute (ocf::btactic:OVHhostroute): Stopped ClusterDefaultRoute (ocf::btactic:OVHdefaultroute): Stopped Resource Group: MyZimbra ZimbraFS (ocf::heartbeat:Filesystem): Stopped ZimbraServer (ocf::btactic:zimbra): Stopped Master/Slave Set: ZimbraDataClone [ZimbraData] Slaves: [ zhatest-01.domain.com ] Stopped: [ ZimbraData:1 ] Operations: * Node zhatest-01.domain.com: ZimbraData:0: migration-threshold=1000000 + (9) start: rc=0 (ok) + (11) monitor: interval=50000ms rc=0 (ok) ZimbraServer: migration-threshold=1000000 + (7) probe: rc=5 (not installed) Failed actions: ZimbraServer_monitor_0 (node=zhatest-01.domain.com, call=7, rc=5, status=complete): not installed Long description: ----------------------- I expect that system tries to start resources in the following order: MySystem ZimbraDataClone:Master MyZimbra that after expanding group members is: ClusterOVHFailover ClusterIP ClusterHostRoute \ ClusterDefaultRoute ZimbraDataClone:Master \ ZimbraFS ZimbraServer . If crm_mon -o shows the operation history as per my former log it seems that corosync insists on starting ZimbraData on the first place and I don't want that. So, that's it. Am I missing something? If you need more logs don't hesitate to ask for them. Thank you! Other questions --------------------- Where is documented the probe operation which happens to appear on crm_mon output? P.S.: This unanswered email is very similar to my issue: http://lists.linux-ha.org/pipermail/linux-ha/2011-May/043144.html -- -- Adrián Gibanel I.T. Manager +34 675 683 301 www.btactic.com Ens podeu seguir a/Nos podeis seguir en: i Abans d´imprimir aquest missatge, pensa en el medi ambient. El medi ambient és cosa de tothom. / Antes de imprimir el mensaje piensa en el medio ambiente. El medio ambiente es cosa de todos. AVIS: El contingut d'aquest missatge i els seus annexos és confidencial. Si no en sou el destinatari, us fem saber que està prohibit utilitzar-lo, divulgar-lo i/o copiar-lo sense tenir l'autorització corresponent. Si heu rebut aquest missatge per error, us agrairem que ho feu saber immediatament al remitent i que procediu a destruir el missatge . AVISO: El contenido de este mensaje y de sus anexos es confidencial. Si no es el destinatario, les hacemos saber que está prohibido utilizarlo, divulgarlo y/o copiarlo sin tener la autorización correspondiente. Si han recibido este mensaje por error, les agradeceríamos que lo hagan saber inmediatamente al remitente y que procedan a destruir el mensaje . _______________________________________________ Pacemaker mailing list: Pacemaker@oss.clusterlabs.org http://oss.clusterlabs.org/mailman/listinfo/pacemaker Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://bugs.clusterlabs.org