Hi, I'm puppetizing resource deployment for pacemaker and corosync, and as part of it, am creating a resource on one of three nodes of a cluster. The problem is that I'm seeing RecurringOp errors during resource creation, which are probably not allowing failover a resource. The resource creation seems to go through fine, but these recurringOp errors always result after resource creation (I'm pasting outputs of two different commands below):
*************************** vagrant@precise64b:/vagrant/puppet-environments/modules/f5_lbaas/tests$ sudo crm status ============ Last updated: Wed Jul 2 03:52:30 2014 Last change: Wed Jul 2 03:38:20 2014 via cibadmin on precise64b Stack: cman Current DC: precise64b - partition with quorum Version: 1.1.6-9971ebba4494012a93c03b40a2c58ec0eb60f50c 3 Nodes configured, unknown expected votes 3 Resources configured. ============ Online: [ precise64b precise64c precise64a ] f5-lbaas-agent-10.6.143.121_resource (lsb:f5-lbaas-agent-10.6.143.121): Started precise64c f5-lbaas-agent-10.6.143.122_resource (lsb:f5-lbaas-agent-10.6.143.122): Started precise64b f5-lbaas-agent-10.6.143.123_resource (lsb:f5-lbaas-agent-10.6.143.123): Started precise64b Failed actions: f5-lbaas-agent-10.6.143.120_resource_monitor_0 (node=precise64b, call=2, rc=5, status=complete): not installed f5-lbaas-agent-10.6.143.121_resource_monitor_0 (node=precise64b, call=3, rc=5, status=complete): not installed f5-lbaas-agent-10.6.143.122_resource_monitor_0 (node=precise64c, call=7, rc=5, status=complete): not installed f5-lbaas-agent-10.6.143.123_resource_monitor_0 (node=precise64c, call=8, rc=5, status=complete): not installed f5-lbaas-agent-10.6.143.120_resource_monitor_0 (node=precise64a, call=2, rc=5, status=complete): not installed f5-lbaas-agent-10.6.143.121_resource_monitor_0 (node=precise64a, call=3, rc=5, status=complete): not installed f5-lbaas-agent-10.6.143.122_resource_monitor_0 (node=precise64a, call=4, rc=5, status=complete): not installed f5-lbaas-agent-10.6.143.123_resource_monitor_0 (node=precise64a, call=5, rc=5, status=complete): not installed vagrant@precise64b:/vagrant/puppet-environments/modules/f5_lbaas/tests$ *************************** vagrant@precise64b:/vagrant/puppet-environments/modules/f5_lbaas/tests$ sudo crm_verify -L -V crm_verify[15183]: 2014/07/02_03:39:13 ERROR: RecurringOp: Invalid recurring action f5-lbaas-agent-10.6.143.121_resource-start-10 wth name: 'start' crm_verify[15183]: 2014/07/02_03:39:13 ERROR: RecurringOp: Invalid recurring action f5-lbaas-agent-10.6.143.121_resource-stop-10 wth name: 'stop' crm_verify[15183]: 2014/07/02_03:39:13 ERROR: RecurringOp: Invalid recurring action f5-lbaas-agent-10.6.143.122_resource-start-10 wth name: 'start' crm_verify[15183]: 2014/07/02_03:39:13 ERROR: RecurringOp: Invalid recurring action f5-lbaas-agent-10.6.143.122_resource-stop-10 wth name: 'stop' Errors found during check: config not valid vagrant@precise64b:/vagrant/puppet-environments/modules/f5_lbaas/tests$ *************************** What do these errors signify? I found one email exchange on a pacemaker ML that suggested that we shouldn't be using start intervals and timeouts, and same with stop, since that would mean that pacemaker would attempt to restart the resource every x seconds, timeout every y seconds, and repeat that. (Link: http://lists.linbit.com/pipermail/drbd-user/2011-September/016938.html) My understanding was that the start interval would apply in case of restart attempts upon detection of a resource as being down. Nevertheless, I removed these parameters and created a third resource (the first two, I created with these parameters), and I still see the same monitor related errors for the third resource ( f5-lbaas-agent-10.6.143.123_resource_monitor_0) in the sudo crm status command output. I don't however understand why this resource doesn't show up in the crm_verify -L -V output. Here are the two CLIs I use to create the resources: sudo crm configure primitive $pmk_res_name $pmk_cont_type:$service_name op monitor interval="$mon_interval" timeout="$mon_timeout" op start interval="$start_interval" timeout="$start_timeout" op stop interval="$stop_interval" timeout="$stop_timeout sudo crm configure primitive $pmk_res_name $pmk_cont_type:$service_name op monitor interval="$mon_interval" timeout="$mon_timeout" The bottom-line is that if I halt the VM running any of these resources, the resource isn't failing over to another VM. I'm not sure what the exact cause is - any help would be greatly appreciated! Thanks, Regards, Vijay
_______________________________________________ Pacemaker mailing list: Pacemaker@oss.clusterlabs.org http://oss.clusterlabs.org/mailman/listinfo/pacemaker Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://bugs.clusterlabs.org