I think this should be fixed by: https://github.com/beekhof/pacemaker/commit/ea7991f
The underlying issue though, is that the lrmd command timed out, which _should_ have been fixed by: https://github.com/beekhof/pacemaker/commit/d65b270 What are you doing to this poor cluster? :) On 21 Oct 2013, at 3:59 pm, Kazunori INOUE <kazunori.ino...@gmail.com> wrote: > Hi, > > I'm using pacemaker-1.1 (b6d42ed. the latest devel). > > After having started corosync and pacemaker with three nodes, > I loaded configuration. > Then internal error occurred in crmd and was exited. > > $ crm configure load update 3vm+2stonith.cli > $ for i in n{6..8};do ssh $i 'grep error: /var/log/ha-log';done > Oct 21 11:19:43 bl460g1n6 pengine[7684]: error: unpack_resources: > Resource start-up disabled since no STONITH resources have been > defined > Oct 21 11:19:43 bl460g1n6 pengine[7684]: error: unpack_resources: > Either configure some or disable STONITH with the stonith-enabled > option > Oct 21 11:19:43 bl460g1n6 pengine[7684]: error: unpack_resources: > NOTE: Clusters with shared data need STONITH to ensure data integrity > Oct 21 11:20:51 bl460g1n6 crmd[7685]: error: crm_element_value: > Couldn't find lrmd_callid in NULL > Oct 21 11:20:51 bl460g1n6 crmd[7685]: error: crm_abort: > crm_element_value: Triggered assert at xml.c:3336 : data != NULL > Oct 21 11:20:51 bl460g1n6 crmd[7685]: error: crm_element_value: > Couldn't find lrmd_rc in NULL > Oct 21 11:20:51 bl460g1n6 crmd[7685]: error: crm_abort: > crm_element_value: Triggered assert at xml.c:3336 : data != NULL > Oct 21 11:20:53 bl460g1n6 crmd[7685]: error: > internal_ipc_get_reply: Discarding old reply 90 (need 91) > > Oct 21 11:20:51 bl460g1n7 crmd[12487]: error: lrmd_send_command: > Couldn't perform lrmd_rsc_info operation (timeout=30000): -11: > Connection timed out (110) > Oct 21 11:20:52 bl460g1n7 crmd[12487]: error: lrmd_send_command: > Couldn't perform lrmd_rsc_register operation (timeout=0): -114: > Connection timed out (110) > Oct 21 11:20:52 bl460g1n7 crmd[12487]: error: lrmd_send_command: > Couldn't perform lrmd_rsc_info operation (timeout=30000): -114: > Connection timed out (110) > Oct 21 11:20:52 bl460g1n7 crmd[12487]: error: get_lrm_resource: > Could not add resource prmStonith6-2 to LRM > Oct 21 11:20:52 bl460g1n7 crmd[12487]: error: do_lrm_invoke: > Invalid resource definition > Oct 21 11:20:52 bl460g1n7 crmd[12487]: error: do_log: FSA: Input > I_TERMINATE from do_recover() received in state S_RECOVERY > Oct 21 11:20:52 bl460g1n7 crmd[12487]: error: > lrm_state_verify_stopped: 4 pending LRM operations at shutdown > Oct 21 11:20:52 bl460g1n7 crmd[12487]: error: > lrm_state_verify_stopped: Pending action: prmVM3:13 (prmVM3_monitor_0) > Oct 21 11:20:52 bl460g1n7 crmd[12487]: error: > lrm_state_verify_stopped: Pending action: prmVM2:9 (prmVM2_monitor_0) > Oct 21 11:20:52 bl460g1n7 crmd[12487]: error: > lrm_state_verify_stopped: Pending action: prmVM1:5 (prmVM1_monitor_0) > Oct 21 11:20:52 bl460g1n7 crmd[12487]: error: > lrm_state_verify_stopped: Pending action: prmStonith6-1:17 > (prmStonith6-1_monitor_0) > Oct 21 11:20:52 bl460g1n7 crmd[12487]: error: crmd_fast_exit: Could > not recover from internal error > Oct 21 11:20:52 bl460g1n7 pacemakerd[12477]: error: > pcmk_child_exit: Child process crmd (12487) exited: Generic Pacemaker > error (201) > > Oct 21 11:20:51 bl460g1n8 crmd[1600]: error: lrmd_send_command: > Couldn't perform lrmd_rsc_info operation (timeout=30000): -11: > Connection timed out (110) > Oct 21 11:20:52 bl460g1n8 crmd[1600]: error: lrmd_send_command: > Couldn't perform lrmd_rsc_register operation (timeout=0): -114: > Connection timed out (110) > Oct 21 11:20:52 bl460g1n8 crmd[1600]: error: lrmd_send_command: > Couldn't perform lrmd_rsc_info operation (timeout=30000): -114: > Connection timed out (110) > Oct 21 11:20:52 bl460g1n8 crmd[1600]: error: get_lrm_resource: > Could not add resource prmStonith6-2 to LRM > Oct 21 11:20:52 bl460g1n8 crmd[1600]: error: do_lrm_invoke: Invalid > resource definition > Oct 21 11:20:52 bl460g1n8 crmd[1600]: error: do_log: FSA: Input > I_TERMINATE from do_recover() received in state S_RECOVERY > Oct 21 11:20:52 bl460g1n8 crmd[1600]: error: > lrm_state_verify_stopped: 4 pending LRM operations at shutdown > Oct 21 11:20:52 bl460g1n8 crmd[1600]: error: > lrm_state_verify_stopped: Pending action: prmVM3:13 (prmVM3_monitor_0) > Oct 21 11:20:52 bl460g1n8 crmd[1600]: error: > lrm_state_verify_stopped: Pending action: prmVM2:9 (prmVM2_monitor_0) > Oct 21 11:20:52 bl460g1n8 crmd[1600]: error: > lrm_state_verify_stopped: Pending action: prmVM1:5 (prmVM1_monitor_0) > Oct 21 11:20:52 bl460g1n8 crmd[1600]: error: > lrm_state_verify_stopped: Pending action: prmStonith6-1:17 > (prmStonith6-1_monitor_0) > Oct 21 11:20:52 bl460g1n8 crmd[1600]: error: crmd_fast_exit: Could > not recover from internal error > Oct 21 11:20:52 bl460g1n8 pacemakerd[1591]: error: pcmk_child_exit: > Child process crmd (1600) exited: Generic Pacemaker error (201) > > Best Regards, > Kazunori INOUE > <crmd_internal_error.tar.bz2>_______________________________________________ > Pacemaker mailing list: Pacemaker@oss.clusterlabs.org > http://oss.clusterlabs.org/mailman/listinfo/pacemaker > > Project Home: http://www.clusterlabs.org > Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf > Bugs: http://bugs.clusterlabs.org _______________________________________________ Pacemaker mailing list: Pacemaker@oss.clusterlabs.org http://oss.clusterlabs.org/mailman/listinfo/pacemaker Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://bugs.clusterlabs.org