On Fri, Feb 17, 2012 at 10:49 AM, <renayama19661...@ybb.ne.jp> wrote: > Hi Andrew, > > Thank you for comment. > >> I'm getting to this soon, really :-) >> First it was corosync 2.0 stuff, so that /something/ in fedora-17 >> works, then fixing everything I broke when adding corosync 2.0 >> support. > > All right! > > I wait for your answer.
I somehow missed that the failure was "not configured" Failed actions: prmVIP_monitor_0 (node=rh57-1, call=2, rc=6, status=complete): not configured http://www.clusterlabs.org/doc/en-US/Pacemaker/1.1/html/Pacemaker_Explained/s-ocf-return-codes.html lists rc=6 as fatal, but I believe we changed that behaviour (the stopping aspect) in the PE as there was also insufficient information for the agent to stop the service. Which results in the node being fenced, the resource being probed, which fails along with the subsequent stop, then the node is fenced again, etc. So two things: this log message should include the human version of rc=6 Jan 6 19:22:01 rh57-1 pengine: [3464]: ERROR: unpack_rsc_op: Hard error - prmVIP_monitor_0 failed with rc=6: Preventing prmVIP from re-starting anywhere in the cluster and the docs need to be updated. > > Best Regards, > Hideo Yamauchi. > > --- On Thu, 2012/2/16, Andrew Beekhof <and...@beekhof.net> wrote: > >> Sorry! >> >> I'm getting to this soon, really :-) >> First it was corosync 2.0 stuff, so that /something/ in fedora-17 >> works, then fixing everything I broke when adding corosync 2.0 >> support. >> >> On Tue, Feb 14, 2012 at 11:20 AM, <renayama19661...@ybb.ne.jp> wrote: >> > Hi Andrew, >> > >> > About this problem, how did it turn out afterwards? >> > >> > Best Regards, >> > Hideo Yamauchi. >> > >> > >> > --- On Mon, 2012/1/16, renayama19661...@ybb.ne.jp >> > <renayama19661...@ybb.ne.jp> wrote: >> > >> >> Hi Andrew, >> >> >> >> Thank you for comments. >> >> >> >> > Could you send me the PE file related to this log please? >> >> > >> >> > Jan 6 19:22:01 rh57-1 crmd: [3461]: info: do_te_invoke: Processing >> >> > graph 4 (ref=pe_calc-dc-1325845321-26) derived from >> >> > /var/lib/pengine/pe-input-4.bz2 >> >> >> >> The old file disappeared. >> >> I send log and the PE file which reappeared in the same procedure. >> >> >> >> * trac1818.zip >> >> * >> >> https://skydrive.live.com/?cid=3a14d57622c66876&id=3A14D57622C66876%21127 >> >> >> >> Best Regards, >> >> Hideo Yamauchi. >> >> >> >> >> >> --- On Mon, 2012/1/16, Andrew Beekhof <and...@beekhof.net> wrote: >> >> >> >> > On Fri, Jan 6, 2012 at 12:37 PM, <renayama19661...@ybb.ne.jp> wrote: >> >> > > Hi Andrew, >> >> > > >> >> > > Thank you for comment. >> >> > > >> >> > >> But it should have a subsequent stop action which would set it back >> >> > >> to >> >> > >> being inactive. >> >> > >> Did that not happen in this case? >> >> > > >> >> > > Yes. >> >> > >> >> > Could you send me the PE file related to this log please? >> >> > >> >> > Jan 6 19:22:01 rh57-1 crmd: [3461]: info: do_te_invoke: Processing >> >> > graph 4 (ref=pe_calc-dc-1325845321-26) derived from >> >> > /var/lib/pengine/pe-input-4.bz2 >> >> > >> >> > >> >> > >> >> > > Log of "verify_stopped" is only recorded. >> >> > > The stop handling of resource that failed in probe was not carried >> >> > > out. >> >> > > >> >> > > ----------------------------- >> >> > > ######### yamauchi PREV STOP ########## >> >> > > Jan 6 19:21:56 rh57-1 heartbeat: [3443]: info: killing >> >> > > /usr/lib64/heartbeat/ifcheckd process group 3462 with signal 15 >> >> > > Jan 6 19:21:56 rh57-1 ifcheckd: [3462]: info: crm_signal_dispatch: >> >> > > Invoking handler for signal 15: Terminated >> >> > > Jan 6 19:21:56 rh57-1 ifcheckd: [3462]: info: do_node_walk: >> >> > > Requesting the list of configured nodes >> >> > > Jan 6 19:21:58 rh57-1 ifcheckd: [3462]: info: main: Exiting ifcheckd >> >> > > Jan 6 19:21:58 rh57-1 heartbeat: [3443]: info: killing >> >> > > /usr/lib64/heartbeat/crmd process group 3461 with signal 15 >> >> > > Jan 6 19:21:58 rh57-1 crmd: [3461]: info: crm_signal_dispatch: >> >> > > Invoking handler for signal 15: Terminated >> >> > > Jan 6 19:21:58 rh57-1 crmd: [3461]: info: crm_shutdown: Requesting >> >> > > shutdown >> >> > > Jan 6 19:21:58 rh57-1 crmd: [3461]: info: do_state_transition: State >> >> > > transition S_IDLE -> S_POLICY_ENGINE [ input=I_SHUTDOWN >> >> > > cause=C_SHUTDOWN origin=crm_shutdown ] >> >> > > Jan 6 19:21:58 rh57-1 crmd: [3461]: info: do_state_transition: All 1 >> >> > > cluster nodes are eligible to run resources. >> >> > > Jan 6 19:21:58 rh57-1 crmd: [3461]: info: do_shutdown_req: Sending >> >> > > shutdown request to DC: rh57-1 >> >> > > Jan 6 19:21:59 rh57-1 crmd: [3461]: info: handle_shutdown_request: >> >> > > Creating shutdown request for rh57-1 (state=S_POLICY_ENGINE) >> >> > > Jan 6 19:21:59 rh57-1 attrd: [3460]: info: attrd_trigger_update: >> >> > > Sending flush op to all hosts for: shutdown (1325845319) >> >> > > Jan 6 19:21:59 rh57-1 attrd: [3460]: info: attrd_perform_update: >> >> > > Sent update 14: shutdown=1325845319 >> >> > > Jan 6 19:21:59 rh57-1 crmd: [3461]: info: abort_transition_graph: >> >> > > te_update_diff:150 - Triggered transition abort (complete=1, >> >> > > tag=nvpair, id=status-1fdd5e2a-44b6-44b9-9993-97fa120072a4-shutdown, >> >> > > name=shutdown, value=1325845319, magic=NA, cib=0.101.16) : Transient >> >> > > attribute: update >> >> > > Jan 6 19:22:01 rh57-1 crmd: [3461]: info: crm_timer_popped: New >> >> > > Transition Timer (I_PE_CALC) just popped! >> >> > > Jan 6 19:22:01 rh57-1 crmd: [3461]: info: do_pe_invoke: Query 44: >> >> > > Requesting the current CIB: S_POLICY_ENGINE >> >> > > Jan 6 19:22:01 rh57-1 crmd: [3461]: info: do_pe_invoke_callback: >> >> > > Invoking the PE: query=44, ref=pe_calc-dc-1325845321-26, seq=1, >> >> > > quorate=0 >> >> > > Jan 6 19:22:01 rh57-1 pengine: [3464]: notice: unpack_config: On >> >> > > loss of CCM Quorum: Ignore >> >> > > Jan 6 19:22:01 rh57-1 pengine: [3464]: info: unpack_config: Node >> >> > > scores: 'red' = -INFINITY, 'yellow' = 0, 'green' = 0 >> >> > > Jan 6 19:22:01 rh57-1 pengine: [3464]: WARN: unpack_nodes: Blind >> >> > > faith: not fencing unseen nodes >> >> > > Jan 6 19:22:01 rh57-1 pengine: [3464]: info: >> >> > > determine_online_status: Node rh57-1 is shutting down >> >> > > Jan 6 19:22:01 rh57-1 pengine: [3464]: ERROR: unpack_rsc_op: Hard >> >> > > error - prmVIP_monitor_0 failed with rc=6: Preventing prmVIP from >> >> > > re-starting anywhere in the cluster >> >> > > Jan 6 19:22:01 rh57-1 pengine: [3464]: notice: group_print: >> >> > > Resource Group: grpUltraMonkey >> >> > > Jan 6 19:22:01 rh57-1 pengine: [3464]: notice: native_print: >> >> > > prmVIP (ocf::heartbeat:LVM): Stopped >> >> > > Jan 6 19:22:01 rh57-1 pengine: [3464]: notice: group_print: >> >> > > Resource Group: grpStonith1 >> >> > > Jan 6 19:22:01 rh57-1 pengine: [3464]: notice: native_print: >> >> > > prmStonith1-2 (stonith:external/ssh): Stopped >> >> > > Jan 6 19:22:01 rh57-1 pengine: [3464]: notice: native_print: >> >> > > prmStonith1-3 (stonith:meatware): Stopped >> >> > > Jan 6 19:22:01 rh57-1 pengine: [3464]: notice: group_print: >> >> > > Resource Group: grpStonith2 >> >> > > Jan 6 19:22:01 rh57-1 pengine: [3464]: notice: native_print: >> >> > > prmStonith2-2 (stonith:external/ssh): Started rh57-1 >> >> > > Jan 6 19:22:01 rh57-1 pengine: [3464]: notice: native_print: >> >> > > prmStonith2-3 (stonith:meatware): Started rh57-1 >> >> > > Jan 6 19:22:01 rh57-1 pengine: [3464]: notice: clone_print: Clone >> >> > > Set: clnPingd >> >> > > Jan 6 19:22:01 rh57-1 pengine: [3464]: notice: short_print: >> >> > > Started: [ rh57-1 ] >> >> > > Jan 6 19:22:01 rh57-1 pengine: [3464]: info: rsc_merge_weights: >> >> > > clnPingd: Rolling back scores from prmVIP >> >> > > Jan 6 19:22:01 rh57-1 pengine: [3464]: info: native_color: Resource >> >> > > prmPingd:0 cannot run anywhere >> >> > > Jan 6 19:22:01 rh57-1 pengine: [3464]: info: native_color: Resource >> >> > > prmVIP cannot run anywhere >> >> > > Jan 6 19:22:01 rh57-1 pengine: [3464]: info: rsc_merge_weights: >> >> > > prmStonith1-2: Rolling back scores from prmStonith1-3 >> >> > > Jan 6 19:22:01 rh57-1 pengine: [3464]: info: native_color: Resource >> >> > > prmStonith1-2 cannot run anywhere >> >> > > Jan 6 19:22:01 rh57-1 pengine: [3464]: info: native_color: Resource >> >> > > prmStonith1-3 cannot run anywhere >> >> > > Jan 6 19:22:01 rh57-1 pengine: [3464]: info: rsc_merge_weights: >> >> > > prmStonith2-2: Rolling back scores from prmStonith2-3 >> >> > > Jan 6 19:22:01 rh57-1 pengine: [3464]: info: native_color: Resource >> >> > > prmStonith2-2 cannot run anywhere >> >> > > Jan 6 19:22:01 rh57-1 pengine: [3464]: info: native_color: Resource >> >> > > prmStonith2-3 cannot run anywhere >> >> > > Jan 6 19:22:01 rh57-1 pengine: [3464]: info: stage6: Scheduling Node >> >> > > rh57-1 for shutdown >> >> > > Jan 6 19:22:01 rh57-1 pengine: [3464]: notice: LogActions: Leave >> >> > > resource prmVIP (Stopped) >> >> > > Jan 6 19:22:01 rh57-1 pengine: [3464]: notice: LogActions: Leave >> >> > > resource prmStonith1-2 (Stopped) >> >> > > Jan 6 19:22:01 rh57-1 pengine: [3464]: notice: LogActions: Leave >> >> > > resource prmStonith1-3 (Stopped) >> >> > > Jan 6 19:22:01 rh57-1 pengine: [3464]: notice: LogActions: Stop >> >> > > resource prmStonith2-2 (rh57-1) >> >> > > Jan 6 19:22:01 rh57-1 pengine: [3464]: notice: LogActions: Stop >> >> > > resource prmStonith2-3 (rh57-1) >> >> > > Jan 6 19:22:01 rh57-1 pengine: [3464]: notice: LogActions: Stop >> >> > > resource prmPingd:0 (rh57-1) >> >> > > Jan 6 19:22:01 rh57-1 crmd: [3461]: info: do_state_transition: State >> >> > > transition S_POLICY_ENGINE -> S_TRANSITION_ENGINE [ >> >> > > input=I_PE_SUCCESS cause=C_IPC_MESSAGE origin=handle_response ] >> >> > > Jan 6 19:22:01 rh57-1 pengine: [3464]: info: process_pe_message: >> >> > > Transition 4: PEngine Input stored in: /var/lib/pengine/pe-input-4.bz2 >> >> > > Jan 6 19:22:01 rh57-1 crmd: [3461]: info: unpack_graph: Unpacked >> >> > > transition 4: 9 actions in 9 synapses >> >> > > Jan 6 19:22:01 rh57-1 crmd: [3461]: info: do_te_invoke: Processing >> >> > > graph 4 (ref=pe_calc-dc-1325845321-26) derived from >> >> > > /var/lib/pengine/pe-input-4.bz2 >> >> > > Jan 6 19:22:01 rh57-1 crmd: [3461]: info: te_pseudo_action: Pseudo >> >> > > action 19 fired and confirmed >> >> > > Jan 6 19:22:01 rh57-1 crmd: [3461]: info: te_pseudo_action: Pseudo >> >> > > action 24 fired and confirmed >> >> > > Jan 6 19:22:01 rh57-1 crmd: [3461]: info: te_rsc_command: Initiating >> >> > > action 21: stop prmPingd:0_stop_0 on rh57-1 (local) >> >> > > Jan 6 19:22:02 rh57-1 lrmd: [3458]: info: cancel_op: operation >> >> > > monitor[10] on prmPingd:0 for client 3461, its parameters: >> >> > > CRM_meta_interval=[10000] multiplier=[100] CRM_meta_on_fail=[restart] >> >> > > CRM_meta_timeout=[60000] name=[default_ping_set] >> >> > > CRM_meta_clone_max=[1] crm_feature_set=[3.0.1] >> >> > > host_list=[192.168.40.1] CRM_meta_globally_unique=[false] >> >> > > CRM_meta_name=[monitor] CRM_meta_clone=[0] >> >> > > CRM_meta_clone_node_max=[1] CRM_meta_notify=[false] cancelled >> >> > > Jan 6 19:22:02 rh57-1 crmd: [3461]: info: do_lrm_rsc_op: Performing >> >> > > key=21:4:0:f1bcc681-b4b6-4f96-8de0-925a814014f9 op=prmPingd:0_stop_0 ) >> >> > > Jan 6 19:22:02 rh57-1 pingd: [3529]: info: crm_signal_dispatch: >> >> > > Invoking handler for signal 15: Terminated >> >> > > Jan 6 19:22:02 rh57-1 lrmd: [3458]: info: rsc:prmPingd:0 stop[14] >> >> > > (pid 3612) >> >> > > Jan 6 19:22:02 rh57-1 lrmd: [3458]: info: operation stop[14] on >> >> > > prmPingd:0 for client 3461: pid 3612 exited with return code 0 >> >> > > Jan 6 19:22:02 rh57-1 crmd: [3461]: info: process_lrm_event: LRM >> >> > > operation prmPingd:0_monitor_10000 (call=10, status=1, cib-update=0, >> >> > > confirmed=true) Cancelled >> >> > > Jan 6 19:22:02 rh57-1 crmd: [3461]: info: process_lrm_event: LRM >> >> > > operation prmPingd:0_stop_0 (call=14, rc=0, cib-update=45, >> >> > > confirmed=true) ok >> >> > > Jan 6 19:22:02 rh57-1 crmd: [3461]: info: match_graph_event: Action >> >> > > prmPingd:0_stop_0 (21) confirmed on rh57-1 (rc=0) >> >> > > Jan 6 19:22:02 rh57-1 crmd: [3461]: info: te_pseudo_action: Pseudo >> >> > > action 25 fired and confirmed >> >> > > Jan 6 19:22:02 rh57-1 crmd: [3461]: info: te_pseudo_action: Pseudo >> >> > > action 4 fired and confirmed >> >> > > Jan 6 19:22:02 rh57-1 crmd: [3461]: info: te_rsc_command: Initiating >> >> > > action 16: stop prmStonith2-3_stop_0 on rh57-1 (local) >> >> > > Jan 6 19:22:02 rh57-1 lrmd: [3458]: info: cancel_op: operation >> >> > > monitor[13] on prmStonith2-3 for client 3461, its parameters: >> >> > > CRM_meta_interval=[3600000] stonith-timeout=[600s] hostlist=[rh57-2] >> >> > > CRM_meta_timeout=[60000] crm_feature_set=[3.0.1] priority=[2] >> >> > > CRM_meta_name=[monitor] cancelled >> >> > > Jan 6 19:22:02 rh57-1 crmd: [3461]: info: do_lrm_rsc_op: Performing >> >> > > key=16:4:0:f1bcc681-b4b6-4f96-8de0-925a814014f9 >> >> > > op=prmStonith2-3_stop_0 ) >> >> > > Jan 6 19:22:02 rh57-1 lrmd: [3458]: info: rsc:prmStonith2-3 stop[15] >> >> > > (pid 3617) >> >> > > Jan 6 19:22:02 rh57-1 lrmd: [3617]: info: Try to stop STONITH >> >> > > resource <rsc_id=prmStonith2-3> : Device=meatware >> >> > > Jan 6 19:22:02 rh57-1 crmd: [3461]: info: process_lrm_event: LRM >> >> > > operation prmStonith2-3_monitor_3600000 (call=13, status=1, >> >> > > cib-update=0, confirmed=true) Cancelled >> >> > > Jan 6 19:22:02 rh57-1 lrmd: [3458]: info: operation stop[15] on >> >> > > prmStonith2-3 for client 3461: pid 3617 exited with return code 0 >> >> > > Jan 6 19:22:02 rh57-1 crmd: [3461]: info: process_lrm_event: LRM >> >> > > operation prmStonith2-3_stop_0 (call=15, rc=0, cib-update=46, >> >> > > confirmed=true) ok >> >> > > Jan 6 19:22:02 rh57-1 crmd: [3461]: info: match_graph_event: Action >> >> > > prmStonith2-3_stop_0 (16) confirmed on rh57-1 (rc=0) >> >> > > Jan 6 19:22:02 rh57-1 crmd: [3461]: info: te_rsc_command: Initiating >> >> > > action 15: stop prmStonith2-2_stop_0 on rh57-1 (local) >> >> > > Jan 6 19:22:02 rh57-1 lrmd: [3458]: info: cancel_op: operation >> >> > > monitor[11] on prmStonith2-2 for client 3461, its parameters: >> >> > > CRM_meta_interval=[3600000] stonith-timeout=[60s] hostlist=[rh57-2] >> >> > > CRM_meta_timeout=[60000] crm_feature_set=[3.0.1] priority=[1] >> >> > > CRM_meta_name=[monitor] cancelled >> >> > > Jan 6 19:22:02 rh57-1 crmd: [3461]: info: do_lrm_rsc_op: Performing >> >> > > key=15:4:0:f1bcc681-b4b6-4f96-8de0-925a814014f9 >> >> > > op=prmStonith2-2_stop_0 ) >> >> > > Jan 6 19:22:02 rh57-1 lrmd: [3458]: info: rsc:prmStonith2-2 stop[16] >> >> > > (pid 3619) >> >> > > Jan 6 19:22:02 rh57-1 lrmd: [3619]: info: Try to stop STONITH >> >> > > resource <rsc_id=prmStonith2-2> : Device=external/ssh >> >> > > Jan 6 19:22:02 rh57-1 crmd: [3461]: info: process_lrm_event: LRM >> >> > > operation prmStonith2-2_monitor_3600000 (call=11, status=1, >> >> > > cib-update=0, confirmed=true) Cancelled >> >> > > Jan 6 19:22:02 rh57-1 lrmd: [3458]: info: operation stop[16] on >> >> > > prmStonith2-2 for client 3461: pid 3619 exited with return code 0 >> >> > > Jan 6 19:22:02 rh57-1 crmd: [3461]: info: process_lrm_event: LRM >> >> > > operation prmStonith2-2_stop_0 (call=16, rc=0, cib-update=47, >> >> > > confirmed=true) ok >> >> > > Jan 6 19:22:02 rh57-1 crmd: [3461]: info: match_graph_event: Action >> >> > > prmStonith2-2_stop_0 (15) confirmed on rh57-1 (rc=0) >> >> > > Jan 6 19:22:02 rh57-1 crmd: [3461]: info: te_pseudo_action: Pseudo >> >> > > action 20 fired and confirmed >> >> > > Jan 6 19:22:02 rh57-1 crmd: [3461]: info: te_crm_command: Executing >> >> > > crm-event (28): do_shutdown on rh57-1 >> >> > > Jan 6 19:22:02 rh57-1 crmd: [3461]: info: te_crm_command: crm-event >> >> > > (28) is a local shutdown >> >> > > Jan 6 19:22:02 rh57-1 crmd: [3461]: info: run_graph: >> >> > > ==================================================== >> >> > > Jan 6 19:22:02 rh57-1 crmd: [3461]: notice: run_graph: Transition 4 >> >> > > (Complete=9, Pending=0, Fired=0, Skipped=0, Incomplete=0, >> >> > > Source=/var/lib/pengine/pe-input-4.bz2): Complete >> >> > > Jan 6 19:22:02 rh57-1 crmd: [3461]: info: te_graph_trigger: >> >> > > Transition 4 is now complete >> >> > > Jan 6 19:22:03 rh57-1 crmd: [3461]: info: do_state_transition: State >> >> > > transition S_TRANSITION_ENGINE -> S_STOPPING [ input=I_STOP >> >> > > cause=C_FSA_INTERNAL origin=notify_crmd ] >> >> > > Jan 6 19:22:03 rh57-1 crmd: [3461]: info: do_dc_release: DC role >> >> > > released >> >> > > Jan 6 19:22:03 rh57-1 crmd: [3461]: info: stop_subsystem: Sent -TERM >> >> > > to pengine: [3464] >> >> > > Jan 6 19:22:03 rh57-1 pengine: [3464]: info: crm_signal_dispatch: >> >> > > Invoking handler for signal 15: Terminated >> >> > > Jan 6 19:22:03 rh57-1 crmd: [3461]: info: do_te_control: >> >> > > Transitioner is now inactive >> >> > > Jan 6 19:22:03 rh57-1 crmd: [3461]: info: do_te_control: >> >> > > Disconnecting STONITH... >> >> > > Jan 6 19:22:03 rh57-1 crmd: [3461]: info: >> >> > > tengine_stonith_connection_destroy: Fencing daemon disconnected >> >> > > Jan 6 19:22:03 rh57-1 crmd: [3461]: notice: Not currently connected. >> >> > > Jan 6 19:22:03 rh57-1 crmd: [3461]: info: do_shutdown: Terminating >> >> > > the pengine >> >> > > Jan 6 19:22:03 rh57-1 crmd: [3461]: info: stop_subsystem: Sent -TERM >> >> > > to pengine: [3464] >> >> > > Jan 6 19:22:03 rh57-1 crmd: [3461]: info: do_shutdown: Waiting for >> >> > > subsystems to exit >> >> > > Jan 6 19:22:03 rh57-1 crmd: [3461]: WARN: register_fsa_input_adv: >> >> > > do_shutdown stalled the FSA with pending inputs >> >> > > Jan 6 19:22:03 rh57-1 crmd: [3461]: info: do_shutdown: All >> >> > > subsystems stopped, continuing >> >> > > Jan 6 19:22:03 rh57-1 crmd: [3461]: WARN: do_log: FSA: Input >> >> > > I_RELEASE_SUCCESS from do_dc_release() received in state S_STOPPING >> >> > > Jan 6 19:22:03 rh57-1 crmd: [3461]: info: do_shutdown: Terminating >> >> > > the pengine >> >> > > Jan 6 19:22:03 rh57-1 crmd: [3461]: info: stop_subsystem: Sent -TERM >> >> > > to pengine: [3464] >> >> > > Jan 6 19:22:03 rh57-1 crmd: [3461]: info: do_shutdown: Waiting for >> >> > > subsystems to exit >> >> > > Jan 6 19:22:03 rh57-1 crmd: [3461]: info: do_shutdown: All >> >> > > subsystems stopped, continuing >> >> > > Jan 6 19:22:03 rh57-1 crmd: [3461]: WARN: G_SIG_dispatch: Dispatch >> >> > > function for SIGCHLD was delayed 420 ms (> 100 ms) before being >> >> > > called (GSource: 0x179d9b0) >> >> > > Jan 6 19:22:03 rh57-1 crmd: [3461]: info: G_SIG_dispatch: started at >> >> > > 429442052 should have started at 429442010 >> >> > > Jan 6 19:22:03 rh57-1 crmd: [3461]: info: crmdManagedChildDied: >> >> > > Process pengine:[3464] exited (signal=0, exitcode=0) >> >> > > Jan 6 19:22:03 rh57-1 crmd: [3461]: WARN: G_SIG_dispatch: Dispatch >> >> > > function for SIGCHLD took too long to execute: 80 ms (> 30 ms) >> >> > > (GSource: 0x179d9b0) >> >> > > Jan 6 19:22:03 rh57-1 crmd: [3461]: info: pe_msg_dispatch: Received >> >> > > HUP from pengine:[3464] >> >> > > Jan 6 19:22:03 rh57-1 crmd: [3461]: info: pe_connection_destroy: >> >> > > Connection to the Policy Engine released >> >> > > Jan 6 19:22:03 rh57-1 crmd: [3461]: info: do_shutdown: All >> >> > > subsystems stopped, continuing >> >> > > Jan 6 19:22:03 rh57-1 crmd: [3461]: ERROR: verify_stopped: Resource >> >> > > prmVIP was active at shutdown. You may ignore this error if it is >> >> > > unmanaged. >> >> > > Jan 6 19:22:03 rh57-1 crmd: [3461]: info: do_lrm_control: >> >> > > Disconnected from the LRM >> >> > > Jan 6 19:22:03 rh57-1 crmd: [3461]: info: do_ha_control: >> >> > > Disconnected from Heartbeat >> >> > > Jan 6 19:22:03 rh57-1 ccm: [3456]: info: client (pid=3461) removed >> >> > > from ccm >> >> > > Jan 6 19:22:03 rh57-1 crmd: [3461]: info: do_cib_control: >> >> > > Disconnecting CIB >> >> > > Jan 6 19:22:03 rh57-1 crmd: [3461]: info: >> >> > > crmd_cib_connection_destroy: Connection to the CIB terminated... >> >> > > Jan 6 19:22:03 rh57-1 cib: [3457]: info: cib_process_readwrite: We >> >> > > are now in R/O mode >> >> > > Jan 6 19:22:03 rh57-1 crmd: [3461]: info: do_exit: Performing >> >> > > A_EXIT_0 - gracefully exiting the CRMd >> >> > > Jan 6 19:22:03 rh57-1 cib: [3457]: WARN: send_ipc_message: IPC >> >> > > Channel to 3461 is not connected >> >> > > Jan 6 19:22:04 rh57-1 crmd: [3461]: info: free_mem: Dropping >> >> > > I_TERMINATE: [ state=S_STOPPING cause=C_FSA_INTERNAL origin=do_stop ] >> >> > > Jan 6 19:22:04 rh57-1 cib: [3457]: WARN: send_via_callback_channel: >> >> > > Delivery of reply to client 3461/5f69edda-aec9-42c7-ae52-045a05d1c5db >> >> > > failed >> >> > > Jan 6 19:22:04 rh57-1 crmd: [3461]: info: do_exit: [crmd] stopped (0) >> >> > > Jan 6 19:22:04 rh57-1 cib: [3457]: WARN: do_local_notify: A-Sync >> >> > > reply to crmd failed: reply failed >> >> > > Jan 6 19:22:04 rh57-1 heartbeat: [3443]: info: killing >> >> > > /usr/lib64/heartbeat/attrd process group 3460 with signal 15 >> >> > > Jan 6 19:22:04 rh57-1 heartbeat: [3443]: WARN: G_SIG_dispatch: >> >> > > Dispatch function for SIGCHLD took too long to execute: 50 ms (> 30 >> >> > > ms) (GSource: 0x7b28140) >> >> > > Jan 6 19:22:04 rh57-1 attrd: [3460]: info: crm_signal_dispatch: >> >> > > Invoking handler for signal 15: Terminated >> >> > > Jan 6 19:22:04 rh57-1 attrd: [3460]: info: attrd_shutdown: Exiting >> >> > > Jan 6 19:22:04 rh57-1 attrd: [3460]: info: main: Exiting... >> >> > > Jan 6 19:22:04 rh57-1 attrd: [3460]: info: >> >> > > attrd_cib_connection_destroy: Connection to the CIB terminated... >> >> > > Jan 6 19:22:04 rh57-1 heartbeat: [3443]: info: killing >> >> > > /usr/lib64/heartbeat/stonithd process group 3459 with signal 15 >> >> > > Jan 6 19:22:04 rh57-1 stonithd: [3459]: notice: >> >> > > /usr/lib64/heartbeat/stonithd normally quit. >> >> > > Jan 6 19:22:04 rh57-1 heartbeat: [3443]: info: killing >> >> > > /usr/lib64/heartbeat/lrmd -r process group 3458 with signal 15 >> >> > > Jan 6 19:22:04 rh57-1 heartbeat: [3443]: WARN: G_SIG_dispatch: >> >> > > Dispatch function for SIGCHLD took too long to execute: 40 ms (> 30 >> >> > > ms) (GSource: 0x7b28140) >> >> > > Jan 6 19:22:04 rh57-1 lrmd: [3458]: info: lrmd is shutting down >> >> > > Jan 6 19:22:04 rh57-1 heartbeat: [3443]: info: killing >> >> > > /usr/lib64/heartbeat/cib process group 3457 with signal 15 >> >> > > Jan 6 19:22:04 rh57-1 heartbeat: [3443]: WARN: G_SIG_dispatch: >> >> > > Dispatch function for SIGCHLD took too long to execute: 40 ms (> 30 >> >> > > ms) (GSource: 0x7b28140) >> >> > > Jan 6 19:22:04 rh57-1 cib: [3457]: info: crm_signal_dispatch: >> >> > > Invoking handler for signal 15: Terminated >> >> > > Jan 6 19:22:04 rh57-1 cib: [3457]: info: cib_shutdown: Disconnected >> >> > > 0 clients >> >> > > Jan 6 19:22:04 rh57-1 cib: [3457]: info: cib_process_disconnect: All >> >> > > clients disconnected... >> >> > > Jan 6 19:22:04 rh57-1 cib: [3457]: info: terminate_cib: >> >> > > initiate_exit: Disconnecting heartbeat >> >> > > Jan 6 19:22:04 rh57-1 cib: [3457]: info: terminate_cib: Exiting... >> >> > > Jan 6 19:22:04 rh57-1 cib: [3457]: info: main: Done >> >> > > Jan 6 19:22:04 rh57-1 ccm: [3456]: info: client (pid=3457) removed >> >> > > from ccm >> >> > > Jan 6 19:22:04 rh57-1 heartbeat: [3443]: info: killing >> >> > > /usr/lib64/heartbeat/ccm process group 3456 with signal 15 >> >> > > Jan 6 19:22:04 rh57-1 heartbeat: [3443]: WARN: G_SIG_dispatch: >> >> > > Dispatch function for SIGCHLD took too long to execute: 60 ms (> 30 >> >> > > ms) (GSource: 0x7b28140) >> >> > > Jan 6 19:22:04 rh57-1 ccm: [3456]: info: received SIGTERM, going to >> >> > > shut down >> >> > > Jan 6 19:22:05 rh57-1 heartbeat: [3443]: info: killing HBFIFO >> >> > > process 3446 with signal 15 >> >> > > Jan 6 19:22:05 rh57-1 heartbeat: [3443]: info: killing HBWRITE >> >> > > process 3447 with signal 15 >> >> > > Jan 6 19:22:05 rh57-1 heartbeat: [3443]: info: killing HBREAD >> >> > > process 3448 with signal 15 >> >> > > Jan 6 19:22:05 rh57-1 heartbeat: [3443]: info: killing HBWRITE >> >> > > process 3449 with signal 15 >> >> > > Jan 6 19:22:05 rh57-1 heartbeat: [3443]: info: killing HBREAD >> >> > > process 3450 with signal 15 >> >> > > Jan 6 19:22:05 rh57-1 heartbeat: [3443]: info: Core process 3448 >> >> > > exited. 5 remaining >> >> > > Jan 6 19:22:05 rh57-1 heartbeat: [3443]: info: Core process 3447 >> >> > > exited. 4 remaining >> >> > > Jan 6 19:22:05 rh57-1 heartbeat: [3443]: info: Core process 3450 >> >> > > exited. 3 remaining >> >> > > Jan 6 19:22:05 rh57-1 heartbeat: [3443]: info: Core process 3446 >> >> > > exited. 2 remaining >> >> > > Jan 6 19:22:05 rh57-1 heartbeat: [3443]: info: Core process 3449 >> >> > > exited. 1 remaining >> >> > > Jan 6 19:22:05 rh57-1 heartbeat: [3443]: info: rh57-1 Heartbeat >> >> > > shutdown complete. >> >> > > >> >> > > ----------------------------- >> >> > > >> >> > > >> >> > > >> >> > > Best Regards, >> >> > > Hideo Yamauchi. >> >> > > >> >> > > >> >> > > >> >> > > >> >> > > >> >> > > --- On Fri, 2012/1/6, Andrew Beekhof <and...@beekhof.net> wrote: >> >> > > >> >> > >> On Tue, Dec 27, 2011 at 6:15 PM, <renayama19661...@ybb.ne.jp> wrote: >> >> > >> > Hi All, >> >> > >> > >> >> > >> > When Pacemaker stops when there is the resource that failed in >> >> > >> > probe processing, crmd outputs the following error message. >> >> > >> > >> >> > >> > >> >> > >> > Dec 28 00:07:36 rh57-1 crmd: [3206]: ERROR: verify_stopped: >> >> > >> > Resource XXXXX was active at shutdown. You may ignore this error >> >> > >> > if it is unmanaged. >> >> > >> > >> >> > >> > >> >> > >> > Because the resource that failed in probe processing does not >> >> > >> > start, >> >> > >> >> >> > >> But it should have a subsequent stop action which would set it back >> >> > >> to >> >> > >> being inactive. >> >> > >> Did that not happen in this case? >> >> > >> >> >> > >> > this error message is not right. >> >> > >> > >> >> > >> > I think that the following correction may be good, but we do not >> >> > >> > have conviction. >> >> > >> > >> >> > >> > >> >> > >> > * crmd/lrm.c >> >> > >> > (snip) >> >> > >> > } else if(op->rc == EXECRA_NOT_RUNNING) { >> >> > >> > active = FALSE; >> >> > >> > + } else if(op->rc != EXECRA_OK && op->interval == 0 >> >> > >> > + && safe_str_eq(op->op_type, >> >> > >> > CRMD_ACTION_STATUS)) { >> >> > >> > + active = FALSE; >> >> > >> > } else { >> >> > >> > active = TRUE; >> >> > >> > } >> >> > >> > (snip) >> >> > >> > >> >> > >> > >> >> > >> > In the source for development of Pacemaker, handling of this >> >> > >> > processing seems to be considerably changed. >> >> > >> > It requests backporting to Pacemaker1.0 system of this change that >> >> > >> > we can do it. >> >> > >> > >> >> > >> > Best Regards, >> >> > >> > Hideo Yamauchi. >> >> > >> > >> >> > >> > >> >> > >> > >> >> > >> > _______________________________________________ >> >> > >> > Pacemaker mailing list: Pacemaker@oss.clusterlabs.org >> >> > >> > http://oss.clusterlabs.org/mailman/listinfo/pacemaker >> >> > >> > >> >> > >> > Project Home: http://www.clusterlabs.org >> >> > >> > Getting started: >> >> > >> > http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf >> >> > >> > Bugs: http://bugs.clusterlabs.org >> >> > >> >> >> > > >> >> > > _______________________________________________ >> >> > > Pacemaker mailing list: Pacemaker@oss.clusterlabs.org >> >> > > http://oss.clusterlabs.org/mailman/listinfo/pacemaker >> >> > > >> >> > > Project Home: http://www.clusterlabs.org >> >> > > Getting started: >> >> > > http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf >> >> > > Bugs: http://bugs.clusterlabs.org >> >> > >> >> >> > >> > _______________________________________________ >> > Pacemaker mailing list: Pacemaker@oss.clusterlabs.org >> > http://oss.clusterlabs.org/mailman/listinfo/pacemaker >> > >> > Project Home: http://www.clusterlabs.org >> > Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf >> > Bugs: http://bugs.clusterlabs.org >> > > _______________________________________________ > Pacemaker mailing list: Pacemaker@oss.clusterlabs.org > http://oss.clusterlabs.org/mailman/listinfo/pacemaker > > Project Home: http://www.clusterlabs.org > Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf > Bugs: http://bugs.clusterlabs.org _______________________________________________ Pacemaker mailing list: Pacemaker@oss.clusterlabs.org http://oss.clusterlabs.org/mailman/listinfo/pacemaker Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://bugs.clusterlabs.org