Re: [Pacemaker] [Problem]It is judged that a stopping resource is starting.

Andrew Beekhof Thu, 23 Feb 2012 15:11:23 -0800

On Fri, Feb 24, 2012 at 10:03 AM, Andrew Beekhof <and...@beekhof.net> wrote:
> On Wed, Feb 22, 2012 at 11:31 AM,  <renayama19661...@ybb.ne.jp> wrote:
>> Hi Andrew,
>>
>> Thank you for comment.
>>
>> Sorry...I cannot understand your answer well.
>>
>> Does your answer mean next?
>>
>> 1)It is necessary for the manager of the system to cope when rc is 6(fatal) 
>> log.
>> 2)And it is necessary for this to be reflected by a document.


No to both.

>> And does it mean that the next log should not be output until a system 
>> administrator controls it?
>>
>> Dec 28 00:07:36 rh57-1 crmd: [3206]: ERROR: verify_stopped: Resource XXXXX 
>> was active at shutdown.  You may ignore this error if it is unmanaged.
>
> Right.  There was actually a third part... a slightly more restrictive
> version of your original patch:

https://github.com/beekhof/pacemaker/commit/543ee8e

> --- a/crmd/lrm.c
> +++ b/crmd/lrm.c
> @@ -694,6 +694,9 @@ is_rsc_active(const char *rsc_id)
>
>     } else if (entry->last->rc == EXECRA_NOT_RUNNING) {
>         return FALSE;
> +
> +    } else if (entry->last->interval == 0 && entry->last->rc ==
> EXECRA_NOT_CONFIGURED) {
> +        return FALSE;
>     }
>
>     return TRUE;
>
>
>>
>> Best Regards,
>> Hideo Yamauchi.
>>
>> --- On Tue, 2012/2/21, Andrew Beekhof <and...@beekhof.net> wrote:
>>
>>> On Fri, Feb 17, 2012 at 10:49 AM,  <renayama19661...@ybb.ne.jp> wrote:
>>> > Hi Andrew,
>>> >
>>> > Thank you for comment.
>>> >
>>> >> I'm getting to this soon, really :-)
>>> >> First it was corosync 2.0 stuff, so that /something/ in fedora-17
>>> >> works, then fixing everything I broke when adding corosync 2.0
>>> >> support.
>>> >
>>> > All right!
>>> >
>>> > I wait for your answer.
>>>
>>> I somehow missed that the failure was "not configured"
>>>
>>> Failed actions:
>>>     prmVIP_monitor_0 (node=rh57-1, call=2, rc=6, status=complete): not
>>> configured
>>>
>>> http://www.clusterlabs.org/doc/en-US/Pacemaker/1.1/html/Pacemaker_Explained/s-ocf-return-codes.html
>>> lists rc=6 as fatal, but I believe we changed that behaviour (the
>>> stopping aspect) in the PE as there was also insufficient information
>>> for the agent to stop the service.
>>> Which results in the node being fenced, the resource being probed,
>>> which fails along with the subsequent stop, then the node is fenced
>>> again, etc.
>>>
>>> So two things:
>>>
>>> this log message should include the human version of rc=6
>>> Jan  6 19:22:01 rh57-1 pengine: [3464]: ERROR: unpack_rsc_op: Hard
>>> error - prmVIP_monitor_0 failed with rc=6: Preventing prmVIP from
>>> re-starting anywhere in the cluster
>>>
>>> and the docs need to be updated.
>>>
>>> >
>>> > Best Regards,
>>> > Hideo Yamauchi.
>>> >
>>> > --- On Thu, 2012/2/16, Andrew Beekhof <and...@beekhof.net> wrote:
>>> >
>>> >> Sorry!
>>> >>
>>> >> I'm getting to this soon, really :-)
>>> >> First it was corosync 2.0 stuff, so that /something/ in fedora-17
>>> >> works, then fixing everything I broke when adding corosync 2.0
>>> >> support.
>>> >>
>>> >> On Tue, Feb 14, 2012 at 11:20 AM,  <renayama19661...@ybb.ne.jp> wrote:
>>> >> > Hi Andrew,
>>> >> >
>>> >> > About this problem, how did it turn out afterwards?
>>> >> >
>>> >> > Best Regards,
>>> >> > Hideo Yamauchi.
>>> >> >
>>> >> >
>>> >> > --- On Mon, 2012/1/16, renayama19661...@ybb.ne.jp 
>>> >> > <renayama19661...@ybb.ne.jp> wrote:
>>> >> >
>>> >> >> Hi Andrew,
>>> >> >>
>>> >> >> Thank you for comments.
>>> >> >>
>>> >> >> > Could you send me the PE file related to this log please?
>>> >> >> >
>>> >> >> > Jan  6 19:22:01 rh57-1 crmd: [3461]: info: do_te_invoke: Processing
>>> >> >> > graph 4 (ref=pe_calc-dc-1325845321-26) derived from
>>> >> >> > /var/lib/pengine/pe-input-4.bz2
>>> >> >>
>>> >> >> The old file disappeared.
>>> >> >> I send log and the PE file which reappeared in the same procedure.
>>> >> >>
>>> >> >>  * trac1818.zip
>>> >> >>   * 
>>> >> >> https://skydrive.live.com/?cid=3a14d57622c66876&id=3A14D57622C66876%21127
>>> >> >>
>>> >> >> Best Regards,
>>> >> >> Hideo Yamauchi.
>>> >> >>
>>> >> >>
>>> >> >> --- On Mon, 2012/1/16, Andrew Beekhof <and...@beekhof.net> wrote:
>>> >> >>
>>> >> >> > On Fri, Jan 6, 2012 at 12:37 PM,  <renayama19661...@ybb.ne.jp> 
>>> >> >> > wrote:
>>> >> >> > > Hi Andrew,
>>> >> >> > >
>>> >> >> > > Thank you for comment.
>>> >> >> > >
>>> >> >> > >> But it should have a subsequent stop action which would set it 
>>> >> >> > >> back to
>>> >> >> > >> being inactive.
>>> >> >> > >> Did that not happen in this case?
>>> >> >> > >
>>> >> >> > > Yes.
>>> >> >> >
>>> >> >> > Could you send me the PE file related to this log please?
>>> >> >> >
>>> >> >> > Jan  6 19:22:01 rh57-1 crmd: [3461]: info: do_te_invoke: Processing
>>> >> >> > graph 4 (ref=pe_calc-dc-1325845321-26) derived from
>>> >> >> > /var/lib/pengine/pe-input-4.bz2
>>> >> >> >
>>> >> >> >
>>> >> >> >
>>> >> >> > > Log of "verify_stopped" is only recorded.
>>> >> >> > > The stop handling of resource that failed in probe was not 
>>> >> >> > > carried out.
>>> >> >> > >
>>> >> >> > > -----------------------------
>>> >> >> > > ######### yamauchi PREV STOP ##########
>>> >> >> > > Jan  6 19:21:56 rh57-1 heartbeat: [3443]: info: killing 
>>> >> >> > > /usr/lib64/heartbeat/ifcheckd process group 3462 with signal 15
>>> >> >> > > Jan  6 19:21:56 rh57-1 ifcheckd: [3462]: info: 
>>> >> >> > > crm_signal_dispatch: Invoking handler for signal 15: Terminated
>>> >> >> > > Jan  6 19:21:56 rh57-1 ifcheckd: [3462]: info: do_node_walk: 
>>> >> >> > > Requesting the list of configured nodes
>>> >> >> > > Jan  6 19:21:58 rh57-1 ifcheckd: [3462]: info: main: Exiting 
>>> >> >> > > ifcheckd
>>> >> >> > > Jan  6 19:21:58 rh57-1 heartbeat: [3443]: info: killing 
>>> >> >> > > /usr/lib64/heartbeat/crmd process group 3461 with signal 15
>>> >> >> > > Jan  6 19:21:58 rh57-1 crmd: [3461]: info: crm_signal_dispatch: 
>>> >> >> > > Invoking handler for signal 15: Terminated
>>> >> >> > > Jan  6 19:21:58 rh57-1 crmd: [3461]: info: crm_shutdown: 
>>> >> >> > > Requesting shutdown
>>> >> >> > > Jan  6 19:21:58 rh57-1 crmd: [3461]: info: do_state_transition: 
>>> >> >> > > State transition S_IDLE -> S_POLICY_ENGINE [ input=I_SHUTDOWN 
>>> >> >> > > cause=C_SHUTDOWN origin=crm_shutdown ]
>>> >> >> > > Jan  6 19:21:58 rh57-1 crmd: [3461]: info: do_state_transition: 
>>> >> >> > > All 1 cluster nodes are eligible to run resources.
>>> >> >> > > Jan  6 19:21:58 rh57-1 crmd: [3461]: info: do_shutdown_req: 
>>> >> >> > > Sending shutdown request to DC: rh57-1
>>> >> >> > > Jan  6 19:21:59 rh57-1 crmd: [3461]: info: 
>>> >> >> > > handle_shutdown_request: Creating shutdown request for rh57-1 
>>> >> >> > > (state=S_POLICY_ENGINE)
>>> >> >> > > Jan  6 19:21:59 rh57-1 attrd: [3460]: info: attrd_trigger_update: 
>>> >> >> > > Sending flush op to all hosts for: shutdown (1325845319)
>>> >> >> > > Jan  6 19:21:59 rh57-1 attrd: [3460]: info: attrd_perform_update: 
>>> >> >> > > Sent update 14: shutdown=1325845319
>>> >> >> > > Jan  6 19:21:59 rh57-1 crmd: [3461]: info: 
>>> >> >> > > abort_transition_graph: te_update_diff:150 - Triggered transition 
>>> >> >> > > abort (complete=1, tag=nvpair, 
>>> >> >> > > id=status-1fdd5e2a-44b6-44b9-9993-97fa120072a4-shutdown, 
>>> >> >> > > name=shutdown, value=1325845319, magic=NA, cib=0.101.16) : 
>>> >> >> > > Transient attribute: update
>>> >> >> > > Jan  6 19:22:01 rh57-1 crmd: [3461]: info: crm_timer_popped: New 
>>> >> >> > > Transition Timer (I_PE_CALC) just popped!
>>> >> >> > > Jan  6 19:22:01 rh57-1 crmd: [3461]: info: do_pe_invoke: Query 
>>> >> >> > > 44: Requesting the current CIB: S_POLICY_ENGINE
>>> >> >> > > Jan  6 19:22:01 rh57-1 crmd: [3461]: info: do_pe_invoke_callback: 
>>> >> >> > > Invoking the PE: query=44, ref=pe_calc-dc-1325845321-26, seq=1, 
>>> >> >> > > quorate=0
>>> >> >> > > Jan  6 19:22:01 rh57-1 pengine: [3464]: notice: unpack_config: On 
>>> >> >> > > loss of CCM Quorum: Ignore
>>> >> >> > > Jan  6 19:22:01 rh57-1 pengine: [3464]: info: unpack_config: Node 
>>> >> >> > > scores: 'red' = -INFINITY, 'yellow' = 0, 'green' = 0
>>> >> >> > > Jan  6 19:22:01 rh57-1 pengine: [3464]: WARN: unpack_nodes: Blind 
>>> >> >> > > faith: not fencing unseen nodes
>>> >> >> > > Jan  6 19:22:01 rh57-1 pengine: [3464]: info: 
>>> >> >> > > determine_online_status: Node rh57-1 is shutting down
>>> >> >> > > Jan  6 19:22:01 rh57-1 pengine: [3464]: ERROR: unpack_rsc_op: 
>>> >> >> > > Hard error - prmVIP_monitor_0 failed with rc=6: Preventing prmVIP 
>>> >> >> > > from re-starting anywhere in the cluster
>>> >> >> > > Jan  6 19:22:01 rh57-1 pengine: [3464]: notice: group_print:  
>>> >> >> > > Resource Group: grpUltraMonkey
>>> >> >> > > Jan  6 19:22:01 rh57-1 pengine: [3464]: notice: native_print:     
>>> >> >> > >  prmVIP       (ocf::heartbeat:LVM):   Stopped
>>> >> >> > > Jan  6 19:22:01 rh57-1 pengine: [3464]: notice: group_print:  
>>> >> >> > > Resource Group: grpStonith1
>>> >> >> > > Jan  6 19:22:01 rh57-1 pengine: [3464]: notice: native_print:     
>>> >> >> > >  prmStonith1-2        (stonith:external/ssh): Stopped
>>> >> >> > > Jan  6 19:22:01 rh57-1 pengine: [3464]: notice: native_print:     
>>> >> >> > >  prmStonith1-3        (stonith:meatware):     Stopped
>>> >> >> > > Jan  6 19:22:01 rh57-1 pengine: [3464]: notice: group_print:  
>>> >> >> > > Resource Group: grpStonith2
>>> >> >> > > Jan  6 19:22:01 rh57-1 pengine: [3464]: notice: native_print:     
>>> >> >> > >  prmStonith2-2        (stonith:external/ssh): Started rh57-1
>>> >> >> > > Jan  6 19:22:01 rh57-1 pengine: [3464]: notice: native_print:     
>>> >> >> > >  prmStonith2-3        (stonith:meatware):     Started rh57-1
>>> >> >> > > Jan  6 19:22:01 rh57-1 pengine: [3464]: notice: clone_print:  
>>> >> >> > > Clone Set: clnPingd
>>> >> >> > > Jan  6 19:22:01 rh57-1 pengine: [3464]: notice: short_print:      
>>> >> >> > > Started: [ rh57-1 ]
>>> >> >> > > Jan  6 19:22:01 rh57-1 pengine: [3464]: info: rsc_merge_weights: 
>>> >> >> > > clnPingd: Rolling back scores from prmVIP
>>> >> >> > > Jan  6 19:22:01 rh57-1 pengine: [3464]: info: native_color: 
>>> >> >> > > Resource prmPingd:0 cannot run anywhere
>>> >> >> > > Jan  6 19:22:01 rh57-1 pengine: [3464]: info: native_color: 
>>> >> >> > > Resource prmVIP cannot run anywhere
>>> >> >> > > Jan  6 19:22:01 rh57-1 pengine: [3464]: info: rsc_merge_weights: 
>>> >> >> > > prmStonith1-2: Rolling back scores from prmStonith1-3
>>> >> >> > > Jan  6 19:22:01 rh57-1 pengine: [3464]: info: native_color: 
>>> >> >> > > Resource prmStonith1-2 cannot run anywhere
>>> >> >> > > Jan  6 19:22:01 rh57-1 pengine: [3464]: info: native_color: 
>>> >> >> > > Resource prmStonith1-3 cannot run anywhere
>>> >> >> > > Jan  6 19:22:01 rh57-1 pengine: [3464]: info: rsc_merge_weights: 
>>> >> >> > > prmStonith2-2: Rolling back scores from prmStonith2-3
>>> >> >> > > Jan  6 19:22:01 rh57-1 pengine: [3464]: info: native_color: 
>>> >> >> > > Resource prmStonith2-2 cannot run anywhere
>>> >> >> > > Jan  6 19:22:01 rh57-1 pengine: [3464]: info: native_color: 
>>> >> >> > > Resource prmStonith2-3 cannot run anywhere
>>> >> >> > > Jan  6 19:22:01 rh57-1 pengine: [3464]: info: stage6: Scheduling 
>>> >> >> > > Node rh57-1 for shutdown
>>> >> >> > > Jan  6 19:22:01 rh57-1 pengine: [3464]: notice: LogActions: Leave 
>>> >> >> > >   resource prmVIP     (Stopped)
>>> >> >> > > Jan  6 19:22:01 rh57-1 pengine: [3464]: notice: LogActions: Leave 
>>> >> >> > >   resource prmStonith1-2      (Stopped)
>>> >> >> > > Jan  6 19:22:01 rh57-1 pengine: [3464]: notice: LogActions: Leave 
>>> >> >> > >   resource prmStonith1-3      (Stopped)
>>> >> >> > > Jan  6 19:22:01 rh57-1 pengine: [3464]: notice: LogActions: Stop  
>>> >> >> > >   resource prmStonith2-2      (rh57-1)
>>> >> >> > > Jan  6 19:22:01 rh57-1 pengine: [3464]: notice: LogActions: Stop  
>>> >> >> > >   resource prmStonith2-3      (rh57-1)
>>> >> >> > > Jan  6 19:22:01 rh57-1 pengine: [3464]: notice: LogActions: Stop  
>>> >> >> > >   resource prmPingd:0 (rh57-1)
>>> >> >> > > Jan  6 19:22:01 rh57-1 crmd: [3461]: info: do_state_transition: 
>>> >> >> > > State transition S_POLICY_ENGINE -> S_TRANSITION_ENGINE [ 
>>> >> >> > > input=I_PE_SUCCESS cause=C_IPC_MESSAGE origin=handle_response ]
>>> >> >> > > Jan  6 19:22:01 rh57-1 pengine: [3464]: info: process_pe_message: 
>>> >> >> > > Transition 4: PEngine Input stored in: 
>>> >> >> > > /var/lib/pengine/pe-input-4.bz2
>>> >> >> > > Jan  6 19:22:01 rh57-1 crmd: [3461]: info: unpack_graph: Unpacked 
>>> >> >> > > transition 4: 9 actions in 9 synapses
>>> >> >> > > Jan  6 19:22:01 rh57-1 crmd: [3461]: info: do_te_invoke: 
>>> >> >> > > Processing graph 4 (ref=pe_calc-dc-1325845321-26) derived from 
>>> >> >> > > /var/lib/pengine/pe-input-4.bz2
>>> >> >> > > Jan  6 19:22:01 rh57-1 crmd: [3461]: info: te_pseudo_action: 
>>> >> >> > > Pseudo action 19 fired and confirmed
>>> >> >> > > Jan  6 19:22:01 rh57-1 crmd: [3461]: info: te_pseudo_action: 
>>> >> >> > > Pseudo action 24 fired and confirmed
>>> >> >> > > Jan  6 19:22:01 rh57-1 crmd: [3461]: info: te_rsc_command: 
>>> >> >> > > Initiating action 21: stop prmPingd:0_stop_0 on rh57-1 (local)
>>> >> >> > > Jan  6 19:22:02 rh57-1 lrmd: [3458]: info: cancel_op: operation 
>>> >> >> > > monitor[10] on prmPingd:0 for client 3461, its parameters: 
>>> >> >> > > CRM_meta_interval=[10000] multiplier=[100] 
>>> >> >> > > CRM_meta_on_fail=[restart] CRM_meta_timeout=[60000] 
>>> >> >> > > name=[default_ping_set] CRM_meta_clone_max=[1] 
>>> >> >> > > crm_feature_set=[3.0.1] host_list=[192.168.40.1] 
>>> >> >> > > CRM_meta_globally_unique=[false] CRM_meta_name=[monitor] 
>>> >> >> > > CRM_meta_clone=[0] CRM_meta_clone_node_max=[1] 
>>> >> >> > > CRM_meta_notify=[false]  cancelled
>>> >> >> > > Jan  6 19:22:02 rh57-1 crmd: [3461]: info: do_lrm_rsc_op: 
>>> >> >> > > Performing key=21:4:0:f1bcc681-b4b6-4f96-8de0-925a814014f9 
>>> >> >> > > op=prmPingd:0_stop_0 )
>>> >> >> > > Jan  6 19:22:02 rh57-1 pingd: [3529]: info: crm_signal_dispatch: 
>>> >> >> > > Invoking handler for signal 15: Terminated
>>> >> >> > > Jan  6 19:22:02 rh57-1 lrmd: [3458]: info: rsc:prmPingd:0 
>>> >> >> > > stop[14] (pid 3612)
>>> >> >> > > Jan  6 19:22:02 rh57-1 lrmd: [3458]: info: operation stop[14] on 
>>> >> >> > > prmPingd:0 for client 3461: pid 3612 exited with return code 0
>>> >> >> > > Jan  6 19:22:02 rh57-1 crmd: [3461]: info: process_lrm_event: LRM 
>>> >> >> > > operation prmPingd:0_monitor_10000 (call=10, status=1, 
>>> >> >> > > cib-update=0, confirmed=true) Cancelled
>>> >> >> > > Jan  6 19:22:02 rh57-1 crmd: [3461]: info: process_lrm_event: LRM 
>>> >> >> > > operation prmPingd:0_stop_0 (call=14, rc=0, cib-update=45, 
>>> >> >> > > confirmed=true) ok
>>> >> >> > > Jan  6 19:22:02 rh57-1 crmd: [3461]: info: match_graph_event: 
>>> >> >> > > Action prmPingd:0_stop_0 (21) confirmed on rh57-1 (rc=0)
>>> >> >> > > Jan  6 19:22:02 rh57-1 crmd: [3461]: info: te_pseudo_action: 
>>> >> >> > > Pseudo action 25 fired and confirmed
>>> >> >> > > Jan  6 19:22:02 rh57-1 crmd: [3461]: info: te_pseudo_action: 
>>> >> >> > > Pseudo action 4 fired and confirmed
>>> >> >> > > Jan  6 19:22:02 rh57-1 crmd: [3461]: info: te_rsc_command: 
>>> >> >> > > Initiating action 16: stop prmStonith2-3_stop_0 on rh57-1 (local)
>>> >> >> > > Jan  6 19:22:02 rh57-1 lrmd: [3458]: info: cancel_op: operation 
>>> >> >> > > monitor[13] on prmStonith2-3 for client 3461, its parameters: 
>>> >> >> > > CRM_meta_interval=[3600000] stonith-timeout=[600s] 
>>> >> >> > > hostlist=[rh57-2] CRM_meta_timeout=[60000] 
>>> >> >> > > crm_feature_set=[3.0.1] priority=[2] CRM_meta_name=[monitor]  
>>> >> >> > > cancelled
>>> >> >> > > Jan  6 19:22:02 rh57-1 crmd: [3461]: info: do_lrm_rsc_op: 
>>> >> >> > > Performing key=16:4:0:f1bcc681-b4b6-4f96-8de0-925a814014f9 
>>> >> >> > > op=prmStonith2-3_stop_0 )
>>> >> >> > > Jan  6 19:22:02 rh57-1 lrmd: [3458]: info: rsc:prmStonith2-3 
>>> >> >> > > stop[15] (pid 3617)
>>> >> >> > > Jan  6 19:22:02 rh57-1 lrmd: [3617]: info: Try to stop STONITH 
>>> >> >> > > resource <rsc_id=prmStonith2-3> : Device=meatware
>>> >> >> > > Jan  6 19:22:02 rh57-1 crmd: [3461]: info: process_lrm_event: LRM 
>>> >> >> > > operation prmStonith2-3_monitor_3600000 (call=13, status=1, 
>>> >> >> > > cib-update=0, confirmed=true) Cancelled
>>> >> >> > > Jan  6 19:22:02 rh57-1 lrmd: [3458]: info: operation stop[15] on 
>>> >> >> > > prmStonith2-3 for client 3461: pid 3617 exited with return code 0
>>> >> >> > > Jan  6 19:22:02 rh57-1 crmd: [3461]: info: process_lrm_event: LRM 
>>> >> >> > > operation prmStonith2-3_stop_0 (call=15, rc=0, cib-update=46, 
>>> >> >> > > confirmed=true) ok
>>> >> >> > > Jan  6 19:22:02 rh57-1 crmd: [3461]: info: match_graph_event: 
>>> >> >> > > Action prmStonith2-3_stop_0 (16) confirmed on rh57-1 (rc=0)
>>> >> >> > > Jan  6 19:22:02 rh57-1 crmd: [3461]: info: te_rsc_command: 
>>> >> >> > > Initiating action 15: stop prmStonith2-2_stop_0 on rh57-1 (local)
>>> >> >> > > Jan  6 19:22:02 rh57-1 lrmd: [3458]: info: cancel_op: operation 
>>> >> >> > > monitor[11] on prmStonith2-2 for client 3461, its parameters: 
>>> >> >> > > CRM_meta_interval=[3600000] stonith-timeout=[60s] 
>>> >> >> > > hostlist=[rh57-2] CRM_meta_timeout=[60000] 
>>> >> >> > > crm_feature_set=[3.0.1] priority=[1] CRM_meta_name=[monitor]  
>>> >> >> > > cancelled
>>> >> >> > > Jan  6 19:22:02 rh57-1 crmd: [3461]: info: do_lrm_rsc_op: 
>>> >> >> > > Performing key=15:4:0:f1bcc681-b4b6-4f96-8de0-925a814014f9 
>>> >> >> > > op=prmStonith2-2_stop_0 )
>>> >> >> > > Jan  6 19:22:02 rh57-1 lrmd: [3458]: info: rsc:prmStonith2-2 
>>> >> >> > > stop[16] (pid 3619)
>>> >> >> > > Jan  6 19:22:02 rh57-1 lrmd: [3619]: info: Try to stop STONITH 
>>> >> >> > > resource <rsc_id=prmStonith2-2> : Device=external/ssh
>>> >> >> > > Jan  6 19:22:02 rh57-1 crmd: [3461]: info: process_lrm_event: LRM 
>>> >> >> > > operation prmStonith2-2_monitor_3600000 (call=11, status=1, 
>>> >> >> > > cib-update=0, confirmed=true) Cancelled
>>> >> >> > > Jan  6 19:22:02 rh57-1 lrmd: [3458]: info: operation stop[16] on 
>>> >> >> > > prmStonith2-2 for client 3461: pid 3619 exited with return code 0
>>> >> >> > > Jan  6 19:22:02 rh57-1 crmd: [3461]: info: process_lrm_event: LRM 
>>> >> >> > > operation prmStonith2-2_stop_0 (call=16, rc=0, cib-update=47, 
>>> >> >> > > confirmed=true) ok
>>> >> >> > > Jan  6 19:22:02 rh57-1 crmd: [3461]: info: match_graph_event: 
>>> >> >> > > Action prmStonith2-2_stop_0 (15) confirmed on rh57-1 (rc=0)
>>> >> >> > > Jan  6 19:22:02 rh57-1 crmd: [3461]: info: te_pseudo_action: 
>>> >> >> > > Pseudo action 20 fired and confirmed
>>> >> >> > > Jan  6 19:22:02 rh57-1 crmd: [3461]: info: te_crm_command: 
>>> >> >> > > Executing crm-event (28): do_shutdown on rh57-1
>>> >> >> > > Jan  6 19:22:02 rh57-1 crmd: [3461]: info: te_crm_command: 
>>> >> >> > > crm-event (28) is a local shutdown
>>> >> >> > > Jan  6 19:22:02 rh57-1 crmd: [3461]: info: run_graph: 
>>> >> >> > > ====================================================
>>> >> >> > > Jan  6 19:22:02 rh57-1 crmd: [3461]: notice: run_graph: 
>>> >> >> > > Transition 4 (Complete=9, Pending=0, Fired=0, Skipped=0, 
>>> >> >> > > Incomplete=0, Source=/var/lib/pengine/pe-input-4.bz2): Complete
>>> >> >> > > Jan  6 19:22:02 rh57-1 crmd: [3461]: info: te_graph_trigger: 
>>> >> >> > > Transition 4 is now complete
>>> >> >> > > Jan  6 19:22:03 rh57-1 crmd: [3461]: info: do_state_transition: 
>>> >> >> > > State transition S_TRANSITION_ENGINE -> S_STOPPING [ input=I_STOP 
>>> >> >> > > cause=C_FSA_INTERNAL origin=notify_crmd ]
>>> >> >> > > Jan  6 19:22:03 rh57-1 crmd: [3461]: info: do_dc_release: DC role 
>>> >> >> > > released
>>> >> >> > > Jan  6 19:22:03 rh57-1 crmd: [3461]: info: stop_subsystem: Sent 
>>> >> >> > > -TERM to pengine: [3464]
>>> >> >> > > Jan  6 19:22:03 rh57-1 pengine: [3464]: info: 
>>> >> >> > > crm_signal_dispatch: Invoking handler for signal 15: Terminated
>>> >> >> > > Jan  6 19:22:03 rh57-1 crmd: [3461]: info: do_te_control: 
>>> >> >> > > Transitioner is now inactive
>>> >> >> > > Jan  6 19:22:03 rh57-1 crmd: [3461]: info: do_te_control: 
>>> >> >> > > Disconnecting STONITH...
>>> >> >> > > Jan  6 19:22:03 rh57-1 crmd: [3461]: info: 
>>> >> >> > > tengine_stonith_connection_destroy: Fencing daemon disconnected
>>> >> >> > > Jan  6 19:22:03 rh57-1 crmd: [3461]: notice: Not currently 
>>> >> >> > > connected.
>>> >> >> > > Jan  6 19:22:03 rh57-1 crmd: [3461]: info: do_shutdown: 
>>> >> >> > > Terminating the pengine
>>> >> >> > > Jan  6 19:22:03 rh57-1 crmd: [3461]: info: stop_subsystem: Sent 
>>> >> >> > > -TERM to pengine: [3464]
>>> >> >> > > Jan  6 19:22:03 rh57-1 crmd: [3461]: info: do_shutdown: Waiting 
>>> >> >> > > for subsystems to exit
>>> >> >> > > Jan  6 19:22:03 rh57-1 crmd: [3461]: WARN: 
>>> >> >> > > register_fsa_input_adv: do_shutdown stalled the FSA with pending 
>>> >> >> > > inputs
>>> >> >> > > Jan  6 19:22:03 rh57-1 crmd: [3461]: info: do_shutdown: All 
>>> >> >> > > subsystems stopped, continuing
>>> >> >> > > Jan  6 19:22:03 rh57-1 crmd: [3461]: WARN: do_log: FSA: Input 
>>> >> >> > > I_RELEASE_SUCCESS from do_dc_release() received in state 
>>> >> >> > > S_STOPPING
>>> >> >> > > Jan  6 19:22:03 rh57-1 crmd: [3461]: info: do_shutdown: 
>>> >> >> > > Terminating the pengine
>>> >> >> > > Jan  6 19:22:03 rh57-1 crmd: [3461]: info: stop_subsystem: Sent 
>>> >> >> > > -TERM to pengine: [3464]
>>> >> >> > > Jan  6 19:22:03 rh57-1 crmd: [3461]: info: do_shutdown: Waiting 
>>> >> >> > > for subsystems to exit
>>> >> >> > > Jan  6 19:22:03 rh57-1 crmd: [3461]: info: do_shutdown: All 
>>> >> >> > > subsystems stopped, continuing
>>> >> >> > > Jan  6 19:22:03 rh57-1 crmd: [3461]: WARN: G_SIG_dispatch: 
>>> >> >> > > Dispatch function for SIGCHLD was delayed 420 ms (> 100 ms) 
>>> >> >> > > before being called (GSource: 0x179d9b0)
>>> >> >> > > Jan  6 19:22:03 rh57-1 crmd: [3461]: info: G_SIG_dispatch: 
>>> >> >> > > started at 429442052 should have started at 429442010
>>> >> >> > > Jan  6 19:22:03 rh57-1 crmd: [3461]: info: crmdManagedChildDied: 
>>> >> >> > > Process pengine:[3464] exited (signal=0, exitcode=0)
>>> >> >> > > Jan  6 19:22:03 rh57-1 crmd: [3461]: WARN: G_SIG_dispatch: 
>>> >> >> > > Dispatch function for SIGCHLD took too long to execute: 80 ms (> 
>>> >> >> > > 30 ms) (GSource: 0x179d9b0)
>>> >> >> > > Jan  6 19:22:03 rh57-1 crmd: [3461]: info: pe_msg_dispatch: 
>>> >> >> > > Received HUP from pengine:[3464]
>>> >> >> > > Jan  6 19:22:03 rh57-1 crmd: [3461]: info: pe_connection_destroy: 
>>> >> >> > > Connection to the Policy Engine released
>>> >> >> > > Jan  6 19:22:03 rh57-1 crmd: [3461]: info: do_shutdown: All 
>>> >> >> > > subsystems stopped, continuing
>>> >> >> > > Jan  6 19:22:03 rh57-1 crmd: [3461]: ERROR: verify_stopped: 
>>> >> >> > > Resource prmVIP was active at shutdown.  You may ignore this 
>>> >> >> > > error if it is unmanaged.
>>> >> >> > > Jan  6 19:22:03 rh57-1 crmd: [3461]: info: do_lrm_control: 
>>> >> >> > > Disconnected from the LRM
>>> >> >> > > Jan  6 19:22:03 rh57-1 crmd: [3461]: info: do_ha_control: 
>>> >> >> > > Disconnected from Heartbeat
>>> >> >> > > Jan  6 19:22:03 rh57-1 ccm: [3456]: info: client (pid=3461) 
>>> >> >> > > removed from ccm
>>> >> >> > > Jan  6 19:22:03 rh57-1 crmd: [3461]: info: do_cib_control: 
>>> >> >> > > Disconnecting CIB
>>> >> >> > > Jan  6 19:22:03 rh57-1 crmd: [3461]: info: 
>>> >> >> > > crmd_cib_connection_destroy: Connection to the CIB terminated...
>>> >> >> > > Jan  6 19:22:03 rh57-1 cib: [3457]: info: cib_process_readwrite: 
>>> >> >> > > We are now in R/O mode
>>> >> >> > > Jan  6 19:22:03 rh57-1 crmd: [3461]: info: do_exit: Performing 
>>> >> >> > > A_EXIT_0 - gracefully exiting the CRMd
>>> >> >> > > Jan  6 19:22:03 rh57-1 cib: [3457]: WARN: send_ipc_message: IPC 
>>> >> >> > > Channel to 3461 is not connected
>>> >> >> > > Jan  6 19:22:04 rh57-1 crmd: [3461]: info: free_mem: Dropping 
>>> >> >> > > I_TERMINATE: [ state=S_STOPPING cause=C_FSA_INTERNAL 
>>> >> >> > > origin=do_stop ]
>>> >> >> > > Jan  6 19:22:04 rh57-1 cib: [3457]: WARN: 
>>> >> >> > > send_via_callback_channel: Delivery of reply to client 
>>> >> >> > > 3461/5f69edda-aec9-42c7-ae52-045a05d1c5db failed
>>> >> >> > > Jan  6 19:22:04 rh57-1 crmd: [3461]: info: do_exit: [crmd] 
>>> >> >> > > stopped (0)
>>> >> >> > > Jan  6 19:22:04 rh57-1 cib: [3457]: WARN: do_local_notify: A-Sync 
>>> >> >> > > reply to crmd failed: reply failed
>>> >> >> > > Jan  6 19:22:04 rh57-1 heartbeat: [3443]: info: killing 
>>> >> >> > > /usr/lib64/heartbeat/attrd process group 3460 with signal 15
>>> >> >> > > Jan  6 19:22:04 rh57-1 heartbeat: [3443]: WARN: G_SIG_dispatch: 
>>> >> >> > > Dispatch function for SIGCHLD took too long to execute: 50 ms (> 
>>> >> >> > > 30 ms) (GSource: 0x7b28140)
>>> >> >> > > Jan  6 19:22:04 rh57-1 attrd: [3460]: info: crm_signal_dispatch: 
>>> >> >> > > Invoking handler for signal 15: Terminated
>>> >> >> > > Jan  6 19:22:04 rh57-1 attrd: [3460]: info: attrd_shutdown: 
>>> >> >> > > Exiting
>>> >> >> > > Jan  6 19:22:04 rh57-1 attrd: [3460]: info: main: Exiting...
>>> >> >> > > Jan  6 19:22:04 rh57-1 attrd: [3460]: info: 
>>> >> >> > > attrd_cib_connection_destroy: Connection to the CIB terminated...
>>> >> >> > > Jan  6 19:22:04 rh57-1 heartbeat: [3443]: info: killing 
>>> >> >> > > /usr/lib64/heartbeat/stonithd process group 3459 with signal 15
>>> >> >> > > Jan  6 19:22:04 rh57-1 stonithd: [3459]: notice: 
>>> >> >> > > /usr/lib64/heartbeat/stonithd normally quit.
>>> >> >> > > Jan  6 19:22:04 rh57-1 heartbeat: [3443]: info: killing 
>>> >> >> > > /usr/lib64/heartbeat/lrmd -r process group 3458 with signal 15
>>> >> >> > > Jan  6 19:22:04 rh57-1 heartbeat: [3443]: WARN: G_SIG_dispatch: 
>>> >> >> > > Dispatch function for SIGCHLD took too long to execute: 40 ms (> 
>>> >> >> > > 30 ms) (GSource: 0x7b28140)
>>> >> >> > > Jan  6 19:22:04 rh57-1 lrmd: [3458]: info: lrmd is shutting down
>>> >> >> > > Jan  6 19:22:04 rh57-1 heartbeat: [3443]: info: killing 
>>> >> >> > > /usr/lib64/heartbeat/cib process group 3457 with signal 15
>>> >> >> > > Jan  6 19:22:04 rh57-1 heartbeat: [3443]: WARN: G_SIG_dispatch: 
>>> >> >> > > Dispatch function for SIGCHLD took too long to execute: 40 ms (> 
>>> >> >> > > 30 ms) (GSource: 0x7b28140)
>>> >> >> > > Jan  6 19:22:04 rh57-1 cib: [3457]: info: crm_signal_dispatch: 
>>> >> >> > > Invoking handler for signal 15: Terminated
>>> >> >> > > Jan  6 19:22:04 rh57-1 cib: [3457]: info: cib_shutdown: 
>>> >> >> > > Disconnected 0 clients
>>> >> >> > > Jan  6 19:22:04 rh57-1 cib: [3457]: info: cib_process_disconnect: 
>>> >> >> > > All clients disconnected...
>>> >> >> > > Jan  6 19:22:04 rh57-1 cib: [3457]: info: terminate_cib: 
>>> >> >> > > initiate_exit: Disconnecting heartbeat
>>> >> >> > > Jan  6 19:22:04 rh57-1 cib: [3457]: info: terminate_cib: 
>>> >> >> > > Exiting...
>>> >> >> > > Jan  6 19:22:04 rh57-1 cib: [3457]: info: main: Done
>>> >> >> > > Jan  6 19:22:04 rh57-1 ccm: [3456]: info: client (pid=3457) 
>>> >> >> > > removed from ccm
>>> >> >> > > Jan  6 19:22:04 rh57-1 heartbeat: [3443]: info: killing 
>>> >> >> > > /usr/lib64/heartbeat/ccm process group 3456 with signal 15
>>> >> >> > > Jan  6 19:22:04 rh57-1 heartbeat: [3443]: WARN: G_SIG_dispatch: 
>>> >> >> > > Dispatch function for SIGCHLD took too long to execute: 60 ms (> 
>>> >> >> > > 30 ms) (GSource: 0x7b28140)
>>> >> >> > > Jan  6 19:22:04 rh57-1 ccm: [3456]: info: received SIGTERM, going 
>>> >> >> > > to shut down
>>> >> >> > > Jan  6 19:22:05 rh57-1 heartbeat: [3443]: info: killing HBFIFO 
>>> >> >> > > process 3446 with signal 15
>>> >> >> > > Jan  6 19:22:05 rh57-1 heartbeat: [3443]: info: killing HBWRITE 
>>> >> >> > > process 3447 with signal 15
>>> >> >> > > Jan  6 19:22:05 rh57-1 heartbeat: [3443]: info: killing HBREAD 
>>> >> >> > > process 3448 with signal 15
>>> >> >> > > Jan  6 19:22:05 rh57-1 heartbeat: [3443]: info: killing HBWRITE 
>>> >> >> > > process 3449 with signal 15
>>> >> >> > > Jan  6 19:22:05 rh57-1 heartbeat: [3443]: info: killing HBREAD 
>>> >> >> > > process 3450 with signal 15
>>> >> >> > > Jan  6 19:22:05 rh57-1 heartbeat: [3443]: info: Core process 3448 
>>> >> >> > > exited. 5 remaining
>>> >> >> > > Jan  6 19:22:05 rh57-1 heartbeat: [3443]: info: Core process 3447 
>>> >> >> > > exited. 4 remaining
>>> >> >> > > Jan  6 19:22:05 rh57-1 heartbeat: [3443]: info: Core process 3450 
>>> >> >> > > exited. 3 remaining
>>> >> >> > > Jan  6 19:22:05 rh57-1 heartbeat: [3443]: info: Core process 3446 
>>> >> >> > > exited. 2 remaining
>>> >> >> > > Jan  6 19:22:05 rh57-1 heartbeat: [3443]: info: Core process 3449 
>>> >> >> > > exited. 1 remaining
>>> >> >> > > Jan  6 19:22:05 rh57-1 heartbeat: [3443]: info: rh57-1 Heartbeat 
>>> >> >> > > shutdown complete.
>>> >> >> > >
>>> >> >> > > -----------------------------
>>> >> >> > >
>>> >> >> > >
>>> >> >> > >
>>> >> >> > > Best Regards,
>>> >> >> > > Hideo Yamauchi.
>>> >> >> > >
>>> >> >> > >
>>> >> >> > >
>>> >> >> > >
>>> >> >> > >
>>> >> >> > > --- On Fri, 2012/1/6, Andrew Beekhof <and...@beekhof.net> wrote:
>>> >> >> > >
>>> >> >> > >> On Tue, Dec 27, 2011 at 6:15 PM,  <renayama19661...@ybb.ne.jp> 
>>> >> >> > >> wrote:
>>> >> >> > >> > Hi All,
>>> >> >> > >> >
>>> >> >> > >> > When Pacemaker stops when there is the resource that failed in 
>>> >> >> > >> > probe processing, crmd outputs the following error message.
>>> >> >> > >> >
>>> >> >> > >> >
>>> >> >> > >> >  Dec 28 00:07:36 rh57-1 crmd: [3206]: ERROR: verify_stopped: 
>>> >> >> > >> > Resource XXXXX was active at shutdown.  You may ignore this 
>>> >> >> > >> > error if it is unmanaged.
>>> >> >> > >> >
>>> >> >> > >> >
>>> >> >> > >> > Because the resource that failed in probe processing does not 
>>> >> >> > >> > start,
>>> >> >> > >>
>>> >> >> > >> But it should have a subsequent stop action which would set it 
>>> >> >> > >> back to
>>> >> >> > >> being inactive.
>>> >> >> > >> Did that not happen in this case?
>>> >> >> > >>
>>> >> >> > >> > this error message is not right.
>>> >> >> > >> >
>>> >> >> > >> > I think that the following correction may be good, but we do 
>>> >> >> > >> > not have conviction.
>>> >> >> > >> >
>>> >> >> > >> >
>>> >> >> > >> >  * crmd/lrm.c
>>> >> >> > >> >  (snip)
>>> >> >> > >> >                } else if(op->rc == EXECRA_NOT_RUNNING) {
>>> >> >> > >> >                        active = FALSE;
>>> >> >> > >> > +                } else if(op->rc != EXECRA_OK && op->interval 
>>> >> >> > >> > == 0
>>> >> >> > >> > +                                && safe_str_eq(op->op_type, 
>>> >> >> > >> > CRMD_ACTION_STATUS)) {
>>> >> >> > >> > +                        active = FALSE;
>>> >> >> > >> >                } else {
>>> >> >> > >> >                        active = TRUE;
>>> >> >> > >> >                }
>>> >> >> > >> >  (snip)
>>> >> >> > >> >
>>> >> >> > >> >
>>> >> >> > >> > In the source for development of Pacemaker, handling of this 
>>> >> >> > >> > processing seems to be considerably changed.
>>> >> >> > >> > It requests backporting to Pacemaker1.0 system of this change 
>>> >> >> > >> > that we can do it.
>>> >> >> > >> >
>>> >> >> > >> > Best Regards,
>>> >> >> > >> > Hideo Yamauchi.
>>> >> >> > >> >
>>> >> >> > >> >
>>> >> >> > >> >
>>> >> >> > >> > _______________________________________________
>>> >> >> > >> > Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
>>> >> >> > >> > http://oss.clusterlabs.org/mailman/listinfo/pacemaker
>>> >> >> > >> >
>>> >> >> > >> > Project Home: http://www.clusterlabs.org
>>> >> >> > >> > Getting started: 
>>> >> >> > >> > http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
>>> >> >> > >> > Bugs: http://bugs.clusterlabs.org
>>> >> >> > >>
>>> >> >> > >
>>> >> >> > > _______________________________________________
>>> >> >> > > Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
>>> >> >> > > http://oss.clusterlabs.org/mailman/listinfo/pacemaker
>>> >> >> > >
>>> >> >> > > Project Home: http://www.clusterlabs.org
>>> >> >> > > Getting started: 
>>> >> >> > > http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
>>> >> >> > > Bugs: http://bugs.clusterlabs.org
>>> >> >> >
>>> >> >>
>>> >> >
>>> >> > _______________________________________________
>>> >> > Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
>>> >> > http://oss.clusterlabs.org/mailman/listinfo/pacemaker
>>> >> >
>>> >> > Project Home: http://www.clusterlabs.org
>>> >> > Getting started: 
>>> >> > http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
>>> >> > Bugs: http://bugs.clusterlabs.org
>>> >>
>>> >
>>> > _______________________________________________
>>> > Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
>>> > http://oss.clusterlabs.org/mailman/listinfo/pacemaker
>>> >
>>> > Project Home: http://www.clusterlabs.org
>>> > Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
>>> > Bugs: http://bugs.clusterlabs.org
>>>
>>
>> _______________________________________________
>> Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
>> http://oss.clusterlabs.org/mailman/listinfo/pacemaker
>>
>> Project Home: http://www.clusterlabs.org
>> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
>> Bugs: http://bugs.clusterlabs.org

_______________________________________________
Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org

Re: [Pacemaker] [Problem]It is judged that a stopping resource is starting.

Reply via email to