Actually, I was wrong, the version used is 1.1.10. So, how I can know which process is taking so long?
thanks On 3/23/14, 7:35 PM, "Andrew Beekhof" <and...@beekhof.net> wrote: > >On 21 Mar 2014, at 3:57 am, Drapeau, Mathieu <mathieu.drap...@intel.com> >wrote: > >> Hello, >> From pacemaker 1.1.8-7 from EL6, crmd died unexpected generating this >>logs during a failover: > >Please update to 1.1.10 from the EL6 update channels: > >http://blog.clusterlabs.org/blog/2014/potential-for-data-corruption-in-pac >emaker-1-dot-1-6-through-1-dot-1-9/ > >> >> >> crmd[10419]: error: crmd_node_update_complete: Node update 79 >>failed: Timer expired (-62) > >It looks like your hardware is overloaded and an operation that shouldn't >have taken very long has timed out. > >> crmd[10419]: error: do_log: FSA: Input I_ERROR from >>crmd_node_update_complete() received in state S_IDLE >> crmd[10419]: notice: do_state_transition: State transition S_IDLE -> >>S_RECOVERY [ input=I_ERROR cause=C_FSA_INTERNAL >>origin=crmd_node_update_complete ] >> crmd[10419]: warning: do_recover: Fast-tracking shutdown in response >>to errors >> crmd[10419]: warning: do_election_vote: Not voting in election, we're >>in state S_RECOVERY >> crmd[10419]: error: do_log: FSA: Input I_TERMINATE from do_recover() >>received in state S_RECOVERY >> crmd[10419]: notice: lrm_state_verify_stopped: Stopped 0 recurring >>operations at shutdown (2 ops remaining) >> crmd[10419]: notice: lrm_state_verify_stopped: Recurring action >>testfs-MDT0000_6cda68:21 (testfs-MDT0000_6cda68_monitor_5000) incomplete >>at shutdown >> crmd[10419]: notice: lrm_state_verify_stopped: Recurring action >>MGS_f055b7:30 (MGS_f055b7_monitor_5000) incomplete at shutdown >> crmd[10419]: error: lrm_state_verify_stopped: 3 resources were >>active at shutdown. >> crmd[10419]: notice: do_lrm_control: Disconnected from the LRM >> crmd[10419]: notice: terminate_cs_connection: Disconnecting from >>Corosync >> corosync[10370]: [pcmk ] info: pcmk_ipc_exit: Client crmd >>(conn=0x2589f40, async-conn=0x2589f40) left >> crmd[10419]: error: crmd_fast_exit: Could not recover from internal >>error >> pacemakerd[10408]: error: pcmk_child_exit: Child process crmd >>(10419) exited: Generic Pacemaker error (201) >> pacemakerd[10408]: notice: pcmk_process_exit: Respawning failed child >>process: crmd >> >> What could have happened and how to avoid crmd to die? >> >> Thanks, >> Mat >> >> >> _______________________________________________ >> Pacemaker mailing list: Pacemaker@oss.clusterlabs.org >> http://oss.clusterlabs.org/mailman/listinfo/pacemaker >> >> Project Home: http://www.clusterlabs.org >> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf >> Bugs: http://bugs.clusterlabs.org > _______________________________________________ Pacemaker mailing list: Pacemaker@oss.clusterlabs.org http://oss.clusterlabs.org/mailman/listinfo/pacemaker Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://bugs.clusterlabs.org