Hi, I´m sorry but the problem was generated by some kind of watchdog-script which stopped the heartbeat service.
Kind Regards f_c -------- Original-Nachricht -------- Datum: Tue, 15 Jun 2010 14:13:19 +0200 Von: "Testuser SST" <[email protected]> An: [email protected] Betreff: [Pacemaker] after update one node in crm is getting offline CentOS Hi, I have just made an update from heartbeat 2.x to the latest pacemaker with heartbeat and corosync from the clusterlabs repo on a 2 node CentOS-Cluster. (uninstall the heartbeat rpm, yum install from the new repo) The Cluster is holding one IP-resource. When I start the first node with "service heartbeat start" eveythings work fine, and I can launch an crm_standby -v off -U NODENAME without any problem. But when I start the second node, the first node is ok, but the second is after a while shuting down itself before I can even launch an crm_standby command. Node which is up: susan.hnc3.lan Node which is initialising the shutdown: salina.hnc3.lan crm_config: INFO: building help index node $id="5ed513ff-7d45-4cbe-b7ef-51be8b5a66ef" susan.hnc3.lan \ attributes standby="off" node $id="6f1f209b-751c-4a57-bae2-b9874be248cf" salina.hnc3.lan \ attributes standby="on" primitive mySQL_IP ocf:heartbeat:IPaddr2 \ params ip="192.168.18.151" cidr_netmask="24" \ op monitor interval="30s" property $id="cib-bootstrap-options" \ dc-version="1.0.8-9881a7350d6182bae9e8e557cf20a3cc5dac3ee7" \ cluster-infrastructure="Heartbeat" \ stonith-enabled="false" \ no-quorum-policy="ignore" rsc_defaults $id="rsc-options" \ resource-stickiness="100" the last lines of the ha-log of salina: Jun 15 13:47:50 salina.hnc3.lan cib: [28079]: info: crm_update_peer_proc: susan.hnc3.lan.ais is now online Jun 15 13:47:50 salina.hnc3.lan cib: [28079]: info: crm_update_peer_proc: susan.hnc3.lan.crmd is now online Jun 15 13:47:50 salina.hnc3.lan cib: [28079]: info: crm_update_peer: Node salina.hnc3.lan: id=0 state=member (new) addr=(null ) votes=-1 born=2 seen=2 proc=00000000000000000000000000000100 Jun 15 13:47:50 salina.hnc3.lan crmd: [28083]: info: crmd_ccm_msg_callback: Quorum (re)attained after event=NEW MEMBERSHIP (i d=2) Jun 15 13:47:50 salina.hnc3.lan crmd: [28083]: info: ccm_event_detail: NEW MEMBERSHIP: trans=2, nodes=2, new=2, lost=0 n_idx= 0, new_idx=0, old_idx=4 Jun 15 13:47:50 salina.hnc3.lan crmd: [28083]: info: ccm_event_detail: CURRENT: susan.hnc3.lan [nodeid=1, born=1] Jun 15 13:47:50 salina.hnc3.lan cib: [28079]: info: crm_update_peer_proc: salina.hnc3.lan.ais is now online Jun 15 13:47:50 salina.hnc3.lan crmd: [28083]: info: ccm_event_detail: CURRENT: salina.hnc3.lan [nodeid=0, born=2] Jun 15 13:47:50 salina.hnc3.lan cib: [28079]: info: crm_update_peer_proc: salina.hnc3.lan.crmd is now online Jun 15 13:47:50 salina.hnc3.lan crmd: [28083]: info: ccm_event_detail: NEW: susan.hnc3.lan [nodeid=1, born=1] Jun 15 13:47:50 salina.hnc3.lan crmd: [28083]: info: ccm_event_detail: NEW: salina.hnc3.lan [nodeid=0, born=2] Jun 15 13:47:50 salina.hnc3.lan crmd: [28083]: info: crm_get_peer: Node susan.hnc3.lan now has id: 1 Jun 15 13:47:50 salina.hnc3.lan crmd: [28083]: info: crm_update_peer: Node susan.hnc3.lan: id=1 state=member (new) addr=(null ) votes=-1 born=1 seen=2 proc=00000000000000000000000000000200 Jun 15 13:47:50 salina.hnc3.lan crmd: [28083]: info: crm_update_peer_proc: susan.hnc3.lan.ais is now online Jun 15 13:47:50 salina.hnc3.lan crmd: [28083]: info: crm_update_peer: Node salina.hnc3.lan: id=0 state=member (new) addr=(nul l) votes=-1 born=2 seen=2 proc=00000000000000000000000000000200 Jun 15 13:47:50 salina.hnc3.lan crmd: [28083]: info: crm_update_peer_proc: salina.hnc3.lan.ais is now online Jun 15 13:47:50 salina.hnc3.lan crmd: [28083]: info: do_started: The local CRM is operational Jun 15 13:47:50 salina.hnc3.lan crmd: [28083]: info: do_state_transition: State transition S_STARTING -> S_PENDING [ input=I_ PENDING cause=C_FSA_INTERNAL origin=do_started ] Jun 15 13:47:52 salina.hnc3.lan cib: [28079]: info: cib_process_diff: Diff 0.21.5 -> 0.21.6 not applied to 0.18.0: current "e poch" is less than required Jun 15 13:47:52 salina.hnc3.lan crmd: [28083]: info: update_dc: Set DC to susan.hnc3.lan (3.0.1) Jun 15 13:47:52 salina.hnc3.lan cib: [28079]: info: cib_server_process_diff: Requesting re-sync from peer Jun 15 13:47:52 salina.hnc3.lan cib: [28079]: WARN: cib_diff_notify: Local-only Change (client:crmd, call: 37): 0.0.0 (Applic ation of an update diff failed, requesting a full refresh) Jun 15 13:47:52 salina.hnc3.lan cib: [28079]: info: cib_replace_notify: Replaced: 0.18.0 -> 0.21.6 from susan.hnc3.lan Jun 15 13:47:52 salina.hnc3.lan crmd: [28083]: info: update_attrd: Connecting to attrd... Jun 15 13:47:52 salina.hnc3.lan crmd: [28083]: info: do_state_transition: State transition S_PENDING -> S_NOT_DC [ input=I_NO T_DC cause=C_HA_MESSAGE origin=do_cl_join_finalize_respond ] Jun 15 13:47:52 salina.hnc3.lan attrd: [28082]: info: find_hash_entry: Creating hash entry for terminate Jun 15 13:47:52 salina.hnc3.lan attrd: [28082]: info: find_hash_entry: Creating hash entry for shutdown Jun 15 13:47:52 salina.hnc3.lan attrd: [28082]: info: attrd_local_callback: Sending full refresh (origin=crmd) Jun 15 13:47:52 salina.hnc3.lan attrd: [28082]: info: attrd_trigger_update: Sending flush op to all hosts for: terminate (<nu ll>) Jun 15 13:47:52 salina.hnc3.lan attrd: [28082]: info: attrd_perform_update: Delaying operation terminate=<null>: cib not conn ected Jun 15 13:47:52 salina.hnc3.lan attrd: [28082]: info: attrd_trigger_update: Sending flush op to all hosts for: shutdown (<nul l>) Jun 15 13:47:52 salina.hnc3.lan attrd: [28082]: info: attrd_ha_callback: flush message from salina.hnc3.lan Jun 15 13:47:52 salina.hnc3.lan attrd: [28082]: info: attrd_ha_callback: flush message from salina.hnc3.lan Jun 15 13:47:52 salina.hnc3.lan cib: [28092]: info: write_cib_contents: Wrote version 0.21.0 of the CIB to disk (digest: 807f 50e28ff0e448e9a6a3b82a879d1c) Jun 15 13:47:52 salina.hnc3.lan cib: [28092]: info: retrieveCib: Reading cluster configuration from: /var/lib/heartbeat/crm/c ib-19.raw Jun 15 13:47:52 salina.hnc3.lan attrd: [28082]: info: cib_connect: Connected to the CIB after 2 signon attempts Jun 15 13:47:52 salina.hnc3.lan attrd: [28082]: info: cib_connect: Sending full refresh ll>) l>) Jun 15 13:47:52 salina.hnc3.lan attrd: [28082]: info: attrd_ha_callback: flush message from salina.hnc3.lan Jun 15 13:47:52 salina.hnc3.lan attrd: [28082]: info: attrd_ha_callback: flush message from salina.hnc3.lan 50e28ff0e448e9a6a3b82a879d1c) ib.7YiWtn (digest: /var/lib/heartbeat/crm/cib.R6wctE) Jun 15 13:47:52 salina.hnc3.lan cib: [28079]: info: Managed write_cib_contents process 28092 exited with return code 0. Jun 15 13:47:53 salina.hnc3.lan attrd: [28082]: info: attrd_ha_callback: flush message from salina.hnc3.lan Jun 15 13:47:53 salina.hnc3.lan attrd: [28082]: info: attrd_ha_callback: flush message from salina.hnc3.lan ]/transient_attributes": ok (rc=0) ]/lrm": ok (rc=0) Jun 15 13:47:55 salina.hnc3.lan attrd: [28082]: info: attrd_ha_callback: flush message from susan.hnc3.lan Jun 15 13:47:55 salina.hnc3.lan attrd: [28082]: info: find_hash_entry: Creating hash entry for probe_complete Jun 15 13:47:55 salina.hnc3.lan attrd: [28082]: info: attrd_ha_callback: flush message from susan.hnc3.lan Jun 15 13:47:55 salina.hnc3.lan attrd: [28082]: info: attrd_ha_callback: flush message from susan.hnc3.lan op=mySQL_IP_monitor_0 ) Jun 15 13:47:55 salina.hnc3.lan lrmd: [28080]: info: rsc:mySQL_IP:2: probe Jun 15 13:47:55 salina.hnc3.lan lrmd: [28080]: WARN: Managed mySQL_IP:monitor process 28097 exited with return code 7. pdate=8, confirmed=true) not running (true) Jun 15 13:47:57 salina.hnc3.lan attrd: [28082]: info: attrd_perform_update: Sen t update 13: probe_complete=true Jun 15 13:48:01 salina.hnc3.lan heartbeat: [28067]: info: killing /usr/lib/heartbeat/crmd process group 28083 with signal 15 Jun 15 13:48:01 salina.hnc3.lan crmd: [28083]: info: crm_signal_dispatch: Invoking handler for signal 15: Terminated Jun 15 13:48:01 salina.hnc3.lan crmd: [28083]: info: crm_shutdown: Requesting shutdown Jun 15 13:48:01 salina.hnc3.lan crmd: [28083]: info: do_shutdown_req: Sending shutdown request to DC: susan.hnc3.lan Jun 15 13:48:02 salina.hnc3.lan attrd: [28082]: info: attrd_ha_callback: Update relayed from susan.hnc3.lan 602481) Jun 15 13:48:02 salina.hnc3.lan attrd: [28082]: info: attrd_perform_update: Sent update 16: shutdown=1276602481 Jun 15 13:48:03 salina.hnc3.lan crmd: [28083]: info: handle_request: Shutting down TOP cause=C_HA_MESSAGE origin=route_message ] Jun 15 13:48:03 salina.hnc3.lan crmd: [28083]: info: do_shutdown: All subsystems stopped, continuing Jun 15 13:48:03 salina.hnc3.lan crmd: [28083]: info: do_lrm_control: Disconnected from the LRM Jun 15 13:48:03 salina.hnc3.lan ccm: [28078]: info: client (pid=28083) removed from ccm Jun 15 13:48:03 salina.hnc3.lan crmd: [28083]: info: do_cib_control: Disconnecting CIB Jun 15 13:48:03 salina.hnc3.lan crmd: [28083]: info: crmd_cib_connection_destroy: Connection to the CIB terminated... Jun 15 13:48:03 salina.hnc3.lan crmd: [28083]: info: do_exit: Performing A_EXIT_0 - gracefully exiting the CRMd origin=do_stop ] Jun 15 13:48:03 salina.hnc3.lan crmd: [28083]: info: do_exit: [crmd] stopped (0) Jun 15 13:48:03 salina.hnc3.lan heartbeat: [28067]: info: killing /usr/lib/heartbeat/attrd process group 28082 with signal 15 Jun 15 13:48:03 salina.hnc3.lan attrd: [28082]: info: crm_signal_dispatch: Invoking handler for signal 15: Terminated Jun 15 13:48:03 salina.hnc3.lan attrd: [28082]: info: attrd_shutdown: Exiting Jun 15 13:48:03 salina.hnc3.lan attrd: [28082]: info: main: Exiting... Jun 15 13:48:03 salina.hnc3.lan attrd: [28082]: info: attrd_cib_connection_destroy: Connection to the CIB terminated... 15 Jun 15 13:48:03 salina.hnc3.lan stonithd: [28081]: notice: /usr/lib/heartbeat/stonithd normally quit. 15 Jun 15 13:48:03 salina.hnc3.lan lrmd: [28080]: info: lrmd is shutting down Jun 15 13:48:03 salina.hnc3.lan heartbeat: [28067]: info: killing /usr/lib/heartbeat/cib process group 28079 with signal 15 Jun 15 13:48:03 salina.hnc3.lan cib: [28079]: info: crm_signal_dispatch: Invoking handler for signal 15: Terminated Jun 15 13:48:03 salina.hnc3.lan cib: [28079]: info: cib_shutdown: Disconnected 0 clients Jun 15 13:48:03 salina.hnc3.lan cib: [28079]: info: cib_process_disconnect: All clients disconnected... Jun 15 13:48:03 salina.hnc3.lan cib: [28079]: info: initiate_exit: Sending disconnect notification to 2 peers... Jun 15 13:48:04 salina.hnc3.lan cib: [28079]: info: cib_process_shutdown_req: Shutdown ACK from susan.hnc3.lan Jun 15 13:48:04 salina.hnc3.lan cib: [28079]: info: terminate_cib: cib_process_shutdown_req: Disconnecting heartbeat Jun 15 13:48:04 salina.hnc3.lan cib: [28079]: info: terminate_cib: Exiting... 'all' (origin=susan.hnc3.lan/susan.hnc3.lan/(null), version=0.0.0): ok (rc=0) Jun 15 13:48:04 salina.hnc3.lan cib: [28079]: info: ha_msg_dispatch: Lost connection to heartbeat service. Jun 15 13:48:04 salina.hnc3.lan cib: [28079]: info: main: Done Jun 15 13:48:04 salina.hnc3.lan ccm: [28078]: info: client (pid=28079) removed from ccm Jun 15 13:48:04 salina.hnc3.lan heartbeat: [28067]: info: killing /usr/lib/heartbeat/ccm process group 28078 with signal 15 Jun 15 13:48:04 salina.hnc3.lan ccm: [28078]: info: received SIGTERM, going to shut down Jun 15 13:48:04 salina.hnc3.lan heartbeat: [28067]: info: client [/usr/lib/heartbeat/ipfail] is not running. Jun 15 13:48:06 salina.hnc3.lan heartbeat: [28067]: info: killing HBREAD process 28072 with signal 15 Jun 15 13:48:06 salina.hnc3.lan heartbeat: [28067]: info: killing HBWRITE process 28073 with signal 15 Jun 15 13:48:06 salina.hnc3.lan heartbeat: [28067]: info: killing HBREAD process 28074 with signal 15 Jun 15 13:48:06 salina.hnc3.lan heartbeat: [28067]: info: killing HBFIFO process 28070 with signal 15 Jun 15 13:48:06 salina.hnc3.lan heartbeat: [28067]: info: killing HBWRITE process 28071 with signal 15 Jun 15 13:48:06 salina.hnc3.lan heartbeat: [28067]: info: Core process 28074 exited. 5 remaining Jun 15 13:48:06 salina.hnc3.lan heartbeat: [28067]: info: Core process 28072 exited. 4 remaining Jun 15 13:48:06 salina.hnc3.lan heartbeat: [28067]: info: Core process 28071 exited. 3 remaining Jun 15 13:48:06 salina.hnc3.lan heartbeat: [28067]: info: Core process 28070 exited. 2 remaining Jun 15 13:48:06 salina.hnc3.lan heartbeat: [28067]: info: Core process 28073 exited. 1 remaining Jun 15 13:48:06 salina.hnc3.lan heartbeat: [28067]: info: salina.hnc3.lan Heartbeat shutdown complete. Full logfile is attached Any assistance would be much appreciated! Kind Regards f_c -- GMX DSL: Internet-, Telefon- und Handy-Flat ab 19,99 EUR/mtl. Bis zu 150 EUR Startguthaben inklusive! http://portal.gmx.net/de/go/dsl -- GRATIS für alle GMX-Mitglieder: Die maxdome Movie-FLAT! Jetzt freischalten unter http://portal.gmx.net/de/go/maxdome01
salina_ha-log
Description: Binary data
_______________________________________________ Pacemaker mailing list: [email protected] http://oss.clusterlabs.org/mailman/listinfo/pacemaker Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://developerbugs.linux-foundation.org/enter_bug.cgi?product=Pacemaker
_______________________________________________ Pacemaker mailing list: [email protected] http://oss.clusterlabs.org/mailman/listinfo/pacemaker Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://developerbugs.linux-foundation.org/enter_bug.cgi?product=Pacemaker
