[Pacemaker] Solved !!! Fwd: after update one node in crm is getting offline CentOS

Testuser SST Tue, 15 Jun 2010 05:33:12 -0700

Hi,

I´m sorry but the problem was generated by some kind of watchdog-script which 
stopped the heartbeat service.


Kind Regards 

f_c


-------- Original-Nachricht --------
Datum: Tue, 15 Jun 2010 14:13:19 +0200
Von: "Testuser  SST" <[email protected]>
An: [email protected]
Betreff: [Pacemaker] after update one node in crm is getting offline CentOS

Hi,

I have just made an update from heartbeat 2.x to the latest pacemaker with 
heartbeat and corosync from the clusterlabs repo on a 2 node CentOS-Cluster. 
(uninstall the heartbeat rpm, yum install from the new repo)
The Cluster is holding one IP-resource. When I start the first node with 
"service heartbeat start" eveythings work fine, and I can launch an crm_standby 
-v off -U NODENAME without any problem. But when I start the second node, the 
first node is ok, but the second is after a while shuting down itself before I 
can even launch an crm_standby command.

Node which is up: susan.hnc3.lan
Node which is initialising the shutdown: salina.hnc3.lan

crm_config:

INFO: building help index
node $id="5ed513ff-7d45-4cbe-b7ef-51be8b5a66ef" susan.hnc3.lan \
        attributes standby="off"
node $id="6f1f209b-751c-4a57-bae2-b9874be248cf" salina.hnc3.lan \
        attributes standby="on"
primitive mySQL_IP ocf:heartbeat:IPaddr2 \
        params ip="192.168.18.151" cidr_netmask="24" \
        op monitor interval="30s"
property $id="cib-bootstrap-options" \
        dc-version="1.0.8-9881a7350d6182bae9e8e557cf20a3cc5dac3ee7" \
        cluster-infrastructure="Heartbeat" \
        stonith-enabled="false" \
        no-quorum-policy="ignore"
rsc_defaults $id="rsc-options" \
        resource-stickiness="100"



the last lines of the ha-log of salina:

 Jun 15 13:47:50 salina.hnc3.lan cib: [28079]: info: crm_update_peer_proc: 
susan.hnc3.lan.ais is now online
Jun 15 13:47:50 salina.hnc3.lan cib: [28079]: info: crm_update_peer_proc: 
susan.hnc3.lan.crmd is now online
Jun 15 13:47:50 salina.hnc3.lan cib: [28079]: info: crm_update_peer: Node 
salina.hnc3.lan: id=0 state=member (new) addr=(null
) votes=-1 born=2 seen=2 proc=00000000000000000000000000000100
Jun 15 13:47:50 salina.hnc3.lan crmd: [28083]: info: crmd_ccm_msg_callback: 
Quorum (re)attained after event=NEW MEMBERSHIP (i
d=2)
Jun 15 13:47:50 salina.hnc3.lan crmd: [28083]: info: ccm_event_detail: NEW 
MEMBERSHIP: trans=2, nodes=2, new=2, lost=0 n_idx=
0, new_idx=0, old_idx=4
Jun 15 13:47:50 salina.hnc3.lan crmd: [28083]: info: ccm_event_detail:  
CURRENT: susan.hnc3.lan [nodeid=1, born=1]
Jun 15 13:47:50 salina.hnc3.lan cib: [28079]: info: crm_update_peer_proc: 
salina.hnc3.lan.ais is now online
Jun 15 13:47:50 salina.hnc3.lan crmd: [28083]: info: ccm_event_detail:  
CURRENT: salina.hnc3.lan [nodeid=0, born=2]
Jun 15 13:47:50 salina.hnc3.lan cib: [28079]: info: crm_update_peer_proc: 
salina.hnc3.lan.crmd is now online
Jun 15 13:47:50 salina.hnc3.lan crmd: [28083]: info: ccm_event_detail:  NEW:    
 susan.hnc3.lan [nodeid=1, born=1]
Jun 15 13:47:50 salina.hnc3.lan crmd: [28083]: info: ccm_event_detail:  NEW:    
 salina.hnc3.lan [nodeid=0, born=2]
Jun 15 13:47:50 salina.hnc3.lan crmd: [28083]: info: crm_get_peer: Node 
susan.hnc3.lan now has id: 1
Jun 15 13:47:50 salina.hnc3.lan crmd: [28083]: info: crm_update_peer: Node 
susan.hnc3.lan: id=1 state=member (new) addr=(null
) votes=-1 born=1 seen=2 proc=00000000000000000000000000000200
Jun 15 13:47:50 salina.hnc3.lan crmd: [28083]: info: crm_update_peer_proc: 
susan.hnc3.lan.ais is now online
Jun 15 13:47:50 salina.hnc3.lan crmd: [28083]: info: crm_update_peer: Node 
salina.hnc3.lan: id=0 state=member (new) addr=(nul
l) votes=-1 born=2 seen=2 proc=00000000000000000000000000000200
Jun 15 13:47:50 salina.hnc3.lan crmd: [28083]: info: crm_update_peer_proc: 
salina.hnc3.lan.ais is now online
Jun 15 13:47:50 salina.hnc3.lan crmd: [28083]: info: do_started: The local CRM 
is operational
Jun 15 13:47:50 salina.hnc3.lan crmd: [28083]: info: do_state_transition: State 
transition S_STARTING -> S_PENDING [ input=I_
PENDING cause=C_FSA_INTERNAL origin=do_started ]
Jun 15 13:47:52 salina.hnc3.lan cib: [28079]: info: cib_process_diff: Diff 
0.21.5 -> 0.21.6 not applied to 0.18.0: current "e
poch" is less than required
Jun 15 13:47:52 salina.hnc3.lan crmd: [28083]: info: update_dc: Set DC to 
susan.hnc3.lan (3.0.1)
Jun 15 13:47:52 salina.hnc3.lan cib: [28079]: info: cib_server_process_diff: 
Requesting re-sync from peer
Jun 15 13:47:52 salina.hnc3.lan cib: [28079]: WARN: cib_diff_notify: Local-only 
Change (client:crmd, call: 37): 0.0.0 (Applic
ation of an update diff failed, requesting a full refresh)
Jun 15 13:47:52 salina.hnc3.lan cib: [28079]: info: cib_replace_notify: 
Replaced: 0.18.0 -> 0.21.6 from susan.hnc3.lan
Jun 15 13:47:52 salina.hnc3.lan crmd: [28083]: info: update_attrd: Connecting 
to attrd...
Jun 15 13:47:52 salina.hnc3.lan crmd: [28083]: info: do_state_transition: State 
transition S_PENDING -> S_NOT_DC [ input=I_NO
T_DC cause=C_HA_MESSAGE origin=do_cl_join_finalize_respond ]
Jun 15 13:47:52 salina.hnc3.lan attrd: [28082]: info: find_hash_entry: Creating 
hash entry for terminate
Jun 15 13:47:52 salina.hnc3.lan attrd: [28082]: info: find_hash_entry: Creating 
hash entry for shutdown
Jun 15 13:47:52 salina.hnc3.lan attrd: [28082]: info: attrd_local_callback: 
Sending full refresh (origin=crmd)
Jun 15 13:47:52 salina.hnc3.lan attrd: [28082]: info: attrd_trigger_update: 
Sending flush op to all hosts for: terminate (<nu
ll>)
Jun 15 13:47:52 salina.hnc3.lan attrd: [28082]: info: attrd_perform_update: 
Delaying operation terminate=<null>: cib not conn
ected
Jun 15 13:47:52 salina.hnc3.lan attrd: [28082]: info: attrd_trigger_update: 
Sending flush op to all hosts for: shutdown (<nul
l>)
Jun 15 13:47:52 salina.hnc3.lan attrd: [28082]: info: attrd_ha_callback: flush 
message from salina.hnc3.lan
Jun 15 13:47:52 salina.hnc3.lan attrd: [28082]: info: attrd_ha_callback: flush 
message from salina.hnc3.lan
Jun 15 13:47:52 salina.hnc3.lan cib: [28092]: info: write_cib_contents: Wrote 
version 0.21.0 of the CIB to disk (digest: 807f
50e28ff0e448e9a6a3b82a879d1c)
Jun 15 13:47:52 salina.hnc3.lan cib: [28092]: info: retrieveCib: Reading 
cluster configuration from: /var/lib/heartbeat/crm/c
ib-19.raw
Jun 15 13:47:52 salina.hnc3.lan attrd: [28082]: info: cib_connect: Connected to 
the CIB after 2 signon attempts
Jun 15 13:47:52 salina.hnc3.lan attrd: [28082]: info: cib_connect: Sending full 
refresh
ll>)
l>)
Jun 15 13:47:52 salina.hnc3.lan attrd: [28082]: info: attrd_ha_callback: flush 
message from salina.hnc3.lan
Jun 15 13:47:52 salina.hnc3.lan attrd: [28082]: info: attrd_ha_callback: flush 
message from salina.hnc3.lan
50e28ff0e448e9a6a3b82a879d1c)
ib.7YiWtn (digest: /var/lib/heartbeat/crm/cib.R6wctE)
Jun 15 13:47:52 salina.hnc3.lan cib: [28079]: info: Managed write_cib_contents 
process 28092 exited with return code 0.
Jun 15 13:47:53 salina.hnc3.lan attrd: [28082]: info: attrd_ha_callback: flush 
message from salina.hnc3.lan
Jun 15 13:47:53 salina.hnc3.lan attrd: [28082]: info: attrd_ha_callback: flush 
message from salina.hnc3.lan
]/transient_attributes": ok (rc=0)
]/lrm": ok (rc=0)
Jun 15 13:47:55 salina.hnc3.lan attrd: [28082]: info: attrd_ha_callback: flush 
message from susan.hnc3.lan
Jun 15 13:47:55 salina.hnc3.lan attrd: [28082]: info: find_hash_entry: Creating 
hash entry for probe_complete
Jun 15 13:47:55 salina.hnc3.lan attrd: [28082]: info: attrd_ha_callback: flush 
message from susan.hnc3.lan
Jun 15 13:47:55 salina.hnc3.lan attrd: [28082]: info: attrd_ha_callback: flush 
message from susan.hnc3.lan
 op=mySQL_IP_monitor_0 )
Jun 15 13:47:55 salina.hnc3.lan lrmd: [28080]: info: rsc:mySQL_IP:2: probe
Jun 15 13:47:55 salina.hnc3.lan lrmd: [28080]: WARN: Managed mySQL_IP:monitor 
process 28097 exited with return code 7.
pdate=8, confirmed=true) not running
 (true)
Jun 15 13:47:57 salina.hnc3.lan attrd: [28082]: info: attrd_perform_update: Sen
t update 13: probe_complete=true
Jun 15 13:48:01 salina.hnc3.lan heartbeat: [28067]: info: killing 
/usr/lib/heartbeat/crmd process group 28083 with signal 15
Jun 15 13:48:01 salina.hnc3.lan crmd: [28083]: info: crm_signal_dispatch: 
Invoking handler for signal 15: Terminated
Jun 15 13:48:01 salina.hnc3.lan crmd: [28083]: info: crm_shutdown: Requesting 
shutdown
Jun 15 13:48:01 salina.hnc3.lan crmd: [28083]: info: do_shutdown_req: Sending 
shutdown request to DC: susan.hnc3.lan
Jun 15 13:48:02 salina.hnc3.lan attrd: [28082]: info: attrd_ha_callback: Update 
relayed from susan.hnc3.lan
602481)
Jun 15 13:48:02 salina.hnc3.lan attrd: [28082]: info: attrd_perform_update: 
Sent update 16: shutdown=1276602481
Jun 15 13:48:03 salina.hnc3.lan crmd: [28083]: info: handle_request: Shutting 
down
TOP cause=C_HA_MESSAGE origin=route_message ]
Jun 15 13:48:03 salina.hnc3.lan crmd: [28083]: info: do_shutdown: All 
subsystems stopped, continuing
Jun 15 13:48:03 salina.hnc3.lan crmd: [28083]: info: do_lrm_control: 
Disconnected from the LRM
Jun 15 13:48:03 salina.hnc3.lan ccm: [28078]: info: client (pid=28083) removed 
from ccm
Jun 15 13:48:03 salina.hnc3.lan crmd: [28083]: info: do_cib_control: 
Disconnecting CIB
Jun 15 13:48:03 salina.hnc3.lan crmd: [28083]: info: 
crmd_cib_connection_destroy: Connection to the CIB terminated...
Jun 15 13:48:03 salina.hnc3.lan crmd: [28083]: info: do_exit: Performing 
A_EXIT_0 - gracefully exiting the CRMd
origin=do_stop ]
Jun 15 13:48:03 salina.hnc3.lan crmd: [28083]: info: do_exit: [crmd] stopped (0)
Jun 15 13:48:03 salina.hnc3.lan heartbeat: [28067]: info: killing 
/usr/lib/heartbeat/attrd process group 28082 with signal 15
Jun 15 13:48:03 salina.hnc3.lan attrd: [28082]: info: crm_signal_dispatch: 
Invoking handler for signal 15: Terminated
Jun 15 13:48:03 salina.hnc3.lan attrd: [28082]: info: attrd_shutdown: Exiting
Jun 15 13:48:03 salina.hnc3.lan attrd: [28082]: info: main: Exiting...
Jun 15 13:48:03 salina.hnc3.lan attrd: [28082]: info: 
attrd_cib_connection_destroy: Connection to the CIB terminated...
 15
Jun 15 13:48:03 salina.hnc3.lan stonithd: [28081]: notice: 
/usr/lib/heartbeat/stonithd normally quit.
15
Jun 15 13:48:03 salina.hnc3.lan lrmd: [28080]: info: lrmd is shutting down
Jun 15 13:48:03 salina.hnc3.lan heartbeat: [28067]: info: killing 
/usr/lib/heartbeat/cib process group 28079 with signal 15
Jun 15 13:48:03 salina.hnc3.lan cib: [28079]: info: crm_signal_dispatch: 
Invoking handler for signal 15: Terminated
Jun 15 13:48:03 salina.hnc3.lan cib: [28079]: info: cib_shutdown: Disconnected 
0 clients
Jun 15 13:48:03 salina.hnc3.lan cib: [28079]: info: cib_process_disconnect: All 
clients disconnected...
Jun 15 13:48:03 salina.hnc3.lan cib: [28079]: info: initiate_exit: Sending 
disconnect notification to 2 peers...
Jun 15 13:48:04 salina.hnc3.lan cib: [28079]: info: cib_process_shutdown_req: 
Shutdown ACK from susan.hnc3.lan
Jun 15 13:48:04 salina.hnc3.lan cib: [28079]: info: terminate_cib: 
cib_process_shutdown_req: Disconnecting heartbeat
Jun 15 13:48:04 salina.hnc3.lan cib: [28079]: info: terminate_cib: Exiting...
'all' (origin=susan.hnc3.lan/susan.hnc3.lan/(null), version=0.0.0): ok (rc=0)
Jun 15 13:48:04 salina.hnc3.lan cib: [28079]: info: ha_msg_dispatch: Lost 
connection to heartbeat service.
Jun 15 13:48:04 salina.hnc3.lan cib: [28079]: info: main: Done
Jun 15 13:48:04 salina.hnc3.lan ccm: [28078]: info: client (pid=28079) removed 
from ccm
Jun 15 13:48:04 salina.hnc3.lan heartbeat: [28067]: info: killing 
/usr/lib/heartbeat/ccm process group 28078 with signal 15
Jun 15 13:48:04 salina.hnc3.lan ccm: [28078]: info: received SIGTERM, going to 
shut down
Jun 15 13:48:04 salina.hnc3.lan heartbeat: [28067]: info: client 
[/usr/lib/heartbeat/ipfail] is not running.
Jun 15 13:48:06 salina.hnc3.lan heartbeat: [28067]: info: killing HBREAD 
process 28072 with signal 15
Jun 15 13:48:06 salina.hnc3.lan heartbeat: [28067]: info: killing HBWRITE 
process 28073 with signal 15
Jun 15 13:48:06 salina.hnc3.lan heartbeat: [28067]: info: killing HBREAD 
process 28074 with signal 15
Jun 15 13:48:06 salina.hnc3.lan heartbeat: [28067]: info: killing HBFIFO 
process 28070 with signal 15
Jun 15 13:48:06 salina.hnc3.lan heartbeat: [28067]: info: killing HBWRITE 
process 28071 with signal 15
Jun 15 13:48:06 salina.hnc3.lan heartbeat: [28067]: info: Core process 28074 
exited. 5 remaining
Jun 15 13:48:06 salina.hnc3.lan heartbeat: [28067]: info: Core process 28072 
exited. 4 remaining
Jun 15 13:48:06 salina.hnc3.lan heartbeat: [28067]: info: Core process 28071 
exited. 3 remaining
Jun 15 13:48:06 salina.hnc3.lan heartbeat: [28067]: info: Core process 28070 
exited. 2 remaining
Jun 15 13:48:06 salina.hnc3.lan heartbeat: [28067]: info: Core process 28073 
exited. 1 remaining
Jun 15 13:48:06 salina.hnc3.lan heartbeat: [28067]: info: salina.hnc3.lan 
Heartbeat shutdown complete.


Full logfile is attached

Any assistance would be much appreciated!

Kind Regards

f_c
-- 
GMX DSL: Internet-, Telefon- und Handy-Flat ab 19,99 EUR/mtl.  
Bis zu 150 EUR Startguthaben inklusive! http://portal.gmx.net/de/go/dsl

-- 
GRATIS für alle GMX-Mitglieder: Die maxdome Movie-FLAT!
Jetzt freischalten unter http://portal.gmx.net/de/go/maxdome01

salina_ha-log
Description: Binary data

_______________________________________________
Pacemaker mailing list: [email protected]
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://developerbugs.linux-foundation.org/enter_bug.cgi?product=Pacemaker

_______________________________________________
Pacemaker mailing list: [email protected]
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://developerbugs.linux-foundation.org/enter_bug.cgi?product=Pacemaker

[Pacemaker] Solved !!! Fwd: after update one node in crm is getting offline CentOS

Reply via email to