On 07/07/2015 03:58 AM, Arjun Pandey wrote: > Hi Ken > > If i look at the logs on the other node around the same time i see this. I > can't figure out the reason based on these.Attaching the corosync.log for > the other node as well.
I don't see anything there either. The relevant part could be earlier or later. Are you using the same version of pacemaker and corosync on both nodes? > Jun 01 13:55:44 [1965] messi crmd: info: do_dc_join_offer_one: > An unknown node joined - (re-)offer to any unconfirmed nodes > Jun 01 13:55:44 [1965] messi crmd: info: join_make_offer: > Making join offers based on membership 224 > Jun 01 13:55:44 [1965] messi crmd: info: join_make_offer: > Skipping messi: already known 4 > Jun 01 13:55:44 [1965] messi crmd: info: join_make_offer: > join-2: Sending offer to ronaldo > Jun 01 13:55:44 [1960] messi cib: info: cib_process_request: > Forwarding cib_modify operation for section status to master > (origin=local/crmd/138) > Jun 01 13:55:44 [1960] messi cib: info: cib_perform_op: > Diff: --- 0.80.2 2 > Jun 01 13:55:44 [1960] messi cib: info: cib_perform_op: > Diff: +++ 0.80.3 (null) > Jun 01 13:55:44 [1965] messi crmd: info: crm_update_peer_join: > join_make_offer: Node ronaldo[2] - join-2 phase 0 -> 1 > Jun 01 13:55:44 [1965] messi crmd: info: abort_transition_graph: > Transition aborted: Peer Halt (source=do_te_invoke:158, 1) > Jun 01 13:55:44 [1960] messi cib: info: cib_perform_op: + > /cib: @num_updates=3 > Jun 01 13:55:44 [1960] messi cib: info: cib_perform_op: + > /cib/status/node_state[@id='ronaldo']: @crmd=online, > @crm-debug-origin=peer_update_callback > Jun 01 13:55:44 [1960] messi cib: info: cib_process_request: > Completed cib_modify operation for section status: OK (rc=0, > origin=messi/crmd/138, version=0.80.3) > Jun 01 13:55:44 [1960] messi cib: info: cib_process_request: > Completed cib_modify operation for section nodes: OK (rc=0, > origin=ronaldo/crmd/3, version=0.80.3) > Jun 01 13:55:45 [1965] messi crmd: info: do_dc_join_offer_one: > join-2: Processing join_announce request from ronaldo in state > S_INTEGRATION > Jun 01 13:55:45 [1965] messi crmd: info: crm_update_peer_join: > do_dc_join_offer_one: Node ronaldo[2] - join-2 phase 1 -> 0 > Jun 01 13:55:45 [1965] messi crmd: info: join_make_offer: > join-2: Sending offer to ronaldo > Jun 01 13:55:45 [1965] messi crmd: info: crm_update_peer_join: > join_make_offer: Node ronaldo[2] - join-2 phase 0 -> 1 > Jun 01 13:55:45 [1965] messi crmd: info: crm_update_peer_join: > join_make_offer: Node messi[1] - join-2 phase 4 -> 0 > Jun 01 13:55:45 [1965] messi crmd: info: join_make_offer: > join-2: Sending offer to messi > Jun 01 13:55:45 [1965] messi crmd: info: crm_update_peer_join: > join_make_offer: Node messi[1] - join-2 phase 0 -> 1 > Jun 01 13:55:45 [1965] messi crmd: info: abort_transition_graph: > Transition aborted: Node join (source=do_dc_join_offer_one:236, 1) > Jun 01 13:55:45 [1965] messi crmd: info: crm_update_peer_join: > do_dc_join_filter_offer: Node messi[1] - join-2 phase 1 -> 2 > Jun 01 13:55:46 [1965] messi crmd: info: crm_update_peer_join: > do_dc_join_filter_offer: Node ronaldo[2] - join-2 phase 1 -> 2 > Jun 01 13:55:46 [1965] messi crmd: info: > crm_update_peer_expected: do_dc_join_filter_offer: Node ronaldo[2] - > expected state is now member (was down) > Jun 01 13:55:46 [1965] messi crmd: info: do_state_transition: > State transition S_INTEGRATION -> S_FINALIZE_JOIN [ input=I_INTEGRATED > cause=C_FSA_INTERNAL origin=check_join_state ] > Jun 01 13:55:46 [1965] messi crmd: info: crmd_join_phase_log: > join-2: messi=integrated > Jun 01 13:55:46 [1965] messi crmd: info: crmd_join_phase_log: > join-2: ronaldo=integrated > Jun 01 13:55:46 [1965] messi crmd: info: do_dc_join_finalize: > join-2: Syncing our CIB to the rest of the cluster > Jun 01 13:55:46 [1965] messi crmd: info: crm_update_peer_join: > finalize_join_for: Node messi[1] - join-2 phase 2 -> 3 > Jun 01 13:55:46 [1965] messi crmd: info: crm_update_peer_join: > finalize_join_for: Node ronaldo[2] - join-2 phase 2 -> 3 > Jun 01 13:55:46 [1965] messi crmd: info: crm_update_peer_join: > do_dc_join_ack: Node messi[1] - join-2 phase 3 -> 4 > Jun 01 13:55:46 [1965] messi crmd: info: do_dc_join_ack: > join-2: Updating node state to member for messi > Jun 01 13:55:46 [1965] messi crmd: info: erase_status_tag: > Deleting xpath: //node_state[@uname='messi']/lrm > Jun 01 13:55:46 [1960] messi cib: info: cib_process_replace: > Digest matched on replace from messi: 5138b696984c7b834dd2b528dadabe0d > Jun 01 13:55:46 [1960] messi cib: info: cib_process_replace: > Replaced 0.80.3 with 0.80.3 from messi > Jun 01 13:55:46 [1965] messi crmd: info: crm_update_peer_join: > do_dc_join_ack: Node ronaldo[2] - join-2 phase 3 -> 4 > > > Regards > Arjun > > On Wed, Jul 1, 2015 at 7:17 PM, Ken Gaillot <kgail...@redhat.com> wrote: > >> On 06/30/2015 11:30 PM, Arjun Pandey wrote: >>> Hi >>> >>> I am running a 2 node cluster with this config on centos 6.5/6.6 >>> >>> Master/Slave Set: foo-master [foo] >>> Masters: [ messi ] >>> Stopped: [ronaldo ] >>> eth1-CP (ocf::pw:IPaddr): Started messi >>> eth2-UP (ocf::pw:IPaddr): Started messi >>> eth3-UPCP (ocf::pw:IPaddr): Started messi >>> >>> where i have a multi-state resource foo being run in master/slave mode >> and >>> IPaddr RA is just modified IPAddr2 RA. Additionally i have a >>> collocation constraint for the IP addr to be collocated with the master. >>> >>> Sometimes when i setup the cluster , i find that one of the nodes (the >>> second node that joins ) gets stopped and i find this log. >>> >>> 2015-06-01T13:55:46.153941+05:30 ronaldo pacemaker: Starting Pacemaker >>> Cluster Manager >>> 2015-06-01T13:55:46.233639+05:30 ronaldo attrd[25988]: notice: >>> attrd_trigger_update: Sending flush op to all hosts for: shutdown (0) >>> 2015-06-01T13:55:46.234162+05:30 ronaldo crmd[25990]: notice: >>> do_state_transition: State transition S_PENDING -> S_NOT_DC [ >>> input=I_NOT_DC cause=C_HA_MESSAG >>> E origin=do_cl_join_finalize_respond ] >>> 2015-06-01T13:55:46.234701+05:30 ronaldo attrd[25988]: notice: >>> attrd_local_callback: Sending full refresh (origin=crmd) >>> 2015-06-01T13:55:46.234708+05:30 ronaldo attrd[25988]: notice: >>> attrd_trigger_update: Sending flush op to all hosts for: shutdown (0) >>> ************************ This looks to be the likely >>> reason******************************************* >>> 2015-06-01T13:55:46.254310+05:30 ronaldo crmd[25990]: error: >>> handle_request: We didn't ask to be shut down, yet our DC is telling us >> too >>> . >>> >> ********************************************************************************************************* >> >> Hi Arjun, >> >> I'd check the other node's logs at this time, to see why it requested >> the shutdown. >> >>> 2015-06-01T13:55:46.254577+05:30 ronaldo crmd[25990]: notice: >>> do_state_transition: State transition S_NOT_DC -> S_STOPPING [ >> input=I_STOP >>> cause=C_HA_MESSAGE >>> origin=route_message ] >>> 2015-06-01T13:55:46.255134+05:30 ronaldo crmd[25990]: notice: >>> lrm_state_verify_stopped: Stopped 0 recurring operations at shutdown... >>> waiting (2 ops remaining) >>> >>> Based on the logs , pacemaker on active was stopping the secondary cloud >>> everytime it joins cluster. This issue seems similar to >>> >> http://pacemaker.oss.clusterlabs.narkive.com/rVvN8May/node-sends-shutdown-request-to-other-node-error >>> >>> Packages used :- >>> pacemaker-1.1.12-4.el6.x86_64 >>> pacemaker-libs-1.1.12-4.el6.x86_64 >>> pacemaker-cli-1.1.12-4.el6.x86_64 >>> pacemaker-cluster-libs-1.1.12-4.el6.x86_64 >>> pacemaker-debuginfo-1.1.10-14.el6.x86_64 >>> pcsc-lite-libs-1.5.2-13.el6_4.x86_64 >>> pcs-0.9.90-2.el6.centos.2.noarch >>> pcsc-lite-1.5.2-13.el6_4.x86_64 >>> pcsc-lite-openct-0.6.19-4.el6.x86_64 >>> corosync-1.4.1-17.el6.x86_64 >>> corosynclib-1.4.1-17.el6.x86_64 _______________________________________________ Users mailing list: Users@clusterlabs.org http://clusterlabs.org/mailman/listinfo/users Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://bugs.clusterlabs.org