Re: Re Re: [Linux-HA] Problem about hb_addnode and hb_delnode

Dejan Muhamedagic Wed, 02 Apr 2008 04:22:12 -0700

Hi,

On Wed, Apr 02, 2008 at 01:55:14PM +0800, ???? wrote:
> Thank you very much for your explanation, Andrew Beekhof. 
> 
> >I'm pretty sure you have to run that command on all the remaining nodes
> >Which would explain why some nodes think there are 3 cluster members
> >and other think there are two (ie. the first 3 errors)


No, it should be enough to run the command on one node. But the
node to be removed should be stopped beforehand.

> But I still have some question.
> 1. When hb_addnode executed on node "slave", 
> log "*** info: hb_add_one_node: Adding new node[master] to configuration." 
> appeared on all node's logfile.
> which means that all nodes(include node2) should have known that a new node 
> "master" want to join the cluster.
> Does i have to run "hb_addnode master" on node "node2" ???

No, that shouldn't be necessary.

> 2. If my guess is wrong, how could i run hb_addnode on all remaining nodes at 
> the same time ???  or, how can i prevent the remaining node's OS from 
> rebooting them self ??
> log indicate that node2 will reboot in a minute when it find error below:
> "ccm[4806]: 2008/03/31_19:08:38 ERROR: ccm_control_process: Node count from 
> node slave does not agree: local count=2, count in message=3 ".

Did you read this howto on removing/deleting nodes:

http://linux-ha.org/DRBD/HowTov2

Thanks,

Dejan

> Thanks!
> 
> log info like like this:
> ////////////////////////////////////////////////////
> heartbeat[4767]: 2008/03/31_19:08:17 info: hb_add_one_node: Adding new 
> node[master] to configuration.
> ccm[4806]: 2008/03/31_19:08:38 ERROR: ccm_control_process: Node count from 
> node slave does not agree: local count=2, count in message=3
> ccm[4806]: 2008/03/31_19:08:38 ERROR: Please make sure ha.cf files on all 
> nodes have same nodes list or add "autojoin any" to ha.cf
> ccm[4806]: 2008/03/31_19:08:38 info: If this problem persists, check the 
> heartbeat 'hostcache' files in the cluster to look for problems.
> cib[4807]: 2008/03/31_19:08:38 info: mem_handle_func:IPC broken, ccm is dead 
> before the client!
> cib[4807]: 2008/03/31_19:08:38 ERROR: cib_ccm_dispatch: CCM connection 
> appears to have failed: rc=-1.
> cib[4807]: 2008/03/31_19:08:38 ERROR: cib_ccm_dispatch: Exiting to recover 
> from CCM connection failure
> crmd[4811]: 2008/03/31_19:08:38 info: mem_handle_func:IPC broken, ccm is dead 
> before the client!
> crmd[4811]: 2008/03/31_19:08:38 ERROR: ccm_dispatch: CCM connection appears 
> to have failed: rc=-1.
> crmd[4811]: 2008/03/31_19:08:38 ERROR: do_log: [[FSA]] Input I_ERROR from 
> ccm_dispatch() received in state (S_IDLE)
> crmd[4811]: 2008/03/31_19:08:38 info: do_state_transition: State transition 
> S_IDLE -> S_RECOVERY [ input=I_ERROR cause=C_CCM_CALLBACK origin=ccm_dispatch 
> ]
> crmd[4811]: 2008/03/31_19:08:38 ERROR: do_recover: Action A_RECOVER 
> (0000000001000000) not supported
> crmd[4811]: 2008/03/31_19:08:38 WARN: do_election_vote: Not voting in 
> election, we're in state S_RECOVERY
> crmd[4811]: 2008/03/31_19:08:38 info: do_dc_release: DC role released
> mgmtd[4812]: 2008/03/31_19:08:38 CRIT: cib_native_dispatch: Lost connection 
> to the CIB service [4807/callback].
> tengine[4841]: 2008/03/31_19:08:38 ERROR: cib_native_msgready: Message 
> pending on command channel [4807]
> heartbeat[4767]: 2008/03/31_19:08:38 WARN: Managed /opt/ha/lib/heartbeat/ccm 
> process 4806 exited with return code 1.
> attrd[4810]: 2008/03/31_19:08:38 ERROR: cib_native_msgready: Message pending 
> on command channel [4807]
> crmd[4811]: 2008/03/31_19:08:38 info: stop_subsystem: Sent -TERM to pengine: 
> [4842]
> pengine[4842]: 2008/03/31_19:08:38 info: pengine_shutdown: Exiting PEngine 
> (SIGTERM)
> heartbeat[4767]: 2008/03/31_19:08:38 EMERG: Rebooting system.  Reason: 
> /opt/ha/lib/heartbeat/ccm
> attrd[4810]: 2008/03/31_19:08:38 ERROR: crm_log_message_adv: #========= 
> cib:cmd message start ==========#
> tengine[4841]: 2008/03/31_19:08:38 ERROR: crm_log_message_adv: #========= 
> cib:cmd message start ==========#
> crmd[4811]: 2008/03/31_19:08:38 info: stop_subsystem: Sent -TERM to tengine: 
> [4841]
> ////////////////////////////////////////////////////
> _______________________________________________
> Linux-HA mailing list
> Linux-HA@lists.linux-ha.org
> http://lists.linux-ha.org/mailman/listinfo/linux-ha
> See also: http://linux-ha.org/ReportingProblems
_______________________________________________
Linux-HA mailing list
Linux-HA@lists.linux-ha.org
http://lists.linux-ha.org/mailman/listinfo/linux-ha
See also: http://linux-ha.org/ReportingProblems

Re: Re Re: [Linux-HA] Problem about hb_addnode and hb_delnode

Reply via email to