Hi, On Wed, Apr 02, 2008 at 01:55:14PM +0800, ???? wrote: > Thank you very much for your explanation, Andrew Beekhof. > > >I'm pretty sure you have to run that command on all the remaining nodes > >Which would explain why some nodes think there are 3 cluster members > >and other think there are two (ie. the first 3 errors)
No, it should be enough to run the command on one node. But the node to be removed should be stopped beforehand. > But I still have some question. > 1. When hb_addnode executed on node "slave", > log "*** info: hb_add_one_node: Adding new node[master] to configuration." > appeared on all node's logfile. > which means that all nodes(include node2) should have known that a new node > "master" want to join the cluster. > Does i have to run "hb_addnode master" on node "node2" ??? No, that shouldn't be necessary. > 2. If my guess is wrong, how could i run hb_addnode on all remaining nodes at > the same time ??? or, how can i prevent the remaining node's OS from > rebooting them self ?? > log indicate that node2 will reboot in a minute when it find error below: > "ccm[4806]: 2008/03/31_19:08:38 ERROR: ccm_control_process: Node count from > node slave does not agree: local count=2, count in message=3 ". Did you read this howto on removing/deleting nodes: http://linux-ha.org/DRBD/HowTov2 Thanks, Dejan > Thanks! > > log info like like this: > //////////////////////////////////////////////////// > heartbeat[4767]: 2008/03/31_19:08:17 info: hb_add_one_node: Adding new > node[master] to configuration. > ccm[4806]: 2008/03/31_19:08:38 ERROR: ccm_control_process: Node count from > node slave does not agree: local count=2, count in message=3 > ccm[4806]: 2008/03/31_19:08:38 ERROR: Please make sure ha.cf files on all > nodes have same nodes list or add "autojoin any" to ha.cf > ccm[4806]: 2008/03/31_19:08:38 info: If this problem persists, check the > heartbeat 'hostcache' files in the cluster to look for problems. > cib[4807]: 2008/03/31_19:08:38 info: mem_handle_func:IPC broken, ccm is dead > before the client! > cib[4807]: 2008/03/31_19:08:38 ERROR: cib_ccm_dispatch: CCM connection > appears to have failed: rc=-1. > cib[4807]: 2008/03/31_19:08:38 ERROR: cib_ccm_dispatch: Exiting to recover > from CCM connection failure > crmd[4811]: 2008/03/31_19:08:38 info: mem_handle_func:IPC broken, ccm is dead > before the client! > crmd[4811]: 2008/03/31_19:08:38 ERROR: ccm_dispatch: CCM connection appears > to have failed: rc=-1. > crmd[4811]: 2008/03/31_19:08:38 ERROR: do_log: [[FSA]] Input I_ERROR from > ccm_dispatch() received in state (S_IDLE) > crmd[4811]: 2008/03/31_19:08:38 info: do_state_transition: State transition > S_IDLE -> S_RECOVERY [ input=I_ERROR cause=C_CCM_CALLBACK origin=ccm_dispatch > ] > crmd[4811]: 2008/03/31_19:08:38 ERROR: do_recover: Action A_RECOVER > (0000000001000000) not supported > crmd[4811]: 2008/03/31_19:08:38 WARN: do_election_vote: Not voting in > election, we're in state S_RECOVERY > crmd[4811]: 2008/03/31_19:08:38 info: do_dc_release: DC role released > mgmtd[4812]: 2008/03/31_19:08:38 CRIT: cib_native_dispatch: Lost connection > to the CIB service [4807/callback]. > tengine[4841]: 2008/03/31_19:08:38 ERROR: cib_native_msgready: Message > pending on command channel [4807] > heartbeat[4767]: 2008/03/31_19:08:38 WARN: Managed /opt/ha/lib/heartbeat/ccm > process 4806 exited with return code 1. > attrd[4810]: 2008/03/31_19:08:38 ERROR: cib_native_msgready: Message pending > on command channel [4807] > crmd[4811]: 2008/03/31_19:08:38 info: stop_subsystem: Sent -TERM to pengine: > [4842] > pengine[4842]: 2008/03/31_19:08:38 info: pengine_shutdown: Exiting PEngine > (SIGTERM) > heartbeat[4767]: 2008/03/31_19:08:38 EMERG: Rebooting system. Reason: > /opt/ha/lib/heartbeat/ccm > attrd[4810]: 2008/03/31_19:08:38 ERROR: crm_log_message_adv: #========= > cib:cmd message start ==========# > tengine[4841]: 2008/03/31_19:08:38 ERROR: crm_log_message_adv: #========= > cib:cmd message start ==========# > crmd[4811]: 2008/03/31_19:08:38 info: stop_subsystem: Sent -TERM to tengine: > [4841] > //////////////////////////////////////////////////// > _______________________________________________ > Linux-HA mailing list > Linux-HA@lists.linux-ha.org > http://lists.linux-ha.org/mailman/listinfo/linux-ha > See also: http://linux-ha.org/ReportingProblems _______________________________________________ Linux-HA mailing list Linux-HA@lists.linux-ha.org http://lists.linux-ha.org/mailman/listinfo/linux-ha See also: http://linux-ha.org/ReportingProblems