pacemaker 1.1.12-11.12
openais 1.1.4-5.24.5
corosync 1.4.7-0.23.5

Its a two node active/passive cluster and we just upgraded the SLES 11 SP 3 to SLES 11 SP 4(nothing else) but when we try to start the cluster service we get the following error:

"Totem is unable to form a cluster because of an operating system or network fault."

Firewall is stopped and disabled on both the nodes. Both nodes can ping/ssh/vnc each other.

corosync.conf:
aisexec {
    group:    root
    user:    root
}
service {
    use_mgmtd:    yes
    use_logd:    yes
    ver:    0
    name:    pacemaker
}
totem {
    rrp_mode:    none
    join:    60
    max_messages:    20
    vsftype:    none
    token:    5000
    consensus:    6000

    interface {
        bindnetaddr:    192.168.150.0

        member {
            memberaddr:     192.168.150.12
        }
        member {
            memberaddr:      192.168.150.13
        }
        mcastport:    5405

        ringnumber:    0

    }
    secauth:    off
    version:    2
    transport:    udpu
    token_retransmits_before_loss_const:    10
    clear_node_high_bit:    new
}
logging {
    to_logfile:    no
    to_syslog:    yes
    debug:    off
    timestamp:    off
    to_stderr:    no
    fileline:    off
    syslog_facility:    daemon
}
amf {
    mode:    disable
}

/var/log/messages:
Apr 6 17:51:49 prd1 corosync[8672]: [MAIN ] Corosync Cluster Engine ('1.4.7'): started and ready to provide service. Apr 6 17:51:49 prd1 corosync[8672]: [MAIN ] Corosync built-in features: nss Apr 6 17:51:49 prd1 corosync[8672]: [MAIN ] Successfully configured openais services to load Apr 6 17:51:49 prd1 corosync[8672]: [MAIN ] Successfully read main configuration file '/etc/corosync/corosync.conf'. Apr 6 17:51:49 prd1 corosync[8672]: [TOTEM ] Initializing transport (UDP/IP Unicast). Apr 6 17:51:49 prd1 corosync[8672]: [TOTEM ] Initializing transmit/receive security: libtomcrypt SOBER128/SHA1HMAC (mode 0). Apr 6 17:51:49 prd1 corosync[8672]: [TOTEM ] The network interface is down. Apr 6 17:51:49 prd1 corosync[8672]: [SERV ] Service engine loaded: openais cluster membership service B.01.01 Apr 6 17:51:49 prd1 corosync[8672]: [SERV ] Service engine loaded: openais event service B.01.01 Apr 6 17:51:49 prd1 corosync[8672]: [SERV ] Service engine loaded: openais checkpoint service B.01.01 Apr 6 17:51:49 prd1 corosync[8672]: [SERV ] Service engine loaded: openais availability management framework B.01.01 Apr 6 17:51:49 prd1 corosync[8672]: [SERV ] Service engine loaded: openais message service B.03.01 Apr 6 17:51:49 prd1 corosync[8672]: [SERV ] Service engine loaded: openais distributed locking service B.03.01 Apr 6 17:51:49 prd1 corosync[8672]: [SERV ] Service engine loaded: openais timer service A.01.01 Apr 6 17:51:49 prd1 corosync[8672]: [pcmk ] info: process_ais_conf: Reading configure Apr 6 17:51:49 prd1 corosync[8672]: [pcmk ] info: config_find_init: Local handle: 7685269064754659330 for logging Apr 6 17:51:49 prd1 corosync[8672]: [pcmk ] info: config_find_next: Processing additional logging options... Apr 6 17:51:49 prd1 corosync[8672]: [pcmk ] info: get_config_opt: Found 'off' for option: debug Apr 6 17:51:49 prd1 corosync[8672]: [pcmk ] info: get_config_opt: Found 'no' for option: to_logfile Apr 6 17:51:49 prd1 corosync[8672]: [pcmk ] info: get_config_opt: Found 'yes' for option: to_syslog Apr 6 17:51:49 prd1 corosync[8672]: [pcmk ] info: get_config_opt: Found 'daemon' for option: syslog_facility Apr 6 17:51:49 prd1 corosync[8672]: [pcmk ] info: config_find_init: Local handle: 8535092201842016259 for quorum Apr 6 17:51:49 prd1 corosync[8672]: [pcmk ] info: config_find_next: No additional configuration supplied for: quorum Apr 6 17:51:49 prd1 corosync[8672]: [pcmk ] info: get_config_opt: No default for option: provider Apr 6 17:51:49 prd1 corosync[8672]: [pcmk ] info: config_find_init: Local handle: 8054506479773810692 for service Apr 6 17:51:49 prd1 corosync[8672]: [pcmk ] info: config_find_next: Processing additional service options... Apr 6 17:51:49 prd1 corosync[8672]: [pcmk ] info: config_find_next: Processing additional service options... Apr 6 17:51:49 prd1 corosync[8672]: [pcmk ] info: config_find_next: Processing additional service options... Apr 6 17:51:49 prd1 corosync[8672]: [pcmk ] info: config_find_next: Processing additional service options... Apr 6 17:51:49 prd1 corosync[8672]: [pcmk ] info: config_find_next: Processing additional service options... Apr 6 17:51:49 prd1 corosync[8672]: [pcmk ] info: config_find_next: Processing additional service options... Apr 6 17:51:49 prd1 corosync[8672]: [pcmk ] info: config_find_next: Processing additional service options... Apr 6 17:51:49 prd1 corosync[8672]: [pcmk ] info: config_find_next: Processing additional service options... Apr 6 17:51:49 prd1 corosync[8672]: [pcmk ] info: get_config_opt: Found '0' for option: ver Apr 6 17:51:49 prd1 corosync[8672]: [pcmk ] info: get_config_opt: Defaulting to 'pcmk' for option: clustername Apr 6 17:51:49 prd1 corosync[8672]: [pcmk ] info: get_config_opt: Found 'yes' for option: use_logd Apr 6 17:51:49 prd1 corosync[8672]: [pcmk ] info: get_config_opt: Found 'yes' for option: use_mgmtd Apr 6 17:51:49 prd1 corosync[8672]: [pcmk ] info: pcmk_startup: CRM: Initialized Apr 6 17:51:49 prd1 corosync[8672]: [pcmk ] Logging: Initialized pcmk_startup Apr 6 17:51:49 prd1 corosync[8672]: [pcmk ] info: pcmk_startup: Maximum core file size is: 18446744073709551615 Apr 6 17:51:49 prd1 corosync[8672]: [pcmk ] info: pcmk_startup: Service: 9 Apr 6 17:51:49 prd1 corosync[8672]: [pcmk ] info: pcmk_startup: Local hostname: prd1 Apr 6 17:51:49 prd1 corosync[8672]: [pcmk ] info: pcmk_update_nodeid: Local node id: 2130706433 Apr 6 17:51:49 prd1 corosync[8672]: [pcmk ] info: update_member: Creating entry for node 2130706433 born on 0 Apr 6 17:51:49 prd1 corosync[8672]: [pcmk ] info: update_member: 0x64c9c0 Node 2130706433 now known as prd1 (was: (null)) Apr 6 17:51:49 prd1 corosync[8672]: [pcmk ] info: update_member: Node prd1 now has 1 quorum votes (was 0) Apr 6 17:51:49 prd1 corosync[8672]: [pcmk ] info: update_member: Node 2130706433/prd1 is now: member Apr 6 17:51:49 prd1 corosync[8672]: [pcmk ] info: spawn_child: Using uid=90 and group=90 for process cib Apr 6 17:51:49 prd1 corosync[8672]: [pcmk ] info: spawn_child: Forked child 8677 for process cib Apr 6 17:51:49 prd1 corosync[8672]: [pcmk ] info: spawn_child: Forked child 8678 for process stonith-ng Apr 6 17:51:49 prd1 corosync[8672]: [pcmk ] info: spawn_child: Forked child 8679 for process lrmd Apr 6 17:51:49 prd1 corosync[8672]: [pcmk ] info: spawn_child: Using uid=90 and group=90 for process attrd Apr 6 17:51:49 prd1 corosync[8672]: [pcmk ] info: spawn_child: Forked child 8680 for process attrd Apr 6 17:51:49 prd1 corosync[8672]: [pcmk ] info: spawn_child: Using uid=90 and group=90 for process pengine Apr 6 17:51:49 prd1 corosync[8672]: [pcmk ] info: spawn_child: Forked child 8681 for process pengine Apr 6 17:51:49 prd1 corosync[8672]: [pcmk ] info: spawn_child: Using uid=90 and group=90 for process crmd Apr 6 17:51:49 prd1 corosync[8672]: [pcmk ] info: spawn_child: Forked child 8682 for process crmd Apr 6 17:51:49 prd1 corosync[8672]: [pcmk ] info: spawn_child: Forked child 8683 for process mgmtd Apr 6 17:51:49 prd1 corosync[8672]: [SERV ] Service engine loaded: Pacemaker Cluster Manager 1.1.12 Apr 6 17:51:49 prd1 corosync[8672]: [SERV ] Service engine loaded: corosync extended virtual synchrony service Apr 6 17:51:49 prd1 corosync[8672]: [SERV ] Service engine loaded: corosync configuration service Apr 6 17:51:49 prd1 corosync[8672]: [SERV ] Service engine loaded: corosync cluster closed process group service v1.01 Apr 6 17:51:49 prd1 corosync[8672]: [SERV ] Service engine loaded: corosync cluster config database access v1.01 Apr 6 17:51:49 prd1 corosync[8672]: [SERV ] Service engine loaded: corosync profile loading service Apr 6 17:51:49 prd1 corosync[8672]: [SERV ] Service engine loaded: corosync cluster quorum service v0.1 Apr 6 17:51:49 prd1 corosync[8672]: [MAIN ] Compatibility mode set to whitetank. Using V1 and V2 of the synchronization engine. Apr 6 17:51:49 prd1 corosync[8672]: [TOTEM ] adding new UDPU member {192.168.150.12} Apr 6 17:51:49 prd1 corosync[8672]: [TOTEM ] adding new UDPU member {192.168.150.13} Apr 6 17:51:50 prd1 lrmd[8679]: notice: crm_add_logfile: Additional logging available in /var/log/pacemaker.log Apr 6 17:51:50 prd1 mgmtd: [8683]: info: Pacemaker-mgmt Git Version: 969d213 Apr 6 17:51:50 prd1 mgmtd: [8683]: WARN: Core dumps could be lost if multiple dumps occur. Apr 6 17:51:50 prd1 mgmtd: [8683]: WARN: Consider setting non-default value in /proc/sys/kernel/core_pattern (or equivalent) for maximum supportability Apr 6 17:51:50 prd1 mgmtd: [8683]: WARN: Consider setting /proc/sys/kernel/core_uses_pid (or equivalent) to 1 for maximum supportability Apr 6 17:51:50 prd1 attrd[8680]: notice: crm_add_logfile: Additional logging available in /var/log/pacemaker.log Apr 6 17:51:50 prd1 pengine[8681]: notice: crm_add_logfile: Additional logging available in /var/log/pacemaker.log Apr 6 17:51:50 prd1 attrd[8680]: notice: crm_cluster_connect: Connecting to cluster infrastructure: classic openais (with plugin) Apr 6 17:51:50 prd1 cib[8677]: notice: crm_add_logfile: Additional logging available in /var/log/pacemaker.log Apr 6 17:51:50 prd1 crmd[8682]: notice: crm_add_logfile: Additional logging available in /var/log/pacemaker.log Apr 6 17:51:50 prd1 attrd[8680]: notice: get_node_name: Defaulting to uname -n for the local classic openais (with plugin) node name Apr 6 17:51:50 prd1 corosync[8672]: [pcmk ] info: pcmk_ipc: Recorded connection 0x7f944c04acf0 for attrd/8680
Apr  6 17:51:50 prd1 crmd[8682]:   notice: main: CRM Git Version: f47ea56
Apr 6 17:51:50 prd1 attrd[8680]: notice: get_node_name: Defaulting to uname -n for the local classic openais (with plugin) node name
Apr  6 17:51:50 prd1 attrd[8680]:   notice: main: Starting mainloop...
Apr 6 17:51:50 prd1 stonith-ng[8678]: notice: crm_add_logfile: Additional logging available in /var/log/pacemaker.log Apr 6 17:51:50 prd1 stonith-ng[8678]: notice: crm_cluster_connect: Connecting to cluster infrastructure: classic openais (with plugin) Apr 6 17:51:50 prd1 stonith-ng[8678]: notice: get_node_name: Defaulting to uname -n for the local classic openais (with plugin) node name Apr 6 17:51:50 prd1 corosync[8672]: [pcmk ] info: pcmk_ipc: Recorded connection 0x658190 for stonith-ng/8678 Apr 6 17:51:50 prd1 corosync[8672]: [pcmk ] info: update_member: Node prd1 now has process list: 00000000000000000000000000151312 (1381138) Apr 6 17:51:50 prd1 corosync[8672]: [pcmk ] info: pcmk_ipc: Sending membership update 0 to stonith-ng Apr 6 17:51:50 prd1 stonith-ng[8678]: notice: get_node_name: Defaulting to uname -n for the local classic openais (with plugin) node name Apr 6 17:51:50 prd1 cib[8677]: notice: crm_cluster_connect: Connecting to cluster infrastructure: classic openais (with plugin) Apr 6 17:51:50 prd1 cib[8677]: notice: get_node_name: Defaulting to uname -n for the local classic openais (with plugin) node name Apr 6 17:51:50 prd1 corosync[8672]: [pcmk ] info: pcmk_ipc: Recorded connection 0x65d450 for cib/8677 Apr 6 17:51:50 prd1 corosync[8672]: [pcmk ] info: pcmk_ipc: Sending membership update 0 to cib Apr 6 17:51:50 prd1 cib[8677]: notice: get_node_name: Defaulting to uname -n for the local classic openais (with plugin) node name Apr 6 17:51:50 prd1 cib[8677]: notice: crm_update_peer_state: cib_peer_update_callback: Node prd1[2130706433] - state is now lost (was (null)) Apr 6 17:51:50 prd1 cib[8677]: notice: crm_update_peer_state: plugin_handle_membership: Node prd1[2130706433] - state is now member (was lost)
Apr  6 17:51:50 prd1 mgmtd: [8683]: info: Started.
Apr 6 17:51:51 prd1 crmd[8682]: notice: crm_cluster_connect: Connecting to cluster infrastructure: classic openais (with plugin) Apr 6 17:51:51 prd1 crmd[8682]: notice: get_node_name: Defaulting to uname -n for the local classic openais (with plugin) node name Apr 6 17:51:51 prd1 corosync[8672]: [pcmk ] info: pcmk_ipc: Recorded connection 0x661b00 for crmd/8682 Apr 6 17:51:51 prd1 corosync[8672]: [pcmk ] info: pcmk_ipc: Sending membership update 0 to crmd Apr 6 17:51:51 prd1 crmd[8682]: notice: get_node_name: Defaulting to uname -n for the local classic openais (with plugin) node name Apr 6 17:51:51 prd1 stonith-ng[8678]: notice: setup_cib: Watching for stonith topology changes Apr 6 17:51:51 prd1 stonith-ng[8678]: notice: crm_update_peer_state: st_peer_update_callback: Node prd1[2130706433] - state is now lost (was (null)) Apr 6 17:51:51 prd1 stonith-ng[8678]: notice: crm_update_peer_state: plugin_handle_membership: Node prd1[2130706433] - state is now member (was lost) Apr 6 17:51:51 prd1 crmd[8682]: notice: crm_update_peer_state: plugin_handle_membership: Node prd1[2130706433] - state is now member (was (null)) Apr 6 17:51:51 prd1 crmd[8682]: notice: do_started: The local CRM is operational Apr 6 17:51:51 prd1 crmd[8682]: notice: do_state_transition: State transition S_STARTING -> S_PENDING [ input=I_PENDING cause=C_FSA_INTERNAL origin=do_started ] Apr 6 17:51:51 prd1 stonith-ng[8678]: notice: unpack_config: On loss of CCM Quorum: Ignore Apr 6 17:52:12 prd1 crmd[8682]: warning: do_log: FSA: Input I_DC_TIMEOUT from crm_timer_popped() received in state S_PENDING Apr 6 17:52:35 prd1 corosync[8672]: [MAIN ] Totem is unable to form a cluster because of an operating system or network fault. The most common cause of this message is that the local firewall is configured improperly. Apr 6 17:52:36 prd1 corosync[8672]: [MAIN ] Totem is unable to form a cluster because of an operating system or network fault. The most common cause of this message is that the local firewall is configured improperly.


--
Regards,

Muhammad Sharfuddin
<http://www.nds.com.pk>

_______________________________________________
Users mailing list: Users@clusterlabs.org
http://clusterlabs.org/mailman/listinfo/users

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org

Reply via email to