Hi Y'all,

I'm having some issues getting things running on a stock CentOS 5.4 install, and I was hoping someone could point me in the right direction...

Through the epel and clusterlabs repos that are referenced in the wiki, I installed:

corosync-1.2.0-1.el5
openais-1.1.0-1.el5
pacemaker-1.0.7-4.el5
(and all dependencies, via yum)

and it all installed fine, according to yum. I installed /etc/corosync/corosync.conf as follows:

-----
# Please read the corosync.conf.5 manual page
compatibility: whitetank

aisexec {
       user:   root
       group:  root
}

totem {
       version: 2

       # How long before declaring a token lost (ms)
       token:          5000

       # How many token retransmits before forming a new configuration
       token_retransmits_before_loss_const: 20

       # How long to wait for join messages in the membership protocol (ms)
       join:           1000

# How long to wait for consensus to be achieved before starting a new round of membership configuration (ms)
       consensus:      7500

       # Turn off the virtual synchrony filter
       vsftype:        none

# Number of messages that may be sent by one processor on receipt of the token
       max_messages:   20

       # Disable encryption
       secauth:        off

       # How many threads to use for encryption/decryption
       threads:        0

       # Limit generated nodeids to 31-bits (positive signed integers)
       clear_node_high_bit: yes

       # Optionally assign a fixed node id (integer)
       # nodeid:         1234
       interface {
               ringnumber: 0
bindnetaddr: 10.1.0.255
mcastaddr: 226.94.1.90
mcastport: 4000
       }
}

logging {
       fileline: off
       to_stderr: yes
       to_logfile: yes
       to_syslog: yes
       logfile: /var/log/corosync.log
       debug: off
       timestamp: on
       logger_subsys {
               subsys: AMF
               debug: off
       }
}

amf {
       mode: disabled
}

service {
       # Load the Pacemaker Cluster Resource Manager
       name: pacemaker
       ver:  0
}
-----

Then I tried:

# /etc/init.d/corosync start
Starting Corosync Cluster Engine (corosync):               [  OK  ]

but then when I run crm_mon, it hangs here:

"Attempting connection to the cluster...."

and nothing happens.  A 'ps' shows corosync in a weird state:

[r...@server ~]# ps -afe | grep coro
root     12942     1  0 08:20 ?        00:00:00 corosync
root     12947 12942  0 08:20 ?        00:00:00 [corosync] <defunct>
root     12955 12858  0 08:20 pts/0    00:00:00 grep coro

I also tried starting corosync via '/etc/init.d/openais start' after changing the line in the /etc/init.d/openais script:

export COROSYNC_DEFAULT_CONFIG_IFACE="openaisserviceenableexperimental:corosync_parser"

and it seems to start, but crm_mon still can't connect and I still get "Attempting connection to the cluster...." and corosync is in a defunct state. Has anyone else had this problem? Are the rpms from epel/clusterlabs not jiving with each other in some way perhaps?

Here is a clip from /var/log/corosync.log:

Mar 07 08:20:04 corosync [MAIN ] Corosync Cluster Engine ('1.2.0'): started and ready to provide service.
Mar 07 08:20:04 corosync [MAIN  ] Corosync built-in features: nss rdma
Mar 07 08:20:04 corosync [MAIN ] Successfully read main configuration file '/etc/corosync/corosync.conf'.
Mar 07 08:20:04 corosync [TOTEM ] Initializing transport (UDP/IP).
Mar 07 08:20:04 corosync [TOTEM ] Initializing transmit/receive security: libtomcrypt SOBER128/SHA1HMAC (mode 0). Mar 07 08:20:04 corosync [MAIN ] Compatibility mode set to whitetank. Using V1 and V2 of the synchronization engine. Mar 07 08:20:04 corosync [TOTEM ] The network interface [10.1.1.84] is now up.
Mar 07 08:20:04 corosync [pcmk  ] info: process_ais_conf: Reading configure
Mar 07 08:20:04 corosync [pcmk ] info: config_find_init: Local handle: 5650605097994944514 for logging Mar 07 08:20:04 corosync [pcmk ] info: config_find_next: Processing additional logging options... Mar 07 08:20:04 corosync [pcmk ] info: get_config_opt: Found 'off' for option: debug Mar 07 08:20:04 corosync [pcmk ] info: get_config_opt: Defaulting to 'off' for option: to_file Mar 07 08:20:04 corosync [pcmk ] info: get_config_opt: Defaulting to 'daemon' for option: syslog_facility Mar 07 08:20:04 corosync [pcmk ] info: config_find_init: Local handle: 2730409743423111171 for service Mar 07 08:20:04 corosync [pcmk ] info: config_find_next: Processing additional service options... Mar 07 08:20:04 corosync [pcmk ] info: get_config_opt: Defaulting to 'pcmk' for option: clustername Mar 07 08:20:04 corosync [pcmk ] info: get_config_opt: Defaulting to 'no' for option: use_logd Mar 07 08:20:04 corosync [pcmk ] info: get_config_opt: Defaulting to 'no' for option: use_mgmtd
Mar 07 08:20:04 corosync [pcmk  ] info: pcmk_startup: CRM: Initialized
Mar 07 08:20:04 corosync [pcmk  ] Logging: Initialized pcmk_startup
Mar 07 08:20:04 corosync [pcmk ] info: pcmk_startup: Maximum core file size is: 18446744073709551615 Mar 07 08:20:04 corosync [pcmk ] ERROR: pcmk_startup: Child 12947 spawned to record non-fatal assertion failure line 544: pwentry != NULL Mar 07 08:20:04 corosync [pcmk ] ERROR: pcmk_startup: Cluster user hacluster does not exist Mar 07 08:20:04 corosync [SERV ] Service engine loaded: Pacemaker Cluster Manager 1.0.7 Mar 07 08:20:04 corosync [SERV ] Service engine loaded: corosync extended virtual synchrony service Mar 07 08:20:04 corosync [SERV ] Service engine loaded: corosync configuration service Mar 07 08:20:04 corosync [SERV ] Service engine loaded: corosync cluster closed process group service v1.01 Mar 07 08:20:04 corosync [SERV ] Service engine loaded: corosync cluster config database access v1.01 Mar 07 08:20:04 corosync [SERV ] Service engine loaded: corosync profile loading service Mar 07 08:20:04 corosync [SERV ] Service engine loaded: corosync cluster quorum service v0.1 Mar 07 08:20:04 corosync [pcmk ] notice: pcmk_peer_update: Transitional membership event on ring 44: memb=0, new=0, lost=0 Mar 07 08:20:04 corosync [pcmk ] notice: pcmk_peer_update: Stable membership event on ring 44: memb=1, new=1, lost=0 Mar 07 08:20:04 corosync [pcmk ] info: update_member: Creating entry for node 1409351946 born on 44 Mar 07 08:20:04 corosync [pcmk ] info: update_member: Node 1409351946/unknown is now: member Mar 07 08:20:04 corosync [pcmk ] info: pcmk_peer_update: NEW: .pending. 1409351946 Mar 07 08:20:05 corosync [pcmk ] info: pcmk_peer_update: MEMB: .pending. 1409351946 Mar 07 08:20:05 corosync [pcmk ] info: pcmk_update_nodeid: Local node id: 1409351946 Mar 07 08:20:05 corosync [pcmk ] info: update_member: Node (null) now has 1 quorum votes (was 0) Mar 07 08:20:05 corosync [pcmk ] info: send_member_notification: Sending membership update 44 to 0 children Mar 07 08:20:05 corosync [pcmk ] info: update_member: Node (null) now has process list: 00000000000000000000000000000002 (2) Mar 07 08:20:05 corosync [TOTEM ] A processor joined or left the membership and a new membership was formed. Mar 07 08:20:05 corosync [pcmk ] info: update_member: 0xec71ac0 Node 1409351946 now known as (was: (null)) Mar 07 08:20:05 corosync [pcmk ] info: send_member_notification: Sending membership update 44 to 0 children Mar 07 08:20:05 corosync [MAIN ] Completed service synchronization, ready to provide service.
Mar 07 08:22:59 corosync [SERV  ] Unloading all Corosync service engines.
Mar 07 08:22:59 corosync [pcmk ] notice: pcmk_shutdown: Shuting down Pacemaker Mar 07 08:22:59 corosync [pcmk ] notice: pcmk_shutdown: crmd confirmed stopped Mar 07 08:22:59 corosync [pcmk ] notice: pcmk_shutdown: pengine confirmed stopped Mar 07 08:22:59 corosync [pcmk ] notice: pcmk_shutdown: attrd confirmed stopped Mar 07 08:22:59 corosync [pcmk ] notice: pcmk_shutdown: lrmd confirmed stopped Mar 07 08:22:59 corosync [pcmk ] notice: pcmk_shutdown: cib confirmed stopped Mar 07 08:22:59 corosync [pcmk ] notice: pcmk_shutdown: stonithd confirmed stopped
Mar 07 08:22:59 corosync [pcmk  ] notice: pcmk_shutdown: Shutdown complete
Mar 07 08:22:59 corosync [SERV ] Service engine unloaded: Pacemaker Cluster Manager 1.0.7 Mar 07 08:22:59 corosync [SERV ] Service engine unloaded: corosync extended virtual synchrony service Mar 07 08:22:59 corosync [SERV ] Service engine unloaded: corosync configuration service Mar 07 08:22:59 corosync [SERV ] Service engine unloaded: corosync cluster closed process group service v1.01 Mar 07 08:22:59 corosync [SERV ] Service engine unloaded: corosync cluster config database access v1.01 Mar 07 08:22:59 corosync [SERV ] Service engine unloaded: corosync profile loading service Mar 07 08:22:59 corosync [SERV ] Service engine unloaded: corosync cluster quorum service v0.1 Mar 07 08:22:59 corosync [MAIN ] Corosync Cluster Engine exiting with status -1 at main.c:158.

Any hints welcome!!

TIA,
erich

_______________________________________________
Pacemaker mailing list
Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Reply via email to