Hello, Our cluster was working OK on corosync stack, with corosync 2.3.0 and pacemaker 1.1.8.
After upgrading (full versions and configs below), we began to have problems with node names. It's a two node cluster, with node names "turifel" (DC) and "selavi". When selavi joins cluster, we have this warning at selavi log: ----- Jun 27 11:54:29 selavi attrd[11998]: notice: corosync_node_name: Unable to get node name for nodeid 168385827 Jun 27 11:54:29 selavi attrd[11998]: notice: get_node_name: Defaulting to uname -n for the local corosync node name ----- This is ok, and also happenned with version 1.1.8. At corosync level, all seems ok: ---- Jun 27 11:51:18 turifel corosync[6725]: [TOTEM ] A processor joined or left the membership and a new membership (10.9.93.35:1184) was formed. Jun 27 11:51:18 turifel corosync[6725]: [QUORUM] Members[2]: 168385827 168385835 Jun 27 11:51:18 turifel corosync[6725]: [MAIN ] Completed service synchronization, ready to provide service. Jun 27 11:51:18 turifel crmd[19526]: notice: crm_update_peer_state: pcmk_quorum_notification: Node selavi[168385827] - state is now member (was lost) ------- But when starting pacemaker on selavi (the new node), turifel log shows this: ---- Jun 27 11:54:28 turifel crmd[19526]: notice: do_state_transition: State transition S_IDLE -> S_INTEGRATION [ input=I_NODE_JOIN cause=C_FSA_INTERNAL origin=peer_update_callback ] Jun 27 11:54:28 turifel crmd[19526]: warning: crm_get_peer: Node 'selavi' and 'selavi' share the same cluster nodeid: 168385827 Jun 27 11:54:28 turifel crmd[19526]: warning: crmd_cs_dispatch: Recieving messages from a node we think is dead: selavi[0] Jun 27 11:54:29 turifel crmd[19526]: warning: crm_get_peer: Node 'selavi' and 'selavi' share the same cluster nodeid: 168385827 Jun 27 11:54:29 turifel crmd[19526]: warning: do_state_transition: Only 1 of 2 cluster nodes are eligible to run resources - continue 0 Jun 27 11:54:29 turifel attrd[19524]: notice: attrd_local_callback: Sending full refresh (origin=crmd) ---- And selavi remains on pending state. Some times turifel (DC) fences selavi, but other times remains pending forever. On turifel node, all resources gives warnings like this one: warning: custom_action: Action p_drbd_ha0:0_monitor_0 on selavi is unrunnable (pending) On both nodes, uname -n and crm_node -n gives correct node names (selavi and turifel respectively) ¿Do you think it's a configuration problem? Below I give information about versions and configurations. Best regards, Bernardo. ----- Versions (git/hg compiled versions): corosync: 2.3.0.66-615d pacemaker: 1.1.9-61e4b8f cluster-glue: 1.0.11 libqb: 0.14.4.43-bb4c3 resource-agents: 3.9.5.98-3b051 crmsh: 1.2.5 Cluster also has drbd, dlm and gfs2, but I think versions are unrelevant here. -------- Output of pacemaker configuration: ./configure --prefix=/opt/ha --without-cman \ --without-heartbeat --with-corosync \ --enable-fatal-warnings=no --with-lcrso-dir=/opt/ha/libexec/lcrso pacemaker configuration: Version = 1.1.9 (Build: 61e4b8f) Features = generated-manpages ascii-docs ncurses libqb-logging libqb-ipc lha-fencing upstart nagios corosync-native snmp libesmtp Prefix = /opt/ha Executables = /opt/ha/sbin Man pages = /opt/ha/share/man Libraries = /opt/ha/lib Header files = /opt/ha/include Arch-independent files = /opt/ha/share State information = /opt/ha/var System configuration = /opt/ha/etc Corosync Plugins = /opt/ha/lib Use system LTDL = yes HA group name = haclient HA user name = hacluster CFLAGS = -I/opt/ha/include -I/opt/ha/include -I/opt/ha/include/heartbeat -I/opt/ha/include -I/opt/ha/include -ggdb -fgnu89-inline -fstack-protector-all -Wall -Waggregate-return -Wbad-function-cast -Wcast-align -Wdeclaration-after-statement -Wendif-labels -Wfloat-equal -Wformat=2 -Wformat-security -Wformat-nonliteral -Wmissing-prototypes -Wmissing-declarations -Wnested-externs -Wno-long-long -Wno-strict-aliasing -Wunused-but-set-variable -Wpointer-arith -Wstrict-prototypes -Wwrite-strings Libraries = -lgnutls -lcorosync_common -lplumb -lpils -lqb -lbz2 -lxslt -lxml2 -lc -luuid -lpam -lrt -ldl -lglib-2.0 -lltdl -L/opt/ha/lib -lqb -ldl -lrt -lpthread Stack Libraries = -L/opt/ha/lib -lqb -ldl -lrt -lpthread -L/opt/ha/lib -lcpg -L/opt/ha/lib -lcfg -L/opt/ha/lib -lcmap -L/opt/ha/lib -lquorum ---- Corosync config: totem { version: 2 crypto_cipher: none crypto_hash: none cluster_name: fiestaha interface { ringnumber: 0 ttl: 1 bindnetaddr: 10.9.93.0 mcastaddr: 226.94.1.1 mcastport: 5405 } } logging { fileline: off to_stderr: yes to_logfile: no to_syslog: yes syslog_facility: local7 debug: off timestamp: on logger_subsys { subsys: QUORUM debug: off } } quorum { provider: corosync_votequorum expected_votes: 2 two_node: 1 wait_for_all: 0 } -- APSL *Bernardo Cabezas Serra* *Responsable Sistemas* Camí Vell de Bunyola 37, esc. A, local 7 07009 Polígono de Son Castelló, Palma Mail: bcabe...@apsl.net Skype: bernat.cabezas Tel: 971439771 _______________________________________________ Pacemaker mailing list: Pacemaker@oss.clusterlabs.org http://oss.clusterlabs.org/mailman/listinfo/pacemaker Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://bugs.clusterlabs.org