cibadmin -Ql works, problem is persistent after upgrade, and the logs for "crmd" reviled the problem:
Sep 27 16:19:22 [5074] ip-10-82-197-219 crmd: info: crm_ipc_connect: Could not establish cib_shm connection: Connection refused (111) Sep 27 16:19:22 [5074] ip-10-82-197-219 crmd: debug: cib_native_signon_raw: Connection unsuccessful (0 (nil)) Sep 27 16:19:22 [5074] ip-10-82-197-219 crmd: debug: cib_native_signon_raw: Connection to CIB failed: Transport endpoint is not connected I will keep searching for the solution, but in meantime, if you had a moment, any hint would be welcomed. many thanks, On Thu, Sep 26, 2013 at 9:25 PM, Andrew Beekhof <[email protected]> wrote: > > On 27/09/2013, at 8:45 AM, Radoslaw Garbacz > <[email protected]> wrote: > >> Hi, >> >> I have a problem starting up a cluster after upgrading corosync from >> 1.4 to 2.3.2 and pacemaker from 1.8 to 1.9. >> >> All "crm_node" calls report well, but any CIB manipulation fails, i.e.: >> * crm_node -q: 1 >> * crm_node -l: OK >> * crm_node -p: OK >> * cibadmin -Q: Call cib_query failed (-62): Timer expired > > Does cibadmin -Ql work? > If so, there might be a DC election going on (look in the logs for "crmd"). > Is the error transient or persistent? > >> >> No iptables, no SELinux, 3 nodes cluster, corosync.conf: >> ... >> ringnumber: 0 >> bindnetaddr: ... >> mcastport: 7800 >> } >> >> transport: udpu >> >> >> >> Any help greatly appreciated. >> >> >> Below is some more information: >> >> * pacemaker logs: >> >> Sep 26 22:24:00 [2836] ip-10-114-210-162 cib: info: >> crm_client_new: Connecting 0x111b780 for uid=0 gid=0 pid=2883 >> id=977d6f23-963b-41a4-8fe0-a63024080d41 >> Sep 26 22:24:00 [2836] ip-10-114-210-162 cib: info: >> cib_process_request: Forwarding cib_query operation for section >> 'all' to master (origin=local/cibadmin/2) >> Sep 26 22:24:30 [2836] ip-10-114-210-162 cib: info: >> crm_client_destroy: Destroying 0 events >> >> >> * ps axf | grep pacemaker|corosync: >> >> 2806 ? Ssl 0:10 corosync >> 2834 pts/1 S 0:00 pacemakerd >> 2836 ? Ss 0:01 \_ /usr/libexec/pacemaker/cib >> 2837 ? Ss 0:00 \_ /usr/libexec/pacemaker/stonithd >> 2838 ? Ss 0:00 \_ /usr/libexec/pacemaker/lrmd >> 2839 ? Ss 0:00 \_ /usr/libexec/pacemaker/attrd >> 2840 ? Ss 0:00 \_ /usr/libexec/pacemaker/pengine >> 2841 ? Ss 0:00 \_ /usr/libexec/pacemaker/crmd >> >> >> * strace cibadmin -Q: >> >> open("/dev/shm/qb-cib_rw-event-2836-2897-12-data", O_RDWR) = 6 >> ftruncate(6, 20480000) = 0 >> mmap(NULL, 40960000, PROT_NONE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = >> 0x7fa221692000 >> mmap(0x7fa221692000, 20480000, PROT_READ|PROT_WRITE, >> MAP_SHARED|MAP_FIXED, 6, 0) = 0x7fa221692000 >> mmap(0x7fa222a1a000, 20480000, PROT_READ|PROT_WRITE, >> MAP_SHARED|MAP_FIXED, 6, 0) = 0x7fa222a1a000 >> close(6) = 0 >> close(5) = 0 >> close(6) = -1 EBADF (Bad file descriptor) >> fstat(4, {st_mode=S_IFSOCK|0777, st_size=0, ...}) = 0 >> fcntl(4, F_GETFL) = 0x802 (flags O_RDWR|O_NONBLOCK) >> poll([{fd=4, events=POLLIN}], 1, 0) = 0 (Timeout) >> poll([{fd=4, events=POLLIN}], 1, 0) = 0 (Timeout) >> sendto(4, "~", 1, MSG_NOSIGNAL, NULL, 0) = 1 >> futex(0x7fa22df4cb60, FUTEX_WAKE_PRIVATE, 2147483647) = 0 >> gettimeofday({1380234692, 68879}, NULL) = 0 >> poll([{fd=4, events=POLLIN}], 1, 0) = 0 (Timeout) >> poll([{fd=4, events=POLLIN}], 1, 0) = 0 (Timeout) >> gettimeofday({1380234692, 69522}, NULL) = 0 >> sendto(4, "\274", 1, MSG_NOSIGNAL, NULL, 0) = 1 >> poll([{fd=4, events=POLLIN}], 1, 0) = 0 (Timeout) >> gettimeofday({1380234692, 70085}, NULL) = 0 >> gettimeofday({1380234692, 70197}, NULL) = 0 >> poll([{fd=4, events=POLLIN}], 1, 30000) = 0 (Timeout) >> gettimeofday({1380234722, 91625}, NULL) = 0 >> write(2, "Call cib_query failed (-62): Tim"..., 43Call cib_query >> failed (-62): Timer expired >> ) = 43 >> poll([{fd=4, events=POLLIN}], 1, 0) = 0 (Timeout) >> >> >> * netstat -lxp: >> >> Active UNIX domain sockets (only servers) >> Proto RefCnt Flags Type State I-Node PID/Program >> name Path >> unix 2 [ ACC ] STREAM LISTENING 20021 2836/cib >> @cib_rw >> unix 2 [ ACC ] STREAM LISTENING 19958 2838/lrmd >> @lrmd >> unix 2 [ ACC ] STREAM LISTENING 19789 2806/corosync >> @quorum >> unix 2 [ ACC ] STREAM LISTENING 19786 2806/corosync >> @cmap >> unix 2 [ ACC ] STREAM LISTENING 20020 2836/cib >> @cib_ro >> unix 2 [ ACC ] STREAM LISTENING 20057 2837/stonithd >> @stonith-ng >> unix 2 [ ACC ] STREAM LISTENING 19787 2806/corosync >> @cfg >> unix 2 [ ACC ] STREAM LISTENING 19906 >> 2834/pacemakerd @pacemakerd >> unix 2 [ ACC ] STREAM LISTENING 19788 2806/corosync >> @cpg >> unix 2 [ ACC ] STREAM LISTENING 20022 2836/cib >> @cib_shm >> unix 2 [ ACC ] STREAM LISTENING 19985 2840/pengine >> @pengine >> >> >> >> Thanks in advance, >> >> -- >> Best Regards, >> >> Radoslaw Garbacz >> >> _______________________________________________ >> Pacemaker mailing list: [email protected] >> http://oss.clusterlabs.org/mailman/listinfo/pacemaker >> >> Project Home: http://www.clusterlabs.org >> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf >> Bugs: http://bugs.clusterlabs.org > > > _______________________________________________ > Pacemaker mailing list: [email protected] > http://oss.clusterlabs.org/mailman/listinfo/pacemaker > > Project Home: http://www.clusterlabs.org > Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf > Bugs: http://bugs.clusterlabs.org > -- Best Regards, Radoslaw Garbacz XtremeData Incorporation _______________________________________________ Pacemaker mailing list: [email protected] http://oss.clusterlabs.org/mailman/listinfo/pacemaker Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://bugs.clusterlabs.org
