On 27/09/2013, at 8:45 AM, Radoslaw Garbacz <radoslaw.garb...@xtremedatainc.com> wrote:
> Hi, > > I have a problem starting up a cluster after upgrading corosync from > 1.4 to 2.3.2 and pacemaker from 1.8 to 1.9. > > All "crm_node" calls report well, but any CIB manipulation fails, i.e.: > * crm_node -q: 1 > * crm_node -l: OK > * crm_node -p: OK > * cibadmin -Q: Call cib_query failed (-62): Timer expired Does cibadmin -Ql work? If so, there might be a DC election going on (look in the logs for "crmd"). Is the error transient or persistent? > > No iptables, no SELinux, 3 nodes cluster, corosync.conf: > ... > ringnumber: 0 > bindnetaddr: ... > mcastport: 7800 > } > > transport: udpu > > > > Any help greatly appreciated. > > > Below is some more information: > > * pacemaker logs: > > Sep 26 22:24:00 [2836] ip-10-114-210-162 cib: info: > crm_client_new: Connecting 0x111b780 for uid=0 gid=0 pid=2883 > id=977d6f23-963b-41a4-8fe0-a63024080d41 > Sep 26 22:24:00 [2836] ip-10-114-210-162 cib: info: > cib_process_request: Forwarding cib_query operation for section > 'all' to master (origin=local/cibadmin/2) > Sep 26 22:24:30 [2836] ip-10-114-210-162 cib: info: > crm_client_destroy: Destroying 0 events > > > * ps axf | grep pacemaker|corosync: > > 2806 ? Ssl 0:10 corosync > 2834 pts/1 S 0:00 pacemakerd > 2836 ? Ss 0:01 \_ /usr/libexec/pacemaker/cib > 2837 ? Ss 0:00 \_ /usr/libexec/pacemaker/stonithd > 2838 ? Ss 0:00 \_ /usr/libexec/pacemaker/lrmd > 2839 ? Ss 0:00 \_ /usr/libexec/pacemaker/attrd > 2840 ? Ss 0:00 \_ /usr/libexec/pacemaker/pengine > 2841 ? Ss 0:00 \_ /usr/libexec/pacemaker/crmd > > > * strace cibadmin -Q: > > open("/dev/shm/qb-cib_rw-event-2836-2897-12-data", O_RDWR) = 6 > ftruncate(6, 20480000) = 0 > mmap(NULL, 40960000, PROT_NONE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = > 0x7fa221692000 > mmap(0x7fa221692000, 20480000, PROT_READ|PROT_WRITE, > MAP_SHARED|MAP_FIXED, 6, 0) = 0x7fa221692000 > mmap(0x7fa222a1a000, 20480000, PROT_READ|PROT_WRITE, > MAP_SHARED|MAP_FIXED, 6, 0) = 0x7fa222a1a000 > close(6) = 0 > close(5) = 0 > close(6) = -1 EBADF (Bad file descriptor) > fstat(4, {st_mode=S_IFSOCK|0777, st_size=0, ...}) = 0 > fcntl(4, F_GETFL) = 0x802 (flags O_RDWR|O_NONBLOCK) > poll([{fd=4, events=POLLIN}], 1, 0) = 0 (Timeout) > poll([{fd=4, events=POLLIN}], 1, 0) = 0 (Timeout) > sendto(4, "~", 1, MSG_NOSIGNAL, NULL, 0) = 1 > futex(0x7fa22df4cb60, FUTEX_WAKE_PRIVATE, 2147483647) = 0 > gettimeofday({1380234692, 68879}, NULL) = 0 > poll([{fd=4, events=POLLIN}], 1, 0) = 0 (Timeout) > poll([{fd=4, events=POLLIN}], 1, 0) = 0 (Timeout) > gettimeofday({1380234692, 69522}, NULL) = 0 > sendto(4, "\274", 1, MSG_NOSIGNAL, NULL, 0) = 1 > poll([{fd=4, events=POLLIN}], 1, 0) = 0 (Timeout) > gettimeofday({1380234692, 70085}, NULL) = 0 > gettimeofday({1380234692, 70197}, NULL) = 0 > poll([{fd=4, events=POLLIN}], 1, 30000) = 0 (Timeout) > gettimeofday({1380234722, 91625}, NULL) = 0 > write(2, "Call cib_query failed (-62): Tim"..., 43Call cib_query > failed (-62): Timer expired > ) = 43 > poll([{fd=4, events=POLLIN}], 1, 0) = 0 (Timeout) > > > * netstat -lxp: > > Active UNIX domain sockets (only servers) > Proto RefCnt Flags Type State I-Node PID/Program > name Path > unix 2 [ ACC ] STREAM LISTENING 20021 2836/cib > @cib_rw > unix 2 [ ACC ] STREAM LISTENING 19958 2838/lrmd > @lrmd > unix 2 [ ACC ] STREAM LISTENING 19789 2806/corosync > @quorum > unix 2 [ ACC ] STREAM LISTENING 19786 2806/corosync > @cmap > unix 2 [ ACC ] STREAM LISTENING 20020 2836/cib > @cib_ro > unix 2 [ ACC ] STREAM LISTENING 20057 2837/stonithd > @stonith-ng > unix 2 [ ACC ] STREAM LISTENING 19787 2806/corosync > @cfg > unix 2 [ ACC ] STREAM LISTENING 19906 > 2834/pacemakerd @pacemakerd > unix 2 [ ACC ] STREAM LISTENING 19788 2806/corosync > @cpg > unix 2 [ ACC ] STREAM LISTENING 20022 2836/cib > @cib_shm > unix 2 [ ACC ] STREAM LISTENING 19985 2840/pengine > @pengine > > > > Thanks in advance, > > -- > Best Regards, > > Radoslaw Garbacz > > _______________________________________________ > Pacemaker mailing list: Pacemaker@oss.clusterlabs.org > http://oss.clusterlabs.org/mailman/listinfo/pacemaker > > Project Home: http://www.clusterlabs.org > Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf > Bugs: http://bugs.clusterlabs.org
signature.asc
Description: Message signed with OpenPGP using GPGMail
_______________________________________________ Pacemaker mailing list: Pacemaker@oss.clusterlabs.org http://oss.clusterlabs.org/mailman/listinfo/pacemaker Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://bugs.clusterlabs.org