On 28/09/2013, at 5:37 AM, Radoslaw Garbacz <radoslaw.garb...@xtremedatainc.com> wrote:
> The problem was actually of a different nature - nothing to do with > cib_shm. The logs showed later on that the connection to cib was > established, just the corosync configuration file didn't hava a proper > quorum section, which caused the experienced problems. > > After fixing "corosync,conf" "quorum" section everything works. I would not have expected that one would result in the other. Glad you got it sorted out though! > > many thanks, > > > On Fri, Sep 27, 2013 at 2:16 PM, Radoslaw Garbacz > <radoslaw.garb...@xtremedatainc.com> wrote: >> cibadmin -Ql works, problem is persistent after upgrade, and the logs >> for "crmd" reviled the problem: >> >> Sep 27 16:19:22 [5074] ip-10-82-197-219 crmd: info: >> crm_ipc_connect: Could not establish cib_shm connection: Connection >> refused (111) >> Sep 27 16:19:22 [5074] ip-10-82-197-219 crmd: debug: >> cib_native_signon_raw: Connection unsuccessful (0 (nil)) >> Sep 27 16:19:22 [5074] ip-10-82-197-219 crmd: debug: >> cib_native_signon_raw: Connection to CIB failed: Transport endpoint >> is not connected >> >> I will keep searching for the solution, but in meantime, if you had a >> moment, any hint would be welcomed. >> >> many thanks, >> >> >> On Thu, Sep 26, 2013 at 9:25 PM, Andrew Beekhof <and...@beekhof.net> wrote: >>> >>> On 27/09/2013, at 8:45 AM, Radoslaw Garbacz >>> <radoslaw.garb...@xtremedatainc.com> wrote: >>> >>>> Hi, >>>> >>>> I have a problem starting up a cluster after upgrading corosync from >>>> 1.4 to 2.3.2 and pacemaker from 1.8 to 1.9. >>>> >>>> All "crm_node" calls report well, but any CIB manipulation fails, i.e.: >>>> * crm_node -q: 1 >>>> * crm_node -l: OK >>>> * crm_node -p: OK >>>> * cibadmin -Q: Call cib_query failed (-62): Timer expired >>> >>> Does cibadmin -Ql work? >>> If so, there might be a DC election going on (look in the logs for "crmd"). >>> Is the error transient or persistent? >>> >>>> >>>> No iptables, no SELinux, 3 nodes cluster, corosync.conf: >>>> ... >>>> ringnumber: 0 >>>> bindnetaddr: ... >>>> mcastport: 7800 >>>> } >>>> >>>> transport: udpu >>>> >>>> >>>> >>>> Any help greatly appreciated. >>>> >>>> >>>> Below is some more information: >>>> >>>> * pacemaker logs: >>>> >>>> Sep 26 22:24:00 [2836] ip-10-114-210-162 cib: info: >>>> crm_client_new: Connecting 0x111b780 for uid=0 gid=0 pid=2883 >>>> id=977d6f23-963b-41a4-8fe0-a63024080d41 >>>> Sep 26 22:24:00 [2836] ip-10-114-210-162 cib: info: >>>> cib_process_request: Forwarding cib_query operation for section >>>> 'all' to master (origin=local/cibadmin/2) >>>> Sep 26 22:24:30 [2836] ip-10-114-210-162 cib: info: >>>> crm_client_destroy: Destroying 0 events >>>> >>>> >>>> * ps axf | grep pacemaker|corosync: >>>> >>>> 2806 ? Ssl 0:10 corosync >>>> 2834 pts/1 S 0:00 pacemakerd >>>> 2836 ? Ss 0:01 \_ /usr/libexec/pacemaker/cib >>>> 2837 ? Ss 0:00 \_ /usr/libexec/pacemaker/stonithd >>>> 2838 ? Ss 0:00 \_ /usr/libexec/pacemaker/lrmd >>>> 2839 ? Ss 0:00 \_ /usr/libexec/pacemaker/attrd >>>> 2840 ? Ss 0:00 \_ /usr/libexec/pacemaker/pengine >>>> 2841 ? Ss 0:00 \_ /usr/libexec/pacemaker/crmd >>>> >>>> >>>> * strace cibadmin -Q: >>>> >>>> open("/dev/shm/qb-cib_rw-event-2836-2897-12-data", O_RDWR) = 6 >>>> ftruncate(6, 20480000) = 0 >>>> mmap(NULL, 40960000, PROT_NONE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = >>>> 0x7fa221692000 >>>> mmap(0x7fa221692000, 20480000, PROT_READ|PROT_WRITE, >>>> MAP_SHARED|MAP_FIXED, 6, 0) = 0x7fa221692000 >>>> mmap(0x7fa222a1a000, 20480000, PROT_READ|PROT_WRITE, >>>> MAP_SHARED|MAP_FIXED, 6, 0) = 0x7fa222a1a000 >>>> close(6) = 0 >>>> close(5) = 0 >>>> close(6) = -1 EBADF (Bad file descriptor) >>>> fstat(4, {st_mode=S_IFSOCK|0777, st_size=0, ...}) = 0 >>>> fcntl(4, F_GETFL) = 0x802 (flags O_RDWR|O_NONBLOCK) >>>> poll([{fd=4, events=POLLIN}], 1, 0) = 0 (Timeout) >>>> poll([{fd=4, events=POLLIN}], 1, 0) = 0 (Timeout) >>>> sendto(4, "~", 1, MSG_NOSIGNAL, NULL, 0) = 1 >>>> futex(0x7fa22df4cb60, FUTEX_WAKE_PRIVATE, 2147483647) = 0 >>>> gettimeofday({1380234692, 68879}, NULL) = 0 >>>> poll([{fd=4, events=POLLIN}], 1, 0) = 0 (Timeout) >>>> poll([{fd=4, events=POLLIN}], 1, 0) = 0 (Timeout) >>>> gettimeofday({1380234692, 69522}, NULL) = 0 >>>> sendto(4, "\274", 1, MSG_NOSIGNAL, NULL, 0) = 1 >>>> poll([{fd=4, events=POLLIN}], 1, 0) = 0 (Timeout) >>>> gettimeofday({1380234692, 70085}, NULL) = 0 >>>> gettimeofday({1380234692, 70197}, NULL) = 0 >>>> poll([{fd=4, events=POLLIN}], 1, 30000) = 0 (Timeout) >>>> gettimeofday({1380234722, 91625}, NULL) = 0 >>>> write(2, "Call cib_query failed (-62): Tim"..., 43Call cib_query >>>> failed (-62): Timer expired >>>> ) = 43 >>>> poll([{fd=4, events=POLLIN}], 1, 0) = 0 (Timeout) >>>> >>>> >>>> * netstat -lxp: >>>> >>>> Active UNIX domain sockets (only servers) >>>> Proto RefCnt Flags Type State I-Node PID/Program >>>> name Path >>>> unix 2 [ ACC ] STREAM LISTENING 20021 2836/cib >>>> @cib_rw >>>> unix 2 [ ACC ] STREAM LISTENING 19958 2838/lrmd >>>> @lrmd >>>> unix 2 [ ACC ] STREAM LISTENING 19789 2806/corosync >>>> @quorum >>>> unix 2 [ ACC ] STREAM LISTENING 19786 2806/corosync >>>> @cmap >>>> unix 2 [ ACC ] STREAM LISTENING 20020 2836/cib >>>> @cib_ro >>>> unix 2 [ ACC ] STREAM LISTENING 20057 2837/stonithd >>>> @stonith-ng >>>> unix 2 [ ACC ] STREAM LISTENING 19787 2806/corosync >>>> @cfg >>>> unix 2 [ ACC ] STREAM LISTENING 19906 >>>> 2834/pacemakerd @pacemakerd >>>> unix 2 [ ACC ] STREAM LISTENING 19788 2806/corosync >>>> @cpg >>>> unix 2 [ ACC ] STREAM LISTENING 20022 2836/cib >>>> @cib_shm >>>> unix 2 [ ACC ] STREAM LISTENING 19985 2840/pengine >>>> @pengine >>>> >>>> >>>> >>>> Thanks in advance, >>>> >>>> -- >>>> Best Regards, >>>> >>>> Radoslaw Garbacz >>>> >>>> _______________________________________________ >>>> Pacemaker mailing list: Pacemaker@oss.clusterlabs.org >>>> http://oss.clusterlabs.org/mailman/listinfo/pacemaker >>>> >>>> Project Home: http://www.clusterlabs.org >>>> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf >>>> Bugs: http://bugs.clusterlabs.org >>> >>> >>> _______________________________________________ >>> Pacemaker mailing list: Pacemaker@oss.clusterlabs.org >>> http://oss.clusterlabs.org/mailman/listinfo/pacemaker >>> >>> Project Home: http://www.clusterlabs.org >>> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf >>> Bugs: http://bugs.clusterlabs.org >>> >> >> >> >> -- >> Best Regards, >> >> Radoslaw Garbacz >> XtremeData Incorporation > > > > -- > Best Regards, > > Radoslaw Garbacz > XtremeData Incorporation > > _______________________________________________ > Pacemaker mailing list: Pacemaker@oss.clusterlabs.org > http://oss.clusterlabs.org/mailman/listinfo/pacemaker > > Project Home: http://www.clusterlabs.org > Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf > Bugs: http://bugs.clusterlabs.org
signature.asc
Description: Message signed with OpenPGP using GPGMail
_______________________________________________ Pacemaker mailing list: Pacemaker@oss.clusterlabs.org http://oss.clusterlabs.org/mailman/listinfo/pacemaker Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://bugs.clusterlabs.org