Re: [Openais] aisexec crashes with SIGABRT
Steve, our setup is a 2-node cluster: - mstel21 (172.24.100.10) - attached gdb output from here - mstel22 (172.24.100.12) log file from 172.24.100.12: Aug 29 08:04:49 172.24.100.12 openais[4656]: [TOTEM] The token was lost in the OPERATIONAL state. Aug 29 08:04:50 172.24.100.12 openais[4656]: [TOTEM] Receive multicast socket recv buffer size (262142 bytes). Aug 29 08:04:50 172.24.100.12 openais[4656]: [TOTEM] Transmit multicast socket send buffer size (262142 bytes). Aug 29 08:04:50 172.24.100.12 openais[4656]: [TOTEM] entering GATHER state from 2. Aug 29 08:04:50 172.24.100.12 openais[4656]: [TOTEM] entering GATHER state from 0. Aug 29 08:04:50 172.24.100.12 openais[4656]: [TOTEM] Creating commit token because I am the rep. Aug 29 08:04:50 172.24.100.12 openais[4656]: [TOTEM] Storing new sequence id for ring ff4 Aug 29 08:04:50 172.24.100.12 openais[4656]: [TOTEM] entering COMMIT state. Aug 29 08:04:50 172.24.100.12 openais[4656]: [TOTEM] entering RECOVERY state. Aug 29 08:04:50 172.24.100.12 openais[4656]: [TOTEM] position [0] member 172.24.100.12: Aug 29 08:04:50 172.24.100.12 openais[4656]: [TOTEM] previous ring seq 4080 rep 172.24.100.10 Aug 29 08:04:50 172.24.100.12 openais[4656]: [TOTEM] aru 425c1 high delivered 425c1 received flag 1 Aug 29 08:04:50 172.24.100.12 openais[4656]: [TOTEM] Did not need to originate any messages in recovery. Aug 29 08:04:50 172.24.100.12 openais[4656]: [TOTEM] Sending initial ORF token Aug 29 08:04:50 172.24.100.12 openais[4656]: [CLM ] CLM CONFIGURATION CHANGE Aug 29 08:04:50 172.24.100.12 openais[4656]: [CLM ] New Configuration: Aug 29 08:04:50 172.24.100.12 openais[4656]: [CLM ] r(0) ip(172.24.100.12) Aug 29 08:04:50 172.24.100.12 openais[4656]: [CLM ] Members Left: Aug 29 08:04:50 172.24.100.12 openais[4656]: [CLM ] r(0) ip(172.24.100.10) Aug 29 08:04:50 172.24.100.12 openais[4656]: [CLM ] Members Joined: Aug 29 08:04:50 172.24.100.12 openais[4656]: [CLM ] CLM CONFIGURATION CHANGE Aug 29 08:04:50 172.24.100.12 openais[4656]: [CLM ] New Configuration: Aug 29 08:04:50 172.24.100.12 openais[4656]: [CLM ] r(0) ip(172.24.100.12) Aug 29 08:04:50 172.24.100.12 openais[4656]: [CLM ] Members Left: Aug 29 08:04:50 172.24.100.12 openais[4656]: [CLM ] Members Joined: Aug 29 08:04:50 172.24.100.12 openais[4656]: [SYNC ] This node is within the primary component and will provide service. Aug 29 08:04:50 172.24.100.12 openais[4656]: [TOTEM] entering OPERATIONAL state. Aug 29 08:04:50 172.24.100.12 openais[4656]: [CLM ] got nodejoin message 172.24.100.12 Aug 29 08:04:50 172.24.100.12 openais[4656]: [EVT ] Channel device_state, total 1, local 1 Aug 29 08:04:50 172.24.100.12 openais[4656]: [EVT ] Node r(0) ip(172.24.100.12) , count 1 Aug 29 08:04:50 172.24.100.12 openais[4656]: [EVT ] Channel mwi, total 1, local 1 Aug 29 08:04:50 172.24.100.12 openais[4656]: [EVT ] Node r(0) ip(172.24.100.12) , count 1 Thanks, -- Chris Kirke Director - Systems Architecture Multi Service Corporation www.multiservice.com +1.913.663.9483 (direct) +1.816.718.0468 (mobile) +1.913.217.9318 (fax) On Tue, Aug 23, 2011 at 11:45, Christopher A. Kirke caki...@multiservice.com wrote: Steve, appreciate the quick response, only happened to be running strace during one of the crashes. i've updated /etc/init.d/openais to enable core dump and changed openais.conf to run as root:asterisk instead of asterisk:asterisk so /var/lib/openais is available for writing. will post gdb output from the next aisexec crash. Thanks, -- Chris Kirke Director - Systems Architecture Multi Service Corporation www.multiservice.com +1.913.663.9483 (direct) +1.816.718.0468 (mobile) +1.913.217.9318 (fax) On Mon, Aug 22, 2011 at 12:13, Steven Dake sd...@redhat.com wrote: On 08/22/2011 09:58 AM, Christopher A. Kirke wrote: currently using the REL5-provided package on two nodes on local Cisco-switched LAN: openais.x86_64 0.80.6-28.el5_6.1 installed with following configuration: # Please read the openais.conf.5 manual page aisexec { user: asterisk group: asterisk } totem { version: 2 secauth: off threads: 0 interface { ringnumber: 0 bindnetaddr: 172.24.100.0 mcastaddr: 239.255.4.1 mcastport: 5405 } } logging { debug: off syslog_facility: local1 syslog_priority: info timestamp: off to_file: no to_syslog: yes } amf { mode: disabled } to enable Asterisk distributed device state. we see cases where aisexec crashes, both with Asterisk running and stopped - strace output below: 0.73 poll([{fd=1, events=POLLIN}, {fd=3, events=POLLIN}, {fd=4, events=POLLIN|POLLNVAL}], 3, 10) = 0 (Timeout) 0.009994 0.010031 poll([{fd=1, events=POLLIN}, {fd=3, events=POLLIN}, {fd=4, events=POLLIN|POLLNVAL}], 3, 237) = 1 ([{fd=1, revents=POLLIN}]) 0.07 0.37 recvmsg(1, {msg_name(16)={sa_family=AF_INET, sin_port=htons(5149),
Re: [Openais] aisexec crashes with SIGABRT
On 08/29/2011 06:26 PM, Christopher A. Kirke wrote: Steve, The core file doesn't have debuginfo installed when it was analyzed. The package you want is something like openais-debuginfo. You may have to enable the debuginfo yum repo if you have not already. Regards -steve our setup is a 2-node cluster: * mstel21 (172.24.100.10) - attached gdb output from here * mstel22 (172.24.100.12) log file from 172.24.100.12 http://172.24.100.12: Aug 29 08:04:49 172.24.100.12 openais[4656]: [TOTEM] The token was lost in the OPERATIONAL state. Aug 29 08:04:50 172.24.100.12 openais[4656]: [TOTEM] Receive multicast socket recv buffer size (262142 bytes). Aug 29 08:04:50 172.24.100.12 openais[4656]: [TOTEM] Transmit multicast socket send buffer size (262142 bytes). Aug 29 08:04:50 172.24.100.12 openais[4656]: [TOTEM] entering GATHER state from 2. Aug 29 08:04:50 172.24.100.12 openais[4656]: [TOTEM] entering GATHER state from 0. Aug 29 08:04:50 172.24.100.12 openais[4656]: [TOTEM] Creating commit token because I am the rep. Aug 29 08:04:50 172.24.100.12 openais[4656]: [TOTEM] Storing new sequence id for ring ff4 Aug 29 08:04:50 172.24.100.12 openais[4656]: [TOTEM] entering COMMIT state. Aug 29 08:04:50 172.24.100.12 openais[4656]: [TOTEM] entering RECOVERY state. Aug 29 08:04:50 172.24.100.12 openais[4656]: [TOTEM] position [0] member 172.24.100.12 http://172.24.100.12: Aug 29 08:04:50 172.24.100.12 openais[4656]: [TOTEM] previous ring seq 4080 rep 172.24.100.10 Aug 29 08:04:50 172.24.100.12 openais[4656]: [TOTEM] aru 425c1 high delivered 425c1 received flag 1 Aug 29 08:04:50 172.24.100.12 openais[4656]: [TOTEM] Did not need to originate any messages in recovery. Aug 29 08:04:50 172.24.100.12 openais[4656]: [TOTEM] Sending initial ORF token Aug 29 08:04:50 172.24.100.12 openais[4656]: [CLM ] CLM CONFIGURATION CHANGE Aug 29 08:04:50 172.24.100.12 openais[4656]: [CLM ] New Configuration: Aug 29 08:04:50 172.24.100.12 openais[4656]: [CLM ] r(0) ip(172.24.100.12) Aug 29 08:04:50 172.24.100.12 openais[4656]: [CLM ] Members Left: Aug 29 08:04:50 172.24.100.12 openais[4656]: [CLM ] r(0) ip(172.24.100.10) Aug 29 08:04:50 172.24.100.12 openais[4656]: [CLM ] Members Joined: Aug 29 08:04:50 172.24.100.12 openais[4656]: [CLM ] CLM CONFIGURATION CHANGE Aug 29 08:04:50 172.24.100.12 openais[4656]: [CLM ] New Configuration: Aug 29 08:04:50 172.24.100.12 openais[4656]: [CLM ] r(0) ip(172.24.100.12) Aug 29 08:04:50 172.24.100.12 openais[4656]: [CLM ] Members Left: Aug 29 08:04:50 172.24.100.12 openais[4656]: [CLM ] Members Joined: Aug 29 08:04:50 172.24.100.12 openais[4656]: [SYNC ] This node is within the primary component and will provide service. Aug 29 08:04:50 172.24.100.12 openais[4656]: [TOTEM] entering OPERATIONAL state. Aug 29 08:04:50 172.24.100.12 openais[4656]: [CLM ] got nodejoin message 172.24.100.12 Aug 29 08:04:50 172.24.100.12 openais[4656]: [EVT ] Channel device_state, total 1, local 1 Aug 29 08:04:50 172.24.100.12 openais[4656]: [EVT ] Node r(0) ip(172.24.100.12) , count 1 Aug 29 08:04:50 172.24.100.12 openais[4656]: [EVT ] Channel mwi, total 1, local 1 Aug 29 08:04:50 172.24.100.12 openais[4656]: [EVT ] Node r(0) ip(172.24.100.12) , count 1 Thanks, -- Chris Kirke Director - Systems Architecture Multi Service Corporation www.multiservice.com http://www.multiservice.com +1.913.663.9483 (direct) +1.816.718.0468 (mobile) +1.913.217.9318 (fax) On Tue, Aug 23, 2011 at 11:45, Christopher A. Kirke caki...@multiservice.com mailto:caki...@multiservice.com wrote: Steve, appreciate the quick response, only happened to be running strace during one of the crashes. i've updated /etc/init.d/openais to enable core dump and changed openais.conf to run as root:asterisk instead of asterisk:asterisk so /var/lib/openais is available for writing. will post gdb output from the next aisexec crash. Thanks, -- Chris Kirke Director - Systems Architecture Multi Service Corporation www.multiservice.com http://www.multiservice.com +1.913.663.9483 tel:%2B1.913.663.9483 (direct) +1.816.718.0468 tel:%2B1.816.718.0468 (mobile) +1.913.217.9318 tel:%2B1.913.217.9318 (fax) On Mon, Aug 22, 2011 at 12:13, Steven Dake sd...@redhat.com mailto:sd...@redhat.com wrote: On 08/22/2011 09:58 AM, Christopher A. Kirke wrote: currently using the REL5-provided package on two nodes on local Cisco-switched LAN: openais.x86_64 0.80.6-28.el5_6.1 installed with following configuration: # Please read the openais.conf.5 manual page aisexec { user: asterisk group: asterisk } totem {
Re: [Openais] aisexec crashes with SIGABRT
On 08/30/2011 06:39 AM, Christopher A. Kirke wrote: Steve, i sometimes need to be smacked by the obvious :^) updated analysis attached ... THe core happens because the event service from openais de-referenes an object with a reference count of 0. This shouldn't happen, but can't explain why it does. We stopped supporting the sa forum services last year in this project, but understanding that we may not be able to fix the problem, If you could provide more details of the exact scenario which triggers this problem, that might be helpful in reproducing the issue. Regards -steve Thanks, -- Chris Kirke Director - Systems Architecture Multi Service Corporation www.multiservice.com http://www.multiservice.com +1.913.663.9483 (direct) +1.816.718.0468 (mobile) +1.913.217.9318 (fax) On Mon, Aug 29, 2011 at 22:43, Steven Dake sd...@redhat.com mailto:sd...@redhat.com wrote: On 08/29/2011 06:26 PM, Christopher A. Kirke wrote: Steve, The core file doesn't have debuginfo installed when it was analyzed. The package you want is something like openais-debuginfo. You may have to enable the debuginfo yum repo if you have not already. Regards -steve our setup is a 2-node cluster: * mstel21 (172.24.100.10) - attached gdb output from here * mstel22 (172.24.100.12) log file from 172.24.100.12 http://172.24.100.12: Aug 29 08:04:49 172.24.100.12 openais[4656]: [TOTEM] The token was lost in the OPERATIONAL state. Aug 29 08:04:50 172.24.100.12 openais[4656]: [TOTEM] Receive multicast socket recv buffer size (262142 bytes). Aug 29 08:04:50 172.24.100.12 openais[4656]: [TOTEM] Transmit multicast socket send buffer size (262142 bytes). Aug 29 08:04:50 172.24.100.12 openais[4656]: [TOTEM] entering GATHER state from 2. Aug 29 08:04:50 172.24.100.12 openais[4656]: [TOTEM] entering GATHER state from 0. Aug 29 08:04:50 172.24.100.12 openais[4656]: [TOTEM] Creating commit token because I am the rep. Aug 29 08:04:50 172.24.100.12 openais[4656]: [TOTEM] Storing new sequence id for ring ff4 Aug 29 08:04:50 172.24.100.12 openais[4656]: [TOTEM] entering COMMIT state. Aug 29 08:04:50 172.24.100.12 openais[4656]: [TOTEM] entering RECOVERY state. Aug 29 08:04:50 172.24.100.12 openais[4656]: [TOTEM] position [0] member 172.24.100.12 http://172.24.100.12: Aug 29 08:04:50 172.24.100.12 openais[4656]: [TOTEM] previous ring seq 4080 rep 172.24.100.10 Aug 29 08:04:50 172.24.100.12 openais[4656]: [TOTEM] aru 425c1 high delivered 425c1 received flag 1 Aug 29 08:04:50 172.24.100.12 openais[4656]: [TOTEM] Did not need to originate any messages in recovery. Aug 29 08:04:50 172.24.100.12 openais[4656]: [TOTEM] Sending initial ORF token Aug 29 08:04:50 172.24.100.12 openais[4656]: [CLM ] CLM CONFIGURATION CHANGE Aug 29 08:04:50 172.24.100.12 openais[4656]: [CLM ] New Configuration: Aug 29 08:04:50 172.24.100.12 openais[4656]: [CLM ] r(0) ip(172.24.100.12) Aug 29 08:04:50 172.24.100.12 openais[4656]: [CLM ] Members Left: Aug 29 08:04:50 172.24.100.12 openais[4656]: [CLM ] r(0) ip(172.24.100.10) Aug 29 08:04:50 172.24.100.12 openais[4656]: [CLM ] Members Joined: Aug 29 08:04:50 172.24.100.12 openais[4656]: [CLM ] CLM CONFIGURATION CHANGE Aug 29 08:04:50 172.24.100.12 openais[4656]: [CLM ] New Configuration: Aug 29 08:04:50 172.24.100.12 openais[4656]: [CLM ] r(0) ip(172.24.100.12) Aug 29 08:04:50 172.24.100.12 openais[4656]: [CLM ] Members Left: Aug 29 08:04:50 172.24.100.12 openais[4656]: [CLM ] Members Joined: Aug 29 08:04:50 172.24.100.12 openais[4656]: [SYNC ] This node is within the primary component and will provide service. Aug 29 08:04:50 172.24.100.12 openais[4656]: [TOTEM] entering OPERATIONAL state. Aug 29 08:04:50 172.24.100.12 openais[4656]: [CLM ] got nodejoin message 172.24.100.12 Aug 29 08:04:50 172.24.100.12 openais[4656]: [EVT ] Channel device_state, total 1, local 1 Aug 29 08:04:50 172.24.100.12 openais[4656]: [EVT ] Node r(0) ip(172.24.100.12) , count 1 Aug 29 08:04:50 172.24.100.12 openais[4656]: [EVT ] Channel mwi, total 1, local 1 Aug 29 08:04:50 172.24.100.12 openais[4656]: [EVT ] Node r(0) ip(172.24.100.12) , count 1 Thanks, -- Chris Kirke Director - Systems Architecture Multi Service Corporation www.multiservice.com http://www.multiservice.com http://www.multiservice.com +1.913.663.9483 tel:%2B1.913.663.9483 (direct) +1.816.718.0468 tel:%2B1.816.718.0468 (mobile) +1.913.217.9318 tel:%2B1.913.217.9318 (fax) On Tue, Aug 23,
Re: [Openais] aisexec crashes with SIGABRT
Steve, appreciate the quick response, only happened to be running strace during one of the crashes. i've updated /etc/init.d/openais to enable core dump and changed openais.conf to run as root:asterisk instead of asterisk:asterisk so /var/lib/openais is available for writing. will post gdb output from the next aisexec crash. Thanks, -- Chris Kirke Director - Systems Architecture Multi Service Corporation www.multiservice.com +1.913.663.9483 (direct) +1.816.718.0468 (mobile) +1.913.217.9318 (fax) On Mon, Aug 22, 2011 at 12:13, Steven Dake sd...@redhat.com wrote: On 08/22/2011 09:58 AM, Christopher A. Kirke wrote: currently using the REL5-provided package on two nodes on local Cisco-switched LAN: openais.x86_64 0.80.6-28.el5_6.1 installed with following configuration: # Please read the openais.conf.5 manual page aisexec { user: asterisk group: asterisk } totem { version: 2 secauth: off threads: 0 interface { ringnumber: 0 bindnetaddr: 172.24.100.0 mcastaddr: 239.255.4.1 mcastport: 5405 } } logging { debug: off syslog_facility: local1 syslog_priority: info timestamp: off to_file: no to_syslog: yes } amf { mode: disabled } to enable Asterisk distributed device state. we see cases where aisexec crashes, both with Asterisk running and stopped - strace output below: 0.73 poll([{fd=1, events=POLLIN}, {fd=3, events=POLLIN}, {fd=4, events=POLLIN|POLLNVAL}], 3, 10) = 0 (Timeout) 0.009994 0.010031 poll([{fd=1, events=POLLIN}, {fd=3, events=POLLIN}, {fd=4, events=POLLIN|POLLNVAL}], 3, 237) = 1 ([{fd=1, revents=POLLIN}]) 0.07 0.37 recvmsg(1, {msg_name(16)={sa_family=AF_INET, sin_port=htons(5149), sin_addr=inet_addr(172.24.100.10)}, msg_iov(1)=[{\2\0\\377\254\30d\n\254\30d\n\2\0\254\30d\n\10\0\2\0\254\30d\n\10\0\4\0\0\0..., 1}], msg_controllen=0, msg_flags=0}, MSG_DONTWAIT|MSG_NOSIGNAL) = 82 0.10 0.72 poll([{fd=1, events=POLLIN}, {fd=3, events=POLLIN}, {fd=4, events=POLLIN|POLLNVAL}], 3, 237) = 1 ([{fd=3, revents=POLLIN}]) 0.180257 0.180339 recvmsg(3, {msg_name(16)={sa_family=AF_INET, sin_port=htons(5149), sin_addr=inet_addr(172.24.100.10)}, msg_iov(1)=[{\0\0\\377\254\30d\fN\1\0\0004/\0\0N\1\0\0\0\0\0\0\254\30d\n\2\0\254\30..., 1}], msg_controllen=0, msg_flags=0}, MSG_DONTWAIT|MSG_NOSIGNAL) = 70 0.22 0.000104 sendmsg(2, {msg_name(16)={sa_family=AF_INET, sin_port=htons(5405), sin_addr=inet_addr(172.24.100.10)}, msg_iov(1)=[{\0\0\\377\254\30d\fN\1\0\0005/\0\0N\1\0\0\0\0\0\0\254\30d\n\2\0\254\30..., 70}], msg_controllen=0, msg_flags=0}, MSG_NOSIGNAL) = 70 0.14 0.72 poll([{fd=1, events=POLLIN}, {fd=3, events=POLLIN}, {fd=4, events=POLLIN|POLLNVAL}], 3, 209) = 1 ([{fd=4, revents=POLLIN}]) 0.037614 0.037682 accept(4, {sa_family=AF_FILE, path=@}, [4294967298]) = 5 0.23 0.96 fcntl(5, F_SETFL, O_RDONLY|O_NONBLOCK) = 0 0.12 0.40 setsockopt(5, SOL_SOCKET, SO_PASSCRED, [1], 4) = 0 0.06 0.70 poll([{fd=1, events=POLLIN}, {fd=3, events=POLLIN}, {fd=4, events=POLLIN|POLLNVAL}, {fd=5, events=POLLIN|POLLNVAL}], 4, 172) = 1 ([{fd=5, revents=POLLIN}]) 0.000158 0.000207 setsockopt(5, SOL_SOCKET, SO_PASSCRED, [1], 4) = 0 0.06 0.29 recvmsg(5, {msg_name(0)=NULL, msg_iov(1)=[{\1\0\0\0\252*\0\0\224\343TE\0\0\0\0\270\355\362+\0\0\0\0, 24}], msg_controllen=32, {cmsg_len=28, cmsg_level=SOL_SOCKET, cmsg_type=SCM_CREDENTIALS{pid=819, uid=301, gid=301}}, msg_flags=0}, MSG_NOSIGNAL) = 24 0.14 0.84 setsockopt(5, SOL_SOCKET, SO_PASSCRED, [0], 4) = 0 0.06 0.28 sendto(5, \1\0\0\0\0\0\0\0, 8, MSG_WAITALL, NULL, 0) = 8 0.07 0.31 shmget(0x4554e394, 308, 0600) = 5144599 0.000118 0.000198 shmat(5144599, 0, 0) = ? 0.002286 0.002332 semget(0x2bf2edb8, 3, 0600) = 1081360 0.000108 0.000155 mmap(NULL, 20, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS|MAP_32BIT, -1, 0) = 0x41781000 0.000212 0.000262 mprotect(0x41781000, 4096, PROT_NONE) = 0 0.20 0.51 clone(child_stack=0x417b0f90, flags=CLONE_VM|CLONE_FS|CLONE_FILES|CLONE_SIGHAND|CLONE_THREAD|CLONE_SYSVSEM|CLONE_SETTLS|CLONE_PARENT_SETTID|CLONE_CHILD_CLEARTID, parent_tidptr=0x417b1710, tls=0x417b1680, child_tidptr=0x417b1710) = 859 0.46 0.000109 poll([{fd=1, events=POLLIN}, {fd=3, events=POLLIN}, {fd=4, events=POLLIN|POLLNVAL}, {fd=5, events=POLLIN|POLLNVAL}], 4, 168) = 1 ([{fd=4, revents=POLLIN}]) 0.000924 0.000980 accept(4, {sa_family=AF_FILE, path=@}, [4294967298]) = 6 0.11 0.63 fcntl(6, F_SETFL, O_RDONLY|O_NONBLOCK) = 0 0.06 0.49 setsockopt(6, SOL_SOCKET, SO_PASSCRED, [1], 4) = 0 0.06 0.33 poll([{fd=1, events=POLLIN}, {fd=3, events=POLLIN}, {fd=4, events=POLLIN|POLLNVAL}, {fd=5,
[Openais] aisexec crashes with SIGABRT
currently using the REL5-provided package on two nodes on local Cisco-switched LAN: openais.x86_64 0.80.6-28.el5_6.1 installed with following configuration: # Please read the openais.conf.5 manual page aisexec { user: asterisk group: asterisk } totem { version: 2 secauth: off threads: 0 interface { ringnumber: 0 bindnetaddr: 172.24.100.0 mcastaddr: 239.255.4.1 mcastport: 5405 } } logging { debug: off syslog_facility: local1 syslog_priority: info timestamp: off to_file: no to_syslog: yes } amf { mode: disabled } to enable Asterisk distributed device state. we see cases where aisexec crashes, both with Asterisk running and stopped - strace output below: 0.73 poll([{fd=1, events=POLLIN}, {fd=3, events=POLLIN}, {fd=4, events=POLLIN|POLLNVAL}], 3, 10) = 0 (Timeout) 0.009994 0.010031 poll([{fd=1, events=POLLIN}, {fd=3, events=POLLIN}, {fd=4, events=POLLIN|POLLNVAL}], 3, 237) = 1 ([{fd=1, revents=POLLIN}]) 0.07 0.37 recvmsg(1, {msg_name(16)={sa_family=AF_INET, sin_port=htons(5149), sin_addr=inet_addr(172.24.100.10)}, msg_iov(1)=[{\2\0\\377\254\30d\n\254\30d\n\2\0\254\30d\n\10\0\2\0\254\30d\n\10\0\4\0\0\0..., 1}], msg_controllen=0, msg_flags=0}, MSG_DONTWAIT|MSG_NOSIGNAL) = 82 0.10 0.72 poll([{fd=1, events=POLLIN}, {fd=3, events=POLLIN}, {fd=4, events=POLLIN|POLLNVAL}], 3, 237) = 1 ([{fd=3, revents=POLLIN}]) 0.180257 0.180339 recvmsg(3, {msg_name(16)={sa_family=AF_INET, sin_port=htons(5149), sin_addr=inet_addr(172.24.100.10)}, msg_iov(1)=[{\0\0\\377\254\30d\fN\1\0\0004/\0\0N\1\0\0\0\0\0\0\254\30d\n\2\0\254\30..., 1}], msg_controllen=0, msg_flags=0}, MSG_DONTWAIT|MSG_NOSIGNAL) = 70 0.22 0.000104 sendmsg(2, {msg_name(16)={sa_family=AF_INET, sin_port=htons(5405), sin_addr=inet_addr(172.24.100.10)}, msg_iov(1)=[{\0\0\\377\254\30d\fN\1\0\0005/\0\0N\1\0\0\0\0\0\0\254\30d\n\2\0\254\30..., 70}], msg_controllen=0, msg_flags=0}, MSG_NOSIGNAL) = 70 0.14 0.72 poll([{fd=1, events=POLLIN}, {fd=3, events=POLLIN}, {fd=4, events=POLLIN|POLLNVAL}], 3, 209) = 1 ([{fd=4, revents=POLLIN}]) 0.037614 0.037682 accept(4, {sa_family=AF_FILE, path=@}, [4294967298]) = 5 0.23 0.96 fcntl(5, F_SETFL, O_RDONLY|O_NONBLOCK) = 0 0.12 0.40 setsockopt(5, SOL_SOCKET, SO_PASSCRED, [1], 4) = 0 0.06 0.70 poll([{fd=1, events=POLLIN}, {fd=3, events=POLLIN}, {fd=4, events=POLLIN|POLLNVAL}, {fd=5, events=POLLIN|POLLNVAL}], 4, 172) = 1 ([{fd=5, revents=POLLIN}]) 0.000158 0.000207 setsockopt(5, SOL_SOCKET, SO_PASSCRED, [1], 4) = 0 0.06 0.29 recvmsg(5, {msg_name(0)=NULL, msg_iov(1)=[{\1\0\0\0\252*\0\0\224\343TE\0\0\0\0\270\355\362+\0\0\0\0, 24}], msg_controllen=32, {cmsg_len=28, cmsg_level=SOL_SOCKET, cmsg_type=SCM_CREDENTIALS{pid=819, uid=301, gid=301}}, msg_flags=0}, MSG_NOSIGNAL) = 24 0.14 0.84 setsockopt(5, SOL_SOCKET, SO_PASSCRED, [0], 4) = 0 0.06 0.28 sendto(5, \1\0\0\0\0\0\0\0, 8, MSG_WAITALL, NULL, 0) = 8 0.07 0.31 shmget(0x4554e394, 308, 0600) = 5144599 0.000118 0.000198 shmat(5144599, 0, 0) = ? 0.002286 0.002332 semget(0x2bf2edb8, 3, 0600) = 1081360 0.000108 0.000155 mmap(NULL, 20, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS|MAP_32BIT, -1, 0) = 0x41781000 0.000212 0.000262 mprotect(0x41781000, 4096, PROT_NONE) = 0 0.20 0.51 clone(child_stack=0x417b0f90, flags=CLONE_VM|CLONE_FS|CLONE_FILES|CLONE_SIGHAND|CLONE_THREAD|CLONE_SYSVSEM|CLONE_SETTLS|CLONE_PARENT_SETTID|CLONE_CHILD_CLEARTID, parent_tidptr=0x417b1710, tls=0x417b1680, child_tidptr=0x417b1710) = 859 0.46 0.000109 poll([{fd=1, events=POLLIN}, {fd=3, events=POLLIN}, {fd=4, events=POLLIN|POLLNVAL}, {fd=5, events=POLLIN|POLLNVAL}], 4, 168) = 1 ([{fd=4, revents=POLLIN}]) 0.000924 0.000980 accept(4, {sa_family=AF_FILE, path=@}, [4294967298]) = 6 0.11 0.63 fcntl(6, F_SETFL, O_RDONLY|O_NONBLOCK) = 0 0.06 0.49 setsockopt(6, SOL_SOCKET, SO_PASSCRED, [1], 4) = 0 0.06 0.33 poll([{fd=1, events=POLLIN}, {fd=3, events=POLLIN}, {fd=4, events=POLLIN|POLLNVAL}, {fd=5, events=POLLIN|POLLNVAL}, {fd=6, events=POLLIN|POLLNVAL}], 5, 167) = 1 ([{fd=6, revents=POLLIN}]) 0.07 0.44 setsockopt(6, SOL_SOCKET, SO_PASSCRED, [1], 4) = 0 0.06 0.42 recvmsg(6, {msg_name(0)=NULL, msg_iov(1)=[{\4\0\0\0\252*\0\0Rv-b\0\0\0\0A\246\10B\0\0\0\0, 24}], msg_controllen=32, {cmsg_len=28, cmsg_level=SOL_SOCKET, cmsg_type=SCM_CREDENTIALS{pid=819, uid=301, gid=301}}, msg_flags=0}, MSG_NOSIGNAL) = 24 0.10 0.72 setsockopt(6, SOL_SOCKET, SO_PASSCRED, [0], 4) = 0 0.06 0.41 sendto(6, \1\0\0\0\0\0\0\0, 8, MSG_WAITALL, NULL, 0) = 8 0.07 0.37 shmget(0x622d7652, 308, 0600) = 5177368 0.06 0.36 shmat(5177368, 0, 0) = ? 0.002768 0.002814 semget(0x4208a641, 3, 0600) = 1114129 0.08 0.54 mmap(NULL, 20, PROT_READ|PROT_WRITE,
Re: [Openais] aisexec crashes with SIGABRT
On 08/22/2011 09:58 AM, Christopher A. Kirke wrote: currently using the REL5-provided package on two nodes on local Cisco-switched LAN: openais.x86_64 0.80.6-28.el5_6.1 installed with following configuration: # Please read the openais.conf.5 manual page aisexec { user: asterisk group: asterisk } totem { version: 2 secauth: off threads: 0 interface { ringnumber: 0 bindnetaddr: 172.24.100.0 mcastaddr: 239.255.4.1 mcastport: 5405 } } logging { debug: off syslog_facility: local1 syslog_priority: info timestamp: off to_file: no to_syslog: yes } amf { mode: disabled } to enable Asterisk distributed device state. we see cases where aisexec crashes, both with Asterisk running and stopped - strace output below: 0.73 poll([{fd=1, events=POLLIN}, {fd=3, events=POLLIN}, {fd=4, events=POLLIN|POLLNVAL}], 3, 10) = 0 (Timeout) 0.009994 0.010031 poll([{fd=1, events=POLLIN}, {fd=3, events=POLLIN}, {fd=4, events=POLLIN|POLLNVAL}], 3, 237) = 1 ([{fd=1, revents=POLLIN}]) 0.07 0.37 recvmsg(1, {msg_name(16)={sa_family=AF_INET, sin_port=htons(5149), sin_addr=inet_addr(172.24.100.10)}, msg_iov(1)=[{\2\0\\377\254\30d\n\254\30d\n\2\0\254\30d\n\10\0\2\0\254\30d\n\10\0\4\0\0\0..., 1}], msg_controllen=0, msg_flags=0}, MSG_DONTWAIT|MSG_NOSIGNAL) = 82 0.10 0.72 poll([{fd=1, events=POLLIN}, {fd=3, events=POLLIN}, {fd=4, events=POLLIN|POLLNVAL}], 3, 237) = 1 ([{fd=3, revents=POLLIN}]) 0.180257 0.180339 recvmsg(3, {msg_name(16)={sa_family=AF_INET, sin_port=htons(5149), sin_addr=inet_addr(172.24.100.10)}, msg_iov(1)=[{\0\0\\377\254\30d\fN\1\0\0004/\0\0N\1\0\0\0\0\0\0\254\30d\n\2\0\254\30..., 1}], msg_controllen=0, msg_flags=0}, MSG_DONTWAIT|MSG_NOSIGNAL) = 70 0.22 0.000104 sendmsg(2, {msg_name(16)={sa_family=AF_INET, sin_port=htons(5405), sin_addr=inet_addr(172.24.100.10)}, msg_iov(1)=[{\0\0\\377\254\30d\fN\1\0\0005/\0\0N\1\0\0\0\0\0\0\254\30d\n\2\0\254\30..., 70}], msg_controllen=0, msg_flags=0}, MSG_NOSIGNAL) = 70 0.14 0.72 poll([{fd=1, events=POLLIN}, {fd=3, events=POLLIN}, {fd=4, events=POLLIN|POLLNVAL}], 3, 209) = 1 ([{fd=4, revents=POLLIN}]) 0.037614 0.037682 accept(4, {sa_family=AF_FILE, path=@}, [4294967298]) = 5 0.23 0.96 fcntl(5, F_SETFL, O_RDONLY|O_NONBLOCK) = 0 0.12 0.40 setsockopt(5, SOL_SOCKET, SO_PASSCRED, [1], 4) = 0 0.06 0.70 poll([{fd=1, events=POLLIN}, {fd=3, events=POLLIN}, {fd=4, events=POLLIN|POLLNVAL}, {fd=5, events=POLLIN|POLLNVAL}], 4, 172) = 1 ([{fd=5, revents=POLLIN}]) 0.000158 0.000207 setsockopt(5, SOL_SOCKET, SO_PASSCRED, [1], 4) = 0 0.06 0.29 recvmsg(5, {msg_name(0)=NULL, msg_iov(1)=[{\1\0\0\0\252*\0\0\224\343TE\0\0\0\0\270\355\362+\0\0\0\0, 24}], msg_controllen=32, {cmsg_len=28, cmsg_level=SOL_SOCKET, cmsg_type=SCM_CREDENTIALS{pid=819, uid=301, gid=301}}, msg_flags=0}, MSG_NOSIGNAL) = 24 0.14 0.84 setsockopt(5, SOL_SOCKET, SO_PASSCRED, [0], 4) = 0 0.06 0.28 sendto(5, \1\0\0\0\0\0\0\0, 8, MSG_WAITALL, NULL, 0) = 8 0.07 0.31 shmget(0x4554e394, 308, 0600) = 5144599 0.000118 0.000198 shmat(5144599, 0, 0) = ? 0.002286 0.002332 semget(0x2bf2edb8, 3, 0600) = 1081360 0.000108 0.000155 mmap(NULL, 20, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS|MAP_32BIT, -1, 0) = 0x41781000 0.000212 0.000262 mprotect(0x41781000, 4096, PROT_NONE) = 0 0.20 0.51 clone(child_stack=0x417b0f90, flags=CLONE_VM|CLONE_FS|CLONE_FILES|CLONE_SIGHAND|CLONE_THREAD|CLONE_SYSVSEM|CLONE_SETTLS|CLONE_PARENT_SETTID|CLONE_CHILD_CLEARTID, parent_tidptr=0x417b1710, tls=0x417b1680, child_tidptr=0x417b1710) = 859 0.46 0.000109 poll([{fd=1, events=POLLIN}, {fd=3, events=POLLIN}, {fd=4, events=POLLIN|POLLNVAL}, {fd=5, events=POLLIN|POLLNVAL}], 4, 168) = 1 ([{fd=4, revents=POLLIN}]) 0.000924 0.000980 accept(4, {sa_family=AF_FILE, path=@}, [4294967298]) = 6 0.11 0.63 fcntl(6, F_SETFL, O_RDONLY|O_NONBLOCK) = 0 0.06 0.49 setsockopt(6, SOL_SOCKET, SO_PASSCRED, [1], 4) = 0 0.06 0.33 poll([{fd=1, events=POLLIN}, {fd=3, events=POLLIN}, {fd=4, events=POLLIN|POLLNVAL}, {fd=5, events=POLLIN|POLLNVAL}, {fd=6, events=POLLIN|POLLNVAL}], 5, 167) = 1 ([{fd=6, revents=POLLIN}]) 0.07 0.44 setsockopt(6, SOL_SOCKET, SO_PASSCRED, [1], 4) = 0 0.06 0.42 recvmsg(6, {msg_name(0)=NULL, msg_iov(1)=[{\4\0\0\0\252*\0\0Rv-b\0\0\0\0A\246\10B\0\0\0\0, 24}], msg_controllen=32, {cmsg_len=28, cmsg_level=SOL_SOCKET, cmsg_type=SCM_CREDENTIALS{pid=819, uid=301, gid=301}}, msg_flags=0}, MSG_NOSIGNAL) = 24 0.10 0.72 setsockopt(6, SOL_SOCKET, SO_PASSCRED, [0], 4) = 0 0.06 0.41 sendto(6, \1\0\0\0\0\0\0\0, 8, MSG_WAITALL, NULL, 0) = 8 0.07 0.37 shmget(0x622d7652, 308, 0600) = 5177368 0.06