Re: [Openais] aisexec crashes with SIGABRT

2011-08-30 Thread Christopher A. Kirke
Steve,

our setup is a 2-node cluster:

   - mstel21 (172.24.100.10) - attached gdb output from here
   - mstel22 (172.24.100.12)

log file from 172.24.100.12:

Aug 29 08:04:49 172.24.100.12 openais[4656]: [TOTEM] The token was lost in
the OPERATIONAL state.
Aug 29 08:04:50 172.24.100.12 openais[4656]: [TOTEM] Receive multicast
socket recv buffer size (262142 bytes).
Aug 29 08:04:50 172.24.100.12 openais[4656]: [TOTEM] Transmit multicast
socket send buffer size (262142 bytes).
Aug 29 08:04:50 172.24.100.12 openais[4656]: [TOTEM] entering GATHER state
from 2.
Aug 29 08:04:50 172.24.100.12 openais[4656]: [TOTEM] entering GATHER state
from 0.
Aug 29 08:04:50 172.24.100.12 openais[4656]: [TOTEM] Creating commit token
because I am the rep.
Aug 29 08:04:50 172.24.100.12 openais[4656]: [TOTEM] Storing new sequence id
for ring ff4
Aug 29 08:04:50 172.24.100.12 openais[4656]: [TOTEM] entering COMMIT state.

Aug 29 08:04:50 172.24.100.12 openais[4656]: [TOTEM] entering RECOVERY
state.
Aug 29 08:04:50 172.24.100.12 openais[4656]: [TOTEM] position [0] member
172.24.100.12:
Aug 29 08:04:50 172.24.100.12 openais[4656]: [TOTEM] previous ring seq 4080
rep 172.24.100.10
Aug 29 08:04:50 172.24.100.12 openais[4656]: [TOTEM] aru 425c1 high
delivered 425c1 received flag 1
Aug 29 08:04:50 172.24.100.12 openais[4656]: [TOTEM] Did not need to
originate any messages in recovery.
Aug 29 08:04:50 172.24.100.12 openais[4656]: [TOTEM] Sending initial ORF
token
Aug 29 08:04:50 172.24.100.12 openais[4656]: [CLM  ] CLM CONFIGURATION
CHANGE
Aug 29 08:04:50 172.24.100.12 openais[4656]: [CLM  ] New Configuration:
Aug 29 08:04:50 172.24.100.12 openais[4656]: [CLM  ] r(0) ip(172.24.100.12)

Aug 29 08:04:50 172.24.100.12 openais[4656]: [CLM  ] Members Left:
Aug 29 08:04:50 172.24.100.12 openais[4656]: [CLM  ] r(0) ip(172.24.100.10)

Aug 29 08:04:50 172.24.100.12 openais[4656]: [CLM  ] Members Joined:
Aug 29 08:04:50 172.24.100.12 openais[4656]: [CLM  ] CLM CONFIGURATION
CHANGE
Aug 29 08:04:50 172.24.100.12 openais[4656]: [CLM  ] New Configuration:
Aug 29 08:04:50 172.24.100.12 openais[4656]: [CLM  ] r(0) ip(172.24.100.12)

Aug 29 08:04:50 172.24.100.12 openais[4656]: [CLM  ] Members Left:
Aug 29 08:04:50 172.24.100.12 openais[4656]: [CLM  ] Members Joined:
Aug 29 08:04:50 172.24.100.12 openais[4656]: [SYNC ] This node is within the
primary component and will provide service.
Aug 29 08:04:50 172.24.100.12 openais[4656]: [TOTEM] entering OPERATIONAL
state.
Aug 29 08:04:50 172.24.100.12 openais[4656]: [CLM  ] got nodejoin message
172.24.100.12
Aug 29 08:04:50 172.24.100.12 openais[4656]: [EVT  ] Channel device_state,
total 1, local 1
Aug 29 08:04:50 172.24.100.12 openais[4656]: [EVT  ] Node r(0)
ip(172.24.100.12) , count 1
Aug 29 08:04:50 172.24.100.12 openais[4656]: [EVT  ] Channel mwi, total 1,
local 1
Aug 29 08:04:50 172.24.100.12 openais[4656]: [EVT  ] Node r(0)
ip(172.24.100.12) , count 1

Thanks,
--
Chris Kirke
Director - Systems Architecture
Multi Service Corporation
www.multiservice.com
+1.913.663.9483 (direct)
+1.816.718.0468 (mobile)
+1.913.217.9318 (fax)




On Tue, Aug 23, 2011 at 11:45, Christopher A. Kirke 
caki...@multiservice.com wrote:

 Steve,

 appreciate the quick response, only happened to be running strace during
 one of the crashes.
 i've updated /etc/init.d/openais to enable core dump and changed
 openais.conf to run
 as root:asterisk instead of asterisk:asterisk so /var/lib/openais is
 available for writing.

 will post gdb output from the next aisexec crash.

 Thanks,
 --
 Chris Kirke
 Director - Systems Architecture
 Multi Service Corporation
 www.multiservice.com
 +1.913.663.9483 (direct)
 +1.816.718.0468 (mobile)
 +1.913.217.9318 (fax)




 On Mon, Aug 22, 2011 at 12:13, Steven Dake sd...@redhat.com wrote:

 On 08/22/2011 09:58 AM, Christopher A. Kirke wrote:
  currently using the REL5-provided package on two nodes on local
  Cisco-switched LAN:
 
  openais.x86_64   0.80.6-28.el5_6.1
  installed
 
  with following configuration:
 
  # Please read the openais.conf.5 manual page
 
  aisexec {
  user: asterisk
  group: asterisk
  }
 
  totem {
  version: 2
  secauth: off
  threads: 0
  interface {
  ringnumber: 0
  bindnetaddr: 172.24.100.0
  mcastaddr: 239.255.4.1
  mcastport: 5405
  }
  }
 
  logging {
  debug: off
  syslog_facility: local1
  syslog_priority: info
  timestamp: off
  to_file: no
  to_syslog: yes
  }
 
  amf {
  mode: disabled
  }
 
  to enable Asterisk distributed device state.
 
  we see cases where aisexec crashes, both with Asterisk running and
  stopped - strace output below:
 
   0.73 poll([{fd=1, events=POLLIN}, {fd=3, events=POLLIN}, {fd=4,
  events=POLLIN|POLLNVAL}], 3, 10) = 0 (Timeout) 0.009994
   0.010031 poll([{fd=1, events=POLLIN}, {fd=3, events=POLLIN}, {fd=4,
  events=POLLIN|POLLNVAL}], 3, 237) = 1 ([{fd=1, revents=POLLIN}])
 0.07
   0.37 recvmsg(1, {msg_name(16)={sa_family=AF_INET,
  sin_port=htons(5149), 

Re: [Openais] aisexec crashes with SIGABRT

2011-08-30 Thread Steven Dake
On 08/29/2011 06:26 PM, Christopher A. Kirke wrote:
 Steve,
 

The core file doesn't have debuginfo installed when it was analyzed.
The package you want is something like openais-debuginfo.  You may
have to enable the debuginfo yum repo if you have not already.

Regards
-steve
 our setup is a 2-node cluster:
 
   * mstel21 (172.24.100.10) - attached gdb output from here
   * mstel22 (172.24.100.12)
 
 log file from 172.24.100.12 http://172.24.100.12:
 
 Aug 29 08:04:49 172.24.100.12 openais[4656]: [TOTEM] The token was lost
 in the OPERATIONAL state.  
 Aug 29 08:04:50 172.24.100.12 openais[4656]: [TOTEM] Receive multicast
 socket recv buffer size (262142 bytes).  
 Aug 29 08:04:50 172.24.100.12 openais[4656]: [TOTEM] Transmit multicast
 socket send buffer size (262142 bytes).  
 Aug 29 08:04:50 172.24.100.12 openais[4656]: [TOTEM] entering GATHER
 state from 2.  
 Aug 29 08:04:50 172.24.100.12 openais[4656]: [TOTEM] entering GATHER
 state from 0.  
 Aug 29 08:04:50 172.24.100.12 openais[4656]: [TOTEM] Creating commit
 token because I am the rep.  
 Aug 29 08:04:50 172.24.100.12 openais[4656]: [TOTEM] Storing new
 sequence id for ring ff4  
 Aug 29 08:04:50 172.24.100.12 openais[4656]: [TOTEM] entering COMMIT
 state.  
 Aug 29 08:04:50 172.24.100.12 openais[4656]: [TOTEM] entering RECOVERY
 state.  
 Aug 29 08:04:50 172.24.100.12 openais[4656]: [TOTEM] position [0] member
 172.24.100.12 http://172.24.100.12:  
 Aug 29 08:04:50 172.24.100.12 openais[4656]: [TOTEM] previous ring seq
 4080 rep 172.24.100.10  
 Aug 29 08:04:50 172.24.100.12 openais[4656]: [TOTEM] aru 425c1 high
 delivered 425c1 received flag 1  
 Aug 29 08:04:50 172.24.100.12 openais[4656]: [TOTEM] Did not need to
 originate any messages in recovery.  
 Aug 29 08:04:50 172.24.100.12 openais[4656]: [TOTEM] Sending initial ORF
 token  
 Aug 29 08:04:50 172.24.100.12 openais[4656]: [CLM  ] CLM CONFIGURATION
 CHANGE  
 Aug 29 08:04:50 172.24.100.12 openais[4656]: [CLM  ] New Configuration:  
 Aug 29 08:04:50 172.24.100.12 openais[4656]: [CLM  ] r(0)
 ip(172.24.100.12)   
 Aug 29 08:04:50 172.24.100.12 openais[4656]: [CLM  ] Members Left:  
 Aug 29 08:04:50 172.24.100.12 openais[4656]: [CLM  ] r(0)
 ip(172.24.100.10)   
 Aug 29 08:04:50 172.24.100.12 openais[4656]: [CLM  ] Members Joined:  
 Aug 29 08:04:50 172.24.100.12 openais[4656]: [CLM  ] CLM CONFIGURATION
 CHANGE  
 Aug 29 08:04:50 172.24.100.12 openais[4656]: [CLM  ] New Configuration:  
 Aug 29 08:04:50 172.24.100.12 openais[4656]: [CLM  ] r(0)
 ip(172.24.100.12)   
 Aug 29 08:04:50 172.24.100.12 openais[4656]: [CLM  ] Members Left:  
 Aug 29 08:04:50 172.24.100.12 openais[4656]: [CLM  ] Members Joined:  
 Aug 29 08:04:50 172.24.100.12 openais[4656]: [SYNC ] This node is within
 the primary component and will provide service.  
 Aug 29 08:04:50 172.24.100.12 openais[4656]: [TOTEM] entering
 OPERATIONAL state.  
 Aug 29 08:04:50 172.24.100.12 openais[4656]: [CLM  ] got nodejoin
 message 172.24.100.12  
 Aug 29 08:04:50 172.24.100.12 openais[4656]: [EVT  ] Channel
 device_state, total 1, local 1  
 Aug 29 08:04:50 172.24.100.12 openais[4656]: [EVT  ] Node r(0)
 ip(172.24.100.12) , count 1  
 Aug 29 08:04:50 172.24.100.12 openais[4656]: [EVT  ] Channel mwi, total
 1, local 1  
 Aug 29 08:04:50 172.24.100.12 openais[4656]: [EVT  ] Node r(0)
 ip(172.24.100.12) , count 1  
 
 Thanks,
 --
 Chris Kirke
 Director - Systems Architecture
 Multi Service Corporation
 www.multiservice.com http://www.multiservice.com
 +1.913.663.9483 (direct)
 +1.816.718.0468 (mobile)
 +1.913.217.9318 (fax)
 
 
 
 
 On Tue, Aug 23, 2011 at 11:45, Christopher A. Kirke
 caki...@multiservice.com mailto:caki...@multiservice.com wrote:
 
 Steve,
 
 appreciate the quick response, only happened to be running
 strace during one of the crashes.
 i've updated /etc/init.d/openais to enable core dump and changed
 openais.conf to run
 as root:asterisk instead of asterisk:asterisk so /var/lib/openais is
 available for writing.
 
 will post gdb output from the next aisexec crash.
 
 Thanks,
 --
 Chris Kirke
 Director - Systems Architecture
 Multi Service Corporation
 www.multiservice.com http://www.multiservice.com
 +1.913.663.9483 tel:%2B1.913.663.9483 (direct)
 +1.816.718.0468 tel:%2B1.816.718.0468 (mobile)
 +1.913.217.9318 tel:%2B1.913.217.9318 (fax)
 
 
 
 
 On Mon, Aug 22, 2011 at 12:13, Steven Dake sd...@redhat.com
 mailto:sd...@redhat.com wrote:
 
 On 08/22/2011 09:58 AM, Christopher A. Kirke wrote:
  currently using the REL5-provided package on two nodes on local
  Cisco-switched LAN:
 
  openais.x86_64   0.80.6-28.el5_6.1
  installed
 
  with following configuration:
 
  # Please read the openais.conf.5 manual page
 
  aisexec {
  user: asterisk
  group: asterisk
  }
 
  totem {
  

Re: [Openais] aisexec crashes with SIGABRT

2011-08-30 Thread Steven Dake
On 08/30/2011 06:39 AM, Christopher A. Kirke wrote:
 Steve,
 
 i sometimes need to be smacked by the obvious :^)
 
 updated analysis attached ...
 

THe core happens because the event service from openais de-referenes an
object with a reference count of 0.  This shouldn't happen, but can't
explain why it does.

We stopped supporting the sa forum services last year in this project,
but understanding that we may not be able to fix the problem, If you
could provide more details of the exact scenario which triggers this
problem, that might be helpful in reproducing the issue.

Regards
-steve

 Thanks,
 --
 Chris Kirke
 Director - Systems Architecture
 Multi Service Corporation
 www.multiservice.com http://www.multiservice.com
 +1.913.663.9483 (direct)
 +1.816.718.0468 (mobile)
 +1.913.217.9318 (fax)
 
 
 
 
 On Mon, Aug 29, 2011 at 22:43, Steven Dake sd...@redhat.com
 mailto:sd...@redhat.com wrote:
 
 On 08/29/2011 06:26 PM, Christopher A. Kirke wrote:
  Steve,
 
 
 The core file doesn't have debuginfo installed when it was analyzed.
 The package you want is something like openais-debuginfo.  You may
 have to enable the debuginfo yum repo if you have not already.
 
 Regards
 -steve
  our setup is a 2-node cluster:
 
* mstel21 (172.24.100.10) - attached gdb output from here
* mstel22 (172.24.100.12)
 
  log file from 172.24.100.12 http://172.24.100.12:
 
  Aug 29 08:04:49 172.24.100.12 openais[4656]: [TOTEM] The token was
 lost
  in the OPERATIONAL state.
  Aug 29 08:04:50 172.24.100.12 openais[4656]: [TOTEM] Receive multicast
  socket recv buffer size (262142 bytes).
  Aug 29 08:04:50 172.24.100.12 openais[4656]: [TOTEM] Transmit
 multicast
  socket send buffer size (262142 bytes).
  Aug 29 08:04:50 172.24.100.12 openais[4656]: [TOTEM] entering GATHER
  state from 2.
  Aug 29 08:04:50 172.24.100.12 openais[4656]: [TOTEM] entering GATHER
  state from 0.
  Aug 29 08:04:50 172.24.100.12 openais[4656]: [TOTEM] Creating commit
  token because I am the rep.
  Aug 29 08:04:50 172.24.100.12 openais[4656]: [TOTEM] Storing new
  sequence id for ring ff4
  Aug 29 08:04:50 172.24.100.12 openais[4656]: [TOTEM] entering COMMIT
  state.
  Aug 29 08:04:50 172.24.100.12 openais[4656]: [TOTEM] entering RECOVERY
  state.
  Aug 29 08:04:50 172.24.100.12 openais[4656]: [TOTEM] position [0]
 member
  172.24.100.12 http://172.24.100.12:
  Aug 29 08:04:50 172.24.100.12 openais[4656]: [TOTEM] previous ring seq
  4080 rep 172.24.100.10
  Aug 29 08:04:50 172.24.100.12 openais[4656]: [TOTEM] aru 425c1 high
  delivered 425c1 received flag 1
  Aug 29 08:04:50 172.24.100.12 openais[4656]: [TOTEM] Did not need to
  originate any messages in recovery.
  Aug 29 08:04:50 172.24.100.12 openais[4656]: [TOTEM] Sending
 initial ORF
  token
  Aug 29 08:04:50 172.24.100.12 openais[4656]: [CLM  ] CLM CONFIGURATION
  CHANGE
  Aug 29 08:04:50 172.24.100.12 openais[4656]: [CLM  ] New
 Configuration:
  Aug 29 08:04:50 172.24.100.12 openais[4656]: [CLM  ] r(0)
  ip(172.24.100.12)
  Aug 29 08:04:50 172.24.100.12 openais[4656]: [CLM  ] Members Left:
  Aug 29 08:04:50 172.24.100.12 openais[4656]: [CLM  ] r(0)
  ip(172.24.100.10)
  Aug 29 08:04:50 172.24.100.12 openais[4656]: [CLM  ] Members Joined:
  Aug 29 08:04:50 172.24.100.12 openais[4656]: [CLM  ] CLM CONFIGURATION
  CHANGE
  Aug 29 08:04:50 172.24.100.12 openais[4656]: [CLM  ] New
 Configuration:
  Aug 29 08:04:50 172.24.100.12 openais[4656]: [CLM  ] r(0)
  ip(172.24.100.12)
  Aug 29 08:04:50 172.24.100.12 openais[4656]: [CLM  ] Members Left:
  Aug 29 08:04:50 172.24.100.12 openais[4656]: [CLM  ] Members Joined:
  Aug 29 08:04:50 172.24.100.12 openais[4656]: [SYNC ] This node is
 within
  the primary component and will provide service.
  Aug 29 08:04:50 172.24.100.12 openais[4656]: [TOTEM] entering
  OPERATIONAL state.
  Aug 29 08:04:50 172.24.100.12 openais[4656]: [CLM  ] got nodejoin
  message 172.24.100.12
  Aug 29 08:04:50 172.24.100.12 openais[4656]: [EVT  ] Channel
  device_state, total 1, local 1
  Aug 29 08:04:50 172.24.100.12 openais[4656]: [EVT  ] Node r(0)
  ip(172.24.100.12) , count 1
  Aug 29 08:04:50 172.24.100.12 openais[4656]: [EVT  ] Channel mwi,
 total
  1, local 1
  Aug 29 08:04:50 172.24.100.12 openais[4656]: [EVT  ] Node r(0)
  ip(172.24.100.12) , count 1
 
  Thanks,
  --
  Chris Kirke
  Director - Systems Architecture
  Multi Service Corporation
  www.multiservice.com http://www.multiservice.com
 http://www.multiservice.com
  +1.913.663.9483 tel:%2B1.913.663.9483 (direct)
  +1.816.718.0468 tel:%2B1.816.718.0468 (mobile)
  +1.913.217.9318 tel:%2B1.913.217.9318 (fax)
 
 
 
 
  On Tue, Aug 23, 

Re: [Openais] aisexec crashes with SIGABRT

2011-08-23 Thread Christopher A. Kirke
Steve,

appreciate the quick response, only happened to be running strace during one
of the crashes.
i've updated /etc/init.d/openais to enable core dump and changed
openais.conf to run
as root:asterisk instead of asterisk:asterisk so /var/lib/openais is
available for writing.

will post gdb output from the next aisexec crash.

Thanks,
--
Chris Kirke
Director - Systems Architecture
Multi Service Corporation
www.multiservice.com
+1.913.663.9483 (direct)
+1.816.718.0468 (mobile)
+1.913.217.9318 (fax)




On Mon, Aug 22, 2011 at 12:13, Steven Dake sd...@redhat.com wrote:

 On 08/22/2011 09:58 AM, Christopher A. Kirke wrote:
  currently using the REL5-provided package on two nodes on local
  Cisco-switched LAN:
 
  openais.x86_64   0.80.6-28.el5_6.1
  installed
 
  with following configuration:
 
  # Please read the openais.conf.5 manual page
 
  aisexec {
  user: asterisk
  group: asterisk
  }
 
  totem {
  version: 2
  secauth: off
  threads: 0
  interface {
  ringnumber: 0
  bindnetaddr: 172.24.100.0
  mcastaddr: 239.255.4.1
  mcastport: 5405
  }
  }
 
  logging {
  debug: off
  syslog_facility: local1
  syslog_priority: info
  timestamp: off
  to_file: no
  to_syslog: yes
  }
 
  amf {
  mode: disabled
  }
 
  to enable Asterisk distributed device state.
 
  we see cases where aisexec crashes, both with Asterisk running and
  stopped - strace output below:
 
   0.73 poll([{fd=1, events=POLLIN}, {fd=3, events=POLLIN}, {fd=4,
  events=POLLIN|POLLNVAL}], 3, 10) = 0 (Timeout) 0.009994
   0.010031 poll([{fd=1, events=POLLIN}, {fd=3, events=POLLIN}, {fd=4,
  events=POLLIN|POLLNVAL}], 3, 237) = 1 ([{fd=1, revents=POLLIN}])
 0.07
   0.37 recvmsg(1, {msg_name(16)={sa_family=AF_INET,
  sin_port=htons(5149), sin_addr=inet_addr(172.24.100.10)},
 
 msg_iov(1)=[{\2\0\\377\254\30d\n\254\30d\n\2\0\254\30d\n\10\0\2\0\254\30d\n\10\0\4\0\0\0...,
  1}], msg_controllen=0, msg_flags=0}, MSG_DONTWAIT|MSG_NOSIGNAL) = 82
  0.10
   0.72 poll([{fd=1, events=POLLIN}, {fd=3, events=POLLIN}, {fd=4,
  events=POLLIN|POLLNVAL}], 3, 237) = 1 ([{fd=3, revents=POLLIN}])
 0.180257
   0.180339 recvmsg(3, {msg_name(16)={sa_family=AF_INET,
  sin_port=htons(5149), sin_addr=inet_addr(172.24.100.10)},
 
 msg_iov(1)=[{\0\0\\377\254\30d\fN\1\0\0004/\0\0N\1\0\0\0\0\0\0\254\30d\n\2\0\254\30...,
  1}], msg_controllen=0, msg_flags=0}, MSG_DONTWAIT|MSG_NOSIGNAL) = 70
  0.22
   0.000104 sendmsg(2, {msg_name(16)={sa_family=AF_INET,
  sin_port=htons(5405), sin_addr=inet_addr(172.24.100.10)},
 
 msg_iov(1)=[{\0\0\\377\254\30d\fN\1\0\0005/\0\0N\1\0\0\0\0\0\0\254\30d\n\2\0\254\30...,
  70}], msg_controllen=0, msg_flags=0}, MSG_NOSIGNAL) = 70 0.14
   0.72 poll([{fd=1, events=POLLIN}, {fd=3, events=POLLIN}, {fd=4,
  events=POLLIN|POLLNVAL}], 3, 209) = 1 ([{fd=4, revents=POLLIN}])
 0.037614
   0.037682 accept(4, {sa_family=AF_FILE, path=@}, [4294967298]) = 5
  0.23
   0.96 fcntl(5, F_SETFL, O_RDONLY|O_NONBLOCK) = 0 0.12
   0.40 setsockopt(5, SOL_SOCKET, SO_PASSCRED, [1], 4) = 0
 0.06
   0.70 poll([{fd=1, events=POLLIN}, {fd=3, events=POLLIN}, {fd=4,
  events=POLLIN|POLLNVAL}, {fd=5, events=POLLIN|POLLNVAL}], 4, 172) = 1
  ([{fd=5, revents=POLLIN}]) 0.000158
   0.000207 setsockopt(5, SOL_SOCKET, SO_PASSCRED, [1], 4) = 0
 0.06
   0.29 recvmsg(5, {msg_name(0)=NULL,
  msg_iov(1)=[{\1\0\0\0\252*\0\0\224\343TE\0\0\0\0\270\355\362+\0\0\0\0,
  24}], msg_controllen=32, {cmsg_len=28, cmsg_level=SOL_SOCKET,
  cmsg_type=SCM_CREDENTIALS{pid=819, uid=301, gid=301}}, msg_flags=0},
  MSG_NOSIGNAL) = 24 0.14
   0.84 setsockopt(5, SOL_SOCKET, SO_PASSCRED, [0], 4) = 0
 0.06
   0.28 sendto(5, \1\0\0\0\0\0\0\0, 8, MSG_WAITALL, NULL, 0) = 8
  0.07
   0.31 shmget(0x4554e394, 308, 0600) = 5144599 0.000118
   0.000198 shmat(5144599, 0, 0)  = ? 0.002286
   0.002332 semget(0x2bf2edb8, 3, 0600) = 1081360 0.000108
   0.000155 mmap(NULL, 20, PROT_READ|PROT_WRITE,
  MAP_PRIVATE|MAP_ANONYMOUS|MAP_32BIT, -1, 0) = 0x41781000 0.000212
   0.000262 mprotect(0x41781000, 4096, PROT_NONE) = 0 0.20
   0.51 clone(child_stack=0x417b0f90,
 
 flags=CLONE_VM|CLONE_FS|CLONE_FILES|CLONE_SIGHAND|CLONE_THREAD|CLONE_SYSVSEM|CLONE_SETTLS|CLONE_PARENT_SETTID|CLONE_CHILD_CLEARTID,
  parent_tidptr=0x417b1710, tls=0x417b1680, child_tidptr=0x417b1710) = 859
  0.46
   0.000109 poll([{fd=1, events=POLLIN}, {fd=3, events=POLLIN}, {fd=4,
  events=POLLIN|POLLNVAL}, {fd=5, events=POLLIN|POLLNVAL}], 4, 168) = 1
  ([{fd=4, revents=POLLIN}]) 0.000924
   0.000980 accept(4, {sa_family=AF_FILE, path=@}, [4294967298]) = 6
  0.11
   0.63 fcntl(6, F_SETFL, O_RDONLY|O_NONBLOCK) = 0 0.06
   0.49 setsockopt(6, SOL_SOCKET, SO_PASSCRED, [1], 4) = 0
 0.06
   0.33 poll([{fd=1, events=POLLIN}, {fd=3, events=POLLIN}, {fd=4,
  events=POLLIN|POLLNVAL}, {fd=5, 

[Openais] aisexec crashes with SIGABRT

2011-08-22 Thread Christopher A. Kirke
currently using the REL5-provided package on two nodes on local
Cisco-switched LAN:

openais.x86_64   0.80.6-28.el5_6.1
installed

with following configuration:

# Please read the openais.conf.5 manual page

aisexec {
user: asterisk
group: asterisk
}

totem {
version: 2
secauth: off
threads: 0
 interface {
ringnumber: 0
bindnetaddr: 172.24.100.0
 mcastaddr: 239.255.4.1
mcastport: 5405
}
}

logging {
debug: off
syslog_facility: local1
 syslog_priority: info
timestamp: off
to_file: no
 to_syslog: yes
}

amf {
mode: disabled
}

to enable Asterisk distributed device state.

we see cases where aisexec crashes, both with Asterisk running and stopped -
strace output below:

 0.73 poll([{fd=1, events=POLLIN}, {fd=3, events=POLLIN}, {fd=4,
events=POLLIN|POLLNVAL}], 3, 10) = 0 (Timeout) 0.009994
 0.010031 poll([{fd=1, events=POLLIN}, {fd=3, events=POLLIN}, {fd=4,
events=POLLIN|POLLNVAL}], 3, 237) = 1 ([{fd=1, revents=POLLIN}]) 0.07
 0.37 recvmsg(1, {msg_name(16)={sa_family=AF_INET,
sin_port=htons(5149), sin_addr=inet_addr(172.24.100.10)},
msg_iov(1)=[{\2\0\\377\254\30d\n\254\30d\n\2\0\254\30d\n\10\0\2\0\254\30d\n\10\0\4\0\0\0...,
1}], msg_controllen=0, msg_flags=0}, MSG_DONTWAIT|MSG_NOSIGNAL) = 82
0.10
 0.72 poll([{fd=1, events=POLLIN}, {fd=3, events=POLLIN}, {fd=4,
events=POLLIN|POLLNVAL}], 3, 237) = 1 ([{fd=3, revents=POLLIN}]) 0.180257
 0.180339 recvmsg(3, {msg_name(16)={sa_family=AF_INET,
sin_port=htons(5149), sin_addr=inet_addr(172.24.100.10)},
msg_iov(1)=[{\0\0\\377\254\30d\fN\1\0\0004/\0\0N\1\0\0\0\0\0\0\254\30d\n\2\0\254\30...,
1}], msg_controllen=0, msg_flags=0}, MSG_DONTWAIT|MSG_NOSIGNAL) = 70
0.22
 0.000104 sendmsg(2, {msg_name(16)={sa_family=AF_INET,
sin_port=htons(5405), sin_addr=inet_addr(172.24.100.10)},
msg_iov(1)=[{\0\0\\377\254\30d\fN\1\0\0005/\0\0N\1\0\0\0\0\0\0\254\30d\n\2\0\254\30...,
70}], msg_controllen=0, msg_flags=0}, MSG_NOSIGNAL) = 70 0.14
 0.72 poll([{fd=1, events=POLLIN}, {fd=3, events=POLLIN}, {fd=4,
events=POLLIN|POLLNVAL}], 3, 209) = 1 ([{fd=4, revents=POLLIN}]) 0.037614
 0.037682 accept(4, {sa_family=AF_FILE, path=@}, [4294967298]) = 5
0.23
 0.96 fcntl(5, F_SETFL, O_RDONLY|O_NONBLOCK) = 0 0.12
 0.40 setsockopt(5, SOL_SOCKET, SO_PASSCRED, [1], 4) = 0 0.06
 0.70 poll([{fd=1, events=POLLIN}, {fd=3, events=POLLIN}, {fd=4,
events=POLLIN|POLLNVAL}, {fd=5, events=POLLIN|POLLNVAL}], 4, 172) = 1
([{fd=5, revents=POLLIN}]) 0.000158
 0.000207 setsockopt(5, SOL_SOCKET, SO_PASSCRED, [1], 4) = 0 0.06
 0.29 recvmsg(5, {msg_name(0)=NULL,
msg_iov(1)=[{\1\0\0\0\252*\0\0\224\343TE\0\0\0\0\270\355\362+\0\0\0\0,
24}], msg_controllen=32, {cmsg_len=28, cmsg_level=SOL_SOCKET,
cmsg_type=SCM_CREDENTIALS{pid=819, uid=301, gid=301}}, msg_flags=0},
MSG_NOSIGNAL) = 24 0.14
 0.84 setsockopt(5, SOL_SOCKET, SO_PASSCRED, [0], 4) = 0 0.06
 0.28 sendto(5, \1\0\0\0\0\0\0\0, 8, MSG_WAITALL, NULL, 0) = 8
0.07
 0.31 shmget(0x4554e394, 308, 0600) = 5144599 0.000118
 0.000198 shmat(5144599, 0, 0)  = ? 0.002286
 0.002332 semget(0x2bf2edb8, 3, 0600) = 1081360 0.000108
 0.000155 mmap(NULL, 20, PROT_READ|PROT_WRITE,
MAP_PRIVATE|MAP_ANONYMOUS|MAP_32BIT, -1, 0) = 0x41781000 0.000212
 0.000262 mprotect(0x41781000, 4096, PROT_NONE) = 0 0.20
 0.51 clone(child_stack=0x417b0f90,
flags=CLONE_VM|CLONE_FS|CLONE_FILES|CLONE_SIGHAND|CLONE_THREAD|CLONE_SYSVSEM|CLONE_SETTLS|CLONE_PARENT_SETTID|CLONE_CHILD_CLEARTID,
parent_tidptr=0x417b1710, tls=0x417b1680, child_tidptr=0x417b1710) = 859
0.46
 0.000109 poll([{fd=1, events=POLLIN}, {fd=3, events=POLLIN}, {fd=4,
events=POLLIN|POLLNVAL}, {fd=5, events=POLLIN|POLLNVAL}], 4, 168) = 1
([{fd=4, revents=POLLIN}]) 0.000924
 0.000980 accept(4, {sa_family=AF_FILE, path=@}, [4294967298]) = 6
0.11
 0.63 fcntl(6, F_SETFL, O_RDONLY|O_NONBLOCK) = 0 0.06
 0.49 setsockopt(6, SOL_SOCKET, SO_PASSCRED, [1], 4) = 0 0.06
 0.33 poll([{fd=1, events=POLLIN}, {fd=3, events=POLLIN}, {fd=4,
events=POLLIN|POLLNVAL}, {fd=5, events=POLLIN|POLLNVAL}, {fd=6,
events=POLLIN|POLLNVAL}], 5, 167) = 1 ([{fd=6, revents=POLLIN}]) 0.07
 0.44 setsockopt(6, SOL_SOCKET, SO_PASSCRED, [1], 4) = 0 0.06
 0.42 recvmsg(6, {msg_name(0)=NULL,
msg_iov(1)=[{\4\0\0\0\252*\0\0Rv-b\0\0\0\0A\246\10B\0\0\0\0, 24}],
msg_controllen=32, {cmsg_len=28, cmsg_level=SOL_SOCKET,
cmsg_type=SCM_CREDENTIALS{pid=819, uid=301, gid=301}}, msg_flags=0},
MSG_NOSIGNAL) = 24 0.10
 0.72 setsockopt(6, SOL_SOCKET, SO_PASSCRED, [0], 4) = 0 0.06
 0.41 sendto(6, \1\0\0\0\0\0\0\0, 8, MSG_WAITALL, NULL, 0) = 8
0.07
 0.37 shmget(0x622d7652, 308, 0600) = 5177368 0.06
 0.36 shmat(5177368, 0, 0)  = ? 0.002768
 0.002814 semget(0x4208a641, 3, 0600) = 1114129 0.08
 0.54 mmap(NULL, 20, PROT_READ|PROT_WRITE,

Re: [Openais] aisexec crashes with SIGABRT

2011-08-22 Thread Steven Dake
On 08/22/2011 09:58 AM, Christopher A. Kirke wrote:
 currently using the REL5-provided package on two nodes on local
 Cisco-switched LAN:
 
 openais.x86_64   0.80.6-28.el5_6.1
 installed
 
 with following configuration:
 
 # Please read the openais.conf.5 manual page
 
 aisexec {
 user: asterisk
 group: asterisk
 }
 
 totem {
 version: 2
 secauth: off
 threads: 0
 interface {
 ringnumber: 0
 bindnetaddr: 172.24.100.0
 mcastaddr: 239.255.4.1
 mcastport: 5405
 }
 }
 
 logging {
 debug: off
 syslog_facility: local1
 syslog_priority: info
 timestamp: off
 to_file: no
 to_syslog: yes
 }
 
 amf {
 mode: disabled
 }
 
 to enable Asterisk distributed device state.
 
 we see cases where aisexec crashes, both with Asterisk running and
 stopped - strace output below:
 
  0.73 poll([{fd=1, events=POLLIN}, {fd=3, events=POLLIN}, {fd=4,
 events=POLLIN|POLLNVAL}], 3, 10) = 0 (Timeout) 0.009994
  0.010031 poll([{fd=1, events=POLLIN}, {fd=3, events=POLLIN}, {fd=4,
 events=POLLIN|POLLNVAL}], 3, 237) = 1 ([{fd=1, revents=POLLIN}]) 0.07
  0.37 recvmsg(1, {msg_name(16)={sa_family=AF_INET,
 sin_port=htons(5149), sin_addr=inet_addr(172.24.100.10)},
 msg_iov(1)=[{\2\0\\377\254\30d\n\254\30d\n\2\0\254\30d\n\10\0\2\0\254\30d\n\10\0\4\0\0\0...,
 1}], msg_controllen=0, msg_flags=0}, MSG_DONTWAIT|MSG_NOSIGNAL) = 82
 0.10
  0.72 poll([{fd=1, events=POLLIN}, {fd=3, events=POLLIN}, {fd=4,
 events=POLLIN|POLLNVAL}], 3, 237) = 1 ([{fd=3, revents=POLLIN}]) 0.180257
  0.180339 recvmsg(3, {msg_name(16)={sa_family=AF_INET,
 sin_port=htons(5149), sin_addr=inet_addr(172.24.100.10)},
 msg_iov(1)=[{\0\0\\377\254\30d\fN\1\0\0004/\0\0N\1\0\0\0\0\0\0\254\30d\n\2\0\254\30...,
 1}], msg_controllen=0, msg_flags=0}, MSG_DONTWAIT|MSG_NOSIGNAL) = 70
 0.22
  0.000104 sendmsg(2, {msg_name(16)={sa_family=AF_INET,
 sin_port=htons(5405), sin_addr=inet_addr(172.24.100.10)},
 msg_iov(1)=[{\0\0\\377\254\30d\fN\1\0\0005/\0\0N\1\0\0\0\0\0\0\254\30d\n\2\0\254\30...,
 70}], msg_controllen=0, msg_flags=0}, MSG_NOSIGNAL) = 70 0.14
  0.72 poll([{fd=1, events=POLLIN}, {fd=3, events=POLLIN}, {fd=4,
 events=POLLIN|POLLNVAL}], 3, 209) = 1 ([{fd=4, revents=POLLIN}]) 0.037614
  0.037682 accept(4, {sa_family=AF_FILE, path=@}, [4294967298]) = 5
 0.23
  0.96 fcntl(5, F_SETFL, O_RDONLY|O_NONBLOCK) = 0 0.12
  0.40 setsockopt(5, SOL_SOCKET, SO_PASSCRED, [1], 4) = 0 0.06
  0.70 poll([{fd=1, events=POLLIN}, {fd=3, events=POLLIN}, {fd=4,
 events=POLLIN|POLLNVAL}, {fd=5, events=POLLIN|POLLNVAL}], 4, 172) = 1
 ([{fd=5, revents=POLLIN}]) 0.000158
  0.000207 setsockopt(5, SOL_SOCKET, SO_PASSCRED, [1], 4) = 0 0.06
  0.29 recvmsg(5, {msg_name(0)=NULL,
 msg_iov(1)=[{\1\0\0\0\252*\0\0\224\343TE\0\0\0\0\270\355\362+\0\0\0\0,
 24}], msg_controllen=32, {cmsg_len=28, cmsg_level=SOL_SOCKET,
 cmsg_type=SCM_CREDENTIALS{pid=819, uid=301, gid=301}}, msg_flags=0},
 MSG_NOSIGNAL) = 24 0.14
  0.84 setsockopt(5, SOL_SOCKET, SO_PASSCRED, [0], 4) = 0 0.06
  0.28 sendto(5, \1\0\0\0\0\0\0\0, 8, MSG_WAITALL, NULL, 0) = 8
 0.07
  0.31 shmget(0x4554e394, 308, 0600) = 5144599 0.000118
  0.000198 shmat(5144599, 0, 0)  = ? 0.002286
  0.002332 semget(0x2bf2edb8, 3, 0600) = 1081360 0.000108
  0.000155 mmap(NULL, 20, PROT_READ|PROT_WRITE,
 MAP_PRIVATE|MAP_ANONYMOUS|MAP_32BIT, -1, 0) = 0x41781000 0.000212
  0.000262 mprotect(0x41781000, 4096, PROT_NONE) = 0 0.20
  0.51 clone(child_stack=0x417b0f90,
 flags=CLONE_VM|CLONE_FS|CLONE_FILES|CLONE_SIGHAND|CLONE_THREAD|CLONE_SYSVSEM|CLONE_SETTLS|CLONE_PARENT_SETTID|CLONE_CHILD_CLEARTID,
 parent_tidptr=0x417b1710, tls=0x417b1680, child_tidptr=0x417b1710) = 859
 0.46
  0.000109 poll([{fd=1, events=POLLIN}, {fd=3, events=POLLIN}, {fd=4,
 events=POLLIN|POLLNVAL}, {fd=5, events=POLLIN|POLLNVAL}], 4, 168) = 1
 ([{fd=4, revents=POLLIN}]) 0.000924
  0.000980 accept(4, {sa_family=AF_FILE, path=@}, [4294967298]) = 6
 0.11
  0.63 fcntl(6, F_SETFL, O_RDONLY|O_NONBLOCK) = 0 0.06
  0.49 setsockopt(6, SOL_SOCKET, SO_PASSCRED, [1], 4) = 0 0.06
  0.33 poll([{fd=1, events=POLLIN}, {fd=3, events=POLLIN}, {fd=4,
 events=POLLIN|POLLNVAL}, {fd=5, events=POLLIN|POLLNVAL}, {fd=6,
 events=POLLIN|POLLNVAL}], 5, 167) = 1 ([{fd=6, revents=POLLIN}]) 0.07
  0.44 setsockopt(6, SOL_SOCKET, SO_PASSCRED, [1], 4) = 0 0.06
  0.42 recvmsg(6, {msg_name(0)=NULL,
 msg_iov(1)=[{\4\0\0\0\252*\0\0Rv-b\0\0\0\0A\246\10B\0\0\0\0, 24}],
 msg_controllen=32, {cmsg_len=28, cmsg_level=SOL_SOCKET,
 cmsg_type=SCM_CREDENTIALS{pid=819, uid=301, gid=301}}, msg_flags=0},
 MSG_NOSIGNAL) = 24 0.10
  0.72 setsockopt(6, SOL_SOCKET, SO_PASSCRED, [0], 4) = 0 0.06
  0.41 sendto(6, \1\0\0\0\0\0\0\0, 8, MSG_WAITALL, NULL, 0) = 8
 0.07
  0.37 shmget(0x622d7652, 308, 0600) = 5177368 0.06