Re: [SR-Users] Kamailio stops responding after 10 days or so

2011-04-26 Thread Daniel-Constantin Mierla

Hello,

thanks for feedback. Going to work on it asap.

Cheers,
Daniel

On 4/19/11 10:17 PM, Morten Isaksen wrote:

Hi,

It happend again.

I got the result from gdb. Here is the output from a bt indside gdb
for each worker and one bt full for one of the workers.

They are all stuck in futexlock.h

My C debuging skills are a bit rusty so I hope one of you have some ideas.

/Morten

(gdb) bt
#0  0x00329b6d0a19 in syscall () from /lib64/libc.so.6
#1  0x2acaf17b2e0d in futex_get (t=,
type=, ps=0x7fffafb7d7a0)
 at ../../mem/../futexlock.h:113
#2  publ_cback_func (t=, type=, ps=0x7fffafb7d7a0) at send_publish.c:272
#3  0x2acaeeb97815 in run_trans_callbacks_internal
(cb_lst=0x2acaf3cfd260, type=256, trans=0x2acaf3cfd1f0,
params=0x7fffafb7d7a0)
 at t_hooks.c:290
#4  0x2acaeeb97a6e in run_trans_callbacks (type=0, trans=0x2,
req=0x0, rpl=0x246, code=0) at t_hooks.c:317
#5  0x2acaeebbfcb3 in local_reply (t=0x2acaf3cfd1f0, p_msg=, branch=0, msg_status=,
 cancel_bitmap=) at t_reply.c:1847
#6  0x2acaeebc29e8 in reply_received (p_msg=0x8d36c8) at t_reply.c:2133
#7  0x0044780e in forward_reply (msg=0x8d36c8) at forward.c:689
#8  0x0047ff02 in receive_msg (
 buf=0x874a40 "SIP/2.0 200 OK\r\nVia: SIP/2.0/UDP
178.21.248.8;branch=z9hG4bKcd3a.d99d6403.0\r\nTo:
sip:+380432571...@sip.uni-tel.dk;user=phone;tag=4e546d1f8c10c6f99af9c51d895c9c87-c9ca\r\nFrom:
sip:+380432571...@sip.uni-"..., len=,
 rcv_info=0x7fffafb7dd20) at receive.c:257
#9  0x005067ab in udp_rcv_loop () at udp_server.c:520
#10 0x00455cdf in main_loop () at main.c:1447
#11 0x00456de2 in main (argc=,
argv=0x7fffafb7dfe8) at main.c:2251


(gdb) bt
#0  0x00329b6d0a19 in syscall () from /lib64/libc.so.6
#1  0x2acaeeb7ea32 in futex_get (i=) at
../../mem/../futexlock.h:100
#2  _lock (i=) at lock.h:98
#3  lock_hash (i=) at h_table.c:98
#4  0x2acaeeba1b99 in t_lookup_request (p_msg=0x8d36c8,
leave_new_locked=0, cancel=0x7fffafb7d5e0) at t_lookup.c:548
#5  0x2acaeeba3532 in t_check_msg (p_msg=0x8d36c8,
param_branch=) at t_lookup.c:1104
#6  0x2acaeebab7b1 in t_check_trans (msg=0x2acaf1f4b2e8,
foo=, bar=0x2) at
tm.c:1881
#7  0x0041345c in do_action (h=0x7fffafb7dab0, a=0x8a6f98,
msg=0x8d36c8) at action.c:860
#8  0x00415d2b in run_actions (h=0x7fffafb7dab0, a=0x89f8d8,
msg=0x8d36c8) at action.c:1293
#9  0x00416084 in run_top_route (a=0x89f8d8, msg=0x8d36c8,
c=) at action.c:1341
#10 0x0047ff4c in receive_msg (
 buf=0x874a40 "INVITE sip:20322595@178.21.248.8:5060;transport=udp
SIP/2.0\r\nRecord-Route:
\r\nVia: SIP/2.0/UDP
178.21.248.20;branch=z9hG4bKdd3a.3d0516e1.0\r\nVia: SIP/2.0/UDP
81.27."..., len=,
 rcv_info=0x7fffafb7dd20) at receive.c:196
#11 0x005067ab in udp_rcv_loop () at udp_server.c:520
#12 0x00455cdf in main_loop () at main.c:1447
#13 0x00456de2 in main (argc=,
argv=0x7fffafb7dfe8) at main.c:2251

#0  0x00329b6d0a19 in syscall () from /lib64/libc.so.6
#1  0x2acaeeb7ea32 in futex_get (i=) at
../../mem/../futexlock.h:100
#2  _lock (i=) at lock.h:98
#3  lock_hash (i=) at h_table.c:98
#4  0x2acaeeba1b99 in t_lookup_request (p_msg=0x8d36c8,
leave_new_locked=0, cancel=0x7fffafb7d5e0) at t_lookup.c:548
#5  0x2acaeeba3532 in t_check_msg (p_msg=0x8d36c8,
param_branch=) at t_lookup.c:1104
#6  0x2acaeebab7b1 in t_check_trans (msg=0x2acaf1f4b2e8,
foo=, bar=0x2) at
tm.c:1881
#7  0x0041345c in do_action (h=0x7fffafb7dab0, a=0x8a6f98,
msg=0x8d36c8) at action.c:860
#8  0x00415d2b in run_actions (h=0x7fffafb7dab0, a=0x89f8d8,
msg=0x8d36c8) at action.c:1293
#9  0x00416084 in run_top_route (a=0x89f8d8, msg=0x8d36c8,
c=) at action.c:1341
#10 0x0047ff4c in receive_msg (
 buf=0x874a40 "INVITE sip:20322595@178.21.248.8:5060;transport=udp
SIP/2.0\r\nRecord-Route:
\r\nVia: SIP/2.0/UDP
178.21.248.20;branch=z9hG4bKdd3a.3d0516e1.0\r\nVia: SIP/2.0/UDP
81.27."..., len=,
 rcv_info=0x7fffafb7dd20) at receive.c:196
#11 0x005067ab in udp_rcv_loop () at udp_server.c:520
#12 0x00455cdf in main_loop () at main.c:1447
#13 0x00456de2 in main (argc=,
argv=0x7fffafb7dfe8) at main.c:2251

(gdb) bt
#0  0x00329b6d0a19 in syscall () from /lib64/libc.so.6
#1  0x2acaeeb7e9fa in futex_get (i=) at
../../mem/../futexlock.h:113
#2  _lock (i=) at lock.h:98
#3  lock_hash (i=) at h_table.c:98
#4  0x2acaeebca0e3 in t_uac_prepare (uac_r=0x7fffafb7c270,
dst_req=0x7fffafb7c060, dst_cell=0x7fffafb7c058) at uac.c:319
#5  0x2acaeebcb4ae in t_uac_with_ids (uac_r=0x2acaf1f4b2e8,
ret_index=0x0, ret_label=0x0) at uac.c:531
#6  0x2acaeebcc9c7 in request (uac_r=0x7fffafb7c270, ruri=0x11,
to=0x1b, from=0x8e0fe0, next_hop=0x2acaf19c01c0) at uac.c:778
#7  0x2acaf17af3ff in send_publish (publ=) at
send_publish.c:556
#8  0x2acaf19c4f67 in dialog_publish (state=,
entity=, peer=,
 callid=, initiator=,
lifetime=43200, localtag=0x0, remotetag=0x0, localtarget=0x

Re: [SR-Users] Kamailio stops responding after 10 days or so

2011-04-19 Thread Morten Isaksen
Hi,

It happend again.

I got the result from gdb. Here is the output from a bt indside gdb
for each worker and one bt full for one of the workers.

They are all stuck in futexlock.h

My C debuging skills are a bit rusty so I hope one of you have some ideas.

/Morten

(gdb) bt
#0  0x00329b6d0a19 in syscall () from /lib64/libc.so.6
#1  0x2acaf17b2e0d in futex_get (t=,
type=, ps=0x7fffafb7d7a0)
at ../../mem/../futexlock.h:113
#2  publ_cback_func (t=, type=, ps=0x7fffafb7d7a0) at send_publish.c:272
#3  0x2acaeeb97815 in run_trans_callbacks_internal
(cb_lst=0x2acaf3cfd260, type=256, trans=0x2acaf3cfd1f0,
params=0x7fffafb7d7a0)
at t_hooks.c:290
#4  0x2acaeeb97a6e in run_trans_callbacks (type=0, trans=0x2,
req=0x0, rpl=0x246, code=0) at t_hooks.c:317
#5  0x2acaeebbfcb3 in local_reply (t=0x2acaf3cfd1f0, p_msg=, branch=0, msg_status=,
cancel_bitmap=) at t_reply.c:1847
#6  0x2acaeebc29e8 in reply_received (p_msg=0x8d36c8) at t_reply.c:2133
#7  0x0044780e in forward_reply (msg=0x8d36c8) at forward.c:689
#8  0x0047ff02 in receive_msg (
buf=0x874a40 "SIP/2.0 200 OK\r\nVia: SIP/2.0/UDP
178.21.248.8;branch=z9hG4bKcd3a.d99d6403.0\r\nTo:
sip:+380432571...@sip.uni-tel.dk;user=phone;tag=4e546d1f8c10c6f99af9c51d895c9c87-c9ca\r\nFrom:
sip:+380432571...@sip.uni-"..., len=,
rcv_info=0x7fffafb7dd20) at receive.c:257
#9  0x005067ab in udp_rcv_loop () at udp_server.c:520
#10 0x00455cdf in main_loop () at main.c:1447
#11 0x00456de2 in main (argc=,
argv=0x7fffafb7dfe8) at main.c:2251


(gdb) bt
#0  0x00329b6d0a19 in syscall () from /lib64/libc.so.6
#1  0x2acaeeb7ea32 in futex_get (i=) at
../../mem/../futexlock.h:100
#2  _lock (i=) at lock.h:98
#3  lock_hash (i=) at h_table.c:98
#4  0x2acaeeba1b99 in t_lookup_request (p_msg=0x8d36c8,
leave_new_locked=0, cancel=0x7fffafb7d5e0) at t_lookup.c:548
#5  0x2acaeeba3532 in t_check_msg (p_msg=0x8d36c8,
param_branch=) at t_lookup.c:1104
#6  0x2acaeebab7b1 in t_check_trans (msg=0x2acaf1f4b2e8,
foo=, bar=0x2 ) at
tm.c:1881
#7  0x0041345c in do_action (h=0x7fffafb7dab0, a=0x8a6f98,
msg=0x8d36c8) at action.c:860
#8  0x00415d2b in run_actions (h=0x7fffafb7dab0, a=0x89f8d8,
msg=0x8d36c8) at action.c:1293
#9  0x00416084 in run_top_route (a=0x89f8d8, msg=0x8d36c8,
c=) at action.c:1341
#10 0x0047ff4c in receive_msg (
buf=0x874a40 "INVITE sip:20322595@178.21.248.8:5060;transport=udp
SIP/2.0\r\nRecord-Route:
\r\nVia: SIP/2.0/UDP
178.21.248.20;branch=z9hG4bKdd3a.3d0516e1.0\r\nVia: SIP/2.0/UDP
81.27."..., len=,
rcv_info=0x7fffafb7dd20) at receive.c:196
#11 0x005067ab in udp_rcv_loop () at udp_server.c:520
#12 0x00455cdf in main_loop () at main.c:1447
#13 0x00456de2 in main (argc=,
argv=0x7fffafb7dfe8) at main.c:2251

#0  0x00329b6d0a19 in syscall () from /lib64/libc.so.6
#1  0x2acaeeb7ea32 in futex_get (i=) at
../../mem/../futexlock.h:100
#2  _lock (i=) at lock.h:98
#3  lock_hash (i=) at h_table.c:98
#4  0x2acaeeba1b99 in t_lookup_request (p_msg=0x8d36c8,
leave_new_locked=0, cancel=0x7fffafb7d5e0) at t_lookup.c:548
#5  0x2acaeeba3532 in t_check_msg (p_msg=0x8d36c8,
param_branch=) at t_lookup.c:1104
#6  0x2acaeebab7b1 in t_check_trans (msg=0x2acaf1f4b2e8,
foo=, bar=0x2 ) at
tm.c:1881
#7  0x0041345c in do_action (h=0x7fffafb7dab0, a=0x8a6f98,
msg=0x8d36c8) at action.c:860
#8  0x00415d2b in run_actions (h=0x7fffafb7dab0, a=0x89f8d8,
msg=0x8d36c8) at action.c:1293
#9  0x00416084 in run_top_route (a=0x89f8d8, msg=0x8d36c8,
c=) at action.c:1341
#10 0x0047ff4c in receive_msg (
buf=0x874a40 "INVITE sip:20322595@178.21.248.8:5060;transport=udp
SIP/2.0\r\nRecord-Route:
\r\nVia: SIP/2.0/UDP
178.21.248.20;branch=z9hG4bKdd3a.3d0516e1.0\r\nVia: SIP/2.0/UDP
81.27."..., len=,
rcv_info=0x7fffafb7dd20) at receive.c:196
#11 0x005067ab in udp_rcv_loop () at udp_server.c:520
#12 0x00455cdf in main_loop () at main.c:1447
#13 0x00456de2 in main (argc=,
argv=0x7fffafb7dfe8) at main.c:2251

(gdb) bt
#0  0x00329b6d0a19 in syscall () from /lib64/libc.so.6
#1  0x2acaeeb7e9fa in futex_get (i=) at
../../mem/../futexlock.h:113
#2  _lock (i=) at lock.h:98
#3  lock_hash (i=) at h_table.c:98
#4  0x2acaeebca0e3 in t_uac_prepare (uac_r=0x7fffafb7c270,
dst_req=0x7fffafb7c060, dst_cell=0x7fffafb7c058) at uac.c:319
#5  0x2acaeebcb4ae in t_uac_with_ids (uac_r=0x2acaf1f4b2e8,
ret_index=0x0, ret_label=0x0) at uac.c:531
#6  0x2acaeebcc9c7 in request (uac_r=0x7fffafb7c270, ruri=0x11,
to=0x1b, from=0x8e0fe0, next_hop=0x2acaf19c01c0) at uac.c:778
#7  0x2acaf17af3ff in send_publish (publ=) at
send_publish.c:556
#8  0x2acaf19c4f67 in dialog_publish (state=,
entity=, peer=,
callid=, initiator=,
lifetime=43200, localtag=0x0, remotetag=0x0, localtarget=0x0,
remotetarget=0x0) at dialog_publish.c:349
#9  0x2acaf19c6880 in __dialog_created (dlg=0x2acaf575eb38,
type=, _para

Re: [SR-Users] Kamailio stops responding after 10 days or so

2011-04-11 Thread marius zbihlei

On 04/09/2011 11:09 AM, Morten Isaksen wrote:

Hi Marius,

   


Hello,

Thanks for the update, but unfortunately I can't spot anything wrong. As 
Daniel noted, please send a trace of the worker process (you can also 
select the worker's pid from the log file)Attack with GDB for several 
times and issue a bt (or bt full) command. If the traces are different, 
this will help. Also do a ngrep to see that indeed you have traffic and 
K is not responding at all.


Marius

The carrier avp is always set in route[0].

My failure_route looks like this.

 xlog("L_WARN", "Failure route - M=$rm RURI=$ru F=$fu T=$tu
IP=$si ID=$ci\n");
 if (t_check_status("408|404|5[0-9][0-9]|6[0-9][0-9]")&&
!t_check_status("503")) {
 revert_uri();
 if (!cr_next_domain("$avp(s:carrier)", "$avp(s:domain)", "$rU",
 "$avp(s:host)",
"$T_reply_code", "$avp(s:domain)")) {
 xlog("L_ERR", "cr_next_domain failed\n");
 exit;
 }
 if (!cr_route("$avp(s:carrier)", "$avp(s:domain)", "$rU", 
"$rU",
 "call_id")) {
 xlog("L_ERR", "cr_route failed\n");
 exit;
 }
 $avp(s:host)= $rd;
 t_on_failure("COREROUTE");
 append_branch();
 xlog("L_WARN", "Outgoing M=$rm RURI=$ru F=$fu T=$tu
IP=$si ID=$ci\n");
 xlog("$si ->  $rd");
 if (!t_relay()) {
 xlog("L_ERR", "t_relay failed\n");
 exit;
 };
 }
   



___
SIP Express Router (SER) and Kamailio (OpenSER) - sr-users mailing list
sr-users@lists.sip-router.org
http://lists.sip-router.org/cgi-bin/mailman/listinfo/sr-users


Re: [SR-Users] Kamailio stops responding after 10 days or so

2011-04-09 Thread Morten Isaksen
Hi Marius,

The carrier avp is always set in route[0].

My failure_route looks like this.

xlog("L_WARN", "Failure route - M=$rm RURI=$ru F=$fu T=$tu
IP=$si ID=$ci\n");
if (t_check_status("408|404|5[0-9][0-9]|6[0-9][0-9]") &&
!t_check_status("503")) {
revert_uri();
if (!cr_next_domain("$avp(s:carrier)", "$avp(s:domain)", "$rU",
"$avp(s:host)",
"$T_reply_code", "$avp(s:domain)")) {
xlog("L_ERR", "cr_next_domain failed\n");
exit;
}
if (!cr_route("$avp(s:carrier)", "$avp(s:domain)", "$rU", "$rU",
"call_id")) {
xlog("L_ERR", "cr_route failed\n");
exit;
}
$avp(s:host)= $rd;
t_on_failure("COREROUTE");
append_branch();
xlog("L_WARN", "Outgoing M=$rm RURI=$ru F=$fu T=$tu
IP=$si ID=$ci\n");
xlog("$si -> $rd");
if (!t_relay()) {
xlog("L_ERR", "t_relay failed\n");
exit;
};
}



On Fri, Apr 8, 2011 at 11:08 AM, marius zbihlei  wrote:
> On 04/07/2011 10:02 PM, Morten Isaksen wrote:
>>
>> Hi!
>>
>> Kamailio 3.0.3.
>>
>> I have a strange problem with one of our Kamailio servers. This one is
>> used for routing (with carrierroute) and to send presence information
>> (with pua module)
>>
>> Once every 10 day or so I get this error and then Kamailio stops
>> responding to any SIP packets.
>>
>> Apr  6 08:05:48 sip-core-1 /usr/local/sbin/kamailio[9186]: WARNING:
>> 

Re: [SR-Users] Kamailio stops responding after 10 days or so

2011-04-08 Thread marius zbihlei

On 04/07/2011 10:02 PM, Morten Isaksen wrote:

Hi!

Kamailio 3.0.3.

I have a strange problem with one of our Kamailio servers. This one is
used for routing (with carrierroute) and to send presence information
(with pua module)

Once every 10 day or so I get this error and then Kamailio stops
responding to any SIP packets.

Apr  6 08:05:48 sip-core-1 /usr/local/sbin/kamailio[9186]: WARNING:

Re: [SR-Users] Kamailio stops responding after 10 days or so

2011-04-07 Thread Morten Isaksen
Hi,

No, normal CPU usage.

Thanks for the sugestion - I will try that next time.

/Morten

On Thu, Apr 7, 2011 at 10:52 PM, Daniel-Constantin Mierla
 wrote:
> Hello,
>
> do you get high CPU usage by kamailio?
>
> What you can do is to attach with gdb to kamailio processes and see what
> they are doing:
>
> gdb   /path/to/kamailio   pid_of_a_kamailio_process
> bt
>
> You should attach to the sip worker processes - you can find the type of
> processes with 'kamctl ps'.
>
> Cheers,
> Daniel
>
> On 4/7/11 9:02 PM, Morten Isaksen wrote:
>>
>> Hi!
>>
>> Kamailio 3.0.3.
>>
>> I have a strange problem with one of our Kamailio servers. This one is
>> used for routing (with carrierroute) and to send presence information
>> (with pua module)
>>
>> Once every 10 day or so I get this error and then Kamailio stops
>> responding to any SIP packets.
>>
>> Apr  6 08:05:48 sip-core-1 /usr/local/sbin/kamailio[9186]: WARNING:
>> 

Re: [SR-Users] Kamailio stops responding after 10 days or so

2011-04-07 Thread Daniel-Constantin Mierla

Hello,

do you get high CPU usage by kamailio?

What you can do is to attach with gdb to kamailio processes and see what 
they are doing:


gdb   /path/to/kamailio   pid_of_a_kamailio_process
bt

You should attach to the sip worker processes - you can find the type of 
processes with 'kamctl ps'.


Cheers,
Daniel

On 4/7/11 9:02 PM, Morten Isaksen wrote:

Hi!

Kamailio 3.0.3.

I have a strange problem with one of our Kamailio servers. This one is
used for routing (with carrierroute) and to send presence information
(with pua module)

Once every 10 day or so I get this error and then Kamailio stops
responding to any SIP packets.

Apr  6 08:05:48 sip-core-1 /usr/local/sbin/kamailio[9186]: WARNING: