[Kamailio-Users] kamailio / deadlock3

2010-01-28 Thread Aymeric Moizard


Hi again people!

I'm currently having some trouble with my sip.antisip.com server.

Within the previous 2 or 3 days, kamailio sometimes fall into
some kind of dead lock.

I've been checking my logs while the dead lock happen, and it
seems (although I'm not sure with only the logs) that only UDP
support is broken: I can see some TLS and TCP registrations but
do not see the usual udp traffic (keep alive for example)

Any idea?

Aymeric MOIZARD / ANTISIP
amsip - http://www.antisip.com
osip2 - http://www.osip.org
eXosip2 - http://savannah.nongnu.org/projects/exosip/


___
Kamailio (OpenSER) - Users mailing list
Users@lists.kamailio.org
http://lists.kamailio.org/cgi-bin/mailman/listinfo/users
http://lists.openser-project.org/cgi-bin/mailman/listinfo/users


Re: [Kamailio-Users] kamailio / deadlock3

2010-01-28 Thread Aymeric Moizard


Got some more info:

The UDP deadlock always seems to happen after a SUBSCRIBE
is sent (in UDP) to mobipouce.com:

Jan 28 11:00:40 ns26829 /usr/sbin/kamailio[13363]: 
ERROR:core:tcp_blocking_connect: poll error: flags 18
Jan 28 11:00:40 ns26829 /usr/sbin/kamailio[13363]: 
ERROR:core:tcp_blocking_connect: failed to retrieve SO_ERROR (111) Connection 
refused
Jan 28 11:00:40 ns26829 /usr/sbin/kamailio[13363]: ERROR:core:tcpconn_connect: 
tcp_blocking_connect failed
Jan 28 11:00:40 ns26829 /usr/sbin/kamailio[13363]: ERROR:core:tcp_send: connect 
failed
Jan 28 11:00:40 ns26829 /usr/sbin/kamailio[13363]: ERROR:tm:msg_send: tcp_send 
failed
Jan 28 11:00:40 ns26829 /usr/sbin/kamailio[13363]: ERROR:tm:t_forward_nonack: 
sending request failed

This logs happens each time I got a SUSCRIBE being relayed to another 
server: mobipouce.com. But the deadlock doesn't appear each time.


mobipouce.com is an existing & running server where I can connect with UDP 
and TCP. However, the SRV record returns 2 host where one host is down.
(and I never got reply for the SUBSCRIBE: either if it fall into deadlock 
cas or not)


In case I can reproduce what step could I take to get more information 
about the issue? Any kmctl command?


Regards,
Aymeric MOIZARD / ANTISIP
amsip - http://www.antisip.com
osip2 - http://www.osip.org
eXosip2 - http://savannah.nongnu.org/projects/exosip/


On Thu, 28 Jan 2010, Aymeric Moizard wrote:



Hi again people!

I'm currently having some trouble with my sip.antisip.com server.

Within the previous 2 or 3 days, kamailio sometimes fall into
some kind of dead lock.

I've been checking my logs while the dead lock happen, and it
seems (although I'm not sure with only the logs) that only UDP
support is broken: I can see some TLS and TCP registrations but
do not see the usual udp traffic (keep alive for example)

Any idea?

Aymeric MOIZARD / ANTISIP
amsip - http://www.antisip.com
osip2 - http://www.osip.org
eXosip2 - http://savannah.nongnu.org/projects/exosip/


___
Kamailio (OpenSER) - Users mailing list
Users@lists.kamailio.org
http://lists.kamailio.org/cgi-bin/mailman/listinfo/users
http://lists.openser-project.org/cgi-bin/mailman/listinfo/users



___
Kamailio (OpenSER) - Users mailing list
Users@lists.kamailio.org
http://lists.kamailio.org/cgi-bin/mailman/listinfo/users
http://lists.openser-project.org/cgi-bin/mailman/listinfo/users


Re: [Kamailio-Users] kamailio / deadlock3

2010-01-28 Thread Daniel-Constantin Mierla

Hello,

On 1/28/10 11:18 AM, Aymeric Moizard wrote:


Got some more info:

The UDP deadlock always seems to happen after a SUBSCRIBE
is sent (in UDP) to mobipouce.com:

Jan 28 11:00:40 ns26829 /usr/sbin/kamailio[13363]: 
ERROR:core:tcp_blocking_connect: poll error: flags 18
Jan 28 11:00:40 ns26829 /usr/sbin/kamailio[13363]: 
ERROR:core:tcp_blocking_connect: failed to retrieve SO_ERROR (111) 
Connection refused
Jan 28 11:00:40 ns26829 /usr/sbin/kamailio[13363]: 
ERROR:core:tcpconn_connect: tcp_blocking_connect failed
Jan 28 11:00:40 ns26829 /usr/sbin/kamailio[13363]: 
ERROR:core:tcp_send: connect failed
Jan 28 11:00:40 ns26829 /usr/sbin/kamailio[13363]: ERROR:tm:msg_send: 
tcp_send failed
Jan 28 11:00:40 ns26829 /usr/sbin/kamailio[13363]: 
ERROR:tm:t_forward_nonack: sending request failed


This logs happens each time I got a SUSCRIBE being relayed to another 
server: mobipouce.com. But the deadlock doesn't appear each time.


mobipouce.com is an existing & running server where I can connect with 
UDP and TCP. However, the SRV record returns 2 host where one host is 
down.
(and I never got reply for the SUBSCRIBE: either if it fall into 
deadlock cas or not)


In case I can reproduce what step could I take to get more information 
about the issue? Any kmctl command?


is it recovering itself or you have to restart? How much cpu usage you get?

I if one or many eating lot of cpu, then use gdb to attach to the pid of 
process using lot of cpu and get the back trace:


gdb /path/to/kamailio pid

Cheers,
Daniel



Regards,
Aymeric MOIZARD / ANTISIP
amsip - http://www.antisip.com
osip2 - http://www.osip.org
eXosip2 - http://savannah.nongnu.org/projects/exosip/


On Thu, 28 Jan 2010, Aymeric Moizard wrote:



Hi again people!

I'm currently having some trouble with my sip.antisip.com server.

Within the previous 2 or 3 days, kamailio sometimes fall into
some kind of dead lock.

I've been checking my logs while the dead lock happen, and it
seems (although I'm not sure with only the logs) that only UDP
support is broken: I can see some TLS and TCP registrations but
do not see the usual udp traffic (keep alive for example)

Any idea?

Aymeric MOIZARD / ANTISIP
amsip - http://www.antisip.com
osip2 - http://www.osip.org
eXosip2 - http://savannah.nongnu.org/projects/exosip/


___
Kamailio (OpenSER) - Users mailing list
Users@lists.kamailio.org
http://lists.kamailio.org/cgi-bin/mailman/listinfo/users
http://lists.openser-project.org/cgi-bin/mailman/listinfo/users



___
Kamailio (OpenSER) - Users mailing list
Users@lists.kamailio.org
http://lists.kamailio.org/cgi-bin/mailman/listinfo/users
http://lists.openser-project.org/cgi-bin/mailman/listinfo/users


--
Daniel-Constantin Mierla
* http://www.asipto.com/


___
Kamailio (OpenSER) - Users mailing list
Users@lists.kamailio.org
http://lists.kamailio.org/cgi-bin/mailman/listinfo/users
http://lists.openser-project.org/cgi-bin/mailman/listinfo/users


Re: [Kamailio-Users] kamailio / deadlock3

2010-01-28 Thread Aymeric Moizard




On Thu, 28 Jan 2010, Daniel-Constantin Mierla wrote:


Hello,

On 1/28/10 11:18 AM, Aymeric Moizard wrote:


Got some more info:

The UDP deadlock always seems to happen after a SUBSCRIBE
is sent (in UDP) to mobipouce.com:

Jan 28 11:00:40 ns26829 /usr/sbin/kamailio[13363]: 
ERROR:core:tcp_blocking_connect: poll error: flags 18
Jan 28 11:00:40 ns26829 /usr/sbin/kamailio[13363]: 
ERROR:core:tcp_blocking_connect: failed to retrieve SO_ERROR (111) 
Connection refused
Jan 28 11:00:40 ns26829 /usr/sbin/kamailio[13363]: 
ERROR:core:tcpconn_connect: tcp_blocking_connect failed
Jan 28 11:00:40 ns26829 /usr/sbin/kamailio[13363]: ERROR:core:tcp_send: 
connect failed
Jan 28 11:00:40 ns26829 /usr/sbin/kamailio[13363]: ERROR:tm:msg_send: 
tcp_send failed
Jan 28 11:00:40 ns26829 /usr/sbin/kamailio[13363]: 
ERROR:tm:t_forward_nonack: sending request failed


This logs happens each time I got a SUSCRIBE being relayed to another 
server: mobipouce.com. But the deadlock doesn't appear each time.


mobipouce.com is an existing & running server where I can connect with UDP 
and TCP. However, the SRV record returns 2 host where one host is down.
(and I never got reply for the SUBSCRIBE: either if it fall into deadlock 
cas or not)


In case I can reproduce what step could I take to get more information 
about the issue? Any kmctl command?


is it recovering itself or you have to restart? How much cpu usage you get?


Not noticed any CPU issue: I'll check exactly next time. (but traffic is 
growing up as kamailio don't answer any more.


I if one or many eating lot of cpu, then use gdb to attach to the pid of 
process using lot of cpu and get the back trace:


gdb /path/to/kamailio pid


I think I can reproduce now. So I'll take a try.

It's definitly after the SRV check: the server choose the
sip2.mobipouce.com server where no sip server is running
and failed to connect. Then the network capture shows that
kamailio is still sending a few SIP packets (like NOTIFY)
but no SIP answers is coming out of kamailio.

I will do more testing, but I guess one can reproduce
by relaying to mobipouce.com!

Aymeric


Cheers,
Daniel



Regards,
Aymeric MOIZARD / ANTISIP
amsip - http://www.antisip.com
osip2 - http://www.osip.org
eXosip2 - http://savannah.nongnu.org/projects/exosip/


On Thu, 28 Jan 2010, Aymeric Moizard wrote:



Hi again people!

I'm currently having some trouble with my sip.antisip.com server.

Within the previous 2 or 3 days, kamailio sometimes fall into
some kind of dead lock.

I've been checking my logs while the dead lock happen, and it
seems (although I'm not sure with only the logs) that only UDP
support is broken: I can see some TLS and TCP registrations but
do not see the usual udp traffic (keep alive for example)

Any idea?

Aymeric MOIZARD / ANTISIP
amsip - http://www.antisip.com
osip2 - http://www.osip.org
eXosip2 - http://savannah.nongnu.org/projects/exosip/


___
Kamailio (OpenSER) - Users mailing list
Users@lists.kamailio.org
http://lists.kamailio.org/cgi-bin/mailman/listinfo/users
http://lists.openser-project.org/cgi-bin/mailman/listinfo/users



___
Kamailio (OpenSER) - Users mailing list
Users@lists.kamailio.org
http://lists.kamailio.org/cgi-bin/mailman/listinfo/users
http://lists.openser-project.org/cgi-bin/mailman/listinfo/users


--
Daniel-Constantin Mierla
* http://www.asipto.com/




___
Kamailio (OpenSER) - Users mailing list
Users@lists.kamailio.org
http://lists.kamailio.org/cgi-bin/mailman/listinfo/users
http://lists.openser-project.org/cgi-bin/mailman/listinfo/users


Re: [Kamailio-Users] kamailio / deadlock3

2010-01-28 Thread Aymeric Moizard


Hi again,

here is the backtrace I have. unfortunatly without debug symbol!
I found the same for many of the kamailio process. "sched_yield"
is pending for ever. My system is a debian/etch.

#0  0xe424 in __kernel_vsyscall ()
#1  0xb7cef4ac in sched_yield () from /lib/tls/i686/cmov/libc.so.6
#2  0x080a93fd in tcp_send ()
#3  0xb7975679 in send_pr_buffer () from /usr/lib/kamailio/modules/tm.so
#4  0xb79789ac in t_forward_nonack () from /usr/lib/kamailio/modules/tm.so
#5  0xb7974784 in t_relay_to () from /usr/lib/kamailio/modules/tm.so
#6  0xb7983a11 in load_tm () from /usr/lib/kamailio/modules/tm.so
#7  0x081cf810 in mem_pool ()
#8  0x in ?? ()

I guess most t_relay operation towards my "mobipouce.com" domain
with one IP being down breaks each kamailio process one after the
other... I'm not sure every such t_relay operation is always breaking
exactly one thread each time.

I went through the lock/unlock of tcp_main.c but it seems every
lock has an unlock at least...

Tks,
Aymeric MOIZARD / ANTISIP
amsip - http://www.antisip.com
osip2 - http://www.osip.org
eXosip2 - http://savannah.nongnu.org/projects/exosip/


On Thu, 28 Jan 2010, Aymeric Moizard wrote:





On Thu, 28 Jan 2010, Daniel-Constantin Mierla wrote:


Hello,

On 1/28/10 11:18 AM, Aymeric Moizard wrote:


Got some more info:

The UDP deadlock always seems to happen after a SUBSCRIBE
is sent (in UDP) to mobipouce.com:

Jan 28 11:00:40 ns26829 /usr/sbin/kamailio[13363]: 
ERROR:core:tcp_blocking_connect: poll error: flags 18
Jan 28 11:00:40 ns26829 /usr/sbin/kamailio[13363]: 
ERROR:core:tcp_blocking_connect: failed to retrieve SO_ERROR (111) 
Connection refused
Jan 28 11:00:40 ns26829 /usr/sbin/kamailio[13363]: 
ERROR:core:tcpconn_connect: tcp_blocking_connect failed
Jan 28 11:00:40 ns26829 /usr/sbin/kamailio[13363]: ERROR:core:tcp_send: 
connect failed
Jan 28 11:00:40 ns26829 /usr/sbin/kamailio[13363]: ERROR:tm:msg_send: 
tcp_send failed
Jan 28 11:00:40 ns26829 /usr/sbin/kamailio[13363]: 
ERROR:tm:t_forward_nonack: sending request failed


This logs happens each time I got a SUSCRIBE being relayed to another 
server: mobipouce.com. But the deadlock doesn't appear each time.


mobipouce.com is an existing & running server where I can connect with UDP 
and TCP. However, the SRV record returns 2 host where one host is down.
(and I never got reply for the SUBSCRIBE: either if it fall into deadlock 
cas or not)


In case I can reproduce what step could I take to get more information 
about the issue? Any kmctl command?


is it recovering itself or you have to restart? How much cpu usage you get?


Not noticed any CPU issue: I'll check exactly next time. (but traffic is 
growing up as kamailio don't answer any more.


I if one or many eating lot of cpu, then use gdb to attach to the pid of 
process using lot of cpu and get the back trace:


gdb /path/to/kamailio pid


I think I can reproduce now. So I'll take a try.

It's definitly after the SRV check: the server choose the
sip2.mobipouce.com server where no sip server is running
and failed to connect. Then the network capture shows that
kamailio is still sending a few SIP packets (like NOTIFY)
but no SIP answers is coming out of kamailio.

I will do more testing, but I guess one can reproduce
by relaying to mobipouce.com!

Aymeric


Cheers,
Daniel



Regards,
Aymeric MOIZARD / ANTISIP
amsip - http://www.antisip.com
osip2 - http://www.osip.org
eXosip2 - http://savannah.nongnu.org/projects/exosip/


On Thu, 28 Jan 2010, Aymeric Moizard wrote:



Hi again people!

I'm currently having some trouble with my sip.antisip.com server.

Within the previous 2 or 3 days, kamailio sometimes fall into
some kind of dead lock.

I've been checking my logs while the dead lock happen, and it
seems (although I'm not sure with only the logs) that only UDP
support is broken: I can see some TLS and TCP registrations but
do not see the usual udp traffic (keep alive for example)

Any idea?

Aymeric MOIZARD / ANTISIP
amsip - http://www.antisip.com
osip2 - http://www.osip.org
eXosip2 - http://savannah.nongnu.org/projects/exosip/


___
Kamailio (OpenSER) - Users mailing list
Users@lists.kamailio.org
http://lists.kamailio.org/cgi-bin/mailman/listinfo/users
http://lists.openser-project.org/cgi-bin/mailman/listinfo/users



___
Kamailio (OpenSER) - Users mailing list
Users@lists.kamailio.org
http://lists.kamailio.org/cgi-bin/mailman/listinfo/users
http://lists.openser-project.org/cgi-bin/mailman/listinfo/users


--
Daniel-Constantin Mierla
* http://www.asipto.com/




___
Kamailio (OpenSER) - Users mailing list
Users@lists.kamailio.org
http://lists.kamailio.org/cgi-bin/mailman/listinfo/users
http://lists.openser-project.org/cgi-bin/mailman/listinfo/users



___
Kamailio (OpenSER) - Users mailing list
Users@lists.kamailio.org
http://lists.kamailio.o

Re: [Kamailio-Users] kamailio / deadlock3

2010-01-28 Thread Henning Westerholt
On Thursday 28 January 2010, Aymeric Moizard wrote:
> here is the backtrace I have. unfortunatly without debug symbol!
> I found the same for many of the kamailio process. "sched_yield"
> is pending for ever. My system is a debian/etch.
> 
> #0  0xe424 in __kernel_vsyscall ()
> #1  0xb7cef4ac in sched_yield () from /lib/tls/i686/cmov/libc.so.6
> #2  0x080a93fd in tcp_send ()
> #3  0xb7975679 in send_pr_buffer () from /usr/lib/kamailio/modules/tm.so
> #4  0xb79789ac in t_forward_nonack () from /usr/lib/kamailio/modules/tm.so
> #5  0xb7974784 in t_relay_to () from /usr/lib/kamailio/modules/tm.so
> #6  0xb7983a11 in load_tm () from /usr/lib/kamailio/modules/tm.so
> #7  0x081cf810 in mem_pool ()
> #8  0x in ?? ()
> 
> I guess most t_relay operation towards my "mobipouce.com" domain
> with one IP being down breaks each kamailio process one after the
> other... I'm not sure every such t_relay operation is always breaking
> exactly one thread each time.
> 
> I went through the lock/unlock of tcp_main.c but it seems every
> lock has an unlock at least...

Hi Aymeric,

i remember that we observed this "sched_yield" problems on one old 0.9 system 
after some time (like weeks or month). We did not found the solution in this 
case, after a restart it was gone again.. 

You mentioned in an earlier mail that you see this related to UDP traffic, but 
in the log file and also in your investigations you think its related to TPC?

Regards,

Henning

Viele Grüße,

Henning

___
Kamailio (OpenSER) - Users mailing list
Users@lists.kamailio.org
http://lists.kamailio.org/cgi-bin/mailman/listinfo/users
http://lists.openser-project.org/cgi-bin/mailman/listinfo/users


Re: [Kamailio-Users] kamailio / deadlock3

2010-01-28 Thread Aymeric Moizard



On Thu, 28 Jan 2010, Henning Westerholt wrote:


On Thursday 28 January 2010, Aymeric Moizard wrote:

here is the backtrace I have. unfortunatly without debug symbol!
I found the same for many of the kamailio process. "sched_yield"
is pending for ever. My system is a debian/etch.

#0  0xe424 in __kernel_vsyscall ()
#1  0xb7cef4ac in sched_yield () from /lib/tls/i686/cmov/libc.so.6
#2  0x080a93fd in tcp_send ()
#3  0xb7975679 in send_pr_buffer () from /usr/lib/kamailio/modules/tm.so
#4  0xb79789ac in t_forward_nonack () from /usr/lib/kamailio/modules/tm.so
#5  0xb7974784 in t_relay_to () from /usr/lib/kamailio/modules/tm.so
#6  0xb7983a11 in load_tm () from /usr/lib/kamailio/modules/tm.so
#7  0x081cf810 in mem_pool ()
#8  0x in ?? ()

I guess most t_relay operation towards my "mobipouce.com" domain
with one IP being down breaks each kamailio process one after the
other... I'm not sure every such t_relay operation is always breaking
exactly one thread each time.

I went through the lock/unlock of tcp_main.c but it seems every
lock has an unlock at least...


Hi Aymeric,

i remember that we observed this "sched_yield" problems on one old 0.9 system
after some time (like weeks or month). We did not found the solution in this
case, after a restart it was gone again..

You mentioned in an earlier mail that you see this related to UDP traffic, but
in the log file and also in your investigations you think its related to TPC?


This is the exact case:
1-> SUBSCRIBE sent to/received by over UDP to kamailio.
2-> kamailio does a SRV record lookup for "mobipouce.com"
3-> kamailio try sip2.mobipouce.com (91.199.234.47) over TCP first
4-> connection failed with logs:
Jan 27 12:56:38 ns26829 /usr/sbin/kamailio[9763]: 
ERROR:core:tcp_blocking_connect: poll error: flags 18
Jan 27 12:56:38 ns26829 /usr/sbin/kamailio[9763]: 
ERROR:core:tcp_blocking_connect: failed to retrieve SO_ERROR (111) Connection 
refused
Jan 27 12:56:38 ns26829 /usr/sbin/kamailio[9763]: ERROR:core:tcpconn_connect: 
tcp_blocking_connect failed
Jan 27 12:56:38 ns26829 /usr/sbin/kamailio[9763]: ERROR:core:tcp_send: connect 
failed
Jan 27 12:56:38 ns26829 /usr/sbin/kamailio[9763]: ERROR:tm:msg_send: tcp_send 
failed
Jan 27 12:56:38 ns26829 /usr/sbin/kamailio[9763]: ERROR:tm:t_forward_nonack: 
sending request failed
5-> I guess kamailio is supposed to try other SRV record value:
sip2.mobipouce.com (91.199.234.46) but it doesn't

Thus, I'm guessing the issue is related to SRV record with failover OR 
just tcp failure. Not related to UDP at all.


It's definitly possible to reproduce the issue now!

I guess anyone can try your version of kamailio and t_relay message
to "mobipouce.com" and you'll fall in that case! Sending plenty of
those messages will finally lock all kamailio process.

Regards,
Aymeric MOIZARD / ANTISIP
amsip - http://www.antisip.com
osip2 - http://www.osip.org
eXosip2 - http://savannah.nongnu.org/projects/exosip/



Regards,

Henning

Viele Grüße,

Henning
___
Kamailio (OpenSER) - Users mailing list
Users@lists.kamailio.org
http://lists.kamailio.org/cgi-bin/mailman/listinfo/users
http://lists.openser-project.org/cgi-bin/mailman/listinfo/users

Re: [Kamailio-Users] kamailio / deadlock3

2010-01-28 Thread Daniel-Constantin Mierla
I am cc-ing sr-dev, since tcp code is from ser and Andrei may have more 
insights...



On 1/28/10 2:41 PM, Aymeric Moizard wrote:



On Thu, 28 Jan 2010, Henning Westerholt wrote:


On Thursday 28 January 2010, Aymeric Moizard wrote:

here is the backtrace I have. unfortunatly without debug symbol!
I found the same for many of the kamailio process. "sched_yield"
is pending for ever. My system is a debian/etch.

#0  0xe424 in __kernel_vsyscall ()
#1  0xb7cef4ac in sched_yield () from /lib/tls/i686/cmov/libc.so.6
#2  0x080a93fd in tcp_send ()
#3  0xb7975679 in send_pr_buffer () from 
/usr/lib/kamailio/modules/tm.so
#4  0xb79789ac in t_forward_nonack () from 
/usr/lib/kamailio/modules/tm.so

#5  0xb7974784 in t_relay_to () from /usr/lib/kamailio/modules/tm.so
#6  0xb7983a11 in load_tm () from /usr/lib/kamailio/modules/tm.so
#7  0x081cf810 in mem_pool ()
#8  0x in ?? ()

I guess most t_relay operation towards my "mobipouce.com" domain
with one IP being down breaks each kamailio process one after the
other... I'm not sure every such t_relay operation is always breaking
exactly one thread each time.

I went through the lock/unlock of tcp_main.c but it seems every
lock has an unlock at least...


Hi Aymeric,

i remember that we observed this "sched_yield" problems on one old 
0.9 system
after some time (like weeks or month). We did not found the solution 
in this

case, after a restart it was gone again..

You mentioned in an earlier mail that you see this related to UDP 
traffic, but
in the log file and also in your investigations you think its related 
to TPC?


This is the exact case:
1-> SUBSCRIBE sent to/received by over UDP to kamailio.
2-> kamailio does a SRV record lookup for "mobipouce.com"
3-> kamailio try sip2.mobipouce.com (91.199.234.47) over TCP first
4-> connection failed with logs:
Jan 27 12:56:38 ns26829 /usr/sbin/kamailio[9763]: 
ERROR:core:tcp_blocking_connect: poll error: flags 18
Jan 27 12:56:38 ns26829 /usr/sbin/kamailio[9763]: 
ERROR:core:tcp_blocking_connect: failed to retrieve SO_ERROR (111) 
Connection refused
Jan 27 12:56:38 ns26829 /usr/sbin/kamailio[9763]: 
ERROR:core:tcpconn_connect: tcp_blocking_connect failed
Jan 27 12:56:38 ns26829 /usr/sbin/kamailio[9763]: ERROR:core:tcp_send: 
connect failed
Jan 27 12:56:38 ns26829 /usr/sbin/kamailio[9763]: ERROR:tm:msg_send: 
tcp_send failed
Jan 27 12:56:38 ns26829 /usr/sbin/kamailio[9763]: 
ERROR:tm:t_forward_nonack: sending request failed

5-> I guess kamailio is supposed to try other SRV record value:
sip2.mobipouce.com (91.199.234.46) but it doesn't

Thus, I'm guessing the issue is related to SRV record with failover OR 
just tcp failure. Not related to UDP at all.


so TCP connect failed, the tcp worker returned as it prints the message 
and, to be sure I got it right, the UDP worker (the one that received) 
got blocked?




It's definitly possible to reproduce the issue now!

I guess anyone can try your version of kamailio and t_relay message
to "mobipouce.com" and you'll fall in that case! Sending plenty of
those messages will finally lock all kamailio process.


All? tcp and udp?

Cheers,
Daniel



Regards,
Aymeric MOIZARD / ANTISIP
amsip - http://www.antisip.com
osip2 - http://www.osip.org
eXosip2 - http://savannah.nongnu.org/projects/exosip/



Regards,

Henning

Viele Grüße,

Henning



___
Kamailio (OpenSER) - Users mailing list
Users@lists.kamailio.org
http://lists.kamailio.org/cgi-bin/mailman/listinfo/users
http://lists.openser-project.org/cgi-bin/mailman/listinfo/users


--
Daniel-Constantin Mierla
* http://www.asipto.com/

___
Kamailio (OpenSER) - Users mailing list
Users@lists.kamailio.org
http://lists.kamailio.org/cgi-bin/mailman/listinfo/users
http://lists.openser-project.org/cgi-bin/mailman/listinfo/users

Re: [Kamailio-Users] kamailio / deadlock3

2010-01-28 Thread Aymeric Moizard


Again additionnal information:

Doing new capture: after the failure, I can see that a TCP
connection is made with second SRV record: sip.mobipouce.com 
(91.199.234.46)


I got:
SYN ACK -> sip.mobipouce.com
ACK <- sip.mobipouce.com
PSH, ACK <- sip.mobipouce.com
ACK -> sip.mobipouce.com

I'm guessing that this is where the stack trace is dead locked because
no SUBSCRIBE is sent then... -> #2  0x080a93fd in tcp_send ()

strangly in this "tcp_send" method, there is no 
TCPCONN_LOCK/TCPCONN_UNLOCK: instead, there is

a
lock_get(&c->write_lock);
...
lock_release(&c->write_lock);

May be still correct anyway...

Tks,
Aymeric MOIZARD / ANTISIP
amsip - http://www.antisip.com
osip2 - http://www.osip.org
eXosip2 - http://savannah.nongnu.org/projects/exosip/


On Thu, 28 Jan 2010, Henning Westerholt wrote:


On Thursday 28 January 2010, Aymeric Moizard wrote:

here is the backtrace I have. unfortunatly without debug symbol!
I found the same for many of the kamailio process. "sched_yield"
is pending for ever. My system is a debian/etch.

#0  0xe424 in __kernel_vsyscall ()
#1  0xb7cef4ac in sched_yield () from /lib/tls/i686/cmov/libc.so.6
#2  0x080a93fd in tcp_send ()
#3  0xb7975679 in send_pr_buffer () from /usr/lib/kamailio/modules/tm.so
#4  0xb79789ac in t_forward_nonack () from /usr/lib/kamailio/modules/tm.so
#5  0xb7974784 in t_relay_to () from /usr/lib/kamailio/modules/tm.so
#6  0xb7983a11 in load_tm () from /usr/lib/kamailio/modules/tm.so
#7  0x081cf810 in mem_pool ()
#8  0x in ?? ()

I guess most t_relay operation towards my "mobipouce.com" domain
with one IP being down breaks each kamailio process one after the
other... I'm not sure every such t_relay operation is always breaking
exactly one thread each time.

I went through the lock/unlock of tcp_main.c but it seems every
lock has an unlock at least...


Hi Aymeric,

i remember that we observed this "sched_yield" problems on one old 0.9 system
after some time (like weeks or month). We did not found the solution in this
case, after a restart it was gone again..

You mentioned in an earlier mail that you see this related to UDP traffic, but
in the log file and also in your investigations you think its related to TPC?

Regards,

Henning

Viele Grüße,

Henning
___
Kamailio (OpenSER) - Users mailing list
Users@lists.kamailio.org
http://lists.kamailio.org/cgi-bin/mailman/listinfo/users
http://lists.openser-project.org/cgi-bin/mailman/listinfo/users

Re: [Kamailio-Users] kamailio / deadlock3

2010-01-28 Thread Aymeric Moizard


some other answer below:

On Thu, 28 Jan 2010, Daniel-Constantin Mierla wrote:

I am cc-ing sr-dev, since tcp code is from ser and Andrei may have more 
insights...



On 1/28/10 2:41 PM, Aymeric Moizard wrote:



On Thu, 28 Jan 2010, Henning Westerholt wrote:


On Thursday 28 January 2010, Aymeric Moizard wrote:

here is the backtrace I have. unfortunatly without debug symbol!
I found the same for many of the kamailio process. "sched_yield"
is pending for ever. My system is a debian/etch.

#0  0xe424 in __kernel_vsyscall ()
#1  0xb7cef4ac in sched_yield () from /lib/tls/i686/cmov/libc.so.6
#2  0x080a93fd in tcp_send ()
#3  0xb7975679 in send_pr_buffer () from /usr/lib/kamailio/modules/tm.so
#4  0xb79789ac in t_forward_nonack () from 
/usr/lib/kamailio/modules/tm.so

#5  0xb7974784 in t_relay_to () from /usr/lib/kamailio/modules/tm.so
#6  0xb7983a11 in load_tm () from /usr/lib/kamailio/modules/tm.so
#7  0x081cf810 in mem_pool ()
#8  0x in ?? ()

I guess most t_relay operation towards my "mobipouce.com" domain
with one IP being down breaks each kamailio process one after the
other... I'm not sure every such t_relay operation is always breaking
exactly one thread each time.

I went through the lock/unlock of tcp_main.c but it seems every
lock has an unlock at least...


Hi Aymeric,

i remember that we observed this "sched_yield" problems on one old 0.9 
system
after some time (like weeks or month). We did not found the solution in 
this

case, after a restart it was gone again..

You mentioned in an earlier mail that you see this related to UDP traffic, 
but
in the log file and also in your investigations you think its related to 
TPC?


This is the exact case:
1-> SUBSCRIBE sent to/received by over UDP to kamailio.
2-> kamailio does a SRV record lookup for "mobipouce.com"
3-> kamailio try sip2.mobipouce.com (91.199.234.47) over TCP first
4-> connection failed with logs:
Jan 27 12:56:38 ns26829 /usr/sbin/kamailio[9763]: 
ERROR:core:tcp_blocking_connect: poll error: flags 18
Jan 27 12:56:38 ns26829 /usr/sbin/kamailio[9763]: 
ERROR:core:tcp_blocking_connect: failed to retrieve SO_ERROR (111) 
Connection refused
Jan 27 12:56:38 ns26829 /usr/sbin/kamailio[9763]: 
ERROR:core:tcpconn_connect: tcp_blocking_connect failed
Jan 27 12:56:38 ns26829 /usr/sbin/kamailio[9763]: ERROR:core:tcp_send: 
connect failed
Jan 27 12:56:38 ns26829 /usr/sbin/kamailio[9763]: ERROR:tm:msg_send: 
tcp_send failed
Jan 27 12:56:38 ns26829 /usr/sbin/kamailio[9763]: 
ERROR:tm:t_forward_nonack: sending request failed

5-> I guess kamailio is supposed to try other SRV record value:
sip2.mobipouce.com (91.199.234.46) but it doesn't

Thus, I'm guessing the issue is related to SRV record with failover OR just 
tcp failure. Not related to UDP at all.


so TCP connect failed, the tcp worker returned as it prints the message and, 
to be sure I got it right, the UDP worker (the one that received) got 
blocked?


1-> TCP connect failed
2-> second SRV is used: TCP connect succeed, but lock in tcp_send

That's what I understand.

I have tested a TCP connection to my server: It seems to be still
working.


It's definitly possible to reproduce the issue now!

I guess anyone can try your version of kamailio and t_relay message
to "mobipouce.com" and you'll fall in that case! Sending plenty of
those messages will finally lock all kamailio process.


All? tcp and udp?


Only udp!
Aymeric


Cheers,
Daniel



Regards,
Aymeric MOIZARD / ANTISIP
amsip - http://www.antisip.com
osip2 - http://www.osip.org
eXosip2 - http://savannah.nongnu.org/projects/exosip/



Regards,

Henning

Viele Grüße,

Henning



___
Kamailio (OpenSER) - Users mailing list
Users@lists.kamailio.org
http://lists.kamailio.org/cgi-bin/mailman/listinfo/users
http://lists.openser-project.org/cgi-bin/mailman/listinfo/users


--
Daniel-Constantin Mierla
* http://www.asipto.com/

___
Kamailio (OpenSER) - Users mailing list
Users@lists.kamailio.org
http://lists.kamailio.org/cgi-bin/mailman/listinfo/users
http://lists.openser-project.org/cgi-bin/mailman/listinfo/users

Re: [Kamailio-Users] kamailio / deadlock3

2010-01-28 Thread Daniel-Constantin Mierla

Hello,

On 1/28/10 3:34 PM, Aymeric Moizard wrote:


some other answer below:

On Thu, 28 Jan 2010, Daniel-Constantin Mierla wrote:

I am cc-ing sr-dev, since tcp code is from ser and Andrei may have 
more insights...



On 1/28/10 2:41 PM, Aymeric Moizard wrote:



On Thu, 28 Jan 2010, Henning Westerholt wrote:


On Thursday 28 January 2010, Aymeric Moizard wrote:

here is the backtrace I have. unfortunatly without debug symbol!



can you recompile with debug symbols? Do you have it installed from 
package or sources? It will give more hints about the place in the 
function...


I will try to reproduce, but now I do not have the proper environment 
for testing...


Thanks,
Daniel





I found the same for many of the kamailio process. "sched_yield"
is pending for ever. My system is a debian/etch.

#0  0xe424 in __kernel_vsyscall ()
#1  0xb7cef4ac in sched_yield () from /lib/tls/i686/cmov/libc.so.6
#2  0x080a93fd in tcp_send ()
#3  0xb7975679 in send_pr_buffer () from 
/usr/lib/kamailio/modules/tm.so
#4  0xb79789ac in t_forward_nonack () from 
/usr/lib/kamailio/modules/tm.so

#5  0xb7974784 in t_relay_to () from /usr/lib/kamailio/modules/tm.so
#6  0xb7983a11 in load_tm () from /usr/lib/kamailio/modules/tm.so
#7  0x081cf810 in mem_pool ()
#8  0x in ?? ()

I guess most t_relay operation towards my "mobipouce.com" domain
with one IP being down breaks each kamailio process one after the
other... I'm not sure every such t_relay operation is always breaking
exactly one thread each time.

I went through the lock/unlock of tcp_main.c but it seems every
lock has an unlock at least...


Hi Aymeric,

i remember that we observed this "sched_yield" problems on one old 
0.9 system
after some time (like weeks or month). We did not found the 
solution in this

case, after a restart it was gone again..

You mentioned in an earlier mail that you see this related to UDP 
traffic, but
in the log file and also in your investigations you think its 
related to TPC?


This is the exact case:
1-> SUBSCRIBE sent to/received by over UDP to kamailio.
2-> kamailio does a SRV record lookup for "mobipouce.com"
3-> kamailio try sip2.mobipouce.com (91.199.234.47) over TCP first
4-> connection failed with logs:
Jan 27 12:56:38 ns26829 /usr/sbin/kamailio[9763]: 
ERROR:core:tcp_blocking_connect: poll error: flags 18
Jan 27 12:56:38 ns26829 /usr/sbin/kamailio[9763]: 
ERROR:core:tcp_blocking_connect: failed to retrieve SO_ERROR (111) 
Connection refused
Jan 27 12:56:38 ns26829 /usr/sbin/kamailio[9763]: 
ERROR:core:tcpconn_connect: tcp_blocking_connect failed
Jan 27 12:56:38 ns26829 /usr/sbin/kamailio[9763]: 
ERROR:core:tcp_send: connect failed
Jan 27 12:56:38 ns26829 /usr/sbin/kamailio[9763]: ERROR:tm:msg_send: 
tcp_send failed
Jan 27 12:56:38 ns26829 /usr/sbin/kamailio[9763]: 
ERROR:tm:t_forward_nonack: sending request failed

5-> I guess kamailio is supposed to try other SRV record value:
sip2.mobipouce.com (91.199.234.46) but it doesn't

Thus, I'm guessing the issue is related to SRV record with failover 
OR just tcp failure. Not related to UDP at all.


so TCP connect failed, the tcp worker returned as it prints the 
message and, to be sure I got it right, the UDP worker (the one that 
received) got blocked?


1-> TCP connect failed
2-> second SRV is used: TCP connect succeed, but lock in tcp_send

That's what I understand.

I have tested a TCP connection to my server: It seems to be still
working.


It's definitly possible to reproduce the issue now!

I guess anyone can try your version of kamailio and t_relay message
to "mobipouce.com" and you'll fall in that case! Sending plenty of
those messages will finally lock all kamailio process.


All? tcp and udp?


Only udp!
Aymeric


Cheers,
Daniel



Regards,
Aymeric MOIZARD / ANTISIP
amsip - http://www.antisip.com
osip2 - http://www.osip.org
eXosip2 - http://savannah.nongnu.org/projects/exosip/



Regards,

Henning

Viele Grüße,

Henning



___
Kamailio (OpenSER) - Users mailing list
Users@lists.kamailio.org
http://lists.kamailio.org/cgi-bin/mailman/listinfo/users
http://lists.openser-project.org/cgi-bin/mailman/listinfo/users


--
Daniel-Constantin Mierla
* http://www.asipto.com/




___
Kamailio (OpenSER) - Users mailing list
Users@lists.kamailio.org
http://lists.kamailio.org/cgi-bin/mailman/listinfo/users
http://lists.openser-project.org/cgi-bin/mailman/listinfo/users


--
Daniel-Constantin Mierla
* http://www.asipto.com/

___
Kamailio (OpenSER) - Users mailing list
Users@lists.kamailio.org
http://lists.kamailio.org/cgi-bin/mailman/listinfo/users
http://lists.openser-project.org/cgi-bin/mailman/listinfo/users