Re: [SR-Users] problem unreferencing dialog in dialog module

2011-03-18 Thread Anton Roman
Hi,

more than 3 millions calls have been processed and no problem (crash,
increment in memory allocation...) has been noticed since the update, so
this check works for us.

Thanks a lot,
regards

2011/3/4 Daniel-Constantin Mierla mico...@gmail.com

 Hello,

 just committed a safety check for this case. If anyone can give it some
 tests, then we can backport.

 I will analyze to see why it got in such case, but anyhow it is better and
 safer to detect bogus dereferences to dialogs and not crash.

 Thanks,
 Daniel


 On 3/3/11 11:34 AM, Timo Reimann wrote:

 Argh:


 On 03.03.2011 11:11, Timo Reimann wrote:

 What I can tell though is that the crash happens because too much dialog
 reference counter decrementing takes place. Although I have no clue why,

^

 ...the crash happens,

  I believe the implementation of unref_dlg_unsafe() (a macro) could be
 somewhat more robust by not unlinking and destroying a dialog when the
 counter drops below zero. That is, instead of running the following block

 if ((_dlg)-ref=0) { \
 unlink_unsafe_dlg( _d_entry, _dlg);\
 LM_DBG(ref=0 for dialog %p\n,_dlg);\
 destroy_dlg(_dlg);\
 }\


  for _dlg-ref= 0, I see no reason to change the compare operator to ==.

 I see no reason *not* to change compare operator to ==. That is, I want
 the block to execute iff the reference counter is found to be zero.


 --Timo

 ___
 SIP Express Router (SER) and Kamailio (OpenSER) - sr-users mailing list
 sr-users@lists.sip-router.org
 http://lists.sip-router.org/cgi-bin/mailman/listinfo/sr-users


 --
 Daniel-Constantin Mierla
 http://www.asipto.com


___
SIP Express Router (SER) and Kamailio (OpenSER) - sr-users mailing list
sr-users@lists.sip-router.org
http://lists.sip-router.org/cgi-bin/mailman/listinfo/sr-users


Re: [SR-Users] problem unreferencing dialog in dialog module

2011-03-05 Thread Anton Roman
Ok,

 I updated the code in the server. I'm testing the changes on Tuesday and
I'll send feedback to the list.

We found dialog module very useful because of the information and
functionality it provides. For example, we are using its exported function
dlg_end_dlg to cleanly end all the active calls when stopping Kamailio is
required for maintenance reasons. We are also using the dlg_bridge function
to implement click-to-dial applications and it works fine.

On the other hand, in the logs of the server we detected the unreference
problem, we got the logs showed below quite often. I don 't know if it can
be related to the unreference problem. Since it has a CRITICAL log level I'm
not sure if this is so because it can mean a real problem or Kamailio can
safety deal with it:

Mar  2 17:21:30 kamailio2 /usr/local/sbin/kamailio[32153]: CRITICAL: dialog
[dlg_hash.c:615]: bogus event 5 in state 4 for dlg
*0x7f2d0a3d30e0*[306:1818049706] with clid '
92515995-3508071667-342...@usmiap1etx02.mydomain.com' and tags
'3508071667-342428' '7A242CC-0'
Mar  2 17:21:30 kamailio2 /usr/local/sbin/kamailio[32153]: DEBUG: dialog
[dlg_hash.c:770]: dialog *0x7f2d0a3d30e0* changed from state 4 to state 4,
due event 5
Mar  2 17:21:30 kamailio2 /usr/local/sbin/kamailio[32153]: DEBUG: tm
[t_lookup.c:1379]: DEBUG: t_newtran: msg id=4077 , global msg id=4076 , T on
entrance=(nil)
Mar  2 17:21:30 kamailio2 /usr/local/sbin/kamailio[32153]: DEBUG: tm
[t_lookup.c:528]: t_lookup_request: start searching: hash=356, isACK=0
Mar  2 17:21:30 kamailio2 /usr/local/sbin/kamailio[32153]: DEBUG: tm
[t_lookup.c:470]: DEBUG: RFC3261 transaction matched,
tid=3178c7ec929daf0e4ade2b303de82a20
Mar  2 17:21:30 kamailio2 /usr/local/sbin/kamailio[32153]: DEBUG: tm
[t_lookup.c:728]: DEBUG: t_lookup_request: transaction found
(T=0x7f2d0a82bca0)
Mar  2 17:21:30 kamailio2 /usr/local/sbin/kamailio[32153]: DEBUG: tm
[t_reply.c:1430]: DEBUG: reply retransmitted. buf=0x7f2d2eff4160: SIP/2.0
5..., shmem=0x7f2d0a72cb90: SIP/2.0 5
Mar  2 17:21:30 kamailio2 /usr/local/sbin/kamailio[32153]: DEBUG: dialog
[dlg_hash.c:599]: unref dlg *0x7f2d0a3d30e0* with 1 - 3
Mar  2 17:21:30 kamailio2 /usr/local/sbin/kamailio[32153]: DEBUG: core
[usr_avp.c:646]: DEBUG:destroy_avp_list: destroying list (nil)
Mar  2 17:21:30 kamailio2 /usr/local/sbin/kamailio[32153]: DEBUG: core
[usr_avp.c:646]:


From dlg_hash.h
...
DLG_STATE_CONFIRMED4 /*! confirmed dialog */
...
DLG_EVENT_REQPRACK 5 /*! PRACK request */
...

I understand it means we are receiving a PRACK in a confirmed dialog (ACK
received), doesn't it? I guess it can be due either to an error of the SIP
stack of the caller side or this PRACK is a rtx due to networking issues
(not probable, I think).

Thanks a lot,
regards

Antón


2011/3/4 Daniel-Constantin Mierla mico...@gmail.com

  Hello,


 On 3/3/11 10:19 AM, Anton Roman wrote:

 Hello,

  thanks for your quick reply, my answer is inline.

 2011/3/2 Daniel-Constantin Mierla mico...@gmail.com

  Hello,

 looks like related to the callbacks for dialog module. Are you loading
 other modules that require dialog module?


 we are using some features of dialog module such as ending dialogs after a
 timeout period, and we are using engage_mediaproxy() function, as well. It's
 an old configuration we had to put in production with no  time enough to
 test. Do you recommend not to use dialog module if not strictly required?


 usage of dialog module was always safe and working great for me. But I use
 it mostly alone, never with mediaproxy module, just with pua_dialoginfo
 module in some cases. From the logs, the crash was related to the callback
 system exported by dialog module for the other modules willing to hook into
 dialog, it is why I asked about the other modules to be sure there is at
 list one binding to dialog.

 So, like with other modules, if there is a problem discovered there, it is
 important that we fix it - this is a module used a lot by many. Therefore
 usage is encouraged when needed :-)

 Cheers,
 Daniel


 --
 Daniel-Constantin Mierlahttp://www.asipto.com


___
SIP Express Router (SER) and Kamailio (OpenSER) - sr-users mailing list
sr-users@lists.sip-router.org
http://lists.sip-router.org/cgi-bin/mailman/listinfo/sr-users


Re: [SR-Users] problem unreferencing dialog in dialog module

2011-03-04 Thread Daniel-Constantin Mierla

Hello,

just committed a safety check for this case. If anyone can give it some 
tests, then we can backport.


I will analyze to see why it got in such case, but anyhow it is better 
and safer to detect bogus dereferences to dialogs and not crash.


Thanks,
Daniel

On 3/3/11 11:34 AM, Timo Reimann wrote:

Argh:


On 03.03.2011 11:11, Timo Reimann wrote:

What I can tell though is that the crash happens because too much dialog
reference counter decrementing takes place. Although I have no clue why,

^

...the crash happens,


I believe the implementation of unref_dlg_unsafe() (a macro) could be
somewhat more robust by not unlinking and destroying a dialog when the
counter drops below zero. That is, instead of running the following block

if ((_dlg)-ref=0) { \
 unlink_unsafe_dlg( _d_entry, _dlg);\
 LM_DBG(ref=0 for dialog %p\n,_dlg);\
 destroy_dlg(_dlg);\
}\



for _dlg-ref= 0, I see no reason to change the compare operator to ==.

I see no reason *not* to change compare operator to ==. That is, I want
the block to execute iff the reference counter is found to be zero.


--Timo

___
SIP Express Router (SER) and Kamailio (OpenSER) - sr-users mailing list
sr-users@lists.sip-router.org
http://lists.sip-router.org/cgi-bin/mailman/listinfo/sr-users


--
Daniel-Constantin Mierla
http://www.asipto.com


___
SIP Express Router (SER) and Kamailio (OpenSER) - sr-users mailing list
sr-users@lists.sip-router.org
http://lists.sip-router.org/cgi-bin/mailman/listinfo/sr-users


Re: [SR-Users] problem unreferencing dialog in dialog module

2011-03-03 Thread Timo Reimann
Hey,


On 03.03.2011 10:19, Anton Roman wrote:
 Checking the time staps from the acc and the crash log, the BYE for
 the dialog was before the crash but the To-tag is not printed from
 dlg_hash.c, although it is in the acc for INVITE and BYE. Do you
 have parallel forking in front of this SIP server? I mean, is there
 another proxy that can do parallel forking then send two or more
 branches to this instance?
 
 AFAIK the the client who is sending that calls is not doing parallel
 forking, they are sending calls over a SIP trunk to our Kamailio. They
 are calling to PSTN numbers and we are sending that calls to a gateway,
 so they shouldn't do parallel forking, I'll get some traces to check it.  

Your trace shows that there are two worker processes dealing with the
segfault-triggering dialog, process ID 32155 and 32158. I cannot see
from your trace what module caused the latter process to execute
unref_dlg() in dlg_hash.c, however.

What I can tell though is that the crash happens because too much dialog
reference counter decrementing takes place. Although I have no clue why,
I believe the implementation of unref_dlg_unsafe() (a macro) could be
somewhat more robust by not unlinking and destroying a dialog when the
counter drops below zero. That is, instead of running the following block

if ((_dlg)-ref=0) { \
unlink_unsafe_dlg( _d_entry, _dlg);\
LM_DBG(ref =0 for dialog %p\n,_dlg);\
destroy_dlg(_dlg);\
}\

for _dlg-ref = 0, I see no reason to change the compare operator to ==.

Of course, that just cures the symptoms. A coredump would be really
helpful in identifying the root of the crash problem but I don't know
why it wasn't generated in your case. Your configuration looks good to me.


Cheers,

--Timo

___
SIP Express Router (SER) and Kamailio (OpenSER) - sr-users mailing list
sr-users@lists.sip-router.org
http://lists.sip-router.org/cgi-bin/mailman/listinfo/sr-users


Re: [SR-Users] problem unreferencing dialog in dialog module

2011-03-03 Thread Timo Reimann
Argh:


On 03.03.2011 11:11, Timo Reimann wrote:
 What I can tell though is that the crash happens because too much dialog
 reference counter decrementing takes place. Although I have no clue why,
   ^

...the crash happens,

 I believe the implementation of unref_dlg_unsafe() (a macro) could be
 somewhat more robust by not unlinking and destroying a dialog when the
 counter drops below zero. That is, instead of running the following block
 
 if ((_dlg)-ref=0) { \
 unlink_unsafe_dlg( _d_entry, _dlg);\
 LM_DBG(ref =0 for dialog %p\n,_dlg);\
 destroy_dlg(_dlg);\
 }\


 for _dlg-ref = 0, I see no reason to change the compare operator to ==.

I see no reason *not* to change compare operator to ==. That is, I want
the block to execute iff the reference counter is found to be zero.


--Timo

___
SIP Express Router (SER) and Kamailio (OpenSER) - sr-users mailing list
sr-users@lists.sip-router.org
http://lists.sip-router.org/cgi-bin/mailman/listinfo/sr-users


[SR-Users] problem unreferencing dialog in dialog module

2011-03-02 Thread Anton Roman
Hi all,

we are running Kamailio 3.1.2 in a production environment, using the dialog
module, and it crashed two hours ago.


Here you have the logs we got (addtional log fragments with the acc records
involved in this call are appended at the end of the mail):

Mar  2 14:43:05 kamailio2 /usr/local/sbin/kamailio[28927]: CRITICAL: dialog
[dlg_hash.c:599]: bogus ref -1 with cnt 1 for dlg 0x7f23f472db30
[2490:1070436595] with clid 'e0a20cb844d211e0acd8001422093865@CLIENT IP'
and tags '1577886432-3759264324-335599788-1698171170' ''
Mar  2 14:43:05 kamailio2 /usr/local/sbin/kamailio[28927]: : core
[mem/q_malloc.c:446]: BUG: qm_free: freeing already freed pointer, first
free: dialog: dlg_cb.c: destroy_dlg_callbacks_list(80) - aborting
Mar  2 14:43:05 kamailio2 /usr/local/sbin/kamailio[28896]: ALERT: core
[main.c:741]: child process 28927 exited by a signal 6
Mar  2 14:43:05 kamailio2 /usr/local/sbin/kamailio[28896]: ALERT: core
[main.c:744]: core was not generated
Mar  2 14:43:05 kamailio2 /usr/local/sbin/kamailio[28896]: INFO: core
[main.c:756]: INFO: terminating due to SIGCHLD
Mar  2 14:43:05 kamailio2 /usr/local/sbin/kamailio[28948]: INFO: core
[main.c:807]: INFO: signal 15 received
Mar  2 14:43:05 kamailio2 /usr/local/sbin/kamailio[28942]: INFO: core
[main.c:807]: INFO: signal 15 received

We get the kamailio code from git last week:

sercmd core.info
{
version: kamailio 3.1.2
id: 4ace86
compiler: gcc 4.3.2
compiled: 09:12:36 Feb 23 2011
flags: STATS: Off, USE_IPV6, USE_TCP, USE_TLS, TLS_HOOKS, USE_RAW_SOCKS,
DISABLE_NAGLE, USE_MCAST, DNS_IP_HACK, SHM_MEM, SHM_MMAP, PKG_MALLOC,
DBG_QM_MALLOC, USE_FUTEX, FAST_LOCK-ADAPTIVE_WAIT, USE_DNS_CACHE,
USE_DNS_FAILOVER, USE_NAPTR, USE_DST_BLACKLIST, HAVE_RESOLV_RES
}

The problem looks like this other one already fixed:
http://lists.sip-router.org/pipermail/sr-users/2009-November/027351.html

We set the Kamailio to debug level in case it happens again.

On the other side, I need to know why the core is not been generated. I have
already checked the points mentioned in
http://www.kamailio.org/dokuwiki/doku.php/troubleshooting:corefiles

1. disable_core_dump is not set in the config file.

2. From /etc/default/kamailio:
...
DUMP_CORE=yes
...

2. From /etc/init.d/kamailio:
...
if test $DUMP_CORE = yes ; then
# set proper ulimit
ulimit -c unlimited

# directory for the core dump files
 COREDIR=/home/corefiles
 [ -d $COREDIR ] || mkdir $COREDIR
 chmod 777 $COREDIR
 echo $COREDIR/core.%e.sig%s.%p  /proc/sys/kernel/core_pattern
fi
...

4. Writting permissions of $COREDIR

ls -hall /home
...
drwxrwxrwx  2 root   root   4.0K 2010-12-21 09:15 corefiles
...

What else should I check?

Thanks in advance,
regards

Antón


*Acc records related to the dialog whose destruction causes the problem:*

Mar  2 14:42:44 kamailio2 /usr/local/sbin/kamailio[28902]: NOTICE: acc
[acc.c:275]: ACC: transaction answered:
timestamp=1299073364;method=INVITE;from_tag=1577886432-3759264324-335599788-1698171170;to_tag=5FFAEA34-6A;call_id=e0a20cb844d211e0acd8001422093865@client
IP;code=200;reason=OK;src_user=caller number;src_domain=client
IP;dst_ouser=called
number;dst_user=called number;dst_domain=10.90.1.251;src_ip=client IP

...

Mar  2 14:42:44 kamailio2 /usr/local/sbin/kamailio[28920]: NOTICE: acc
[acc.c:275]: ACC: request acknowledged:
timestamp=1299073364;method=ACK;from_tag=1577886432-3759264324-335599788-1698171170;to_tag=5FFAEA34-6A;call_id=e0a20cb844d211e0acd8001422093865@client
IP;code=200;reason=OK;src_user=caller number;src_domain=client
IP;dst_ouser=called number;dst_user=called
number;dst_domain=10.90.1.251;src_ip=client IP
...


Mar  2 14:43:00 kamailio2 /usr/local/sbin/kamailio[28903]: ERROR: script:
ACK WITHOUT MATCHING TRANSACTION in e0a20cb844d211e0acd8001422093865@client
IP call... ignore and discard.

...

Mar  2 14:43:00 kamailio2 /usr/local/sbin/kamailio[28904]: NOTICE: acc
[acc.c:275]: ACC: transaction answered:
timestamp=1299073380;method=BYE;from_tag=1577886432-3759264324-335599788-1698171170;to_tag=5FFAEA34-6A;call_id=e0a20cb844d211e0acd8001422093865@client
IP;code=200;reason=OK;src_user=caller number;src_domain=client
IP;dst_ouser=called number;dst_user=called
number;dst_domain=10.90.1.251;src_ip=client IP
___
SIP Express Router (SER) and Kamailio (OpenSER) - sr-users mailing list
sr-users@lists.sip-router.org
http://lists.sip-router.org/cgi-bin/mailman/listinfo/sr-users