Re: [SR-Users] KEMI and Native mixing

2020-10-26 Thread Daniel-Constantin Mierla
Hello,

adding a bit more on this topic: it should be also possible to execute
native config routing blocks from KEMI using
KSR.cfgutils.route_if_exists() or KSR.route().

Just test before when combining use of route or event_route (especially
for this ones) from KEMI, to be sure that the conditions in the c code
do not prevent execution of particular native config blocks when cfg
engine is set to a KEMI interpreter. Open an issue on tracking whenever
such case if found in order to be sorted out properly.

Cheers,
Daniel

On 23.10.20 07:19, Arsen Semenov wrote:
> Hi Marcin,
> As a bit addition to the above comments
>  once you set cfgengine to your prefered lang the routes become the
> language's functions which should be defined, but you still can keep
> some "event-routes" in native kamailio language, they are being
> initialised at startup.  
> If you migrate a large script, I'd suggest you split it into several
> logically or by route divided scripts, so by this way you can
> load/reload and troubleshoot them separately. Testing framework could
> be much help. 
>
>
> Regards,
>
>
> On Thu, Oct 22, 2020 at 10:13 PM Marcin Kowalczyk
>  wrote:
>
> Hi,
>
>  Thanks for your reply. 
> I have quite a large and complex kamailio.cfg and it would be
> quite a challenging task to migrate it in one go. This is the
> reason why I was thinking to do it block by block to be sure it
> will not break a logic. 
>
> Regards
> Marcin
>
> On Wed, Oct 21, 2020 at 8:16 PM Henning Westerholt  > wrote:
>
> Hello,
>
>  
>
> other people might be able to add more, but what you can
> certainly can do is to have a usual kamailio.cfg and then use
> e.g. lua_run(..) to execute some functions defined in the
> loaded lua script.
>
> In my opinion you will add some more complexity if you start
> to mix also the routes, but it might be fine for a certain
> migration period.
>
>  
>
> Cheers,
>
>  
>
> Henning
>
>  
>
>  
>
> -- 
>
> Henning Westerholt – https://skalatan.de/blog/
> 
>
> Kamailio services – https://gilawa.com 
>
>  
>
> *From:* sr-users  > *On Behalf Of
> *Marcin Kowalczyk
> *Sent:* Wednesday, October 21, 2020 7:25 PM
> *To:* sr-users@lists.kamailio.org
> 
> *Subject:* [SR-Users] KEMI and Native mixing
>
>  
>
> Hi,
>
>  
>
>  Is it possible to mix both KEMI (lua) and Native configs in
> one kamailio instance? So some blocks are called from native
> script and some others from KEMI (lua)? 
>
>  
>
> Marcin
>
> ___
> Kamailio (SER) - Users Mailing List
> sr-users@lists.kamailio.org 
> https://lists.kamailio.org/cgi-bin/mailman/listinfo/sr-users
> 
>
>
>
> -- 
> Arsen Semenov
>
>
> ___
> Kamailio (SER) - Users Mailing List
> sr-users@lists.kamailio.org
> https://lists.kamailio.org/cgi-bin/mailman/listinfo/sr-users

-- 
Daniel-Constantin Mierla -- www.asipto.com
www.twitter.com/miconda -- www.linkedin.com/in/miconda
Funding: https://www.paypal.me/dcmierla

___
Kamailio (SER) - Users Mailing List
sr-users@lists.kamailio.org
https://lists.kamailio.org/cgi-bin/mailman/listinfo/sr-users


[SR-Users] Kamailio unresponsive with Dialog+DMQ

2020-10-26 Thread Patrick Wakano
Hello list,
Hope all are doing well!

We are running load tests in our Kamailio server, that is just making
inbound and outbound calls and eventually (there is no identified pattern)
Kamailio freezes and of course all calls start to fail. It does not crash,
it just stops responding and it has to be killed -9. When this happens, SIP
messages are not processed, dmq keepalive fails (so the other node reports
as down), dialog KA are not sent, but Registrations from UAC seem to still
go out (logs from local_route are seen).
We don't have a high amount of cps, it is max 3 or 4 per sec, and it gets
around 1900 active calls. We are now using Kamailio 5.2.8 installed from
the repo on a CentOS7 server. Dialog has KA active and DMQ (with 2 workers)
is being used on an active-active instance.
>From investigation using GDB as pasted below, I can see UDP workers are
stuck on a lock either on a callback from t_relay...
#0  0x7ffb74e9bbf9 in syscall () from /lib64/libc.so.6
#1  0x7ffb2b1bce08 in futex_get (lock=0x7ffb35217b90) at
../../core/futexlock.h:108
#2  0x7ffb2b1bec44 in bcast_dmq_message1 (peer=0x7ffb35e8bf38,
body=0x7fff2e95ffb0, except=0x0, resp_cback=0x7ffb2a8a0ab0
, max_forwards=1, content_type=0x7ffb2a8a0a70
, incl_inactive=0) at dmq_funcs.c:156
#3  0x7ffb2b1bf46b in bcast_dmq_message (peer=0x7ffb35e8bf38,
body=0x7fff2e95ffb0, except=0x0, resp_cback=0x7ffb2a8a0ab0
, max_forwards=1, content_type=0x7ffb2a8a0a70
) at dmq_funcs.c:188
#4  0x7ffb2a6448fa in dlg_dmq_send (body=0x7fff2e95ffb0, node=0x0) at
dlg_dmq.c:88
#5  0x7ffb2a64da5d in dlg_dmq_replicate_action (action=DLG_DMQ_UPDATE,
dlg=0x7ffb362ea3c8, needlock=1, node=0x0) at dlg_dmq.c:628
#6  0x7ffb2a61f28e in dlg_on_send (t=0x7ffb36c98120, type=16,
param=0x7fff2e9601e0) at dlg_handlers.c:739
#7  0x7ffb2ef285b6 in run_trans_callbacks_internal
(cb_lst=0x7ffb36c98198, type=16, trans=0x7ffb36c98120,
params=0x7fff2e9601e0) at t_hooks.c:260
#8  0x7ffb2ef286d0 in run_trans_callbacks (type=16,
trans=0x7ffb36c98120, req=0x7ffb742f27e0, rpl=0x0, code=-1) at t_hooks.c:287
#9  0x7ffb2ef38ac1 in prepare_new_uac (t=0x7ffb36c98120,
i_req=0x7ffb742f27e0, branch=0, uri=0x7fff2e9603e0, path=0x7fff2e9603c0,
next_hop=0x7ffb742f2a58, fsocket=0x7ffb73e3e968, snd_flags=..., fproto=0,
flags=2, instance=0x7fff2e9603b0, ruid=0x7fff2e9603a0,
location_ua=0x7fff2e960390) at t_fwd.c:381
#10 0x7ffb2ef3d02d in add_uac (t=0x7ffb36c98120,
request=0x7ffb742f27e0, uri=0x7ffb742f2a58, next_hop=0x7ffb742f2a58,
path=0x7ffb742f2e20, proxy=0x0, fsocket=0x7ffb73e3e968, snd_flags=...,
proto=0, flags=2, instance=0x7ffb742f2e30, ruid=0x7ffb742f2e48,
location_ua=0x7ffb742f2e58) at t_fwd.c:811
#11 0x7ffb2ef4535a in t_forward_nonack (t=0x7ffb36c98120,
p_msg=0x7ffb742f27e0, proxy=0x0, proto=0) at t_fwd.c:1699
#12 0x7ffb2ef20505 in t_relay_to (p_msg=0x7ffb742f27e0, proxy=0x0,
proto=0, replicate=0) at t_funcs.c:334

or loose_route...
#0  0x7ffb74e9bbf9 in syscall () from /lib64/libc.so.6
#1  0x7ffb2b1bce08 in futex_get (lock=0x7ffb35217b90) at
../../core/futexlock.h:108
#2  0x7ffb2b1bec44 in bcast_dmq_message1 (peer=0x7ffb35e8bf38,
body=0x7fff2e9629d0, except=0x0, resp_cback=0x7ffb2a8a0ab0
, max_forwards=1, content_type=0x7ffb2a8a0a70
, incl_inactive=0) at dmq_funcs.c:156
#3  0x7ffb2b1bf46b in bcast_dmq_message (peer=0x7ffb35e8bf38,
body=0x7fff2e9629d0, except=0x0, resp_cback=0x7ffb2a8a0ab0
, max_forwards=1, content_type=0x7ffb2a8a0a70
) at dmq_funcs.c:188
#4  0x7ffb2a6448fa in dlg_dmq_send (body=0x7fff2e9629d0, node=0x0) at
dlg_dmq.c:88
#5  0x7ffb2a64da5d in dlg_dmq_replicate_action (action=DLG_DMQ_STATE,
dlg=0x7ffb363e0c10, needlock=0, node=0x0) at dlg_dmq.c:628
#6  0x7ffb2a62b3bf in dlg_onroute (req=0x7ffb742f11d0,
route_params=0x7fff2e962ce0, param=0x0) at dlg_handlers.c:1538
#7  0x7ffb2e7db203 in run_rr_callbacks (req=0x7ffb742f11d0,
rr_param=0x7fff2e962d80) at rr_cb.c:96
#8  0x7ffb2e7eb2f9 in after_loose (_m=0x7ffb742f11d0, preloaded=0) at
loose.c:945
#9  0x7ffb2e7eb990 in loose_route (_m=0x7ffb742f11d0) at loose.c:979

or  t_check_trans:
#0  0x7ffb74e9bbf9 in syscall () from /lib64/libc.so.6
#1  0x7ffb2a5ea9c6 in futex_get (lock=0x7ffb35e78804) at
../../core/futexlock.h:108
#2  0x7ffb2a5f1c46 in dlg_lookup_mode (h_entry=1609, h_id=59882,
lmode=0) at dlg_hash.c:709
#3  0x7ffb2a5f27aa in dlg_get_by_iuid (diuid=0x7ffb36326bd0) at
dlg_hash.c:777
#4  0x7ffb2a61ba1d in dlg_onreply (t=0x7ffb36952988, type=2,
param=0x7fff2e963bf0) at dlg_handlers.c:437
#5  0x7ffb2ef285b6 in run_trans_callbacks_internal
(cb_lst=0x7ffb36952a00, type=2, trans=0x7ffb36952988,
params=0x7fff2e963bf0) at t_hooks
.c:260
#6  0x7ffb2ef286d0 in run_trans_callbacks (type=2,
trans=0x7ffb36952988, req=0x7ffb3675c360, rpl=0x7ffb742f1930, code=200) at
t_hooks.c:28
7
#7  0x7ffb2ee7037f in t_reply_matching (p_msg=0x7ffb742f1930,
p_branch=0x7fff2e963ebc) at t_lookup.c:997
#8  0x7ffb2ee725e4 in t_check_msg (p_msg=0x7ffb

Re: [SR-Users] Kamailio unresponsive with Dialog+DMQ

2020-10-26 Thread Patrick Wakano
Sorry to bother again, but another problem we are facing and that now seems
related is an increasing memory usage similar to a memory leak.
So during the load test, Kamailio shmem starts increasing fast making it
run out of shmem. Today it happened again and I could retrieve some info
and looks like the "leak" is likely due to the DMQ workers. The kamcmd
shmem stats showed high usage in core and dmq:
# kamcmd mod.stats all shm
Module: core
{
sip_msg_shm_clone(496): 959632704
counters_prefork_init(207): 61440
cfg_clone_str(130): 392
cfg_shmize(217): 1496
main_loop(1295): 8
init_pt(113): 8
init_pt(108): 8
init_pt(107): 5920
cfg_parse_str(906): 80
register_timer(1012): 432
cfg_register_ctx(47): 96
init_tcp(4714): 8192
init_tcp(4708): 32768
init_tcp(4700): 8
init_tcp(4693): 8
init_tcp(4686): 8
init_tcp(4680): 8
init_tcp(4668): 8
init_avps(90): 8
init_avps(89): 8
init_dst_blacklist(437): 16384
init_dst_blacklist(430): 8
timer_alloc(515): 96
init_dns_cache(358): 8
init_dns_cache(350): 16384
init_dns_cache(343): 16
init_dns_cache(336): 8
init_timer(284): 8
init_timer(283): 16384
init_timer(282): 8
init_timer(281): 8
init_timer(270): 8
init_timer(238): 8
init_timer(221): 278544
init_timer(220): 8
init_timer(207): 8
cfg_child_cb_new(828): 64
sr_cfg_init(360): 8
sr_cfg_init(353): 8
sr_cfg_init(346): 8
sr_cfg_init(334): 8
sr_cfg_init(322): 8
qm_shm_lock_init(1202): 8
Total: 960071600
}
Module: dmq
{
alloc_job_queue(250): 64
shm_str_dup(723): 48
build_dmq_node(164): 896
add_peer(68): 312
mod_init(240): 8
mod_init(233): 48
init_dmq_node_list(70): 24
init_peer_list(33): 24
job_queue_push(286): 15369848
Total: 15371272
}

>From GDB I could see both workers were stuck here:
Thread 1 (Thread 0x7f4a3cba7740 (LWP 17401)):
#0  0x7f4a3c29dbf9 in syscall () from /lib64/libc.so.6
#1  0x7f49f1a44bdd in futex_get (lock=0x7fff1d14f564) at
../../core/futexlock.h:121
#2  0x7f49f1a4fe57 in dmq_send_all_dlgs (dmq_node=0x0) at dlg_dmq.c:657
mypid = 17401
index = 2543
entry = {first = 0x0, last = 0x0, next_id = 107271, lock = {val =
2}, locker_pid = {val = 17393}, rec_lock_level = 0}
dlg = 0x0
__FUNCTION__ = "dmq_send_all_dlgs"
#3  0x7f49f1a4c000 in dlg_dmq_handle_msg (msg=0x7f49fe88b2b8,
resp=0x7fff1d14f8e0, node=0x7f49fd4be5d8) at dlg_dmq.c:391
#4  0x7f49f25da34a in worker_loop (id=1) at worker.c:113
#5  0x7f49f25d7d35 in child_init (rank=0) at dmq.c:300

Thread 1 (Thread 0x7f4a3cba7740 (LWP 17400)):
#0  0x7f4a3c29dbf9 in syscall () from /lib64/libc.so.6
#1  0x7f49f1a44bdd in futex_get (lock=0x7fff1d14f564) at
../../core/futexlock.h:121
#2  0x7f49f1a4fe57 in dmq_send_all_dlgs (dmq_node=0x0) at dlg_dmq.c:657
mypid = 17400
index = 1080
entry = {first = 0x7f49fe575ad0, last = 0x7f49fe575ad0, next_id =
54971, lock = {val = 2}, locker_pid = {val = 17385}, rec_lock_level = 0}
dlg = 0x0
__FUNCTION__ = "dmq_send_all_dlgs"
#3  0x7f49f1a4c000 in dlg_dmq_handle_msg (msg=0x7f49fe437878,
resp=0x7fff1d14f8e0, node=0x7f49fd4be5d8) at dlg_dmq.c:391
#4  0x7f49f25da34a in worker_loop (id=0) at worker.c:113
#5  0x7f49f25d7d35 in child_init (rank=0) at dmq.c:300

>From my analysis, it seems memory was increasing because every new call was
adding a new job to the dmq workers queue but the workers were stuck in the
mutex not consuming the jobs. We could not figure out which process had the
mutex, because from GDB the locker_pid had an UDP process and a timer
process, but both seemed to be just waiting in normal operation.

Also, I didn't create a ticket because 5.2 is deprecated, but I found it
worth reporting this potential problem in the list since it looks like the
code for the DMQ is similar between 5.4 and 5.2.


On Tue, 27 Oct 2020 at 09:22, Patrick Wakano  wrote:

> Hello list,
> Hope all are doing well!
>
> We are running load tests in our Kamailio server, that is just making
> inbound and outbound calls and eventually (there is no identified pattern)
> Kamailio freezes and of course all calls start to fail. It does not crash,
> it just stops responding and it has to be killed -9. When this happens, SIP
> messages are not processed, dmq keepalive fails (so the other node reports
> as down), dialog KA are not sent, but Registrations from UAC seem to still
> go out (logs from local_route are seen).
> We don't have a high amount of cps, it is max 3 or 4 per sec, and it gets
> around 1900 active calls. We are now using Kamailio 5.2.8 installed from
> the repo on a CentOS7 server. Dialog has KA active and DMQ (with 2 workers)
> is being used on an active-active instance.
> From investigation using GDB as pasted below, I can see UDP workers are
> stuck on a lock either on a callback from t_relay...
> #0  0x7ffb74e9bbf9 in syscall () from /lib64/libc.so.6
> #1  0x7ffb2b1bce08 in futex_get (lock=0x7ffb35217b90) at
> ../../core/futexlock.h:108
> #2  0x7ffb2b1bec44 in bcast_dmq_message1 (peer=0x7ffb35e8bf38,
> body=0x7fff