[SR-Users] Re: Finding a way to gracefully restart kamailio with dialog module enabled and DMQ sync

2023-07-07 Thread Nick F
Hello,

Have you tried to use REDIS as a backend for the DLG module?
You can count active calls per node like:


modparam("htable", "htable", "dialog_counter=>size=8;")

event_route[dialog:start]{

if($sht(dialog_counter=>count) == $null){
   sht_lock(dialog_counter=>count);
   $sht(dialog_counter=>count) = 1;
   sht_unlock(dialog_counter=>count);
}else {
   $sht(dialog_counter=>count) + 1;
   }
}

event_route[dialog:end]{
   $sht(dialog_counter=>count) - 1;
}

event_route[dialog:failed]{
$sht(dialog_counter=>count) - 1;
}



Then you can get

kamcmd htable.get dialog_counter count



ср, 5 июл. 2023 г. в 17:28, Daniel-Constantin Mierla :

> Hello,
>
> while I did some work for the dialog module over the time, it is not one
> of my favourites modules beside using it to ensure a maximum duration of
> calls (for which it should work fine). Also, I never ended up using it
> for CDRs generation, I like the acc event based account which can record
> more events event for the same call.
>
> That said, for active calls limiting I usually rely on other solutions
> built via config file and leveraging htable or various backends. Also,
> for values that I need to use during call duration, I use htable.
>
> Anyhow, I find it strange that after restart a request within dialog
> does not match the record loaded in memory, because obviously it is
> there as you say the dialog times out at some point in time later. Did
> you change the value of modparam hash_size?
>
> Have you captured the sip traffic and can you see the 'did' parameter in
> the Route headers of BYE?
>
> Cheers,
> Daniel
>
> On 05.07.23 15:44, Benoît Panizzon wrote:
> > Hi Daniel
> >
> > PS: Kamailio 5.5 in use so not on the edge yet.
> >
> > Thank you for helping regarding that issue and maybe hinting how it
> > could be improved.
> >
> >> what is the purpose of dmq replication? To limit active calls?
> > Exactly. Our subscriptions contain a certain number of 'channels'. If
> > they are used the customer is busy.
> >
> > So I use profile counters to track the used channel count per customer.
> >
> >> What exactly happens? What means "corrupts" them? What data/fields
> >> become corrupted?
> > It looks like the/some dialogues just don't exist any more after a
> > reload. Or they exist but are not being found.
> >
> > Observed issues:
> >
> > * dialogue variables that were populated before the restart do not
> >   exist any more.
> > * When a call ends, the corresponding dialogue is not found, so the
> >   dialogue modules is unable to end the CDR - but when the dialogue
> >   timeout hits, the CDR is then written with duration = timeout value
> >   which is way longer than the actual duration.
> > * profile counter for dialogues that were not found when the call ended
> >   are still present so 'POTS' customers with 'one' channel stay 'busy'
> >   until the dialogue timeout hits.
> > * Database accumulates data from dialogues that do not exist anymore.
> >
> > Specific error I see, when a dialogue should be ended and kamailio
> > can't find it anymore after a restart is:
> >
> > ERROR: dialog [dlg_dmq.c:289]: dlg_dmq_handle_msg(): dialog [838:15539]
> > not found
> >
> > If you could help, I could try to dig out the full log of a dialogue
> > experiencing that issue.
> >
> > Dialog Parameters used:
> >
> > modparam("dialog", "send_bye", 1)
> > modparam("dialog", "timer_procs", 0)
> > modparam("dialog", "db_mode", 1 )
> > modparam("dialog", "db_url", DBLOCAL )
> > modparam("dialog", "dlg_flag", FLT_DLG )
> > modparam("dialog", "dlg_match_mode", 1)
> > modparam("dialog", "dlg_extra_hdrs", "Hint: Initiated by IMP Core
> Proxy\r\n")
> > modparam("dialog", "hash_size",  4096 )
> > # Do not send any keepalive messages in dialog
> > modparam("dialog", "ka_timer",  0)
> > modparam("dialog", "ka_interval",  30 )
> > modparam("dialog", "enable_stats",  1 )
> > modparam("dialog", "detect_spirals", 1 )
> > modparam("dialog", "bridge_controller", "sip:control...@imp.ch")
> > modparam("dialog", "default_timeout", 43200 )
> > modparam("dialog", "timeout_avp", "$avp(dlgtimeout)") # Needs to be same
> as sst timeout!
> > modparam("dialog", "profiles_no_value", "callcounter;total_sbcincoming");
> > modparam("dialog", "profiles_with_value",
> "dispatchout;sbcincoming;trunkincoming;cpeincoming;safariincoming;custprofilecounter;legcounter");
> > modparam("dialog", "enable_dmq", 1)
> > modparam("dialog", "h_id_start", -1) # Use server_id
> > modparam("dialog", "h_id_step", 2)
> >
> > Each node uses a local database (defined as DBLOCAL), they don't access
> > our common 'remote' database where for example customer authentication
> > information is provided.
> >
> >> Regarding reject of the calls for cooling down the instance for
> >> restart, check if the 305 Use Proxy is supported by origin of the
> >> calls, it might be more suitable.
> > Our registrar nodes run kamailio too, so implementing that would be an
> > option.
> > Regarding our IC to 

[SR-Users] Re: Finding a way to gracefully restart kamailio with dialog module enabled and DMQ sync

2023-07-05 Thread Daniel-Constantin Mierla
Hello,

while I did some work for the dialog module over the time, it is not one
of my favourites modules beside using it to ensure a maximum duration of
calls (for which it should work fine). Also, I never ended up using it
for CDRs generation, I like the acc event based account which can record
more events event for the same call.

That said, for active calls limiting I usually rely on other solutions
built via config file and leveraging htable or various backends. Also,
for values that I need to use during call duration, I use htable.

Anyhow, I find it strange that after restart a request within dialog
does not match the record loaded in memory, because obviously it is
there as you say the dialog times out at some point in time later. Did
you change the value of modparam hash_size?

Have you captured the sip traffic and can you see the 'did' parameter in
the Route headers of BYE?

Cheers,
Daniel

On 05.07.23 15:44, Benoît Panizzon wrote:
> Hi Daniel
>
> PS: Kamailio 5.5 in use so not on the edge yet.
>
> Thank you for helping regarding that issue and maybe hinting how it
> could be improved.
>
>> what is the purpose of dmq replication? To limit active calls?
> Exactly. Our subscriptions contain a certain number of 'channels'. If
> they are used the customer is busy.
>
> So I use profile counters to track the used channel count per customer.
>
>> What exactly happens? What means "corrupts" them? What data/fields
>> become corrupted?
> It looks like the/some dialogues just don't exist any more after a
> reload. Or they exist but are not being found.
>
> Observed issues:
>
> * dialogue variables that were populated before the restart do not
>   exist any more.
> * When a call ends, the corresponding dialogue is not found, so the
>   dialogue modules is unable to end the CDR - but when the dialogue
>   timeout hits, the CDR is then written with duration = timeout value
>   which is way longer than the actual duration.
> * profile counter for dialogues that were not found when the call ended
>   are still present so 'POTS' customers with 'one' channel stay 'busy'
>   until the dialogue timeout hits.
> * Database accumulates data from dialogues that do not exist anymore.
>
> Specific error I see, when a dialogue should be ended and kamailio
> can't find it anymore after a restart is:
>
> ERROR: dialog [dlg_dmq.c:289]: dlg_dmq_handle_msg(): dialog [838:15539]
> not found
>
> If you could help, I could try to dig out the full log of a dialogue
> experiencing that issue.
>
> Dialog Parameters used:
>
> modparam("dialog", "send_bye", 1)
> modparam("dialog", "timer_procs", 0)
> modparam("dialog", "db_mode", 1 )
> modparam("dialog", "db_url", DBLOCAL )
> modparam("dialog", "dlg_flag", FLT_DLG )
> modparam("dialog", "dlg_match_mode", 1)
> modparam("dialog", "dlg_extra_hdrs", "Hint: Initiated by IMP Core Proxy\r\n")
> modparam("dialog", "hash_size",  4096 )
> # Do not send any keepalive messages in dialog
> modparam("dialog", "ka_timer",  0)
> modparam("dialog", "ka_interval",  30 )
> modparam("dialog", "enable_stats",  1 )
> modparam("dialog", "detect_spirals", 1 )
> modparam("dialog", "bridge_controller", "sip:control...@imp.ch")
> modparam("dialog", "default_timeout", 43200 )
> modparam("dialog", "timeout_avp", "$avp(dlgtimeout)") # Needs to be same as 
> sst timeout!
> modparam("dialog", "profiles_no_value", "callcounter;total_sbcincoming");
> modparam("dialog", "profiles_with_value", 
> "dispatchout;sbcincoming;trunkincoming;cpeincoming;safariincoming;custprofilecounter;legcounter");
> modparam("dialog", "enable_dmq", 1)
> modparam("dialog", "h_id_start", -1) # Use server_id
> modparam("dialog", "h_id_step", 2)
>
> Each node uses a local database (defined as DBLOCAL), they don't access
> our common 'remote' database where for example customer authentication
> information is provided.
>  
>> Regarding reject of the calls for cooling down the instance for
>> restart, check if the 305 Use Proxy is supported by origin of the
>> calls, it might be more suitable.
> Our registrar nodes run kamailio too, so implementing that would be an
> option.
> Regarding our IC to other TSP and Carriers, I would have to check, at
> the moment, they are all connected via a commercial vendor SBC so if
> that SBC can handle 305 on Invites (it can in register) that would work.
>
> But one of our goals is to eventually also get rid of that SBC which
> has some limitations and a not very advantageous 'feature' licensing
> model in favour of open source and flexiblity by using Kamailion and
> rtpengine for that task. But then we would have to check with every IC
> we have. I know that at the moment 503 is understood by all our
> switches connected to kamailio and also our registrars handle 503 as a
> failure to the other node in the dispatcher list.
>
> -- 
> Mit freundlichen Grüssen
>
> -Benoît Panizzon- @ HomeOffice und normal erreichbar
> -- 
> I m p r o W a r e   A G-Leiter Commerce Kunden
> 

[SR-Users] Re: Finding a way to gracefully restart kamailio with dialog module enabled and DMQ sync

2023-07-05 Thread Benoît Panizzon
Hi Daniel

PS: Kamailio 5.5 in use so not on the edge yet.

Thank you for helping regarding that issue and maybe hinting how it
could be improved.

> what is the purpose of dmq replication? To limit active calls?

Exactly. Our subscriptions contain a certain number of 'channels'. If
they are used the customer is busy.

So I use profile counters to track the used channel count per customer.

> What exactly happens? What means "corrupts" them? What data/fields
> become corrupted?

It looks like the/some dialogues just don't exist any more after a
reload. Or they exist but are not being found.

Observed issues:

* dialogue variables that were populated before the restart do not
  exist any more.
* When a call ends, the corresponding dialogue is not found, so the
  dialogue modules is unable to end the CDR - but when the dialogue
  timeout hits, the CDR is then written with duration = timeout value
  which is way longer than the actual duration.
* profile counter for dialogues that were not found when the call ended
  are still present so 'POTS' customers with 'one' channel stay 'busy'
  until the dialogue timeout hits.
* Database accumulates data from dialogues that do not exist anymore.

Specific error I see, when a dialogue should be ended and kamailio
can't find it anymore after a restart is:

ERROR: dialog [dlg_dmq.c:289]: dlg_dmq_handle_msg(): dialog [838:15539]
not found

If you could help, I could try to dig out the full log of a dialogue
experiencing that issue.

Dialog Parameters used:

modparam("dialog", "send_bye", 1)
modparam("dialog", "timer_procs", 0)
modparam("dialog", "db_mode", 1 )
modparam("dialog", "db_url", DBLOCAL )
modparam("dialog", "dlg_flag", FLT_DLG )
modparam("dialog", "dlg_match_mode", 1)
modparam("dialog", "dlg_extra_hdrs", "Hint: Initiated by IMP Core Proxy\r\n")
modparam("dialog", "hash_size",  4096 )
# Do not send any keepalive messages in dialog
modparam("dialog", "ka_timer",  0)
modparam("dialog", "ka_interval",  30 )
modparam("dialog", "enable_stats",  1 )
modparam("dialog", "detect_spirals", 1 )
modparam("dialog", "bridge_controller", "sip:control...@imp.ch")
modparam("dialog", "default_timeout", 43200 )
modparam("dialog", "timeout_avp", "$avp(dlgtimeout)")   # Needs to be same as 
sst timeout!
modparam("dialog", "profiles_no_value", "callcounter;total_sbcincoming");
modparam("dialog", "profiles_with_value", 
"dispatchout;sbcincoming;trunkincoming;cpeincoming;safariincoming;custprofilecounter;legcounter");
modparam("dialog", "enable_dmq", 1)
modparam("dialog", "h_id_start", -1) # Use server_id
modparam("dialog", "h_id_step", 2)

Each node uses a local database (defined as DBLOCAL), they don't access
our common 'remote' database where for example customer authentication
information is provided.
 
> Regarding reject of the calls for cooling down the instance for
> restart, check if the 305 Use Proxy is supported by origin of the
> calls, it might be more suitable.

Our registrar nodes run kamailio too, so implementing that would be an
option.
Regarding our IC to other TSP and Carriers, I would have to check, at
the moment, they are all connected via a commercial vendor SBC so if
that SBC can handle 305 on Invites (it can in register) that would work.

But one of our goals is to eventually also get rid of that SBC which
has some limitations and a not very advantageous 'feature' licensing
model in favour of open source and flexiblity by using Kamailion and
rtpengine for that task. But then we would have to check with every IC
we have. I know that at the moment 503 is understood by all our
switches connected to kamailio and also our registrars handle 503 as a
failure to the other node in the dispatcher list.

-- 
Mit freundlichen Grüssen

-Benoît Panizzon- @ HomeOffice und normal erreichbar
-- 
I m p r o W a r e   A G-Leiter Commerce Kunden
__

Zurlindenstrasse 29 Tel  +41 61 826 93 00
CH-4133 PrattelnFax  +41 61 826 93 01
Schweiz Web  http://www.imp.ch
__
__
Kamailio - Users Mailing List - Non Commercial Discussions
To unsubscribe send an email to sr-users-le...@lists.kamailio.org
Important: keep the mailing list in the recipients, do not reply only to the 
sender!
Edit mailing list options or unsubscribe:


[SR-Users] Re: Finding a way to gracefully restart kamailio with dialog module enabled and DMQ sync

2023-07-05 Thread Daniel-Constantin Mierla
Hello,

what is the purpose of dmq replication? To limit active calls?

What exactly happens? What means "corrupts" them? What data/fields
become corrupted?

Regarding reject of the calls for cooling down the instance for restart,
check if the 305 Use Proxy is supported by origin of the calls, it might
be more suitable.

Cheers,
Daniel

On 05.07.23 11:38, Benoît Panizzon wrote:
> Hi Gang
>
> We are still having massive issues on how to safely reload kamailio
> after config changes when using the dialog module and DMQ.
>
> If there are active dialogues, kamailio corrupts them on a restart even
> when using MySQL as dialog backend.
>
> As we use two core nodes for redundancy, I am looking for a way to
> gracefully restart kamailio.
>
> I am considering adding some key in a hash table or anything else I
> can reload on runtime to indicate to kamailio not to accept any
> new calls (effectively rejecting Invites without To-Tag with 503 causing
> the registrar or IC peer to hopefully resend the invite to the other
> node).
>
> Then wait, until no more dialogues are active, so kamailio can safely be
> restarted.
>
> My Issue now: How can I find out, one specific node does not have
> any active dialogues?
>
> 'kamcmd dlg.stats_active' returns the count of all DMQ synced nodes, not
> of the local one.
>
> And suggestions or any other ideas how I can 'reload' the kamailio
> config without disrupting active dialogues?
>
> My last resort would be to look into the database:
> modparam("dialog", "h_id_start", -1) # Use server_id
> modparam("dialog", "h_id_step", 2)
>
> So odd/even H_ID should tell me the number per node. But I see a lot of
> orphan dialogues hanging around in the database not being cleaned so I
> guess that will not be reliable at all.
>
> Yes, I know I will get the question: 'Why do you need to restart
> kamailio that often'.
>
> We have started production on our kamailio based TSP platform. And of
> course, despite a LOT of testing beforehand, there is always some issue
> that pops up. At the moment, I have to implement a config change about
> once or twice a week to fix some new minor issues.
>
> I hope, somewhen in the future we will hopefully have a stable config
> which will last for several months, but at the moment this is the
> situation.
>
> -- 
> Mit freundlichen Grüssen
>
> -Benoît Panizzon- @ HomeOffice und normal erreichbar
> -- 
> I m p r o W a r e   A G-Leiter Commerce Kunden
> __
>
> Zurlindenstrasse 29 Tel  +41 61 826 93 00
> CH-4133 PrattelnFax  +41 61 826 93 01
> Schweiz Web  http://www.imp.ch
> __
> __
> Kamailio - Users Mailing List - Non Commercial Discussions
> To unsubscribe send an email to sr-users-le...@lists.kamailio.org
> Important: keep the mailing list in the recipients, do not reply only to the 
> sender!
> Edit mailing list options or unsubscribe:

-- 
Daniel-Constantin Mierla -- www.asipto.com
www.twitter.com/miconda -- www.linkedin.com/in/miconda
Kamailio World Conference - www.kamailioworld.com

__
Kamailio - Users Mailing List - Non Commercial Discussions
To unsubscribe send an email to sr-users-le...@lists.kamailio.org
Important: keep the mailing list in the recipients, do not reply only to the 
sender!
Edit mailing list options or unsubscribe: