subject:"Re\: \[OpenSIPS\-Users\] Autoscaler in 3.2.x"

Re: [OpenSIPS-Users] Autoscaler in 3.2.x

2022-09-15 Thread Bogdan-Andrei Iancu


Hi Yury,

For the crash -> is there any core file to check ?

For mem usage -> you should try to get a memory dump for further 
investigation [1].


[1] https://opensips.org/Documentation/TroubleShooting-OutOfMem

Best regards,

Bogdan-Andrei Iancu

OpenSIPS Founder and Developer
  https://www.opensips-solutions.com
OpenSIPS Summit 27-30 Sept 2022, Athens
  https://www.opensips.org/events/Summit-2022Athens/

On 9/14/22 10:13 PM, Yury Kirsanov wrote:

Hi Bogdan,
Thanks a lot for your help and support! The only question I know have 
is why OpenSIPS was going into a crash if all TCP processes were 
blocked waiting for connection? It was starting to consume more and 
more memory and then it was crashing with a segfault upon reaching 
then -m memory parameter. I do understand that TCP listeners were in a 
blocking mode and were not able to do any work until the session could 
be fully established, not being able to forward any SIP packets, but 
isn't that a bug that OpenSIPS was starting to eat memory and then 
crash? Do I need to open a bug report on this? Thanks!


Best regards,
Yury.

On Wed, Sep 14, 2022 at 10:58 PM Bogdan-Andrei Iancu 
mailto:bog...@opensips.org>> wrote:


Hi Yury,

You need to check the TCP setting and to be sure your OpenSIPS
will (1) not try to perform TCP connect against destination known
not to be able to accept (like TCP/WS end points behind NAT) - see
the tcp_no_new_conn_bflag [1] - or (2) not block for long time
while attempting a connect - see the tcp_connect_timeout [2] or
consider enabling async [3].

[1]

https://www.opensips.org/Documentation/Script-CoreParameters-3-2#tcp_no_new_conn_bflag


[2]

https://www.opensips.org/Documentation/Script-CoreParameters-3-2#tcp_connect_timeout


[3]
https://opensips.org/html/docs/modules/3.2.x/proto_tcp.html#idp168992


Regards,

Bogdan-Andrei Iancu

OpenSIPS Founder and Developer
   https://www.opensips-solutions.com  
OpenSIPS Summit 27-30 Sept 2022, Athens
   https://www.opensips.org/events/Summit-2022Athens/  


On 9/13/22 12:01 PM, Yury Kirsanov wrote:

Hi Bogdan,
Thanks for this update, but it looks like I can't check
autoscaler because of this first issue with blocking TCP connect.
Is there a way to resolve it? Am I doing something wrong? Or is
that something to do with OpenSIPS code? As yes, you're right, as
soon as I restart OpenSIPS having a lot of SIP devices trying to
connect to it - it goes crazy, starts to consume memory and stops
to forward packets sitting there at 100% load until it runs out
of memory and segfaults. Sometimes I can't even restart it to
come to normal state to make it work, it just loops into same
crash whatever I try to do.

I've compiled OpenSIPS 3.3.1 with your patch and was able to
start it but not sure, maybe I was just lucky this time.

What should I do? Thanks!

Best regards,
Yury.

On Tue, 13 Sept 2022, 18:56 Bogdan-Andrei Iancu,
mailto:bog...@opensips.org>> wrote:

Hi Yury,

it looks like you some multiple issues, overlapping here. The
traps you sent here have nothing to do with the auto-scaling,
but with a blocking TCP connect for SIP - most of the procs
get blocked into a sync TCP connect.

Regards,

Bogdan-Andrei Iancu

OpenSIPS Founder and Developer
   https://www.opensips-solutions.com  

OpenSIPS Summit 27-30 Sept 2022, Athens
   https://www.opensips.org/events/Summit-2022Athens/  


On 9/12/22 4:39 PM, Yury Kirsanov wrote:

Hi Bogdan,
I've applied the patch (had to find where to apply it
manually for 3.2.8 downloaded from Web page, line 1568
instead of 1652) and restarted the server with only about
300-350 SIP devices and immediately got into same issue. I'm
attaching two GDB dumps made within several minutes from
each other. Autoscale was now OFF, please see my previous
message as currently for some reason I'm experiencing
lockups even when it's off :(



Best regards,
Yury.

On Mon, Sep 12, 2022 at 7:48 PM Bogdan-Andrei Iancu
mailto:bog...@opensips.org>> wrote:

Hi Yuri,

Could you give this patch a try? it should fix the
blocking you experience (it should apply on 3.2 too).

Best regards,

Bogdan-Andrei Iancu

OpenSIPS Founder and Developer

Re: [OpenSIPS-Users] Autoscaler in 3.2.x

2022-09-14 Thread Yury Kirsanov

Hi Bogdan,
Thanks a lot for your help and support! The only question I know have is
why OpenSIPS was going into a crash if all TCP processes were blocked
waiting for connection? It was starting to consume more and more memory and
then it was crashing with a segfault upon reaching then -m memory
parameter. I do understand that TCP listeners were in a blocking mode and
were not able to do any work until the session could be fully established,
not being able to forward any SIP packets, but isn't that a bug that
OpenSIPS was starting to eat memory and then crash? Do I need to open a bug
report on this? Thanks!

Best regards,
Yury.

On Wed, Sep 14, 2022 at 10:58 PM Bogdan-Andrei Iancu 
wrote:

> Hi Yury,
>
> You need to check the TCP setting and to be sure your OpenSIPS will (1)
> not try to perform TCP connect against destination known not to be able to
> accept (like TCP/WS end points behind NAT) - see the tcp_no_new_conn_bflag
> [1] - or (2) not block for long time while attempting a connect - see the
> tcp_connect_timeout [2] or consider enabling async [3].
>
> [1]
> https://www.opensips.org/Documentation/Script-CoreParameters-3-2#tcp_no_new_conn_bflag
> [2]
> https://www.opensips.org/Documentation/Script-CoreParameters-3-2#tcp_connect_timeout
> [3] https://opensips.org/html/docs/modules/3.2.x/proto_tcp.html#idp168992
>
> Regards,
>
> Bogdan-Andrei Iancu
>
> OpenSIPS Founder and Developer
>   https://www.opensips-solutions.com
> OpenSIPS Summit 27-30 Sept 2022, Athens
>   https://www.opensips.org/events/Summit-2022Athens/
>
> On 9/13/22 12:01 PM, Yury Kirsanov wrote:
>
> Hi Bogdan,
> Thanks for this update, but it looks like I can't check autoscaler because
> of this first issue with blocking TCP connect. Is there a way to resolve
> it? Am I doing something wrong? Or is that something to do with OpenSIPS
> code? As yes, you're right, as soon as I restart OpenSIPS having a lot of
> SIP devices trying to connect to it - it goes crazy, starts to consume
> memory and stops to forward packets sitting there at 100% load until it
> runs out of memory and segfaults. Sometimes I can't even restart it to come
> to normal state to make it work, it just loops into same crash whatever I
> try to do.
>
> I've compiled OpenSIPS 3.3.1 with your patch and was able to start it but
> not sure, maybe I was just lucky this time.
>
> What should I do? Thanks!
>
> Best regards,
> Yury.
>
> On Tue, 13 Sept 2022, 18:56 Bogdan-Andrei Iancu, 
> wrote:
>
>> Hi Yury,
>>
>> it looks like you some multiple issues, overlapping here. The traps you
>> sent here have nothing to do with the auto-scaling, but with a blocking TCP
>> connect for SIP - most of the procs get blocked into a sync TCP connect.
>>
>> Regards,
>>
>> Bogdan-Andrei Iancu
>>
>> OpenSIPS Founder and Developer
>>   https://www.opensips-solutions.com
>> OpenSIPS Summit 27-30 Sept 2022, Athens
>>   https://www.opensips.org/events/Summit-2022Athens/
>>
>> On 9/12/22 4:39 PM, Yury Kirsanov wrote:
>>
>> Hi Bogdan,
>> I've applied the patch (had to find where to apply it manually for 3.2.8
>> downloaded from Web page, line 1568 instead of 1652) and restarted the
>> server with only about 300-350 SIP devices and immediately got into same
>> issue. I'm attaching two GDB dumps made within several minutes from each
>> other. Autoscale was now OFF, please see my previous message as currently
>> for some reason I'm experiencing lockups even when it's off :(
>>
>>
>> Best regards,
>> Yury.
>>
>> On Mon, Sep 12, 2022 at 7:48 PM Bogdan-Andrei Iancu 
>> wrote:
>>
>>> Hi Yuri,
>>>
>>> Could you give this patch a try? it should fix the blocking you
>>> experience (it should apply on 3.2 too).
>>>
>>> Best regards,
>>>
>>> Bogdan-Andrei Iancu
>>>
>>> OpenSIPS Founder and Developer
>>>   https://www.opensips-solutions.com
>>> OpenSIPS Summit 27-30 Sept 2022, Athens
>>>   https://www.opensips.org/events/Summit-2022Athens/
>>>
>>> On 9/7/22 2:54 PM, Bogdan-Andrei Iancu wrote:
>>>
>>> Hi Yury,
>>>
>>> Thanks for the details info here - let me do a review of some code and
>>> run some tests, as at this point I have a good idea on the direction to dig
>>> into.
>>>
>>> I will update here.
>>>
>>> Best regards,
>>>
>>> Bogdan-Andrei Iancu
>>>
>>> OpenSIPS Founder and Developer
>>>   https://www.opensips-solutions.com
>>> OpenSIPS Summit 27-30 Sept 2022, Athens
>>>   https://www.opensips.org/events/Summit-2022Athens/
>>>
>>> On 9/6/22 11:24 AM, Yury Kirsanov wrote:
>>>
>>> Hi Bogdan,
>>> Yes, I'm listening on all types of sockets including UDP, TCP and TLS on
>>> the outside public interface and then forward traffic into internal LAN via
>>> UDP only.
>>>
>>> Previously it was getting stuck quite easily, now I had to wait for a
>>> while before this actually happened. I've routed part of my customers to
>>> this server to obtain this result so I will have to do that again.
>>>
>>> As soon as I see one of the processes stuck I'll dot the trap command
>>> and send you all the details including

Re: [OpenSIPS-Users] Autoscaler in 3.2.x

2022-09-14 Thread Yury Kirsanov

Hi Bogdan,
Looks like my problem was quite complex as I had following issues:

1. tcp_async was off
2. TCP timeouts were set to be very high

I've tried to just enable tcp_async and that didn't help - after restart
and TCP SYN storm OpenSIPS started to consume memory and processes got
locked up again. Then I started to tune other parameters. Here's like it
was before:

# Proto TCP
loadmodule "proto_tcp.so"
modparam("proto_tcp", "tcp_async", 1)
modparam("proto_tcp", "tcp_send_timeout", 5000)
modparam("proto_tcp", "tcp_async_local_connect_timeout", 5000)
modparam("proto_tcp", "tcp_async_local_write_timeout", 5000)
modparam("proto_tcp", "tcp_max_msg_chunks", 16)

I had a very high tcp_send_timout because some of our customers are
connecting from across the globe and have high latency times, of course
that's not 5 seconds but I set it that high just to make sure they will be
able to connect. Now I ended up with this config:

# Proto TCP
loadmodule "proto_tcp.so"
modparam("proto_tcp", "tcp_async", 1)
modparam("proto_tcp", "tcp_send_timeout", 1000)
modparam("proto_tcp", "tcp_async_local_connect_timeout", 500)
modparam("proto_tcp", "tcp_async_local_write_timeout", 500)
modparam("proto_tcp", "tcp_max_msg_chunks", 16)
modparam("proto_tcp", "tcp_parallel_handling", 1)

And looks like OpenSIPS is now able to survive restarts!

One more thing I tried before was to rate-limit TCP connections on iptables
- that also helped even in my incorrect configuration and with blocking TCP
mode. I rate-limited TCP SYN packets on my public interface on TCP ports
that go to OpenSIPS using iptables rate-limit module with 10 packets per
second and 50 packets burst - that also seemed to help. This can be
adjusted as required depending on new connections load. Hope this helps
someone who would run into the same troubles!

I will continue monitoring our OpenSIPS instances and if everything works
fine after restart I will enable auto-scaler to test it with the new patch.

Thanks a lot for your help, Bogdan, that's much appreciated!

Best regards,
Yury.

On Thu, Sep 15, 2022 at 1:22 AM Yury Kirsanov  wrote:

> Hi Bogdan,
> Thanks for your answer, I've checked my configs and yes, for some reason I
> had tcp_async off!!! I will definitely switch it on for now and then give
> it a try!!! Can't believe I missed that one!!!
>
> Best regards,
> Yury.
>
> On Wed, Sep 14, 2022 at 10:58 PM Bogdan-Andrei Iancu 
> wrote:
>
>> Hi Yury,
>>
>> You need to check the TCP setting and to be sure your OpenSIPS will (1)
>> not try to perform TCP connect against destination known not to be able to
>> accept (like TCP/WS end points behind NAT) - see the tcp_no_new_conn_bflag
>> [1] - or (2) not block for long time while attempting a connect - see the
>> tcp_connect_timeout [2] or consider enabling async [3].
>>
>> [1]
>> https://www.opensips.org/Documentation/Script-CoreParameters-3-2#tcp_no_new_conn_bflag
>> [2]
>> https://www.opensips.org/Documentation/Script-CoreParameters-3-2#tcp_connect_timeout
>> [3] https://opensips.org/html/docs/modules/3.2.x/proto_tcp.html#idp168992
>>
>> Regards,
>>
>> Bogdan-Andrei Iancu
>>
>> OpenSIPS Founder and Developer
>>   https://www.opensips-solutions.com
>> OpenSIPS Summit 27-30 Sept 2022, Athens
>>   https://www.opensips.org/events/Summit-2022Athens/
>>
>> On 9/13/22 12:01 PM, Yury Kirsanov wrote:
>>
>> Hi Bogdan,
>> Thanks for this update, but it looks like I can't check autoscaler
>> because of this first issue with blocking TCP connect. Is there a way to
>> resolve it? Am I doing something wrong? Or is that something to do with
>> OpenSIPS code? As yes, you're right, as soon as I restart OpenSIPS having a
>> lot of SIP devices trying to connect to it - it goes crazy, starts to
>> consume memory and stops to forward packets sitting there at 100% load
>> until it runs out of memory and segfaults. Sometimes I can't even restart
>> it to come to normal state to make it work, it just loops into same crash
>> whatever I try to do.
>>
>> I've compiled OpenSIPS 3.3.1 with your patch and was able to start it but
>> not sure, maybe I was just lucky this time.
>>
>> What should I do? Thanks!
>>
>> Best regards,
>> Yury.
>>
>> On Tue, 13 Sept 2022, 18:56 Bogdan-Andrei Iancu, 
>> wrote:
>>
>>> Hi Yury,
>>>
>>> it looks like you some multiple issues, overlapping here. The traps you
>>> sent here have nothing to do with the auto-scaling, but with a blocking TCP
>>> connect for SIP - most of the procs get blocked into a sync TCP connect.
>>>
>>> Regards,
>>>
>>> Bogdan-Andrei Iancu
>>>
>>> OpenSIPS Founder and Developer
>>>   https://www.opensips-solutions.com
>>> OpenSIPS Summit 27-30 Sept 2022, Athens
>>>   https://www.opensips.org/events/Summit-2022Athens/
>>>
>>> On 9/12/22 4:39 PM, Yury Kirsanov wrote:
>>>
>>> Hi Bogdan,
>>> I've applied the patch (had to find where to apply it manually for 3.2.8
>>> downloaded from Web page, line 1568 instead of 1652) and restarted the
>>> server with only about 300-350 SIP devices and

Re: [OpenSIPS-Users] Autoscaler in 3.2.x

2022-09-14 Thread Yury Kirsanov

Hi Bogdan,
Thanks for your answer, I've checked my configs and yes, for some reason I
had tcp_async off!!! I will definitely switch it on for now and then give
it a try!!! Can't believe I missed that one!!!

Best regards,
Yury.

On Wed, Sep 14, 2022 at 10:58 PM Bogdan-Andrei Iancu 
wrote:

> Hi Yury,
>
> You need to check the TCP setting and to be sure your OpenSIPS will (1)
> not try to perform TCP connect against destination known not to be able to
> accept (like TCP/WS end points behind NAT) - see the tcp_no_new_conn_bflag
> [1] - or (2) not block for long time while attempting a connect - see the
> tcp_connect_timeout [2] or consider enabling async [3].
>
> [1]
> https://www.opensips.org/Documentation/Script-CoreParameters-3-2#tcp_no_new_conn_bflag
> [2]
> https://www.opensips.org/Documentation/Script-CoreParameters-3-2#tcp_connect_timeout
> [3] https://opensips.org/html/docs/modules/3.2.x/proto_tcp.html#idp168992
>
> Regards,
>
> Bogdan-Andrei Iancu
>
> OpenSIPS Founder and Developer
>   https://www.opensips-solutions.com
> OpenSIPS Summit 27-30 Sept 2022, Athens
>   https://www.opensips.org/events/Summit-2022Athens/
>
> On 9/13/22 12:01 PM, Yury Kirsanov wrote:
>
> Hi Bogdan,
> Thanks for this update, but it looks like I can't check autoscaler because
> of this first issue with blocking TCP connect. Is there a way to resolve
> it? Am I doing something wrong? Or is that something to do with OpenSIPS
> code? As yes, you're right, as soon as I restart OpenSIPS having a lot of
> SIP devices trying to connect to it - it goes crazy, starts to consume
> memory and stops to forward packets sitting there at 100% load until it
> runs out of memory and segfaults. Sometimes I can't even restart it to come
> to normal state to make it work, it just loops into same crash whatever I
> try to do.
>
> I've compiled OpenSIPS 3.3.1 with your patch and was able to start it but
> not sure, maybe I was just lucky this time.
>
> What should I do? Thanks!
>
> Best regards,
> Yury.
>
> On Tue, 13 Sept 2022, 18:56 Bogdan-Andrei Iancu, 
> wrote:
>
>> Hi Yury,
>>
>> it looks like you some multiple issues, overlapping here. The traps you
>> sent here have nothing to do with the auto-scaling, but with a blocking TCP
>> connect for SIP - most of the procs get blocked into a sync TCP connect.
>>
>> Regards,
>>
>> Bogdan-Andrei Iancu
>>
>> OpenSIPS Founder and Developer
>>   https://www.opensips-solutions.com
>> OpenSIPS Summit 27-30 Sept 2022, Athens
>>   https://www.opensips.org/events/Summit-2022Athens/
>>
>> On 9/12/22 4:39 PM, Yury Kirsanov wrote:
>>
>> Hi Bogdan,
>> I've applied the patch (had to find where to apply it manually for 3.2.8
>> downloaded from Web page, line 1568 instead of 1652) and restarted the
>> server with only about 300-350 SIP devices and immediately got into same
>> issue. I'm attaching two GDB dumps made within several minutes from each
>> other. Autoscale was now OFF, please see my previous message as currently
>> for some reason I'm experiencing lockups even when it's off :(
>>
>>
>> Best regards,
>> Yury.
>>
>> On Mon, Sep 12, 2022 at 7:48 PM Bogdan-Andrei Iancu 
>> wrote:
>>
>>> Hi Yuri,
>>>
>>> Could you give this patch a try? it should fix the blocking you
>>> experience (it should apply on 3.2 too).
>>>
>>> Best regards,
>>>
>>> Bogdan-Andrei Iancu
>>>
>>> OpenSIPS Founder and Developer
>>>   https://www.opensips-solutions.com
>>> OpenSIPS Summit 27-30 Sept 2022, Athens
>>>   https://www.opensips.org/events/Summit-2022Athens/
>>>
>>> On 9/7/22 2:54 PM, Bogdan-Andrei Iancu wrote:
>>>
>>> Hi Yury,
>>>
>>> Thanks for the details info here - let me do a review of some code and
>>> run some tests, as at this point I have a good idea on the direction to dig
>>> into.
>>>
>>> I will update here.
>>>
>>> Best regards,
>>>
>>> Bogdan-Andrei Iancu
>>>
>>> OpenSIPS Founder and Developer
>>>   https://www.opensips-solutions.com
>>> OpenSIPS Summit 27-30 Sept 2022, Athens
>>>   https://www.opensips.org/events/Summit-2022Athens/
>>>
>>> On 9/6/22 11:24 AM, Yury Kirsanov wrote:
>>>
>>> Hi Bogdan,
>>> Yes, I'm listening on all types of sockets including UDP, TCP and TLS on
>>> the outside public interface and then forward traffic into internal LAN via
>>> UDP only.
>>>
>>> Previously it was getting stuck quite easily, now I had to wait for a
>>> while before this actually happened. I've routed part of my customers to
>>> this server to obtain this result so I will have to do that again.
>>>
>>> As soon as I see one of the processes stuck I'll dot the trap command
>>> and send you all the details including processes load, ps output and so on.
>>>
>>> For now I had to switch autoscaling off and just create many listeners.
>>> Do I understand correctly that I need to restart OpenSIPS in order to apply
>>> autoscaling profiles and reload-routes is not sufficient?
>>>
>>> Also, do I need separate UDP profiles for public and private interfaces?
>>> And do I need to apply autoscaling profile just to a socket or I need

Re: [OpenSIPS-Users] Autoscaler in 3.2.x

2022-09-14 Thread Bogdan-Andrei Iancu


Hi Yury,

You need to check the TCP setting and to be sure your OpenSIPS will (1) 
not try to perform TCP connect against destination known not to be able 
to accept (like TCP/WS end points behind NAT) - see the 
tcp_no_new_conn_bflag [1] - or (2) not block for long time while 
attempting a connect - see the tcp_connect_timeout [2] or consider 
enabling async [3].


[1] 
https://www.opensips.org/Documentation/Script-CoreParameters-3-2#tcp_no_new_conn_bflag
[2] 
https://www.opensips.org/Documentation/Script-CoreParameters-3-2#tcp_connect_timeout

[3] https://opensips.org/html/docs/modules/3.2.x/proto_tcp.html#idp168992

Regards,

Bogdan-Andrei Iancu

OpenSIPS Founder and Developer
  https://www.opensips-solutions.com
OpenSIPS Summit 27-30 Sept 2022, Athens
  https://www.opensips.org/events/Summit-2022Athens/

On 9/13/22 12:01 PM, Yury Kirsanov wrote:

Hi Bogdan,
Thanks for this update, but it looks like I can't check autoscaler 
because of this first issue with blocking TCP connect. Is there a way 
to resolve it? Am I doing something wrong? Or is that something to do 
with OpenSIPS code? As yes, you're right, as soon as I restart 
OpenSIPS having a lot of SIP devices trying to connect to it - it goes 
crazy, starts to consume memory and stops to forward packets sitting 
there at 100% load until it runs out of memory and segfaults. 
Sometimes I can't even restart it to come to normal state to make it 
work, it just loops into same crash whatever I try to do.


I've compiled OpenSIPS 3.3.1 with your patch and was able to start it 
but not sure, maybe I was just lucky this time.


What should I do? Thanks!

Best regards,
Yury.

On Tue, 13 Sept 2022, 18:56 Bogdan-Andrei Iancu, > wrote:


Hi Yury,

it looks like you some multiple issues, overlapping here. The
traps you sent here have nothing to do with the auto-scaling, but
with a blocking TCP connect for SIP - most of the procs get
blocked into a sync TCP connect.

Regards,

Bogdan-Andrei Iancu

OpenSIPS Founder and Developer
   https://www.opensips-solutions.com  
OpenSIPS Summit 27-30 Sept 2022, Athens
   https://www.opensips.org/events/Summit-2022Athens/  


On 9/12/22 4:39 PM, Yury Kirsanov wrote:

Hi Bogdan,
I've applied the patch (had to find where to apply it manually
for 3.2.8 downloaded from Web page, line 1568 instead of 1652)
and restarted the server with only about 300-350 SIP devices and
immediately got into same issue. I'm attaching two GDB dumps made
within several minutes from each other. Autoscale was now OFF,
please see my previous message as currently for some reason I'm
experiencing lockups even when it's off :(



Best regards,
Yury.

On Mon, Sep 12, 2022 at 7:48 PM Bogdan-Andrei Iancu
mailto:bog...@opensips.org>> wrote:

Hi Yuri,

Could you give this patch a try? it should fix the blocking
you experience (it should apply on 3.2 too).

Best regards,

Bogdan-Andrei Iancu

OpenSIPS Founder and Developer
   https://www.opensips-solutions.com  

OpenSIPS Summit 27-30 Sept 2022, Athens
   https://www.opensips.org/events/Summit-2022Athens/  


On 9/7/22 2:54 PM, Bogdan-Andrei Iancu wrote:

Hi Yury,

Thanks for the details info here - let me do a review of
some code and run some tests, as at this point I have a good
idea on the direction to dig into.

I will update here.

Best regards,
Bogdan-Andrei Iancu

OpenSIPS Founder and Developer
   https://www.opensips-solutions.com  

OpenSIPS Summit 27-30 Sept 2022, Athens
   https://www.opensips.org/events/Summit-2022Athens/  

On 9/6/22 11:24 AM, Yury Kirsanov wrote:

Hi Bogdan,
Yes, I'm listening on all types of sockets including UDP,
TCP and TLS on the outside public interface and then
forward traffic into internal LAN via UDP only.

Previously it was getting stuck quite easily, now I had to
wait for a while before this actually happened. I've routed
part of my customers to this server to obtain this result
so I will have to do that again.

As soon as I see one of the processes stuck I'll dot the
trap command and send you all the details including
processes load, ps output and so on.

For now I had to switch autoscaling off and just create
many listeners. Do I understand correctly that I need to
restart OpenSIPS in order to apply autoscaling profiles and
reload-routes is not sufficient?

Also, do I need separate UDP

Re: [OpenSIPS-Users] Autoscaler in 3.2.x

2022-09-13 Thread Yury Kirsanov

Hi Bogdan,
Thanks for this update, but it looks like I can't check autoscaler because
of this first issue with blocking TCP connect. Is there a way to resolve
it? Am I doing something wrong? Or is that something to do with OpenSIPS
code? As yes, you're right, as soon as I restart OpenSIPS having a lot of
SIP devices trying to connect to it - it goes crazy, starts to consume
memory and stops to forward packets sitting there at 100% load until it
runs out of memory and segfaults. Sometimes I can't even restart it to come
to normal state to make it work, it just loops into same crash whatever I
try to do.

I've compiled OpenSIPS 3.3.1 with your patch and was able to start it but
not sure, maybe I was just lucky this time.

What should I do? Thanks!

Best regards,
Yury.

On Tue, 13 Sept 2022, 18:56 Bogdan-Andrei Iancu, 
wrote:

> Hi Yury,
>
> it looks like you some multiple issues, overlapping here. The traps you
> sent here have nothing to do with the auto-scaling, but with a blocking TCP
> connect for SIP - most of the procs get blocked into a sync TCP connect.
>
> Regards,
>
> Bogdan-Andrei Iancu
>
> OpenSIPS Founder and Developer
>   https://www.opensips-solutions.com
> OpenSIPS Summit 27-30 Sept 2022, Athens
>   https://www.opensips.org/events/Summit-2022Athens/
>
> On 9/12/22 4:39 PM, Yury Kirsanov wrote:
>
> Hi Bogdan,
> I've applied the patch (had to find where to apply it manually for 3.2.8
> downloaded from Web page, line 1568 instead of 1652) and restarted the
> server with only about 300-350 SIP devices and immediately got into same
> issue. I'm attaching two GDB dumps made within several minutes from each
> other. Autoscale was now OFF, please see my previous message as currently
> for some reason I'm experiencing lockups even when it's off :(
>
>
> Best regards,
> Yury.
>
> On Mon, Sep 12, 2022 at 7:48 PM Bogdan-Andrei Iancu 
> wrote:
>
>> Hi Yuri,
>>
>> Could you give this patch a try? it should fix the blocking you
>> experience (it should apply on 3.2 too).
>>
>> Best regards,
>>
>> Bogdan-Andrei Iancu
>>
>> OpenSIPS Founder and Developer
>>   https://www.opensips-solutions.com
>> OpenSIPS Summit 27-30 Sept 2022, Athens
>>   https://www.opensips.org/events/Summit-2022Athens/
>>
>> On 9/7/22 2:54 PM, Bogdan-Andrei Iancu wrote:
>>
>> Hi Yury,
>>
>> Thanks for the details info here - let me do a review of some code and
>> run some tests, as at this point I have a good idea on the direction to dig
>> into.
>>
>> I will update here.
>>
>> Best regards,
>>
>> Bogdan-Andrei Iancu
>>
>> OpenSIPS Founder and Developer
>>   https://www.opensips-solutions.com
>> OpenSIPS Summit 27-30 Sept 2022, Athens
>>   https://www.opensips.org/events/Summit-2022Athens/
>>
>> On 9/6/22 11:24 AM, Yury Kirsanov wrote:
>>
>> Hi Bogdan,
>> Yes, I'm listening on all types of sockets including UDP, TCP and TLS on
>> the outside public interface and then forward traffic into internal LAN via
>> UDP only.
>>
>> Previously it was getting stuck quite easily, now I had to wait for a
>> while before this actually happened. I've routed part of my customers to
>> this server to obtain this result so I will have to do that again.
>>
>> As soon as I see one of the processes stuck I'll dot the trap command and
>> send you all the details including processes load, ps output and so on.
>>
>> For now I had to switch autoscaling off and just create many listeners.
>> Do I understand correctly that I need to restart OpenSIPS in order to apply
>> autoscaling profiles and reload-routes is not sufficient?
>>
>> Also, do I need separate UDP profiles for public and private interfaces?
>> And do I need to apply autoscaling profile just to a socket or I need to
>> specify udp or tcp_workers with autoscaler too?
>>
>> Thanks and best regards,
>> Yury.
>>
>> On Tue, 6 Sept 2022, 18:18 Bogdan-Andrei Iancu, 
>> wrote:
>>
>>> Hi Yury,
>>>
>>> Thanks for the info. I see that the stuck process (24) is an
>>> auto-scalled one (based on its id). Do you have SIP traffic from UDP to TCP
>>> or doing some HEP capturing for SIP ? I saw a recent similar report where a
>>> UDP auto-scalled worked got stuck when trying to do some communication with
>>> the TCP main/manager process (in order to handle a TCP operation).
>>>
>>> BTW, any chance to do a "opensips-cli -x trap" when you have that stuck
>>> process, just to see where is it stuck? and is it hard to reproduce? as I
>>> may ask you to extract some information from the running process
>>>
>>> Regards,
>>>
>>> Bogdan-Andrei Iancu
>>>
>>> OpenSIPS Founder and Developer
>>>   https://www.opensips-solutions.com
>>> OpenSIPS Summit 27-30 Sept 2022, Athens
>>>   https://www.opensips.org/events/Summit-2022Athens/
>>>
>>> On 9/3/22 6:54 PM, Yury Kirsanov wrote:
>>>
>>
>>
>> ___
>> Users mailing 
>> listUsers@lists.opensips.orghttp://lists.opensips.org/cgi-bin/mailman/listinfo/users
>>
>>
>>
>
___
Users mailing list

Re: [OpenSIPS-Users] Autoscaler in 3.2.x

2022-09-13 Thread Bogdan-Andrei Iancu


Hi Yury,

it looks like you some multiple issues, overlapping here. The traps you 
sent here have nothing to do with the auto-scaling, but with a blocking 
TCP connect for SIP - most of the procs get blocked into a sync TCP connect.


Regards,

Bogdan-Andrei Iancu

OpenSIPS Founder and Developer
  https://www.opensips-solutions.com
OpenSIPS Summit 27-30 Sept 2022, Athens
  https://www.opensips.org/events/Summit-2022Athens/

On 9/12/22 4:39 PM, Yury Kirsanov wrote:

Hi Bogdan,
I've applied the patch (had to find where to apply it manually for 
3.2.8 downloaded from Web page, line 1568 instead of 1652) and 
restarted the server with only about 300-350 SIP devices and 
immediately got into same issue. I'm attaching two GDB dumps made 
within several minutes from each other. Autoscale was now OFF, please 
see my previous message as currently for some reason I'm experiencing 
lockups even when it's off :(



Best regards,
Yury.

On Mon, Sep 12, 2022 at 7:48 PM Bogdan-Andrei Iancu 
mailto:bog...@opensips.org>> wrote:


Hi Yuri,

Could you give this patch a try? it should fix the blocking you
experience (it should apply on 3.2 too).

Best regards,

Bogdan-Andrei Iancu

OpenSIPS Founder and Developer
   https://www.opensips-solutions.com  
OpenSIPS Summit 27-30 Sept 2022, Athens
   https://www.opensips.org/events/Summit-2022Athens/  


On 9/7/22 2:54 PM, Bogdan-Andrei Iancu wrote:

Hi Yury,

Thanks for the details info here - let me do a review of some
code and run some tests, as at this point I have a good idea on
the direction to dig into.

I will update here.

Best regards,
Bogdan-Andrei Iancu

OpenSIPS Founder and Developer
   https://www.opensips-solutions.com  
OpenSIPS Summit 27-30 Sept 2022, Athens
   https://www.opensips.org/events/Summit-2022Athens/  

On 9/6/22 11:24 AM, Yury Kirsanov wrote:

Hi Bogdan,
Yes, I'm listening on all types of sockets including UDP, TCP
and TLS on the outside public interface and then forward traffic
into internal LAN via UDP only.

Previously it was getting stuck quite easily, now I had to wait
for a while before this actually happened. I've routed part of
my customers to this server to obtain this result so I will have
to do that again.

As soon as I see one of the processes stuck I'll dot the trap
command and send you all the details including processes load,
ps output and so on.

For now I had to switch autoscaling off and just create many
listeners. Do I understand correctly that I need to restart
OpenSIPS in order to apply autoscaling profiles and
reload-routes is not sufficient?

Also, do I need separate UDP profiles for public and private
interfaces? And do I need to apply autoscaling profile just to a
socket or I need to specify udp or tcp_workers with autoscaler too?

Thanks and best regards,
Yury.

On Tue, 6 Sept 2022, 18:18 Bogdan-Andrei Iancu,
mailto:bog...@opensips.org>> wrote:

Hi Yury,

Thanks for the info. I see that the stuck process (24) is an
auto-scalled one (based on its id). Do you have SIP traffic
from UDP to TCP or doing some HEP capturing for SIP ? I saw
a recent similar report where a UDP auto-scalled worked got
stuck when trying to do some communication with the TCP
main/manager process (in order to handle a TCP operation).

BTW, any chance to do a "opensips-cli -x trap" when you have
that stuck process, just to see where is it stuck? and is it
hard to reproduce? as I may ask you to extract some
information from the running process

Regards,

Bogdan-Andrei Iancu

OpenSIPS Founder and Developer
   https://www.opensips-solutions.com  

OpenSIPS Summit 27-30 Sept 2022, Athens
   https://www.opensips.org/events/Summit-2022Athens/  


On 9/3/22 6:54 PM, Yury Kirsanov wrote:




___
Users mailing list
Users@lists.opensips.org  
http://lists.opensips.org/cgi-bin/mailman/listinfo/users  





___
Users mailing list
Users@lists.opensips.org
http://lists.opensips.org/cgi-bin/mailman/listinfo/users

Re: [OpenSIPS-Users] Autoscaler in 3.2.x

2022-09-12 Thread Yury Kirsanov

Hi Bogdan,
Just FYI the patch wasn't able to apply automatically cause line to patch
is on line 1568 in OpenSIPS 3.2.8 downloaded from web server, not from GIT,
I have manually applied it.

Best regards,
Yury.

On Mon, Sep 12, 2022 at 10:49 PM Yury Kirsanov  wrote:

> Hi Bogdan,
> Thanks, I'll try this patch today and if anything locks up will definitely
> do a trap before restarting! Thanks again!
>
> Best regards,
> Yury.
>
> On Mon, 12 Sept 2022, 19:56 Bogdan-Andrei Iancu, 
> wrote:
>
>> Hi Yuri,
>>
>> Could you give this patch a try? it should fix the blocking you
>> experience (it should apply on 3.2 too).
>>
>> Best regards,
>>
>> Bogdan-Andrei Iancu
>>
>> OpenSIPS Founder and Developer
>>   https://www.opensips-solutions.com
>> OpenSIPS Summit 27-30 Sept 2022, Athens
>>   https://www.opensips.org/events/Summit-2022Athens/
>>
>> On 9/7/22 2:54 PM, Bogdan-Andrei Iancu wrote:
>>
>> Hi Yury,
>>
>> Thanks for the details info here - let me do a review of some code and
>> run some tests, as at this point I have a good idea on the direction to dig
>> into.
>>
>> I will update here.
>>
>> Best regards,
>>
>> Bogdan-Andrei Iancu
>>
>> OpenSIPS Founder and Developer
>>   https://www.opensips-solutions.com
>> OpenSIPS Summit 27-30 Sept 2022, Athens
>>   https://www.opensips.org/events/Summit-2022Athens/
>>
>> On 9/6/22 11:24 AM, Yury Kirsanov wrote:
>>
>> Hi Bogdan,
>> Yes, I'm listening on all types of sockets including UDP, TCP and TLS on
>> the outside public interface and then forward traffic into internal LAN via
>> UDP only.
>>
>> Previously it was getting stuck quite easily, now I had to wait for a
>> while before this actually happened. I've routed part of my customers to
>> this server to obtain this result so I will have to do that again.
>>
>> As soon as I see one of the processes stuck I'll dot the trap command and
>> send you all the details including processes load, ps output and so on.
>>
>> For now I had to switch autoscaling off and just create many listeners.
>> Do I understand correctly that I need to restart OpenSIPS in order to apply
>> autoscaling profiles and reload-routes is not sufficient?
>>
>> Also, do I need separate UDP profiles for public and private interfaces?
>> And do I need to apply autoscaling profile just to a socket or I need to
>> specify udp or tcp_workers with autoscaler too?
>>
>> Thanks and best regards,
>> Yury.
>>
>> On Tue, 6 Sept 2022, 18:18 Bogdan-Andrei Iancu, 
>> wrote:
>>
>>> Hi Yury,
>>>
>>> Thanks for the info. I see that the stuck process (24) is an
>>> auto-scalled one (based on its id). Do you have SIP traffic from UDP to TCP
>>> or doing some HEP capturing for SIP ? I saw a recent similar report where a
>>> UDP auto-scalled worked got stuck when trying to do some communication with
>>> the TCP main/manager process (in order to handle a TCP operation).
>>>
>>> BTW, any chance to do a "opensips-cli -x trap" when you have that stuck
>>> process, just to see where is it stuck? and is it hard to reproduce? as I
>>> may ask you to extract some information from the running process
>>>
>>> Regards,
>>>
>>> Bogdan-Andrei Iancu
>>>
>>> OpenSIPS Founder and Developer
>>>   https://www.opensips-solutions.com
>>> OpenSIPS Summit 27-30 Sept 2022, Athens
>>>   https://www.opensips.org/events/Summit-2022Athens/
>>>
>>> On 9/3/22 6:54 PM, Yury Kirsanov wrote:
>>>
>>
>>
>> ___
>> Users mailing 
>> listUsers@lists.opensips.orghttp://lists.opensips.org/cgi-bin/mailman/listinfo/users
>>
>>
>>
___
Users mailing list
Users@lists.opensips.org
http://lists.opensips.org/cgi-bin/mailman/listinfo/users

Re: [OpenSIPS-Users] Autoscaler in 3.2.x

2022-09-12 Thread Yury Kirsanov

Hi Bogdan,
Thanks, I'll try this patch today and if anything locks up will definitely
do a trap before restarting! Thanks again!

Best regards,
Yury.

On Mon, 12 Sept 2022, 19:56 Bogdan-Andrei Iancu, 
wrote:

> Hi Yuri,
>
> Could you give this patch a try? it should fix the blocking you experience
> (it should apply on 3.2 too).
>
> Best regards,
>
> Bogdan-Andrei Iancu
>
> OpenSIPS Founder and Developer
>   https://www.opensips-solutions.com
> OpenSIPS Summit 27-30 Sept 2022, Athens
>   https://www.opensips.org/events/Summit-2022Athens/
>
> On 9/7/22 2:54 PM, Bogdan-Andrei Iancu wrote:
>
> Hi Yury,
>
> Thanks for the details info here - let me do a review of some code and run
> some tests, as at this point I have a good idea on the direction to dig
> into.
>
> I will update here.
>
> Best regards,
>
> Bogdan-Andrei Iancu
>
> OpenSIPS Founder and Developer
>   https://www.opensips-solutions.com
> OpenSIPS Summit 27-30 Sept 2022, Athens
>   https://www.opensips.org/events/Summit-2022Athens/
>
> On 9/6/22 11:24 AM, Yury Kirsanov wrote:
>
> Hi Bogdan,
> Yes, I'm listening on all types of sockets including UDP, TCP and TLS on
> the outside public interface and then forward traffic into internal LAN via
> UDP only.
>
> Previously it was getting stuck quite easily, now I had to wait for a
> while before this actually happened. I've routed part of my customers to
> this server to obtain this result so I will have to do that again.
>
> As soon as I see one of the processes stuck I'll dot the trap command and
> send you all the details including processes load, ps output and so on.
>
> For now I had to switch autoscaling off and just create many listeners. Do
> I understand correctly that I need to restart OpenSIPS in order to apply
> autoscaling profiles and reload-routes is not sufficient?
>
> Also, do I need separate UDP profiles for public and private interfaces?
> And do I need to apply autoscaling profile just to a socket or I need to
> specify udp or tcp_workers with autoscaler too?
>
> Thanks and best regards,
> Yury.
>
> On Tue, 6 Sept 2022, 18:18 Bogdan-Andrei Iancu, 
> wrote:
>
>> Hi Yury,
>>
>> Thanks for the info. I see that the stuck process (24) is an auto-scalled
>> one (based on its id). Do you have SIP traffic from UDP to TCP or doing
>> some HEP capturing for SIP ? I saw a recent similar report where a UDP
>> auto-scalled worked got stuck when trying to do some communication with the
>> TCP main/manager process (in order to handle a TCP operation).
>>
>> BTW, any chance to do a "opensips-cli -x trap" when you have that stuck
>> process, just to see where is it stuck? and is it hard to reproduce? as I
>> may ask you to extract some information from the running process
>>
>> Regards,
>>
>> Bogdan-Andrei Iancu
>>
>> OpenSIPS Founder and Developer
>>   https://www.opensips-solutions.com
>> OpenSIPS Summit 27-30 Sept 2022, Athens
>>   https://www.opensips.org/events/Summit-2022Athens/
>>
>> On 9/3/22 6:54 PM, Yury Kirsanov wrote:
>>
>
>
> ___
> Users mailing 
> listUsers@lists.opensips.orghttp://lists.opensips.org/cgi-bin/mailman/listinfo/users
>
>
>
___
Users mailing list
Users@lists.opensips.org
http://lists.opensips.org/cgi-bin/mailman/listinfo/users

Re: [OpenSIPS-Users] Autoscaler in 3.2.x

2022-09-12 Thread Bogdan-Andrei Iancu


Hi Yuri,

Could you give this patch a try? it should fix the blocking you 
experience (it should apply on 3.2 too).


Best regards,

Bogdan-Andrei Iancu

OpenSIPS Founder and Developer
  https://www.opensips-solutions.com
OpenSIPS Summit 27-30 Sept 2022, Athens
  https://www.opensips.org/events/Summit-2022Athens/

On 9/7/22 2:54 PM, Bogdan-Andrei Iancu wrote:

Hi Yury,

Thanks for the details info here - let me do a review of some code and 
run some tests, as at this point I have a good idea on the direction 
to dig into.


I will update here.

Best regards,
Bogdan-Andrei Iancu

OpenSIPS Founder and Developer
   https://www.opensips-solutions.com
OpenSIPS Summit 27-30 Sept 2022, Athens
   https://www.opensips.org/events/Summit-2022Athens/
On 9/6/22 11:24 AM, Yury Kirsanov wrote:

Hi Bogdan,
Yes, I'm listening on all types of sockets including UDP, TCP and TLS 
on the outside public interface and then forward traffic into 
internal LAN via UDP only.


Previously it was getting stuck quite easily, now I had to wait for a 
while before this actually happened. I've routed part of my customers 
to this server to obtain this result so I will have to do that again.


As soon as I see one of the processes stuck I'll dot the trap command 
and send you all the details including processes load, ps output and 
so on.


For now I had to switch autoscaling off and just create many 
listeners. Do I understand correctly that I need to restart OpenSIPS 
in order to apply autoscaling profiles and reload-routes is not 
sufficient?


Also, do I need separate UDP profiles for public and private 
interfaces? And do I need to apply autoscaling profile just to a 
socket or I need to specify udp or tcp_workers with autoscaler too?


Thanks and best regards,
Yury.

On Tue, 6 Sept 2022, 18:18 Bogdan-Andrei Iancu, > wrote:


Hi Yury,

Thanks for the info. I see that the stuck process (24) is an
auto-scalled one (based on its id). Do you have SIP traffic from
UDP to TCP or doing some HEP capturing for SIP ? I saw a recent
similar report where a UDP auto-scalled worked got stuck when
trying to do some communication with the TCP main/manager process
(in order to handle a TCP operation).

BTW, any chance to do a "opensips-cli -x trap" when you have that
stuck process, just to see where is it stuck? and is it hard to
reproduce? as I may ask you to extract some information from the
running process

Regards,

Bogdan-Andrei Iancu

OpenSIPS Founder and Developer
   https://www.opensips-solutions.com  
OpenSIPS Summit 27-30 Sept 2022, Athens
   https://www.opensips.org/events/Summit-2022Athens/  


On 9/3/22 6:54 PM, Yury Kirsanov wrote:




___
Users mailing list
Users@lists.opensips.org
http://lists.opensips.org/cgi-bin/mailman/listinfo/users


diff --git a/net/net_tcp.c b/net/net_tcp.c
index fff9aab4a..b6edce32f 100644
--- a/net/net_tcp.c
+++ b/net/net_tcp.c
@@ -1652,13 +1652,13 @@ static void tcp_main_server(void)
 	 * processes (get fd, new connection a.s.o)
 	 * NOTE: we add even the socks for the inactive/unfork processes - the
 	 *   socks are already created, but the triggering is from proc to
-	 *   main, having them into reactor is harmless - thye will never
+	 *   main, having them into reactor is harmless - they will never
 	 *   trigger as there is no proc on the other end to write us */
 	for (n=1; n0)
-			if (reactor_add_reader( pt[n].unix_sock, F_TCP_WORKER,
+		if (n!=process_no && pt[n].tcp_socks_holder[0]>0)
+			if (reactor_add_reader( pt[n].tcp_socks_holder[0], F_TCP_WORKER,
 			RCT_PRIO_PROC, [n])<0){
 LM_ERR("failed to add process %d (%s) unix socket "
 	"to the fd list\n", n, pt[n].desc);
___
Users mailing list
Users@lists.opensips.org
http://lists.opensips.org/cgi-bin/mailman/listinfo/users

Re: [OpenSIPS-Users] Autoscaler in 3.2.x

2022-09-12 Thread Bogdan-Andrei Iancu


Hi Yury,

Maybe you can get a trap output while the procs are in 100% and before 
everything dies ?


Best regards,

Bogdan-Andrei Iancu

OpenSIPS Founder and Developer
  https://www.opensips-solutions.com
OpenSIPS Summit 27-30 Sept 2022, Athens
  https://www.opensips.org/events/Summit-2022Athens/

On 9/12/22 11:12 AM, Yury Kirsanov wrote:

Hi Bogdan,
We've run into another issue, this time I was just restarting OpenSIPS 
server during busy hours when about ~2500 SIP devices were registering 
and making calls (even though dialog number was only around 100-200 
but there were a lot of packets) and I was unable to successfully 
restart OpenSIPS, it was getting some processes stuck almost 
immediately at 100% load and then they were starting to consume more 
and more memory and after eating up all the memory they were dying and 
OpenSIPS stopped processing SIP packets.


I believe it's similar to autoscaler issue because in this case I only 
had 16 UDP workers and 16 TCP workers and it was taking more time for 
OpenSIPS to run into the issue, while when I had autoscaler on it 
wasn't able to open that many processes at once so currently active 
ones were getting stuck very fast and crash was happening almost 
immediately.


I'm running a localhost REDIS cache to store where to proxy each SIP 
packet to and if there's no record for this SIP device then I'm 
querying REST server and cache its response. REST server load was no 
more than 25% during restart when all SIP devices were urgently trying 
to re-connect to OpenSIPS so I don't think they're of any issue.


I'm using async REST calls and believe there should be no issues with 
my configuration script even though it runs a lot of nested routes due 
to async REST requests. Hopefully I didn't forget some 'exit' 
statements anywhere but if it was the case - OpenSIPS service would be 
locking up at any time.


OpenSIPS itself is running on a VMWare host as a virtual machine and I 
could see it was consuming up to 100% CPU of a 40-core host when it 
was locking up. Also VMWare readyness for VM was spiking to 1500ms 
during these lock-ups meaning that VM was waiting for some cores to 
actually free up to get some CPU time.


The only way out of this situation for me was to run multiple OpenSIPS 
VMs and spread the load between them, no matter what I tried to do 
I wasn't able to make OpenSIPS running fine again even though it was 
working perfectly fine for more than a week in this configuration and 
under same load, but I was starting/restarting it only during night 
hours when there were no calls active.


I'm happy to share my configuration file with you privately if requred.

Hope this helps!

Thanks and best regards,
Yury.

On Wed, Sep 7, 2022 at 9:54 PM Bogdan-Andrei Iancu 
mailto:bog...@opensips.org>> wrote:


Hi Yury,

Thanks for the details info here - let me do a review of some code
and run some tests, as at this point I have a good idea on the
direction to dig into.

I will update here.

Best regards,

Bogdan-Andrei Iancu

OpenSIPS Founder and Developer
   https://www.opensips-solutions.com  
OpenSIPS Summit 27-30 Sept 2022, Athens
   https://www.opensips.org/events/Summit-2022Athens/  


On 9/6/22 11:24 AM, Yury Kirsanov wrote:

Hi Bogdan,
Yes, I'm listening on all types of sockets including UDP, TCP and
TLS on the outside public interface and then forward traffic into
internal LAN via UDP only.

Previously it was getting stuck quite easily, now I had to wait
for a while before this actually happened. I've routed part of my
customers to this server to obtain this result so I will have to
do that again.

As soon as I see one of the processes stuck I'll dot the trap
command and send you all the details including processes load, ps
output and so on.

For now I had to switch autoscaling off and just create many
listeners. Do I understand correctly that I need to restart
OpenSIPS in order to apply autoscaling profiles and reload-routes
is not sufficient?

Also, do I need separate UDP profiles for public and private
interfaces? And do I need to apply autoscaling profile just to a
socket or I need to specify udp or tcp_workers with autoscaler too?

Thanks and best regards,
Yury.

On Tue, 6 Sept 2022, 18:18 Bogdan-Andrei Iancu,
mailto:bog...@opensips.org>> wrote:

Hi Yury,

Thanks for the info. I see that the stuck process (24) is an
auto-scalled one (based on its id). Do you have SIP traffic
from UDP to TCP or doing some HEP capturing for SIP ? I saw a
recent similar report where a UDP auto-scalled worked got
stuck when trying to do some communication with the TCP
main/manager process (in order to handle a TCP operation).

BTW, any chance to do a "opensips-cli -x trap"

Re: [OpenSIPS-Users] Autoscaler in 3.2.x

2022-09-12 Thread Yury Kirsanov

Hi Bogdan,
We've run into another issue, this time I was just restarting OpenSIPS
server during busy hours when about ~2500 SIP devices were registering and
making calls (even though dialog number was only around 100-200 but there
were a lot of packets) and I was unable to successfully restart OpenSIPS,
it was getting some processes stuck almost immediately at 100% load and
then they were starting to consume more and more memory and after eating up
all the memory they were dying and OpenSIPS stopped processing SIP packets.

I believe it's similar to autoscaler issue because in this case I only had
16 UDP workers and 16 TCP workers and it was taking more time for OpenSIPS
to run into the issue, while when I had autoscaler on it wasn't able to
open that many processes at once so currently active ones were getting
stuck very fast and crash was happening almost immediately.

I'm running a localhost REDIS cache to store where to proxy each SIP packet
to and if there's no record for this SIP device then I'm querying REST
server and cache its response. REST server load was no more than 25% during
restart when all SIP devices were urgently trying to re-connect to OpenSIPS
so I don't think they're of any issue.

I'm using async REST calls and believe there should be no issues with my
configuration script even though it runs a lot of nested routes due to
async REST requests. Hopefully I didn't forget some 'exit' statements
anywhere but if it was the case - OpenSIPS service would be locking up at
any time.

OpenSIPS itself is running on a VMWare host as a virtual machine and I
could see it was consuming up to 100% CPU of a 40-core host when it was
locking up. Also VMWare readyness for VM was spiking to 1500ms during these
lock-ups meaning that VM was waiting for some cores to actually free up to
get some CPU time.

The only way out of this situation for me was to run multiple OpenSIPS VMs
and spread the load between them, no matter what I tried to do I wasn't
able to make OpenSIPS running fine again even though it was working
perfectly fine for more than a week in this configuration and under same
load, but I was starting/restarting it only during night hours when there
were no calls active.

I'm happy to share my configuration file with you privately if requred.

Hope this helps!

Thanks and best regards,
Yury.

On Wed, Sep 7, 2022 at 9:54 PM Bogdan-Andrei Iancu 
wrote:

> Hi Yury,
>
> Thanks for the details info here - let me do a review of some code and run
> some tests, as at this point I have a good idea on the direction to dig
> into.
>
> I will update here.
>
> Best regards,
>
> Bogdan-Andrei Iancu
>
> OpenSIPS Founder and Developer
>   https://www.opensips-solutions.com
> OpenSIPS Summit 27-30 Sept 2022, Athens
>   https://www.opensips.org/events/Summit-2022Athens/
>
> On 9/6/22 11:24 AM, Yury Kirsanov wrote:
>
> Hi Bogdan,
> Yes, I'm listening on all types of sockets including UDP, TCP and TLS on
> the outside public interface and then forward traffic into internal LAN via
> UDP only.
>
> Previously it was getting stuck quite easily, now I had to wait for a
> while before this actually happened. I've routed part of my customers to
> this server to obtain this result so I will have to do that again.
>
> As soon as I see one of the processes stuck I'll dot the trap command and
> send you all the details including processes load, ps output and so on.
>
> For now I had to switch autoscaling off and just create many listeners. Do
> I understand correctly that I need to restart OpenSIPS in order to apply
> autoscaling profiles and reload-routes is not sufficient?
>
> Also, do I need separate UDP profiles for public and private interfaces?
> And do I need to apply autoscaling profile just to a socket or I need to
> specify udp or tcp_workers with autoscaler too?
>
> Thanks and best regards,
> Yury.
>
> On Tue, 6 Sept 2022, 18:18 Bogdan-Andrei Iancu, 
> wrote:
>
>> Hi Yury,
>>
>> Thanks for the info. I see that the stuck process (24) is an auto-scalled
>> one (based on its id). Do you have SIP traffic from UDP to TCP or doing
>> some HEP capturing for SIP ? I saw a recent similar report where a UDP
>> auto-scalled worked got stuck when trying to do some communication with the
>> TCP main/manager process (in order to handle a TCP operation).
>>
>> BTW, any chance to do a "opensips-cli -x trap" when you have that stuck
>> process, just to see where is it stuck? and is it hard to reproduce? as I
>> may ask you to extract some information from the running process
>>
>> Regards,
>>
>> Bogdan-Andrei Iancu
>>
>> OpenSIPS Founder and Developer
>>   https://www.opensips-solutions.com
>> OpenSIPS Summit 27-30 Sept 2022, Athens
>>   https://www.opensips.org/events/Summit-2022Athens/
>>
>> On 9/3/22 6:54 PM, Yury Kirsanov wrote:
>>
>
>
___
Users mailing list
Users@lists.opensips.org
http://lists.opensips.org/cgi-bin/mailman/listinfo/users

Re: [OpenSIPS-Users] Autoscaler in 3.2.x

2022-09-07 Thread Bogdan-Andrei Iancu


Hi Yury,

Thanks for the details info here - let me do a review of some code and 
run some tests, as at this point I have a good idea on the direction to 
dig into.


I will update here.

Best regards,

Bogdan-Andrei Iancu

OpenSIPS Founder and Developer
  https://www.opensips-solutions.com
OpenSIPS Summit 27-30 Sept 2022, Athens
  https://www.opensips.org/events/Summit-2022Athens/

On 9/6/22 11:24 AM, Yury Kirsanov wrote:

Hi Bogdan,
Yes, I'm listening on all types of sockets including UDP, TCP and TLS 
on the outside public interface and then forward traffic into internal 
LAN via UDP only.


Previously it was getting stuck quite easily, now I had to wait for a 
while before this actually happened. I've routed part of my customers 
to this server to obtain this result so I will have to do that again.


As soon as I see one of the processes stuck I'll dot the trap command 
and send you all the details including processes load, ps output and 
so on.


For now I had to switch autoscaling off and just create many 
listeners. Do I understand correctly that I need to restart OpenSIPS 
in order to apply autoscaling profiles and reload-routes is not 
sufficient?


Also, do I need separate UDP profiles for public and private 
interfaces? And do I need to apply autoscaling profile just to a 
socket or I need to specify udp or tcp_workers with autoscaler too?


Thanks and best regards,
Yury.

On Tue, 6 Sept 2022, 18:18 Bogdan-Andrei Iancu, > wrote:


Hi Yury,

Thanks for the info. I see that the stuck process (24) is an
auto-scalled one (based on its id). Do you have SIP traffic from
UDP to TCP or doing some HEP capturing for SIP ? I saw a recent
similar report where a UDP auto-scalled worked got stuck when
trying to do some communication with the TCP main/manager process
(in order to handle a TCP operation).

BTW, any chance to do a "opensips-cli -x trap" when you have that
stuck process, just to see where is it stuck? and is it hard to
reproduce? as I may ask you to extract some information from the
running process

Regards,

Bogdan-Andrei Iancu

OpenSIPS Founder and Developer
   https://www.opensips-solutions.com  
OpenSIPS Summit 27-30 Sept 2022, Athens
   https://www.opensips.org/events/Summit-2022Athens/  


On 9/3/22 6:54 PM, Yury Kirsanov wrote:



___
Users mailing list
Users@lists.opensips.org
http://lists.opensips.org/cgi-bin/mailman/listinfo/users

Re: [OpenSIPS-Users] Autoscaler in 3.2.x

2022-09-06 Thread Yury Kirsanov

Hi Bogdan,
This has finally happened, OS is stuck again in 100% for one of its
processes. Here's the output of load: command:

opensips-cli -x mi get_statistics load:
{
"load:load-proc-1": 0,
"load:load1m-proc-1": 0,
"load:load10m-proc-1": 0,
"load:load-proc-2": 0,
"load:load1m-proc-2": 0,
"load:load10m-proc-2": 0,
"load:load-proc-3": 0,
"load:load1m-proc-3": 0,
"load:load10m-proc-3": 0,
"load:load-proc-4": 0,
"load:load1m-proc-4": 0,
"load:load10m-proc-4": 0,
"load:load-proc-5": 0,
"load:load1m-proc-5": 0,
"load:load10m-proc-5": 8,
"load:load-proc-6": 0,
"load:load1m-proc-6": 0,
"load:load10m-proc-6": 6,
"load:load-proc-13": 0,
"load:load1m-proc-13": 0,
"load:load10m-proc-13": 0,
"load:load-proc-14": 0,
"load:load1m-proc-14": 0,
"load:load10m-proc-14": 0,
"load:load-proc-21": 0,
"load:load1m-proc-21": 0,
"load:load10m-proc-21": 0,
"load:load-proc-22": 0,
"load:load1m-proc-22": 0,
"load:load10m-proc-22": 0,
"load:load-proc-23": 0,
"load:load1m-proc-23": 0,
"load:load10m-proc-23": 0,
"load:load-proc-24": 100,
"load:load1m-proc-24": 100,
"load:load10m-proc-24": 100,
"load:load": 12,
"load:load1m": 12,
"load:load10m": 14,
"load:load-all": 10,
"load:load1m-all": 10,
"load:load10m-all": 11,
"load:processes_number": 13
}

As you can see, process 24 is consuming 100% of time for more than a minute
already

Here's the output of process list, it's a UDP socket listener on internal
interface that's stuck at 100% load:

opensips-cli -x mi ps
{
"Processes": [
{
"ID": 0,
"PID": 5457,
"Type": "attendant"
},
{
"ID": 1,
"PID": 5463,
"Type": "HTTPD 10.x.x.x:"
},
{
"ID": 2,
"PID": 5464,
"Type": "MI FIFO"
},
{
"ID": 3,
"PID": 5465,
"Type": "time_keeper"
},
{
"ID": 4,
"PID": 5466,
"Type": "timer"
},
{
"ID": 5,
"PID": 5467,
"Type": "SIP receiver udp:10.x.x.x:5060"
},
{
"ID": 6,
"PID": 5470,
"Type": "SIP receiver udp:10.x.x.x:5060"
},
{
"ID": 13,
"PID": 5477,
"Type": "SIP receiver udp:103.x.x.x:7060"
},
{
"ID": 14,
"PID": 5478,
"Type": "SIP receiver udp:103.x.x.x:7060"
},
{
"ID": 21,
"PID": 5485,
"Type": "TCP receiver"
},
{
"ID": 22,
"PID": 5486,
"Type": "Timer handler"
},
{
"ID": 23,
"PID": 5487,
"Type": "TCP main"
},
{
"ID": 24,
"PID": 5759,
"Type": "SIP receiver udp:10.x.x.x:5060"
}
]
}

opensips -V
version: opensips 3.2.8 (x86_64/linux)
flags: STATS: On, DISABLE_NAGLE, USE_MCAST, SHM_MMAP, PKG_MALLOC, Q_MALLOC,
F_MALLOC, HP_MALLOC, DBG_MALLOC, FAST_LOCK-ADAPTIVE_WAIT
ADAPTIVE_WAIT_LOOPS=1024, MAX_RECV_BUFFER_SIZE 262144, MAX_LISTEN 16,
MAX_URI_SIZE 1024, BUF_SIZE 65535
poll method support: poll, epoll, sigio_rt, select.
git revision: d2496fed5
main.c compiled on 16:17:53 Aug 24 2022 with gcc 9

This time server has some load but still it's not heavy at all plus I'm
using async requests for REST queries.

This is my autoscaling section:

# Scaling section
auto_scaling_profile = PROFILE_UDP_PUB
 scale up to 16 on 70% for 4 cycles within 5
 scale down to 2 on 20% for 5 cycles

auto_scaling_profile = PROFILE_UDP_PRIV
 scale up to 16 on 70% for 4 cycles within 5
 scale down to 2 on 20% for 5 cycles

auto_scaling_profile = PROFILE_TCP
 scale up to 16 on 70% for 4 cycles within 5
 scale down to 2 on 20% for 10 cycles

And that's how I apply it to sockets, I'm not applying it to UDP workers at
all:

socket=udp:10.x.x.x:5060 use_auto_scaling_profile PROFILE_UDP_PRIV
socket=udp:103.x.x.x:7060 use_auto_scaling_profile PROFILE_UDP_PUB

tcp_workers = 1 use_auto_scaling_profile PROFILE_TCP

I can't get this process unstuck until I restart OpenSIPS.

Just to add - if I turn off auto scaling and enable 16 UDP and 16 TCP
workers and just specify sockets without any parameters - load goes to 0,
see graph attached, load was at 25% all the time until I restarted OpenSIPS
in normal mode, then it's immediately 0:

[image: image.png]

Here's an output of load:

opensips-cli -x mi get_statistics load:
{
"load:load-proc-1": 0,
"load:load1m-proc-1": 0,
"load:load10m-proc-1": 0,
"load:load-proc-2": 0,
"load:load1m-proc-2": 0,
"load:load10m-proc-2": 0,
"load:load-proc-3": 0,
"load:load1m-proc-3": 0,
"load:load10m-proc-3": 0,
"load:load-proc-4":

Re: [OpenSIPS-Users] Autoscaler in 3.2.x

2022-09-06 Thread Bogdan-Andrei Iancu


Hi Yury,

Thanks for the info. I see that the stuck process (24) is an 
auto-scalled one (based on its id). Do you have SIP traffic from UDP to 
TCP or doing some HEP capturing for SIP ? I saw a recent similar report 
where a UDP auto-scalled worked got stuck when trying to do some 
communication with the TCP main/manager process (in order to handle a 
TCP operation).


BTW, any chance to do a "opensips-cli -x trap" when you have that stuck 
process, just to see where is it stuck? and is it hard to reproduce? as 
I may ask you to extract some information from the running process


Regards,

Bogdan-Andrei Iancu

OpenSIPS Founder and Developer
  https://www.opensips-solutions.com
OpenSIPS Summit 27-30 Sept 2022, Athens
  https://www.opensips.org/events/Summit-2022Athens/

On 9/3/22 6:54 PM, Yury Kirsanov wrote:

Hi Bogdan,
This has finally happened, OS is stuck again in 100% for one of its 
processes. Here's the output of load: command:


opensips-cli -x mi get_statistics load:
{
    "load:load-proc-1": 0,
    "load:load1m-proc-1": 0,
    "load:load10m-proc-1": 0,
    "load:load-proc-2": 0,
    "load:load1m-proc-2": 0,
    "load:load10m-proc-2": 0,
    "load:load-proc-3": 0,
    "load:load1m-proc-3": 0,
    "load:load10m-proc-3": 0,
    "load:load-proc-4": 0,
    "load:load1m-proc-4": 0,
    "load:load10m-proc-4": 0,
    "load:load-proc-5": 0,
    "load:load1m-proc-5": 0,
    "load:load10m-proc-5": 8,
    "load:load-proc-6": 0,
    "load:load1m-proc-6": 0,
    "load:load10m-proc-6": 6,
    "load:load-proc-13": 0,
    "load:load1m-proc-13": 0,
    "load:load10m-proc-13": 0,
    "load:load-proc-14": 0,
    "load:load1m-proc-14": 0,
    "load:load10m-proc-14": 0,
    "load:load-proc-21": 0,
    "load:load1m-proc-21": 0,
    "load:load10m-proc-21": 0,
    "load:load-proc-22": 0,
    "load:load1m-proc-22": 0,
    "load:load10m-proc-22": 0,
    "load:load-proc-23": 0,
    "load:load1m-proc-23": 0,
    "load:load10m-proc-23": 0,
    "load:load-proc-24": 100,
    "load:load1m-proc-24": 100,
    "load:load10m-proc-24": 100,
    "load:load": 12,
    "load:load1m": 12,
    "load:load10m": 14,
    "load:load-all": 10,
    "load:load1m-all": 10,
    "load:load10m-all": 11,
    "load:processes_number": 13
}

As you can see, process 24 is consuming 100% of time for more than a 
minute already


Here's the output of process list, it's a UDP socket listener on 
internal interface that's stuck at 100% load:


opensips-cli -x mi ps
{
    "Processes": [
        {
            "ID": 0,
            "PID": 5457,
            "Type": "attendant"
        },
        {
            "ID": 1,
            "PID": 5463,
            "Type": "HTTPD 10.x.x.x:"
        },
        {
            "ID": 2,
            "PID": 5464,
            "Type": "MI FIFO"
        },
        {
            "ID": 3,
            "PID": 5465,
            "Type": "time_keeper"
        },
        {
            "ID": 4,
            "PID": 5466,
            "Type": "timer"
        },
        {
            "ID": 5,
            "PID": 5467,
            "Type": "SIP receiver udp:10.x.x.x:5060"
        },
        {
            "ID": 6,
            "PID": 5470,
            "Type": "SIP receiver udp:10.x.x.x:5060"
        },
        {
            "ID": 13,
            "PID": 5477,
            "Type": "SIP receiver udp:103.x.x.x:7060"
        },
        {
            "ID": 14,
            "PID": 5478,
            "Type": "SIP receiver udp:103.x.x.x:7060"
        },
        {
            "ID": 21,
            "PID": 5485,
            "Type": "TCP receiver"
        },
        {
            "ID": 22,
            "PID": 5486,
            "Type": "Timer handler"
        },
        {
            "ID": 23,
            "PID": 5487,
            "Type": "TCP main"
        },
        {
            "ID": 24,
            "PID": 5759,
            "Type": "SIP receiver udp:10.x.x.x:5060"
        }
    ]
}

opensips -V
version: opensips 3.2.8 (x86_64/linux)
flags: STATS: On, DISABLE_NAGLE, USE_MCAST, SHM_MMAP, PKG_MALLOC, 
Q_MALLOC, F_MALLOC, HP_MALLOC, DBG_MALLOC, FAST_LOCK-ADAPTIVE_WAIT
ADAPTIVE_WAIT_LOOPS=1024, MAX_RECV_BUFFER_SIZE 262144, MAX_LISTEN 16, 
MAX_URI_SIZE 1024, BUF_SIZE 65535

poll method support: poll, epoll, sigio_rt, select.
git revision: d2496fed5
main.c compiled on 16:17:53 Aug 24 2022 with gcc 9

This time server has some load but still it's not heavy at all plus 
I'm using async requests for REST queries.


This is my autoscaling section:

# Scaling section
auto_scaling_profile = PROFILE_UDP_PUB
     scale up to 16 on 70% for 4 cycles within 5
     scale down to 2 on 20% for 5 cycles

auto_scaling_profile = PROFILE_UDP_PRIV
     scale up to 16 on 70% for 4 cycles within 5
     scale down to 2 on 20% for 5 cycles

auto_scaling_profile = PROFILE_TCP
     scale up to 16 on 70% for 4 cycles within 5
     scale down to 2 on 20% for 10 cycles

And that's how I apply it to sockets, I'm not applying it to UDP 
workers at all:


socket=udp:10.x.x.x:5060

Re: [OpenSIPS-Users] Autoscaler in 3.2.x

2022-08-25 Thread Bogdan-Andrei Iancu


Hi Yury,

And when that scaling up happens, do you actually have traffic ? or your 
OpenSIPS is idle ?


Also, could you run `opensips-cli -x mi get_statistics load:` (not the 
colon at the end).


Regards,

Bogdan-Andrei Iancu

OpenSIPS Founder and Developer
  https://www.opensips-solutions.com
OpenSIPS Summit 27-30 Sept 2022, Athens
  https://www.opensips.org/events/Summit-2022Athens/

On 8/25/22 10:57 AM, Yury Kirsanov wrote:

Hi all,
I've ran into a strange issue, if I enable autoscaler on OpenSIPS 
3.2.x (tried 5,6,7 and now 8) on a server without any load using 
'socket' statement like this:


auto_scaling_profile = PROFILE_UDP_PRIV
     scale up to 16 on 30% for 4 cycles within 5
     scale down to 2 on 10% for 5 cycles

udp_workers=4

socket=udp:10.x.x.x:5060 use_auto_scaling_profile PROFILE_UDP_PRIV

then after a while OpenSIPS load goes up to some high number, 
autoscaler starts to open new processes up to a maximum number 
specified in profile and them load stays at that number, for example:


opensips-cli -x mi get_statistics load
{
    "load:load": 60
}

It never changes and looks just 'stuck'.

Any ideas why this is happening in my case? Or should I file a bug 
report? Thanks.


Regards,
Yury.

___
Users mailing list
Users@lists.opensips.org
http://lists.opensips.org/cgi-bin/mailman/listinfo/users


___
Users mailing list
Users@lists.opensips.org
http://lists.opensips.org/cgi-bin/mailman/listinfo/users

Re: [OpenSIPS-Users] Autoscaler in 3.2.x

Re: [OpenSIPS-Users] Autoscaler in 3.2.x

Re: [OpenSIPS-Users] Autoscaler in 3.2.x

Re: [OpenSIPS-Users] Autoscaler in 3.2.x

Re: [OpenSIPS-Users] Autoscaler in 3.2.x

Re: [OpenSIPS-Users] Autoscaler in 3.2.x

Re: [OpenSIPS-Users] Autoscaler in 3.2.x

Re: [OpenSIPS-Users] Autoscaler in 3.2.x

Re: [OpenSIPS-Users] Autoscaler in 3.2.x

Re: [OpenSIPS-Users] Autoscaler in 3.2.x

Re: [OpenSIPS-Users] Autoscaler in 3.2.x

Re: [OpenSIPS-Users] Autoscaler in 3.2.x

Re: [OpenSIPS-Users] Autoscaler in 3.2.x

Re: [OpenSIPS-Users] Autoscaler in 3.2.x

Re: [OpenSIPS-Users] Autoscaler in 3.2.x

Re: [OpenSIPS-Users] Autoscaler in 3.2.x

16 matches

Site Navigation

Mail list logo

Footer information