Re: [OpenSIPS-Users] Fine tuning high CPS and msyql queries

2020-06-16 Thread Alex Balashov

Hi Calvin,

I'm really glad you were able to get things sorted out, and I apologise 
if the thread got testy. I do appreciate your follow-up, which I think 
will benefit readers looking for similar answers.


A few inline thoughts:

On 6/15/20 4:04 PM, Calvin Ellison wrote:

I attempted to reproduce the original breakdown around 3000 CPS using 
the default 212992 byte receive buffer and could not, which tells me I 
broke a cardinal rule of load testing and changed more than one thing at 
a time. Also, don't do load testing when tired. I suspect that I had 
also made a change to the sipp scenario recv/sched loops, or I had 
unknowingly broken something while checking out the tuned package.


In several decades of doing backend systems programming, I've not found 
tuning Linux kernel defaults to be generally fruitful for improving 
throughput to any non-trivial degree. The defaults are sensible for 
almost all use-cases, all the more so given modern hardware and 
multi-core processors and the rest.


This is in sharp contrast to the conservative defaults some applications 
(e.g. Apache, MySQL) ship with on many distributions. I think the idea 
behind such conservative settings is to constrain the application so 
that in the event of a DDoS or similar event, it does not take over all 
available hardware resources, which would impede response and resolution.


But on the kernel settings, the only impactful changes I have ever seen 
are minor adjustments to slightly improve very niche server load 
problems of a rather global nature (e.g. related to I/O scheduling, NIC 
issues, storage, etc). This wasn't that kind of scenario.


In most respects, it just follows from first principles and Occam's 
Razor, IMHO. There's no reason for kernels to ship tuned unnecessarily 
conservatively to deny average users something on the order of _several 
times'_ more performance from their hardware, and any effort to do that 
would be readily apparent and, it stands to reason, staunchly opposed. 
It therefore also stands to reason that there isn't some silver bullet 
or magic setting that unlocks multiplicative performance gains, if only 
one just knows the secret sauce or thinks to tweak it--for the simple 
reason that if such a tweak existed, it would be systemically 
rationalised away, absent a clear and persuasive basis for such an 
artificial and contrived limit to exist. I cannot conceive of what such 
a basis would look like, and I'd like to think that's not just a failure 
of imagination.


Or in other words, it goes with the commonsensical, "If it seems too 
good to be true, it is," intuition. The basic fundamentals of the 
application, and to a lesser but still very significant extent the 
hardware (in terms of its relative homogeneity nowadays), determine 
99.9% of the performance characteristics, and matter a thousand times 
more than literally anything one can tweak.


I deeply appreciate Alex's instance that I was wrong and to keep 
digging. I am happy to retract my claim regarding "absolutely terrible 
sysctl defaults". Using synchronous/blocking DB queries, the 8-core 
server reached 14,000 CPS, at which point I declared it fixed and went 
to bed. It could probably go higher: there's only one DB query with a 
<10ms response time, Memcache for the query response, and some logic to 
decide how to respond. There's only a single non-200 final response, so 
it's probably as minimalist as it gets.


I would agree that with such a minimal call processing loop, given a 
generous number of CPU cores you shouldn't be terribly limited.


If anyone else is trying to tune their setup, I think Alex's advice to 
"not run more than 2 * (CPU threads) [children]" is the best place to 
start. I had inherited this project from someone else's work under 
version 1.11 and they had used 128 children. They were using remote DB 
servers with much higher latency than the local DBs we have today, so 
that might have been the reason. Or they were just wrong to being with.


Aye. Barring a workload consisting of exceptionally latent blocking 
service queries, there's really not a valid reason to ever have that 
many child processes, and even if one does have such a workload, plenty 
of reasons to lean on the fundamental latency problem rather than 
working around it with more child processes.


With the proviso that I am not an expert in modern-day OpenSIPS 
concurrency innards, the common OpenSER heritage prescribes a preforked 
worker process pool with SysV shared memory for inter-process 
communication (IPC). Like any shared memory space, this requires mutex 
locking so that multiple threads (in this case, processes) don't 
access/modify the same data structures at the same time* in ways that 
step on the others. Because every process holds and waits on these 
locks, this model works well when there aren't very many processes and 
their path to execution is mostly clear and not especially volatile, and 
when as little data is shared as 

Re: [OpenSIPS-Users] Fine tuning high CPS and msyql queries

2020-06-15 Thread Calvin Ellison
I attempted to reproduce the original breakdown around 3000 CPS using the
default 212992 byte receive buffer and could not, which tells me I broke a
cardinal rule of load testing and changed more than one thing at a time.
Also, don't do load testing when tired. I suspect that I had also made a
change to the sipp scenario recv/sched loops, or I had unknowingly
broken something while checking out the tuned package.

I deeply appreciate Alex's instance that I was wrong and to keep digging. I
am happy to retract my claim regarding "absolutely terrible sysctl
defaults". Using synchronous/blocking DB queries, the 8-core server reached
14,000 CPS, at which point I declared it fixed and went to bed. It could
probably go higher: there's only one DB query with a <10ms response time,
Memcache for the query response, and some logic to decide how to respond.
There's only a single non-200 final response, so it's probably as
minimalist as it gets.

If anyone else is trying to tune their setup, I think Alex's advice to "not
run more than 2 * (CPU threads) [children]" is the best place to start. I
had inherited this project from someone else's work under version 1.11 and
they had used 128 children. They were using remote DB servers with much
higher latency than the local DBs we have today, so that might have been
the reason. Or they were just wrong to being with.

The Description for Asynchronous Statements is extremely tempting and was
what started me down that path; it might be missing a qualification that
Async can be an improvement for slow blocking operations, but the
additional overhead may be a disadvantage for very fast blocking
operations.

Thank you to everyone who responded to this topic.

Regards,

*Calvin Ellison*
Senior Voice Operations Engineer
calvin.elli...@voxox.com


On Fri, Jun 12, 2020 at 6:42 PM Alex Balashov 
wrote:
___
Users mailing list
Users@lists.opensips.org
http://lists.opensips.org/cgi-bin/mailman/listinfo/users


Re: [OpenSIPS-Users] Fine tuning high CPS and msyql queries

2020-06-12 Thread Alex Balashov

On 6/12/20 9:11 PM, Calvin Ellison wrote:

You suggested to "Monitor your receive queue scrupulously at a very high 
timing resolution". How do I do this?


If using pre-systemd systems, e.g. EL6:

# netstat --inet -n -l | grep 5060

If it's systemd and beyond -- I'm sure Ubuntu Server 18 is, though I 
have no experience with it:


# ss -4nl | grep 5060

Example:

---
[root@allegro-1 ~]# ss -4nl | grep 5060
udpUNCONN 0  0  10.150.20.5:5060  *:* 

udpUNCONN 0  0  10.150.20.2:5060  *:* 

udpUNCONN 0  0  209.51.167.66:5060  *:* 

tcpLISTEN 0  12810.150.20.2:5060  *:* 


---

The third column there (all-0s) is the RecvQ, as can be gleaned from the 
header this command outputs:



Netid  State  Recv-Q Send-Q Local Address:Port   Peer 
Address:Port

---

For `netstat`, it would be the second column.

To monitor it at a low interval, for example 200 ms (5 times per sec), 
you could do something like:


---
#!/bin/bash

while : ; do
echo -n "$(date +"%T.%3N"): "
ss -4nl | grep 5060 | head -1 | awk '{print $3}'
sleep 0.2
done
---

That should give you some idea of where the value sits in general.

You propose there is a pathological issue and the increased buffer size 
is masking it. How do I determine what that issue is?


Without knowing what your exact routing workflow is, I can't say.

However, 99.9% of the time, it occurs in blocking queries to databases 
or other data sources.


I've asked repeatedly about children, shared memory, process 
memory, timer_partitions, etc. but the only answers have been "try 
more". I've been trying more and less of these things two weeks and 
changing the buffers was the only thing that appeared to have any 
immediate impact. How do I know when enough is enough versus too much?


I wrote this article several years ago for Kamailio, but the same basic 
considerations apply to OpenSIPS:


http://www.evaristesys.com/blog/tuning-kamailio-for-high-throughput-and-performance/

"Try more" is definitely not the answer except in cases where the 
workload is overwhelmingly network I/O-bound and/or database-bound. 
Otherwise, the most natural course of action would be to spawn a 
functionally infinite number of children. However, children create 
context switches, contend with each other for CPU time (less a concern 
if most of the workload is waiting on blocking external I/O) and fight 
for various global shared memory structures and locks (still a concern 
regardless). So, there is a point of diminishing returns for any given 
workload. All other things being equal, as per the article, the 
reasonable number of child processes is equal to the number of available 
CPU threads (in /proc/cpuinfo). This number can be increased if the 
workload is very I/O-bound, but only to a point. It's hard to say 
exactly what that point is, and it does have to be empirically 
determined, but I would not run more than 2 * (CPU threads).


Note, there have been no memory-related log messages. The 16-thread 
servers have 48GB RAM and the 8-thread servers have 16GB. I'm happy to 
give all that to OpenSIPS once I know the right way to carve it up.


I see no rationality in giving it all to OpenSIPS.

It's worth bearing in mind that there are two kinds of memory allocations:

- Shared memory, used by the system for global/system-wide data 
constructs, such as transaction memory, dialog state, etc.;


- Package memory, memory that is private to each process and used for 
handling the immediate message. That means every child process 
pre-allocates the package memory requested, so this value should of 
course be much, much smaller than your shared memory pool size.


But still, when you consider all the data that OpenSIPS needs to keep in 
the course of call processing, a lot of it is ephemeral and 
transaction-associated. Once the call is set up, the INVITE transaction 
is disposed. Other call state may add up to a few kilobytes per call at 
most (notwithstanding page sizes and blocks in the underlying 
allocator), but nothing on the order of gigabytes upon gigabytes. 
Assuming 4 KB per call and 200,000 concurrent calls, that's ~800 MB, and 
that is a very generous assumption indeed.


-- Alex

--
Alex Balashov | Principal | Evariste Systems LLC

Tel: +1-706-510-6800 / +1-800-250-5920 (toll-free)
Web: http://www.evaristesys.com/, http://www.csrpswitch.com/

___
Users mailing list
Users@lists.opensips.org
http://lists.opensips.org/cgi-bin/mailman/listinfo/users


Re: [OpenSIPS-Users] Fine tuning high CPS and msyql queries

2020-06-12 Thread Calvin Ellison
On Fri, Jun 12, 2020 at 5:23 PM Alex Balashov 
wrote:

> One should see the forest for the trees, instead of cultivating a myopic
> preoccupation with short-term, stop-gap solutions.
>

Understanding that text lacks tone, this a rather offputting comment for a
mailing list intended to help users. I appreciate your time and feedback,
there's no need to be insulting. Perhaps you could stop assuming what my
preoccupations and scope of vision are and concentrate on the problem and
the solution? The question now is why increasing buffers made any
difference at all.

You suggested to "Monitor your receive queue scrupulously at a very high
timing resolution". How do I do this?

You propose there is a pathological issue and the increased buffer size is
masking it. How do I determine what that issue is?

I've asked repeatedly about children, shared memory, process
memory, timer_partitions, etc. but the only answers have been "try more".
I've been trying more and less of these things two weeks and changing the
buffers was the only thing that appeared to have any immediate impact. How
do I know when enough is enough versus too much?

Note, there have been no memory-related log messages. The 16-thread servers
have 48GB RAM and the 8-thread servers have 16GB. I'm happy to give all
that to OpenSIPS once I know the right way to carve it up.

Should I even be using 2.4?
___
Users mailing list
Users@lists.opensips.org
http://lists.opensips.org/cgi-bin/mailman/listinfo/users


Re: [OpenSIPS-Users] Fine tuning high CPS and msyql queries

2020-06-12 Thread Alex Balashov

On 6/12/20 8:03 PM, Calvin Ellison wrote:

That's been the point of this discussion. Unfortunately, there answers 
so far have added to up "keep changing settings until you find what 
works best" and "buffers are a Ponzi scheme" despite and immediate 3x 
performance increase.


Perhaps if one adopts a rather esoteric understanding of "performance 
increase" ... in principle, queueing more packets indicates the 
opposite. In that light, one could look at a lower receive queue depth 
as an optimisation, actually.


One should see the forest for the trees, instead of cultivating a myopic 
preoccupation with short-term, stop-gap solutions.


Two things are logically possible:

(1) The receive queue backlog keeps stacking up until the size of 16.7m 
is exhausted as well; this is clearly not happening, or it would not be 
triumphantly claimed as a solution;


(2) The higher buffer provides the elasticity needed to cope with 
stochastic I/O wait conditions which, for a given base CPS load, occur 
within a certain range of latencies and at a certain frequency 
distribution. Because the blocking conditions are not always uniformly 
present--they're stochastic, after all--in those low-tide moments, the 
receive queue drains.


The larger buffer solves #2, but not by way of providing for a 
"performance increase". Nothing is performing better. It's a lever--a 
quite imperfect one--to cope with the fact that something is performing 
_quite poorly_. However, fortunately for you, it happens on an 
intermittent basis, otherwise you'd have quickly encountered scenario #1.


#2 is a better and more manageable problem than #1, in strictly relative 
terms, but both are pathological and need to be addressed. To say that 
this amounts to a "tuning" or "optimisation" that yields a "performance 
increase" is profoundly misleading.


-- Alex

--
Alex Balashov | Principal | Evariste Systems LLC

Tel: +1-706-510-6800 / +1-800-250-5920 (toll-free)
Web: http://www.evaristesys.com/, http://www.csrpswitch.com/

___
Users mailing list
Users@lists.opensips.org
http://lists.opensips.org/cgi-bin/mailman/listinfo/users


Re: [OpenSIPS-Users] Fine tuning high CPS and msyql queries

2020-06-12 Thread Alex Balashov
The value is perfectly reasonable for an application that is properly 
coping with its request load.


On 6/12/20 7:49 PM, David Villasmil wrote:
Keep in mundo the don’t make Ubuntu for SIP applications, which have 
their own idiosyncrasies. They make it general purpose. So finding a 
value that doesn’t work perfectly with what you need for this very 
specific application, is not a big deal.


On Sat, 13 Jun 2020 at 00:38, Calvin Ellison > wrote:


I doubt the system will be using all of that buffer. I also don't
know if the issue was in the receive buffer or send buffer since I
changed both at once. Many resources are available online from
people who have already done much more scientific testing that
indicate the default values should be increased for certain
applications, which is the reason I changed it to begin with.
There's no one-size-fits all for server configurations, and what
works for this UDP application with a small number of clients might
not work well for a different application with many TCS connections.

"absolutely terrible" may be too strong of a way to put it, but that
the before and after don't lie.

On Fri, Jun 12, 2020 at 4:02 PM Alex Balashov
mailto:abalas...@evaristesys.com>> wrote:

But increasing the depth of the queue by 78x (if I'm not mistaken,
212992 is the default--at least, it is on all my CentOS 7.x and 8.x

___
Users mailing list
Users@lists.opensips.org 
http://lists.opensips.org/cgi-bin/mailman/listinfo/users

--
Regards,

David Villasmil
email: david.villasmil.w...@gmail.com 


phone: +34669448337

___
Users mailing list
Users@lists.opensips.org
http://lists.opensips.org/cgi-bin/mailman/listinfo/users



--
Alex Balashov | Principal | Evariste Systems LLC

Tel: +1-706-510-6800 / +1-800-250-5920 (toll-free)
Web: http://www.evaristesys.com/, http://www.csrpswitch.com/

___
Users mailing list
Users@lists.opensips.org
http://lists.opensips.org/cgi-bin/mailman/listinfo/users


Re: [OpenSIPS-Users] Fine tuning high CPS and msyql queries

2020-06-12 Thread Calvin Ellison
On Fri, Jun 12, 2020 at 4:35 PM David Villasmil <
david.villasmil.w...@gmail.com> wrote:

> Basically, the application is not processing the received packets as
> quickly as it should, so the kernel stores the packets in the buffer so it
> doesn’t have to throw them away.
>
> It’s not so difficult to understand. If this is happening all the time,
> you won’t solve this by making the buffer bigger. You solve this by
> figuring out why the application is not processing the packets fast enough.
>

That's been the point of this discussion. Unfortunately, there answers so
far have added to up "keep changing settings until you find what works
best" and "buffers are a Ponzi scheme" despite and immediate 3x performance
increase.
___
Users mailing list
Users@lists.opensips.org
http://lists.opensips.org/cgi-bin/mailman/listinfo/users


Re: [OpenSIPS-Users] Fine tuning high CPS and msyql queries

2020-06-12 Thread Calvin Ellison
Agreed it's not a big deal that the kernel default is 212992, it's that
value because it's enough for lots of things. It is a big deal not being
aware of its effect on things could be the difference between less than
3,000 CPS and more than 10,000.

On Fri, Jun 12, 2020 at 4:50 PM David Villasmil <
david.villasmil.w...@gmail.com> wrote:

> Keep in mundo the don’t make Ubuntu for SIP applications, which have their
> own idiosyncrasies. They make it general purpose. So finding a value that
> doesn’t work perfectly with what you need for this very specific
> application, is not a big deal.
>
___
Users mailing list
Users@lists.opensips.org
http://lists.opensips.org/cgi-bin/mailman/listinfo/users


Re: [OpenSIPS-Users] Fine tuning high CPS and msyql queries

2020-06-12 Thread David Villasmil
Keep in mundo the don’t make Ubuntu for SIP applications, which have their
own idiosyncrasies. They make it general purpose. So finding a value that
doesn’t work perfectly with what you need for this very specific
application, is not a big deal.

On Sat, 13 Jun 2020 at 00:38, Calvin Ellison 
wrote:

> I doubt the system will be using all of that buffer. I also don't know if
> the issue was in the receive buffer or send buffer since I changed both at
> once. Many resources are available online from people who have already done
> much more scientific testing that indicate the default values should be
> increased for certain applications, which is the reason I changed it to
> begin with. There's no one-size-fits all for server configurations, and
> what works for this UDP application with a small number of clients might
> not work well for a different application with many TCS connections.
>
> "absolutely terrible" may be too strong of a way to put it, but that the
> before and after don't lie.
>
> On Fri, Jun 12, 2020 at 4:02 PM Alex Balashov 
> wrote:
>
>> But increasing the depth of the queue by 78x (if I'm not mistaken,
>> 212992 is the default--at least, it is on all my CentOS 7.x and 8.x
>>
>> ___
> Users mailing list
> Users@lists.opensips.org
> http://lists.opensips.org/cgi-bin/mailman/listinfo/users
>
-- 
Regards,

David Villasmil
email: david.villasmil.w...@gmail.com
phone: +34669448337
___
Users mailing list
Users@lists.opensips.org
http://lists.opensips.org/cgi-bin/mailman/listinfo/users


Re: [OpenSIPS-Users] Fine tuning high CPS and msyql queries

2020-06-12 Thread Calvin Ellison
I doubt the system will be using all of that buffer. I also don't know if
the issue was in the receive buffer or send buffer since I changed both at
once. Many resources are available online from people who have already done
much more scientific testing that indicate the default values should be
increased for certain applications, which is the reason I changed it to
begin with. There's no one-size-fits all for server configurations, and
what works for this UDP application with a small number of clients might
not work well for a different application with many TCS connections.

"absolutely terrible" may be too strong of a way to put it, but that the
before and after don't lie.

On Fri, Jun 12, 2020 at 4:02 PM Alex Balashov 
wrote:

> But increasing the depth of the queue by 78x (if I'm not mistaken,
> 212992 is the default--at least, it is on all my CentOS 7.x and 8.x
>
>
___
Users mailing list
Users@lists.opensips.org
http://lists.opensips.org/cgi-bin/mailman/listinfo/users


Re: [OpenSIPS-Users] Fine tuning high CPS and msyql queries

2020-06-12 Thread David Villasmil
Basically, the application is not processing the received packets as
quickly as it should, so the kernel stores the packets in the buffer so it
doesn’t have to throw them away.

It’s not so difficult to understand. If this is happening all the time, you
won’t solve this by making the buffer bigger. You solve this by figuring
out why the application is not processing the packets fast enough.


On Sat, 13 Jun 2020 at 00:28, Alex Balashov 
wrote:

> On 6/12/20 7:20 PM, Calvin Ellison wrote:
>
> > I think the important point here is that the receive buffers are used to
> > hold received data until it is read by the application. In fact, too
> > small of a receive buffer would cause packets to be discarded outright,
> > regardless of how fast the application can respond. Not knowing how
> > large of a buffer is needed was the problem, not the raw processing
> > power. It doesn't matter how fast I can eat if the server only has very
> > small plates to bring the food every trip from the kitchen.
>
> In absolute terms, this is true. But if your kitchen is putting out so
> much food that not even ~200,000 plates "in flight" will do, you've got
> a bigger problem to address and adding more plates is just papering it
> over.
>
> Monitor your receive queue scrupulously at a very high timing
> resolution. If you found default values for rmem_max to be "absolutely
> terrible", that means the backlog was increasing monotonically until you
> ran out of space. If you increase the queue depth, you will be able to
> prolong this effect for a while.
>
> The kernel's packet queue is a backstop--an emergency release valve, not
> a main thoroughfare. It's there to help you deal with ephemeral
> congestion caused by things like periodic big-lock background process
> contention, scheduler hiccups, disk controller patrol reads, etc.  But
> the base load should result in a long-run queue backlog of zero.
> Applications which properly cope with their workload don't cause
> non-trivial packet or connection queueing on the OS side.
>
> -- Alex
>
> --
> Alex Balashov | Principal | Evariste Systems LLC
>
> Tel: +1-706-510-6800 / +1-800-250-5920 (toll-free)
> Web: http://www.evaristesys.com/, http://www.csrpswitch.com/
>
> ___
> Users mailing list
> Users@lists.opensips.org
> http://lists.opensips.org/cgi-bin/mailman/listinfo/users
>
-- 
Regards,

David Villasmil
email: david.villasmil.w...@gmail.com
phone: +34669448337
___
Users mailing list
Users@lists.opensips.org
http://lists.opensips.org/cgi-bin/mailman/listinfo/users


Re: [OpenSIPS-Users] Fine tuning high CPS and msyql queries

2020-06-12 Thread Alex Balashov

On 6/12/20 7:20 PM, Calvin Ellison wrote:

I think the important point here is that the receive buffers are used to 
hold received data until it is read by the application. In fact, too 
small of a receive buffer would cause packets to be discarded outright, 
regardless of how fast the application can respond. Not knowing how 
large of a buffer is needed was the problem, not the raw processing 
power. It doesn't matter how fast I can eat if the server only has very 
small plates to bring the food every trip from the kitchen.


In absolute terms, this is true. But if your kitchen is putting out so 
much food that not even ~200,000 plates "in flight" will do, you've got 
a bigger problem to address and adding more plates is just papering it 
over.


Monitor your receive queue scrupulously at a very high timing 
resolution. If you found default values for rmem_max to be "absolutely 
terrible", that means the backlog was increasing monotonically until you 
ran out of space. If you increase the queue depth, you will be able to 
prolong this effect for a while.


The kernel's packet queue is a backstop--an emergency release valve, not 
a main thoroughfare. It's there to help you deal with ephemeral 
congestion caused by things like periodic big-lock background process 
contention, scheduler hiccups, disk controller patrol reads, etc.  But 
the base load should result in a long-run queue backlog of zero. 
Applications which properly cope with their workload don't cause 
non-trivial packet or connection queueing on the OS side.


-- Alex

--
Alex Balashov | Principal | Evariste Systems LLC

Tel: +1-706-510-6800 / +1-800-250-5920 (toll-free)
Web: http://www.evaristesys.com/, http://www.csrpswitch.com/

___
Users mailing list
Users@lists.opensips.org
http://lists.opensips.org/cgi-bin/mailman/listinfo/users


Re: [OpenSIPS-Users] Fine tuning high CPS and msyql queries

2020-06-12 Thread Calvin Ellison
I think the important point here is that the receive buffers are used to
hold received data until it is read by the application. In fact, too small
of a receive buffer would cause packets to be discarded outright,
regardless of how fast the application can respond. Not knowing how large
of a buffer is needed was the problem, not the raw processing power. It
doesn't matter how fast I can eat if the server only has very small plates
to bring the food every trip from the kitchen.


On Fri, Jun 12, 2020 at 4:02 PM Alex Balashov 
wrote:

> Perhaps a simpler way look at it: buffers. It's in the name - they
> buffer things.
>
___
Users mailing list
Users@lists.opensips.org
http://lists.opensips.org/cgi-bin/mailman/listinfo/users


Re: [OpenSIPS-Users] Fine tuning high CPS and msyql queries

2020-06-12 Thread Alex Balashov

On 6/12/20 7:08 PM, Calvin Ellison wrote:

I don't disagree that there is a physical limit to what any hardware can 
achieve. However, there was a dramatic difference between the default 
setting and the increased setting.
I do: the default setting leaves packets beyond the depth of the queue 
dropped, which, in the case of SIP requests and some responses, causes 
them to be retransmitted, which creates a positive feedback loop. 
Letting them stack up allows more of them to be attended to eventually, 
reducing that dynamic.


But telling the OS network stack to hold more packets because you can't 
process them fast enough isn't a solution to the problem of not 
processing them fast enough. You're infinitely better off just 
processing them faster.


-- Alex

--
Alex Balashov | Principal | Evariste Systems LLC

Tel: +1-706-510-6800 / +1-800-250-5920 (toll-free)
Web: http://www.evaristesys.com/, http://www.csrpswitch.com/

___
Users mailing list
Users@lists.opensips.org
http://lists.opensips.org/cgi-bin/mailman/listinfo/users


Re: [OpenSIPS-Users] Fine tuning high CPS and msyql queries

2020-06-12 Thread Calvin Ellison
I don't disagree that there is a physical limit to what any hardware can
achieve. However, there was a dramatic difference between the default
setting and the increased setting. I have no explanation for this, as I am
not a kernel or network developer.  I'm happy to run the load test again
both ways and see if there was any difference in response times.


On Fri, Jun 12, 2020 at 3:51 PM Alex Balashov 
wrote:

> There's no free lunch, but it seems like you and others want one. :-)
>
___
Users mailing list
Users@lists.opensips.org
http://lists.opensips.org/cgi-bin/mailman/listinfo/users


Re: [OpenSIPS-Users] Fine tuning high CPS and msyql queries

2020-06-12 Thread Alex Balashov
Perhaps a simpler way look at it: buffers. It's in the name - they 
buffer things.


If your "work queue" buffer (which is what the packet RecvQ 
fundamentally is) is chronically full, to the degree that it needs 
increased significantly, it means things aren't consistently leaving out 
at the other end at the same velocity that they're coming in.


In small amounts, this regulating capacity is acceptable and 
necessary--that's why buffers exist.


But increasing the depth of the queue by 78x (if I'm not mistaken, 
212992 is the default--at least, it is on all my CentOS 7.x and 8.x 
systems, which I guess also have "absolutely terrible sysctl defaults") 
is faker than a Ponzi scheme. In some other contexts, this would be 
called morally bankrupt and intellectually fraudulent. I guess here we 
call it "mad dialer CPS" or whatever.


-- Alex

On 6/12/20 6:50 PM, Alex Balashov wrote:
There's no free lunch, but it seems like you and others want one. :-) 
Increasing these values just increases the depth of the kernel's packet 
queue for the sync processes to consume as able. It doesn't mean they're 
able, and accordingly, request response time will go up.


A healthy system that is able to keep up with the load you're throwing 
at it should show a receive queue at +/- 0 most of the time, maybe with 
some ephemeral spikes but generally trending around 0. If packets are 
stacking up in the RecvQ, it means the SIP worker processes aren't 
available enough to consume them all in a timely fashion.


Leaning on async won't help here if the workload is largely CPU-bound. 
If it's largely bound over waiting on network I/O from external 
services, it merely deputises the problem of notifying you when there's 
a response from those services to the kernel. But - vitally - it won't 
get your requests processed faster, and setup latency is a very 
important consideration in real-time communications, especially from the 
perspective of interoperability with the synchronous/circuit-switched PSTN.


In short, async isn't magic, and neither is increasing the receive 
queue. It's simple thermodynamics; there's only so much CPU available, 
and depending on the nature of the workload, throughput becomes more a 
linear function of available CPU hardware threads, or less, but slower, 
if it's largely I/O-bound.


The metaphor of a balloon is appropriate. You're pushing the problem 
around by squeezing one part of the balloon, causing another to enlarge. 
Various parts of the balloon can be squeezed - async vs. sync, various 
queues and buffers, etc. But the internal volume of air held by the 
balloon is more or less the same. A little slack can be added into the 
system through your rmem_max technique, as long as you're willing to 
tolerate increased processing latency--and it will generate increased 
latency; if it didn't, you wouldn't need to increase it--but ultimately, 
you're just pushing the air around the balloon. A fixed amount of CPU 
and memory is available to accommodate the large number of processes 
that sleep on an external I/O-bound workload, and there are diminishing 
returns from both internal OpenSIPS contention and context switching.


I'm not saying there aren't some local minima and maxima, but they 
aren't as magnitudinal as folks think. It's not that Ubuntu Server is 
mistuned, it's that you're abusing it. :-) You can't put the milk back 
in the cow, although it's quite a spectacle ...


-- Alex

On 6/12/20 6:02 PM, Calvin Ellison wrote:
I noticed a way-too-small receive buffer value in the OpenSIPS startup 
messages and it turns out that a fresh Ubuntu 18 Server install has 
absolutely terrible sysctl defaults for high-performance networking. I 
got my 8-core lab from less than 2,000 CPS up to 14,000 CPS using a 
spread of all dips in non-async mode just by setting the following to 
match "maxbuffer=16777216":


net.core.rmem_max = 16777216
net.core.wmem_max = 16777216

Does OpenSIPS have guidelines for sysclt and other OS parameters?

    Async requires the TM module which adds additional overhead and
    memory allocation.


According to with the docs:
"By requiring less processes to complete the same amount of work in the
same amount of time, process context switching is minimized and
overall CPU usage is improved. Less processes will also eat up less
system memory."

So which is it? When should async be used, and when should async not 
be used? One can only invest so many hours in load testing 
combinations of sync/async, the number of children, timer_partitions, 
etc. Some fuzzy math based on CPU core count, SpecInt Rate, BogoMIPS, 
etc. would be a great starting point.


Regards,

*Calvin Ellison*
Senior Voice Operations Engineer
calvin.elli...@voxox.com 


___
Users mailing list
Users@lists.opensips.org
http://lists.opensips.org/cgi-bin/mailman/listinfo/users





--
Alex Balashov | Principal | Evariste Systems LLC

Tel: 

Re: [OpenSIPS-Users] Fine tuning high CPS and msyql queries

2020-06-12 Thread Alex Balashov
There's no free lunch, but it seems like you and others want one. :-) 
Increasing these values just increases the depth of the kernel's packet 
queue for the sync processes to consume as able. It doesn't mean they're 
able, and accordingly, request response time will go up.


A healthy system that is able to keep up with the load you're throwing 
at it should show a receive queue at +/- 0 most of the time, maybe with 
some ephemeral spikes but generally trending around 0. If packets are 
stacking up in the RecvQ, it means the SIP worker processes aren't 
available enough to consume them all in a timely fashion.


Leaning on async won't help here if the workload is largely CPU-bound. 
If it's largely bound over waiting on network I/O from external 
services, it merely deputises the problem of notifying you when there's 
a response from those services to the kernel. But - vitally - it won't 
get your requests processed faster, and setup latency is a very 
important consideration in real-time communications, especially from the 
perspective of interoperability with the synchronous/circuit-switched PSTN.


In short, async isn't magic, and neither is increasing the receive 
queue. It's simple thermodynamics; there's only so much CPU available, 
and depending on the nature of the workload, throughput becomes more a 
linear function of available CPU hardware threads, or less, but slower, 
if it's largely I/O-bound.


The metaphor of a balloon is appropriate. You're pushing the problem 
around by squeezing one part of the balloon, causing another to enlarge. 
Various parts of the balloon can be squeezed - async vs. sync, various 
queues and buffers, etc. But the internal volume of air held by the 
balloon is more or less the same. A little slack can be added into the 
system through your rmem_max technique, as long as you're willing to 
tolerate increased processing latency--and it will generate increased 
latency; if it didn't, you wouldn't need to increase it--but ultimately, 
you're just pushing the air around the balloon. A fixed amount of CPU 
and memory is available to accommodate the large number of processes 
that sleep on an external I/O-bound workload, and there are diminishing 
returns from both internal OpenSIPS contention and context switching.


I'm not saying there aren't some local minima and maxima, but they 
aren't as magnitudinal as folks think. It's not that Ubuntu Server is 
mistuned, it's that you're abusing it. :-) You can't put the milk back 
in the cow, although it's quite a spectacle ...


-- Alex

On 6/12/20 6:02 PM, Calvin Ellison wrote:
I noticed a way-too-small receive buffer value in the OpenSIPS startup 
messages and it turns out that a fresh Ubuntu 18 Server install has 
absolutely terrible sysctl defaults for high-performance networking. I 
got my 8-core lab from less than 2,000 CPS up to 14,000 CPS using a 
spread of all dips in non-async mode just by setting the following to 
match "maxbuffer=16777216":


net.core.rmem_max = 16777216
net.core.wmem_max = 16777216

Does OpenSIPS have guidelines for sysclt and other OS parameters?

Async requires the TM module which adds additional overhead and
memory allocation.


According to with the docs:
"By requiring less processes to complete the same amount of work in the
same amount of time, process context switching is minimized and
overall CPU usage is improved. Less processes will also eat up less
system memory."

So which is it? When should async be used, and when should async not be 
used? One can only invest so many hours in load testing combinations of 
sync/async, the number of children, timer_partitions, etc. Some fuzzy 
math based on CPU core count, SpecInt Rate, BogoMIPS, etc. would be a 
great starting point.


Regards,

*Calvin Ellison*
Senior Voice Operations Engineer
calvin.elli...@voxox.com 


___
Users mailing list
Users@lists.opensips.org
http://lists.opensips.org/cgi-bin/mailman/listinfo/users



--
Alex Balashov | Principal | Evariste Systems LLC

Tel: +1-706-510-6800 / +1-800-250-5920 (toll-free)
Web: http://www.evaristesys.com/, http://www.csrpswitch.com/

___
Users mailing list
Users@lists.opensips.org
http://lists.opensips.org/cgi-bin/mailman/listinfo/users


Re: [OpenSIPS-Users] Fine tuning high CPS and msyql queries

2020-06-12 Thread Calvin Ellison
 I noticed a way-too-small receive buffer value in the OpenSIPS startup
messages and it turns out that a fresh Ubuntu 18 Server install has
absolutely terrible sysctl defaults for high-performance networking. I got
my 8-core lab from less than 2,000 CPS up to 14,000 CPS using a spread of
all dips in non-async mode just by setting the following to match
"maxbuffer=16777216":

net.core.rmem_max = 16777216
net.core.wmem_max = 16777216

Does OpenSIPS have guidelines for sysclt and other OS parameters?

Async requires the TM module which adds additional overhead and memory
> allocation.


According to with the docs:
"By requiring less processes to complete the same amount of work in the
same amount of time, process context switching is minimized and
overall CPU usage is improved. Less processes will also eat up less
system memory."

So which is it? When should async be used, and when should async not be
used? One can only invest so many hours in load testing combinations of
sync/async, the number of children, timer_partitions, etc. Some fuzzy math
based on CPU core count, SpecInt Rate, BogoMIPS, etc. would be a great
starting point.

Regards,

*Calvin Ellison*
Senior Voice Operations Engineer
calvin.elli...@voxox.com
___
Users mailing list
Users@lists.opensips.org
http://lists.opensips.org/cgi-bin/mailman/listinfo/users


Re: [OpenSIPS-Users] Fine tuning high CPS and msyql queries

2020-06-10 Thread Saint Michael
I do 3000+ CPS with Opensips and MySQL using unixODBC, no problem.  My
query is for routing only. I read a 280 MM table (RocksDB)  for every call.
So it's comparable.
It seems to work flawlessly. However, I only use $60.000 servers from Dell,
R920s, with 64 physical cores and 120 threads, plus 1.5 TB of RAM. My
Opensips is under Vmware ESX 6.X.
Honestly, I paid an Opensips guru to assemble the application for me.
I designed the logic base on my previous switch for Asterisk that could not
handle the pressure. Basically all routing is done in MariaDB. Opensips
just asks a question using a Stored Procedure.
The brain is the database, and opensips executes instructions. It just
works.


On Wed, Jun 10, 2020 at 5:15 PM Jon Abrams  wrote:

> I built a similar functioning platform back in 2015 based on similar
> hardware (Westmere Xeons, hyperthreading enabled)  running bare metal on
> Centos6. At some point we bumped it up to dual X5670s (cheap upgrade
> nowadays), but it was handling 12000 CPS peaks on 1 server with 3000-5000
> CPS sustained for large parts of the day. I don't think you are too far off
> in hardware.
> This was on version 1.9, so there was no Async. IIRC it was either 32 or
> 64 children. Async requires the TM module which adds additional overhead
> and memory allocation.
> The LRN database was stored in mysql with a very simple table (TN, LRN) to
> keep memory usage down so that it could be pinned in memory (server had 48
> or 72GB I think). MySQL was set to dump the innodb buffer cache to disk on
> restart so that the whole database would be back in memory on restart.
> Doing a full table scan would initially populate the MySQL cache.
> Blacklists and other smaller datasets were stored in OpenSIPs using the
> userblacklist module. There are better ways to do that in version 2 and
> onwards. Bigger list were stored in memcached. I prefer redis for this
> purpose now.
>
> I would suggest simplifying testing by using a single MySQL server and
> bypassing the F5 to eliminate that as a source of connection problems or
> additional latencies.
> In the OpenSIPs script, eliminate everything but 1 dip, probably just dip
> the LRN to start.
> Performance test the stripped down scenario with sipp. Based on past
> experience, you should be able to hit or come close to your performance
> goal with only 1 dip in play.
> If you do hit your performance targets, keep adding more dips one by one
> until it breaks.
> If you can't reach your performance target with this stripped down
> scenario, then I'd suggest testing without the async and transactions
> enabled. I wouldn't think transactions would be a necessity in this
> scenario. I ran into CPS problems on that other open source SIP server when
> using async under heavy load. The transaction creation was chewing up CPU
> and memory. I'm not sure how different the implementation is here.
> I seem to start having problems with sipp when I hit a few thousand CPS
> due to it being single threaded. You probably will need to run multiple
> parallel sipp processes for your load test, if not already.
> If using an OS with systemd journald for logging, that will be a big
> bottleneck in of itself with even small amounts of logging.
> In 1.9, I hacked together a module to create a timestamp string with ms
> for logging query latencies for diagnostic purposes. There may be a better
> out of the box way to do it now.
> For children sizing, I would suggest benchmarking with at least 16
> children and then doubling it to compare performance.
> Watch the logs for out of memory or defragment messages and bump up shared
> memory or package memory if necessary. Package memory is probably going to
> be your problem, but it doesn't sound like it is a problem yet.
>
> BR,
> Jon Abrams
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
> On Wed, Jun 10, 2020 at 3:20 PM Calvin Ellison 
> wrote:
>
>> We've checked our F5 BigIP configuration and added a second database
>> server to the pool. Both DBs have been checked for max connections, open
>> files, etc. Memcached has been moved to a dedicated server. Using a SIPp
>> scenario for load testing from a separate host, things seem to fall apart
>> on OpenSIPS around 3,000 CPS with every CPU core at or near 100% and no
>> logs indicating fallback to sync/blocking mode. Both databases barely
>> noticed the few hundred connections. Does this seem reasonable for a dual
>> CPU server with 8 cores and 16 threads?
>>
>>
>> https://ark.intel.com/content/www/us/en/ark/products/47925/intel-xeon-processor-e5620-12m-cache-2-40-ghz-5-86-gt-s-intel-qpi.html
>>
>> What is the OpenSIPS opinion on Hyper-Threading?
>>
>> Is there a way to estimate max CPS based on SPECrate, BogoMIPS, or some
>> other metric?
>>
>> I would love to know if my opensips.cfg has any mistakes, omissions, or
>> inefficiencies. Is there a person or group who does sanity checks?
>>
>> What should I be looking at within OpenSIPS during a load test to
>> identify bottlenecks?
>>
>> I'm still 

Re: [OpenSIPS-Users] Fine tuning high CPS and msyql queries

2020-06-10 Thread Jon Abrams
I built a similar functioning platform back in 2015 based on similar
hardware (Westmere Xeons, hyperthreading enabled)  running bare metal on
Centos6. At some point we bumped it up to dual X5670s (cheap upgrade
nowadays), but it was handling 12000 CPS peaks on 1 server with 3000-5000
CPS sustained for large parts of the day. I don't think you are too far off
in hardware.
This was on version 1.9, so there was no Async. IIRC it was either 32 or 64
children. Async requires the TM module which adds additional overhead and
memory allocation.
The LRN database was stored in mysql with a very simple table (TN, LRN) to
keep memory usage down so that it could be pinned in memory (server had 48
or 72GB I think). MySQL was set to dump the innodb buffer cache to disk on
restart so that the whole database would be back in memory on restart.
Doing a full table scan would initially populate the MySQL cache.
Blacklists and other smaller datasets were stored in OpenSIPs using the
userblacklist module. There are better ways to do that in version 2 and
onwards. Bigger list were stored in memcached. I prefer redis for this
purpose now.

I would suggest simplifying testing by using a single MySQL server and
bypassing the F5 to eliminate that as a source of connection problems or
additional latencies.
In the OpenSIPs script, eliminate everything but 1 dip, probably just dip
the LRN to start.
Performance test the stripped down scenario with sipp. Based on past
experience, you should be able to hit or come close to your performance
goal with only 1 dip in play.
If you do hit your performance targets, keep adding more dips one by one
until it breaks.
If you can't reach your performance target with this stripped down
scenario, then I'd suggest testing without the async and transactions
enabled. I wouldn't think transactions would be a necessity in this
scenario. I ran into CPS problems on that other open source SIP server when
using async under heavy load. The transaction creation was chewing up CPU
and memory. I'm not sure how different the implementation is here.
I seem to start having problems with sipp when I hit a few thousand CPS due
to it being single threaded. You probably will need to run multiple
parallel sipp processes for your load test, if not already.
If using an OS with systemd journald for logging, that will be a big
bottleneck in of itself with even small amounts of logging.
In 1.9, I hacked together a module to create a timestamp string with ms for
logging query latencies for diagnostic purposes. There may be a better out
of the box way to do it now.
For children sizing, I would suggest benchmarking with at least 16 children
and then doubling it to compare performance.
Watch the logs for out of memory or defragment messages and bump up shared
memory or package memory if necessary. Package memory is probably going to
be your problem, but it doesn't sound like it is a problem yet.

BR,
Jon Abrams
















On Wed, Jun 10, 2020 at 3:20 PM Calvin Ellison 
wrote:

> We've checked our F5 BigIP configuration and added a second database
> server to the pool. Both DBs have been checked for max connections, open
> files, etc. Memcached has been moved to a dedicated server. Using a SIPp
> scenario for load testing from a separate host, things seem to fall apart
> on OpenSIPS around 3,000 CPS with every CPU core at or near 100% and no
> logs indicating fallback to sync/blocking mode. Both databases barely
> noticed the few hundred connections. Does this seem reasonable for a dual
> CPU server with 8 cores and 16 threads?
>
>
> https://ark.intel.com/content/www/us/en/ark/products/47925/intel-xeon-processor-e5620-12m-cache-2-40-ghz-5-86-gt-s-intel-qpi.html
>
> What is the OpenSIPS opinion on Hyper-Threading?
>
> Is there a way to estimate max CPS based on SPECrate, BogoMIPS, or some
> other metric?
>
> I would love to know if my opensips.cfg has any mistakes, omissions, or
> inefficiencies. Is there a person or group who does sanity checks?
>
> What should I be looking at within OpenSIPS during a load test to identify
> bottlenecks?
>
> I'm still looking for guidance on the things below, especially children
> vs timer_partitions:
>
> Is there an established method for fine-tuning these things?
>> shared memory
>> process memory
>> children
>> db_max_async_connections
>> listen=... use_children
>> modparam("tm", "timer_partitions", ?)
>
>
> What else is worth considering?
>
> Regards,
>
> Calvin Ellison
> Senior Voice Operations Engineer
> calvin.elli...@voxox.com
>
> On Thu, Jun 4, 2020 at 5:18 PM David Villasmil <
> david.villasmil.w...@gmail.com> wrote:
> >
> > Maybe you are hitting the max connections? How many connections are
> there when it starts to show those errors?
> ___
> Users mailing list
> Users@lists.opensips.org
> http://lists.opensips.org/cgi-bin/mailman/listinfo/users
>
___
Users mailing list
Users@lists.opensips.org

Re: [OpenSIPS-Users] Fine tuning high CPS and msyql queries

2020-06-10 Thread Calvin Ellison
We've checked our F5 BigIP configuration and added a second database server
to the pool. Both DBs have been checked for max connections, open files,
etc. Memcached has been moved to a dedicated server. Using a SIPp scenario
for load testing from a separate host, things seem to fall apart on
OpenSIPS around 3,000 CPS with every CPU core at or near 100% and no logs
indicating fallback to sync/blocking mode. Both databases barely noticed
the few hundred connections. Does this seem reasonable for a dual CPU
server with 8 cores and 16 threads?

https://ark.intel.com/content/www/us/en/ark/products/47925/intel-xeon-processor-e5620-12m-cache-2-40-ghz-5-86-gt-s-intel-qpi.html

What is the OpenSIPS opinion on Hyper-Threading?

Is there a way to estimate max CPS based on SPECrate, BogoMIPS, or some
other metric?

I would love to know if my opensips.cfg has any mistakes, omissions, or
inefficiencies. Is there a person or group who does sanity checks?

What should I be looking at within OpenSIPS during a load test to identify
bottlenecks?

I'm still looking for guidance on the things below, especially children
vs timer_partitions:

Is there an established method for fine-tuning these things?
> shared memory
> process memory
> children
> db_max_async_connections
> listen=... use_children
> modparam("tm", "timer_partitions", ?)


What else is worth considering?

Regards,

Calvin Ellison
Senior Voice Operations Engineer
calvin.elli...@voxox.com

On Thu, Jun 4, 2020 at 5:18 PM David Villasmil <
david.villasmil.w...@gmail.com> wrote:
>
> Maybe you are hitting the max connections? How many connections are there
when it starts to show those errors?
___
Users mailing list
Users@lists.opensips.org
http://lists.opensips.org/cgi-bin/mailman/listinfo/users


Re: [OpenSIPS-Users] Fine tuning high CPS and msyql queries

2020-06-05 Thread David Villasmil
No idea, but can always check with any of several utilities, I.e.: netstat

On Fri, 5 Jun 2020 at 01:37, Calvin Ellison 
wrote:

> On Thu, Jun 4, 2020 at 5:18 PM David Villasmil
>  wrote:
> >
> > Maybe you are hitting the max connections? How many connections are
> there when it starts to show those errors?
>
> I'd definitely benefit from a monitor on this. Is this available from
> within opensips?
>
> ___
> Users mailing list
> Users@lists.opensips.org
> http://lists.opensips.org/cgi-bin/mailman/listinfo/users
>
-- 
Regards,

David Villasmil
email: david.villasmil.w...@gmail.com
phone: +34669448337
___
Users mailing list
Users@lists.opensips.org
http://lists.opensips.org/cgi-bin/mailman/listinfo/users


Re: [OpenSIPS-Users] Fine tuning high CPS and msyql queries

2020-06-04 Thread Calvin Ellison
On Thu, Jun 4, 2020 at 5:18 PM David Villasmil
 wrote:
>
> Maybe you are hitting the max connections? How many connections are there 
> when it starts to show those errors?

I'd definitely benefit from a monitor on this. Is this available from
within opensips?

___
Users mailing list
Users@lists.opensips.org
http://lists.opensips.org/cgi-bin/mailman/listinfo/users


Re: [OpenSIPS-Users] Fine tuning high CPS and msyql queries

2020-06-04 Thread David Villasmil
Maybe you are hitting the max connections? How many connections are there
when it starts to show those errors?

On Fri, 5 Jun 2020 at 01:06, Calvin Ellison 
wrote:

> > A) Is the LRN database located locally on the OpenSIPs box or is it
> remote?
>
> We are using an F5 BIG-IP to proxy a pool of database servers.
> Opensips is showing two connection-related errors:
>
> Jun  4 10:41:48 TC-521 /usr/sbin/opensips[12318]:
> ERROR:db_mysql:db_mysql_connect: driver error(2013): Lost connection
> to MySQL server at 'reading authorization packet', system error: 110
> Jun  4 10:41:48 TC-521 /usr/sbin/opensips[12318]:
> ERROR:db_mysql:db_mysql_new_connection: initial connect failed
> Jun  4 10:41:48 TC-521 /usr/sbin/opensips[12318]:
> ERROR:core:db_init_async: failed to open new DB connection on
> mysql://:@10.0.5.38:0/
> Jun  4 10:41:48 TC-521 /usr/sbin/opensips[12318]:
> INFO:db_mysql:db_mysql_async_raw_query: Failed to open new connection
> (current: 1 + 8). Running in sync mode!
> Jun  4 10:41:48 TC-521 /usr/sbin/opensips[12318]:
> INFO:db_mysql:switch_state_to_disconnected: disconnect event for
> 0x7f8903f16d10
> Jun  4 10:41:48 TC-521 /usr/sbin/opensips[12318]:
> INFO:db_mysql:reset_all_statements: resetting all statements on
> connection: (0x7f8903f16bb0) 0x7f8903f16d10
> Jun  4 10:41:48 TC-521 /usr/sbin/opensips[12318]:
> INFO:db_mysql:connect_with_retry: re-connected successful for
> 0x7f8903f16d10
>
> Jun  4 10:44:29 TC-521 /usr/sbin/opensips[12342]:
> ERROR:db_mysql:db_mysql_connect: driver error(2003): Can't connect to
> MySQL server on '10.0.5.38' (110)
> Jun  4 10:44:29 TC-521 /usr/sbin/opensips[12342]:
> ERROR:db_mysql:db_mysql_new_connection: initial connect failed
> Jun  4 10:44:29 TC-521 /usr/sbin/opensips[12342]:
> ERROR:core:db_init_async: failed to open new DB connection on
> mysql://:@10.0.5.38:0/
> Jun  4 10:44:29 TC-521 /usr/sbin/opensips[12342]:
> INFO:db_mysql:db_mysql_async_raw_query: Failed to open new connection
> (current: 1 + 10). Running in sync mode!
> Jun  4 10:44:29 TC-521 /usr/sbin/opensips[12342]:
> INFO:db_mysql:switch_state_to_disconnected: disconnect event for
> 0x7f8903f16d10
> Jun  4 10:44:29 TC-521 /usr/sbin/opensips[12342]:
> INFO:db_mysql:reset_all_statements: resetting all statements on
> connection: (0x7f8903f16bb0) 0x7f8903f16d10
> Jun  4 10:44:29 TC-521 /usr/sbin/opensips[12342]:
> INFO:db_mysql:connect_with_retry: re-connected successful for
> 0x7f8903f16d10
>
> MariaDB is also showing an error from its perspective:
>
> 2020-06-04 23:40:27 64783 [Warning] Aborted connection 64783 to db:
> 'unconnected' user: 'anonymous' host: '8.38.42.13' (Got timeout
> reading communication packets)
>
> > B) Have you tried only doing sync database queries? Async introduces
> some overhead, and I'm not sure if it causes extra database connections to
> be created. When using sync there is a connection per child process that
> stays up.
>
> Using synchronous mode appeared to be causing context switching issues
> under heavy load. We specifically moved to async for this reason and
> that appeared to reduce the CPU load dramatically. From the docs:
>
> "Using the asynchronous, "suspend-resume" logic instead of forking a
> large number of processes in order to scale also has the advantage of
> optimizing system resource usage, increasing its maximal throughput.
> By requiring less processes to complete the same amount of work in the
> same amount of time, process context switching is minimized and
> overall CPU usage is improved. Less processes will also eat up less
> system memory."
>
> I've been tweaking each of the configuration settings I've mentioned,
> but without any clear path forward. Would 3.x provide any solutions?
>
> Is it possible to have too many children or timer partitions, and
> starve opensips with context switches? Would that cause connection
> issues?
>
> > C) Does the database have enough memory to contain the LRN and DNC
> datasets fully in memory? The extra latency for the non-cache hits sent to
> the database may stack up if the database has to hit disk.
>
> DB says query response time is like 0.001s and doesn't show any sign
> of strain. I'm not personally familiar with the TokuDB engine, but I'm
> lead to believe the entire dataset is in memory. I have two DBA triple
> checking things. It's possible we're hitting a max connections or open
> files limit that's set too low. Sometimes our peak hours include
> spikes as well.
>
> > D) How many child processes are you using now? If you are hitting 100%
> you may need to increase them.
>
> Only one hits 100% initially, then they topple over after that. This
> seems to be related to the intermittent database connection errors.
> We'll see what raising the max connections and ulimits on the server
> does. I've also backed off on children and increased the async
> connection pool size to result in the same number of total maximum
> connections. Presumably this will reduce context switches and timer
> 

Re: [OpenSIPS-Users] Fine tuning high CPS and msyql queries

2020-06-04 Thread Calvin Ellison
> A) Is the LRN database located locally on the OpenSIPs box or is it remote?

We are using an F5 BIG-IP to proxy a pool of database servers.
Opensips is showing two connection-related errors:

Jun  4 10:41:48 TC-521 /usr/sbin/opensips[12318]:
ERROR:db_mysql:db_mysql_connect: driver error(2013): Lost connection
to MySQL server at 'reading authorization packet', system error: 110
Jun  4 10:41:48 TC-521 /usr/sbin/opensips[12318]:
ERROR:db_mysql:db_mysql_new_connection: initial connect failed
Jun  4 10:41:48 TC-521 /usr/sbin/opensips[12318]:
ERROR:core:db_init_async: failed to open new DB connection on
mysql://:@10.0.5.38:0/
Jun  4 10:41:48 TC-521 /usr/sbin/opensips[12318]:
INFO:db_mysql:db_mysql_async_raw_query: Failed to open new connection
(current: 1 + 8). Running in sync mode!
Jun  4 10:41:48 TC-521 /usr/sbin/opensips[12318]:
INFO:db_mysql:switch_state_to_disconnected: disconnect event for
0x7f8903f16d10
Jun  4 10:41:48 TC-521 /usr/sbin/opensips[12318]:
INFO:db_mysql:reset_all_statements: resetting all statements on
connection: (0x7f8903f16bb0) 0x7f8903f16d10
Jun  4 10:41:48 TC-521 /usr/sbin/opensips[12318]:
INFO:db_mysql:connect_with_retry: re-connected successful for
0x7f8903f16d10

Jun  4 10:44:29 TC-521 /usr/sbin/opensips[12342]:
ERROR:db_mysql:db_mysql_connect: driver error(2003): Can't connect to
MySQL server on '10.0.5.38' (110)
Jun  4 10:44:29 TC-521 /usr/sbin/opensips[12342]:
ERROR:db_mysql:db_mysql_new_connection: initial connect failed
Jun  4 10:44:29 TC-521 /usr/sbin/opensips[12342]:
ERROR:core:db_init_async: failed to open new DB connection on
mysql://:@10.0.5.38:0/
Jun  4 10:44:29 TC-521 /usr/sbin/opensips[12342]:
INFO:db_mysql:db_mysql_async_raw_query: Failed to open new connection
(current: 1 + 10). Running in sync mode!
Jun  4 10:44:29 TC-521 /usr/sbin/opensips[12342]:
INFO:db_mysql:switch_state_to_disconnected: disconnect event for
0x7f8903f16d10
Jun  4 10:44:29 TC-521 /usr/sbin/opensips[12342]:
INFO:db_mysql:reset_all_statements: resetting all statements on
connection: (0x7f8903f16bb0) 0x7f8903f16d10
Jun  4 10:44:29 TC-521 /usr/sbin/opensips[12342]:
INFO:db_mysql:connect_with_retry: re-connected successful for
0x7f8903f16d10

MariaDB is also showing an error from its perspective:

2020-06-04 23:40:27 64783 [Warning] Aborted connection 64783 to db:
'unconnected' user: 'anonymous' host: '8.38.42.13' (Got timeout
reading communication packets)

> B) Have you tried only doing sync database queries? Async introduces some 
> overhead, and I'm not sure if it causes extra database connections to be 
> created. When using sync there is a connection per child process that stays 
> up.

Using synchronous mode appeared to be causing context switching issues
under heavy load. We specifically moved to async for this reason and
that appeared to reduce the CPU load dramatically. From the docs:

"Using the asynchronous, "suspend-resume" logic instead of forking a
large number of processes in order to scale also has the advantage of
optimizing system resource usage, increasing its maximal throughput.
By requiring less processes to complete the same amount of work in the
same amount of time, process context switching is minimized and
overall CPU usage is improved. Less processes will also eat up less
system memory."

I've been tweaking each of the configuration settings I've mentioned,
but without any clear path forward. Would 3.x provide any solutions?

Is it possible to have too many children or timer partitions, and
starve opensips with context switches? Would that cause connection
issues?

> C) Does the database have enough memory to contain the LRN and DNC datasets 
> fully in memory? The extra latency for the non-cache hits sent to the 
> database may stack up if the database has to hit disk.

DB says query response time is like 0.001s and doesn't show any sign
of strain. I'm not personally familiar with the TokuDB engine, but I'm
lead to believe the entire dataset is in memory. I have two DBA triple
checking things. It's possible we're hitting a max connections or open
files limit that's set too low. Sometimes our peak hours include
spikes as well.

> D) How many child processes are you using now? If you are hitting 100% you 
> may need to increase them.

Only one hits 100% initially, then they topple over after that. This
seems to be related to the intermittent database connection errors.
We'll see what raising the max connections and ulimits on the server
does. I've also backed off on children and increased the async
connection pool size to result in the same number of total maximum
connections. Presumably this will reduce context switches and timer
delays.

> E) Are your memcached processes using heavy cpu? If you are caching multiple 
> lists, I've found it helps to use unique memcached instance per list.

All of the various SIP dips are the same db stored procedure with many
fields in the response. Those fields are cached as a CSV string, so
any cached dip can be used by any 

Re: [OpenSIPS-Users] Fine tuning high CPS and msyql queries

2020-06-04 Thread Saint Michael
Calvin, feel free to login to the container and change the config for
MariaDB, it is located in a single file /etc/my.cnf.
I use TokuDB, a new engine that is better than InnoDB. Lately, I shifted to
RocksDB, which is even better, designed by Facebook.
I have not updated that box because: "if it ain't broke, don't fix it".
But I am open to change the engine in the second container, if you
so desire..
MariaDB is better than MySQL because it gives us a pool of threads,
something that MySQL only gives to paying customers. So I think it should
work.


On Thu, Jun 4, 2020 at 3:44 PM Jon Abrams  wrote:

> A) Is the LRN database located locally on the OpenSIPs box or is it remote?
> B) Have you tried only doing sync database queries? Async introduces some
> overhead, and I'm not sure if it causes extra database connections to be
> created. When using sync there is a connection per child process that stays
> up.
> C) Does the database have enough memory to contain the LRN and DNC
> datasets fully in memory? The extra latency for the non-cache hits sent to
> the database may stack up if the database has to hit disk.
> D) How many child processes are you using now? If you are hitting 100% you
> may need to increase them.
> E) Are your memcached processes using heavy cpu? If you are caching
> multiple lists, I've found it helps to use unique memcached instance per
> list.
> F) Look for memory related log messages. If the memory starts getting
> exhausted you will see defrag messages. This will chew up available
> computation cycles.
>
> - Jon Abrams
>
>
> On Thu, Jun 4, 2020 at 2:17 PM Calvin Ellison 
> wrote:
>
>> The scenario is INVITE -> MySQL query -> non-200 final response. No
>> calls are connected here, only dipping things like LRN, Do Not Call,
>> and Wireless/Landline. A similar service runs on a second port,
>> specific to a different kind of traffic and dip. We're using async
>> avp_db_query and memcached, with about 3:1 cache hits.
>>
>> Our target is up to 10,000 CPS across two opensips servers, which are
>> dual-CPU Xeon E5620 with 48G RAM. Both are run memcached, and both
>> servers are using both memcached to share a distributed cache thanks
>> to this:
>>
>> 'modparam("cachedb_memcached","cachedb_url","memcached:lrn://lrn-d,lrn-e/")'.
>> At a glance there are over 200mil total cached items, distributed
>> nearly equally.
>>
>> The issue is that individual child processes start getting suck at
>> 100% CPU. Logs indicate connection failures to the MySQL database
>> causing children to run in sync mode, and there are warnings about
>> delayed timer jobs tm-timer and blcore-expire. Eventually, the service
>> becomes unresponsive. Restarting opensips restores service and the
>> children return to single-digit CPU utilization, but eventually,
>> children get suck again.
>>
>> I'm not certain if the issue is on the database server, or if the
>> opensips servers are overloaded, or if the config is just not right
>> yet.
>>
>> Is there an established method for fine-tuning these things?
>> shared memory
>> process memory
>> children
>> db_max_async_connections
>> listen=... use_children
>> modparam("tm", "timer_partitions", ?)
>>
>> What else is worth considering?
>> Does a child ever return to async mode after running in sync mode?
>> How do I know when my servers have reached their limit?
>> opensips.cfg is available on request.
>>
>> version: opensips 2.4.7 (x86_64/linux)
>> flags: STATS: On, DISABLE_NAGLE, USE_MCAST, SHM_MMAP, PKG_MALLOC,
>> F_MALLOC, FAST_LOCK-ADAPTIVE_WAIT
>> ADAPTIVE_WAIT_LOOPS=1024, MAX_RECV_BUFFER_SIZE 262144, MAX_LISTEN 16,
>> MAX_URI_SIZE 1024, BUF_SIZE 65535
>> poll method support: poll, epoll, sigio_rt, select.
>> git revision: 9e1fcc915
>> main.c compiled on  with gcc 7
>>
>> *re-built using dpkg-buildpackage including the patch to support DB
>> floating point types:
>> https://opensips.org/pipermail/users/2020-March/042528.html
>>
>> $ lsb_release -d
>> Description:Ubuntu 18.04.4 LTS
>>
>> $ uname -a
>> Linux TC-521 4.15.0-91-generic #92-Ubuntu SMP Fri Feb 28 11:09:48 UTC
>> 2020 x86_64 x86_64 x86_64 GNU/Linux
>>
>> $ free -mw
>>   totalusedfree  shared buffers
>>cache   available
>> Mem:  482811085 337  871729
>>45128   46551
>>
>> $ lscpu
>> Architecture:x86_64
>> CPU op-mode(s):  32-bit, 64-bit
>> Byte Order:  Little Endian
>> CPU(s):  16
>> On-line CPU(s) list: 0-15
>> Thread(s) per core:  2
>> Core(s) per socket:  4
>> Socket(s):   2
>> NUMA node(s):2
>> Vendor ID:   GenuineIntel
>> CPU family:  6
>> Model:   44
>> Model name:  Intel(R) Xeon(R) CPU   E5620  @ 2.40GHz
>> Stepping:2
>> CPU MHz: 2527.029
>> BogoMIPS:4788.05
>> Virtualization:  VT-x
>> L1d cache:   32K
>> L1i cache:   32K
>> L2 cache:256K
>> L3 cache:

Re: [OpenSIPS-Users] Fine tuning high CPS and msyql queries

2020-06-04 Thread Jon Abrams
A) Is the LRN database located locally on the OpenSIPs box or is it remote?
B) Have you tried only doing sync database queries? Async introduces some
overhead, and I'm not sure if it causes extra database connections to be
created. When using sync there is a connection per child process that stays
up.
C) Does the database have enough memory to contain the LRN and DNC datasets
fully in memory? The extra latency for the non-cache hits sent to the
database may stack up if the database has to hit disk.
D) How many child processes are you using now? If you are hitting 100% you
may need to increase them.
E) Are your memcached processes using heavy cpu? If you are caching
multiple lists, I've found it helps to use unique memcached instance per
list.
F) Look for memory related log messages. If the memory starts getting
exhausted you will see defrag messages. This will chew up available
computation cycles.

- Jon Abrams


On Thu, Jun 4, 2020 at 2:17 PM Calvin Ellison 
wrote:

> The scenario is INVITE -> MySQL query -> non-200 final response. No
> calls are connected here, only dipping things like LRN, Do Not Call,
> and Wireless/Landline. A similar service runs on a second port,
> specific to a different kind of traffic and dip. We're using async
> avp_db_query and memcached, with about 3:1 cache hits.
>
> Our target is up to 10,000 CPS across two opensips servers, which are
> dual-CPU Xeon E5620 with 48G RAM. Both are run memcached, and both
> servers are using both memcached to share a distributed cache thanks
> to this:
>
> 'modparam("cachedb_memcached","cachedb_url","memcached:lrn://lrn-d,lrn-e/")'.
> At a glance there are over 200mil total cached items, distributed
> nearly equally.
>
> The issue is that individual child processes start getting suck at
> 100% CPU. Logs indicate connection failures to the MySQL database
> causing children to run in sync mode, and there are warnings about
> delayed timer jobs tm-timer and blcore-expire. Eventually, the service
> becomes unresponsive. Restarting opensips restores service and the
> children return to single-digit CPU utilization, but eventually,
> children get suck again.
>
> I'm not certain if the issue is on the database server, or if the
> opensips servers are overloaded, or if the config is just not right
> yet.
>
> Is there an established method for fine-tuning these things?
> shared memory
> process memory
> children
> db_max_async_connections
> listen=... use_children
> modparam("tm", "timer_partitions", ?)
>
> What else is worth considering?
> Does a child ever return to async mode after running in sync mode?
> How do I know when my servers have reached their limit?
> opensips.cfg is available on request.
>
> version: opensips 2.4.7 (x86_64/linux)
> flags: STATS: On, DISABLE_NAGLE, USE_MCAST, SHM_MMAP, PKG_MALLOC,
> F_MALLOC, FAST_LOCK-ADAPTIVE_WAIT
> ADAPTIVE_WAIT_LOOPS=1024, MAX_RECV_BUFFER_SIZE 262144, MAX_LISTEN 16,
> MAX_URI_SIZE 1024, BUF_SIZE 65535
> poll method support: poll, epoll, sigio_rt, select.
> git revision: 9e1fcc915
> main.c compiled on  with gcc 7
>
> *re-built using dpkg-buildpackage including the patch to support DB
> floating point types:
> https://opensips.org/pipermail/users/2020-March/042528.html
>
> $ lsb_release -d
> Description:Ubuntu 18.04.4 LTS
>
> $ uname -a
> Linux TC-521 4.15.0-91-generic #92-Ubuntu SMP Fri Feb 28 11:09:48 UTC
> 2020 x86_64 x86_64 x86_64 GNU/Linux
>
> $ free -mw
>   totalusedfree  shared buffers
>cache   available
> Mem:  482811085 337  871729
>45128   46551
>
> $ lscpu
> Architecture:x86_64
> CPU op-mode(s):  32-bit, 64-bit
> Byte Order:  Little Endian
> CPU(s):  16
> On-line CPU(s) list: 0-15
> Thread(s) per core:  2
> Core(s) per socket:  4
> Socket(s):   2
> NUMA node(s):2
> Vendor ID:   GenuineIntel
> CPU family:  6
> Model:   44
> Model name:  Intel(R) Xeon(R) CPU   E5620  @ 2.40GHz
> Stepping:2
> CPU MHz: 2527.029
> BogoMIPS:4788.05
> Virtualization:  VT-x
> L1d cache:   32K
> L1i cache:   32K
> L2 cache:256K
> L3 cache:12288K
> NUMA node0 CPU(s):   0,2,4,6,8,10,12,14
> NUMA node1 CPU(s):   1,3,5,7,9,11,13,15
> Flags:   fpu vme de pse tsc msr pae mce cx8 apic sep mtrr
> pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe
> syscall nx pdpe1gb rdtscp lm constant_tsc arch_perfmon pebs bts
> rep_good nopl xtopology nonstop_tsc cpuid aperfmperf pni pclmulqdq
> dtes64 monitor ds_cpl vmx smx est tm2 ssse3 cx16 xtpr pdcm pcid dca
> sse4_1 sse4_2 popcnt aes lahf_lm pti ssbd ibrs ibpb stibp tpr_shadow
> vnmi flexpriority ept vpid dtherm ida arat flush_l1d
>
> Regards,
>
> Calvin Ellison
> Senior Voice Operations Engineer
> calvin.elli...@voxox.com
>
> ___
> Users mailing list

[OpenSIPS-Users] Fine tuning high CPS and msyql queries

2020-06-04 Thread Calvin Ellison
The scenario is INVITE -> MySQL query -> non-200 final response. No
calls are connected here, only dipping things like LRN, Do Not Call,
and Wireless/Landline. A similar service runs on a second port,
specific to a different kind of traffic and dip. We're using async
avp_db_query and memcached, with about 3:1 cache hits.

Our target is up to 10,000 CPS across two opensips servers, which are
dual-CPU Xeon E5620 with 48G RAM. Both are run memcached, and both
servers are using both memcached to share a distributed cache thanks
to this:
'modparam("cachedb_memcached","cachedb_url","memcached:lrn://lrn-d,lrn-e/")'.
At a glance there are over 200mil total cached items, distributed
nearly equally.

The issue is that individual child processes start getting suck at
100% CPU. Logs indicate connection failures to the MySQL database
causing children to run in sync mode, and there are warnings about
delayed timer jobs tm-timer and blcore-expire. Eventually, the service
becomes unresponsive. Restarting opensips restores service and the
children return to single-digit CPU utilization, but eventually,
children get suck again.

I'm not certain if the issue is on the database server, or if the
opensips servers are overloaded, or if the config is just not right
yet.

Is there an established method for fine-tuning these things?
shared memory
process memory
children
db_max_async_connections
listen=... use_children
modparam("tm", "timer_partitions", ?)

What else is worth considering?
Does a child ever return to async mode after running in sync mode?
How do I know when my servers have reached their limit?
opensips.cfg is available on request.

version: opensips 2.4.7 (x86_64/linux)
flags: STATS: On, DISABLE_NAGLE, USE_MCAST, SHM_MMAP, PKG_MALLOC,
F_MALLOC, FAST_LOCK-ADAPTIVE_WAIT
ADAPTIVE_WAIT_LOOPS=1024, MAX_RECV_BUFFER_SIZE 262144, MAX_LISTEN 16,
MAX_URI_SIZE 1024, BUF_SIZE 65535
poll method support: poll, epoll, sigio_rt, select.
git revision: 9e1fcc915
main.c compiled on  with gcc 7

*re-built using dpkg-buildpackage including the patch to support DB
floating point types:
https://opensips.org/pipermail/users/2020-March/042528.html

$ lsb_release -d
Description:Ubuntu 18.04.4 LTS

$ uname -a
Linux TC-521 4.15.0-91-generic #92-Ubuntu SMP Fri Feb 28 11:09:48 UTC
2020 x86_64 x86_64 x86_64 GNU/Linux

$ free -mw
  totalusedfree  shared buffers
   cache   available
Mem:  482811085 337  871729
   45128   46551

$ lscpu
Architecture:x86_64
CPU op-mode(s):  32-bit, 64-bit
Byte Order:  Little Endian
CPU(s):  16
On-line CPU(s) list: 0-15
Thread(s) per core:  2
Core(s) per socket:  4
Socket(s):   2
NUMA node(s):2
Vendor ID:   GenuineIntel
CPU family:  6
Model:   44
Model name:  Intel(R) Xeon(R) CPU   E5620  @ 2.40GHz
Stepping:2
CPU MHz: 2527.029
BogoMIPS:4788.05
Virtualization:  VT-x
L1d cache:   32K
L1i cache:   32K
L2 cache:256K
L3 cache:12288K
NUMA node0 CPU(s):   0,2,4,6,8,10,12,14
NUMA node1 CPU(s):   1,3,5,7,9,11,13,15
Flags:   fpu vme de pse tsc msr pae mce cx8 apic sep mtrr
pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe
syscall nx pdpe1gb rdtscp lm constant_tsc arch_perfmon pebs bts
rep_good nopl xtopology nonstop_tsc cpuid aperfmperf pni pclmulqdq
dtes64 monitor ds_cpl vmx smx est tm2 ssse3 cx16 xtpr pdcm pcid dca
sse4_1 sse4_2 popcnt aes lahf_lm pti ssbd ibrs ibpb stibp tpr_shadow
vnmi flexpriority ept vpid dtherm ida arat flush_l1d

Regards,

Calvin Ellison
Senior Voice Operations Engineer
calvin.elli...@voxox.com

___
Users mailing list
Users@lists.opensips.org
http://lists.opensips.org/cgi-bin/mailman/listinfo/users