Re: Few problems seen in haproxy? (threads, connections).

2018-10-16 Thread Krishna Kumar (Engineering)
Hi Willy,

My systems were out of rotation for some other tests so did not get
to this till now. I have pulled the latest bits just now and tested.
Regarding
maxconn, I simply kept maxconn in global/defaults to 1 million and have
this line in the backend section:
default-server maxconn 100
I have not seen the Queue/Max you mentioned earlier.

The FD time has gone down to zero, but the LB time has increased
about 50% from last time (7700 ns to 11600 ns, I am using 'balance
leastconn').
Test was run for 1 minute:
$ wrk -c 4800 -t 48 -d 60s http://www.flipkart.com/128

The results were for 32 threads, which is the same configuration I tested
with earlier. Both of these testing was done with threads pinning to NUMA-1
cores (cores 1, 3, 5, ..47), and irq's to NUMA-0 (0, 2, 4, ..46). However,
the
cpus recycles from 1-47 back to 1-15 for the thread pinning. So that may
explain the much higher lock numbers that I am seeing. When I changed
this to use all cpus (0-31), the LBPRM lock took 74339.117 ns per operation.
But performance dropped from 210K to 80K.

Overall, I am not at ease for threading, or will have to settle for 12
threads
for the 12 non-hyperthreaded cores for a single socket.

Inlining the output of locks for the case where all threads are pinned to
NUMA-1 cores (and hence 2 threads to same cores for some cores),
at the end of this mail.

Thanks,
- Krishna

Stats about Lock FD:
# write lock  : 2
# write unlock: 2 (0)
# wait time for write : 0.001 msec
# wait time for write/lock: 302.000 nsec
# read lock   : 0
# read unlock : 0 (0)
# wait time for read  : 0.000 msec
# wait time for read/lock : 0.000 nsec
Stats about Lock TASK_RQ:
# write lock  : 373317
# write unlock: 373317 (0)
# wait time for write : 341.875 msec
# wait time for write/lock: 915.775 nsec
# read lock   : 0
# read unlock : 0 (0)
# wait time for read  : 0.000 msec
# wait time for read/lock : 0.000 nsec
Stats about Lock TASK_WQ:
# write lock  : 373432
# write unlock: 373432 (0)
# wait time for write : 491.524 msec
# wait time for write/lock: 1316.235 nsec
# read lock   : 0
# read unlock : 0 (0)
# wait time for read  : 0.000 msec
# wait time for read/lock : 0.000 nsec
Stats about Lock LISTENER:
# write lock  : 1248
# write unlock: 1248 (0)
# wait time for write : 0.295 msec
# wait time for write/lock: 236.341 nsec
# read lock   : 0
# read unlock : 0 (0)
# wait time for read  : 0.000 msec
# wait time for read/lock : 0.000 nsec
Stats about Lock PROXY:
# write lock  : 12524202
# write unlock: 12524202 (0)
# wait time for write : 20979.972 msec
# wait time for write/lock: 1675.154 nsec
# read lock   : 0
# read unlock : 0 (0)
# wait time for read  : 0.000 msec
# wait time for read/lock : 0.000 nsec
Stats about Lock SERVER:
# write lock  : 50100330
# write unlock: 50100330 (0)
# wait time for write : 76908.311 msec
# wait time for write/lock: 1535.086 nsec
# read lock   : 0
# read unlock : 0 (0)
# wait time for read  : 0.000 msec
# wait time for read/lock : 0.000 nsec
Stats about Lock LBPRM:
# write lock  : 50096808
# write unlock: 50096808 (0)
# wait time for write : 584505.012 msec
# wait time for write/lock: 11667.510 nsec
# read lock   : 0
# read unlock : 0 (0)
# wait time for read  : 0.000 msec
# wait time for read/lock : 0.000 nsec
Stats about Lock BUF_WQ:
# write lock  : 35653802
# write unlock: 35653802 (0)
# wait time for write : 80406.420 msec
# wait time for write/lock: 2255.199 nsec
# read lock   : 0
# read unlock : 0 (0)
# wait time for read  : 0.000 msec
# wait time for read/lock : 0.000 nsec
Stats about Lock STRMS:
# write lock  : 9602
# write unlock: 9602 (0)
# wait time for write : 5.613 msec
# wait time for write/lock: 584.594 nsec
# read lock   : 0
# read unlock : 0 (0)
# wait time for read  : 0.000 msec
# wait time for read/lock : 0.000 nsec
Stats about Lock VARS:
# write lock  : 37596611
# write unlock: 37596611 (0)
# wait time for write : 2285.148 msec
# wait time for write/lock: 60.781 nsec
# read lock   : 0
# read unlock : 0 (0)
# wait time for read  : 0.000 msec
# wait time for read/lock : 0.000 nsec


On Mon, Oct 15, 2018 at 11:14 PM Willy Tarreau  wrote:

> Hi again,
>
> finally I got rid of the FD lock for single-threaded accesses (most of
> them), and based on Olivier's suggestion, I implemented a per-thread
> wait queue, and cache-aligned some list heads to avoid undesired cache
> line sharing. For me all of this combined resulted in a performance
> increase of 25% on a 12-threads workload. I'm interested in your test
> results, all of this is in the latest master.
>
> If you still see LBPRM a lot, I can send you the experimental patch
> to move the element inside the tree without unlinking/relinking it
> and we can see if that provides any benefit or not (I'm not convinced).
>
> Cheers,
> Willy
>


Re: Few problems seen in haproxy? (threads, connections).

2018-10-15 Thread Willy Tarreau
Hi again,

finally I got rid of the FD lock for single-threaded accesses (most of
them), and based on Olivier's suggestion, I implemented a per-thread
wait queue, and cache-aligned some list heads to avoid undesired cache
line sharing. For me all of this combined resulted in a performance
increase of 25% on a 12-threads workload. I'm interested in your test
results, all of this is in the latest master.

If you still see LBPRM a lot, I can send you the experimental patch
to move the element inside the tree without unlinking/relinking it
and we can see if that provides any benefit or not (I'm not convinced).

Cheers,
Willy



Re: Few problems seen in haproxy? (threads, connections).

2018-10-14 Thread Willy Tarreau
Hi Krishna,

On Thu, Oct 11, 2018 at 12:04:31PM +0530, Krishna Kumar (Engineering) wrote:
> I must say the improvements are pretty impressive!
> 
> Earlier number reported with 24 processes: 519K
> Earlier number reported with 24 threads:  79K
> New RPS with system irq tuning, today's git,
>configuration changes, 24 threads:353K
> Old code with same tuning gave:  290K

OK that's much better but I'm still horrified by the time taken in
the load balancing algorithm. I thought it could be fwlc_reposition(),
which contains an eb32_delete()+eb32_insert(), so I decided to replace
this with a new eb32_move() which moves the node within the tree, and
it didn't change anything here. Also I figured that I cannot manage to
reach that high time spent in this lock (300ms here, 58s for you). There
is one possible difference that might explain it, do you have a maxconn
setting on your servers ? If so, is it possible that it's reached ? You
can take a look at your stats page and see if the "Queue/Max" entry for
any backend is non-null.

Indeed, I'm seeing that once a server is saturated, we skip it for the
next one. This part can be expensive. Ideally we should remove such servers
from the tree until they're unblocked, but there is one special case making
this difficult, which is the dynamic limitation (minconn+maxconn+fullconn).
However I think we could improve this so that only this use case would be
affected and not the other ones.

I'm also seeing that this lock could be replaced by an RW lock. But before
taking a deeper look, I'm interested in verifying that it's indeed the
situation you're facing.

Thanks,
Willy



Re: Few problems seen in haproxy? (threads, connections).

2018-10-11 Thread Krishna Kumar (Engineering)
I must say the improvements are pretty impressive!

Earlier number reported with 24 processes: 519K
Earlier number reported with 24 threads:  79K
New RPS with system irq tuning, today's git,
   configuration changes, 24 threads:353K
Old code with same tuning gave:  290K

My test machine is a 2 NUMA node server, CPU's 0, 2, ..., 22 on node 0, and
1, 3, ..., 23 on node 1. That adds to 24 cpus. The remaining 24 cores are
HT.
Node 0 has 24, 26, ..., 46 and node 1 has 25, 27, ..., 47. This may explain
why
it scales well to 24. The 16 irq's of the NIC and pinned to cpus 0, 2, ...,
30.
Hoping performance further improves.

12 threads: 280K
16 threads: 318K
24 threads: 353K (occasional drop till 330K)
32 threads: 238K

I am attaching 2 text files of the lock metrics for 24 and 32 threads. A
vimdiff shows the differences nicely (fd, task_rq, task_wq, proxy, server,
lbprm and buf_wq increased significantly).

Thanks!

On Thu, Oct 11, 2018 at 8:53 AM Krishna Kumar (Engineering) <
krishna...@flipkart.com> wrote:

> Thanks, will do that.
>
> On Thu, Oct 11, 2018 at 8:37 AM Willy Tarreau  wrote:
>
>> On Thu, Oct 11, 2018 at 08:18:21AM +0530, Krishna Kumar (Engineering)
>> wrote:
>> > Hi Willy,
>> >
>> > Thank you very much for the in-depth analysis and configuration setting
>> > suggestions.
>> > I believe I have got the 3 key items to continue based on your mail:
>> >
>> > 1. Thread pinning
>> > 2. Fix system irq pinning accordingly
>> > 3. Listen on all threads.
>> >
>> > I will post the configuration changes and the results.
>>
>> By the way, please pull the latest master fixes. I've addressed two issues
>> there with locking :
>>   - one where the scheduler work was slightly too high, increasing the
>> time
>> spent on RQ lock
>>   - another one where I messed up on a fix, causing lock-free pools to be
>> disabled (as seen in your output, where the POOL lock appears a lot)
>>
>> On some tests I've run here, I've found the stick-tables lock to be a
>> bottleneck when tracking is enabled. I don't have a short-term solution
>> to this, but looking at the code it's obvious that it can significantly
>> be improved (though it will take quite some time). I'll probably at least
>> try to replace it with an RW lock as I think it could improve the
>> situation.
>>
>> The FD lock is another one requiring some lift-up. I'm certain it's
>> possible,
>> I just don't know if it will not degrade low-thread count performance by
>> using too many atomic ops instead. We'll have to experiment.
>>
>> Cheers,
>> Willy
>>
>
Stats about Lock FD: 
 # write lock  : 407304162
 # write unlock: 407304155 (-7)
 # wait time for write : 39627.620 msec
 # wait time for write/lock: 97.292 nsec
 # read lock   : 0
 # read unlock : 0 (0)
 # wait time for read  : 0.000 msec
 # wait time for read/lock : 0.000 nsec
Stats about Lock TASK_RQ: 
 # write lock  : 2230051
 # write unlock: 2230051 (0)
 # wait time for write : 63163.277 msec
 # wait time for write/lock: 28323.692 nsec
 # read lock   : 0
 # read unlock : 0 (0)
 # wait time for read  : 0.000 msec
 # wait time for read/lock : 0.000 nsec
Stats about Lock TASK_WQ: 
 # write lock  : 14897430
 # write unlock: 14897430 (0)
 # wait time for write : 49136.313 msec
 # wait time for write/lock: 3298.308 nsec
 # read lock   : 0
 # read unlock : 0 (0)
 # wait time for read  : 0.000 msec
 # wait time for read/lock : 0.000 nsec
Stats about Lock POOL: 
 # write lock  : 0
 # write unlock: 0 (0)
 # wait time for write : 0.000 msec
 # wait time for write/lock: 0.000 nsec
 # read lock   : 0
 # read unlock : 0 (0)
 # wait time for read  : 0.000 msec
 # wait time for read/lock : 0.000 nsec
Stats about Lock LISTENER: 
 # write lock  : 5500
 # write unlock: 5500 (0)
 # wait time for write : 0.076 msec
 # wait time for write/lock: 13.734 nsec
 # read lock   : 0
 # read unlock : 0 (0)
 # wait time for read  : 0.000 msec
 # wait time for read/lock : 0.000 nsec
Stats about Lock PROXY: 
 # write lock  : 7368276
 # write unlock: 7368276 (0)
 # wait time for write : 16768.394 msec
 # wait time for write/lock: 2275.755 nsec
 # read lock   : 0
 # read unlock : 0 (0)
 # wait time for read  : 0.000 msec
 # wait time for read/lock : 0.000 nsec
Stats about Lock SERVER: 
 # write lock  : 29505947
 # write unlock: 29505947 (0)
 # wait time for write : 38455.325 msec
 # wait time for write/lock: 1303.308 nsec
 # read lock   : 0
 # read unlock : 0 (0)
 # wait time for read  : 

Re: Few problems seen in haproxy? (threads, connections).

2018-10-10 Thread Willy Tarreau
On Thu, Oct 11, 2018 at 08:18:21AM +0530, Krishna Kumar (Engineering) wrote:
> Hi Willy,
> 
> Thank you very much for the in-depth analysis and configuration setting
> suggestions.
> I believe I have got the 3 key items to continue based on your mail:
> 
> 1. Thread pinning
> 2. Fix system irq pinning accordingly
> 3. Listen on all threads.
> 
> I will post the configuration changes and the results.

By the way, please pull the latest master fixes. I've addressed two issues
there with locking :
  - one where the scheduler work was slightly too high, increasing the time
spent on RQ lock
  - another one where I messed up on a fix, causing lock-free pools to be
disabled (as seen in your output, where the POOL lock appears a lot)

On some tests I've run here, I've found the stick-tables lock to be a
bottleneck when tracking is enabled. I don't have a short-term solution
to this, but looking at the code it's obvious that it can significantly
be improved (though it will take quite some time). I'll probably at least
try to replace it with an RW lock as I think it could improve the situation.

The FD lock is another one requiring some lift-up. I'm certain it's possible,
I just don't know if it will not degrade low-thread count performance by
using too many atomic ops instead. We'll have to experiment.

Cheers,
Willy



Re: Few problems seen in haproxy? (threads, connections).

2018-10-10 Thread Krishna Kumar (Engineering)
Hi Willy,

Thank you very much for the in-depth analysis and configuration setting
suggestions.
I believe I have got the 3 key items to continue based on your mail:

1. Thread pinning
2. Fix system irq pinning accordingly
3. Listen on all threads.

I will post the configuration changes and the results.

Regards,
- Krishna


On Wed, Oct 10, 2018 at 6:39 PM Willy Tarreau  wrote:

> Hi Krishna,
>
> On Tue, Oct 02, 2018 at 09:18:19PM +0530, Krishna Kumar (Engineering)
> wrote:
> (...)
> > 1. HAProxy system:
> > Kernel: 4.17.13,
> > CPU: 48 core E5-2670 v3
> > Memory: 128GB memory
> > NIC: Mellanox 40g with IRQ pinning
> >
> > 2. Client, 48 core similar to server. Test command line:
> > wrk -c 4800 -t 48 -d 30s http:///128
> >
> > 3. HAProxy version: I am testing both 1.8.14 and 1.9-dev3 (git checkout
> as
> > of
> > Oct 2nd).
> > # haproxy-git -vv
> > HA-Proxy version 1.9-dev3 2018/09/29
> (...)
> > 4. HAProxy results for #processes and #threads
> > #Threads-RPS Procs-RPS
> > 1 20903 19280
> > 2 46400 51045
> > 4 96587 142801
> > 8 172224 254720
> > 12 210451 437488
> > 16 173034 437375
> > 24 79069 519367
> > 32 55607 586367
> > 48 31739 596148
>
> Our largest thread test was on 12 cores and it happens that in your case
> it's also the optimal one.
>
> However I do have some comments about your config, before going back to
> real thread issues :
>
> > # cpu-map auto:1/1-48 0-39
>   => you must absolutely pin your processes, and they must be pinned
>  to cores *not* shared with the network card. That's critical.
>  Moreover it's also important that threads are not split across
>  multiple physical CPUs because the remote L3 cache access time
>  over QPI/UPI is terrible. When you run on 12 threads with two
>  12-cores/24-threads CPUs, you could very well have haproxy using
>  12 threads from 6 cores, and the NIC using 12 threads from the
>  other 6 cores of the same physical CPU. The second socket is,
>  as usual, useless for anything requiring low latency. However
>  it's perfect to run SSL. So you could be interested in testing
>  if running the NIC on one socket (try to figure what node the
>  PCIe lanes are physically connected to), and haproxy on the other
>  one. It *could* be possible that you get more performance from 12
>  cores of each but I strongly doubt it based on a number of tests.
>  If you use SSL however it's different as you will benefit from
>  lots of cores much more than low latency.
>
> > bind :80 process 1/1-48
>   => it's also capital for scalability to have individual bind lines. Here
>  you have a single socket accessed from all 48 threads. There's no
>  efficient thread load balancing here. By having this :
>
>  bind :80 process 1/1
>  bind :80 process 1/2
>  ...
>  bind :80 process 1/47
>  bind :80 process 1/48
>
>  You will let the kernel perform the load balancing and distribute a
>  fair load to all threads. This way none of them will risk to pick a
>  larger share of the incoming connections than optimal. I know it's
>  annoying to configure at the moment, I've been thinking about having
>  a way to automatically iterate from a single config line (like the
>  "auto" feature of cpu-map), but for now it's not done.
>
> Now back to the thread measurements :
>
> > 5. Lock stats for 1.9-dev3: Some write locks on average took a lot more
> time
> >to acquire, e.g. "POOL" and "TASK_WQ". For 48 threads, I get:
> > Stats about Lock FD:
> > # write lock  : 143933900
> > # write unlock: 143933895 (-5)
> > # wait time for write : 11370.245 msec
>
> This one definitely is huge. We know some work is still needed on this lock
> and that there are still a few low hanging fruits but not much savings to
> expect short term. This output is very revealing however of the importance
> of this lock.
>
> > # wait time for write/lock: 78.996 nsec
>
> That's roughly the time it takes to access the other CPU's cache, so using
> your two sockets for the same process definitely hurts a lot here.
>
> > Stats about Lock TASK_RQ:
> > # write lock  : 2062874
> > # write unlock: 2062875 (1)
> > # wait time for write : 7820.234 msec
>
> This one is still far too large for what we'd hope, even though it
> has significantly shrunk since 1.8. It could be related to the poor
> distribution of the incoming connections across threads.
>
> > # wait time for write/lock: 3790.941 nsec
>
> Wow, 3.8 microseconds to acquire the write lock is a lot! I'm starting
> to suspect some longer tree walks than expected. Again, lack of fairness
> between threads can make this significantly worse than it should.
>
> > Stats about Lock TASK_WQ:
> > # write lock  : 2601227
> > # write unlock: 2601227 (0)
> > # wait time for write : 5019.811 msec
> > # wait time for write/lock: 1929.786 nsec
>
> For this one I had an idea that could significantly improve the situation.
> It will unlikely be for 1.9 

Re: Few problems seen in haproxy? (threads, connections).

2018-10-10 Thread Willy Tarreau
Hi Krishna,

On Tue, Oct 02, 2018 at 09:18:19PM +0530, Krishna Kumar (Engineering) wrote:
(...)
> 1. HAProxy system:
> Kernel: 4.17.13,
> CPU: 48 core E5-2670 v3
> Memory: 128GB memory
> NIC: Mellanox 40g with IRQ pinning
> 
> 2. Client, 48 core similar to server. Test command line:
> wrk -c 4800 -t 48 -d 30s http:///128
> 
> 3. HAProxy version: I am testing both 1.8.14 and 1.9-dev3 (git checkout as
> of
> Oct 2nd).
> # haproxy-git -vv
> HA-Proxy version 1.9-dev3 2018/09/29
(...)
> 4. HAProxy results for #processes and #threads
> #Threads-RPS Procs-RPS
> 1 20903 19280
> 2 46400 51045
> 4 96587 142801
> 8 172224 254720
> 12 210451 437488
> 16 173034 437375
> 24 79069 519367
> 32 55607 586367
> 48 31739 596148

Our largest thread test was on 12 cores and it happens that in your case
it's also the optimal one.

However I do have some comments about your config, before going back to
real thread issues :

> # cpu-map auto:1/1-48 0-39
  => you must absolutely pin your processes, and they must be pinned
 to cores *not* shared with the network card. That's critical.
 Moreover it's also important that threads are not split across
 multiple physical CPUs because the remote L3 cache access time
 over QPI/UPI is terrible. When you run on 12 threads with two
 12-cores/24-threads CPUs, you could very well have haproxy using
 12 threads from 6 cores, and the NIC using 12 threads from the
 other 6 cores of the same physical CPU. The second socket is,
 as usual, useless for anything requiring low latency. However
 it's perfect to run SSL. So you could be interested in testing
 if running the NIC on one socket (try to figure what node the
 PCIe lanes are physically connected to), and haproxy on the other
 one. It *could* be possible that you get more performance from 12
 cores of each but I strongly doubt it based on a number of tests.
 If you use SSL however it's different as you will benefit from
 lots of cores much more than low latency.

> bind :80 process 1/1-48
  => it's also capital for scalability to have individual bind lines. Here
 you have a single socket accessed from all 48 threads. There's no
 efficient thread load balancing here. By having this :

 bind :80 process 1/1
 bind :80 process 1/2
 ...
 bind :80 process 1/47
 bind :80 process 1/48

 You will let the kernel perform the load balancing and distribute a
 fair load to all threads. This way none of them will risk to pick a
 larger share of the incoming connections than optimal. I know it's
 annoying to configure at the moment, I've been thinking about having
 a way to automatically iterate from a single config line (like the
 "auto" feature of cpu-map), but for now it's not done.

Now back to the thread measurements :

> 5. Lock stats for 1.9-dev3: Some write locks on average took a lot more time
>to acquire, e.g. "POOL" and "TASK_WQ". For 48 threads, I get:
> Stats about Lock FD:
> # write lock  : 143933900
> # write unlock: 143933895 (-5)
> # wait time for write : 11370.245 msec

This one definitely is huge. We know some work is still needed on this lock
and that there are still a few low hanging fruits but not much savings to
expect short term. This output is very revealing however of the importance
of this lock.

> # wait time for write/lock: 78.996 nsec

That's roughly the time it takes to access the other CPU's cache, so using
your two sockets for the same process definitely hurts a lot here.

> Stats about Lock TASK_RQ:
> # write lock  : 2062874
> # write unlock: 2062875 (1)
> # wait time for write : 7820.234 msec

This one is still far too large for what we'd hope, even though it
has significantly shrunk since 1.8. It could be related to the poor
distribution of the incoming connections across threads.

> # wait time for write/lock: 3790.941 nsec

Wow, 3.8 microseconds to acquire the write lock is a lot! I'm starting
to suspect some longer tree walks than expected. Again, lack of fairness
between threads can make this significantly worse than it should.

> Stats about Lock TASK_WQ:
> # write lock  : 2601227
> # write unlock: 2601227 (0)
> # wait time for write : 5019.811 msec
> # wait time for write/lock: 1929.786 nsec

For this one I had an idea that could significantly improve the situation.
It will unlikely be for 1.9 though.

> Stats about Lock POOL:
> # write lock  : 2823393
> # write unlock: 2823393 (0)
> # wait time for write : 11984.706 msec

Interesting. Very interesting... I thought we merged the lock-free pool
code. Maybe I was mistaken. Or I misunderstood something about the nature
of the changes that went in early during the 1.9-dev cycle. This one is
not expected to be hard to address, and I even see how we could go further
with a small thread-local cache since I developed such a thing a few years
ago (though it was over-engineered) that could easily be retrofitted here.

> # wait time for 

Re: Few problems seen in haproxy? (threads, connections).

2018-10-05 Thread Willy Tarreau
Hi Krishna,

On Fri, Oct 05, 2018 at 02:25:13PM +0530, Krishna Kumar (Engineering) wrote:
> Sorry for repeating once again, but this is my last unsolicited
> mail on this topic. Any directions for what to look out for?

Sorry, but I didn't even have the time to read your mail over the last
two days and still have a huge backlog pending. Maybe next week.

Regards,
Willy



Re: Few problems seen in haproxy? (threads, connections).

2018-10-05 Thread Krishna Kumar (Engineering)
Sorry for repeating once again, but this is my last unsolicited
mail on this topic. Any directions for what to look out for?

Thanks,
- Krishna


On Thu, Oct 4, 2018 at 8:42 AM Krishna Kumar (Engineering) <
krishna...@flipkart.com> wrote:

> Re-sending in case this mail was missed. To summarise the 3 issues seen:
>
> 1. Performance drops 18x with higher number of nbthreads as compared to
> nbprocs.
> 2. CPU utilisation remains at 100% after wrk finishes for 30 seconds (for
> 1.9-dev3
> for nbprocs and nbthreads).
> 3. Sockets on client remain in FIN-WAIT-2, while on HAProxy it remains in
> either
>  CLOSE-WAIT (towards clients) and ESTAB (towards the backend servers),
> till
>  the server/client timeout expires.
>
> The tests for threads and processes were done on the same systems, so
> there is
> no difference in system parameters.
>
> Thanks,
> - Krishna
>
>
> On Tue, Oct 2, 2018 at 9:18 PM Krishna Kumar (Engineering) <
> krishna...@flipkart.com> wrote:
>
>> Hi Willy, and community developers,
>>
>> I am not sure if I am doing something wrong, but wanted to report
>> some issues that I am seeing. Please let me know if this is a problem.
>>
>> 1. HAProxy system:
>> Kernel: 4.17.13,
>> CPU: 48 core E5-2670 v3
>> Memory: 128GB memory
>> NIC: Mellanox 40g with IRQ pinning
>>
>> 2. Client, 48 core similar to server. Test command line:
>> wrk -c 4800 -t 48 -d 30s http:///128
>>
>> 3. HAProxy version: I am testing both 1.8.14 and 1.9-dev3 (git checkout
>> as of
>> Oct 2nd).
>> # haproxy-git -vv
>> HA-Proxy version 1.9-dev3 2018/09/29
>> Copyright 2000-2018 Willy Tarreau 
>>
>> Build options :
>>   TARGET  = linux2628
>>   CPU = generic
>>   CC  = gcc
>>   CFLAGS  = -O2 -g -fno-strict-aliasing -Wdeclaration-after-statement
>> -fwrapv -fno-strict-overflow -Wno-unused-label -Wno-sign-compare
>> -Wno-unused-parameter -Wno-old-style-declaration -Wno-ignored-qualifiers
>> -Wno-clobbered -Wno-missing-field-initializers -Wtype-limits
>>   OPTIONS = USE_ZLIB=yes USE_OPENSSL=1 USE_PCRE=1
>>
>> Default settings :
>>   maxconn = 2000, bufsize = 16384, maxrewrite = 1024, maxpollevents = 200
>>
>> Built with OpenSSL version : OpenSSL 1.0.2g  1 Mar 2016
>> Running on OpenSSL version : OpenSSL 1.0.2g  1 Mar 2016
>> OpenSSL library supports TLS extensions : yes
>> OpenSSL library supports SNI : yes
>> OpenSSL library supports : TLSv1.0 TLSv1.1 TLSv1.2
>> Built with transparent proxy support using: IP_TRANSPARENT
>> IPV6_TRANSPARENT IP_FREEBIND
>> Encrypted password support via crypt(3): yes
>> Built with multi-threading support.
>> Built with PCRE version : 8.38 2015-11-23
>> Running on PCRE version : 8.38 2015-11-23
>> PCRE library supports JIT : no (USE_PCRE_JIT not set)
>> Built with zlib version : 1.2.8
>> Running on zlib version : 1.2.8
>> Compression algorithms supported : identity("identity"),
>> deflate("deflate"), raw-deflate("deflate"), gzip("gzip")
>> Built with network namespace support.
>>
>> Available polling systems :
>>   epoll : pref=300,  test result OK
>>poll : pref=200,  test result OK
>>  select : pref=150,  test result OK
>> Total: 3 (3 usable), will use epoll.
>>
>> Available multiplexer protocols :
>> (protocols markes as  cannot be specified using 'proto' keyword)
>>   h2 : mode=HTTP   side=FE
>> : mode=TCP|HTTP   side=FE|BE
>>
>> Available filters :
>> [SPOE] spoe
>> [COMP] compression
>> [TRACE] trace
>>
>> 4. HAProxy results for #processes and #threads
>> #Threads-RPS Procs-RPS
>> 1 20903 19280
>> 2 46400 51045
>> 4 96587 142801
>> 8 172224 254720
>> 12 210451 437488
>> 16 173034 437375
>> 24 79069 519367
>> 32 55607 586367
>> 48 31739 596148
>>
>> 5. Lock stats for 1.9-dev3: Some write locks on average took a lot more
>> time
>>to acquire, e.g. "POOL" and "TASK_WQ". For 48 threads, I get:
>> Stats about Lock FD:
>> # write lock  : 143933900
>> # write unlock: 143933895 (-5)
>> # wait time for write : 11370.245 msec
>> # wait time for write/lock: 78.996 nsec
>> # read lock   : 0
>> # read unlock : 0 (0)
>> # wait time for read  : 0.000 msec
>> # wait time for read/lock : 0.000 nsec
>> Stats about Lock TASK_RQ:
>> # write lock  : 2062874
>> # write unlock: 2062875 (1)
>> # wait time for write : 7820.234 msec
>> # wait time for write/lock: 3790.941 nsec
>> # read lock   : 0
>> # read unlock : 0 (0)
>> # wait time for read  : 0.000 msec
>> # wait time for read/lock : 0.000 nsec
>> Stats about Lock TASK_WQ:
>> # write lock  : 2601227
>> # write unlock: 2601227 (0)
>> # wait time for write : 5019.811 msec
>> # wait time for write/lock: 1929.786 nsec
>> # read lock   : 0
>> # read unlock : 0 (0)
>> # wait time for read  : 0.000 msec
>> # wait time for read/lock : 0.000 nsec
>> Stats about Lock POOL:
>> # write lock  : 2823393
>> # write unlock: 2823393 (0)
>> # wait time for write : 11984.706 msec
>> # wait time for write/lock: 4244.788 nsec
>> # read lock   : 0
>> # read unlock : 0 (0)
>> # wait time for 

Re: Few problems seen in haproxy? (threads, connections).

2018-10-04 Thread Krishna Kumar (Engineering)
Thanks, will take a look!

On Thu, Oct 4, 2018 at 12:58 PM Илья Шипицин  wrote:

> what I going to try (when I will have some spare time) is sampling with
> google perftools
>
> https://github.com/gperftools/gperftools
>
> they are great in cpu profiling.
> you can try them youself if you have time/wish :)
>
>
> чт, 4 окт. 2018 г. в 11:53, Krishna Kumar (Engineering) <
> krishna...@flipkart.com>:
>
>> 1. haproxy config: Same as given above (both processes and threads were
>> given in the mail)
>> 2. nginx: default, no changes.
>> 3. sysctl's: nothing set. All changes as described earlier (e.g.
>> irqbalance, irq pinning, etc).
>> 4. nf_conntrack: disabled
>> 5. dmesg: no messages.
>>
>> With the same system and settings, threads gives 18x lesser RPS than
>> processes, along with
>> the other 2 issues given in my mail today.
>>
>>
>> On Thu, Oct 4, 2018 at 12:09 PM Илья Шипицин 
>> wrote:
>>
>>> haproxy config, nginx config
>>> non default sysctl (if any)
>>>
>>> as a side note, can you have a look at "dmesg" output ? do you have nf
>>> conntrack enabled ? what are its limits ?
>>>
>>> чт, 4 окт. 2018 г. в 9:59, Krishna Kumar (Engineering) <
>>> krishna...@flipkart.com>:
>>>
 Sure.

 1. Client: Use one of the following two setup's:
 - a single baremetal (48 core, 40g) system
   Run: "wrk -c 4800 -t 48 -d 30s http://:80/128", or,
 - 100 2 core vm's.
   Run "wrk -c 16 -t 2 -d 30s http://:80/128" from
   each VM and summarize the results using some
   parallel-ssh setup.

 2. HAProxy running on a single baremetal (same system config
 as client - 48 core, 40g, 4.17.13 kernel, irq tuned to use different
 cores of the same NUMA node for each irq, kill irqbalance, with
 haproxy configuration file as given in my first mail. Around 60
 backend servers are configured in haproxy.

 3. Backend servers are 2 core VM's running nginx and serving
 a file called "/128", which is 128 bytes in size.

 Let me know if you need more information.

 Thanks,
 - Krishna


 On Thu, Oct 4, 2018 at 10:21 AM Илья Шипицин 
 wrote:

> load testing is somewhat good.
> can you describe an overall setup ? (I want to reproduce and play with
> it)
>
> чт, 4 окт. 2018 г. в 8:16, Krishna Kumar (Engineering) <
> krishna...@flipkart.com>:
>
>> Re-sending in case this mail was missed. To summarise the 3 issues
>> seen:
>>
>> 1. Performance drops 18x with higher number of nbthreads as compared
>> to nbprocs.
>> 2. CPU utilisation remains at 100% after wrk finishes for 30 seconds
>> (for 1.9-dev3
>> for nbprocs and nbthreads).
>> 3. Sockets on client remain in FIN-WAIT-2, while on HAProxy it
>> remains in either
>>  CLOSE-WAIT (towards clients) and ESTAB (towards the backend
>> servers), till
>>  the server/client timeout expires.
>>
>> The tests for threads and processes were done on the same systems, so
>> there is
>> no difference in system parameters.
>>
>> Thanks,
>> - Krishna
>>
>>
>> On Tue, Oct 2, 2018 at 9:18 PM Krishna Kumar (Engineering) <
>> krishna...@flipkart.com> wrote:
>>
>>> Hi Willy, and community developers,
>>>
>>> I am not sure if I am doing something wrong, but wanted to report
>>> some issues that I am seeing. Please let me know if this is a
>>> problem.
>>>
>>> 1. HAProxy system:
>>> Kernel: 4.17.13,
>>> CPU: 48 core E5-2670 v3
>>> Memory: 128GB memory
>>> NIC: Mellanox 40g with IRQ pinning
>>>
>>> 2. Client, 48 core similar to server. Test command line:
>>> wrk -c 4800 -t 48 -d 30s http:///128
>>>
>>> 3. HAProxy version: I am testing both 1.8.14 and 1.9-dev3 (git
>>> checkout as of
>>> Oct 2nd).
>>> # haproxy-git -vv
>>> HA-Proxy version 1.9-dev3 2018/09/29
>>> Copyright 2000-2018 Willy Tarreau 
>>>
>>> Build options :
>>>   TARGET  = linux2628
>>>   CPU = generic
>>>   CC  = gcc
>>>   CFLAGS  = -O2 -g -fno-strict-aliasing
>>> -Wdeclaration-after-statement -fwrapv -fno-strict-overflow
>>> -Wno-unused-label -Wno-sign-compare -Wno-unused-parameter
>>> -Wno-old-style-declaration -Wno-ignored-qualifiers -Wno-clobbered
>>> -Wno-missing-field-initializers -Wtype-limits
>>>   OPTIONS = USE_ZLIB=yes USE_OPENSSL=1 USE_PCRE=1
>>>
>>> Default settings :
>>>   maxconn = 2000, bufsize = 16384, maxrewrite = 1024, maxpollevents
>>> = 200
>>>
>>> Built with OpenSSL version : OpenSSL 1.0.2g  1 Mar 2016
>>> Running on OpenSSL version : OpenSSL 1.0.2g  1 Mar 2016
>>> OpenSSL library supports TLS extensions : yes
>>> OpenSSL library supports SNI : yes
>>> OpenSSL library supports : TLSv1.0 TLSv1.1 TLSv1.2
>>> Built with transparent proxy support 

Re: Few problems seen in haproxy? (threads, connections).

2018-10-04 Thread Илья Шипицин
what I going to try (when I will have some spare time) is sampling with
google perftools

https://github.com/gperftools/gperftools

they are great in cpu profiling.
you can try them youself if you have time/wish :)


чт, 4 окт. 2018 г. в 11:53, Krishna Kumar (Engineering) <
krishna...@flipkart.com>:

> 1. haproxy config: Same as given above (both processes and threads were
> given in the mail)
> 2. nginx: default, no changes.
> 3. sysctl's: nothing set. All changes as described earlier (e.g.
> irqbalance, irq pinning, etc).
> 4. nf_conntrack: disabled
> 5. dmesg: no messages.
>
> With the same system and settings, threads gives 18x lesser RPS than
> processes, along with
> the other 2 issues given in my mail today.
>
>
> On Thu, Oct 4, 2018 at 12:09 PM Илья Шипицин  wrote:
>
>> haproxy config, nginx config
>> non default sysctl (if any)
>>
>> as a side note, can you have a look at "dmesg" output ? do you have nf
>> conntrack enabled ? what are its limits ?
>>
>> чт, 4 окт. 2018 г. в 9:59, Krishna Kumar (Engineering) <
>> krishna...@flipkart.com>:
>>
>>> Sure.
>>>
>>> 1. Client: Use one of the following two setup's:
>>> - a single baremetal (48 core, 40g) system
>>>   Run: "wrk -c 4800 -t 48 -d 30s http://:80/128", or,
>>> - 100 2 core vm's.
>>>   Run "wrk -c 16 -t 2 -d 30s http://:80/128" from
>>>   each VM and summarize the results using some
>>>   parallel-ssh setup.
>>>
>>> 2. HAProxy running on a single baremetal (same system config
>>> as client - 48 core, 40g, 4.17.13 kernel, irq tuned to use different
>>> cores of the same NUMA node for each irq, kill irqbalance, with
>>> haproxy configuration file as given in my first mail. Around 60
>>> backend servers are configured in haproxy.
>>>
>>> 3. Backend servers are 2 core VM's running nginx and serving
>>> a file called "/128", which is 128 bytes in size.
>>>
>>> Let me know if you need more information.
>>>
>>> Thanks,
>>> - Krishna
>>>
>>>
>>> On Thu, Oct 4, 2018 at 10:21 AM Илья Шипицин 
>>> wrote:
>>>
 load testing is somewhat good.
 can you describe an overall setup ? (I want to reproduce and play with
 it)

 чт, 4 окт. 2018 г. в 8:16, Krishna Kumar (Engineering) <
 krishna...@flipkart.com>:

> Re-sending in case this mail was missed. To summarise the 3 issues
> seen:
>
> 1. Performance drops 18x with higher number of nbthreads as compared
> to nbprocs.
> 2. CPU utilisation remains at 100% after wrk finishes for 30 seconds
> (for 1.9-dev3
> for nbprocs and nbthreads).
> 3. Sockets on client remain in FIN-WAIT-2, while on HAProxy it remains
> in either
>  CLOSE-WAIT (towards clients) and ESTAB (towards the backend
> servers), till
>  the server/client timeout expires.
>
> The tests for threads and processes were done on the same systems, so
> there is
> no difference in system parameters.
>
> Thanks,
> - Krishna
>
>
> On Tue, Oct 2, 2018 at 9:18 PM Krishna Kumar (Engineering) <
> krishna...@flipkart.com> wrote:
>
>> Hi Willy, and community developers,
>>
>> I am not sure if I am doing something wrong, but wanted to report
>> some issues that I am seeing. Please let me know if this is a problem.
>>
>> 1. HAProxy system:
>> Kernel: 4.17.13,
>> CPU: 48 core E5-2670 v3
>> Memory: 128GB memory
>> NIC: Mellanox 40g with IRQ pinning
>>
>> 2. Client, 48 core similar to server. Test command line:
>> wrk -c 4800 -t 48 -d 30s http:///128
>>
>> 3. HAProxy version: I am testing both 1.8.14 and 1.9-dev3 (git
>> checkout as of
>> Oct 2nd).
>> # haproxy-git -vv
>> HA-Proxy version 1.9-dev3 2018/09/29
>> Copyright 2000-2018 Willy Tarreau 
>>
>> Build options :
>>   TARGET  = linux2628
>>   CPU = generic
>>   CC  = gcc
>>   CFLAGS  = -O2 -g -fno-strict-aliasing -Wdeclaration-after-statement
>> -fwrapv -fno-strict-overflow -Wno-unused-label -Wno-sign-compare
>> -Wno-unused-parameter -Wno-old-style-declaration -Wno-ignored-qualifiers
>> -Wno-clobbered -Wno-missing-field-initializers -Wtype-limits
>>   OPTIONS = USE_ZLIB=yes USE_OPENSSL=1 USE_PCRE=1
>>
>> Default settings :
>>   maxconn = 2000, bufsize = 16384, maxrewrite = 1024, maxpollevents =
>> 200
>>
>> Built with OpenSSL version : OpenSSL 1.0.2g  1 Mar 2016
>> Running on OpenSSL version : OpenSSL 1.0.2g  1 Mar 2016
>> OpenSSL library supports TLS extensions : yes
>> OpenSSL library supports SNI : yes
>> OpenSSL library supports : TLSv1.0 TLSv1.1 TLSv1.2
>> Built with transparent proxy support using: IP_TRANSPARENT
>> IPV6_TRANSPARENT IP_FREEBIND
>> Encrypted password support via crypt(3): yes
>> Built with multi-threading support.
>> Built with PCRE version : 8.38 2015-11-23
>> Running on PCRE version : 8.38 

Re: Few problems seen in haproxy? (threads, connections).

2018-10-04 Thread Krishna Kumar (Engineering)
1. haproxy config: Same as given above (both processes and threads were
given in the mail)
2. nginx: default, no changes.
3. sysctl's: nothing set. All changes as described earlier (e.g.
irqbalance, irq pinning, etc).
4. nf_conntrack: disabled
5. dmesg: no messages.

With the same system and settings, threads gives 18x lesser RPS than
processes, along with
the other 2 issues given in my mail today.


On Thu, Oct 4, 2018 at 12:09 PM Илья Шипицин  wrote:

> haproxy config, nginx config
> non default sysctl (if any)
>
> as a side note, can you have a look at "dmesg" output ? do you have nf
> conntrack enabled ? what are its limits ?
>
> чт, 4 окт. 2018 г. в 9:59, Krishna Kumar (Engineering) <
> krishna...@flipkart.com>:
>
>> Sure.
>>
>> 1. Client: Use one of the following two setup's:
>> - a single baremetal (48 core, 40g) system
>>   Run: "wrk -c 4800 -t 48 -d 30s http://:80/128", or,
>> - 100 2 core vm's.
>>   Run "wrk -c 16 -t 2 -d 30s http://:80/128" from
>>   each VM and summarize the results using some
>>   parallel-ssh setup.
>>
>> 2. HAProxy running on a single baremetal (same system config
>> as client - 48 core, 40g, 4.17.13 kernel, irq tuned to use different
>> cores of the same NUMA node for each irq, kill irqbalance, with
>> haproxy configuration file as given in my first mail. Around 60
>> backend servers are configured in haproxy.
>>
>> 3. Backend servers are 2 core VM's running nginx and serving
>> a file called "/128", which is 128 bytes in size.
>>
>> Let me know if you need more information.
>>
>> Thanks,
>> - Krishna
>>
>>
>> On Thu, Oct 4, 2018 at 10:21 AM Илья Шипицин 
>> wrote:
>>
>>> load testing is somewhat good.
>>> can you describe an overall setup ? (I want to reproduce and play with
>>> it)
>>>
>>> чт, 4 окт. 2018 г. в 8:16, Krishna Kumar (Engineering) <
>>> krishna...@flipkart.com>:
>>>
 Re-sending in case this mail was missed. To summarise the 3 issues seen:

 1. Performance drops 18x with higher number of nbthreads as compared to
 nbprocs.
 2. CPU utilisation remains at 100% after wrk finishes for 30 seconds
 (for 1.9-dev3
 for nbprocs and nbthreads).
 3. Sockets on client remain in FIN-WAIT-2, while on HAProxy it remains
 in either
  CLOSE-WAIT (towards clients) and ESTAB (towards the backend
 servers), till
  the server/client timeout expires.

 The tests for threads and processes were done on the same systems, so
 there is
 no difference in system parameters.

 Thanks,
 - Krishna


 On Tue, Oct 2, 2018 at 9:18 PM Krishna Kumar (Engineering) <
 krishna...@flipkart.com> wrote:

> Hi Willy, and community developers,
>
> I am not sure if I am doing something wrong, but wanted to report
> some issues that I am seeing. Please let me know if this is a problem.
>
> 1. HAProxy system:
> Kernel: 4.17.13,
> CPU: 48 core E5-2670 v3
> Memory: 128GB memory
> NIC: Mellanox 40g with IRQ pinning
>
> 2. Client, 48 core similar to server. Test command line:
> wrk -c 4800 -t 48 -d 30s http:///128
>
> 3. HAProxy version: I am testing both 1.8.14 and 1.9-dev3 (git
> checkout as of
> Oct 2nd).
> # haproxy-git -vv
> HA-Proxy version 1.9-dev3 2018/09/29
> Copyright 2000-2018 Willy Tarreau 
>
> Build options :
>   TARGET  = linux2628
>   CPU = generic
>   CC  = gcc
>   CFLAGS  = -O2 -g -fno-strict-aliasing -Wdeclaration-after-statement
> -fwrapv -fno-strict-overflow -Wno-unused-label -Wno-sign-compare
> -Wno-unused-parameter -Wno-old-style-declaration -Wno-ignored-qualifiers
> -Wno-clobbered -Wno-missing-field-initializers -Wtype-limits
>   OPTIONS = USE_ZLIB=yes USE_OPENSSL=1 USE_PCRE=1
>
> Default settings :
>   maxconn = 2000, bufsize = 16384, maxrewrite = 1024, maxpollevents =
> 200
>
> Built with OpenSSL version : OpenSSL 1.0.2g  1 Mar 2016
> Running on OpenSSL version : OpenSSL 1.0.2g  1 Mar 2016
> OpenSSL library supports TLS extensions : yes
> OpenSSL library supports SNI : yes
> OpenSSL library supports : TLSv1.0 TLSv1.1 TLSv1.2
> Built with transparent proxy support using: IP_TRANSPARENT
> IPV6_TRANSPARENT IP_FREEBIND
> Encrypted password support via crypt(3): yes
> Built with multi-threading support.
> Built with PCRE version : 8.38 2015-11-23
> Running on PCRE version : 8.38 2015-11-23
> PCRE library supports JIT : no (USE_PCRE_JIT not set)
> Built with zlib version : 1.2.8
> Running on zlib version : 1.2.8
> Compression algorithms supported : identity("identity"),
> deflate("deflate"), raw-deflate("deflate"), gzip("gzip")
> Built with network namespace support.
>
> Available polling systems :
>   epoll : pref=300,  test result OK
>poll : pref=200,  test result OK
>   

Re: Few problems seen in haproxy? (threads, connections).

2018-10-04 Thread Илья Шипицин
haproxy config, nginx config
non default sysctl (if any)

as a side note, can you have a look at "dmesg" output ? do you have nf
conntrack enabled ? what are its limits ?

чт, 4 окт. 2018 г. в 9:59, Krishna Kumar (Engineering) <
krishna...@flipkart.com>:

> Sure.
>
> 1. Client: Use one of the following two setup's:
> - a single baremetal (48 core, 40g) system
>   Run: "wrk -c 4800 -t 48 -d 30s http://:80/128", or,
> - 100 2 core vm's.
>   Run "wrk -c 16 -t 2 -d 30s http://:80/128" from
>   each VM and summarize the results using some
>   parallel-ssh setup.
>
> 2. HAProxy running on a single baremetal (same system config
> as client - 48 core, 40g, 4.17.13 kernel, irq tuned to use different
> cores of the same NUMA node for each irq, kill irqbalance, with
> haproxy configuration file as given in my first mail. Around 60
> backend servers are configured in haproxy.
>
> 3. Backend servers are 2 core VM's running nginx and serving
> a file called "/128", which is 128 bytes in size.
>
> Let me know if you need more information.
>
> Thanks,
> - Krishna
>
>
> On Thu, Oct 4, 2018 at 10:21 AM Илья Шипицин  wrote:
>
>> load testing is somewhat good.
>> can you describe an overall setup ? (I want to reproduce and play with it)
>>
>> чт, 4 окт. 2018 г. в 8:16, Krishna Kumar (Engineering) <
>> krishna...@flipkart.com>:
>>
>>> Re-sending in case this mail was missed. To summarise the 3 issues seen:
>>>
>>> 1. Performance drops 18x with higher number of nbthreads as compared to
>>> nbprocs.
>>> 2. CPU utilisation remains at 100% after wrk finishes for 30 seconds
>>> (for 1.9-dev3
>>> for nbprocs and nbthreads).
>>> 3. Sockets on client remain in FIN-WAIT-2, while on HAProxy it remains
>>> in either
>>>  CLOSE-WAIT (towards clients) and ESTAB (towards the backend
>>> servers), till
>>>  the server/client timeout expires.
>>>
>>> The tests for threads and processes were done on the same systems, so
>>> there is
>>> no difference in system parameters.
>>>
>>> Thanks,
>>> - Krishna
>>>
>>>
>>> On Tue, Oct 2, 2018 at 9:18 PM Krishna Kumar (Engineering) <
>>> krishna...@flipkart.com> wrote:
>>>
 Hi Willy, and community developers,

 I am not sure if I am doing something wrong, but wanted to report
 some issues that I am seeing. Please let me know if this is a problem.

 1. HAProxy system:
 Kernel: 4.17.13,
 CPU: 48 core E5-2670 v3
 Memory: 128GB memory
 NIC: Mellanox 40g with IRQ pinning

 2. Client, 48 core similar to server. Test command line:
 wrk -c 4800 -t 48 -d 30s http:///128

 3. HAProxy version: I am testing both 1.8.14 and 1.9-dev3 (git checkout
 as of
 Oct 2nd).
 # haproxy-git -vv
 HA-Proxy version 1.9-dev3 2018/09/29
 Copyright 2000-2018 Willy Tarreau 

 Build options :
   TARGET  = linux2628
   CPU = generic
   CC  = gcc
   CFLAGS  = -O2 -g -fno-strict-aliasing -Wdeclaration-after-statement
 -fwrapv -fno-strict-overflow -Wno-unused-label -Wno-sign-compare
 -Wno-unused-parameter -Wno-old-style-declaration -Wno-ignored-qualifiers
 -Wno-clobbered -Wno-missing-field-initializers -Wtype-limits
   OPTIONS = USE_ZLIB=yes USE_OPENSSL=1 USE_PCRE=1

 Default settings :
   maxconn = 2000, bufsize = 16384, maxrewrite = 1024, maxpollevents =
 200

 Built with OpenSSL version : OpenSSL 1.0.2g  1 Mar 2016
 Running on OpenSSL version : OpenSSL 1.0.2g  1 Mar 2016
 OpenSSL library supports TLS extensions : yes
 OpenSSL library supports SNI : yes
 OpenSSL library supports : TLSv1.0 TLSv1.1 TLSv1.2
 Built with transparent proxy support using: IP_TRANSPARENT
 IPV6_TRANSPARENT IP_FREEBIND
 Encrypted password support via crypt(3): yes
 Built with multi-threading support.
 Built with PCRE version : 8.38 2015-11-23
 Running on PCRE version : 8.38 2015-11-23
 PCRE library supports JIT : no (USE_PCRE_JIT not set)
 Built with zlib version : 1.2.8
 Running on zlib version : 1.2.8
 Compression algorithms supported : identity("identity"),
 deflate("deflate"), raw-deflate("deflate"), gzip("gzip")
 Built with network namespace support.

 Available polling systems :
   epoll : pref=300,  test result OK
poll : pref=200,  test result OK
  select : pref=150,  test result OK
 Total: 3 (3 usable), will use epoll.

 Available multiplexer protocols :
 (protocols markes as  cannot be specified using 'proto'
 keyword)
   h2 : mode=HTTP   side=FE
 : mode=TCP|HTTP   side=FE|BE

 Available filters :
 [SPOE] spoe
 [COMP] compression
 [TRACE] trace

 4. HAProxy results for #processes and #threads
 #Threads-RPS Procs-RPS
 1 20903 19280
 2 46400 51045
 4 96587 142801
 8 172224 254720
 12 210451 437488
 16 173034 437375

Re: Few problems seen in haproxy? (threads, connections).

2018-10-03 Thread Krishna Kumar (Engineering)
Sure.

1. Client: Use one of the following two setup's:
- a single baremetal (48 core, 40g) system
  Run: "wrk -c 4800 -t 48 -d 30s http://:80/128", or,
- 100 2 core vm's.
  Run "wrk -c 16 -t 2 -d 30s http://:80/128" from
  each VM and summarize the results using some
  parallel-ssh setup.

2. HAProxy running on a single baremetal (same system config
as client - 48 core, 40g, 4.17.13 kernel, irq tuned to use different
cores of the same NUMA node for each irq, kill irqbalance, with
haproxy configuration file as given in my first mail. Around 60
backend servers are configured in haproxy.

3. Backend servers are 2 core VM's running nginx and serving
a file called "/128", which is 128 bytes in size.

Let me know if you need more information.

Thanks,
- Krishna


On Thu, Oct 4, 2018 at 10:21 AM Илья Шипицин  wrote:

> load testing is somewhat good.
> can you describe an overall setup ? (I want to reproduce and play with it)
>
> чт, 4 окт. 2018 г. в 8:16, Krishna Kumar (Engineering) <
> krishna...@flipkart.com>:
>
>> Re-sending in case this mail was missed. To summarise the 3 issues seen:
>>
>> 1. Performance drops 18x with higher number of nbthreads as compared to
>> nbprocs.
>> 2. CPU utilisation remains at 100% after wrk finishes for 30 seconds (for
>> 1.9-dev3
>> for nbprocs and nbthreads).
>> 3. Sockets on client remain in FIN-WAIT-2, while on HAProxy it remains in
>> either
>>  CLOSE-WAIT (towards clients) and ESTAB (towards the backend
>> servers), till
>>  the server/client timeout expires.
>>
>> The tests for threads and processes were done on the same systems, so
>> there is
>> no difference in system parameters.
>>
>> Thanks,
>> - Krishna
>>
>>
>> On Tue, Oct 2, 2018 at 9:18 PM Krishna Kumar (Engineering) <
>> krishna...@flipkart.com> wrote:
>>
>>> Hi Willy, and community developers,
>>>
>>> I am not sure if I am doing something wrong, but wanted to report
>>> some issues that I am seeing. Please let me know if this is a problem.
>>>
>>> 1. HAProxy system:
>>> Kernel: 4.17.13,
>>> CPU: 48 core E5-2670 v3
>>> Memory: 128GB memory
>>> NIC: Mellanox 40g with IRQ pinning
>>>
>>> 2. Client, 48 core similar to server. Test command line:
>>> wrk -c 4800 -t 48 -d 30s http:///128
>>>
>>> 3. HAProxy version: I am testing both 1.8.14 and 1.9-dev3 (git checkout
>>> as of
>>> Oct 2nd).
>>> # haproxy-git -vv
>>> HA-Proxy version 1.9-dev3 2018/09/29
>>> Copyright 2000-2018 Willy Tarreau 
>>>
>>> Build options :
>>>   TARGET  = linux2628
>>>   CPU = generic
>>>   CC  = gcc
>>>   CFLAGS  = -O2 -g -fno-strict-aliasing -Wdeclaration-after-statement
>>> -fwrapv -fno-strict-overflow -Wno-unused-label -Wno-sign-compare
>>> -Wno-unused-parameter -Wno-old-style-declaration -Wno-ignored-qualifiers
>>> -Wno-clobbered -Wno-missing-field-initializers -Wtype-limits
>>>   OPTIONS = USE_ZLIB=yes USE_OPENSSL=1 USE_PCRE=1
>>>
>>> Default settings :
>>>   maxconn = 2000, bufsize = 16384, maxrewrite = 1024, maxpollevents = 200
>>>
>>> Built with OpenSSL version : OpenSSL 1.0.2g  1 Mar 2016
>>> Running on OpenSSL version : OpenSSL 1.0.2g  1 Mar 2016
>>> OpenSSL library supports TLS extensions : yes
>>> OpenSSL library supports SNI : yes
>>> OpenSSL library supports : TLSv1.0 TLSv1.1 TLSv1.2
>>> Built with transparent proxy support using: IP_TRANSPARENT
>>> IPV6_TRANSPARENT IP_FREEBIND
>>> Encrypted password support via crypt(3): yes
>>> Built with multi-threading support.
>>> Built with PCRE version : 8.38 2015-11-23
>>> Running on PCRE version : 8.38 2015-11-23
>>> PCRE library supports JIT : no (USE_PCRE_JIT not set)
>>> Built with zlib version : 1.2.8
>>> Running on zlib version : 1.2.8
>>> Compression algorithms supported : identity("identity"),
>>> deflate("deflate"), raw-deflate("deflate"), gzip("gzip")
>>> Built with network namespace support.
>>>
>>> Available polling systems :
>>>   epoll : pref=300,  test result OK
>>>poll : pref=200,  test result OK
>>>  select : pref=150,  test result OK
>>> Total: 3 (3 usable), will use epoll.
>>>
>>> Available multiplexer protocols :
>>> (protocols markes as  cannot be specified using 'proto' keyword)
>>>   h2 : mode=HTTP   side=FE
>>> : mode=TCP|HTTP   side=FE|BE
>>>
>>> Available filters :
>>> [SPOE] spoe
>>> [COMP] compression
>>> [TRACE] trace
>>>
>>> 4. HAProxy results for #processes and #threads
>>> #Threads-RPS Procs-RPS
>>> 1 20903 19280
>>> 2 46400 51045
>>> 4 96587 142801
>>> 8 172224 254720
>>> 12 210451 437488
>>> 16 173034 437375
>>> 24 79069 519367
>>> 32 55607 586367
>>> 48 31739 596148
>>>
>>> 5. Lock stats for 1.9-dev3: Some write locks on average took a lot more
>>> time
>>>to acquire, e.g. "POOL" and "TASK_WQ". For 48 threads, I get:
>>> Stats about Lock FD:
>>> # write lock  : 143933900
>>> # write unlock: 143933895 (-5)
>>> # wait time for write : 11370.245 msec
>>> # wait time for write/lock: 78.996 nsec
>>> # read lock   : 0

Re: Few problems seen in haproxy? (threads, connections).

2018-10-03 Thread Илья Шипицин
load testing is somewhat good.
can you describe an overall setup ? (I want to reproduce and play with it)

чт, 4 окт. 2018 г. в 8:16, Krishna Kumar (Engineering) <
krishna...@flipkart.com>:

> Re-sending in case this mail was missed. To summarise the 3 issues seen:
>
> 1. Performance drops 18x with higher number of nbthreads as compared to
> nbprocs.
> 2. CPU utilisation remains at 100% after wrk finishes for 30 seconds (for
> 1.9-dev3
> for nbprocs and nbthreads).
> 3. Sockets on client remain in FIN-WAIT-2, while on HAProxy it remains in
> either
>  CLOSE-WAIT (towards clients) and ESTAB (towards the backend servers),
> till
>  the server/client timeout expires.
>
> The tests for threads and processes were done on the same systems, so
> there is
> no difference in system parameters.
>
> Thanks,
> - Krishna
>
>
> On Tue, Oct 2, 2018 at 9:18 PM Krishna Kumar (Engineering) <
> krishna...@flipkart.com> wrote:
>
>> Hi Willy, and community developers,
>>
>> I am not sure if I am doing something wrong, but wanted to report
>> some issues that I am seeing. Please let me know if this is a problem.
>>
>> 1. HAProxy system:
>> Kernel: 4.17.13,
>> CPU: 48 core E5-2670 v3
>> Memory: 128GB memory
>> NIC: Mellanox 40g with IRQ pinning
>>
>> 2. Client, 48 core similar to server. Test command line:
>> wrk -c 4800 -t 48 -d 30s http:///128
>>
>> 3. HAProxy version: I am testing both 1.8.14 and 1.9-dev3 (git checkout
>> as of
>> Oct 2nd).
>> # haproxy-git -vv
>> HA-Proxy version 1.9-dev3 2018/09/29
>> Copyright 2000-2018 Willy Tarreau 
>>
>> Build options :
>>   TARGET  = linux2628
>>   CPU = generic
>>   CC  = gcc
>>   CFLAGS  = -O2 -g -fno-strict-aliasing -Wdeclaration-after-statement
>> -fwrapv -fno-strict-overflow -Wno-unused-label -Wno-sign-compare
>> -Wno-unused-parameter -Wno-old-style-declaration -Wno-ignored-qualifiers
>> -Wno-clobbered -Wno-missing-field-initializers -Wtype-limits
>>   OPTIONS = USE_ZLIB=yes USE_OPENSSL=1 USE_PCRE=1
>>
>> Default settings :
>>   maxconn = 2000, bufsize = 16384, maxrewrite = 1024, maxpollevents = 200
>>
>> Built with OpenSSL version : OpenSSL 1.0.2g  1 Mar 2016
>> Running on OpenSSL version : OpenSSL 1.0.2g  1 Mar 2016
>> OpenSSL library supports TLS extensions : yes
>> OpenSSL library supports SNI : yes
>> OpenSSL library supports : TLSv1.0 TLSv1.1 TLSv1.2
>> Built with transparent proxy support using: IP_TRANSPARENT
>> IPV6_TRANSPARENT IP_FREEBIND
>> Encrypted password support via crypt(3): yes
>> Built with multi-threading support.
>> Built with PCRE version : 8.38 2015-11-23
>> Running on PCRE version : 8.38 2015-11-23
>> PCRE library supports JIT : no (USE_PCRE_JIT not set)
>> Built with zlib version : 1.2.8
>> Running on zlib version : 1.2.8
>> Compression algorithms supported : identity("identity"),
>> deflate("deflate"), raw-deflate("deflate"), gzip("gzip")
>> Built with network namespace support.
>>
>> Available polling systems :
>>   epoll : pref=300,  test result OK
>>poll : pref=200,  test result OK
>>  select : pref=150,  test result OK
>> Total: 3 (3 usable), will use epoll.
>>
>> Available multiplexer protocols :
>> (protocols markes as  cannot be specified using 'proto' keyword)
>>   h2 : mode=HTTP   side=FE
>> : mode=TCP|HTTP   side=FE|BE
>>
>> Available filters :
>> [SPOE] spoe
>> [COMP] compression
>> [TRACE] trace
>>
>> 4. HAProxy results for #processes and #threads
>> #Threads-RPS Procs-RPS
>> 1 20903 19280
>> 2 46400 51045
>> 4 96587 142801
>> 8 172224 254720
>> 12 210451 437488
>> 16 173034 437375
>> 24 79069 519367
>> 32 55607 586367
>> 48 31739 596148
>>
>> 5. Lock stats for 1.9-dev3: Some write locks on average took a lot more
>> time
>>to acquire, e.g. "POOL" and "TASK_WQ". For 48 threads, I get:
>> Stats about Lock FD:
>> # write lock  : 143933900
>> # write unlock: 143933895 (-5)
>> # wait time for write : 11370.245 msec
>> # wait time for write/lock: 78.996 nsec
>> # read lock   : 0
>> # read unlock : 0 (0)
>> # wait time for read  : 0.000 msec
>> # wait time for read/lock : 0.000 nsec
>> Stats about Lock TASK_RQ:
>> # write lock  : 2062874
>> # write unlock: 2062875 (1)
>> # wait time for write : 7820.234 msec
>> # wait time for write/lock: 3790.941 nsec
>> # read lock   : 0
>> # read unlock : 0 (0)
>> # wait time for read  : 0.000 msec
>> # wait time for read/lock : 0.000 nsec
>> Stats about Lock TASK_WQ:
>> # write lock  : 2601227
>> # write unlock: 2601227 (0)
>> # wait time for write : 5019.811 msec
>> # wait time for write/lock: 1929.786 nsec
>> # read lock   : 0
>> # read unlock : 0 (0)
>> # wait time for read  : 0.000 msec
>> # wait time for read/lock : 0.000 nsec
>> Stats about Lock POOL:
>> # write lock  : 2823393
>> # write unlock: 2823393 (0)
>> # wait time for write : 11984.706 msec
>> # wait time for write/lock: 4244.788 nsec
>> # read lock   : 0
>> # read unlock : 0 (0)
>> # wait time for read  : 0.000 msec
>> # wait time for 

Re: Few problems seen in haproxy? (threads, connections).

2018-10-03 Thread Krishna Kumar (Engineering)
Re-sending in case this mail was missed. To summarise the 3 issues seen:

1. Performance drops 18x with higher number of nbthreads as compared to
nbprocs.
2. CPU utilisation remains at 100% after wrk finishes for 30 seconds (for
1.9-dev3
for nbprocs and nbthreads).
3. Sockets on client remain in FIN-WAIT-2, while on HAProxy it remains in
either
 CLOSE-WAIT (towards clients) and ESTAB (towards the backend servers),
till
 the server/client timeout expires.

The tests for threads and processes were done on the same systems, so there
is
no difference in system parameters.

Thanks,
- Krishna


On Tue, Oct 2, 2018 at 9:18 PM Krishna Kumar (Engineering) <
krishna...@flipkart.com> wrote:

> Hi Willy, and community developers,
>
> I am not sure if I am doing something wrong, but wanted to report
> some issues that I am seeing. Please let me know if this is a problem.
>
> 1. HAProxy system:
> Kernel: 4.17.13,
> CPU: 48 core E5-2670 v3
> Memory: 128GB memory
> NIC: Mellanox 40g with IRQ pinning
>
> 2. Client, 48 core similar to server. Test command line:
> wrk -c 4800 -t 48 -d 30s http:///128
>
> 3. HAProxy version: I am testing both 1.8.14 and 1.9-dev3 (git checkout as
> of
> Oct 2nd).
> # haproxy-git -vv
> HA-Proxy version 1.9-dev3 2018/09/29
> Copyright 2000-2018 Willy Tarreau 
>
> Build options :
>   TARGET  = linux2628
>   CPU = generic
>   CC  = gcc
>   CFLAGS  = -O2 -g -fno-strict-aliasing -Wdeclaration-after-statement
> -fwrapv -fno-strict-overflow -Wno-unused-label -Wno-sign-compare
> -Wno-unused-parameter -Wno-old-style-declaration -Wno-ignored-qualifiers
> -Wno-clobbered -Wno-missing-field-initializers -Wtype-limits
>   OPTIONS = USE_ZLIB=yes USE_OPENSSL=1 USE_PCRE=1
>
> Default settings :
>   maxconn = 2000, bufsize = 16384, maxrewrite = 1024, maxpollevents = 200
>
> Built with OpenSSL version : OpenSSL 1.0.2g  1 Mar 2016
> Running on OpenSSL version : OpenSSL 1.0.2g  1 Mar 2016
> OpenSSL library supports TLS extensions : yes
> OpenSSL library supports SNI : yes
> OpenSSL library supports : TLSv1.0 TLSv1.1 TLSv1.2
> Built with transparent proxy support using: IP_TRANSPARENT
> IPV6_TRANSPARENT IP_FREEBIND
> Encrypted password support via crypt(3): yes
> Built with multi-threading support.
> Built with PCRE version : 8.38 2015-11-23
> Running on PCRE version : 8.38 2015-11-23
> PCRE library supports JIT : no (USE_PCRE_JIT not set)
> Built with zlib version : 1.2.8
> Running on zlib version : 1.2.8
> Compression algorithms supported : identity("identity"),
> deflate("deflate"), raw-deflate("deflate"), gzip("gzip")
> Built with network namespace support.
>
> Available polling systems :
>   epoll : pref=300,  test result OK
>poll : pref=200,  test result OK
>  select : pref=150,  test result OK
> Total: 3 (3 usable), will use epoll.
>
> Available multiplexer protocols :
> (protocols markes as  cannot be specified using 'proto' keyword)
>   h2 : mode=HTTP   side=FE
> : mode=TCP|HTTP   side=FE|BE
>
> Available filters :
> [SPOE] spoe
> [COMP] compression
> [TRACE] trace
>
> 4. HAProxy results for #processes and #threads
> #Threads-RPS Procs-RPS
> 1 20903 19280
> 2 46400 51045
> 4 96587 142801
> 8 172224 254720
> 12 210451 437488
> 16 173034 437375
> 24 79069 519367
> 32 55607 586367
> 48 31739 596148
>
> 5. Lock stats for 1.9-dev3: Some write locks on average took a lot more
> time
>to acquire, e.g. "POOL" and "TASK_WQ". For 48 threads, I get:
> Stats about Lock FD:
> # write lock  : 143933900
> # write unlock: 143933895 (-5)
> # wait time for write : 11370.245 msec
> # wait time for write/lock: 78.996 nsec
> # read lock   : 0
> # read unlock : 0 (0)
> # wait time for read  : 0.000 msec
> # wait time for read/lock : 0.000 nsec
> Stats about Lock TASK_RQ:
> # write lock  : 2062874
> # write unlock: 2062875 (1)
> # wait time for write : 7820.234 msec
> # wait time for write/lock: 3790.941 nsec
> # read lock   : 0
> # read unlock : 0 (0)
> # wait time for read  : 0.000 msec
> # wait time for read/lock : 0.000 nsec
> Stats about Lock TASK_WQ:
> # write lock  : 2601227
> # write unlock: 2601227 (0)
> # wait time for write : 5019.811 msec
> # wait time for write/lock: 1929.786 nsec
> # read lock   : 0
> # read unlock : 0 (0)
> # wait time for read  : 0.000 msec
> # wait time for read/lock : 0.000 nsec
> Stats about Lock POOL:
> # write lock  : 2823393
> # write unlock: 2823393 (0)
> # wait time for write : 11984.706 msec
> # wait time for write/lock: 4244.788 nsec
> # read lock   : 0
> # read unlock : 0 (0)
> # wait time for read  : 0.000 msec
> # wait time for read/lock : 0.000 nsec
> Stats about Lock LISTENER:
> # write lock  : 184
> # write unlock: 184 (0)
> # wait time for write : 0.011 msec
> # wait time for write/lock: 60.554 nsec
> # read lock   : 0
> # read unlock : 0 (0)
> # wait time for read  : 0.000 msec
> # wait time for read/lock : 0.000 nsec
> Stats about Lock PROXY:
> # write lock  : 291557

Few problems seen in haproxy? (threads, connections).

2018-10-02 Thread Krishna Kumar (Engineering)
Hi Willy, and community developers,

I am not sure if I am doing something wrong, but wanted to report
some issues that I am seeing. Please let me know if this is a problem.

1. HAProxy system:
Kernel: 4.17.13,
CPU: 48 core E5-2670 v3
Memory: 128GB memory
NIC: Mellanox 40g with IRQ pinning

2. Client, 48 core similar to server. Test command line:
wrk -c 4800 -t 48 -d 30s http:///128

3. HAProxy version: I am testing both 1.8.14 and 1.9-dev3 (git checkout as
of
Oct 2nd).
# haproxy-git -vv
HA-Proxy version 1.9-dev3 2018/09/29
Copyright 2000-2018 Willy Tarreau 

Build options :
  TARGET  = linux2628
  CPU = generic
  CC  = gcc
  CFLAGS  = -O2 -g -fno-strict-aliasing -Wdeclaration-after-statement
-fwrapv -fno-strict-overflow -Wno-unused-label -Wno-sign-compare
-Wno-unused-parameter -Wno-old-style-declaration -Wno-ignored-qualifiers
-Wno-clobbered -Wno-missing-field-initializers -Wtype-limits
  OPTIONS = USE_ZLIB=yes USE_OPENSSL=1 USE_PCRE=1

Default settings :
  maxconn = 2000, bufsize = 16384, maxrewrite = 1024, maxpollevents = 200

Built with OpenSSL version : OpenSSL 1.0.2g  1 Mar 2016
Running on OpenSSL version : OpenSSL 1.0.2g  1 Mar 2016
OpenSSL library supports TLS extensions : yes
OpenSSL library supports SNI : yes
OpenSSL library supports : TLSv1.0 TLSv1.1 TLSv1.2
Built with transparent proxy support using: IP_TRANSPARENT IPV6_TRANSPARENT
IP_FREEBIND
Encrypted password support via crypt(3): yes
Built with multi-threading support.
Built with PCRE version : 8.38 2015-11-23
Running on PCRE version : 8.38 2015-11-23
PCRE library supports JIT : no (USE_PCRE_JIT not set)
Built with zlib version : 1.2.8
Running on zlib version : 1.2.8
Compression algorithms supported : identity("identity"),
deflate("deflate"), raw-deflate("deflate"), gzip("gzip")
Built with network namespace support.

Available polling systems :
  epoll : pref=300,  test result OK
   poll : pref=200,  test result OK
 select : pref=150,  test result OK
Total: 3 (3 usable), will use epoll.

Available multiplexer protocols :
(protocols markes as  cannot be specified using 'proto' keyword)
  h2 : mode=HTTP   side=FE
: mode=TCP|HTTP   side=FE|BE

Available filters :
[SPOE] spoe
[COMP] compression
[TRACE] trace

4. HAProxy results for #processes and #threads
#Threads-RPS Procs-RPS
1 20903 19280
2 46400 51045
4 96587 142801
8 172224 254720
12 210451 437488
16 173034 437375
24 79069 519367
32 55607 586367
48 31739 596148

5. Lock stats for 1.9-dev3: Some write locks on average took a lot more time
   to acquire, e.g. "POOL" and "TASK_WQ". For 48 threads, I get:
Stats about Lock FD:
# write lock  : 143933900
# write unlock: 143933895 (-5)
# wait time for write : 11370.245 msec
# wait time for write/lock: 78.996 nsec
# read lock   : 0
# read unlock : 0 (0)
# wait time for read  : 0.000 msec
# wait time for read/lock : 0.000 nsec
Stats about Lock TASK_RQ:
# write lock  : 2062874
# write unlock: 2062875 (1)
# wait time for write : 7820.234 msec
# wait time for write/lock: 3790.941 nsec
# read lock   : 0
# read unlock : 0 (0)
# wait time for read  : 0.000 msec
# wait time for read/lock : 0.000 nsec
Stats about Lock TASK_WQ:
# write lock  : 2601227
# write unlock: 2601227 (0)
# wait time for write : 5019.811 msec
# wait time for write/lock: 1929.786 nsec
# read lock   : 0
# read unlock : 0 (0)
# wait time for read  : 0.000 msec
# wait time for read/lock : 0.000 nsec
Stats about Lock POOL:
# write lock  : 2823393
# write unlock: 2823393 (0)
# wait time for write : 11984.706 msec
# wait time for write/lock: 4244.788 nsec
# read lock   : 0
# read unlock : 0 (0)
# wait time for read  : 0.000 msec
# wait time for read/lock : 0.000 nsec
Stats about Lock LISTENER:
# write lock  : 184
# write unlock: 184 (0)
# wait time for write : 0.011 msec
# wait time for write/lock: 60.554 nsec
# read lock   : 0
# read unlock : 0 (0)
# wait time for read  : 0.000 msec
# wait time for read/lock : 0.000 nsec
Stats about Lock PROXY:
# write lock  : 291557
# write unlock: 291557 (0)
# wait time for write : 109.694 msec
# wait time for write/lock: 376.235 nsec
# read lock   : 0
# read unlock : 0 (0)
# wait time for read  : 0.000 msec
# wait time for read/lock : 0.000 nsec
Stats about Lock SERVER:
# write lock  : 1188511
# write unlock: 1188511 (0)
# wait time for write : 854.171 msec
# wait time for write/lock: 718.690 nsec
# read lock   : 0
# read unlock : 0 (0)
# wait time for read  : 0.000 msec
# wait time for read/lock : 0.000 nsec
Stats about Lock LBPRM:
# write lock  : 1184709
# write unlock: 1184709 (0)
# wait time for write : 778.947 msec
# wait time for write/lock: 657.501 nsec
# read lock   : 0
# read unlock : 0 (0)
# wait time for read  : 0.000 msec
# wait time for read/lock : 0.000 nsec
Stats about Lock BUF_WQ:
# write lock  : 669247
# write unlock: 669247 (0)
# wait time for write : 252.265 msec
# wait time for write/lock: 376.939 nsec
# read lock