RE: Haproxy timing issues

Erik Torlen Wed, 02 Nov 2011 11:18:22 -0700

Hi,

Yeah the clients are not the problem, we are using 5 different datacenters with 
5 machines each so ~25 machines. Hardcore loadtesting :)
Btw, the loadtest are done transatlantic so that is causing latency etc.


After some more testing yesterday we found at just what you mentioned here: 
using stud with too many processes made the result much
more worse.
The perfect setup turned out to be stud with n=3 and haproxy nbproc=1. 
Increasing n with n=4,5,6.. made the result much worse. 

When I got these results I used stud with n=6 which caused a lot of response 
time problems. However, I don't see these response time
now when running with n=3 in haproxy logs. So how could stud with n=6 affect 
the response time on the "backend" in haproxy logs?

We are currently using the latest version of stud from github, 
bumptech-stud-0.2-76-g8012fe3. Is the "emericbr patches" merge 
in there or is that a fork?

The loadtest client is doing a renegotiation for every connection. The scenario 
is containing 3 small images.
Each connection is making 3 request times 10 with 3-7s waittime between each 
request.
This is to maintain the connection as long as possible and get many active 
connections. (We also have scenarios doing a lot of conns/s etc).

Yeah, Aloha would have been cool to test. But this is not for us, this is for a 
customer :)

These are my main sysctl values which gave me visible performance improvement:
net.ipv4.tcp_max_syn_backlog=262144
net.ipv4.tcp_syncookies=0
net.ipv4.tcp_tw_reuse=1
net.ipv4.tcp_no_metrics_save=1
net.core.somaxconn=262144

net.ipv4.ip_local_port_range=1024 65536
net.ipv4.tcp_tw_recycle=1

These are some more I have tried with but it did not gave me so much 
improvement:
#net.ipv4.tcp_rmem=4096 87380 16777216
#net.ipv4.tcp_wmem=4096 65536 16777216
#net.ipv4.tcp_fin_timeout = 3
#net.ipv4.tcp_max_orphans = 262144
#net.ipv4.tcp_synack_retries = 2
#net.ipv4.tcp_syn_retries = 2

#net.core.rmem_max=16777216
#net.core.wmem_max=16777216
#net.core.netdev_max_backlog = 262144

/E


-----Original Message-----
From: Baptiste [mailto:bed...@gmail.com] 
Sent: den 1 november 2011 16:08
To: Erik Torlen
Cc: haproxy@formilux.org
Subject: Re: Haproxy timing issues

Hi,

First question: are you sure you're reaching the limit of
haproxy/varnish and not the limit of your client?
Mainly concerning the increasing response time.

How many CPUs do you have in your VM? Starting too much stud proccess
could be counter-productive.
I doubt doing CPU affinity in a VM improves something :)

Concerning the logs, the time we can see on your client side are very
high! Too high :)
3/4s for HAProxy to get the full request.

How are you running stud?
Which options? Are you using the one with emericbr patches?
Are you using requesting using the same SSL Session ID or do you
renegotiate a new one for each connection?

Have you checked your network statisitics, on both client and server side?
netstat -in and netstat -s
Is there a lot of drops, retransmission, congestion, etc...

On your last log line, we can see that HAProxy took 22s to establish a
TCP connection to your Varnish...

Can you share your stud, haproxy, and varnish configuration, the
version of each software, the startup parameters for Varnish.
What kind of tool do you use on your client to run your load test?
What sysctl have you already tunned?


Unfortunately, the Aloha does not run on Amazon :)


cheers,


On Tue, Nov 1, 2011 at 9:16 PM, Erik Torlen <erik.tor...@apicasystem.com> wrote:
> Hi,
>
> I am currently (and have been from time to time the last weeks) doing some 
> heavy loadtesting against haproxy with stud in front of it handling the ssl.
>
> My loadtest has been focused on loadtesting SSL traffic through stud against 
> haproxy on amazon ec2.
>
> Our current problem is that we cannot get more then ~30k active connections 
> (~150 conns/s) until we starting to see increased response time (>10-60s) on 
> the
> client side. Running with 38k connections now and seeing much higher response 
> time.
>
> The setup is:
> 1 instance running haproxy + stud
> 2 instances running varnish server 3 cached images
>
> Varnish has 100% cache hit ratio so nothing goes to the backend.
>
> We have tried using m1.xlarge and the c1.xlarge. The m1.xlarge uses almost 
> 100% cpu when doing the loadtests while c1.xlarge has a lot of resources left 
> (stud using a few percent per process) and haproxy ~60-70%cpu.
> The only difference is that c1.xlarge gives quite better response time before 
> the actual problem happens where resp times are increasing.
>
> Haproxy is running with nbproc=1
> Stud is running with n=6 and shared session cache. (Tried it with n=3 as well
>
> From the logging in haproxy I could see that the time it takes to establish a 
> connection against the backend and receive the data:
>
> Haproxy.log
> Nov  1 18:40:35 127.0.0.1 haproxy[18511]: x.x.x.x:54113 
> [01/Nov/2011:18:39:40.273] varnish varnish/varnish1 4519/0/73/50215/54809 200 
> 2715 - - ---- 238/236/4/5/0 0/0 "GET /assets/images/icons/elite_logo_beta.png 
> HTTP/1.1"
> Nov  1 18:40:35 127.0.0.1 haproxy[18511]: x.x.x.x:55635 
> [01/Nov/2011:18:39:41.547] varnish varnish/varnish1 3245/0/81/50207/53535 200 
> 1512 - - ---- 238/236/3/4/0 0/0 "GET /assets/images/icons/favicon.ico 
> HTTP/1.1"
> ...
> Nov  1 18:40:44 127.0.0.1 haproxy[18511]: x.x.x.x:34453 
> [01/Nov/2011:18:39:25.330] varnish varnish/varnish1 3082/0/225/32661/79559 
> 200 1512 - - ---- 234/232/1/2/0 0/0 "GET /assets/images/icons/favicon.ico 
> HTTP/1.1"
> Nov  1 18:40:44 127.0.0.1 haproxy[18511]: x.x.x.x:53731 
> [01/Nov/2011:18:39:25.036] varnish varnish/varnish1 3377/0/216/32669/79854 
> 200 1725 - - ---- 233/231/0/1/0 0/0 "GET /assets/images/create/action_btn.png 
> HTTP/1.1"
>
> Haproxy.err (NOTE: 504 error here)
>
> Nov  1 18:40:11 127.0.0.1 haproxy[18511]: x.x.x.x:34885 
> [01/Nov/2011:18:39:07.597] varnish varnish/varnish1 4299/0/27/-1/64330 504 
> 194 - - sH-- 10916/10914/4777/2700/0 0/0 "GET 
> /assets/images/icons/favicon.ico HTTP/1.1"
> Nov  1 18:40:12 127.0.0.1 haproxy[18511]: x.x.x.x:58878 
> [01/Nov/2011:18:39:12.621] varnish varnish/varnish2 314/0/55/-1/60374 504 194 
> - - sH-- 3692/3690/3392/1623/0 0/0 "GET /assets/images/icons/favicon.ico 
> HTTP/1.1"
>
> Nov  1 18:40:18 127.0.0.1 haproxy[18511]: x.x.x.x:35505 
> [01/Nov/2011:18:39:42.670] varnish varnish/varnish1 3515/0/22078/10217/35811 
> 200 1512 - - ---- 1482/1481/1238/710/1 0/0 "GET 
> /assets/images/icons/favicon.ico HTTP/1.1"
> Nov  1 18:40:18 127.0.0.1 haproxy[18511]: x.x.x.x:40602 
> [01/Nov/2011:18:39:42.056] varnish varnish/varnish1 4126/0/22081/10226/36435 
> 200 1512 - - ---- 1475/1474/1231/703/1 0/0 "GET 
> /assets/images/icons/favicon.ico HTTP/1.1"
>
>
> Here is the logs from running haproxy with varnish as a backend on the local 
> machine:
>
> Haproxy.log
> Nov  1 20:00:52 127.0.0.1 haproxy[18953]: x.x.x.x:38552 
> [01/Nov/2011:20:00:45.157] varnish varnish/local_varnish 7513/0/0/0/7513 200 
> 1725 - - ---- 4/3/0/1/0 0/0 "GET /assets/images/create/action_btn.png 
> HTTP/1.1"
> Nov  1 20:00:54 127.0.0.1 haproxy[18953]: x.x.x.x:40850 
> [01/Nov/2011:20:00:48.219] varnish varnish/local_varnish 6524/0/0/0/6524 200 
> 1725 - - ---- 2/1/0/1/0 0/0 "GET /assets/images/create/action_btn.png 
> HTTP/1.1"
>
> Haproxy.err
> Nov  1 20:00:38 127.0.0.1 haproxy[18953]: x.x.x.x:39649 
> [01/Nov/2011:20:00:08.665] varnish varnish/local_varnish 
> 7412/0/22090/23/29525 200 1511 - - ---- 15700/15698/267/268/1 0/0 "GET 
> /assets/images/icons/favicon.ico HTTP/1.1"
> Nov  1 20:00:38 127.0.0.1 haproxy[18953]: x.x.x.x:54565 
> [01/Nov/2011:20:00:12.255] varnish varnish/local_varnish 
> 3823/0/22090/23/25936 200 1511 - - ---- 15700/15698/266/267/1 0/0 "GET 
> /assets/images/icons/favicon.ico HTTP/1.1"
>
> I see on all these tests that haproxy-stats is showing %1 idle but "top" 
> shows that haproxy are using ~70% cpu?
>
> The "jungle" aka amazon and its internal network are causing a lot of latency 
> when running varnish on external machine. The response times get better when 
> running
> varnish locally (~0.5s improvement).
> But there is still very high response times in haproxy.err when running 
> varnish locally?
>
> I have played around with sysctl values and found some that improved my 
> performance.
> My feeling is that I need to tune some more values in order to go beyond this 
> level, suggestions?
>
> Kind Regards
> Erik
>
>

RE: Haproxy timing issues

Reply via email to