Hi Willy and the list,

I couldn't find time for haproxy for some weeks. Now I'm on holidays, I try to 
review some patches I had on my test machine.
One of them is the possibility to limit the number of HTTP keep-alive 
connections to allow a better concurrency between clients.

I propose to add a suboption to the "http-server-close" one to let haproxy 
fall back to a "httpclose" mode once a certain number of connections on the 
frontend is reached.
The value can be defined :
- as an absolute limit
  Example :
    maxconn 1000
    option http-server-close limit 500

- or as a percent of the frontend maxconn
  Example :
    maxconn 1000
    option http-server-close limit 75%

Let me illustrate the benefits, sorry if it's a bit long to read ;-)

* THE CONFIGURATION

First, I used this configuration :
(maxconn values were set to 150 to ease the tests on a laptop that was not 
tuned for high # of connections)
global
        log localhost local7 debug err

defaults
        timeout server 60s
        timeout client 60s
        timeout connect 5s
        timeout http-keep-alive 5s
        log global
        option httplog

listen scl-without-limit
        bind :8000
        maxconn 150
        mode http
        option http-server-close
        capture request header User-Agent len 5
        server local 127.0.0.1:80 maxconn 150

listen close
        bind :8001
        maxconn 150
        mode http
        option httpclose
        capture request header User-Agent len 5
        server local 127.0.0.1:80 maxconn 150

listen scl-with-limit-75pct
        bind :8002
        maxconn 150
        mode http
        option http-server-close limit 75%
        capture request header User-Agent len 5
        server local 127.0.0.1:80 maxconn 150

listen scl-with-limit-95pct
        bind :8003
        maxconn 150
        mode http
        option http-server-close limit 95%
        capture request header User-Agent len 5
        server local 127.0.0.1:80 maxconn 150

listen scl-with-limit-50pct
        bind :8004
        maxconn 150
        mode http
        option http-server-close limit 50%
        capture request header User-Agent len 5
        server local 127.0.0.1:80 maxconn 150

listen scl-with-limit-25pct
        bind :8005
        maxconn 150
        mode http
        option http-server-close limit 25%
        capture request header User-Agent len 5
        server local 127.0.0.1:80 maxconn 150

And I defined a test URL that waits some times before replying (100ms in this 
tests).

* THE SCENARIO

The scenario I used is :
ab -H "User-Agent: test1" -n10000 -c150 -k http://localhost:<port>/ &
sleep 1
ab -H "User-Agent: test2" -n10000 -c150 -k http://localhost:<port>/ &
sleep 1
curl -H "User-Agent: test3" http://localhost:<port>/

and as soon as each "ab" instances are done, I launch a final "ab" test to 
compare :
ab -H "User-Agent: test4" -n10000 -c150 -k http://localhost:<port>/

I've written a log analyzer to sum up the scenario execution, second by 
second.
For each test, it shows :
- the HTTP keep-alive efficiency
- when the test could really obtain its first response (the '|' characters 
indicates that the test is started but is waiting for a connection).
- how long the test runned to obtain the last response
and the global keep-alive efficiency measured.

* USING option http-server-close

Let's see what happens with this scenario when we use the current
"http-server-close" option :

Date      Frontend              {test1}  {test2}  {test3}  {test4}  Global
00:00:00  scl-without-limit     100                                 100
00:00:01  scl-without-limit     100      |                          100
00:00:02  scl-without-limit     100      |        |                 100
00:00:03  scl-without-limit     100      |        |                 100
00:00:04  scl-without-limit     100      |        |                 100
00:00:05  scl-without-limit     100      |        |                 100
00:00:06  scl-without-limit     100      |        |                 100
00:00:07  scl-without-limit     100      |        |                 100
00:00:08  scl-without-limit     100      |        |                 100
00:00:19  scl-without-limit     100      |        |                 100
00:00:10  scl-without-limit     100      |        |                 100
00:00:11  scl-without-limit     100      |        |                 100
00:00:12  scl-without-limit     100      |        |                 100
00:00:13  scl-without-limit              100      |                 100
00:00:14  scl-without-limit              100      |                 100
00:00:15  scl-without-limit              100      |                 100
00:00:16  scl-without-limit              100      |                 100
00:00:17  scl-without-limit              100      |                 100
00:00:18  scl-without-limit              100      |                 100
00:00:19  scl-without-limit              100      |                 100
00:00:20  scl-without-limit              100      |                 100
00:00:21  scl-without-limit                       100               100
00:01:22  scl-without-limit                                100      100
00:01:23  scl-without-limit                                100      100
00:01:24  scl-without-limit                                100      100
00:01:25  scl-without-limit                                100      100
00:01:26  scl-without-limit                                100      100
00:01:27  scl-without-limit                                100      100
00:01:28  scl-without-limit                                100      100
00:01:29  scl-without-limit                                100      100

- test1 used all the connections allowed by haproxy.
- test2 can't obtain any connection unless test1 is finished.
- test3 also have to wait until test1 and test2 are finished (sometimes it can 
be processed in parallel to test2, depending on the ability of test2 to take 
all the connections first).
- each test could use keep-alive connections.

* USING option httpclose

Now, if we compare with "option httpclose" :
Date      Frontend              {test1}  {test2}  {test3}  {test4}  Global
00:00:00  close                 0                                   0
00:00:01  close                 0        0                          0
00:00:02  close                 0        0        0                 0
00:00:03  close                 0        0                          0
00:00:04  close                 0        0                          0
00:00:05  close                 0        0                          0
00:00:06  close                 0        0                          0
00:00:07  close                 0        0                          0
00:00:08  close                 0        0                          0
00:00:09  close                 0        0                          0
00:00:10  close                 0        0                          0
00:00:11  close                 0        0                          0
00:00:12  close                 0        0                          0
00:00:13  close                 0        0                          0
00:00:14  close                 0        0                          0
00:00:15  close                          0                          0
00:00:16  close                                            0        0
00:00:17  close                                            0        0
00:00:18  close                                            0        0
00:00:19  close                                            0        0
00:00:20  close                                            0        0
00:00:21  close                                            0        0
00:00:22  close                                            0        0
00:00:23  close                                            0        0

- test1, test2 and test3 could run concurrently.
- as wanted, no keep-alive connections were used.

* NOW USING http-server-close limit 75%

Once patched, how does haproxy could manage the same scenario using 75% of 
HTTP keep-alive connections ?

Date      Frontend              {test1}  {test2}  {test3}  {test4}  Global
00:00:00  scl-with-limit-75pct  75.57                               75.57
00:00:01  scl-with-limit-75pct  88.87    0                          73.12
00:00:02  scl-with-limit-75pct  93.24    0        0                 74.93
00:00:03  scl-with-limit-75pct  93.56    0                          74.77
00:00:04  scl-with-limit-75pct  93.92    0                          73.47
00:00:05  scl-with-limit-75pct  94.39    0                          74.61
00:00:06  scl-with-limit-75pct  92.86    0                          74.16
00:00:07  scl-with-limit-75pct  94.64    0                          74.12
00:00:08  scl-with-limit-75pct  92.39    0                          73.88
00:00:09  scl-with-limit-75pct  91.67    7.97                       47.92
00:00:10  scl-with-limit-75pct           15.2                       15.2
00:00:11  scl-with-limit-75pct           14.91                      14.91
00:00:12  scl-with-limit-75pct           14.78                      14.78
00:00:13  scl-with-limit-75pct           14.94                      14.94
00:00:14  scl-with-limit-75pct           16.92                      16.92
00:00:15  scl-with-limit-75pct           100                        100
00:00:16  scl-with-limit-75pct                             73.83    73.83
00:00:17  scl-with-limit-75pct                             74.68    74.68
00:00:18  scl-with-limit-75pct                             73.6     73.6
00:00:19  scl-with-limit-75pct                             74.42    74.42
00:00:20  scl-with-limit-75pct                             74.55    74.55
00:00:21  scl-with-limit-75pct                             74.65    74.65
00:00:22  scl-with-limit-75pct                             73.56    73.56
00:00:23  scl-with-limit-75pct                             74.62    74.62

- test1, test2 and test3 could run concurrently.
- 75% of the global connections still could use HTTP Keep-Alive.
- As test2 started after test1 reached the limit, it couldn't use keep-alive 
connections until test1 finished.
- test4 shows that once alone, the test could use almost 75% of keep-alive 
connections.

The same observations can be done with different values, depending on how we 
ant to tune the proxy.
For example with 95% of the connections :
Date      Frontend              {test1}  {test2}  {test3}  {test4}  Global
00:00:00  scl-with-limit-95pct  94.32                               94.32
00:00:01  scl-with-limit-95pct  98.73    0                          94.1
00:00:02  scl-with-limit-95pct  100      0        |                 94.3
00:00:03  scl-with-limit-95pct  99.3     0        |                 93.83
00:00:04  scl-with-limit-95pct  99.35    0        0                 94.02
00:00:05  scl-with-limit-95pct  100      0                          93.88
00:00:06  scl-with-limit-95pct  99.42    0                          94.27
00:00:07  scl-with-limit-95pct  100      0                          93.86
00:00:08  scl-with-limit-95pct  97.73    0                          79.26
00:00:09  scl-with-limit-95pct  0        87.87                      86.29
00:00:10  scl-with-limit-95pct           100                        100
00:00:11  scl-with-limit-95pct           100                        100
00:00:12  scl-with-limit-95pct           100                        100
00:00:13  scl-with-limit-95pct           88.93                      88.93
00:00:14  scl-with-limit-95pct           79.38                      79.38
00:00:15  scl-with-limit-95pct           78.86                      78.86
00:00:16  scl-with-limit-95pct           84.3                       84.3
00:00:17  scl-with-limit-95pct                             94.49    94.49
00:00:18  scl-with-limit-95pct                             93.94    93.94
00:00:19  scl-with-limit-95pct                             94.01    94.01
00:00:20  scl-with-limit-95pct                             94.27    94.27
00:00:21  scl-with-limit-95pct                             93.99    93.99
00:00:22  scl-with-limit-95pct                             93.97    93.97
00:00:23  scl-with-limit-95pct                             93.92    93.92
00:00:24  scl-with-limit-95pct                             94.23    94.23

- as most of the connections are used by test1 and test2, test3 was a little 
delayed until a connection becomes available (because 5% of them are not 
persistant).

The same tests were also done with a 1mb static file, with the same results.

If the idea is OK for you, I can release a patch, the time for me to review 
the documentation.

-- 
wCyril Bonté

Reply via email to