Hi Willy and the list, I couldn't find time for haproxy for some weeks. Now I'm on holidays, I try to review some patches I had on my test machine. One of them is the possibility to limit the number of HTTP keep-alive connections to allow a better concurrency between clients.
I propose to add a suboption to the "http-server-close" one to let haproxy fall back to a "httpclose" mode once a certain number of connections on the frontend is reached. The value can be defined : - as an absolute limit Example : maxconn 1000 option http-server-close limit 500 - or as a percent of the frontend maxconn Example : maxconn 1000 option http-server-close limit 75% Let me illustrate the benefits, sorry if it's a bit long to read ;-) * THE CONFIGURATION First, I used this configuration : (maxconn values were set to 150 to ease the tests on a laptop that was not tuned for high # of connections) global log localhost local7 debug err defaults timeout server 60s timeout client 60s timeout connect 5s timeout http-keep-alive 5s log global option httplog listen scl-without-limit bind :8000 maxconn 150 mode http option http-server-close capture request header User-Agent len 5 server local 127.0.0.1:80 maxconn 150 listen close bind :8001 maxconn 150 mode http option httpclose capture request header User-Agent len 5 server local 127.0.0.1:80 maxconn 150 listen scl-with-limit-75pct bind :8002 maxconn 150 mode http option http-server-close limit 75% capture request header User-Agent len 5 server local 127.0.0.1:80 maxconn 150 listen scl-with-limit-95pct bind :8003 maxconn 150 mode http option http-server-close limit 95% capture request header User-Agent len 5 server local 127.0.0.1:80 maxconn 150 listen scl-with-limit-50pct bind :8004 maxconn 150 mode http option http-server-close limit 50% capture request header User-Agent len 5 server local 127.0.0.1:80 maxconn 150 listen scl-with-limit-25pct bind :8005 maxconn 150 mode http option http-server-close limit 25% capture request header User-Agent len 5 server local 127.0.0.1:80 maxconn 150 And I defined a test URL that waits some times before replying (100ms in this tests). * THE SCENARIO The scenario I used is : ab -H "User-Agent: test1" -n10000 -c150 -k http://localhost:<port>/ & sleep 1 ab -H "User-Agent: test2" -n10000 -c150 -k http://localhost:<port>/ & sleep 1 curl -H "User-Agent: test3" http://localhost:<port>/ and as soon as each "ab" instances are done, I launch a final "ab" test to compare : ab -H "User-Agent: test4" -n10000 -c150 -k http://localhost:<port>/ I've written a log analyzer to sum up the scenario execution, second by second. For each test, it shows : - the HTTP keep-alive efficiency - when the test could really obtain its first response (the '|' characters indicates that the test is started but is waiting for a connection). - how long the test runned to obtain the last response and the global keep-alive efficiency measured. * USING option http-server-close Let's see what happens with this scenario when we use the current "http-server-close" option : Date Frontend {test1} {test2} {test3} {test4} Global 00:00:00 scl-without-limit 100 100 00:00:01 scl-without-limit 100 | 100 00:00:02 scl-without-limit 100 | | 100 00:00:03 scl-without-limit 100 | | 100 00:00:04 scl-without-limit 100 | | 100 00:00:05 scl-without-limit 100 | | 100 00:00:06 scl-without-limit 100 | | 100 00:00:07 scl-without-limit 100 | | 100 00:00:08 scl-without-limit 100 | | 100 00:00:19 scl-without-limit 100 | | 100 00:00:10 scl-without-limit 100 | | 100 00:00:11 scl-without-limit 100 | | 100 00:00:12 scl-without-limit 100 | | 100 00:00:13 scl-without-limit 100 | 100 00:00:14 scl-without-limit 100 | 100 00:00:15 scl-without-limit 100 | 100 00:00:16 scl-without-limit 100 | 100 00:00:17 scl-without-limit 100 | 100 00:00:18 scl-without-limit 100 | 100 00:00:19 scl-without-limit 100 | 100 00:00:20 scl-without-limit 100 | 100 00:00:21 scl-without-limit 100 100 00:01:22 scl-without-limit 100 100 00:01:23 scl-without-limit 100 100 00:01:24 scl-without-limit 100 100 00:01:25 scl-without-limit 100 100 00:01:26 scl-without-limit 100 100 00:01:27 scl-without-limit 100 100 00:01:28 scl-without-limit 100 100 00:01:29 scl-without-limit 100 100 - test1 used all the connections allowed by haproxy. - test2 can't obtain any connection unless test1 is finished. - test3 also have to wait until test1 and test2 are finished (sometimes it can be processed in parallel to test2, depending on the ability of test2 to take all the connections first). - each test could use keep-alive connections. * USING option httpclose Now, if we compare with "option httpclose" : Date Frontend {test1} {test2} {test3} {test4} Global 00:00:00 close 0 0 00:00:01 close 0 0 0 00:00:02 close 0 0 0 0 00:00:03 close 0 0 0 00:00:04 close 0 0 0 00:00:05 close 0 0 0 00:00:06 close 0 0 0 00:00:07 close 0 0 0 00:00:08 close 0 0 0 00:00:09 close 0 0 0 00:00:10 close 0 0 0 00:00:11 close 0 0 0 00:00:12 close 0 0 0 00:00:13 close 0 0 0 00:00:14 close 0 0 0 00:00:15 close 0 0 00:00:16 close 0 0 00:00:17 close 0 0 00:00:18 close 0 0 00:00:19 close 0 0 00:00:20 close 0 0 00:00:21 close 0 0 00:00:22 close 0 0 00:00:23 close 0 0 - test1, test2 and test3 could run concurrently. - as wanted, no keep-alive connections were used. * NOW USING http-server-close limit 75% Once patched, how does haproxy could manage the same scenario using 75% of HTTP keep-alive connections ? Date Frontend {test1} {test2} {test3} {test4} Global 00:00:00 scl-with-limit-75pct 75.57 75.57 00:00:01 scl-with-limit-75pct 88.87 0 73.12 00:00:02 scl-with-limit-75pct 93.24 0 0 74.93 00:00:03 scl-with-limit-75pct 93.56 0 74.77 00:00:04 scl-with-limit-75pct 93.92 0 73.47 00:00:05 scl-with-limit-75pct 94.39 0 74.61 00:00:06 scl-with-limit-75pct 92.86 0 74.16 00:00:07 scl-with-limit-75pct 94.64 0 74.12 00:00:08 scl-with-limit-75pct 92.39 0 73.88 00:00:09 scl-with-limit-75pct 91.67 7.97 47.92 00:00:10 scl-with-limit-75pct 15.2 15.2 00:00:11 scl-with-limit-75pct 14.91 14.91 00:00:12 scl-with-limit-75pct 14.78 14.78 00:00:13 scl-with-limit-75pct 14.94 14.94 00:00:14 scl-with-limit-75pct 16.92 16.92 00:00:15 scl-with-limit-75pct 100 100 00:00:16 scl-with-limit-75pct 73.83 73.83 00:00:17 scl-with-limit-75pct 74.68 74.68 00:00:18 scl-with-limit-75pct 73.6 73.6 00:00:19 scl-with-limit-75pct 74.42 74.42 00:00:20 scl-with-limit-75pct 74.55 74.55 00:00:21 scl-with-limit-75pct 74.65 74.65 00:00:22 scl-with-limit-75pct 73.56 73.56 00:00:23 scl-with-limit-75pct 74.62 74.62 - test1, test2 and test3 could run concurrently. - 75% of the global connections still could use HTTP Keep-Alive. - As test2 started after test1 reached the limit, it couldn't use keep-alive connections until test1 finished. - test4 shows that once alone, the test could use almost 75% of keep-alive connections. The same observations can be done with different values, depending on how we ant to tune the proxy. For example with 95% of the connections : Date Frontend {test1} {test2} {test3} {test4} Global 00:00:00 scl-with-limit-95pct 94.32 94.32 00:00:01 scl-with-limit-95pct 98.73 0 94.1 00:00:02 scl-with-limit-95pct 100 0 | 94.3 00:00:03 scl-with-limit-95pct 99.3 0 | 93.83 00:00:04 scl-with-limit-95pct 99.35 0 0 94.02 00:00:05 scl-with-limit-95pct 100 0 93.88 00:00:06 scl-with-limit-95pct 99.42 0 94.27 00:00:07 scl-with-limit-95pct 100 0 93.86 00:00:08 scl-with-limit-95pct 97.73 0 79.26 00:00:09 scl-with-limit-95pct 0 87.87 86.29 00:00:10 scl-with-limit-95pct 100 100 00:00:11 scl-with-limit-95pct 100 100 00:00:12 scl-with-limit-95pct 100 100 00:00:13 scl-with-limit-95pct 88.93 88.93 00:00:14 scl-with-limit-95pct 79.38 79.38 00:00:15 scl-with-limit-95pct 78.86 78.86 00:00:16 scl-with-limit-95pct 84.3 84.3 00:00:17 scl-with-limit-95pct 94.49 94.49 00:00:18 scl-with-limit-95pct 93.94 93.94 00:00:19 scl-with-limit-95pct 94.01 94.01 00:00:20 scl-with-limit-95pct 94.27 94.27 00:00:21 scl-with-limit-95pct 93.99 93.99 00:00:22 scl-with-limit-95pct 93.97 93.97 00:00:23 scl-with-limit-95pct 93.92 93.92 00:00:24 scl-with-limit-95pct 94.23 94.23 - as most of the connections are used by test1 and test2, test3 was a little delayed until a connection becomes available (because 5% of them are not persistant). The same tests were also done with a 1mb static file, with the same results. If the idea is OK for you, I can release a patch, the time for me to review the documentation. -- wCyril Bonté