On Mon, Dec 09, 2013 at 03:43:09PM +0000, Annika Wickert wrote: > - Two Intel(R) Xeon(R) CPU X6550 @ 2.00GHz in each cluster node > - 2x Emulex Corporation OneConnect 10Gb NIC (rev 02) in each cluster node > - 32gbit RAM in each cluster node > - Two nodes per cluster (active-active in the new one)
I never had the opportunity to test Emulex NICs yet. It could be possible that they disable some TCP optimizations by default resulting in worse performance with splice(). > - Debian Squeeze / 3.1.0-1-amd64 / Tickrate 250 > - CentOS release 6.4 (Final) / 3.11.5-1.el6 / Tickrate 1000 > > The higher the tickrate, the higher the CPU load. You quadripled > the tickrate, and your load what - quadripled? I suggest you > try a lower tickrate in the very same configuration. 250 is the best tick rate for network related traffic, it allows a number of timing conversions to milliseconds to be done with a simple shift instead of a divide, while not hammering the system too fast. > - We are forcing by splice-request / splice-responce OK so I suspect this is purely TCP. > I believe splice is not always more efficient than recv/send; Confirmed, especially with small transfers (less than a page = 4 kB). > use splice-auto to use it less aggressively (doc: splice-auto): > > For testing we disabled splicing on one of the cluster members on the new > cluster (after succesfull tests). Now load drops below 8 from 16. So I maybe > try it with splice-auto and if that does not help with a new haproxy build > with the following git commits: > http://haproxy.1wt.eu/git?p=haproxy.git;a=commit;h=61d39a0e2a047df78f7f3bfcf5584090913cdc65 Oh good point, I completely forgot about this one. Yes it could be a culprit! > http://haproxy.1wt.eu/git?p=haproxy.git;a=commit;h=fa8e2bc68c583a227ebc78bab5779b84065b28da > > Haproxy uses heuristics to estimate if kernel splicing might improve > performance or not. Both directions are handled independently. Note > that the heuristics used are not much aggressive in order to limit > excessive use of splicing. Yes, the heuristics consist in detecting if haproxy manages to read a full buffer a once and to purge it at once. If that works, then it's considered that the traffic is high enough for making a good use of splice(). Otherwise with non-complete buffers, it sticks to recv/send. It tends to work really well in web environments when you don't want favicon.ico to be spliced but you want your photos to be. Regards, Willy