Hi Erik, On Fri, Mar 12, 2010 at 11:08:08AM +0100, Erik Gulliksson wrote: > Hi! > > First, I'd like to thank Willy and the other haproxy contributors for > bringing this wonderful piece of software into the world :)
Thanks ! > For the last 2 years now we have been running haproxy 1.3 successfully > to load balance our frontend applications and storage services. Mainly > the requests passing through our haproxy instances are WebDAV > commands. Since there were some new sought-after features announced in > the new stable 1.4 branch, yesterday we gave it a go and upgraded to > haproxy from 1.3.22 to 1.4.1 in our production environment (simply > replaced active binary with -sf switch). After the new version was > deployed our incoming traffic slowly dropped from approximately > 150Mbps to 80Mbps (as ongoing requests were still processed by > 1.3.22). The configuration file were not changed between the two > versions, so we have not yet started use any of the new config options > for 1.4 (http-server-close etc). Because of the drop in throughput we > have now rolled back to 1.3.22 (and traffic levels are back to > normal). Did you observe anything special about the CPU usage ? Was it lower than with 1.3 ? If so, it would indicate some additional delay somewhere. If it was higher, it could indicate that the Transfer-encoding parser takes too many cycles but my preliminary tests proved it to be quite efficient. > What differ our service from most other online services is that we are > more of a "content-consumer" rather than a content provider. The > requests that are generating our traffic volume is mostly large and > small PUT requests with Transfer-Encoding: chunked. Is this type of > requests included in any of your tests or benchmarks? No, I've run POST requests (very similar to PUT), except that there was no Transfer-Encoding in the requests. It's interesting that you're doing that in the request, because Apache removed support for TE:chunked a few years ago because there was no user. Also, most of my POST tests were not performance related. > Do you have a > clue of what might have changed in the code base to cause this > behavior? Any suggestions for where to go from here (other than > sticking with 1.3 :) is greatly appreciated. A big part has changed, in previous version, haproxy did not care at all about the payload. It only saw headers. Now with keepalive support, it has to find requests/responses bounds and as such must parse the transfer-encoding and content-lengths. However, transfer encoding is nice to components such as haproxy because it's very cheap. Haproxy reads a chunk size (one line), then forwards that many bytes, then reads a new chunk size, etc... So this is really a cheap operation. My tests have shown no issue at gigabit/s speeds with just a few bytes per chunk. I suspect that the application tries to use the chunked encoding to simulate a bidirectionnal access. In this case, it might be waiting for data pending in the kernel buffers which were sent by haproxy with the MSG_MORE flag, indicating that more data are following (and so you should observe a low CPU usage). Could you please do a small test : in src/stream_sock.c, please comment out line 616 : 615 /* this flag has precedence over the rest */ 616 // if (b->flags & BF_SEND_DONTWAIT) 617 send_flag &= ~MSG_MORE; It will unconditionally disable use of MSG_MORE. If this fixes the issue for you, I'll probably have to add an option to disable this packet merging for very specific applications. Regards, Willy