On Thu, May 03, 2018 at 02:51:12PM +0200, Pavlos Parissis wrote: > On 03/05/2018 02:45 uu, Olivier Houchard wrote: > > Hi Pavlos, > > > > On Thu, May 03, 2018 at 12:45:42PM +0200, Pavlos Parissis wrote: > >> Hi, > >> > >> Linux kernel version 4.14 adds support for zero-copy from user memory to > >> TCP sockets by setting > >> MSG_ZEROCOPY flag. This is for the sending side of the socket, for the > >> receiving side of the socket > >> we need to wait for kernel version 4.18. > >> > >> Will you consider enabling this on HAProxy? > >> > >> More info can be found here, > >> https://www.kernel.org/doc/html/latest/networking/msg_zerocopy.html > > > > After some discussion with Willy, we're not sure it is worth it. > > It would force us to release buffer much later than we do actually, it can't > > be used with SSL, and we already achieve zero-copy by using splicing. > > > > Is there any specific case where you think it'd be a huge win ? > > > > The only use case that I can think of is HTTP streaming. But, without testing > it we can't say a lot.
In fact, for HTTP streaming, splicing already does it all and even better since it only manipulates a few pointers in the kernel between the source and destination socket buffers. Userspace is not even involved. Also it's important to remember that while copies are best avoided whenever possible, they aren't that dramatic at the common traffic rates. I've already reached 60 Gbps of forwarded traffic with and without splicing on a 4-core machine. One aspect to keep in mind is the following. A typical Xeon system will achieve around 20 GB/s of in-L3 memcpy() bandwidth. For a typical 16kB buffer, that's only 760 ns to copy the whole buffer, which is roughly the cost of the extra syscall needed to check that the transfer completed. At 10 Gbps, this represents only 6.25% of the total processing time. And there's something much more important : with the copy operation, the buffer is released after these 760 ns and immediately recycled for other connections. This ensures that the memory usage remains low and that most transfer operations are made in L3 instead of RAM. If you use zero-copy here, instead your memory will be pinned for the time it takes to cycle on many other connections and get back to processing this FD. It can very easily become 10-100 microseconds, or 15-150 times more, resulting in much more RAM usage for temporary buffers, and thus a much higher cache footprint. In my opinion MSG_ZEROCOPY was designed for servers, those which stream video and so on, and which produce their own data, and which don't need to recycle their buffers. We're definitely not in this case at all here, we're just forwarding ephemeral data so we can recycle buffers very quickly and through splicing we can even avoid to see these data at all. Hoping this helps, Willy

