Re: [OpenAFS] performance and udp buffers

Simon Wilkinson Sun, 18 Nov 2012 13:35:05 -0800

On 9 Oct 2012, at 10:24, Dan Van Der Ster wrote:
> We currently run fileservers with udpsize=2MB, and at that size we have a 30 
> client limit in our test environment. With a buffer size=8MB (increased 
> kernel max with sysctl and fileserver option), we don't see any dropped UDP 
> packets during our client-reading stress test, but still get some dropped 
> packets if all clients write to the server. With a 16MB buffer we don't see 
> any dropped packets at all in reading or writing.


This was discussed in Edinburgh as part of the CERN site report (which I'd 
recommend to anyone interested in AFS server performance), however it's just 
occurred to me that nothing made it back to the list. As I've been looking at 
this whole area in more detail for the work I'm doing on YFS's RX stack, I 
thought it would be worth summarising what's happening here.

Sizing the UDP buffer for RX is tricky because unlike TCP, a single UDP buffer 
has to be large enough to handle all of the currently outstanding streams (TCP 
has a buffer per connection, we have a single buffer per server).

In order to avoid any packet loss at all, the UDP buffer has to be big enough 
to handle all of the packets which may be in flight at a particular moment in 
time. For each simultaneous call that the server can handle, there must be a 
full window's worth of packets available. Simultaneous calls are determined by 
the number of threads in the server - so for a typical OpenAFS installation, 
this is 128 (threads) x 32 (window size) = 4096 packets. Calls which are 
"waiting for a thread" can then each consume a single packet (they only have a 
window size of 1 until an application thread is allocated and data starts being 
consumed). You also need packets to be able to send pings and other various RX 
housekeeping - typically 1 packet for each client that a server has "recently" 
seen.

With the 1.4.x series, packet loss was a bad thing - the fast recovery 
implementation was broken, so any packet loss at all would put the connection 
back to square one, and had a significant effect on throughput. With 1.6.x, 
packet loss is generally dealt with through fast recovery, and the impact on 
throughput is less. That said, avoiding unnecessary packet loss is always good!

For a heavily loaded OpenAFS server with 128 threads, I would plan to have a 
receive buffer of around 5000 packets. If you increase the number of threads, 
you should similarly increase the size of your packet buffers.

Converting that number of packets into a buffer size is a bit of a dark art. 
I'm only going to discuss the situation on Linux.

The first wrinkle is that internally Linux takes your selected buffer size, and 
doubles it. So setting a buffer of 8Mbyte actually permits 16Mbytes of kernel 
memory to be used by the RX socket.

The second wrinkle is that allocations from this buffer are counted according 
to the way in which memory is managed by your network card, by the socket 
buffer, and by the kernel's allocator. Very roughly, each packet will take the 
MTU of your network, plus the socket buffer overhead, rounded to the nearest 
bucket used by the kernel allocator. With a standard ethernet, the MTU will be 
1500 bytes. The socket buffer overhead depends on your kernel architecture and 
configuration, but is potentially around 600 bytes on x86_64. This is large 
enough to put each packet allocation into the 4096 byte allocator bucket.

So, setting a UDP buffer of 8Mbytes from user space is _just_ enough to handle 
4096 incoming RX packets on a standard ethernet. However, it doesn't give you 
enough overhead to handle pings and other management packets. 16Mbytes should 
be plenty providing that you don't

a) Dramatically increase the number of threads on your fileserver
b) Increase the RX window size
c) Increase the ethernet frame size of your network (what impact this has 
depends on the internals of your network card implementation)
d) Have a large number of 1.6.0 clients on your network

To summarise, and to stress Dan's original point - if you're running with the 
fileserver default buffer size (64k, 16 packets), or with the standard Linux 
maximum buffer size (128k, 32 packets), you almost certainly don't have enough 
buffer space for a loaded fileserver.

Hope that all helps!

Cheers,

Simon

_______________________________________________
OpenAFS-info mailing list
OpenAFS-info@openafs.org
https://lists.openafs.org/mailman/listinfo/openafs-info

Re: [OpenAFS] performance and udp buffers

Reply via email to