On 12 May 2011, at 12:48, Anton Lundin wrote:

> The smp-scaling in the fileserver is really bad. Have anyone done any
> profiling on what is causing this? Is any work getting done on this?

In general, recent work on the fileserver has been focussing on correctness, 
rather than on performance. We do have a number of results that point at poor 
SMP scaling of both the 1.4.x, and (sadly) the 1.6.x fileservers. In 
particular, many workloads seem to benefit from having a lower number of 
threads than the maximum permitted. This is obviously not ideal

As Derrick noted, the first thing would be to try the 1.6.0 prerelease 
fileserver. There are substantial changes in various parts of the fileserver in 
1.6.x, even if you don't end up running demand attach. As far as I'm aware, 
little benchmarking has been performed of these changes, so it would be very 
interesting to see how both the demand attach, and normal, fileservers perform 
in your tests.

What has received substantial performance attention in the 1.6.x series is the 
RX transport protocol. We know that the RX that will ship in 1.6.x is 
substantially faster than that in 1.4.x. If you are on an i?86 platform some of 
these performance improvements will only be apparent if you build for an i586 
(or i686) architecture.

There are also a couple of RX "features" that will cause single user workloads 
to scale particularly badly.

Firstly, hot threads. In a typical dispatcher architecture one thread will read 
from the network, and hand incoming packets off to worker threads to handle 
them. This obviously entails a context switch, and for the data to be passed 
between threads. To avoid this, RX has "hot threads". The process which 
receives an incoming packet is the one which will handle it. The next free 
process then starts listening on the network. So, the process handling a given 
packet is constantly switching. Where there is a substantial amount of context 
associated with a packet (connection data, volume data, inode data, etc), if 
these two threads are scheduled on different cores, then a lot of data is 
constantly being swapped around. You might find, therefore, that disabling hot 
threads actually improves your performance.

Secondly, the way we round robin processes. In effect, we use an LRU queue to 
schedule idle threads. If we have five threads A, B, C, D, E, then packet 1 
will be handled by A, whilst B becomes the listener. packet 2 goes to B, and C 
starts listening, packet 3 to C, packet 4 to D, packet 5 to E, and packet 6 
back to A again. On a machine with 128 threads, and only 4 cores, there's a lot 
of churn here. Pulling the last entry, rather than the first entry from the 
idle queue would solve this problem - I have a patch for this change, but don't 
currently have access to any machines to test it on.

It's worth noting that both of these are likely to be particular issues in the 
single user case. On busier fileservers (where the number of connections is 
more than the number of cores) there will inevitably be churn, so I suspect 
that the performance degredation as cores come online will be much less marked.

Cheers,

Simon.

_______________________________________________
OpenAFS-devel mailing list
[email protected]
https://lists.openafs.org/mailman/listinfo/openafs-devel

Reply via email to