[Pvfs2-developers] tuning kernel buffer settings

Phil Carns Wed, 29 Nov 2006 07:35:08 -0800

We recently ran some tests that we thought would be interesting toshare. We used the following setup:


- single client
- 16 servers
- gigabit ethernet
- read/write tests, with 40 GB files
- using reads and writes of 100 MB each in size
- varying number of processes running concurrently on the client

This test application can be configured to be run with multipleprocesses and/or multiple client nodes. In this case we kept everythingon a single client to focus on bottlenecks on that side.

What we were looking at was the kernel buffer settings controlled inpint-dev-shared.h. By default PVFS2 uses 5 buffers of 4 MB each. Afterexperimenting for a while, we made a few observations:


- increasing the buffer size helped performance

- using only 2 buffers (rather than 5) was sufficient to saturate theclient when we were running multiple processes; adding more made only amarginal difference

We found good results using 2 32MB buffers. Here are some comparisonsbetween the standard settings and the 2 x 32MB configuration:


results for RHEL4 (2.6 kernel):
------------------------------
5 x 4MB, 1 process: 83.6 MB/s
2 x 32MB, 1 process: 95.5 MB/s

5 x 4MB, 5 processes: 107.4 MB/s
2 x 32MB, 5 processes: 111.2 MB/s

results for RHEL3 (2.4 kernel):
-------------------------------
5 x 4MB, 1 process: 80.5 MB/s
2 x 32MB, 1 process: 90.7 MB/s

5 x 4MB, 5 processes: 91 MB/s
2 x 32MB, 5 processes: 103.5 MB/s


A few comments based on those numbers:

- on 3 out of 4 tests, we saw a 13-15% performance improvement by goingto 2 32 MB buffers- the remaining test (5 process RHEL4) probably did not see as muchimprovement because we maxed out the network. In the past, netpipe hasshown that we can get around 112 MB/s out of these nodes.- the RHEL3 nodes are on a different switch, so it is hard to say howmuch of the difference from RHEL3 to RHEL4 is due to network topologyand how much is due to the kernel version

It is also worth noting that even with this tuning, the single processtests are about 14% slower than the 5 process tests. I am guessing thatthis is due to a lack of pipelining, probably caused by two things:

- the application only submitting one read/write at a time

- the kernel module itself serializing when it breaks reads/writes intobuffer sized chunks

The latter could be addressed by either pipelining the I/O through thebufmap interface (so that a single read or write could keep multiplebuffers busy) or by going to a system like Murali came up with formemory transfers a while back that isn't limited by buffer size.

It would also be nice to have a way to set these buffer settings withoutrecompiling- either via module options or via pvfs2-client-core commandline options. For the time being we are going to hard code our tree torun with the 32 MB buffers. The 64 MB of RAM that this uses up (vs. 20MB with the old settings) doesn't really matter for our standard nodefootprint.


-Phil
_______________________________________________
Pvfs2-developers mailing list
Pvfs2-developers@beowulf-underground.org
http://www.beowulf-underground.org/mailman/listinfo/pvfs2-developers

[Pvfs2-developers] tuning kernel buffer settings

Reply via email to