Smile I/O.  I might have to steal that tag-line.  :-)

On Sep 27, 2009, at 12:21 AM, Milo wrote:

Apologies for another post: The subject should read SMALL I/O. I am apparently in a subconsciously upbeat mood.

On Sep 27, 2009, at 1:19 AM, Milo wrote:

Hi, guys. Milo from CMU here.

I'm looking into small I/O performance on PVFS2. It's actually part of a larger project investigating possible improvements to the performance of cloud computing software, and we're using PVFS2 as a kind of upper bound for performance (e.g. writing a flat file on a parallel filesystem as opposed to updating data in an HBase table).

One barrier I've encountered is the small I/O nature of many of these Cloud Workloads. For example, the one we're looking at currently does 1 KB I/O requests even when performing sequential writes to generate a file.

On large I/O requests, I've managed to tweak PVFS2 to get close to the performance of the underlying filesystem (115 MB/s or so). But on small I/O requests performance is much lower. It seems I can only performance approximately 5,000 I/O operations/second even when sequentially writing testing on a single node server (4.7 MB/s with 1KB sequential writes. 19.0 MB/s with 4KB sequential writes). The filesystem system is mounted through the PVFS2 kernel mod. This seems similar to the Bonnie++ rates in ftp://info.mcs.anl.gov/pub/tech_reports/reports/P1010.pdf

None of this is unexpected to me and I'm happy with PVFS2's large I/ O performance. But I'd like to get a better handle on where this bottleneck is coming from, codewise (and how I could fix it if I find coding time between research). Here's some experimentation I've done so far:

1) A small pair of C client/servert programs that open and close TCP connections in a tight loop, pinging each other with a small of data ('Hello World'). I see about 10,000 connections/second with this approach. So if each small I/O is opening and closing two TCP connections, this could be the bottleneck. I haven't yet dug into the pvfs2-client code and the library to see if it reuses TCP connections or makes new ones on each request (that's deeper into the flow code than I remember. =;) )

2) I can write to the underlying filesystem with 1 KB sequential writes almost as quickly as with 1 MB writes. So it's not the underlying ext3.

3) The IO ops/s bottleneck is there even with the null-aio TroveMethod, so I doubt it's Trove.

4) atime is getting updated with null-aio, so a MetaData barrier is possible.

The size of the file gets updated with null-aio as well.


Some configuration information about the filesystem:
* version 2.8.1
* The strip_size is 4194304. Not that this should matter a great deal with one server.
* FlowBufferSizeBytes 4194304
* TroveSyncMeta and TroveSyncData are set to no
* I've applied the patch from http://www.pvfs.org/fisheye/rdiff/PVFS?csid=MAIN:slang:20090421161045&u&N to be sure metadata syncing really is off, though I'm not sure how to check. =:)

Other than seeing a big performance difference with/without the patch, there's not a good way to do that. You could strace the server and see how many syncs are being performed, but that's not ideal. Getting an idea of the performance without the kernel interface (as Rob mentioned) would help narrow down the problem, but these are IOzone runs right? Using the mpi-io-test program with the pvfs romio driver might be the easiest way to perform a similar test of small I/Os without the kernel module in the loop.


Thanks.

~Milo

PS: Should I send this to the pvfs2-developers list instead? Apologies if I've used the wrong venue.

Users is the right place. When you send patches for all the performance improvements you're making, those can go to developers. ;-)

-sam

_______________________________________________
Pvfs2-users mailing list
[email protected]
http://www.beowulf-underground.org/mailman/listinfo/pvfs2-users


_______________________________________________
Pvfs2-users mailing list
[email protected]
http://www.beowulf-underground.org/mailman/listinfo/pvfs2-users

_______________________________________________
Pvfs2-users mailing list
[email protected]
http://www.beowulf-underground.org/mailman/listinfo/pvfs2-users

Reply via email to