Hi all,
What performance do you typically see with a single client and single
server (not the same machine) with 10 Gb/s NICs?
I am using pvfs2-cp to copy a 1 GB file from the client to the
server. The client is reading from a tmpfs mount so it does not use
disk (I am not swapping). The server's backing store is also tmpfs. I
set FlowBufferSizeBytes to 1 MB. With tweaking, I am seeing about 400
MB/s.
On the same machine, if I use dd to copy from /dev/zero to /mnt/tmpfs/
zeros using 1 MB blocks, I get 300 MB/s for a 1 GB file.
Initially, I used the dumbest of BMI_meth_memalloc() and
BMI_meth_memfree(), where they are simply calls to malloc() and free
(), and I was getting about 300 MB/s. Thinking that this was the
problem, I tinkered with mallopt() to set higher thresholds for trim
and mmap. This added about 50 MB/s.
Next, I added pre-malloced memory on startup and I manage a list of
these buffers. This added another 50 MB/s to get me to 400 MB/s. I
tried playing with pvfs2-cp's -b option but performance never
improved over the default behavior. Interestingly, on the client,
pvfs2-cp only uses two 1 MB buffers (over and over) for the entire 1
GB transfer. Is this intentional? Does this mean, that only one
buffer is in flight while the other is being filled? Is there a way
to get pvfs2-cp to use more concurrent messages?
With Lustre, I see ~675 MB/s with a single client using one thread to
a single server. This is not going through the entire filesystem,
however. It is simply testing the network layer. By default, though,
Lustre will try to use 8 or 16 threads (depending on a configurable
parameter).
Scott
_______________________________________________
Pvfs2-developers mailing list
Pvfs2-developers@beowulf-underground.org
http://www.beowulf-underground.org/mailman/listinfo/pvfs2-developers