What would such network filesystems report as their blocksize? I have a
feeling it isn't going to be on the order of a MB. At least for local
filesystems, the ideal transfer block size is going to be quite a bit
larger than the filesystem block size ( if the filesystem is even block
oriented... think reiser4, or cramfs ). In the case of network
filesystems, they should be performing readahead in the background
between small block copies to keep the pipeline full. As long as the
copy program isn't blocked elsewhere for long periods, say in the write
to the destination, then the readahead mechanism should keep the
pipeline full. Up to a point, using larger block sizes saves some cpu
by lowering the number of system calls. After a certain point, the copy
program can start to waste enough time in the write that the readahead
stops and stalls the pipeline.
If you want really fast copies of large files, then you want to send
down multiple overlapped aio ( real aio, not the glibc threaded
implementation ) O_DIRECT reads and writes, but that gets quite
complicated. Simply using blocking O_DIRECT reads into a memory mapped
destination file buffer performs nearly as well, provided you use a
decent block size. On my system I have found that 128 KB+ buffers are
needed to keep the pipeline full because I'm using a 2 disk raid0 with a
64k stripe factor. As a result, blocks smaller than 128 KB only keep
one disk going at a time. That's probably getting a bit too complicated
though for this conversation.
If we are talking about the conventional blocking cached read, followed
by a blocking cached write, then I think you will find that using a
buffer size of several pages ( say 32 or 64 KB ) will be MUCH more
efficient than 1024 bytes ( the typical local filesystem block size ),
so using st_blksize for the size of the read/write buffer is not good.
I think you may be ascribing meaning to st_blksize that is not there.
Robert Latham wrote:
In local file systems, i'm sure you are correct. If you are working
with a remote file system, however, the optimal size is on the order
of megabytes, not kilobytes. For a specific example, consider the
PVFS2 file system, where the plateau in "blocksize vs. bandwitdh" is
two orders of magnitude larger than 64 KB. PVFS2 is a parallel file
system for linux clusters. I am not nearly as familiar with Lustre,
GPFS, or GFS, but I suspect those filesystems too would benefit from
block sizes larger than 64 KB.
Are you taking umbrage at the idea of using st_blksize to direct how
large the transfer size should be for I/O? I don't know what other
purpose st_blksize should have, nor are there any other fields which
are remotely valid for that purpose.
Thanks for your feedback.
==rob
_______________________________________________
Bug-coreutils mailing list
Bug-coreutils@gnu.org
http://lists.gnu.org/mailman/listinfo/bug-coreutils