On Thu, Nov 22, 2007 at 10:15:25AM -0500, Mark Hahn wrote: > with that in mind, my opinion is that cluster IO testing should be > a combination of: > - parallel streaming IO to separate files - resembling a checkpoint, > or an IO-intensive app reading, or an app where the user forgot to > turn off debugging. > - smallish metadata-heavy traffic like time(tar zxf;make;make clean).
The word 'distributed' in the subject is telling... I like to make a distiction between 'distributed', 'cluster', and 'parallell' file systems. Distributed: uncorrdinated access among processes. Possibly over the wide area. Total capacity is important, but performance is not. Cluster: local access only. maybe homedir-style accesses (lots of metadata operations, lots of small file creation/reading/writing -- unpack a tarball, compile a kernel). also has uncoordinated access among many processes. Parallel: a high performance file system for parallel applications doing large amounts of I/O. Coordinated access, likely via MPI-IO. This is verring a bit off topic from the original question... I'd like to suggest that I/O to separate files, while certainly a popular I/O workload, should be considered a legacy workload, or at the very least not a high-performance workload. Applications should be encouraged if at all possible to do their I/O to a single large file. Supercompuer applications, further, should do all their I/O through either MPI-IO or a high-level library on top of MPI-IO (parallel-HDF5, parallel-NetCDF, etc). Lots of files compilcates the data management problem and eliminiates several optimization opportunities for the I/O software stack. ==rob -- Rob Latham Mathematics and Computer Science Division A215 0178 EA2D B059 8CDF Argonne National Lab, IL USA B29D F333 664A 4280 315B _______________________________________________ Beowulf mailing list, [email protected] To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf
