On Wed, Mar 3, 2010 at 2:30 PM, Andreas Dilger <adil...@sun.com> wrote:
> On 2010-03-03, at 12:50, Jagga Soorma wrote: > >> I have just deployed a new Lustre FS with 2 MDS servers, 2 active OSS >> servers (5x2TB OST's per OSS) and 16 compute nodes. >> > > Does this mean you are using 5 2TB disks in a single RAID-5 OST per OSS > (i.e. total OST size is 8TB), or are you using 5 separate 2TB OSTs? No I am using 5 independent 2TB OST's per OSS. > > > Attached are our findings from the iozone tests and it looks like the >> iozone throughput tests have demonstrated almost linear scalability of >> Lustre except for when WRITING files that exceed 128MB in size. When >> multiple clients create/write files larger than 128MB, Lustre throughput >> levels up to approximately ~1GB/s. This behavior has been observed with >> almost all tested block size ranges except for 4KB. I don't have any >> explanation as to why Lustre performs poorly when writing large files. >> >> Has anyoned experienced this behaviour? Any comments on our findings? >> > > > The default client tunable max_dirty_mb=32MB per OSC (i.e. the maximum > amount of unwritten dirty data per OST before blocking the process > submitting IO). If you have 2 OST/OSCs and you have a stripe count of 2 > then you can cache up to 64MB on the client without having to wait for any > RPCs to complete. That is why you see a performance cliff for writes beyond > 32MB. > So the true write performance should be measured for data captured for files larger than 128MB? If we do see a large number of large files being created on the lustre fs, is this something that can be tuned on the client side? If so, where/how can I get this done and what would be the recommended settings? > It should be clear that the read graphs are meaningless, due to local cache > of the file. I'd hazard a guess that you are not getting 100GB/s from 2 OSS > nodes. > Agreed. Is there a way to find out the size of the local cache on the clients? > > Also, what is the interconnect on the client? If you are using a single > 10GigE then 1GB/s is as fast as you can possibly write large files to the > OSTs, regardless of the striping. > I am using Infiniband (QDR) interconnects for all nodes. > > Cheers, Andreas > -- > Andreas Dilger > Sr. Staff Engineer, Lustre Group > Sun Microsystems of Canada, Inc. > >
_______________________________________________ Lustre-discuss mailing list Lustre-discuss@lists.lustre.org http://lists.lustre.org/mailman/listinfo/lustre-discuss