After doing some more digging it looks as though a bug was reported on this in 2007.
https://bugzilla.lustre.org/show_bug.cgi?id=12739 We have loaded the patch for lustre attached to this bug, however when running the set_param command I am getting the following error. lctl set_param llite*.*.stat_blksize=4096 error: set_param: /proc/{fs,sys}/{lnet,lustre}/llite/lustre*/stat_blksize: No such process Is this patch still valid for 2.6.9-78.0.22.EL_lustre.1.6.7.2smp Thanks again Rocky From: Andreas Dilger <andreas.dil...@oracle.com> To: Ronald K Long <rkl...@usgs.gov> Cc: "Brian J. Murrell" <brian.murr...@sun.com>, lustre-discuss@lists.lustre.org, lustre-discuss-boun...@lists.lustre.org Date: 04/14/2010 02:13 PM Subject: Re: [Lustre-discuss] fseeks on lustre On 2010-04-14, at 11:08, Ronald K Long wrote: > We've narrowed down the problem quite a bit. > > The problematic code snippet is not actually doing any reads or > writes; > it's just doing a massive number of fseek() operations within a couple > of nested loops. (Note: The production code is doing some I/O, but > this > snippet was narrowed down to the bare minimum example that exhibited > the > problem, which was how we discovered that fseek was the culprit.) > > The issue appears to be the behavior of the glibc implementation of > fseek(). Apparently, a call to fseek() on a buffered file stream > causes > glibc to flush the stream (regardless of whether a flush is actually > needed). If we modify the snippet to call setvbuf() and disable > buffering on the file stream before any of the fseek() calls, then it > finishes more or less instantly, as you would expect. I'd encourage you to file a bug (preferably with a patch) against glibc to fix this. I've had reasonable success in getting problems like this fixed upstream. > The problem is that this offending code is actually buried deep > within a > COTS library that we're using to do image processing (the Hierarchical > Data Format (HDF) library). While we do have access to the source > code > for this library and could conceivably modify it, this is a large and > complex library, and a change of this nature would require us to do a > large amount of regression testing to ensure that nothing was broken. > > So at the end of the day this is really not a "Lustre problem" per se, > though we would still be interested in any suggestions as to how we > can > minimize the effects of this glibc "flush penalty". This penalty is > not > particularly onerous when reading and writing to local disk, but is > obviously more of an issue with a distributed filesystem. Similarly, HDF + Lustre usage is very common, and I expect that the HDF developers would be interested to fix this if possible. > On Wed, 2010-04-14 at 07:08 -0500, Ronald K Long wrote: > > > > Andreas - Here is a snipet of the strace output. > > > > read(3, > "\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0 > > \0\0"..., 2097152) = 2097152 > > As Andreas suspected, your application is doing 2MB reads every time. > Does it really need 2MB of data on each read? If not, can you fix > your > application to only read as much data as it actually wants? Cheers, Andreas -- Andreas Dilger Principal Engineer, Lustre Group Oracle Corporation Canada Inc.
_______________________________________________ Lustre-discuss mailing list Lustre-discuss@lists.lustre.org http://lists.lustre.org/mailman/listinfo/lustre-discuss