John,

Have you run the same test a second time against the same file (ie - 
overwriting data from the first test so that a new file isn't allocated by 
lustre)?  If so, do you see the same behavior both times?  The reason I ask is 
because I am wondering if this could be related to lustre's lazy allocation of 
the second PFL component.  Lustre will only allocate osts for the first 
component when the file is created, but as soon as you attempt to write into 
the second component, Lustre will then allocate a set of osts for it.  Maybe 
there is some locking that happens which forces the client to flush its cache?  
It's just a guess but it might be worth testing if you haven't already done so.

--Rick


On 1/15/26, 3:43 PM, "lustre-discuss on behalf of John Bauer via 
lustre-discuss" <[email protected]> wrote:

All,
I am back to trying to emulate Hybrid I/O from user space, doing direct and 
buffered I/O to the same file concurrently. I open a file twice, once with 
O_DIRECT, and once without. Note that you will see 2 different file names 
involved, buffered.dat and direct.dat. direct.dat is a symlink to buffered.dat 
and this is done so my tool can more easily display the direct and non-direct 
I/O differently. The file has striping of 512M@4{100,101,102,103}x32M<ssd-pool 
+ EOF@4{104,105,106,107}x32M<ssd-pool. The application first writes 512M ( 32M 
per write ) to only the first PFL component using non-direct fd. Then the 
application writes 512M ( 32M per write ) alternating between the direct fd and 
non-direct fd. The very first write ( using direct ) into the 2nd component 
triggers the dump of the entire first component from buffer cache. From that 
point on the 2 OSC that handle the non-direct writes accumulate cache. The 2 
OSC that handle the direct writes accumulate no cache. My question: Why does 
Lustre dump the 1st component from buffer cache? The 1st and 2nd component do 
not even share OSCs. Lustre is has no problem dealing with direct and 
non-direct I/O in the same component (2nd component in this case). To me it 
would seem that if Lustre can correctly buffer direct and non-direct in the 
same component, it should be able to correctly buffer direct and non-direct in 
multiple components. My ultimate goal is to have the first, and smaller 
component, remain cached, and the remainder of the file use direct I/O, but as 
soon as I do a direct I/O, I lose all my buffer cache.
The top frame of the plot is the amount of cache used by each OSC versus time. 
The bottom frame of the plot is the File Position Activity versus time. Next to 
each pwrite64() depicted, I indicate which OSC is being written to. I have also 
colored the pwrite64()s by whether they used the direct fd (green) or 
non-direct fd(red). As soon as the 2nd PFL component is touched by a direct 
write, that write waits until the OSCs of the first PFL component dump all 
their cache.
John


_______________________________________________
lustre-discuss mailing list
[email protected]
http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org

Reply via email to