It is likely the revocation of the layout lock, caused by the MDS allocating the new objects that is causing the cache to be flushed?
Strictly speaking, the client _shouldn't_ have to flush the cache in this case, because the OST objects in the first component have not changed, but it doesn't know this in advance so it is pre-emptively flushing the cache. Instead of ftruncate(0) you could ftruncate(final_size), or at least large enough to trigger creation of the later component(s) (they need to be created to store the file size from the truncate(size) call). Cheers, Andreas > On Jan 15, 2026, at 17:41, John Bauer via lustre-discuss > <[email protected]> wrote: > > Rick, > You were spot on. I changed the test program to rewrite the file, with a > ftruncate(0) in between. It can be seen that the ftruncate(0) caused > "cached" for all OSCs drops to zero at about 1.4 seconds. The subsequent > rewrite does not dump the first component when a direct write goes to the > 2nd component. > Thanks much for the insight. > John<split_direct_2.png> > On 1/15/2026 5:31 PM, Mohr, Rick wrote: >> John, >> >> Have you run the same test a second time against the same file (ie - >> overwriting data from the first test so that a new file isn't allocated by >> lustre)? If so, do you see the same behavior both times? The reason I ask is >> because I am wondering if this could be related to lustre's lazy allocation >> of the second PFL component. Lustre will only allocate osts for the first >> component when the file is created, but as soon as you attempt to write into >> the second component, Lustre will then allocate a set of osts for it. Maybe >> there is some locking that happens which forces the client to flush its >> cache? It's just a guess but it might be worth testing if you haven't >> already done so. >> >> --Rick >> >> >> On 1/15/26, 3:43 PM, "lustre-discuss on behalf of John Bauer via >> lustre-discuss" <[email protected]> wrote: >> >> All, >> I am back to trying to emulate Hybrid I/O from user space, doing direct and >> buffered I/O to the same file concurrently. I open a file twice, once with >> O_DIRECT, and once without. Note that you will see 2 different file names >> involved, buffered.dat and direct.dat. direct.dat is a symlink to >> buffered.dat and this is done so my tool can more easily display the direct >> and non-direct I/O differently. The file has striping of >> 512M@4{100,101,102,103}x32M<ssd-pool + EOF@4{104,105,106,107}x32M<ssd-pool. >> The application first writes 512M ( 32M per write ) to only the first PFL >> component using non-direct fd. Then the application writes 512M ( 32M per >> write ) alternating between the direct fd and non-direct fd. The very first >> write ( using direct ) into the 2nd component triggers the dump of the >> entire first component from buffer cache. From that point on the 2 OSC that >> handle the non-direct writes accumulate cache. The 2 OSC that handle the >> direct writes accumulate no cache. My question: Why does Lustre dump the 1st >> component from buffer cache? The 1st and 2nd component do not even share >> OSCs. Lustre is has no problem dealing with direct and non-direct I/O in the >> same component (2nd component in this case). To me it would seem that if >> Lustre can correctly buffer direct and non-direct in the same component, it >> should be able to correctly buffer direct and non-direct in multiple >> components. My ultimate goal is to have the first, and smaller component, >> remain cached, and the remainder of the file use direct I/O, but as soon as >> I do a direct I/O, I lose all my buffer cache. >> The top frame of the plot is the amount of cache used by each OSC versus >> time. The bottom frame of the plot is the File Position Activity versus >> time. Next to each pwrite64() depicted, I indicate which OSC is being >> written to. I have also colored the pwrite64()s by whether they used the >> direct fd (green) or non-direct fd(red). As soon as the 2nd PFL component is >> touched by a direct write, that write waits until the OSCs of the first PFL >> component dump all their cache. >> John >> >> >> > _______________________________________________ > lustre-discuss mailing list > [email protected] > http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org --- Andreas Dilger Principal Lustre Architect [email protected] _______________________________________________ lustre-discuss mailing list [email protected] http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org
