It is likely the revocation of the layout lock, caused by the MDS allocating 
the new objects that is causing the cache to be flushed?

Strictly speaking, the client _shouldn't_ have to flush the cache in this case, 
because the OST objects in the first component have not changed, but it doesn't 
know this in advance so it is pre-emptively flushing the cache.

Instead of ftruncate(0) you could ftruncate(final_size), or at least large 
enough to trigger creation of the later component(s) (they need to be created 
to store the file size from the truncate(size) call).

Cheers, Andreas

> On Jan 15, 2026, at 17:41, John Bauer via lustre-discuss 
> <[email protected]> wrote:
> 
> Rick,
> You were spot on.  I changed the test program to rewrite the file, with a 
> ftruncate(0) in between.  It can be seen that the ftruncate(0) caused 
> "cached" for all OSCs drops to zero at about 1.4 seconds.  The subsequent 
> rewrite does not dump the first component when a direct write  goes to the 
> 2nd component.
> Thanks much for the insight.
> John<split_direct_2.png>
> On 1/15/2026 5:31 PM, Mohr, Rick wrote:
>> John,
>> 
>> Have you run the same test a second time against the same file (ie - 
>> overwriting data from the first test so that a new file isn't allocated by 
>> lustre)? If so, do you see the same behavior both times? The reason I ask is 
>> because I am wondering if this could be related to lustre's lazy allocation 
>> of the second PFL component. Lustre will only allocate osts for the first 
>> component when the file is created, but as soon as you attempt to write into 
>> the second component, Lustre will then allocate a set of osts for it. Maybe 
>> there is some locking that happens which forces the client to flush its 
>> cache? It's just a guess but it might be worth testing if you haven't 
>> already done so.
>> 
>> --Rick
>> 
>> 
>> On 1/15/26, 3:43 PM, "lustre-discuss on behalf of John Bauer via 
>> lustre-discuss" <[email protected]> wrote:
>> 
>> All,
>> I am back to trying to emulate Hybrid I/O from user space, doing direct and 
>> buffered I/O to the same file concurrently. I open a file twice, once with 
>> O_DIRECT, and once without. Note that you will see 2 different file names 
>> involved, buffered.dat and direct.dat. direct.dat is a symlink to 
>> buffered.dat and this is done so my tool can more easily display the direct 
>> and non-direct I/O differently. The file has striping of 
>> 512M@4{100,101,102,103}x32M<ssd-pool + EOF@4{104,105,106,107}x32M<ssd-pool. 
>> The application first writes 512M ( 32M per write ) to only the first PFL 
>> component using non-direct fd. Then the application writes 512M ( 32M per 
>> write ) alternating between the direct fd and non-direct fd. The very first 
>> write ( using direct ) into the 2nd component triggers the dump of the 
>> entire first component from buffer cache. From that point on the 2 OSC that 
>> handle the non-direct writes accumulate cache. The 2 OSC that handle the 
>> direct writes accumulate no cache. My question: Why does Lustre dump the 1st 
>> component from buffer cache? The 1st and 2nd component do not even share 
>> OSCs. Lustre is has no problem dealing with direct and non-direct I/O in the 
>> same component (2nd component in this case). To me it would seem that if 
>> Lustre can correctly buffer direct and non-direct in the same component, it 
>> should be able to correctly buffer direct and non-direct in multiple 
>> components. My ultimate goal is to have the first, and smaller component, 
>> remain cached, and the remainder of the file use direct I/O, but as soon as 
>> I do a direct I/O, I lose all my buffer cache.
>> The top frame of the plot is the amount of cache used by each OSC versus 
>> time. The bottom frame of the plot is the File Position Activity versus 
>> time. Next to each pwrite64() depicted, I indicate which OSC is being 
>> written to. I have also colored the pwrite64()s by whether they used the 
>> direct fd (green) or non-direct fd(red). As soon as the 2nd PFL component is 
>> touched by a direct write, that write waits until the OSCs of the first PFL 
>> component dump all their cache.
>> John
>> 
>> 
>> 
> _______________________________________________
> lustre-discuss mailing list
> [email protected]
> http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org

---
Andreas Dilger
Principal Lustre Architect
[email protected]




_______________________________________________
lustre-discuss mailing list
[email protected]
http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org

Reply via email to