Rick,

I had suspicions it would be something you described.  I am not very familiar with those aspects of Lustre.  Your comment about Persistent Client Cache is interesting, but I believe it to be unavailable on my client nodes.  I'm going to throw one more image into the discussion as an argument for needing a mechanism for bypassing Hybrid I/O.  This image is of the File Position Activity plot that had the checkpoint thing happen during its run.  Notice that my job did not stall a bit while the checkpoint occurred. Should someone think it reasonable to implement a mechanism to bypass the Hybrid I/O function, I will throw out that it should be done via the PFL mechanism, allowing selected PFL components to be flagged for legacy buffered I/O.  I assume that every read or write must be checked for which component and OST it is a part of.  Once lustre has determined the component it could also flag the request for the non-hybrid path.

John


Image 5:

https://www.dropbox.com/scl/fi/a6jaf6piq4p7z42x5h4x5/buffered_with_no_cp_affect.png?rlkey=4mbl5ysmm5xokkremnn63f1qk&st=zlhqf2k5&dl=0


On 1/13/2026 3:02 PM, [email protected] wrote:
Send lustre-discuss mailing list submissions to
        [email protected]

To subscribe or unsubscribe via the World Wide Web, visit
        http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org
or, via email, send a message with subject or body 'help' to
        [email protected]

You can reach the person managing the list at
        [email protected]

When replying, please edit your Subject line so it is more specific
than "Re: Contents of lustre-discuss digest..."


Today's Topics:

    1. Re: [EXTERNAL] Dramatic loss of performance when another
       application does writing. (Mohr, Rick)


----------------------------------------------------------------------

Message: 1
Date: Tue, 13 Jan 2026 20:01:09 +0000
From: "Mohr, Rick"<[email protected]>
To: John Bauer<[email protected]>,
        "[email protected]" <[email protected]>
Subject: Re: [lustre-discuss] [EXTERNAL] Dramatic loss of performance
        when another application does writing.
Message-ID:<[email protected]>
Content-Type: text/plain; charset="utf-8"

John,

I wonder if this could be a credit issue.  Do you know the size of the other 
job that is doing the checkpointing?  It sounds like your job is just a single 
client job so it is going to have a limited number of credits (the default used 
to be 8 but I don't know if that is still the case).  If the other job is using 
100 nodes (just as an example), it could have 100x more outstanding IO requests 
than your job can. The spike in the server load makes me think that IO requests 
are getting backed up.

Lustre has limit on the peer_credits which is the number of outstanding IO 
requests per client which helps to prevent any one client from monopolizing a 
Lustre server.  But the nodes themselves also have a limit on the total number 
of credits which helps to limit the number of outstanding IO requests on the 
server (I think the number is related to the limitations of the network fabric, 
but it can also serve as a way to limit the number of requests that get queued 
on the server to help prevent a server from getting overloaded).  If a large 
job is checkpointing, then maybe that job is chewing up the server's credits so 
that your application is only getting a small number of IO requests added to a 
very large queue of outstanding requests.  My knowledge of credits may be 
flawed/out-dated (and perhaps someone else on the list can correct me if I am), 
but it's one way that contention could exist on a server even if there isn't 
contention on the OSTs themselves.

If your application is using a single client which has some local SSD storage, 
maybe the Persistent Client Cache (PCC) feature might be of some benefit to you 
(if it's available on your file system).

--Rick


?On 1/12/26, 7:52 PM, "lustre-discuss on behalf of John Bauer via 
lustre-discuss"<[email protected]> wrote:


All,
My questions of recent are related to my trying to understand the following issue. I have 
an application that is writing, reading forwards, and reading backwards, a single file 
multiple times ( as seen in bottom frame of Image 1). The file is striped 4x16M on 4 ssd 
OSTs on 2 OSS. Everything runs along just great with transfer rates in the 5GB/s range. 
At some point, another application triggers approximately 135 GB of writes to each of the 
32 hdd OSTs on the16 OSSs of the file system. When this happens my applications 
performance drops to 4.8 MB/s, a 99.9% loss of performance for the 33+ second duration of 
the other application's writes. My application is doing 16MB preads and pwrites in 
parallel using 4 pthreads, with O_DIRECT on the client. The main question I have is: 
"Why do the writes from the other application affect my application so 
dramatically?" I am making demands of the 2 OSS of about the same order of 
magnitude, 2.5GB/s each from 2 OSS, as the other application is gettin
  g from the same 2 OSS, about 4 GB/s each. There should be no competition for 
the OSTs, as I am using ssd and the other application is using hdd. If both 
applications are triggering Direct I/O on the OSSs, I would think there would 
be minimal competition for compute resources on the OSSs. But as seen below in 
Image 3, there is a huge spike in cpu load during the other application's 
writes. This is not a one-off event. I see this about 2 out of every 3 times I 
run this job. I suspect the other application is one that checkpoints on a 
regular interval, but I am a non-root user and have no way to determine. I am 
using PCP/pmapi to get the OSS data during my run. If the images get removed 
from the email, I have used alternate text with links to Dropbox for the images.
Thanks,
John



------------------------------

Subject: Digest Footer

_______________________________________________
lustre-discuss mailing list
[email protected]
http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org


------------------------------

End of lustre-discuss Digest, Vol 238, Issue 8
**********************************************
_______________________________________________
lustre-discuss mailing list
[email protected]
http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org

Reply via email to