Rick,
I had suspicions it would be something you described. I am not very
familiar with those aspects of Lustre. Your comment about Persistent
Client Cache is interesting, but I believe it to be unavailable on my
client nodes. I'm going to throw one more image into the discussion as
an argument for needing a mechanism for bypassing Hybrid I/O. This
image is of the File Position Activity plot that had the checkpoint
thing happen during its run. Notice that my job did not stall a bit
while the checkpoint occurred. Should someone think it reasonable to
implement a mechanism to bypass the Hybrid I/O function, I will throw
out that it should be done via the PFL mechanism, allowing selected PFL
components to be flagged for legacy buffered I/O. I assume that every
read or write must be checked for which component and OST it is a part
of. Once lustre has determined the component it could also flag the
request for the non-hybrid path.
John
Image 5:
https://www.dropbox.com/scl/fi/a6jaf6piq4p7z42x5h4x5/buffered_with_no_cp_affect.png?rlkey=4mbl5ysmm5xokkremnn63f1qk&st=zlhqf2k5&dl=0
On 1/13/2026 3:02 PM, [email protected] wrote:
Send lustre-discuss mailing list submissions to
[email protected]
To subscribe or unsubscribe via the World Wide Web, visit
http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org
or, via email, send a message with subject or body 'help' to
[email protected]
You can reach the person managing the list at
[email protected]
When replying, please edit your Subject line so it is more specific
than "Re: Contents of lustre-discuss digest..."
Today's Topics:
1. Re: [EXTERNAL] Dramatic loss of performance when another
application does writing. (Mohr, Rick)
----------------------------------------------------------------------
Message: 1
Date: Tue, 13 Jan 2026 20:01:09 +0000
From: "Mohr, Rick"<[email protected]>
To: John Bauer<[email protected]>,
"[email protected]" <[email protected]>
Subject: Re: [lustre-discuss] [EXTERNAL] Dramatic loss of performance
when another application does writing.
Message-ID:<[email protected]>
Content-Type: text/plain; charset="utf-8"
John,
I wonder if this could be a credit issue. Do you know the size of the other
job that is doing the checkpointing? It sounds like your job is just a single
client job so it is going to have a limited number of credits (the default used
to be 8 but I don't know if that is still the case). If the other job is using
100 nodes (just as an example), it could have 100x more outstanding IO requests
than your job can. The spike in the server load makes me think that IO requests
are getting backed up.
Lustre has limit on the peer_credits which is the number of outstanding IO
requests per client which helps to prevent any one client from monopolizing a
Lustre server. But the nodes themselves also have a limit on the total number
of credits which helps to limit the number of outstanding IO requests on the
server (I think the number is related to the limitations of the network fabric,
but it can also serve as a way to limit the number of requests that get queued
on the server to help prevent a server from getting overloaded). If a large
job is checkpointing, then maybe that job is chewing up the server's credits so
that your application is only getting a small number of IO requests added to a
very large queue of outstanding requests. My knowledge of credits may be
flawed/out-dated (and perhaps someone else on the list can correct me if I am),
but it's one way that contention could exist on a server even if there isn't
contention on the OSTs themselves.
If your application is using a single client which has some local SSD storage,
maybe the Persistent Client Cache (PCC) feature might be of some benefit to you
(if it's available on your file system).
--Rick
?On 1/12/26, 7:52 PM, "lustre-discuss on behalf of John Bauer via
lustre-discuss"<[email protected]> wrote:
All,
My questions of recent are related to my trying to understand the following issue. I have
an application that is writing, reading forwards, and reading backwards, a single file
multiple times ( as seen in bottom frame of Image 1). The file is striped 4x16M on 4 ssd
OSTs on 2 OSS. Everything runs along just great with transfer rates in the 5GB/s range.
At some point, another application triggers approximately 135 GB of writes to each of the
32 hdd OSTs on the16 OSSs of the file system. When this happens my applications
performance drops to 4.8 MB/s, a 99.9% loss of performance for the 33+ second duration of
the other application's writes. My application is doing 16MB preads and pwrites in
parallel using 4 pthreads, with O_DIRECT on the client. The main question I have is:
"Why do the writes from the other application affect my application so
dramatically?" I am making demands of the 2 OSS of about the same order of
magnitude, 2.5GB/s each from 2 OSS, as the other application is gettin
g from the same 2 OSS, about 4 GB/s each. There should be no competition for
the OSTs, as I am using ssd and the other application is using hdd. If both
applications are triggering Direct I/O on the OSSs, I would think there would
be minimal competition for compute resources on the OSSs. But as seen below in
Image 3, there is a huge spike in cpu load during the other application's
writes. This is not a one-off event. I see this about 2 out of every 3 times I
run this job. I suspect the other application is one that checkpoints on a
regular interval, but I am a non-root user and have no way to determine. I am
using PCP/pmapi to get the OSS data during my run. If the images get removed
from the email, I have used alternate text with links to Dropbox for the images.
Thanks,
John
------------------------------
Subject: Digest Footer
_______________________________________________
lustre-discuss mailing list
[email protected]
http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org
------------------------------
End of lustre-discuss Digest, Vol 238, Issue 8
**********************************************
_______________________________________________
lustre-discuss mailing list
[email protected]
http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org