Peter,

Do you see any lustre timeouts or client evictions in your logs (either server 
or client) that correlate with these slowdowns?

--Rick


On 10/28/25, 4:13 PM, "lustre-discuss on behalf of Peter Grandi via 
lustre-discuss" <[email protected] 
<mailto:[email protected]> on behalf of 
[email protected] <mailto:[email protected]>> wrote:

So I have 3 Lustre storage clusters which recently have develope a strange 
issue: * Each cluster has 1 MDT with 3 "enterprise" SSDs and 8 OSTs each with 3 
"entreprise" SSDs, MDT and OSTs all done with ZFS, on top of Alma 8.10. Lustre 
version is 2.15.7. Each server is pretty overspecified (28 cores, 128GiB), 
100Gb/s cards and switch, and the clients are the same as the servers except 
they run the client version of the Lustre 2.16.1 drivers. * For an example I 
will use the Lustre 'temp01' where the servers have addresses 192.168.102.40-48 
where .40 is the MDT and some clients with addresses 192.168.102.13-36. * 
Reading is quite good for all clients. But since yesterday early afternoon 
inexplicably the clients .13-36 have a maximum average write speed of around 
35-40MB/s; but if I mount 'temp01' on any of the Lustre servers (and I usually 
have it mounted on the MDT .40) write rates are as good as before. Mysteriously 
today for a while one of the clients (.14) wrote at previous good speeds for a 
while and then reverted to slow. I was tweaking the some 
'/proc/sys/net/ipv4/tcp_*' parameters at the time but the same parameters on 
.13 did not improve the situation. * I have collected 'tcpdump' traces on all 
the 'temp01' servers and a client while writing and examined with WireShark's 
"TCP Stream Graphs" (etc.) and what is happening is that the clients send at 
full speed for a little while and then pause for around 2-3 seconds and then 
resume. The servers when accessing 'temp01' as clients do not pauses. * If I 
use NFS Ganesha with NFSv4-over-TCP on the MDT exporting 'temp01' I can write 
to that at high rates (not as high as with native Lustre of course). * I have 
used 'iperf3' to check basic network rates and for "reasons" they are around 
25-30Gb/s, but still much higher than observed *average* write speeds. * The 
issues persists after rebooting the clients (have not reebooted all the servers 
of at least one cluster, but I recently rebooted one of the MDTs). * I have 
checked the relevant switch logs and ports and there are no obvious errors or 
significant rates of packet issues. My current guesses are some issue with IP 
flow control or TCP window size but bare TCP with 'iperf3' and NFSv4-over-TCP 
both give good rates. So perhaps it is something weird with the LNET drivers 
with receive pacing in the Lustre driver. Please let me know if you have seen 
something similar or other suggestions 

_______________________________________________
lustre-discuss mailing list
[email protected]
http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org

Reply via email to