Hi Bill,

I just ran into a similar issue.  See:
https://jira.whamcloud.com/browse/LU-15468

Lustre definitely caches data in the pagecache, and as far as I have seen 
metadata in slab.  I'd start by running slabtop on a client machine if you can 
stably reproduce the OOM situation, or creating a cronjob to cat /proc/meminfo 
and /proc/vmstat into a file at minutely intervals to try to save state of the 
machine before it goes belly up.  If you see a tremendous amount consumed by 
Lustre slabs then it's likely on the inode caching side (the slab name should 
be indicative though), and you might try a client build with this recent change 
to see if it mitigates the issue:
https://review.whamcloud.com/#/c/39973

Note that disabling the Lustre inode cache like this will inherently apply 
significantly more pressure on your MDTs, but if it keeps you out of OOM 
territory, it's probably a win.

In my case it wasn't metadata that was forcing my clients to OOM, but PTLRPC 
holding onto references to pages the rest of Lustre thought it was done with 
until my OSTs committed their transactions.  Revising my OST mount options to 
use an explicit commit=5 fixed my problem.

Best,

ellis

-----Original Message-----
From: lustre-discuss <lustre-discuss-boun...@lists.lustre.org> On Behalf Of 
bill broadley via lustre-discuss
Sent: Friday, February 18, 2022 4:43 PM
To: lustre-discuss@lists.lustre.org
Subject: [EXTERNAL] [lustre-discuss] Limiting Lustre memory use?


On a cluster I managed (without Lustre), we had many problems with users 
running nodes out of ram which often killed the node.  We added cgroup support 
to slurm and those problems disappeared.  Nearly 100% of the time get a cgroup 
OOM instead of a kernel OOM and the nodes would stay up and stable. This became 
doubly important when we started allowing jobs to share nodes and didn't want 
job A to be able to crash job B.

I've tried similar on a Lustre enabled cluster and it seems like the memory 
used by Lustre (which I believe is in the kernel and outside of the job's 
cgroup).  I think part of the problem is I believe Lustre caches metadata in 
the linux page cache, but not data.  I've tried reducing the ram available to 
slurm, but still getting kernel OOMs instead of cgroup OOMs.

Anyone have a suggestion for fixing this?  Is there any way to limit Lustre's 
memory use in the kernel?  Or force that caching into userspace and inside the 
cgroup?  Or possibly out of ram and onto a client local NVMe?

_______________________________________________
lustre-discuss mailing list
lustre-discuss@lists.lustre.org
https://nam06.safelinks.protection.outlook.com/?url=http%3A%2F%2Flists.lustre.org%2Flistinfo.cgi%2Flustre-discuss-lustre.org&amp;data=04%7C01%7Celliswilson%40microsoft.com%7C03fd04f2bfc44a103a0908d9f3279b87%7C72f988bf86f141af91ab2d7cd011db47%7C1%7C0%7C637808174021538816%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000&amp;sdata=0Pr7JcdxBeo9G8SbcodZP%2Bj2FTPFrdI4bKt%2BbjMO%2BKQ%3D&amp;reserved=0
_______________________________________________
lustre-discuss mailing list
lustre-discuss@lists.lustre.org
http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org

Reply via email to