Re: [lustre-discuss] Ongoing issues with quota
Hi Dan, Ah, I see. Sorry, no idea - it's been a few years since I last used ZFS, and I've never used the Lustre ZFS backend. Regards, Mark On Wed, 4 Oct 2023, Daniel Szkola wrote: [EXTERNAL EMAIL] Hi Mark, All nodes are using ZFS. OSTs, MDT, and MGT are all ZFS-based, so there's really no way to fsck them. I could do a scrub, but that's not the same thing. Is there a Lustre/ZFS equivalent of 'tune2fs -O [^]quota' for ZFS? I'm guessing that at some point, a large number of files was removed and somehow quota accounting missed this. There should be a simple way to reconcile or regenerate what quota has recorded vs what is actually on disk, which I have verified two different ways. -- Dan On Wed, 2023-10-04 at 15:01 +0100, Mark Dixon wrote: Hi Dan, I think it gets corrected when you umount and fsck the OST's themselves (not lfsck). At least I recall seeing such messages when fsck'ing on 2.12. Best, Mark On Wed, 4 Oct 2023, Daniel Szkola via lustre-discuss wrote: [EXTERNAL EMAIL] No combination of lfsck runs has helped with this. Again, robinhood shows 1796104 files for the group, an 'lfs find -G gid' found 1796104 files as well. So why is the quota command showing over 3 million inodes used? There must be a way to force it to recount or clear all stale quota data and have it regenerate it? Anyone? — Dan Szkola FNAL On Sep 27, 2023, at 9:42 AM, Daniel Szkola via lustre-discuss wrote: We have a lustre filesystem that we just upgraded to 2.15.3, however this problem has been going on for some time. The quota command shows this: Disk quotas for grp somegroup (gid 9544): Filesystemused quota limit grace files quota limit grace /lustre1 13.38T 40T 45T - 3136761* 2621440 3670016 expired The group is not using nearly that many files. We have robinhood installed and it show this: Using config file '/etc/robinhood.d/lustre1.conf'. group, type, count, volume, spc_used, avg_size somegroup, symlink, 59071,5.12 MB, 103.16 MB, 91 somegroup, dir, 426619,5.24 GB,5.24 GB, 12.87 KB somegroup, file,1310414, 16.24 TB, 13.37 TB, 13.00 MB Total: 1796104 entries, volume: 17866508365925 bytes (16.25 TB), space used: 14704924899840 bytes (13.37 TB) Any ideas what is wrong here? — Dan Szkola FNAL ___ lustre-discuss mailing list lustre-discuss@lists.lustre.org https://urldefense.proofpoint.com/v2/url?u=http-3A__lists.lustre.org_listinfo.cgi_lustre-2Ddiscuss-2Dlustre.org&d=DwIGaQ&c=gRgGjJ3BkIsb5y6s49QqsA&r=e9DXjyTaQ786Tg7WH7oIVaQOA1YDRqyxHOUaYU2_LQw&m=Nk1MkSBTpT-KnrXzEvOOP5tZoVAKyHfPvB-o8_OhewuwHF6S0KelH_WPMLq8IRnR&s=JzAV0C2_CqaDUOG0wZr0mx5tiblBde6ZRUuIHZ2n9DI&e= ___ lustre-discuss mailing list lustre-discuss@lists.lustre.org https://urldefense.proofpoint.com/v2/url?u=http-3A__lists.lustre.org_listinfo.cgi_lustre-2Ddiscuss-2Dlustre.org&d=DwIDaQ&c=gRgGjJ3BkIsb5y6s49QqsA&r=e9DXjyTaQ786Tg7WH7oIVaQOA1YDRqyxHOUaYU2_LQw&m=BBVt50ufoqbL64MfSKVa87fK1B4Q0n91KVNJVmvb-9q9xOYwnzpZcOXWgUeM6fxQ&s=uTJ98MgxxcM61HIDJRBpfJpuLDt9Ug4ARh8P_Api3xQ&e= ___ lustre-discuss mailing list lustre-discuss@lists.lustre.org http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org
Re: [lustre-discuss] Ongoing issues with quota
Hi Dan, I think it gets corrected when you umount and fsck the OST's themselves (not lfsck). At least I recall seeing such messages when fsck'ing on 2.12. Best, Mark On Wed, 4 Oct 2023, Daniel Szkola via lustre-discuss wrote: [EXTERNAL EMAIL] No combination of lfsck runs has helped with this. Again, robinhood shows 1796104 files for the group, an 'lfs find -G gid' found 1796104 files as well. So why is the quota command showing over 3 million inodes used? There must be a way to force it to recount or clear all stale quota data and have it regenerate it? Anyone? — Dan Szkola FNAL On Sep 27, 2023, at 9:42 AM, Daniel Szkola via lustre-discuss wrote: We have a lustre filesystem that we just upgraded to 2.15.3, however this problem has been going on for some time. The quota command shows this: Disk quotas for grp somegroup (gid 9544): Filesystemused quota limit grace files quota limit grace /lustre1 13.38T 40T 45T - 3136761* 2621440 3670016 expired The group is not using nearly that many files. We have robinhood installed and it show this: Using config file '/etc/robinhood.d/lustre1.conf'. group, type, count, volume, spc_used, avg_size somegroup, symlink, 59071,5.12 MB, 103.16 MB, 91 somegroup, dir, 426619,5.24 GB,5.24 GB, 12.87 KB somegroup, file,1310414, 16.24 TB, 13.37 TB, 13.00 MB Total: 1796104 entries, volume: 17866508365925 bytes (16.25 TB), space used: 14704924899840 bytes (13.37 TB) Any ideas what is wrong here? — Dan Szkola FNAL ___ lustre-discuss mailing list lustre-discuss@lists.lustre.org https://urldefense.proofpoint.com/v2/url?u=http-3A__lists.lustre.org_listinfo.cgi_lustre-2Ddiscuss-2Dlustre.org&d=DwIGaQ&c=gRgGjJ3BkIsb5y6s49QqsA&r=e9DXjyTaQ786Tg7WH7oIVaQOA1YDRqyxHOUaYU2_LQw&m=Nk1MkSBTpT-KnrXzEvOOP5tZoVAKyHfPvB-o8_OhewuwHF6S0KelH_WPMLq8IRnR&s=JzAV0C2_CqaDUOG0wZr0mx5tiblBde6ZRUuIHZ2n9DI&e= ___ lustre-discuss mailing list lustre-discuss@lists.lustre.org http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org ___ lustre-discuss mailing list lustre-discuss@lists.lustre.org http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org
Re: [lustre-discuss] Rocky 9.2/lustre 2.15.3 client questions
Hi Christopher, Not an exact match, but we've seen problems running Vasp on a 2.15.x client against 2.12.6 servers. It can get in quite a tangle, to the point that other clients cannot "ls -l" the Vasp working directory. Don't know (yet) if it's also true of 2.12.9. Best, Mark On Fri, 23 Jun 2023, Mountford, Christopher J. (Dr.) via lustre-discuss wrote: [EXTERNAL EMAIL] Hi, I'm building the lustre client/kernel modules for our new HPC cluster and have a couple of questions: 1) Are there any known issues running lustre 2.15.3 clients and lustre 2.12.9 servers? I haven't seen anything showstopping on the mailing list or in JIRA but wondered if anyone had run into problems. 2) Is it possible to get the dkms kernel rpm to work with Rocky/RHEL 9.2? If I try to install the lustre-client-dkms rpm I get the following error: error: Failed dependencies: /usr/bin/python2 is needed by lustre-client-dkms-2.15.3-1.el9.noarch - Not surprisingly as I understand that python2 is not available for rocky/rhel 9 I see there is a patch for 2.16 (from LU-16626). Not a major problem as I can build kmod-lustre-client rpms for our kernel/ofed, but I would prefer to use dkms if possible. Kind Regards, Christopher. Dr. Christopher Mountford, System Specialist, RCS, Digital Services. ___ lustre-discuss mailing list lustre-discuss@lists.lustre.org http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org ___ lustre-discuss mailing list lustre-discuss@lists.lustre.org http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org