Re: [lustre-discuss] Ongoing issues with quota

2023-10-04 Thread Mark Dixon via lustre-discuss

Hi Dan,

Ah, I see. Sorry, no idea - it's been a few years since I last used ZFS, 
and I've never used the Lustre ZFS backend.


Regards,

Mark

On Wed, 4 Oct 2023, Daniel Szkola wrote:


[EXTERNAL EMAIL]

Hi Mark,

All nodes are using ZFS. OSTs, MDT, and MGT are all ZFS-based, so there's
really no way to fsck them. I could do a scrub, but that's not the same
thing. Is there a Lustre/ZFS equivalent of 'tune2fs -O [^]quota' for ZFS?

I'm guessing that at some point, a large number of files was removed and
somehow quota accounting missed this.

There should be a simple way to reconcile or regenerate what quota has
recorded vs what is actually on disk, which I have verified two different
ways.

--
Dan

On Wed, 2023-10-04 at 15:01 +0100, Mark Dixon wrote:

Hi Dan,

I think it gets corrected when you umount and fsck the OST's themselves
(not lfsck). At least I recall seeing such messages when fsck'ing on 2.12.

Best,

Mark

On Wed, 4 Oct 2023, Daniel Szkola via lustre-discuss wrote:


[EXTERNAL EMAIL]

No combination of lfsck runs has helped with this.

Again, robinhood shows 1796104 files for the group, an 'lfs find -G gid'
found 1796104 files as well.

So why is the quota command showing over 3 million inodes used?

There must be a way to force it to recount or clear all stale quota data
and have it regenerate it?

Anyone?

—
Dan Szkola
FNAL



On Sep 27, 2023, at 9:42 AM, Daniel Szkola via lustre-discuss
 wrote:

We have a lustre filesystem that we just upgraded to 2.15.3, however
this problem has been going on for some time.

The quota command shows this:

Disk quotas for grp somegroup (gid 9544):
Filesystemused   quota   limit   grace   files   quota
limit   grace
  /lustre1  13.38T 40T 45T   - 3136761* 2621440
3670016 expired

The group is not using nearly that many files. We have robinhood
installed and it show this:

Using config file '/etc/robinhood.d/lustre1.conf'.
group, type,  count, volume,   spc_used,   avg_size
somegroup,   symlink,  59071,5.12 MB,  103.16 MB, 91
somegroup,   dir, 426619,5.24 GB,5.24 GB,   12.87 KB
somegroup,  file,1310414,   16.24 TB,   13.37 TB,   13.00 MB

Total: 1796104 entries, volume: 17866508365925 bytes (16.25 TB), space
used: 14704924899840 bytes (13.37 TB)

Any ideas what is wrong here?

—
Dan Szkola
FNAL
___
lustre-discuss mailing list
lustre-discuss@lists.lustre.org
https://urldefense.proofpoint.com/v2/url?u=http-3A__lists.lustre.org_listinfo.cgi_lustre-2Ddiscuss-2Dlustre.org&d=DwIGaQ&c=gRgGjJ3BkIsb5y6s49QqsA&r=e9DXjyTaQ786Tg7WH7oIVaQOA1YDRqyxHOUaYU2_LQw&m=Nk1MkSBTpT-KnrXzEvOOP5tZoVAKyHfPvB-o8_OhewuwHF6S0KelH_WPMLq8IRnR&s=JzAV0C2_CqaDUOG0wZr0mx5tiblBde6ZRUuIHZ2n9DI&e=


___
lustre-discuss mailing list
lustre-discuss@lists.lustre.org
https://urldefense.proofpoint.com/v2/url?u=http-3A__lists.lustre.org_listinfo.cgi_lustre-2Ddiscuss-2Dlustre.org&d=DwIDaQ&c=gRgGjJ3BkIsb5y6s49QqsA&r=e9DXjyTaQ786Tg7WH7oIVaQOA1YDRqyxHOUaYU2_LQw&m=BBVt50ufoqbL64MfSKVa87fK1B4Q0n91KVNJVmvb-9q9xOYwnzpZcOXWgUeM6fxQ&s=uTJ98MgxxcM61HIDJRBpfJpuLDt9Ug4ARh8P_Api3xQ&e=



___
lustre-discuss mailing list
lustre-discuss@lists.lustre.org
http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org


Re: [lustre-discuss] Ongoing issues with quota

2023-10-04 Thread Mark Dixon via lustre-discuss

Hi Dan,

I think it gets corrected when you umount and fsck the OST's themselves 
(not lfsck). At least I recall seeing such messages when fsck'ing on 2.12.


Best,

Mark

On Wed, 4 Oct 2023, Daniel Szkola via lustre-discuss wrote:


[EXTERNAL EMAIL]

No combination of lfsck runs has helped with this.

Again, robinhood shows 1796104 files for the group, an 'lfs find -G gid' found 
1796104 files as well.

So why is the quota command showing over 3 million inodes used?

There must be a way to force it to recount or clear all stale quota data and 
have it regenerate it?

Anyone?

—
Dan Szkola
FNAL



On Sep 27, 2023, at 9:42 AM, Daniel Szkola via lustre-discuss 
 wrote:

We have a lustre filesystem that we just upgraded to 2.15.3, however this 
problem has been going on for some time.

The quota command shows this:

Disk quotas for grp somegroup (gid 9544):
Filesystemused   quota   limit   grace   files   quota   limit   grace
  /lustre1  13.38T 40T 45T   - 3136761* 2621440 3670016 expired

The group is not using nearly that many files. We have robinhood installed and 
it show this:

Using config file '/etc/robinhood.d/lustre1.conf'.
group, type,  count, volume,   spc_used,   avg_size
somegroup,   symlink,  59071,5.12 MB,  103.16 MB, 91
somegroup,   dir, 426619,5.24 GB,5.24 GB,   12.87 KB
somegroup,  file,1310414,   16.24 TB,   13.37 TB,   13.00 MB

Total: 1796104 entries, volume: 17866508365925 bytes (16.25 TB), space used: 
14704924899840 bytes (13.37 TB)

Any ideas what is wrong here?

—
Dan Szkola
FNAL
___
lustre-discuss mailing list
lustre-discuss@lists.lustre.org
https://urldefense.proofpoint.com/v2/url?u=http-3A__lists.lustre.org_listinfo.cgi_lustre-2Ddiscuss-2Dlustre.org&d=DwIGaQ&c=gRgGjJ3BkIsb5y6s49QqsA&r=e9DXjyTaQ786Tg7WH7oIVaQOA1YDRqyxHOUaYU2_LQw&m=Nk1MkSBTpT-KnrXzEvOOP5tZoVAKyHfPvB-o8_OhewuwHF6S0KelH_WPMLq8IRnR&s=JzAV0C2_CqaDUOG0wZr0mx5tiblBde6ZRUuIHZ2n9DI&e=


___
lustre-discuss mailing list
lustre-discuss@lists.lustre.org
http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org
___
lustre-discuss mailing list
lustre-discuss@lists.lustre.org
http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org


Re: [lustre-discuss] Rocky 9.2/lustre 2.15.3 client questions

2023-06-26 Thread Mark Dixon via lustre-discuss

Hi Christopher,

Not an exact match, but we've seen problems running Vasp on a 2.15.x 
client against 2.12.6 servers. It can get in quite a tangle, to the point 
that other clients cannot "ls -l" the Vasp working directory.


Don't know (yet) if it's also true of 2.12.9.

Best,

Mark

On Fri, 23 Jun 2023, Mountford, Christopher J. (Dr.) via lustre-discuss wrote:


[EXTERNAL EMAIL]

Hi,

I'm building the lustre client/kernel modules for our new HPC cluster and have 
a couple of questions:

1) Are there any known issues running lustre 2.15.3 clients and lustre 2.12.9 
servers? I haven't seen anything showstopping on the mailing list or in JIRA 
but wondered if anyone had run into problems.

2) Is it possible to get the dkms kernel rpm to work with Rocky/RHEL 9.2? If I 
try to install the lustre-client-dkms rpm I get the following error:

error: Failed dependencies:
   /usr/bin/python2 is needed by lustre-client-dkms-2.15.3-1.el9.noarch

- Not surprisingly as I understand that python2 is not available for rocky/rhel 
9

I see there is a patch for 2.16 (from LU-16626). Not a major problem as I can 
build kmod-lustre-client rpms for our kernel/ofed, but I would prefer to use 
dkms if possible.

Kind Regards,
Christopher.


Dr. Christopher Mountford,
System Specialist,
RCS,
Digital Services.
___
lustre-discuss mailing list
lustre-discuss@lists.lustre.org
http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org


___
lustre-discuss mailing list
lustre-discuss@lists.lustre.org
http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org