date:20231004

Re: [lustre-discuss] OST went back in time: no(?) hardware issue

2023-10-04 Thread Thomas Roth via lustre-discuss


Hi Andreas,

On 10/5/23 02:30, Andreas Dilger wrote:

On Oct 3, 2023, at 16:22, Thomas Roth via lustre-discuss 
 wrote:


Hi all,

in our Lustre 2.12.5 system, we have "OST went back in time" after OST hardware 
replacement:
- hardware had reached EOL
- we set `max_create_count=0` for these OSTs, searched for and migrated off the 
files of these OSTs
- formatted the new OSTs with `--replace` and the old indices
- all OSTs are on ZFS
- set the OSTs `active=0` on our 3 MDTs
- moved in the new hardware, reused the old NIDs, old OST indices, mounted the 
OSTs
- set the OSTs `active=1`
- ran `lfsck` on all servers
- set `max_create_count=200` for these OSTs

Now the "OST went back in time" messages appeard in the MDS logs.

This doesn't quite fit the description in the manual. There were no crashes or 
power losses. I cannot understand how which cache might have been lost.
The transaction numbers quoted in the error are both large, eg. `transno 
55841088879 was previously committed, server now claims 4294992012`

What should we do? Give `lfsck` another try?


Nothing really to see here I think?

Did you delete LAST_RCVD during the replacement and the OST didn't know what 
transno was assigned to the last RPCs it sent?  The still-mounted clients have 
a record of this transno and are surprised that it was reset.  If you unmount 
and remount the clients the error would go away.



No, I don't think I deleted something during the procedure.
- The old OST was emptied (max_create_count=0) in normal Lustre operations. 
Last transaction should be ~ last file being moved away.
- Then the OST is deactivated, but only on the MDS, not on the clients.
- Then the new OST, formatted with '--replace', is mounted. It is activated on 
the MDS. Up to this point no errors.
- Finally, the max_create_count is increased, clients can write.
- Now the MDT throws this error (nothing in the client logs).

According to the manual, what should have happened when I mounted the new OST,

The MDS and OSS will negotiate the LAST_ID value for the replacement OST.


Ok, this is about LAST_ID, whereever that is on ZFS.

About LAST_RCVD, the manual says (even in the case when the configuration files 
got lost and have to be recreated):

The last_rcvd file will be recreated when the OST is first mounted using the 
default parameters,



So, let's see what happens once the clients remount.
Eventually, then, I should also restart the MDTs?


Regards,
Thomas



I'm not sure if the clients might try to preserve the next 55B RPCs in memory 
until the committed transno on the OST catches up, or if they just accept the 
new transno and get on with life?

Cheers, Andreas
--
Andreas Dilger
Lustre Principal Architect
Whamcloud








___
lustre-discuss mailing list
lustre-discuss@lists.lustre.org
http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org

Re: [lustre-discuss] OST went back in time: no(?) hardware issue

2023-10-04 Thread Andreas Dilger via lustre-discuss

On Oct 3, 2023, at 16:22, Thomas Roth via lustre-discuss 
 wrote:
> 
> Hi all,
> 
> in our Lustre 2.12.5 system, we have "OST went back in time" after OST 
> hardware replacement:
> - hardware had reached EOL
> - we set `max_create_count=0` for these OSTs, searched for and migrated off 
> the files of these OSTs
> - formatted the new OSTs with `--replace` and the old indices
> - all OSTs are on ZFS
> - set the OSTs `active=0` on our 3 MDTs
> - moved in the new hardware, reused the old NIDs, old OST indices, mounted 
> the OSTs
> - set the OSTs `active=1`
> - ran `lfsck` on all servers
> - set `max_create_count=200` for these OSTs
> 
> Now the "OST went back in time" messages appeard in the MDS logs.
> 
> This doesn't quite fit the description in the manual. There were no crashes 
> or power losses. I cannot understand how which cache might have been lost.
> The transaction numbers quoted in the error are both large, eg. `transno 
> 55841088879 was previously committed, server now claims 4294992012`
> 
> What should we do? Give `lfsck` another try?

Nothing really to see here I think?

Did you delete LAST_RCVD during the replacement and the OST didn't know what 
transno was assigned to the last RPCs it sent?  The still-mounted clients have 
a record of this transno and are surprised that it was reset.  If you unmount 
and remount the clients the error would go away.

I'm not sure if the clients might try to preserve the next 55B RPCs in memory 
until the committed transno on the OST catches up, or if they just accept the 
new transno and get on with life?

Cheers, Andreas
--
Andreas Dilger
Lustre Principal Architect
Whamcloud







___
lustre-discuss mailing list
lustre-discuss@lists.lustre.org
http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org

Re: [lustre-discuss] Failing build of lustre client on Debian 12

2023-10-04 Thread Andreas Dilger via lustre-discuss

On Oct 4, 2023, at 16:26, Jan Andersen mailto:j...@comind.io>> 
wrote:

Hi,

I've just successfully built the lustre 2.15.3 client on Debian 11 and need to 
do the same on Debian 12; however, configure fails with:

checking if Linux kernel was built with CONFIG_FHANDLE in or as module... no
configure: error:

Lustre fid handling requires that CONFIG_FHANDLE is enabled in your kernel.



As far as I can see, CONFIG_FHANDLE is in fact enabled - eg:

root@debian12:~/lustre-release# grep CONFIG_FHANDLE /boot/config-6.1.38
CONFIG_FHANDLE=y

I've tried to figure out how configure checks for this, but the script is 
rather dense and I haven't penetrated it (yet). It seems to me there is an 
error in the way it checks. What is the best way forward, considering that I've 
already invested a lot of time and effort in setting up a slurm cluster with 
Debian 12?

You could change the AC_MSG_ERROR() to AC_MESSAGE_WARN() or similar, if you 
think the check is wrong.  It would be wortwhile to check if a patch has 
already been submitted to fix this on the master branch.  Otherwise, getting a 
proper patch submitted to fix the check would be better than just ignoring the 
error and leaving it for the next person to fix.

Cheers, Andreas
--
Andreas Dilger
Lustre Principal Architect
Whamcloud







___
lustre-discuss mailing list
lustre-discuss@lists.lustre.org
http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org

Re: [lustre-discuss] Ongoing issues with quota

2023-10-04 Thread Daniel Szkola via lustre-discuss

Also, quotas on the OSTS don’t add up to near 3 million files either:

[root@lustreclient scratch]# ssh ossnode0 lfs quota -g somegroup -I 0 /lustre1
Disk quotas for grp somegroup (gid 9544):
 Filesystem  kbytes   quota   limit   grace   files   quota   limit   grace
1394853459   0 1913344192   -  132863   0   0   
-
[root@lustreclient scratch]# ssh ossnode0 lfs quota -g somegroup -I 1 /lustre1
Disk quotas for grp somegroup (gid 9544):
 Filesystem  kbytes   quota   limit   grace   files   quota   limit   grace
1411579601   0 1963246413   -  120643   0   0   
-
[root@lustreclient scratch]# ssh ossnode1 lfs quota -g somegroup -I 2 /lustre1
Disk quotas for grp somegroup (gid 9544):
 Filesystem  kbytes   quota   limit   grace   files   quota   limit   grace
1416507527   0 1789950778   -  190687   0   0   
-
[root@lustreclient scratch]# ssh ossnode1 lfs quota -g somegroup -I 3 /lustre1
Disk quotas for grp somegroup (gid 9544):
 Filesystem  kbytes   quota   limit   grace   files   quota   limit   grace
1636465724   0 1926578117   -  195034   0   0   
-
[root@lustreclient scratch]# ssh ossnode2 lfs quota -g somegroup -I 4 /lustre1
Disk quotas for grp somegroup (gid 9544):
 Filesystem  kbytes   quota   limit   grace   files   quota   limit   grace
2202272244   0 3020159313   -  185097   0   0   
-
[root@lustreclient scratch]# ssh ossnode2 lfs quota -g somegroup -I 5 /lustre1
Disk quotas for grp somegroup (gid 9544):
 Filesystem  kbytes   quota   limit   grace   files   quota   limit   grace
1324770165   0 1371244768   -  145347   0   0   
-
[root@lustreclient scratch]# ssh ossnode3 lfs quota -g somegroup -I 6 /lustre1
Disk quotas for grp somegroup (gid 9544):
 Filesystem  kbytes   quota   limit   grace   files   quota   limit   grace
2892027349   0 3221225472   -  169386   0   0   
-
[root@lustreclient scratch]# ssh ossnode3 lfs quota -g somegroup -I 7 /lustre1
Disk quotas for grp somegroup (gid 9544):
 Filesystem  kbytes   quota   limit   grace   files   quota   limit   grace
2076201636   0 2474853207   -  171552   0   0   
-


—
Dan Szkola
FNAL

> On Oct 4, 2023, at 8:45 AM, Daniel Szkola via lustre-discuss 
>  wrote:
> 
> No combination of ossnodek runs has helped with this.
> 
> Again, robinhood shows 1796104 files for the group, an 'lfs find -G gid' 
> found 1796104 files as well.
> 
> So why is the quota command showing over 3 million inodes used?
> 
> There must be a way to force it to recount or clear all stale quota data and 
> have it regenerate it?
> 
> Anyone?
> 
> —
> Dan Szkola
> FNAL
> 
> 
>> On Sep 27, 2023, at 9:42 AM, Daniel Szkola via lustre-discuss 
>>  wrote:
>> 
>> We have a lustre filesystem that we just upgraded to 2.15.3, however this 
>> problem has been going on for some time.
>> 
>> The quota command shows this:
>> 
>> Disk quotas for grp somegroup (gid 9544):
>>Filesystemused   quota   limit   grace   files   quota   limit   grace
>>  /lustre1  13.38T 40T 45T   - 3136761* 2621440 3670016 
>> expired
>> 
>> The group is not using nearly that many files. We have robinhood installed 
>> and it show this:
>> 
>> Using config file '/etc/robinhood.d/lustre1.conf'.
>>group, type,  count, volume,   spc_used,   avg_size
>> somegroup,   symlink,  59071,5.12 MB,  103.16 MB, 91
>> somegroup,   dir, 426619,5.24 GB,5.24 GB,   12.87 KB
>> somegroup,  file,1310414,   16.24 TB,   13.37 TB,   13.00 MB
>> 
>> Total: 1796104 entries, volume: 17866508365925 bytes (16.25 TB), space used: 
>> 14704924899840 bytes (13.37 TB)
>> 
>> Any ideas what is wrong here?
>> 
>> —
>> Dan Szkola
>> FNAL
>> ___
>> lustre-discuss mailing list
>> lustre-discuss@lists.lustre.org
>> https://urldefense.proofpoint.com/v2/url?u=http-3A__lists.lustre.org_listinfo.cgi_lustre-2Ddiscuss-2Dlustre.org=DwIGaQ=gRgGjJ3BkIsb5y6s49QqsA=e9DXjyTaQ786Tg7WH7oIVaQOA1YDRqyxHOUaYU2_LQw=Nk1MkSBTpT-KnrXzEvOOP5tZoVAKyHfPvB-o8_OhewuwHF6S0KelH_WPMLq8IRnR=JzAV0C2_CqaDUOG0wZr0mx5tiblBde6ZRUuIHZ2n9DI=
>>  
> 
> ___
> lustre-discuss mailing list
> lustre-discuss@lists.lustre.org
> https://urldefense.proofpoint.com/v2/url?u=http-3A__lists.lustre.org_listinfo.cgi_lustre-2Ddiscuss-2Dlustre.org=DwIGaQ=gRgGjJ3BkIsb5y6s49QqsA=e9DXjyTaQ786Tg7WH7oIVaQOA1YDRqyxHOUaYU2_LQw=k8TeSgok6MIb-uQMJaquDJS0FQPt0RQxysFNe4d7Rp5TMqGtcqdlezA_TZNuoTJS=SRDKhUKQgMW9_OohjyrkzKNYbzTw_M5BJk-bmEi_6w4=
>  

___
lustre-discuss mailing list
lustre-discuss@lists.lustre.org
http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org

Re: [lustre-discuss] Ongoing issues with quota

2023-10-04 Thread Mark Dixon via lustre-discuss


Hi Dan,

Ah, I see. Sorry, no idea - it's been a few years since I last used ZFS, 
and I've never used the Lustre ZFS backend.


Regards,

Mark

On Wed, 4 Oct 2023, Daniel Szkola wrote:


[EXTERNAL EMAIL]

Hi Mark,

All nodes are using ZFS. OSTs, MDT, and MGT are all ZFS-based, so there's
really no way to fsck them. I could do a scrub, but that's not the same
thing. Is there a Lustre/ZFS equivalent of 'tune2fs -O [^]quota' for ZFS?

I'm guessing that at some point, a large number of files was removed and
somehow quota accounting missed this.

There should be a simple way to reconcile or regenerate what quota has
recorded vs what is actually on disk, which I have verified two different
ways.

--
Dan

On Wed, 2023-10-04 at 15:01 +0100, Mark Dixon wrote:

Hi Dan,

I think it gets corrected when you umount and fsck the OST's themselves
(not lfsck). At least I recall seeing such messages when fsck'ing on 2.12.

Best,

Mark

On Wed, 4 Oct 2023, Daniel Szkola via lustre-discuss wrote:


[EXTERNAL EMAIL]

No combination of lfsck runs has helped with this.

Again, robinhood shows 1796104 files for the group, an 'lfs find -G gid'
found 1796104 files as well.

So why is the quota command showing over 3 million inodes used?

There must be a way to force it to recount or clear all stale quota data
and have it regenerate it?

Anyone?

—
Dan Szkola
FNAL



On Sep 27, 2023, at 9:42 AM, Daniel Szkola via lustre-discuss
 wrote:

We have a lustre filesystem that we just upgraded to 2.15.3, however
this problem has been going on for some time.

The quota command shows this:

Disk quotas for grp somegroup (gid 9544):
Filesystemused   quota   limit   grace   files   quota
limit   grace
  /lustre1  13.38T 40T 45T   - 3136761* 2621440
3670016 expired

The group is not using nearly that many files. We have robinhood
installed and it show this:

Using config file '/etc/robinhood.d/lustre1.conf'.
group, type,  count, volume,   spc_used,   avg_size
somegroup,   symlink,  59071,5.12 MB,  103.16 MB, 91
somegroup,   dir, 426619,5.24 GB,5.24 GB,   12.87 KB
somegroup,  file,1310414,   16.24 TB,   13.37 TB,   13.00 MB

Total: 1796104 entries, volume: 17866508365925 bytes (16.25 TB), space
used: 14704924899840 bytes (13.37 TB)

Any ideas what is wrong here?

—
Dan Szkola
FNAL
___
lustre-discuss mailing list
lustre-discuss@lists.lustre.org
https://urldefense.proofpoint.com/v2/url?u=http-3A__lists.lustre.org_listinfo.cgi_lustre-2Ddiscuss-2Dlustre.org=DwIGaQ=gRgGjJ3BkIsb5y6s49QqsA=e9DXjyTaQ786Tg7WH7oIVaQOA1YDRqyxHOUaYU2_LQw=Nk1MkSBTpT-KnrXzEvOOP5tZoVAKyHfPvB-o8_OhewuwHF6S0KelH_WPMLq8IRnR=JzAV0C2_CqaDUOG0wZr0mx5tiblBde6ZRUuIHZ2n9DI=


___
lustre-discuss mailing list
lustre-discuss@lists.lustre.org
https://urldefense.proofpoint.com/v2/url?u=http-3A__lists.lustre.org_listinfo.cgi_lustre-2Ddiscuss-2Dlustre.org=DwIDaQ=gRgGjJ3BkIsb5y6s49QqsA=e9DXjyTaQ786Tg7WH7oIVaQOA1YDRqyxHOUaYU2_LQw=BBVt50ufoqbL64MfSKVa87fK1B4Q0n91KVNJVmvb-9q9xOYwnzpZcOXWgUeM6fxQ=uTJ98MgxxcM61HIDJRBpfJpuLDt9Ug4ARh8P_Api3xQ=



___
lustre-discuss mailing list
lustre-discuss@lists.lustre.org
http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org

[lustre-discuss] Failing build of lustre client on Debian 12

2023-10-04 Thread Jan Andersen


Hi,

I've just successfully built the lustre 2.15.3 client on Debian 11 and need to 
do the same on Debian 12; however, configure fails with:

checking if Linux kernel was built with CONFIG_FHANDLE in or as module... no
configure: error:

Lustre fid handling requires that CONFIG_FHANDLE is enabled in your kernel.



As far as I can see, CONFIG_FHANDLE is in fact enabled - eg:

root@debian12:~/lustre-release# grep CONFIG_FHANDLE /boot/config-6.1.38
CONFIG_FHANDLE=y

I've tried to figure out how configure checks for this, but the script is 
rather dense and I haven't penetrated it (yet). It seems to me there is an 
error in the way it checks. What is the best way forward, considering that I've 
already invested a lot of time and effort in setting up a slurm cluster with 
Debian 12?

/jan
___
lustre-discuss mailing list
lustre-discuss@lists.lustre.org
http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org

Re: [lustre-discuss] Ongoing issues with quota

2023-10-04 Thread Daniel Szkola via lustre-discuss

Hi Mark,

All nodes are using ZFS. OSTs, MDT, and MGT are all ZFS-based, so there's
really no way to fsck them. I could do a scrub, but that's not the same
thing. Is there a Lustre/ZFS equivalent of 'tune2fs -O [^]quota' for ZFS?

I'm guessing that at some point, a large number of files was removed and
somehow quota accounting missed this.

There should be a simple way to reconcile or regenerate what quota has
recorded vs what is actually on disk, which I have verified two different
ways. 

--
Dan

On Wed, 2023-10-04 at 15:01 +0100, Mark Dixon wrote:
> Hi Dan,
> 
> I think it gets corrected when you umount and fsck the OST's themselves 
> (not lfsck). At least I recall seeing such messages when fsck'ing on 2.12.
> 
> Best,
> 
> Mark
> 
> On Wed, 4 Oct 2023, Daniel Szkola via lustre-discuss wrote:
> 
> > [EXTERNAL EMAIL]
> > 
> > No combination of lfsck runs has helped with this.
> > 
> > Again, robinhood shows 1796104 files for the group, an 'lfs find -G gid'
> > found 1796104 files as well.
> > 
> > So why is the quota command showing over 3 million inodes used?
> > 
> > There must be a way to force it to recount or clear all stale quota data
> > and have it regenerate it?
> > 
> > Anyone?
> > 
> > —
> > Dan Szkola
> > FNAL
> > 
> > 
> > > On Sep 27, 2023, at 9:42 AM, Daniel Szkola via lustre-discuss
> > >  wrote:
> > > 
> > > We have a lustre filesystem that we just upgraded to 2.15.3, however
> > > this problem has been going on for some time.
> > > 
> > > The quota command shows this:
> > > 
> > > Disk quotas for grp somegroup (gid 9544):
> > >     Filesystem    used   quota   limit   grace   files   quota  
> > > limit   grace
> > >   /lustre1  13.38T 40T 45T   - 3136761* 2621440
> > > 3670016 expired
> > > 
> > > The group is not using nearly that many files. We have robinhood
> > > installed and it show this:
> > > 
> > > Using config file '/etc/robinhood.d/lustre1.conf'.
> > >     group, type,  count, volume,   spc_used,   avg_size
> > > somegroup,   symlink,  59071,    5.12 MB,  103.16 MB, 91
> > > somegroup,   dir, 426619,    5.24 GB,    5.24 GB,   12.87 KB
> > > somegroup,  file,    1310414,   16.24 TB,   13.37 TB,   13.00 MB
> > > 
> > > Total: 1796104 entries, volume: 17866508365925 bytes (16.25 TB), space
> > > used: 14704924899840 bytes (13.37 TB)
> > > 
> > > Any ideas what is wrong here?
> > > 
> > > —
> > > Dan Szkola
> > > FNAL
> > > ___
> > > lustre-discuss mailing list
> > > lustre-discuss@lists.lustre.org
> > > https://urldefense.proofpoint.com/v2/url?u=http-3A__lists.lustre.org_listinfo.cgi_lustre-2Ddiscuss-2Dlustre.org=DwIGaQ=gRgGjJ3BkIsb5y6s49QqsA=e9DXjyTaQ786Tg7WH7oIVaQOA1YDRqyxHOUaYU2_LQw=Nk1MkSBTpT-KnrXzEvOOP5tZoVAKyHfPvB-o8_OhewuwHF6S0KelH_WPMLq8IRnR=JzAV0C2_CqaDUOG0wZr0mx5tiblBde6ZRUuIHZ2n9DI=
> > 
> > ___
> > lustre-discuss mailing list
> > lustre-discuss@lists.lustre.org
> > https://urldefense.proofpoint.com/v2/url?u=http-3A__lists.lustre.org_listinfo.cgi_lustre-2Ddiscuss-2Dlustre.org=DwIDaQ=gRgGjJ3BkIsb5y6s49QqsA=e9DXjyTaQ786Tg7WH7oIVaQOA1YDRqyxHOUaYU2_LQw=BBVt50ufoqbL64MfSKVa87fK1B4Q0n91KVNJVmvb-9q9xOYwnzpZcOXWgUeM6fxQ=uTJ98MgxxcM61HIDJRBpfJpuLDt9Ug4ARh8P_Api3xQ=
> >  

___
lustre-discuss mailing list
lustre-discuss@lists.lustre.org
http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org

Re: [lustre-discuss] Ongoing issues with quota

2023-10-04 Thread Mark Dixon via lustre-discuss


Hi Dan,

I think it gets corrected when you umount and fsck the OST's themselves 
(not lfsck). At least I recall seeing such messages when fsck'ing on 2.12.


Best,

Mark

On Wed, 4 Oct 2023, Daniel Szkola via lustre-discuss wrote:


[EXTERNAL EMAIL]

No combination of lfsck runs has helped with this.

Again, robinhood shows 1796104 files for the group, an 'lfs find -G gid' found 
1796104 files as well.

So why is the quota command showing over 3 million inodes used?

There must be a way to force it to recount or clear all stale quota data and 
have it regenerate it?

Anyone?

—
Dan Szkola
FNAL



On Sep 27, 2023, at 9:42 AM, Daniel Szkola via lustre-discuss 
 wrote:

We have a lustre filesystem that we just upgraded to 2.15.3, however this 
problem has been going on for some time.

The quota command shows this:

Disk quotas for grp somegroup (gid 9544):
Filesystemused   quota   limit   grace   files   quota   limit   grace
  /lustre1  13.38T 40T 45T   - 3136761* 2621440 3670016 expired

The group is not using nearly that many files. We have robinhood installed and 
it show this:

Using config file '/etc/robinhood.d/lustre1.conf'.
group, type,  count, volume,   spc_used,   avg_size
somegroup,   symlink,  59071,5.12 MB,  103.16 MB, 91
somegroup,   dir, 426619,5.24 GB,5.24 GB,   12.87 KB
somegroup,  file,1310414,   16.24 TB,   13.37 TB,   13.00 MB

Total: 1796104 entries, volume: 17866508365925 bytes (16.25 TB), space used: 
14704924899840 bytes (13.37 TB)

Any ideas what is wrong here?

—
Dan Szkola
FNAL
___
lustre-discuss mailing list
lustre-discuss@lists.lustre.org
https://urldefense.proofpoint.com/v2/url?u=http-3A__lists.lustre.org_listinfo.cgi_lustre-2Ddiscuss-2Dlustre.org=DwIGaQ=gRgGjJ3BkIsb5y6s49QqsA=e9DXjyTaQ786Tg7WH7oIVaQOA1YDRqyxHOUaYU2_LQw=Nk1MkSBTpT-KnrXzEvOOP5tZoVAKyHfPvB-o8_OhewuwHF6S0KelH_WPMLq8IRnR=JzAV0C2_CqaDUOG0wZr0mx5tiblBde6ZRUuIHZ2n9DI=


___
lustre-discuss mailing list
lustre-discuss@lists.lustre.org
http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org
___
lustre-discuss mailing list
lustre-discuss@lists.lustre.org
http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org

Re: [lustre-discuss] Ongoing issues with quota

2023-10-04 Thread Daniel Szkola via lustre-discuss

No combination of lfsck runs has helped with this.

Again, robinhood shows 1796104 files for the group, an 'lfs find -G gid' found 
1796104 files as well.

So why is the quota command showing over 3 million inodes used?

There must be a way to force it to recount or clear all stale quota data and 
have it regenerate it?

Anyone?

—
Dan Szkola
FNAL


> On Sep 27, 2023, at 9:42 AM, Daniel Szkola via lustre-discuss 
>  wrote:
> 
> We have a lustre filesystem that we just upgraded to 2.15.3, however this 
> problem has been going on for some time.
> 
> The quota command shows this:
> 
> Disk quotas for grp somegroup (gid 9544):
> Filesystemused   quota   limit   grace   files   quota   limit   grace
>   /lustre1  13.38T 40T 45T   - 3136761* 2621440 3670016 
> expired
> 
> The group is not using nearly that many files. We have robinhood installed 
> and it show this:
> 
> Using config file '/etc/robinhood.d/lustre1.conf'.
> group, type,  count, volume,   spc_used,   avg_size
> somegroup,   symlink,  59071,5.12 MB,  103.16 MB, 91
> somegroup,   dir, 426619,5.24 GB,5.24 GB,   12.87 KB
> somegroup,  file,1310414,   16.24 TB,   13.37 TB,   13.00 MB
> 
> Total: 1796104 entries, volume: 17866508365925 bytes (16.25 TB), space used: 
> 14704924899840 bytes (13.37 TB)
> 
> Any ideas what is wrong here?
> 
> —
> Dan Szkola
> FNAL
> ___
> lustre-discuss mailing list
> lustre-discuss@lists.lustre.org
> https://urldefense.proofpoint.com/v2/url?u=http-3A__lists.lustre.org_listinfo.cgi_lustre-2Ddiscuss-2Dlustre.org=DwIGaQ=gRgGjJ3BkIsb5y6s49QqsA=e9DXjyTaQ786Tg7WH7oIVaQOA1YDRqyxHOUaYU2_LQw=Nk1MkSBTpT-KnrXzEvOOP5tZoVAKyHfPvB-o8_OhewuwHF6S0KelH_WPMLq8IRnR=JzAV0C2_CqaDUOG0wZr0mx5tiblBde6ZRUuIHZ2n9DI=
>  

___
lustre-discuss mailing list
lustre-discuss@lists.lustre.org
http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org

Re: [lustre-discuss] OST went back in time: no(?) hardware issue

Re: [lustre-discuss] OST went back in time: no(?) hardware issue

Re: [lustre-discuss] Failing build of lustre client on Debian 12

Re: [lustre-discuss] Ongoing issues with quota

Re: [lustre-discuss] Ongoing issues with quota

[lustre-discuss] Failing build of lustre client on Debian 12

Re: [lustre-discuss] Ongoing issues with quota

Re: [lustre-discuss] Ongoing issues with quota

Re: [lustre-discuss] Ongoing issues with quota

9 matches

Site Navigation

Mail list logo

Footer information