Hello Andreas, lfs df -i reports 19,204,412 inodes used. When I did the full robinhood scan, it reported scanning 18,673,874 entries, so fairly close.
I don’t have a .lustre directory at the filesystem root. Another interesting aspect of this particular issue is I can run lctl lfsck and every time I get: layout_repaired: 1468299 But it doesn’t seem to be actually repairing anything because if I run it again, I’ll get the same or a similar number. I run it like this: lctl lfsck_start -t layout -t namespace -o -M lfsc-MDT0000 — Dan Szkola FNAL > On Oct 10, 2023, at 10:47 AM, Andreas Dilger <adil...@whamcloud.com> wrote: > > There is a $ROOT/.lustre/lost+found that you could check. > > What does "lfs df -i" report for the used inode count? Maybe it is RBH that > is reporting the wrong count? > > The other alternative would be to mount the MDT filesystem directly as type > ZFS and see what df -i and find report? > > Cheers, Andreas > >> On Oct 10, 2023, at 22:16, Daniel Szkola via lustre-discuss >> <lustre-discuss@lists.lustre.org> wrote: >> >> OK, I disabled, waited for a while, then reenabled. I still get the same >> numbers. The only thing I can think is somehow the count is correct, despite >> the huge difference. Robinhood and find show about 1.7M files, dirs, and >> links. The quota is showing a bit over 3.1M inodes used. We only have one >> MDS and MGS. Any ideas where the discrepancy may lie? Orphans? Is there a >> lost+found area in lustre? >> >> — >> Dan Szkola >> FNAL >> >> >>> On Oct 10, 2023, at 8:24 AM, Daniel Szkola <dszk...@fnal.gov> wrote: >>> >>> Hi Robert, >>> >>> Thanks for the response. Do you remember exactly how you did it? Did you >>> bring everything down at any point? I know you can do this: >>> >>> lctl conf_param fsname.quota.mdt=none >>> >>> but is that all you did? Did you wait or bring everything down before >>> reenabling? I’m worried because that allegedly just enables/disables >>> enforcement and space accounting is always on. Andreas stated that quotas >>> are controlled by ZFS, but there has been no quota support enabled on any >>> of the ZFS volumes in our lustre filesystem. >>> >>> — >>> Dan Szkola >>> FNAL >>> >>>>> On Oct 10, 2023, at 2:17 AM, Redl, Robert <robert.r...@lmu.de> wrote: >>>> >>>> Dear Dan, >>>> >>>> I had a similar problem some time ago. We are also using ZFS for MDT and >>>> OSTs. For us, the used disk space was reported wrong. The problem was >>>> fixed by switching quota support off on the MGS and then on again. >>>> >>>> Cheers, >>>> Robert >>>> >>>>> Am 09.10.2023 um 17:55 schrieb Daniel Szkola via lustre-discuss >>>>> <lustre-discuss@lists.lustre.org>: >>>>> >>>>> Thanks, I will look into the ZFS quota since we are using ZFS for all >>>>> storage, MDT and OSTs. >>>>> >>>>> In our case, there is a single MDS/MDT. I have used Robinhood and lfs >>>>> find (by group) commands to verify what the numbers should apparently be. >>>>> >>>>> — >>>>> Dan Szkola >>>>> FNAL >>>>> >>>>>> On Oct 9, 2023, at 10:13 AM, Andreas Dilger <adil...@whamcloud.com> >>>>>> wrote: >>>>>> >>>>>> The quota accounting is controlled by the backing filesystem of the OSTs >>>>>> and MDTs. >>>>>> >>>>>> For ldiskfs/ext4 you could run e2fsck to re-count all of the inode and >>>>>> block usage. >>>>>> >>>>>> For ZFS you would have to ask on the ZFS list to see if there is some >>>>>> way to re-count the quota usage. >>>>>> >>>>>> The "inode" quota is accounted from the MDTs, while the "block" quota is >>>>>> accounted from the OSTs. You might be able to see with "lfs quota -v -g >>>>>> group" to see if there is one particular MDT that is returning too many >>>>>> inodes. >>>>>> >>>>>> Possibly if you have directories that are striped across many MDTs it >>>>>> would inflate the used inode count. For example, if every one of the >>>>>> 426k directories reported by RBH was striped across 4 MDTs then you >>>>>> would see the inode count add up to 3.6M. >>>>>> >>>>>> If that was the case, then I would really, really advise against >>>>>> striping every directory in the filesystem. That will cause problems >>>>>> far worse than just inflating the inode quota accounting. >>>>>> >>>>>> Cheers, Andreas >>>>>> >>>>>>> On Oct 9, 2023, at 22:33, Daniel Szkola via lustre-discuss >>>>>>> <lustre-discuss@lists.lustre.org> wrote: >>>>>>> >>>>>>> Is there really no way to force a recount of files used by the quota? >>>>>>> All indications are we have accounts where files were removed and this >>>>>>> is not reflected in the used file count in the quota. The space used >>>>>>> seems correct but the inodes used numbers are way high. There must be a >>>>>>> way to clear these numbers and have a fresh count done. >>>>>>> >>>>>>> — >>>>>>> Dan Szkola >>>>>>> FNAL >>>>>>> >>>>>>>> On Oct 4, 2023, at 11:37 AM, Daniel Szkola via lustre-discuss >>>>>>>> <lustre-discuss@lists.lustre.org> wrote: >>>>>>>> >>>>>>>> Also, quotas on the OSTS don’t add up to near 3 million files either: >>>>>>>> >>>>>>>> [root@lustreclient scratch]# ssh ossnode0 lfs quota -g somegroup -I 0 >>>>>>>> /lustre1 >>>>>>>> Disk quotas for grp somegroup (gid 9544): >>>>>>>> Filesystem kbytes quota limit grace files quota limit >>>>>>>> grace >>>>>>>> 1394853459 0 1913344192 - 132863 0 0 >>>>>>>> - >>>>>>>> [root@lustreclient scratch]# ssh ossnode0 lfs quota -g somegroup -I 1 >>>>>>>> /lustre1 >>>>>>>> Disk quotas for grp somegroup (gid 9544): >>>>>>>> Filesystem kbytes quota limit grace files quota limit >>>>>>>> grace >>>>>>>> 1411579601 0 1963246413 - 120643 0 0 >>>>>>>> - >>>>>>>> [root@lustreclient scratch]# ssh ossnode1 lfs quota -g somegroup -I 2 >>>>>>>> /lustre1 >>>>>>>> Disk quotas for grp somegroup (gid 9544): >>>>>>>> Filesystem kbytes quota limit grace files quota limit >>>>>>>> grace >>>>>>>> 1416507527 0 1789950778 - 190687 0 0 >>>>>>>> - >>>>>>>> [root@lustreclient scratch]# ssh ossnode1 lfs quota -g somegroup -I 3 >>>>>>>> /lustre1 >>>>>>>> Disk quotas for grp somegroup (gid 9544): >>>>>>>> Filesystem kbytes quota limit grace files quota limit >>>>>>>> grace >>>>>>>> 1636465724 0 1926578117 - 195034 0 0 >>>>>>>> - >>>>>>>> [root@lustreclient scratch]# ssh ossnode2 lfs quota -g somegroup -I 4 >>>>>>>> /lustre1 >>>>>>>> Disk quotas for grp somegroup (gid 9544): >>>>>>>> Filesystem kbytes quota limit grace files quota limit >>>>>>>> grace >>>>>>>> 2202272244 0 3020159313 - 185097 0 0 >>>>>>>> - >>>>>>>> [root@lustreclient scratch]# ssh ossnode2 lfs quota -g somegroup -I 5 >>>>>>>> /lustre1 >>>>>>>> Disk quotas for grp somegroup (gid 9544): >>>>>>>> Filesystem kbytes quota limit grace files quota limit >>>>>>>> grace >>>>>>>> 1324770165 0 1371244768 - 145347 0 0 >>>>>>>> - >>>>>>>> [root@lustreclient scratch]# ssh ossnode3 lfs quota -g somegroup -I 6 >>>>>>>> /lustre1 >>>>>>>> Disk quotas for grp somegroup (gid 9544): >>>>>>>> Filesystem kbytes quota limit grace files quota limit >>>>>>>> grace >>>>>>>> 2892027349 0 3221225472 - 169386 0 0 >>>>>>>> - >>>>>>>> [root@lustreclient scratch]# ssh ossnode3 lfs quota -g somegroup -I 7 >>>>>>>> /lustre1 >>>>>>>> Disk quotas for grp somegroup (gid 9544): >>>>>>>> Filesystem kbytes quota limit grace files quota limit >>>>>>>> grace >>>>>>>> 2076201636 0 2474853207 - 171552 0 0 >>>>>>>> - >>>>>>>> >>>>>>>> >>>>>>>> — >>>>>>>> Dan Szkola >>>>>>>> FNAL >>>>>>>> >>>>>>>>>> On Oct 4, 2023, at 8:45 AM, Daniel Szkola via lustre-discuss >>>>>>>>>> <lustre-discuss@lists.lustre.org> wrote: >>>>>>>>> >>>>>>>>> No combination of ossnodek runs has helped with this. >>>>>>>>> >>>>>>>>> Again, robinhood shows 1796104 files for the group, an 'lfs find -G >>>>>>>>> gid' found 1796104 files as well. >>>>>>>>> >>>>>>>>> So why is the quota command showing over 3 million inodes used? >>>>>>>>> >>>>>>>>> There must be a way to force it to recount or clear all stale quota >>>>>>>>> data and have it regenerate it? >>>>>>>>> >>>>>>>>> Anyone? >>>>>>>>> >>>>>>>>> — >>>>>>>>> Dan Szkola >>>>>>>>> FNAL >>>>>>>>> >>>>>>>>> >>>>>>>>>> On Sep 27, 2023, at 9:42 AM, Daniel Szkola via lustre-discuss >>>>>>>>>> <lustre-discuss@lists.lustre.org> wrote: >>>>>>>>>> >>>>>>>>>> We have a lustre filesystem that we just upgraded to 2.15.3, however >>>>>>>>>> this problem has been going on for some time. >>>>>>>>>> >>>>>>>>>> The quota command shows this: >>>>>>>>>> >>>>>>>>>> Disk quotas for grp somegroup (gid 9544): >>>>>>>>>> Filesystem used quota limit grace files quota limit >>>>>>>>>> grace >>>>>>>>>> /lustre1 13.38T 40T 45T - 3136761* 2621440 3670016 >>>>>>>>>> expired >>>>>>>>>> >>>>>>>>>> The group is not using nearly that many files. We have robinhood >>>>>>>>>> installed and it show this: >>>>>>>>>> >>>>>>>>>> Using config file '/etc/robinhood.d/lustre1.conf'. >>>>>>>>>> group, type, count, volume, spc_used, avg_size >>>>>>>>>> somegroup, symlink, 59071, 5.12 MB, 103.16 MB, 91 >>>>>>>>>> somegroup, dir, 426619, 5.24 GB, 5.24 GB, 12.87 KB >>>>>>>>>> somegroup, file, 1310414, 16.24 TB, 13.37 TB, 13.00 MB >>>>>>>>>> >>>>>>>>>> Total: 1796104 entries, volume: 17866508365925 bytes (16.25 TB), >>>>>>>>>> space used: 14704924899840 bytes (13.37 TB) >>>>>>>>>> >>>>>>>>>> Any ideas what is wrong here? >>>>>>>>>> >>>>>>>>>> — >>>>>>>>>> Dan Szkola >>>>>>>>>> FNAL >>>>> >>>>> _______________________________________________ >>>>> lustre-discuss mailing list >>>>> lustre-discuss@lists.lustre.org >>>>> https://urldefense.proofpoint.com/v2/url?u=http-3A__lists.lustre.org_listinfo.cgi_lustre-2Ddiscuss-2Dlustre.org&d=DwIGaQ&c=gRgGjJ3BkIsb5y6s49QqsA&r=e9DXjyTaQ786Tg7WH7oIVaQOA1YDRqyxHOUaYU2_LQw&m=fBsuy6PvJ6sM0CR4j5XQgdmXVIcfg7TPKCS_M1GPMH4gO9ZYMYzNwrtCk4VsbxsJ&s=G9xyFTfK33CmMEh4Da4Vor1Iu5u_EwwX04fX1YarFPI&e= >>>>> >>>> >>> >> >> _______________________________________________ >> lustre-discuss mailing list >> lustre-discuss@lists.lustre.org >> https://urldefense.proofpoint.com/v2/url?u=http-3A__lists.lustre.org_listinfo.cgi_lustre-2Ddiscuss-2Dlustre.org&d=DwIGaQ&c=gRgGjJ3BkIsb5y6s49QqsA&r=e9DXjyTaQ786Tg7WH7oIVaQOA1YDRqyxHOUaYU2_LQw&m=fBsuy6PvJ6sM0CR4j5XQgdmXVIcfg7TPKCS_M1GPMH4gO9ZYMYzNwrtCk4VsbxsJ&s=G9xyFTfK33CmMEh4Da4Vor1Iu5u_EwwX04fX1YarFPI&e= >> _______________________________________________ lustre-discuss mailing list lustre-discuss@lists.lustre.org http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org