In this case almost all, if not all, of the files look a lot like this: -r-------- 1 someuser somegroup 0 Dec 31 1969 '[0x200012392:0xe0ad:0x0]-R-0’
stat shows: # stat [0x200012392:0xe0ad:0x0]-R-0 File: [0x200012392:0xe0ad:0x0]-R-0 Size: 0 Blocks: 1 IO Block: 4194304 regular empty file Device: a75b4da0h/2807778720d Inode: 144116440360870061 Links: 1 Access: (0400/-r--------) Uid: (43667/ someuser) Gid: ( 9349/somegroup) Access: 1969-12-31 18:00:00.000000000 -0600 Modify: 1969-12-31 18:00:00.000000000 -0600 Change: 1969-12-31 18:00:00.000000000 -0600 Birth: 2023-01-11 13:01:40.000000000 -0600 Not sure what these were or how they ended up in lost+found. I took this lustre fs over from folks who have moved on and I’m still trying to wrap my head around some of the finer details. In a normal linux fs, usually, not always, the blocks will have data in them. These are all zero-length. My inclination is to see if I can delete them and be done with it, but I’m a bit paranoid. — Dan Szkola FNAL > On Oct 17, 2023, at 4:23 PM, Andreas Dilger <adil...@whamcloud.com> wrote: > > The files reported in .lustre/lost+found *ARE* the objects on the OSTs (at > least when accessed through a Lustre mountpoint, not if accessed directly on > the MDT mounted as ldiskfs), so when they are deleted the space on the OSTs > will be freed. > > As for identification, the OST objects do not have any name information, but > they should have UID/GID/PROJID and timestamps that might help identification. > > Cheers, Andreas > >> On Oct 18, 2023, at 03:42, Daniel Szkola <dszk...@fnal.gov> wrote: >> >> OK, so I did find the hidden .lustre directory (thanks Darby) and there are >> many, many files in the lost+found directory. I can run ’stat’ on them and >> get some info. Is there anything else I can do to tell what these were? Is >> it safe to delete them? Is there anyway to tell if there are matching files >> on the OST(s) that also need to be deleted? >> >> — >> Dan Szkola >> FNAL >> >>> On Oct 10, 2023, at 3:44 PM, Vicker, Darby J. (JSC-EG111)[Jacobs >>> Technology, Inc.] <darby.vicke...@nasa.gov> wrote: >>> >>>> I don’t have a .lustre directory at the filesystem root. >>> >>> It's there, but doesn't show up even with 'ls -a'. If you cd into it or ls >>> it, it's there. Lustre magic. :) >>> >>> -----Original Message----- >>> From: lustre-discuss <lustre-discuss-boun...@lists.lustre.org >>> <mailto:lustre-discuss-boun...@lists.lustre.org>> on behalf of Daniel >>> Szkola via lustre-discuss <lustre-discuss@lists.lustre.org >>> <mailto:lustre-discuss@lists.lustre.org>> >>> Reply-To: Daniel Szkola <dszk...@fnal.gov <mailto:dszk...@fnal.gov>> >>> Date: Tuesday, October 10, 2023 at 2:30 PM >>> To: Andreas Dilger <adil...@whamcloud.com <mailto:adil...@whamcloud.com>> >>> Cc: lustre <lustre-discuss@lists.lustre.org >>> <mailto:lustre-discuss@lists.lustre.org>> >>> Subject: [EXTERNAL] [BULK] Re: [lustre-discuss] Ongoing issues with quota >>> >>> >>> CAUTION: This email originated from outside of NASA. Please take care when >>> clicking links or opening attachments. Use the "Report Message" button to >>> report suspicious messages to the NASA SOC. >>> >>> >>> >>> >>> >>> >>> >>> >>> Hello Andreas, >>> >>> >>> lfs df -i reports 19,204,412 inodes used. When I did the full robinhood >>> scan, it reported scanning 18,673,874 entries, so fairly close. >>> >>> >>> I don’t have a .lustre directory at the filesystem root. >>> >>> >>> Another interesting aspect of this particular issue is I can run lctl lfsck >>> and every time I get: >>> >>> >>> layout_repaired: 1468299 >>> >>> >>> But it doesn’t seem to be actually repairing anything because if I run it >>> again, I’ll get the same or a similar number. >>> >>> >>> I run it like this: >>> lctl lfsck_start -t layout -t namespace -o -M lfsc-MDT0000 >>> >>> >>> — >>> Dan Szkola >>> FNAL >>> >>> >>> >>> >>>> On Oct 10, 2023, at 10:47 AM, Andreas Dilger <adil...@whamcloud.com >>>> <mailto:adil...@whamcloud.com>> wrote: >>>> >>>> There is a $ROOT/.lustre/lost+found that you could check. >>>> >>>> What does "lfs df -i" report for the used inode count? Maybe it is RBH >>>> that is reporting the wrong count? >>>> >>>> The other alternative would be to mount the MDT filesystem directly as >>>> type ZFS and see what df -i and find report? >>>> >>>> Cheers, Andreas >>>> >>>>> On Oct 10, 2023, at 22:16, Daniel Szkola via lustre-discuss >>>>> <lustre-discuss@lists.lustre.org >>>>> <mailto:lustre-discuss@lists.lustre.org>> wrote: >>>>> >>>>> OK, I disabled, waited for a while, then reenabled. I still get the same >>>>> numbers. The only thing I can think is somehow the count is correct, >>>>> despite the huge difference. Robinhood and find show about 1.7M files, >>>>> dirs, and links. The quota is showing a bit over 3.1M inodes used. We >>>>> only have one MDS and MGS. Any ideas where the discrepancy may lie? >>>>> Orphans? Is there a lost+found area in lustre? >>>>> >>>>> — >>>>> Dan Szkola >>>>> FNAL >>>>> >>>>> >>>>>> On Oct 10, 2023, at 8:24 AM, Daniel Szkola <dszk...@fnal.gov >>>>>> <mailto:dszk...@fnal.gov>> wrote: >>>>>> >>>>>> Hi Robert, >>>>>> >>>>>> Thanks for the response. Do you remember exactly how you did it? Did you >>>>>> bring everything down at any point? I know you can do this: >>>>>> >>>>>> lctl conf_param fsname.quota.mdt=none >>>>>> >>>>>> but is that all you did? Did you wait or bring everything down before >>>>>> reenabling? I’m worried because that allegedly just enables/disables >>>>>> enforcement and space accounting is always on. Andreas stated that >>>>>> quotas are controlled by ZFS, but there has been no quota support >>>>>> enabled on any of the ZFS volumes in our lustre filesystem. >>>>>> >>>>>> — >>>>>> Dan Szkola >>>>>> FNAL >>>>>> >>>>>>>> On Oct 10, 2023, at 2:17 AM, Redl, Robert <robert.r...@lmu.de >>>>>>>> <mailto:robert.r...@lmu.de>> wrote: >>>>>>> >>>>>>> Dear Dan, >>>>>>> >>>>>>> I had a similar problem some time ago. We are also using ZFS for MDT >>>>>>> and OSTs. For us, the used disk space was reported wrong. The problem >>>>>>> was fixed by switching quota support off on the MGS and then on again. >>>>>>> >>>>>>> Cheers, >>>>>>> Robert >>>>>>> >>>>>>>> Am 09.10.2023 um 17:55 schrieb Daniel Szkola via lustre-discuss >>>>>>>> <lustre-discuss@lists.lustre.org >>>>>>>> <mailto:lustre-discuss@lists.lustre.org>>: >>>>>>>> >>>>>>>> Thanks, I will look into the ZFS quota since we are using ZFS for all >>>>>>>> storage, MDT and OSTs. >>>>>>>> >>>>>>>> In our case, there is a single MDS/MDT. I have used Robinhood and lfs >>>>>>>> find (by group) commands to verify what the numbers should apparently >>>>>>>> be. >>>>>>>> >>>>>>>> — >>>>>>>> Dan Szkola >>>>>>>> FNAL >>>>>>>> >>>>>>>>> On Oct 9, 2023, at 10:13 AM, Andreas Dilger <adil...@whamcloud.com >>>>>>>>> <mailto:adil...@whamcloud.com>> wrote: >>>>>>>>> >>>>>>>>> The quota accounting is controlled by the backing filesystem of the >>>>>>>>> OSTs and MDTs. >>>>>>>>> >>>>>>>>> For ldiskfs/ext4 you could run e2fsck to re-count all of the inode >>>>>>>>> and block usage. >>>>>>>>> >>>>>>>>> For ZFS you would have to ask on the ZFS list to see if there is some >>>>>>>>> way to re-count the quota usage. >>>>>>>>> >>>>>>>>> The "inode" quota is accounted from the MDTs, while the "block" quota >>>>>>>>> is accounted from the OSTs. You might be able to see with "lfs quota >>>>>>>>> -v -g group" to see if there is one particular MDT that is returning >>>>>>>>> too many inodes. >>>>>>>>> >>>>>>>>> Possibly if you have directories that are striped across many MDTs it >>>>>>>>> would inflate the used inode count. For example, if every one of the >>>>>>>>> 426k directories reported by RBH was striped across 4 MDTs then you >>>>>>>>> would see the inode count add up to 3.6M. >>>>>>>>> >>>>>>>>> If that was the case, then I would really, really advise against >>>>>>>>> striping every directory in the filesystem. That will cause problems >>>>>>>>> far worse than just inflating the inode quota accounting. >>>>>>>>> >>>>>>>>> Cheers, Andreas >>>>>>>>> >>>>>>>>>> On Oct 9, 2023, at 22:33, Daniel Szkola via lustre-discuss >>>>>>>>>> <lustre-discuss@lists.lustre.org >>>>>>>>>> <mailto:lustre-discuss@lists.lustre.org>> wrote: >>>>>>>>>> >>>>>>>>>> Is there really no way to force a recount of files used by the >>>>>>>>>> quota? All indications are we have accounts where files were removed >>>>>>>>>> and this is not reflected in the used file count in the quota. The >>>>>>>>>> space used seems correct but the inodes used numbers are way high. >>>>>>>>>> There must be a way to clear these numbers and have a fresh count >>>>>>>>>> done. >>>>>>>>>> >>>>>>>>>> — >>>>>>>>>> Dan Szkola >>>>>>>>>> FNAL >>>>>>>>>> >>>>>>>>>>> On Oct 4, 2023, at 11:37 AM, Daniel Szkola via lustre-discuss >>>>>>>>>>> <lustre-discuss@lists.lustre.org >>>>>>>>>>> <mailto:lustre-discuss@lists.lustre.org>> wrote: >>>>>>>>>>> >>>>>>>>>>> Also, quotas on the OSTS don’t add up to near 3 million files >>>>>>>>>>> either: >>>>>>>>>>> >>>>>>>>>>> [root@lustreclient scratch]# ssh ossnode0 lfs quota -g somegroup -I >>>>>>>>>>> 0 /lustre1 >>>>>>>>>>> Disk quotas for grp somegroup (gid 9544): >>>>>>>>>>> Filesystem kbytes quota limit grace files quota limit grace >>>>>>>>>>> 1394853459 0 1913344192 - 132863 0 0 - >>>>>>>>>>> [root@lustreclient scratch]# ssh ossnode0 lfs quota -g somegroup -I >>>>>>>>>>> 1 /lustre1 >>>>>>>>>>> Disk quotas for grp somegroup (gid 9544): >>>>>>>>>>> Filesystem kbytes quota limit grace files quota limit grace >>>>>>>>>>> 1411579601 0 1963246413 - 120643 0 0 - >>>>>>>>>>> [root@lustreclient scratch]# ssh ossnode1 lfs quota -g somegroup -I >>>>>>>>>>> 2 /lustre1 >>>>>>>>>>> Disk quotas for grp somegroup (gid 9544): >>>>>>>>>>> Filesystem kbytes quota limit grace files quota limit grace >>>>>>>>>>> 1416507527 0 1789950778 - 190687 0 0 - >>>>>>>>>>> [root@lustreclient scratch]# ssh ossnode1 lfs quota -g somegroup -I >>>>>>>>>>> 3 /lustre1 >>>>>>>>>>> Disk quotas for grp somegroup (gid 9544): >>>>>>>>>>> Filesystem kbytes quota limit grace files quota limit grace >>>>>>>>>>> 1636465724 0 1926578117 - 195034 0 0 - >>>>>>>>>>> [root@lustreclient scratch]# ssh ossnode2 lfs quota -g somegroup -I >>>>>>>>>>> 4 /lustre1 >>>>>>>>>>> Disk quotas for grp somegroup (gid 9544): >>>>>>>>>>> Filesystem kbytes quota limit grace files quota limit grace >>>>>>>>>>> 2202272244 0 3020159313 - 185097 0 0 - >>>>>>>>>>> [root@lustreclient scratch]# ssh ossnode2 lfs quota -g somegroup -I >>>>>>>>>>> 5 /lustre1 >>>>>>>>>>> Disk quotas for grp somegroup (gid 9544): >>>>>>>>>>> Filesystem kbytes quota limit grace files quota limit grace >>>>>>>>>>> 1324770165 0 1371244768 - 145347 0 0 - >>>>>>>>>>> [root@lustreclient scratch]# ssh ossnode3 lfs quota -g somegroup -I >>>>>>>>>>> 6 /lustre1 >>>>>>>>>>> Disk quotas for grp somegroup (gid 9544): >>>>>>>>>>> Filesystem kbytes quota limit grace files quota limit grace >>>>>>>>>>> 2892027349 0 3221225472 - 169386 0 0 - >>>>>>>>>>> [root@lustreclient scratch]# ssh ossnode3 lfs quota -g somegroup -I >>>>>>>>>>> 7 /lustre1 >>>>>>>>>>> Disk quotas for grp somegroup (gid 9544): >>>>>>>>>>> Filesystem kbytes quota limit grace files quota limit grace >>>>>>>>>>> 2076201636 0 2474853207 - 171552 0 0 - >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> — >>>>>>>>>>> Dan Szkola >>>>>>>>>>> FNAL >>>>>>>>>>> >>>>>>>>>>>>> On Oct 4, 2023, at 8:45 AM, Daniel Szkola via lustre-discuss >>>>>>>>>>>>> <lustre-discuss@lists.lustre.org >>>>>>>>>>>>> <mailto:lustre-discuss@lists.lustre.org>> wrote: >>>>>>>>>>>> >>>>>>>>>>>> No combination of ossnodek runs has helped with this. >>>>>>>>>>>> >>>>>>>>>>>> Again, robinhood shows 1796104 files for the group, an 'lfs find >>>>>>>>>>>> -G gid' found 1796104 files as well. >>>>>>>>>>>> >>>>>>>>>>>> So why is the quota command showing over 3 million inodes used? >>>>>>>>>>>> >>>>>>>>>>>> There must be a way to force it to recount or clear all stale >>>>>>>>>>>> quota data and have it regenerate it? >>>>>>>>>>>> >>>>>>>>>>>> Anyone? >>>>>>>>>>>> >>>>>>>>>>>> — >>>>>>>>>>>> Dan Szkola >>>>>>>>>>>> FNAL >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>>> On Sep 27, 2023, at 9:42 AM, Daniel Szkola via lustre-discuss >>>>>>>>>>>>> <lustre-discuss@lists.lustre.org >>>>>>>>>>>>> <mailto:lustre-discuss@lists.lustre.org>> wrote: >>>>>>>>>>>>> >>>>>>>>>>>>> We have a lustre filesystem that we just upgraded to 2.15.3, >>>>>>>>>>>>> however this problem has been going on for some time. >>>>>>>>>>>>> >>>>>>>>>>>>> The quota command shows this: >>>>>>>>>>>>> >>>>>>>>>>>>> Disk quotas for grp somegroup (gid 9544): >>>>>>>>>>>>> Filesystem used quota limit grace files quota limit grace >>>>>>>>>>>>> /lustre1 13.38T 40T 45T - 3136761* 2621440 3670016 expired >>>>>>>>>>>>> >>>>>>>>>>>>> The group is not using nearly that many files. We have robinhood >>>>>>>>>>>>> installed and it show this: >>>>>>>>>>>>> >>>>>>>>>>>>> Using config file '/etc/robinhood.d/lustre1.conf'. >>>>>>>>>>>>> group, type, count, volume, spc_used, avg_size >>>>>>>>>>>>> somegroup, symlink, 59071, 5.12 MB, 103.16 MB, 91 >>>>>>>>>>>>> somegroup, dir, 426619, 5.24 GB, 5.24 GB, 12.87 KB >>>>>>>>>>>>> somegroup, file, 1310414, 16.24 TB, 13.37 TB, 13.00 MB >>>>>>>>>>>>> >>>>>>>>>>>>> Total: 1796104 entries, volume: 17866508365925 bytes (16.25 TB), >>>>>>>>>>>>> space used: 14704924899840 bytes (13.37 TB) >>>>>>>>>>>>> >>>>>>>>>>>>> Any ideas what is wrong here? >>>>>>>>>>>>> >>>>>>>>>>>>> — >>>>>>>>>>>>> Dan Szkola >>>>>>>>>>>>> FNAL >>>>>>>> >>>>>>>> _______________________________________________ >>>>>>>> lustre-discuss mailing list >>>>>>>> lustre-discuss@lists.lustre.org >>>>>>>> <mailto:lustre-discuss@lists.lustre.org> >>>>>>>> https://urldefense.proofpoint.com/v2/url?u=http-3A__lists.lustre.org_listinfo.cgi_lustre-2Ddiscuss-2Dlustre.org&d=DwIGaQ&c=gRgGjJ3BkIsb5y6s49QqsA&r=e9DXjyTaQ786Tg7WH7oIVaQOA1YDRqyxHOUaYU2_LQw&m=fBsuy6PvJ6sM0CR4j5XQgdmXVIcfg7TPKCS_M1GPMH4gO9ZYMYzNwrtCk4VsbxsJ&s=G9xyFTfK33CmMEh4Da4Vor1Iu5u_EwwX04fX1YarFPI&e= >>>>>>>> >>>>>>>> <https://urldefense.proofpoint.com/v2/url?u=http-3A__lists.lustre.org_listinfo.cgi_lustre-2Ddiscuss-2Dlustre.org&d=DwIGaQ&c=gRgGjJ3BkIsb5y6s49QqsA&r=e9DXjyTaQ786Tg7WH7oIVaQOA1YDRqyxHOUaYU2_LQw&m=fBsuy6PvJ6sM0CR4j5XQgdmXVIcfg7TPKCS_M1GPMH4gO9ZYMYzNwrtCk4VsbxsJ&s=G9xyFTfK33CmMEh4Da4Vor1Iu5u_EwwX04fX1YarFPI&e=> >>>>>>> >>>>>> >>>>> >>>>> _______________________________________________ >>>>> lustre-discuss mailing list >>>>> lustre-discuss@lists.lustre.org <mailto:lustre-discuss@lists.lustre.org> >>>>> https://urldefense.proofpoint.com/v2/url?u=http-3A__lists.lustre.org_listinfo.cgi_lustre-2Ddiscuss-2Dlustre.org&d=DwIGaQ&c=gRgGjJ3BkIsb5y6s49QqsA&r=e9DXjyTaQ786Tg7WH7oIVaQOA1YDRqyxHOUaYU2_LQw&m=fBsuy6PvJ6sM0CR4j5XQgdmXVIcfg7TPKCS_M1GPMH4gO9ZYMYzNwrtCk4VsbxsJ&s=G9xyFTfK33CmMEh4Da4Vor1Iu5u_EwwX04fX1YarFPI&e= >>>>> >>>>> <https://urldefense.proofpoint.com/v2/url?u=http-3A__lists.lustre.org_listinfo.cgi_lustre-2Ddiscuss-2Dlustre.org&d=DwIGaQ&c=gRgGjJ3BkIsb5y6s49QqsA&r=e9DXjyTaQ786Tg7WH7oIVaQOA1YDRqyxHOUaYU2_LQw&m=fBsuy6PvJ6sM0CR4j5XQgdmXVIcfg7TPKCS_M1GPMH4gO9ZYMYzNwrtCk4VsbxsJ&s=G9xyFTfK33CmMEh4Da4Vor1Iu5u_EwwX04fX1YarFPI&e=> >>> >>> >>> _______________________________________________ >>> lustre-discuss mailing list >>> lustre-discuss@lists.lustre.org <mailto:lustre-discuss@lists.lustre.org> >>> https://urldefense.proofpoint.com/v2/url?u=http-3A__lists.lustre.org_listinfo.cgi_lustre-2Ddiscuss-2Dlustre.org&d=DwIGaQ&c=gRgGjJ3BkIsb5y6s49QqsA&r=e9DXjyTaQ786Tg7WH7oIVaQOA1YDRqyxHOUaYU2_LQw&m=0w70-kOuiuQFGBYkAfiiTTpCg4Y0hyNsE5eiNjaEKABgaZXRMVf3O2sqLq2-1HIC&s=LcunTK-Jz7cEyPzFriWsTM8uxubSE1NHq-DZOLUHXZ4&e= >>> >>> <https://urldefense.proofpoint.com/v2/url?u=http-3A__lists.lustre.org_listinfo.cgi_lustre-2Ddiscuss-2Dlustre.org&d=DwIGaQ&c=gRgGjJ3BkIsb5y6s49QqsA&r=e9DXjyTaQ786Tg7WH7oIVaQOA1YDRqyxHOUaYU2_LQw&m=0w70-kOuiuQFGBYkAfiiTTpCg4Y0hyNsE5eiNjaEKABgaZXRMVf3O2sqLq2-1HIC&s=LcunTK-Jz7cEyPzFriWsTM8uxubSE1NHq-DZOLUHXZ4&e= >>> > >>> >>> >>> >> > > Cheers, Andreas > -- > Andreas Dilger > Lustre Principal Architect > Whamcloud > > > > > > > _______________________________________________ lustre-discuss mailing list lustre-discuss@lists.lustre.org http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org