Your help on this issue has been much appreciated, thanks. I deleted all the 
zero-length files for the group that was having issues. The robinhood report 
and the quota are now reporting the same number of files. Amazing. Thanks again.

Dan Szkola

> On Oct 18, 2023, at 7:12 AM, Andreas Dilger <> wrote:
> The zero-length objects are created for the file stripes, but if the MDT 
> inodes were deleted, but something went wrong with the MDT before the OST 
> objects were deleted, then the objects would be left behind. 
> If the objects are in lost+found with the FID as the filename, then the file 
> itself is almost certainly already deleted, so fid2path would just return the 
> file in lost+found. 
> I don't think there would be any problem to delete them. 
> Cheers, Andreas
>> On Oct 18, 2023, at 08:30, Daniel Szkola <> wrote:
>> In this case almost all, if not all, of the files look a lot like this:
>> -r-------- 1 someuser   somegroup 0 Dec 31  1969 
>> '[0x200012392:0xe0ad:0x0]-R-0’
>> stat shows:
>> # stat [0x200012392:0xe0ad:0x0]-R-0
>> File: [0x200012392:0xe0ad:0x0]-R-0
>> Size: 0             Blocks: 1          IO Block: 4194304 regular empty file
>> Device: a75b4da0h/2807778720d    Inode: 144116440360870061  Links: 1
>> Access: (0400/-r--------)  Uid: (43667/  someuser)   Gid: ( 9349/somegroup)
>> Access: 1969-12-31 18:00:00.000000000 -0600
>> Modify: 1969-12-31 18:00:00.000000000 -0600
>> Change: 1969-12-31 18:00:00.000000000 -0600
>> Birth: 2023-01-11 13:01:40.000000000 -0600
>> Not sure what these were or how they ended up in lost+found. I took this 
>> lustre fs over from folks who have moved on and I’m still trying to wrap my 
>> head around some of the finer details. In a normal linux fs, usually, not 
>> always, the blocks will have data in them. These are all zero-length. My 
>> inclination is to see if I can delete them and be done with it, but I’m a 
>> bit paranoid.
>> —
>> Dan Szkola
>>> On Oct 17, 2023, at 4:23 PM, Andreas Dilger <> wrote:
>>> The files reported in .lustre/lost+found *ARE* the objects on the OSTs (at 
>>> least when accessed through a Lustre mountpoint, not if accessed directly 
>>> on the MDT mounted as ldiskfs), so when they are deleted the space on the 
>>> OSTs will be freed.
>>> As for identification, the OST objects do not have any name information, 
>>> but they should have UID/GID/PROJID and timestamps that might help 
>>> identification.
>>> Cheers, Andreas
>>>>> On Oct 18, 2023, at 03:42, Daniel Szkola <> wrote:
>>>> OK, so I did find the hidden .lustre directory (thanks Darby) and there 
>>>> are many, many files in the lost+found directory. I can run ’stat’ on them 
>>>> and get some info. Is there anything else I can do to tell what these 
>>>> were? Is it safe to delete them? Is there anyway to tell if there are 
>>>> matching files on the OST(s) that also need to be deleted?
>>>> —
>>>> Dan Szkola
>>>> FNAL 
>>>>> On Oct 10, 2023, at 3:44 PM, Vicker, Darby J. (JSC-EG111)[Jacobs 
>>>>> Technology, Inc.] <> wrote:
>>>>>> I don’t have a .lustre directory at the filesystem root.
>>>>> It's there, but doesn't show up even with 'ls -a'.  If you cd into it or 
>>>>> ls it, it's there.  Lustre magic.  :)
>>>>> -----Original Message-----
>>>>> From: lustre-discuss < 
>>>>> <>> on behalf of Daniel 
>>>>> Szkola via lustre-discuss < 
>>>>> <>>
>>>>> Reply-To: Daniel Szkola < <>>
>>>>> Date: Tuesday, October 10, 2023 at 2:30 PM
>>>>> To: Andreas Dilger < <>>
>>>>> Cc: lustre < 
>>>>> <>>
>>>>> Subject: [EXTERNAL] [BULK] Re: [lustre-discuss] Ongoing issues with quota
>>>>> CAUTION: This email originated from outside of NASA. Please take care 
>>>>> when clicking links or opening attachments. Use the "Report Message" 
>>>>> button to report suspicious messages to the NASA SOC.
>>>>> Hello Andreas,
>>>>> lfs df -i reports 19,204,412 inodes used. When I did the full robinhood 
>>>>> scan, it reported scanning 18,673,874 entries, so fairly close.
>>>>> I don’t have a .lustre directory at the filesystem root.
>>>>> Another interesting aspect of this particular issue is I can run lctl 
>>>>> lfsck and every time I get:
>>>>> layout_repaired: 1468299
>>>>> But it doesn’t seem to be actually repairing anything because if I run it 
>>>>> again, I’ll get the same or a similar number.
>>>>> I run it like this:
>>>>> lctl lfsck_start -t layout -t namespace -o -M lfsc-MDT0000
>>>>> —
>>>>> Dan Szkola
>>>>> FNAL
>>>>>> On Oct 10, 2023, at 10:47 AM, Andreas Dilger < 
>>>>>> <>> wrote:
>>>>>> There is a $ROOT/.lustre/lost+found that you could check.
>>>>>> What does "lfs df -i" report for the used inode count? Maybe it is RBH 
>>>>>> that is reporting the wrong count?
>>>>>> The other alternative would be to mount the MDT filesystem directly as 
>>>>>> type ZFS and see what df -i and find report?
>>>>>> Cheers, Andreas
>>>>>>> On Oct 10, 2023, at 22:16, Daniel Szkola via lustre-discuss 
>>>>>>> < 
>>>>>>> <>> wrote:
>>>>>>> OK, I disabled, waited for a while, then reenabled. I still get the 
>>>>>>> same numbers. The only thing I can think is somehow the count is 
>>>>>>> correct, despite the huge difference. Robinhood and find show about 
>>>>>>> 1.7M files, dirs, and links. The quota is showing a bit over 3.1M 
>>>>>>> inodes used. We only have one MDS and MGS. Any ideas where the 
>>>>>>> discrepancy may lie? Orphans? Is there a lost+found area in lustre?
>>>>>>> —
>>>>>>> Dan Szkola
>>>>>>> FNAL
>>>>>>>> On Oct 10, 2023, at 8:24 AM, Daniel Szkola < 
>>>>>>>> <>> wrote:
>>>>>>>> Hi Robert,
>>>>>>>> Thanks for the response. Do you remember exactly how you did it? Did 
>>>>>>>> you bring everything down at any point? I know you can do this:
>>>>>>>> lctl conf_param fsname.quota.mdt=none
>>>>>>>> but is that all you did? Did you wait or bring everything down before 
>>>>>>>> reenabling? I’m worried because that allegedly just enables/disables 
>>>>>>>> enforcement and space accounting is always on. Andreas stated that 
>>>>>>>> quotas are controlled by ZFS, but there has been no quota support 
>>>>>>>> enabled on any of the ZFS volumes in our lustre filesystem.
>>>>>>>> —
>>>>>>>> Dan Szkola
>>>>>>>> FNAL
>>>>>>>>>> On Oct 10, 2023, at 2:17 AM, Redl, Robert < 
>>>>>>>>>> <>> wrote:
>>>>>>>>> Dear Dan,
>>>>>>>>> I had a similar problem some time ago. We are also using ZFS for MDT 
>>>>>>>>> and OSTs. For us, the used disk space was reported wrong. The problem 
>>>>>>>>> was fixed by switching quota support off on the MGS and then on again.
>>>>>>>>> Cheers,
>>>>>>>>> Robert
>>>>>>>>>> Am 09.10.2023 um 17:55 schrieb Daniel Szkola via lustre-discuss 
>>>>>>>>>> < 
>>>>>>>>>> <>>:
>>>>>>>>>> Thanks, I will look into the ZFS quota since we are using ZFS for 
>>>>>>>>>> all storage, MDT and OSTs.
>>>>>>>>>> In our case, there is a single MDS/MDT. I have used Robinhood and 
>>>>>>>>>> lfs find (by group) commands to verify what the numbers should 
>>>>>>>>>> apparently be.
>>>>>>>>>> —
>>>>>>>>>> Dan Szkola
>>>>>>>>>> FNAL
>>>>>>>>>>> On Oct 9, 2023, at 10:13 AM, Andreas Dilger < 
>>>>>>>>>>> <>> wrote:
>>>>>>>>>>> The quota accounting is controlled by the backing filesystem of the 
>>>>>>>>>>> OSTs and MDTs.
>>>>>>>>>>> For ldiskfs/ext4 you could run e2fsck to re-count all of the inode 
>>>>>>>>>>> and block usage.
>>>>>>>>>>> For ZFS you would have to ask on the ZFS list to see if there is 
>>>>>>>>>>> some way to re-count the quota usage.
>>>>>>>>>>> The "inode" quota is accounted from the MDTs, while the "block" 
>>>>>>>>>>> quota is accounted from the OSTs. You might be able to see with 
>>>>>>>>>>> "lfs quota -v -g group" to see if there is one particular MDT that 
>>>>>>>>>>> is returning too many inodes.
>>>>>>>>>>> Possibly if you have directories that are striped across many MDTs 
>>>>>>>>>>> it would inflate the used inode count. For example, if every one of 
>>>>>>>>>>> the 426k directories reported by RBH was striped across 4 MDTs then 
>>>>>>>>>>> you would see the inode count add up to 3.6M.
>>>>>>>>>>> If that was the case, then I would really, really advise against 
>>>>>>>>>>> striping every directory in the filesystem. That will cause 
>>>>>>>>>>> problems far worse than just inflating the inode quota accounting.
>>>>>>>>>>> Cheers, Andreas
>>>>>>>>>>>> On Oct 9, 2023, at 22:33, Daniel Szkola via lustre-discuss 
>>>>>>>>>>>> < 
>>>>>>>>>>>> <>> wrote:
>>>>>>>>>>>> Is there really no way to force a recount of files used by the 
>>>>>>>>>>>> quota? All indications are we have accounts where files were 
>>>>>>>>>>>> removed and this is not reflected in the used file count in the 
>>>>>>>>>>>> quota. The space used seems correct but the inodes used numbers 
>>>>>>>>>>>> are way high. There must be a way to clear these numbers and have 
>>>>>>>>>>>> a fresh count done.
>>>>>>>>>>>> —
>>>>>>>>>>>> Dan Szkola
>>>>>>>>>>>> FNAL
>>>>>>>>>>>>> On Oct 4, 2023, at 11:37 AM, Daniel Szkola via lustre-discuss 
>>>>>>>>>>>>> < 
>>>>>>>>>>>>> <>> wrote:
>>>>>>>>>>>>> Also, quotas on the OSTS don’t add up to near 3 million files 
>>>>>>>>>>>>> either:
>>>>>>>>>>>>> [root@lustreclient scratch]# ssh ossnode0 lfs quota -g somegroup 
>>>>>>>>>>>>> -I 0 /lustre1
>>>>>>>>>>>>> Disk quotas for grp somegroup (gid 9544):
>>>>>>>>>>>>> Filesystem kbytes quota limit grace files quota limit grace
>>>>>>>>>>>>> 1394853459 0 1913344192 - 132863 0 0 -
>>>>>>>>>>>>> [root@lustreclient scratch]# ssh ossnode0 lfs quota -g somegroup 
>>>>>>>>>>>>> -I 1 /lustre1
>>>>>>>>>>>>> Disk quotas for grp somegroup (gid 9544):
>>>>>>>>>>>>> Filesystem kbytes quota limit grace files quota limit grace
>>>>>>>>>>>>> 1411579601 0 1963246413 - 120643 0 0 -
>>>>>>>>>>>>> [root@lustreclient scratch]# ssh ossnode1 lfs quota -g somegroup 
>>>>>>>>>>>>> -I 2 /lustre1
>>>>>>>>>>>>> Disk quotas for grp somegroup (gid 9544):
>>>>>>>>>>>>> Filesystem kbytes quota limit grace files quota limit grace
>>>>>>>>>>>>> 1416507527 0 1789950778 - 190687 0 0 -
>>>>>>>>>>>>> [root@lustreclient scratch]# ssh ossnode1 lfs quota -g somegroup 
>>>>>>>>>>>>> -I 3 /lustre1
>>>>>>>>>>>>> Disk quotas for grp somegroup (gid 9544):
>>>>>>>>>>>>> Filesystem kbytes quota limit grace files quota limit grace
>>>>>>>>>>>>> 1636465724 0 1926578117 - 195034 0 0 -
>>>>>>>>>>>>> [root@lustreclient scratch]# ssh ossnode2 lfs quota -g somegroup 
>>>>>>>>>>>>> -I 4 /lustre1
>>>>>>>>>>>>> Disk quotas for grp somegroup (gid 9544):
>>>>>>>>>>>>> Filesystem kbytes quota limit grace files quota limit grace
>>>>>>>>>>>>> 2202272244 0 3020159313 - 185097 0 0 -
>>>>>>>>>>>>> [root@lustreclient scratch]# ssh ossnode2 lfs quota -g somegroup 
>>>>>>>>>>>>> -I 5 /lustre1
>>>>>>>>>>>>> Disk quotas for grp somegroup (gid 9544):
>>>>>>>>>>>>> Filesystem kbytes quota limit grace files quota limit grace
>>>>>>>>>>>>> 1324770165 0 1371244768 - 145347 0 0 -
>>>>>>>>>>>>> [root@lustreclient scratch]# ssh ossnode3 lfs quota -g somegroup 
>>>>>>>>>>>>> -I 6 /lustre1
>>>>>>>>>>>>> Disk quotas for grp somegroup (gid 9544):
>>>>>>>>>>>>> Filesystem kbytes quota limit grace files quota limit grace
>>>>>>>>>>>>> 2892027349 0 3221225472 - 169386 0 0 -
>>>>>>>>>>>>> [root@lustreclient scratch]# ssh ossnode3 lfs quota -g somegroup 
>>>>>>>>>>>>> -I 7 /lustre1
>>>>>>>>>>>>> Disk quotas for grp somegroup (gid 9544):
>>>>>>>>>>>>> Filesystem kbytes quota limit grace files quota limit grace
>>>>>>>>>>>>> 2076201636 0 2474853207 - 171552 0 0 -
>>>>>>>>>>>>> —
>>>>>>>>>>>>> Dan Szkola
>>>>>>>>>>>>> FNAL
>>>>>>>>>>>>>>> On Oct 4, 2023, at 8:45 AM, Daniel Szkola via lustre-discuss 
>>>>>>>>>>>>>>> < 
>>>>>>>>>>>>>>> <>> wrote:
>>>>>>>>>>>>>> No combination of ossnodek runs has helped with this.
>>>>>>>>>>>>>> Again, robinhood shows 1796104 files for the group, an 'lfs find 
>>>>>>>>>>>>>> -G gid' found 1796104 files as well.
>>>>>>>>>>>>>> So why is the quota command showing over 3 million inodes used?
>>>>>>>>>>>>>> There must be a way to force it to recount or clear all stale 
>>>>>>>>>>>>>> quota data and have it regenerate it?
>>>>>>>>>>>>>> Anyone?
>>>>>>>>>>>>>> —
>>>>>>>>>>>>>> Dan Szkola
>>>>>>>>>>>>>> FNAL
>>>>>>>>>>>>>>> On Sep 27, 2023, at 9:42 AM, Daniel Szkola via lustre-discuss 
>>>>>>>>>>>>>>> < 
>>>>>>>>>>>>>>> <>> wrote:
>>>>>>>>>>>>>>> We have a lustre filesystem that we just upgraded to 2.15.3, 
>>>>>>>>>>>>>>> however this problem has been going on for some time.
>>>>>>>>>>>>>>> The quota command shows this:
>>>>>>>>>>>>>>> Disk quotas for grp somegroup (gid 9544):
>>>>>>>>>>>>>>> Filesystem used quota limit grace files quota limit grace
>>>>>>>>>>>>>>> /lustre1 13.38T 40T 45T - 3136761* 2621440 3670016 expired
>>>>>>>>>>>>>>> The group is not using nearly that many files. We have 
>>>>>>>>>>>>>>> robinhood installed and it show this:
>>>>>>>>>>>>>>> Using config file '/etc/robinhood.d/lustre1.conf'.
>>>>>>>>>>>>>>> group, type, count, volume, spc_used, avg_size
>>>>>>>>>>>>>>> somegroup, symlink, 59071, 5.12 MB, 103.16 MB, 91
>>>>>>>>>>>>>>> somegroup, dir, 426619, 5.24 GB, 5.24 GB, 12.87 KB
>>>>>>>>>>>>>>> somegroup, file, 1310414, 16.24 TB, 13.37 TB, 13.00 MB
>>>>>>>>>>>>>>> Total: 1796104 entries, volume: 17866508365925 bytes (16.25 
>>>>>>>>>>>>>>> TB), space used: 14704924899840 bytes (13.37 TB)
>>>>>>>>>>>>>>> Any ideas what is wrong here?
>>>>>>>>>>>>>>> —
>>>>>>>>>>>>>>> Dan Szkola
>>>>>>>>>>>>>>> FNAL
>>>>>>>>>> _______________________________________________
>>>>>>>>>> lustre-discuss mailing list
>>>>>>>>>> <>
>>>>>>>>>> <>
>>>>>>> _______________________________________________
>>>>>>> lustre-discuss mailing list
>>>>>>> <>
>>>>>>> <>
>>>>> _______________________________________________
>>>>> lustre-discuss mailing list
>>>>> <>
>>>>> <
>>>>>  >
>>> Cheers, Andreas
>>> --
>>> Andreas Dilger
>>> Lustre Principal Architect
>>> Whamcloud

lustre-discuss mailing list
  • R... Daniel Szkola via lustre-discuss
    • ... Daniel Szkola via lustre-discuss
      • ... Andreas Dilger via lustre-discuss
      • ... Daniel Szkola via lustre-discuss
      • ... Vicker, Darby J. (JSC-EG111)[Jacobs Technology, Inc.] via lustre-discuss
      • ... Daniel Szkola via lustre-discuss
      • ... Andreas Dilger via lustre-discuss
      • ... Daniel Szkola via lustre-discuss
      • ... Vicker, Darby J. (JSC-EG111)[Jacobs Technology, Inc.] via lustre-discuss
      • ... Andreas Dilger via lustre-discuss
      • ... Daniel Szkola via lustre-discuss

Reply via email to