Re: [lustre-discuss] [EXTERNAL] [BULK] Re: Ongoing issues with quota

2023-10-18 Thread Daniel Szkola via lustre-discuss
Your help on this issue has been much appreciated, thanks. I deleted all the 
zero-length files for the group that was having issues. The robinhood report 
and the quota are now reporting the same number of files. Amazing. Thanks again.

—
Dan Szkola
FNAL


> On Oct 18, 2023, at 7:12 AM, Andreas Dilger  wrote:
> 
> The zero-length objects are created for the file stripes, but if the MDT 
> inodes were deleted, but something went wrong with the MDT before the OST 
> objects were deleted, then the objects would be left behind. 
> 
> If the objects are in lost+found with the FID as the filename, then the file 
> itself is almost certainly already deleted, so fid2path would just return the 
> file in lost+found. 
> 
> I don't think there would be any problem to delete them. 
> 
> Cheers, Andreas
> 
>> On Oct 18, 2023, at 08:30, Daniel Szkola  wrote:
>> 
>> In this case almost all, if not all, of the files look a lot like this:
>> 
>> -r 1 someuser   somegroup 0 Dec 31  1969 
>> '[0x200012392:0xe0ad:0x0]-R-0’
>> 
>> stat shows:
>> 
>> # stat [0x200012392:0xe0ad:0x0]-R-0
>> File: [0x200012392:0xe0ad:0x0]-R-0
>> Size: 0 Blocks: 1  IO Block: 4194304 regular empty file
>> Device: a75b4da0h/2807778720dInode: 144116440360870061  Links: 1
>> Access: (0400/-r)  Uid: (43667/  someuser)   Gid: ( 9349/somegroup)
>> Access: 1969-12-31 18:00:00.0 -0600
>> Modify: 1969-12-31 18:00:00.0 -0600
>> Change: 1969-12-31 18:00:00.0 -0600
>> Birth: 2023-01-11 13:01:40.0 -0600
>> 
>> Not sure what these were or how they ended up in lost+found. I took this 
>> lustre fs over from folks who have moved on and I’m still trying to wrap my 
>> head around some of the finer details. In a normal linux fs, usually, not 
>> always, the blocks will have data in them. These are all zero-length. My 
>> inclination is to see if I can delete them and be done with it, but I’m a 
>> bit paranoid.
>> 
>> —
>> Dan Szkola
>> FNAL
>> 
>> 
>> 
>> 
>> 
>>> On Oct 17, 2023, at 4:23 PM, Andreas Dilger  wrote:
>>> 
>>> The files reported in .lustre/lost+found *ARE* the objects on the OSTs (at 
>>> least when accessed through a Lustre mountpoint, not if accessed directly 
>>> on the MDT mounted as ldiskfs), so when they are deleted the space on the 
>>> OSTs will be freed.
>>> 
>>> As for identification, the OST objects do not have any name information, 
>>> but they should have UID/GID/PROJID and timestamps that might help 
>>> identification.
>>> 
>>> Cheers, Andreas
>>> 
>>>>> On Oct 18, 2023, at 03:42, Daniel Szkola  wrote:
>>>> 
>>>> OK, so I did find the hidden .lustre directory (thanks Darby) and there 
>>>> are many, many files in the lost+found directory. I can run ’stat’ on them 
>>>> and get some info. Is there anything else I can do to tell what these 
>>>> were? Is it safe to delete them? Is there anyway to tell if there are 
>>>> matching files on the OST(s) that also need to be deleted?
>>>> 
>>>> —
>>>> Dan Szkola
>>>> FNAL 
>>>> 
>>>>> On Oct 10, 2023, at 3:44 PM, Vicker, Darby J. (JSC-EG111)[Jacobs 
>>>>> Technology, Inc.]  wrote:
>>>>> 
>>>>>> I don’t have a .lustre directory at the filesystem root.
>>>>> 
>>>>> It's there, but doesn't show up even with 'ls -a'.  If you cd into it or 
>>>>> ls it, it's there.  Lustre magic.  :)
>>>>> 
>>>>> -Original Message-
>>>>> From: lustre-discuss >>>> <mailto:lustre-discuss-boun...@lists.lustre.org>> on behalf of Daniel 
>>>>> Szkola via lustre-discuss >>>> <mailto:lustre-discuss@lists.lustre.org>>
>>>>> Reply-To: Daniel Szkola mailto:dszk...@fnal.gov>>
>>>>> Date: Tuesday, October 10, 2023 at 2:30 PM
>>>>> To: Andreas Dilger mailto:adil...@whamcloud.com>>
>>>>> Cc: lustre >>>> <mailto:lustre-discuss@lists.lustre.org>>
>>>>> Subject: [EXTERNAL] [BULK] Re: [lustre-discuss] Ongoing issues with quota
>>>>> 
>>>>> 
>>>>> CAUTION: This email originated from outside of NASA. Please take care 
>>>>> when clicking links or opening attachments. Use the "Report Message" 
>>>>> button to report suspicious messages to the NASA SOC.

Re: [lustre-discuss] [EXTERNAL] [BULK] Re: Ongoing issues with quota

2023-10-17 Thread Daniel Szkola via lustre-discuss
In this case almost all, if not all, of the files look a lot like this:

-r 1 someuser   somegroup 0 Dec 31  1969 '[0x200012392:0xe0ad:0x0]-R-0’

stat shows:

# stat [0x200012392:0xe0ad:0x0]-R-0
  File: [0x200012392:0xe0ad:0x0]-R-0
  Size: 0   Blocks: 1  IO Block: 4194304 regular empty file
Device: a75b4da0h/2807778720d   Inode: 144116440360870061  Links: 1
Access: (0400/-r)  Uid: (43667/  someuser)   Gid: ( 9349/somegroup)
Access: 1969-12-31 18:00:00.0 -0600
Modify: 1969-12-31 18:00:00.0 -0600
Change: 1969-12-31 18:00:00.0 -0600
 Birth: 2023-01-11 13:01:40.0 -0600

Not sure what these were or how they ended up in lost+found. I took this lustre 
fs over from folks who have moved on and I’m still trying to wrap my head 
around some of the finer details. In a normal linux fs, usually, not always, 
the blocks will have data in them. These are all zero-length. My inclination is 
to see if I can delete them and be done with it, but I’m a bit paranoid.

—
Dan Szkola
FNAL



 

> On Oct 17, 2023, at 4:23 PM, Andreas Dilger  wrote:
> 
> The files reported in .lustre/lost+found *ARE* the objects on the OSTs (at 
> least when accessed through a Lustre mountpoint, not if accessed directly on 
> the MDT mounted as ldiskfs), so when they are deleted the space on the OSTs 
> will be freed.
> 
> As for identification, the OST objects do not have any name information, but 
> they should have UID/GID/PROJID and timestamps that might help identification.
> 
> Cheers, Andreas
> 
>> On Oct 18, 2023, at 03:42, Daniel Szkola  wrote:
>> 
>> OK, so I did find the hidden .lustre directory (thanks Darby) and there are 
>> many, many files in the lost+found directory. I can run ’stat’ on them and 
>> get some info. Is there anything else I can do to tell what these were? Is 
>> it safe to delete them? Is there anyway to tell if there are matching files 
>> on the OST(s) that also need to be deleted?
>> 
>> —
>> Dan Szkola
>> FNAL 
>> 
>>> On Oct 10, 2023, at 3:44 PM, Vicker, Darby J. (JSC-EG111)[Jacobs 
>>> Technology, Inc.]  wrote:
>>> 
>>>> I don’t have a .lustre directory at the filesystem root.
>>> 
>>> It's there, but doesn't show up even with 'ls -a'.  If you cd into it or ls 
>>> it, it's there.  Lustre magic.  :)
>>> 
>>> -Original Message-
>>> From: lustre-discuss >> <mailto:lustre-discuss-boun...@lists.lustre.org>> on behalf of Daniel 
>>> Szkola via lustre-discuss >> <mailto:lustre-discuss@lists.lustre.org>>
>>> Reply-To: Daniel Szkola mailto:dszk...@fnal.gov>>
>>> Date: Tuesday, October 10, 2023 at 2:30 PM
>>> To: Andreas Dilger mailto:adil...@whamcloud.com>>
>>> Cc: lustre >> <mailto:lustre-discuss@lists.lustre.org>>
>>> Subject: [EXTERNAL] [BULK] Re: [lustre-discuss] Ongoing issues with quota
>>> 
>>> 
>>> CAUTION: This email originated from outside of NASA. Please take care when 
>>> clicking links or opening attachments. Use the "Report Message" button to 
>>> report suspicious messages to the NASA SOC.
>>> 
>>> 
>>> 
>>> 
>>> 
>>> 
>>> 
>>> 
>>> Hello Andreas,
>>> 
>>> 
>>> lfs df -i reports 19,204,412 inodes used. When I did the full robinhood 
>>> scan, it reported scanning 18,673,874 entries, so fairly close.
>>> 
>>> 
>>> I don’t have a .lustre directory at the filesystem root.
>>> 
>>> 
>>> Another interesting aspect of this particular issue is I can run lctl lfsck 
>>> and every time I get:
>>> 
>>> 
>>> layout_repaired: 1468299
>>> 
>>> 
>>> But it doesn’t seem to be actually repairing anything because if I run it 
>>> again, I’ll get the same or a similar number.
>>> 
>>> 
>>> I run it like this:
>>> lctl lfsck_start -t layout -t namespace -o -M lfsc-MDT
>>> 
>>> 
>>> —
>>> Dan Szkola
>>> FNAL
>>> 
>>> 
>>> 
>>> 
>>>> On Oct 10, 2023, at 10:47 AM, Andreas Dilger >>> <mailto:adil...@whamcloud.com>> wrote:
>>>> 
>>>> There is a $ROOT/.lustre/lost+found that you could check.
>>>> 
>>>> What does "lfs df -i" report for the used inode count? Maybe it is RBH 
>>>> that is reporting the wrong count?
>>>> 
>>>> The other alternative would be to mount the MDT filesystem directly as 
>>>>

Re: [lustre-discuss] [EXTERNAL] [BULK] Re: Ongoing issues with quota

2023-10-17 Thread Daniel Szkola via lustre-discuss
OK, so I did find the hidden .lustre directory (thanks Darby) and there are 
many, many files in the lost+found directory. I can run ’stat’ on them and get 
some info. Is there anything else I can do to tell what these were? Is it safe 
to delete them? Is there anyway to tell if there are matching files on the 
OST(s) that also need to be deleted?

—
Dan Szkola
FNAL 

> On Oct 10, 2023, at 3:44 PM, Vicker, Darby J. (JSC-EG111)[Jacobs Technology, 
> Inc.]  wrote:
> 
>> I don’t have a .lustre directory at the filesystem root.
> 
> It's there, but doesn't show up even with 'ls -a'.  If you cd into it or ls 
> it, it's there.  Lustre magic.  :)
> 
> -Original Message-
> From: lustre-discuss  <mailto:lustre-discuss-boun...@lists.lustre.org>> on behalf of Daniel Szkola 
> via lustre-discuss  <mailto:lustre-discuss@lists.lustre.org>>
> Reply-To: Daniel Szkola mailto:dszk...@fnal.gov>>
> Date: Tuesday, October 10, 2023 at 2:30 PM
> To: Andreas Dilger mailto:adil...@whamcloud.com>>
> Cc: lustre  <mailto:lustre-discuss@lists.lustre.org>>
> Subject: [EXTERNAL] [BULK] Re: [lustre-discuss] Ongoing issues with quota
> 
> 
> CAUTION: This email originated from outside of NASA. Please take care when 
> clicking links or opening attachments. Use the "Report Message" button to 
> report suspicious messages to the NASA SOC.
> 
> 
> 
> 
> 
> 
> 
> 
> Hello Andreas,
> 
> 
> lfs df -i reports 19,204,412 inodes used. When I did the full robinhood scan, 
> it reported scanning 18,673,874 entries, so fairly close.
> 
> 
> I don’t have a .lustre directory at the filesystem root.
> 
> 
> Another interesting aspect of this particular issue is I can run lctl lfsck 
> and every time I get:
> 
> 
> layout_repaired: 1468299
> 
> 
> But it doesn’t seem to be actually repairing anything because if I run it 
> again, I’ll get the same or a similar number.
> 
> 
> I run it like this:
> lctl lfsck_start -t layout -t namespace -o -M lfsc-MDT
> 
> 
> —
> Dan Szkola
> FNAL
> 
> 
> 
> 
>> On Oct 10, 2023, at 10:47 AM, Andreas Dilger > <mailto:adil...@whamcloud.com>> wrote:
>> 
>> There is a $ROOT/.lustre/lost+found that you could check.
>> 
>> What does "lfs df -i" report for the used inode count? Maybe it is RBH that 
>> is reporting the wrong count?
>> 
>> The other alternative would be to mount the MDT filesystem directly as type 
>> ZFS and see what df -i and find report?
>> 
>> Cheers, Andreas
>> 
>>> On Oct 10, 2023, at 22:16, Daniel Szkola via lustre-discuss 
>>> mailto:lustre-discuss@lists.lustre.org>> 
>>> wrote:
>>> 
>>> OK, I disabled, waited for a while, then reenabled. I still get the same 
>>> numbers. The only thing I can think is somehow the count is correct, 
>>> despite the huge difference. Robinhood and find show about 1.7M files, 
>>> dirs, and links. The quota is showing a bit over 3.1M inodes used. We only 
>>> have one MDS and MGS. Any ideas where the discrepancy may lie? Orphans? Is 
>>> there a lost+found area in lustre?
>>> 
>>> —
>>> Dan Szkola
>>> FNAL
>>> 
>>> 
>>>> On Oct 10, 2023, at 8:24 AM, Daniel Szkola >>> <mailto:dszk...@fnal.gov>> wrote:
>>>> 
>>>> Hi Robert,
>>>> 
>>>> Thanks for the response. Do you remember exactly how you did it? Did you 
>>>> bring everything down at any point? I know you can do this:
>>>> 
>>>> lctl conf_param fsname.quota.mdt=none
>>>> 
>>>> but is that all you did? Did you wait or bring everything down before 
>>>> reenabling? I’m worried because that allegedly just enables/disables 
>>>> enforcement and space accounting is always on. Andreas stated that quotas 
>>>> are controlled by ZFS, but there has been no quota support enabled on any 
>>>> of the ZFS volumes in our lustre filesystem.
>>>> 
>>>> —
>>>> Dan Szkola
>>>> FNAL
>>>> 
>>>>>> On Oct 10, 2023, at 2:17 AM, Redl, Robert >>>>> <mailto:robert.r...@lmu.de>> wrote:
>>>>> 
>>>>> Dear Dan,
>>>>> 
>>>>> I had a similar problem some time ago. We are also using ZFS for MDT and 
>>>>> OSTs. For us, the used disk space was reported wrong. The problem was 
>>>>> fixed by switching quota support off on the MGS and then on again.
>>>>> 
>>>>> Cheers,
&g

Re: [lustre-discuss] [EXTERNAL] [BULK] Re: Ongoing issues with quota

2023-10-10 Thread Vicker, Darby J. (JSC-EG111)[Jacobs Technology, Inc.] via lustre-discuss
> I don’t have a .lustre directory at the filesystem root.

It's there, but doesn't show up even with 'ls -a'.  If you cd into it or ls it, 
it's there.  Lustre magic.  :)

-Original Message-
From: lustre-discuss mailto:lustre-discuss-boun...@lists.lustre.org>> on behalf of Daniel Szkola 
via lustre-discuss mailto:lustre-discuss@lists.lustre.org>>
Reply-To: Daniel Szkola mailto:dszk...@fnal.gov>>
Date: Tuesday, October 10, 2023 at 2:30 PM
To: Andreas Dilger mailto:adil...@whamcloud.com>>
Cc: lustre mailto:lustre-discuss@lists.lustre.org>>
Subject: [EXTERNAL] [BULK] Re: [lustre-discuss] Ongoing issues with quota


CAUTION: This email originated from outside of NASA. Please take care when 
clicking links or opening attachments. Use the "Report Message" button to 
report suspicious messages to the NASA SOC.








Hello Andreas,


lfs df -i reports 19,204,412 inodes used. When I did the full robinhood scan, 
it reported scanning 18,673,874 entries, so fairly close.


I don’t have a .lustre directory at the filesystem root.


Another interesting aspect of this particular issue is I can run lctl lfsck and 
every time I get:


layout_repaired: 1468299


But it doesn’t seem to be actually repairing anything because if I run it 
again, I’ll get the same or a similar number.


I run it like this:
lctl lfsck_start -t layout -t namespace -o -M lfsc-MDT


—
Dan Szkola
FNAL




> On Oct 10, 2023, at 10:47 AM, Andreas Dilger  <mailto:adil...@whamcloud.com>> wrote:
>
> There is a $ROOT/.lustre/lost+found that you could check.
>
> What does "lfs df -i" report for the used inode count? Maybe it is RBH that 
> is reporting the wrong count?
>
> The other alternative would be to mount the MDT filesystem directly as type 
> ZFS and see what df -i and find report?
>
> Cheers, Andreas
>
>> On Oct 10, 2023, at 22:16, Daniel Szkola via lustre-discuss 
>> mailto:lustre-discuss@lists.lustre.org>> 
>> wrote:
>>
>> OK, I disabled, waited for a while, then reenabled. I still get the same 
>> numbers. The only thing I can think is somehow the count is correct, despite 
>> the huge difference. Robinhood and find show about 1.7M files, dirs, and 
>> links. The quota is showing a bit over 3.1M inodes used. We only have one 
>> MDS and MGS. Any ideas where the discrepancy may lie? Orphans? Is there a 
>> lost+found area in lustre?
>>
>> —
>> Dan Szkola
>> FNAL
>>
>>
>>> On Oct 10, 2023, at 8:24 AM, Daniel Szkola >> <mailto:dszk...@fnal.gov>> wrote:
>>>
>>> Hi Robert,
>>>
>>> Thanks for the response. Do you remember exactly how you did it? Did you 
>>> bring everything down at any point? I know you can do this:
>>>
>>> lctl conf_param fsname.quota.mdt=none
>>>
>>> but is that all you did? Did you wait or bring everything down before 
>>> reenabling? I’m worried because that allegedly just enables/disables 
>>> enforcement and space accounting is always on. Andreas stated that quotas 
>>> are controlled by ZFS, but there has been no quota support enabled on any 
>>> of the ZFS volumes in our lustre filesystem.
>>>
>>> —
>>> Dan Szkola
>>> FNAL
>>>
>>>>> On Oct 10, 2023, at 2:17 AM, Redl, Robert >>>> <mailto:robert.r...@lmu.de>> wrote:
>>>>
>>>> Dear Dan,
>>>>
>>>> I had a similar problem some time ago. We are also using ZFS for MDT and 
>>>> OSTs. For us, the used disk space was reported wrong. The problem was 
>>>> fixed by switching quota support off on the MGS and then on again.
>>>>
>>>> Cheers,
>>>> Robert
>>>>
>>>>> Am 09.10.2023 um 17:55 schrieb Daniel Szkola via lustre-discuss 
>>>>> >>>> <mailto:lustre-discuss@lists.lustre.org>>:
>>>>>
>>>>> Thanks, I will look into the ZFS quota since we are using ZFS for all 
>>>>> storage, MDT and OSTs.
>>>>>
>>>>> In our case, there is a single MDS/MDT. I have used Robinhood and lfs 
>>>>> find (by group) commands to verify what the numbers should apparently be.
>>>>>
>>>>> —
>>>>> Dan Szkola
>>>>> FNAL
>>>>>
>>>>>> On Oct 9, 2023, at 10:13 AM, Andreas Dilger >>>>> <mailto:adil...@whamcloud.com>> wrote:
>>>>>>
>>>>>> The quota accounting is controlled by the backing filesystem of the OSTs 
>>>>>> and MDTs.
>>>>>>
>>>&