Re: [lustre-discuss] Quota issue after OST removal

2022-10-26 Thread Daniel Szkola via lustre-discuss
I did show a 'lfs quota -g somegroup' in the original post and yes, each OST
is at the limit that was originally allocated, especially after migrating
the files off of the two OSTs before removal.

However, I think you may be misreading the issue here. The total quota is
27T and all the files on the remaining OSTs adds up to just over 21T because
two OSTs have been removed permanently. The permanently removed OSTs should
not be part of the calculations anymore.

When the two OSTs were removed, shouldn't the quota be split among the
remaining OSTs with each OST given a bigger share of the overall quota? Is
there a way to force this? Will restarting the MDS cause this to happen?

I just changed the soft/hard limits to 37T/40T from 27T/30T and that does
allocate more space per OST, but putting it back to 27T/30T puts the
original values back and the group is again shown as exceeding quota. Why is
setquota still using the removed OSTs? You can see in the listing where it
is still looking for ost4 and ost5.

quotactl ost4 failed.
quotactl ost5 failed.

--
Dan Szkola
FNAL

On Wed, 2022-10-26 at 21:00 +0200, Thomas Roth via lustre-discuss wrote:
> Hi Daniel,
> 
> isn't this expected: on your lustrefs-OST0001, usage  seems to have hit
> the limit (perhaps if you do 'lfs quota -g somegroup...', it will show
> you by how many bytes).
> 
> If one part of the distributed quota is exceeded, Lustre should report
> that with the * - although the total across the file system is still below
> the 
> limit.
> 
> 
> Obviously your 'somegroup' is at the quota limit on all visible OSTs, so
> my guess is that would be the same on the missing two OSTs.
> So, either have some data removed or increase the limit.
> 
> Best regards
> Thomas
> 
> On 26.10.22 16:52, Daniel Szkola via lustre-discuss wrote:
> > Hello all,
> > 
> > We recently removed an OSS/OST node that was spontaneously shutting down
> > so
> > hardware testing could be performed. I have no idea how long it will be
> > out,
> > so I followed the procedure for permanent removal.
> > 
> > Since then space usage is being calculated correctly, but 'lfs quota'
> > will
> > show groups as exceeding quota, despite being under both soft and hard
> > limits. A verbose listing shows that all OST limits are met and I have
> > no
> > idea how to reset the limits now that the two OSTs on the removed OSS
> > node
> > are not part of the equation.
> > 
> > Due to the heavy usage of the Lustre filesystem, no clients have been
> > unmounted and no MDS or OST nodes have been restarted. The underlying
> > filesystem is ZFS.
> > 
> > Looking for ideas on how to correct this.
> > 
> > Example:
> > 
> > # lfs quota -gh somegroup -v /lustre1
> > Disk quotas for grp somegroup (gid ):
> >   Filesystem    used   quota   limit   grace   files   quota   limit
> > grace
> >     /lustre1  21.59T*    27T 30T 6d23h39m15s 2250592  2621440
> > 3145728
> > -
> > lustrefs-MDT_UUID
> >   1.961G   -  1.962G   - 2250592   - 2359296
> > -
> > lustrefs-OST_UUID
> >   2.876T   -  2.876T   -   -   -   -
> > -
> > lustrefs-OST0001_UUID
> >   2.611T*  -  2.611T   -   -   -   -
> > -
> > lustrefs-OST0002_UUID
> >   4.794T   -  4.794T   -   -   -   -
> > -
> > lustrefs-OST0003_UUID
> >   4.587T   -  4.587T   -   -   -   -
> > -
> > quotactl ost4 failed.
> > quotactl ost5 failed.
> > lustrefs-OST0006_UUID
> >    3.21T   -   3.21T   -   -   -   -
> > -
> > lustrefs-OST0007_UUID
> >   3.515T   -  3.515T   -   -   -   -
> > -
> > Total allocated inode limit: 2359296, total allocated block limit:
> > 21.59T
> > Some errors happened when getting quota info. Some devices may be not
> > working or deactivated. The data in "[]" is inaccurate.
> > 
> > --
> > Dan Szkola
> > FNAL
> > ___
> > lustre-discuss mailing list
> > lustre-discuss@lists.lustre.org
> > https://urldefense.proofpoint.com/v2/url?u=http-3A__lists.lustre.org_listinfo.cgi_lustre-2Ddiscuss-2Dlustre.org&d=DwICAg&c=gRgGjJ3BkIsb5y6s49QqsA&r=e9DXjyTaQ786Tg7WH7oIVaQOA1YDRqyxHOUaYU2_LQw&m=5r-NvoGFgbr_fVjvWQhbhz8QVqYg9ZRVU5EEQik2CBRsJg5sReIlz9B-UX7VHKey&s=ymWPMGZy5p2pXtOBT_qsRHVMNT5OGmAACZpJj9xUq0k&e=
> >  
> ___
> lustre-discuss mailing list
> lustre-discuss@lists.lustre.org
> https://urldefense.proofpoint.com/v2/url?u=http-3A__lists.lustre.org_listinfo.cgi_lustre-2Ddiscuss-2Dlustre.org&d=DwICAg&c=gRgGjJ3BkIsb5y6s49QqsA&r=e9DXjyTaQ786Tg7WH7oIVaQOA1YDRqyxHOUaYU2_LQw&m=5r-NvoGFgbr_fVjvWQhbhz8QVqYg9ZRVU5EEQik2CBRsJg5sReIlz9B-UX7VHKey&s=ymWPMGZy5p2pXtOBT_qsRHVMNT5OGmAACZpJj9xUq0k&e=
>  

___
lustre-discuss mailing list
lustre-discuss@lists.lustre.org
http://lists.lustre.org/li

Re: [lustre-discuss] Quota issue after OST removal

2022-10-26 Thread Thomas Roth via lustre-discuss

Hi Daniel,

isn't this expected: on your lustrefs-OST0001, usage  seems to have hit the 
limit (perhaps if you do 'lfs quota -g somegroup...', it will show
you by how many bytes).

If one part of the distributed quota is exceeded, Lustre should report that with the * - although the total across the file system is still below the 
limit.



Obviously your 'somegroup' is at the quota limit on all visible OSTs, so my 
guess is that would be the same on the missing two OSTs.
So, either have some data removed or increase the limit.

Best regards
Thomas

On 26.10.22 16:52, Daniel Szkola via lustre-discuss wrote:

Hello all,

We recently removed an OSS/OST node that was spontaneously shutting down so
hardware testing could be performed. I have no idea how long it will be out,
so I followed the procedure for permanent removal.

Since then space usage is being calculated correctly, but 'lfs quota' will
show groups as exceeding quota, despite being under both soft and hard
limits. A verbose listing shows that all OST limits are met and I have no
idea how to reset the limits now that the two OSTs on the removed OSS node
are not part of the equation.

Due to the heavy usage of the Lustre filesystem, no clients have been
unmounted and no MDS or OST nodes have been restarted. The underlying
filesystem is ZFS.

Looking for ideas on how to correct this.

Example:

# lfs quota -gh somegroup -v /lustre1
Disk quotas for grp somegroup (gid ):
  Filesystemused   quota   limit   grace   files   quota   limit
grace
/lustre1  21.59T*27T 30T 6d23h39m15s 2250592  2621440 3145728
-
lustrefs-MDT_UUID
  1.961G   -  1.962G   - 2250592   - 2359296
-
lustrefs-OST_UUID
  2.876T   -  2.876T   -   -   -   -
-
lustrefs-OST0001_UUID
  2.611T*  -  2.611T   -   -   -   -
-
lustrefs-OST0002_UUID
  4.794T   -  4.794T   -   -   -   -
-
lustrefs-OST0003_UUID
  4.587T   -  4.587T   -   -   -   -
-
quotactl ost4 failed.
quotactl ost5 failed.
lustrefs-OST0006_UUID
   3.21T   -   3.21T   -   -   -   -
-
lustrefs-OST0007_UUID
  3.515T   -  3.515T   -   -   -   -
-
Total allocated inode limit: 2359296, total allocated block limit: 21.59T
Some errors happened when getting quota info. Some devices may be not
working or deactivated. The data in "[]" is inaccurate.

--
Dan Szkola
FNAL
___
lustre-discuss mailing list
lustre-discuss@lists.lustre.org
http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org

___
lustre-discuss mailing list
lustre-discuss@lists.lustre.org
http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org


[lustre-discuss] Quota issue after OST removal

2022-10-26 Thread Daniel Szkola via lustre-discuss
Hello all,

We recently removed an OSS/OST node that was spontaneously shutting down so
hardware testing could be performed. I have no idea how long it will be out,
so I followed the procedure for permanent removal.

Since then space usage is being calculated correctly, but 'lfs quota' will
show groups as exceeding quota, despite being under both soft and hard
limits. A verbose listing shows that all OST limits are met and I have no
idea how to reset the limits now that the two OSTs on the removed OSS node
are not part of the equation.

Due to the heavy usage of the Lustre filesystem, no clients have been
unmounted and no MDS or OST nodes have been restarted. The underlying
filesystem is ZFS.

Looking for ideas on how to correct this.

Example:

# lfs quota -gh somegroup -v /lustre1
Disk quotas for grp somegroup (gid ):
 Filesystemused   quota   limit   grace   files   quota   limit  
grace
   /lustre1  21.59T*27T 30T 6d23h39m15s 2250592  2621440 3145728
-
lustrefs-MDT_UUID
 1.961G   -  1.962G   - 2250592   - 2359296
-
lustrefs-OST_UUID
 2.876T   -  2.876T   -   -   -   -
-
lustrefs-OST0001_UUID
 2.611T*  -  2.611T   -   -   -   -
-
lustrefs-OST0002_UUID
 4.794T   -  4.794T   -   -   -   -
-
lustrefs-OST0003_UUID
 4.587T   -  4.587T   -   -   -   -
-
quotactl ost4 failed.
quotactl ost5 failed.
lustrefs-OST0006_UUID
  3.21T   -   3.21T   -   -   -   -
-
lustrefs-OST0007_UUID
 3.515T   -  3.515T   -   -   -   -
-
Total allocated inode limit: 2359296, total allocated block limit: 21.59T
Some errors happened when getting quota info. Some devices may be not
working or deactivated. The data in "[]" is inaccurate.

--
Dan Szkola
FNAL
___
lustre-discuss mailing list
lustre-discuss@lists.lustre.org
http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org