Re: [lustre-discuss] Disk quota exceeded while quota is not filled

2020-08-25 Thread David Cohen
Hi,
Still hoping for a reply...

It seems to me that old groups are more affected by the issue than new ones
that were created after a major disk migration.
It seems that the quota enforcement is somehow based on a counter other
than the accounting as the accounting produces the same numbers as du.
So if quota is calculated separately from accounting, it is possible that
quota is broken and keeps values from removed disks, while accounting is
correct.
So following that suspicion I tried to force the FS to recalculate quota.
I tried:
lctl conf_param technion.quota.ost=none
and back to:
lctl conf_param technion.quota.ost=ugp

I tried running on mds and all ost:
tune2fs -O ^quota
and on again:
tune2fs -O quota
and after each attempt, also:
lctl lfsck_start -A -t all -o -e continue

But still the problem persists and groups under the quota usage get blocked
with "quota exceeded"

Best,
David


On Sun, Aug 16, 2020 at 8:41 AM David Cohen 
wrote:

> Hi,
> Adding some more information.
> A Few months ago the data on the Lustre fs was migrated to new physical
> storage.
> After successful migration the old ost were marked as active=0
> (lctl conf_param technion-OST0001.osc.active=0)
>
> Since then all the clients were unmounted and mounted.
> tunefs.lustre --writeconf was executed on the mgs/mdt and all the ost.
> lctl dl don't show the old ost anymore, but when querying the quota they
> still appear.
> As I see that new users are less affected by the "quota exceeded" problem
> (blocked from writing while quota is not filled),
> I suspect that quota calculation is still summing values from the old ost:
>
> *lfs quota -g -v md_kaplan /storage/*
> Disk quotas for grp md_kaplan (gid 10028):
>  Filesystem  kbytes   quota   limit   grace   files   quota   limit
> grace
>   /storage/ 4823987000   0 5368709120   -  143596   0
>   0   -
> technion-MDT_UUID
>   37028   -   0   -  143596   -   0
> -
> quotactl ost0 failed.
> quotactl ost1 failed.
> quotactl ost2 failed.
> quotactl ost3 failed.
> quotactl ost4 failed.
> quotactl ost5 failed.
> quotactl ost6 failed.
> quotactl ost7 failed.
> quotactl ost8 failed.
> quotactl ost9 failed.
> quotactl ost10 failed.
> quotactl ost11 failed.
> quotactl ost12 failed.
> quotactl ost13 failed.
> quotactl ost14 failed.
> quotactl ost15 failed.
> quotactl ost16 failed.
> quotactl ost17 failed.
> quotactl ost18 failed.
> quotactl ost19 failed.
> quotactl ost20 failed.
> technion-OST0015_UUID
> 114429464*  - 114429464   -   -   -
> -   -
> technion-OST0016_UUID
> 92938588   - 92938592   -   -   -   -
>   -
> technion-OST0017_UUID
> 128496468*  - 128496468   -   -   -
> -   -
> technion-OST0018_UUID
> 191478704*  - 191478704   -   -   -
> -   -
> technion-OST0019_UUID
> 107720552   - 107720560   -   -   -
> -   -
> technion-OST001a_UUID
> 165631952*  - 165631952   -   -   -
> -   -
> technion-OST001b_UUID
> 460714156*  - 460714156   -   -   -
> -   -
> technion-OST001c_UUID
> 157182900*  - 157182900   -   -   -
> -   -
> technion-OST001d_UUID
> 102945952*  - 102945952   -   -   -
> -   -
> technion-OST001e_UUID
> 175840980*  - 175840980   -   -   -
> -   -
> technion-OST001f_UUID
> 142666872*  - 142666872   -   -   -
> -   -
> technion-OST0020_UUID
> 188147548*  - 188147548   -   -   -
> -   -
> technion-OST0021_UUID
> 125914240*  - 125914240   -   -   -
> -   -
> technion-OST0022_UUID
> 186390800*  - 186390800   -   -   -
> -   -
> technion-OST0023_UUID
> 115386876   - 115386884   -   -   -
> -   -
> technion-OST0024_UUID
> 127139556*  - 127139556   -   -   -
> -   -
> technion-OST0025_UUID
> 179666580*  - 179666580   -   -   -
> -   -
> technion-OST0026_UUID
> 147837348   - 147837356   -   -   -
> -   -
> technion-OST0027_UUID
> 129823528   - 129823536   -   -   -
> -   -
> technion-OST0028_UUID
> 158270776   - 158270784   -   -   -
> -   -
> technion-OST0029_UUID
> 168762120   - 168763104   -   -   -
> -   -
> technion-OST002a_UUID
> 164235684   - 164235688   -   -   -
> -   -
> technion-OST002b_UUID
> 147512200   - 147512204   -   -   -
> -   -
> technion-OST002c_UUID
> 158046652   

Re: [lustre-discuss] Disk quota exceeded while quota is not filled

2020-08-15 Thread David Cohen
Hi,
Adding some more information.
A Few months ago the data on the Lustre fs was migrated to new physical
storage.
After successful migration the old ost were marked as active=0
(lctl conf_param technion-OST0001.osc.active=0)

Since then all the clients were unmounted and mounted.
tunefs.lustre --writeconf was executed on the mgs/mdt and all the ost.
lctl dl don't show the old ost anymore, but when querying the quota they
still appear.
As I see that new users are less affected by the "quota exceeded" problem
(blocked from writing while quota is not filled),
I suspect that quota calculation is still summing values from the old ost:

*lfs quota -g -v md_kaplan /storage/*
Disk quotas for grp md_kaplan (gid 10028):
 Filesystem  kbytes   quota   limit   grace   files   quota   limit
grace
  /storage/ 4823987000   0 5368709120   -  143596   0
0   -
technion-MDT_UUID
  37028   -   0   -  143596   -   0
  -
quotactl ost0 failed.
quotactl ost1 failed.
quotactl ost2 failed.
quotactl ost3 failed.
quotactl ost4 failed.
quotactl ost5 failed.
quotactl ost6 failed.
quotactl ost7 failed.
quotactl ost8 failed.
quotactl ost9 failed.
quotactl ost10 failed.
quotactl ost11 failed.
quotactl ost12 failed.
quotactl ost13 failed.
quotactl ost14 failed.
quotactl ost15 failed.
quotactl ost16 failed.
quotactl ost17 failed.
quotactl ost18 failed.
quotactl ost19 failed.
quotactl ost20 failed.
technion-OST0015_UUID
114429464*  - 114429464   -   -   -   -
  -
technion-OST0016_UUID
92938588   - 92938592   -   -   -   -
-
technion-OST0017_UUID
128496468*  - 128496468   -   -   -   -
  -
technion-OST0018_UUID
191478704*  - 191478704   -   -   -   -
  -
technion-OST0019_UUID
107720552   - 107720560   -   -   -   -
  -
technion-OST001a_UUID
165631952*  - 165631952   -   -   -   -
  -
technion-OST001b_UUID
460714156*  - 460714156   -   -   -   -
  -
technion-OST001c_UUID
157182900*  - 157182900   -   -   -   -
  -
technion-OST001d_UUID
102945952*  - 102945952   -   -   -   -
  -
technion-OST001e_UUID
175840980*  - 175840980   -   -   -   -
  -
technion-OST001f_UUID
142666872*  - 142666872   -   -   -   -
  -
technion-OST0020_UUID
188147548*  - 188147548   -   -   -   -
  -
technion-OST0021_UUID
125914240*  - 125914240   -   -   -   -
  -
technion-OST0022_UUID
186390800*  - 186390800   -   -   -   -
  -
technion-OST0023_UUID
115386876   - 115386884   -   -   -   -
  -
technion-OST0024_UUID
127139556*  - 127139556   -   -   -   -
  -
technion-OST0025_UUID
179666580*  - 179666580   -   -   -   -
  -
technion-OST0026_UUID
147837348   - 147837356   -   -   -   -
  -
technion-OST0027_UUID
129823528   - 129823536   -   -   -   -
  -
technion-OST0028_UUID
158270776   - 158270784   -   -   -   -
  -
technion-OST0029_UUID
168762120   - 168763104   -   -   -   -
  -
technion-OST002a_UUID
164235684   - 164235688   -   -   -   -
  -
technion-OST002b_UUID
147512200   - 147512204   -   -   -   -
  -
technion-OST002c_UUID
158046652   - 158046668   -   -   -   -
  -
technion-OST002d_UUID
199314048*  - 199314048   -   -   -   -
  -
technion-OST002e_UUID
209187196*  - 209187196   -   -   -   -
  -
technion-OST002f_UUID
162586732   - 162586764   -   -   -   -
  -
technion-OST0030_UUID
131248812*  - 131248812   -   -   -   -
  -
technion-OST0031_UUID
134665176*  - 134665176   -   -   -   -
  -
technion-OST0032_UUID
149767512*  - 149767512   -   -   -   -
  -
Total allocated inode limit: 0, total allocated block limit: 4823951056
Some errors happened when getting quota info. Some devices may be not
working or deactivated. The data in "[]" is inaccurate.


*lfs quota -g -h md_kaplan /storage/*
Disk quotas for grp md_kaplan (gid 10028):
 Filesystemused   quota   limit   grace   files   quota   limit
grace