Re: [lustre-discuss] [EXTERNAL] Re: Disk quota exceeded while quota is not filled

2020-08-26 Thread Chad DeWitt
Sure, David.

Unfortunately, I do not have experience with project quota, but I would not
think that it's causing any ill effects by not having any project quotas
defined.

>From the manual, this appears to be what you are encountering (25.5. Quota
Allocation):

*Note*
*It is very important to note that the block quota is consumed per OST and
the inode quota per MDS. Therefore, when the quota is consumed on one OST
(resp. MDT), the client may not be able to create files regardless of the
quota available on other OSTs (resp. MDTs).*


(The group, md_kaplan, has hit quota on 19 OSTs.) Not sure if there is a
way to "free" up the 500GB still allowed to md_kaplan. Maybe unset and then
reset the group's quota? Maybe stripe any large files owned by md_kaplan so
the data is spread amongst the OSTs?

Cheers,
Chad

On Wed, Aug 26, 2020 at 11:41 AM David Cohen 
wrote:

> Thank you Chad for answering,
> We are using the patched kernel on the MDT/OSS
> The problem is in the group space quota.
> In any case I enabled project quota just for future purposes.
> There are no defined projects, do you think it can still pose a problem?
>
> Best,
> David
>
>
>
>
> On Wed, Aug 26, 2020 at 3:18 PM Chad DeWitt  wrote:
>
>> Hi David,
>>
>> Hope you're doing well.
>>
>> This is a total shot in the dark, but depending on the kernel version you
>> are running, you may need a patched kernel to use project quotas. I'm not
>> sure what the symptoms would be, but it may be worth turning off project
>> quotas and seeing if doing so resolves your issue:
>>
>> lctl conf_param technion.quota.mdt=none
>> lctl conf_param technion.quota.mdt=ug
>> lctl conf_param technion.quota.ost=none
>> lctl conf_param technion.quota.ost=ug
>>
>> (Looks like you have been running project quota on your MDT for a while
>> without issue, so this may be a deadend.)
>>
>> Here's more info concerning when a patched kernel is necessary for
>> project quotas (25.2.  Enabling Disk Quotas):
>>
>> http://doc.lustre.org/lustre_manual.xhtml
>>
>>
>> Cheers,
>> Chad
>>
>> 
>>
>> Chad DeWitt, CISSP | University Research Computing
>>
>> UNC Charlotte *| *Office of OneIT
>>
>> ccdew...@uncc.edu
>>
>> 
>>
>>
>>
>> On Tue, Aug 25, 2020 at 3:04 AM David Cohen <
>> cda...@physics.technion.ac.il> wrote:
>>
>>> [*Caution*: Email from External Sender. Do not click or open links or
>>> attachments unless you know this sender.]
>>>
>>> Hi,
>>> Still hoping for a reply...
>>>
>>> It seems to me that old groups are more affected by the issue than new
>>> ones that were created after a major disk migration.
>>> It seems that the quota enforcement is somehow based on a counter other
>>> than the accounting as the accounting produces the same numbers as du.
>>> So if quota is calculated separately from accounting, it is possible
>>> that quota is broken and keeps values from removed disks, while accounting
>>> is correct.
>>> So following that suspicion I tried to force the FS to recalculate quota.
>>> I tried:
>>> lctl conf_param technion.quota.ost=none
>>> and back to:
>>> lctl conf_param technion.quota.ost=ugp
>>>
>>> I tried running on mds and all ost:
>>> tune2fs -O ^quota
>>> and on again:
>>> tune2fs -O quota
>>> and after each attempt, also:
>>> lctl lfsck_start -A -t all -o -e continue
>>>
>>> But still the problem persists and groups under the quota usage get
>>> blocked with "quota exceeded"
>>>
>>> Best,
>>> David
>>>
>>>
>>> On Sun, Aug 16, 2020 at 8:41 AM David Cohen <
>>> cda...@physics.technion.ac.il> wrote:
>>>
 Hi,
 Adding some more information.
 A Few months ago the data on the Lustre fs was migrated to new physical
 storage.
 After successful migration the old ost were marked as active=0
 (lctl conf_param technion-OST0001.osc.active=0)

 Since then all the clients were unmounted and mounted.
 tunefs.lustre --writeconf was executed on the mgs/mdt and all the ost.
 lctl dl don't show the old ost anymore, but when querying the quota
 they still appear.
 As I see that new users are less affected by the "quota exceeded"
 problem (blocked from writing while quota is not filled),
 I suspect that quota calculation is still summing values from the old
 ost:

 *lfs quota -g -v md_kaplan /storage/*
 Disk quotas for grp md_kaplan (gid 10028):
  Filesystem  kbytes   quota   limit   grace   files   quota   limit
   grace
   /storage/ 4823987000   0 5368709120   -  143596   0
 0   -
 technion-MDT_UUID
   37028   -   0   -  143596   -   0
   -
 quotactl ost0 failed.
 quotactl ost1 failed.
 quotactl ost2 failed.
 quotactl ost3 failed.
 quotactl ost4 failed.
 quotactl ost5 failed.
 quotactl ost6 failed.
 quotactl ost7 failed.
 quotactl ost8 failed.
 quotactl 

Re: [lustre-discuss] [EXTERNAL] Re: Disk quota exceeded while quota is not filled

2020-08-26 Thread David Cohen
Thank you Chad for answering,
We are using the patched kernel on the MDT/OSS
The problem is in the group space quota.
In any case I enabled project quota just for future purposes.
There are no defined projects, do you think it can still pose a problem?

Best,
David




On Wed, Aug 26, 2020 at 3:18 PM Chad DeWitt  wrote:

> Hi David,
>
> Hope you're doing well.
>
> This is a total shot in the dark, but depending on the kernel version you
> are running, you may need a patched kernel to use project quotas. I'm not
> sure what the symptoms would be, but it may be worth turning off project
> quotas and seeing if doing so resolves your issue:
>
> lctl conf_param technion.quota.mdt=none
> lctl conf_param technion.quota.mdt=ug
> lctl conf_param technion.quota.ost=none
> lctl conf_param technion.quota.ost=ug
>
> (Looks like you have been running project quota on your MDT for a while
> without issue, so this may be a deadend.)
>
> Here's more info concerning when a patched kernel is necessary for
> project quotas (25.2.  Enabling Disk Quotas):
>
> http://doc.lustre.org/lustre_manual.xhtml
>
>
> Cheers,
> Chad
>
> 
>
> Chad DeWitt, CISSP | University Research Computing
>
> UNC Charlotte *| *Office of OneIT
>
> ccdew...@uncc.edu
>
> 
>
>
>
> On Tue, Aug 25, 2020 at 3:04 AM David Cohen 
> wrote:
>
>> [*Caution*: Email from External Sender. Do not click or open links or
>> attachments unless you know this sender.]
>>
>> Hi,
>> Still hoping for a reply...
>>
>> It seems to me that old groups are more affected by the issue than new
>> ones that were created after a major disk migration.
>> It seems that the quota enforcement is somehow based on a counter other
>> than the accounting as the accounting produces the same numbers as du.
>> So if quota is calculated separately from accounting, it is possible that
>> quota is broken and keeps values from removed disks, while accounting is
>> correct.
>> So following that suspicion I tried to force the FS to recalculate quota.
>> I tried:
>> lctl conf_param technion.quota.ost=none
>> and back to:
>> lctl conf_param technion.quota.ost=ugp
>>
>> I tried running on mds and all ost:
>> tune2fs -O ^quota
>> and on again:
>> tune2fs -O quota
>> and after each attempt, also:
>> lctl lfsck_start -A -t all -o -e continue
>>
>> But still the problem persists and groups under the quota usage get
>> blocked with "quota exceeded"
>>
>> Best,
>> David
>>
>>
>> On Sun, Aug 16, 2020 at 8:41 AM David Cohen <
>> cda...@physics.technion.ac.il> wrote:
>>
>>> Hi,
>>> Adding some more information.
>>> A Few months ago the data on the Lustre fs was migrated to new physical
>>> storage.
>>> After successful migration the old ost were marked as active=0
>>> (lctl conf_param technion-OST0001.osc.active=0)
>>>
>>> Since then all the clients were unmounted and mounted.
>>> tunefs.lustre --writeconf was executed on the mgs/mdt and all the ost.
>>> lctl dl don't show the old ost anymore, but when querying the quota they
>>> still appear.
>>> As I see that new users are less affected by the "quota exceeded"
>>> problem (blocked from writing while quota is not filled),
>>> I suspect that quota calculation is still summing values from the old
>>> ost:
>>>
>>> *lfs quota -g -v md_kaplan /storage/*
>>> Disk quotas for grp md_kaplan (gid 10028):
>>>  Filesystem  kbytes   quota   limit   grace   files   quota   limit
>>>   grace
>>>   /storage/ 4823987000   0 5368709120   -  143596   0
>>> 0   -
>>> technion-MDT_UUID
>>>   37028   -   0   -  143596   -   0
>>>   -
>>> quotactl ost0 failed.
>>> quotactl ost1 failed.
>>> quotactl ost2 failed.
>>> quotactl ost3 failed.
>>> quotactl ost4 failed.
>>> quotactl ost5 failed.
>>> quotactl ost6 failed.
>>> quotactl ost7 failed.
>>> quotactl ost8 failed.
>>> quotactl ost9 failed.
>>> quotactl ost10 failed.
>>> quotactl ost11 failed.
>>> quotactl ost12 failed.
>>> quotactl ost13 failed.
>>> quotactl ost14 failed.
>>> quotactl ost15 failed.
>>> quotactl ost16 failed.
>>> quotactl ost17 failed.
>>> quotactl ost18 failed.
>>> quotactl ost19 failed.
>>> quotactl ost20 failed.
>>> technion-OST0015_UUID
>>> 114429464*  - 114429464   -   -   -
>>>   -   -
>>> technion-OST0016_UUID
>>> 92938588   - 92938592   -   -   -
>>> -   -
>>> technion-OST0017_UUID
>>> 128496468*  - 128496468   -   -   -
>>>   -   -
>>> technion-OST0018_UUID
>>> 191478704*  - 191478704   -   -   -
>>>   -   -
>>> technion-OST0019_UUID
>>> 107720552   - 107720560   -   -   -
>>>   -   -
>>> technion-OST001a_UUID
>>> 165631952*  - 165631952   -   -   -
>>>   -   -
>>> technion-OST001b_UUID
>>> 

Re: [lustre-discuss] [EXTERNAL] Re: Disk quota exceeded while quota is not filled

2020-08-26 Thread Chad DeWitt
Hi David,

Hope you're doing well.

This is a total shot in the dark, but depending on the kernel version you
are running, you may need a patched kernel to use project quotas. I'm not
sure what the symptoms would be, but it may be worth turning off project
quotas and seeing if doing so resolves your issue:

lctl conf_param technion.quota.mdt=none
lctl conf_param technion.quota.mdt=ug
lctl conf_param technion.quota.ost=none
lctl conf_param technion.quota.ost=ug

(Looks like you have been running project quota on your MDT for a while
without issue, so this may be a deadend.)

Here's more info concerning when a patched kernel is necessary for
project quotas (25.2.  Enabling Disk Quotas):

http://doc.lustre.org/lustre_manual.xhtml


Cheers,
Chad



Chad DeWitt, CISSP | University Research Computing

UNC Charlotte *| *Office of OneIT

ccdew...@uncc.edu





On Tue, Aug 25, 2020 at 3:04 AM David Cohen 
wrote:

> [*Caution*: Email from External Sender. Do not click or open links or
> attachments unless you know this sender.]
>
> Hi,
> Still hoping for a reply...
>
> It seems to me that old groups are more affected by the issue than new
> ones that were created after a major disk migration.
> It seems that the quota enforcement is somehow based on a counter other
> than the accounting as the accounting produces the same numbers as du.
> So if quota is calculated separately from accounting, it is possible that
> quota is broken and keeps values from removed disks, while accounting is
> correct.
> So following that suspicion I tried to force the FS to recalculate quota.
> I tried:
> lctl conf_param technion.quota.ost=none
> and back to:
> lctl conf_param technion.quota.ost=ugp
>
> I tried running on mds and all ost:
> tune2fs -O ^quota
> and on again:
> tune2fs -O quota
> and after each attempt, also:
> lctl lfsck_start -A -t all -o -e continue
>
> But still the problem persists and groups under the quota usage get
> blocked with "quota exceeded"
>
> Best,
> David
>
>
> On Sun, Aug 16, 2020 at 8:41 AM David Cohen 
> wrote:
>
>> Hi,
>> Adding some more information.
>> A Few months ago the data on the Lustre fs was migrated to new physical
>> storage.
>> After successful migration the old ost were marked as active=0
>> (lctl conf_param technion-OST0001.osc.active=0)
>>
>> Since then all the clients were unmounted and mounted.
>> tunefs.lustre --writeconf was executed on the mgs/mdt and all the ost.
>> lctl dl don't show the old ost anymore, but when querying the quota they
>> still appear.
>> As I see that new users are less affected by the "quota exceeded" problem
>> (blocked from writing while quota is not filled),
>> I suspect that quota calculation is still summing values from the old ost:
>>
>> *lfs quota -g -v md_kaplan /storage/*
>> Disk quotas for grp md_kaplan (gid 10028):
>>  Filesystem  kbytes   quota   limit   grace   files   quota   limit
>> grace
>>   /storage/ 4823987000   0 5368709120   -  143596   0
>>   0   -
>> technion-MDT_UUID
>>   37028   -   0   -  143596   -   0
>> -
>> quotactl ost0 failed.
>> quotactl ost1 failed.
>> quotactl ost2 failed.
>> quotactl ost3 failed.
>> quotactl ost4 failed.
>> quotactl ost5 failed.
>> quotactl ost6 failed.
>> quotactl ost7 failed.
>> quotactl ost8 failed.
>> quotactl ost9 failed.
>> quotactl ost10 failed.
>> quotactl ost11 failed.
>> quotactl ost12 failed.
>> quotactl ost13 failed.
>> quotactl ost14 failed.
>> quotactl ost15 failed.
>> quotactl ost16 failed.
>> quotactl ost17 failed.
>> quotactl ost18 failed.
>> quotactl ost19 failed.
>> quotactl ost20 failed.
>> technion-OST0015_UUID
>> 114429464*  - 114429464   -   -   -
>> -   -
>> technion-OST0016_UUID
>> 92938588   - 92938592   -   -   -   -
>>   -
>> technion-OST0017_UUID
>> 128496468*  - 128496468   -   -   -
>> -   -
>> technion-OST0018_UUID
>> 191478704*  - 191478704   -   -   -
>> -   -
>> technion-OST0019_UUID
>> 107720552   - 107720560   -   -   -
>> -   -
>> technion-OST001a_UUID
>> 165631952*  - 165631952   -   -   -
>> -   -
>> technion-OST001b_UUID
>> 460714156*  - 460714156   -   -   -
>> -   -
>> technion-OST001c_UUID
>> 157182900*  - 157182900   -   -   -
>> -   -
>> technion-OST001d_UUID
>> 102945952*  - 102945952   -   -   -
>> -   -
>> technion-OST001e_UUID
>> 175840980*  - 175840980   -   -   -
>> -   -
>> technion-OST001f_UUID
>> 142666872*  - 142666872   -   -   -
>> -   -
>> technion-OST0020_UUID
>>