Re: [lustre-discuss] NRS TBF by UID and congestion

Moreno Diego (ID SIS) Mon, 18 Oct 2021 07:55:41 -0700

Salut Stephane!

Thanks a lot for this. I guess this is the kind of helpful answer I was looking 
for when I posted. All in all it seems we will need to find the right value 
that works for us. I have the impression that changing also the settings in the 
middle of a very high load might not be the best idea since the queues are 
already filled. We see some kind of blocked filesystem during some minutes 
after we enable it but afterwards it seems to work better. Have you also tried 
to enable it on LDLM services? I was advised in the past to never enable any 
kind of throttling on LDLM so locks are cancelled as fast as possible, 
otherwise we would have high CPU and memory usage on the MDS side.


I agree that it would be very useful to know which users have long waiting 
queues, this could eventually help to create dynamic and more complex rules for 
throttling.

Regards,

Diego
 

On 15.10.21, 09:13, "Stephane Thiell" <sthi...@stanford.edu> wrote:

    Salut Diego!

    Yes, we have been using NRS TBF by UID on our Oak storage system for months 
now with Lustre 2.12. It’s a capacity-oriented, global filesystem, not designed 
for heavy workloads (unlike our scratch filesystem) but with many users and as 
such, a great candidate for NRS TBF UID. Since NRS, we have seen WAY fewer 
occurrences of single users abusing the system (which is always by mistake so 
we’re helping them too!). We use NRS TBF UID for all Lustre services on MDS and 
OSS.

    We have an “exemption" rule for "root {0}" at 10000, and a default rule 
"default {*}” at a certain value. This value is per user and per CPT (it’s also 
a value per lustre service on the MDS for example, eg. mdt_readpage is a 
separate service). If you have large servers with many CPTs and set the value 
to 500, that’s 500 req/s per CPT per user, so perhaps it is still too high to 
be useful. The ideal value also probably depends on your default striping or 
other specifics.

    To set the NRS rate values right for the system, our approach is to monitor 
the active/queued values taken from the ’tbf uid’ policy on each OSS with lctl 
get_param ost.OSS.ost_io.nrs_tbf_rule (same thing on MDS for each mdt service). 
We record these instant gauge-like values every minute, which seems to be 
enough to see trends. The ‘queued' number is the most useful to me as I can 
easily see the impact of the rule by looking at the graph. Graphing these 
metrics over time allows us to adjust the rates so that queueing is not the 
norm, but the exception, while limiting heavy workloads.

    So it’s working for us on this system, the only thing now is that we would 
love to have a way to get additional NRS stats from Lustre, for example, the 
UIDs that have reached the rate limit over a period.

    Lastly, we tried to implement it on our scratch filesystem, but it’s more 
difficult. If a user has heavy duty jobs running on compute nodes and hit the 
rate limit, the user basically cannot transfer anything from a DTN or a login 
node (and will complain). I’ve opened LU-14567 to discuss wildcard support for 
“uid" in NRS TBF policy (’tbf’ and not ’tbf uid’) rules so that we could mix 
other, non-UID TBF rules with UID TBF rules. I don’t know how hard it is to 
implement.

    Hope that helps,

    Stephane


    > On Oct 14, 2021, at 12:33 PM, Moreno Diego (ID SIS) 
<diego.mor...@id.ethz.ch> wrote:
    > 
    > Hi Lustre friends,
    > 
    > I'm wondering if someone has experience setting NRS TBF (by UID) on the 
OSTs (ost_io and ost service) in order to avoid congestion of the filesystem 
IOPS or bandwidth. All my tries during the last months have miserably failed 
into something that doesn’t look like QoS when the system has a high load. Once 
the system is under high load not even the TBF UID policy is saving us from 
slow response times for any user. So far, I have only tried setting it by UID 
so every user has their fair share of bandwidth. I tried different rate values 
for the default rule (5'000, 1'000 or 500). We have Lustre 2.12 in our cluster.
    > 
    > Maybe there's any other setting that needs throttling (I see a parameter 
/sys/module/ptlrpc/parameters/tbf_rate that I could not find documented set to 
10'000), is there anything I'm missing about this feature?
    > 
    > Regards,
    > 
    > Diego
    > 
    > 
    > _______________________________________________
    > lustre-discuss mailing list
    > lustre-discuss@lists.lustre.org
    > http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org


_______________________________________________
lustre-discuss mailing list
lustre-discuss@lists.lustre.org
http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org

Re: [lustre-discuss] NRS TBF by UID and congestion

Reply via email to