Salut Stephane!
Thanks a lot for this. I guess this is the kind of helpful answer I was looking
for when I posted. All in all it seems we will need to find the right value
that works for us. I have the impression that changing also the settings in the
middle of a very high load might not be the best idea since the queues are
already filled. We see some kind of blocked filesystem during some minutes
after we enable it but afterwards it seems to work better. Have you also tried
to enable it on LDLM services? I was advised in the past to never enable any
kind of throttling on LDLM so locks are cancelled as fast as possible,
otherwise we would have high CPU and memory usage on the MDS side.
I agree that it would be very useful to know which users have long waiting
queues, this could eventually help to create dynamic and more complex rules for
throttling.
Regards,
Diego
On 15.10.21, 09:13, "Stephane Thiell" wrote:
Salut Diego!
Yes, we have been using NRS TBF by UID on our Oak storage system for months
now with Lustre 2.12. It’s a capacity-oriented, global filesystem, not designed
for heavy workloads (unlike our scratch filesystem) but with many users and as
such, a great candidate for NRS TBF UID. Since NRS, we have seen WAY fewer
occurrences of single users abusing the system (which is always by mistake so
we’re helping them too!). We use NRS TBF UID for all Lustre services on MDS and
OSS.
We have an “exemption" rule for "root {0}" at 1, and a default rule
"default {*}” at a certain value. This value is per user and per CPT (it’s also
a value per lustre service on the MDS for example, eg. mdt_readpage is a
separate service). If you have large servers with many CPTs and set the value
to 500, that’s 500 req/s per CPT per user, so perhaps it is still too high to
be useful. The ideal value also probably depends on your default striping or
other specifics.
To set the NRS rate values right for the system, our approach is to monitor
the active/queued values taken from the ’tbf uid’ policy on each OSS with lctl
get_param ost.OSS.ost_io.nrs_tbf_rule (same thing on MDS for each mdt service).
We record these instant gauge-like values every minute, which seems to be
enough to see trends. The ‘queued' number is the most useful to me as I can
easily see the impact of the rule by looking at the graph. Graphing these
metrics over time allows us to adjust the rates so that queueing is not the
norm, but the exception, while limiting heavy workloads.
So it’s working for us on this system, the only thing now is that we would
love to have a way to get additional NRS stats from Lustre, for example, the
UIDs that have reached the rate limit over a period.
Lastly, we tried to implement it on our scratch filesystem, but it’s more
difficult. If a user has heavy duty jobs running on compute nodes and hit the
rate limit, the user basically cannot transfer anything from a DTN or a login
node (and will complain). I’ve opened LU-14567 to discuss wildcard support for
“uid" in NRS TBF policy (’tbf’ and not ’tbf uid’) rules so that we could mix
other, non-UID TBF rules with UID TBF rules. I don’t know how hard it is to
implement.
Hope that helps,
Stephane
> On Oct 14, 2021, at 12:33 PM, Moreno Diego (ID SIS)
wrote:
>
> Hi Lustre friends,
>
> I'm wondering if someone has experience setting NRS TBF (by UID) on the
OSTs (ost_io and ost service) in order to avoid congestion of the filesystem
IOPS or bandwidth. All my tries during the last months have miserably failed
into something that doesn’t look like QoS when the system has a high load. Once
the system is under high load not even the TBF UID policy is saving us from
slow response times for any user. So far, I have only tried setting it by UID
so every user has their fair share of bandwidth. I tried different rate values
for the default rule (5'000, 1'000 or 500). We have Lustre 2.12 in our cluster.
>
> Maybe there's any other setting that needs throttling (I see a parameter
/sys/module/ptlrpc/parameters/tbf_rate that I could not find documented set to
10'000), is there anything I'm missing about this feature?
>
> Regards,
>
> Diego
>
>
> ___
> lustre-discuss mailing list
> lustre-discuss@lists.lustre.org
> http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org
___
lustre-discuss mailing list
lustre-discuss@lists.lustre.org
http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org