date:20211018

[lustre-discuss] df shows wrong size of lustre file system (on all nodes).

2021-10-18 Thread Sid Young via lustre-discuss

I have some stability in my lustre installation after many days of testing,
however df- h now reports the /home filesystem incorrectly.

After mounting the /home I get:
[root@n04 ~]# df -h
10.140.90.42@tcp:/lustre  286T   59T  228T  21% /lustre
10.140.90.42@tcp:/home191T  153T   38T  81% /home

doing it again straight after, I get:

[root@n04 ~]# df -h
10.140.90.42@tcp:/lustre  286T   59T  228T  21% /lustre
10.140.90.42@tcp:/home 48T   40T  7.8T  84% /home

The 4 OSTs report as active and present:

[root@n04 ~]# lfs df

UUID   1K-blocksUsed   Available Use% Mounted on
home-MDT_UUID 447380569641784064  4432019584   1% /home[MDT:0]
home-OST_UUID51097753600 40560842752 10536908800  80% /home[OST:0]
home-OST0001_UUID51097896960 42786978816  8310916096  84% /home[OST:1]
home-OST0002_UUID51097687040 38293322752 12804362240  75% /home[OST:2]
home-OST0003_UUID51097765888 42293640192  8804123648  83% /home[OST:3]

filesystem_summary:  204391103488 163934784512 40456310784  81% /home

[root@n04 ~]#
[root@n04 ~]# lfs osts
OBDS:
0: lustre-OST_UUID ACTIVE
1: lustre-OST0001_UUID ACTIVE
2: lustre-OST0002_UUID ACTIVE
3: lustre-OST0003_UUID ACTIVE
4: lustre-OST0004_UUID ACTIVE
5: lustre-OST0005_UUID ACTIVE
OBDS:
0: home-OST_UUID ACTIVE
1: home-OST0001_UUID ACTIVE
2: home-OST0002_UUID ACTIVE
3: home-OST0003_UUID ACTIVE
[root@n04 ~]#

Anyone seen this before? Reboots and remounts do not appear to change the
value. zfs pool is reporting as online and a scrub returns 0 errors.

Sid Young
___
lustre-discuss mailing list
lustre-discuss@lists.lustre.org
http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org

Re: [lustre-discuss] NRS TBF by UID and congestion

2021-10-18 Thread Moreno Diego (ID SIS)

Salut Stephane!

Thanks a lot for this. I guess this is the kind of helpful answer I was looking 
for when I posted. All in all it seems we will need to find the right value 
that works for us. I have the impression that changing also the settings in the 
middle of a very high load might not be the best idea since the queues are 
already filled. We see some kind of blocked filesystem during some minutes 
after we enable it but afterwards it seems to work better. Have you also tried 
to enable it on LDLM services? I was advised in the past to never enable any 
kind of throttling on LDLM so locks are cancelled as fast as possible, 
otherwise we would have high CPU and memory usage on the MDS side.

I agree that it would be very useful to know which users have long waiting 
queues, this could eventually help to create dynamic and more complex rules for 
throttling.

Regards,

Diego

On 15.10.21, 09:13, "Stephane Thiell"  wrote:

Salut Diego!

Yes, we have been using NRS TBF by UID on our Oak storage system for months 
now with Lustre 2.12. It’s a capacity-oriented, global filesystem, not designed 
for heavy workloads (unlike our scratch filesystem) but with many users and as 
such, a great candidate for NRS TBF UID. Since NRS, we have seen WAY fewer 
occurrences of single users abusing the system (which is always by mistake so 
we’re helping them too!). We use NRS TBF UID for all Lustre services on MDS and 
OSS.

We have an “exemption" rule for "root {0}" at 1, and a default rule 
"default {*}” at a certain value. This value is per user and per CPT (it’s also 
a value per lustre service on the MDS for example, eg. mdt_readpage is a 
separate service). If you have large servers with many CPTs and set the value 
to 500, that’s 500 req/s per CPT per user, so perhaps it is still too high to 
be useful. The ideal value also probably depends on your default striping or 
other specifics.

To set the NRS rate values right for the system, our approach is to monitor 
the active/queued values taken from the ’tbf uid’ policy on each OSS with lctl 
get_param ost.OSS.ost_io.nrs_tbf_rule (same thing on MDS for each mdt service). 
We record these instant gauge-like values every minute, which seems to be 
enough to see trends. The ‘queued' number is the most useful to me as I can 
easily see the impact of the rule by looking at the graph. Graphing these 
metrics over time allows us to adjust the rates so that queueing is not the 
norm, but the exception, while limiting heavy workloads.

So it’s working for us on this system, the only thing now is that we would 
love to have a way to get additional NRS stats from Lustre, for example, the 
UIDs that have reached the rate limit over a period.

Lastly, we tried to implement it on our scratch filesystem, but it’s more 
difficult. If a user has heavy duty jobs running on compute nodes and hit the 
rate limit, the user basically cannot transfer anything from a DTN or a login 
node (and will complain). I’ve opened LU-14567 to discuss wildcard support for 
“uid" in NRS TBF policy (’tbf’ and not ’tbf uid’) rules so that we could mix 
other, non-UID TBF rules with UID TBF rules. I don’t know how hard it is to 
implement.

Hope that helps,

Stephane

> On Oct 14, 2021, at 12:33 PM, Moreno Diego (ID SIS) 
 wrote:
> 
> Hi Lustre friends,
> 
> I'm wondering if someone has experience setting NRS TBF (by UID) on the 
OSTs (ost_io and ost service) in order to avoid congestion of the filesystem 
IOPS or bandwidth. All my tries during the last months have miserably failed 
into something that doesn’t look like QoS when the system has a high load. Once 
the system is under high load not even the TBF UID policy is saving us from 
slow response times for any user. So far, I have only tried setting it by UID 
so every user has their fair share of bandwidth. I tried different rate values 
for the default rule (5'000, 1'000 or 500). We have Lustre 2.12 in our cluster.
> 
> Maybe there's any other setting that needs throttling (I see a parameter 
/sys/module/ptlrpc/parameters/tbf_rate that I could not find documented set to 
10'000), is there anything I'm missing about this feature?
> 
> Regards,
> 
> Diego
> 
> 
> ___
> lustre-discuss mailing list
> lustre-discuss@lists.lustre.org
> http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org

___
lustre-discuss mailing list
lustre-discuss@lists.lustre.org
http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org

[lustre-discuss] df shows wrong size of lustre file system (on all nodes).

Re: [lustre-discuss] NRS TBF by UID and congestion

2 matches

Site Navigation

Mail list logo

Footer information