[ceph-users] Re: Provide more documentation for MDS performance tuning on large file systems

Janek Bevendorff Mon, 07 Dec 2020 04:16:24 -0800

Never mind, when I enable it on a more busy directory, I do see newephemeral pins popping up. Just not on the directories I set it onoriginally. Let's see how that holds up.


On 07/12/2020 13:04, Janek Bevendorff wrote:

Thanks. I tried playing around a bit withmds_export_ephemeral_distributed just now, because it's pretty muchthe same thing that your script does manually. Unfortunately, it seemsto have no effect.

I pinned all top-level directories to mds.0 and then enabledceph.dir.pin.distributed for a few sub trees. Despitemds_export_ephemeral_distributed being set to true, all work is doneby mds.0 now and I also don't see any additional pins in ceph tellmds.\* get subtrees.


Any ideas why that might be?


On 07/12/2020 10:49, Dan van der Ster wrote:

On Mon, Dec 7, 2020 at 10:39 AM Janek Bevendorff
<janek.bevendo...@uni-weimar.de> wrote:

What exactly do you set to 64k?
We used to set mds_max_caps_per_client to 50000, but once we started
using the tuned caps recall config, we reverted that back to the
default 1M without issue.

mds_max_caps_per_client. As I mentioned, some clients hit this limit
regularly and they aren't entirely idle. I will keep tuning the recall
settings, though.

This 15k caps client I mentioned is not related to the max caps per
client config. In recent nautilus, the MDS will proactively recall
caps from idle clients -- so a client with even just a few caps like
this can provoke the caps recall warnings (if it is buggy, like in
this case). The client doesn't cause any real problems, just the
annoying warnings.

We only see the warnings during normal operation. I remember having
massive issues with early Nautilus releases, but thanks to more
aggressive recall behaviour in newer releases, that is fixed. Back then
it was virtually impossible to keep the MDS within the bounds of its
memory limit. Nowadays, the warnings only appear when the MDS is really
stressed. In that situation, the whole FS performance is already

degraded massively and MDSs are likely to fail and run into therejoin loop.

Multi-active + pinning definitely increases the overall MD throughput
(once you can get the relevant inodes cached), because as you know the
MDS is single threaded and CPU bound at the limit.
We could get something like 4-5k handle_client_requests out of a
single MDS, and that really does scale horizontally as you add MDSs
(and pin).

Okay, I will definitely re-evaluate options for pinning individual
directories, perhaps a small script can do it.

There is a new ephemeral pinning option in the latest latest releases,
but we didn't try it yet.
Here's our script -- it assumes the parent dir is pinned to zero or
that bal is disabled:

https://github.com/cernceph/ceph-scripts/blob/master/tools/cephfs/cephfs-bal-shard


Too many pins can cause problems -- we have something like 700 pins at
the moment and it's fine, though.

Cheers, Dan

_______________________________________________
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io

_______________________________________________
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io

[ceph-users] Re: Provide more documentation for MDS performance tuning on large file systems

Reply via email to