Re: [ceph-users] MDS Stability with lots of CAPS

2019-10-05 Thread Patrick Donnelly
On Wed, Oct 2, 2019 at 9:48 AM Stefan Kooman  wrote:
> According to [1] there are new parameters in place to have the MDS
> behave more stable. Quoting that blog post "One of the more recent
> issues weve discovered is that an MDS with a very large cache (64+GB)
> will hang during certain recovery events."
>
> For all of us that are not (yet) running Nautilus I wonder what the best
> course of action is to prevent instable MDS during recovery situations.
>
> Artificially limit the "mds_cache_memory_limit" to say 32 GB?

Reduce the MDS cache size.

Mimic backport will probably make next minor release:
https://github.com/ceph/ceph/pull/28452

> I wonder if the amount of clients is of influence in a MDS being
> overwhelmed by release messages. Of are a handfull of clients (with
> millions of CAPS) able to overload an MDS?

Just one client with millions of caps could cause issues.

> Is there a way, other than unmounting cephfs on clients, to decrease the
> amount of CAPS the MDS has handed out, before an upgrade to a newer Ceph
> release is undertaken when running luminous / Mimic?

Incrementally reduce the cache size using a script.

> I'm assuming you need to restart the MDS to make the
> "mds_cache_memory_limit" effective, is that correct?

No. It is respected at runtime.

-- 
Patrick Donnelly, Ph.D.
He / Him / His
Senior Software Engineer
Red Hat Sunnyvale, CA
GPG: 19F28A586F808C2402351B93C3301A3E258DD79D

___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


[ceph-users] MDS Stability with lots of CAPS

2019-10-02 Thread Stefan Kooman
Hi,

According to [1] there are new parameters in place to have the MDS
behave more stable. Quoting that blog post "One of the more recent
issues weve discovered is that an MDS with a very large cache (64+GB)
will hang during certain recovery events."

For all of us that are not (yet) running Nautilus I wonder what the best
course of action is to prevent instable MDS during recovery situations.

Artificially limit the "mds_cache_memory_limit" to say 32 GB?

I wonder if the amount of clients is of influence in a MDS being
overwhelmed by release messages. Of are a handfull of clients (with
millions of CAPS) able to overload an MDS?

Is there a way, other than unmounting cephfs on clients, to decrease the
amount of CAPS the MDS has handed out, before an upgrade to a newer Ceph
release is undertaken when running luminous / Mimic?

I'm assuming you need to restart the MDS to make the
"mds_cache_memory_limit" effective, is that correct?

Gr. Stefan

[1]: https://ceph.com/community/nautilus-cephfs/


-- 
| BIT BV  https://www.bit.nl/Kamer van Koophandel 09090351
| GPG: 0xD14839C6   +31 318 648 688 / i...@bit.nl
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com