Hi Eugen, Thanks for the update.
The message still appears in the logs these days. Option client_oc_size in my cluster is 100MB from the start. I have configured mds_cache_memory_limit to 4G and from then on the message reduced. What I noticed is that the mds task reserves 6G memory(in top) while "cache status" is closed to 4G in my environment. I'll keep on researching this. Thanks Eugen Block <ebl...@nde.ag> 于2018年9月6日周四 下午11:01写道: > Hi, > > I would like to update this thread for others struggling with cache > pressure. > > The last time we hit that message was more than three weeks ago > (workload has not changed), so it seems as our current configuration > is fitting our workload. > Reducing client_oc_size to 100 MB (from default 200 MB) seems to be > the trick here, just increasing the cache size was not enough, at > least not if you are limited in memory. Currently we have set > mds_cache_memory_limit to 4 GB. > > Another note on MDS cache size: > I had configured the mds_cache_memory_limit (4 GB) and client_oc_size > (100 MB) in version 12.2.5. Comparing the real usage with "ceph daemon > mds.<MDS> cache status" and the reserved memory with "top" I noticed a > huge difference, the reserved memory was almost 8 GB while "cache > status" was at nearly 4 GB. > After upgrading to 12.2.7 the reserved memory size in top is still > only about 5 GB after one week. Obviously there have been improvements > regarding memory consumption of MDS, which is nice. :-) > > Regards, > Eugen > > > Zitat von Eugen Block <ebl...@nde.ag>: > > > Hi, > > > >> I think it does have positive effect on the messages. Cause I get fewer > >> messages than before. > > > > that's nice. I also receive definitely less cache pressure messages > > than before. > > I also started to play around with the client side cache > > configuration. I halved the client object cache size from 200 MB to > > 100 MB: > > > > ceph@host1:~ $ ceph daemon mds.host1 config set client_oc_size 104857600 > > > > Although I still encountered one pressure message recently the total > > amount of these messages has decreased significantly. > > > > Regards, > > Eugen > > > > > > Zitat von Zhenshi Zhou <deader...@gmail.com>: > > > >> Hi Eugen, > >> I think it does have positive effect on the messages. Cause I get fewer > >> messages than before. > >> > >> Eugen Block <ebl...@nde.ag> 于2018年8月20日周一 下午9:29写道: > >> > >>> Update: we are getting these messages again. > >>> > >>> So the search continues... > >>> > >>> > >>> Zitat von Eugen Block <ebl...@nde.ag>: > >>> > >>>> Hi, > >>>> > >>>> Depending on your kernel (memory leaks with CephFS) increasing the > >>>> mds_cache_memory_limit could be of help. What is your current > >>>> setting now? > >>>> > >>>> ceph:~ # ceph daemon mds.<MDS> config show | grep > mds_cache_memory_limit > >>>> > >>>> We had these messages for months, almost every day. > >>>> It would occur when hourly backup jobs ran and the MDS had to serve > >>>> an additional client (searching the whole CephFS for changes) > >>>> besides the existing CephFS clients. First we updated all clients to > >>>> a more recent kernel version, but the warnings didn't stop. Then we > >>>> doubled the cache size from 2 GB to 4 GB last week and since then I > >>>> haven't seen this warning again (for now). > >>>> > >>>> Try playing with the cache size to find a setting fitting your > >>>> needs, but don't forget to monitor your MDS in case something goes > >>>> wrong. > >>>> > >>>> Regards, > >>>> Eugen > >>>> > >>>> > >>>> Zitat von Wido den Hollander <w...@42on.com>: > >>>> > >>>>> On 08/13/2018 01:22 PM, Zhenshi Zhou wrote: > >>>>>> Hi, > >>>>>> Recently, the cluster runs healthy, but I get warning messages > >>> everyday: > >>>>>> > >>>>> > >>>>> Which version of Ceph? Which version of clients? > >>>>> > >>>>> Can you post: > >>>>> > >>>>> $ ceph versions > >>>>> $ ceph features > >>>>> $ ceph fs status > >>>>> > >>>>> Wido > >>>>> > >>>>>> 2018-08-13 17:39:23.682213 [INF] Cluster is now healthy > >>>>>> 2018-08-13 17:39:23.682144 [INF] Health check cleared: > >>>>>> MDS_CLIENT_RECALL (was: 6 clients failing to respond to cache > pressure) > >>>>>> 2018-08-13 17:39:23.052022 [INF] MDS health message cleared > (mds.0): > >>>>>> Client docker38:docker failing to respond to cache pressure > >>>>>> 2018-08-13 17:39:23.051979 [INF] MDS health message cleared > (mds.0): > >>>>>> Client docker73:docker failing to respond to cache pressure > >>>>>> 2018-08-13 17:39:23.051934 [INF] MDS health message cleared > (mds.0): > >>>>>> Client docker74:docker failing to respond to cache pressure > >>>>>> 2018-08-13 17:39:23.051853 [INF] MDS health message cleared > (mds.0): > >>>>>> Client docker75:docker failing to respond to cache pressure > >>>>>> 2018-08-13 17:39:23.051815 [INF] MDS health message cleared > (mds.0): > >>>>>> Client docker27:docker failing to respond to cache pressure > >>>>>> 2018-08-13 17:39:23.051753 [INF] MDS health message cleared > (mds.0): > >>>>>> Client docker27 failing to respond to cache pressure > >>>>>> 2018-08-13 17:38:11.100331 [WRN] Health check update: 6 clients > >>> failing > >>>>>> to respond to cache pressure (MDS_CLIENT_RECALL) > >>>>>> 2018-08-13 17:37:39.570014 [WRN] Health check update: 5 clients > >>> failing > >>>>>> to respond to cache pressure (MDS_CLIENT_RECALL) > >>>>>> 2018-08-13 17:37:31.099418 [WRN] Health check update: 3 clients > >>> failing > >>>>>> to respond to cache pressure (MDS_CLIENT_RECALL) > >>>>>> 2018-08-13 17:36:34.564345 [WRN] Health check update: 1 clients > >>> failing > >>>>>> to respond to cache pressure (MDS_CLIENT_RECALL) > >>>>>> 2018-08-13 17:36:27.121891 [WRN] Health check update: 3 clients > >>> failing > >>>>>> to respond to cache pressure (MDS_CLIENT_RECALL) > >>>>>> 2018-08-13 17:36:11.967531 [WRN] Health check update: 5 clients > >>> failing > >>>>>> to respond to cache pressure (MDS_CLIENT_RECALL) > >>>>>> 2018-08-13 17:35:59.870055 [WRN] Health check update: 6 clients > >>> failing > >>>>>> to respond to cache pressure (MDS_CLIENT_RECALL) > >>>>>> 2018-08-13 17:35:47.787323 [WRN] Health check update: 3 clients > >>> failing > >>>>>> to respond to cache pressure (MDS_CLIENT_RECALL) > >>>>>> 2018-08-13 17:34:59.435933 [WRN] Health check failed: 1 clients > >>> failing > >>>>>> to respond to cache pressure (MDS_CLIENT_RECALL) > >>>>>> 2018-08-13 17:34:59.045510 [WRN] MDS health message (mds.0): Client > >>>>>> docker75:docker failing to respond to cache pressure > >>>>>> > >>>>>> How can I fix it? > >>>>>> > >>>>>> > >>>>>> _______________________________________________ > >>>>>> ceph-users mailing list > >>>>>> ceph-users@lists.ceph.com > >>>>>> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com > >>>>>> > >>>>> _______________________________________________ > >>>>> ceph-users mailing list > >>>>> ceph-users@lists.ceph.com > >>>>> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com > >>> > >>> > >>> > >>> _______________________________________________ > >>> ceph-users mailing list > >>> ceph-users@lists.ceph.com > >>> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com > >>> > > > >
_______________________________________________ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com