Re: [ceph-users] failing to respond to cache pressure

Zhenshi Zhou Fri, 07 Sep 2018 02:43:39 -0700

Hi Eugen,

Thanks for the update.


The message still appears in the logs these days. Option client_oc_size in
my cluster is 100MB from the start. I have configured
mds_cache_memory_limit
to 4G and from then on the message reduced.

What I noticed is that the mds task reserves 6G memory(in top) while "cache
status"
is closed to 4G in my environment.

I'll keep on researching this.

Thanks

Eugen Block <ebl...@nde.ag> 于2018年9月6日周四 下午11:01写道：

> Hi,
>
> I would like to update this thread for others struggling with cache
> pressure.
>
> The last time we hit that message was more than three weeks ago
> (workload has not changed), so it seems as our current configuration
> is fitting our workload.
> Reducing client_oc_size to 100 MB (from default 200 MB) seems to be
> the trick here, just increasing the cache size was not enough, at
> least not if you are limited in memory. Currently we have set
> mds_cache_memory_limit to 4 GB.
>
> Another note on MDS cache size:
> I had configured the mds_cache_memory_limit (4 GB) and client_oc_size
> (100 MB) in version 12.2.5. Comparing the real usage with "ceph daemon
> mds.<MDS> cache status" and the reserved memory with "top" I noticed a
> huge difference, the reserved memory was almost 8 GB while "cache
> status" was at nearly 4 GB.
> After upgrading to 12.2.7 the reserved memory size in top is still
> only about 5 GB after one week. Obviously there have been improvements
> regarding memory consumption of MDS, which is nice. :-)
>
> Regards,
> Eugen
>
>
> Zitat von Eugen Block <ebl...@nde.ag>:
>
> > Hi,
> >
> >> I think it does have positive effect on the messages. Cause I get fewer
> >> messages than before.
> >
> > that's nice. I also receive definitely less cache pressure messages
> > than before.
> > I also started to play around with the client side cache
> > configuration. I halved the client object cache size from 200 MB to
> > 100 MB:
> >
> > ceph@host1:~ $ ceph daemon mds.host1 config set client_oc_size 104857600
> >
> > Although I still encountered one pressure message recently the total
> > amount of these messages has decreased significantly.
> >
> > Regards,
> > Eugen
> >
> >
> > Zitat von Zhenshi Zhou <deader...@gmail.com>:
> >
> >> Hi Eugen,
> >> I think it does have positive effect on the messages. Cause I get fewer
> >> messages than before.
> >>
> >> Eugen Block <ebl...@nde.ag> 于2018年8月20日周一 下午9:29写道：
> >>
> >>> Update: we are getting these messages again.
> >>>
> >>> So the search continues...
> >>>
> >>>
> >>> Zitat von Eugen Block <ebl...@nde.ag>:
> >>>
> >>>> Hi,
> >>>>
> >>>> Depending on your kernel (memory leaks with CephFS) increasing the
> >>>> mds_cache_memory_limit could be of help. What is your current
> >>>> setting now?
> >>>>
> >>>> ceph:~ # ceph daemon mds.<MDS> config show | grep
> mds_cache_memory_limit
> >>>>
> >>>> We had these messages for months, almost every day.
> >>>> It would occur when hourly backup jobs ran and the MDS had to serve
> >>>> an additional client (searching the whole CephFS for changes)
> >>>> besides the existing CephFS clients. First we updated all clients to
> >>>> a more recent kernel version, but the warnings didn't stop. Then we
> >>>> doubled the cache size from 2 GB to 4 GB last week and since then I
> >>>> haven't seen this warning again (for now).
> >>>>
> >>>> Try playing with the cache size to find a setting fitting your
> >>>> needs, but don't forget to monitor your MDS in case something goes
> >>>> wrong.
> >>>>
> >>>> Regards,
> >>>> Eugen
> >>>>
> >>>>
> >>>> Zitat von Wido den Hollander <w...@42on.com>:
> >>>>
> >>>>> On 08/13/2018 01:22 PM, Zhenshi Zhou wrote:
> >>>>>> Hi,
> >>>>>> Recently, the cluster runs healthy, but I get warning messages
> >>> everyday:
> >>>>>>
> >>>>>
> >>>>> Which version of Ceph? Which version of clients?
> >>>>>
> >>>>> Can you post:
> >>>>>
> >>>>> $ ceph versions
> >>>>> $ ceph features
> >>>>> $ ceph fs status
> >>>>>
> >>>>> Wido
> >>>>>
> >>>>>> 2018-08-13 17:39:23.682213 [INF]  Cluster is now healthy
> >>>>>> 2018-08-13 17:39:23.682144 [INF]  Health check cleared:
> >>>>>> MDS_CLIENT_RECALL (was: 6 clients failing to respond to cache
> pressure)
> >>>>>> 2018-08-13 17:39:23.052022 [INF]  MDS health message cleared
> (mds.0):
> >>>>>> Client docker38:docker failing to respond to cache pressure
> >>>>>> 2018-08-13 17:39:23.051979 [INF]  MDS health message cleared
> (mds.0):
> >>>>>> Client docker73:docker failing to respond to cache pressure
> >>>>>> 2018-08-13 17:39:23.051934 [INF]  MDS health message cleared
> (mds.0):
> >>>>>> Client docker74:docker failing to respond to cache pressure
> >>>>>> 2018-08-13 17:39:23.051853 [INF]  MDS health message cleared
> (mds.0):
> >>>>>> Client docker75:docker failing to respond to cache pressure
> >>>>>> 2018-08-13 17:39:23.051815 [INF]  MDS health message cleared
> (mds.0):
> >>>>>> Client docker27:docker failing to respond to cache pressure
> >>>>>> 2018-08-13 17:39:23.051753 [INF]  MDS health message cleared
> (mds.0):
> >>>>>> Client docker27 failing to respond to cache pressure
> >>>>>> 2018-08-13 17:38:11.100331 [WRN]  Health check update: 6 clients
> >>> failing
> >>>>>> to respond to cache pressure (MDS_CLIENT_RECALL)
> >>>>>> 2018-08-13 17:37:39.570014 [WRN]  Health check update: 5 clients
> >>> failing
> >>>>>> to respond to cache pressure (MDS_CLIENT_RECALL)
> >>>>>> 2018-08-13 17:37:31.099418 [WRN]  Health check update: 3 clients
> >>> failing
> >>>>>> to respond to cache pressure (MDS_CLIENT_RECALL)
> >>>>>> 2018-08-13 17:36:34.564345 [WRN]  Health check update: 1 clients
> >>> failing
> >>>>>> to respond to cache pressure (MDS_CLIENT_RECALL)
> >>>>>> 2018-08-13 17:36:27.121891 [WRN]  Health check update: 3 clients
> >>> failing
> >>>>>> to respond to cache pressure (MDS_CLIENT_RECALL)
> >>>>>> 2018-08-13 17:36:11.967531 [WRN]  Health check update: 5 clients
> >>> failing
> >>>>>> to respond to cache pressure (MDS_CLIENT_RECALL)
> >>>>>> 2018-08-13 17:35:59.870055 [WRN]  Health check update: 6 clients
> >>> failing
> >>>>>> to respond to cache pressure (MDS_CLIENT_RECALL)
> >>>>>> 2018-08-13 17:35:47.787323 [WRN]  Health check update: 3 clients
> >>> failing
> >>>>>> to respond to cache pressure (MDS_CLIENT_RECALL)
> >>>>>> 2018-08-13 17:34:59.435933 [WRN]  Health check failed: 1 clients
> >>> failing
> >>>>>> to respond to cache pressure (MDS_CLIENT_RECALL)
> >>>>>> 2018-08-13 17:34:59.045510 [WRN]  MDS health message (mds.0): Client
> >>>>>> docker75:docker failing to respond to cache pressure
> >>>>>>
> >>>>>> How can I fix it?
> >>>>>>
> >>>>>>
> >>>>>> _______________________________________________
> >>>>>> ceph-users mailing list
> >>>>>> ceph-users@lists.ceph.com
> >>>>>> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
> >>>>>>
> >>>>> _______________________________________________
> >>>>> ceph-users mailing list
> >>>>> ceph-users@lists.ceph.com
> >>>>> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
> >>>
> >>>
> >>>
> >>> _______________________________________________
> >>> ceph-users mailing list
> >>> ceph-users@lists.ceph.com
> >>> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
> >>>
>
>
>
>

_______________________________________________
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Re: [ceph-users] failing to respond to cache pressure

Reply via email to