It seems that this might interesting - unfortunately this cannot be changed dynamically:
# ceph tell osd.* injectargs '--osd_snap_trim_sleep 0.025' osd.0: osd_snap_trim_sleep = '0.025000' (not observed, change may require restart) osd.1: osd_snap_trim_sleep = '0.025000' (not observed, change may require restart) osd.2: osd_snap_trim_sleep = '0.025000' (not observed, change may require restart) Am 29.06.2018 um 17:36 schrieb Paul Emmerich: > It's usually the snapshot deletion that triggers slowness. Are you > also deleting/rotating old snapshots when creating new ones? > > In this case: try to increase osd_snap_trim_sleep a little bit. Even > to 0.025 can help a lot with a lot of concurrent snapshot deletions. > (That's what we set as default for exactly this reason - users see > snapshot deletion as instant and cheap, but it can be quite expensive) > > Paul > > > 2018-06-29 17:28 GMT+02:00 Marc Schöchlin <m...@256bit.org > <mailto:m...@256bit.org>>: > > Hi Gregory, > > thanks for the link - very interesting talk. > You mentioned the following settings in your talk, but i was not > able to find some documentation in the osd config reference: > (http://docs.ceph.com/docs/luminous/rados/configuration/osd-config-ref/ > <http://docs.ceph.com/docs/luminous/rados/configuration/osd-config-ref/>) > > My clusters settings look like this (luminous/12.2.5): > > osd_snap_trim_cost = 1048576 > osd_snap_trim_priority = 5 > osd_snap_trim_sleep = 0.000000 > mon_osd_snap_trim_queue_warn_on = 32768 > > I currently experience messages like this: > > 2018-06-29 12:17:47.230028 mon.ceph-mon-s43 mon.0 > 10.23.27.153:6789/0 <http://10.23.27.153:6789/0> 1534846 : cluster > [INF] Health check cleared: REQUEST_SLOW (was: 22 slow requests > are blocked > 32 sec) > 2018-06-29 12:17:47.230069 mon.ceph-mon-s43 mon.0 > 10.23.27.153:6789/0 <http://10.23.27.153:6789/0> 1534847 : cluster > [INF] Cluster is now healthy > 2018-06-29 12:18:03.287947 mon.ceph-mon-s43 mon.0 > 10.23.27.153:6789/0 <http://10.23.27.153:6789/0> 1534876 : cluster > [WRN] Health check failed: 24 slow requests are blocked > 32 sec > (REQUEST_SLOW) > 2018-06-29 12:18:08.307626 mon.ceph-mon-s43 mon.0 > 10.23.27.153:6789/0 <http://10.23.27.153:6789/0> 1534882 : cluster > [WRN] Health check update: 70 slow requests are blocked > 32 sec > (REQUEST_SLOW) > 2018-06-29 12:18:14.325471 mon.ceph-mon-s43 mon.0 > 10.23.27.153:6789/0 <http://10.23.27.153:6789/0> 1534889 : cluster > [WRN] Health check update: 79 slow requests are blocked > 32 sec > (REQUEST_SLOW) > 2018-06-29 12:18:24.502586 mon.ceph-mon-s43 mon.0 > 10.23.27.153:6789/0 <http://10.23.27.153:6789/0> 1534900 : cluster > [WRN] Health check update: 84 slow requests are blocked > 32 sec > (REQUEST_SLOW) > 2018-06-29 12:18:34.489700 mon.ceph-mon-s43 mon.0 > 10.23.27.153:6789/0 <http://10.23.27.153:6789/0> 1534911 : cluster > [WRN] Health check update: 17 slow requests are blocked > 32 sec > (REQUEST_SLOW) > 2018-06-29 12:18:39.489982 mon.ceph-mon-s43 mon.0 > 10.23.27.153:6789/0 <http://10.23.27.153:6789/0> 1534917 : cluster > [WRN] Health check update: 19 slow requests are blocked > 32 sec > (REQUEST_SLOW) > 2018-06-29 12:18:44.490274 mon.ceph-mon-s43 mon.0 > 10.23.27.153:6789/0 <http://10.23.27.153:6789/0> 1534923 : cluster > [WRN] Health check update: 40 slow requests are blocked > 32 sec > (REQUEST_SLOW) > 2018-06-29 12:18:52.620025 mon.ceph-mon-s43 mon.0 > 10.23.27.153:6789/0 <http://10.23.27.153:6789/0> 1534932 : cluster > [WRN] Health check update: 92 slow requests are blocked > 32 sec > (REQUEST_SLOW) > 2018-06-29 12:18:58.641621 mon.ceph-mon-s43 mon.0 > 10.23.27.153:6789/0 <http://10.23.27.153:6789/0> 1534939 : cluster > [WRN] Health check update: 32 slow requests are blocked > 32 sec > (REQUEST_SLOW) > 2018-06-29 12:19:02.653015 mon.ceph-mon-s43 mon.0 > 10.23.27.153:6789/0 <http://10.23.27.153:6789/0> 1534948 : cluster > [INF] Health check cleared: REQUEST_SLOW (was: 32 slow requests > are blocked > 32 sec) > 2018-06-29 12:19:02.653048 mon.ceph-mon-s43 mon.0 > 10.23.27.153:6789/0 <http://10.23.27.153:6789/0> 1534949 : cluster > [INF] Cluster is now healthy > 2018-06-29 12:19:08.674106 mon.ceph-mon-s43 mon.0 > 10.23.27.153:6789/0 <http://10.23.27.153:6789/0> 1534956 : cluster > [WRN] Health check failed: 15 slow requests are blocked > 32 sec > (REQUEST_SLOW) > 2018-06-29 12:19:14.491798 mon.ceph-mon-s43 mon.0 > 10.23.27.153:6789/0 <http://10.23.27.153:6789/0> 1534963 : cluster > [WRN] Health check update: 14 slow requests are blocked > 32 sec > (REQUEST_SLOW) > 2018-06-29 12:19:19.492129 mon.ceph-mon-s43 mon.0 > 10.23.27.153:6789/0 <http://10.23.27.153:6789/0> 1534969 : cluster > [WRN] Health check update: 32 slow requests are blocked > 32 sec > (REQUEST_SLOW) > 2018-06-29 12:19:22.726667 mon.ceph-mon-s43 mon.0 > 10.23.27.153:6789/0 <http://10.23.27.153:6789/0> 1534973 : cluster > [INF] Health check cleared: REQUEST_SLOW (was: 32 slow requests > are blocked > 32 sec) > 2018-06-29 12:19:22.726697 mon.ceph-mon-s43 mon.0 > 10.23.27.153:6789/0 <http://10.23.27.153:6789/0> 1534974 : cluster > [INF] Cluster is now healthy > 2018-06-29 13:00:00.000121 mon.ceph-mon-s43 mon.0 > 10.23.27.153:6789/0 <http://10.23.27.153:6789/0> 1537844 : cluster > [INF] overall HEALTH_OK > > Is that related to snap trimming? > > I am currently migrating 250 virtual machines to my new an shiny > 2448 PGs, 72 OSD (48 HDD, 24 SSD, 5 osd nodes) cluster and these > messages appear with some delay after the daily rbd snapshot > creation.... > > Regards > > Marc > > > Am 29.06.2018 um 04:27 schrieb Gregory Farnum: >> You may find my talk at OpenStack Boston’s Ceph day last year to >> be useful: https://www.youtube.com/watch?v=rY0OWtllkn8 >> <https://www.youtube.com/watch?v=rY0OWtllkn8> >> -Greg >> On Wed, Jun 27, 2018 at 9:06 AM Marc Schöchlin <m...@256bit.org >> <mailto:m...@256bit.org>> wrote: >> >> Hello list, >> >> i currently hold 3 snapshots per rbd image for my virtual >> systems. >> >> What i miss in the current documentation: >> >> * details about the implementation of snapshots >> o implementation details >> o which scenarios create high overhead per snapshot >> o what causes the really short performance degration on >> snapshot >> creation/deletion >> o why do i not see a significant rbd performance >> degration if >> there a numerous snapshots >> o .... >> * details and recommendations about the overhead of snapshots >> o what performance penalty do i have to expect for a >> write/read iop >> o what are the edgecases of the implemnetation >> o how many snapshots per image (i.e virtual machine) >> might be a >> good idea >> o ... >> >> Regards >> Marc >> >> >> Am 27.06.2018 um 15:37 schrieb Brian :: >> > Hi John >> > >> > Have you looked at ceph documentation? >> > >> > RBD: http://docs.ceph.com/docs/luminous/rbd/rbd-snapshot/ >> <http://docs.ceph.com/docs/luminous/rbd/rbd-snapshot/> >> > >> > The ceph project documentation is really good for most >> areas. Have a >> > look at what you can find then come back with more specific >> questions! >> > >> > Thanks >> > Brian >> > >> > >> > >> > >> > On Wed, Jun 27, 2018 at 2:24 PM, John Molefe >> <john.mol...@nwu.ac.za <mailto:john.mol...@nwu.ac.za>> wrote: >> >> Hi everyone >> >> >> >> I would like some advice and insight into how ceph >> snapshots work and how it >> >> can be setup. >> >> >> >> Responses will be much appreciated. >> >> >> >> Thanks >> >> John >> >> >> >> Vrywaringsklousule / Disclaimer: >> >> http://www.nwu.ac.za/it/gov-man/disclaimer.html >> <http://www.nwu.ac.za/it/gov-man/disclaimer.html> >> >> >> >> >> >> _______________________________________________ >> >> ceph-users mailing list >> >> ceph-users@lists.ceph.com <mailto:ceph-users@lists.ceph.com> >> >> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com >> <http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com> >> >> >> > _______________________________________________ >> > ceph-users mailing list >> > ceph-users@lists.ceph.com <mailto:ceph-users@lists.ceph.com> >> > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com >> <http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com> >> >> >> _______________________________________________ >> ceph-users mailing list >> ceph-users@lists.ceph.com <mailto:ceph-users@lists.ceph.com> >> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com >> <http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com> >> > > > _______________________________________________ > ceph-users mailing list > ceph-users@lists.ceph.com <mailto:ceph-users@lists.ceph.com> > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com > <http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com> > > > > > -- > Paul Emmerich > > Looking for help with your Ceph cluster? Contact us at https://croit.io > > croit GmbH > Freseniusstr. 31h > 81247 München > www.croit.io <http://www.croit.io> > Tel: +49 89 1896585 90
_______________________________________________ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com