It seems that this might interesting - unfortunately this cannot be
changed dynamically:

# ceph tell osd.* injectargs '--osd_snap_trim_sleep 0.025'
osd.0: osd_snap_trim_sleep = '0.025000' (not observed, change may
require restart)
osd.1: osd_snap_trim_sleep = '0.025000' (not observed, change may
require restart)
osd.2: osd_snap_trim_sleep = '0.025000' (not observed, change may
require restart)


Am 29.06.2018 um 17:36 schrieb Paul Emmerich:
> It's usually the snapshot deletion that triggers slowness. Are you
> also deleting/rotating old snapshots when creating new ones?
>
> In this case: try to increase osd_snap_trim_sleep a little bit. Even
> to 0.025 can help a lot with a lot of concurrent snapshot deletions.
> (That's what we set as default for exactly this reason - users see
> snapshot deletion as instant and cheap, but it can be quite expensive)
>
> Paul
>
>
> 2018-06-29 17:28 GMT+02:00 Marc Schöchlin <m...@256bit.org
> <mailto:m...@256bit.org>>:
>
>     Hi Gregory,
>
>     thanks for the link - very interesting talk.
>     You mentioned the following settings in your talk, but i was not
>     able to find some documentation in the osd config reference:
>     (http://docs.ceph.com/docs/luminous/rados/configuration/osd-config-ref/
>     <http://docs.ceph.com/docs/luminous/rados/configuration/osd-config-ref/>)
>
>     My clusters settings look like this (luminous/12.2.5):
>
>     osd_snap_trim_cost = 1048576
>     osd_snap_trim_priority = 5
>     osd_snap_trim_sleep = 0.000000
>     mon_osd_snap_trim_queue_warn_on = 32768
>
>     I currently experience messages like this:
>
>     2018-06-29 12:17:47.230028 mon.ceph-mon-s43 mon.0
>     10.23.27.153:6789/0 <http://10.23.27.153:6789/0> 1534846 : cluster
>     [INF] Health check cleared: REQUEST_SLOW (was: 22 slow requests
>     are blocked > 32 sec)
>     2018-06-29 12:17:47.230069 mon.ceph-mon-s43 mon.0
>     10.23.27.153:6789/0 <http://10.23.27.153:6789/0> 1534847 : cluster
>     [INF] Cluster is now healthy
>     2018-06-29 12:18:03.287947 mon.ceph-mon-s43 mon.0
>     10.23.27.153:6789/0 <http://10.23.27.153:6789/0> 1534876 : cluster
>     [WRN] Health check failed: 24 slow requests are blocked > 32 sec
>     (REQUEST_SLOW)
>     2018-06-29 12:18:08.307626 mon.ceph-mon-s43 mon.0
>     10.23.27.153:6789/0 <http://10.23.27.153:6789/0> 1534882 : cluster
>     [WRN] Health check update: 70 slow requests are blocked > 32 sec
>     (REQUEST_SLOW)
>     2018-06-29 12:18:14.325471 mon.ceph-mon-s43 mon.0
>     10.23.27.153:6789/0 <http://10.23.27.153:6789/0> 1534889 : cluster
>     [WRN] Health check update: 79 slow requests are blocked > 32 sec
>     (REQUEST_SLOW)
>     2018-06-29 12:18:24.502586 mon.ceph-mon-s43 mon.0
>     10.23.27.153:6789/0 <http://10.23.27.153:6789/0> 1534900 : cluster
>     [WRN] Health check update: 84 slow requests are blocked > 32 sec
>     (REQUEST_SLOW)
>     2018-06-29 12:18:34.489700 mon.ceph-mon-s43 mon.0
>     10.23.27.153:6789/0 <http://10.23.27.153:6789/0> 1534911 : cluster
>     [WRN] Health check update: 17 slow requests are blocked > 32 sec
>     (REQUEST_SLOW)
>     2018-06-29 12:18:39.489982 mon.ceph-mon-s43 mon.0
>     10.23.27.153:6789/0 <http://10.23.27.153:6789/0> 1534917 : cluster
>     [WRN] Health check update: 19 slow requests are blocked > 32 sec
>     (REQUEST_SLOW)
>     2018-06-29 12:18:44.490274 mon.ceph-mon-s43 mon.0
>     10.23.27.153:6789/0 <http://10.23.27.153:6789/0> 1534923 : cluster
>     [WRN] Health check update: 40 slow requests are blocked > 32 sec
>     (REQUEST_SLOW)
>     2018-06-29 12:18:52.620025 mon.ceph-mon-s43 mon.0
>     10.23.27.153:6789/0 <http://10.23.27.153:6789/0> 1534932 : cluster
>     [WRN] Health check update: 92 slow requests are blocked > 32 sec
>     (REQUEST_SLOW)
>     2018-06-29 12:18:58.641621 mon.ceph-mon-s43 mon.0
>     10.23.27.153:6789/0 <http://10.23.27.153:6789/0> 1534939 : cluster
>     [WRN] Health check update: 32 slow requests are blocked > 32 sec
>     (REQUEST_SLOW)
>     2018-06-29 12:19:02.653015 mon.ceph-mon-s43 mon.0
>     10.23.27.153:6789/0 <http://10.23.27.153:6789/0> 1534948 : cluster
>     [INF] Health check cleared: REQUEST_SLOW (was: 32 slow requests
>     are blocked > 32 sec)
>     2018-06-29 12:19:02.653048 mon.ceph-mon-s43 mon.0
>     10.23.27.153:6789/0 <http://10.23.27.153:6789/0> 1534949 : cluster
>     [INF] Cluster is now healthy
>     2018-06-29 12:19:08.674106 mon.ceph-mon-s43 mon.0
>     10.23.27.153:6789/0 <http://10.23.27.153:6789/0> 1534956 : cluster
>     [WRN] Health check failed: 15 slow requests are blocked > 32 sec
>     (REQUEST_SLOW)
>     2018-06-29 12:19:14.491798 mon.ceph-mon-s43 mon.0
>     10.23.27.153:6789/0 <http://10.23.27.153:6789/0> 1534963 : cluster
>     [WRN] Health check update: 14 slow requests are blocked > 32 sec
>     (REQUEST_SLOW)
>     2018-06-29 12:19:19.492129 mon.ceph-mon-s43 mon.0
>     10.23.27.153:6789/0 <http://10.23.27.153:6789/0> 1534969 : cluster
>     [WRN] Health check update: 32 slow requests are blocked > 32 sec
>     (REQUEST_SLOW)
>     2018-06-29 12:19:22.726667 mon.ceph-mon-s43 mon.0
>     10.23.27.153:6789/0 <http://10.23.27.153:6789/0> 1534973 : cluster
>     [INF] Health check cleared: REQUEST_SLOW (was: 32 slow requests
>     are blocked > 32 sec)
>     2018-06-29 12:19:22.726697 mon.ceph-mon-s43 mon.0
>     10.23.27.153:6789/0 <http://10.23.27.153:6789/0> 1534974 : cluster
>     [INF] Cluster is now healthy
>     2018-06-29 13:00:00.000121 mon.ceph-mon-s43 mon.0
>     10.23.27.153:6789/0 <http://10.23.27.153:6789/0> 1537844 : cluster
>     [INF] overall HEALTH_OK
>
>     Is that related to snap trimming?
>
>     I am currently migrating 250 virtual machines to my new an shiny
>     2448 PGs, 72 OSD (48 HDD, 24 SSD, 5 osd nodes) cluster and these
>     messages appear with some delay after the daily rbd snapshot
>     creation....
>
>     Regards
>
>     Marc
>
>
>     Am 29.06.2018 um 04:27 schrieb Gregory Farnum:
>>     You may find my talk at OpenStack Boston’s Ceph day last year to
>>     be useful: https://www.youtube.com/watch?v=rY0OWtllkn8
>>     <https://www.youtube.com/watch?v=rY0OWtllkn8>
>>     -Greg
>>     On Wed, Jun 27, 2018 at 9:06 AM Marc Schöchlin <m...@256bit.org
>>     <mailto:m...@256bit.org>> wrote:
>>
>>         Hello list,
>>
>>         i currently hold 3 snapshots per rbd image for my virtual
>>         systems.
>>
>>         What i miss in the current documentation:
>>
>>           * details about the implementation of snapshots
>>               o implementation details
>>               o which scenarios create high overhead per snapshot
>>               o what causes the really short performance degration on
>>         snapshot
>>                 creation/deletion
>>               o why do i not see a significant rbd performance
>>         degration if
>>                 there a numerous snapshots
>>               o ....
>>           * details and recommendations about the overhead of snapshots
>>               o what performance penalty do i have to expect for a
>>         write/read iop
>>               o what are the edgecases of the implemnetation
>>               o how many snapshots per image (i.e virtual machine)
>>         might be a
>>                 good idea
>>               o ...
>>
>>         Regards
>>         Marc
>>
>>
>>         Am 27.06.2018 um 15:37 schrieb Brian ::
>>         > Hi John
>>         >
>>         > Have you looked at ceph documentation?
>>         >
>>         > RBD: http://docs.ceph.com/docs/luminous/rbd/rbd-snapshot/
>>         <http://docs.ceph.com/docs/luminous/rbd/rbd-snapshot/>
>>         >
>>         > The ceph project documentation is really good for most
>>         areas. Have a
>>         > look at what you can find then come back with more specific
>>         questions!
>>         >
>>         > Thanks
>>         > Brian
>>         >
>>         >
>>         >
>>         >
>>         > On Wed, Jun 27, 2018 at 2:24 PM, John Molefe
>>         <john.mol...@nwu.ac.za <mailto:john.mol...@nwu.ac.za>> wrote:
>>         >> Hi everyone
>>         >>
>>         >> I would like some advice and insight into how ceph
>>         snapshots work and how it
>>         >> can be setup.
>>         >>
>>         >> Responses will be much appreciated.
>>         >>
>>         >> Thanks
>>         >> John
>>         >>
>>         >> Vrywaringsklousule / Disclaimer:
>>         >> http://www.nwu.ac.za/it/gov-man/disclaimer.html
>>         <http://www.nwu.ac.za/it/gov-man/disclaimer.html>
>>         >>
>>         >>
>>         >> _______________________________________________
>>         >> ceph-users mailing list
>>         >> ceph-users@lists.ceph.com <mailto:ceph-users@lists.ceph.com>
>>         >> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>>         <http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com>
>>         >>
>>         > _______________________________________________
>>         > ceph-users mailing list
>>         > ceph-users@lists.ceph.com <mailto:ceph-users@lists.ceph.com>
>>         > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>>         <http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com>
>>
>>
>>         _______________________________________________
>>         ceph-users mailing list
>>         ceph-users@lists.ceph.com <mailto:ceph-users@lists.ceph.com>
>>         http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>>         <http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com>
>>
>
>
>     _______________________________________________
>     ceph-users mailing list
>     ceph-users@lists.ceph.com <mailto:ceph-users@lists.ceph.com>
>     http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>     <http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com>
>
>
>
>
> -- 
> Paul Emmerich
>
> Looking for help with your Ceph cluster? Contact us at https://croit.io
>
> croit GmbH
> Freseniusstr. 31h
> 81247 München
> www.croit.io <http://www.croit.io>
> Tel: +49 89 1896585 90

_______________________________________________
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Reply via email to