Re: [ceph-users] Ceph snapshots

2018-06-29 Thread Paul Emmerich
IIRC it can be changed/takes effect immediately. The message is only an
implementation detail:
there is no observer registered that explicitly takes some action when it's
changed, but the value is re-read
anyways. But it's some time since I had to change this value on run-time
but I'm pretty sure it worked
(same with recovery sleep)


Paul
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Ceph snapshots

2018-06-29 Thread Marc Schöchlin
It seems that this might interesting - unfortunately this cannot be
changed dynamically:

# ceph tell osd.* injectargs '--osd_snap_trim_sleep 0.025'
osd.0: osd_snap_trim_sleep = '0.025000' (not observed, change may
require restart)
osd.1: osd_snap_trim_sleep = '0.025000' (not observed, change may
require restart)
osd.2: osd_snap_trim_sleep = '0.025000' (not observed, change may
require restart)


Am 29.06.2018 um 17:36 schrieb Paul Emmerich:
> It's usually the snapshot deletion that triggers slowness. Are you
> also deleting/rotating old snapshots when creating new ones?
>
> In this case: try to increase osd_snap_trim_sleep a little bit. Even
> to 0.025 can help a lot with a lot of concurrent snapshot deletions.
> (That's what we set as default for exactly this reason - users see
> snapshot deletion as instant and cheap, but it can be quite expensive)
>
> Paul
>
>
> 2018-06-29 17:28 GMT+02:00 Marc Schöchlin  >:
>
> Hi Gregory,
>
> thanks for the link - very interesting talk.
> You mentioned the following settings in your talk, but i was not
> able to find some documentation in the osd config reference:
> (http://docs.ceph.com/docs/luminous/rados/configuration/osd-config-ref/
> )
>
> My clusters settings look like this (luminous/12.2.5):
>
> osd_snap_trim_cost = 1048576
> osd_snap_trim_priority = 5
> osd_snap_trim_sleep = 0.00
> mon_osd_snap_trim_queue_warn_on = 32768
>
> I currently experience messages like this:
>
> 2018-06-29 12:17:47.230028 mon.ceph-mon-s43 mon.0
> 10.23.27.153:6789/0  1534846 : cluster
> [INF] Health check cleared: REQUEST_SLOW (was: 22 slow requests
> are blocked > 32 sec)
> 2018-06-29 12:17:47.230069 mon.ceph-mon-s43 mon.0
> 10.23.27.153:6789/0  1534847 : cluster
> [INF] Cluster is now healthy
> 2018-06-29 12:18:03.287947 mon.ceph-mon-s43 mon.0
> 10.23.27.153:6789/0  1534876 : cluster
> [WRN] Health check failed: 24 slow requests are blocked > 32 sec
> (REQUEST_SLOW)
> 2018-06-29 12:18:08.307626 mon.ceph-mon-s43 mon.0
> 10.23.27.153:6789/0  1534882 : cluster
> [WRN] Health check update: 70 slow requests are blocked > 32 sec
> (REQUEST_SLOW)
> 2018-06-29 12:18:14.325471 mon.ceph-mon-s43 mon.0
> 10.23.27.153:6789/0  1534889 : cluster
> [WRN] Health check update: 79 slow requests are blocked > 32 sec
> (REQUEST_SLOW)
> 2018-06-29 12:18:24.502586 mon.ceph-mon-s43 mon.0
> 10.23.27.153:6789/0  1534900 : cluster
> [WRN] Health check update: 84 slow requests are blocked > 32 sec
> (REQUEST_SLOW)
> 2018-06-29 12:18:34.489700 mon.ceph-mon-s43 mon.0
> 10.23.27.153:6789/0  1534911 : cluster
> [WRN] Health check update: 17 slow requests are blocked > 32 sec
> (REQUEST_SLOW)
> 2018-06-29 12:18:39.489982 mon.ceph-mon-s43 mon.0
> 10.23.27.153:6789/0  1534917 : cluster
> [WRN] Health check update: 19 slow requests are blocked > 32 sec
> (REQUEST_SLOW)
> 2018-06-29 12:18:44.490274 mon.ceph-mon-s43 mon.0
> 10.23.27.153:6789/0  1534923 : cluster
> [WRN] Health check update: 40 slow requests are blocked > 32 sec
> (REQUEST_SLOW)
> 2018-06-29 12:18:52.620025 mon.ceph-mon-s43 mon.0
> 10.23.27.153:6789/0  1534932 : cluster
> [WRN] Health check update: 92 slow requests are blocked > 32 sec
> (REQUEST_SLOW)
> 2018-06-29 12:18:58.641621 mon.ceph-mon-s43 mon.0
> 10.23.27.153:6789/0  1534939 : cluster
> [WRN] Health check update: 32 slow requests are blocked > 32 sec
> (REQUEST_SLOW)
> 2018-06-29 12:19:02.653015 mon.ceph-mon-s43 mon.0
> 10.23.27.153:6789/0  1534948 : cluster
> [INF] Health check cleared: REQUEST_SLOW (was: 32 slow requests
> are blocked > 32 sec)
> 2018-06-29 12:19:02.653048 mon.ceph-mon-s43 mon.0
> 10.23.27.153:6789/0  1534949 : cluster
> [INF] Cluster is now healthy
> 2018-06-29 12:19:08.674106 mon.ceph-mon-s43 mon.0
> 10.23.27.153:6789/0  1534956 : cluster
> [WRN] Health check failed: 15 slow requests are blocked > 32 sec
> (REQUEST_SLOW)
> 2018-06-29 12:19:14.491798 mon.ceph-mon-s43 mon.0
> 10.23.27.153:6789/0  1534963 : cluster
> [WRN] Health check update: 14 slow requests are blocked > 32 sec
> (REQUEST_SLOW)
> 2018-06-29 12:19:19.492129 mon.ceph-mon-s43 mon.0
> 10.23.27.153:6789/0  1534969 : cluster
> [WRN] Health check update: 32 slow requests are blocked > 32 

Re: [ceph-users] Ceph snapshots

2018-06-29 Thread Paul Emmerich
It's usually the snapshot deletion that triggers slowness. Are you also
deleting/rotating old snapshots when creating new ones?

In this case: try to increase osd_snap_trim_sleep a little bit. Even to
0.025 can help a lot with a lot of concurrent snapshot deletions.
(That's what we set as default for exactly this reason - users see snapshot
deletion as instant and cheap, but it can be quite expensive)

Paul


2018-06-29 17:28 GMT+02:00 Marc Schöchlin :

> Hi Gregory,
>
> thanks for the link - very interesting talk.
> You mentioned the following settings in your talk, but i was not able to
> find some documentation in the osd config reference:
> (http://docs.ceph.com/docs/luminous/rados/configuration/osd-config-ref/)
>
> My clusters settings look like this (luminous/12.2.5):
>
> osd_snap_trim_cost = 1048576
> osd_snap_trim_priority = 5
> osd_snap_trim_sleep = 0.00
> mon_osd_snap_trim_queue_warn_on = 32768
>
> I currently experience messages like this:
>
> 2018-06-29 12:17:47.230028 mon.ceph-mon-s43 mon.0 10.23.27.153:6789/0
> 1534846 : cluster [INF] Health check cleared: REQUEST_SLOW (was: 22 slow
> requests are blocked > 32 sec)
> 2018-06-29 12:17:47.230069 mon.ceph-mon-s43 mon.0 10.23.27.153:6789/0
> 1534847 : cluster [INF] Cluster is now healthy
> 2018-06-29 12:18:03.287947 mon.ceph-mon-s43 mon.0 10.23.27.153:6789/0
> 1534876 : cluster [WRN] Health check failed: 24 slow requests are blocked >
> 32 sec (REQUEST_SLOW)
> 2018-06-29 12:18:08.307626 mon.ceph-mon-s43 mon.0 10.23.27.153:6789/0
> 1534882 : cluster [WRN] Health check update: 70 slow requests are blocked >
> 32 sec (REQUEST_SLOW)
> 2018-06-29 12:18:14.325471 mon.ceph-mon-s43 mon.0 10.23.27.153:6789/0
> 1534889 : cluster [WRN] Health check update: 79 slow requests are blocked >
> 32 sec (REQUEST_SLOW)
> 2018-06-29 12:18:24.502586 mon.ceph-mon-s43 mon.0 10.23.27.153:6789/0
> 1534900 : cluster [WRN] Health check update: 84 slow requests are blocked >
> 32 sec (REQUEST_SLOW)
> 2018-06-29 12:18:34.489700 mon.ceph-mon-s43 mon.0 10.23.27.153:6789/0
> 1534911 : cluster [WRN] Health check update: 17 slow requests are blocked >
> 32 sec (REQUEST_SLOW)
> 2018-06-29 12:18:39.489982 mon.ceph-mon-s43 mon.0 10.23.27.153:6789/0
> 1534917 : cluster [WRN] Health check update: 19 slow requests are blocked >
> 32 sec (REQUEST_SLOW)
> 2018-06-29 12:18:44.490274 mon.ceph-mon-s43 mon.0 10.23.27.153:6789/0
> 1534923 : cluster [WRN] Health check update: 40 slow requests are blocked >
> 32 sec (REQUEST_SLOW)
> 2018-06-29 12:18:52.620025 mon.ceph-mon-s43 mon.0 10.23.27.153:6789/0
> 1534932 : cluster [WRN] Health check update: 92 slow requests are blocked >
> 32 sec (REQUEST_SLOW)
> 2018-06-29 12:18:58.641621 mon.ceph-mon-s43 mon.0 10.23.27.153:6789/0
> 1534939 : cluster [WRN] Health check update: 32 slow requests are blocked >
> 32 sec (REQUEST_SLOW)
> 2018-06-29 12:19:02.653015 mon.ceph-mon-s43 mon.0 10.23.27.153:6789/0
> 1534948 : cluster [INF] Health check cleared: REQUEST_SLOW (was: 32 slow
> requests are blocked > 32 sec)
> 2018-06-29 12:19:02.653048 mon.ceph-mon-s43 mon.0 10.23.27.153:6789/0
> 1534949 : cluster [INF] Cluster is now healthy
> 2018-06-29 12:19:08.674106 mon.ceph-mon-s43 mon.0 10.23.27.153:6789/0
> 1534956 : cluster [WRN] Health check failed: 15 slow requests are blocked >
> 32 sec (REQUEST_SLOW)
> 2018-06-29 12:19:14.491798 mon.ceph-mon-s43 mon.0 10.23.27.153:6789/0
> 1534963 : cluster [WRN] Health check update: 14 slow requests are blocked >
> 32 sec (REQUEST_SLOW)
> 2018-06-29 12:19:19.492129 mon.ceph-mon-s43 mon.0 10.23.27.153:6789/0
> 1534969 : cluster [WRN] Health check update: 32 slow requests are blocked >
> 32 sec (REQUEST_SLOW)
> 2018-06-29 12:19:22.726667 mon.ceph-mon-s43 mon.0 10.23.27.153:6789/0
> 1534973 : cluster [INF] Health check cleared: REQUEST_SLOW (was: 32 slow
> requests are blocked > 32 sec)
> 2018-06-29 12:19:22.726697 mon.ceph-mon-s43 mon.0 10.23.27.153:6789/0
> 1534974 : cluster [INF] Cluster is now healthy
> 2018-06-29 13:00:00.000121 mon.ceph-mon-s43 mon.0 10.23.27.153:6789/0
> 1537844 : cluster [INF] overall HEALTH_OK
>
> Is that related to snap trimming?
>
> I am currently migrating 250 virtual machines to my new an shiny 2448 PGs,
> 72 OSD (48 HDD, 24 SSD, 5 osd nodes) cluster and these messages appear with
> some delay after the daily rbd snapshot creation
>
> Regards
>
> Marc
>
> Am 29.06.2018 um 04:27 schrieb Gregory Farnum:
>
> You may find my talk at OpenStack Boston’s Ceph day last year to be
> useful: https://www.youtube.com/watch?v=rY0OWtllkn8
> -Greg
> On Wed, Jun 27, 2018 at 9:06 AM Marc Schöchlin  wrote:
>
>> Hello list,
>>
>> i currently hold 3 snapshots per rbd image for my virtual systems.
>>
>> What i miss in the current documentation:
>>
>>   * details about the implementation of snapshots
>>   o implementation details
>>   o which scenarios create high overhead per snapshot
>>   o what causes the really short performance degration on snapshot
>> creation/deletion
>>   o why 

Re: [ceph-users] Ceph snapshots

2018-06-28 Thread Gregory Farnum
You may find my talk at OpenStack Boston’s Ceph day last year to be useful:
https://www.youtube.com/watch?v=rY0OWtllkn8
-Greg
On Wed, Jun 27, 2018 at 9:06 AM Marc Schöchlin  wrote:

> Hello list,
>
> i currently hold 3 snapshots per rbd image for my virtual systems.
>
> What i miss in the current documentation:
>
>   * details about the implementation of snapshots
>   o implementation details
>   o which scenarios create high overhead per snapshot
>   o what causes the really short performance degration on snapshot
> creation/deletion
>   o why do i not see a significant rbd performance degration if
> there a numerous snapshots
>   o 
>   * details and recommendations about the overhead of snapshots
>   o what performance penalty do i have to expect for a write/read iop
>   o what are the edgecases of the implemnetation
>   o how many snapshots per image (i.e virtual machine) might be a
> good idea
>   o ...
>
> Regards
> Marc
>
>
> Am 27.06.2018 um 15:37 schrieb Brian ::
> > Hi John
> >
> > Have you looked at ceph documentation?
> >
> > RBD: http://docs.ceph.com/docs/luminous/rbd/rbd-snapshot/
> >
> > The ceph project documentation is really good for most areas. Have a
> > look at what you can find then come back with more specific questions!
> >
> > Thanks
> > Brian
> >
> >
> >
> >
> > On Wed, Jun 27, 2018 at 2:24 PM, John Molefe 
> wrote:
> >> Hi everyone
> >>
> >> I would like some advice and insight into how ceph snapshots work and
> how it
> >> can be setup.
> >>
> >> Responses will be much appreciated.
> >>
> >> Thanks
> >> John
> >>
> >> Vrywaringsklousule / Disclaimer:
> >> http://www.nwu.ac.za/it/gov-man/disclaimer.html
> >>
> >>
> >> ___
> >> ceph-users mailing list
> >> ceph-users@lists.ceph.com
> >> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
> >>
> > ___
> > ceph-users mailing list
> > ceph-users@lists.ceph.com
> > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>
>
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Ceph snapshots

2018-06-27 Thread Marc Schöchlin
Hello list,

i currently hold 3 snapshots per rbd image for my virtual systems.

What i miss in the current documentation:

  * details about the implementation of snapshots
  o implementation details
  o which scenarios create high overhead per snapshot
  o what causes the really short performance degration on snapshot
creation/deletion
  o why do i not see a significant rbd performance degration if
there a numerous snapshots
  o 
  * details and recommendations about the overhead of snapshots
  o what performance penalty do i have to expect for a write/read iop
  o what are the edgecases of the implemnetation
  o how many snapshots per image (i.e virtual machine) might be a
good idea
  o ...

Regards
Marc


Am 27.06.2018 um 15:37 schrieb Brian ::
> Hi John
>
> Have you looked at ceph documentation?
>
> RBD: http://docs.ceph.com/docs/luminous/rbd/rbd-snapshot/
>
> The ceph project documentation is really good for most areas. Have a
> look at what you can find then come back with more specific questions!
>
> Thanks
> Brian
>
>
>
>
> On Wed, Jun 27, 2018 at 2:24 PM, John Molefe  wrote:
>> Hi everyone
>>
>> I would like some advice and insight into how ceph snapshots work and how it
>> can be setup.
>>
>> Responses will be much appreciated.
>>
>> Thanks
>> John
>>
>> Vrywaringsklousule / Disclaimer:
>> http://www.nwu.ac.za/it/gov-man/disclaimer.html
>>
>>
>> ___
>> ceph-users mailing list
>> ceph-users@lists.ceph.com
>> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>>
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Ceph snapshots

2018-06-27 Thread Brian :
Hi John

Have you looked at ceph documentation?

RBD: http://docs.ceph.com/docs/luminous/rbd/rbd-snapshot/

The ceph project documentation is really good for most areas. Have a
look at what you can find then come back with more specific questions!

Thanks
Brian




On Wed, Jun 27, 2018 at 2:24 PM, John Molefe  wrote:
> Hi everyone
>
> I would like some advice and insight into how ceph snapshots work and how it
> can be setup.
>
> Responses will be much appreciated.
>
> Thanks
> John
>
> Vrywaringsklousule / Disclaimer:
> http://www.nwu.ac.za/it/gov-man/disclaimer.html
>
>
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com