Unfortunately, decreasing the osd_scrub_max_interval to 6 days isn’t going to fix it.
There is sort of quirk in the way the deep scrub is initiated. It doesn’t trigger a deep scrub until a regular scrub is about to start. So with osd_scrub_max_interval set to 1 week and a high load the next possible scrub or deep-scrub is 1 week from the last REGULAR scrub, even if the last deep scrub was more than 7 days ago. The longest wait for a deep scrub is osd_scrub_max_interval + osd_deep_scrub_interval between deep scrubs. For example, a deep scrub happens on Jan 1. Each day after that for six days a regular scrub happens with low load. After 6 regular scrubs ending on Jan 7 the load goes high. Now with the load high no scrub can start until Jan 14 because you must get past osd_scrub_max_interval since the last regular scrub on Jan 7. At that time it will be a deep scrub because it is more than 7 days since the last deep scrub on Jan 1. See also http://tracker.ceph.com/issues/6735 There may be a need for more documentation clarification in this area or a change to the behavior. David Zafman Senior Developer http://www.inktank.com http://www.redhat.com On Jun 23, 2014, at 11:10 PM, Christian Balzer <ch...@gol.com> wrote: > > > Hello, > > On Mon, 23 Jun 2014 21:50:50 -0700 David Zafman wrote: > >> >> By default osd_scrub_max_interval and osd_deep_scrub_interval are 1 week >> 604800 seconds (60*60*24*7) and osd_scrub_min_interval is 1 day 86400 >> seconds (60*60*24). As long as osd_scrub_max_interval <= >> osd_deep_scrub_interval then the load won’t impact when deep scrub >> occurs. I suggest that osd_scrub_min_interval <= >> osd_scrub_max_interval <= osd_deep_scrub_interval. >> >> I’d like to know how you have those 3 values set, so I can confirm that >> this explains the issue. >> > They are and were unsurprisingly set to the default values. > > Now to provide some more information, shortly after the inception of this > cluster I did initiate a deep scrub on all OSDs on 00:30 on a Sunday > morning (the things we do for Ceph, a scheduler with a variety of rules > would be nice, but I digress). > This took until 05:30 despite the cluster being idle and with close to no > data in it. In retrospect it seems clear to me that this already was > influenced by the load threshold (a scrub I initiated with the new > threshold value of 1.5 finished in just 30 minutes last night). > Consequently all the normal scrubs happened in the same time frame until > this weekend on the 21st (normal scrub). > The deep scrub on the 22nd clearly ran into the load threshold. > > So if I understand you correctly setting osd_scrub_max_interval to 6 days > should have deep scrubs ignore the load threshold as per the documentation? > > Regards, > > Christian > >> >> David Zafman >> Senior Developer >> http://www.inktank.com >> http://www.redhat.com >> >> On Jun 23, 2014, at 7:01 PM, Christian Balzer <ch...@gol.com> wrote: >> >>> >>> Hello, >>> >>> On Mon, 23 Jun 2014 14:20:37 -0400 Gregory Farnum wrote: >>> >>>> Looks like it's a doc error (at least on master), but it might have >>>> changed over time. If you're running Dumpling we should change the >>>> docs. >>> >>> Nope, I'm running 0.80.1 currently. >>> >>> Christian >>> >>>> -Greg >>>> Software Engineer #42 @ http://inktank.com | http://ceph.com >>>> >>>> >>>> On Sun, Jun 22, 2014 at 10:18 PM, Christian Balzer <ch...@gol.com> >>>> wrote: >>>>> >>>>> Hello, >>>>> >>>>> This weekend I noticed that the deep scrubbing took a lot longer than >>>>> usual (long periods without a scrub running/finishing), even though >>>>> the cluster wasn't all that busy. >>>>> It was however busier than in the past and the load average was above >>>>> 0.5 frequently. >>>>> >>>>> Now according to the documentation "osd scrub load threshold" is >>>>> ignored when it comes to deep scrubs. >>>>> >>>>> However after setting it to 1.5 and restarting the OSDs the >>>>> floodgates opened and all those deep scrubs are now running at full >>>>> speed. >>>>> >>>>> Documentation error or did I "unstuck" something by the OSD restart? >>>>> >>>>> Regards, >>>>> >>>>> Christian >>>>> -- >>>>> Christian Balzer Network/Systems Engineer >>>>> ch...@gol.com Global OnLine Japan/Fusion Communications >>>>> http://www.gol.com/ >>>>> _______________________________________________ >>>>> ceph-users mailing list >>>>> ceph-users@lists.ceph.com >>>>> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com >>>> >>> >>> >>> -- >>> Christian Balzer Network/Systems Engineer >>> ch...@gol.com Global OnLine Japan/Fusion Communications >>> http://www.gol.com/ >>> _______________________________________________ >>> ceph-users mailing list >>> ceph-users@lists.ceph.com >>> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com >> > > > -- > Christian Balzer Network/Systems Engineer > ch...@gol.com Global OnLine Japan/Fusion Communications > http://www.gol.com/
_______________________________________________ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com