On Mon, Mar 5, 2018 at 9:56 AM Jonathan D. Proulx <j...@csail.mit.edu> wrote:
> Hi All, > > I've recently noticed my deep scrubs are EXTREAMLY poorly > distributed. They are stating with in the 18->06 local time start > stop time but are not distrubuted over enough days or well distributed > over the range of days they have. > > root@ceph-mon0:~# for date in `ceph pg dump | awk '/active/{print $20}'`; > do date +%D -d $date; done | sort | uniq -c > dumped all > 1 03/01/18 > 6 03/03/18 > 8358 03/04/18 > 1875 03/05/18 > > So very nearly all 10240 pgs scrubbed lastnight/this morning. I've > been kicking this around for a while since I noticed poor distribution > over a 7 day range when I was really pretty sure I'd changed that from > the 7d default to 28d. > > Tried kicking it out to 42 days about a week ago with: > > ceph tell osd.* injectargs '--osd_deep_scrub_interval 3628800' > > > There were many error suggesting it could nto reread the change and I'd > need to restart the OSDs but 'ceph daemon osd.0 config show |grep > osd_deep_scrub_interval' showed the right value so I let it roll for a > week but the scrubs did not spread out. > > So Friday I set that value in ceph.conf and did rolling restarts of > all OSDs. Then doubled checked running value on all daemons. > Checking Sunday the nightly deeps scrubs (based on LAST_DEEP_SCRUB > voodoo above) show near enough 1/42nd of PGs had been scrubbed > Saturday night that I thought this was working. > > This morning I checked again and got the results above. > > I would expect after changing to a 42d scrub cycle I'd see approx 1/42 > of the PGs deep scrub each night untill there was a roughly even > distribution over the past 42 days. > > So which thing is broken my config or my expectations? > Sadly, changing the interval settings does not directly change the scheduling of deep scrubs. Instead, it merely influences whether a PG will get queued for scrub when it is examined as a candidate, based on how out-of-date its scrub is. (That is, nothing holistically goes "I need to scrub 1/n of these PGs every night"; there's a simple task that says "is this PG's last scrub more than n days old?") Users have shared various scripts on the list for setting up a more even scrub distribution by fiddling with the settings and poking at specific PGs to try and smear them out over the whole time period; I'd check archives or google for those. :) -Greg
_______________________________________________ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com