Craig,
I've struggled with the same issue for quite a while. If your i/o is
similar to mine, I believe you are on the right track. For the past
month or so, I have been running this cronjob:
* * * * * for strPg in `ceph pg dump | egrep
'^[0-9]\.[0-9a-f]{1,4}' | sort -k20 | awk '{ print $1 }' | head -2`; do
ceph pg deep-scrub $strPg; done
That roughly handles my 20672 PGs that are set to be deep-scrubbed every
7 days. Your script may be a bit better, but this quick and dirty method
has helped my cluster maintain more consistency.
The real key for me is to avoid the "clumpiness" I have observed without
that hack where concurrent deep-scrubs sit at zero for a long period of
time (despite having PGs that were months overdue for a deep-scrub),
then concurrent deep-scrubs suddenly spike up and stay in the teens for
hours, killing client writes/second.
The scrubbing behavior table[0] indicates that a periodic tick initiates
scrubs on a per-PG basis. Perhaps the timing of ticks aren't
sufficiently randomized when you restart lots of OSDs concurrently (for
instance via pdsh).
On my cluster I suffer a significant drag on client writes/second when I
exceed perhaps four or five concurrent PGs in deep-scrub. When
concurrent deep-scrubs get into the teens, I get a massive drop in
client writes/second.
Greg, is there locking involved when a PG enters deep-scrub? If so, is
the entire PG locked for the duration or is each individual object
inside the PG locked as it is processed? Some of my PGs will be in
deep-scrub for minutes at a time.
0: http://ceph.com/docs/master/dev/osd_internals/scrub/
Thanks,
Mike Dawson
On 6/9/2014 6:22 PM, Craig Lewis wrote:
I've correlated a large deep scrubbing operation to cluster stability
problems.
My primary cluster does a small amount of deep scrubs all the time,
spread out over the whole week. It has no stability problems.
My secondary cluster doesn't spread them out. It saves them up, and
tries to do all of the deep scrubs over the weekend. The secondary
starts loosing OSDs about an hour after these deep scrubs start.
To avoid this, I'm thinking of writing a script that continuously scrubs
the oldest outstanding PG. In psuedo-bash:
# Sort by the deep-scrub timestamp, taking the single oldest PG
while ceph pg dump | awk '$1 ~ /[0-9a-f]+\.[0-9a-f]+/ {print $20, $21,
$1}' | sort | head -1 | read date time pg
do
ceph pg deep-scrub ${pg}
while ceph status | grep scrubbing+deep
do
sleep 5
done
sleep 30
done
Does anybody think this will solve my problem?
I'm also considering disabling deep-scrubbing until the secondary
finishes replicating from the primary. Once it's caught up, the write
load should drop enough that opportunistic deep scrubs should have a
chance to run. It should only take another week or two to catch up.
_______________________________________________
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
_______________________________________________
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com