Re: Deep-Scrub and High Read Latency with QEMU/RBD

2013-09-11 Thread Mike Dawson
I created Issue #6278 (http://tracker.ceph.com/issues/6278) to track 
this issue.


Thanks,
Mike Dawson


On 8/30/2013 1:52 PM, Andrey Korolyov wrote:

On Fri, Aug 30, 2013 at 9:44 PM, Mike Dawson mike.daw...@cloudapt.com wrote:

Andrey,

I use all the defaults:

# ceph --admin-daemon /var/run/ceph/ceph-osd.1.asok config show | grep scrub
   osd_scrub_thread_timeout: 60,
   osd_scrub_finalize_thread_timeout: 600,




   osd_max_scrubs: 1,


This one. I may suggest to increase max_interval and write some kind
of script doing per-pg scrub with low intensity, so you`ll have one
scrubbing PG or less anytime and you may wait some time before
scrubbing next, so they will not start scrubbing at once when
max_interval will expire. I had discussed some throttling mechanisms
to scrubbing some months ago here or in ceph-devel, but there still no
such implementation (it is ultimately low-priority task since it can
be handled by such simple thing as proposal above).


   osd_scrub_load_threshold: 0.5,
   osd_scrub_min_interval: 86400,
   osd_scrub_max_interval: 604800,
   osd_scrub_chunk_min: 5,
   osd_scrub_chunk_max: 25,
   osd_deep_scrub_interval: 604800,
   osd_deep_scrub_stride: 524288,

Which value are you referring to?


Does anyone know exactly how osd scrub load threshold works? The manual
states The maximum CPU load. Ceph will not scrub when the CPU load is
higher than this number. Default is 50%. So on a system with multiple
processors and cores...what happens? Is the threshold .5 load (meaning half
a core) or 50% of max load meaning anything less than 8 if you have 16
cores?

Thanks,
Mike Dawson


On 8/30/2013 1:34 PM, Andrey Korolyov wrote:


You may want to reduce scrubbing pgs per osd to 1 using config option
and check the results.

On Fri, Aug 30, 2013 at 8:03 PM, Mike Dawson mike.daw...@cloudapt.com
wrote:


We've been struggling with an issue of spikes of high i/o latency with
qemu/rbd guests. As we've been chasing this bug, we've greatly improved
the
methods we use to monitor our infrastructure.

It appears that our RBD performance chokes in two situations:

- Deep-Scrub
- Backfill/recovery

In this email, I want to focus on deep-scrub. Graphing '% Util' from
'iostat
-x' on my hosts with OSDs, I can see Deep-Scrub take my disks from around
10% utilized to complete saturation during a scrub.

RBD writeback cache appears to cover the issue nicely, but occasionally
suffers drops in performance (presumably when it flushes). But, reads
appear
to suffer greatly, with multiple seconds of 0B/s of reads accomplished
(see
log fragment below). If I make the assumption that deep-scrub isn't
intended
to create massive spindle contention, this appears to be a problem. What
should happen here?

Looking at the settings around deep-scrub, I don't see an obvious way to
say
don't saturate my drives. Are there any setting in Ceph or otherwise
(readahead?) that might lower the burden of deep-scrub?

If not, perhaps reads could be remapped to avoid waiting on saturated
disks
during scrub.

Any ideas?

2013-08-30 15:47:20.166149 mon.0 [INF] pgmap v9853931: 20672 pgs: 20665
active+clean, 7 active+clean+scrubbing+deep; 38136 GB data, 111 TB used,
64556 GB / 174 TB avail; 0B/s rd, 5058KB/s wr, 217op/s
2013-08-30 15:47:21.945948 mon.0 [INF] pgmap v9853932: 20672 pgs: 20665
active+clean, 7 active+clean+scrubbing+deep; 38136 GB data, 111 TB used,
64556 GB / 174 TB avail; 0B/s rd, 5553KB/s wr, 229op/s
2013-08-30 15:47:23.205843 mon.0 [INF] pgmap v9853933: 20672 pgs: 20664
active+clean, 8 active+clean+scrubbing+deep; 38136 GB data, 111 TB used,
64556 GB / 174 TB avail; 0B/s rd, 6580KB/s wr, 246op/s
2013-08-30 15:47:24.843308 mon.0 [INF] pgmap v9853934: 20672 pgs: 20664
active+clean, 8 active+clean+scrubbing+deep; 38136 GB data, 111 TB used,
64556 GB / 174 TB avail; 0B/s rd, 3795KB/s wr, 224op/s
2013-08-30 15:47:25.862722 mon.0 [INF] pgmap v9853935: 20672 pgs: 20664
active+clean, 8 active+clean+scrubbing+deep; 38136 GB data, 111 TB used,
64556 GB / 174 TB avail; 1414B/s rd, 3799KB/s wr, 181op/s
2013-08-30 15:47:26.887516 mon.0 [INF] pgmap v9853936: 20672 pgs: 20664
active+clean, 8 active+clean+scrubbing+deep; 38136 GB data, 111 TB used,
64556 GB / 174 TB avail; 1541B/s rd, 8138KB/s wr, 160op/s
2013-08-30 15:47:27.933629 mon.0 [INF] pgmap v9853937: 20672 pgs: 20664
active+clean, 8 active+clean+scrubbing+deep; 38136 GB data, 111 TB used,
64556 GB / 174 TB avail; 0B/s rd, 14458KB/s wr, 304op/s
2013-08-30 15:47:29.127847 mon.0 [INF] pgmap v9853938: 20672 pgs: 20664
active+clean, 8 active+clean+scrubbing+deep; 38136 GB data, 111 TB used,
64556 GB / 174 TB avail; 0B/s rd, 15300KB/s wr, 345op/s
2013-08-30 15:47:30.344837 mon.0 [INF] pgmap v9853939: 20672 pgs: 20664
active+clean, 8 active+clean+scrubbing+deep; 38136 GB data, 111 TB used,
64556 GB / 174 TB avail; 0B/s rd, 13128KB/s wr, 218op/s
2013-08-30 15:47:31.380089 mon.0 [INF] pgmap v9853940: 20672 pgs: 20664
active+clean, 8 active+clean+scrubbing+deep; 38136 GB data, 111 TB used,

Deep-Scrub and High Read Latency with QEMU/RBD

2013-08-30 Thread Mike Dawson
We've been struggling with an issue of spikes of high i/o latency with 
qemu/rbd guests. As we've been chasing this bug, we've greatly improved 
the methods we use to monitor our infrastructure.


It appears that our RBD performance chokes in two situations:

- Deep-Scrub
- Backfill/recovery

In this email, I want to focus on deep-scrub. Graphing '% Util' from 
'iostat -x' on my hosts with OSDs, I can see Deep-Scrub take my disks 
from around 10% utilized to complete saturation during a scrub.


RBD writeback cache appears to cover the issue nicely, but occasionally 
suffers drops in performance (presumably when it flushes). But, reads 
appear to suffer greatly, with multiple seconds of 0B/s of reads 
accomplished (see log fragment below). If I make the assumption that 
deep-scrub isn't intended to create massive spindle contention, this 
appears to be a problem. What should happen here?


Looking at the settings around deep-scrub, I don't see an obvious way to 
say don't saturate my drives. Are there any setting in Ceph or 
otherwise (readahead?) that might lower the burden of deep-scrub?


If not, perhaps reads could be remapped to avoid waiting on saturated 
disks during scrub.


Any ideas?

2013-08-30 15:47:20.166149 mon.0 [INF] pgmap v9853931: 20672 pgs: 20665 
active+clean, 7 active+clean+scrubbing+deep; 38136 GB data, 111 TB used, 
64556 GB / 174 TB avail; 0B/s rd, 5058KB/s wr, 217op/s
2013-08-30 15:47:21.945948 mon.0 [INF] pgmap v9853932: 20672 pgs: 20665 
active+clean, 7 active+clean+scrubbing+deep; 38136 GB data, 111 TB used, 
64556 GB / 174 TB avail; 0B/s rd, 5553KB/s wr, 229op/s
2013-08-30 15:47:23.205843 mon.0 [INF] pgmap v9853933: 20672 pgs: 20664 
active+clean, 8 active+clean+scrubbing+deep; 38136 GB data, 111 TB used, 
64556 GB / 174 TB avail; 0B/s rd, 6580KB/s wr, 246op/s
2013-08-30 15:47:24.843308 mon.0 [INF] pgmap v9853934: 20672 pgs: 20664 
active+clean, 8 active+clean+scrubbing+deep; 38136 GB data, 111 TB used, 
64556 GB / 174 TB avail; 0B/s rd, 3795KB/s wr, 224op/s
2013-08-30 15:47:25.862722 mon.0 [INF] pgmap v9853935: 20672 pgs: 20664 
active+clean, 8 active+clean+scrubbing+deep; 38136 GB data, 111 TB used, 
64556 GB / 174 TB avail; 1414B/s rd, 3799KB/s wr, 181op/s
2013-08-30 15:47:26.887516 mon.0 [INF] pgmap v9853936: 20672 pgs: 20664 
active+clean, 8 active+clean+scrubbing+deep; 38136 GB data, 111 TB used, 
64556 GB / 174 TB avail; 1541B/s rd, 8138KB/s wr, 160op/s
2013-08-30 15:47:27.933629 mon.0 [INF] pgmap v9853937: 20672 pgs: 20664 
active+clean, 8 active+clean+scrubbing+deep; 38136 GB data, 111 TB used, 
64556 GB / 174 TB avail; 0B/s rd, 14458KB/s wr, 304op/s
2013-08-30 15:47:29.127847 mon.0 [INF] pgmap v9853938: 20672 pgs: 20664 
active+clean, 8 active+clean+scrubbing+deep; 38136 GB data, 111 TB used, 
64556 GB / 174 TB avail; 0B/s rd, 15300KB/s wr, 345op/s
2013-08-30 15:47:30.344837 mon.0 [INF] pgmap v9853939: 20672 pgs: 20664 
active+clean, 8 active+clean+scrubbing+deep; 38136 GB data, 111 TB used, 
64556 GB / 174 TB avail; 0B/s rd, 13128KB/s wr, 218op/s
2013-08-30 15:47:31.380089 mon.0 [INF] pgmap v9853940: 20672 pgs: 20664 
active+clean, 8 active+clean+scrubbing+deep; 38136 GB data, 111 TB used, 
64556 GB / 174 TB avail; 0B/s rd, 13299KB/s wr, 241op/s
2013-08-30 15:47:32.388303 mon.0 [INF] pgmap v9853941: 20672 pgs: 20664 
active+clean, 8 active+clean+scrubbing+deep; 38136 GB data, 111 TB used, 
64556 GB / 174 TB avail; 4951B/s rd, 8147KB/s wr, 192op/s
2013-08-30 15:47:33.858382 mon.0 [INF] pgmap v9853942: 20672 pgs: 20664 
active+clean, 8 active+clean+scrubbing+deep; 38136 GB data, 111 TB used, 
64556 GB / 174 TB avail; 7029B/s rd, 3254KB/s wr, 190op/s
2013-08-30 15:47:35.279691 mon.0 [INF] pgmap v9853943: 20672 pgs: 20664 
active+clean, 8 active+clean+scrubbing+deep; 38136 GB data, 111 TB used, 
64555 GB / 174 TB avail; 1651B/s rd, 2476KB/s wr, 207op/s
2013-08-30 15:47:36.309078 mon.0 [INF] pgmap v9853944: 20672 pgs: 20664 
active+clean, 8 active+clean+scrubbing+deep; 38136 GB data, 111 TB used, 
64555 GB / 174 TB avail; 0B/s rd, 3788KB/s wr, 239op/s
2013-08-30 15:47:38.120343 mon.0 [INF] pgmap v9853945: 20672 pgs: 20664 
active+clean, 8 active+clean+scrubbing+deep; 38136 GB data, 111 TB used, 
64555 GB / 174 TB avail; 0B/s rd, 4671KB/s wr, 239op/s
2013-08-30 15:47:39.546980 mon.0 [INF] pgmap v9853946: 20672 pgs: 20664 
active+clean, 8 active+clean+scrubbing+deep; 38136 GB data, 111 TB used, 
64555 GB / 174 TB avail; 0B/s rd, 13487KB/s wr, 444op/s
2013-08-30 15:47:40.561203 mon.0 [INF] pgmap v9853947: 20672 pgs: 20664 
active+clean, 8 active+clean+scrubbing+deep; 38136 GB data, 111 TB used, 
64555 GB / 174 TB avail; 0B/s rd, 15265KB/s wr, 489op/s
2013-08-30 15:47:41.794355 mon.0 [INF] pgmap v9853948: 20672 pgs: 20664 
active+clean, 8 active+clean+scrubbing+deep; 38136 GB data, 111 TB used, 
64555 GB / 174 TB avail; 0B/s rd, 7157KB/s wr, 240op/s
2013-08-30 15:47:44.661000 mon.0 [INF] pgmap v9853949: 20672 pgs: 20664 
active+clean, 8 active+clean+scrubbing+deep; 38136 GB 

Re: Deep-Scrub and High Read Latency with QEMU/RBD

2013-08-30 Thread Andrey Korolyov
You may want to reduce scrubbing pgs per osd to 1 using config option
and check the results.

On Fri, Aug 30, 2013 at 8:03 PM, Mike Dawson mike.daw...@cloudapt.com wrote:
 We've been struggling with an issue of spikes of high i/o latency with
 qemu/rbd guests. As we've been chasing this bug, we've greatly improved the
 methods we use to monitor our infrastructure.

 It appears that our RBD performance chokes in two situations:

 - Deep-Scrub
 - Backfill/recovery

 In this email, I want to focus on deep-scrub. Graphing '% Util' from 'iostat
 -x' on my hosts with OSDs, I can see Deep-Scrub take my disks from around
 10% utilized to complete saturation during a scrub.

 RBD writeback cache appears to cover the issue nicely, but occasionally
 suffers drops in performance (presumably when it flushes). But, reads appear
 to suffer greatly, with multiple seconds of 0B/s of reads accomplished (see
 log fragment below). If I make the assumption that deep-scrub isn't intended
 to create massive spindle contention, this appears to be a problem. What
 should happen here?

 Looking at the settings around deep-scrub, I don't see an obvious way to say
 don't saturate my drives. Are there any setting in Ceph or otherwise
 (readahead?) that might lower the burden of deep-scrub?

 If not, perhaps reads could be remapped to avoid waiting on saturated disks
 during scrub.

 Any ideas?

 2013-08-30 15:47:20.166149 mon.0 [INF] pgmap v9853931: 20672 pgs: 20665
 active+clean, 7 active+clean+scrubbing+deep; 38136 GB data, 111 TB used,
 64556 GB / 174 TB avail; 0B/s rd, 5058KB/s wr, 217op/s
 2013-08-30 15:47:21.945948 mon.0 [INF] pgmap v9853932: 20672 pgs: 20665
 active+clean, 7 active+clean+scrubbing+deep; 38136 GB data, 111 TB used,
 64556 GB / 174 TB avail; 0B/s rd, 5553KB/s wr, 229op/s
 2013-08-30 15:47:23.205843 mon.0 [INF] pgmap v9853933: 20672 pgs: 20664
 active+clean, 8 active+clean+scrubbing+deep; 38136 GB data, 111 TB used,
 64556 GB / 174 TB avail; 0B/s rd, 6580KB/s wr, 246op/s
 2013-08-30 15:47:24.843308 mon.0 [INF] pgmap v9853934: 20672 pgs: 20664
 active+clean, 8 active+clean+scrubbing+deep; 38136 GB data, 111 TB used,
 64556 GB / 174 TB avail; 0B/s rd, 3795KB/s wr, 224op/s
 2013-08-30 15:47:25.862722 mon.0 [INF] pgmap v9853935: 20672 pgs: 20664
 active+clean, 8 active+clean+scrubbing+deep; 38136 GB data, 111 TB used,
 64556 GB / 174 TB avail; 1414B/s rd, 3799KB/s wr, 181op/s
 2013-08-30 15:47:26.887516 mon.0 [INF] pgmap v9853936: 20672 pgs: 20664
 active+clean, 8 active+clean+scrubbing+deep; 38136 GB data, 111 TB used,
 64556 GB / 174 TB avail; 1541B/s rd, 8138KB/s wr, 160op/s
 2013-08-30 15:47:27.933629 mon.0 [INF] pgmap v9853937: 20672 pgs: 20664
 active+clean, 8 active+clean+scrubbing+deep; 38136 GB data, 111 TB used,
 64556 GB / 174 TB avail; 0B/s rd, 14458KB/s wr, 304op/s
 2013-08-30 15:47:29.127847 mon.0 [INF] pgmap v9853938: 20672 pgs: 20664
 active+clean, 8 active+clean+scrubbing+deep; 38136 GB data, 111 TB used,
 64556 GB / 174 TB avail; 0B/s rd, 15300KB/s wr, 345op/s
 2013-08-30 15:47:30.344837 mon.0 [INF] pgmap v9853939: 20672 pgs: 20664
 active+clean, 8 active+clean+scrubbing+deep; 38136 GB data, 111 TB used,
 64556 GB / 174 TB avail; 0B/s rd, 13128KB/s wr, 218op/s
 2013-08-30 15:47:31.380089 mon.0 [INF] pgmap v9853940: 20672 pgs: 20664
 active+clean, 8 active+clean+scrubbing+deep; 38136 GB data, 111 TB used,
 64556 GB / 174 TB avail; 0B/s rd, 13299KB/s wr, 241op/s
 2013-08-30 15:47:32.388303 mon.0 [INF] pgmap v9853941: 20672 pgs: 20664
 active+clean, 8 active+clean+scrubbing+deep; 38136 GB data, 111 TB used,
 64556 GB / 174 TB avail; 4951B/s rd, 8147KB/s wr, 192op/s
 2013-08-30 15:47:33.858382 mon.0 [INF] pgmap v9853942: 20672 pgs: 20664
 active+clean, 8 active+clean+scrubbing+deep; 38136 GB data, 111 TB used,
 64556 GB / 174 TB avail; 7029B/s rd, 3254KB/s wr, 190op/s
 2013-08-30 15:47:35.279691 mon.0 [INF] pgmap v9853943: 20672 pgs: 20664
 active+clean, 8 active+clean+scrubbing+deep; 38136 GB data, 111 TB used,
 64555 GB / 174 TB avail; 1651B/s rd, 2476KB/s wr, 207op/s
 2013-08-30 15:47:36.309078 mon.0 [INF] pgmap v9853944: 20672 pgs: 20664
 active+clean, 8 active+clean+scrubbing+deep; 38136 GB data, 111 TB used,
 64555 GB / 174 TB avail; 0B/s rd, 3788KB/s wr, 239op/s
 2013-08-30 15:47:38.120343 mon.0 [INF] pgmap v9853945: 20672 pgs: 20664
 active+clean, 8 active+clean+scrubbing+deep; 38136 GB data, 111 TB used,
 64555 GB / 174 TB avail; 0B/s rd, 4671KB/s wr, 239op/s
 2013-08-30 15:47:39.546980 mon.0 [INF] pgmap v9853946: 20672 pgs: 20664
 active+clean, 8 active+clean+scrubbing+deep; 38136 GB data, 111 TB used,
 64555 GB / 174 TB avail; 0B/s rd, 13487KB/s wr, 444op/s
 2013-08-30 15:47:40.561203 mon.0 [INF] pgmap v9853947: 20672 pgs: 20664
 active+clean, 8 active+clean+scrubbing+deep; 38136 GB data, 111 TB used,
 64555 GB / 174 TB avail; 0B/s rd, 15265KB/s wr, 489op/s
 2013-08-30 15:47:41.794355 mon.0 [INF] pgmap v9853948: 20672 pgs: 20664
 active+clean, 8 active+clean+scrubbing+deep; 38136 GB data, 111 

Re: Deep-Scrub and High Read Latency with QEMU/RBD

2013-08-30 Thread Mike Dawson

Andrey,

I use all the defaults:

# ceph --admin-daemon /var/run/ceph/ceph-osd.1.asok config show | grep scrub
  osd_scrub_thread_timeout: 60,
  osd_scrub_finalize_thread_timeout: 600,
  osd_max_scrubs: 1,
  osd_scrub_load_threshold: 0.5,
  osd_scrub_min_interval: 86400,
  osd_scrub_max_interval: 604800,
  osd_scrub_chunk_min: 5,
  osd_scrub_chunk_max: 25,
  osd_deep_scrub_interval: 604800,
  osd_deep_scrub_stride: 524288,

Which value are you referring to?


Does anyone know exactly how osd scrub load threshold works? The 
manual states The maximum CPU load. Ceph will not scrub when the CPU 
load is higher than this number. Default is 50%. So on a system with 
multiple processors and cores...what happens? Is the threshold .5 load 
(meaning half a core) or 50% of max load meaning anything less than 8 if 
you have 16 cores?


Thanks,
Mike Dawson

On 8/30/2013 1:34 PM, Andrey Korolyov wrote:

You may want to reduce scrubbing pgs per osd to 1 using config option
and check the results.

On Fri, Aug 30, 2013 at 8:03 PM, Mike Dawson mike.daw...@cloudapt.com wrote:

We've been struggling with an issue of spikes of high i/o latency with
qemu/rbd guests. As we've been chasing this bug, we've greatly improved the
methods we use to monitor our infrastructure.

It appears that our RBD performance chokes in two situations:

- Deep-Scrub
- Backfill/recovery

In this email, I want to focus on deep-scrub. Graphing '% Util' from 'iostat
-x' on my hosts with OSDs, I can see Deep-Scrub take my disks from around
10% utilized to complete saturation during a scrub.

RBD writeback cache appears to cover the issue nicely, but occasionally
suffers drops in performance (presumably when it flushes). But, reads appear
to suffer greatly, with multiple seconds of 0B/s of reads accomplished (see
log fragment below). If I make the assumption that deep-scrub isn't intended
to create massive spindle contention, this appears to be a problem. What
should happen here?

Looking at the settings around deep-scrub, I don't see an obvious way to say
don't saturate my drives. Are there any setting in Ceph or otherwise
(readahead?) that might lower the burden of deep-scrub?

If not, perhaps reads could be remapped to avoid waiting on saturated disks
during scrub.

Any ideas?

2013-08-30 15:47:20.166149 mon.0 [INF] pgmap v9853931: 20672 pgs: 20665
active+clean, 7 active+clean+scrubbing+deep; 38136 GB data, 111 TB used,
64556 GB / 174 TB avail; 0B/s rd, 5058KB/s wr, 217op/s
2013-08-30 15:47:21.945948 mon.0 [INF] pgmap v9853932: 20672 pgs: 20665
active+clean, 7 active+clean+scrubbing+deep; 38136 GB data, 111 TB used,
64556 GB / 174 TB avail; 0B/s rd, 5553KB/s wr, 229op/s
2013-08-30 15:47:23.205843 mon.0 [INF] pgmap v9853933: 20672 pgs: 20664
active+clean, 8 active+clean+scrubbing+deep; 38136 GB data, 111 TB used,
64556 GB / 174 TB avail; 0B/s rd, 6580KB/s wr, 246op/s
2013-08-30 15:47:24.843308 mon.0 [INF] pgmap v9853934: 20672 pgs: 20664
active+clean, 8 active+clean+scrubbing+deep; 38136 GB data, 111 TB used,
64556 GB / 174 TB avail; 0B/s rd, 3795KB/s wr, 224op/s
2013-08-30 15:47:25.862722 mon.0 [INF] pgmap v9853935: 20672 pgs: 20664
active+clean, 8 active+clean+scrubbing+deep; 38136 GB data, 111 TB used,
64556 GB / 174 TB avail; 1414B/s rd, 3799KB/s wr, 181op/s
2013-08-30 15:47:26.887516 mon.0 [INF] pgmap v9853936: 20672 pgs: 20664
active+clean, 8 active+clean+scrubbing+deep; 38136 GB data, 111 TB used,
64556 GB / 174 TB avail; 1541B/s rd, 8138KB/s wr, 160op/s
2013-08-30 15:47:27.933629 mon.0 [INF] pgmap v9853937: 20672 pgs: 20664
active+clean, 8 active+clean+scrubbing+deep; 38136 GB data, 111 TB used,
64556 GB / 174 TB avail; 0B/s rd, 14458KB/s wr, 304op/s
2013-08-30 15:47:29.127847 mon.0 [INF] pgmap v9853938: 20672 pgs: 20664
active+clean, 8 active+clean+scrubbing+deep; 38136 GB data, 111 TB used,
64556 GB / 174 TB avail; 0B/s rd, 15300KB/s wr, 345op/s
2013-08-30 15:47:30.344837 mon.0 [INF] pgmap v9853939: 20672 pgs: 20664
active+clean, 8 active+clean+scrubbing+deep; 38136 GB data, 111 TB used,
64556 GB / 174 TB avail; 0B/s rd, 13128KB/s wr, 218op/s
2013-08-30 15:47:31.380089 mon.0 [INF] pgmap v9853940: 20672 pgs: 20664
active+clean, 8 active+clean+scrubbing+deep; 38136 GB data, 111 TB used,
64556 GB / 174 TB avail; 0B/s rd, 13299KB/s wr, 241op/s
2013-08-30 15:47:32.388303 mon.0 [INF] pgmap v9853941: 20672 pgs: 20664
active+clean, 8 active+clean+scrubbing+deep; 38136 GB data, 111 TB used,
64556 GB / 174 TB avail; 4951B/s rd, 8147KB/s wr, 192op/s
2013-08-30 15:47:33.858382 mon.0 [INF] pgmap v9853942: 20672 pgs: 20664
active+clean, 8 active+clean+scrubbing+deep; 38136 GB data, 111 TB used,
64556 GB / 174 TB avail; 7029B/s rd, 3254KB/s wr, 190op/s
2013-08-30 15:47:35.279691 mon.0 [INF] pgmap v9853943: 20672 pgs: 20664
active+clean, 8 active+clean+scrubbing+deep; 38136 GB data, 111 TB used,
64555 GB / 174 TB avail; 1651B/s rd, 2476KB/s wr, 207op/s
2013-08-30 15:47:36.309078 mon.0 [INF] pgmap v9853944: 20672 pgs: 20664
active+clean, 8 

Re: Deep-Scrub and High Read Latency with QEMU/RBD

2013-08-30 Thread Andrey Korolyov
On Fri, Aug 30, 2013 at 9:44 PM, Mike Dawson mike.daw...@cloudapt.com wrote:
 Andrey,

 I use all the defaults:

 # ceph --admin-daemon /var/run/ceph/ceph-osd.1.asok config show | grep scrub
   osd_scrub_thread_timeout: 60,
   osd_scrub_finalize_thread_timeout: 600,


   osd_max_scrubs: 1,

This one. I may suggest to increase max_interval and write some kind
of script doing per-pg scrub with low intensity, so you`ll have one
scrubbing PG or less anytime and you may wait some time before
scrubbing next, so they will not start scrubbing at once when
max_interval will expire. I had discussed some throttling mechanisms
to scrubbing some months ago here or in ceph-devel, but there still no
such implementation (it is ultimately low-priority task since it can
be handled by such simple thing as proposal above).

   osd_scrub_load_threshold: 0.5,
   osd_scrub_min_interval: 86400,
   osd_scrub_max_interval: 604800,
   osd_scrub_chunk_min: 5,
   osd_scrub_chunk_max: 25,
   osd_deep_scrub_interval: 604800,
   osd_deep_scrub_stride: 524288,

 Which value are you referring to?


 Does anyone know exactly how osd scrub load threshold works? The manual
 states The maximum CPU load. Ceph will not scrub when the CPU load is
 higher than this number. Default is 50%. So on a system with multiple
 processors and cores...what happens? Is the threshold .5 load (meaning half
 a core) or 50% of max load meaning anything less than 8 if you have 16
 cores?

 Thanks,
 Mike Dawson


 On 8/30/2013 1:34 PM, Andrey Korolyov wrote:

 You may want to reduce scrubbing pgs per osd to 1 using config option
 and check the results.

 On Fri, Aug 30, 2013 at 8:03 PM, Mike Dawson mike.daw...@cloudapt.com
 wrote:

 We've been struggling with an issue of spikes of high i/o latency with
 qemu/rbd guests. As we've been chasing this bug, we've greatly improved
 the
 methods we use to monitor our infrastructure.

 It appears that our RBD performance chokes in two situations:

 - Deep-Scrub
 - Backfill/recovery

 In this email, I want to focus on deep-scrub. Graphing '% Util' from
 'iostat
 -x' on my hosts with OSDs, I can see Deep-Scrub take my disks from around
 10% utilized to complete saturation during a scrub.

 RBD writeback cache appears to cover the issue nicely, but occasionally
 suffers drops in performance (presumably when it flushes). But, reads
 appear
 to suffer greatly, with multiple seconds of 0B/s of reads accomplished
 (see
 log fragment below). If I make the assumption that deep-scrub isn't
 intended
 to create massive spindle contention, this appears to be a problem. What
 should happen here?

 Looking at the settings around deep-scrub, I don't see an obvious way to
 say
 don't saturate my drives. Are there any setting in Ceph or otherwise
 (readahead?) that might lower the burden of deep-scrub?

 If not, perhaps reads could be remapped to avoid waiting on saturated
 disks
 during scrub.

 Any ideas?

 2013-08-30 15:47:20.166149 mon.0 [INF] pgmap v9853931: 20672 pgs: 20665
 active+clean, 7 active+clean+scrubbing+deep; 38136 GB data, 111 TB used,
 64556 GB / 174 TB avail; 0B/s rd, 5058KB/s wr, 217op/s
 2013-08-30 15:47:21.945948 mon.0 [INF] pgmap v9853932: 20672 pgs: 20665
 active+clean, 7 active+clean+scrubbing+deep; 38136 GB data, 111 TB used,
 64556 GB / 174 TB avail; 0B/s rd, 5553KB/s wr, 229op/s
 2013-08-30 15:47:23.205843 mon.0 [INF] pgmap v9853933: 20672 pgs: 20664
 active+clean, 8 active+clean+scrubbing+deep; 38136 GB data, 111 TB used,
 64556 GB / 174 TB avail; 0B/s rd, 6580KB/s wr, 246op/s
 2013-08-30 15:47:24.843308 mon.0 [INF] pgmap v9853934: 20672 pgs: 20664
 active+clean, 8 active+clean+scrubbing+deep; 38136 GB data, 111 TB used,
 64556 GB / 174 TB avail; 0B/s rd, 3795KB/s wr, 224op/s
 2013-08-30 15:47:25.862722 mon.0 [INF] pgmap v9853935: 20672 pgs: 20664
 active+clean, 8 active+clean+scrubbing+deep; 38136 GB data, 111 TB used,
 64556 GB / 174 TB avail; 1414B/s rd, 3799KB/s wr, 181op/s
 2013-08-30 15:47:26.887516 mon.0 [INF] pgmap v9853936: 20672 pgs: 20664
 active+clean, 8 active+clean+scrubbing+deep; 38136 GB data, 111 TB used,
 64556 GB / 174 TB avail; 1541B/s rd, 8138KB/s wr, 160op/s
 2013-08-30 15:47:27.933629 mon.0 [INF] pgmap v9853937: 20672 pgs: 20664
 active+clean, 8 active+clean+scrubbing+deep; 38136 GB data, 111 TB used,
 64556 GB / 174 TB avail; 0B/s rd, 14458KB/s wr, 304op/s
 2013-08-30 15:47:29.127847 mon.0 [INF] pgmap v9853938: 20672 pgs: 20664
 active+clean, 8 active+clean+scrubbing+deep; 38136 GB data, 111 TB used,
 64556 GB / 174 TB avail; 0B/s rd, 15300KB/s wr, 345op/s
 2013-08-30 15:47:30.344837 mon.0 [INF] pgmap v9853939: 20672 pgs: 20664
 active+clean, 8 active+clean+scrubbing+deep; 38136 GB data, 111 TB used,
 64556 GB / 174 TB avail; 0B/s rd, 13128KB/s wr, 218op/s
 2013-08-30 15:47:31.380089 mon.0 [INF] pgmap v9853940: 20672 pgs: 20664
 active+clean, 8 active+clean+scrubbing+deep; 38136 GB data, 111 TB used,
 64556 GB / 174 TB avail; 0B/s rd, 13299KB/s wr, 241op/s
 2013-08-30 15:47:32.388303