Re: [ceph-users] deep scrubbing causes osd down

Andrei Mikhailovsky Sat, 11 Apr 2015 12:53:38 -0700

Hi JC, 

I am running ceph 0.87.1 on Ubuntu 12.04 LTS server with latest patches. I am 
however running kernel version 3.19.3 and not the stock distro one.


I am running cfq on all spindles and noop on all ssds (used for journals). 

I've not done any scrub specific options, but will try and see if it makes a 
difference. 

Thanks for your feedback 

Andrei 

----- Original Message -----

> From: "LOPEZ Jean-Charles" <jelo...@redhat.com>
> To: "Andrei Mikhailovsky" <and...@arhont.com>
> Cc: "LOPEZ Jean-Charles" <jelo...@redhat.com>,
> ceph-users@lists.ceph.com
> Sent: Saturday, 11 April, 2015 7:54:18 PM
> Subject: Re: [ceph-users] deep scrubbing causes osd down

> Hi Andrei,

> 1) what ceph version are you running?
> 2) what distro and version are you running?
> 3) have you checked the disk elevator for the OSD devices to be set
> to cfq?
> 4) Have have you considered exploring the following parameters to
> further tune
> - osd_scrub_chunk_min lower the default value of 5. e.g. = 1
> - osd_scrub_chunk_max lower the default value of 25. e.g. = 5

> - osd_deep_scrub_stride If you have lowered parameters above, you can
> play with this one to fit best your physical disk behaviour. -
> osd_scrub_sleep introduce a half second sleep between 2 scrubs; e.g.
> = 0.5 to start with a half second delay

> Cheers
> JC

> > On 10 Apr 2015, at 12:01, Andrei Mikhailovsky < and...@arhont.com >
> > wrote:
> 

> > Hi guys,
> 

> > I was wondering if anyone noticed that the deep scrubbing process
> > causes some osd to go down?
> 

> > I have been keeping an eye on a few remaining stability issues in
> > my
> > test cluster. One of the unsolved issues is the occasional
> > reporting
> > of osd(s) going down and coming back up after about 20-30 seconds.
> > This happens to various osds throughout the cluster. I have a small
> > cluster of just 2 osd servers with 9 osds each.
> 

> > The common trend that i see week after week is that whenever there
> > is
> > a long deep scrubbing activity on the cluster it triggers one or
> > more osds to go down for a short period of time. After the osd is
> > marked down, it goes back up after about 20 seconds. Obviously
> > there
> > is a repair process that kicks in which causes more load on the
> > cluster. While looking at the logs, i've not seen the osds being
> > marked down when the cluster is not deep scrubbing. It _always_
> > happens when there is a deep scrub activity. I am seeing the
> > reports
> > of osds going down about 3-4 times a week.
> 

> > The latest happened just recently with the following log entries:
> 

> > 2015-04-10 19:32:48.330430 mon.0 192.168.168.13:6789/0 3441533 :
> > cluster [INF] pgmap v50849466: 8508 pgs: 8506 active+clean, 2
> > active+clean+scrubbing+deep; 13213 GB data, 26896 GB used, 23310 GB
> > / 50206 GB avail; 1005 B/s rd, 1005
> 
> > B/s wr, 0 op/s
> 
> > 2015-04-10 19:32:52.950633 mon.0 192.168.168.13:6789/0 3441542 :
> > cluster [INF] osd.6 192.168.168.200:6816/3738 failed (5 reports
> > from
> > 5 peers after 60.747890 >= grace 46.701350)
> 
> > 2015-04-10 19:32:53.121904 mon.0 192.168.168.13:6789/0 3441544 :
> > cluster [INF] osdmap e74309: 18 osds: 17 up, 18 in
> 
> > 2015-04-10 19:32:53.231730 mon.0 192.168.168.13:6789/0 3441545 :
> > cluster [INF] pgmap v50849467: 8508 pgs: 599 stale+active+clean,
> > 7907 active+clean, 1 stale+active+clean+scrubbing+deep, 1
> > active+clean+scrubbing+deep; 13213 GB data, 26896 GB used, 23310 GB
> > / 50206 GB avail; 375 B/s rd, 0 op/s
> 

> > osd.6 logs around the same time are:
> 

> > 2015-04-10 19:16:29.110617 7fad6d5ec700 0 log_channel(default) log
> > [INF] : 5.3d7 deep-scrub ok
> 
> > 2015-04-10 19:27:47.561389 7fad6bde9700 0 log_channel(default) log
> > [INF] : 5.276 deep-scrub ok
> 
> > 2015-04-10 19:31:11.611321 7fad6d5ec700 0 log_channel(default) log
> > [INF] : 5.287 deep-scrub ok
> 
> > 2015-04-10 19:31:53.339881 7fad7ce0b700 1 heartbeat_map is_healthy
> > 'OSD::osd_op_tp thread 0x7fad735f8700' had timed out after 15
> 
> > 2015-04-10 19:31:53.339887 7fad7ce0b700 1 heartbeat_map is_healthy
> > 'OSD::osd_op_tp thread 0x7fad745fa700' had timed out after 15
> 
> > 2015-04-10 19:31:53.339890 7fad7ce0b700 1 heartbeat_map is_healthy
> > 'OSD::osd_op_tp thread 0x7fad705f2700' had timed out after 15
> 
> > 2015-04-10 19:31:53.340050 7fad7e60e700 1 heartbeat_map is_healthy
> > 'OSD::osd_op_tp thread 0x7fad735f8700' had timed out after 15
> 
> > 2015-04-10 19:31:53.340053 7fad7e60e700 1 heartbeat_map is_healthy
> > 'OSD::osd_op_tp thread 0x7fad745fa700' had timed out after 15
> 

> > [.....]
> 

> > 2015-04-10 19:32:53.010609 7fad7e60e700 1 heartbeat_map is_healthy
> > 'FileStore::op_tp thread 0x7fad86132700' had timed out after 60
> 
> > 2015-04-10 19:32:53.010611 7fad7e60e700 1 heartbeat_map is_healthy
> > 'FileStore::op_tp thread 0x7fad88937700' had timed out after 60
> 
> > 2015-04-10 19:32:53.111470 7fad66ed2700 0 --
> > 192.168.168.200:6817/3738 >> 192.168.168.201:6837/4409
> > pipe(0x2b793b80 sd=179 :6817 s=2 pgs=5 cs=1 l=0 c=0x21e8b420).fault
> > with nothing to send, going to standby
> 
> > 2015-04-10 19:32:53.111496 7fad6329d700 0 --
> > 192.168.168.200:6817/3738 >> 192.168.168.201:6827/4208
> > pipe(0x2b793600 sd=172 :6817 s=2 pgs=7 cs=1 l=0 c=0x1791ab00).fault
> > with nothing to send, going to standby
> 
> > 2015-04-10 19:32:53.111463 7fad55bd0700 0 --
> > 192.168.168.200:6817/3738 >> 192.168.168.201:6822/3910
> > pipe(0x2cb55dc0 sd=262 :6817 s=2 pgs=8 cs=1 l=0 c=0xe7802c0).fault
> > with nothing to send, going to standby
> 
> > 2015-04-10 19:32:53.121815 7fad6218c700 0 --
> > 192.168.168.200:6817/3738 >> 192.168.168.201:6807/3575
> > pipe(0x2cf8e080 sd=294 :6817 s=2 pgs=4 cs=1 l=0 c=0x138669a0).fault
> > with nothing to send, going to standby
> 
> > 2015-04-10 19:32:53.121856 7fad67bdf700 0 --
> > 192.168.168.200:6817/3738 >> 192.168.168.201:6842/4442
> > pipe(0x2b792580 sd=190 :6817 s=2 pgs=9 cs=1 l=0 c=0x138922c0).fault
> > with nothing to send, going to standby
> 
> > 2015-04-10 19:32:53.123545 7fad651bc700 0 --
> > 192.168.168.200:6817/3738 >> 192.168.168.201:6801/3053
> > pipe(0x15e538c0 sd=260 :6817 s=2 pgs=1 cs=1 l=0 c=0x16bf09a0).fault
> > with nothing to send, going to standby
> 
> > 2015-04-10 19:32:53.128729 7fad53eb3700 0 --
> > 192.168.168.200:6817/3738 >> 192.168.168.201:6832/4257
> > pipe(0x37dcb80 sd=311 :6817 s=2 pgs=3 cs=1 l=0 c=0x1131f420).fault
> > with nothing to send, going to standby
> 
> > 2015-04-10 19:32:53.132691 7fad53fb4700 0 --
> > 192.168.168.200:6817/3738 >> 192.168.168.201:6812/3674
> > pipe(0x37dbdc0 sd=417 :6817 s=2 pgs=3 cs=1 l=0 c=0x1c0badc0).fault
> > with nothing to send, going to standby
> 
> > 2015-04-10 19:32:53.139856 7fad65cc7700 0 --
> > 192.168.168.200:6817/3738 >> 192.168.168.201:6817/3864
> > pipe(0x2cf8eb80 sd=365 :6817 s=2 pgs=5 cs=1 l=0 c=0xbd1f2c0).fault
> > with nothing to send, going to standby
> 

> > [ followed by about 500 or so lines which look similar to:]
> 
> > 2015-04-10 19:33:09.021252 7fad65cc7700 0 --
> > 192.168.168.200:6817/3738 >> 192.168.168.201:6842/4442
> > pipe(0x2b792580 sd=24 :45523 s=2 pgs=1721 cs=3371 l=0
> > c=0x138922c0).fault, initiating reconnect
> 
> > 2015-04-10 19:33:09.021280 7fad53eb3700 0 --
> > 192.168.168.200:6817/3738 >> 192.168.168.201:6807/3575
> > pipe(0x2cf8e080 sd=37 :51859 s=2 pgs=1755 cs=3449 l=0
> > c=0x138669a0).fault, initiating reconnect
> 
> > 2015-04-10 19:33:09.021443 7fad66ed2700 0 --
> > 192.168.168.200:6817/3738 >> 192.168.168.201:6812/3674
> > pipe(0x37dbdc0 sd=18 :46453 s=2 pgs=1782 cs=3489 l=0
> > c=0x1c0badc0).fault, initiating reconnect
> 
> > 2015-04-10 19:33:09.021702 7fad6218c700 0 --
> > 192.168.168.200:6817/3738 >> 192.168.168.201:6837/4409
> > pipe(0x2b793b80 sd=38 :53724 s=2 pgs=1757 cs=3449 l=0
> > c=0x21e8b420).fault, initiating reconnect
> 

> > [followed by:]
> 

> > 2015-04-10 19:33:09.021830 7fad8e142700 0 log_channel(default) log
> > [WRN] : 6 slow requests, 6 included below; oldest blocked for >
> > 94.284115 secs
> 
> > 2015-04-10 19:33:09.021834 7fad67bdf700 0 --
> > 192.168.168.200:6817/3738 >> 192.168.168.201:6817/3864
> > pipe(0x2cf8eb80 sd=29 :50115 s=2 pgs=1717 cs=3395 l=0
> > c=0xbd1f2c0).fault, initiating reconnect
> 
> > 2015-04-10 19:33:09.021840 7fad8e142700 0 log_channel(default) log
> > [WRN] : slow request 62.902554 seconds old, received at 2015-04-10
> > 19:32:06.119164: osd_op(client.64965167.0:1562644
> > rbd_data.1f264b2ae8944a.0000000000000400 [set-alloc-hint
> > object_size
> > 4194304 write_size 4194304,write 1155072~4096] 5.f4cd34bb
> > ack+ondisk+write+known_if_redirected e74308) currently waiting for
> > subops from 13
> 
> > 2015-04-10 19:33:09.021846 7fad8e142700 0 log_channel(default) log
> > [WRN] : slow request 63.004012 seconds old, received at 2015-04-10
> > 19:32:06.017706: osd_op(client.69637439.0:278581863
> > rbd_data.265f967a5f7514.0000000000064e2e [set-alloc-hint
> > object_size
> > 4194304 write_size 4194304,write 1040384~4096] 5.bf3d2652 snapc
> > ac=[ac] ack+ondisk+write+known_if_redirected e74308) currently
> > waiting for subops from 11
> 
> > 2015-04-10 19:33:09.021851 7fad8e142700 0 log_channel(default) log
> > [WRN] : slow request 94.284115 seconds old, received at 2015-04-10
> > 19:31:34.737603: osd_sub_op(unknown.0.0:0 5.299 0//0//-1
> > [scrub-map]
> > v 0'0 snapset=0=[]:[] snapc=0=[]) currently started
> 
> > 2015-04-10 19:33:09.021856 7fad8e142700 0 log_channel(default) log
> > [WRN] : slow request 92.669022 seconds old, received at 2015-04-10
> > 19:31:36.352696: osd_op(client.69645050.0:2750479
> > rbd_data.1b704a8f51a837.000000000000023d [stat,set-alloc-hint
> > object_size 4194304 write_size 4194304,write 1449984~4096]
> > 5.44a257d7 snapc 9cc=[9cc,9c1,9b5,9a9,99d,991,984,972,953,2f8]
> > ack+ondisk+write+known_if_redirected e74308) currently waiting for
> > subops from 17
> 
> > 2015-04-10 19:33:09.021861 7fad8e142700 0 log_channel(default) log
> > [WRN] : slow request 76.733358 seconds old, received at 2015-04-10
> > 19:31:52.288360: osd_op(client.64987185.0:26635606
> > rbd_data.1b652a2ae8944a.0000000000000623 [set-alloc-hint
> > object_size
> > 4194304 write_size 4194304,write 1282048~4096] 5.64d42a36
> > ack+ondisk+write+known_if_redirected e74308) currently waiting for
> > subops from 17
> 

> > [and finally]
> 

> > 2015-04-10 19:33:09.024361 7fad5c83b700 0 --
> > 192.168.168.200:6817/3738 >> 192.168.168.201:6832/4257
> > pipe(0x37dcb80 sd=35 :55060 s=2 pgs=1719 cs=3359 l=0
> > c=0x1131f420).fault, initiating reconnect
> 
> > 2015-04-10 19:33:09.024418 7fad82616700 0 log_channel(default) log
> > [WRN] : map e74309 wrongly marked me down
> 
> > 2015-04-10 19:33:09.024466 7fad53fb4700 0 --
> > 192.168.168.200:6817/3738 >> 192.168.168.201:6827/4208
> > pipe(0x2b793600 sd=33 :36954 s=2 pgs=1731 cs=3403 l=0
> > c=0x1791ab00).fault, initiating reconnect
> 
> > 2015-04-10 19:33:09.024476 7fad67bdf700 0 --
> > 192.168.168.200:6817/3738 >> 192.168.168.201:6817/3864
> > pipe(0x2cf8eb80 sd=24 :50133 s=2 pgs=1720 cs=3401 l=0
> > c=0xbd1f2c0).fault, initiating reconnect
> 
> > 2015-04-10 19:33:09.024553 7fad65cc7700 0 --
> > 192.168.168.200:6817/3738 >> 192.168.168.201:6842/4442
> > pipe(0x2b792580 sd=29 :45548 s=2 pgs=1725 cs=3379 l=0
> > c=0x138922c0).fault, initiating reconnect
> 
> > 2015-04-10 19:33:09.024629 7fad53eb3700 0 --
> > 192.168.168.200:6817/3738 >> 192.168.168.201:6807/3575
> > pipe(0x2cf8e080 sd=37 :51884 s=2 pgs=1759 cs=3457 l=0
> > c=0x138669a0).fault, initiating reconnect
> 
> > 2015-04-10 19:33:09.024783 7fad650bb700 0 --
> > 192.168.168.200:6817/3738 >> 192.168.168.201:6842/4442
> > pipe(0x2b792580 sd=29 :45548 s=1 pgs=1725 cs=3380 l=0
> > c=0x138922c0).fault
> 
> > 2015-04-10 19:33:09.025056 7fad6218c700 0 --
> > 192.168.168.200:6817/3738 >> 192.168.168.201:6837/4409
> > pipe(0x2b793b80 sd=38 :53747 s=2 pgs=1761 cs=3457 l=0
> > c=0x21e8b420).fault, initiating reconnect
> 
> > 2015-04-10 19:33:09.025062 7fad61580700 0 --
> > 192.168.168.200:6817/3738 >> 192.168.168.201:6817/3864
> > pipe(0x2cf8eb80 sd=24 :50140 s=4 pgs=1720 cs=3402 l=0
> > c=0xbd1f2c0).connect got RESETSESSION but no longer connecting
> 
> > 2015-04-10 19:33:16.999741 7fad5f964700 0 --
> > 192.168.168.200:6846/1003738 >> 192.168.168.200:6837/4266
> > pipe(0xa711580 sd=257 :6846 s=0 pgs=0 cs=0 l=0 c=0x1afbf840).accept
> > connect_seq 0 vs existing 0 state connecting
> 
> > 2015-04-10 19:33:16.999883 7fad5f863700 0 --
> > 192.168.168.200:6846/1003738 >> 192.168.168.200:6827/3938
> > pipe(0x2c48a8c0 sd=260 :6846 s=0 pgs=0 cs=0 l=0
> > c=0x18a85580).accept
> > connect_seq 0 vs existing 0 state connecting
> 
> > 2015-04-10 19:36:26.247402 7fad6bde9700 0 log_channel(default) log
> > [INF] : 5.299 deep-scrub ok
> 
> > 2015-04-10 19:40:18.577423 7fad725f6700 0 <cls>
> > cls/rgw/cls_rgw.cc:1458: gc_iterate_entries
> > end_key=1_01428691218.577419000
> 

> > 2015-04-10 19:42:22.709291 7fad6edef700 0 log_channel(default) log
> > [INF] : 5.2ae deep-scrub ok
> 
> > 2015-04-10 19:45:51.790224 7fad6bde9700 0 log_channel(default) log
> > [INF] : 5.2bb deep-scrub ok
> 
> > 2015-04-10 19:48:17.022413 7fad5f863700 0 --
> > 192.168.168.200:6846/1003738 >> 192.168.168.200:6827/3938
> > pipe(0x2c48a8c0 sd=260 :6846 s=2 pgs=63 cs=1 l=0 c=0x3f81160).fault
> > with nothing to send, going to standby
> 

> > In the past I have been asked to verify that the networking is
> > working okay. I've ran days and days of various tests and verified
> > that the network is not the cause of the problems. The network is
> > running perfectly well with no errors reported on the interfaces
> > nor
> > any delays. I ran multitude of tcp and udp tests including firing
> > hundreds to thousands connections per second for weeks with 0%
> > packet loss or any other network related errors.
> 

> > Does anyone have any idea what could be causing the osds to be
> > marked
> > down during the deep scrub activity?
> 

> > I have enabled the following tuning options in the ceph.conf:
> 

> > #change to decrease rebuild/recovery priority (should make guest
> > vms
> > more responsive) and decrease scrubbing priority
> 
> > osd recovery max chunk = 8388608
> 
> > osd recovery op priority = 1
> 
> > osd max backfills = 1
> 
> > osd recovery max active = 1
> 
> > osd recovery threads = 1
> 
> > osd_disk_thread_ioprio_priority = 7
> 
> > osd_disk_thread_ioprio_class = idle
> 

> > #reporting of down osds
> 
> > mon osd min down reporters = 6
> 
> > mon osd report timeout = 1800
> 
> > mon osd min down reports = 7
> 
> > osd heartbeat grace = 60
> 

> > Many thanks for your help
> 

> > Andrei
> 

> > _______________________________________________
> 
> > ceph-users mailing list
> 
> > ceph-users@lists.ceph.com
> 
> > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>

_______________________________________________
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Re: [ceph-users] deep scrubbing causes osd down

Reply via email to