Re: [ceph-users] Slow requests from bluestore osds / crashing rbd-nbd

2019-05-21 Thread Charles Alva
Got it. Thanks for the explanation, Jason!

Kind regards,

Charles Alva
Sent from Gmail Mobile


On Tue, May 21, 2019 at 5:16 PM Jason Dillaman  wrote:

> On Tue, May 21, 2019 at 12:03 PM Charles Alva 
> wrote:
> >
> > Hi Jason,
> >
> > Should we disable fstrim services inside VM which runs on top of RBD?
>
> It has a potential to be a thundering herd issue if you have lots of
> VMs all issuing discards all at the same time and your RBD images do
> not have object-map enabled. With object-map enabled, the discards
> will just get ignored if the backing objects do not already exist. You
> could hit a similar issue if you have hundreds or thousands of VMs all
> running a scheduled IO heavy task all at the same time (e.g. a yum/apt
> update every week at midnight), so it's not really tied to discard (or
> even Ceph/RBD) but more of a peak IOPS capacity issue.
>
> > I recall Ubuntu OS has weekly fstrim cronjob enabled by default, while
> we have to enable fstrim service manually on Debian and CentOS.
> >
> >
> > Kind regards,
> >
> > Charles Alva
> > Sent from Gmail Mobile
> >
> > On Tue, May 21, 2019, 4:49 AM Jason Dillaman 
> wrote:
> >>
> >> On Mon, May 20, 2019 at 2:17 PM Marc Schöchlin  wrote:
> >> >
> >> > Hello cephers,
> >> >
> >> > we have a few systems which utilize a rbd-bd map/mount to get access
> to a rbd volume.
> >> > (This problem seems to be related to "[ceph-users] Slow requests from
> bluestore osds" (the original thread))
> >> >
> >> > Unfortunately the rbd-nbd device of a system crashes three mondays in
> series at ~00:00 when the systemd fstrim timer executes "fstrim -av".
> >> > (which runs in parallel to deep scrub operations)
> >>
> >> That's probably not a good practice if you have lots of VMs doing this
> >> at the same time *and* you are not using object-map. The reason is
> >> that "fstrim" could discard huge extents that result around a thousand
> >> concurrent remove/truncate/zero ops per image being thrown at your
> >> cluster.
> >>
> >> > After that the device constantly reports io errors every time a
> access to the filesystem happens.
> >> > Unmounting, remapping and mounting helped to get the
> filesystem/device back into business :-)
> >>
> >> If the cluster was being DDoSed by the fstrims, the VM OSes' might
> >> have timed out thinking a controller failure.
> >>
> >> > Manual 30 minute stresstests using the following fio command, did not
> produce any problems on client side
> >> > (Ceph storage reported some slow requests while testing).
> >> >
> >> > fio --randrepeat=1 --ioengine=libaio --direct=1 --gtod_reduce=1
> --name=test --filename=test --bs=4k --iodepth=64 --size=4G
> --readwrite=randrw --rwmixread=50 --numjobs=50 --loops=10
> >> >
> >> > It seems that others also experienced this problem:
> https://ceph-users.ceph.narkive.com/2FIfyx1U/rbd-nbd-timeout-and-crash
> >> > The change for setting device timeouts by not seems to be merged to
> luminous.
> >> > Experiments setting the timeout manually after mapping using
> https://github.com/OnApp/nbd-kernel_mod/blob/master/nbd_set_timeout.c
> haven't change the situation.
> >> >
> >> > Do you have suggestions how to analyze/solve the situation?
> >> >
> >> > Regards
> >> > Marc
> >> > 
> >> >
> >> >
> >> >
> >> > The client kernel throws messages like this:
> >> >
> >> > May 19 23:59:01 int-nfs-001 CRON[836295]: (root) CMD (command -v
> debian-sa1 > /dev/null && debian-sa1 60 2)
> >> > May 20 00:00:30 int-nfs-001 systemd[1]: Starting Discard unused
> blocks...
> >> > May 20 00:01:02 int-nfs-001 kernel: [1077851.623582] block nbd0:
> Connection timed out
> >> > May 20 00:01:02 int-nfs-001 kernel: [1077851.623613] block nbd0:
> shutting down sockets
> >> > May 20 00:01:02 int-nfs-001 kernel: [1077851.623617] print_req_error:
> I/O error, dev nbd0, sector 84082280
> >> > May 20 00:01:02 int-nfs-001 kernel: [1077851.623632] block nbd0:
> Connection timed out
> >> > May 20 00:01:02 int-nfs-001 kernel: [1077851.623636] print_req_error:
> I/O error, dev nbd0, sector 92470887
> >> > May 20 00:01:02 int-nfs-001 kernel: [1077851.623642] block nbd0:
> Connection timed out
> >> >
> >> > Ceph throws messages like this:
> >> >
> >> > 2019-05-20 00:00:00.000124 mon.ceph-mon-s43 mon.0 10.23.27.153:6789/0
> 173572 : cluster [INF] overall HEALTH_OK
> >> > 2019-05-20 00:00:54.249998 mon.ceph-mon-s43 mon.0 10.23.27.153:6789/0
> 173586 : cluster [WRN] Health check failed: 644 slow requests are blocked >
> 32 sec. Implicated osds 51 (REQUEST_SLOW)
> >> > 2019-05-20 00:01:00.330566 mon.ceph-mon-s43 mon.0 10.23.27.153:6789/0
> 173587 : cluster [WRN] Health check update: 594 slow requests are blocked >
> 32 sec. Implicated osds 51 (REQUEST_SLOW)
> >> > 2019-05-20 00:01:09.768476 mon.ceph-mon-s43 mon.0 10.23.27.153:6789/0
> 173591 : cluster [WRN] Health check update: 505 slow requests are blocked >
> 32 sec. Implicated osds 51 (REQUEST_SLOW)
> >> > 2019-05-20 00:01:14.768769 mon.ceph-mon-s43 mon.0 

Re: [ceph-users] Slow requests from bluestore osds / crashing rbd-nbd

2019-05-21 Thread Jason Dillaman
On Tue, May 21, 2019 at 12:03 PM Charles Alva  wrote:
>
> Hi Jason,
>
> Should we disable fstrim services inside VM which runs on top of RBD?

It has a potential to be a thundering herd issue if you have lots of
VMs all issuing discards all at the same time and your RBD images do
not have object-map enabled. With object-map enabled, the discards
will just get ignored if the backing objects do not already exist. You
could hit a similar issue if you have hundreds or thousands of VMs all
running a scheduled IO heavy task all at the same time (e.g. a yum/apt
update every week at midnight), so it's not really tied to discard (or
even Ceph/RBD) but more of a peak IOPS capacity issue.

> I recall Ubuntu OS has weekly fstrim cronjob enabled by default, while we 
> have to enable fstrim service manually on Debian and CentOS.
>
>
> Kind regards,
>
> Charles Alva
> Sent from Gmail Mobile
>
> On Tue, May 21, 2019, 4:49 AM Jason Dillaman  wrote:
>>
>> On Mon, May 20, 2019 at 2:17 PM Marc Schöchlin  wrote:
>> >
>> > Hello cephers,
>> >
>> > we have a few systems which utilize a rbd-bd map/mount to get access to a 
>> > rbd volume.
>> > (This problem seems to be related to "[ceph-users] Slow requests from 
>> > bluestore osds" (the original thread))
>> >
>> > Unfortunately the rbd-nbd device of a system crashes three mondays in 
>> > series at ~00:00 when the systemd fstrim timer executes "fstrim -av".
>> > (which runs in parallel to deep scrub operations)
>>
>> That's probably not a good practice if you have lots of VMs doing this
>> at the same time *and* you are not using object-map. The reason is
>> that "fstrim" could discard huge extents that result around a thousand
>> concurrent remove/truncate/zero ops per image being thrown at your
>> cluster.
>>
>> > After that the device constantly reports io errors every time a access to 
>> > the filesystem happens.
>> > Unmounting, remapping and mounting helped to get the filesystem/device 
>> > back into business :-)
>>
>> If the cluster was being DDoSed by the fstrims, the VM OSes' might
>> have timed out thinking a controller failure.
>>
>> > Manual 30 minute stresstests using the following fio command, did not 
>> > produce any problems on client side
>> > (Ceph storage reported some slow requests while testing).
>> >
>> > fio --randrepeat=1 --ioengine=libaio --direct=1 --gtod_reduce=1 
>> > --name=test --filename=test --bs=4k --iodepth=64 --size=4G 
>> > --readwrite=randrw --rwmixread=50 --numjobs=50 --loops=10
>> >
>> > It seems that others also experienced this problem: 
>> > https://ceph-users.ceph.narkive.com/2FIfyx1U/rbd-nbd-timeout-and-crash
>> > The change for setting device timeouts by not seems to be merged to 
>> > luminous.
>> > Experiments setting the timeout manually after mapping using 
>> > https://github.com/OnApp/nbd-kernel_mod/blob/master/nbd_set_timeout.c 
>> > haven't change the situation.
>> >
>> > Do you have suggestions how to analyze/solve the situation?
>> >
>> > Regards
>> > Marc
>> > 
>> >
>> >
>> >
>> > The client kernel throws messages like this:
>> >
>> > May 19 23:59:01 int-nfs-001 CRON[836295]: (root) CMD (command -v 
>> > debian-sa1 > /dev/null && debian-sa1 60 2)
>> > May 20 00:00:30 int-nfs-001 systemd[1]: Starting Discard unused blocks...
>> > May 20 00:01:02 int-nfs-001 kernel: [1077851.623582] block nbd0: 
>> > Connection timed out
>> > May 20 00:01:02 int-nfs-001 kernel: [1077851.623613] block nbd0: shutting 
>> > down sockets
>> > May 20 00:01:02 int-nfs-001 kernel: [1077851.623617] print_req_error: I/O 
>> > error, dev nbd0, sector 84082280
>> > May 20 00:01:02 int-nfs-001 kernel: [1077851.623632] block nbd0: 
>> > Connection timed out
>> > May 20 00:01:02 int-nfs-001 kernel: [1077851.623636] print_req_error: I/O 
>> > error, dev nbd0, sector 92470887
>> > May 20 00:01:02 int-nfs-001 kernel: [1077851.623642] block nbd0: 
>> > Connection timed out
>> >
>> > Ceph throws messages like this:
>> >
>> > 2019-05-20 00:00:00.000124 mon.ceph-mon-s43 mon.0 10.23.27.153:6789/0 
>> > 173572 : cluster [INF] overall HEALTH_OK
>> > 2019-05-20 00:00:54.249998 mon.ceph-mon-s43 mon.0 10.23.27.153:6789/0 
>> > 173586 : cluster [WRN] Health check failed: 644 slow requests are blocked 
>> > > 32 sec. Implicated osds 51 (REQUEST_SLOW)
>> > 2019-05-20 00:01:00.330566 mon.ceph-mon-s43 mon.0 10.23.27.153:6789/0 
>> > 173587 : cluster [WRN] Health check update: 594 slow requests are blocked 
>> > > 32 sec. Implicated osds 51 (REQUEST_SLOW)
>> > 2019-05-20 00:01:09.768476 mon.ceph-mon-s43 mon.0 10.23.27.153:6789/0 
>> > 173591 : cluster [WRN] Health check update: 505 slow requests are blocked 
>> > > 32 sec. Implicated osds 51 (REQUEST_SLOW)
>> > 2019-05-20 00:01:14.768769 mon.ceph-mon-s43 mon.0 10.23.27.153:6789/0 
>> > 173592 : cluster [WRN] Health check update: 497 slow requests are blocked 
>> > > 32 sec. Implicated osds 51 (REQUEST_SLOW)
>> > 2019-05-20 00:01:20.610398 mon.ceph-mon-s43 mon.0 10.23.27.153:6789/0 
>> > 

Re: [ceph-users] Slow requests from bluestore osds / crashing rbd-nbd

2019-05-21 Thread Charles Alva
Hi Jason,

Should we disable fstrim services inside VM which runs on top of RBD?

I recall Ubuntu OS has weekly fstrim cronjob enabled by default, while we
have to enable fstrim service manually on Debian and CentOS.


Kind regards,

Charles Alva
Sent from Gmail Mobile

On Tue, May 21, 2019, 4:49 AM Jason Dillaman  wrote:

> On Mon, May 20, 2019 at 2:17 PM Marc Schöchlin  wrote:
> >
> > Hello cephers,
> >
> > we have a few systems which utilize a rbd-bd map/mount to get access to
> a rbd volume.
> > (This problem seems to be related to "[ceph-users] Slow requests from
> bluestore osds" (the original thread))
> >
> > Unfortunately the rbd-nbd device of a system crashes three mondays in
> series at ~00:00 when the systemd fstrim timer executes "fstrim -av".
> > (which runs in parallel to deep scrub operations)
>
> That's probably not a good practice if you have lots of VMs doing this
> at the same time *and* you are not using object-map. The reason is
> that "fstrim" could discard huge extents that result around a thousand
> concurrent remove/truncate/zero ops per image being thrown at your
> cluster.
>
> > After that the device constantly reports io errors every time a access
> to the filesystem happens.
> > Unmounting, remapping and mounting helped to get the filesystem/device
> back into business :-)
>
> If the cluster was being DDoSed by the fstrims, the VM OSes' might
> have timed out thinking a controller failure.
>
> > Manual 30 minute stresstests using the following fio command, did not
> produce any problems on client side
> > (Ceph storage reported some slow requests while testing).
> >
> > fio --randrepeat=1 --ioengine=libaio --direct=1 --gtod_reduce=1
> --name=test --filename=test --bs=4k --iodepth=64 --size=4G
> --readwrite=randrw --rwmixread=50 --numjobs=50 --loops=10
> >
> > It seems that others also experienced this problem:
> https://ceph-users.ceph.narkive.com/2FIfyx1U/rbd-nbd-timeout-and-crash
> > The change for setting device timeouts by not seems to be merged to
> luminous.
> > Experiments setting the timeout manually after mapping using
> https://github.com/OnApp/nbd-kernel_mod/blob/master/nbd_set_timeout.c
> haven't change the situation.
> >
> > Do you have suggestions how to analyze/solve the situation?
> >
> > Regards
> > Marc
> > 
> >
> >
> >
> > The client kernel throws messages like this:
> >
> > May 19 23:59:01 int-nfs-001 CRON[836295]: (root) CMD (command -v
> debian-sa1 > /dev/null && debian-sa1 60 2)
> > May 20 00:00:30 int-nfs-001 systemd[1]: Starting Discard unused blocks...
> > May 20 00:01:02 int-nfs-001 kernel: [1077851.623582] block nbd0:
> Connection timed out
> > May 20 00:01:02 int-nfs-001 kernel: [1077851.623613] block nbd0:
> shutting down sockets
> > May 20 00:01:02 int-nfs-001 kernel: [1077851.623617] print_req_error:
> I/O error, dev nbd0, sector 84082280
> > May 20 00:01:02 int-nfs-001 kernel: [1077851.623632] block nbd0:
> Connection timed out
> > May 20 00:01:02 int-nfs-001 kernel: [1077851.623636] print_req_error:
> I/O error, dev nbd0, sector 92470887
> > May 20 00:01:02 int-nfs-001 kernel: [1077851.623642] block nbd0:
> Connection timed out
> >
> > Ceph throws messages like this:
> >
> > 2019-05-20 00:00:00.000124 mon.ceph-mon-s43 mon.0 10.23.27.153:6789/0
> 173572 : cluster [INF] overall HEALTH_OK
> > 2019-05-20 00:00:54.249998 mon.ceph-mon-s43 mon.0 10.23.27.153:6789/0
> 173586 : cluster [WRN] Health check failed: 644 slow requests are blocked >
> 32 sec. Implicated osds 51 (REQUEST_SLOW)
> > 2019-05-20 00:01:00.330566 mon.ceph-mon-s43 mon.0 10.23.27.153:6789/0
> 173587 : cluster [WRN] Health check update: 594 slow requests are blocked >
> 32 sec. Implicated osds 51 (REQUEST_SLOW)
> > 2019-05-20 00:01:09.768476 mon.ceph-mon-s43 mon.0 10.23.27.153:6789/0
> 173591 : cluster [WRN] Health check update: 505 slow requests are blocked >
> 32 sec. Implicated osds 51 (REQUEST_SLOW)
> > 2019-05-20 00:01:14.768769 mon.ceph-mon-s43 mon.0 10.23.27.153:6789/0
> 173592 : cluster [WRN] Health check update: 497 slow requests are blocked >
> 32 sec. Implicated osds 51 (REQUEST_SLOW)
> > 2019-05-20 00:01:20.610398 mon.ceph-mon-s43 mon.0 10.23.27.153:6789/0
> 173593 : cluster [WRN] Health check update: 509 slow requests are blocked >
> 32 sec. Implicated osds 51 (REQUEST_SLOW)
> > 2019-05-20 00:01:28.721891 mon.ceph-mon-s43 mon.0 10.23.27.153:6789/0
> 173594 : cluster [WRN] Health check update: 501 slow requests are blocked >
> 32 sec. Implicated osds 51 (REQUEST_SLOW)
> > 2019-05-20 00:01:34.909842 mon.ceph-mon-s43 mon.0 10.23.27.153:6789/0
> 173596 : cluster [WRN] Health check update: 494 slow requests are blocked >
> 32 sec. Implicated osds 51 (REQUEST_SLOW)
> > 2019-05-20 00:01:44.770330 mon.ceph-mon-s43 mon.0 10.23.27.153:6789/0
> 173597 : cluster [WRN] Health check update: 500 slow requests are blocked >
> 32 sec. Implicated osds 51 (REQUEST_SLOW)
> > 2019-05-20 00:01:49.770625 mon.ceph-mon-s43 mon.0 10.23.27.153:6789/0
> 173599 : 

Re: [ceph-users] Slow requests from bluestore osds / crashing rbd-nbd

2019-05-21 Thread Jason Dillaman
On Tue, May 21, 2019 at 11:28 AM Marc Schöchlin  wrote:
>
> Hello Jason,
>
> Am 20.05.19 um 23:49 schrieb Jason Dillaman:
>
> On Mon, May 20, 2019 at 2:17 PM Marc Schöchlin  wrote:
>
> Hello cephers,
>
> we have a few systems which utilize a rbd-bd map/mount to get access to a rbd 
> volume.
> (This problem seems to be related to "[ceph-users] Slow requests from 
> bluestore osds" (the original thread))
>
> Unfortunately the rbd-nbd device of a system crashes three mondays in series 
> at ~00:00 when the systemd fstrim timer executes "fstrim -av".
> (which runs in parallel to deep scrub operations)
>
> That's probably not a good practice if you have lots of VMs doing this
> at the same time *and* you are not using object-map. The reason is
> that "fstrim" could discard huge extents that result around a thousand
> concurrent remove/truncate/zero ops per image being thrown at your
> cluster.
>
> Sure, currently we do not have lots of vms which are capable to run fstim on 
> rbd volumes.
> But the already involved RBD Images are multiple-tb images with a high 
> write/deletetion rate.
> Therefore i am already in progress to distribute fstrims by adding random 
> delays
>
> After that the device constantly reports io errors every time a access to the 
> filesystem happens.
> Unmounting, remapping and mounting helped to get the filesystem/device back 
> into business :-)
>
> If the cluster was being DDoSed by the fstrims, the VM OSes' might
> have timed out thinking a controller failure.
>
>
> Yes and no :-) Probably my problem is related to the kernel release, kernel 
> setting or the operating system release.
> Why?
>
> we run ~800 RBD images on that ceph cluster with rbd-nbd 12.2.5 in our xen 
> cluster as dom0-storage repository device without any timeout problems
> (kernel 4.4.0+10, centos 7)
> we run some 35TB kRBD images with multiples of the load of the crashed 
> rbd-nbd very write/read/deletion load without any timeout problems
> the timeout problem appears on two vms (ubuntu 18.04, ubuntu 16.04) which 
> utilize the described settings
>
> From my point of view, the error behavior is currently reproducible with a 
> good probability.
> Do you have suggestions how to find the root cause of this problem?

Can you provide any logs/backtraces/core dumps from the rbd-nbd process?

> Regards
> Marc


-- 
Jason
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Slow requests from bluestore osds / crashing rbd-nbd

2019-05-21 Thread Marc Schöchlin
Hello Jason,

Am 20.05.19 um 23:49 schrieb Jason Dillaman:
> On Mon, May 20, 2019 at 2:17 PM Marc Schöchlin  wrote:
>> Hello cephers,
>>
>> we have a few systems which utilize a rbd-bd map/mount to get access to a 
>> rbd volume.
>> (This problem seems to be related to "[ceph-users] Slow requests from 
>> bluestore osds" (the original thread))
>>
>> Unfortunately the rbd-nbd device of a system crashes three mondays in series 
>> at ~00:00 when the systemd fstrim timer executes "fstrim -av".
>> (which runs in parallel to deep scrub operations)
> That's probably not a good practice if you have lots of VMs doing this
> at the same time *and* you are not using object-map. The reason is
> that "fstrim" could discard huge extents that result around a thousand
> concurrent remove/truncate/zero ops per image being thrown at your
> cluster.
Sure, currently we do not have lots of vms which are capable to run fstim on 
rbd volumes.
But the already involved RBD Images are multiple-tb images with a high 
write/deletetion rate.
Therefore i am already in progress to distribute fstrims by adding random delays
>
>> After that the device constantly reports io errors every time a access to 
>> the filesystem happens.
>> Unmounting, remapping and mounting helped to get the filesystem/device back 
>> into business :-)
> If the cluster was being DDoSed by the fstrims, the VM OSes' might
> have timed out thinking a controller failure.

Yes and no :-) Probably my problem is related to the kernel release, kernel 
setting or the operating system release.
Why?

  * we run ~800 RBD images on that ceph cluster with rbd-nbd 12.2.5 in our xen 
cluster as dom0-storage repository device without any timeout problems
(kernel 4.4.0+10, centos 7)
  * we run some 35TB kRBD images with multiples of the load of the crashed 
rbd-nbd very write/read/deletion load without any timeout problems
  * the timeout problem appears on two vms (ubuntu 18.04, ubuntu 16.04) which 
utilize the described settings

From my point of view, the error behavior is currently reproducible with a good 
probability.
Do you have suggestions how to find the root cause of this problem?

Regards
Marc

___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Slow requests from bluestore osds / crashing rbd-nbd

2019-05-20 Thread Jason Dillaman
On Mon, May 20, 2019 at 2:17 PM Marc Schöchlin  wrote:
>
> Hello cephers,
>
> we have a few systems which utilize a rbd-bd map/mount to get access to a rbd 
> volume.
> (This problem seems to be related to "[ceph-users] Slow requests from 
> bluestore osds" (the original thread))
>
> Unfortunately the rbd-nbd device of a system crashes three mondays in series 
> at ~00:00 when the systemd fstrim timer executes "fstrim -av".
> (which runs in parallel to deep scrub operations)

That's probably not a good practice if you have lots of VMs doing this
at the same time *and* you are not using object-map. The reason is
that "fstrim" could discard huge extents that result around a thousand
concurrent remove/truncate/zero ops per image being thrown at your
cluster.

> After that the device constantly reports io errors every time a access to the 
> filesystem happens.
> Unmounting, remapping and mounting helped to get the filesystem/device back 
> into business :-)

If the cluster was being DDoSed by the fstrims, the VM OSes' might
have timed out thinking a controller failure.

> Manual 30 minute stresstests using the following fio command, did not produce 
> any problems on client side
> (Ceph storage reported some slow requests while testing).
>
> fio --randrepeat=1 --ioengine=libaio --direct=1 --gtod_reduce=1 --name=test 
> --filename=test --bs=4k --iodepth=64 --size=4G --readwrite=randrw 
> --rwmixread=50 --numjobs=50 --loops=10
>
> It seems that others also experienced this problem: 
> https://ceph-users.ceph.narkive.com/2FIfyx1U/rbd-nbd-timeout-and-crash
> The change for setting device timeouts by not seems to be merged to luminous.
> Experiments setting the timeout manually after mapping using 
> https://github.com/OnApp/nbd-kernel_mod/blob/master/nbd_set_timeout.c haven't 
> change the situation.
>
> Do you have suggestions how to analyze/solve the situation?
>
> Regards
> Marc
> 
>
>
>
> The client kernel throws messages like this:
>
> May 19 23:59:01 int-nfs-001 CRON[836295]: (root) CMD (command -v debian-sa1 > 
> /dev/null && debian-sa1 60 2)
> May 20 00:00:30 int-nfs-001 systemd[1]: Starting Discard unused blocks...
> May 20 00:01:02 int-nfs-001 kernel: [1077851.623582] block nbd0: Connection 
> timed out
> May 20 00:01:02 int-nfs-001 kernel: [1077851.623613] block nbd0: shutting 
> down sockets
> May 20 00:01:02 int-nfs-001 kernel: [1077851.623617] print_req_error: I/O 
> error, dev nbd0, sector 84082280
> May 20 00:01:02 int-nfs-001 kernel: [1077851.623632] block nbd0: Connection 
> timed out
> May 20 00:01:02 int-nfs-001 kernel: [1077851.623636] print_req_error: I/O 
> error, dev nbd0, sector 92470887
> May 20 00:01:02 int-nfs-001 kernel: [1077851.623642] block nbd0: Connection 
> timed out
>
> Ceph throws messages like this:
>
> 2019-05-20 00:00:00.000124 mon.ceph-mon-s43 mon.0 10.23.27.153:6789/0 173572 
> : cluster [INF] overall HEALTH_OK
> 2019-05-20 00:00:54.249998 mon.ceph-mon-s43 mon.0 10.23.27.153:6789/0 173586 
> : cluster [WRN] Health check failed: 644 slow requests are blocked > 32 sec. 
> Implicated osds 51 (REQUEST_SLOW)
> 2019-05-20 00:01:00.330566 mon.ceph-mon-s43 mon.0 10.23.27.153:6789/0 173587 
> : cluster [WRN] Health check update: 594 slow requests are blocked > 32 sec. 
> Implicated osds 51 (REQUEST_SLOW)
> 2019-05-20 00:01:09.768476 mon.ceph-mon-s43 mon.0 10.23.27.153:6789/0 173591 
> : cluster [WRN] Health check update: 505 slow requests are blocked > 32 sec. 
> Implicated osds 51 (REQUEST_SLOW)
> 2019-05-20 00:01:14.768769 mon.ceph-mon-s43 mon.0 10.23.27.153:6789/0 173592 
> : cluster [WRN] Health check update: 497 slow requests are blocked > 32 sec. 
> Implicated osds 51 (REQUEST_SLOW)
> 2019-05-20 00:01:20.610398 mon.ceph-mon-s43 mon.0 10.23.27.153:6789/0 173593 
> : cluster [WRN] Health check update: 509 slow requests are blocked > 32 sec. 
> Implicated osds 51 (REQUEST_SLOW)
> 2019-05-20 00:01:28.721891 mon.ceph-mon-s43 mon.0 10.23.27.153:6789/0 173594 
> : cluster [WRN] Health check update: 501 slow requests are blocked > 32 sec. 
> Implicated osds 51 (REQUEST_SLOW)
> 2019-05-20 00:01:34.909842 mon.ceph-mon-s43 mon.0 10.23.27.153:6789/0 173596 
> : cluster [WRN] Health check update: 494 slow requests are blocked > 32 sec. 
> Implicated osds 51 (REQUEST_SLOW)
> 2019-05-20 00:01:44.770330 mon.ceph-mon-s43 mon.0 10.23.27.153:6789/0 173597 
> : cluster [WRN] Health check update: 500 slow requests are blocked > 32 sec. 
> Implicated osds 51 (REQUEST_SLOW)
> 2019-05-20 00:01:49.770625 mon.ceph-mon-s43 mon.0 10.23.27.153:6789/0 173599 
> : cluster [WRN] Health check update: 608 slow requests are blocked > 32 sec. 
> Implicated osds 51 (REQUEST_SLOW)
> 2019-05-20 00:01:55.073734 mon.ceph-mon-s43 mon.0 10.23.27.153:6789/0 173600 
> : cluster [WRN] Health check update: 593 slow requests are blocked > 32 sec. 
> Implicated osds 51 (REQUEST_SLOW)
> 2019-05-20 00:02:04.771432 mon.ceph-mon-s43 mon.0 10.23.27.153:6789/0 173607 
> : cluster [WRN] Health 

Re: [ceph-users] Slow requests from bluestore osds / crashing rbd-nbd

2019-05-20 Thread Marc Schöchlin
Hello cephers,

we have a few systems which utilize a rbd-bd map/mount to get access to a rbd 
volume.
(This problem seems to be related to "[ceph-users] Slow requests from bluestore 
osds" (the original thread))

Unfortunately the rbd-nbd device of a system crashes three mondays in series at 
~00:00 when the systemd fstrim timer executes "fstrim -av".
(which runs in parallel to deep scrub operations)

After that the device constantly reports io errors every time a access to the 
filesystem happens.
Unmounting, remapping and mounting helped to get the filesystem/device back 
into business :-)

Manual 30 minute stresstests using the following fio command, did not produce 
any problems on client side
(Ceph storage reported some slow requests while testing).

fio --randrepeat=1 --ioengine=libaio --direct=1 --gtod_reduce=1 --name=test 
--filename=test --bs=4k --iodepth=64 --size=4G --readwrite=randrw 
--rwmixread=50 --numjobs=50 --loops=10

It seems that others also experienced this problem: 
https://ceph-users.ceph.narkive.com/2FIfyx1U/rbd-nbd-timeout-and-crash
The change for setting device timeouts by not seems to be merged to luminous.
Experiments setting the timeout manually after mapping using 
https://github.com/OnApp/nbd-kernel_mod/blob/master/nbd_set_timeout.c haven't 
change the situation.

Do you have suggestions how to analyze/solve the situation?

Regards
Marc
--



The client kernel throws messages like this:

May 19 23:59:01 int-nfs-001 CRON[836295]: (root) CMD (command -v debian-sa1 > 
/dev/null && debian-sa1 60 2)
May 20 00:00:30 int-nfs-001 systemd[1]: Starting Discard unused blocks...
May 20 00:01:02 int-nfs-001 kernel: [1077851.623582] block nbd0: Connection 
timed out
May 20 00:01:02 int-nfs-001 kernel: [1077851.623613] block nbd0: shutting down 
sockets
May 20 00:01:02 int-nfs-001 kernel: [1077851.623617] print_req_error: I/O 
error, dev nbd0, sector 84082280
May 20 00:01:02 int-nfs-001 kernel: [1077851.623632] block nbd0: Connection 
timed out
May 20 00:01:02 int-nfs-001 kernel: [1077851.623636] print_req_error: I/O 
error, dev nbd0, sector 92470887
May 20 00:01:02 int-nfs-001 kernel: [1077851.623642] block nbd0: Connection 
timed out

Ceph throws messages like this:

2019-05-20 00:00:00.000124 mon.ceph-mon-s43 mon.0 10.23.27.153:6789/0 173572 : 
cluster [INF] overall HEALTH_OK
2019-05-20 00:00:54.249998 mon.ceph-mon-s43 mon.0 10.23.27.153:6789/0 173586 : 
cluster [WRN] Health check failed: 644 slow requests are blocked > 32 sec. 
Implicated osds 51 (REQUEST_SLOW)
2019-05-20 00:01:00.330566 mon.ceph-mon-s43 mon.0 10.23.27.153:6789/0 173587 : 
cluster [WRN] Health check update: 594 slow requests are blocked > 32 sec. 
Implicated osds 51 (REQUEST_SLOW)
2019-05-20 00:01:09.768476 mon.ceph-mon-s43 mon.0 10.23.27.153:6789/0 173591 : 
cluster [WRN] Health check update: 505 slow requests are blocked > 32 sec. 
Implicated osds 51 (REQUEST_SLOW)
2019-05-20 00:01:14.768769 mon.ceph-mon-s43 mon.0 10.23.27.153:6789/0 173592 : 
cluster [WRN] Health check update: 497 slow requests are blocked > 32 sec. 
Implicated osds 51 (REQUEST_SLOW)
2019-05-20 00:01:20.610398 mon.ceph-mon-s43 mon.0 10.23.27.153:6789/0 173593 : 
cluster [WRN] Health check update: 509 slow requests are blocked > 32 sec. 
Implicated osds 51 (REQUEST_SLOW)
2019-05-20 00:01:28.721891 mon.ceph-mon-s43 mon.0 10.23.27.153:6789/0 173594 : 
cluster [WRN] Health check update: 501 slow requests are blocked > 32 sec. 
Implicated osds 51 (REQUEST_SLOW)
2019-05-20 00:01:34.909842 mon.ceph-mon-s43 mon.0 10.23.27.153:6789/0 173596 : 
cluster [WRN] Health check update: 494 slow requests are blocked > 32 sec. 
Implicated osds 51 (REQUEST_SLOW)
2019-05-20 00:01:44.770330 mon.ceph-mon-s43 mon.0 10.23.27.153:6789/0 173597 : 
cluster [WRN] Health check update: 500 slow requests are blocked > 32 sec. 
Implicated osds 51 (REQUEST_SLOW)
2019-05-20 00:01:49.770625 mon.ceph-mon-s43 mon.0 10.23.27.153:6789/0 173599 : 
cluster [WRN] Health check update: 608 slow requests are blocked > 32