Hello Jason,

Am 18.07.19 um 20:10 schrieb Jason Dillaman:
> On Thu, Jul 18, 2019 at 1:47 PM Marc Schöchlin <m...@256bit.org> wrote:
>> Hello cephers,
>>
>> rbd-nbd crashes in a reproducible way here.
> I don't see a crash report in the log below. Is it really crashing or
> is it shutting down? If it is crashing and it's reproducable, can you
> install the debuginfo packages, attach gdb, and get a full backtrace
> of the crash?

I do not get a crash report of rbd-nbd.
I seems that "rbd-nbd" just terminates, and crashes the xfs filesystem because 
the nbd device is not available anymore.
("rbd nbd ls" shows no mapped device anymore)

>
> It seems like your cluster cannot keep up w/ the load and the nbd
> kernel driver is timing out the IO and shutting down. There is a
> "--timeout" option on "rbd-nbd" that you can use to increase the
> kernel IO timeout for nbd.
>
I have also a 36TB XFS (non_ec) volume on this virtual system mapped by krbd 
which is under really heavy read/write usage.
I never experienced problems like this on this system with similar usage 
patterns.

The volume which is involved in the problem only handles a really low load and 
i was capable to create the error situation by using the simple "find . -type f 
-name "*.sql" -exec ionice -c3 nice -n 20 gzip -v {} \;" command.
I copied and read ~1.5 TB of data to this volume without a problem - it seems 
that the gzip command provokes a io pattern which leads to the error situation.

As described i use a luminous "12.2.11" client which does not support that 
"--timeout" option (btw. a backport would be nice).
Our ceph system runs with a heavy write load, therefore we already set a 60 
seconds timeout using the following code:
(https://github.com/OnApp/nbd-kernel_mod/blob/master/nbd_set_timeout.c)

We have ~500 heavy load rbd-nbd devices in our xen cluster (rbd-nbd 12.2.5, 
kernel 4.4.0+10, centos clone) and ~20 high load krbd devices (kernel 
4.15.0-45, ubuntu 16.04) - we never experienced problems like this.
We only experience problems like this with rbd-nbd > 12.2.5 on ubuntu 16.04 
(kernel 4.15) or ubuntu 18.04 (kernel 4.15) with erasure encoding or without.

Regards
Marc


_______________________________________________
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Reply via email to