On 07/19/2019 02:42 AM, Marc Schöchlin wrote:
> We have ~500 heavy load rbd-nbd devices in our xen cluster (rbd-nbd 12.2.5, 
> kernel 4.4.0+10, centos clone) and ~20 high load krbd devices (kernel 
> 4.15.0-45, ubuntu 16.04) - we never experienced problems like this.

For this setup, do you have 257 or more rbd-nbd devices running on a
single system?

If so then you are hitting another bug where newer kernels only support
256 devices. It looks like a regression was added when mq and netlink
support was added upstream. You can create more then 256 devices, but
some devices will not be able to execute any IO. Commands sent to the
rbd-nbd device are going to always timeout and you will see the errors
in your log.

I am testing some patches for that right now.
_______________________________________________
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Reply via email to