Hi,

I have a Ceph cluster with 4 disk servers, 14 OSDs and replica size of
3. A number of KVM virtual machines are using RBD as their only storage
device. Whenever some OSDs (always on a single server) have slow
requests, caused, I believe, by flaky hardware or, in one occasion, by a
S.M.A.R.T command that crashed the system disk of one of the disk
servers, most virtual machines remount their disk read-only and need to
be rebooted.

One of the virtual machines still has Debian 6 installed, and it never
crashes. It also has an ext3 filesystem, contrary to some other
machines, which have ext4. ext3 does crash in systems with Debian 7, but
those have different mount flags, such as "barrier" and "data=ordered".
I suspect (but haven't tested) that using "nobarrier" may solve the
problem, but that doesn't seem to be an ideal solution.

Most of those machines have Debian 7 or Ubuntu 12.04, but two of them
have Ubuntu 14.04 (and thus a more recent kernel) and they also remount
read-only.

I searched the mailing list and found a couple of relevant messages. One
person seemed to have the same problem[1], but someone else replied that
it didn't happen in his case ("I've had multiple VMs hang for hours at a
time when I broke a Ceph cluster and after fixing it the VMs would start
working again"). The other message[2] is not very informative.

Are other people experiencing this problem? Is there a file system or
kernel version that is recommended for KVM guests that would prevent it?
Or does this problem indicate that something else is wrong and should be
fixed? I did configure all machines to use "cache=writeback", but never
investigated whether that makes a difference or even whether it is
actually working.

Thanks,
Paulo Almeida
Instituto Gulbenkian de Ciência, Oeiras, Portugal


[1] http://thread.gmane.org/gmane.comp.file-systems.ceph.user/8011
[2] http://thread.gmane.org/gmane.comp.file-systems.ceph.user/1742

_______________________________________________
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Reply via email to