Hi,

As a Ceph consultant I get numerous calls throughout the year to help people 
with getting their broken Ceph clusters back online.

The causes of downtime vary vastly, but one of the biggest causes is that 
people use replication 2x. size = 2, min_size = 1.

In 2016 the amount of cases I have where data was lost due to these settings 
grew exponentially.

Usually a disk failed, recovery kicks in and while recovery is happening a 
second disk fails. Causing PGs to become incomplete.

There have been to many times where I had to use xfs_repair on broken disks and 
use ceph-objectstore-tool to export/import PGs.

I really don't like these cases, mainly because they can be prevented easily by 
using size = 3 and min_size = 2 for all pools.

With size = 2 you go into the danger zone as soon as a single disk/daemon 
fails. With size = 3 you always have two additional copies left thus keeping 
your data safe(r).

If you are running CephFS, at least consider running the 'metadata' pool with 
size = 3 to keep the MDS happy.

Please, let this be a big warning to everybody who is running with size = 2. 
The downtime and problems caused by missing objects/replicas are usually big 
and it takes days to recover from those. But very often data is lost and/or 
corrupted which causes even more problems.

I can't stress this enough. Running with size = 2 in production is a SERIOUS 
hazard and should not be done imho.

To anyone out there running with size = 2, please reconsider this!

Thanks,

Wido
_______________________________________________
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Reply via email to