[ceph-users] Re: (Ceph Octopus) Repairing a neglected Ceph cluster - Degraded Data Reduncancy, all PGs degraded, undersized, not scrubbed in time

Hans van den Bogert Mon, 16 Nov 2020 04:00:23 -0800

I think we're deviating from the original thread quite a bit and I wouldnever argue that in a production environment with plenty OSDs you shouldgo for R=2 or K+1, so my example cluster which happens to be 2+1 is abit unlucky.


However I'm interested in the following


On 11/16/20 11:31 AM, Janne Johansson wrote:
> So while one could always say "one more drive is better than your
> amount", there are people losing data with repl=2 or K+1 because some
> more normal operation was in flight and _then_ a single surprise
> happens.  So you can have a weird reboot, causing those PGs needing
> backfill later, and if one of the uptodate hosts have any single
> surprise during the recovery, the cluster will lack some of the current
> data even if two disks were never down at the same time.

I'm not sure I follow, from a logical perspective they *are* down at thesame time right? In your scenario 1 up-to-date replica was left, buteven that had a surprise. Okay well that's the risk you take with R=2,but it's not intrinsically different than R=3.

_______________________________________________
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io

[ceph-users] Re: (Ceph Octopus) Repairing a neglected Ceph cluster - Degraded Data Reduncancy, all PGs degraded, undersized, not scrubbed in time

Reply via email to