[ceph-users] Re: (Ceph Octopus) Repairing a neglected Ceph cluster - Degraded Data Reduncancy, all PGs degraded, undersized, not scrubbed in time

Frank Schilder Mon, 16 Nov 2020 08:18:52 -0800

To throw in my 5 cents. Choosing m in k+m EC replication is not random and the 
argument that anyone with larger m could always say lower m is wrong is also 
not working.

Why are people recommending m>=2 for production (or R>=3 replicas)?

Its very simple. What is forgotten below is maintenance. Whenever you do 
maintenance on ceph, there will be longer episodes of degraded redundancy as 
OSDs are down. However, on production storage systems, writes *always* need to 
go to redundant storage. Hence, minimum redundancy under maintenance is the 
keyword here.

With m=1 (R=2) one could never do any maintenance without down time as shutting 
down just 1 OSD would imply writes to non-redundant storage, which in turn 
would mean data loss in case a disk dies during maintenance.

Basically, with m parity shards you can do maintenance on m-1 failure domains 
at the same time without downtime or non-redundant writes. With R copies you 
can do maintenance on R-2 failure domains without downtime.

If your SLAs require higher minimum redundancy at all times, m (R) need to be 
large enough to allow maintenance unless you do downtime. However, the latter 
would be odd, because one of the key features of ceph is its ability to 
provides infinite uptime while hardware gets renewed all the time.

Best regards,
=================
Frank Schilder
AIT Risø Campus
Bygning 109, rum S14

________________________________________
From: Hans van den Bogert <hansbog...@gmail.com>
Sent: 16 November 2020 12:59:31
Cc: ceph-users
Subject: [ceph-users] Re: (Ceph Octopus) Repairing a neglected Ceph cluster - 
Degraded Data Reduncancy, all PGs degraded, undersized, not scrubbed in time

I think we're deviating from the original thread quite a bit and I would
never argue that in a production environment with plenty OSDs you should
go for R=2 or K+1, so my example cluster which happens to be 2+1 is a
bit unlucky.

However I'm interested in the following

On 11/16/20 11:31 AM, Janne Johansson wrote:
 > So while one could always say "one more drive is better than your
 > amount", there are people losing data with repl=2 or K+1 because some
 > more normal operation was in flight and _then_ a single surprise
 > happens.  So you can have a weird reboot, causing those PGs needing
 > backfill later, and if one of the uptodate hosts have any single
 > surprise during the recovery, the cluster will lack some of the current
 > data even if two disks were never down at the same time.

I'm not sure I follow, from a logical perspective they *are* down at the
same time right? In your scenario 1 up-to-date  replica was left, but
even that had a surprise. Okay well that's the risk you take with R=2,
but it's not intrinsically different than R=3.
_______________________________________________
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io
_______________________________________________
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io

[ceph-users] Re: (Ceph Octopus) Repairing a neglected Ceph cluster - Degraded Data Reduncancy, all PGs degraded, undersized, not scrubbed in time

Reply via email to