Oliver, Here's the commit info:
https://github.com/ceph/ceph/commit/48e40fcde7b19bab98821ab8d604eab920591284 Caspar 2018-02-27 14:28 GMT+01:00 Oliver Freyermuth <freyerm...@physik.uni-bonn.de> : > Am 27.02.2018 um 14:16 schrieb Caspar Smit: > > Oliver, > > > > Be aware that for k=4,m=2 the min_size will be 5 (k+1), so after a node > failure the min_size is already reached. > > Any OSD failure beyond the node failure will probably result in some > PG's to be become incomplete (I/O Freeze) until the incomplete PG's data is > recovered to another OSD in that node. > > > > So please reconsider your statement "one host + x safety" as the x > safety (with I/O freeze) is probably not what you want. > > > > Forcing to run with min_size=4 could also be dangerous for other > reasons. (there's a reason why min_size = k+1) > > Thanks for pointing this out! > Yes, indeed, in case we need to take down a host for a longer period (we > would hope this never has to happen for > 24 hours... but you never know), > and in case disks start to fail, we would indeed have to degrade to > min_size=4 to keep running. > > What exactly are the implications? > It should still be possible to ensure the data is not corrupt (with the > checksums), and recovery to k+1 copies should start automatically once a > disk fails - > so what's the actual implication? > Of course pg repair can not work in that case (if a PG for which the > additional disk failed is corrupted), > but in general, when there's the need to reinstall a host, we'd try to > bring it back with OSD data intact - > which should then allow to postpone the repair until that point. > > Is there a danger I miss in my reasoning? > > Cheers and many thanks! > Oliver > > > > > Caspar > > > > 2018-02-27 0:17 GMT+01:00 Oliver Freyermuth < > freyerm...@physik.uni-bonn.de <mailto:freyerm...@physik.uni-bonn.de>>: > > > > Am 27.02.2018 um 00:10 schrieb Gregory Farnum: > > > On Mon, Feb 26, 2018 at 2:59 PM Oliver Freyermuth < > freyerm...@physik.uni-bonn.de <mailto:freyerm...@physik.uni-bonn.de> > <mailto:freyerm...@physik.uni-bonn.de <mailto:freyerm...@physik.uni- > bonn.de>>> wrote: > > > > > > > > > > Does this match expectations? > > > > > > > > > > > > Can you get the output of eg "ceph pg 2.7cd query"? Want to > make sure the backfilling versus acting sets and things are correct. > > > > > > You'll find attached: > > > query_allwell) Output of "ceph pg 2.7cd query" when all OSDs > are up and everything is healthy. > > > query_one_host_out) Output of "ceph pg 2.7cd query" when OSDs > 164-195 (one host) are down and out. > > > > > > > > > Yep, that's what we want to see. So when everything's well, we > have OSDs 91, 63, 33, 163, 192, 103. That corresponds to chassis 3, 2, 1, > 5, 6, 4. > > > > > > When marking out a host, we have OSDs 91, 63, 33, 163, 123, > UNMAPPED. That corresponds to chassis 3, 2, 1, 5, 4, UNMAPPED. > > > > > > So what's happened is that with the new map, when choosing the > home for shard 4, we selected host 4 instead of host 6 (which is gone). And > now shard 5 can't map properly. But of course we still have shard 5 > available on host 4, so host 4 is going to end up properly owning shard 4, > but also just carrying that shard 5 around as a remapped location. > > > > > > So this is as we expect. Whew. > > > -Greg > > > > Understood. Thanks for explaining step by step :-). > > It's of course a bit weird that this happens, since in the end, this > really means data is moved (or rather, a shard is recreated) and taking up > space without increasing redundancy > > (well, it might, if it lands on a different OSD than shard 5, but > that's not really ensured). I'm unsure if this can be solved "better" in > any way. > > > > Anyways, it seems this would be another reason why running with > k+m=number of hosts should not be a general recommendation. For us, it's > fine for now, > > especially since we want to keep the cluster open for later > extension with more OSDs, and we do now know the gotchas - and I don't see > a better EC configuration at the moment > > which would accomodate our wishes (one host + x safety, don't reduce > space too much). > > > > So thanks again! > > > > Cheers, > > Oliver > > > > > > _______________________________________________ > > ceph-users mailing list > > ceph-users@lists.ceph.com <mailto:ceph-users@lists.ceph.com> > > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com < > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com> > > > > > >
_______________________________________________ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com