On Mon, Nov 26, 2018 at 3:30 AM Janne Johansson <icepic...@gmail.com> wrote:
> Den sön 25 nov. 2018 kl 22:10 skrev Stefan Kooman <ste...@bit.nl>: > > > > Hi List, > > > > Another interesting and unexpected thing we observed during cluster > > expansion is the following. After we added extra disks to the cluster, > > while "norebalance" flag was set, we put the new OSDs "IN". As soon as > > we did that a couple of hundered objects would become degraded. During > > that time no OSD crashed or restarted. Every "ceph osd crush add $osd > > weight host=$storage-node" would cause extra degraded objects. > > > > I don't expect objects to become degraded when extra OSDs are added. > > Misplaced, yes. Degraded, no > > > > Someone got an explantion for this? > > > > Yes, when you add a drive (or 10), some PGs decide they should have one or > more > replicas on the new drives, a new empty PG is created there, and > _then_ that replica > will make that PG get into the "degraded" mode, meaning if it had 3 > fine active+clean > replicas before, it now has 2 active+clean and one needing backfill to > get into shape. > > It is a slight mistake in reporting it in the same way as an error, > even if it looks to the > cluster just as if it was in error and needs fixing. This gives the > new ceph admins a > sense of urgency or danger whereas it should be perfectly normal to add > space to > a cluster. Also, it could have chosen to add a fourth PG in a repl=3 > PG and fill from > the one going out into the new empty PG and somehow keep itself with 3 > working > replicas, but ceph chooses to first discard one replica, then backfill > into the empty > one, leading to this kind of "error" report. > See, that's the thing: Ceph is designed *not* to reduce data reliability this way; it shouldn't do that; and so far as I've been able to establish so far it doesn't actually do that. Which makes these degraded object reports a bit perplexing. What we have worked out is that sometimes objects can be degraded because the log-based recovery takes a while after the primary juggles around PG set membership, and I suspect that's what is turning up here. The exact cause still eludes me a bit, but I assume it's a consequence of the backfill and recovery throttling we've added over the years. If a whole PG was missing then you'd expect to see very large degraded object counts (as opposed to the 2 that Marco reported). -Greg > > -- > May the most significant bit of your life be positive. > _______________________________________________ > ceph-users mailing list > ceph-users@lists.ceph.com > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com >
_______________________________________________ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com