Re: [ceph-users] Behaviour of ceph pg repair on different replication levels
Am 26.06.2014 02:08, schrieb Gregory Farnum: > It's a good idea, and in fact there was a discussion yesterday during > the Ceph Developer Summit about making scrub repair significantly more > powerful; they're keeping that use case in mind in addition to very > fine-grained ones like specifying a particular replica for every > object. +1 This would be very cool. > Yeah, it's got nothing and is relying on the local filesystem to barf > if that happens. Unfortunately, neither xfs nor ext4 provide that > checking functionality (which is one of the reasons we continue to > look to btrfs as our long-term goal). When thinking in petabytes scale, bit rot going to happen as a matter of fact. So I think Ceph should be prepared, at least when there are more than 2 replicas. Regards Christian -- Dipl.-Inf. Christian Kauhaus <>< · k...@gocept.com · systems administration gocept gmbh & co. kg · Forsterstraße 29 · 06112 Halle (Saale) · Germany http://gocept.com · tel +49 345 219401-11 Python, Pyramid, Plone, Zope · consulting, development, hosting, operations signature.asc Description: OpenPGP digital signature ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] Behaviour of ceph pg repair on different replication levels
On Wed, Jun 25, 2014 at 12:22 AM, Christian Kauhaus wrote: > Am 23.06.2014 20:24, schrieb Gregory Farnum: >> Well, actually it always takes the primary copy, unless the primary >> has some way of locally telling that its version is corrupt. (This >> might happen if the primary thinks it should have an object, but it >> doesn't exist on disk.) But there's not a voting or anything at this >> time. > > Thanks Greg for the clarification. I wonder if some sort of voting during > recovery would be feasible to implement. Having this available would make a 3x > replica scheme immensely more useful. It's a good idea, and in fact there was a discussion yesterday during the Ceph Developer Summit about making scrub repair significantly more powerful; they're keeping that use case in mind in addition to very fine-grained ones like specifying a particular replica for every object. > > In my current understanding Ceph has no guards against local bit rot (e.g., > when a local disk returns incorrect data). Yeah, it's got nothing and is relying on the local filesystem to barf if that happens. Unfortunately, neither xfs nor ext4 provide that checking functionality (which is one of the reasons we continue to look to btrfs as our long-term goal). -Greg Software Engineer #42 @ http://inktank.com | http://ceph.com > Or is there already a voting scheme > in place during deep scrub? > > Regards > > Christian > > -- > Dipl.-Inf. Christian Kauhaus <>< · k...@gocept.com · systems administration > gocept gmbh & co. kg · Forsterstraße 29 · 06112 Halle (Saale) · Germany > http://gocept.com · tel +49 345 219401-11 > Python, Pyramid, Plone, Zope · consulting, development, hosting, operations > ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] Behaviour of ceph pg repair on different replication levels
Am 23.06.2014 20:24, schrieb Gregory Farnum: > Well, actually it always takes the primary copy, unless the primary > has some way of locally telling that its version is corrupt. (This > might happen if the primary thinks it should have an object, but it > doesn't exist on disk.) But there's not a voting or anything at this > time. Thanks Greg for the clarification. I wonder if some sort of voting during recovery would be feasible to implement. Having this available would make a 3x replica scheme immensely more useful. In my current understanding Ceph has no guards against local bit rot (e.g., when a local disk returns incorrect data). Or is there already a voting scheme in place during deep scrub? Regards Christian -- Dipl.-Inf. Christian Kauhaus <>< · k...@gocept.com · systems administration gocept gmbh & co. kg · Forsterstraße 29 · 06112 Halle (Saale) · Germany http://gocept.com · tel +49 345 219401-11 Python, Pyramid, Plone, Zope · consulting, development, hosting, operations signature.asc Description: OpenPGP digital signature ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] Behaviour of ceph pg repair on different replication levels
On Mon, Jun 23, 2014 at 4:54 AM, Christian Eichelmann wrote: > Hi ceph users, > > since our cluster had a few inconsistent pgs in the last time, i was > wondering what ceph pg repair does, depending on the replication level. > So I just wanted to check if my assumptions are correct: > > Replication 2x > Since the cluster can not decide which version is correct one, it would > just copy the primary copy (the active one) over the secondary copy. > Which is a 50/50 chance to get the correct version. > > Replication 3x or more > Now the cluster has a quorum and a ceph pg repair will replace the > corrupt replica with one of the correct one. No manual intervention needed. Well, actually it always takes the primary copy, unless the primary has some way of locally telling that its version is corrupt. (This might happen if the primary thinks it should have an object, but it doesn't exist on disk.) But there's not a voting or anything at this time. -Greg Software Engineer #42 @ http://inktank.com | http://ceph.com ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
[ceph-users] Behaviour of ceph pg repair on different replication levels
Hi ceph users, since our cluster had a few inconsistent pgs in the last time, i was wondering what ceph pg repair does, depending on the replication level. So I just wanted to check if my assumptions are correct: Replication 2x Since the cluster can not decide which version is correct one, it would just copy the primary copy (the active one) over the secondary copy. Which is a 50/50 chance to get the correct version. Replication 3x or more Now the cluster has a quorum and a ceph pg repair will replace the corrupt replica with one of the correct one. No manual intervention needed. Am I on the right way? Regards, Christian ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com