Re: [ceph-users] Behaviour of ceph pg repair on different replication levels

2014-06-26 Thread Christian Kauhaus
Am 26.06.2014 02:08, schrieb Gregory Farnum:
> It's a good idea, and in fact there was a discussion yesterday during
> the Ceph Developer Summit about making scrub repair significantly more
> powerful; they're keeping that use case in mind in addition to very
> fine-grained ones like specifying a particular replica for every
> object.

+1

This would be very cool.

> Yeah, it's got nothing and is relying on the local filesystem to barf
> if that happens. Unfortunately, neither xfs nor ext4 provide that
> checking functionality (which is one of the reasons we continue to
> look to btrfs as our long-term goal).

When thinking in petabytes scale, bit rot going to happen as a matter of fact.
So I think Ceph should be prepared, at least when there are more than 2 
replicas.

Regards

Christian

-- 
Dipl.-Inf. Christian Kauhaus <>< · k...@gocept.com · systems administration
gocept gmbh & co. kg · Forsterstraße 29 · 06112 Halle (Saale) · Germany
http://gocept.com · tel +49 345 219401-11
Python, Pyramid, Plone, Zope · consulting, development, hosting, operations



signature.asc
Description: OpenPGP digital signature
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Behaviour of ceph pg repair on different replication levels

2014-06-25 Thread Gregory Farnum
On Wed, Jun 25, 2014 at 12:22 AM, Christian Kauhaus  wrote:
> Am 23.06.2014 20:24, schrieb Gregory Farnum:
>> Well, actually it always takes the primary copy, unless the primary
>> has some way of locally telling that its version is corrupt. (This
>> might happen if the primary thinks it should have an object, but it
>> doesn't exist on disk.) But there's not a voting or anything at this
>> time.
>
> Thanks Greg for the clarification. I wonder if some sort of voting during
> recovery would be feasible to implement. Having this available would make a 3x
> replica scheme immensely more useful.

It's a good idea, and in fact there was a discussion yesterday during
the Ceph Developer Summit about making scrub repair significantly more
powerful; they're keeping that use case in mind in addition to very
fine-grained ones like specifying a particular replica for every
object.

>
> In my current understanding Ceph has no guards against local bit rot (e.g.,
> when a local disk returns incorrect data).

Yeah, it's got nothing and is relying on the local filesystem to barf
if that happens. Unfortunately, neither xfs nor ext4 provide that
checking functionality (which is one of the reasons we continue to
look to btrfs as our long-term goal).
-Greg
Software Engineer #42 @ http://inktank.com | http://ceph.com

> Or is there already a voting scheme
> in place during deep scrub?
>
> Regards
>
> Christian
>
> --
> Dipl.-Inf. Christian Kauhaus <>< · k...@gocept.com · systems administration
> gocept gmbh & co. kg · Forsterstraße 29 · 06112 Halle (Saale) · Germany
> http://gocept.com · tel +49 345 219401-11
> Python, Pyramid, Plone, Zope · consulting, development, hosting, operations
>
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Behaviour of ceph pg repair on different replication levels

2014-06-25 Thread Christian Kauhaus
Am 23.06.2014 20:24, schrieb Gregory Farnum:
> Well, actually it always takes the primary copy, unless the primary
> has some way of locally telling that its version is corrupt. (This
> might happen if the primary thinks it should have an object, but it
> doesn't exist on disk.) But there's not a voting or anything at this
> time.

Thanks Greg for the clarification. I wonder if some sort of voting during
recovery would be feasible to implement. Having this available would make a 3x
replica scheme immensely more useful.

In my current understanding Ceph has no guards against local bit rot (e.g.,
when a local disk returns incorrect data). Or is there already a voting scheme
in place during deep scrub?

Regards

Christian

-- 
Dipl.-Inf. Christian Kauhaus <>< · k...@gocept.com · systems administration
gocept gmbh & co. kg · Forsterstraße 29 · 06112 Halle (Saale) · Germany
http://gocept.com · tel +49 345 219401-11
Python, Pyramid, Plone, Zope · consulting, development, hosting, operations



signature.asc
Description: OpenPGP digital signature
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Behaviour of ceph pg repair on different replication levels

2014-06-23 Thread Gregory Farnum
On Mon, Jun 23, 2014 at 4:54 AM, Christian Eichelmann
 wrote:
> Hi ceph users,
>
> since our cluster had a few inconsistent pgs in the last time, i was
> wondering what ceph pg repair does, depending on the replication level.
> So I just wanted to check if my assumptions are correct:
>
> Replication 2x
> Since the cluster can not decide which version is correct one, it would
> just copy the primary copy (the active one) over the secondary copy.
> Which is a 50/50 chance to get the correct version.
>
> Replication 3x or more
> Now the cluster has a quorum and a ceph pg repair will replace the
> corrupt replica with one of the correct one. No manual intervention needed.

Well, actually it always takes the primary copy, unless the primary
has some way of locally telling that its version is corrupt. (This
might happen if the primary thinks it should have an object, but it
doesn't exist on disk.) But there's not a voting or anything at this
time.
-Greg
Software Engineer #42 @ http://inktank.com | http://ceph.com
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


[ceph-users] Behaviour of ceph pg repair on different replication levels

2014-06-23 Thread Christian Eichelmann
Hi ceph users,

since our cluster had a few inconsistent pgs in the last time, i was
wondering what ceph pg repair does, depending on the replication level.
So I just wanted to check if my assumptions are correct:

Replication 2x
Since the cluster can not decide which version is correct one, it would
just copy the primary copy (the active one) over the secondary copy.
Which is a 50/50 chance to get the correct version.

Replication 3x or more
Now the cluster has a quorum and a ceph pg repair will replace the
corrupt replica with one of the correct one. No manual intervention needed.

Am I on the right way?

Regards,
Christian
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com