Personally I would not just run this command automatically because as you
stated, it only copies the primary PGs to the replicas and if the primary
is corrupt, you will corrupt your secondaries.I think the monitor log shows
which OSD has the problem so if it is not your primary, then just issue the
repair command.

There was talk, and I believe work towards, Ceph storing a hash of the
object so that it can be smarter about which replica has the correct data
and automatically replicate the good data no matter where it is. I think
the first part, creating the hash and storing it, has been included in
Hammer. I'm not an authority on this so take it with a grain of salt.

Right now our procedure is to find the PG files on the OSDs, perform a MD5
on all of them and the one that doesn't match, overwrite, either by issuing
the PG repair command, or removing the bad PG files, rsyncing them with the
-X argument and then instructing a deep-scrub on the PG to clear it up in
Ceph.

I've only tested this on an idle cluster, so I don't know how well it will
work on an active cluster. Since we issue a deep-scrub, if the PGs of the
replicas change during the rsync, it should come up with an error. The idea
is to keep rsyncing until the deep-scrub is clean. Be warned that you may
be aiming your gun at your foot with this!

----------------
Robert LeBlanc
GPG Fingerprint 79A2 9CA4 6CC4 45DD A904  C70E E654 3BB2 FA62 B9F1

On Mon, May 11, 2015 at 2:09 AM, Christian Eichelmann <
christian.eichelm...@1und1.de> wrote:

> Hi all!
>
> We are experiencing approximately 1 scrub error / inconsistent pg every
> two days. As far as I know, to fix this you can issue a "ceph pg
> repair", which works fine for us. I have a few qestions regarding the
> behavior of the ceph cluster in such a case:
>
> 1. After ceph detects the scrub error, the pg is marked as inconsistent.
> Does that mean that any IO to this pg is blocked until it is repaired?
>
> 2. Is this amount of scrub errors normal? We currently have only 150TB
> in our cluster, distributed over 720 2TB disks.
>
> 3. As far as I know, a "ceph pg repair" just copies the content of the
> primary pg to all replicas. Is this still the case? What if the primary
> copy is the one having errors? We have a 4x replication level and it
> would be cool if ceph would use one of the pg for recovery which has the
> same checksum as the majority of pgs.
>
> 4. Some of this errors are happening at night. Since ceph reports this
> as a critical error, our shift is called and wake up, just to issue a
> single command. Do you see any problems in triggering this command
> automatically via monitoring event? Is there a reason why ceph isn't
> resolving these errors itself when it has enought replicas to do so?
>
> Regards,
> Christian
> _______________________________________________
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>
_______________________________________________
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Reply via email to