Nick,

Yes, as you would expect a read error would not be used as a source for repair no matter which OSD(s) are getting read errors.


David

On 2/21/17 12:38 AM, Nick Fisk wrote:
-----Original Message-----
From: ceph-users [mailto:ceph-users-boun...@lists.ceph.com] On Behalf Of
Gregory Farnum
Sent: 20 February 2017 22:13
To: Nick Fisk <n...@fisk.me.uk>; David Zafman <dzaf...@redhat.com>
Cc: ceph-users <ceph-us...@ceph.com>
Subject: Re: [ceph-users] How safe is ceph pg repair these days?

On Sat, Feb 18, 2017 at 12:39 AM, Nick Fisk <n...@fisk.me.uk> wrote:
 From what I understand in Jewel+ Ceph has the concept of an
authorative shard, so in the case of a 3x replica pools, it will
notice that 2 replicas match and one doesn't and use one of the good
replicas. However, in a 2x pool your out of luck.

However, if someone could confirm my suspicions that would be good as
well.

Hmm, I went digging in and sadly this isn't quite right. The code has a
lot of
internal plumbing to allow more smarts than were previously feasible and
the erasure-coded pools make use of them for noticing stuff like local
corruption. Replicated pools make an attempt but it's not as reliable as
one
would like and it still doesn't involve any kind of voting mechanism.
A self-inconsistent replicated primary won't get chosen. A primary is
self-
inconsistent when its digest doesn't match the data, which happens when:
1) the object hasn't been written since it was last scrubbed, or
2) the object was written in full, or
3) the object has only been appended to since the last time its digest was
recorded, or
4) something has gone terribly wrong in/under LevelDB and the omap entries
don't match what the digest says should be there.

Thanks for the correction Greg. So I'm guessing that the probability of
overwriting with an incorrect primary is reduced in later releases, but it
can still happen.

Quick question and its maybe that this is a #5 on your list. What about
objects that are marked inconsistent on the primary due to a read error. I
would say 90% of my inconsistent PG's are always caused by a read error and
associated smartctl error.

"rados list-inconsistent-obj" shows that it knows that the primary had a
read error, so I assume a "pg repair" wouldn't try and read from the primary
again?

David knows more and correct if I'm missing something. He's also working
on
interfaces for scrub that are more friendly in general and allow
administrators to make more fine-grained decisions about recovery in ways
that cooperate with RADOS.
-Greg

-----Original Message-----
From: ceph-users [mailto:ceph-users-boun...@lists.ceph.com] On Behalf
Of Tracy Reed
Sent: 18 February 2017 03:06
To: Shinobu Kinjo <ski...@redhat.com>
Cc: ceph-users <ceph-us...@ceph.com>
Subject: Re: [ceph-users] How safe is ceph pg repair these days?

Well, that's the question...is that safe? Because the link to the
mailing
list
post (possibly outdated) says that what you just suggested is
definitely
NOT
safe. Is the mailing list post wrong? Has the situation changed?
Exactly
what
does ceph repair do now? I suppose I could go dig into the code but
I'm
not
an expert and would hate to get it wrong and post possibly bogus info
the the list for other newbies to find and worry about and possibly
lose their data.

On Fri, Feb 17, 2017 at 06:08:39PM PST, Shinobu Kinjo spake thusly:
if ``ceph pg deep-scrub <pg id>`` does not work then
   do
     ``ceph pg repair <pg id>


On Sat, Feb 18, 2017 at 10:02 AM, Tracy Reed
<tr...@ultraviolet.org>
wrote:
I have a 3 replica cluster. A couple times I have run into
inconsistent PGs. I googled it and ceph docs and various blogs
say run a repair first. But a couple people on IRC and a mailing
list thread from 2015 say that ceph blindly copies the primary
over the secondaries and calls it good.

http://lists.ceph.com/pipermail/ceph-users-ceph.com/2015-
May/001370.
html

I sure hope that isn't the case. If so it would seem highly
irresponsible to implement such a naive command called "repair".
I have recently learned how to properly analyze the OSD logs and
manually fix these things but not before having run repair on a
dozen inconsistent PGs. Now I'm worried about what sort of
corruption I may have introduced. Repairing things by hand is a
simple heuristic based on comparing the size or checksum (as
indicated by the logs) for each of the 3 copies and figuring out
which is correct. Presumably matching two out of three should win
and the odd object out should be deleted since having the exact
same kind of error on two different OSDs is highly improbable. I
don't understand why ceph repair wouldn't have done this all along.

What is the current best practice in the use of ceph repair?

Thanks!

--
Tracy Reed

_______________________________________________
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

--
Tracy Reed
_______________________________________________
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
_______________________________________________
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

_______________________________________________
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Reply via email to