I've tried a few more things to get a deep-scrub going on my PG. I tried
instructing the involved osds to scrub all their PGs and it looks like that
didn't do it.

Do you have any documentation on the object-store-tool? What I've found
online talks about filestore and not bluestore.

On 6 April 2018 at 09:27, David Turner <drakonst...@gmail.com> wrote:

> I'm running into this exact same situation.  I'm running 12.2.2 and I have
> an EC PG with a scrub error.  It has the same output for [1] rados
> list-inconsistent-obj as mentioned before.  This is the [2] full health
> detail.  This is the [3] excerpt from the log from the deep-scrub that
> marked the PG inconsistent.  The scrub happened when the PG was starting up
> after using ceph-objectstore-tool to split its filestore subfolders.  This
> is using a script that I've used for months without any side effects.
>
> I have tried quite a few things to get this PG to deep-scrub or repair,
> but to no avail.  It will not do anything.  I have set every osd's
> osd_max_scrubs to 0 in the cluster, waited for all scrubbing and deep
> scrubbing to finish, then increased the 11 OSDs for this PG to 1 before
> issuing a deep-scrub.  And it will sit there for over an hour without
> deep-scrubbing.  My current testing of this is to set all osds to 1,
> increase all of the osds for this PG to 4, and then issue the repair... but
> similarly nothing happens.  Each time I issue the deep-scrub or repair, the
> output correctly says 'instructing pg 145.2e3 on osd.234 to repair', but
> nothing shows up in the log for the OSD and the PG state stays
> 'active+clean+inconsistent'.
>
> My next step, unless anyone has a better idea, is to find the exact copy
> of the PG with the missing object, use object-store-tool to back up that
> copy of the PG and remove it.  Then starting the OSD back up should
> backfill the full copy of the PG and be healthy again.
>
>
>
> [1] $ rados list-inconsistent-obj 145.2e3
> No scrub information available for pg 145.2e3
> error 2: (2) No such file or directory
>
> [2] $ ceph health detail
> HEALTH_ERR 1 scrub errors; Possible data damage: 1 pg inconsistent
> OSD_SCRUB_ERRORS 1 scrub errors
> PG_DAMAGED Possible data damage: 1 pg inconsistent
>     pg 145.2e3 is active+clean+inconsistent, acting
> [234,132,33,331,278,217,55,358,79,3,24]
>
> [3] 2018-04-04 15:24:53.603380 7f54d1820700  0 log_channel(cluster) log
> [DBG] : 145.2e3 deep-scrub starts
> 2018-04-04 17:32:37.916853 7f54d1820700 -1 log_channel(cluster) log [ERR]
> : 145.2e3s0 deep-scrub 1 missing, 0 inconsistent objects
> 2018-04-04 17:32:37.916865 7f54d1820700 -1 log_channel(cluster) log [ERR]
> : 145.2e3 deep-scrub 1 errors
>
> On Mon, Apr 2, 2018 at 4:51 PM Michael Sudnick <michael.sudn...@gmail.com>
> wrote:
>
>> Hi Kjetil,
>>
>> I've tried to get the pg scrubbing/deep scrubbing and nothing seems to be
>> happening. I've tried it a few times over the last few days. My cluster is
>> recovering from a failed disk (which was probably the reason for the
>> inconsistency), do I need to wait for the cluster to heal before
>> repair/deep scrub works?
>>
>> -Michael
>>
>> On 2 April 2018 at 14:13, Kjetil Joergensen <kje...@medallia.com> wrote:
>>
>>> Hi,
>>>
>>> scrub or deep-scrub the pg, that should in theory get you back to
>>> list-inconsistent-obj spitting out what's wrong, then mail that info to the
>>> list.
>>>
>>> -KJ
>>>
>>> On Sun, Apr 1, 2018 at 9:17 AM, Michael Sudnick <
>>> michael.sudn...@gmail.com> wrote:
>>>
>>>> Hello,
>>>>
>>>> I have a small cluster with an inconsistent pg. I've tried ceph pg
>>>> repair multiple times to no luck. rados list-inconsistent-obj 49.11c
>>>> returns:
>>>>
>>>> # rados list-inconsistent-obj 49.11c
>>>> No scrub information available for pg 49.11c
>>>> error 2: (2) No such file or directory
>>>>
>>>> I'm a bit at a loss here as what to do to recover. That pg is part of a
>>>> cephfs_data pool with compression set to force/snappy.
>>>>
>>>> Does anyone have an suggestions?
>>>>
>>>> -Michael
>>>>
>>>> _______________________________________________
>>>> ceph-users mailing list
>>>> ceph-users@lists.ceph.com
>>>> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>>>>
>>>>
>>>
>>>
>>> --
>>> Kjetil Joergensen <kje...@medallia.com>
>>> SRE, Medallia Inc
>>>
>>
>> _______________________________________________
>> ceph-users mailing list
>> ceph-users@lists.ceph.com
>> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>>
>
_______________________________________________
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Reply via email to