[ceph-users] Re: erasure coded pool PG stuck inconsistent on ceph Pacific 15.2.13

Wesley Dillingham Fri, 19 Nov 2021 10:15:30 -0800

Okay, now I see your attachment, the pg is in state:

"state":
"active+undersized+degraded+remapped+inconsistent+backfill_toofull",


The reason it cant scrub or repair is that its degraded and further it
seems that the cluster doesnt have the space to make that recovery happen
"backfill_toofull" state. This may clear on its own as other pgs recover
and this pg is ultimately able to recover. Other options are to remove data
or add capacity. How full is your cluster? Is your cluster currently
backfilling actively.

Respectfully,

*Wes Dillingham*
w...@wesdillingham.com
LinkedIn <http://www.linkedin.com/in/wesleydillingham>


On Fri, Nov 19, 2021 at 10:57 AM J-P Methot <jp.met...@planethoster.info>
wrote:

> We have stopped deepscrubbing a while ago. However, forcing a deepscrub by
> doing "ceph pg deep-scrub 6.180" doesn't do anything. The deepscrub doesn't
> run at all. Could the deepscrubbing process be stuck elsewhere?
> On 11/18/21 3:29 PM, Wesley Dillingham wrote:
>
> That response is typically indicative of a pg whose OSD sets has changed
> since it was last scrubbed (typically from a disk failing).
>
> Are you sure its actually getting scrubbed when you issue the scrub? For
> example you can issue: "ceph pg <pg_id> query"  and look for
> "last_deep_scrub_stamp" which will tell you when it was last deep
> scrubbed.
>
> Further, in sufficiently recent versions of Ceph (introduced in
> 14.2.something iirc) setting the flag "nodeep-scrub" will cause all in
> flight deep-scrubs to stop immediately. You may have a scheduling issue
> where you deep-scrub or repairs arent getting scheduled.
>
> Set the nodeep-scrub flag: "ceph osd set nodeep-scrub" and wait for all
> current deep-scrubs to complete then try and manually re-issue the deep
> scrub "ceph pg deep-scrub <pg_id>" at this point your scrub should start
> near immediately and "rados
> list-inconsistent-obj 6.180 --format=json-pretty" should return with
> something of value.
>
> Respectfully,
>
> *Wes Dillingham*
> w...@wesdillingham.com
> LinkedIn <http://www.linkedin.com/in/wesleydillingham>
>
>
> On Thu, Nov 18, 2021 at 2:38 PM J-P Methot <jp.met...@planethoster.info>
> wrote:
>
>> Hi,
>>
>> We currently have a PG stuck in an inconsistent state on an erasure
>> coded pool. The pool's K and M values are 33 and 3.  The command rados
>> list-inconsistent-obj 6.180 --format=json-pretty results in the
>> following error:
>>
>> No scrub information available for pg 6.180 error 2: (2) No such file or
>> directory
>>
>> Forcing a deep scrub of the pg does not fix this. Doing a ceph pg repair
>> 6.180 doesn't seem to do anything. Is there a known bug explaining this
>> behavior? I am attaching informations regarding the PG in question.
>>
>> --
>> Jean-Philippe Méthot
>> Senior Openstack system administrator
>> Administrateur système Openstack sénior
>> PlanetHoster inc.
>>
>> _______________________________________________
>> ceph-users mailing list -- ceph-users@ceph.io
>> To unsubscribe send an email to ceph-users-le...@ceph.io
>>
> --
> Jean-Philippe Méthot
> Senior Openstack system administrator
> Administrateur système Openstack sénior
> PlanetHoster inc.
>
>
_______________________________________________
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io

[ceph-users] Re: erasure coded pool PG stuck inconsistent on ceph Pacific 15.2.13

Reply via email to