[ceph-users] Re: erasure coded pool PG stuck inconsistent on ceph Pacific 15.2.13

Wesley Dillingham Fri, 19 Nov 2021 10:18:34 -0800

You may also be able to use an upmap (or the upmap balancer) to help make
room for you on the osd which is too full.


Respectfully,

*Wes Dillingham*
w...@wesdillingham.com
LinkedIn <http://www.linkedin.com/in/wesleydillingham>


On Fri, Nov 19, 2021 at 1:14 PM Wesley Dillingham <w...@wesdillingham.com>
wrote:

> Okay, now I see your attachment, the pg is in state:
>
> "state":
> "active+undersized+degraded+remapped+inconsistent+backfill_toofull",
>
> The reason it cant scrub or repair is that its degraded and further it
> seems that the cluster doesnt have the space to make that recovery happen
> "backfill_toofull" state. This may clear on its own as other pgs recover
> and this pg is ultimately able to recover. Other options are to remove data
> or add capacity. How full is your cluster? Is your cluster currently
> backfilling actively.
>
> Respectfully,
>
> *Wes Dillingham*
> w...@wesdillingham.com
> LinkedIn <http://www.linkedin.com/in/wesleydillingham>
>
>
> On Fri, Nov 19, 2021 at 10:57 AM J-P Methot <jp.met...@planethoster.info>
> wrote:
>
>> We have stopped deepscrubbing a while ago. However, forcing a deepscrub
>> by doing "ceph pg deep-scrub 6.180" doesn't do anything. The deepscrub
>> doesn't run at all. Could the deepscrubbing process be stuck elsewhere?
>> On 11/18/21 3:29 PM, Wesley Dillingham wrote:
>>
>> That response is typically indicative of a pg whose OSD sets has changed
>> since it was last scrubbed (typically from a disk failing).
>>
>> Are you sure its actually getting scrubbed when you issue the scrub? For
>> example you can issue: "ceph pg <pg_id> query"  and look for
>> "last_deep_scrub_stamp" which will tell you when it was last deep
>> scrubbed.
>>
>> Further, in sufficiently recent versions of Ceph (introduced in
>> 14.2.something iirc) setting the flag "nodeep-scrub" will cause all in
>> flight deep-scrubs to stop immediately. You may have a scheduling issue
>> where you deep-scrub or repairs arent getting scheduled.
>>
>> Set the nodeep-scrub flag: "ceph osd set nodeep-scrub" and wait for all
>> current deep-scrubs to complete then try and manually re-issue the deep
>> scrub "ceph pg deep-scrub <pg_id>" at this point your scrub should start
>> near immediately and "rados
>> list-inconsistent-obj 6.180 --format=json-pretty" should return with
>> something of value.
>>
>> Respectfully,
>>
>> *Wes Dillingham*
>> w...@wesdillingham.com
>> LinkedIn <http://www.linkedin.com/in/wesleydillingham>
>>
>>
>> On Thu, Nov 18, 2021 at 2:38 PM J-P Methot <jp.met...@planethoster.info>
>> wrote:
>>
>>> Hi,
>>>
>>> We currently have a PG stuck in an inconsistent state on an erasure
>>> coded pool. The pool's K and M values are 33 and 3.  The command rados
>>> list-inconsistent-obj 6.180 --format=json-pretty results in the
>>> following error:
>>>
>>> No scrub information available for pg 6.180 error 2: (2) No such file or
>>> directory
>>>
>>> Forcing a deep scrub of the pg does not fix this. Doing a ceph pg repair
>>> 6.180 doesn't seem to do anything. Is there a known bug explaining this
>>> behavior? I am attaching informations regarding the PG in question.
>>>
>>> --
>>> Jean-Philippe Méthot
>>> Senior Openstack system administrator
>>> Administrateur système Openstack sénior
>>> PlanetHoster inc.
>>>
>>> _______________________________________________
>>> ceph-users mailing list -- ceph-users@ceph.io
>>> To unsubscribe send an email to ceph-users-le...@ceph.io
>>>
>> --
>> Jean-Philippe Méthot
>> Senior Openstack system administrator
>> Administrateur système Openstack sénior
>> PlanetHoster inc.
>>
>>
_______________________________________________
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io

[ceph-users] Re: erasure coded pool PG stuck inconsistent on ceph Pacific 15.2.13

Reply via email to