I think you may have hit the same bug we encountered. Cory submitted a fix,
see if it fits what you've encountered:

https://github.com/ceph/ceph/pull/46727 (backport to Pacific here:
https://github.com/ceph/ceph/pull/46877 )
https://tracker.ceph.com/issues/54172

On Fri, Jul 15, 2022 at 8:52 AM Wesley Dillingham <w...@wesdillingham.com>
wrote:

> We have two clusters one 14.2.22 -> 16.2.7 -> 16.2.9
>
> Another 16.2.7 -> 16.2.9
>
> Both with a multi disk (spinner block / ssd block.db) and both CephFS
> around 600 OSDs each with combo of rep-3 and 8+3 EC data pools. Examples of
> stuck scrubbing PGs from all of the pools.
>
> They have generally been behind on scrubbing which we attributed to simply
> being large disks (10TB) with a heavy write load and the OSDs just having
> trouble keeping up. On closer inspection it appears we have many PGs that
> have been lodged in a deep scrubbing state on one cluster for 2 weeks and
> another for 7 weeks. Wondering if others have been experiencing anything
> similar. The only example of PGs being stuck scrubbing I have seen in the
> past has been related to snaptrim PG state but we arent doing anything with
> snapshots in these new clusters.
>
> Granted my cluster has been warning me with "pgs not deep-scrubbed in time"
> and its on me for not looking more closely into why. Perhaps a separate
> warning of "PG Stuck Scrubbing for greater than 24 hours" or similar might
> be helpful to an operator.
>
> In any case I was able to get scrubs proceeding again by restarting the
> primary OSD daemon in the PGs which were stuck. Will monitor closely for
> additional stuck scrubs.
>
>
> Respectfully,
>
> *Wes Dillingham*
> w...@wesdillingham.com
> LinkedIn <http://www.linkedin.com/in/wesleydillingham>
> _______________________________________________
> ceph-users mailing list -- ceph-users@ceph.io
> To unsubscribe send an email to ceph-users-le...@ceph.io
>
_______________________________________________
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io

Reply via email to