[ceph-users] Re: Seemingly stuck scrubbing and snaptrims

Eugen Block Fri, 26 Sep 2025 23:39:42 -0700

Hi,

in Quincy (17.2.X) mclock is the default scheduler, there have beenissues with snaptrimming in mclock which have been resolved in thelatest releases. You might be facing exactly that (stuck snaptrim).I'd recommend to either upgrade to latest Reef or Squid which shouldimprove the mclock scheduler. Or you revert to the previous default"wpq" and restart all OSDs.


ceph config get osd osd_op_queue

If you are already running wpq we should take a closer look, if not,change it to wpq:


ceph config set osd osd_op_queue wpq
ceph orch restart osd

The latter will restart all OSDs (not simultaneously). Then you shouldsee changes in the snaptrim queues.If you're patient and have time to dig in, you could check if thestuck snaptrim operations are due to the average object size in thoseaffected PGs compared to the osd_snap_trim_cost.

The deep-scrubs should then also continue, I think they're blocked bythe snaptrims.


Regards,
Eugen

Zitat von Gustavo Garcia Rondina <[email protected]>:

Hi Ceph community,
We have a Ceph cluster with 6 OSD nodes and 168 OSDs (28 per nodeeach with a 18 TB data disk). We are running quincy 17.2.6. Thecluster was not properly maintained for a while and it was in notthe greatest shape. After some work, we got here:
[ceph: root@ceph-admin1 /]# ceph -s
  cluster:
    id:     26315dca-383a-11ee-9d49-xxxxxxxxxxxx
    health: HEALTH_WARN
            257 pgs not deep-scrubbed in time
            259 pgs not scrubbed in time

  services:
mon: 5 daemons, quorumceph-admin1,ceph-admin2,ceph-osd1,ceph-osd2,ceph-osd3 (age 6M)
    mgr: ceph-admin2.sipadf(active, since 1y), standbys: ceph-admin1.nwaovh
    mds: 2/2 daemons up, 2 standby
    osd: 168 osds: 168 up (since 6w), 168 in (since 6w)

  data:
    volumes: 2/2 healthy
    pools:   9 pools, 2273 pgs
    objects: 699.66M objects, 1.5 PiB
    usage:   2.2 PiB used, 529 TiB / 2.7 PiB avail
    pgs:     1986 active+clean
             125  active+clean+snaptrim_wait
             110  active+clean+scrubbing+deep
             47   active+clean+snaptrim
             5    active+clean+scrubbing
The snaptrim_wait and snaptrim numbers are stuck on these values forweeks, if not months. We also had almost all PGs not (deep-)scrubbedin time, but after tweaking a few parameters the numbers got down towhat you see above. However, now they seem to not be lowering, ordoing so very slowly (a couple every 2-3 days). Dumping all PGs andfiltering for the SCRUB_SCHEDULING column, I can see that some PGshave been scrubbing for a long time:
[root@ceph-admin1 ~]# ceph pg dump | grep -e 'scrubbing for' -eSCHED | awk '{print $1,$(NF-2)}' | sort -n -k2 | tail
5.624 4788773s
5.5f2 4790148s
5.4b 4791924s
5.686 4795526s
5.538 4796221s
5.5c8 4796573s
5.722 4796937s
5.483 4797551s
5.81 4799856s
5.233 10052851s

While recent ones look reasonable, so new scrubs are starting:
[root@ceph-admin1 ~]# ceph pg dump | grep -e 'scrubbing for' -eSCHED | awk '{print $1,$(NF-2)}' | sort -n -k2 | head
PG_STAT SCRUB_SCHEDULING
5.14b 53s
5.382 71s
5.ff 187s
6.9 265s
5.1c3 354s
5.70a 364s
5.596 367s
5.1fc 897s
5.3c8 1420s
Looking closer to PG 5.233, it has a long snaptrimq length, but itdoesn't seem to be blocked:
[root@ceph-admin1 ~]# ceph pg 5.233 query | grep -E'snaptrimq_len|blocked_by|waiting_on'
            "snaptrimq_len": 2682,
            "blocked_by": [],

However it does seem to be waiting on someone:

[root@ceph-admin1 ~]# ceph pg 5.233 query | grep -A5 'waiting_on_whom'
        "waiting_on_whom": [
            "164(4)"
        ],
        "schedule": "scrubbing"
The primary OSD for PG 5.233 is osd.136 on node ceph-osd4, and itslogs do not show anything remarkable. The HDD for osd.136 is in goodshape, no SMART errors, no I/O errors anywhere on ceph-osd4.
Is it the scrubbing that's blocking the snaptrim, or the other way?

Also, looking at historic ops:

[root@ceph-admin1 ~]# ceph tell osd.136 dump_historic_ops_by_duration
{
    "size": 20,
    "duration": 600,
     ...
    "ops": [
        {
            "description": "rep_scrubmap(6.9 e14586 from shard 151)",
            "initiated_at": "2025-09-25T09:57:46.150913-0500",
            "age": 356.3390488,
            "duration": 0.12764220300000001,
            "type_data": {
                "flag_point": "started",
                "events": [
                    ...             
                    {
                        "event": "header_read",
                        "time": "2025-09-25T09:57:46.150912-0500",
                        "duration": 4294967295.9999995
                    },
This last duration above looks odd as 4294967295.9999995 is 2^32.Not sure what it means, but thought it was strange.
Any clues on any of this?

Thank you,
Gustavo
_______________________________________________
ceph-users mailing list -- [email protected]
To unsubscribe send an email to [email protected]



_______________________________________________
ceph-users mailing list -- [email protected]
To unsubscribe send an email to [email protected]

[ceph-users] Re: Seemingly stuck scrubbing and snaptrims

Reply via email to