Hi Ceph community,

We have a Ceph cluster with 6 OSD nodes and 168 OSDs (28 per node each with a 
18 TB data disk). We are running quincy 17.2.6. The cluster was not properly 
maintained for a while and it was in not the greatest shape. After some work, 
we got here:

[ceph: root@ceph-admin1 /]# ceph -s
  cluster:
    id:     26315dca-383a-11ee-9d49-xxxxxxxxxxxx
    health: HEALTH_WARN
            257 pgs not deep-scrubbed in time
            259 pgs not scrubbed in time

  services:
    mon: 5 daemons, quorum 
ceph-admin1,ceph-admin2,ceph-osd1,ceph-osd2,ceph-osd3 (age 6M)
    mgr: ceph-admin2.sipadf(active, since 1y), standbys: ceph-admin1.nwaovh
    mds: 2/2 daemons up, 2 standby
    osd: 168 osds: 168 up (since 6w), 168 in (since 6w)

  data:
    volumes: 2/2 healthy
    pools:   9 pools, 2273 pgs
    objects: 699.66M objects, 1.5 PiB
    usage:   2.2 PiB used, 529 TiB / 2.7 PiB avail
    pgs:     1986 active+clean
             125  active+clean+snaptrim_wait
             110  active+clean+scrubbing+deep
             47   active+clean+snaptrim
             5    active+clean+scrubbing


The snaptrim_wait and snaptrim numbers are stuck on these values for weeks, if 
not months. We also had almost all PGs not (deep-)scrubbed in time, but after 
tweaking a few parameters the numbers got down to what you see above. However, 
now they seem to not be lowering, or doing so very slowly (a couple every 2-3 
days). Dumping all PGs and filtering for the SCRUB_SCHEDULING column, I can see 
that some PGs have been scrubbing for a long time:

[root@ceph-admin1 ~]# ceph pg dump | grep -e 'scrubbing for'  -e SCHED  | awk 
'{print $1,$(NF-2)}' | sort -n -k2 | tail
5.624 4788773s
5.5f2 4790148s
5.4b 4791924s
5.686 4795526s
5.538 4796221s
5.5c8 4796573s
5.722 4796937s
5.483 4797551s
5.81 4799856s
5.233 10052851s

While recent ones look reasonable, so new scrubs are starting:

[root@ceph-admin1 ~]# ceph pg dump | grep -e 'scrubbing for'  -e SCHED  | awk 
'{print $1,$(NF-2)}' | sort -n -k2 | head
PG_STAT SCRUB_SCHEDULING
5.14b 53s
5.382 71s
5.ff 187s
6.9 265s
5.1c3 354s
5.70a 364s
5.596 367s
5.1fc 897s
5.3c8 1420s

Looking closer to PG 5.233, it has a long snaptrimq length, but it doesn't seem 
to be blocked:

[root@ceph-admin1 ~]# ceph pg 5.233 query | grep -E 
'snaptrimq_len|blocked_by|waiting_on'
            "snaptrimq_len": 2682,
            "blocked_by": [],

However it does seem to be waiting on someone:

[root@ceph-admin1 ~]# ceph pg 5.233 query | grep -A5 'waiting_on_whom'
        "waiting_on_whom": [
            "164(4)"
        ],
        "schedule": "scrubbing"

The primary OSD for PG 5.233 is osd.136 on node ceph-osd4, and its logs do not 
show anything remarkable. The HDD for osd.136 is in good shape, no SMART 
errors, no I/O errors anywhere on ceph-osd4.

Is it the scrubbing that's blocking the snaptrim, or the other way?

Also, looking at historic ops:

[root@ceph-admin1 ~]# ceph tell osd.136 dump_historic_ops_by_duration
{
    "size": 20,
    "duration": 600,
     ...
    "ops": [
        {
            "description": "rep_scrubmap(6.9 e14586 from shard 151)",
            "initiated_at": "2025-09-25T09:57:46.150913-0500",
            "age": 356.3390488,
            "duration": 0.12764220300000001,
            "type_data": {
                "flag_point": "started",
                "events": [
                    ...             
                    {
                        "event": "header_read",
                        "time": "2025-09-25T09:57:46.150912-0500",
                        "duration": 4294967295.9999995
                    },

This last duration above looks odd as 4294967295.9999995 is 2^32. Not sure what 
it means, but thought it was strange.

Any clues on any of this?

Thank you,
Gustavo
_______________________________________________
ceph-users mailing list -- [email protected]
To unsubscribe send an email to [email protected]

Reply via email to