Hi,
in Quincy (17.2.X) mclock is the default scheduler, there have been
issues with snaptrimming in mclock which have been resolved in the
latest releases. You might be facing exactly that (stuck snaptrim).
I'd recommend to either upgrade to latest Reef or Squid which should
improve the mclock scheduler. Or you revert to the previous default
"wpq" and restart all OSDs.
ceph config get osd osd_op_queue
If you are already running wpq we should take a closer look, if not,
change it to wpq:
ceph config set osd osd_op_queue wpq
ceph orch restart osd
The latter will restart all OSDs (not simultaneously). Then you should
see changes in the snaptrim queues.
If you're patient and have time to dig in, you could check if the
stuck snaptrim operations are due to the average object size in those
affected PGs compared to the osd_snap_trim_cost.
The deep-scrubs should then also continue, I think they're blocked by
the snaptrims.
Regards,
Eugen
Zitat von Gustavo Garcia Rondina <[email protected]>:
Hi Ceph community,
We have a Ceph cluster with 6 OSD nodes and 168 OSDs (28 per node
each with a 18 TB data disk). We are running quincy 17.2.6. The
cluster was not properly maintained for a while and it was in not
the greatest shape. After some work, we got here:
[ceph: root@ceph-admin1 /]# ceph -s
cluster:
id: 26315dca-383a-11ee-9d49-xxxxxxxxxxxx
health: HEALTH_WARN
257 pgs not deep-scrubbed in time
259 pgs not scrubbed in time
services:
mon: 5 daemons, quorum
ceph-admin1,ceph-admin2,ceph-osd1,ceph-osd2,ceph-osd3 (age 6M)
mgr: ceph-admin2.sipadf(active, since 1y), standbys: ceph-admin1.nwaovh
mds: 2/2 daemons up, 2 standby
osd: 168 osds: 168 up (since 6w), 168 in (since 6w)
data:
volumes: 2/2 healthy
pools: 9 pools, 2273 pgs
objects: 699.66M objects, 1.5 PiB
usage: 2.2 PiB used, 529 TiB / 2.7 PiB avail
pgs: 1986 active+clean
125 active+clean+snaptrim_wait
110 active+clean+scrubbing+deep
47 active+clean+snaptrim
5 active+clean+scrubbing
The snaptrim_wait and snaptrim numbers are stuck on these values for
weeks, if not months. We also had almost all PGs not (deep-)scrubbed
in time, but after tweaking a few parameters the numbers got down to
what you see above. However, now they seem to not be lowering, or
doing so very slowly (a couple every 2-3 days). Dumping all PGs and
filtering for the SCRUB_SCHEDULING column, I can see that some PGs
have been scrubbing for a long time:
[root@ceph-admin1 ~]# ceph pg dump | grep -e 'scrubbing for' -e
SCHED | awk '{print $1,$(NF-2)}' | sort -n -k2 | tail
5.624 4788773s
5.5f2 4790148s
5.4b 4791924s
5.686 4795526s
5.538 4796221s
5.5c8 4796573s
5.722 4796937s
5.483 4797551s
5.81 4799856s
5.233 10052851s
While recent ones look reasonable, so new scrubs are starting:
[root@ceph-admin1 ~]# ceph pg dump | grep -e 'scrubbing for' -e
SCHED | awk '{print $1,$(NF-2)}' | sort -n -k2 | head
PG_STAT SCRUB_SCHEDULING
5.14b 53s
5.382 71s
5.ff 187s
6.9 265s
5.1c3 354s
5.70a 364s
5.596 367s
5.1fc 897s
5.3c8 1420s
Looking closer to PG 5.233, it has a long snaptrimq length, but it
doesn't seem to be blocked:
[root@ceph-admin1 ~]# ceph pg 5.233 query | grep -E
'snaptrimq_len|blocked_by|waiting_on'
"snaptrimq_len": 2682,
"blocked_by": [],
However it does seem to be waiting on someone:
[root@ceph-admin1 ~]# ceph pg 5.233 query | grep -A5 'waiting_on_whom'
"waiting_on_whom": [
"164(4)"
],
"schedule": "scrubbing"
The primary OSD for PG 5.233 is osd.136 on node ceph-osd4, and its
logs do not show anything remarkable. The HDD for osd.136 is in good
shape, no SMART errors, no I/O errors anywhere on ceph-osd4.
Is it the scrubbing that's blocking the snaptrim, or the other way?
Also, looking at historic ops:
[root@ceph-admin1 ~]# ceph tell osd.136 dump_historic_ops_by_duration
{
"size": 20,
"duration": 600,
...
"ops": [
{
"description": "rep_scrubmap(6.9 e14586 from shard 151)",
"initiated_at": "2025-09-25T09:57:46.150913-0500",
"age": 356.3390488,
"duration": 0.12764220300000001,
"type_data": {
"flag_point": "started",
"events": [
...
{
"event": "header_read",
"time": "2025-09-25T09:57:46.150912-0500",
"duration": 4294967295.9999995
},
This last duration above looks odd as 4294967295.9999995 is 2^32.
Not sure what it means, but thought it was strange.
Any clues on any of this?
Thank you,
Gustavo
_______________________________________________
ceph-users mailing list -- [email protected]
To unsubscribe send an email to [email protected]
_______________________________________________
ceph-users mailing list -- [email protected]
To unsubscribe send an email to [email protected]