Thank you everyone for your individual replies. Autoscaler was outputting nothing so the note here was relevant: https://docs.ceph.com/en/latest/rados/operations/placement-groups/#viewing-pg-scaling-recommendations
.mgr pool was on the default CRUSH root by itself. After fixing that recovery began as expected. On Tue, Oct 7, 2025 at 2:38 AM Eugen Block <[email protected]> wrote: > Hi, > > just to not get confused about the wording: > > There's the autoscaler which can change pg_num of pools if enabled. > Maybe it was in "warn" mode? You can add the output of: > > ceph osd pool autoscale-status > > And then there's the balancer which moves PGs around to have the OSD > utilization evenly balanced. The balancer only kicks in when there's > less than 5% misplaced objects. Since your cluster currently shows > 22%, it won't do anything. So you'll indeed have to wait for the > misplaced objects to drop. You should also see the respective message > in: > > ceph balancer status > > Regards, > Eugen > > Zitat von Ryan Sleeth <[email protected]>: > > > Hello, I am new to Ceph and still learning. I setup a test cluster and > > copied a lot of 'torture' data to it (20 TB with many files under 4KB). I > > copied this data to different pools configured for different compression > > plugins (snappy vs. lz4 vs. zstd), required ratios (default 87 vs. 97), > and > > EC plugins (jerasure vs. isa). All totaled, the pools add up to ~440 TB > > (out of 1.3 PB) all using EC 4+2. > > > > Perhaps the autobalancer is misbehaving because many OSDs were 'stuck' > at 1 > > PG until I manually specified them to be 256--now they are catching up. > > What is not showing any progress is: 1) I have one HDD OSD at 87% full, > the > > next highest at 69%, and the lowest at 12% and 2) a huge number of PGs > not > > deep-scrubbed in time; probably can ignore this as its a result of the > > dramatic PG changes > > > > The cluster is 9x Dell PowerEdge R730xd each with 8x 20 TB HDDs and 2x 2 > TB > > enterprise NVMes configured for db; setup on Ubuntu 22 with Tentacle > 20.1.0 > > RC0 upgraded from Squid and running with Docker/cephadm. > > > > Perhaps I am being impatient waiting for it to rebalance? Does the > > community have any suggestions about how to approach this? PG > autobalancer > > seems fine. > > > > cluster: > > id: mypool > > health: HEALTH_WARN > > 1 nearfull osd(s) > > 1392 pgs not deep-scrubbed in time > > 217 pgs not scrubbed in time > > 21 pool(s) nearfull > > services: > > mon: 5 daemons, quorum ceph01,ceph02,ceph03,ceph06,ceph05 (age 2w) > [leader: > > ceph01] > > mgr: ceph01.qfwdee(active, since 2w), standbys: ceph02.imozkz > > mds: 2/2 daemons up, 1 standby > > osd: 90 osds: 90 up (since 2w), 90 in (since 3w); 863 remapped pgs > > data: > > volumes: 2/2 healthy > > pools: 25 pools, 3051 pgs > > objects: 396.39M objects, 326 TiB > > usage: 440 TiB used, 896 TiB / 1.3 PiB avail > > pgs: 428953812/1897506717 objects misplaced (22.606%) > > 2159 active+clean > > 861 active+remapped+backfill_wait > > 16 active+clean+scrubbing+deep > > 13 active+clean+scrubbing > > 2 active+remapped+backfilling > > io: > > recovery: 29 MiB/s, 21 objects/s > > > > > > -- > > Ryan Sleeth > > _______________________________________________ > > ceph-users mailing list -- [email protected] > > To unsubscribe send an email to [email protected] > > > _______________________________________________ > ceph-users mailing list -- [email protected] > To unsubscribe send an email to [email protected] > -- Ryan Sleeth _______________________________________________ ceph-users mailing list -- [email protected] To unsubscribe send an email to [email protected]
