[ceph-users] Re: OSDs not balancing

Eugen Block Fri, 17 Oct 2025 09:14:36 -0700

Hi,

just to not get confused about the wording:

There's the autoscaler which can change pg_num of pools if enabled.Maybe it was in "warn" mode? You can add the output of:


ceph osd pool autoscale-status

And then there's the balancer which moves PGs around to have the OSDutilization evenly balanced. The balancer only kicks in when there'sless than 5% misplaced objects. Since your cluster currently shows22%, it won't do anything. So you'll indeed have to wait for themisplaced objects to drop. You should also see the respective messagein:


ceph balancer status

Regards,
Eugen

Zitat von Ryan Sleeth <[email protected]>:

Hello, I am new to Ceph and still learning. I setup a test cluster and
copied a lot of 'torture' data to it (20 TB with many files under 4KB). I
copied this data to different pools configured for different compression
plugins (snappy vs. lz4 vs. zstd), required ratios (default 87 vs. 97), and
EC plugins (jerasure vs. isa). All totaled, the pools add up to ~440 TB
(out of 1.3 PB) all using EC 4+2.

Perhaps the autobalancer is misbehaving because many OSDs were 'stuck' at 1
PG until I manually specified them to be 256--now they are catching up.
What is not showing any progress is: 1) I have one HDD OSD at 87% full, the
next highest at 69%, and the lowest at 12% and 2) a huge number of PGs not
deep-scrubbed in time; probably can ignore this as its a result of the
dramatic PG changes

The cluster is 9x Dell PowerEdge R730xd each with 8x 20 TB HDDs and 2x 2 TB
enterprise NVMes configured for db; setup on Ubuntu 22 with Tentacle 20.1.0
RC0 upgraded from Squid and running with Docker/cephadm.

Perhaps I am being impatient waiting for it to rebalance? Does the
community have any suggestions about how to approach this? PG autobalancer
seems fine.

cluster:
id: mypool
health: HEALTH_WARN
1 nearfull osd(s)
1392 pgs not deep-scrubbed in time
217 pgs not scrubbed in time
21 pool(s) nearfull
services:
mon: 5 daemons, quorum ceph01,ceph02,ceph03,ceph06,ceph05 (age 2w) [leader:
ceph01]
mgr: ceph01.qfwdee(active, since 2w), standbys: ceph02.imozkz
mds: 2/2 daemons up, 1 standby
osd: 90 osds: 90 up (since 2w), 90 in (since 3w); 863 remapped pgs
data:
volumes: 2/2 healthy
pools: 25 pools, 3051 pgs
objects: 396.39M objects, 326 TiB
usage: 440 TiB used, 896 TiB / 1.3 PiB avail
pgs: 428953812/1897506717 objects misplaced (22.606%)
2159 active+clean
861 active+remapped+backfill_wait
16 active+clean+scrubbing+deep
13 active+clean+scrubbing
2 active+remapped+backfilling
io:
recovery: 29 MiB/s, 21 objects/s


--
Ryan Sleeth
_______________________________________________
ceph-users mailing list -- [email protected]
To unsubscribe send an email to [email protected]



_______________________________________________
ceph-users mailing list -- [email protected]
To unsubscribe send an email to [email protected]

[ceph-users] Re: OSDs not balancing

Reply via email to