[ceph-users] Re: OSDs not balancing

Ryan Sleeth Tue, 21 Oct 2025 07:12:33 -0700

Thank you everyone for your individual replies. Autoscaler was outputting
nothing so the note here was relevant:
https://docs.ceph.com/en/latest/rados/operations/placement-groups/#viewing-pg-scaling-recommendations


.mgr pool was on the default CRUSH root by itself. After fixing that
recovery began as expected.

On Tue, Oct 7, 2025 at 2:38 AM Eugen Block <[email protected]> wrote:

> Hi,
>
> just to not get confused about the wording:
>
> There's the autoscaler which can change pg_num of pools if enabled.
> Maybe it was in "warn" mode? You can add the output of:
>
> ceph osd pool autoscale-status
>
> And then there's the balancer which moves PGs around to have the OSD
> utilization evenly balanced. The balancer only kicks in when there's
> less than 5% misplaced objects. Since your cluster currently shows
> 22%, it won't do anything. So you'll indeed have to wait for the
> misplaced objects to drop. You should also see the respective message
> in:
>
> ceph balancer status
>
> Regards,
> Eugen
>
> Zitat von Ryan Sleeth <[email protected]>:
>
> > Hello, I am new to Ceph and still learning. I setup a test cluster and
> > copied a lot of 'torture' data to it (20 TB with many files under 4KB). I
> > copied this data to different pools configured for different compression
> > plugins (snappy vs. lz4 vs. zstd), required ratios (default 87 vs. 97),
> and
> > EC plugins (jerasure vs. isa). All totaled, the pools add up to ~440 TB
> > (out of 1.3 PB) all using EC 4+2.
> >
> > Perhaps the autobalancer is misbehaving because many OSDs were 'stuck'
> at 1
> > PG until I manually specified them to be 256--now they are catching up.
> > What is not showing any progress is: 1) I have one HDD OSD at 87% full,
> the
> > next highest at 69%, and the lowest at 12% and 2) a huge number of PGs
> not
> > deep-scrubbed in time; probably can ignore this as its a result of the
> > dramatic PG changes
> >
> > The cluster is 9x Dell PowerEdge R730xd each with 8x 20 TB HDDs and 2x 2
> TB
> > enterprise NVMes configured for db; setup on Ubuntu 22 with Tentacle
> 20.1.0
> > RC0 upgraded from Squid and running with Docker/cephadm.
> >
> > Perhaps I am being impatient waiting for it to rebalance? Does the
> > community have any suggestions about how to approach this? PG
> autobalancer
> > seems fine.
> >
> > cluster:
> > id: mypool
> > health: HEALTH_WARN
> > 1 nearfull osd(s)
> > 1392 pgs not deep-scrubbed in time
> > 217 pgs not scrubbed in time
> > 21 pool(s) nearfull
> > services:
> > mon: 5 daemons, quorum ceph01,ceph02,ceph03,ceph06,ceph05 (age 2w)
> [leader:
> > ceph01]
> > mgr: ceph01.qfwdee(active, since 2w), standbys: ceph02.imozkz
> > mds: 2/2 daemons up, 1 standby
> > osd: 90 osds: 90 up (since 2w), 90 in (since 3w); 863 remapped pgs
> > data:
> > volumes: 2/2 healthy
> > pools: 25 pools, 3051 pgs
> > objects: 396.39M objects, 326 TiB
> > usage: 440 TiB used, 896 TiB / 1.3 PiB avail
> > pgs: 428953812/1897506717 objects misplaced (22.606%)
> > 2159 active+clean
> > 861 active+remapped+backfill_wait
> > 16 active+clean+scrubbing+deep
> > 13 active+clean+scrubbing
> > 2 active+remapped+backfilling
> > io:
> > recovery: 29 MiB/s, 21 objects/s
> >
> >
> > --
> > Ryan Sleeth
> > _______________________________________________
> > ceph-users mailing list -- [email protected]
> > To unsubscribe send an email to [email protected]
>
>
> _______________________________________________
> ceph-users mailing list -- [email protected]
> To unsubscribe send an email to [email protected]
>


-- 
Ryan Sleeth
_______________________________________________
ceph-users mailing list -- [email protected]
To unsubscribe send an email to [email protected]

[ceph-users] Re: OSDs not balancing

Reply via email to