Hi,
we have often seen strange behavior and also interesting pg targets from pg_autoscaler in the last years.
That's why we disable it globally.

The commands:
ceph osd reweight-by-utilization
ceph osd test-reweight-by-utilization
are from the time before the upmap balancer was introduced and did not solve the problem in the long run in an active cluster.

That the balancer skips the pool with the rebalancing, I've seen more than once.

Why pg_autoscaler behaves this way would have to be analyzed in more detail. As mentioned above, we normally turn it off.

The idea of Eugen would help when the pool rebalancing is done.

Joachim

___________________________________
ceph ambassador DACH
ceph consultant since 2012

Clyso GmbH - Premier Ceph Foundation Member

https://www.clyso.com/

Am 22.09.23 um 11:22 schrieb b...@sanger.ac.uk:
Hi Folks,

We are currently running with one nearfull OSD and 15 nearfull pools. The most 
full OSD is about 86% full but the average is 58% full. However, the balancer 
is skipping a pool on which the autoscaler is trying to complete a pg_num 
reduction from 131,072 to 32,768 (default.rgw.buckets.data pool). However, the 
autoscaler has been working on this for the last 20 days, it works through a 
list of objects that are misplaced but when it gets close to the end, more 
objects get added to the list.

This morning I observed the list get down to c. 7,000 objects misplaced with 2 
PGs active+remapped+backfilling, one PG completed the backfilling then the list 
shot up to c. 70,000 objects misplaced with 3 PGs active+remapped+backfilling.

Has anyone come across this behaviour before? If so, what was your remediation?

Thanks in advance for sharing.
Bruno
_______________________________________________
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io

Reply via email to