Hi Nicola,

its not noise. Even though the modules seem disabled and pool flags are set to 
false, they still linger around in the background and interfere. See the recent 
thread 
https://lists.ceph.io/hyperkitty/list/ceph-users@ceph.io/thread/WST6K5A4UQGGISBFGJEZS4HFL2VVWW32/
 .

With all the settings you have, the last one would be setting

ceph config set mgr target_max_misplaced_ratio 1

and all the balancer- and scaling modules will just do what you tell them, 
assuming you know what you are doing. I restored default behaviour with instant 
application of changes and don't have any problems with it.

Best regards,
=================
Frank Schilder
AIT Risø Campus
Bygning 109, rum S14

________________________________________
From: Josh Baergen <jbaer...@digitalocean.com>
Sent: 07 October 2022 17:16:49
To: Nicola Mori
Cc: ceph-users
Subject: [ceph-users] Re: Iinfinite backfill loop + number of pgp groups stuck 
at wrong value

As of Nautilus+, when you set pg_num, it actually internally sets
pg(p)_num_target, and then slowly increases (or decreases, if you're
merging) pg_num and then pgp_num until it reaches the target. The
amount of backfill scheduled into the system is controlled by
target_max_misplaced_ratio.

Josh

On Fri, Oct 7, 2022 at 3:50 AM Nicola Mori <m...@fi.infn.it> wrote:
>
> The situation got solved by itself, since probably there was no error. I
> manually increased the number of PGs and PGPs to 128 some days ago, and
> the PGP count was being updated step by step. Actually after a bump from
> 5% to 7% in the count of misplaced object I noticed that the number of
> PGPs was updated to 126, and after a last bump it is now at 128 with a
> ~4% of misplaced objects currently decreasing.
> Sorry for the noise,
>
> Nicola
>
> On 07/10/22 09:15, Nicola Mori wrote:
> > Dear Ceph users,
> >
> > my cluster is stuck since several days with some PG backfilling. The
> > number of misplaced objects slowly decreases down to 5%, and at that
> > point jumps up again to about 7%, and so on. I found several possible
> > reasons for this behavior. One is related to the balancer, which anyway
> > I think is not operating:
> >
> > # ceph balancer status
> > {
> >      "active": false,
> >      "last_optimize_duration": "0:00:00.000938",
> >      "last_optimize_started": "Thu Oct  6 16:19:59 2022",
> >      "mode": "upmap",
> >      "optimize_result": "Too many objects (0.071539 > 0.050000) are
> > misplaced; try again later",
> >      "plans": []
> > }
> >
> > (the lase optimize result is from yesterday when I disabled it, and
> > since then the backfill loop has happened several times).
> > Another possible reason seems to be an imbalance of PG and PGB  numbers.
> > Effectively I found such an imbalance on one of my pools:
> >
> > # ceph osd pool get wizard_data pg_num
> > pg_num: 128
> > # ceph osd pool get wizard_data pgp_num
> > pgp_num: 123
> >
> > but I cannot fix it:
> > # ceph osd pool set wizard_data pgp_num 128
> > set pool 3 pgp_num to 128
> > # ceph osd pool get wizard_data pgp_num
> > pgp_num: 123
> >
> > The autoscaler is off for that pool:
> >
> > POOL               SIZE  TARGET SIZE                RATE  RAW CAPACITY
> > RATIO  TARGET RATIO  EFFECTIVE RATIO  BIAS  PG_NUM  NEW PG_NUM
> > AUTOSCALE  BULK
> > wizard_data       8951G               1.3333333730697632        152.8T
> > 0.0763                                  1.0     128              off
> > False
> >
> > so I don't understand why the PGP number is stuck at 123.
> > Thanks in advance for any help,
> >
> > Nicola
>
> --
> Nicola Mori, Ph.D.
> INFN sezione di Firenze
> Via Bruno Rossi 1, 50019 Sesto F.no (Italy)
> +390554572660
> m...@fi.infn.it
> _______________________________________________
> ceph-users mailing list -- ceph-users@ceph.io
> To unsubscribe send an email to ceph-users-le...@ceph.io
_______________________________________________
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io
_______________________________________________
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io

Reply via email to