Hi Eugen,

thanks, that was a great hint! I have a strong déjà vu feeling, we discussed 
this before with increasing pg_num, didn't we? I just set it to 1 and it did 
exactly what I wanted. Its the same number of PGs backfilling, but 
pgp_num=1024, so while the rebalancing load is the same, I got rid of any 
redundant data movements and I can actually see the progress of the merge just 
with ceph status.

Related to that, I have set mon_max_pg_per_osd=300 and do have OSDs with more 
than 400 PGs. Still, I don't see the promised health warning in ceph status. Is 
this a known issue?

Opinion part.

Returning to the above setting, I have to say that the assignment of which 
parameter influences what seems a bit unintuitive if not inconsistent. The 
parameter target_max_misplaced_ratio belongs to the balancer module, but 
merging PGs clearly is a task of the pg_autoscaler module. I'm not balancing, 
I'm scaling PG numbers. Such cross dependencies make it really hard to find 
relevant information in the section of the documentation where one would be 
looking for it. It starts being distributed all over the place.

If its not possible to have such things separated and specific tasks 
consistently explained in a single section, there could at least be a hint 
including also the case of PG merging/splitting in the description of 
target_max_misplaced_ratio so that a search for these terms brings up this 
page. There should also be a cross reference from "ceph osd pool set pg[p]_num" 
to target_max_misplaced_ratio. Well, its now here in this message for google to 
reveal.

I have to add that, while I understand the motivation behind adding these baby 
sitting modules, I would actually appreciate if one could disable them. I 
personally find them to be really annoying especially in emergency situations, 
but also in normal operations. I would consider them a nice to have and not 
enforce it on people who want to be in charge.

For example, in my current situation, I'm halving the PG count of a pool. Doing 
the merge in one go or letting the target_max_misplaced_ratio "help" me leads 
to exactly the same number of PGs backfilling at any time. Which means both 
cases, target_max_misplaced_ratio=0.05 and 1 lead to exactly the same 
interference of rebalancing IO with user IO. The difference is that with 
target_max_misplaced_ratio=0.05 this phase of reduced performance will take 
longer, because every time the module decides to change pgp_num it will 
inevitably also rebalance objects again that have been moved before. I find it 
difficult to consider this an improvement. I prefer to avoid any redundant 
writes at all cost for the benefit of disk life time. If I really need to 
reduce the impact of recovery IO I can set recovery_sleep.

My personal opinion to the user group.

Thanks for your help and have a nice evening!

Best regards,
=================
Frank Schilder
AIT Risø Campus
Bygning 109, rum S14

________________________________________
From: Eugen Block <ebl...@nde.ag>
Sent: 11 October 2022 14:13:45
To: ceph-users@ceph.io
Subject: [ceph-users] Re: How to force PG merging in one step?

Hi Frank,

I don't think it's the autoscaler interferring here but the default 5%
target_max_misplaced_ratio. I haven't tested the impacts of increasing
that to a much higher value, so be careful.

Regards,
Eugen


Zitat von Frank Schilder <fr...@dtu.dk>:

> Hi all,
>
> I need to reduce the number of PGs in a pool from 2048 to 512 and
> would really like to do that in a single step. I executed the set
> pg_num 512 command, but the PGs are not all merged. Instead I get
> this intermediate state:
>
> pool 13 'con-fs2-meta2' replicated size 4 min_size 2 crush_rule 3
> object_hash rjenkins pg_num 2048 pgp_num 1946 pg_num_target 512
> pgp_num_target 512 autoscale_mode off last_change 916710 lfor
> 0/0/618995 flags hashpspool,nodelete,selfmanaged_snaps max_bytes
> 107374182400 stripe_width 0 compression_mode none application cephfs
>
> This is really annoying, because it will not only lead to repeated
> redundant data movements and but I also need to rebalance this pool
> onto fewer OSDs, which cannot hold the 1946 PGs it will be merged to
> intermittently. How can I override the autoscaler interfering with
> admin operations in such tight corners?
>
> As you can see, we disabled autoscaler on all pools and also
> globally. Still, it interferes with admin commands in an unsolicited
> way. I would like the PG merge happen on the fly as the data moves
> to the new OSDs.
>
> Best regards,
> =================
> Frank Schilder
> AIT Risø Campus
> Bygning 109, rum S14
> _______________________________________________
> ceph-users mailing list -- ceph-users@ceph.io
> To unsubscribe send an email to ceph-users-le...@ceph.io



_______________________________________________
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io
_______________________________________________
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io

Reply via email to