[ceph-users] Re: Stop Rebalancing

2022-04-13 Thread Neha Ojha
For the moment, Dan's workaround sounds good to me, but I'd like to
understand how we got here, in terms of the decisions that were made
by the autoscaler.
We have a config option called "target_max_misplaced_ratio" (default
value is 0.05), which is supposed to limit the number of misplaced
objects in the cluster to 5% of the total. Ray, in your case, does
that seem to have worked, given that you have ~1.3 billion misplaced
objects?

In any case, let's use https://tracker.ceph.com/issues/55303 to
capture some more debug data that can help us understand the actions
of the autoscaler. To start with, it would be helpful if you could
attach the cluster and audit logs, output of ceph -s, ceph df along
with the output of ceph osd pool autoscale-status and ceph osd pool ls
detail. Junior (Kamoltat), is there anything else that will be useful
to capture to get to the bottom of this?

Just for future reference, 16.2.8 and quincy, will include a
"noautoscale" cluster-wide flag, which can be used to disable auto
scaling across pools, during maintenance periods.

Thanks,
Neha


On Wed, Apr 13, 2022 at 1:58 PM Ray Cunningham
 wrote:
>
> We've done that, I'll update with what happens overnight. Thanks everyone!
>
>
> Thank you,
>
> Ray
>
> 
> From: Anthony D'Atri 
> Sent: Wednesday, April 13, 2022 4:49 PM
> To: Ceph Users 
> Subject: [ceph-users] Re: Stop Rebalancing
>
>
>
> > In any case, isn't this still the best approach to make all PGs go
> > active+clean ASAP in this scenario?
> >
> > 1. turn off the autoscaler (for those pools, or fully)
> > 2. for any pool with pg_num_target or pgp_num_target values, get the
> > current pgp_num X and use it to `ceph osd pool set  pg_num X`.
> >
> > Can someone confirm that or recommend something different?
>
> FWIW that’s what I would do.
>
> ___
> ceph-users mailing list -- ceph-users@ceph.io
> To unsubscribe send an email to ceph-users-le...@ceph.io
> ___
> ceph-users mailing list -- ceph-users@ceph.io
> To unsubscribe send an email to ceph-users-le...@ceph.io
>

___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: Stop Rebalancing

2022-04-13 Thread Ray Cunningham
We've done that, I'll update with what happens overnight. Thanks everyone!


Thank you,

Ray


From: Anthony D'Atri 
Sent: Wednesday, April 13, 2022 4:49 PM
To: Ceph Users 
Subject: [ceph-users] Re: Stop Rebalancing



> In any case, isn't this still the best approach to make all PGs go
> active+clean ASAP in this scenario?
>
> 1. turn off the autoscaler (for those pools, or fully)
> 2. for any pool with pg_num_target or pgp_num_target values, get the
> current pgp_num X and use it to `ceph osd pool set  pg_num X`.
>
> Can someone confirm that or recommend something different?

FWIW that’s what I would do.

___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: Stop Rebalancing

2022-04-13 Thread Anthony D'Atri


> In any case, isn't this still the best approach to make all PGs go
> active+clean ASAP in this scenario?
> 
> 1. turn off the autoscaler (for those pools, or fully)
> 2. for any pool with pg_num_target or pgp_num_target values, get the
> current pgp_num X and use it to `ceph osd pool set  pg_num X`.
> 
> Can someone confirm that or recommend something different?

FWIW that’s what I would do.

___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: Stop Rebalancing

2022-04-13 Thread Dan van der Ster
On Wed, Apr 13, 2022 at 7:07 PM Gregory Farnum  wrote:
>
> On Wed, Apr 13, 2022 at 10:01 AM Dan van der Ster  wrote:
> >
> > I would set the pg_num, not pgp_num. In older versions of ceph you could
> > manipulate these things separately, but in pacific I'm not confident about
> > what setting pgp_num directly will do in this exact scenario.
> >
> > To understand, the difference between these two depends on if you're
> > splitting or merging.
> > First, definitions: pg_num is the number of PGs and pgp_num is the number
> > used for placing objects.
> >
> > So if pgp_num < pg_num, then at steady state only pgp_num pgs actually
> > store data, and the other pg_num-pgp_num PGs are sitting empty.
>
> Wait, what? That's not right! pgp_num is pg *placement* number; it
> controls how we map PGs to OSDs. But the full pg still exists as its
> own thing on the OSD and has its own data structures and objects. If
> currently the cluster has reduced pgp_num it has changed the locations
> of PGs, but it hasn't merged any PGs together. Changing the pg_num and
> causing merges will invoke a whole new workload which can be pretty
> substantial.

Eek, yes, I got this wrong. Somehow I imagined some orthogonal
implementation based on how it appears to work in practice.

In any case, isn't this still the best approach to make all PGs go
active+clean ASAP in this scenario?

1. turn off the autoscaler (for those pools, or fully)
2. for any pool with pg_num_target or pgp_num_target values, get the
current pgp_num X and use it to `ceph osd pool set  pg_num X`.

Can someone confirm that or recommend something different?

Cheers, Dan



> -Greg
>
> >
> > To merge PGs, Ceph decreases pgp_num to squeeze the objects into fewer pgs,
> > then decreases pg_num as the PGs are emptied to actually delete the now
> > empty PGs.
> >
> > Splitting is similar but in reverse: first, Ceph creates new empty PGs by
> > increasing pg_num. Then it gradually increases pgp_num to start sending
> > data to the new PGs.
> >
> > That's the general idea, anyway.
> >
> > Long story short, set pg_num to something close to the current
> > pgp_num_target.
> >
> > .. Dan
> >
> >
> > On Wed., Apr. 13, 2022, 18:43 Ray Cunningham, 
> > 
> > wrote:
> >
> > > Thank you so much, Dan!
> > >
> > > Can you confirm for me that for pool7, which has 2048/2048 for pg_num and
> > > 883/2048 for pgp_num, we should change pg_num or pgp_num? And can they be
> > > different for a single pool, or does pg_num and pgp_num have to always be
> > > the same?
> > >
> > > IF we just set pgp_num to 890 we will have pg_num at 2048 and pgp_num at
> > > 890, is that ok? Because if we reduce the pg_num by 1200 it will just 
> > > start
> > > a whole new load of misplaced object rebalancing. Won't it?
> > >
> > > Thank you,
> > > Ray
> > >
> > >
> > > -Original Message-
> > > From: Dan van der Ster 
> > > Sent: Wednesday, April 13, 2022 11:11 AM
> > > To: Ray Cunningham 
> > > Cc: ceph-users@ceph.io
> > > Subject: Re: [ceph-users] Stop Rebalancing
> > >
> > > Hi, Thanks.
> > >
> > > norebalance/nobackfill are useful to pause ongoing backfilling, but aren't
> > > the best option now to get the PGs to go active+clean and let the mon db
> > > come back under control. Unset those before continuing.
> > >
> > > I think you need to set the pg_num for pool1 to something close to but
> > > less than 926. (Or whatever the pg_num_target is when you run the command
> > > below).
> > > The idea is to let a few more merges complete successfully but then once
> > > all PGs are active+clean to take a decision about the other interventions
> > > you want to carry out.
> > > So this ought to be good:
> > > ceph osd pool set pool1 pg_num 920
> > >
> > > Then for pool7 this looks like splitting is ongoing. You should be able to
> > > pause that by setting the pg_num to something just above 883.
> > > I would do:
> > > ceph osd pool set pool7 pg_num 890
> > >
> > > It may even be fastest to just set those pg_num values to exactly what the
> > > current pgp_num_target is. You can try it.
> > >
> > > Once your cluster is stable again, then you should set those to the
> > > nearest power of two.
> > > Personally I would wait for #53729 to be fixed before embarking on future
> > > pg_num changes.
> > > (You'll have to mute a warning in the meantime -- check the docs after the
> > > warning appears).
> > >
> > > Cheers, dan
> > >
> > > On Wed, Apr 13, 2022 at 5:16 PM Ray Cunningham <
> > > ray.cunning...@keepertech.com> wrote:
> > > >
> > > > Perfect timing, I was just about to reply. We have disabled autoscaler
> > > on all pools now.
> > > >
> > > > Unfortunately, I can't just copy and paste from this system...
> > > >
> > > > `ceph osd pool ls detail` only 2 pools have any difference.
> > > > pool1:  pgnum 940, pgnum target 256, pgpnum 926 pgpnum target 256
> > > > pool7:  pgnum 2048, pgnum target 2048, pgpnum883, pgpnum target 2048
> > > >
> > > > ` ceph osd pool autoscale-status`
> > > > Size is 

[ceph-users] Re: Stop Rebalancing

2022-04-13 Thread Ray Cunningham
Ok, so in our situation with high pg_num and a low pgp_num, is there any way we 
can make it stop backfilling temporarily? The system is already operating with 
different pg and pgp numbers, so I'm thinking it won't kill the cluster if we 
just set the pgp_num and make it stop splitting for the moment.. 

Thank you,
Ray 

-Original Message-
From: Gregory Farnum  
Sent: Wednesday, April 13, 2022 12:07 PM
To: Dan van der Ster 
Cc: Ray Cunningham ; Ceph Users 

Subject: Re: [ceph-users] Re: Stop Rebalancing

On Wed, Apr 13, 2022 at 10:01 AM Dan van der Ster  wrote:
>
> I would set the pg_num, not pgp_num. In older versions of ceph you 
> could manipulate these things separately, but in pacific I'm not 
> confident about what setting pgp_num directly will do in this exact scenario.
>
> To understand, the difference between these two depends on if you're 
> splitting or merging.
> First, definitions: pg_num is the number of PGs and pgp_num is the 
> number used for placing objects.
>
> So if pgp_num < pg_num, then at steady state only pgp_num pgs actually 
> store data, and the other pg_num-pgp_num PGs are sitting empty.

Wait, what? That's not right! pgp_num is pg *placement* number; it controls how 
we map PGs to OSDs. But the full pg still exists as its own thing on the OSD 
and has its own data structures and objects. If currently the cluster has 
reduced pgp_num it has changed the locations of PGs, but it hasn't merged any 
PGs together. Changing the pg_num and causing merges will invoke a whole new 
workload which can be pretty substantial.
-Greg

>
> To merge PGs, Ceph decreases pgp_num to squeeze the objects into fewer 
> pgs, then decreases pg_num as the PGs are emptied to actually delete 
> the now empty PGs.
>
> Splitting is similar but in reverse: first, Ceph creates new empty PGs 
> by increasing pg_num. Then it gradually increases pgp_num to start 
> sending data to the new PGs.
>
> That's the general idea, anyway.
>
> Long story short, set pg_num to something close to the current 
> pgp_num_target.
>
> .. Dan
>
>
> On Wed., Apr. 13, 2022, 18:43 Ray Cunningham, 
> 
> wrote:
>
> > Thank you so much, Dan!
> >
> > Can you confirm for me that for pool7, which has 2048/2048 for 
> > pg_num and
> > 883/2048 for pgp_num, we should change pg_num or pgp_num? And can 
> > they be different for a single pool, or does pg_num and pgp_num have 
> > to always be the same?
> >
> > IF we just set pgp_num to 890 we will have pg_num at 2048 and 
> > pgp_num at 890, is that ok? Because if we reduce the pg_num by 1200 
> > it will just start a whole new load of misplaced object rebalancing. Won't 
> > it?
> >
> > Thank you,
> > Ray
> >
> >
> > -Original Message-
> > From: Dan van der Ster 
> > Sent: Wednesday, April 13, 2022 11:11 AM
> > To: Ray Cunningham 
> > Cc: ceph-users@ceph.io
> > Subject: Re: [ceph-users] Stop Rebalancing
> >
> > Hi, Thanks.
> >
> > norebalance/nobackfill are useful to pause ongoing backfilling, but 
> > aren't the best option now to get the PGs to go active+clean and let 
> > the mon db come back under control. Unset those before continuing.
> >
> > I think you need to set the pg_num for pool1 to something close to 
> > but less than 926. (Or whatever the pg_num_target is when you run 
> > the command below).
> > The idea is to let a few more merges complete successfully but then 
> > once all PGs are active+clean to take a decision about the other 
> > interventions you want to carry out.
> > So this ought to be good:
> > ceph osd pool set pool1 pg_num 920
> >
> > Then for pool7 this looks like splitting is ongoing. You should be 
> > able to pause that by setting the pg_num to something just above 883.
> > I would do:
> > ceph osd pool set pool7 pg_num 890
> >
> > It may even be fastest to just set those pg_num values to exactly 
> > what the current pgp_num_target is. You can try it.
> >
> > Once your cluster is stable again, then you should set those to the 
> > nearest power of two.
> > Personally I would wait for #53729 to be fixed before embarking on 
> > future pg_num changes.
> > (You'll have to mute a warning in the meantime -- check the docs 
> > after the warning appears).
> >
> > Cheers, dan
> >
> > On Wed, Apr 13, 2022 at 5:16 PM Ray Cunningham < 
> > ray.cunning...@keepertech.com> wrote:
> > >
> > > Perfect timing, I was just about to reply. We have disabled 
> > > autoscaler
> > on all pools now.
> > >
> > > Unfortunately, I can't just copy and paste from

[ceph-users] Re: Stop Rebalancing

2022-04-13 Thread Gregory Farnum
On Wed, Apr 13, 2022 at 10:01 AM Dan van der Ster  wrote:
>
> I would set the pg_num, not pgp_num. In older versions of ceph you could
> manipulate these things separately, but in pacific I'm not confident about
> what setting pgp_num directly will do in this exact scenario.
>
> To understand, the difference between these two depends on if you're
> splitting or merging.
> First, definitions: pg_num is the number of PGs and pgp_num is the number
> used for placing objects.
>
> So if pgp_num < pg_num, then at steady state only pgp_num pgs actually
> store data, and the other pg_num-pgp_num PGs are sitting empty.

Wait, what? That's not right! pgp_num is pg *placement* number; it
controls how we map PGs to OSDs. But the full pg still exists as its
own thing on the OSD and has its own data structures and objects. If
currently the cluster has reduced pgp_num it has changed the locations
of PGs, but it hasn't merged any PGs together. Changing the pg_num and
causing merges will invoke a whole new workload which can be pretty
substantial.
-Greg

>
> To merge PGs, Ceph decreases pgp_num to squeeze the objects into fewer pgs,
> then decreases pg_num as the PGs are emptied to actually delete the now
> empty PGs.
>
> Splitting is similar but in reverse: first, Ceph creates new empty PGs by
> increasing pg_num. Then it gradually increases pgp_num to start sending
> data to the new PGs.
>
> That's the general idea, anyway.
>
> Long story short, set pg_num to something close to the current
> pgp_num_target.
>
> .. Dan
>
>
> On Wed., Apr. 13, 2022, 18:43 Ray Cunningham, 
> wrote:
>
> > Thank you so much, Dan!
> >
> > Can you confirm for me that for pool7, which has 2048/2048 for pg_num and
> > 883/2048 for pgp_num, we should change pg_num or pgp_num? And can they be
> > different for a single pool, or does pg_num and pgp_num have to always be
> > the same?
> >
> > IF we just set pgp_num to 890 we will have pg_num at 2048 and pgp_num at
> > 890, is that ok? Because if we reduce the pg_num by 1200 it will just start
> > a whole new load of misplaced object rebalancing. Won't it?
> >
> > Thank you,
> > Ray
> >
> >
> > -Original Message-
> > From: Dan van der Ster 
> > Sent: Wednesday, April 13, 2022 11:11 AM
> > To: Ray Cunningham 
> > Cc: ceph-users@ceph.io
> > Subject: Re: [ceph-users] Stop Rebalancing
> >
> > Hi, Thanks.
> >
> > norebalance/nobackfill are useful to pause ongoing backfilling, but aren't
> > the best option now to get the PGs to go active+clean and let the mon db
> > come back under control. Unset those before continuing.
> >
> > I think you need to set the pg_num for pool1 to something close to but
> > less than 926. (Or whatever the pg_num_target is when you run the command
> > below).
> > The idea is to let a few more merges complete successfully but then once
> > all PGs are active+clean to take a decision about the other interventions
> > you want to carry out.
> > So this ought to be good:
> > ceph osd pool set pool1 pg_num 920
> >
> > Then for pool7 this looks like splitting is ongoing. You should be able to
> > pause that by setting the pg_num to something just above 883.
> > I would do:
> > ceph osd pool set pool7 pg_num 890
> >
> > It may even be fastest to just set those pg_num values to exactly what the
> > current pgp_num_target is. You can try it.
> >
> > Once your cluster is stable again, then you should set those to the
> > nearest power of two.
> > Personally I would wait for #53729 to be fixed before embarking on future
> > pg_num changes.
> > (You'll have to mute a warning in the meantime -- check the docs after the
> > warning appears).
> >
> > Cheers, dan
> >
> > On Wed, Apr 13, 2022 at 5:16 PM Ray Cunningham <
> > ray.cunning...@keepertech.com> wrote:
> > >
> > > Perfect timing, I was just about to reply. We have disabled autoscaler
> > on all pools now.
> > >
> > > Unfortunately, I can't just copy and paste from this system...
> > >
> > > `ceph osd pool ls detail` only 2 pools have any difference.
> > > pool1:  pgnum 940, pgnum target 256, pgpnum 926 pgpnum target 256
> > > pool7:  pgnum 2048, pgnum target 2048, pgpnum883, pgpnum target 2048
> > >
> > > ` ceph osd pool autoscale-status`
> > > Size is defined
> > > target size is empty
> > > Rate is 7 for all pools except pool7, which is 1.333730697632 Raw
> > > capacity is defined Ratio for pool1 is .0177, pool7 is .4200 and all
> > > others is 0 Target and Effective Ratio is empty Bias is 1.0 for all
> > > PG_NUM: pool1 is 256, pool7 is 2048 and all others are 32.
> > > New PG_NUM is empty
> > > Autoscale is now off for all
> > > Profile is scale-up
> > >
> > >
> > > We have set norebalance and nobackfill and are watching to see what
> > happens.
> > >
> > > Thank you,
> > > Ray
> > >
> > > -Original Message-
> > > From: Dan van der Ster 
> > > Sent: Wednesday, April 13, 2022 10:00 AM
> > > To: Ray Cunningham 
> > > Cc: ceph-users@ceph.io
> > > Subject: Re: [ceph-users] Stop Rebalancing
> > >
> > > One 

[ceph-users] Re: Stop Rebalancing

2022-04-13 Thread Dan van der Ster
I would set the pg_num, not pgp_num. In older versions of ceph you could
manipulate these things separately, but in pacific I'm not confident about
what setting pgp_num directly will do in this exact scenario.

To understand, the difference between these two depends on if you're
splitting or merging.
First, definitions: pg_num is the number of PGs and pgp_num is the number
used for placing objects.

So if pgp_num < pg_num, then at steady state only pgp_num pgs actually
store data, and the other pg_num-pgp_num PGs are sitting empty.

To merge PGs, Ceph decreases pgp_num to squeeze the objects into fewer pgs,
then decreases pg_num as the PGs are emptied to actually delete the now
empty PGs.

Splitting is similar but in reverse: first, Ceph creates new empty PGs by
increasing pg_num. Then it gradually increases pgp_num to start sending
data to the new PGs.

That's the general idea, anyway.

Long story short, set pg_num to something close to the current
pgp_num_target.

.. Dan


On Wed., Apr. 13, 2022, 18:43 Ray Cunningham, 
wrote:

> Thank you so much, Dan!
>
> Can you confirm for me that for pool7, which has 2048/2048 for pg_num and
> 883/2048 for pgp_num, we should change pg_num or pgp_num? And can they be
> different for a single pool, or does pg_num and pgp_num have to always be
> the same?
>
> IF we just set pgp_num to 890 we will have pg_num at 2048 and pgp_num at
> 890, is that ok? Because if we reduce the pg_num by 1200 it will just start
> a whole new load of misplaced object rebalancing. Won't it?
>
> Thank you,
> Ray
>
>
> -Original Message-
> From: Dan van der Ster 
> Sent: Wednesday, April 13, 2022 11:11 AM
> To: Ray Cunningham 
> Cc: ceph-users@ceph.io
> Subject: Re: [ceph-users] Stop Rebalancing
>
> Hi, Thanks.
>
> norebalance/nobackfill are useful to pause ongoing backfilling, but aren't
> the best option now to get the PGs to go active+clean and let the mon db
> come back under control. Unset those before continuing.
>
> I think you need to set the pg_num for pool1 to something close to but
> less than 926. (Or whatever the pg_num_target is when you run the command
> below).
> The idea is to let a few more merges complete successfully but then once
> all PGs are active+clean to take a decision about the other interventions
> you want to carry out.
> So this ought to be good:
> ceph osd pool set pool1 pg_num 920
>
> Then for pool7 this looks like splitting is ongoing. You should be able to
> pause that by setting the pg_num to something just above 883.
> I would do:
> ceph osd pool set pool7 pg_num 890
>
> It may even be fastest to just set those pg_num values to exactly what the
> current pgp_num_target is. You can try it.
>
> Once your cluster is stable again, then you should set those to the
> nearest power of two.
> Personally I would wait for #53729 to be fixed before embarking on future
> pg_num changes.
> (You'll have to mute a warning in the meantime -- check the docs after the
> warning appears).
>
> Cheers, dan
>
> On Wed, Apr 13, 2022 at 5:16 PM Ray Cunningham <
> ray.cunning...@keepertech.com> wrote:
> >
> > Perfect timing, I was just about to reply. We have disabled autoscaler
> on all pools now.
> >
> > Unfortunately, I can't just copy and paste from this system...
> >
> > `ceph osd pool ls detail` only 2 pools have any difference.
> > pool1:  pgnum 940, pgnum target 256, pgpnum 926 pgpnum target 256
> > pool7:  pgnum 2048, pgnum target 2048, pgpnum883, pgpnum target 2048
> >
> > ` ceph osd pool autoscale-status`
> > Size is defined
> > target size is empty
> > Rate is 7 for all pools except pool7, which is 1.333730697632 Raw
> > capacity is defined Ratio for pool1 is .0177, pool7 is .4200 and all
> > others is 0 Target and Effective Ratio is empty Bias is 1.0 for all
> > PG_NUM: pool1 is 256, pool7 is 2048 and all others are 32.
> > New PG_NUM is empty
> > Autoscale is now off for all
> > Profile is scale-up
> >
> >
> > We have set norebalance and nobackfill and are watching to see what
> happens.
> >
> > Thank you,
> > Ray
> >
> > -Original Message-
> > From: Dan van der Ster 
> > Sent: Wednesday, April 13, 2022 10:00 AM
> > To: Ray Cunningham 
> > Cc: ceph-users@ceph.io
> > Subject: Re: [ceph-users] Stop Rebalancing
> >
> > One more thing, could you please also share the `ceph osd pool
> autoscale-status` ?
> >
> >
> > On Tue, Apr 12, 2022 at 9:50 PM Ray Cunningham <
> ray.cunning...@keepertech.com> wrote:
> > >
> > > Thank you Dan! I will definitely disable autoscaler on the rest of our
> pools. I can't get the PG numbers today, but I will try to get them
> tomorrow. We definitely want to get this under control.
> > >
> > > Thank you,
> > > Ray
> > >
> > >
> > > -Original Message-
> > > From: Dan van der Ster 
> > > Sent: Tuesday, April 12, 2022 2:46 PM
> > > To: Ray Cunningham 
> > > Cc: ceph-users@ceph.io
> > > Subject: Re: [ceph-users] Stop Rebalancing
> > >
> > > Hi Ray,
> > >
> > > Disabling the autoscaler on all pools is probably a 

[ceph-users] Re: Stop Rebalancing

2022-04-13 Thread Ray Cunningham
Thank you so much, Dan! 

Can you confirm for me that for pool7, which has 2048/2048 for pg_num and 
883/2048 for pgp_num, we should change pg_num or pgp_num? And can they be 
different for a single pool, or does pg_num and pgp_num have to always be the 
same? 

IF we just set pgp_num to 890 we will have pg_num at 2048 and pgp_num at 890, 
is that ok? Because if we reduce the pg_num by 1200 it will just start a whole 
new load of misplaced object rebalancing. Won't it? 

Thank you,
Ray 
 

-Original Message-
From: Dan van der Ster  
Sent: Wednesday, April 13, 2022 11:11 AM
To: Ray Cunningham 
Cc: ceph-users@ceph.io
Subject: Re: [ceph-users] Stop Rebalancing

Hi, Thanks.

norebalance/nobackfill are useful to pause ongoing backfilling, but aren't the 
best option now to get the PGs to go active+clean and let the mon db come back 
under control. Unset those before continuing.

I think you need to set the pg_num for pool1 to something close to but less 
than 926. (Or whatever the pg_num_target is when you run the command below).
The idea is to let a few more merges complete successfully but then once all 
PGs are active+clean to take a decision about the other interventions you want 
to carry out.
So this ought to be good:
ceph osd pool set pool1 pg_num 920

Then for pool7 this looks like splitting is ongoing. You should be able to 
pause that by setting the pg_num to something just above 883.
I would do:
ceph osd pool set pool7 pg_num 890

It may even be fastest to just set those pg_num values to exactly what the 
current pgp_num_target is. You can try it.

Once your cluster is stable again, then you should set those to the nearest 
power of two.
Personally I would wait for #53729 to be fixed before embarking on future 
pg_num changes.
(You'll have to mute a warning in the meantime -- check the docs after the 
warning appears).

Cheers, dan

On Wed, Apr 13, 2022 at 5:16 PM Ray Cunningham  
wrote:
>
> Perfect timing, I was just about to reply. We have disabled autoscaler on all 
> pools now.
>
> Unfortunately, I can't just copy and paste from this system...
>
> `ceph osd pool ls detail` only 2 pools have any difference.
> pool1:  pgnum 940, pgnum target 256, pgpnum 926 pgpnum target 256
> pool7:  pgnum 2048, pgnum target 2048, pgpnum883, pgpnum target 2048
>
> ` ceph osd pool autoscale-status`
> Size is defined
> target size is empty
> Rate is 7 for all pools except pool7, which is 1.333730697632 Raw 
> capacity is defined Ratio for pool1 is .0177, pool7 is .4200 and all 
> others is 0 Target and Effective Ratio is empty Bias is 1.0 for all
> PG_NUM: pool1 is 256, pool7 is 2048 and all others are 32.
> New PG_NUM is empty
> Autoscale is now off for all
> Profile is scale-up
>
>
> We have set norebalance and nobackfill and are watching to see what happens.
>
> Thank you,
> Ray
>
> -Original Message-
> From: Dan van der Ster 
> Sent: Wednesday, April 13, 2022 10:00 AM
> To: Ray Cunningham 
> Cc: ceph-users@ceph.io
> Subject: Re: [ceph-users] Stop Rebalancing
>
> One more thing, could you please also share the `ceph osd pool 
> autoscale-status` ?
>
>
> On Tue, Apr 12, 2022 at 9:50 PM Ray Cunningham 
>  wrote:
> >
> > Thank you Dan! I will definitely disable autoscaler on the rest of our 
> > pools. I can't get the PG numbers today, but I will try to get them 
> > tomorrow. We definitely want to get this under control.
> >
> > Thank you,
> > Ray
> >
> >
> > -Original Message-
> > From: Dan van der Ster 
> > Sent: Tuesday, April 12, 2022 2:46 PM
> > To: Ray Cunningham 
> > Cc: ceph-users@ceph.io
> > Subject: Re: [ceph-users] Stop Rebalancing
> >
> > Hi Ray,
> >
> > Disabling the autoscaler on all pools is probably a good idea. At least 
> > until https://tracker.ceph.com/issues/53729 is fixed. (You are likely not 
> > susceptible to that -- but better safe than sorry).
> >
> > To pause the ongoing PG merges, you can indeed set the pg_num to the 
> > current value. This will allow the ongoing merge complete and prevent 
> > further merges from starting.
> > From `ceph osd pool ls detail` you'll see pg_num, pgp_num, pg_num_target, 
> > pgp_num_target... If you share the current values of those we can help 
> > advise what you need to set the pg_num to to effectively pause things where 
> > they are.
> >
> > BTW -- I'm going to create a request in the tracker that we improve the pg 
> > autoscaler heuristic. IMHO the autoscaler should estimate the time to carry 
> > out a split/merge operation and avoid taking one-way decisions without 
> > permission from the administrator. The autoscaler is meant to be helpful, 
> > not degrade a cluster for 100 days!
> >
> > Cheers, Dan
> >
> >
> >
> > On Tue, Apr 12, 2022 at 9:04 PM Ray Cunningham 
> >  wrote:
> > >
> > > Hi Everyone,
> > >
> > > We just upgraded our 640 OSD cluster to Ceph 16.2.7 and the resulting 
> > > rebalancing of misplaced objects is overwhelming the cluster and 
> > > impacting MON DB compaction, deep scrub repairs and 

[ceph-users] Re: Stop Rebalancing

2022-04-13 Thread Dan van der Ster
Hi, Thanks.

norebalance/nobackfill are useful to pause ongoing backfilling, but
aren't the best option now to get the PGs to go active+clean and let
the mon db come back under control. Unset those before continuing.

I think you need to set the pg_num for pool1 to something close to but
less than 926. (Or whatever the pg_num_target is when you run the
command below).
The idea is to let a few more merges complete successfully but then
once all PGs are active+clean to take a decision about the other
interventions you want to carry out.
So this ought to be good:
ceph osd pool set pool1 pg_num 920

Then for pool7 this looks like splitting is ongoing. You should be
able to pause that by setting the pg_num to something just above 883.
I would do:
ceph osd pool set pool7 pg_num 890

It may even be fastest to just set those pg_num values to exactly what
the current pgp_num_target is. You can try it.

Once your cluster is stable again, then you should set those to the
nearest power of two.
Personally I would wait for #53729 to be fixed before embarking on
future pg_num changes.
(You'll have to mute a warning in the meantime -- check the docs after
the warning appears).

Cheers, dan

On Wed, Apr 13, 2022 at 5:16 PM Ray Cunningham
 wrote:
>
> Perfect timing, I was just about to reply. We have disabled autoscaler on all 
> pools now.
>
> Unfortunately, I can't just copy and paste from this system...
>
> `ceph osd pool ls detail` only 2 pools have any difference.
> pool1:  pgnum 940, pgnum target 256, pgpnum 926 pgpnum target 256
> pool7:  pgnum 2048, pgnum target 2048, pgpnum883, pgpnum target 2048
>
> ` ceph osd pool autoscale-status`
> Size is defined
> target size is empty
> Rate is 7 for all pools except pool7, which is 1.333730697632
> Raw capacity is defined
> Ratio for pool1 is .0177, pool7 is .4200 and all others is 0
> Target and Effective Ratio is empty
> Bias is 1.0 for all
> PG_NUM: pool1 is 256, pool7 is 2048 and all others are 32.
> New PG_NUM is empty
> Autoscale is now off for all
> Profile is scale-up
>
>
> We have set norebalance and nobackfill and are watching to see what happens.
>
> Thank you,
> Ray
>
> -Original Message-
> From: Dan van der Ster 
> Sent: Wednesday, April 13, 2022 10:00 AM
> To: Ray Cunningham 
> Cc: ceph-users@ceph.io
> Subject: Re: [ceph-users] Stop Rebalancing
>
> One more thing, could you please also share the `ceph osd pool 
> autoscale-status` ?
>
>
> On Tue, Apr 12, 2022 at 9:50 PM Ray Cunningham 
>  wrote:
> >
> > Thank you Dan! I will definitely disable autoscaler on the rest of our 
> > pools. I can't get the PG numbers today, but I will try to get them 
> > tomorrow. We definitely want to get this under control.
> >
> > Thank you,
> > Ray
> >
> >
> > -Original Message-
> > From: Dan van der Ster 
> > Sent: Tuesday, April 12, 2022 2:46 PM
> > To: Ray Cunningham 
> > Cc: ceph-users@ceph.io
> > Subject: Re: [ceph-users] Stop Rebalancing
> >
> > Hi Ray,
> >
> > Disabling the autoscaler on all pools is probably a good idea. At least 
> > until https://tracker.ceph.com/issues/53729 is fixed. (You are likely not 
> > susceptible to that -- but better safe than sorry).
> >
> > To pause the ongoing PG merges, you can indeed set the pg_num to the 
> > current value. This will allow the ongoing merge complete and prevent 
> > further merges from starting.
> > From `ceph osd pool ls detail` you'll see pg_num, pgp_num, pg_num_target, 
> > pgp_num_target... If you share the current values of those we can help 
> > advise what you need to set the pg_num to to effectively pause things where 
> > they are.
> >
> > BTW -- I'm going to create a request in the tracker that we improve the pg 
> > autoscaler heuristic. IMHO the autoscaler should estimate the time to carry 
> > out a split/merge operation and avoid taking one-way decisions without 
> > permission from the administrator. The autoscaler is meant to be helpful, 
> > not degrade a cluster for 100 days!
> >
> > Cheers, Dan
> >
> >
> >
> > On Tue, Apr 12, 2022 at 9:04 PM Ray Cunningham 
> >  wrote:
> > >
> > > Hi Everyone,
> > >
> > > We just upgraded our 640 OSD cluster to Ceph 16.2.7 and the resulting 
> > > rebalancing of misplaced objects is overwhelming the cluster and 
> > > impacting MON DB compaction, deep scrub repairs and us upgrading legacy 
> > > bluestore OSDs. We have to pause the rebalancing if misplaced objects or 
> > > we're going to fall over.
> > >
> > > Autoscaler-status tells us that we are reducing our PGs by 700'ish which 
> > > will take us over 100 days to complete at our current recovery speed. We 
> > > disabled autoscaler on our biggest pool, but I'm concerned that it's 
> > > already on the path to the lower PG count and won't stop adding to our 
> > > misplaced count after drop below 5%. What can we do to stop the cluster 
> > > from finding more misplaced objects to rebalance? Should we set the PG 
> > > num manually to what our current count is? Or will that cause even 

[ceph-users] Re: Stop Rebalancing

2022-04-13 Thread Ray Cunningham
No repair IO and misplaced objects increasing with norebalance and nobackfill 
set.


Thank you,

Ray


From: Ray Cunningham 
Sent: Wednesday, April 13, 2022 10:38:29 AM
To: Dan van der Ster 
Cc: ceph-users@ceph.io 
Subject: Re: [ceph-users] Stop Rebalancing

All pools have gone backfillfull.


Thank you,

Ray Cunningham



Systems Engineering and Services Manager

keepertechnology

(571) 223-7242


From: Ray Cunningham
Sent: Wednesday, April 13, 2022 10:15:56 AM
To: Dan van der Ster 
Cc: ceph-users@ceph.io 
Subject: RE: [ceph-users] Stop Rebalancing

Perfect timing, I was just about to reply. We have disabled autoscaler on all 
pools now.

Unfortunately, I can't just copy and paste from this system...

`ceph osd pool ls detail` only 2 pools have any difference.
pool1:  pgnum 940, pgnum target 256, pgpnum 926 pgpnum target 256
pool7:  pgnum 2048, pgnum target 2048, pgpnum883, pgpnum target 2048

` ceph osd pool autoscale-status`
Size is defined
target size is empty
Rate is 7 for all pools except pool7, which is 1.333730697632
Raw capacity is defined
Ratio for pool1 is .0177, pool7 is .4200 and all others is 0
Target and Effective Ratio is empty
Bias is 1.0 for all
PG_NUM: pool1 is 256, pool7 is 2048 and all others are 32.
New PG_NUM is empty
Autoscale is now off for all
Profile is scale-up


We have set norebalance and nobackfill and are watching to see what happens.

Thank you,
Ray

-Original Message-
From: Dan van der Ster 
Sent: Wednesday, April 13, 2022 10:00 AM
To: Ray Cunningham 
Cc: ceph-users@ceph.io
Subject: Re: [ceph-users] Stop Rebalancing

One more thing, could you please also share the `ceph osd pool 
autoscale-status` ?


On Tue, Apr 12, 2022 at 9:50 PM Ray Cunningham  
wrote:
>
> Thank you Dan! I will definitely disable autoscaler on the rest of our pools. 
> I can't get the PG numbers today, but I will try to get them tomorrow. We 
> definitely want to get this under control.
>
> Thank you,
> Ray
>
>
> -Original Message-
> From: Dan van der Ster 
> Sent: Tuesday, April 12, 2022 2:46 PM
> To: Ray Cunningham 
> Cc: ceph-users@ceph.io
> Subject: Re: [ceph-users] Stop Rebalancing
>
> Hi Ray,
>
> Disabling the autoscaler on all pools is probably a good idea. At least until 
> https://tracker.ceph.com/issues/53729 is fixed. (You are likely not 
> susceptible to that -- but better safe than sorry).
>
> To pause the ongoing PG merges, you can indeed set the pg_num to the current 
> value. This will allow the ongoing merge complete and prevent further merges 
> from starting.
> From `ceph osd pool ls detail` you'll see pg_num, pgp_num, pg_num_target, 
> pgp_num_target... If you share the current values of those we can help advise 
> what you need to set the pg_num to to effectively pause things where they are.
>
> BTW -- I'm going to create a request in the tracker that we improve the pg 
> autoscaler heuristic. IMHO the autoscaler should estimate the time to carry 
> out a split/merge operation and avoid taking one-way decisions without 
> permission from the administrator. The autoscaler is meant to be helpful, not 
> degrade a cluster for 100 days!
>
> Cheers, Dan
>
>
>
> On Tue, Apr 12, 2022 at 9:04 PM Ray Cunningham 
>  wrote:
> >
> > Hi Everyone,
> >
> > We just upgraded our 640 OSD cluster to Ceph 16.2.7 and the resulting 
> > rebalancing of misplaced objects is overwhelming the cluster and impacting 
> > MON DB compaction, deep scrub repairs and us upgrading legacy bluestore 
> > OSDs. We have to pause the rebalancing if misplaced objects or we're going 
> > to fall over.
> >
> > Autoscaler-status tells us that we are reducing our PGs by 700'ish which 
> > will take us over 100 days to complete at our current recovery speed. We 
> > disabled autoscaler on our biggest pool, but I'm concerned that it's 
> > already on the path to the lower PG count and won't stop adding to our 
> > misplaced count after drop below 5%. What can we do to stop the cluster 
> > from finding more misplaced objects to rebalance? Should we set the PG num 
> > manually to what our current count is? Or will that cause even more havoc?
> >
> > Any other thoughts or ideas? My goals are to stop the rebalancing 
> > temporarily so we can deep scrub and repair inconsistencies, upgrade legacy 
> > bluestore OSDs and compact our MON DBs (supposedly MON DBs don't compact 
> > when you aren't 100% active+clean).
> >
> > Thank you,
> > Ray
> >
> > ___
> > ceph-users mailing list -- ceph-users@ceph.io To unsubscribe send an
> > email to ceph-users-le...@ceph.io
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: Stop Rebalancing

2022-04-13 Thread Ray Cunningham
All pools have gone backfillfull.


Thank you,

Ray Cunningham



Systems Engineering and Services Manager

keepertechnology

(571) 223-7242


From: Ray Cunningham
Sent: Wednesday, April 13, 2022 10:15:56 AM
To: Dan van der Ster 
Cc: ceph-users@ceph.io 
Subject: RE: [ceph-users] Stop Rebalancing

Perfect timing, I was just about to reply. We have disabled autoscaler on all 
pools now.

Unfortunately, I can't just copy and paste from this system...

`ceph osd pool ls detail` only 2 pools have any difference.
pool1:  pgnum 940, pgnum target 256, pgpnum 926 pgpnum target 256
pool7:  pgnum 2048, pgnum target 2048, pgpnum883, pgpnum target 2048

` ceph osd pool autoscale-status`
Size is defined
target size is empty
Rate is 7 for all pools except pool7, which is 1.333730697632
Raw capacity is defined
Ratio for pool1 is .0177, pool7 is .4200 and all others is 0
Target and Effective Ratio is empty
Bias is 1.0 for all
PG_NUM: pool1 is 256, pool7 is 2048 and all others are 32.
New PG_NUM is empty
Autoscale is now off for all
Profile is scale-up


We have set norebalance and nobackfill and are watching to see what happens.

Thank you,
Ray

-Original Message-
From: Dan van der Ster 
Sent: Wednesday, April 13, 2022 10:00 AM
To: Ray Cunningham 
Cc: ceph-users@ceph.io
Subject: Re: [ceph-users] Stop Rebalancing

One more thing, could you please also share the `ceph osd pool 
autoscale-status` ?


On Tue, Apr 12, 2022 at 9:50 PM Ray Cunningham  
wrote:
>
> Thank you Dan! I will definitely disable autoscaler on the rest of our pools. 
> I can't get the PG numbers today, but I will try to get them tomorrow. We 
> definitely want to get this under control.
>
> Thank you,
> Ray
>
>
> -Original Message-
> From: Dan van der Ster 
> Sent: Tuesday, April 12, 2022 2:46 PM
> To: Ray Cunningham 
> Cc: ceph-users@ceph.io
> Subject: Re: [ceph-users] Stop Rebalancing
>
> Hi Ray,
>
> Disabling the autoscaler on all pools is probably a good idea. At least until 
> https://tracker.ceph.com/issues/53729 is fixed. (You are likely not 
> susceptible to that -- but better safe than sorry).
>
> To pause the ongoing PG merges, you can indeed set the pg_num to the current 
> value. This will allow the ongoing merge complete and prevent further merges 
> from starting.
> From `ceph osd pool ls detail` you'll see pg_num, pgp_num, pg_num_target, 
> pgp_num_target... If you share the current values of those we can help advise 
> what you need to set the pg_num to to effectively pause things where they are.
>
> BTW -- I'm going to create a request in the tracker that we improve the pg 
> autoscaler heuristic. IMHO the autoscaler should estimate the time to carry 
> out a split/merge operation and avoid taking one-way decisions without 
> permission from the administrator. The autoscaler is meant to be helpful, not 
> degrade a cluster for 100 days!
>
> Cheers, Dan
>
>
>
> On Tue, Apr 12, 2022 at 9:04 PM Ray Cunningham 
>  wrote:
> >
> > Hi Everyone,
> >
> > We just upgraded our 640 OSD cluster to Ceph 16.2.7 and the resulting 
> > rebalancing of misplaced objects is overwhelming the cluster and impacting 
> > MON DB compaction, deep scrub repairs and us upgrading legacy bluestore 
> > OSDs. We have to pause the rebalancing if misplaced objects or we're going 
> > to fall over.
> >
> > Autoscaler-status tells us that we are reducing our PGs by 700'ish which 
> > will take us over 100 days to complete at our current recovery speed. We 
> > disabled autoscaler on our biggest pool, but I'm concerned that it's 
> > already on the path to the lower PG count and won't stop adding to our 
> > misplaced count after drop below 5%. What can we do to stop the cluster 
> > from finding more misplaced objects to rebalance? Should we set the PG num 
> > manually to what our current count is? Or will that cause even more havoc?
> >
> > Any other thoughts or ideas? My goals are to stop the rebalancing 
> > temporarily so we can deep scrub and repair inconsistencies, upgrade legacy 
> > bluestore OSDs and compact our MON DBs (supposedly MON DBs don't compact 
> > when you aren't 100% active+clean).
> >
> > Thank you,
> > Ray
> >
> > ___
> > ceph-users mailing list -- ceph-users@ceph.io To unsubscribe send an
> > email to ceph-users-le...@ceph.io
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: Stop Rebalancing

2022-04-13 Thread Ray Cunningham
Perfect timing, I was just about to reply. We have disabled autoscaler on all 
pools now. 

Unfortunately, I can't just copy and paste from this system... 

`ceph osd pool ls detail` only 2 pools have any difference. 
pool1:  pgnum 940, pgnum target 256, pgpnum 926 pgpnum target 256
pool7:  pgnum 2048, pgnum target 2048, pgpnum883, pgpnum target 2048

` ceph osd pool autoscale-status`
Size is defined
target size is empty
Rate is 7 for all pools except pool7, which is 1.333730697632
Raw capacity is defined
Ratio for pool1 is .0177, pool7 is .4200 and all others is 0
Target and Effective Ratio is empty
Bias is 1.0 for all
PG_NUM: pool1 is 256, pool7 is 2048 and all others are 32. 
New PG_NUM is empty
Autoscale is now off for all
Profile is scale-up


We have set norebalance and nobackfill and are watching to see what happens. 

Thank you,
Ray 

-Original Message-
From: Dan van der Ster  
Sent: Wednesday, April 13, 2022 10:00 AM
To: Ray Cunningham 
Cc: ceph-users@ceph.io
Subject: Re: [ceph-users] Stop Rebalancing

One more thing, could you please also share the `ceph osd pool 
autoscale-status` ?


On Tue, Apr 12, 2022 at 9:50 PM Ray Cunningham  
wrote:
>
> Thank you Dan! I will definitely disable autoscaler on the rest of our pools. 
> I can't get the PG numbers today, but I will try to get them tomorrow. We 
> definitely want to get this under control.
>
> Thank you,
> Ray
>
>
> -Original Message-
> From: Dan van der Ster 
> Sent: Tuesday, April 12, 2022 2:46 PM
> To: Ray Cunningham 
> Cc: ceph-users@ceph.io
> Subject: Re: [ceph-users] Stop Rebalancing
>
> Hi Ray,
>
> Disabling the autoscaler on all pools is probably a good idea. At least until 
> https://tracker.ceph.com/issues/53729 is fixed. (You are likely not 
> susceptible to that -- but better safe than sorry).
>
> To pause the ongoing PG merges, you can indeed set the pg_num to the current 
> value. This will allow the ongoing merge complete and prevent further merges 
> from starting.
> From `ceph osd pool ls detail` you'll see pg_num, pgp_num, pg_num_target, 
> pgp_num_target... If you share the current values of those we can help advise 
> what you need to set the pg_num to to effectively pause things where they are.
>
> BTW -- I'm going to create a request in the tracker that we improve the pg 
> autoscaler heuristic. IMHO the autoscaler should estimate the time to carry 
> out a split/merge operation and avoid taking one-way decisions without 
> permission from the administrator. The autoscaler is meant to be helpful, not 
> degrade a cluster for 100 days!
>
> Cheers, Dan
>
>
>
> On Tue, Apr 12, 2022 at 9:04 PM Ray Cunningham 
>  wrote:
> >
> > Hi Everyone,
> >
> > We just upgraded our 640 OSD cluster to Ceph 16.2.7 and the resulting 
> > rebalancing of misplaced objects is overwhelming the cluster and impacting 
> > MON DB compaction, deep scrub repairs and us upgrading legacy bluestore 
> > OSDs. We have to pause the rebalancing if misplaced objects or we're going 
> > to fall over.
> >
> > Autoscaler-status tells us that we are reducing our PGs by 700'ish which 
> > will take us over 100 days to complete at our current recovery speed. We 
> > disabled autoscaler on our biggest pool, but I'm concerned that it's 
> > already on the path to the lower PG count and won't stop adding to our 
> > misplaced count after drop below 5%. What can we do to stop the cluster 
> > from finding more misplaced objects to rebalance? Should we set the PG num 
> > manually to what our current count is? Or will that cause even more havoc?
> >
> > Any other thoughts or ideas? My goals are to stop the rebalancing 
> > temporarily so we can deep scrub and repair inconsistencies, upgrade legacy 
> > bluestore OSDs and compact our MON DBs (supposedly MON DBs don't compact 
> > when you aren't 100% active+clean).
> >
> > Thank you,
> > Ray
> >
> > ___
> > ceph-users mailing list -- ceph-users@ceph.io To unsubscribe send an 
> > email to ceph-users-le...@ceph.io
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: Stop Rebalancing

2022-04-13 Thread Dan van der Ster
One more thing, could you please also share the `ceph osd pool
autoscale-status` ?


On Tue, Apr 12, 2022 at 9:50 PM Ray Cunningham
 wrote:
>
> Thank you Dan! I will definitely disable autoscaler on the rest of our pools. 
> I can't get the PG numbers today, but I will try to get them tomorrow. We 
> definitely want to get this under control.
>
> Thank you,
> Ray
>
>
> -Original Message-
> From: Dan van der Ster 
> Sent: Tuesday, April 12, 2022 2:46 PM
> To: Ray Cunningham 
> Cc: ceph-users@ceph.io
> Subject: Re: [ceph-users] Stop Rebalancing
>
> Hi Ray,
>
> Disabling the autoscaler on all pools is probably a good idea. At least until 
> https://tracker.ceph.com/issues/53729 is fixed. (You are likely not 
> susceptible to that -- but better safe than sorry).
>
> To pause the ongoing PG merges, you can indeed set the pg_num to the current 
> value. This will allow the ongoing merge complete and prevent further merges 
> from starting.
> From `ceph osd pool ls detail` you'll see pg_num, pgp_num, pg_num_target, 
> pgp_num_target... If you share the current values of those we can help advise 
> what you need to set the pg_num to to effectively pause things where they are.
>
> BTW -- I'm going to create a request in the tracker that we improve the pg 
> autoscaler heuristic. IMHO the autoscaler should estimate the time to carry 
> out a split/merge operation and avoid taking one-way decisions without 
> permission from the administrator. The autoscaler is meant to be helpful, not 
> degrade a cluster for 100 days!
>
> Cheers, Dan
>
>
>
> On Tue, Apr 12, 2022 at 9:04 PM Ray Cunningham 
>  wrote:
> >
> > Hi Everyone,
> >
> > We just upgraded our 640 OSD cluster to Ceph 16.2.7 and the resulting 
> > rebalancing of misplaced objects is overwhelming the cluster and impacting 
> > MON DB compaction, deep scrub repairs and us upgrading legacy bluestore 
> > OSDs. We have to pause the rebalancing if misplaced objects or we're going 
> > to fall over.
> >
> > Autoscaler-status tells us that we are reducing our PGs by 700'ish which 
> > will take us over 100 days to complete at our current recovery speed. We 
> > disabled autoscaler on our biggest pool, but I'm concerned that it's 
> > already on the path to the lower PG count and won't stop adding to our 
> > misplaced count after drop below 5%. What can we do to stop the cluster 
> > from finding more misplaced objects to rebalance? Should we set the PG num 
> > manually to what our current count is? Or will that cause even more havoc?
> >
> > Any other thoughts or ideas? My goals are to stop the rebalancing 
> > temporarily so we can deep scrub and repair inconsistencies, upgrade legacy 
> > bluestore OSDs and compact our MON DBs (supposedly MON DBs don't compact 
> > when you aren't 100% active+clean).
> >
> > Thank you,
> > Ray
> >
> > ___
> > ceph-users mailing list -- ceph-users@ceph.io To unsubscribe send an
> > email to ceph-users-le...@ceph.io
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: Stop Rebalancing

2022-04-12 Thread Dan van der Ster
OK -- here's the tracker for what I mentioned:
https://tracker.ceph.com/issues/55303

On Tue, Apr 12, 2022 at 9:50 PM Ray Cunningham
 wrote:
>
> Thank you Dan! I will definitely disable autoscaler on the rest of our pools. 
> I can't get the PG numbers today, but I will try to get them tomorrow. We 
> definitely want to get this under control.
>
> Thank you,
> Ray
>
>
> -Original Message-
> From: Dan van der Ster 
> Sent: Tuesday, April 12, 2022 2:46 PM
> To: Ray Cunningham 
> Cc: ceph-users@ceph.io
> Subject: Re: [ceph-users] Stop Rebalancing
>
> Hi Ray,
>
> Disabling the autoscaler on all pools is probably a good idea. At least until 
> https://tracker.ceph.com/issues/53729 is fixed. (You are likely not 
> susceptible to that -- but better safe than sorry).
>
> To pause the ongoing PG merges, you can indeed set the pg_num to the current 
> value. This will allow the ongoing merge complete and prevent further merges 
> from starting.
> From `ceph osd pool ls detail` you'll see pg_num, pgp_num, pg_num_target, 
> pgp_num_target... If you share the current values of those we can help advise 
> what you need to set the pg_num to to effectively pause things where they are.
>
> BTW -- I'm going to create a request in the tracker that we improve the pg 
> autoscaler heuristic. IMHO the autoscaler should estimate the time to carry 
> out a split/merge operation and avoid taking one-way decisions without 
> permission from the administrator. The autoscaler is meant to be helpful, not 
> degrade a cluster for 100 days!
>
> Cheers, Dan
>
>
>
> On Tue, Apr 12, 2022 at 9:04 PM Ray Cunningham 
>  wrote:
> >
> > Hi Everyone,
> >
> > We just upgraded our 640 OSD cluster to Ceph 16.2.7 and the resulting 
> > rebalancing of misplaced objects is overwhelming the cluster and impacting 
> > MON DB compaction, deep scrub repairs and us upgrading legacy bluestore 
> > OSDs. We have to pause the rebalancing if misplaced objects or we're going 
> > to fall over.
> >
> > Autoscaler-status tells us that we are reducing our PGs by 700'ish which 
> > will take us over 100 days to complete at our current recovery speed. We 
> > disabled autoscaler on our biggest pool, but I'm concerned that it's 
> > already on the path to the lower PG count and won't stop adding to our 
> > misplaced count after drop below 5%. What can we do to stop the cluster 
> > from finding more misplaced objects to rebalance? Should we set the PG num 
> > manually to what our current count is? Or will that cause even more havoc?
> >
> > Any other thoughts or ideas? My goals are to stop the rebalancing 
> > temporarily so we can deep scrub and repair inconsistencies, upgrade legacy 
> > bluestore OSDs and compact our MON DBs (supposedly MON DBs don't compact 
> > when you aren't 100% active+clean).
> >
> > Thank you,
> > Ray
> >
> > ___
> > ceph-users mailing list -- ceph-users@ceph.io To unsubscribe send an
> > email to ceph-users-le...@ceph.io
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: Stop Rebalancing

2022-04-12 Thread Ray Cunningham
Thank you Dan! I will definitely disable autoscaler on the rest of our pools. I 
can't get the PG numbers today, but I will try to get them tomorrow. We 
definitely want to get this under control. 

Thank you,
Ray 
 

-Original Message-
From: Dan van der Ster  
Sent: Tuesday, April 12, 2022 2:46 PM
To: Ray Cunningham 
Cc: ceph-users@ceph.io
Subject: Re: [ceph-users] Stop Rebalancing

Hi Ray,

Disabling the autoscaler on all pools is probably a good idea. At least until 
https://tracker.ceph.com/issues/53729 is fixed. (You are likely not susceptible 
to that -- but better safe than sorry).

To pause the ongoing PG merges, you can indeed set the pg_num to the current 
value. This will allow the ongoing merge complete and prevent further merges 
from starting.
From `ceph osd pool ls detail` you'll see pg_num, pgp_num, pg_num_target, 
pgp_num_target... If you share the current values of those we can help advise 
what you need to set the pg_num to to effectively pause things where they are.

BTW -- I'm going to create a request in the tracker that we improve the pg 
autoscaler heuristic. IMHO the autoscaler should estimate the time to carry out 
a split/merge operation and avoid taking one-way decisions without permission 
from the administrator. The autoscaler is meant to be helpful, not degrade a 
cluster for 100 days!

Cheers, Dan



On Tue, Apr 12, 2022 at 9:04 PM Ray Cunningham  
wrote:
>
> Hi Everyone,
>
> We just upgraded our 640 OSD cluster to Ceph 16.2.7 and the resulting 
> rebalancing of misplaced objects is overwhelming the cluster and impacting 
> MON DB compaction, deep scrub repairs and us upgrading legacy bluestore OSDs. 
> We have to pause the rebalancing if misplaced objects or we're going to fall 
> over.
>
> Autoscaler-status tells us that we are reducing our PGs by 700'ish which will 
> take us over 100 days to complete at our current recovery speed. We disabled 
> autoscaler on our biggest pool, but I'm concerned that it's already on the 
> path to the lower PG count and won't stop adding to our misplaced count after 
> drop below 5%. What can we do to stop the cluster from finding more misplaced 
> objects to rebalance? Should we set the PG num manually to what our current 
> count is? Or will that cause even more havoc?
>
> Any other thoughts or ideas? My goals are to stop the rebalancing temporarily 
> so we can deep scrub and repair inconsistencies, upgrade legacy bluestore 
> OSDs and compact our MON DBs (supposedly MON DBs don't compact when you 
> aren't 100% active+clean).
>
> Thank you,
> Ray
>
> ___
> ceph-users mailing list -- ceph-users@ceph.io To unsubscribe send an 
> email to ceph-users-le...@ceph.io
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: Stop Rebalancing

2022-04-12 Thread Ray Cunningham
Thanks Matt! I didn't know about nobackfill and norebalance! That could be a 
good stop gap, as long as there's no issue having it set for weeks. We estimate 
our legacy bluestore cleanup to take about 3-4 weeks. 

You are correct, I don't want to cancel it we just need to catch up on other 
maintenance items. First and foremost is our MON db growing like crazy. If this 
doesn't make them compact, I'll send out another email. But I'm pretty sure 
docs say that it won't compact while PGs are not active+clean. 

When the PG drop happens we do see PG status including the term premerge so you 
are probably right about this being caused by a PG merge. 

Thank you,
Ray 
 

-Original Message-
From: Matt Vandermeulen  
Sent: Tuesday, April 12, 2022 2:39 PM
To: Ray Cunningham 
Cc: ceph-users@ceph.io
Subject: Re: [ceph-users] Stop Rebalancing

It sounds like this is from a PG merge, so I'm going to _guess_ that you don't 
want to straight up cancel the current backfill and instead pause it to catch 
your breath.

You can set `nobackfill` and/or `norebalance` which should pause the backfill.  
Alternatively, use `ceph config set osd.* osd_max_backfills 0` to stop all OSDs 
from allowing backfill to continue.  You could use this to throttle it on an 
OSD cadence, though that's a bit messy.  
Consider the recovery sleep options for that, too.

However, if you want to fully cancel the rebalance, you might want to set the 
PG count back to where you were (if that's what you want), and unless you had a 
bunch of upmaps already, your cluster should be mostly balanced, minus the data 
that has already PG-merged.

I don't think you can do something like use `pgremapper cancel-backfill --yes` 
(see Github) for this because of the PG merge (though maybe you can, I haven't 
tried it), which will add upmaps for ongoing remapped PGs to stop them from 
happening.

Others can chime in with other options, I'm always interested in new ways to 
reign in lots of backfill.


On 2022-04-12 16:03, Ray Cunningham wrote:
> Hi Everyone,
> 
> We just upgraded our 640 OSD cluster to Ceph 16.2.7 and the resulting
> rebalancing of misplaced objects is overwhelming the cluster and
> impacting MON DB compaction, deep scrub repairs and us upgrading
> legacy bluestore OSDs. We have to pause the rebalancing if misplaced
> objects or we're going to fall over.
> 
> Autoscaler-status tells us that we are reducing our PGs by 700'ish
> which will take us over 100 days to complete at our current recovery
> speed. We disabled autoscaler on our biggest pool, but I'm concerned
> that it's already on the path to the lower PG count and won't stop
> adding to our misplaced count after drop below 5%. What can we do to
> stop the cluster from finding more misplaced objects to rebalance?
> Should we set the PG num manually to what our current count is? Or
> will that cause even more havoc?
> 
> Any other thoughts or ideas? My goals are to stop the rebalancing
> temporarily so we can deep scrub and repair inconsistencies, upgrade
> legacy bluestore OSDs and compact our MON DBs (supposedly MON DBs
> don't compact when you aren't 100% active+clean).
> 
> Thank you,
> Ray
> 
> ___
> ceph-users mailing list -- ceph-users@ceph.io
> To unsubscribe send an email to ceph-users-le...@ceph.io
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: Stop Rebalancing

2022-04-12 Thread Dan van der Ster
Hi Ray,

Disabling the autoscaler on all pools is probably a good idea. At
least until https://tracker.ceph.com/issues/53729 is fixed. (You are
likely not susceptible to that -- but better safe than sorry).

To pause the ongoing PG merges, you can indeed set the pg_num to the
current value. This will allow the ongoing merge complete and prevent
further merges from starting.
>From `ceph osd pool ls detail` you'll see pg_num, pgp_num,
pg_num_target, pgp_num_target... If you share the current values of
those we can help advise what you need to set the pg_num to to
effectively pause things where they are.

BTW -- I'm going to create a request in the tracker that we improve
the pg autoscaler heuristic. IMHO the autoscaler should estimate the
time to carry out a split/merge operation and avoid taking one-way
decisions without permission from the administrator. The autoscaler is
meant to be helpful, not degrade a cluster for 100 days!

Cheers, Dan



On Tue, Apr 12, 2022 at 9:04 PM Ray Cunningham
 wrote:
>
> Hi Everyone,
>
> We just upgraded our 640 OSD cluster to Ceph 16.2.7 and the resulting 
> rebalancing of misplaced objects is overwhelming the cluster and impacting 
> MON DB compaction, deep scrub repairs and us upgrading legacy bluestore OSDs. 
> We have to pause the rebalancing if misplaced objects or we're going to fall 
> over.
>
> Autoscaler-status tells us that we are reducing our PGs by 700'ish which will 
> take us over 100 days to complete at our current recovery speed. We disabled 
> autoscaler on our biggest pool, but I'm concerned that it's already on the 
> path to the lower PG count and won't stop adding to our misplaced count after 
> drop below 5%. What can we do to stop the cluster from finding more misplaced 
> objects to rebalance? Should we set the PG num manually to what our current 
> count is? Or will that cause even more havoc?
>
> Any other thoughts or ideas? My goals are to stop the rebalancing temporarily 
> so we can deep scrub and repair inconsistencies, upgrade legacy bluestore 
> OSDs and compact our MON DBs (supposedly MON DBs don't compact when you 
> aren't 100% active+clean).
>
> Thank you,
> Ray
>
> ___
> ceph-users mailing list -- ceph-users@ceph.io
> To unsubscribe send an email to ceph-users-le...@ceph.io
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: Stop Rebalancing

2022-04-12 Thread Matt Vandermeulen
It sounds like this is from a PG merge, so I'm going to _guess_ that you 
don't want to straight up cancel the current backfill and instead pause 
it to catch your breath.


You can set `nobackfill` and/or `norebalance` which should pause the 
backfill.  Alternatively, use `ceph config set osd.* osd_max_backfills 
0` to stop all OSDs from allowing backfill to continue.  You could use 
this to throttle it on an OSD cadence, though that's a bit messy.  
Consider the recovery sleep options for that, too.


However, if you want to fully cancel the rebalance, you might want to 
set the PG count back to where you were (if that's what you want), and 
unless you had a bunch of upmaps already, your cluster should be mostly 
balanced, minus the data that has already PG-merged.


I don't think you can do something like use `pgremapper cancel-backfill 
--yes` (see Github) for this because of the PG merge (though maybe you 
can, I haven't tried it), which will add upmaps for ongoing remapped PGs 
to stop them from happening.


Others can chime in with other options, I'm always interested in new 
ways to reign in lots of backfill.



On 2022-04-12 16:03, Ray Cunningham wrote:

Hi Everyone,

We just upgraded our 640 OSD cluster to Ceph 16.2.7 and the resulting
rebalancing of misplaced objects is overwhelming the cluster and
impacting MON DB compaction, deep scrub repairs and us upgrading
legacy bluestore OSDs. We have to pause the rebalancing if misplaced
objects or we're going to fall over.

Autoscaler-status tells us that we are reducing our PGs by 700'ish
which will take us over 100 days to complete at our current recovery
speed. We disabled autoscaler on our biggest pool, but I'm concerned
that it's already on the path to the lower PG count and won't stop
adding to our misplaced count after drop below 5%. What can we do to
stop the cluster from finding more misplaced objects to rebalance?
Should we set the PG num manually to what our current count is? Or
will that cause even more havoc?

Any other thoughts or ideas? My goals are to stop the rebalancing
temporarily so we can deep scrub and repair inconsistencies, upgrade
legacy bluestore OSDs and compact our MON DBs (supposedly MON DBs
don't compact when you aren't 100% active+clean).

Thank you,
Ray

___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io

___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io