[ceph-users] Re: Tool to cancel pending backfills

2021-10-04 Thread Peter Lieven

Am 01.10.21 um 16:52 schrieb Josh Baergen:

Hi Peter,


When I check for circles I found that running the upmap balancer alone never 
seems to create
any kind of circle in the graph

By a circle, do you mean something like this?
pg 1.a: 1->2 (upmap to put a chunk on 2 instead of 1)
pg 1.b: 2->3
pg 1.c: 3->1



Exactly. The upmap balancer tries to remove upmap entries first as far as I 
understand so I would expect that there never will be a circle like, 1->2, 
2->1, but I don't

see why It would no accidently create a circle with more nodes involved.




If so, then it's not surprising that the upmap balancer wouldn't
create this situation by itself, since there's no reason for this set
of upmaps to exist purely for balance reasons. I don't think the
balancer needs any explicit code to avoid the situation because of
this.


Running pgremapper + balancer created circles with sometimes several dozen 
nodes. I would update the docs of the pgremapper
to warn about this fact and guide the users to use undo-upmap to slowly remove 
the upmaps create by cancel-backfill.

This is again not surprising, since cancel-backfill will do whatever's
necessary to undo a set of CRUSH changes (and some CRUSH changes
regularly lead to movement cycles like this), and then using the upmap
balancer will only make enough changes to achieve balance, not undo
everything that's there.


It might be a nice addition to pgremapper to add an option to optimze the upmap 
table.

What I'm still missing here is the value in this. Are there
demonstrable problems presented by a large upmap exception table (e.g.
performance or operational)?



I have no evidence, how expensive upmap table entries are. They need to be 
synched to every client and there have a certain overhead.

Maybe someone with more knowledge of the internals can give some insight here. 
The whole idea of crush is not carry a table with exact

mappings around and upmap entries are the exactly opposite of this idea. Bottom 
line, every circlic upmap definition is an overhead thats

unnecessary and i personally think it should be avoided.


Best,

Peter



___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: Tool to cancel pending backfills

2021-10-01 Thread Josh Baergen
Hi Peter,

> When I check for circles I found that running the upmap balancer alone never 
> seems to create
> any kind of circle in the graph

By a circle, do you mean something like this?
pg 1.a: 1->2 (upmap to put a chunk on 2 instead of 1)
pg 1.b: 2->3
pg 1.c: 3->1

If so, then it's not surprising that the upmap balancer wouldn't
create this situation by itself, since there's no reason for this set
of upmaps to exist purely for balance reasons. I don't think the
balancer needs any explicit code to avoid the situation because of
this.

> Running pgremapper + balancer created circles with sometimes several dozen 
> nodes. I would update the docs of the pgremapper
> to warn about this fact and guide the users to use undo-upmap to slowly 
> remove the upmaps create by cancel-backfill.

This is again not surprising, since cancel-backfill will do whatever's
necessary to undo a set of CRUSH changes (and some CRUSH changes
regularly lead to movement cycles like this), and then using the upmap
balancer will only make enough changes to achieve balance, not undo
everything that's there.

> It might be a nice addition to pgremapper to add an option to optimze the 
> upmap table.

What I'm still missing here is the value in this. Are there
demonstrable problems presented by a large upmap exception table (e.g.
performance or operational)?

Josh
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: Tool to cancel pending backfills

2021-10-01 Thread Peter Lieven
Am 27.09.21 um 22:38 schrieb Josh Baergen:
>> I have a question regarding the last step. It seems to me that the ceph 
>> balancer is not able to remove the upmaps
>> created by pgremapper, but instead creates new upmaps to balance the pgs 
>> among osds.
> The balancer will prefer to remove existing upmaps[1], but it's not
> guaranteed. The upmap has to exist between the source and target OSD
> already decided on by the balancer in order for this to happen. The
> reality, though, is that the upmap balancer will need to create many
> upmap exception table entries to balance any sizable system.
>
> Like you, I would prefer to have as few upmap exception table entries
> as possible in a system (fewer surprises when an OSD fails), but we
> regularly run systems that have thousands of entries without any
> discernible impact and haven't had any major operational issues that
> result from it, except for on really old systems that are just awful
> to work with in the first place.


Hi Josh,


thanks for the update. With the pgremapper run first and then the upmap 
balancer enabled

I ended up with an upmap entry for almost every placement group in a pool. So I 
decided to write a tool which analizes the upmap

entries and put them in a digraph. When I check for circles I found that 
running the upmap balancer alone never seems to create

any kind of circle in the graph (i will have to check the code you mentioned 
[1] if there is any mechanism to avoid that).

Running pgremapper + balancer created circles with sometimes several dozen 
nodes. I would update the docs of the pgremapper

to warn about this fact and guide the users to use undo-upmap to slowly remove 
the upmaps create by cancel-backfill.


It might be a nice addition to pgremapper to add an option to optimze the upmap 
table. When searching for circles you might want to limit

the depth of the DFS otherwise the runtime will be crazy.


Thanks,

Peter



___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: Tool to cancel pending backfills

2021-09-27 Thread Josh Baergen
> I have a question regarding the last step. It seems to me that the ceph 
> balancer is not able to remove the upmaps
> created by pgremapper, but instead creates new upmaps to balance the pgs 
> among osds.

The balancer will prefer to remove existing upmaps[1], but it's not
guaranteed. The upmap has to exist between the source and target OSD
already decided on by the balancer in order for this to happen. The
reality, though, is that the upmap balancer will need to create many
upmap exception table entries to balance any sizable system.

Like you, I would prefer to have as few upmap exception table entries
as possible in a system (fewer surprises when an OSD fails), but we
regularly run systems that have thousands of entries without any
discernible impact and haven't had any major operational issues that
result from it, except for on really old systems that are just awful
to work with in the first place.

Josh

[1] I think this is the implementation:
https://github.com/ceph/ceph/blob/bc8c846b36288ff7ac65005087b0dda0e4b857f4/src/osd/OSDMap.cc#L4794-L4832
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: Tool to cancel pending backfills

2021-09-27 Thread Peter Lieven
Am 26.09.21 um 19:08 schrieb Alexandre Marangone:
> Thanks for the feedback Alex! If you have any issue or ideas for
> improvements please do submit them on the GH repo:
> https://github.com/digitalocean/pgremapper/
>
> Last Thursday I did a Ceph at DO tech talk, I talked about how we use
> pgremapper to do augments on HDD clusters. The recording is not
> available yet but the gist is:
>  - set nobackfill/norebalance
>  - create all your osds -> PGs are in backill state but not data movement
>  - cancel all backfills with pgremapper -> PGs are back to active+clean
>  - unset nobackfill/norebalance -> nothing happens
>  - turn on ceph balancer or use pgremapper undo-upmap to do your
> augment in a controlled way


Thanks anyone for the pointer. Great tool!


I have a question regarding the last step. It seems to me that the ceph 
balancer is not able to remove the upmaps

created by pgremapper, but instead creates new upmaps to balance the pgs among 
osds.

Is this expected? The ideal way would be try to remove undo upmaps to achieve 
balance rather than creating

more and more upmaps.


This way it might happen that we have an upmap for a pg from OSD A to OSD B and 
an upmap for another pg from OSB B to OSD A

whereas it would just be enough to have no upmap at all.



Thanks,

Peter



___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: Tool to cancel pending backfills

2021-09-26 Thread Alexandre Marangone
Thanks for the feedback Alex! If you have any issue or ideas for
improvements please do submit them on the GH repo:
https://github.com/digitalocean/pgremapper/

Last Thursday I did a Ceph at DO tech talk, I talked about how we use
pgremapper to do augments on HDD clusters. The recording is not
available yet but the gist is:
 - set nobackfill/norebalance
 - create all your osds -> PGs are in backill state but not data movement
 - cancel all backfills with pgremapper -> PGs are back to active+clean
 - unset nobackfill/norebalance -> nothing happens
 - turn on ceph balancer or use pgremapper undo-upmap to do your
augment in a controlled way

Our main motivation to do it this way is that on HDD clusters flapping
is a fact of life and creates recovery PGs that are blocked by the
backfill reservations from the augment. As more flapping occurs, the
number of degraded objects increases which is always uncomfortable.
Doing an augment this way allows us to have N backfills at a time,
wait for completion -> let recovery happen -> undo N more upmaps ->
etc. which dramatically lowers the amount of time a cluster is
degraded.



On Sat, Sep 25, 2021 at 10:15 AM Alex Gorbachev  
wrote:
>
> Hi Ceph community,
>
> I think this is so important operationally that bears repeating (
> https://lists.ceph.io/hyperkitty/list/ceph-users@ceph.io/thread/GJ35EL73A4LV6NPA74M6H6IN7BXMMHYA/
> )
>
> Digital Ocean has released the pgremapper tool, with which one can cancel
> pending backfills (in case bad decisions were made by balancer, or other
> tools) - in my case this was a necessity to reweight many OSDs back to 1.
> This tool saved many days of waiting for an unneeded rebalance.
>
> I found the tool at https://golangrepo.com/repo/digitalocean-pgremapper
> --
> Alex Gorbachev
> https://alextelescope.blogspot.com
> ___
> ceph-users mailing list -- ceph-users@ceph.io
> To unsubscribe send an email to ceph-users-le...@ceph.io
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io