[ceph-users] Re: Best way to change bucket hierarchy

Frank Schilder Thu, 04 Jun 2020 20:24:12 -0700

Its hard to tell without knowing what the diff is, but from your description I 
take it that you changed the failure domain for every(?) pool from host to 
chassis. I don't know what a chassis is in your architecture, but if each 
chassis contains several host buckets, then yes, I would expect almost every PG 
to be affected.

Best regards,
=================
Frank Schilder
AIT Risø Campus
Bygning 109, rum S14

________________________________________
From: Kyriazis, George <george.kyria...@intel.com>
Sent: 05 June 2020 00:28:43
To: Frank Schilder
Cc: ceph-users
Subject: Re: Best way to change bucket hierarchy

Hmm,

So I tried all that, and I got almost all of my PGs being remapped.  Crush map 
looks correct.  Is that normal?

Thanks,

George

On Jun 4, 2020, at 2:33 PM, Frank Schilder <fr...@dtu.dk<mailto:fr...@dtu.dk>> 
wrote:

Hi George,

you don't need to worry about that too much. The EC profile contains two types 
of information, one part about the actual EC encoding and another part about 
crush parameters. Unfortunately, actually. Part of this information is mutable 
after pool creation while the rest is not. Mutable here means outside of the 
profile. You can change the failure domain in the crush map without issues, but 
the profile won't reflect that change. That's an inconsistency we currently 
have to live with and it would have been better to separate mutable data (like 
failure domain) from immutable data (like k and m) or provide a meaningful 
interface to maintain consistency of mutable information.

In short, don't believe everything the EC profile tells you. Some information 
might be out of date, like the failure domain or the device class (basically 
everything starting with crush-). If you remember that, you are out of trouble. 
Always dump the crush rule of an EC pool explicitly to see the true parameters 
in action.

Having said that, to change the failure domain for an EC pool, change the crush 
rule for the EC profile - I did this too and it works just fine. The crush rule 
has by default the same name as the pool. I'm afraid, here you will have to do 
a manual edit of the crush rule as Wido explained. There is no other way - at 
least currently not.

You can ask in this list for confirmation that your change is doing what you 
want.

Do not try to touch an EC profile, they are read-only any ways. The crush 
parameters are only used at pool creation and never looked at again. You can 
override these by editing the crush rule as explained above.

Best regards and good luck,
=================
Frank Schilder
AIT Risø Campus
Bygning 109, rum S14

________________________________________
From: Kyriazis, George 
<george.kyria...@intel.com<mailto:george.kyria...@intel.com>>
Sent: 04 June 2020 20:56:38
To: Frank Schilder
Cc: ceph-users
Subject: Re: Best way to change bucket hierarchy

Thanks Frank,

Interesting info about the EC profile.  I do have an EC pool, but I noticed the 
following when I dumped the profile:

# ceph osd erasure-code-profile get ec22
crush-device-class=hdd
crush-failure-domain=host
crush-root=default
jerasure-per-chunk-alignment=false
k=2
m=2
plugin=jerasure
technique=reed_sol_van
w=8
#

Which says that the failure domain of the EC profile is also set to host.  
Looks like I need to change the EC profile, too, but since it associated with 
the pool, maybe I can’t do that after pool creation?  Or…. Since it the 
property is named “crush-failure-domain”, it’s automatically inherited from the 
crush profile, so I don’t have to do anything?

Thanks,

George

On Jun 4, 2020, at 1:51 AM, Frank Schilder 
<fr...@dtu.dk<mailto:fr...@dtu.dk><mailto:fr...@dtu.dk>> wrote:

Hi George,

for replicated rules you can simply create a new crush rule with the new 
failure domain set to chassis and change any pool's crush rule to this new one. 
If you have EC pools, then the chooseleaf needs to be edited by hand. I did 
this before as well. (A really unfortunate side effect is, that the EC profile 
attached to the pool goes out of sync with the crush map and there is nothing 
one can do about that. This is annoying yet harmless.)

The intend of doing these changes while norebalance is set is

- to avoid unnecessary data movement due to successive changes happening step 
by step and
- to make sure peering is successful before starting to move data.

I believe OSDs peer a bit faster with norebalance set and there is then a 
shorter interrupt to ongoing I/O (no I/O happens to a PG during peering).

Yes, if you safe the old crush map, you can undo everything. It is a good idea 
to have a backup also just for reference and to compare before and after.

Best regards,
=================
Frank Schilder
AIT Risø Campus
Bygning 109, rum S14

________________________________________
From: Kyriazis, George 
<george.kyria...@intel.com<mailto:george.kyria...@intel.com><mailto:george.kyria...@intel.com>>
Sent: 04 June 2020 00:58:20
To: Frank Schilder
Cc: ceph-users
Subject: Re: Best way to change bucket hierarchy

Thanks Frank,

I don’t have too much experience editing crush rules, but I assume the 
chooseleaf step would also have to change to:

      step chooseleaf firstn 0 type chassis

Correct?  Is that the only other change that is needed?  It looks like the rule 
change can happen both inside and outside the “norebalance” setting (again with 
CLI commands), but is it safer to do it inside (ie. while not rebalancing)?

If I keep a backup of the crush rule map (with “ceph osd getcrushmap”), I 
assume I can restore the old map if something goes bad?

Thanks again!

George

On Jun 3, 2020, at 5:24 PM, Frank Schilder 
<fr...@dtu.dk<mailto:fr...@dtu.dk><mailto:fr...@dtu.dk>> wrote:

You can use the command-line without editing the crush map. Look at the 
documentation of commands like

ceph osd crush add-bucket ...
ceph osd crush move ...

Before starting this, set "ceph osd set norebalance" and unset after you are 
happy with the crush tree. Let everything peer. You should see misplaced 
objects and remapped PGs, but no degraded objects or PGs.

Do this only when cluster is helth_ok, otherwise things can get really 
complicated.

Best regards,
=================
Frank Schilder
AIT Risø Campus
Bygning 109, rum S14

________________________________________
From: Kyriazis, George 
<george.kyria...@intel.com<mailto:george.kyria...@intel.com><mailto:george.kyria...@intel.com>>
Sent: 03 June 2020 22:45:11
To: ceph-users
Subject: [ceph-users] Best way to change bucket hierarchy

Helo,

I have a live ceph cluster, and I’m in the need of modifying the bucket 
hierarchy.  I am currently using the default crush rule (ie. keep each replica 
on a different host).  My need is to add a “chassis” level, and keep replicas 
on a per-chassis level.

>From what I read in the documentation, I would have to edit the crush file 
>manually, however this sounds kinda scary for a live cluster.

Are there any “best known methods” to achieve that goal without messing things 
up?

In my current scenario, I have one host per chassis, and planning on later 
adding nodes where there would be >1 hosts per chassis. It looks like “in 
theory” there wouldn’t be a need for any data movement after the crush map 
changes.  Will reality match theory? Anything else I need to watch out for?

Thank you!

George

_______________________________________________
ceph-users mailing list -- 
ceph-users@ceph.io<mailto:ceph-users@ceph.io><mailto:ceph-users@ceph.io>
To unsubscribe send an email to 
ceph-users-le...@ceph.io<mailto:ceph-users-le...@ceph.io><mailto:ceph-users-le...@ceph.io>
_______________________________________________
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io

[ceph-users] Re: Best way to change bucket hierarchy

Reply via email to