https://github.com/ceph/ceph/pull/869

has a bunch of pending changes to CRUSH to support the erasure coding 
work in firefly.

The main item is that the behavior of 'choose indep' has changed 
significantly.  This is strictly speaking a change in behavior, but nobody 
should be using indep mode in a normal ceph cluster (unless they went 
manually fiddling with their crush map).

The new and improved indep does a breadth-first mapping instead of 
depth-first, which means few items shifting around when there are 
failures.  It also drops some of the cruft that fell out of the combined 
code from before.  As a bonus, the old method is now firstn-only and I was 
able to strip out a bunch of crap in the process.  Yay!

There are a few other things:

- The 'osd crush rule create-simple ..' command now takes an optional mode 
  (firstn or indep) so that it can be used for erasure pools.

- There is an 'erasure' pg pool type (existing types were 'rep' (default) 
and 'raid4' (never used or implemented)).

- New rule commands:

 step set_choose_tries N

This overrides the tunable total_tries (default is 50) for the current 
rule only.

 step set_chooseleaf_tries M

This overrides the recursive behavior when using chooseleaf.  By default, 
for indep mode, we try exactly once with the recursive call, as this 
maintains the same bound on computational complexity.  However, increasing 
this a bit (say, to 5) improves stability of the mapping a bit when 
there are devices marked out.  This lets you set it for *just* the current 
rule.

Note that for the 'firstn' mode, the default (legacy) behavior is to try 
total_tries in the recursive call, which makes the computational 
complexity proprotional to total_tries^2 (in the extreme).  If the 
'descend_once' tunable is set (now the default), then we do one attempt.. 
if we hit a reject.  Unfortunately not in the case of a collision (dup).  
But, we can't change that without breaking compatibility for existing 
rules.  To "fix" that, we can add a set_chooseleaf_tries 1 command to 
firstn rules.  It's a bit muddled, though.  :(

- CrushWrapper has a helper to detect if any of these rule commands are in 
use, and OSDMap sets the required features accordingly.

- There is a small fix for OSDMap CACHEPOOL feature detection.

Long story short: if any of this new stuff is used (and it will be 
needed for erasure pools), the new feature bit will be required and old 
clients won't be able to connect.  I think the new behavior is good.  My 
main concern is the weird interplay of the 'descend_once' tunable, which 
unfortunately wasn't implemented to mean the same as chooseleaf_tries = 1.  
I'm not sure if it's worth fixing that via _another_ tunable or not; if 
so, we could (yay) end up where set_chooseleaf_tries actually works for 
firstn the same way it does for indep, and the tunable just makes it 
default to 1 (as it does with indep).

sage
--
To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Reply via email to