I messed up a crush map the other day, mixing components of different
types in a single rule.  The crushmap compiler didn't complain, but mons
and osds would crash when applying those rules.  I had to use this patch
to recover the cluster.  Only the second hunk was relevant, but I
figured a BUG_ON that stops you from fixing the problem is best avoided
;-)

--- Begin Message ---
It's very hard to recover from an invalid crushmap if mons fail
assertions while processing the map, and osds crash while advancing
past an already-fixed map.  Skip such broken rules instead of
aborting.

Signed-off-by: Alexandre Oliva <ol...@lsd.ic.unicamp.br>
---
 src/crush/mapper.c |   14 +++++++++++---
 1 files changed, 11 insertions(+), 3 deletions(-)

diff --git a/src/crush/mapper.c b/src/crush/mapper.c
index 1e475b40..6ce4c97 100644
--- a/src/crush/mapper.c
+++ b/src/crush/mapper.c
@@ -354,7 +354,11 @@ static int crush_choose(const struct crush_map *map,
                                        item = bucket_perm_choose(in, x, r);
                                else
                                        item = crush_bucket_choose(in, x, r);
-                               BUG_ON(item >= map->max_devices);
+                               if (item >= map->max_devices) {
+                                       dprintk("  bad item %d\n", item);
+                                       skip_rep = 1;
+                                       break;
+                               }
 
                                /* desired type? */
                                if (item < 0)
@@ -365,8 +369,12 @@ static int crush_choose(const struct crush_map *map,
 
                                /* keep going? */
                                if (itemtype != type) {
-                                       BUG_ON(item >= 0 ||
-                                              (-1-item) >= map->max_buckets);
+                                       if (item >= 0 ||
+                                           (-1-item) >= map->max_buckets) {
+                                               dprintk("  bad item type %d\n", 
type)
+                                               skip_rep = 1;
+                                               break;
+                                       }
                                        in = map->buckets[-1-item];
                                        retry_bucket = 1;
                                        continue;
-- 
1.7.7.6


--- End Message ---

-- 
Alexandre Oliva, freedom fighter    http://FSFLA.org/~lxoliva/
You must be the change you wish to see in the world. -- Gandhi
Be Free! -- http://FSFLA.org/   FSF Latin America board member
Free Software Evangelist      Red Hat Brazil Compiler Engineer

Reply via email to