[Cluster-devel] [PATCH 1/2] Add some randomisation to the GFS2 resource group allocator

Mark Syms Thu, 20 Sep 2018 08:14:53 -0700

From: Tim Smith <tim.sm...@citrix.com>

When growing a number of files on the same cluster node from different
threads (e.g. fio with 20 or so jobs), all those threads pile into
gfs2_inplace_reserve() independently looking to claim a new resource
group and after a while they all synchronise, getting through the
gfs2_rgrp_used_recently()/gfs2_rgrp_congested() check together.


When this happens, write performance drops to about 1/5 on a single
node cluster, and on multi-node clusters it drops to near zero on
some nodes. The output from "glocktop -r -H -d 1" when this happens
begins to show many processes stuck in gfs2_inplace_reserve(), waiting
on a resource group lock.

This commit introduces a module parameter which, when set to a value
of 1, will introduce some random jitter into the first two passes of
gfs2_inplace_reserve() when trying to lock a new resource group,
skipping to the next one 1/2 the time with progressively lower
probability on each attempt.

Signed-off-by: Tim Smith <tim.sm...@citrix.com>
---
 fs/gfs2/rgrp.c | 39 +++++++++++++++++++++++++++++++++++----
 1 file changed, 35 insertions(+), 4 deletions(-)

diff --git a/fs/gfs2/rgrp.c b/fs/gfs2/rgrp.c
index 1ad3256..994eb7f 100644
--- a/fs/gfs2/rgrp.c
+++ b/fs/gfs2/rgrp.c
@@ -19,6 +19,7 @@
 #include <linux/blkdev.h>
 #include <linux/rbtree.h>
 #include <linux/random.h>
+#include <linux/module.h>
 
 #include "gfs2.h"
 #include "incore.h"
@@ -49,6 +50,11 @@
 #define LBITSKIP00 (0x0000000000000000UL)
 #endif
 
+static int gfs2_skippy_rgrp_alloc;
+
+module_param_named(skippy_rgrp_alloc, gfs2_skippy_rgrp_alloc, int, 0644);
+MODULE_PARM_DESC(skippy_rgrp_alloc, "Set skippiness of resource group 
allocator, 0|1. Where 1 will cause resource groups to be randomly skipped with 
the likelihood of skipping progressively decreasing after a skip has occured.");
+
 /*
  * These routines are used by the resource group routines (rgrp.c)
  * to keep track of block allocation.  Each block is represented by two
@@ -2016,6 +2022,11 @@ int gfs2_inplace_reserve(struct gfs2_inode *ip, struct 
gfs2_alloc_parms *ap)
        u64 last_unlinked = NO_BLOCK;
        int loops = 0;
        u32 free_blocks, skip = 0;
+       /*
+        * gfs2_skippy_rgrp_alloc provides our initial skippiness.
+        * randskip will thus be 2-255 if we want it do do anything.
+        */
+       u8 randskip = gfs2_skippy_rgrp_alloc + 1;
 
        if (sdp->sd_args.ar_rgrplvb)
                flags |= GL_SKIP;
@@ -2046,10 +2057,30 @@ int gfs2_inplace_reserve(struct gfs2_inode *ip, struct 
gfs2_alloc_parms *ap)
                                if (loops == 0 &&
                                    !fast_to_acquire(rs->rs_rbm.rgd))
                                        goto next_rgrp;
-                               if ((loops < 2) &&
-                                   gfs2_rgrp_used_recently(rs, 1000) &&
-                                   gfs2_rgrp_congested(rs->rs_rbm.rgd, loops))
-                                       goto next_rgrp;
+                               if (loops < 2) {
+                                       /*
+                                        * If resource group allocation is 
requested to be skippy,
+                                        * roll a hypothetical dice of 
<randskip> sides and skip
+                                        * straight to the next resource group 
anyway if it comes
+                                        * up 1.
+                                        */
+                                       if (gfs2_skippy_rgrp_alloc) {
+                                               u8 jitter;
+
+                                               prandom_bytes(&jitter, 
sizeof(jitter));
+                                               if ((jitter % randskip) == 0) {
+                                                       /*
+                                                        * If we are choosing 
to skip, bump randskip to make it
+                                                        * successively less 
likely that we will skip again
+                                                        */
+                                                       randskip ++;
+                                                       goto next_rgrp;
+                                               }
+                                       }
+                                       if (gfs2_rgrp_used_recently(rs, 1000) &&
+                                               
gfs2_rgrp_congested(rs->rs_rbm.rgd, loops))
+                                               goto next_rgrp;
+                               }
                        }
                        error = gfs2_glock_nq_init(rs->rs_rbm.rgd->rd_gl,
                                                   LM_ST_EXCLUSIVE, flags,
-- 
1.8.3.1

[Cluster-devel] [PATCH 1/2] Add some randomisation to the GFS2 resource group allocator

Reply via email to