From: Hong Zhiguo <zhiguoh...@tencent.com>

Why
====
Pointed out by Vivek: Tokens generated during idle period should
be trimmed. Otherwise a huge bio may be permited immediately.
Overlimit behaviour may be observed during short I/O throughput
test.

Vivek also pointed out: We should not over-trim for hierarchical
groups. Suppose a subtree of groups are idle when a big bio comes.
The token of the child group is trimmed and not enough. So the bio is
queued on the child group. After some jiffies the child group reserved
enough tokens and the bio climbs up. If we trim the parent group at
this time again, this bio will wait too much time than expected.

Analysis
========
When the bio is queued on child group, all it's ancestor groups
becomes non-idle. They should start to reserve tokens for that
bio from this moment. And their reserved tokens before should be
trimmed at this moment.

How
====
service_queue now has a new member nr_queued_tree[2], to represent
the the number of bios waiting on the subtree rooted by this sq.

When a bio is queued on the hierarchy first time, nr_queued_tree
of all ancestors and the child group itself are increased. When a
bio climbs up, nr_queued_tree of the child group is decreased.

When nr_queued_tree turns from zero to one, the tokens reserved
before are trimmed. And after this switch, this group will never
be trimmed to reserve tokens for the bio waiting on it's descendant
group.

Signed-off-by: Hong Zhiguo <zhiguoh...@tencent.com>
---
 block/blk-throttle.c | 38 ++++++++++++++++++++++++++++++++++++++
 1 file changed, 38 insertions(+)

diff --git a/block/blk-throttle.c b/block/blk-throttle.c
index 45d4d91..d10a5e1 100644
--- a/block/blk-throttle.c
+++ b/block/blk-throttle.c
@@ -63,6 +63,10 @@ struct throtl_service_queue {
         */
        struct list_head        queued[2];      /* throtl_qnode [READ/WRITE] */
        unsigned int            nr_queued[2];   /* number of queued bios */
+       /*
+        * number of bios queued on this subtree
+        */
+       unsigned int            nr_queued_tree[2];
 
        /*
         * RB tree of active children throtl_grp's, which are sorted by
@@ -699,6 +703,11 @@ static void tg_update_token(struct throtl_grp *tg, bool rw)
        do_div(token, HZ);
        tg->bytes_token[rw] += token;
 
+       /* trim token if the whole subtree is idle */
+       if (!tg->service_queue.nr_queued_tree[rw] &&
+           tg->bytes_token[rw] > THROTL_BURST_BYTES)
+               token = THROTL_BURST_BYTES;
+
        tg->t_c[rw] = jiffies;
 }
 
@@ -843,6 +852,28 @@ static void throtl_charge_bio(struct throtl_grp *tg, 
struct bio *bio)
 }
 
 /**
+ * tg_update_nr_queue_tree - update nr_queued_tree of all ancestors
+ * @sq: the target service_queue
+ *
+ * Traverse from @sq to the top throtl_grp, increase their
+ * nr_queued_tree. If one sq has zero nr_queued_tree and now turns to
+ * one, trim the tokens generated before. From now on it will never
+ * get trimmed until same thing happens next time.
+ */
+static void tg_update_nr_queue_tree(struct throtl_service_queue *sq, bool rw)
+{
+       struct throtl_grp *tg;
+
+       while (sq && (tg = sq_to_tg(sq))) {
+               if (!sq->nr_queued_tree[rw])
+                       tg_update_token(tg, rw);
+
+               sq->nr_queued_tree[rw]++;
+               sq = sq->parent_sq;
+       }
+}
+
+/**
  * throtl_add_bio_tg - add a bio to the specified throtl_grp
  * @bio: bio to add
  * @qn: qnode to use
@@ -916,6 +947,9 @@ static void tg_dispatch_one_bio(struct throtl_grp *tg, bool 
rw)
        bio = throtl_pop_queued(&sq->queued[rw], &tg_to_put);
        sq->nr_queued[rw]--;
 
+       /* Here's the only place nr_queued_tree get decreased */
+       sq->nr_queued_tree[rw]--;
+
        throtl_charge_bio(tg, bio);
 
        /*
@@ -1375,6 +1409,10 @@ bool blk_throtl_bio(struct request_queue *q, struct bio 
*bio)
 
        bio_associate_current(bio);
        tg->td->nr_queued[rw]++;
+
+       /* Here's the only place nr_queued_tree get increased */
+       tg_update_nr_queue_tree(sq, rw);
+
        throtl_add_bio_tg(bio, qn, tg);
        throttled = true;
 
-- 
1.8.1.2

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Reply via email to