[PATCH tip/core/rcu 1/2] rcu: Tag callback lists with corresponding grace-period number
From: "Paul E. McKenney" Currently, callbacks are advanced each time the corresponding CPU notices a change in its leaf rcu_node structure's ->completed value (this value counts grace-period completions). This approach has worked quite well, but with the advent of RCU_FAST_NO_HZ, we cannot count on a given CPU seeing all the grace-period completions. When a CPU misses a grace-period completion that occurs while it is in dyntick-idle mode, this will delay invocation of its callbacks. In addition, acceleration of callbacks (when RCU realizes that a given callback need only wait until the end of the next grace period, rather than having to wait for a partial grace period followed by a full grace period) must be carried out extremely carefully. Insufficient acceleration will result in unnecessarily long grace-period latencies, while excessive acceleration will result in premature callback invocation. Changes that involve this tradeoff are therefore among the most nerve-wracking changes to RCU. This commit therefore explicitly tags groups of callbacks with the number of the grace period that they are waiting for. This means that callback-advancement and callback-acceleration functions are idempotent, so that excessive acceleration will merely waste a few CPU cycles. This also allows a CPU to take full advantage of any grace periods that have elapsed while it has been in dyntick-idle mode. It should also enable simulataneous simplifications to and optimizations of RCU_FAST_NO_HZ. Signed-off-by: Paul E. McKenney Signed-off-by: Paul E. McKenney --- kernel/rcutree.c | 195 ++ kernel/rcutree.h |2 + 2 files changed, 169 insertions(+), 28 deletions(-) diff --git a/kernel/rcutree.c b/kernel/rcutree.c index e441b77..ac6a75d 100644 --- a/kernel/rcutree.c +++ b/kernel/rcutree.c @@ -305,17 +305,27 @@ cpu_has_callbacks_ready_to_invoke(struct rcu_data *rdp) } /* - * Does the current CPU require a yet-as-unscheduled grace period? + * Does the current CPU require a not-yet-started grace period? + * The caller must have disabled interrupts to prevent races with + * normal callback registry. */ static int cpu_needs_another_gp(struct rcu_state *rsp, struct rcu_data *rdp) { - struct rcu_head **ntp; + int i; - ntp = rdp->nxttail[RCU_DONE_TAIL + - (ACCESS_ONCE(rsp->completed) != rdp->completed)]; - return rdp->nxttail[RCU_DONE_TAIL] && ntp && *ntp && - !rcu_gp_in_progress(rsp); + if (rcu_gp_in_progress(rsp)) + return 0; /* No, a grace period is already in progress. */ + if (!rdp->nxttail[RCU_NEXT_TAIL]) + return 0; /* No, this is a no-CBs (or offline) CPU. */ + if (*rdp->nxttail[RCU_NEXT_READY_TAIL]) + return 1; /* Yes, this CPU has newly registered callbacks. */ + for (i = RCU_WAIT_TAIL; i < RCU_NEXT_TAIL; i++) + if (rdp->nxttail[i - 1] != rdp->nxttail[i] && + ULONG_CMP_LT(ACCESS_ONCE(rsp->completed), +rdp->nxtcompleted[i])) + return 1; /* Yes, CBs for future grace period. */ + return 0; /* No grace period needed. */ } /* @@ -1071,6 +1081,139 @@ static void init_callback_list(struct rcu_data *rdp) } /* + * Determine the value that ->completed will have at the end of the + * next subsequent grace period. This is used to tag callbacks so that + * a CPU can invoke callbacks in a timely fashion even if that CPU has + * been dyntick-idle for an extended period with callbacks under the + * influence of RCU_FAST_NO_HZ. + * + * The caller must hold rnp->lock with interrupts disabled. + */ +static unsigned long rcu_cbs_completed(struct rcu_state *rsp, + struct rcu_node *rnp) +{ + /* +* If RCU is idle, we just wait for the next grace period. +* But we can only be sure that RCU is idle if we are looking +* at the root rcu_node structure -- otherwise, a new grace +* period might have started, but just not yet gotten around +* to initializing the current non-root rcu_node structure. +*/ + if (rcu_get_root(rsp) == rnp && rnp->gpnum == rnp->completed) + return rnp->completed + 1; + + /* +* Otherwise, wait for a possible partial grace period and +* then the subsequent full grace period. +*/ + return rnp->completed + 2; +} + +/* + * If there is room, assign a ->completed number to any callbacks on + * this CPU that have not already been assigned. Also accelerate any + * callbacks that were previously assigned a ->completed number that has + * since proven to be too conservative, which can happen if callbacks get + * assigned a ->completed number while RCU is idle, but with reference to + * a non-root rcu_node structure. This function is idempotent, so it does + * not hurt to call it
[PATCH tip/core/rcu 1/2] rcu: Tag callback lists with corresponding grace-period number
From: Paul E. McKenney paul...@linux.vnet.ibm.com Currently, callbacks are advanced each time the corresponding CPU notices a change in its leaf rcu_node structure's -completed value (this value counts grace-period completions). This approach has worked quite well, but with the advent of RCU_FAST_NO_HZ, we cannot count on a given CPU seeing all the grace-period completions. When a CPU misses a grace-period completion that occurs while it is in dyntick-idle mode, this will delay invocation of its callbacks. In addition, acceleration of callbacks (when RCU realizes that a given callback need only wait until the end of the next grace period, rather than having to wait for a partial grace period followed by a full grace period) must be carried out extremely carefully. Insufficient acceleration will result in unnecessarily long grace-period latencies, while excessive acceleration will result in premature callback invocation. Changes that involve this tradeoff are therefore among the most nerve-wracking changes to RCU. This commit therefore explicitly tags groups of callbacks with the number of the grace period that they are waiting for. This means that callback-advancement and callback-acceleration functions are idempotent, so that excessive acceleration will merely waste a few CPU cycles. This also allows a CPU to take full advantage of any grace periods that have elapsed while it has been in dyntick-idle mode. It should also enable simulataneous simplifications to and optimizations of RCU_FAST_NO_HZ. Signed-off-by: Paul E. McKenney paul.mcken...@linaro.org Signed-off-by: Paul E. McKenney paul...@linux.vnet.ibm.com --- kernel/rcutree.c | 195 ++ kernel/rcutree.h |2 + 2 files changed, 169 insertions(+), 28 deletions(-) diff --git a/kernel/rcutree.c b/kernel/rcutree.c index e441b77..ac6a75d 100644 --- a/kernel/rcutree.c +++ b/kernel/rcutree.c @@ -305,17 +305,27 @@ cpu_has_callbacks_ready_to_invoke(struct rcu_data *rdp) } /* - * Does the current CPU require a yet-as-unscheduled grace period? + * Does the current CPU require a not-yet-started grace period? + * The caller must have disabled interrupts to prevent races with + * normal callback registry. */ static int cpu_needs_another_gp(struct rcu_state *rsp, struct rcu_data *rdp) { - struct rcu_head **ntp; + int i; - ntp = rdp-nxttail[RCU_DONE_TAIL + - (ACCESS_ONCE(rsp-completed) != rdp-completed)]; - return rdp-nxttail[RCU_DONE_TAIL] ntp *ntp - !rcu_gp_in_progress(rsp); + if (rcu_gp_in_progress(rsp)) + return 0; /* No, a grace period is already in progress. */ + if (!rdp-nxttail[RCU_NEXT_TAIL]) + return 0; /* No, this is a no-CBs (or offline) CPU. */ + if (*rdp-nxttail[RCU_NEXT_READY_TAIL]) + return 1; /* Yes, this CPU has newly registered callbacks. */ + for (i = RCU_WAIT_TAIL; i RCU_NEXT_TAIL; i++) + if (rdp-nxttail[i - 1] != rdp-nxttail[i] + ULONG_CMP_LT(ACCESS_ONCE(rsp-completed), +rdp-nxtcompleted[i])) + return 1; /* Yes, CBs for future grace period. */ + return 0; /* No grace period needed. */ } /* @@ -1071,6 +1081,139 @@ static void init_callback_list(struct rcu_data *rdp) } /* + * Determine the value that -completed will have at the end of the + * next subsequent grace period. This is used to tag callbacks so that + * a CPU can invoke callbacks in a timely fashion even if that CPU has + * been dyntick-idle for an extended period with callbacks under the + * influence of RCU_FAST_NO_HZ. + * + * The caller must hold rnp-lock with interrupts disabled. + */ +static unsigned long rcu_cbs_completed(struct rcu_state *rsp, + struct rcu_node *rnp) +{ + /* +* If RCU is idle, we just wait for the next grace period. +* But we can only be sure that RCU is idle if we are looking +* at the root rcu_node structure -- otherwise, a new grace +* period might have started, but just not yet gotten around +* to initializing the current non-root rcu_node structure. +*/ + if (rcu_get_root(rsp) == rnp rnp-gpnum == rnp-completed) + return rnp-completed + 1; + + /* +* Otherwise, wait for a possible partial grace period and +* then the subsequent full grace period. +*/ + return rnp-completed + 2; +} + +/* + * If there is room, assign a -completed number to any callbacks on + * this CPU that have not already been assigned. Also accelerate any + * callbacks that were previously assigned a -completed number that has + * since proven to be too conservative, which can happen if callbacks get + * assigned a -completed number while RCU is idle, but with reference to + * a non-root rcu_node structure. This function is idempotent,