[PATCH tip/core/rcu 1/2] rcu: Tag callback lists with corresponding grace-period number

2013-01-26 Thread Paul E. McKenney
From: "Paul E. McKenney" 

Currently, callbacks are advanced each time the corresponding CPU
notices a change in its leaf rcu_node structure's ->completed value
(this value counts grace-period completions).  This approach has worked
quite well, but with the advent of RCU_FAST_NO_HZ, we cannot count on
a given CPU seeing all the grace-period completions.  When a CPU misses
a grace-period completion that occurs while it is in dyntick-idle mode,
this will delay invocation of its callbacks.

In addition, acceleration of callbacks (when RCU realizes that a given
callback need only wait until the end of the next grace period, rather
than having to wait for a partial grace period followed by a full
grace period) must be carried out extremely carefully.  Insufficient
acceleration will result in unnecessarily long grace-period latencies,
while excessive acceleration will result in premature callback invocation.
Changes that involve this tradeoff are therefore among the most
nerve-wracking changes to RCU.

This commit therefore explicitly tags groups of callbacks with the
number of the grace period that they are waiting for.  This means that
callback-advancement and callback-acceleration functions are idempotent,
so that excessive acceleration will merely waste a few CPU cycles.  This
also allows a CPU to take full advantage of any grace periods that have
elapsed while it has been in dyntick-idle mode.  It should also enable
simulataneous simplifications to and optimizations of RCU_FAST_NO_HZ.

Signed-off-by: Paul E. McKenney 
Signed-off-by: Paul E. McKenney 
---
 kernel/rcutree.c |  195 ++
 kernel/rcutree.h |2 +
 2 files changed, 169 insertions(+), 28 deletions(-)

diff --git a/kernel/rcutree.c b/kernel/rcutree.c
index e441b77..ac6a75d 100644
--- a/kernel/rcutree.c
+++ b/kernel/rcutree.c
@@ -305,17 +305,27 @@ cpu_has_callbacks_ready_to_invoke(struct rcu_data *rdp)
 }
 
 /*
- * Does the current CPU require a yet-as-unscheduled grace period?
+ * Does the current CPU require a not-yet-started grace period?
+ * The caller must have disabled interrupts to prevent races with
+ * normal callback registry.
  */
 static int
 cpu_needs_another_gp(struct rcu_state *rsp, struct rcu_data *rdp)
 {
-   struct rcu_head **ntp;
+   int i;
 
-   ntp = rdp->nxttail[RCU_DONE_TAIL +
-  (ACCESS_ONCE(rsp->completed) != rdp->completed)];
-   return rdp->nxttail[RCU_DONE_TAIL] && ntp && *ntp &&
-  !rcu_gp_in_progress(rsp);
+   if (rcu_gp_in_progress(rsp))
+   return 0;  /* No, a grace period is already in progress. */
+   if (!rdp->nxttail[RCU_NEXT_TAIL])
+   return 0;  /* No, this is a no-CBs (or offline) CPU. */
+   if (*rdp->nxttail[RCU_NEXT_READY_TAIL])
+   return 1;  /* Yes, this CPU has newly registered callbacks. */
+   for (i = RCU_WAIT_TAIL; i < RCU_NEXT_TAIL; i++)
+   if (rdp->nxttail[i - 1] != rdp->nxttail[i] &&
+   ULONG_CMP_LT(ACCESS_ONCE(rsp->completed),
+rdp->nxtcompleted[i]))
+   return 1;  /* Yes, CBs for future grace period. */
+   return 0; /* No grace period needed. */
 }
 
 /*
@@ -1071,6 +1081,139 @@ static void init_callback_list(struct rcu_data *rdp)
 }
 
 /*
+ * Determine the value that ->completed will have at the end of the
+ * next subsequent grace period.  This is used to tag callbacks so that
+ * a CPU can invoke callbacks in a timely fashion even if that CPU has
+ * been dyntick-idle for an extended period with callbacks under the
+ * influence of RCU_FAST_NO_HZ.
+ *
+ * The caller must hold rnp->lock with interrupts disabled.
+ */
+static unsigned long rcu_cbs_completed(struct rcu_state *rsp,
+  struct rcu_node *rnp)
+{
+   /*
+* If RCU is idle, we just wait for the next grace period.
+* But we can only be sure that RCU is idle if we are looking
+* at the root rcu_node structure -- otherwise, a new grace
+* period might have started, but just not yet gotten around
+* to initializing the current non-root rcu_node structure.
+*/
+   if (rcu_get_root(rsp) == rnp && rnp->gpnum == rnp->completed)
+   return rnp->completed + 1;
+
+   /*
+* Otherwise, wait for a possible partial grace period and
+* then the subsequent full grace period.
+*/
+   return rnp->completed + 2;
+}
+
+/*
+ * If there is room, assign a ->completed number to any callbacks on
+ * this CPU that have not already been assigned.  Also accelerate any
+ * callbacks that were previously assigned a ->completed number that has
+ * since proven to be too conservative, which can happen if callbacks get
+ * assigned a ->completed number while RCU is idle, but with reference to
+ * a non-root rcu_node structure.  This function is idempotent, so it does
+ * not hurt to call it 

[PATCH tip/core/rcu 1/2] rcu: Tag callback lists with corresponding grace-period number

2013-01-26 Thread Paul E. McKenney
From: Paul E. McKenney paul...@linux.vnet.ibm.com

Currently, callbacks are advanced each time the corresponding CPU
notices a change in its leaf rcu_node structure's -completed value
(this value counts grace-period completions).  This approach has worked
quite well, but with the advent of RCU_FAST_NO_HZ, we cannot count on
a given CPU seeing all the grace-period completions.  When a CPU misses
a grace-period completion that occurs while it is in dyntick-idle mode,
this will delay invocation of its callbacks.

In addition, acceleration of callbacks (when RCU realizes that a given
callback need only wait until the end of the next grace period, rather
than having to wait for a partial grace period followed by a full
grace period) must be carried out extremely carefully.  Insufficient
acceleration will result in unnecessarily long grace-period latencies,
while excessive acceleration will result in premature callback invocation.
Changes that involve this tradeoff are therefore among the most
nerve-wracking changes to RCU.

This commit therefore explicitly tags groups of callbacks with the
number of the grace period that they are waiting for.  This means that
callback-advancement and callback-acceleration functions are idempotent,
so that excessive acceleration will merely waste a few CPU cycles.  This
also allows a CPU to take full advantage of any grace periods that have
elapsed while it has been in dyntick-idle mode.  It should also enable
simulataneous simplifications to and optimizations of RCU_FAST_NO_HZ.

Signed-off-by: Paul E. McKenney paul.mcken...@linaro.org
Signed-off-by: Paul E. McKenney paul...@linux.vnet.ibm.com
---
 kernel/rcutree.c |  195 ++
 kernel/rcutree.h |2 +
 2 files changed, 169 insertions(+), 28 deletions(-)

diff --git a/kernel/rcutree.c b/kernel/rcutree.c
index e441b77..ac6a75d 100644
--- a/kernel/rcutree.c
+++ b/kernel/rcutree.c
@@ -305,17 +305,27 @@ cpu_has_callbacks_ready_to_invoke(struct rcu_data *rdp)
 }
 
 /*
- * Does the current CPU require a yet-as-unscheduled grace period?
+ * Does the current CPU require a not-yet-started grace period?
+ * The caller must have disabled interrupts to prevent races with
+ * normal callback registry.
  */
 static int
 cpu_needs_another_gp(struct rcu_state *rsp, struct rcu_data *rdp)
 {
-   struct rcu_head **ntp;
+   int i;
 
-   ntp = rdp-nxttail[RCU_DONE_TAIL +
-  (ACCESS_ONCE(rsp-completed) != rdp-completed)];
-   return rdp-nxttail[RCU_DONE_TAIL]  ntp  *ntp 
-  !rcu_gp_in_progress(rsp);
+   if (rcu_gp_in_progress(rsp))
+   return 0;  /* No, a grace period is already in progress. */
+   if (!rdp-nxttail[RCU_NEXT_TAIL])
+   return 0;  /* No, this is a no-CBs (or offline) CPU. */
+   if (*rdp-nxttail[RCU_NEXT_READY_TAIL])
+   return 1;  /* Yes, this CPU has newly registered callbacks. */
+   for (i = RCU_WAIT_TAIL; i  RCU_NEXT_TAIL; i++)
+   if (rdp-nxttail[i - 1] != rdp-nxttail[i] 
+   ULONG_CMP_LT(ACCESS_ONCE(rsp-completed),
+rdp-nxtcompleted[i]))
+   return 1;  /* Yes, CBs for future grace period. */
+   return 0; /* No grace period needed. */
 }
 
 /*
@@ -1071,6 +1081,139 @@ static void init_callback_list(struct rcu_data *rdp)
 }
 
 /*
+ * Determine the value that -completed will have at the end of the
+ * next subsequent grace period.  This is used to tag callbacks so that
+ * a CPU can invoke callbacks in a timely fashion even if that CPU has
+ * been dyntick-idle for an extended period with callbacks under the
+ * influence of RCU_FAST_NO_HZ.
+ *
+ * The caller must hold rnp-lock with interrupts disabled.
+ */
+static unsigned long rcu_cbs_completed(struct rcu_state *rsp,
+  struct rcu_node *rnp)
+{
+   /*
+* If RCU is idle, we just wait for the next grace period.
+* But we can only be sure that RCU is idle if we are looking
+* at the root rcu_node structure -- otherwise, a new grace
+* period might have started, but just not yet gotten around
+* to initializing the current non-root rcu_node structure.
+*/
+   if (rcu_get_root(rsp) == rnp  rnp-gpnum == rnp-completed)
+   return rnp-completed + 1;
+
+   /*
+* Otherwise, wait for a possible partial grace period and
+* then the subsequent full grace period.
+*/
+   return rnp-completed + 2;
+}
+
+/*
+ * If there is room, assign a -completed number to any callbacks on
+ * this CPU that have not already been assigned.  Also accelerate any
+ * callbacks that were previously assigned a -completed number that has
+ * since proven to be too conservative, which can happen if callbacks get
+ * assigned a -completed number while RCU is idle, but with reference to
+ * a non-root rcu_node structure.  This function is idempotent,