From: Joel Fernandes <j...@joelfernandes.org>

Please consider this is an RFC for discussion only. Just want to discuss
why the GP_REPLAY state is needed at all.

Here's the intention AFAICS:
When rcu_sync_exit() has happened, the gp_state changes to GP_EXIT while
we wait for a grace period before transitioning to GP_IDLE. In the
meanwhile, if we receive another rcu_sync_exit(), then we want to wait
for another GP to account for that.

Drawing some timing diagrams, it looks like this:

Legend:
rse = rcu_sync_enter
rsx = rcu_sync_exit
i = GP_IDLE
x = GP_EXIT
r = GP_REPLAY
e = GP_ENTER
p = GP_PASSED
rx = GP_REPLAY changes to GP_EXIT

GP num = The GP we are one.

note: A GP passes between the states:
  e and p
  x and i
  x and rx
  rx and i

In a simple case, the timing and states look like:
time
---------------------->
GP num         1111111    2222222
GP state  i    e     p    x     i
CPU0 :         rse        rsx

However we can enter the replay state like this:
time
---------------------->
GP num         1111111    2222222222222222222223333333
GP state  i    e     p    x              r     rx    i
CPU0 :         rse        rsx
CPU1 :                         rse     rsx

Due to the second rse + rsx, we had to wait for another GP.

I believe the rationale is, if another rsx happens, another GP has to
happen.

But this is not always true if you consider the following events:

time
---------------------->
GP num         111111     22222222222222222222222222222222233333333
GP state  i    e     p    x                 r              rx     i
CPU0 :         rse        rsx
CPU1 :                         rse     rsx
CPU2 :                                         rse     rsx

Here, we had 3 grace periods that elapsed, 1 for the rcu_sync_enter(),
and 2 for the rcu_sync_exit(s).

However, we had 3 rcu_sync_exit()s, not 2. In other words, the
rcu_sync_exit() got batched.

So my point here is, rcu_sync_exit() does not really always cause a new
GP to happen and we can end up having the rcu_sync_exit()s as batched
and sharing the same grace period.

Then what is the point of the GP_REPLAY state at all if it does not
always wait for a new GP?  Taking a step back, why did we intend to have
to wait for a new GP if another rcu_sync_exit() comes while one is still
in progress?

Cc: bris...@redhat.com
Cc: pet...@infradead.org
Cc: o...@redhat.com
Cc: paul...@kernel.org
Cc: r...@vger.kernel.org
Signed-off-by: Joel Fernandes (Google) <j...@joelfernandes.org>
---
 kernel/rcu/sync.c | 14 ++------------
 1 file changed, 2 insertions(+), 12 deletions(-)

diff --git a/kernel/rcu/sync.c b/kernel/rcu/sync.c
index d4558ab7a07d..4f3aad67992c 100644
--- a/kernel/rcu/sync.c
+++ b/kernel/rcu/sync.c
@@ -10,7 +10,7 @@
 #include <linux/rcu_sync.h>
 #include <linux/sched.h>
 
-enum { GP_IDLE = 0, GP_ENTER, GP_PASSED, GP_EXIT, GP_REPLAY };
+enum { GP_IDLE = 0, GP_ENTER, GP_PASSED, GP_EXIT };
 
 #define        rss_lock        gp_wait.lock
 
@@ -85,13 +85,6 @@ static void rcu_sync_func(struct rcu_head *rhp)
                 */
                WRITE_ONCE(rsp->gp_state, GP_PASSED);
                wake_up_locked(&rsp->gp_wait);
-       } else if (rsp->gp_state == GP_REPLAY) {
-               /*
-                * A new rcu_sync_exit() has happened; requeue the callback to
-                * catch a later GP.
-                */
-               WRITE_ONCE(rsp->gp_state, GP_EXIT);
-               rcu_sync_call(rsp);
        } else {
                /*
                 * We're at least a GP after the last rcu_sync_exit(); eveybody
@@ -167,16 +160,13 @@ void rcu_sync_enter(struct rcu_sync *rsp)
  */
 void rcu_sync_exit(struct rcu_sync *rsp)
 {
-       WARN_ON_ONCE(READ_ONCE(rsp->gp_state) == GP_IDLE);
-       WARN_ON_ONCE(READ_ONCE(rsp->gp_count) == 0);
+       WARN_ON_ONCE(READ_ONCE(rsp->gp_state) < GP_PASSED);
 
        spin_lock_irq(&rsp->rss_lock);
        if (!--rsp->gp_count) {
                if (rsp->gp_state == GP_PASSED) {
                        WRITE_ONCE(rsp->gp_state, GP_EXIT);
                        rcu_sync_call(rsp);
-               } else if (rsp->gp_state == GP_EXIT) {
-                       WRITE_ONCE(rsp->gp_state, GP_REPLAY);
                }
        }
        spin_unlock_irq(&rsp->rss_lock);
-- 
2.23.0.581.g78d2f28ef7-goog

Reply via email to