On Fri, Oct 04, 2019 at 10:57:41AM -0400, Joel Fernandes (Google) wrote: > From: Joel Fernandes <j...@joelfernandes.org> > > Please consider this is an RFC for discussion only. Just want to discuss > why the GP_REPLAY state is needed at all.
And I messed up the subject prefix, but this is *really* RFC and for discussion purposes :) thanks, - Joel > Here's the intention AFAICS: > When rcu_sync_exit() has happened, the gp_state changes to GP_EXIT while > we wait for a grace period before transitioning to GP_IDLE. In the > meanwhile, if we receive another rcu_sync_exit(), then we want to wait > for another GP to account for that. > > Drawing some timing diagrams, it looks like this: > > Legend: > rse = rcu_sync_enter > rsx = rcu_sync_exit > i = GP_IDLE > x = GP_EXIT > r = GP_REPLAY > e = GP_ENTER > p = GP_PASSED > rx = GP_REPLAY changes to GP_EXIT > > GP num = The GP we are one. > > note: A GP passes between the states: > e and p > x and i > x and rx > rx and i > > In a simple case, the timing and states look like: > time > ----------------------> > GP num 1111111 2222222 > GP state i e p x i > CPU0 : rse rsx > > However we can enter the replay state like this: > time > ----------------------> > GP num 1111111 2222222222222222222223333333 > GP state i e p x r rx i > CPU0 : rse rsx > CPU1 : rse rsx > > Due to the second rse + rsx, we had to wait for another GP. > > I believe the rationale is, if another rsx happens, another GP has to > happen. > > But this is not always true if you consider the following events: > > time > ----------------------> > GP num 111111 22222222222222222222222222222222233333333 > GP state i e p x r rx i > CPU0 : rse rsx > CPU1 : rse rsx > CPU2 : rse rsx > > Here, we had 3 grace periods that elapsed, 1 for the rcu_sync_enter(), > and 2 for the rcu_sync_exit(s). > > However, we had 3 rcu_sync_exit()s, not 2. In other words, the > rcu_sync_exit() got batched. > > So my point here is, rcu_sync_exit() does not really always cause a new > GP to happen and we can end up having the rcu_sync_exit()s as batched > and sharing the same grace period. > > Then what is the point of the GP_REPLAY state at all if it does not > always wait for a new GP? Taking a step back, why did we intend to have > to wait for a new GP if another rcu_sync_exit() comes while one is still > in progress? > > Cc: bris...@redhat.com > Cc: pet...@infradead.org > Cc: o...@redhat.com > Cc: paul...@kernel.org > Cc: r...@vger.kernel.org > Signed-off-by: Joel Fernandes (Google) <j...@joelfernandes.org> > --- > kernel/rcu/sync.c | 14 ++------------ > 1 file changed, 2 insertions(+), 12 deletions(-) > > diff --git a/kernel/rcu/sync.c b/kernel/rcu/sync.c > index d4558ab7a07d..4f3aad67992c 100644 > --- a/kernel/rcu/sync.c > +++ b/kernel/rcu/sync.c > @@ -10,7 +10,7 @@ > #include <linux/rcu_sync.h> > #include <linux/sched.h> > > -enum { GP_IDLE = 0, GP_ENTER, GP_PASSED, GP_EXIT, GP_REPLAY }; > +enum { GP_IDLE = 0, GP_ENTER, GP_PASSED, GP_EXIT }; > > #define rss_lock gp_wait.lock > > @@ -85,13 +85,6 @@ static void rcu_sync_func(struct rcu_head *rhp) > */ > WRITE_ONCE(rsp->gp_state, GP_PASSED); > wake_up_locked(&rsp->gp_wait); > - } else if (rsp->gp_state == GP_REPLAY) { > - /* > - * A new rcu_sync_exit() has happened; requeue the callback to > - * catch a later GP. > - */ > - WRITE_ONCE(rsp->gp_state, GP_EXIT); > - rcu_sync_call(rsp); > } else { > /* > * We're at least a GP after the last rcu_sync_exit(); eveybody > @@ -167,16 +160,13 @@ void rcu_sync_enter(struct rcu_sync *rsp) > */ > void rcu_sync_exit(struct rcu_sync *rsp) > { > - WARN_ON_ONCE(READ_ONCE(rsp->gp_state) == GP_IDLE); > - WARN_ON_ONCE(READ_ONCE(rsp->gp_count) == 0); > + WARN_ON_ONCE(READ_ONCE(rsp->gp_state) < GP_PASSED); > > spin_lock_irq(&rsp->rss_lock); > if (!--rsp->gp_count) { > if (rsp->gp_state == GP_PASSED) { > WRITE_ONCE(rsp->gp_state, GP_EXIT); > rcu_sync_call(rsp); > - } else if (rsp->gp_state == GP_EXIT) { > - WRITE_ONCE(rsp->gp_state, GP_REPLAY); > } > } > spin_unlock_irq(&rsp->rss_lock); > -- > 2.23.0.581.g78d2f28ef7-goog >