Re: [PATCH tip/core/rcu 1/3] rcu: Provide polling interfaces for Tree RCU grace periods

2021-03-19 Thread Frederic Weisbecker
On Fri, Mar 19, 2021 at 04:38:48PM -0700, Paul E. McKenney wrote:
> > I didn't even think that far.
> > My scenario was:
> > 
> > 1.  cookie = start_poll_synchronize_rcu()
> >  
> >  
> > 2.  cond_synchronize_rcu() checks the cookie and sees that the
> > grace period has not yet expired. So it calls synchronize_rcu()
> > which queues a callback.
> > 
> > 3.  The grace period for the cookie eventually completes.
> > 
> > 4.  The callback queued in 2. gets assigned a new grace period number.
> > That new grace period starts.
> > 
> > 5.  The new grace period completes and synchronize_rcu() returns.
> > 
> > 
> > But I think this is due to some deep misunderstanding from my end.
> 
> You mean like this?
> 
>   oldstate = start_poll_synchronize_rcu();
>   // Why wait?  Beat the rush!!!
>   cond_synchronize_rcu(oldstate);
> 
> This would be a bit silly (why not just call synchronize_rcu()?),
> and yes, this would unconditionally get you an extra RCU grace period.
> Then again, any call to cond_synchronize_rcu() before the desired grace
> period has expired will get you an extra grace period, and maybe more.
> 
> So a given use case either needs to not care about the added latency
> or have a high probability of invoking cond_synchronize_rcu() after
> the desired grace period has expired.

Fair point!

Thanks.


Re: [PATCH tip/core/rcu 1/3] rcu: Provide polling interfaces for Tree RCU grace periods

2021-03-19 Thread Paul E. McKenney
On Fri, Mar 19, 2021 at 11:10:40PM +0100, Frederic Weisbecker wrote:
> On Fri, Mar 19, 2021 at 10:51:16AM -0700, Paul E. McKenney wrote:
> > On Fri, Mar 19, 2021 at 02:58:54PM +0100, Frederic Weisbecker wrote:
> > > It's all a matter of personal taste but if I may suggest some namespace
> > > modifications:
> > > 
> > > get_state_synchronize_rcu() -> synchronize_rcu_poll_start_raw()
> > > start_poll_synchronize_rcu() -> synchronize_rcu_poll_start()
> > > poll_state_synchronize_rcu() -> synchronize_rcu_poll()
> > > cond_synchronize_rcu() -> synchronize_rcu_cond()
> > > 
> > > But it's up to you really.
> > 
> > I am concerned about starting anything "synchronize_rcu" if that
> > thing doesn't unconditionally wait for a grace period.  "What do
> > you mean that there was no grace period?  Don't you see that call to
> > synchronize_rcu_poll_start_raw()???"
> 
> I see, that could indeed be confusing.
> 
> > This objection doesn't apply to cond_synchronize_rcu(), but it is
> > already in use, so any name change should be worked with the users.
> > All two of them.  ;-)
> 
> Probably not worth it. We have cond_resched() as a schedule() counterpart
> for a reference after all.

Good point!

> > > >  /**
> > > > + * start_poll_state_synchronize_rcu - Snapshot and start RCU grace 
> > > > period
> > > > + *
> > > > + * Returns a cookie that is used by a later call to 
> > > > cond_synchronize_rcu()
> > > 
> > > It may be worth noting that calling start_poll_synchronize_rcu() and then
> > > pass the cookie to cond_synchronize_rcu() soon after may end up waiting 
> > > for
> > > one more grace period.
> > 
> > You mean this sequence of events?
> > 
> > 1.  cookie = start_poll_synchronize_rcu()
> > 
> > 2.  The grace period corresponding to cookie is almost over...
> > 
> > 3.  cond_synchronize_rcu() checks the cookie and sees that the
> > grace period has not yet expired.
> > 
> > 4.  The grace period corresponding to cookie completes.
> > 
> > 5.  Someone else starts a grace period.
> > 
> > 6.  cond_synchronize_rcu() invokes synchronize_rcu(), which waits
> > for the just-started grace period plus another grace period.
> > Thus, there has been no fewer than three full grace periods
> > between the call to start_poll_synchronize_rcu() and the
> > return from cond_synchronize_rcu().
> > 
> > Yes, this can happen!  And it can be worse, for example, it is quite
> > possible that cond_synchronize_rcu() would be preempted for multiple
> > grace periods at step 5, in which case it would still wait for almost
> > two additional grace periods.
> > 
> > Or are you thinking of something else?
> 
> I didn't even think that far.
> My scenario was:
> 
> 1.cookie = start_poll_synchronize_rcu()
>  
>  
> 2.cond_synchronize_rcu() checks the cookie and sees that the
>   grace period has not yet expired. So it calls synchronize_rcu()
>   which queues a callback.
> 
> 3.The grace period for the cookie eventually completes.
> 
> 4.The callback queued in 2. gets assigned a new grace period number.
>   That new grace period starts.
> 
> 5.The new grace period completes and synchronize_rcu() returns.
> 
> 
> But I think this is due to some deep misunderstanding from my end.

You mean like this?

oldstate = start_poll_synchronize_rcu();
// Why wait?  Beat the rush!!!
cond_synchronize_rcu(oldstate);

This would be a bit silly (why not just call synchronize_rcu()?),
and yes, this would unconditionally get you an extra RCU grace period.
Then again, any call to cond_synchronize_rcu() before the desired grace
period has expired will get you an extra grace period, and maybe more.

So a given use case either needs to not care about the added latency
or have a high probability of invoking cond_synchronize_rcu() after
the desired grace period has expired.

> > > > + * If a full RCU grace period has elapsed since the earlier call from
> > > > + * which oldstate was obtained, return @true, otherwise return @false.
> > > > + * Otherwise, invoke synchronize_rcu() to wait for a full grace period.
> > > 
> > > Rephrase suggestion for the last sentence:
> > > 
> > > "In case of failure, it's up to the caller to try polling again later or
> > > invoke synchronize_rcu() to wait for a new full grace period to complete."
> > 
> > How about like this?
> > 
> > /**
> >  * poll_state_synchronize_rcu - Conditionally wait for an RCU grace period
> >  *
> >  * @oldstate: return from call to get_state_synchronize_rcu() or 
> > start_poll_synchronize_rcu()
> >  *
> >  * If a full RCU grace period has elapsed since the earlier call from
> >  * which oldstate was obtained, return @true, otherwise return @false.
> >  * If @false is returned, it is the caller's responsibilty to invoke this
> >  * function later on until it does return @true.  Alternatively, the caller
> >  * can explicitly wait for a grace period, for example, by passing @oldstate
> >  * to cond_synchronize_rcu() or by directly 

Re: [PATCH tip/core/rcu 1/3] rcu: Provide polling interfaces for Tree RCU grace periods

2021-03-19 Thread Frederic Weisbecker
On Fri, Mar 19, 2021 at 10:51:16AM -0700, Paul E. McKenney wrote:
> On Fri, Mar 19, 2021 at 02:58:54PM +0100, Frederic Weisbecker wrote:
> > It's all a matter of personal taste but if I may suggest some namespace
> > modifications:
> > 
> > get_state_synchronize_rcu() -> synchronize_rcu_poll_start_raw()
> > start_poll_synchronize_rcu() -> synchronize_rcu_poll_start()
> > poll_state_synchronize_rcu() -> synchronize_rcu_poll()
> > cond_synchronize_rcu() -> synchronize_rcu_cond()
> > 
> > But it's up to you really.
> 
> I am concerned about starting anything "synchronize_rcu" if that
> thing doesn't unconditionally wait for a grace period.  "What do
> you mean that there was no grace period?  Don't you see that call to
> synchronize_rcu_poll_start_raw()???"

I see, that could indeed be confusing.

> 
> This objection doesn't apply to cond_synchronize_rcu(), but it is
> already in use, so any name change should be worked with the users.
> All two of them.  ;-)

Probably not worth it. We have cond_resched() as a schedule() counterpart
for a reference after all.

> > >  /**
> > > + * start_poll_state_synchronize_rcu - Snapshot and start RCU grace period
> > > + *
> > > + * Returns a cookie that is used by a later call to 
> > > cond_synchronize_rcu()
> > 
> > It may be worth noting that calling start_poll_synchronize_rcu() and then
> > pass the cookie to cond_synchronize_rcu() soon after may end up waiting for
> > one more grace period.
> 
> You mean this sequence of events?
> 
> 1.cookie = start_poll_synchronize_rcu()
> 
> 2.The grace period corresponding to cookie is almost over...
> 
> 3.cond_synchronize_rcu() checks the cookie and sees that the
>   grace period has not yet expired.
> 
> 4.The grace period corresponding to cookie completes.
> 
> 5.Someone else starts a grace period.
> 
> 6.cond_synchronize_rcu() invokes synchronize_rcu(), which waits
>   for the just-started grace period plus another grace period.
>   Thus, there has been no fewer than three full grace periods
>   between the call to start_poll_synchronize_rcu() and the
>   return from cond_synchronize_rcu().
> 
> Yes, this can happen!  And it can be worse, for example, it is quite
> possible that cond_synchronize_rcu() would be preempted for multiple
> grace periods at step 5, in which case it would still wait for almost
> two additional grace periods.
> 
> Or are you thinking of something else?

I didn't even think that far.
My scenario was:

1.  cookie = start_poll_synchronize_rcu()
 
 
2.  cond_synchronize_rcu() checks the cookie and sees that the
grace period has not yet expired. So it calls synchronize_rcu()
which queues a callback.

3.  The grace period for the cookie eventually completes.

4.  The callback queued in 2. gets assigned a new grace period number.
That new grace period starts.

5.  The new grace period completes and synchronize_rcu() returns.


But I think this is due to some deep misunderstanding from my end.


> > > + * If a full RCU grace period has elapsed since the earlier call from
> > > + * which oldstate was obtained, return @true, otherwise return @false.
> > > + * Otherwise, invoke synchronize_rcu() to wait for a full grace period.
> > 
> > Rephrase suggestion for the last sentence:
> > 
> > "In case of failure, it's up to the caller to try polling again later or
> > invoke synchronize_rcu() to wait for a new full grace period to complete."
> 
> How about like this?
> 
> /**
>  * poll_state_synchronize_rcu - Conditionally wait for an RCU grace period
>  *
>  * @oldstate: return from call to get_state_synchronize_rcu() or 
> start_poll_synchronize_rcu()
>  *
>  * If a full RCU grace period has elapsed since the earlier call from
>  * which oldstate was obtained, return @true, otherwise return @false.
>  * If @false is returned, it is the caller's responsibilty to invoke this
>  * function later on until it does return @true.  Alternatively, the caller
>  * can explicitly wait for a grace period, for example, by passing @oldstate
>  * to cond_synchronize_rcu() or by directly invoking synchronize_rcu().

Yes very nice!

Thanks!


Re: [PATCH tip/core/rcu 1/3] rcu: Provide polling interfaces for Tree RCU grace periods

2021-03-19 Thread Paul E. McKenney
On Fri, Mar 19, 2021 at 02:58:54PM +0100, Frederic Weisbecker wrote:
> On Wed, Mar 03, 2021 at 04:26:30PM -0800, paul...@kernel.org wrote:
> > From: "Paul E. McKenney" 
> > 
> > There is a need for a non-blocking polling interface for RCU grace
> > periods, so this commit supplies start_poll_synchronize_rcu() and
> > poll_state_synchronize_rcu() for this purpose.  Note that the existing
> > get_state_synchronize_rcu() may be used if future grace periods are
> > inevitable (perhaps due to a later call_rcu() invocation).  The new
> > start_poll_synchronize_rcu() is to be used if future grace periods
> > might not otherwise happen.  Finally, poll_state_synchronize_rcu()
> > provides a lockless check for a grace period having elapsed since
> > the corresponding call to either of the get_state_synchronize_rcu()
> > or start_poll_synchronize_rcu().
> > 
> > As with get_state_synchronize_rcu(), the return value from either
> > get_state_synchronize_rcu() or start_poll_synchronize_rcu() is passed in
> > to a later call to either poll_state_synchronize_rcu() or the existing
> > (might_sleep) cond_synchronize_rcu().
> 
> It's all a matter of personal taste but if I may suggest some namespace
> modifications:
> 
> get_state_synchronize_rcu() -> synchronize_rcu_poll_start_raw()
> start_poll_synchronize_rcu() -> synchronize_rcu_poll_start()
> poll_state_synchronize_rcu() -> synchronize_rcu_poll()
> cond_synchronize_rcu() -> synchronize_rcu_cond()
> 
> But it's up to you really.

I am concerned about starting anything "synchronize_rcu" if that
thing doesn't unconditionally wait for a grace period.  "What do
you mean that there was no grace period?  Don't you see that call to
synchronize_rcu_poll_start_raw()???"

This objection doesn't apply to cond_synchronize_rcu(), but it is
already in use, so any name change should be worked with the users.
All two of them.  ;-)

> >  /**
> > + * start_poll_state_synchronize_rcu - Snapshot and start RCU grace period
> > + *
> > + * Returns a cookie that is used by a later call to cond_synchronize_rcu()
> 
> It may be worth noting that calling start_poll_synchronize_rcu() and then
> pass the cookie to cond_synchronize_rcu() soon after may end up waiting for
> one more grace period.

You mean this sequence of events?

1.  cookie = start_poll_synchronize_rcu()

2.  The grace period corresponding to cookie is almost over...

3.  cond_synchronize_rcu() checks the cookie and sees that the
grace period has not yet expired.

4.  The grace period corresponding to cookie completes.

5.  Someone else starts a grace period.

6.  cond_synchronize_rcu() invokes synchronize_rcu(), which waits
for the just-started grace period plus another grace period.
Thus, there has been no fewer than three full grace periods
between the call to start_poll_synchronize_rcu() and the
return from cond_synchronize_rcu().

Yes, this can happen!  And it can be worse, for example, it is quite
possible that cond_synchronize_rcu() would be preempted for multiple
grace periods at step 5, in which case it would still wait for almost
two additional grace periods.

Or are you thinking of something else?

> > + * or poll_state_synchronize_rcu() to determine whether or not a full
> > + * grace period has elapsed in the meantime.  If the needed grace period
> > + * is not already slated to start, notifies RCU core of the need for that
> > + * grace period.
> > + *
> > + * Interrupts must be enabled for the case where it is necessary to awaken
> > + * the grace-period kthread.
> > + */
> > +unsigned long start_poll_synchronize_rcu(void)
> > +{
> > +   unsigned long flags;
> > +   unsigned long gp_seq = get_state_synchronize_rcu();
> > +   bool needwake;
> > +   struct rcu_data *rdp;
> > +   struct rcu_node *rnp;
> [...]
> > +
> > +/**
> > + * poll_state_synchronize_rcu - Conditionally wait for an RCU grace period
> > + *
> > + * @oldstate: return from call to get_state_synchronize_rcu() or 
> > start_poll_synchronize_rcu()
> > + *
> > + * If a full RCU grace period has elapsed since the earlier call from
> > + * which oldstate was obtained, return @true, otherwise return @false.
> > + * Otherwise, invoke synchronize_rcu() to wait for a full grace period.
> 
> Rephrase suggestion for the last sentence:
> 
> "In case of failure, it's up to the caller to try polling again later or
> invoke synchronize_rcu() to wait for a new full grace period to complete."

How about like this?

/**
 * poll_state_synchronize_rcu - Conditionally wait for an RCU grace period
 *
 * @oldstate: return from call to get_state_synchronize_rcu() or 
start_poll_synchronize_rcu()
 *
 * If a full RCU grace period has elapsed since the earlier call from
 * which oldstate was obtained, return @true, otherwise return @false.
 * If @false is returned, it is the caller's responsibilty to invoke this
 * function later on until it does return @true.  Alternatively, the caller
 * can explicitly wait for a 

Re: [PATCH tip/core/rcu 1/3] rcu: Provide polling interfaces for Tree RCU grace periods

2021-03-19 Thread Frederic Weisbecker
On Wed, Mar 03, 2021 at 04:26:30PM -0800, paul...@kernel.org wrote:
> From: "Paul E. McKenney" 
> 
> There is a need for a non-blocking polling interface for RCU grace
> periods, so this commit supplies start_poll_synchronize_rcu() and
> poll_state_synchronize_rcu() for this purpose.  Note that the existing
> get_state_synchronize_rcu() may be used if future grace periods are
> inevitable (perhaps due to a later call_rcu() invocation).  The new
> start_poll_synchronize_rcu() is to be used if future grace periods
> might not otherwise happen.  Finally, poll_state_synchronize_rcu()
> provides a lockless check for a grace period having elapsed since
> the corresponding call to either of the get_state_synchronize_rcu()
> or start_poll_synchronize_rcu().
> 
> As with get_state_synchronize_rcu(), the return value from either
> get_state_synchronize_rcu() or start_poll_synchronize_rcu() is passed in
> to a later call to either poll_state_synchronize_rcu() or the existing
> (might_sleep) cond_synchronize_rcu().

It's all a matter of personal taste but if I may suggest some namespace
modifications:

get_state_synchronize_rcu() -> synchronize_rcu_poll_start_raw()
start_poll_synchronize_rcu() -> synchronize_rcu_poll_start()
poll_state_synchronize_rcu() -> synchronize_rcu_poll()
cond_synchronize_rcu() -> synchronize_rcu_cond()

But it's up to you really.

>  /**
> + * start_poll_state_synchronize_rcu - Snapshot and start RCU grace period
> + *
> + * Returns a cookie that is used by a later call to cond_synchronize_rcu()

It may be worth noting that calling start_poll_synchronize_rcu() and then
pass the cookie to cond_synchronize_rcu() soon after may end up waiting for
one more grace period.

> + * or poll_state_synchronize_rcu() to determine whether or not a full
> + * grace period has elapsed in the meantime.  If the needed grace period
> + * is not already slated to start, notifies RCU core of the need for that
> + * grace period.
> + *
> + * Interrupts must be enabled for the case where it is necessary to awaken
> + * the grace-period kthread.
> + */
> +unsigned long start_poll_synchronize_rcu(void)
> +{
> + unsigned long flags;
> + unsigned long gp_seq = get_state_synchronize_rcu();
> + bool needwake;
> + struct rcu_data *rdp;
> + struct rcu_node *rnp;
[...]
> +
> +/**
> + * poll_state_synchronize_rcu - Conditionally wait for an RCU grace period
> + *
> + * @oldstate: return from call to get_state_synchronize_rcu() or 
> start_poll_synchronize_rcu()
> + *
> + * If a full RCU grace period has elapsed since the earlier call from
> + * which oldstate was obtained, return @true, otherwise return @false.
> + * Otherwise, invoke synchronize_rcu() to wait for a full grace period.

Rephrase suggestion for the last sentence:

"In case of failure, it's up to the caller to try polling again later or
invoke synchronize_rcu() to wait for a new full grace period to complete."


In any case: Reviewed-by: Frederic Weisbecker 

Thanks!


Re: [PATCH tip/core/rcu 1/3] rcu: Provide polling interfaces for Tree RCU grace periods

2021-03-18 Thread Paul E. McKenney
On Thu, Mar 18, 2021 at 03:59:52PM +0100, Frederic Weisbecker wrote:
> On Tue, Mar 16, 2021 at 09:51:01AM -0700, Paul E. McKenney wrote:
> > On Tue, Mar 16, 2021 at 04:17:50PM +0100, Frederic Weisbecker wrote:
> > > On Wed, Mar 03, 2021 at 04:26:30PM -0800, paul...@kernel.org wrote:
> > > > +/**
> > > > + * poll_state_synchronize_rcu - Conditionally wait for an RCU grace 
> > > > period
> > > > + *
> > > > + * @oldstate: return from call to get_state_synchronize_rcu() or 
> > > > start_poll_synchronize_rcu()
> > > > + *
> > > > + * If a full RCU grace period has elapsed since the earlier call from
> > > > + * which oldstate was obtained, return @true, otherwise return @false.
> > > > + * Otherwise, invoke synchronize_rcu() to wait for a full grace period.
> > > > + *
> > > > + * Yes, this function does not take counter wrap into account.
> > > > + * But counter wrap is harmless.  If the counter wraps, we have waited 
> > > > for
> > > > + * more than 2 billion grace periods (and way more on a 64-bit 
> > > > system!).
> > > > + * Those needing to keep oldstate values for very long time periods
> > > > + * (many hours even on 32-bit systems) should check them occasionally
> > > > + * and either refresh them or set a flag indicating that the grace 
> > > > period
> > > > + * has completed.
> > > > + */
> > > > +bool poll_state_synchronize_rcu(unsigned long oldstate)
> > > > +{
> > > > +   if (rcu_seq_done(_state.gp_seq, oldstate)) {
> > > > +   smp_mb(); /* Ensure GP ends before subsequent accesses. 
> > > > */
> > > 
> > > Also as usual I'm a bit lost with the reason behind those memory barriers.
> > > So this is ordering the read on rcu_state.gp_seq against something (why 
> > > not an
> > > smp_rmb() btw?). And what does it pair with?
> > 
> > Because it needs to order subsequent writes as well as reads.
> > 
> > It is ordering whatever the RCU user wishes to put after the call to
> > poll_state_synchronize_rcu() with whatever the RCU user put before
> > whatever started the grace period that just now completed.  Please
> > see the synchronize_rcu() comment header for the statement of the
> > guarantee.  Or that of call_rcu().
> 
> I see. OTOH the update side's CPU had to report a quiescent state for the
> requested grace period to complete. As the quiescent state propagated along
> with full ordering up to the root rnp, everything that happened before
> rcu_seq_done() should appear before and everything that happened after
> rcu_seq_done() should appear after.
> 
> Now in the case the update side's CPU is not the last CPU that reported
> a quiescent state (and thus not the one that propagated every subsequent
> CPUs QS to the final "rcu_state.gp_seq"), the full barrier after 
> rcu_seq_done()
> is necessary to order against all the CPUs that reported a QS after the
> update side's CPU.
> 
> Is that right?

That is the way I see it.  ;-)

> > For more detail on how these guarantees are implemented, please see
> > Documentation/RCU/Design/Memory-Ordering/Tree-RCU-Memory-Ordering.rst
> > and its many diagrams.
> 
> Indeed, very useful documentation!

Glad you like it!

> > There are a lot of memory barriers that pair and form larger cycles to
> > implement this guarantee.  Pretty much all of the calls to the infamous
> > smp_mb__after_unlock_lock() macro form cycles involving this barrier,
> > for example.
> > 
> > Please do not hesitate to ask more questions.  This underpins RCU.
> 
> Careful what you wish! ;-)

;-) ;-) ;-)

Thanx, Paul


Re: [PATCH tip/core/rcu 1/3] rcu: Provide polling interfaces for Tree RCU grace periods

2021-03-18 Thread Frederic Weisbecker
On Tue, Mar 16, 2021 at 09:51:01AM -0700, Paul E. McKenney wrote:
> On Tue, Mar 16, 2021 at 04:17:50PM +0100, Frederic Weisbecker wrote:
> > On Wed, Mar 03, 2021 at 04:26:30PM -0800, paul...@kernel.org wrote:
> > > +/**
> > > + * poll_state_synchronize_rcu - Conditionally wait for an RCU grace 
> > > period
> > > + *
> > > + * @oldstate: return from call to get_state_synchronize_rcu() or 
> > > start_poll_synchronize_rcu()
> > > + *
> > > + * If a full RCU grace period has elapsed since the earlier call from
> > > + * which oldstate was obtained, return @true, otherwise return @false.
> > > + * Otherwise, invoke synchronize_rcu() to wait for a full grace period.
> > > + *
> > > + * Yes, this function does not take counter wrap into account.
> > > + * But counter wrap is harmless.  If the counter wraps, we have waited 
> > > for
> > > + * more than 2 billion grace periods (and way more on a 64-bit system!).
> > > + * Those needing to keep oldstate values for very long time periods
> > > + * (many hours even on 32-bit systems) should check them occasionally
> > > + * and either refresh them or set a flag indicating that the grace period
> > > + * has completed.
> > > + */
> > > +bool poll_state_synchronize_rcu(unsigned long oldstate)
> > > +{
> > > + if (rcu_seq_done(_state.gp_seq, oldstate)) {
> > > + smp_mb(); /* Ensure GP ends before subsequent accesses. */
> > 
> > Also as usual I'm a bit lost with the reason behind those memory barriers.
> > So this is ordering the read on rcu_state.gp_seq against something (why not 
> > an
> > smp_rmb() btw?). And what does it pair with?
> 
> Because it needs to order subsequent writes as well as reads.
> 
> It is ordering whatever the RCU user wishes to put after the call to
> poll_state_synchronize_rcu() with whatever the RCU user put before
> whatever started the grace period that just now completed.  Please
> see the synchronize_rcu() comment header for the statement of the
> guarantee.  Or that of call_rcu().

I see. OTOH the update side's CPU had to report a quiescent state for the
requested grace period to complete. As the quiescent state propagated along
with full ordering up to the root rnp, everything that happened before
rcu_seq_done() should appear before and everything that happened after
rcu_seq_done() should appear after.

Now in the case the update side's CPU is not the last CPU that reported
a quiescent state (and thus not the one that propagated every subsequent
CPUs QS to the final "rcu_state.gp_seq"), the full barrier after rcu_seq_done()
is necessary to order against all the CPUs that reported a QS after the
update side's CPU.

Is that right?


> 
> For more detail on how these guarantees are implemented, please see
> Documentation/RCU/Design/Memory-Ordering/Tree-RCU-Memory-Ordering.rst
> and its many diagrams.

Indeed, very useful documentation!

> 
> There are a lot of memory barriers that pair and form larger cycles to
> implement this guarantee.  Pretty much all of the calls to the infamous
> smp_mb__after_unlock_lock() macro form cycles involving this barrier,
> for example.
> 
> Please do not hesitate to ask more questions.  This underpins RCU.

Careful what you wish! ;-)

Thanks.


Re: [PATCH tip/core/rcu 1/3] rcu: Provide polling interfaces for Tree RCU grace periods

2021-03-16 Thread Paul E. McKenney
On Tue, Mar 16, 2021 at 04:17:50PM +0100, Frederic Weisbecker wrote:
> On Wed, Mar 03, 2021 at 04:26:30PM -0800, paul...@kernel.org wrote:
> > +/**
> > + * poll_state_synchronize_rcu - Conditionally wait for an RCU grace period
> > + *
> > + * @oldstate: return from call to get_state_synchronize_rcu() or 
> > start_poll_synchronize_rcu()
> > + *
> > + * If a full RCU grace period has elapsed since the earlier call from
> > + * which oldstate was obtained, return @true, otherwise return @false.
> > + * Otherwise, invoke synchronize_rcu() to wait for a full grace period.
> > + *
> > + * Yes, this function does not take counter wrap into account.
> > + * But counter wrap is harmless.  If the counter wraps, we have waited for
> > + * more than 2 billion grace periods (and way more on a 64-bit system!).
> > + * Those needing to keep oldstate values for very long time periods
> > + * (many hours even on 32-bit systems) should check them occasionally
> > + * and either refresh them or set a flag indicating that the grace period
> > + * has completed.
> > + */
> > +bool poll_state_synchronize_rcu(unsigned long oldstate)
> > +{
> > +   if (rcu_seq_done(_state.gp_seq, oldstate)) {
> > +   smp_mb(); /* Ensure GP ends before subsequent accesses. */
> 
> Also as usual I'm a bit lost with the reason behind those memory barriers.
> So this is ordering the read on rcu_state.gp_seq against something (why not an
> smp_rmb() btw?). And what does it pair with?

Because it needs to order subsequent writes as well as reads.

It is ordering whatever the RCU user wishes to put after the call to
poll_state_synchronize_rcu() with whatever the RCU user put before
whatever started the grace period that just now completed.  Please
see the synchronize_rcu() comment header for the statement of the
guarantee.  Or that of call_rcu().

For more detail on how these guarantees are implemented, please see
Documentation/RCU/Design/Memory-Ordering/Tree-RCU-Memory-Ordering.rst
and its many diagrams.

There are a lot of memory barriers that pair and form larger cycles to
implement this guarantee.  Pretty much all of the calls to the infamous
smp_mb__after_unlock_lock() macro form cycles involving this barrier,
for example.

Please do not hesitate to ask more questions.  This underpins RCU.

Thanx, Paul


Re: [PATCH tip/core/rcu 1/3] rcu: Provide polling interfaces for Tree RCU grace periods

2021-03-16 Thread Paul E. McKenney
On Tue, Mar 16, 2021 at 03:47:22PM +0100, Frederic Weisbecker wrote:
> On Wed, Mar 03, 2021 at 04:26:30PM -0800, paul...@kernel.org wrote:
> > +/**
> > + * poll_state_synchronize_rcu - Conditionally wait for an RCU grace period
> > + *
> > + * @oldstate: return from call to get_state_synchronize_rcu() or 
> > start_poll_synchronize_rcu()
> > + *
> > + * If a full RCU grace period has elapsed since the earlier call from
> > + * which oldstate was obtained, return @true, otherwise return @false.
> > + * Otherwise, invoke synchronize_rcu() to wait for a full grace period.
> > + *
> > + * Yes, this function does not take counter wrap into account.
> > + * But counter wrap is harmless.  If the counter wraps, we have waited for
> > + * more than 2 billion grace periods (and way more on a 64-bit system!).
> > + * Those needing to keep oldstate values for very long time periods
> > + * (many hours even on 32-bit systems) should check them occasionally
> > + * and either refresh them or set a flag indicating that the grace period
> > + * has completed.
> > + */
> > +bool poll_state_synchronize_rcu(unsigned long oldstate)
> > +{
> > +   if (rcu_seq_done(_state.gp_seq, oldstate)) {
> > +   smp_mb(); /* Ensure GP ends before subsequent accesses. */
> > +   return true;
> > +   }
> > +   return false;
> > +}
> > +EXPORT_SYMBOL_GPL(poll_state_synchronize_rcu);
> > +
> > +/**
> >   * cond_synchronize_rcu - Conditionally wait for an RCU grace period
> >   *
> >   * @oldstate: return value from earlier call to get_state_synchronize_rcu()
> >   *
> >   * If a full RCU grace period has elapsed since the earlier call to
> > - * get_state_synchronize_rcu(), just return.  Otherwise, invoke
> > - * synchronize_rcu() to wait for a full grace period.
> > + * get_state_synchronize_rcu() or start_poll_synchronize_rcu(), just 
> > return.
> > + * Otherwise, invoke synchronize_rcu() to wait for a full grace period.
> >   *
> >   * Yes, this function does not take counter wrap into account.  But
> >   * counter wrap is harmless.  If the counter wraps, we have waited for
> > @@ -3804,7 +3864,7 @@ EXPORT_SYMBOL_GPL(get_state_synchronize_rcu);
> >   */
> >  void cond_synchronize_rcu(unsigned long oldstate)
> >  {
> > -   if (!rcu_seq_done(_state.gp_seq, oldstate))
> > +   if (!poll_state_synchronize_rcu(oldstate))
> > synchronize_rcu();
> > else
> > smp_mb(); /* Ensure GP ends before subsequent accesses. */
> 
> poll_state_synchronize_rcu() already does the full barrier.

Right you are!  Good catch, thank you!!!

I will fold removing this smp_mb() in with attribution.

Thanx, Paul


Re: [PATCH tip/core/rcu 1/3] rcu: Provide polling interfaces for Tree RCU grace periods

2021-03-16 Thread Frederic Weisbecker
On Wed, Mar 03, 2021 at 04:26:30PM -0800, paul...@kernel.org wrote:
> +/**
> + * poll_state_synchronize_rcu - Conditionally wait for an RCU grace period
> + *
> + * @oldstate: return from call to get_state_synchronize_rcu() or 
> start_poll_synchronize_rcu()
> + *
> + * If a full RCU grace period has elapsed since the earlier call from
> + * which oldstate was obtained, return @true, otherwise return @false.
> + * Otherwise, invoke synchronize_rcu() to wait for a full grace period.
> + *
> + * Yes, this function does not take counter wrap into account.
> + * But counter wrap is harmless.  If the counter wraps, we have waited for
> + * more than 2 billion grace periods (and way more on a 64-bit system!).
> + * Those needing to keep oldstate values for very long time periods
> + * (many hours even on 32-bit systems) should check them occasionally
> + * and either refresh them or set a flag indicating that the grace period
> + * has completed.
> + */
> +bool poll_state_synchronize_rcu(unsigned long oldstate)
> +{
> + if (rcu_seq_done(_state.gp_seq, oldstate)) {
> + smp_mb(); /* Ensure GP ends before subsequent accesses. */

Also as usual I'm a bit lost with the reason behind those memory barriers.
So this is ordering the read on rcu_state.gp_seq against something (why not an
smp_rmb() btw?). And what does it pair with?

Thanks.


Re: [PATCH tip/core/rcu 1/3] rcu: Provide polling interfaces for Tree RCU grace periods

2021-03-16 Thread Frederic Weisbecker
On Wed, Mar 03, 2021 at 04:26:30PM -0800, paul...@kernel.org wrote:
> +/**
> + * poll_state_synchronize_rcu - Conditionally wait for an RCU grace period
> + *
> + * @oldstate: return from call to get_state_synchronize_rcu() or 
> start_poll_synchronize_rcu()
> + *
> + * If a full RCU grace period has elapsed since the earlier call from
> + * which oldstate was obtained, return @true, otherwise return @false.
> + * Otherwise, invoke synchronize_rcu() to wait for a full grace period.
> + *
> + * Yes, this function does not take counter wrap into account.
> + * But counter wrap is harmless.  If the counter wraps, we have waited for
> + * more than 2 billion grace periods (and way more on a 64-bit system!).
> + * Those needing to keep oldstate values for very long time periods
> + * (many hours even on 32-bit systems) should check them occasionally
> + * and either refresh them or set a flag indicating that the grace period
> + * has completed.
> + */
> +bool poll_state_synchronize_rcu(unsigned long oldstate)
> +{
> + if (rcu_seq_done(_state.gp_seq, oldstate)) {
> + smp_mb(); /* Ensure GP ends before subsequent accesses. */
> + return true;
> + }
> + return false;
> +}
> +EXPORT_SYMBOL_GPL(poll_state_synchronize_rcu);
> +
> +/**
>   * cond_synchronize_rcu - Conditionally wait for an RCU grace period
>   *
>   * @oldstate: return value from earlier call to get_state_synchronize_rcu()
>   *
>   * If a full RCU grace period has elapsed since the earlier call to
> - * get_state_synchronize_rcu(), just return.  Otherwise, invoke
> - * synchronize_rcu() to wait for a full grace period.
> + * get_state_synchronize_rcu() or start_poll_synchronize_rcu(), just return.
> + * Otherwise, invoke synchronize_rcu() to wait for a full grace period.
>   *
>   * Yes, this function does not take counter wrap into account.  But
>   * counter wrap is harmless.  If the counter wraps, we have waited for
> @@ -3804,7 +3864,7 @@ EXPORT_SYMBOL_GPL(get_state_synchronize_rcu);
>   */
>  void cond_synchronize_rcu(unsigned long oldstate)
>  {
> - if (!rcu_seq_done(_state.gp_seq, oldstate))
> + if (!poll_state_synchronize_rcu(oldstate))
>   synchronize_rcu();
>   else
>   smp_mb(); /* Ensure GP ends before subsequent accesses. */

poll_state_synchronize_rcu() already does the full barrier.

> -- 
> 2.9.5
> 


Re: [PATCH tip/core/rcu 1/3] rcu: Provide polling interfaces for Tree RCU grace periods

2021-03-15 Thread Paul E. McKenney
On Fri, Mar 12, 2021 at 01:26:47PM +0100, Frederic Weisbecker wrote:
> On Wed, Mar 03, 2021 at 04:26:30PM -0800, paul...@kernel.org wrote:
> >  /**
> > + * start_poll_state_synchronize_rcu - Snapshot and start RCU grace period
> > + *
> > + * Returns a cookie that is used by a later call to cond_synchronize_rcu()
> > + * or poll_state_synchronize_rcu() to determine whether or not a full
> > + * grace period has elapsed in the meantime.  If the needed grace period
> > + * is not already slated to start, notifies RCU core of the need for that
> > + * grace period.
> > + *
> > + * Interrupts must be enabled for the case where it is necessary to awaken
> > + * the grace-period kthread.
> > + */
> > +unsigned long start_poll_synchronize_rcu(void)
> > +{
> > +   unsigned long flags;
> > +   unsigned long gp_seq = get_state_synchronize_rcu();
> 
> Ah! It's using rcu_seq_snap() and not rcu_seq_current() and therefore it's
> waiting for a safe future grace period, right?

Exactly!  ;-)

Thanx, Paul

> If so, please discard my previous email.
> 
> Thanks.


Re: [PATCH tip/core/rcu 1/3] rcu: Provide polling interfaces for Tree RCU grace periods

2021-03-12 Thread Frederic Weisbecker
On Wed, Mar 03, 2021 at 04:26:30PM -0800, paul...@kernel.org wrote:
>  /**
> + * start_poll_state_synchronize_rcu - Snapshot and start RCU grace period
> + *
> + * Returns a cookie that is used by a later call to cond_synchronize_rcu()
> + * or poll_state_synchronize_rcu() to determine whether or not a full
> + * grace period has elapsed in the meantime.  If the needed grace period
> + * is not already slated to start, notifies RCU core of the need for that
> + * grace period.
> + *
> + * Interrupts must be enabled for the case where it is necessary to awaken
> + * the grace-period kthread.
> + */
> +unsigned long start_poll_synchronize_rcu(void)
> +{
> + unsigned long flags;
> + unsigned long gp_seq = get_state_synchronize_rcu();

Ah! It's using rcu_seq_snap() and not rcu_seq_current() and therefore it's
waiting for a safe future grace period, right?

If so, please discard my previous email.

Thanks.


Re: [PATCH tip/core/rcu 1/3] rcu: Provide polling interfaces for Tree RCU grace periods

2021-03-12 Thread Frederic Weisbecker
On Wed, Mar 03, 2021 at 04:26:30PM -0800, paul...@kernel.org wrote:
> From: "Paul E. McKenney" 
> 
> There is a need for a non-blocking polling interface for RCU grace
> periods, so this commit supplies start_poll_synchronize_rcu() and
> poll_state_synchronize_rcu() for this purpose.  Note that the existing
> get_state_synchronize_rcu() may be used if future grace periods are
> inevitable (perhaps due to a later call_rcu() invocation).  The new
> start_poll_synchronize_rcu() is to be used if future grace periods
> might not otherwise happen.

By future grace period, you mean if a grace period has been started right
_before_ we start polling, right?


> Finally, poll_state_synchronize_rcu()
> provides a lockless check for a grace period having elapsed since
> the corresponding call to either of the get_state_synchronize_rcu()
> or start_poll_synchronize_rcu().
> 
> As with get_state_synchronize_rcu(), the return value from either
> get_state_synchronize_rcu() or start_poll_synchronize_rcu() is passed in
> to a later call to either poll_state_synchronize_rcu() or the existing
> (might_sleep) cond_synchronize_rcu().
> 
> Signed-off-by: Paul E. McKenney 
[...]
>  /**
> + * start_poll_state_synchronize_rcu - Snapshot and start RCU grace period
> + *
> + * Returns a cookie that is used by a later call to cond_synchronize_rcu()
> + * or poll_state_synchronize_rcu() to determine whether or not a full
> + * grace period has elapsed in the meantime.  If the needed grace period
> + * is not already slated to start, notifies RCU core of the need for that
> + * grace period.
> + *
> + * Interrupts must be enabled for the case where it is necessary to awaken
> + * the grace-period kthread.
> + */
> +unsigned long start_poll_synchronize_rcu(void)
> +{
> + unsigned long flags;
> + unsigned long gp_seq = get_state_synchronize_rcu();
> + bool needwake;
> + struct rcu_data *rdp;
> + struct rcu_node *rnp;
> +
> + lockdep_assert_irqs_enabled();
> + local_irq_save(flags);
> + rdp = this_cpu_ptr(_data);
> + rnp = rdp->mynode;
> + raw_spin_lock_rcu_node(rnp); // irqs already disabled.
> + needwake = rcu_start_this_gp(rnp, rdp, gp_seq);

I'm a bit surprised we don't start a new grace period instead of snapshotting
the current one.

So if we do this:

   //start grace period gp_num=5

   old = p;
   rcu_assign_pointer(p, new);

   num = start_poll_synchronize_rcu(); // num = 5

   //grace period ends, start new gp_num=6

   poll_state_synchronize_rcu(num); // rcu seq is done

   kfree(old);

Isn't there a risk that other CPUs still see the old pointer?

Of course I know I'm missing something obvious :-)

Thanks.


[PATCH tip/core/rcu 1/3] rcu: Provide polling interfaces for Tree RCU grace periods

2021-03-03 Thread paulmck
From: "Paul E. McKenney" 

There is a need for a non-blocking polling interface for RCU grace
periods, so this commit supplies start_poll_synchronize_rcu() and
poll_state_synchronize_rcu() for this purpose.  Note that the existing
get_state_synchronize_rcu() may be used if future grace periods are
inevitable (perhaps due to a later call_rcu() invocation).  The new
start_poll_synchronize_rcu() is to be used if future grace periods
might not otherwise happen.  Finally, poll_state_synchronize_rcu()
provides a lockless check for a grace period having elapsed since
the corresponding call to either of the get_state_synchronize_rcu()
or start_poll_synchronize_rcu().

As with get_state_synchronize_rcu(), the return value from either
get_state_synchronize_rcu() or start_poll_synchronize_rcu() is passed in
to a later call to either poll_state_synchronize_rcu() or the existing
(might_sleep) cond_synchronize_rcu().

Signed-off-by: Paul E. McKenney 
---
 include/linux/rcutree.h |  2 ++
 kernel/rcu/tree.c   | 70 +
 2 files changed, 67 insertions(+), 5 deletions(-)

diff --git a/include/linux/rcutree.h b/include/linux/rcutree.h
index df578b7..b89b541 100644
--- a/include/linux/rcutree.h
+++ b/include/linux/rcutree.h
@@ -41,6 +41,8 @@ void rcu_momentary_dyntick_idle(void);
 void kfree_rcu_scheduler_running(void);
 bool rcu_gp_might_be_stalled(void);
 unsigned long get_state_synchronize_rcu(void);
+unsigned long start_poll_synchronize_rcu(void);
+bool poll_state_synchronize_rcu(unsigned long oldstate);
 void cond_synchronize_rcu(unsigned long oldstate);
 
 void rcu_idle_enter(void);
diff --git a/kernel/rcu/tree.c b/kernel/rcu/tree.c
index da6f521..8e0a140 100644
--- a/kernel/rcu/tree.c
+++ b/kernel/rcu/tree.c
@@ -3774,8 +3774,8 @@ EXPORT_SYMBOL_GPL(synchronize_rcu);
  * get_state_synchronize_rcu - Snapshot current RCU state
  *
  * Returns a cookie that is used by a later call to cond_synchronize_rcu()
- * to determine whether or not a full grace period has elapsed in the
- * meantime.
+ * or poll_state_synchronize_rcu() to determine whether or not a full
+ * grace period has elapsed in the meantime.
  */
 unsigned long get_state_synchronize_rcu(void)
 {
@@ -3789,13 +3789,73 @@ unsigned long get_state_synchronize_rcu(void)
 EXPORT_SYMBOL_GPL(get_state_synchronize_rcu);
 
 /**
+ * start_poll_state_synchronize_rcu - Snapshot and start RCU grace period
+ *
+ * Returns a cookie that is used by a later call to cond_synchronize_rcu()
+ * or poll_state_synchronize_rcu() to determine whether or not a full
+ * grace period has elapsed in the meantime.  If the needed grace period
+ * is not already slated to start, notifies RCU core of the need for that
+ * grace period.
+ *
+ * Interrupts must be enabled for the case where it is necessary to awaken
+ * the grace-period kthread.
+ */
+unsigned long start_poll_synchronize_rcu(void)
+{
+   unsigned long flags;
+   unsigned long gp_seq = get_state_synchronize_rcu();
+   bool needwake;
+   struct rcu_data *rdp;
+   struct rcu_node *rnp;
+
+   lockdep_assert_irqs_enabled();
+   local_irq_save(flags);
+   rdp = this_cpu_ptr(_data);
+   rnp = rdp->mynode;
+   raw_spin_lock_rcu_node(rnp); // irqs already disabled.
+   needwake = rcu_start_this_gp(rnp, rdp, gp_seq);
+   raw_spin_unlock_irqrestore_rcu_node(rnp, flags);
+   if (needwake)
+   rcu_gp_kthread_wake();
+   return gp_seq;
+}
+EXPORT_SYMBOL_GPL(start_poll_synchronize_rcu);
+
+/**
+ * poll_state_synchronize_rcu - Conditionally wait for an RCU grace period
+ *
+ * @oldstate: return from call to get_state_synchronize_rcu() or 
start_poll_synchronize_rcu()
+ *
+ * If a full RCU grace period has elapsed since the earlier call from
+ * which oldstate was obtained, return @true, otherwise return @false.
+ * Otherwise, invoke synchronize_rcu() to wait for a full grace period.
+ *
+ * Yes, this function does not take counter wrap into account.
+ * But counter wrap is harmless.  If the counter wraps, we have waited for
+ * more than 2 billion grace periods (and way more on a 64-bit system!).
+ * Those needing to keep oldstate values for very long time periods
+ * (many hours even on 32-bit systems) should check them occasionally
+ * and either refresh them or set a flag indicating that the grace period
+ * has completed.
+ */
+bool poll_state_synchronize_rcu(unsigned long oldstate)
+{
+   if (rcu_seq_done(_state.gp_seq, oldstate)) {
+   smp_mb(); /* Ensure GP ends before subsequent accesses. */
+   return true;
+   }
+   return false;
+}
+EXPORT_SYMBOL_GPL(poll_state_synchronize_rcu);
+
+/**
  * cond_synchronize_rcu - Conditionally wait for an RCU grace period
  *
  * @oldstate: return value from earlier call to get_state_synchronize_rcu()
  *
  * If a full RCU grace period has elapsed since the earlier call to
- * get_state_synchronize_rcu(), just return.  Otherwise, invoke
- *