Re: [PATCH RFC] tracepoint: Introduce tracepoint callbacks executing with preempt on

2018-04-27 Thread Mathieu Desnoyers
- On Apr 27, 2018, at 2:11 PM, Joel Fernandes joe...@google.com wrote:

> On Fri, Apr 27, 2018 at 9:37 AM, Steven Rostedt  wrote:
>> On Fri, 27 Apr 2018 09:30:05 -0700
>> Joel Fernandes  wrote:
>>
>>> On Fri, Apr 27, 2018 at 7:47 AM, Steven Rostedt  wrote:
>>> > On Fri, 27 Apr 2018 10:26:29 -0400 (EDT)
>>> > Mathieu Desnoyers  wrote:
>>> >
>>> >> The general approach and the implementation look fine, except for
>>> >> one small detail: I would be tempted to explicitly disable preemption
>>> >> around the call to the tracepoint callback for the rcuidle variant,
>>> >> unless we plan to audit every tracer right away to remove any assumption
>>> >> that preemption is disabled in the callback implementation.
>>> >
>>> > I'm thinking that we do that audit. There shouldn't be many instances
>>> > of it. I like the idea that a tracepoint callback gets called with
>>> > preemption enabled.
>>>
>>> Here is the list of all callers of the _rcuidle :
>>
>> I was thinking of auditing who registers callbacks to any tracepoints.
> 
> Ok. If you feel strongly about this, I think for now I could also just
> wrap the callback execution with preempt_disable_notrace. And, when/if
> we get to doing the blocking callbacks work, we can considering
> keeping preempts on.

My main point here is to introduce the minimal change (keeping preemption
disabled) needed for the rcuidle variant, and only tackle the work of
dealing with preemptible callbacks when we really need it and when we can
properly test it (e.g. by using it for syscall entry/exit tracing).

Thanks,

Mathieu

-- 
Mathieu Desnoyers
EfficiOS Inc.
http://www.efficios.com


Re: [PATCH RFC] tracepoint: Introduce tracepoint callbacks executing with preempt on

2018-04-27 Thread Mathieu Desnoyers
- On Apr 27, 2018, at 2:11 PM, Joel Fernandes joe...@google.com wrote:

> On Fri, Apr 27, 2018 at 9:37 AM, Steven Rostedt  wrote:
>> On Fri, 27 Apr 2018 09:30:05 -0700
>> Joel Fernandes  wrote:
>>
>>> On Fri, Apr 27, 2018 at 7:47 AM, Steven Rostedt  wrote:
>>> > On Fri, 27 Apr 2018 10:26:29 -0400 (EDT)
>>> > Mathieu Desnoyers  wrote:
>>> >
>>> >> The general approach and the implementation look fine, except for
>>> >> one small detail: I would be tempted to explicitly disable preemption
>>> >> around the call to the tracepoint callback for the rcuidle variant,
>>> >> unless we plan to audit every tracer right away to remove any assumption
>>> >> that preemption is disabled in the callback implementation.
>>> >
>>> > I'm thinking that we do that audit. There shouldn't be many instances
>>> > of it. I like the idea that a tracepoint callback gets called with
>>> > preemption enabled.
>>>
>>> Here is the list of all callers of the _rcuidle :
>>
>> I was thinking of auditing who registers callbacks to any tracepoints.
> 
> Ok. If you feel strongly about this, I think for now I could also just
> wrap the callback execution with preempt_disable_notrace. And, when/if
> we get to doing the blocking callbacks work, we can considering
> keeping preempts on.

My main point here is to introduce the minimal change (keeping preemption
disabled) needed for the rcuidle variant, and only tackle the work of
dealing with preemptible callbacks when we really need it and when we can
properly test it (e.g. by using it for syscall entry/exit tracing).

Thanks,

Mathieu

-- 
Mathieu Desnoyers
EfficiOS Inc.
http://www.efficios.com


Re: [PATCH RFC] tracepoint: Introduce tracepoint callbacks executing with preempt on

2018-04-27 Thread Joel Fernandes
On Fri, Apr 27, 2018 at 9:37 AM, Steven Rostedt  wrote:
> On Fri, 27 Apr 2018 09:30:05 -0700
> Joel Fernandes  wrote:
>
>> On Fri, Apr 27, 2018 at 7:47 AM, Steven Rostedt  wrote:
>> > On Fri, 27 Apr 2018 10:26:29 -0400 (EDT)
>> > Mathieu Desnoyers  wrote:
>> >
>> >> The general approach and the implementation look fine, except for
>> >> one small detail: I would be tempted to explicitly disable preemption
>> >> around the call to the tracepoint callback for the rcuidle variant,
>> >> unless we plan to audit every tracer right away to remove any assumption
>> >> that preemption is disabled in the callback implementation.
>> >
>> > I'm thinking that we do that audit. There shouldn't be many instances
>> > of it. I like the idea that a tracepoint callback gets called with
>> > preemption enabled.
>>
>> Here is the list of all callers of the _rcuidle :
>
> I was thinking of auditing who registers callbacks to any tracepoints.

Ok. If you feel strongly about this, I think for now I could also just
wrap the callback execution with preempt_disable_notrace. And, when/if
we get to doing the blocking callbacks work, we can considering
keeping preempts on.

thanks,

- Joel


Re: [PATCH RFC] tracepoint: Introduce tracepoint callbacks executing with preempt on

2018-04-27 Thread Joel Fernandes
On Fri, Apr 27, 2018 at 9:37 AM, Steven Rostedt  wrote:
> On Fri, 27 Apr 2018 09:30:05 -0700
> Joel Fernandes  wrote:
>
>> On Fri, Apr 27, 2018 at 7:47 AM, Steven Rostedt  wrote:
>> > On Fri, 27 Apr 2018 10:26:29 -0400 (EDT)
>> > Mathieu Desnoyers  wrote:
>> >
>> >> The general approach and the implementation look fine, except for
>> >> one small detail: I would be tempted to explicitly disable preemption
>> >> around the call to the tracepoint callback for the rcuidle variant,
>> >> unless we plan to audit every tracer right away to remove any assumption
>> >> that preemption is disabled in the callback implementation.
>> >
>> > I'm thinking that we do that audit. There shouldn't be many instances
>> > of it. I like the idea that a tracepoint callback gets called with
>> > preemption enabled.
>>
>> Here is the list of all callers of the _rcuidle :
>
> I was thinking of auditing who registers callbacks to any tracepoints.

Ok. If you feel strongly about this, I think for now I could also just
wrap the callback execution with preempt_disable_notrace. And, when/if
we get to doing the blocking callbacks work, we can considering
keeping preempts on.

thanks,

- Joel


Re: [PATCH RFC] tracepoint: Introduce tracepoint callbacks executing with preempt on

2018-04-27 Thread Paul E. McKenney
On Fri, Apr 27, 2018 at 10:00:48AM -0700, Paul E. McKenney wrote:
> On Fri, Apr 27, 2018 at 12:46:41PM -0400, Steven Rostedt wrote:
> > On Fri, 27 Apr 2018 09:45:54 -0700
> > "Paul E. McKenney"  wrote:
> > 
> > > > > That shouldn't be needed. For the rcu_read_lock_sched case, there is a
> > > > > preempt_disable which needs to be a notrace, but for the srcu one,
> > > > > since we don't do that, I think it should be fine.  
> > > > 
> > > > Actually, I think I may agree here too. Because the _notrace is for
> > > > function tracing, and it shouldn't affect it. If people don't want it
> > > > traced, they could add those functions to the list in the notrace file. 
> > > >  
> > > 
> > > OK, feel free to ignore my notrace srcu_read_lock() patch, then.  ;-)
> > 
> > Of course I wasn't thinking about the lockdep tracepoints that Joel
> > mentioned, which happens to be the reason for all this discussion in
> > the first place :-)  Now I think we do need it. (OK, I can keep
> > changing my mind, can't I?).
> 
> You can, but at some point I start applying heavy-duty hysteresis.  ;-)
> 
> So the current thought (as of your having sent the above email) is that
> we need notrace versions of srcu_read_lock() and srcu_read_unlock(),
> but not for __srcu_read_lock() and __srcu_read_unlock(), correct?

And Joel noted offline that I messed up srcu_read_unlock_notrace(),
so here is an updated patch with that fixed.

Thanx, Paul



diff --git a/include/linux/srcu.h b/include/linux/srcu.h
index 91494d7e8e41..3e72a291c401 100644
--- a/include/linux/srcu.h
+++ b/include/linux/srcu.h
@@ -195,6 +195,16 @@ static inline int srcu_read_lock(struct srcu_struct *sp) 
__acquires(sp)
return retval;
 }
 
+/* Used by tracing, cannot be traced and cannot invoke lockdep. */
+static inline notrace int
+srcu_read_lock_notrace(struct srcu_struct *sp) __acquires(sp)
+{
+   int retval;
+
+   retval = __srcu_read_lock(sp);
+   return retval;
+}
+
 /**
  * srcu_read_unlock - unregister a old reader from an SRCU-protected structure.
  * @sp: srcu_struct in which to unregister the old reader.
@@ -209,6 +219,13 @@ static inline void srcu_read_unlock(struct srcu_struct 
*sp, int idx)
__srcu_read_unlock(sp, idx);
 }
 
+/* Used by tracing, cannot be traced and cannot call lockdep. */
+static inline notrace void
+srcu_read_unlock_notrace(struct srcu_struct *sp, int idx) __releases(sp)
+{
+   __srcu_read_unlock(sp, idx);
+}
+
 /**
  * smp_mb__after_srcu_read_unlock - ensure full ordering after srcu_read_unlock
  *



Re: [PATCH RFC] tracepoint: Introduce tracepoint callbacks executing with preempt on

2018-04-27 Thread Paul E. McKenney
On Fri, Apr 27, 2018 at 10:00:48AM -0700, Paul E. McKenney wrote:
> On Fri, Apr 27, 2018 at 12:46:41PM -0400, Steven Rostedt wrote:
> > On Fri, 27 Apr 2018 09:45:54 -0700
> > "Paul E. McKenney"  wrote:
> > 
> > > > > That shouldn't be needed. For the rcu_read_lock_sched case, there is a
> > > > > preempt_disable which needs to be a notrace, but for the srcu one,
> > > > > since we don't do that, I think it should be fine.  
> > > > 
> > > > Actually, I think I may agree here too. Because the _notrace is for
> > > > function tracing, and it shouldn't affect it. If people don't want it
> > > > traced, they could add those functions to the list in the notrace file. 
> > > >  
> > > 
> > > OK, feel free to ignore my notrace srcu_read_lock() patch, then.  ;-)
> > 
> > Of course I wasn't thinking about the lockdep tracepoints that Joel
> > mentioned, which happens to be the reason for all this discussion in
> > the first place :-)  Now I think we do need it. (OK, I can keep
> > changing my mind, can't I?).
> 
> You can, but at some point I start applying heavy-duty hysteresis.  ;-)
> 
> So the current thought (as of your having sent the above email) is that
> we need notrace versions of srcu_read_lock() and srcu_read_unlock(),
> but not for __srcu_read_lock() and __srcu_read_unlock(), correct?

And Joel noted offline that I messed up srcu_read_unlock_notrace(),
so here is an updated patch with that fixed.

Thanx, Paul



diff --git a/include/linux/srcu.h b/include/linux/srcu.h
index 91494d7e8e41..3e72a291c401 100644
--- a/include/linux/srcu.h
+++ b/include/linux/srcu.h
@@ -195,6 +195,16 @@ static inline int srcu_read_lock(struct srcu_struct *sp) 
__acquires(sp)
return retval;
 }
 
+/* Used by tracing, cannot be traced and cannot invoke lockdep. */
+static inline notrace int
+srcu_read_lock_notrace(struct srcu_struct *sp) __acquires(sp)
+{
+   int retval;
+
+   retval = __srcu_read_lock(sp);
+   return retval;
+}
+
 /**
  * srcu_read_unlock - unregister a old reader from an SRCU-protected structure.
  * @sp: srcu_struct in which to unregister the old reader.
@@ -209,6 +219,13 @@ static inline void srcu_read_unlock(struct srcu_struct 
*sp, int idx)
__srcu_read_unlock(sp, idx);
 }
 
+/* Used by tracing, cannot be traced and cannot call lockdep. */
+static inline notrace void
+srcu_read_unlock_notrace(struct srcu_struct *sp, int idx) __releases(sp)
+{
+   __srcu_read_unlock(sp, idx);
+}
+
 /**
  * smp_mb__after_srcu_read_unlock - ensure full ordering after srcu_read_unlock
  *



Re: [PATCH RFC] tracepoint: Introduce tracepoint callbacks executing with preempt on

2018-04-27 Thread Paul E. McKenney
On Fri, Apr 27, 2018 at 12:46:41PM -0400, Steven Rostedt wrote:
> On Fri, 27 Apr 2018 09:45:54 -0700
> "Paul E. McKenney"  wrote:
> 
> > > > That shouldn't be needed. For the rcu_read_lock_sched case, there is a
> > > > preempt_disable which needs to be a notrace, but for the srcu one,
> > > > since we don't do that, I think it should be fine.  
> > > 
> > > Actually, I think I may agree here too. Because the _notrace is for
> > > function tracing, and it shouldn't affect it. If people don't want it
> > > traced, they could add those functions to the list in the notrace file.  
> > 
> > OK, feel free to ignore my notrace srcu_read_lock() patch, then.  ;-)
> 
> Of course I wasn't thinking about the lockdep tracepoints that Joel
> mentioned, which happens to be the reason for all this discussion in
> the first place :-)  Now I think we do need it. (OK, I can keep
> changing my mind, can't I?).

You can, but at some point I start applying heavy-duty hysteresis.  ;-)

So the current thought (as of your having sent the above email) is that
we need notrace versions of srcu_read_lock() and srcu_read_unlock(),
but not for __srcu_read_lock() and __srcu_read_unlock(), correct?

Thanx, Paul



Re: [PATCH RFC] tracepoint: Introduce tracepoint callbacks executing with preempt on

2018-04-27 Thread Paul E. McKenney
On Fri, Apr 27, 2018 at 12:46:41PM -0400, Steven Rostedt wrote:
> On Fri, 27 Apr 2018 09:45:54 -0700
> "Paul E. McKenney"  wrote:
> 
> > > > That shouldn't be needed. For the rcu_read_lock_sched case, there is a
> > > > preempt_disable which needs to be a notrace, but for the srcu one,
> > > > since we don't do that, I think it should be fine.  
> > > 
> > > Actually, I think I may agree here too. Because the _notrace is for
> > > function tracing, and it shouldn't affect it. If people don't want it
> > > traced, they could add those functions to the list in the notrace file.  
> > 
> > OK, feel free to ignore my notrace srcu_read_lock() patch, then.  ;-)
> 
> Of course I wasn't thinking about the lockdep tracepoints that Joel
> mentioned, which happens to be the reason for all this discussion in
> the first place :-)  Now I think we do need it. (OK, I can keep
> changing my mind, can't I?).

You can, but at some point I start applying heavy-duty hysteresis.  ;-)

So the current thought (as of your having sent the above email) is that
we need notrace versions of srcu_read_lock() and srcu_read_unlock(),
but not for __srcu_read_lock() and __srcu_read_unlock(), correct?

Thanx, Paul



Re: [PATCH RFC] tracepoint: Introduce tracepoint callbacks executing with preempt on

2018-04-27 Thread Steven Rostedt
On Fri, 27 Apr 2018 09:45:54 -0700
"Paul E. McKenney"  wrote:

> > > That shouldn't be needed. For the rcu_read_lock_sched case, there is a
> > > preempt_disable which needs to be a notrace, but for the srcu one,
> > > since we don't do that, I think it should be fine.  
> > 
> > Actually, I think I may agree here too. Because the _notrace is for
> > function tracing, and it shouldn't affect it. If people don't want it
> > traced, they could add those functions to the list in the notrace file.  
> 
> OK, feel free to ignore my notrace srcu_read_lock() patch, then.  ;-)

Of course I wasn't thinking about the lockdep tracepoints that Joel
mentioned, which happens to be the reason for all this discussion in
the first place :-)  Now I think we do need it. (OK, I can keep
changing my mind, can't I?).

-- Steve


Re: [PATCH RFC] tracepoint: Introduce tracepoint callbacks executing with preempt on

2018-04-27 Thread Steven Rostedt
On Fri, 27 Apr 2018 09:45:54 -0700
"Paul E. McKenney"  wrote:

> > > That shouldn't be needed. For the rcu_read_lock_sched case, there is a
> > > preempt_disable which needs to be a notrace, but for the srcu one,
> > > since we don't do that, I think it should be fine.  
> > 
> > Actually, I think I may agree here too. Because the _notrace is for
> > function tracing, and it shouldn't affect it. If people don't want it
> > traced, they could add those functions to the list in the notrace file.  
> 
> OK, feel free to ignore my notrace srcu_read_lock() patch, then.  ;-)

Of course I wasn't thinking about the lockdep tracepoints that Joel
mentioned, which happens to be the reason for all this discussion in
the first place :-)  Now I think we do need it. (OK, I can keep
changing my mind, can't I?).

-- Steve


Re: [PATCH RFC] tracepoint: Introduce tracepoint callbacks executing with preempt on

2018-04-27 Thread Paul E. McKenney
On Fri, Apr 27, 2018 at 12:22:01PM -0400, Steven Rostedt wrote:
> On Fri, 27 Apr 2018 09:14:30 -0700
> Joel Fernandes  wrote:
> 
> > > Hmmm...  Do I need to create a _notrace variant of srcu_read_lock()
> > > and srcu_read_unlock()?  
> > 
> > That shouldn't be needed. For the rcu_read_lock_sched case, there is a
> > preempt_disable which needs to be a notrace, but for the srcu one,
> > since we don't do that, I think it should be fine.
> 
> Actually, I think I may agree here too. Because the _notrace is for
> function tracing, and it shouldn't affect it. If people don't want it
> traced, they could add those functions to the list in the notrace file.

OK, feel free to ignore my notrace srcu_read_lock() patch, then.  ;-)

Thanx, Paul



Re: [PATCH RFC] tracepoint: Introduce tracepoint callbacks executing with preempt on

2018-04-27 Thread Paul E. McKenney
On Fri, Apr 27, 2018 at 12:22:01PM -0400, Steven Rostedt wrote:
> On Fri, 27 Apr 2018 09:14:30 -0700
> Joel Fernandes  wrote:
> 
> > > Hmmm...  Do I need to create a _notrace variant of srcu_read_lock()
> > > and srcu_read_unlock()?  
> > 
> > That shouldn't be needed. For the rcu_read_lock_sched case, there is a
> > preempt_disable which needs to be a notrace, but for the srcu one,
> > since we don't do that, I think it should be fine.
> 
> Actually, I think I may agree here too. Because the _notrace is for
> function tracing, and it shouldn't affect it. If people don't want it
> traced, they could add those functions to the list in the notrace file.

OK, feel free to ignore my notrace srcu_read_lock() patch, then.  ;-)

Thanx, Paul



Re: [PATCH RFC] tracepoint: Introduce tracepoint callbacks executing with preempt on

2018-04-27 Thread Paul E. McKenney
On Fri, Apr 27, 2018 at 12:13:30PM -0400, Steven Rostedt wrote:
> On Fri, 27 Apr 2018 08:57:01 -0700
> "Paul E. McKenney"  wrote:
> 
> > > + if (preempt_on) {   \
> > > + WARN_ON_ONCE(in_nmi()); /* no srcu from nmi */  \  
> > 
> > Very good on this check, thank you!
> 
> I think you need to return and not call the read lock.

Works for me either way, at least assuming that the splat actually gets
printed.  ;-)

>   if (WARN_ON_ONCE(in_nmi()))
>   return;
> 
> > 
> > > + idx = srcu_read_lock(_srcu); \  
> > 
> > Hmmm...  Do I need to create a _notrace variant of srcu_read_lock()
> > and srcu_read_unlock()?
> 
> I think so.

OK, please see the (untested) patch below.  Of course,
srcu_read_lock_notrace() invokes __srcu_read_lock(), which looks as
follows:

int __srcu_read_lock(struct srcu_struct *sp)
{
int idx;

idx = READ_ONCE(sp->srcu_idx) & 0x1;
this_cpu_inc(sp->sda->srcu_lock_count[idx]);
smp_mb(); /* B */  /* Avoid leaking the critical section. */
return idx;
}

Do I also need to make a notrace version of __srcu_read_lock()?
Same question for __srcu_read_unlock(), which is similar.  If so,
assuming that there is no need for a notrace variant of this_cpu_inc()
and smp_mb(), I suppose I could simply macro-ize the internals in both
cases, but perhaps you have a better approach.

Thanx, Paul



diff --git a/include/linux/srcu.h b/include/linux/srcu.h
index 91494d7e8e41..e2e2cf05a6eb 100644
--- a/include/linux/srcu.h
+++ b/include/linux/srcu.h
@@ -195,6 +195,16 @@ static inline int srcu_read_lock(struct srcu_struct *sp) 
__acquires(sp)
return retval;
 }
 
+/* Used by tracing, cannot be traced and cannot invoke lockdep. */
+static inline notrace int
+srcu_read_lock_notrace(struct srcu_struct *sp) __acquires(sp)
+{
+   int retval;
+
+   retval = __srcu_read_lock(sp);
+   return retval;
+}
+
 /**
  * srcu_read_unlock - unregister a old reader from an SRCU-protected structure.
  * @sp: srcu_struct in which to unregister the old reader.
@@ -205,6 +215,13 @@ static inline int srcu_read_lock(struct srcu_struct *sp) 
__acquires(sp)
 static inline void srcu_read_unlock(struct srcu_struct *sp, int idx)
__releases(sp)
 {
+   __srcu_read_unlock(sp, idx);
+}
+
+/* Used by tracing, cannot be traced and cannot call lockdep. */
+static inline notrace void
+srcu_read_unlock_notrace(struct srcu_struct *sp, int idx) __releases(sp)
+{
rcu_lock_release(&(sp)->dep_map);
__srcu_read_unlock(sp, idx);
 }



Re: [PATCH RFC] tracepoint: Introduce tracepoint callbacks executing with preempt on

2018-04-27 Thread Paul E. McKenney
On Fri, Apr 27, 2018 at 12:13:30PM -0400, Steven Rostedt wrote:
> On Fri, 27 Apr 2018 08:57:01 -0700
> "Paul E. McKenney"  wrote:
> 
> > > + if (preempt_on) {   \
> > > + WARN_ON_ONCE(in_nmi()); /* no srcu from nmi */  \  
> > 
> > Very good on this check, thank you!
> 
> I think you need to return and not call the read lock.

Works for me either way, at least assuming that the splat actually gets
printed.  ;-)

>   if (WARN_ON_ONCE(in_nmi()))
>   return;
> 
> > 
> > > + idx = srcu_read_lock(_srcu); \  
> > 
> > Hmmm...  Do I need to create a _notrace variant of srcu_read_lock()
> > and srcu_read_unlock()?
> 
> I think so.

OK, please see the (untested) patch below.  Of course,
srcu_read_lock_notrace() invokes __srcu_read_lock(), which looks as
follows:

int __srcu_read_lock(struct srcu_struct *sp)
{
int idx;

idx = READ_ONCE(sp->srcu_idx) & 0x1;
this_cpu_inc(sp->sda->srcu_lock_count[idx]);
smp_mb(); /* B */  /* Avoid leaking the critical section. */
return idx;
}

Do I also need to make a notrace version of __srcu_read_lock()?
Same question for __srcu_read_unlock(), which is similar.  If so,
assuming that there is no need for a notrace variant of this_cpu_inc()
and smp_mb(), I suppose I could simply macro-ize the internals in both
cases, but perhaps you have a better approach.

Thanx, Paul



diff --git a/include/linux/srcu.h b/include/linux/srcu.h
index 91494d7e8e41..e2e2cf05a6eb 100644
--- a/include/linux/srcu.h
+++ b/include/linux/srcu.h
@@ -195,6 +195,16 @@ static inline int srcu_read_lock(struct srcu_struct *sp) 
__acquires(sp)
return retval;
 }
 
+/* Used by tracing, cannot be traced and cannot invoke lockdep. */
+static inline notrace int
+srcu_read_lock_notrace(struct srcu_struct *sp) __acquires(sp)
+{
+   int retval;
+
+   retval = __srcu_read_lock(sp);
+   return retval;
+}
+
 /**
  * srcu_read_unlock - unregister a old reader from an SRCU-protected structure.
  * @sp: srcu_struct in which to unregister the old reader.
@@ -205,6 +215,13 @@ static inline int srcu_read_lock(struct srcu_struct *sp) 
__acquires(sp)
 static inline void srcu_read_unlock(struct srcu_struct *sp, int idx)
__releases(sp)
 {
+   __srcu_read_unlock(sp, idx);
+}
+
+/* Used by tracing, cannot be traced and cannot call lockdep. */
+static inline notrace void
+srcu_read_unlock_notrace(struct srcu_struct *sp, int idx) __releases(sp)
+{
rcu_lock_release(&(sp)->dep_map);
__srcu_read_unlock(sp, idx);
 }



Re: [PATCH RFC] tracepoint: Introduce tracepoint callbacks executing with preempt on

2018-04-27 Thread Steven Rostedt
On Fri, 27 Apr 2018 09:30:05 -0700
Joel Fernandes  wrote:

> On Fri, Apr 27, 2018 at 7:47 AM, Steven Rostedt  wrote:
> > On Fri, 27 Apr 2018 10:26:29 -0400 (EDT)
> > Mathieu Desnoyers  wrote:
> >  
> >> The general approach and the implementation look fine, except for
> >> one small detail: I would be tempted to explicitly disable preemption
> >> around the call to the tracepoint callback for the rcuidle variant,
> >> unless we plan to audit every tracer right away to remove any assumption
> >> that preemption is disabled in the callback implementation.  
> >
> > I'm thinking that we do that audit. There shouldn't be many instances
> > of it. I like the idea that a tracepoint callback gets called with
> > preemption enabled.  
> 
> Here is the list of all callers of the _rcuidle :

I was thinking of auditing who registers callbacks to any tracepoints.

-- Steve

> 
> trace_clk_disable_complete_rcuidle
> trace_clk_disable_rcuidle
> trace_clk_enable_complete_rcuidle
> trace_clk_enable_rcuidle
> trace_console_rcuidle
> trace_cpu_idle_rcuidle
> trace_ipi_entry_rcuidle
> trace_ipi_exit_rcuidle
> trace_ipi_raise_rcuidle
> trace_irq_disable_rcuidle
> trace_irq_enable_rcuidle
> trace_power_domain_target_rcuidle
> trace_preempt_disable_rcuidle
> trace_preempt_enable_rcuidle
> trace_rpm_idle_rcuidle
> trace_rpm_resume_rcuidle
> trace_rpm_return_int_rcuidle
> trace_rpm_suspend_rcuidle
> trace_tlb_flush_rcuidle
> 
> All of these are either called from irqs or preemption disabled
> already. So I think it should be fine to keep preemption on. But I'm
> Ok with disabling it before callback execution if we agree that we
> want that.
> 
> (and the ring buffer code is able to cope anyway Steven pointed).
> 
> thanks,
> 
>  - Joel



Re: [PATCH RFC] tracepoint: Introduce tracepoint callbacks executing with preempt on

2018-04-27 Thread Steven Rostedt
On Fri, 27 Apr 2018 09:30:05 -0700
Joel Fernandes  wrote:

> On Fri, Apr 27, 2018 at 7:47 AM, Steven Rostedt  wrote:
> > On Fri, 27 Apr 2018 10:26:29 -0400 (EDT)
> > Mathieu Desnoyers  wrote:
> >  
> >> The general approach and the implementation look fine, except for
> >> one small detail: I would be tempted to explicitly disable preemption
> >> around the call to the tracepoint callback for the rcuidle variant,
> >> unless we plan to audit every tracer right away to remove any assumption
> >> that preemption is disabled in the callback implementation.  
> >
> > I'm thinking that we do that audit. There shouldn't be many instances
> > of it. I like the idea that a tracepoint callback gets called with
> > preemption enabled.  
> 
> Here is the list of all callers of the _rcuidle :

I was thinking of auditing who registers callbacks to any tracepoints.

-- Steve

> 
> trace_clk_disable_complete_rcuidle
> trace_clk_disable_rcuidle
> trace_clk_enable_complete_rcuidle
> trace_clk_enable_rcuidle
> trace_console_rcuidle
> trace_cpu_idle_rcuidle
> trace_ipi_entry_rcuidle
> trace_ipi_exit_rcuidle
> trace_ipi_raise_rcuidle
> trace_irq_disable_rcuidle
> trace_irq_enable_rcuidle
> trace_power_domain_target_rcuidle
> trace_preempt_disable_rcuidle
> trace_preempt_enable_rcuidle
> trace_rpm_idle_rcuidle
> trace_rpm_resume_rcuidle
> trace_rpm_return_int_rcuidle
> trace_rpm_suspend_rcuidle
> trace_tlb_flush_rcuidle
> 
> All of these are either called from irqs or preemption disabled
> already. So I think it should be fine to keep preemption on. But I'm
> Ok with disabling it before callback execution if we agree that we
> want that.
> 
> (and the ring buffer code is able to cope anyway Steven pointed).
> 
> thanks,
> 
>  - Joel



Re: [PATCH RFC] tracepoint: Introduce tracepoint callbacks executing with preempt on

2018-04-27 Thread Joel Fernandes
On Fri, Apr 27, 2018 at 7:47 AM, Steven Rostedt  wrote:
> On Fri, 27 Apr 2018 10:26:29 -0400 (EDT)
> Mathieu Desnoyers  wrote:
>
>> The general approach and the implementation look fine, except for
>> one small detail: I would be tempted to explicitly disable preemption
>> around the call to the tracepoint callback for the rcuidle variant,
>> unless we plan to audit every tracer right away to remove any assumption
>> that preemption is disabled in the callback implementation.
>
> I'm thinking that we do that audit. There shouldn't be many instances
> of it. I like the idea that a tracepoint callback gets called with
> preemption enabled.

Here is the list of all callers of the _rcuidle :

trace_clk_disable_complete_rcuidle
trace_clk_disable_rcuidle
trace_clk_enable_complete_rcuidle
trace_clk_enable_rcuidle
trace_console_rcuidle
trace_cpu_idle_rcuidle
trace_ipi_entry_rcuidle
trace_ipi_exit_rcuidle
trace_ipi_raise_rcuidle
trace_irq_disable_rcuidle
trace_irq_enable_rcuidle
trace_power_domain_target_rcuidle
trace_preempt_disable_rcuidle
trace_preempt_enable_rcuidle
trace_rpm_idle_rcuidle
trace_rpm_resume_rcuidle
trace_rpm_return_int_rcuidle
trace_rpm_suspend_rcuidle
trace_tlb_flush_rcuidle

All of these are either called from irqs or preemption disabled
already. So I think it should be fine to keep preemption on. But I'm
Ok with disabling it before callback execution if we agree that we
want that.

(and the ring buffer code is able to cope anyway Steven pointed).

thanks,

 - Joel


Re: [PATCH RFC] tracepoint: Introduce tracepoint callbacks executing with preempt on

2018-04-27 Thread Joel Fernandes
On Fri, Apr 27, 2018 at 7:47 AM, Steven Rostedt  wrote:
> On Fri, 27 Apr 2018 10:26:29 -0400 (EDT)
> Mathieu Desnoyers  wrote:
>
>> The general approach and the implementation look fine, except for
>> one small detail: I would be tempted to explicitly disable preemption
>> around the call to the tracepoint callback for the rcuidle variant,
>> unless we plan to audit every tracer right away to remove any assumption
>> that preemption is disabled in the callback implementation.
>
> I'm thinking that we do that audit. There shouldn't be many instances
> of it. I like the idea that a tracepoint callback gets called with
> preemption enabled.

Here is the list of all callers of the _rcuidle :

trace_clk_disable_complete_rcuidle
trace_clk_disable_rcuidle
trace_clk_enable_complete_rcuidle
trace_clk_enable_rcuidle
trace_console_rcuidle
trace_cpu_idle_rcuidle
trace_ipi_entry_rcuidle
trace_ipi_exit_rcuidle
trace_ipi_raise_rcuidle
trace_irq_disable_rcuidle
trace_irq_enable_rcuidle
trace_power_domain_target_rcuidle
trace_preempt_disable_rcuidle
trace_preempt_enable_rcuidle
trace_rpm_idle_rcuidle
trace_rpm_resume_rcuidle
trace_rpm_return_int_rcuidle
trace_rpm_suspend_rcuidle
trace_tlb_flush_rcuidle

All of these are either called from irqs or preemption disabled
already. So I think it should be fine to keep preemption on. But I'm
Ok with disabling it before callback execution if we agree that we
want that.

(and the ring buffer code is able to cope anyway Steven pointed).

thanks,

 - Joel


Re: [PATCH RFC] tracepoint: Introduce tracepoint callbacks executing with preempt on

2018-04-27 Thread Joel Fernandes
On Fri, Apr 27, 2018 at 9:13 AM, Steven Rostedt  wrote:
> On Fri, 27 Apr 2018 08:57:01 -0700
> "Paul E. McKenney"  wrote:
>
>> > +   if (preempt_on) {   \
>> > +   WARN_ON_ONCE(in_nmi()); /* no srcu from nmi */  \
>>
>> Very good on this check, thank you!
>
> I think you need to return and not call the read lock.
>
> if (WARN_ON_ONCE(in_nmi()))
> return;

Cool, I'll do that.

>>
>> > +   idx = srcu_read_lock(_srcu); \
>>
>> Hmmm...  Do I need to create a _notrace variant of srcu_read_lock()
>> and srcu_read_unlock()?
>
> I think so.

Oh yes, since otherwise we call into lockdep.

Paul, then I think that's true we'd need srcu _notrace variants that
don't do the rcu_lock_acquire. Sorry for my earlier email saying its
not needed. Thanks,

- Joel


Re: [PATCH RFC] tracepoint: Introduce tracepoint callbacks executing with preempt on

2018-04-27 Thread Joel Fernandes
On Fri, Apr 27, 2018 at 9:13 AM, Steven Rostedt  wrote:
> On Fri, 27 Apr 2018 08:57:01 -0700
> "Paul E. McKenney"  wrote:
>
>> > +   if (preempt_on) {   \
>> > +   WARN_ON_ONCE(in_nmi()); /* no srcu from nmi */  \
>>
>> Very good on this check, thank you!
>
> I think you need to return and not call the read lock.
>
> if (WARN_ON_ONCE(in_nmi()))
> return;

Cool, I'll do that.

>>
>> > +   idx = srcu_read_lock(_srcu); \
>>
>> Hmmm...  Do I need to create a _notrace variant of srcu_read_lock()
>> and srcu_read_unlock()?
>
> I think so.

Oh yes, since otherwise we call into lockdep.

Paul, then I think that's true we'd need srcu _notrace variants that
don't do the rcu_lock_acquire. Sorry for my earlier email saying its
not needed. Thanks,

- Joel


Re: [PATCH RFC] tracepoint: Introduce tracepoint callbacks executing with preempt on

2018-04-27 Thread Steven Rostedt
On Fri, 27 Apr 2018 09:14:30 -0700
Joel Fernandes  wrote:

> > Hmmm...  Do I need to create a _notrace variant of srcu_read_lock()
> > and srcu_read_unlock()?  
> 
> That shouldn't be needed. For the rcu_read_lock_sched case, there is a
> preempt_disable which needs to be a notrace, but for the srcu one,
> since we don't do that, I think it should be fine.

Actually, I think I may agree here too. Because the _notrace is for
function tracing, and it shouldn't affect it. If people don't want it
traced, they could add those functions to the list in the notrace file.

-- Steve


Re: [PATCH RFC] tracepoint: Introduce tracepoint callbacks executing with preempt on

2018-04-27 Thread Steven Rostedt
On Fri, 27 Apr 2018 09:14:30 -0700
Joel Fernandes  wrote:

> > Hmmm...  Do I need to create a _notrace variant of srcu_read_lock()
> > and srcu_read_unlock()?  
> 
> That shouldn't be needed. For the rcu_read_lock_sched case, there is a
> preempt_disable which needs to be a notrace, but for the srcu one,
> since we don't do that, I think it should be fine.

Actually, I think I may agree here too. Because the _notrace is for
function tracing, and it shouldn't affect it. If people don't want it
traced, they could add those functions to the list in the notrace file.

-- Steve


Re: [PATCH RFC] tracepoint: Introduce tracepoint callbacks executing with preempt on

2018-04-27 Thread Joel Fernandes
Hi Paul,

On Fri, Apr 27, 2018 at 8:57 AM, Paul E. McKenney
 wrote:
> On Thu, Apr 26, 2018 at 09:26:56PM -0700, Joel Fernandes wrote:
>> In recent tests with IRQ on/off tracepoints, a large performance
>> overhead ~10% is noticed when running hackbench. This is root caused to
>> calls to rcu_irq_enter_irqson and rcu_irq_exit_irqson from the
>> tracepoint code. Following a long discussion on the list [1] about this,
>> we concluded that srcu is a better alternative for use during rcu idle.
>> Although it does involve extra barriers, its lighter than the sched-rcu
>> version which has to do additional RCU calls to notify RCU idle about
>> entry into RCU sections.
>>
>> In this patch, we change the underlying implementation of the
>> trace_*_rcuidle API to use SRCU. This has shown to improve performance
>> alot for the high frequency irq enable/disable tracepoints.
>>
>> In the future, we can add a new may_sleep API which can use this
>> infrastructure for callbacks that actually can sleep which will support
>> Mathieu's usecase of blocking probes.
>>
>> Test: Tested idle and preempt/irq tracepoints.
>
> Looks good overall!  One question and a few comments below.
>
> Thanx, Paul
>
>> [1] https://patchwork.kernel.org/patch/10344297/
>>
>> Cc: Steven Rostedt 
>> Cc: Peter Zilstra 
>> Cc: Ingo Molnar 
>> Cc: Mathieu Desnoyers 
>> Cc: Tom Zanussi 
>> Cc: Namhyung Kim 
>> Cc: Thomas Glexiner 
>> Cc: Boqun Feng 
>> Cc: Paul McKenney 
>> Cc: Frederic Weisbecker 
>> Cc: Randy Dunlap 
>> Cc: Masami Hiramatsu 
>> Cc: Fenguang Wu 
>> Cc: Baohong Liu 
>> Cc: Vedang Patel 
>> Cc: kernel-t...@android.com
>> Signed-off-by: Joel Fernandes 
>> ---
>>  include/linux/tracepoint.h | 37 +
>>  kernel/tracepoint.c| 10 +-
>>  2 files changed, 38 insertions(+), 9 deletions(-)
>>
>> diff --git a/include/linux/tracepoint.h b/include/linux/tracepoint.h
>> index c94f466d57ef..a1c1987de423 100644
>> --- a/include/linux/tracepoint.h
>> +++ b/include/linux/tracepoint.h
>> @@ -15,6 +15,7 @@
>>   */
>>
>>  #include 
>> +#include 
>>  #include 
>>  #include 
>>  #include 
>> @@ -33,6 +34,8 @@ struct trace_eval_map {
>>
>>  #define TRACEPOINT_DEFAULT_PRIO  10
>>
>> +extern struct srcu_struct tracepoint_srcu;
>> +
>>  extern int
>>  tracepoint_probe_register(struct tracepoint *tp, void *probe, void *data);
>>  extern int
>> @@ -77,6 +80,7 @@ int unregister_tracepoint_module_notifier(struct 
>> notifier_block *nb)
>>   */
>>  static inline void tracepoint_synchronize_unregister(void)
>>  {
>> + synchronize_srcu(_srcu);
>>   synchronize_sched();
>>  }
>>
>> @@ -129,18 +133,26 @@ extern void syscall_unregfunc(void);
>>   * as "(void *, void)". The DECLARE_TRACE_NOARGS() will pass in just
>>   * "void *data", where as the DECLARE_TRACE() will pass in "void *data, 
>> proto".
>>   */
>> -#define __DO_TRACE(tp, proto, args, cond, rcucheck)  \
>> +#define __DO_TRACE(tp, proto, args, cond, preempt_on)   
>>  \
>>   do {\
>>   struct tracepoint_func *it_func_ptr;\
>>   void *it_func;  \
>>   void *__data;   \
>> + int __maybe_unused idx = 0; \
>>   \
>>   if (!(cond))\
>>   return; \
>> - if (rcucheck)   \
>> - rcu_irq_enter_irqson(); \
>> - rcu_read_lock_sched_notrace();  \
>> - it_func_ptr = rcu_dereference_sched((tp)->funcs);   \
>> + if (preempt_on) {   \
>> + WARN_ON_ONCE(in_nmi()); /* no srcu from nmi */  \
>
> Very good on this check, thank you!

Sure thing :-)

>
>> + idx = srcu_read_lock(_srcu); \
>
> Hmmm...  Do I need to create a _notrace variant of srcu_read_lock()
> and srcu_read_unlock()?

That shouldn't be needed. For the rcu_read_lock_sched case, there is a
preempt_disable which needs to be a notrace, but for the srcu one,
since we don't do that, I think it should be fine.

Thanks!

- Joel


Re: [PATCH RFC] tracepoint: Introduce tracepoint callbacks executing with preempt on

2018-04-27 Thread Joel Fernandes
Hi Paul,

On Fri, Apr 27, 2018 at 8:57 AM, Paul E. McKenney
 wrote:
> On Thu, Apr 26, 2018 at 09:26:56PM -0700, Joel Fernandes wrote:
>> In recent tests with IRQ on/off tracepoints, a large performance
>> overhead ~10% is noticed when running hackbench. This is root caused to
>> calls to rcu_irq_enter_irqson and rcu_irq_exit_irqson from the
>> tracepoint code. Following a long discussion on the list [1] about this,
>> we concluded that srcu is a better alternative for use during rcu idle.
>> Although it does involve extra barriers, its lighter than the sched-rcu
>> version which has to do additional RCU calls to notify RCU idle about
>> entry into RCU sections.
>>
>> In this patch, we change the underlying implementation of the
>> trace_*_rcuidle API to use SRCU. This has shown to improve performance
>> alot for the high frequency irq enable/disable tracepoints.
>>
>> In the future, we can add a new may_sleep API which can use this
>> infrastructure for callbacks that actually can sleep which will support
>> Mathieu's usecase of blocking probes.
>>
>> Test: Tested idle and preempt/irq tracepoints.
>
> Looks good overall!  One question and a few comments below.
>
> Thanx, Paul
>
>> [1] https://patchwork.kernel.org/patch/10344297/
>>
>> Cc: Steven Rostedt 
>> Cc: Peter Zilstra 
>> Cc: Ingo Molnar 
>> Cc: Mathieu Desnoyers 
>> Cc: Tom Zanussi 
>> Cc: Namhyung Kim 
>> Cc: Thomas Glexiner 
>> Cc: Boqun Feng 
>> Cc: Paul McKenney 
>> Cc: Frederic Weisbecker 
>> Cc: Randy Dunlap 
>> Cc: Masami Hiramatsu 
>> Cc: Fenguang Wu 
>> Cc: Baohong Liu 
>> Cc: Vedang Patel 
>> Cc: kernel-t...@android.com
>> Signed-off-by: Joel Fernandes 
>> ---
>>  include/linux/tracepoint.h | 37 +
>>  kernel/tracepoint.c| 10 +-
>>  2 files changed, 38 insertions(+), 9 deletions(-)
>>
>> diff --git a/include/linux/tracepoint.h b/include/linux/tracepoint.h
>> index c94f466d57ef..a1c1987de423 100644
>> --- a/include/linux/tracepoint.h
>> +++ b/include/linux/tracepoint.h
>> @@ -15,6 +15,7 @@
>>   */
>>
>>  #include 
>> +#include 
>>  #include 
>>  #include 
>>  #include 
>> @@ -33,6 +34,8 @@ struct trace_eval_map {
>>
>>  #define TRACEPOINT_DEFAULT_PRIO  10
>>
>> +extern struct srcu_struct tracepoint_srcu;
>> +
>>  extern int
>>  tracepoint_probe_register(struct tracepoint *tp, void *probe, void *data);
>>  extern int
>> @@ -77,6 +80,7 @@ int unregister_tracepoint_module_notifier(struct 
>> notifier_block *nb)
>>   */
>>  static inline void tracepoint_synchronize_unregister(void)
>>  {
>> + synchronize_srcu(_srcu);
>>   synchronize_sched();
>>  }
>>
>> @@ -129,18 +133,26 @@ extern void syscall_unregfunc(void);
>>   * as "(void *, void)". The DECLARE_TRACE_NOARGS() will pass in just
>>   * "void *data", where as the DECLARE_TRACE() will pass in "void *data, 
>> proto".
>>   */
>> -#define __DO_TRACE(tp, proto, args, cond, rcucheck)  \
>> +#define __DO_TRACE(tp, proto, args, cond, preempt_on)   
>>  \
>>   do {\
>>   struct tracepoint_func *it_func_ptr;\
>>   void *it_func;  \
>>   void *__data;   \
>> + int __maybe_unused idx = 0; \
>>   \
>>   if (!(cond))\
>>   return; \
>> - if (rcucheck)   \
>> - rcu_irq_enter_irqson(); \
>> - rcu_read_lock_sched_notrace();  \
>> - it_func_ptr = rcu_dereference_sched((tp)->funcs);   \
>> + if (preempt_on) {   \
>> + WARN_ON_ONCE(in_nmi()); /* no srcu from nmi */  \
>
> Very good on this check, thank you!

Sure thing :-)

>
>> + idx = srcu_read_lock(_srcu); \
>
> Hmmm...  Do I need to create a _notrace variant of srcu_read_lock()
> and srcu_read_unlock()?

That shouldn't be needed. For the rcu_read_lock_sched case, there is a
preempt_disable which needs to be a notrace, but for the srcu one,
since we don't do that, I think it should be fine.

Thanks!

- Joel


Re: [PATCH RFC] tracepoint: Introduce tracepoint callbacks executing with preempt on

2018-04-27 Thread Steven Rostedt
On Fri, 27 Apr 2018 08:57:01 -0700
"Paul E. McKenney"  wrote:

> > +   if (preempt_on) {   \
> > +   WARN_ON_ONCE(in_nmi()); /* no srcu from nmi */  \  
> 
> Very good on this check, thank you!

I think you need to return and not call the read lock.

if (WARN_ON_ONCE(in_nmi()))
return;

> 
> > +   idx = srcu_read_lock(_srcu); \  
> 
> Hmmm...  Do I need to create a _notrace variant of srcu_read_lock()
> and srcu_read_unlock()?

I think so.

-- Steve

> 
> > +   it_func_ptr = srcu_dereference((tp)->funcs, \
> > +   _srcu);  \
> > +   } else {\
> > +   rcu_read_lock_sched_notrace();  \
> > +   it_func_ptr =   \
> > +   rcu_dereference_sched((tp)->funcs); \
> > +   }   \
> > +   \


Re: [PATCH RFC] tracepoint: Introduce tracepoint callbacks executing with preempt on

2018-04-27 Thread Steven Rostedt
On Fri, 27 Apr 2018 08:57:01 -0700
"Paul E. McKenney"  wrote:

> > +   if (preempt_on) {   \
> > +   WARN_ON_ONCE(in_nmi()); /* no srcu from nmi */  \  
> 
> Very good on this check, thank you!

I think you need to return and not call the read lock.

if (WARN_ON_ONCE(in_nmi()))
return;

> 
> > +   idx = srcu_read_lock(_srcu); \  
> 
> Hmmm...  Do I need to create a _notrace variant of srcu_read_lock()
> and srcu_read_unlock()?

I think so.

-- Steve

> 
> > +   it_func_ptr = srcu_dereference((tp)->funcs, \
> > +   _srcu);  \
> > +   } else {\
> > +   rcu_read_lock_sched_notrace();  \
> > +   it_func_ptr =   \
> > +   rcu_dereference_sched((tp)->funcs); \
> > +   }   \
> > +   \


Re: [PATCH RFC] tracepoint: Introduce tracepoint callbacks executing with preempt on

2018-04-27 Thread Steven Rostedt
On Fri, 27 Apr 2018 11:43:41 -0400 (EDT)
Mathieu Desnoyers  wrote:

> It does so by disabling preemption in the callbacks, even when it's
> redundant with the guarantees already provided by tracepoint-sched-rcu
> and by kprobes. It's not that great for a fast-path.

Really, preempt_disable() is not bad for a fast path. It's far better
than a local_irq_disable().

-- Steve


Re: [PATCH RFC] tracepoint: Introduce tracepoint callbacks executing with preempt on

2018-04-27 Thread Steven Rostedt
On Fri, 27 Apr 2018 11:43:41 -0400 (EDT)
Mathieu Desnoyers  wrote:

> It does so by disabling preemption in the callbacks, even when it's
> redundant with the guarantees already provided by tracepoint-sched-rcu
> and by kprobes. It's not that great for a fast-path.

Really, preempt_disable() is not bad for a fast path. It's far better
than a local_irq_disable().

-- Steve


Re: [PATCH RFC] tracepoint: Introduce tracepoint callbacks executing with preempt on

2018-04-27 Thread Steven Rostedt
On Fri, 27 Apr 2018 11:42:15 -0400 (EDT)
Mathieu Desnoyers  wrote:

> - On Apr 27, 2018, at 10:47 AM, rostedt rost...@goodmis.org wrote:
> 
> > On Fri, 27 Apr 2018 10:26:29 -0400 (EDT)
> > Mathieu Desnoyers  wrote:
> >   
> >> The general approach and the implementation look fine, except for
> >> one small detail: I would be tempted to explicitly disable preemption
> >> around the call to the tracepoint callback for the rcuidle variant,
> >> unless we plan to audit every tracer right away to remove any assumption
> >> that preemption is disabled in the callback implementation.  
> > 
> > I'm thinking that we do that audit. There shouldn't be many instances
> > of it. I like the idea that a tracepoint callback gets called with
> > preemption enabled.  
> 
> I see that ftrace explicitly disables preemption in its ring buffer
> code. FWIW, this is redundant when called from sched-rcu tracepoints
> and from kprobes which adds unnecessary performance overhead.

Sure, but that code is called from other locations that do not have
preemption disabled. Calling preempt_disable() is far from the biggest
overhead of that code path.

> 
> LTTng expects preemption to be disabled when invoked. I can adapt on my
> side as needed, but would prefer not to have redundant preemption disabling
> for probes hooking on sched-rcu tracepoints (which is the common case).

Why not? Really, preempt_disable is simply a per cpu counter, with only
need of adding compiler barriers.

> 
> Do perf callbacks expect preemption to be disabled ?

I'll have to look, but wouldn't be hard to change.

-- Steve



Re: [PATCH RFC] tracepoint: Introduce tracepoint callbacks executing with preempt on

2018-04-27 Thread Steven Rostedt
On Fri, 27 Apr 2018 11:42:15 -0400 (EDT)
Mathieu Desnoyers  wrote:

> - On Apr 27, 2018, at 10:47 AM, rostedt rost...@goodmis.org wrote:
> 
> > On Fri, 27 Apr 2018 10:26:29 -0400 (EDT)
> > Mathieu Desnoyers  wrote:
> >   
> >> The general approach and the implementation look fine, except for
> >> one small detail: I would be tempted to explicitly disable preemption
> >> around the call to the tracepoint callback for the rcuidle variant,
> >> unless we plan to audit every tracer right away to remove any assumption
> >> that preemption is disabled in the callback implementation.  
> > 
> > I'm thinking that we do that audit. There shouldn't be many instances
> > of it. I like the idea that a tracepoint callback gets called with
> > preemption enabled.  
> 
> I see that ftrace explicitly disables preemption in its ring buffer
> code. FWIW, this is redundant when called from sched-rcu tracepoints
> and from kprobes which adds unnecessary performance overhead.

Sure, but that code is called from other locations that do not have
preemption disabled. Calling preempt_disable() is far from the biggest
overhead of that code path.

> 
> LTTng expects preemption to be disabled when invoked. I can adapt on my
> side as needed, but would prefer not to have redundant preemption disabling
> for probes hooking on sched-rcu tracepoints (which is the common case).

Why not? Really, preempt_disable is simply a per cpu counter, with only
need of adding compiler barriers.

> 
> Do perf callbacks expect preemption to be disabled ?

I'll have to look, but wouldn't be hard to change.

-- Steve



Re: [PATCH RFC] tracepoint: Introduce tracepoint callbacks executing with preempt on

2018-04-27 Thread Paul E. McKenney
On Fri, Apr 27, 2018 at 11:40:05AM -0400, Steven Rostedt wrote:
> On Fri, 27 Apr 2018 08:38:26 -0700
> "Paul E. McKenney"  wrote:
> 
> > On Fri, Apr 27, 2018 at 10:47:47AM -0400, Steven Rostedt wrote:
> > > On Fri, 27 Apr 2018 10:26:29 -0400 (EDT)
> > > Mathieu Desnoyers  wrote:
> > >   
> > > > The general approach and the implementation look fine, except for
> > > > one small detail: I would be tempted to explicitly disable preemption
> > > > around the call to the tracepoint callback for the rcuidle variant,
> > > > unless we plan to audit every tracer right away to remove any assumption
> > > > that preemption is disabled in the callback implementation.  
> > > 
> > > I'm thinking that we do that audit. There shouldn't be many instances
> > > of it. I like the idea that a tracepoint callback gets called with
> > > preemption enabled.  
> > 
> > Are you really sure you want to increase your state space that much?
> 
> Why not? The code I have in callbacks already deals with all sorts of
> context - normal, softirq, irq, NMI, preemption disabled, irq
> disabled.

But why?  Do people really expect good real-time response on systems
instrumented with lots of tracing?

Thanx, Paul



Re: [PATCH RFC] tracepoint: Introduce tracepoint callbacks executing with preempt on

2018-04-27 Thread Paul E. McKenney
On Fri, Apr 27, 2018 at 11:40:05AM -0400, Steven Rostedt wrote:
> On Fri, 27 Apr 2018 08:38:26 -0700
> "Paul E. McKenney"  wrote:
> 
> > On Fri, Apr 27, 2018 at 10:47:47AM -0400, Steven Rostedt wrote:
> > > On Fri, 27 Apr 2018 10:26:29 -0400 (EDT)
> > > Mathieu Desnoyers  wrote:
> > >   
> > > > The general approach and the implementation look fine, except for
> > > > one small detail: I would be tempted to explicitly disable preemption
> > > > around the call to the tracepoint callback for the rcuidle variant,
> > > > unless we plan to audit every tracer right away to remove any assumption
> > > > that preemption is disabled in the callback implementation.  
> > > 
> > > I'm thinking that we do that audit. There shouldn't be many instances
> > > of it. I like the idea that a tracepoint callback gets called with
> > > preemption enabled.  
> > 
> > Are you really sure you want to increase your state space that much?
> 
> Why not? The code I have in callbacks already deals with all sorts of
> context - normal, softirq, irq, NMI, preemption disabled, irq
> disabled.

But why?  Do people really expect good real-time response on systems
instrumented with lots of tracing?

Thanx, Paul



Re: [PATCH RFC] tracepoint: Introduce tracepoint callbacks executing with preempt on

2018-04-27 Thread Paul E. McKenney
On Thu, Apr 26, 2018 at 09:26:56PM -0700, Joel Fernandes wrote:
> In recent tests with IRQ on/off tracepoints, a large performance
> overhead ~10% is noticed when running hackbench. This is root caused to
> calls to rcu_irq_enter_irqson and rcu_irq_exit_irqson from the
> tracepoint code. Following a long discussion on the list [1] about this,
> we concluded that srcu is a better alternative for use during rcu idle.
> Although it does involve extra barriers, its lighter than the sched-rcu
> version which has to do additional RCU calls to notify RCU idle about
> entry into RCU sections.
> 
> In this patch, we change the underlying implementation of the
> trace_*_rcuidle API to use SRCU. This has shown to improve performance
> alot for the high frequency irq enable/disable tracepoints.
> 
> In the future, we can add a new may_sleep API which can use this
> infrastructure for callbacks that actually can sleep which will support
> Mathieu's usecase of blocking probes.
> 
> Test: Tested idle and preempt/irq tracepoints.

Looks good overall!  One question and a few comments below.

Thanx, Paul

> [1] https://patchwork.kernel.org/patch/10344297/
> 
> Cc: Steven Rostedt 
> Cc: Peter Zilstra 
> Cc: Ingo Molnar 
> Cc: Mathieu Desnoyers 
> Cc: Tom Zanussi 
> Cc: Namhyung Kim 
> Cc: Thomas Glexiner 
> Cc: Boqun Feng 
> Cc: Paul McKenney 
> Cc: Frederic Weisbecker 
> Cc: Randy Dunlap 
> Cc: Masami Hiramatsu 
> Cc: Fenguang Wu 
> Cc: Baohong Liu 
> Cc: Vedang Patel 
> Cc: kernel-t...@android.com
> Signed-off-by: Joel Fernandes 
> ---
>  include/linux/tracepoint.h | 37 +
>  kernel/tracepoint.c| 10 +-
>  2 files changed, 38 insertions(+), 9 deletions(-)
> 
> diff --git a/include/linux/tracepoint.h b/include/linux/tracepoint.h
> index c94f466d57ef..a1c1987de423 100644
> --- a/include/linux/tracepoint.h
> +++ b/include/linux/tracepoint.h
> @@ -15,6 +15,7 @@
>   */
> 
>  #include 
> +#include 
>  #include 
>  #include 
>  #include 
> @@ -33,6 +34,8 @@ struct trace_eval_map {
> 
>  #define TRACEPOINT_DEFAULT_PRIO  10
> 
> +extern struct srcu_struct tracepoint_srcu;
> +
>  extern int
>  tracepoint_probe_register(struct tracepoint *tp, void *probe, void *data);
>  extern int
> @@ -77,6 +80,7 @@ int unregister_tracepoint_module_notifier(struct 
> notifier_block *nb)
>   */
>  static inline void tracepoint_synchronize_unregister(void)
>  {
> + synchronize_srcu(_srcu);
>   synchronize_sched();
>  }
> 
> @@ -129,18 +133,26 @@ extern void syscall_unregfunc(void);
>   * as "(void *, void)". The DECLARE_TRACE_NOARGS() will pass in just
>   * "void *data", where as the DECLARE_TRACE() will pass in "void *data, 
> proto".
>   */
> -#define __DO_TRACE(tp, proto, args, cond, rcucheck)  \
> +#define __DO_TRACE(tp, proto, args, cond, preempt_on)
> \
>   do {\
>   struct tracepoint_func *it_func_ptr;\
>   void *it_func;  \
>   void *__data;   \
> + int __maybe_unused idx = 0; \
>   \
>   if (!(cond))\
>   return; \
> - if (rcucheck)   \
> - rcu_irq_enter_irqson(); \
> - rcu_read_lock_sched_notrace();  \
> - it_func_ptr = rcu_dereference_sched((tp)->funcs);   \
> + if (preempt_on) {   \
> + WARN_ON_ONCE(in_nmi()); /* no srcu from nmi */  \

Very good on this check, thank you!

> + idx = srcu_read_lock(_srcu); \

Hmmm...  Do I need to create a _notrace variant of srcu_read_lock()
and srcu_read_unlock()?

> + it_func_ptr = srcu_dereference((tp)->funcs, \
> + _srcu);  \
> + } else {\
> + rcu_read_lock_sched_notrace();  \
> + it_func_ptr =   \
> + rcu_dereference_sched((tp)->funcs); \
> + }   

Re: [PATCH RFC] tracepoint: Introduce tracepoint callbacks executing with preempt on

2018-04-27 Thread Paul E. McKenney
On Thu, Apr 26, 2018 at 09:26:56PM -0700, Joel Fernandes wrote:
> In recent tests with IRQ on/off tracepoints, a large performance
> overhead ~10% is noticed when running hackbench. This is root caused to
> calls to rcu_irq_enter_irqson and rcu_irq_exit_irqson from the
> tracepoint code. Following a long discussion on the list [1] about this,
> we concluded that srcu is a better alternative for use during rcu idle.
> Although it does involve extra barriers, its lighter than the sched-rcu
> version which has to do additional RCU calls to notify RCU idle about
> entry into RCU sections.
> 
> In this patch, we change the underlying implementation of the
> trace_*_rcuidle API to use SRCU. This has shown to improve performance
> alot for the high frequency irq enable/disable tracepoints.
> 
> In the future, we can add a new may_sleep API which can use this
> infrastructure for callbacks that actually can sleep which will support
> Mathieu's usecase of blocking probes.
> 
> Test: Tested idle and preempt/irq tracepoints.

Looks good overall!  One question and a few comments below.

Thanx, Paul

> [1] https://patchwork.kernel.org/patch/10344297/
> 
> Cc: Steven Rostedt 
> Cc: Peter Zilstra 
> Cc: Ingo Molnar 
> Cc: Mathieu Desnoyers 
> Cc: Tom Zanussi 
> Cc: Namhyung Kim 
> Cc: Thomas Glexiner 
> Cc: Boqun Feng 
> Cc: Paul McKenney 
> Cc: Frederic Weisbecker 
> Cc: Randy Dunlap 
> Cc: Masami Hiramatsu 
> Cc: Fenguang Wu 
> Cc: Baohong Liu 
> Cc: Vedang Patel 
> Cc: kernel-t...@android.com
> Signed-off-by: Joel Fernandes 
> ---
>  include/linux/tracepoint.h | 37 +
>  kernel/tracepoint.c| 10 +-
>  2 files changed, 38 insertions(+), 9 deletions(-)
> 
> diff --git a/include/linux/tracepoint.h b/include/linux/tracepoint.h
> index c94f466d57ef..a1c1987de423 100644
> --- a/include/linux/tracepoint.h
> +++ b/include/linux/tracepoint.h
> @@ -15,6 +15,7 @@
>   */
> 
>  #include 
> +#include 
>  #include 
>  #include 
>  #include 
> @@ -33,6 +34,8 @@ struct trace_eval_map {
> 
>  #define TRACEPOINT_DEFAULT_PRIO  10
> 
> +extern struct srcu_struct tracepoint_srcu;
> +
>  extern int
>  tracepoint_probe_register(struct tracepoint *tp, void *probe, void *data);
>  extern int
> @@ -77,6 +80,7 @@ int unregister_tracepoint_module_notifier(struct 
> notifier_block *nb)
>   */
>  static inline void tracepoint_synchronize_unregister(void)
>  {
> + synchronize_srcu(_srcu);
>   synchronize_sched();
>  }
> 
> @@ -129,18 +133,26 @@ extern void syscall_unregfunc(void);
>   * as "(void *, void)". The DECLARE_TRACE_NOARGS() will pass in just
>   * "void *data", where as the DECLARE_TRACE() will pass in "void *data, 
> proto".
>   */
> -#define __DO_TRACE(tp, proto, args, cond, rcucheck)  \
> +#define __DO_TRACE(tp, proto, args, cond, preempt_on)
> \
>   do {\
>   struct tracepoint_func *it_func_ptr;\
>   void *it_func;  \
>   void *__data;   \
> + int __maybe_unused idx = 0; \
>   \
>   if (!(cond))\
>   return; \
> - if (rcucheck)   \
> - rcu_irq_enter_irqson(); \
> - rcu_read_lock_sched_notrace();  \
> - it_func_ptr = rcu_dereference_sched((tp)->funcs);   \
> + if (preempt_on) {   \
> + WARN_ON_ONCE(in_nmi()); /* no srcu from nmi */  \

Very good on this check, thank you!

> + idx = srcu_read_lock(_srcu); \

Hmmm...  Do I need to create a _notrace variant of srcu_read_lock()
and srcu_read_unlock()?

> + it_func_ptr = srcu_dereference((tp)->funcs, \
> + _srcu);  \
> + } else {\
> + rcu_read_lock_sched_notrace();  \
> + it_func_ptr =   \
> + rcu_dereference_sched((tp)->funcs); \
> + }   \
> + \
>   if (it_func_ptr) {  \
>   do {\
>   it_func = (it_func_ptr)->func;  \
> @@ -148,12 +160,21 @@ extern void 

Re: [PATCH RFC] tracepoint: Introduce tracepoint callbacks executing with preempt on

2018-04-27 Thread Mathieu Desnoyers
- On Apr 27, 2018, at 11:40 AM, rostedt rost...@goodmis.org wrote:

> On Fri, 27 Apr 2018 08:38:26 -0700
> "Paul E. McKenney"  wrote:
> 
>> On Fri, Apr 27, 2018 at 10:47:47AM -0400, Steven Rostedt wrote:
>> > On Fri, 27 Apr 2018 10:26:29 -0400 (EDT)
>> > Mathieu Desnoyers  wrote:
>> >   
>> > > The general approach and the implementation look fine, except for
>> > > one small detail: I would be tempted to explicitly disable preemption
>> > > around the call to the tracepoint callback for the rcuidle variant,
>> > > unless we plan to audit every tracer right away to remove any assumption
>> > > that preemption is disabled in the callback implementation.
>> > 
>> > I'm thinking that we do that audit. There shouldn't be many instances
>> > of it. I like the idea that a tracepoint callback gets called with
>> > preemption enabled.
>> 
>> Are you really sure you want to increase your state space that much?
> 
> Why not? The code I have in callbacks already deals with all sorts of
> context - normal, softirq, irq, NMI, preemption disabled, irq
> disabled.

It does so by disabling preemption in the callbacks, even when it's
redundant with the guarantees already provided by tracepoint-sched-rcu
and by kprobes. It's not that great for a fast-path.

Thanks,

Mathieu


-- 
Mathieu Desnoyers
EfficiOS Inc.
http://www.efficios.com


Re: [PATCH RFC] tracepoint: Introduce tracepoint callbacks executing with preempt on

2018-04-27 Thread Mathieu Desnoyers
- On Apr 27, 2018, at 11:40 AM, rostedt rost...@goodmis.org wrote:

> On Fri, 27 Apr 2018 08:38:26 -0700
> "Paul E. McKenney"  wrote:
> 
>> On Fri, Apr 27, 2018 at 10:47:47AM -0400, Steven Rostedt wrote:
>> > On Fri, 27 Apr 2018 10:26:29 -0400 (EDT)
>> > Mathieu Desnoyers  wrote:
>> >   
>> > > The general approach and the implementation look fine, except for
>> > > one small detail: I would be tempted to explicitly disable preemption
>> > > around the call to the tracepoint callback for the rcuidle variant,
>> > > unless we plan to audit every tracer right away to remove any assumption
>> > > that preemption is disabled in the callback implementation.
>> > 
>> > I'm thinking that we do that audit. There shouldn't be many instances
>> > of it. I like the idea that a tracepoint callback gets called with
>> > preemption enabled.
>> 
>> Are you really sure you want to increase your state space that much?
> 
> Why not? The code I have in callbacks already deals with all sorts of
> context - normal, softirq, irq, NMI, preemption disabled, irq
> disabled.

It does so by disabling preemption in the callbacks, even when it's
redundant with the guarantees already provided by tracepoint-sched-rcu
and by kprobes. It's not that great for a fast-path.

Thanks,

Mathieu


-- 
Mathieu Desnoyers
EfficiOS Inc.
http://www.efficios.com


Re: [PATCH RFC] tracepoint: Introduce tracepoint callbacks executing with preempt on

2018-04-27 Thread Mathieu Desnoyers
- On Apr 27, 2018, at 10:47 AM, rostedt rost...@goodmis.org wrote:

> On Fri, 27 Apr 2018 10:26:29 -0400 (EDT)
> Mathieu Desnoyers  wrote:
> 
>> The general approach and the implementation look fine, except for
>> one small detail: I would be tempted to explicitly disable preemption
>> around the call to the tracepoint callback for the rcuidle variant,
>> unless we plan to audit every tracer right away to remove any assumption
>> that preemption is disabled in the callback implementation.
> 
> I'm thinking that we do that audit. There shouldn't be many instances
> of it. I like the idea that a tracepoint callback gets called with
> preemption enabled.

I see that ftrace explicitly disables preemption in its ring buffer
code. FWIW, this is redundant when called from sched-rcu tracepoints
and from kprobes which adds unnecessary performance overhead.

LTTng expects preemption to be disabled when invoked. I can adapt on my
side as needed, but would prefer not to have redundant preemption disabling
for probes hooking on sched-rcu tracepoints (which is the common case).

Do perf callbacks expect preemption to be disabled ?

Thanks,

Mathieu

-- 
Mathieu Desnoyers
EfficiOS Inc.
http://www.efficios.com


Re: [PATCH RFC] tracepoint: Introduce tracepoint callbacks executing with preempt on

2018-04-27 Thread Mathieu Desnoyers
- On Apr 27, 2018, at 10:47 AM, rostedt rost...@goodmis.org wrote:

> On Fri, 27 Apr 2018 10:26:29 -0400 (EDT)
> Mathieu Desnoyers  wrote:
> 
>> The general approach and the implementation look fine, except for
>> one small detail: I would be tempted to explicitly disable preemption
>> around the call to the tracepoint callback for the rcuidle variant,
>> unless we plan to audit every tracer right away to remove any assumption
>> that preemption is disabled in the callback implementation.
> 
> I'm thinking that we do that audit. There shouldn't be many instances
> of it. I like the idea that a tracepoint callback gets called with
> preemption enabled.

I see that ftrace explicitly disables preemption in its ring buffer
code. FWIW, this is redundant when called from sched-rcu tracepoints
and from kprobes which adds unnecessary performance overhead.

LTTng expects preemption to be disabled when invoked. I can adapt on my
side as needed, but would prefer not to have redundant preemption disabling
for probes hooking on sched-rcu tracepoints (which is the common case).

Do perf callbacks expect preemption to be disabled ?

Thanks,

Mathieu

-- 
Mathieu Desnoyers
EfficiOS Inc.
http://www.efficios.com


Re: [PATCH RFC] tracepoint: Introduce tracepoint callbacks executing with preempt on

2018-04-27 Thread Steven Rostedt
On Fri, 27 Apr 2018 08:38:26 -0700
"Paul E. McKenney"  wrote:

> On Fri, Apr 27, 2018 at 10:47:47AM -0400, Steven Rostedt wrote:
> > On Fri, 27 Apr 2018 10:26:29 -0400 (EDT)
> > Mathieu Desnoyers  wrote:
> >   
> > > The general approach and the implementation look fine, except for
> > > one small detail: I would be tempted to explicitly disable preemption
> > > around the call to the tracepoint callback for the rcuidle variant,
> > > unless we plan to audit every tracer right away to remove any assumption
> > > that preemption is disabled in the callback implementation.  
> > 
> > I'm thinking that we do that audit. There shouldn't be many instances
> > of it. I like the idea that a tracepoint callback gets called with
> > preemption enabled.  
> 
> Are you really sure you want to increase your state space that much?

Why not? The code I have in callbacks already deals with all sorts of
context - normal, softirq, irq, NMI, preemption disabled, irq
disabled.

-- Steve


Re: [PATCH RFC] tracepoint: Introduce tracepoint callbacks executing with preempt on

2018-04-27 Thread Steven Rostedt
On Fri, 27 Apr 2018 08:38:26 -0700
"Paul E. McKenney"  wrote:

> On Fri, Apr 27, 2018 at 10:47:47AM -0400, Steven Rostedt wrote:
> > On Fri, 27 Apr 2018 10:26:29 -0400 (EDT)
> > Mathieu Desnoyers  wrote:
> >   
> > > The general approach and the implementation look fine, except for
> > > one small detail: I would be tempted to explicitly disable preemption
> > > around the call to the tracepoint callback for the rcuidle variant,
> > > unless we plan to audit every tracer right away to remove any assumption
> > > that preemption is disabled in the callback implementation.  
> > 
> > I'm thinking that we do that audit. There shouldn't be many instances
> > of it. I like the idea that a tracepoint callback gets called with
> > preemption enabled.  
> 
> Are you really sure you want to increase your state space that much?

Why not? The code I have in callbacks already deals with all sorts of
context - normal, softirq, irq, NMI, preemption disabled, irq
disabled.

-- Steve


Re: [PATCH RFC] tracepoint: Introduce tracepoint callbacks executing with preempt on

2018-04-27 Thread Paul E. McKenney
On Fri, Apr 27, 2018 at 10:47:47AM -0400, Steven Rostedt wrote:
> On Fri, 27 Apr 2018 10:26:29 -0400 (EDT)
> Mathieu Desnoyers  wrote:
> 
> > The general approach and the implementation look fine, except for
> > one small detail: I would be tempted to explicitly disable preemption
> > around the call to the tracepoint callback for the rcuidle variant,
> > unless we plan to audit every tracer right away to remove any assumption
> > that preemption is disabled in the callback implementation.
> 
> I'm thinking that we do that audit. There shouldn't be many instances
> of it. I like the idea that a tracepoint callback gets called with
> preemption enabled.

Are you really sure you want to increase your state space that much?

Thanx, Paul



Re: [PATCH RFC] tracepoint: Introduce tracepoint callbacks executing with preempt on

2018-04-27 Thread Paul E. McKenney
On Fri, Apr 27, 2018 at 10:47:47AM -0400, Steven Rostedt wrote:
> On Fri, 27 Apr 2018 10:26:29 -0400 (EDT)
> Mathieu Desnoyers  wrote:
> 
> > The general approach and the implementation look fine, except for
> > one small detail: I would be tempted to explicitly disable preemption
> > around the call to the tracepoint callback for the rcuidle variant,
> > unless we plan to audit every tracer right away to remove any assumption
> > that preemption is disabled in the callback implementation.
> 
> I'm thinking that we do that audit. There shouldn't be many instances
> of it. I like the idea that a tracepoint callback gets called with
> preemption enabled.

Are you really sure you want to increase your state space that much?

Thanx, Paul



Re: [PATCH RFC] tracepoint: Introduce tracepoint callbacks executing with preempt on

2018-04-27 Thread Steven Rostedt
On Fri, 27 Apr 2018 10:26:29 -0400 (EDT)
Mathieu Desnoyers  wrote:

> The general approach and the implementation look fine, except for
> one small detail: I would be tempted to explicitly disable preemption
> around the call to the tracepoint callback for the rcuidle variant,
> unless we plan to audit every tracer right away to remove any assumption
> that preemption is disabled in the callback implementation.

I'm thinking that we do that audit. There shouldn't be many instances
of it. I like the idea that a tracepoint callback gets called with
preemption enabled.

-- Steve


Re: [PATCH RFC] tracepoint: Introduce tracepoint callbacks executing with preempt on

2018-04-27 Thread Steven Rostedt
On Fri, 27 Apr 2018 10:26:29 -0400 (EDT)
Mathieu Desnoyers  wrote:

> The general approach and the implementation look fine, except for
> one small detail: I would be tempted to explicitly disable preemption
> around the call to the tracepoint callback for the rcuidle variant,
> unless we plan to audit every tracer right away to remove any assumption
> that preemption is disabled in the callback implementation.

I'm thinking that we do that audit. There shouldn't be many instances
of it. I like the idea that a tracepoint callback gets called with
preemption enabled.

-- Steve


Re: [PATCH RFC] tracepoint: Introduce tracepoint callbacks executing with preempt on

2018-04-27 Thread Mathieu Desnoyers
- On Apr 27, 2018, at 12:26 AM, Joel Fernandes joe...@google.com wrote:

> In recent tests with IRQ on/off tracepoints, a large performance
> overhead ~10% is noticed when running hackbench. This is root caused to
> calls to rcu_irq_enter_irqson and rcu_irq_exit_irqson from the
> tracepoint code. Following a long discussion on the list [1] about this,
> we concluded that srcu is a better alternative for use during rcu idle.
> Although it does involve extra barriers, its lighter than the sched-rcu
> version which has to do additional RCU calls to notify RCU idle about
> entry into RCU sections.
> 
> In this patch, we change the underlying implementation of the
> trace_*_rcuidle API to use SRCU. This has shown to improve performance
> alot for the high frequency irq enable/disable tracepoints.

The general approach and the implementation look fine, except for
one small detail: I would be tempted to explicitly disable preemption
around the call to the tracepoint callback for the rcuidle variant,
unless we plan to audit every tracer right away to remove any assumption
that preemption is disabled in the callback implementation.

That would be the main difference between an eventual "may_sleep" tracepoint
and a rcuidle tracepoint: "may_sleep" would use SRCU and leave preemption
enabled when invoking the callback. rcuidle uses SRCU too, but would
disable preemption when invoking the callback.

Thoughts ?

Thanks,

Mathieu


> 
> In the future, we can add a new may_sleep API which can use this
> infrastructure for callbacks that actually can sleep which will support
> Mathieu's usecase of blocking probes.
> 
> Test: Tested idle and preempt/irq tracepoints.
> 
> [1] https://patchwork.kernel.org/patch/10344297/
> 
> Cc: Steven Rostedt 
> Cc: Peter Zilstra 
> Cc: Ingo Molnar 
> Cc: Mathieu Desnoyers 
> Cc: Tom Zanussi 
> Cc: Namhyung Kim 
> Cc: Thomas Glexiner 
> Cc: Boqun Feng 
> Cc: Paul McKenney 
> Cc: Frederic Weisbecker 
> Cc: Randy Dunlap 
> Cc: Masami Hiramatsu 
> Cc: Fenguang Wu 
> Cc: Baohong Liu 
> Cc: Vedang Patel 
> Cc: kernel-t...@android.com
> Signed-off-by: Joel Fernandes 
> ---
> include/linux/tracepoint.h | 37 +
> kernel/tracepoint.c| 10 +-
> 2 files changed, 38 insertions(+), 9 deletions(-)
> 
> diff --git a/include/linux/tracepoint.h b/include/linux/tracepoint.h
> index c94f466d57ef..a1c1987de423 100644
> --- a/include/linux/tracepoint.h
> +++ b/include/linux/tracepoint.h
> @@ -15,6 +15,7 @@
>  */
> 
> #include 
> +#include 
> #include 
> #include 
> #include 
> @@ -33,6 +34,8 @@ struct trace_eval_map {
> 
> #define TRACEPOINT_DEFAULT_PRIO   10
> 
> +extern struct srcu_struct tracepoint_srcu;
> +
> extern int
> tracepoint_probe_register(struct tracepoint *tp, void *probe, void *data);
> extern int
> @@ -77,6 +80,7 @@ int unregister_tracepoint_module_notifier(struct
> notifier_block *nb)
>  */
> static inline void tracepoint_synchronize_unregister(void)
> {
> + synchronize_srcu(_srcu);
>   synchronize_sched();
> }
> 
> @@ -129,18 +133,26 @@ extern void syscall_unregfunc(void);
>  * as "(void *, void)". The DECLARE_TRACE_NOARGS() will pass in just
>  * "void *data", where as the DECLARE_TRACE() will pass in "void *data, 
> proto".
>  */
> -#define __DO_TRACE(tp, proto, args, cond, rcucheck)  \
> +#define __DO_TRACE(tp, proto, args, cond, preempt_on)
> \
>   do {\
>   struct tracepoint_func *it_func_ptr;\
>   void *it_func;  \
>   void *__data;   \
> + int __maybe_unused idx = 0; \
>   \
>   if (!(cond))\
>   return; \
> - if (rcucheck)   \
> - rcu_irq_enter_irqson(); \
> - rcu_read_lock_sched_notrace();  \
> - it_func_ptr = rcu_dereference_sched((tp)->funcs);   \
> + if (preempt_on) {   \
> + WARN_ON_ONCE(in_nmi()); /* no srcu from nmi */  \
> + idx = srcu_read_lock(_srcu); \
> + it_func_ptr = srcu_dereference((tp)->funcs, \
> +

Re: [PATCH RFC] tracepoint: Introduce tracepoint callbacks executing with preempt on

2018-04-27 Thread Mathieu Desnoyers
- On Apr 27, 2018, at 12:26 AM, Joel Fernandes joe...@google.com wrote:

> In recent tests with IRQ on/off tracepoints, a large performance
> overhead ~10% is noticed when running hackbench. This is root caused to
> calls to rcu_irq_enter_irqson and rcu_irq_exit_irqson from the
> tracepoint code. Following a long discussion on the list [1] about this,
> we concluded that srcu is a better alternative for use during rcu idle.
> Although it does involve extra barriers, its lighter than the sched-rcu
> version which has to do additional RCU calls to notify RCU idle about
> entry into RCU sections.
> 
> In this patch, we change the underlying implementation of the
> trace_*_rcuidle API to use SRCU. This has shown to improve performance
> alot for the high frequency irq enable/disable tracepoints.

The general approach and the implementation look fine, except for
one small detail: I would be tempted to explicitly disable preemption
around the call to the tracepoint callback for the rcuidle variant,
unless we plan to audit every tracer right away to remove any assumption
that preemption is disabled in the callback implementation.

That would be the main difference between an eventual "may_sleep" tracepoint
and a rcuidle tracepoint: "may_sleep" would use SRCU and leave preemption
enabled when invoking the callback. rcuidle uses SRCU too, but would
disable preemption when invoking the callback.

Thoughts ?

Thanks,

Mathieu


> 
> In the future, we can add a new may_sleep API which can use this
> infrastructure for callbacks that actually can sleep which will support
> Mathieu's usecase of blocking probes.
> 
> Test: Tested idle and preempt/irq tracepoints.
> 
> [1] https://patchwork.kernel.org/patch/10344297/
> 
> Cc: Steven Rostedt 
> Cc: Peter Zilstra 
> Cc: Ingo Molnar 
> Cc: Mathieu Desnoyers 
> Cc: Tom Zanussi 
> Cc: Namhyung Kim 
> Cc: Thomas Glexiner 
> Cc: Boqun Feng 
> Cc: Paul McKenney 
> Cc: Frederic Weisbecker 
> Cc: Randy Dunlap 
> Cc: Masami Hiramatsu 
> Cc: Fenguang Wu 
> Cc: Baohong Liu 
> Cc: Vedang Patel 
> Cc: kernel-t...@android.com
> Signed-off-by: Joel Fernandes 
> ---
> include/linux/tracepoint.h | 37 +
> kernel/tracepoint.c| 10 +-
> 2 files changed, 38 insertions(+), 9 deletions(-)
> 
> diff --git a/include/linux/tracepoint.h b/include/linux/tracepoint.h
> index c94f466d57ef..a1c1987de423 100644
> --- a/include/linux/tracepoint.h
> +++ b/include/linux/tracepoint.h
> @@ -15,6 +15,7 @@
>  */
> 
> #include 
> +#include 
> #include 
> #include 
> #include 
> @@ -33,6 +34,8 @@ struct trace_eval_map {
> 
> #define TRACEPOINT_DEFAULT_PRIO   10
> 
> +extern struct srcu_struct tracepoint_srcu;
> +
> extern int
> tracepoint_probe_register(struct tracepoint *tp, void *probe, void *data);
> extern int
> @@ -77,6 +80,7 @@ int unregister_tracepoint_module_notifier(struct
> notifier_block *nb)
>  */
> static inline void tracepoint_synchronize_unregister(void)
> {
> + synchronize_srcu(_srcu);
>   synchronize_sched();
> }
> 
> @@ -129,18 +133,26 @@ extern void syscall_unregfunc(void);
>  * as "(void *, void)". The DECLARE_TRACE_NOARGS() will pass in just
>  * "void *data", where as the DECLARE_TRACE() will pass in "void *data, 
> proto".
>  */
> -#define __DO_TRACE(tp, proto, args, cond, rcucheck)  \
> +#define __DO_TRACE(tp, proto, args, cond, preempt_on)
> \
>   do {\
>   struct tracepoint_func *it_func_ptr;\
>   void *it_func;  \
>   void *__data;   \
> + int __maybe_unused idx = 0; \
>   \
>   if (!(cond))\
>   return; \
> - if (rcucheck)   \
> - rcu_irq_enter_irqson(); \
> - rcu_read_lock_sched_notrace();  \
> - it_func_ptr = rcu_dereference_sched((tp)->funcs);   \
> + if (preempt_on) {   \
> + WARN_ON_ONCE(in_nmi()); /* no srcu from nmi */  \
> + idx = srcu_read_lock(_srcu); \
> + it_func_ptr = srcu_dereference((tp)->funcs, \
> + _srcu);  \
> + } else {\
> + rcu_read_lock_sched_notrace();  \
> + it_func_ptr =   \
> + rcu_dereference_sched((tp)->funcs); \
> + }  

[PATCH RFC] tracepoint: Introduce tracepoint callbacks executing with preempt on

2018-04-26 Thread Joel Fernandes
In recent tests with IRQ on/off tracepoints, a large performance
overhead ~10% is noticed when running hackbench. This is root caused to
calls to rcu_irq_enter_irqson and rcu_irq_exit_irqson from the
tracepoint code. Following a long discussion on the list [1] about this,
we concluded that srcu is a better alternative for use during rcu idle.
Although it does involve extra barriers, its lighter than the sched-rcu
version which has to do additional RCU calls to notify RCU idle about
entry into RCU sections.

In this patch, we change the underlying implementation of the
trace_*_rcuidle API to use SRCU. This has shown to improve performance
alot for the high frequency irq enable/disable tracepoints.

In the future, we can add a new may_sleep API which can use this
infrastructure for callbacks that actually can sleep which will support
Mathieu's usecase of blocking probes.

Test: Tested idle and preempt/irq tracepoints.

[1] https://patchwork.kernel.org/patch/10344297/

Cc: Steven Rostedt 
Cc: Peter Zilstra 
Cc: Ingo Molnar 
Cc: Mathieu Desnoyers 
Cc: Tom Zanussi 
Cc: Namhyung Kim 
Cc: Thomas Glexiner 
Cc: Boqun Feng 
Cc: Paul McKenney 
Cc: Frederic Weisbecker 
Cc: Randy Dunlap 
Cc: Masami Hiramatsu 
Cc: Fenguang Wu 
Cc: Baohong Liu 
Cc: Vedang Patel 
Cc: kernel-t...@android.com
Signed-off-by: Joel Fernandes 
---
 include/linux/tracepoint.h | 37 +
 kernel/tracepoint.c| 10 +-
 2 files changed, 38 insertions(+), 9 deletions(-)

diff --git a/include/linux/tracepoint.h b/include/linux/tracepoint.h
index c94f466d57ef..a1c1987de423 100644
--- a/include/linux/tracepoint.h
+++ b/include/linux/tracepoint.h
@@ -15,6 +15,7 @@
  */
 
 #include 
+#include 
 #include 
 #include 
 #include 
@@ -33,6 +34,8 @@ struct trace_eval_map {
 
 #define TRACEPOINT_DEFAULT_PRIO10
 
+extern struct srcu_struct tracepoint_srcu;
+
 extern int
 tracepoint_probe_register(struct tracepoint *tp, void *probe, void *data);
 extern int
@@ -77,6 +80,7 @@ int unregister_tracepoint_module_notifier(struct 
notifier_block *nb)
  */
 static inline void tracepoint_synchronize_unregister(void)
 {
+   synchronize_srcu(_srcu);
synchronize_sched();
 }
 
@@ -129,18 +133,26 @@ extern void syscall_unregfunc(void);
  * as "(void *, void)". The DECLARE_TRACE_NOARGS() will pass in just
  * "void *data", where as the DECLARE_TRACE() will pass in "void *data, proto".
  */
-#define __DO_TRACE(tp, proto, args, cond, rcucheck)\
+#define __DO_TRACE(tp, proto, args, cond, preempt_on)  \
do {\
struct tracepoint_func *it_func_ptr;\
void *it_func;  \
void *__data;   \
+   int __maybe_unused idx = 0; \
\
if (!(cond))\
return; \
-   if (rcucheck)   \
-   rcu_irq_enter_irqson(); \
-   rcu_read_lock_sched_notrace();  \
-   it_func_ptr = rcu_dereference_sched((tp)->funcs);   \
+   if (preempt_on) {   \
+   WARN_ON_ONCE(in_nmi()); /* no srcu from nmi */  \
+   idx = srcu_read_lock(_srcu); \
+   it_func_ptr = srcu_dereference((tp)->funcs, \
+   _srcu);  \
+   } else {\
+   rcu_read_lock_sched_notrace();  \
+   it_func_ptr =   \
+   rcu_dereference_sched((tp)->funcs); \
+   }   \
+   \
if (it_func_ptr) {  \
do {\
it_func = (it_func_ptr)->func;  \
@@ -148,12 +160,21 @@ extern void syscall_unregfunc(void);
((void(*)(proto))(it_func))(args);  \
   

[PATCH RFC] tracepoint: Introduce tracepoint callbacks executing with preempt on

2018-04-26 Thread Joel Fernandes
In recent tests with IRQ on/off tracepoints, a large performance
overhead ~10% is noticed when running hackbench. This is root caused to
calls to rcu_irq_enter_irqson and rcu_irq_exit_irqson from the
tracepoint code. Following a long discussion on the list [1] about this,
we concluded that srcu is a better alternative for use during rcu idle.
Although it does involve extra barriers, its lighter than the sched-rcu
version which has to do additional RCU calls to notify RCU idle about
entry into RCU sections.

In this patch, we change the underlying implementation of the
trace_*_rcuidle API to use SRCU. This has shown to improve performance
alot for the high frequency irq enable/disable tracepoints.

In the future, we can add a new may_sleep API which can use this
infrastructure for callbacks that actually can sleep which will support
Mathieu's usecase of blocking probes.

Test: Tested idle and preempt/irq tracepoints.

[1] https://patchwork.kernel.org/patch/10344297/

Cc: Steven Rostedt 
Cc: Peter Zilstra 
Cc: Ingo Molnar 
Cc: Mathieu Desnoyers 
Cc: Tom Zanussi 
Cc: Namhyung Kim 
Cc: Thomas Glexiner 
Cc: Boqun Feng 
Cc: Paul McKenney 
Cc: Frederic Weisbecker 
Cc: Randy Dunlap 
Cc: Masami Hiramatsu 
Cc: Fenguang Wu 
Cc: Baohong Liu 
Cc: Vedang Patel 
Cc: kernel-t...@android.com
Signed-off-by: Joel Fernandes 
---
 include/linux/tracepoint.h | 37 +
 kernel/tracepoint.c| 10 +-
 2 files changed, 38 insertions(+), 9 deletions(-)

diff --git a/include/linux/tracepoint.h b/include/linux/tracepoint.h
index c94f466d57ef..a1c1987de423 100644
--- a/include/linux/tracepoint.h
+++ b/include/linux/tracepoint.h
@@ -15,6 +15,7 @@
  */
 
 #include 
+#include 
 #include 
 #include 
 #include 
@@ -33,6 +34,8 @@ struct trace_eval_map {
 
 #define TRACEPOINT_DEFAULT_PRIO10
 
+extern struct srcu_struct tracepoint_srcu;
+
 extern int
 tracepoint_probe_register(struct tracepoint *tp, void *probe, void *data);
 extern int
@@ -77,6 +80,7 @@ int unregister_tracepoint_module_notifier(struct 
notifier_block *nb)
  */
 static inline void tracepoint_synchronize_unregister(void)
 {
+   synchronize_srcu(_srcu);
synchronize_sched();
 }
 
@@ -129,18 +133,26 @@ extern void syscall_unregfunc(void);
  * as "(void *, void)". The DECLARE_TRACE_NOARGS() will pass in just
  * "void *data", where as the DECLARE_TRACE() will pass in "void *data, proto".
  */
-#define __DO_TRACE(tp, proto, args, cond, rcucheck)\
+#define __DO_TRACE(tp, proto, args, cond, preempt_on)  \
do {\
struct tracepoint_func *it_func_ptr;\
void *it_func;  \
void *__data;   \
+   int __maybe_unused idx = 0; \
\
if (!(cond))\
return; \
-   if (rcucheck)   \
-   rcu_irq_enter_irqson(); \
-   rcu_read_lock_sched_notrace();  \
-   it_func_ptr = rcu_dereference_sched((tp)->funcs);   \
+   if (preempt_on) {   \
+   WARN_ON_ONCE(in_nmi()); /* no srcu from nmi */  \
+   idx = srcu_read_lock(_srcu); \
+   it_func_ptr = srcu_dereference((tp)->funcs, \
+   _srcu);  \
+   } else {\
+   rcu_read_lock_sched_notrace();  \
+   it_func_ptr =   \
+   rcu_dereference_sched((tp)->funcs); \
+   }   \
+   \
if (it_func_ptr) {  \
do {\
it_func = (it_func_ptr)->func;  \
@@ -148,12 +160,21 @@ extern void syscall_unregfunc(void);
((void(*)(proto))(it_func))(args);  \
} while ((++it_func_ptr)->func);\
}   \
-   rcu_read_unlock_sched_notrace();\
-   if (rcucheck)   \
-   rcu_irq_exit_irqson();  \
+