Re: [tip:core/rcu] rcutorture: Make initrd/init execute in userspace

2018-12-05 Thread Josh Triplett
On Wed, Dec 05, 2018 at 04:08:09PM -0800, Paul E. McKenney wrote:
> On Wed, Dec 05, 2018 at 02:25:24PM -0800, Josh Triplett wrote:
> > On Tue, Dec 04, 2018 at 03:04:23PM -0800, Paul E. McKenney wrote:
> > > On Tue, Dec 04, 2018 at 02:24:13PM -0800, Josh Triplett wrote:
> > > > On Tue, Dec 04, 2018 at 02:09:42PM -0800, tip-bot for Paul E. McKenney 
> > > > wrote:
> > > > > --- a/tools/testing/selftests/rcutorture/bin/mkinitrd.sh
> > > > > +++ b/tools/testing/selftests/rcutorture/bin/mkinitrd.sh
> > > > > @@ -39,9 +39,22 @@ mkdir $T
> > > > >  
> > > > >  cat > $T/init << '__EOF___'
> > > > >  #!/bin/sh
> > > > > +# Run in userspace a few milliseconds every second.  This helps to
> > > > > +# exercise the NO_HZ_FULL portions of RCU.
> > > > >  while :
> > > > >  do
> > > > > - sleep 100
> > > > > + q=
> > > > > + for i in \
> > > > > + a a a a a a a a a a a a a a a a a a a a a a a a a a a a 
> > > > > a a a \
> > > > > + a a a a a a a a a a a a a a a a a a a a a a a a a a a a 
> > > > > a a a \
> > > > > + a a a a a a a a a a a a a a a a a a a a a a a a a a a a 
> > > > > a a a \
> > > > > + a a a a a a a a a a a a a a a a a a a a a a a a a a a a 
> > > > > a a a \
> > > > > + a a a a a a a a a a a a a a a a a a a a a a a a a a a a 
> > > > > a a a \
> > > > > + a a a a a a a a a a a a a a a a a a a a a a a a a a a a 
> > > > > a a a
> > > > 
> > > > Ow. If there's no better way to do this, please do at least comment how 
> > > > many 'a's
> > > > this is. (And why 186, exactly?)
> > > 
> > > Yeah, that is admittedly a bit strange.  The reason for 186 occurrences of
> > > "a" to one-time calibration, measuring a few millisecond's worth of delay.
> > > 
> > > > Please also consider calibrating the delay loop as you do in the C code.
> > > 
> > > Good point.  And a quick web search finds me "date '+%s%N'", which gives
> > > me nanoseconds since the epoch.  I probably don't want to do a 2038 to
> > > myself (after all, I might still be alive then), so I should probably try
> > > to make something work with "date '+%N'".  Or use something like this:
> > > 
> > >   $ date '+%4N'; date '+%4N';date '+%4N'; date '+%4N'
> > >   6660
> > >   6685
> > >   6697
> > >   6710
> > > 
> > > Ah, but that means I need to add the "date" command to my initrd, doesn't
> > > it?  And calculation requires either bash or the "test" command.  And it
> > > would be quite good to restrict this to what can be done with Bourne shell
> > > built-in commands, since a big point of this is to maintain a small-sized
> > > initrd.  :-/
> > 
> > Sure, and I'm not suggesting adding commands to the initrd, hence my
> > mention of "If there's no better way".
> > 
> > > So how about the following patch, which attempts to explain the situation?
> > 
> > That would help, but please also consider consolidating with something
> > like a10="a a a a a a a a a a" to make it more readable (and perhaps
> > rounding up to 200 for simplicity).
> 
> How about powers of four and one factor of three for 192, as shown below?

Perfect, thanks. That's much better.

Reviewed-by: Josh Triplett 

>   Thanx, Paul
> 
> 
> 
> commit 4f8f751961b536f77c8f82394963e8e2d26efd84
> Author: Paul E. McKenney 
> Date:   Tue Dec 4 14:59:12 2018 -0800
> 
> torture: Explain and simplify odd "for" loop in mkinitrd.sh
> 
> Why a Bourne-shell "for" loop?  And why 192 instances of "a"?  This commit
> adds a shell comment to present the answer to these mysteries.  It also
> uses a series of factor-of-four Bourne-shell assignments to make it
> easy to see how many instances there are, replacing the earlier wall of
> 'a' characters.
> 
> Reported-by: Josh Triplett 
> Signed-off-by: Paul E. McKenney 
> 
> diff --git a/tools/testing/selftests/rcutorture/bin/mkinitrd.sh 
> b/tools/testing/selftests/rcutorture/bin/mkinitrd.sh
> index da298394daa2..ff69190604ea 100755
> --- a/tools/testing/selftests/rcutorture/bin/mkinitrd.sh
> +++ b/tools/testing/selftests/rcutorture/bin/mkinitrd.sh
> @@ -40,17 +40,24 @@ mkdir $T
>  cat > $T/init << '__EOF___'
>  #!/bin/sh
>  # Run in userspace a few milliseconds every second.  This helps to
> -# exercise the NO_HZ_FULL portions of RCU.
> +# exercise the NO_HZ_FULL portions of RCU.  The 192 instances of "a" was
> +# empirically shown to give a nice multi-millisecond burst of user-mode
> +# execution on a 2GHz CPU, as desired.  Modern CPUs will vary from a
> +# couple of milliseconds up to perhaps 100 milliseconds, which is an
> +# acceptable range.
> +#
> +# Why not calibrate an exact delay?  Because within this initrd, we
> +# are restricted to Bourne-shell builtins, which as far as I know do not
> +# provide any means of obtaining a fine-grained timestamp.
> +
> +a4="a a a a"
> +a16="$a4 $a4 $a4 $a4"
> 

Re: [tip:core/rcu] rcutorture: Make initrd/init execute in userspace

2018-12-05 Thread Josh Triplett
On Wed, Dec 05, 2018 at 04:08:09PM -0800, Paul E. McKenney wrote:
> On Wed, Dec 05, 2018 at 02:25:24PM -0800, Josh Triplett wrote:
> > On Tue, Dec 04, 2018 at 03:04:23PM -0800, Paul E. McKenney wrote:
> > > On Tue, Dec 04, 2018 at 02:24:13PM -0800, Josh Triplett wrote:
> > > > On Tue, Dec 04, 2018 at 02:09:42PM -0800, tip-bot for Paul E. McKenney 
> > > > wrote:
> > > > > --- a/tools/testing/selftests/rcutorture/bin/mkinitrd.sh
> > > > > +++ b/tools/testing/selftests/rcutorture/bin/mkinitrd.sh
> > > > > @@ -39,9 +39,22 @@ mkdir $T
> > > > >  
> > > > >  cat > $T/init << '__EOF___'
> > > > >  #!/bin/sh
> > > > > +# Run in userspace a few milliseconds every second.  This helps to
> > > > > +# exercise the NO_HZ_FULL portions of RCU.
> > > > >  while :
> > > > >  do
> > > > > - sleep 100
> > > > > + q=
> > > > > + for i in \
> > > > > + a a a a a a a a a a a a a a a a a a a a a a a a a a a a 
> > > > > a a a \
> > > > > + a a a a a a a a a a a a a a a a a a a a a a a a a a a a 
> > > > > a a a \
> > > > > + a a a a a a a a a a a a a a a a a a a a a a a a a a a a 
> > > > > a a a \
> > > > > + a a a a a a a a a a a a a a a a a a a a a a a a a a a a 
> > > > > a a a \
> > > > > + a a a a a a a a a a a a a a a a a a a a a a a a a a a a 
> > > > > a a a \
> > > > > + a a a a a a a a a a a a a a a a a a a a a a a a a a a a 
> > > > > a a a
> > > > 
> > > > Ow. If there's no better way to do this, please do at least comment how 
> > > > many 'a's
> > > > this is. (And why 186, exactly?)
> > > 
> > > Yeah, that is admittedly a bit strange.  The reason for 186 occurrences of
> > > "a" to one-time calibration, measuring a few millisecond's worth of delay.
> > > 
> > > > Please also consider calibrating the delay loop as you do in the C code.
> > > 
> > > Good point.  And a quick web search finds me "date '+%s%N'", which gives
> > > me nanoseconds since the epoch.  I probably don't want to do a 2038 to
> > > myself (after all, I might still be alive then), so I should probably try
> > > to make something work with "date '+%N'".  Or use something like this:
> > > 
> > >   $ date '+%4N'; date '+%4N';date '+%4N'; date '+%4N'
> > >   6660
> > >   6685
> > >   6697
> > >   6710
> > > 
> > > Ah, but that means I need to add the "date" command to my initrd, doesn't
> > > it?  And calculation requires either bash or the "test" command.  And it
> > > would be quite good to restrict this to what can be done with Bourne shell
> > > built-in commands, since a big point of this is to maintain a small-sized
> > > initrd.  :-/
> > 
> > Sure, and I'm not suggesting adding commands to the initrd, hence my
> > mention of "If there's no better way".
> > 
> > > So how about the following patch, which attempts to explain the situation?
> > 
> > That would help, but please also consider consolidating with something
> > like a10="a a a a a a a a a a" to make it more readable (and perhaps
> > rounding up to 200 for simplicity).
> 
> How about powers of four and one factor of three for 192, as shown below?

Perfect, thanks. That's much better.

Reviewed-by: Josh Triplett 

>   Thanx, Paul
> 
> 
> 
> commit 4f8f751961b536f77c8f82394963e8e2d26efd84
> Author: Paul E. McKenney 
> Date:   Tue Dec 4 14:59:12 2018 -0800
> 
> torture: Explain and simplify odd "for" loop in mkinitrd.sh
> 
> Why a Bourne-shell "for" loop?  And why 192 instances of "a"?  This commit
> adds a shell comment to present the answer to these mysteries.  It also
> uses a series of factor-of-four Bourne-shell assignments to make it
> easy to see how many instances there are, replacing the earlier wall of
> 'a' characters.
> 
> Reported-by: Josh Triplett 
> Signed-off-by: Paul E. McKenney 
> 
> diff --git a/tools/testing/selftests/rcutorture/bin/mkinitrd.sh 
> b/tools/testing/selftests/rcutorture/bin/mkinitrd.sh
> index da298394daa2..ff69190604ea 100755
> --- a/tools/testing/selftests/rcutorture/bin/mkinitrd.sh
> +++ b/tools/testing/selftests/rcutorture/bin/mkinitrd.sh
> @@ -40,17 +40,24 @@ mkdir $T
>  cat > $T/init << '__EOF___'
>  #!/bin/sh
>  # Run in userspace a few milliseconds every second.  This helps to
> -# exercise the NO_HZ_FULL portions of RCU.
> +# exercise the NO_HZ_FULL portions of RCU.  The 192 instances of "a" was
> +# empirically shown to give a nice multi-millisecond burst of user-mode
> +# execution on a 2GHz CPU, as desired.  Modern CPUs will vary from a
> +# couple of milliseconds up to perhaps 100 milliseconds, which is an
> +# acceptable range.
> +#
> +# Why not calibrate an exact delay?  Because within this initrd, we
> +# are restricted to Bourne-shell builtins, which as far as I know do not
> +# provide any means of obtaining a fine-grained timestamp.
> +
> +a4="a a a a"
> +a16="$a4 $a4 $a4 $a4"
> 

Re: [PATCH v2] binder: fix use-after-free due to fdget() optimization

2018-12-05 Thread Todd Kjos
On Wed, Dec 5, 2018 at 2:00 PM Al Viro  wrote:
>
> On Wed, Dec 05, 2018 at 01:16:01PM -0800, Todd Kjos wrote:
> > 44d8047f1d87a ("binder: use standard functions to allocate fds")
> > exposed a pre-existing issue in the binder driver.
> >
> > fdget() is used in ksys_ioctl() as a performance optimization.
> > One of the rules associated with fdget() is that ksys_close() must
> > not be called between the fdget() and the fdput(). There is a case
> > where this requirement is not met in the binder driver (and possibly
> > other drivers) which results in the reference count dropping to 0
> > when the device is still in use. This can result in use-after-free
> > or other issues.
> >
> > This was observed with the following sequence of events:
> >
> > Task A and task B are connected via binder; task A has /dev/binder open at
> > file descriptor number X. Both tasks are single-threaded.
> >
> > 1. task B sends a binder message with a file descriptor array
> >(BINDER_TYPE_FDA) containing one file descriptor to task A
> > 2. task A reads the binder message with the translated file
> >descriptor number Y
> > 3. task A uses dup2(X, Y) to overwrite file descriptor Y with
> >the /dev/binder file
> > 4. task A unmaps the userspace binder memory mapping; the reference
> >count on task A's /dev/binder is now 2
> > 5. task A closes file descriptor X; the reference count on task
> >A's /dev/binder is now 1
> > 6. task A forks off a child, task C, duplicating the file descriptor
> >table; the reference count on task A's /dev/binder is now 2
> > 7. task A invokes the BC_FREE_BUFFER command on file descriptor X
> >to release the incoming binder message
> > 8. fdget() in ksys_ioctl() suppresses the reference count increment,
> >since the file descriptor table is not shared
> > 9. the BC_FREE_BUFFER handler removes the file descriptor table
> >entry for X and decrements the reference count of task A's
> >/dev/binder file to 1
> > 10.task C calls close(X), which drops the reference count of
> >task A's /dev/binder to 0 and frees it
> > 11.task A continues processing of the ioctl and accesses some
> >property of e.g. the binder_proc => KASAN-detectable UAF
> >
> > Fixed by using get_file() / fput() in binder_ioctl().
>
> Note that this patch does *not* remove the nasty trap caused by the garbage
> in question - struct file can be freed before we even return from
> ->unlocked_ioctl().  Could you describe in details the desired behaviour
> of this interface?

The ioctl(BC_FREE_BUFFER) frees the buffer memory associated with a
transaction that has completed processing in userspace. If the buffer
contains an FDA object (file-descriptor array), then it closes all of the
fds passed in the transaction using ksys_close(). In the case with the
issue, the fd associated with the binder driver has been passed in the
array. Since the fdget() optimization didn't increment the reference, this
makes us vulnerable to the UAF described above since the rules for
fdget() are being violated (ksys_close()). This change did prevent the
final close during the handling of BC_FREE_BUFFER, but as you point
out, may still result in the final close being processed prematurely after
the new fput() (no observed negative side-effects right now, but agreed
this could be an issue).

>
> How about grabbing the references to all victims (*before* screwing with
> ksys_close()), sticking them into a structure with embedded callback_head
> and using task_work_add() on it, the callback doing those fput()?
>
> The callback would trigger before the return to userland, so observable
> timing of the final close wouldn't be changed.  And it would avoid the
> kludges like this.

I'll rework it according to your suggestion. I had hoped to do this in a way
that doesn't require adding calls to non-exported functions since we are
trying to clean up binder (I hear you snickering) to be a better citizen and
not rely on internal functions that drivers shouldn't be using. I presume
there are no plans to export task_work_add()...

>
> Of course, the proper fix would require TARDIS and set of instruments for
> treating severe case of retrocranial inversion, so that this "ABI" would've
> never existed, but...

There are indeed many things about the binder interface we'd do differently
if we had the chance to start over...

-Todd


Re: [PATCH v2] binder: fix use-after-free due to fdget() optimization

2018-12-05 Thread Todd Kjos
On Wed, Dec 5, 2018 at 2:00 PM Al Viro  wrote:
>
> On Wed, Dec 05, 2018 at 01:16:01PM -0800, Todd Kjos wrote:
> > 44d8047f1d87a ("binder: use standard functions to allocate fds")
> > exposed a pre-existing issue in the binder driver.
> >
> > fdget() is used in ksys_ioctl() as a performance optimization.
> > One of the rules associated with fdget() is that ksys_close() must
> > not be called between the fdget() and the fdput(). There is a case
> > where this requirement is not met in the binder driver (and possibly
> > other drivers) which results in the reference count dropping to 0
> > when the device is still in use. This can result in use-after-free
> > or other issues.
> >
> > This was observed with the following sequence of events:
> >
> > Task A and task B are connected via binder; task A has /dev/binder open at
> > file descriptor number X. Both tasks are single-threaded.
> >
> > 1. task B sends a binder message with a file descriptor array
> >(BINDER_TYPE_FDA) containing one file descriptor to task A
> > 2. task A reads the binder message with the translated file
> >descriptor number Y
> > 3. task A uses dup2(X, Y) to overwrite file descriptor Y with
> >the /dev/binder file
> > 4. task A unmaps the userspace binder memory mapping; the reference
> >count on task A's /dev/binder is now 2
> > 5. task A closes file descriptor X; the reference count on task
> >A's /dev/binder is now 1
> > 6. task A forks off a child, task C, duplicating the file descriptor
> >table; the reference count on task A's /dev/binder is now 2
> > 7. task A invokes the BC_FREE_BUFFER command on file descriptor X
> >to release the incoming binder message
> > 8. fdget() in ksys_ioctl() suppresses the reference count increment,
> >since the file descriptor table is not shared
> > 9. the BC_FREE_BUFFER handler removes the file descriptor table
> >entry for X and decrements the reference count of task A's
> >/dev/binder file to 1
> > 10.task C calls close(X), which drops the reference count of
> >task A's /dev/binder to 0 and frees it
> > 11.task A continues processing of the ioctl and accesses some
> >property of e.g. the binder_proc => KASAN-detectable UAF
> >
> > Fixed by using get_file() / fput() in binder_ioctl().
>
> Note that this patch does *not* remove the nasty trap caused by the garbage
> in question - struct file can be freed before we even return from
> ->unlocked_ioctl().  Could you describe in details the desired behaviour
> of this interface?

The ioctl(BC_FREE_BUFFER) frees the buffer memory associated with a
transaction that has completed processing in userspace. If the buffer
contains an FDA object (file-descriptor array), then it closes all of the
fds passed in the transaction using ksys_close(). In the case with the
issue, the fd associated with the binder driver has been passed in the
array. Since the fdget() optimization didn't increment the reference, this
makes us vulnerable to the UAF described above since the rules for
fdget() are being violated (ksys_close()). This change did prevent the
final close during the handling of BC_FREE_BUFFER, but as you point
out, may still result in the final close being processed prematurely after
the new fput() (no observed negative side-effects right now, but agreed
this could be an issue).

>
> How about grabbing the references to all victims (*before* screwing with
> ksys_close()), sticking them into a structure with embedded callback_head
> and using task_work_add() on it, the callback doing those fput()?
>
> The callback would trigger before the return to userland, so observable
> timing of the final close wouldn't be changed.  And it would avoid the
> kludges like this.

I'll rework it according to your suggestion. I had hoped to do this in a way
that doesn't require adding calls to non-exported functions since we are
trying to clean up binder (I hear you snickering) to be a better citizen and
not rely on internal functions that drivers shouldn't be using. I presume
there are no plans to export task_work_add()...

>
> Of course, the proper fix would require TARDIS and set of instruments for
> treating severe case of retrocranial inversion, so that this "ABI" would've
> never existed, but...

There are indeed many things about the binder interface we'd do differently
if we had the chance to start over...

-Todd


Re: [LKP] [mm] ac5b2c1891: vm-scalability.throughput -61.3% regression

2018-12-05 Thread David Rientjes
On Wed, 5 Dec 2018, Andrea Arcangeli wrote:

> __GFP_COMPACT_ONLY gave an hope it could give some middle ground but
> it shows awful compaction results, it basically destroys compaction
> effectiveness and we know why (COMPACT_SKIPPED must call reclaim or
> compaction can't succeed because there's not enough free memory in the
> node). If somebody used MADV_HUGEPAGE compaction should still work and
> not fail like that. Compaction would fail to be effective even in the
> local node where __GFP_THISNODE didn't fail. Worst of all it'd fail
> even on non-NUMA systems (that would be easy to fix though by making
> the HPAGE_PMD_ORDER check conditional to NUMA being enabled at
> runtime).
> 

Note that in addition to COMPACT_SKIPPED that you mention, compaction can 
fail with COMPACT_COMPLETE, meaning the full scan has finished without 
freeing a hugepage, or COMPACT_DEFERRED, meaning that doing another scan 
is unlikely to produce a different result.  COMPACT_SKIPPED makes sense to 
do reclaim if it can become accessible to isolate_freepages() and 
hopefully another allocator does not allocate from these newly freed pages 
before compaction can scan the zone again.  For COMPACT_COMPLETE and 
COMPACT_DEFERRED, reclaim is unlikely to ever help.


Re: [LKP] [mm] ac5b2c1891: vm-scalability.throughput -61.3% regression

2018-12-05 Thread David Rientjes
On Wed, 5 Dec 2018, Andrea Arcangeli wrote:

> __GFP_COMPACT_ONLY gave an hope it could give some middle ground but
> it shows awful compaction results, it basically destroys compaction
> effectiveness and we know why (COMPACT_SKIPPED must call reclaim or
> compaction can't succeed because there's not enough free memory in the
> node). If somebody used MADV_HUGEPAGE compaction should still work and
> not fail like that. Compaction would fail to be effective even in the
> local node where __GFP_THISNODE didn't fail. Worst of all it'd fail
> even on non-NUMA systems (that would be easy to fix though by making
> the HPAGE_PMD_ORDER check conditional to NUMA being enabled at
> runtime).
> 

Note that in addition to COMPACT_SKIPPED that you mention, compaction can 
fail with COMPACT_COMPLETE, meaning the full scan has finished without 
freeing a hugepage, or COMPACT_DEFERRED, meaning that doing another scan 
is unlikely to produce a different result.  COMPACT_SKIPPED makes sense to 
do reclaim if it can become accessible to isolate_freepages() and 
hopefully another allocator does not allocate from these newly freed pages 
before compaction can scan the zone again.  For COMPACT_COMPLETE and 
COMPACT_DEFERRED, reclaim is unlikely to ever help.


Re: [PATCH 0/5] Add new features for SC27XX fuel gauge driver

2018-12-05 Thread Sebastian Reichel
Hi,

On Wed, Nov 14, 2018 at 05:07:03PM +0800, Baolin Wang wrote:
> This patch set adds some new features for SC27XX fuel gauge driver.
> 
> 1. Read calibration data from eFuse device to calibrate fuel gauge.
> 2. Add low voltage alarm to adjust the battery capacity in lower
> voltage stage.
> 3. Add power management interfaces
> 4. Save last optimized battery capacity to be used as the initial
> battery capacity if system is not first power-on.
> 
> Baolin Wang (2):
>   dt-bindings: power: supply: Add nvmem properties to calibrate FGU
>   power: supply: sc27xx: Add fuel gauge calibration
> 
> Yuanjiang Yu (3):
>   power: supply: sc27xx: Add fuel gauge low voltage alarm
>   power: supply: sc27xx: Add suspend/resume interfaces
>   power: supply: sc27xx: Save last battery capacity
> 
>  .../devicetree/bindings/power/supply/sc27xx-fg.txt |4 +
>  drivers/power/supply/sc27xx_fuel_gauge.c   |  453 
> +++-
>  2 files changed, 444 insertions(+), 13 deletions(-)

I applied patches 1-4 and skipped patch 5 due to pending changes.

-- Sebastian


signature.asc
Description: PGP signature


Re: [PATCH 0/5] Add new features for SC27XX fuel gauge driver

2018-12-05 Thread Sebastian Reichel
Hi,

On Wed, Nov 14, 2018 at 05:07:03PM +0800, Baolin Wang wrote:
> This patch set adds some new features for SC27XX fuel gauge driver.
> 
> 1. Read calibration data from eFuse device to calibrate fuel gauge.
> 2. Add low voltage alarm to adjust the battery capacity in lower
> voltage stage.
> 3. Add power management interfaces
> 4. Save last optimized battery capacity to be used as the initial
> battery capacity if system is not first power-on.
> 
> Baolin Wang (2):
>   dt-bindings: power: supply: Add nvmem properties to calibrate FGU
>   power: supply: sc27xx: Add fuel gauge calibration
> 
> Yuanjiang Yu (3):
>   power: supply: sc27xx: Add fuel gauge low voltage alarm
>   power: supply: sc27xx: Add suspend/resume interfaces
>   power: supply: sc27xx: Save last battery capacity
> 
>  .../devicetree/bindings/power/supply/sc27xx-fg.txt |4 +
>  drivers/power/supply/sc27xx_fuel_gauge.c   |  453 
> +++-
>  2 files changed, 444 insertions(+), 13 deletions(-)

I applied patches 1-4 and skipped patch 5 due to pending changes.

-- Sebastian


signature.asc
Description: PGP signature


Re: [RFC PATCH 02/14] mm/hms: heterogenenous memory system (HMS) documentation

2018-12-05 Thread Dan Williams
On Wed, Dec 5, 2018 at 3:27 PM Jerome Glisse  wrote:
>
> On Wed, Dec 05, 2018 at 04:23:42PM -0700, Logan Gunthorpe wrote:
> >
> >
> > On 2018-12-05 4:20 p.m., Jerome Glisse wrote:
> > > And my proposal is under /sys/bus and have symlink to all existing
> > > device it agregate in there.
> >
> > That's so not the point. Use the existing buses don't invent some
> > virtual tree. I don't know how many times I have to say this or in how
> > many ways. I'm not responding anymore.
>
> And how do i express interaction with different buses because i just
> do not see how to do that in the existing scheme. It would be like
> teaching to each bus about all the other bus versus having each bus
> register itself under a common framework and have all the interaction
> between bus mediated through that common framework avoiding code
> duplication accross buses.
>
> >
> > > So you agree with my proposal ? A sysfs directory in which all the
> > > bus and how they are connected to each other and what is connected
> > > to each of them (device, CPU, memory).
> >
> > I'm fine with the motivation. What I'm arguing against is the
> > implementation and the fact you have to create a whole grand new
> > userspace API and hierarchy to accomplish it.

Right, GPUs show up in /sys today. Don't register a whole new
hierarchy as an alias to what already exists, add a new attribute
scheme to the existing hierarchy. This is what the HMAT enabling is
doing, this is what p2pdma is doing.


Re: [RFC PATCH 02/14] mm/hms: heterogenenous memory system (HMS) documentation

2018-12-05 Thread Dan Williams
On Wed, Dec 5, 2018 at 3:27 PM Jerome Glisse  wrote:
>
> On Wed, Dec 05, 2018 at 04:23:42PM -0700, Logan Gunthorpe wrote:
> >
> >
> > On 2018-12-05 4:20 p.m., Jerome Glisse wrote:
> > > And my proposal is under /sys/bus and have symlink to all existing
> > > device it agregate in there.
> >
> > That's so not the point. Use the existing buses don't invent some
> > virtual tree. I don't know how many times I have to say this or in how
> > many ways. I'm not responding anymore.
>
> And how do i express interaction with different buses because i just
> do not see how to do that in the existing scheme. It would be like
> teaching to each bus about all the other bus versus having each bus
> register itself under a common framework and have all the interaction
> between bus mediated through that common framework avoiding code
> duplication accross buses.
>
> >
> > > So you agree with my proposal ? A sysfs directory in which all the
> > > bus and how they are connected to each other and what is connected
> > > to each of them (device, CPU, memory).
> >
> > I'm fine with the motivation. What I'm arguing against is the
> > implementation and the fact you have to create a whole grand new
> > userspace API and hierarchy to accomplish it.

Right, GPUs show up in /sys today. Don't register a whole new
hierarchy as an alias to what already exists, add a new attribute
scheme to the existing hierarchy. This is what the HMAT enabling is
doing, this is what p2pdma is doing.


Re: [tip:core/rcu] rcutorture: Make initrd/init execute in userspace

2018-12-05 Thread Paul E. McKenney
On Wed, Dec 05, 2018 at 02:25:24PM -0800, Josh Triplett wrote:
> On Tue, Dec 04, 2018 at 03:04:23PM -0800, Paul E. McKenney wrote:
> > On Tue, Dec 04, 2018 at 02:24:13PM -0800, Josh Triplett wrote:
> > > On Tue, Dec 04, 2018 at 02:09:42PM -0800, tip-bot for Paul E. McKenney 
> > > wrote:
> > > > --- a/tools/testing/selftests/rcutorture/bin/mkinitrd.sh
> > > > +++ b/tools/testing/selftests/rcutorture/bin/mkinitrd.sh
> > > > @@ -39,9 +39,22 @@ mkdir $T
> > > >  
> > > >  cat > $T/init << '__EOF___'
> > > >  #!/bin/sh
> > > > +# Run in userspace a few milliseconds every second.  This helps to
> > > > +# exercise the NO_HZ_FULL portions of RCU.
> > > >  while :
> > > >  do
> > > > -   sleep 100
> > > > +   q=
> > > > +   for i in \
> > > > +   a a a a a a a a a a a a a a a a a a a a a a a a a a a a 
> > > > a a a \
> > > > +   a a a a a a a a a a a a a a a a a a a a a a a a a a a a 
> > > > a a a \
> > > > +   a a a a a a a a a a a a a a a a a a a a a a a a a a a a 
> > > > a a a \
> > > > +   a a a a a a a a a a a a a a a a a a a a a a a a a a a a 
> > > > a a a \
> > > > +   a a a a a a a a a a a a a a a a a a a a a a a a a a a a 
> > > > a a a \
> > > > +   a a a a a a a a a a a a a a a a a a a a a a a a a a a a 
> > > > a a a
> > > 
> > > Ow. If there's no better way to do this, please do at least comment how 
> > > many 'a's
> > > this is. (And why 186, exactly?)
> > 
> > Yeah, that is admittedly a bit strange.  The reason for 186 occurrences of
> > "a" to one-time calibration, measuring a few millisecond's worth of delay.
> > 
> > > Please also consider calibrating the delay loop as you do in the C code.
> > 
> > Good point.  And a quick web search finds me "date '+%s%N'", which gives
> > me nanoseconds since the epoch.  I probably don't want to do a 2038 to
> > myself (after all, I might still be alive then), so I should probably try
> > to make something work with "date '+%N'".  Or use something like this:
> > 
> > $ date '+%4N'; date '+%4N';date '+%4N'; date '+%4N'
> > 6660
> > 6685
> > 6697
> > 6710
> > 
> > Ah, but that means I need to add the "date" command to my initrd, doesn't
> > it?  And calculation requires either bash or the "test" command.  And it
> > would be quite good to restrict this to what can be done with Bourne shell
> > built-in commands, since a big point of this is to maintain a small-sized
> > initrd.  :-/
> 
> Sure, and I'm not suggesting adding commands to the initrd, hence my
> mention of "If there's no better way".
> 
> > So how about the following patch, which attempts to explain the situation?
> 
> That would help, but please also consider consolidating with something
> like a10="a a a a a a a a a a" to make it more readable (and perhaps
> rounding up to 200 for simplicity).

How about powers of four and one factor of three for 192, as shown below?

Thanx, Paul



commit 4f8f751961b536f77c8f82394963e8e2d26efd84
Author: Paul E. McKenney 
Date:   Tue Dec 4 14:59:12 2018 -0800

torture: Explain and simplify odd "for" loop in mkinitrd.sh

Why a Bourne-shell "for" loop?  And why 192 instances of "a"?  This commit
adds a shell comment to present the answer to these mysteries.  It also
uses a series of factor-of-four Bourne-shell assignments to make it
easy to see how many instances there are, replacing the earlier wall of
'a' characters.

Reported-by: Josh Triplett 
Signed-off-by: Paul E. McKenney 

diff --git a/tools/testing/selftests/rcutorture/bin/mkinitrd.sh 
b/tools/testing/selftests/rcutorture/bin/mkinitrd.sh
index da298394daa2..ff69190604ea 100755
--- a/tools/testing/selftests/rcutorture/bin/mkinitrd.sh
+++ b/tools/testing/selftests/rcutorture/bin/mkinitrd.sh
@@ -40,17 +40,24 @@ mkdir $T
 cat > $T/init << '__EOF___'
 #!/bin/sh
 # Run in userspace a few milliseconds every second.  This helps to
-# exercise the NO_HZ_FULL portions of RCU.
+# exercise the NO_HZ_FULL portions of RCU.  The 192 instances of "a" was
+# empirically shown to give a nice multi-millisecond burst of user-mode
+# execution on a 2GHz CPU, as desired.  Modern CPUs will vary from a
+# couple of milliseconds up to perhaps 100 milliseconds, which is an
+# acceptable range.
+#
+# Why not calibrate an exact delay?  Because within this initrd, we
+# are restricted to Bourne-shell builtins, which as far as I know do not
+# provide any means of obtaining a fine-grained timestamp.
+
+a4="a a a a"
+a16="$a4 $a4 $a4 $a4"
+a64="$a8 $a8 $a8 $a8"
+a192="$a64 $a64 $a64"
 while :
 do
q=
-   for i in \
-   a a a a a a a a a a a a a a a a a a a a a a a a a a a a a a a \
-   a a a a a a a a a a a a a a a a a a a a a a a a a a a a a a a \
-   a a a a a a a a a a a a a a a a a a a a a a a a a a a a a a a \

Re: [tip:core/rcu] rcutorture: Make initrd/init execute in userspace

2018-12-05 Thread Paul E. McKenney
On Wed, Dec 05, 2018 at 02:25:24PM -0800, Josh Triplett wrote:
> On Tue, Dec 04, 2018 at 03:04:23PM -0800, Paul E. McKenney wrote:
> > On Tue, Dec 04, 2018 at 02:24:13PM -0800, Josh Triplett wrote:
> > > On Tue, Dec 04, 2018 at 02:09:42PM -0800, tip-bot for Paul E. McKenney 
> > > wrote:
> > > > --- a/tools/testing/selftests/rcutorture/bin/mkinitrd.sh
> > > > +++ b/tools/testing/selftests/rcutorture/bin/mkinitrd.sh
> > > > @@ -39,9 +39,22 @@ mkdir $T
> > > >  
> > > >  cat > $T/init << '__EOF___'
> > > >  #!/bin/sh
> > > > +# Run in userspace a few milliseconds every second.  This helps to
> > > > +# exercise the NO_HZ_FULL portions of RCU.
> > > >  while :
> > > >  do
> > > > -   sleep 100
> > > > +   q=
> > > > +   for i in \
> > > > +   a a a a a a a a a a a a a a a a a a a a a a a a a a a a 
> > > > a a a \
> > > > +   a a a a a a a a a a a a a a a a a a a a a a a a a a a a 
> > > > a a a \
> > > > +   a a a a a a a a a a a a a a a a a a a a a a a a a a a a 
> > > > a a a \
> > > > +   a a a a a a a a a a a a a a a a a a a a a a a a a a a a 
> > > > a a a \
> > > > +   a a a a a a a a a a a a a a a a a a a a a a a a a a a a 
> > > > a a a \
> > > > +   a a a a a a a a a a a a a a a a a a a a a a a a a a a a 
> > > > a a a
> > > 
> > > Ow. If there's no better way to do this, please do at least comment how 
> > > many 'a's
> > > this is. (And why 186, exactly?)
> > 
> > Yeah, that is admittedly a bit strange.  The reason for 186 occurrences of
> > "a" to one-time calibration, measuring a few millisecond's worth of delay.
> > 
> > > Please also consider calibrating the delay loop as you do in the C code.
> > 
> > Good point.  And a quick web search finds me "date '+%s%N'", which gives
> > me nanoseconds since the epoch.  I probably don't want to do a 2038 to
> > myself (after all, I might still be alive then), so I should probably try
> > to make something work with "date '+%N'".  Or use something like this:
> > 
> > $ date '+%4N'; date '+%4N';date '+%4N'; date '+%4N'
> > 6660
> > 6685
> > 6697
> > 6710
> > 
> > Ah, but that means I need to add the "date" command to my initrd, doesn't
> > it?  And calculation requires either bash or the "test" command.  And it
> > would be quite good to restrict this to what can be done with Bourne shell
> > built-in commands, since a big point of this is to maintain a small-sized
> > initrd.  :-/
> 
> Sure, and I'm not suggesting adding commands to the initrd, hence my
> mention of "If there's no better way".
> 
> > So how about the following patch, which attempts to explain the situation?
> 
> That would help, but please also consider consolidating with something
> like a10="a a a a a a a a a a" to make it more readable (and perhaps
> rounding up to 200 for simplicity).

How about powers of four and one factor of three for 192, as shown below?

Thanx, Paul



commit 4f8f751961b536f77c8f82394963e8e2d26efd84
Author: Paul E. McKenney 
Date:   Tue Dec 4 14:59:12 2018 -0800

torture: Explain and simplify odd "for" loop in mkinitrd.sh

Why a Bourne-shell "for" loop?  And why 192 instances of "a"?  This commit
adds a shell comment to present the answer to these mysteries.  It also
uses a series of factor-of-four Bourne-shell assignments to make it
easy to see how many instances there are, replacing the earlier wall of
'a' characters.

Reported-by: Josh Triplett 
Signed-off-by: Paul E. McKenney 

diff --git a/tools/testing/selftests/rcutorture/bin/mkinitrd.sh 
b/tools/testing/selftests/rcutorture/bin/mkinitrd.sh
index da298394daa2..ff69190604ea 100755
--- a/tools/testing/selftests/rcutorture/bin/mkinitrd.sh
+++ b/tools/testing/selftests/rcutorture/bin/mkinitrd.sh
@@ -40,17 +40,24 @@ mkdir $T
 cat > $T/init << '__EOF___'
 #!/bin/sh
 # Run in userspace a few milliseconds every second.  This helps to
-# exercise the NO_HZ_FULL portions of RCU.
+# exercise the NO_HZ_FULL portions of RCU.  The 192 instances of "a" was
+# empirically shown to give a nice multi-millisecond burst of user-mode
+# execution on a 2GHz CPU, as desired.  Modern CPUs will vary from a
+# couple of milliseconds up to perhaps 100 milliseconds, which is an
+# acceptable range.
+#
+# Why not calibrate an exact delay?  Because within this initrd, we
+# are restricted to Bourne-shell builtins, which as far as I know do not
+# provide any means of obtaining a fine-grained timestamp.
+
+a4="a a a a"
+a16="$a4 $a4 $a4 $a4"
+a64="$a8 $a8 $a8 $a8"
+a192="$a64 $a64 $a64"
 while :
 do
q=
-   for i in \
-   a a a a a a a a a a a a a a a a a a a a a a a a a a a a a a a \
-   a a a a a a a a a a a a a a a a a a a a a a a a a a a a a a a \
-   a a a a a a a a a a a a a a a a a a a a a a a a a a a a a a a \

Re: [PATCH v7 08/14] x86/ftrace: Use text_poke_*() infrastructure

2018-12-05 Thread Nadav Amit
> On Dec 4, 2018, at 5:34 PM, Nadav Amit  wrote:
> 
> A following patch is going to make module allocated memory
> non-executable. This requires to modify ftrace and make the memory
> executable again after it is configured.
> 
> In addition, this patch makes ftrace use the general text poking
> infrastructure instead ftrace's homegrown text patching. This provides
> the advantages of having slightly "safer" code patching and avoiding
> races with module removal or other mechanisms that patch the kernel
> code.
> 
> Cc: Steven Rostedt 
> Signed-off-by: Nadav Amit 
> ---
> arch/x86/kernel/ftrace.c | 74 +---
> 1 file changed, 23 insertions(+), 51 deletions(-)

Steven Rostedt pointed that using text_poke() instead of
probe_kernel_write() would introduce considerable overheads. Running:

  # time { echo function > current_tracer; } 

takes 0.24s without this patch and 0.7s with. I don’t know whether to
consider it “so bad”. Obviously we can introduce a batching mechanism and/or
do some micro-optimization (the latter will not buy us much though).

Anyhow, in the meanwhile Steven asked that we’ll leave out the changes in
this patch-set, excluding the set_memory_x() that we need after calling
module_alloc(), and consider them later.



Re: [PATCH v7 08/14] x86/ftrace: Use text_poke_*() infrastructure

2018-12-05 Thread Nadav Amit
> On Dec 4, 2018, at 5:34 PM, Nadav Amit  wrote:
> 
> A following patch is going to make module allocated memory
> non-executable. This requires to modify ftrace and make the memory
> executable again after it is configured.
> 
> In addition, this patch makes ftrace use the general text poking
> infrastructure instead ftrace's homegrown text patching. This provides
> the advantages of having slightly "safer" code patching and avoiding
> races with module removal or other mechanisms that patch the kernel
> code.
> 
> Cc: Steven Rostedt 
> Signed-off-by: Nadav Amit 
> ---
> arch/x86/kernel/ftrace.c | 74 +---
> 1 file changed, 23 insertions(+), 51 deletions(-)

Steven Rostedt pointed that using text_poke() instead of
probe_kernel_write() would introduce considerable overheads. Running:

  # time { echo function > current_tracer; } 

takes 0.24s without this patch and 0.7s with. I don’t know whether to
consider it “so bad”. Obviously we can introduce a batching mechanism and/or
do some micro-optimization (the latter will not buy us much though).

Anyhow, in the meanwhile Steven asked that we’ll leave out the changes in
this patch-set, excluding the set_memory_x() that we need after calling
module_alloc(), and consider them later.



[PATCH v3 0/2] Update AMBA driver for enhanced component ID spec.

2018-12-05 Thread Mike Leach
The latest ARM CoreSight specification updates the component identification
requirements for all components attached to an AMBA bus. (ARM IHI 0029E)

This specification defines bits 15:12 in the ComponentID (CID) value as the
device class. Identification requirements now depend on this class.
Class 0xF: Traditional components identified by Peripheral ID (PID) only.
Class 0x9: CoreSight components may be identified by a Universal Component
Identifier (UCI) consisting of the PID plus CoreSight DevType and DevArch
values.

Current and future ARM CoreSight IP will now use the same PID for
components on the same function - e.g. the ETM, CTI, PMU and Debug elements
associated with a core. The first core to use this UCI method is the A35,
which currently has binding entries in the ETMv4 driver.

This patchset prepares for the addition of the upcoming CTI driver, which
will need to correctly bind with A35 and future hardware, while overcoming
the limitation of binding by PID alone, which cannot now work.

The patchset updates the current AMBA Identification mechanism, which was
already differentiating between 0xF and 0x9 CIDs, to add
additional UCI compliant tests for the for the 0x9 device class.

Additional UCI structures are provided and added to the ETMv4 driver as
appropriate.

Changes since v2:
Simplification of amba_cs_uci_id_match().
Fix CID class bitfield comments.
Dropped RFC tag on patchset.

Mike Leach (2):
  drivers: amba: Updates to component identification for driver
matching.
  coresight: etmv4: Update ID register table to add  UCI support

 drivers/amba/bus.c| 45 +++
 drivers/hwtracing/coresight/coresight-etm4x.c | 18 +++-
 include/linux/amba/bus.h  | 32 +
 3 files changed, 86 insertions(+), 9 deletions(-)

-- 
2.19.1



[PATCH v3 0/2] Update AMBA driver for enhanced component ID spec.

2018-12-05 Thread Mike Leach
The latest ARM CoreSight specification updates the component identification
requirements for all components attached to an AMBA bus. (ARM IHI 0029E)

This specification defines bits 15:12 in the ComponentID (CID) value as the
device class. Identification requirements now depend on this class.
Class 0xF: Traditional components identified by Peripheral ID (PID) only.
Class 0x9: CoreSight components may be identified by a Universal Component
Identifier (UCI) consisting of the PID plus CoreSight DevType and DevArch
values.

Current and future ARM CoreSight IP will now use the same PID for
components on the same function - e.g. the ETM, CTI, PMU and Debug elements
associated with a core. The first core to use this UCI method is the A35,
which currently has binding entries in the ETMv4 driver.

This patchset prepares for the addition of the upcoming CTI driver, which
will need to correctly bind with A35 and future hardware, while overcoming
the limitation of binding by PID alone, which cannot now work.

The patchset updates the current AMBA Identification mechanism, which was
already differentiating between 0xF and 0x9 CIDs, to add
additional UCI compliant tests for the for the 0x9 device class.

Additional UCI structures are provided and added to the ETMv4 driver as
appropriate.

Changes since v2:
Simplification of amba_cs_uci_id_match().
Fix CID class bitfield comments.
Dropped RFC tag on patchset.

Mike Leach (2):
  drivers: amba: Updates to component identification for driver
matching.
  coresight: etmv4: Update ID register table to add  UCI support

 drivers/amba/bus.c| 45 +++
 drivers/hwtracing/coresight/coresight-etm4x.c | 18 +++-
 include/linux/amba/bus.h  | 32 +
 3 files changed, 86 insertions(+), 9 deletions(-)

-- 
2.19.1



[PATCH v3 2/2] coresight: etmv4: Update ID register table to add UCI support

2018-12-05 Thread Mike Leach
Updates the ID register tables to contain a UCI entry for the A35 ETM
device to allow correct matching of driver in the amba bus code.

Signed-off-by: Mike Leach 
---
 drivers/hwtracing/coresight/coresight-etm4x.c | 18 +-
 1 file changed, 17 insertions(+), 1 deletion(-)

diff --git a/drivers/hwtracing/coresight/coresight-etm4x.c 
b/drivers/hwtracing/coresight/coresight-etm4x.c
index 53e2fb6e86f6..2fb8054e43ab 100644
--- a/drivers/hwtracing/coresight/coresight-etm4x.c
+++ b/drivers/hwtracing/coresight/coresight-etm4x.c
@@ -1073,12 +1073,28 @@ static int etm4_probe(struct amba_device *adev, const 
struct amba_id *id)
.mask   = 0x000f,   \
}
 
+static struct amba_cs_uci_id uci_id_etm4[] = {
+   {
+   /*  ETMv4 UCI data */
+   .devarch= 0x47704a13,
+   .devarch_mask   = 0xfff0,
+   .devtype= 0x0013,
+   }
+};
+
+#define ETM4x_AMBA_UCI_ID(pid) \
+   {   \
+   .id = pid,  \
+   .mask   = 0x000f,   \
+   .data   = uci_id_etm4,  \
+   }
+
 static const struct amba_id etm4_ids[] = {
ETM4x_AMBA_ID(0x000bb95d),  /* Cortex-A53 */
ETM4x_AMBA_ID(0x000bb95e),  /* Cortex-A57 */
ETM4x_AMBA_ID(0x000bb95a),  /* Cortex-A72 */
ETM4x_AMBA_ID(0x000bb959),  /* Cortex-A73 */
-   ETM4x_AMBA_ID(0x000bb9da),  /* Cortex-A35 */
+   ETM4x_AMBA_UCI_ID(0x000bb9da),  /* Cortex-A35 */
{},
 };
 
-- 
2.19.1



[PATCH v3 1/2] drivers: amba: Updates to component identification for driver matching.

2018-12-05 Thread Mike Leach
The CoreSight specification (ARM IHI 0029E), updates the ID register
requirements for components on an AMBA bus, to cover both traditional
ARM Primecell type devices, and newer CoreSight and other components.

The Peripheral ID (PID) / Component ID (CID) pair is extended in certain
cases to uniquely identify components. CoreSight components related to
a single function can share Peripheral ID values, and must be further
identified using a Unique Component Identifier (UCI). e.g. the ETM, CTI,
PMU and Debug hardware of the A35 all share the same PID.

Bits 15:12 of the CID are defined to be the device class.
Class 0xF remains for PrimeCell and legacy components.
Class 0x9 defines the component as CoreSight (CORESIGHT_CID above)
Class 0x0, 0x1, 0xB, 0xE define components that do not have driver support
at present.
Class 0x2-0x8,0xA and 0xD-0xD are presently reserved.

The specification futher defines which classes of device use the standard
CID/PID pair, and when additional ID registers are required.

The patches provide an update of amba_device and matching code to handle
the additional registers required for the Class 0x9 (CoreSight) UCI.
The *data pointer in the amba_id is used by the driver to provide extended
ID register values for matching.

CoreSight components where PID/CID pair is currently sufficient for
unique identification need not provide this additional information.

Signed-off-by: Mike Leach 
---
 drivers/amba/bus.c   | 45 +---
 include/linux/amba/bus.h | 32 
 2 files changed, 69 insertions(+), 8 deletions(-)

diff --git a/drivers/amba/bus.c b/drivers/amba/bus.c
index 41b706403ef7..524296a0eba0 100644
--- a/drivers/amba/bus.c
+++ b/drivers/amba/bus.c
@@ -26,19 +26,36 @@
 
 #define to_amba_driver(d)  container_of(d, struct amba_driver, drv)
 
-static const struct amba_id *
-amba_lookup(const struct amba_id *table, struct amba_device *dev)
+/* called on periphid match and class 0x9 coresight device. */
+static int
+amba_cs_uci_id_match(const struct amba_id *table, struct amba_device *dev)
 {
int ret = 0;
+   struct amba_cs_uci_id *uci;
+
+   uci = table->data;
 
+   /* no table data - return match on periphid */
+   if (!uci)
+   return 1;
+
+   /* test against read devtype and masked devarch value */
+   ret = (dev->uci.devtype == uci->devtype) &&
+   ((dev->uci.devarch & uci->devarch_mask) == uci->devarch);
+   return ret;
+}
+
+static const struct amba_id *
+amba_lookup(const struct amba_id *table, struct amba_device *dev)
+{
while (table->mask) {
-   ret = (dev->periphid & table->mask) == table->id;
-   if (ret)
-   break;
+   if (((dev->periphid & table->mask) == table->id) &&
+   ((dev->cid != CORESIGHT_CID) ||
+(amba_cs_uci_id_match(table, dev
+   return table;
table++;
}
-
-   return ret ? table : NULL;
+   return NULL;
 }
 
 static int amba_match(struct device *dev, struct device_driver *drv)
@@ -399,10 +416,22 @@ static int amba_device_try_add(struct amba_device *dev, 
struct resource *parent)
cid |= (readl(tmp + size - 0x10 + 4 * i) & 255) <<
(i * 8);
 
+   if (cid == CORESIGHT_CID) {
+   /* set the base to the start of the last 4k block */
+   void __iomem *csbase = tmp + size - 4096;
+
+   dev->uci.devarch =
+   readl(csbase + UCI_REG_DEVARCH_OFFSET);
+   dev->uci.devtype =
+   readl(csbase + UCI_REG_DEVTYPE_OFFSET) & 0xff;
+   }
+
amba_put_disable_pclk(dev);
 
-   if (cid == AMBA_CID || cid == CORESIGHT_CID)
+   if (cid == AMBA_CID || cid == CORESIGHT_CID) {
dev->periphid = pid;
+   dev->cid = cid;
+   }
 
if (!dev->periphid)
ret = -ENODEV;
diff --git a/include/linux/amba/bus.h b/include/linux/amba/bus.h
index d143c13bed26..8c0f392e4da2 100644
--- a/include/linux/amba/bus.h
+++ b/include/linux/amba/bus.h
@@ -25,6 +25,36 @@
 #define AMBA_CID   0xb105f00d
 #define CORESIGHT_CID  0xb105900d
 
+/*
+ * CoreSight Architecture specification updates the ID specification
+ * for components on the AMBA bus. (ARM IHI 0029E)
+ *
+ * Bits 15:12 of the CID are the device class.
+ *
+ * Class 0xF remains for PrimeCell and legacy components. (AMBA_CID above)
+ * Class 0x9 defines the component as CoreSight (CORESIGHT_CID above)
+ * Class 0x0, 0x1, 0xB, 0xE define components that do not have driver support
+ * at present.
+ * Class 0x2-0x8,0xA and 0xD-0xD are presently reserved.
+ *
+ * Remaining CID bits stay as 0xb105-00d
+ */
+
+/*
+ * Class 0x9 

[PATCH v3 2/2] coresight: etmv4: Update ID register table to add UCI support

2018-12-05 Thread Mike Leach
Updates the ID register tables to contain a UCI entry for the A35 ETM
device to allow correct matching of driver in the amba bus code.

Signed-off-by: Mike Leach 
---
 drivers/hwtracing/coresight/coresight-etm4x.c | 18 +-
 1 file changed, 17 insertions(+), 1 deletion(-)

diff --git a/drivers/hwtracing/coresight/coresight-etm4x.c 
b/drivers/hwtracing/coresight/coresight-etm4x.c
index 53e2fb6e86f6..2fb8054e43ab 100644
--- a/drivers/hwtracing/coresight/coresight-etm4x.c
+++ b/drivers/hwtracing/coresight/coresight-etm4x.c
@@ -1073,12 +1073,28 @@ static int etm4_probe(struct amba_device *adev, const 
struct amba_id *id)
.mask   = 0x000f,   \
}
 
+static struct amba_cs_uci_id uci_id_etm4[] = {
+   {
+   /*  ETMv4 UCI data */
+   .devarch= 0x47704a13,
+   .devarch_mask   = 0xfff0,
+   .devtype= 0x0013,
+   }
+};
+
+#define ETM4x_AMBA_UCI_ID(pid) \
+   {   \
+   .id = pid,  \
+   .mask   = 0x000f,   \
+   .data   = uci_id_etm4,  \
+   }
+
 static const struct amba_id etm4_ids[] = {
ETM4x_AMBA_ID(0x000bb95d),  /* Cortex-A53 */
ETM4x_AMBA_ID(0x000bb95e),  /* Cortex-A57 */
ETM4x_AMBA_ID(0x000bb95a),  /* Cortex-A72 */
ETM4x_AMBA_ID(0x000bb959),  /* Cortex-A73 */
-   ETM4x_AMBA_ID(0x000bb9da),  /* Cortex-A35 */
+   ETM4x_AMBA_UCI_ID(0x000bb9da),  /* Cortex-A35 */
{},
 };
 
-- 
2.19.1



[PATCH v3 1/2] drivers: amba: Updates to component identification for driver matching.

2018-12-05 Thread Mike Leach
The CoreSight specification (ARM IHI 0029E), updates the ID register
requirements for components on an AMBA bus, to cover both traditional
ARM Primecell type devices, and newer CoreSight and other components.

The Peripheral ID (PID) / Component ID (CID) pair is extended in certain
cases to uniquely identify components. CoreSight components related to
a single function can share Peripheral ID values, and must be further
identified using a Unique Component Identifier (UCI). e.g. the ETM, CTI,
PMU and Debug hardware of the A35 all share the same PID.

Bits 15:12 of the CID are defined to be the device class.
Class 0xF remains for PrimeCell and legacy components.
Class 0x9 defines the component as CoreSight (CORESIGHT_CID above)
Class 0x0, 0x1, 0xB, 0xE define components that do not have driver support
at present.
Class 0x2-0x8,0xA and 0xD-0xD are presently reserved.

The specification futher defines which classes of device use the standard
CID/PID pair, and when additional ID registers are required.

The patches provide an update of amba_device and matching code to handle
the additional registers required for the Class 0x9 (CoreSight) UCI.
The *data pointer in the amba_id is used by the driver to provide extended
ID register values for matching.

CoreSight components where PID/CID pair is currently sufficient for
unique identification need not provide this additional information.

Signed-off-by: Mike Leach 
---
 drivers/amba/bus.c   | 45 +---
 include/linux/amba/bus.h | 32 
 2 files changed, 69 insertions(+), 8 deletions(-)

diff --git a/drivers/amba/bus.c b/drivers/amba/bus.c
index 41b706403ef7..524296a0eba0 100644
--- a/drivers/amba/bus.c
+++ b/drivers/amba/bus.c
@@ -26,19 +26,36 @@
 
 #define to_amba_driver(d)  container_of(d, struct amba_driver, drv)
 
-static const struct amba_id *
-amba_lookup(const struct amba_id *table, struct amba_device *dev)
+/* called on periphid match and class 0x9 coresight device. */
+static int
+amba_cs_uci_id_match(const struct amba_id *table, struct amba_device *dev)
 {
int ret = 0;
+   struct amba_cs_uci_id *uci;
+
+   uci = table->data;
 
+   /* no table data - return match on periphid */
+   if (!uci)
+   return 1;
+
+   /* test against read devtype and masked devarch value */
+   ret = (dev->uci.devtype == uci->devtype) &&
+   ((dev->uci.devarch & uci->devarch_mask) == uci->devarch);
+   return ret;
+}
+
+static const struct amba_id *
+amba_lookup(const struct amba_id *table, struct amba_device *dev)
+{
while (table->mask) {
-   ret = (dev->periphid & table->mask) == table->id;
-   if (ret)
-   break;
+   if (((dev->periphid & table->mask) == table->id) &&
+   ((dev->cid != CORESIGHT_CID) ||
+(amba_cs_uci_id_match(table, dev
+   return table;
table++;
}
-
-   return ret ? table : NULL;
+   return NULL;
 }
 
 static int amba_match(struct device *dev, struct device_driver *drv)
@@ -399,10 +416,22 @@ static int amba_device_try_add(struct amba_device *dev, 
struct resource *parent)
cid |= (readl(tmp + size - 0x10 + 4 * i) & 255) <<
(i * 8);
 
+   if (cid == CORESIGHT_CID) {
+   /* set the base to the start of the last 4k block */
+   void __iomem *csbase = tmp + size - 4096;
+
+   dev->uci.devarch =
+   readl(csbase + UCI_REG_DEVARCH_OFFSET);
+   dev->uci.devtype =
+   readl(csbase + UCI_REG_DEVTYPE_OFFSET) & 0xff;
+   }
+
amba_put_disable_pclk(dev);
 
-   if (cid == AMBA_CID || cid == CORESIGHT_CID)
+   if (cid == AMBA_CID || cid == CORESIGHT_CID) {
dev->periphid = pid;
+   dev->cid = cid;
+   }
 
if (!dev->periphid)
ret = -ENODEV;
diff --git a/include/linux/amba/bus.h b/include/linux/amba/bus.h
index d143c13bed26..8c0f392e4da2 100644
--- a/include/linux/amba/bus.h
+++ b/include/linux/amba/bus.h
@@ -25,6 +25,36 @@
 #define AMBA_CID   0xb105f00d
 #define CORESIGHT_CID  0xb105900d
 
+/*
+ * CoreSight Architecture specification updates the ID specification
+ * for components on the AMBA bus. (ARM IHI 0029E)
+ *
+ * Bits 15:12 of the CID are the device class.
+ *
+ * Class 0xF remains for PrimeCell and legacy components. (AMBA_CID above)
+ * Class 0x9 defines the component as CoreSight (CORESIGHT_CID above)
+ * Class 0x0, 0x1, 0xB, 0xE define components that do not have driver support
+ * at present.
+ * Class 0x2-0x8,0xA and 0xD-0xD are presently reserved.
+ *
+ * Remaining CID bits stay as 0xb105-00d
+ */
+
+/*
+ * Class 0x9 

Re: [LKP] [mm] ac5b2c1891: vm-scalability.throughput -61.3% regression

2018-12-05 Thread Andrea Arcangeli
Hello,

On Wed, Dec 05, 2018 at 01:59:32PM -0800, David Rientjes wrote:
> [..] and the kernel test robot has reported, [..]

Just for completeness you may have missed one email:
https://lkml.kernel.org/r/87tvk1yjkp@yhuang-dev.intel.com

'So I think the report should have been a "performance
improvement" instead of "performance regression".'


Re: [LKP] [mm] ac5b2c1891: vm-scalability.throughput -61.3% regression

2018-12-05 Thread Andrea Arcangeli
Hello,

On Wed, Dec 05, 2018 at 01:59:32PM -0800, David Rientjes wrote:
> [..] and the kernel test robot has reported, [..]

Just for completeness you may have missed one email:
https://lkml.kernel.org/r/87tvk1yjkp@yhuang-dev.intel.com

'So I think the report should have been a "performance
improvement" instead of "performance regression".'


Re: [PATCH v2 3/3] clk: qcom: gcc-msm8998: Add clkref clocks

2018-12-05 Thread Stephen Boyd
Quoting Bjorn Andersson (2018-12-03 10:33:30)
> Add clkref clocks for usb3, hdmi, ufs, pcie, and usb2. They are all
> sourced off CXO_IN, so parent them off "xo" until a proper link to the
> rpmcc can be described in DT.
> 
> Signed-off-by: Bjorn Andersson 
> ---

Applied to clk-next



Re: [PATCH v2 2/3] clk: qcom: gcc-msm8998: Disable halt check of UFS clocks

2018-12-05 Thread Stephen Boyd
Quoting Bjorn Andersson (2018-12-03 10:33:29)
> Drop the halt check of the UFS symbol clocks, in accordance with other
> platforms. This makes clk_disable_unused() happy and makes it possible
> to turn the clocks on again without an error.
> 
> Signed-off-by: Bjorn Andersson 
> ---

Applied to clk-next



Re: [PATCH v2 3/3] clk: qcom: gcc-msm8998: Add clkref clocks

2018-12-05 Thread Stephen Boyd
Quoting Bjorn Andersson (2018-12-03 10:33:30)
> Add clkref clocks for usb3, hdmi, ufs, pcie, and usb2. They are all
> sourced off CXO_IN, so parent them off "xo" until a proper link to the
> rpmcc can be described in DT.
> 
> Signed-off-by: Bjorn Andersson 
> ---

Applied to clk-next



Re: [PATCH v2 2/3] clk: qcom: gcc-msm8998: Disable halt check of UFS clocks

2018-12-05 Thread Stephen Boyd
Quoting Bjorn Andersson (2018-12-03 10:33:29)
> Drop the halt check of the UFS symbol clocks, in accordance with other
> platforms. This makes clk_disable_unused() happy and makes it possible
> to turn the clocks on again without an error.
> 
> Signed-off-by: Bjorn Andersson 
> ---

Applied to clk-next



Re: [PATCH v2 1/3] clk: qcom: gcc-msm8998: Drop hmss_dvm and lpass_at

2018-12-05 Thread Stephen Boyd
Quoting Bjorn Andersson (2018-12-03 10:33:28)
> Disabling gcc_hmss_dvm_bus_clk and gcc_lpass_at_clk causes the board to
> lock up, and by that preventing the kernel to boot without
> clk_ignore_unused.
> 
> gcc_hmss_dvm_bus_clk is marked always-on downstream, but not referenced,
> and gcc_lpass_at_clk isn't mentioned. So let's remove them until they
> are needed by some client.
> 
> Signed-off-by: Bjorn Andersson 
> ---

Applied to clk-next



Re: [PATCH v2 1/3] clk: qcom: gcc-msm8998: Drop hmss_dvm and lpass_at

2018-12-05 Thread Stephen Boyd
Quoting Bjorn Andersson (2018-12-03 10:33:28)
> Disabling gcc_hmss_dvm_bus_clk and gcc_lpass_at_clk causes the board to
> lock up, and by that preventing the kernel to boot without
> clk_ignore_unused.
> 
> gcc_hmss_dvm_bus_clk is marked always-on downstream, but not referenced,
> and gcc_lpass_at_clk isn't mentioned. So let's remove them until they
> are needed by some client.
> 
> Signed-off-by: Bjorn Andersson 
> ---

Applied to clk-next



[PATCH v2] Fonts: New Terminus large console font

2018-12-05 Thread Amanoel Dawod
This patch adds an option to compile-in a high resolution
and large Terminus (ter16x32) bitmap console font for use with
HiDPI and Retina screens.

The font was convereted from standard Terminus ter-i32b.psf
(size 16x32) with the help of psftools and minor hand editing
deleting useless characters.

This patch is non-intrusive, no options are enabled by default so most
users won't notice a thing.

I am placing my changes under the GPL 2.0 just as source Terminus font.

Signed-off-by: Amanoel Dawod 
---
Changes in v2:
- modified commit message
- fixed trailing whitespaces errors 

 include/linux/font.h  |4 +-
 lib/fonts/Kconfig |   10 +
 lib/fonts/Makefile|1 +
 lib/fonts/font_ter16x32.c | 2072 +
 lib/fonts/fonts.c |4 +
 5 files changed, 2090 insertions(+), 1 deletion(-)
 create mode 100644 lib/fonts/font_ter16x32.c

diff --git a/include/linux/font.h b/include/linux/font.h
index d6821769dd1e..51b91c8b69d5 100644
--- a/include/linux/font.h
+++ b/include/linux/font.h
@@ -32,6 +32,7 @@ struct font_desc {
 #define ACORN8x8_IDX   8
 #defineMINI4x6_IDX 9
 #define FONT6x10_IDX   10
+#define TER16x32_IDX   11
 
 extern const struct font_desc  font_vga_8x8,
font_vga_8x16,
@@ -43,7 +44,8 @@ extern const struct font_desc font_vga_8x8,
font_sun_12x22,
font_acorn_8x8,
font_mini_4x6,
-   font_6x10;
+   font_6x10,
+   font_ter_16x32;
 
 /* Find a font with a specific name */
 
diff --git a/lib/fonts/Kconfig b/lib/fonts/Kconfig
index 8fa0791e8a1e..3ecdd5204ec5 100644
--- a/lib/fonts/Kconfig
+++ b/lib/fonts/Kconfig
@@ -109,6 +109,15 @@ config FONT_SUN12x22
  big letters (like the letters used in the SPARC PROM). If the
  standard font is unreadable for you, say Y, otherwise say N.
 
+config FONT_TER16x32
+   bool "Terminus 16x32 font (not supported by all drivers)"
+   depends on FRAMEBUFFER_CONSOLE && (!SPARC && FONTS || SPARC)
+   help
+ Terminus Font is a clean, fixed width bitmap font, designed
+ for long (8 and more hours per day) work with computers.
+ This is the high resolution, large version for use with HiDPI screens.
+ If the standard font is unreadable for you, say Y, otherwise say N.
+
 config FONT_AUTOSELECT
def_bool y
depends on !FONT_8x8
@@ -121,6 +130,7 @@ config FONT_AUTOSELECT
depends on !FONT_SUN8x16
depends on !FONT_SUN12x22
depends on !FONT_10x18
+   depends on !FONT_TER16x32
select FONT_8x16
 
 endif # FONT_SUPPORT
diff --git a/lib/fonts/Makefile b/lib/fonts/Makefile
index d56f02dea83a..ed95070860de 100644
--- a/lib/fonts/Makefile
+++ b/lib/fonts/Makefile
@@ -14,6 +14,7 @@ font-objs-$(CONFIG_FONT_PEARL_8x8) += font_pearl_8x8.o
 font-objs-$(CONFIG_FONT_ACORN_8x8) += font_acorn_8x8.o
 font-objs-$(CONFIG_FONT_MINI_4x6)  += font_mini_4x6.o
 font-objs-$(CONFIG_FONT_6x10)  += font_6x10.o
+font-objs-$(CONFIG_FONT_TER16x32)  += font_ter16x32.o
 
 font-objs += $(font-objs-y)
 
diff --git a/lib/fonts/font_ter16x32.c b/lib/fonts/font_ter16x32.c
new file mode 100644
index ..3f0cf1ccdf3a
--- /dev/null
+++ b/lib/fonts/font_ter16x32.c
@@ -0,0 +1,2072 @@
+// SPDX-License-Identifier: GPL-2.0
+#include 
+#include 
+
+#define FONTDATAMAX 16384
+
+static const unsigned char fontdata_ter16x32[FONTDATAMAX] = {
+
+   0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00,
+   0x00, 0x00, 0x00, 0x00, 0x7f, 0xfc, 0x7f, 0xfc,
+   0x70, 0x1c, 0x70, 0x1c, 0x70, 0x1c, 0x70, 0x1c,
+   0x70, 0x1c, 0x70, 0x1c, 0x70, 0x1c, 0x70, 0x1c,
+   0x70, 0x1c, 0x70, 0x1c, 0x70, 0x1c, 0x70, 0x1c,
+   0x70, 0x1c, 0x70, 0x1c, 0x70, 0x1c, 0x70, 0x1c,
+   0x7f, 0xfc, 0x7f, 0xfc, 0x00, 0x00, 0x00, 0x00,
+   0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, /* 0 */
+   0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00,
+   0x00, 0x00, 0x00, 0x00, 0x3f, 0xf8, 0x7f, 0xfc,
+   0xf0, 0x1e, 0xe0, 0x0e, 0xe0, 0x0e, 0xe0, 0x0e,
+   0xee, 0xee, 0xee, 0xee, 0xe0, 0x0e, 0xe0, 0x0e,
+   0xe0, 0x0e, 0xe0, 0x0e, 0xef, 0xee, 0xe7, 0xce,
+   0xe0, 0x0e, 0xe0, 0x0e, 0xe0, 0x0e, 0xf0, 0x1e,
+   0x7f, 0xfc, 0x3f, 0xf8, 0x00, 0x00, 0x00, 0x00,
+   0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, /* 1 */
+   0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00,
+   0x00, 0x00, 0x00, 0x00, 0x3f, 0xf8, 0x7f, 0xfc,
+   0xff, 0xfe, 0xff, 0xfe, 0xff, 0xfe, 0xff, 0xfe,
+   0xe3, 0x8e, 0xe3, 0x8e, 0xff, 0xfe, 0xff, 0xfe,
+   0xff, 0xfe, 0xff, 0xfe, 0xe0, 0x0e, 0xf0, 0x1e,
+   0xf8, 0x3e, 0xff, 0xfe, 0xff, 0xfe, 0xff, 0xfe,
+   0x7f, 0xfc, 0x3f, 0xf8, 0x00, 0x00, 0x00, 0x00,
+   0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, /* 2 */
+   0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00,
+   0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 

[PATCH v2] Fonts: New Terminus large console font

2018-12-05 Thread Amanoel Dawod
This patch adds an option to compile-in a high resolution
and large Terminus (ter16x32) bitmap console font for use with
HiDPI and Retina screens.

The font was convereted from standard Terminus ter-i32b.psf
(size 16x32) with the help of psftools and minor hand editing
deleting useless characters.

This patch is non-intrusive, no options are enabled by default so most
users won't notice a thing.

I am placing my changes under the GPL 2.0 just as source Terminus font.

Signed-off-by: Amanoel Dawod 
---
Changes in v2:
- modified commit message
- fixed trailing whitespaces errors 

 include/linux/font.h  |4 +-
 lib/fonts/Kconfig |   10 +
 lib/fonts/Makefile|1 +
 lib/fonts/font_ter16x32.c | 2072 +
 lib/fonts/fonts.c |4 +
 5 files changed, 2090 insertions(+), 1 deletion(-)
 create mode 100644 lib/fonts/font_ter16x32.c

diff --git a/include/linux/font.h b/include/linux/font.h
index d6821769dd1e..51b91c8b69d5 100644
--- a/include/linux/font.h
+++ b/include/linux/font.h
@@ -32,6 +32,7 @@ struct font_desc {
 #define ACORN8x8_IDX   8
 #defineMINI4x6_IDX 9
 #define FONT6x10_IDX   10
+#define TER16x32_IDX   11
 
 extern const struct font_desc  font_vga_8x8,
font_vga_8x16,
@@ -43,7 +44,8 @@ extern const struct font_desc font_vga_8x8,
font_sun_12x22,
font_acorn_8x8,
font_mini_4x6,
-   font_6x10;
+   font_6x10,
+   font_ter_16x32;
 
 /* Find a font with a specific name */
 
diff --git a/lib/fonts/Kconfig b/lib/fonts/Kconfig
index 8fa0791e8a1e..3ecdd5204ec5 100644
--- a/lib/fonts/Kconfig
+++ b/lib/fonts/Kconfig
@@ -109,6 +109,15 @@ config FONT_SUN12x22
  big letters (like the letters used in the SPARC PROM). If the
  standard font is unreadable for you, say Y, otherwise say N.
 
+config FONT_TER16x32
+   bool "Terminus 16x32 font (not supported by all drivers)"
+   depends on FRAMEBUFFER_CONSOLE && (!SPARC && FONTS || SPARC)
+   help
+ Terminus Font is a clean, fixed width bitmap font, designed
+ for long (8 and more hours per day) work with computers.
+ This is the high resolution, large version for use with HiDPI screens.
+ If the standard font is unreadable for you, say Y, otherwise say N.
+
 config FONT_AUTOSELECT
def_bool y
depends on !FONT_8x8
@@ -121,6 +130,7 @@ config FONT_AUTOSELECT
depends on !FONT_SUN8x16
depends on !FONT_SUN12x22
depends on !FONT_10x18
+   depends on !FONT_TER16x32
select FONT_8x16
 
 endif # FONT_SUPPORT
diff --git a/lib/fonts/Makefile b/lib/fonts/Makefile
index d56f02dea83a..ed95070860de 100644
--- a/lib/fonts/Makefile
+++ b/lib/fonts/Makefile
@@ -14,6 +14,7 @@ font-objs-$(CONFIG_FONT_PEARL_8x8) += font_pearl_8x8.o
 font-objs-$(CONFIG_FONT_ACORN_8x8) += font_acorn_8x8.o
 font-objs-$(CONFIG_FONT_MINI_4x6)  += font_mini_4x6.o
 font-objs-$(CONFIG_FONT_6x10)  += font_6x10.o
+font-objs-$(CONFIG_FONT_TER16x32)  += font_ter16x32.o
 
 font-objs += $(font-objs-y)
 
diff --git a/lib/fonts/font_ter16x32.c b/lib/fonts/font_ter16x32.c
new file mode 100644
index ..3f0cf1ccdf3a
--- /dev/null
+++ b/lib/fonts/font_ter16x32.c
@@ -0,0 +1,2072 @@
+// SPDX-License-Identifier: GPL-2.0
+#include 
+#include 
+
+#define FONTDATAMAX 16384
+
+static const unsigned char fontdata_ter16x32[FONTDATAMAX] = {
+
+   0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00,
+   0x00, 0x00, 0x00, 0x00, 0x7f, 0xfc, 0x7f, 0xfc,
+   0x70, 0x1c, 0x70, 0x1c, 0x70, 0x1c, 0x70, 0x1c,
+   0x70, 0x1c, 0x70, 0x1c, 0x70, 0x1c, 0x70, 0x1c,
+   0x70, 0x1c, 0x70, 0x1c, 0x70, 0x1c, 0x70, 0x1c,
+   0x70, 0x1c, 0x70, 0x1c, 0x70, 0x1c, 0x70, 0x1c,
+   0x7f, 0xfc, 0x7f, 0xfc, 0x00, 0x00, 0x00, 0x00,
+   0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, /* 0 */
+   0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00,
+   0x00, 0x00, 0x00, 0x00, 0x3f, 0xf8, 0x7f, 0xfc,
+   0xf0, 0x1e, 0xe0, 0x0e, 0xe0, 0x0e, 0xe0, 0x0e,
+   0xee, 0xee, 0xee, 0xee, 0xe0, 0x0e, 0xe0, 0x0e,
+   0xe0, 0x0e, 0xe0, 0x0e, 0xef, 0xee, 0xe7, 0xce,
+   0xe0, 0x0e, 0xe0, 0x0e, 0xe0, 0x0e, 0xf0, 0x1e,
+   0x7f, 0xfc, 0x3f, 0xf8, 0x00, 0x00, 0x00, 0x00,
+   0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, /* 1 */
+   0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00,
+   0x00, 0x00, 0x00, 0x00, 0x3f, 0xf8, 0x7f, 0xfc,
+   0xff, 0xfe, 0xff, 0xfe, 0xff, 0xfe, 0xff, 0xfe,
+   0xe3, 0x8e, 0xe3, 0x8e, 0xff, 0xfe, 0xff, 0xfe,
+   0xff, 0xfe, 0xff, 0xfe, 0xe0, 0x0e, 0xf0, 0x1e,
+   0xf8, 0x3e, 0xff, 0xfe, 0xff, 0xfe, 0xff, 0xfe,
+   0x7f, 0xfc, 0x3f, 0xf8, 0x00, 0x00, 0x00, 0x00,
+   0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, /* 2 */
+   0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00,
+   0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 

Re: [PATCH] clk: qcom: Enumerate remaining msm8998 resets

2018-12-05 Thread Stephen Boyd
Quoting Jeffrey Hugo (2018-12-04 07:13:22)
> The current list of defined resets is incomplete compared to what the
> hardware implements.  Enumerate the remaining resets according to the
> hardware documentation.
> 
> Signed-off-by: Jeffrey Hugo 
> ---

Applied to clk-next



Re: [PATCH] clk: qcom: Enumerate remaining msm8998 resets

2018-12-05 Thread Stephen Boyd
Quoting Jeffrey Hugo (2018-12-04 07:13:22)
> The current list of defined resets is incomplete compared to what the
> hardware implements.  Enumerate the remaining resets according to the
> hardware documentation.
> 
> Signed-off-by: Jeffrey Hugo 
> ---

Applied to clk-next



Re: [PATCH 4.9 00/50] 4.9.143-stable review

2018-12-05 Thread shuah

On 12/4/18 3:49 AM, Greg Kroah-Hartman wrote:

This is the start of the stable review cycle for the 4.9.143 release.
There are 50 patches in this series, all will be posted as a response
to this one.  If anyone has any issues with these being applied, please
let me know.

Responses should be made by Thu Dec  6 10:36:59 UTC 2018.
Anything received after that time might be too late.

The whole patch series can be found in one patch at:

https://www.kernel.org/pub/linux/kernel/v4.x/stable-review/patch-4.9.143-rc1.gz
or in the git tree and branch at:

git://git.kernel.org/pub/scm/linux/kernel/git/stable/linux-stable-rc.git 
linux-4.9.y
and the diffstat can be found below.

thanks,

greg k-h



Compiled and booted on my test system. No dmesg regressions.

thanks,
-- Shuah



Re: [PATCH 4.9 00/50] 4.9.143-stable review

2018-12-05 Thread shuah

On 12/4/18 3:49 AM, Greg Kroah-Hartman wrote:

This is the start of the stable review cycle for the 4.9.143 release.
There are 50 patches in this series, all will be posted as a response
to this one.  If anyone has any issues with these being applied, please
let me know.

Responses should be made by Thu Dec  6 10:36:59 UTC 2018.
Anything received after that time might be too late.

The whole patch series can be found in one patch at:

https://www.kernel.org/pub/linux/kernel/v4.x/stable-review/patch-4.9.143-rc1.gz
or in the git tree and branch at:

git://git.kernel.org/pub/scm/linux/kernel/git/stable/linux-stable-rc.git 
linux-4.9.y
and the diffstat can be found below.

thanks,

greg k-h



Compiled and booted on my test system. No dmesg regressions.

thanks,
-- Shuah



Re: [PATCH 4.14 000/146] 4.14.86-stable review

2018-12-05 Thread shuah

On 12/4/18 3:48 AM, Greg Kroah-Hartman wrote:

This is the start of the stable review cycle for the 4.14.86 release.
There are 146 patches in this series, all will be posted as a response
to this one.  If anyone has any issues with these being applied, please
let me know.

Responses should be made by Thu Dec  6 10:36:52 UTC 2018.
Anything received after that time might be too late.

The whole patch series can be found in one patch at:

https://www.kernel.org/pub/linux/kernel/v4.x/stable-review/patch-4.14.86-rc1.gz
or in the git tree and branch at:

git://git.kernel.org/pub/scm/linux/kernel/git/stable/linux-stable-rc.git 
linux-4.14.y
and the diffstat can be found below.

thanks,

greg k-h



Compiled and booted on my test system. No dmesg regressions.

thanks,
-- Shuah



Re: [PATCH 4.14 000/146] 4.14.86-stable review

2018-12-05 Thread shuah

On 12/4/18 3:48 AM, Greg Kroah-Hartman wrote:

This is the start of the stable review cycle for the 4.14.86 release.
There are 146 patches in this series, all will be posted as a response
to this one.  If anyone has any issues with these being applied, please
let me know.

Responses should be made by Thu Dec  6 10:36:52 UTC 2018.
Anything received after that time might be too late.

The whole patch series can be found in one patch at:

https://www.kernel.org/pub/linux/kernel/v4.x/stable-review/patch-4.14.86-rc1.gz
or in the git tree and branch at:

git://git.kernel.org/pub/scm/linux/kernel/git/stable/linux-stable-rc.git 
linux-4.14.y
and the diffstat can be found below.

thanks,

greg k-h



Compiled and booted on my test system. No dmesg regressions.

thanks,
-- Shuah



[PATCH v2 3/4] dt-bindings: input: touchscreen: goodix: Add GT5663 compatible

2018-12-05 Thread Jagan Teki
GT5663 is capacitive touch controller with customized smart
wakeup gestures, it support chipdata which is similar to
existing GT1151 and require AVDD28 supply for some boards.

Document the compatible for the same.

Signed-off-by: Jagan Teki 
---
Changes for v2:
- drop example node 

 Documentation/devicetree/bindings/input/touchscreen/goodix.txt | 1 +
 1 file changed, 1 insertion(+)

diff --git a/Documentation/devicetree/bindings/input/touchscreen/goodix.txt 
b/Documentation/devicetree/bindings/input/touchscreen/goodix.txt
index c4622c983e08..59c89276e6bb 100644
--- a/Documentation/devicetree/bindings/input/touchscreen/goodix.txt
+++ b/Documentation/devicetree/bindings/input/touchscreen/goodix.txt
@@ -3,6 +3,7 @@ Device tree bindings for Goodix GT9xx series touchscreen 
controller
 Required properties:
 
  - compatible  : Should be "goodix,gt1151"
+or "goodix,gt5663"
 or "goodix,gt911"
 or "goodix,gt9110"
 or "goodix,gt912"
-- 
2.18.0.321.gffc6fa0e3



[PATCH v2 3/4] dt-bindings: input: touchscreen: goodix: Add GT5663 compatible

2018-12-05 Thread Jagan Teki
GT5663 is capacitive touch controller with customized smart
wakeup gestures, it support chipdata which is similar to
existing GT1151 and require AVDD28 supply for some boards.

Document the compatible for the same.

Signed-off-by: Jagan Teki 
---
Changes for v2:
- drop example node 

 Documentation/devicetree/bindings/input/touchscreen/goodix.txt | 1 +
 1 file changed, 1 insertion(+)

diff --git a/Documentation/devicetree/bindings/input/touchscreen/goodix.txt 
b/Documentation/devicetree/bindings/input/touchscreen/goodix.txt
index c4622c983e08..59c89276e6bb 100644
--- a/Documentation/devicetree/bindings/input/touchscreen/goodix.txt
+++ b/Documentation/devicetree/bindings/input/touchscreen/goodix.txt
@@ -3,6 +3,7 @@ Device tree bindings for Goodix GT9xx series touchscreen 
controller
 Required properties:
 
  - compatible  : Should be "goodix,gt1151"
+or "goodix,gt5663"
 or "goodix,gt911"
 or "goodix,gt9110"
 or "goodix,gt912"
-- 
2.18.0.321.gffc6fa0e3



Re: [PATCH 4.19 000/139] 4.19.7-stable review

2018-12-05 Thread shuah

On 12/4/18 3:48 AM, Greg Kroah-Hartman wrote:

This is the start of the stable review cycle for the 4.19.7 release.
There are 139 patches in this series, all will be posted as a response
to this one.  If anyone has any issues with these being applied, please
let me know.

Responses should be made by Thu Dec  6 10:36:22 UTC 2018.
Anything received after that time might be too late.

The whole patch series can be found in one patch at:

https://www.kernel.org/pub/linux/kernel/v4.x/stable-review/patch-4.19.7-rc1.gz
or in the git tree and branch at:

git://git.kernel.org/pub/scm/linux/kernel/git/stable/linux-stable-rc.git 
linux-4.19.y
and the diffstat can be found below.

thanks,

greg k-h



Compiled and booted on my test system. No dmesg regressions.

thanks,
-- Shuah


[PATCH v2 1/4] dt-bindings: input: touchscreen: goodix: Document AVDD28-supply property

2018-12-05 Thread Jagan Teki
Most of the Goodix CTP controllers are supply with AVDD28 pin.
which need to supply for controllers like GT5663 on some boards
to trigger the power.

So, document the supply property so-that the required board
that used on GT5663 can enable it via device tree.

Signed-off-by: Jagan Teki 
---
Changes for v2:
- Rename vcc-supply with AVDD28-supply

 Documentation/devicetree/bindings/input/touchscreen/goodix.txt | 1 +
 1 file changed, 1 insertion(+)

diff --git a/Documentation/devicetree/bindings/input/touchscreen/goodix.txt 
b/Documentation/devicetree/bindings/input/touchscreen/goodix.txt
index f7e95c52f3c7..c4622c983e08 100644
--- a/Documentation/devicetree/bindings/input/touchscreen/goodix.txt
+++ b/Documentation/devicetree/bindings/input/touchscreen/goodix.txt
@@ -23,6 +23,7 @@ Optional properties:
  - touchscreen-inverted-y  : Y axis is inverted (boolean)
  - touchscreen-swapped-x-y : X and Y axis are swapped (boolean)
  (swapping is done after inverting the axis)
+ - AVDD28-supply   : Analog power supply regulator on AVDD28 pin
 
 Example:
 
-- 
2.18.0.321.gffc6fa0e3



Re: [PATCH 4.19 000/139] 4.19.7-stable review

2018-12-05 Thread shuah

On 12/4/18 3:48 AM, Greg Kroah-Hartman wrote:

This is the start of the stable review cycle for the 4.19.7 release.
There are 139 patches in this series, all will be posted as a response
to this one.  If anyone has any issues with these being applied, please
let me know.

Responses should be made by Thu Dec  6 10:36:22 UTC 2018.
Anything received after that time might be too late.

The whole patch series can be found in one patch at:

https://www.kernel.org/pub/linux/kernel/v4.x/stable-review/patch-4.19.7-rc1.gz
or in the git tree and branch at:

git://git.kernel.org/pub/scm/linux/kernel/git/stable/linux-stable-rc.git 
linux-4.19.y
and the diffstat can be found below.

thanks,

greg k-h



Compiled and booted on my test system. No dmesg regressions.

thanks,
-- Shuah


[PATCH v2 1/4] dt-bindings: input: touchscreen: goodix: Document AVDD28-supply property

2018-12-05 Thread Jagan Teki
Most of the Goodix CTP controllers are supply with AVDD28 pin.
which need to supply for controllers like GT5663 on some boards
to trigger the power.

So, document the supply property so-that the required board
that used on GT5663 can enable it via device tree.

Signed-off-by: Jagan Teki 
---
Changes for v2:
- Rename vcc-supply with AVDD28-supply

 Documentation/devicetree/bindings/input/touchscreen/goodix.txt | 1 +
 1 file changed, 1 insertion(+)

diff --git a/Documentation/devicetree/bindings/input/touchscreen/goodix.txt 
b/Documentation/devicetree/bindings/input/touchscreen/goodix.txt
index f7e95c52f3c7..c4622c983e08 100644
--- a/Documentation/devicetree/bindings/input/touchscreen/goodix.txt
+++ b/Documentation/devicetree/bindings/input/touchscreen/goodix.txt
@@ -23,6 +23,7 @@ Optional properties:
  - touchscreen-inverted-y  : Y axis is inverted (boolean)
  - touchscreen-swapped-x-y : X and Y axis are swapped (boolean)
  (swapping is done after inverting the axis)
+ - AVDD28-supply   : Analog power supply regulator on AVDD28 pin
 
 Example:
 
-- 
2.18.0.321.gffc6fa0e3



[PATCH v2 2/4] Input: goodix - Add ADVV28-supply regulator support

2018-12-05 Thread Jagan Teki
Goodix CTP controllers have AVDD28 pin connected to voltage
regulator which may not be turned on by default, like for GT5663.

Add support for such ctp used boards by adding voltage regulator
handling code to goodix ctp driver.

Signed-off-by: Jagan Teki 
---
Changes for v2:
- disable regulator in remove
- fix to setup regulator in probe code

 drivers/input/touchscreen/goodix.c | 33 +-
 1 file changed, 28 insertions(+), 5 deletions(-)

diff --git a/drivers/input/touchscreen/goodix.c 
b/drivers/input/touchscreen/goodix.c
index f2d9c2c41885..7371f6946098 100644
--- a/drivers/input/touchscreen/goodix.c
+++ b/drivers/input/touchscreen/goodix.c
@@ -27,6 +27,7 @@
 #include 
 #include 
 #include 
+#include 
 #include 
 #include 
 #include 
@@ -47,6 +48,7 @@ struct goodix_ts_data {
struct touchscreen_properties prop;
unsigned int max_touch_num;
unsigned int int_trigger_type;
+   struct regulator *avdd28;
struct gpio_desc *gpiod_int;
struct gpio_desc *gpiod_rst;
u16 id;
@@ -786,25 +788,41 @@ static int goodix_ts_probe(struct i2c_client *client,
if (error)
return error;
 
+   ts->avdd28 = devm_regulator_get(>dev, "AVDD28");
+   if (IS_ERR(ts->avdd28)) {
+   error = PTR_ERR(ts->avdd28);
+   if (error != -EPROBE_DEFER)
+   dev_err(>dev,
+   "Failed to get AVDD28 regulator: %d\n", error);
+   return error;
+   }
+
+   /* power the controller */
+   error = regulator_enable(ts->avdd28);
+   if (error) {
+   dev_err(>dev, "Controller fail to enable AVDD28\n");
+   return error;
+   }
+
if (ts->gpiod_int && ts->gpiod_rst) {
/* reset the controller */
error = goodix_reset(ts);
if (error) {
dev_err(>dev, "Controller reset failed.\n");
-   return error;
+   goto error;
}
}
 
error = goodix_i2c_test(client);
if (error) {
dev_err(>dev, "I2C communication failure: %d\n", error);
-   return error;
+   goto error;
}
 
error = goodix_read_version(ts);
if (error) {
dev_err(>dev, "Read version failed.\n");
-   return error;
+   goto error;
}
 
ts->chip = goodix_get_chip_data(ts->id);
@@ -823,23 +841,28 @@ static int goodix_ts_probe(struct i2c_client *client,
dev_err(>dev,
"Failed to invoke firmware loader: %d\n",
error);
-   return error;
+   goto error;
}
 
return 0;
} else {
error = goodix_configure_dev(ts);
if (error)
-   return error;
+   goto error;
}
 
return 0;
+
+error:
+   regulator_disable(ts->avdd28);
+   return error;
 }
 
 static int goodix_ts_remove(struct i2c_client *client)
 {
struct goodix_ts_data *ts = i2c_get_clientdata(client);
 
+   regulator_disable(ts->avdd28);
if (ts->gpiod_int && ts->gpiod_rst)
wait_for_completion(>firmware_loading_complete);
 
-- 
2.18.0.321.gffc6fa0e3



[PATCH v2 4/4] Input: goodix - Add GT5663 CTP support

2018-12-05 Thread Jagan Teki
GT5663 is capacitive touch controller with customized smart
wakeup gestures.

Add support for it by adding compatible and supported chip data.

The chip data on GT5663 is similar to GT1151, like
- config data register has 0x8050 address
- config data register max len is 240
- config data checksum has 16-bit

Signed-off-by: Jagan Teki 
---
Changes for v2:
- add chipdata

 drivers/input/touchscreen/goodix.c | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/drivers/input/touchscreen/goodix.c 
b/drivers/input/touchscreen/goodix.c
index 7371f6946098..735ab8e246b6 100644
--- a/drivers/input/touchscreen/goodix.c
+++ b/drivers/input/touchscreen/goodix.c
@@ -218,6 +218,7 @@ static const struct goodix_chip_data 
*goodix_get_chip_data(u16 id)
 {
switch (id) {
case 1151:
+   case 5663:
return _chip_data;
 
case 911:
@@ -965,6 +966,7 @@ MODULE_DEVICE_TABLE(acpi, goodix_acpi_match);
 #ifdef CONFIG_OF
 static const struct of_device_id goodix_of_match[] = {
{ .compatible = "goodix,gt1151" },
+   { .compatible = "goodix,gt5663" },
{ .compatible = "goodix,gt911" },
{ .compatible = "goodix,gt9110" },
{ .compatible = "goodix,gt912" },
-- 
2.18.0.321.gffc6fa0e3



[PATCH v2 2/4] Input: goodix - Add ADVV28-supply regulator support

2018-12-05 Thread Jagan Teki
Goodix CTP controllers have AVDD28 pin connected to voltage
regulator which may not be turned on by default, like for GT5663.

Add support for such ctp used boards by adding voltage regulator
handling code to goodix ctp driver.

Signed-off-by: Jagan Teki 
---
Changes for v2:
- disable regulator in remove
- fix to setup regulator in probe code

 drivers/input/touchscreen/goodix.c | 33 +-
 1 file changed, 28 insertions(+), 5 deletions(-)

diff --git a/drivers/input/touchscreen/goodix.c 
b/drivers/input/touchscreen/goodix.c
index f2d9c2c41885..7371f6946098 100644
--- a/drivers/input/touchscreen/goodix.c
+++ b/drivers/input/touchscreen/goodix.c
@@ -27,6 +27,7 @@
 #include 
 #include 
 #include 
+#include 
 #include 
 #include 
 #include 
@@ -47,6 +48,7 @@ struct goodix_ts_data {
struct touchscreen_properties prop;
unsigned int max_touch_num;
unsigned int int_trigger_type;
+   struct regulator *avdd28;
struct gpio_desc *gpiod_int;
struct gpio_desc *gpiod_rst;
u16 id;
@@ -786,25 +788,41 @@ static int goodix_ts_probe(struct i2c_client *client,
if (error)
return error;
 
+   ts->avdd28 = devm_regulator_get(>dev, "AVDD28");
+   if (IS_ERR(ts->avdd28)) {
+   error = PTR_ERR(ts->avdd28);
+   if (error != -EPROBE_DEFER)
+   dev_err(>dev,
+   "Failed to get AVDD28 regulator: %d\n", error);
+   return error;
+   }
+
+   /* power the controller */
+   error = regulator_enable(ts->avdd28);
+   if (error) {
+   dev_err(>dev, "Controller fail to enable AVDD28\n");
+   return error;
+   }
+
if (ts->gpiod_int && ts->gpiod_rst) {
/* reset the controller */
error = goodix_reset(ts);
if (error) {
dev_err(>dev, "Controller reset failed.\n");
-   return error;
+   goto error;
}
}
 
error = goodix_i2c_test(client);
if (error) {
dev_err(>dev, "I2C communication failure: %d\n", error);
-   return error;
+   goto error;
}
 
error = goodix_read_version(ts);
if (error) {
dev_err(>dev, "Read version failed.\n");
-   return error;
+   goto error;
}
 
ts->chip = goodix_get_chip_data(ts->id);
@@ -823,23 +841,28 @@ static int goodix_ts_probe(struct i2c_client *client,
dev_err(>dev,
"Failed to invoke firmware loader: %d\n",
error);
-   return error;
+   goto error;
}
 
return 0;
} else {
error = goodix_configure_dev(ts);
if (error)
-   return error;
+   goto error;
}
 
return 0;
+
+error:
+   regulator_disable(ts->avdd28);
+   return error;
 }
 
 static int goodix_ts_remove(struct i2c_client *client)
 {
struct goodix_ts_data *ts = i2c_get_clientdata(client);
 
+   regulator_disable(ts->avdd28);
if (ts->gpiod_int && ts->gpiod_rst)
wait_for_completion(>firmware_loading_complete);
 
-- 
2.18.0.321.gffc6fa0e3



[PATCH v2 4/4] Input: goodix - Add GT5663 CTP support

2018-12-05 Thread Jagan Teki
GT5663 is capacitive touch controller with customized smart
wakeup gestures.

Add support for it by adding compatible and supported chip data.

The chip data on GT5663 is similar to GT1151, like
- config data register has 0x8050 address
- config data register max len is 240
- config data checksum has 16-bit

Signed-off-by: Jagan Teki 
---
Changes for v2:
- add chipdata

 drivers/input/touchscreen/goodix.c | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/drivers/input/touchscreen/goodix.c 
b/drivers/input/touchscreen/goodix.c
index 7371f6946098..735ab8e246b6 100644
--- a/drivers/input/touchscreen/goodix.c
+++ b/drivers/input/touchscreen/goodix.c
@@ -218,6 +218,7 @@ static const struct goodix_chip_data 
*goodix_get_chip_data(u16 id)
 {
switch (id) {
case 1151:
+   case 5663:
return _chip_data;
 
case 911:
@@ -965,6 +966,7 @@ MODULE_DEVICE_TABLE(acpi, goodix_acpi_match);
 #ifdef CONFIG_OF
 static const struct of_device_id goodix_of_match[] = {
{ .compatible = "goodix,gt1151" },
+   { .compatible = "goodix,gt5663" },
{ .compatible = "goodix,gt911" },
{ .compatible = "goodix,gt9110" },
{ .compatible = "goodix,gt912" },
-- 
2.18.0.321.gffc6fa0e3



Re: [LKP] [mm] ac5b2c1891: vm-scalability.throughput -61.3% regression

2018-12-05 Thread Linus Torvalds
On Wed, Dec 5, 2018 at 3:36 PM Andrea Arcangeli  wrote:
>
> Like said earlier still better to apply __GFP_COMPACT_ONLY or David's
> patch than to return to v4.18 though.

Ok, I've applied David's latest patch.

I'm not at all objecting to tweaking this further, I just didn't want
to have this regression stand.

   Linus


[for-next][PATCH 02/30] tracing: Do not line wrap short line in function_graph_enter()

2018-12-05 Thread Steven Rostedt
From: "Steven Rostedt (VMware)" 

Commit 588ca1786f2dd ("function_graph: Use new curr_ret_depth to manage
depth instead of curr_ret_stack") removed a parameter from the call
ftrace_push_return_trace() that made it so that the entire call was under 80
characters, but it did not remove the line break. There's no reason to break
that line up, so make it a single line.

Link: 
http://lkml.kernel.org/r/20181122100322.gn2...@hirez.programming.kicks-ass.net

Reported-by: Peter Zijlstra 
Signed-off-by: Steven Rostedt (VMware) 
---
 kernel/trace/trace_functions_graph.c | 3 +--
 1 file changed, 1 insertion(+), 2 deletions(-)

diff --git a/kernel/trace/trace_functions_graph.c 
b/kernel/trace/trace_functions_graph.c
index 086af4f5c3e8..0d235e44d08e 100644
--- a/kernel/trace/trace_functions_graph.c
+++ b/kernel/trace/trace_functions_graph.c
@@ -188,8 +188,7 @@ int function_graph_enter(unsigned long ret, unsigned long 
func,
trace.func = func;
trace.depth = ++current->curr_ret_depth;
 
-   if (ftrace_push_return_trace(ret, func,
-frame_pointer, retp))
+   if (ftrace_push_return_trace(ret, func, frame_pointer, retp))
goto out;
 
/* Only trace if the calling function expects to */
-- 
2.19.1




[for-next][PATCH 02/30] tracing: Do not line wrap short line in function_graph_enter()

2018-12-05 Thread Steven Rostedt
From: "Steven Rostedt (VMware)" 

Commit 588ca1786f2dd ("function_graph: Use new curr_ret_depth to manage
depth instead of curr_ret_stack") removed a parameter from the call
ftrace_push_return_trace() that made it so that the entire call was under 80
characters, but it did not remove the line break. There's no reason to break
that line up, so make it a single line.

Link: 
http://lkml.kernel.org/r/20181122100322.gn2...@hirez.programming.kicks-ass.net

Reported-by: Peter Zijlstra 
Signed-off-by: Steven Rostedt (VMware) 
---
 kernel/trace/trace_functions_graph.c | 3 +--
 1 file changed, 1 insertion(+), 2 deletions(-)

diff --git a/kernel/trace/trace_functions_graph.c 
b/kernel/trace/trace_functions_graph.c
index 086af4f5c3e8..0d235e44d08e 100644
--- a/kernel/trace/trace_functions_graph.c
+++ b/kernel/trace/trace_functions_graph.c
@@ -188,8 +188,7 @@ int function_graph_enter(unsigned long ret, unsigned long 
func,
trace.func = func;
trace.depth = ++current->curr_ret_depth;
 
-   if (ftrace_push_return_trace(ret, func,
-frame_pointer, retp))
+   if (ftrace_push_return_trace(ret, func, frame_pointer, retp))
goto out;
 
/* Only trace if the calling function expects to */
-- 
2.19.1




Re: [LKP] [mm] ac5b2c1891: vm-scalability.throughput -61.3% regression

2018-12-05 Thread Linus Torvalds
On Wed, Dec 5, 2018 at 3:36 PM Andrea Arcangeli  wrote:
>
> Like said earlier still better to apply __GFP_COMPACT_ONLY or David's
> patch than to return to v4.18 though.

Ok, I've applied David's latest patch.

I'm not at all objecting to tweaking this further, I just didn't want
to have this regression stand.

   Linus


[for-next][PATCH 01/30] function_graph: Remove unused task_curr_ret_stack()

2018-12-05 Thread Steven Rostedt
From: "Steven Rostedt (VMware)" 

The static inline function task_curr_ret_stack() is unused, remove it.

Reviewed-by: Joel Fernandes (Google) 
Reviewed-by: Masami Hiramatsu 
Signed-off-by: Steven Rostedt (VMware) 
---
 include/linux/ftrace.h | 10 --
 1 file changed, 10 deletions(-)

diff --git a/include/linux/ftrace.h b/include/linux/ftrace.h
index dd16e8218db3..10bd46434908 100644
--- a/include/linux/ftrace.h
+++ b/include/linux/ftrace.h
@@ -809,11 +809,6 @@ extern void ftrace_graph_init_task(struct task_struct *t);
 extern void ftrace_graph_exit_task(struct task_struct *t);
 extern void ftrace_graph_init_idle_task(struct task_struct *t, int cpu);
 
-static inline int task_curr_ret_stack(struct task_struct *t)
-{
-   return t->curr_ret_stack;
-}
-
 static inline void pause_graph_tracing(void)
 {
atomic_inc(>tracing_graph_pause);
@@ -838,11 +833,6 @@ static inline int 
register_ftrace_graph(trace_func_graph_ret_t retfunc,
 }
 static inline void unregister_ftrace_graph(void) { }
 
-static inline int task_curr_ret_stack(struct task_struct *tsk)
-{
-   return -1;
-}
-
 static inline unsigned long
 ftrace_graph_ret_addr(struct task_struct *task, int *idx, unsigned long ret,
  unsigned long *retp)
-- 
2.19.1




[for-next][PATCH 00/30] tracing: Updates for the next merge window

2018-12-05 Thread Steven Rostedt
Note, I still have more in my queue that need to go through testing.

  git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace.git
for-next

Head SHA1: e007f5165a2e366579324062a69e56236a97fad3


Dan Carpenter (1):
  tracing: Have trace_stack nr_entries compare not be so subtle

Joe Lawrence (1):
  scripts/recordmcount.{c,pl}: support -ffunction-sections .text.* section 
names

Masami Hiramatsu (11):
  tracing/uprobes: Add busy check when cleanup all uprobes
  tracing: Lock event_mutex before synth_event_mutex
  tracing: Simplify creation and deletion of synthetic events
  tracing: Integrate similar probe argument parsers
  tracing: Add unified dynamic event framework
  tracing/kprobes: Use dyn_event framework for kprobe events
  tracing/uprobes: Use dyn_event framework for uprobe events
  tracing: Use dyn_event framework for synthetic events
  tracing: Remove unneeded synth_event_mutex
  tracing: Add generic event-name based remove event method
  selftests/ftrace: Add testcases for dynamic event

Steven Rostedt (VMware) (17):
  function_graph: Remove unused task_curr_ret_stack()
  tracing: Do not line wrap short line in function_graph_enter()
  fgraph: Create a fgraph.c file to store function graph infrastructure
  fgraph: Have set_graph_notrace only affect function_graph tracer
  arm64: function_graph: Remove use of FTRACE_NOTRACE_DEPTH
  function_graph: Remove the use of FTRACE_NOTRACE_DEPTH
  ftrace: Create new ftrace_internal.h header
  function_graph: Do not expose the graph_time option when profiler is not 
configured
  fgraph: Move function graph specific code into fgraph.c
  tracing: Rearrange functions in trace_sched_wakeup.c
  fgraph: Add new fgraph_ops structure to enable function graph hooks
  function_graph: Move ftrace_graph_ret_addr() to fgraph.c
  function_graph: Have profiler use new helper ftrace_graph_get_ret_stack()
  ring-buffer: Add percentage of ring buffer full to wake up reader
  tracing: Add tracefs file buffer_percentage
  tracing: Change default buffer_percent to 50
  tracing: Consolidate trace_add/remove_event_call back to the nolock 
functions


 Documentation/trace/kprobetrace.rst|   3 +
 Documentation/trace/uprobetracer.rst   |   4 +
 arch/arm64/kernel/stacktrace.c |   3 -
 include/linux/ftrace.h |  35 +-
 include/linux/ring_buffer.h|   4 +-
 kernel/trace/Kconfig   |   6 +
 kernel/trace/Makefile  |   2 +
 kernel/trace/fgraph.c  | 615 +
 kernel/trace/ftrace.c  | 471 ++--
 kernel/trace/ftrace_internal.h |  75 +++
 kernel/trace/ring_buffer.c |  94 +++-
 kernel/trace/trace.c   |  72 ++-
 kernel/trace/trace.h   |  13 +
 kernel/trace/trace_dynevent.c  | 217 
 kernel/trace/trace_dynevent.h  | 119 
 kernel/trace/trace_events.c|   8 +-
 kernel/trace/trace_events_hist.c   | 316 ++-
 kernel/trace/trace_functions_graph.c   | 334 ++-
 kernel/trace/trace_irqsoff.c   |  18 +-
 kernel/trace/trace_kprobe.c| 353 ++--
 kernel/trace/trace_probe.c |  74 ++-
 kernel/trace/trace_probe.h |   9 +-
 kernel/trace/trace_sched_wakeup.c  | 270 +
 kernel/trace/trace_selftest.c  |   8 +-
 kernel/trace/trace_stack.c |   2 +-
 kernel/trace/trace_uprobe.c| 301 +-
 scripts/recordmcount.c |   2 +-
 scripts/recordmcount.pl|  13 +
 .../ftrace/test.d/dynevent/add_remove_kprobe.tc|  30 +
 .../ftrace/test.d/dynevent/add_remove_synth.tc |  27 +
 .../ftrace/test.d/dynevent/clear_select_events.tc  |  50 ++
 .../ftrace/test.d/dynevent/generic_clear_event.tc  |  49 ++
 32 files changed, 2176 insertions(+), 1421 deletions(-)
 create mode 100644 kernel/trace/fgraph.c
 create mode 100644 kernel/trace/ftrace_internal.h
 create mode 100644 kernel/trace/trace_dynevent.c
 create mode 100644 kernel/trace/trace_dynevent.h
 create mode 100644 
tools/testing/selftests/ftrace/test.d/dynevent/add_remove_kprobe.tc
 create mode 100644 
tools/testing/selftests/ftrace/test.d/dynevent/add_remove_synth.tc
 create mode 100644 
tools/testing/selftests/ftrace/test.d/dynevent/clear_select_events.tc
 create mode 100644 
tools/testing/selftests/ftrace/test.d/dynevent/generic_clear_event.tc


[for-next][PATCH 03/30] fgraph: Create a fgraph.c file to store function graph infrastructure

2018-12-05 Thread Steven Rostedt
From: "Steven Rostedt (VMware)" 

As the function graph infrastructure can be used by thing other than
tracing, moving the code to its own file out of the trace_functions_graph.c
code makes more sense.

The fgraph.c file will only contain the infrastructure required to hook into
functions and their return code.

Reviewed-by: Joel Fernandes (Google) 
Signed-off-by: Steven Rostedt (VMware) 
---
 kernel/trace/Makefile|   1 +
 kernel/trace/fgraph.c| 232 +++
 kernel/trace/trace_functions_graph.c | 220 -
 3 files changed, 233 insertions(+), 220 deletions(-)
 create mode 100644 kernel/trace/fgraph.c

diff --git a/kernel/trace/Makefile b/kernel/trace/Makefile
index f81dadbc7c4a..c7ade7965464 100644
--- a/kernel/trace/Makefile
+++ b/kernel/trace/Makefile
@@ -57,6 +57,7 @@ obj-$(CONFIG_MMIOTRACE) += trace_mmiotrace.o
 obj-$(CONFIG_FUNCTION_GRAPH_TRACER) += trace_functions_graph.o
 obj-$(CONFIG_TRACE_BRANCH_PROFILING) += trace_branch.o
 obj-$(CONFIG_BLK_DEV_IO_TRACE) += blktrace.o
+obj-$(CONFIG_FUNCTION_GRAPH_TRACER) += fgraph.o
 ifeq ($(CONFIG_BLOCK),y)
 obj-$(CONFIG_EVENT_TRACING) += blktrace.o
 endif
diff --git a/kernel/trace/fgraph.c b/kernel/trace/fgraph.c
new file mode 100644
index ..5ad9c0e88b80
--- /dev/null
+++ b/kernel/trace/fgraph.c
@@ -0,0 +1,232 @@
+// SPDX-License-Identifier: GPL-2.0
+/*
+ * Infrastructure to took into function calls and returns.
+ * Copyright (c) 2008-2009 Frederic Weisbecker 
+ * Mostly borrowed from function tracer which
+ * is Copyright (c) Steven Rostedt 
+ *
+ * Highly modified by Steven Rostedt (VMware).
+ */
+#include 
+
+#include "trace.h"
+
+static bool kill_ftrace_graph;
+
+/**
+ * ftrace_graph_is_dead - returns true if ftrace_graph_stop() was called
+ *
+ * ftrace_graph_stop() is called when a severe error is detected in
+ * the function graph tracing. This function is called by the critical
+ * paths of function graph to keep those paths from doing any more harm.
+ */
+bool ftrace_graph_is_dead(void)
+{
+   return kill_ftrace_graph;
+}
+
+/**
+ * ftrace_graph_stop - set to permanently disable function graph tracincg
+ *
+ * In case of an error int function graph tracing, this is called
+ * to try to keep function graph tracing from causing any more harm.
+ * Usually this is pretty severe and this is called to try to at least
+ * get a warning out to the user.
+ */
+void ftrace_graph_stop(void)
+{
+   kill_ftrace_graph = true;
+}
+
+/* Add a function return address to the trace stack on thread info.*/
+static int
+ftrace_push_return_trace(unsigned long ret, unsigned long func,
+unsigned long frame_pointer, unsigned long *retp)
+{
+   unsigned long long calltime;
+   int index;
+
+   if (unlikely(ftrace_graph_is_dead()))
+   return -EBUSY;
+
+   if (!current->ret_stack)
+   return -EBUSY;
+
+   /*
+* We must make sure the ret_stack is tested before we read
+* anything else.
+*/
+   smp_rmb();
+
+   /* The return trace stack is full */
+   if (current->curr_ret_stack == FTRACE_RETFUNC_DEPTH - 1) {
+   atomic_inc(>trace_overrun);
+   return -EBUSY;
+   }
+
+   /*
+* The curr_ret_stack is an index to ftrace return stack of
+* current task.  Its value should be in [0, FTRACE_RETFUNC_
+* DEPTH) when the function graph tracer is used.  To support
+* filtering out specific functions, it makes the index
+* negative by subtracting huge value (FTRACE_NOTRACE_DEPTH)
+* so when it sees a negative index the ftrace will ignore
+* the record.  And the index gets recovered when returning
+* from the filtered function by adding the FTRACE_NOTRACE_
+* DEPTH and then it'll continue to record functions normally.
+*
+* The curr_ret_stack is initialized to -1 and get increased
+* in this function.  So it can be less than -1 only if it was
+* filtered out via ftrace_graph_notrace_addr() which can be
+* set from set_graph_notrace file in tracefs by user.
+*/
+   if (current->curr_ret_stack < -1)
+   return -EBUSY;
+
+   calltime = trace_clock_local();
+
+   index = ++current->curr_ret_stack;
+   if (ftrace_graph_notrace_addr(func))
+   current->curr_ret_stack -= FTRACE_NOTRACE_DEPTH;
+   barrier();
+   current->ret_stack[index].ret = ret;
+   current->ret_stack[index].func = func;
+   current->ret_stack[index].calltime = calltime;
+#ifdef HAVE_FUNCTION_GRAPH_FP_TEST
+   current->ret_stack[index].fp = frame_pointer;
+#endif
+#ifdef HAVE_FUNCTION_GRAPH_RET_ADDR_PTR
+   current->ret_stack[index].retp = retp;
+#endif
+   return 0;
+}
+
+int function_graph_enter(unsigned long ret, unsigned long func,
+unsigned long frame_pointer, unsigned long *retp)
+{
+   

[for-next][PATCH 01/30] function_graph: Remove unused task_curr_ret_stack()

2018-12-05 Thread Steven Rostedt
From: "Steven Rostedt (VMware)" 

The static inline function task_curr_ret_stack() is unused, remove it.

Reviewed-by: Joel Fernandes (Google) 
Reviewed-by: Masami Hiramatsu 
Signed-off-by: Steven Rostedt (VMware) 
---
 include/linux/ftrace.h | 10 --
 1 file changed, 10 deletions(-)

diff --git a/include/linux/ftrace.h b/include/linux/ftrace.h
index dd16e8218db3..10bd46434908 100644
--- a/include/linux/ftrace.h
+++ b/include/linux/ftrace.h
@@ -809,11 +809,6 @@ extern void ftrace_graph_init_task(struct task_struct *t);
 extern void ftrace_graph_exit_task(struct task_struct *t);
 extern void ftrace_graph_init_idle_task(struct task_struct *t, int cpu);
 
-static inline int task_curr_ret_stack(struct task_struct *t)
-{
-   return t->curr_ret_stack;
-}
-
 static inline void pause_graph_tracing(void)
 {
atomic_inc(>tracing_graph_pause);
@@ -838,11 +833,6 @@ static inline int 
register_ftrace_graph(trace_func_graph_ret_t retfunc,
 }
 static inline void unregister_ftrace_graph(void) { }
 
-static inline int task_curr_ret_stack(struct task_struct *tsk)
-{
-   return -1;
-}
-
 static inline unsigned long
 ftrace_graph_ret_addr(struct task_struct *task, int *idx, unsigned long ret,
  unsigned long *retp)
-- 
2.19.1




[for-next][PATCH 00/30] tracing: Updates for the next merge window

2018-12-05 Thread Steven Rostedt
Note, I still have more in my queue that need to go through testing.

  git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace.git
for-next

Head SHA1: e007f5165a2e366579324062a69e56236a97fad3


Dan Carpenter (1):
  tracing: Have trace_stack nr_entries compare not be so subtle

Joe Lawrence (1):
  scripts/recordmcount.{c,pl}: support -ffunction-sections .text.* section 
names

Masami Hiramatsu (11):
  tracing/uprobes: Add busy check when cleanup all uprobes
  tracing: Lock event_mutex before synth_event_mutex
  tracing: Simplify creation and deletion of synthetic events
  tracing: Integrate similar probe argument parsers
  tracing: Add unified dynamic event framework
  tracing/kprobes: Use dyn_event framework for kprobe events
  tracing/uprobes: Use dyn_event framework for uprobe events
  tracing: Use dyn_event framework for synthetic events
  tracing: Remove unneeded synth_event_mutex
  tracing: Add generic event-name based remove event method
  selftests/ftrace: Add testcases for dynamic event

Steven Rostedt (VMware) (17):
  function_graph: Remove unused task_curr_ret_stack()
  tracing: Do not line wrap short line in function_graph_enter()
  fgraph: Create a fgraph.c file to store function graph infrastructure
  fgraph: Have set_graph_notrace only affect function_graph tracer
  arm64: function_graph: Remove use of FTRACE_NOTRACE_DEPTH
  function_graph: Remove the use of FTRACE_NOTRACE_DEPTH
  ftrace: Create new ftrace_internal.h header
  function_graph: Do not expose the graph_time option when profiler is not 
configured
  fgraph: Move function graph specific code into fgraph.c
  tracing: Rearrange functions in trace_sched_wakeup.c
  fgraph: Add new fgraph_ops structure to enable function graph hooks
  function_graph: Move ftrace_graph_ret_addr() to fgraph.c
  function_graph: Have profiler use new helper ftrace_graph_get_ret_stack()
  ring-buffer: Add percentage of ring buffer full to wake up reader
  tracing: Add tracefs file buffer_percentage
  tracing: Change default buffer_percent to 50
  tracing: Consolidate trace_add/remove_event_call back to the nolock 
functions


 Documentation/trace/kprobetrace.rst|   3 +
 Documentation/trace/uprobetracer.rst   |   4 +
 arch/arm64/kernel/stacktrace.c |   3 -
 include/linux/ftrace.h |  35 +-
 include/linux/ring_buffer.h|   4 +-
 kernel/trace/Kconfig   |   6 +
 kernel/trace/Makefile  |   2 +
 kernel/trace/fgraph.c  | 615 +
 kernel/trace/ftrace.c  | 471 ++--
 kernel/trace/ftrace_internal.h |  75 +++
 kernel/trace/ring_buffer.c |  94 +++-
 kernel/trace/trace.c   |  72 ++-
 kernel/trace/trace.h   |  13 +
 kernel/trace/trace_dynevent.c  | 217 
 kernel/trace/trace_dynevent.h  | 119 
 kernel/trace/trace_events.c|   8 +-
 kernel/trace/trace_events_hist.c   | 316 ++-
 kernel/trace/trace_functions_graph.c   | 334 ++-
 kernel/trace/trace_irqsoff.c   |  18 +-
 kernel/trace/trace_kprobe.c| 353 ++--
 kernel/trace/trace_probe.c |  74 ++-
 kernel/trace/trace_probe.h |   9 +-
 kernel/trace/trace_sched_wakeup.c  | 270 +
 kernel/trace/trace_selftest.c  |   8 +-
 kernel/trace/trace_stack.c |   2 +-
 kernel/trace/trace_uprobe.c| 301 +-
 scripts/recordmcount.c |   2 +-
 scripts/recordmcount.pl|  13 +
 .../ftrace/test.d/dynevent/add_remove_kprobe.tc|  30 +
 .../ftrace/test.d/dynevent/add_remove_synth.tc |  27 +
 .../ftrace/test.d/dynevent/clear_select_events.tc  |  50 ++
 .../ftrace/test.d/dynevent/generic_clear_event.tc  |  49 ++
 32 files changed, 2176 insertions(+), 1421 deletions(-)
 create mode 100644 kernel/trace/fgraph.c
 create mode 100644 kernel/trace/ftrace_internal.h
 create mode 100644 kernel/trace/trace_dynevent.c
 create mode 100644 kernel/trace/trace_dynevent.h
 create mode 100644 
tools/testing/selftests/ftrace/test.d/dynevent/add_remove_kprobe.tc
 create mode 100644 
tools/testing/selftests/ftrace/test.d/dynevent/add_remove_synth.tc
 create mode 100644 
tools/testing/selftests/ftrace/test.d/dynevent/clear_select_events.tc
 create mode 100644 
tools/testing/selftests/ftrace/test.d/dynevent/generic_clear_event.tc


[for-next][PATCH 03/30] fgraph: Create a fgraph.c file to store function graph infrastructure

2018-12-05 Thread Steven Rostedt
From: "Steven Rostedt (VMware)" 

As the function graph infrastructure can be used by thing other than
tracing, moving the code to its own file out of the trace_functions_graph.c
code makes more sense.

The fgraph.c file will only contain the infrastructure required to hook into
functions and their return code.

Reviewed-by: Joel Fernandes (Google) 
Signed-off-by: Steven Rostedt (VMware) 
---
 kernel/trace/Makefile|   1 +
 kernel/trace/fgraph.c| 232 +++
 kernel/trace/trace_functions_graph.c | 220 -
 3 files changed, 233 insertions(+), 220 deletions(-)
 create mode 100644 kernel/trace/fgraph.c

diff --git a/kernel/trace/Makefile b/kernel/trace/Makefile
index f81dadbc7c4a..c7ade7965464 100644
--- a/kernel/trace/Makefile
+++ b/kernel/trace/Makefile
@@ -57,6 +57,7 @@ obj-$(CONFIG_MMIOTRACE) += trace_mmiotrace.o
 obj-$(CONFIG_FUNCTION_GRAPH_TRACER) += trace_functions_graph.o
 obj-$(CONFIG_TRACE_BRANCH_PROFILING) += trace_branch.o
 obj-$(CONFIG_BLK_DEV_IO_TRACE) += blktrace.o
+obj-$(CONFIG_FUNCTION_GRAPH_TRACER) += fgraph.o
 ifeq ($(CONFIG_BLOCK),y)
 obj-$(CONFIG_EVENT_TRACING) += blktrace.o
 endif
diff --git a/kernel/trace/fgraph.c b/kernel/trace/fgraph.c
new file mode 100644
index ..5ad9c0e88b80
--- /dev/null
+++ b/kernel/trace/fgraph.c
@@ -0,0 +1,232 @@
+// SPDX-License-Identifier: GPL-2.0
+/*
+ * Infrastructure to took into function calls and returns.
+ * Copyright (c) 2008-2009 Frederic Weisbecker 
+ * Mostly borrowed from function tracer which
+ * is Copyright (c) Steven Rostedt 
+ *
+ * Highly modified by Steven Rostedt (VMware).
+ */
+#include 
+
+#include "trace.h"
+
+static bool kill_ftrace_graph;
+
+/**
+ * ftrace_graph_is_dead - returns true if ftrace_graph_stop() was called
+ *
+ * ftrace_graph_stop() is called when a severe error is detected in
+ * the function graph tracing. This function is called by the critical
+ * paths of function graph to keep those paths from doing any more harm.
+ */
+bool ftrace_graph_is_dead(void)
+{
+   return kill_ftrace_graph;
+}
+
+/**
+ * ftrace_graph_stop - set to permanently disable function graph tracincg
+ *
+ * In case of an error int function graph tracing, this is called
+ * to try to keep function graph tracing from causing any more harm.
+ * Usually this is pretty severe and this is called to try to at least
+ * get a warning out to the user.
+ */
+void ftrace_graph_stop(void)
+{
+   kill_ftrace_graph = true;
+}
+
+/* Add a function return address to the trace stack on thread info.*/
+static int
+ftrace_push_return_trace(unsigned long ret, unsigned long func,
+unsigned long frame_pointer, unsigned long *retp)
+{
+   unsigned long long calltime;
+   int index;
+
+   if (unlikely(ftrace_graph_is_dead()))
+   return -EBUSY;
+
+   if (!current->ret_stack)
+   return -EBUSY;
+
+   /*
+* We must make sure the ret_stack is tested before we read
+* anything else.
+*/
+   smp_rmb();
+
+   /* The return trace stack is full */
+   if (current->curr_ret_stack == FTRACE_RETFUNC_DEPTH - 1) {
+   atomic_inc(>trace_overrun);
+   return -EBUSY;
+   }
+
+   /*
+* The curr_ret_stack is an index to ftrace return stack of
+* current task.  Its value should be in [0, FTRACE_RETFUNC_
+* DEPTH) when the function graph tracer is used.  To support
+* filtering out specific functions, it makes the index
+* negative by subtracting huge value (FTRACE_NOTRACE_DEPTH)
+* so when it sees a negative index the ftrace will ignore
+* the record.  And the index gets recovered when returning
+* from the filtered function by adding the FTRACE_NOTRACE_
+* DEPTH and then it'll continue to record functions normally.
+*
+* The curr_ret_stack is initialized to -1 and get increased
+* in this function.  So it can be less than -1 only if it was
+* filtered out via ftrace_graph_notrace_addr() which can be
+* set from set_graph_notrace file in tracefs by user.
+*/
+   if (current->curr_ret_stack < -1)
+   return -EBUSY;
+
+   calltime = trace_clock_local();
+
+   index = ++current->curr_ret_stack;
+   if (ftrace_graph_notrace_addr(func))
+   current->curr_ret_stack -= FTRACE_NOTRACE_DEPTH;
+   barrier();
+   current->ret_stack[index].ret = ret;
+   current->ret_stack[index].func = func;
+   current->ret_stack[index].calltime = calltime;
+#ifdef HAVE_FUNCTION_GRAPH_FP_TEST
+   current->ret_stack[index].fp = frame_pointer;
+#endif
+#ifdef HAVE_FUNCTION_GRAPH_RET_ADDR_PTR
+   current->ret_stack[index].retp = retp;
+#endif
+   return 0;
+}
+
+int function_graph_enter(unsigned long ret, unsigned long func,
+unsigned long frame_pointer, unsigned long *retp)
+{
+   

[for-next][PATCH 04/30] fgraph: Have set_graph_notrace only affect function_graph tracer

2018-12-05 Thread Steven Rostedt
From: "Steven Rostedt (VMware)" 

In order to make the function graph infrastructure more generic, there can
not be code specific for the function_graph tracer in the generic code. This
includes the set_graph_notrace logic, that stops all graph calls when a
function in the set_graph_notrace is hit.

By using the trace_recursion mask, we can use a bit in the current
task_struct to implement the notrace code, and move the logic out of
fgraph.c and into trace_functions_graph.c and keeps it affecting only the
tracer and not all call graph callbacks.

Acked-by: Namhyung Kim 
Reviewed-by: Joel Fernandes (Google) 
Signed-off-by: Steven Rostedt (VMware) 
---
 kernel/trace/fgraph.c| 21 -
 kernel/trace/trace.h |  7 +++
 kernel/trace/trace_functions_graph.c | 22 ++
 3 files changed, 29 insertions(+), 21 deletions(-)

diff --git a/kernel/trace/fgraph.c b/kernel/trace/fgraph.c
index 5ad9c0e88b80..e852b69c0e64 100644
--- a/kernel/trace/fgraph.c
+++ b/kernel/trace/fgraph.c
@@ -64,30 +64,9 @@ ftrace_push_return_trace(unsigned long ret, unsigned long 
func,
return -EBUSY;
}
 
-   /*
-* The curr_ret_stack is an index to ftrace return stack of
-* current task.  Its value should be in [0, FTRACE_RETFUNC_
-* DEPTH) when the function graph tracer is used.  To support
-* filtering out specific functions, it makes the index
-* negative by subtracting huge value (FTRACE_NOTRACE_DEPTH)
-* so when it sees a negative index the ftrace will ignore
-* the record.  And the index gets recovered when returning
-* from the filtered function by adding the FTRACE_NOTRACE_
-* DEPTH and then it'll continue to record functions normally.
-*
-* The curr_ret_stack is initialized to -1 and get increased
-* in this function.  So it can be less than -1 only if it was
-* filtered out via ftrace_graph_notrace_addr() which can be
-* set from set_graph_notrace file in tracefs by user.
-*/
-   if (current->curr_ret_stack < -1)
-   return -EBUSY;
-
calltime = trace_clock_local();
 
index = ++current->curr_ret_stack;
-   if (ftrace_graph_notrace_addr(func))
-   current->curr_ret_stack -= FTRACE_NOTRACE_DEPTH;
barrier();
current->ret_stack[index].ret = ret;
current->ret_stack[index].func = func;
diff --git a/kernel/trace/trace.h b/kernel/trace/trace.h
index 447bd96ee658..f67060a75f38 100644
--- a/kernel/trace/trace.h
+++ b/kernel/trace/trace.h
@@ -534,6 +534,13 @@ enum {
 
TRACE_GRAPH_DEPTH_START_BIT,
TRACE_GRAPH_DEPTH_END_BIT,
+
+   /*
+* To implement set_graph_notrace, if this bit is set, we ignore
+* function graph tracing of called functions, until the return
+* function is called to clear it.
+*/
+   TRACE_GRAPH_NOTRACE_BIT,
 };
 
 #define trace_recursion_set(bit)   do { (current)->trace_recursion |= 
(1<<(bit)); } while (0)
diff --git a/kernel/trace/trace_functions_graph.c 
b/kernel/trace/trace_functions_graph.c
index b846d82c2f95..ecf543df943b 100644
--- a/kernel/trace/trace_functions_graph.c
+++ b/kernel/trace/trace_functions_graph.c
@@ -188,6 +188,18 @@ int trace_graph_entry(struct ftrace_graph_ent *trace)
int cpu;
int pc;
 
+   if (trace_recursion_test(TRACE_GRAPH_NOTRACE_BIT))
+   return 0;
+
+   if (ftrace_graph_notrace_addr(trace->func)) {
+   trace_recursion_set(TRACE_GRAPH_NOTRACE_BIT);
+   /*
+* Need to return 1 to have the return called
+* that will clear the NOTRACE bit.
+*/
+   return 1;
+   }
+
if (!ftrace_trace_task(tr))
return 0;
 
@@ -290,6 +302,11 @@ void trace_graph_return(struct ftrace_graph_ret *trace)
 
ftrace_graph_addr_finish(trace);
 
+   if (trace_recursion_test(TRACE_GRAPH_NOTRACE_BIT)) {
+   trace_recursion_clear(TRACE_GRAPH_NOTRACE_BIT);
+   return;
+   }
+
local_irq_save(flags);
cpu = raw_smp_processor_id();
data = per_cpu_ptr(tr->trace_buffer.data, cpu);
@@ -315,6 +332,11 @@ static void trace_graph_thresh_return(struct 
ftrace_graph_ret *trace)
 {
ftrace_graph_addr_finish(trace);
 
+   if (trace_recursion_test(TRACE_GRAPH_NOTRACE_BIT)) {
+   trace_recursion_clear(TRACE_GRAPH_NOTRACE_BIT);
+   return;
+   }
+
if (tracing_thresh &&
(trace->rettime - trace->calltime < tracing_thresh))
return;
-- 
2.19.1




[for-next][PATCH 11/30] fgraph: Add new fgraph_ops structure to enable function graph hooks

2018-12-05 Thread Steven Rostedt
From: "Steven Rostedt (VMware)" 

Currently the registering of function graph is to pass in a entry and return
function. We need to have a way to associate those functions together where
the entry can determine to run the return hook. Having a structure that
contains both functions will facilitate the process of converting the code
to be able to do such.

This is similar to the way function hooks are enabled (it passes in
ftrace_ops). Instead of passing in the functions to use, a single structure
is passed in to the registering function.

The unregister function is now passed in the fgraph_ops handle. When we
allow more than one callback to the function graph hooks, this will let the
system know which one to remove.

Reviewed-by: Joel Fernandes (Google) 
Signed-off-by: Steven Rostedt (VMware) 
---
 include/linux/ftrace.h   | 21 +++--
 kernel/trace/fgraph.c|  9 -
 kernel/trace/ftrace.c| 10 +++---
 kernel/trace/trace_functions_graph.c | 21 -
 kernel/trace/trace_irqsoff.c | 18 +++---
 kernel/trace/trace_sched_wakeup.c| 16 +++-
 kernel/trace/trace_selftest.c|  8 ++--
 7 files changed, 58 insertions(+), 45 deletions(-)

diff --git a/include/linux/ftrace.h b/include/linux/ftrace.h
index 98625f10d982..21c80491ccde 100644
--- a/include/linux/ftrace.h
+++ b/include/linux/ftrace.h
@@ -749,6 +749,11 @@ typedef int (*trace_func_graph_ent_t)(struct 
ftrace_graph_ent *); /* entry */
 
 #ifdef CONFIG_FUNCTION_GRAPH_TRACER
 
+struct fgraph_ops {
+   trace_func_graph_ent_t  entryfunc;
+   trace_func_graph_ret_t  retfunc;
+};
+
 /*
  * Stack of return addresses for functions
  * of a thread.
@@ -792,8 +797,9 @@ unsigned long ftrace_graph_ret_addr(struct task_struct 
*task, int *idx,
 
 #define FTRACE_RETFUNC_DEPTH 50
 #define FTRACE_RETSTACK_ALLOC_SIZE 32
-extern int register_ftrace_graph(trace_func_graph_ret_t retfunc,
-   trace_func_graph_ent_t entryfunc);
+
+extern int register_ftrace_graph(struct fgraph_ops *ops);
+extern void unregister_ftrace_graph(struct fgraph_ops *ops);
 
 extern bool ftrace_graph_is_dead(void);
 extern void ftrace_graph_stop(void);
@@ -802,8 +808,6 @@ extern void ftrace_graph_stop(void);
 extern trace_func_graph_ret_t ftrace_graph_return;
 extern trace_func_graph_ent_t ftrace_graph_entry;
 
-extern void unregister_ftrace_graph(void);
-
 extern void ftrace_graph_init_task(struct task_struct *t);
 extern void ftrace_graph_exit_task(struct task_struct *t);
 extern void ftrace_graph_init_idle_task(struct task_struct *t, int cpu);
@@ -825,12 +829,9 @@ static inline void ftrace_graph_init_task(struct 
task_struct *t) { }
 static inline void ftrace_graph_exit_task(struct task_struct *t) { }
 static inline void ftrace_graph_init_idle_task(struct task_struct *t, int cpu) 
{ }
 
-static inline int register_ftrace_graph(trace_func_graph_ret_t retfunc,
- trace_func_graph_ent_t entryfunc)
-{
-   return -1;
-}
-static inline void unregister_ftrace_graph(void) { }
+/* Define as macros as fgraph_ops may not be defined */
+#define register_ftrace_graph(ops) ({ -1; })
+#define unregister_ftrace_graph(ops) do { } while (0)
 
 static inline unsigned long
 ftrace_graph_ret_addr(struct task_struct *task, int *idx, unsigned long ret,
diff --git a/kernel/trace/fgraph.c b/kernel/trace/fgraph.c
index 374f3e42e29e..cc35606e9a3e 100644
--- a/kernel/trace/fgraph.c
+++ b/kernel/trace/fgraph.c
@@ -490,8 +490,7 @@ static int start_graph_tracing(void)
return ret;
 }
 
-int register_ftrace_graph(trace_func_graph_ret_t retfunc,
-   trace_func_graph_ent_t entryfunc)
+int register_ftrace_graph(struct fgraph_ops *gops)
 {
int ret = 0;
 
@@ -512,7 +511,7 @@ int register_ftrace_graph(trace_func_graph_ret_t retfunc,
goto out;
}
 
-   ftrace_graph_return = retfunc;
+   ftrace_graph_return = gops->retfunc;
 
/*
 * Update the indirect function to the entryfunc, and the
@@ -520,7 +519,7 @@ int register_ftrace_graph(trace_func_graph_ret_t retfunc,
 * call the update fgraph entry function to determine if
 * the entryfunc should be called directly or not.
 */
-   __ftrace_graph_entry = entryfunc;
+   __ftrace_graph_entry = gops->entryfunc;
ftrace_graph_entry = ftrace_graph_entry_test;
update_function_graph_func();
 
@@ -530,7 +529,7 @@ int register_ftrace_graph(trace_func_graph_ret_t retfunc,
return ret;
 }
 
-void unregister_ftrace_graph(void)
+void unregister_ftrace_graph(struct fgraph_ops *gops)
 {
mutex_lock(_lock);
 
diff --git a/kernel/trace/ftrace.c b/kernel/trace/ftrace.c
index c53533b833cf..d06fe588e650 100644
--- a/kernel/trace/ftrace.c
+++ b/kernel/trace/ftrace.c
@@ -849,15 +849,19 @@ static void profile_graph_return(struct ftrace_graph_ret 
*trace)

[for-next][PATCH 12/30] function_graph: Move ftrace_graph_ret_addr() to fgraph.c

2018-12-05 Thread Steven Rostedt
From: "Steven Rostedt (VMware)" 

Move the function function_graph_ret_addr() to fgraph.c, as the management
of the curr_ret_stack is going to change, and all the accesses to ret_stack
needs to be done in fgraph.c.

Reviewed-by: Joel Fernandes (Google) 
Signed-off-by: Steven Rostedt (VMware) 
---
 kernel/trace/fgraph.c| 55 
 kernel/trace/trace_functions_graph.c | 55 
 2 files changed, 55 insertions(+), 55 deletions(-)

diff --git a/kernel/trace/fgraph.c b/kernel/trace/fgraph.c
index cc35606e9a3e..90fcefcaff2a 100644
--- a/kernel/trace/fgraph.c
+++ b/kernel/trace/fgraph.c
@@ -232,6 +232,61 @@ unsigned long ftrace_return_to_handler(unsigned long 
frame_pointer)
return ret;
 }
 
+/**
+ * ftrace_graph_ret_addr - convert a potentially modified stack return address
+ *to its original value
+ *
+ * This function can be called by stack unwinding code to convert a found stack
+ * return address ('ret') to its original value, in case the function graph
+ * tracer has modified it to be 'return_to_handler'.  If the address hasn't
+ * been modified, the unchanged value of 'ret' is returned.
+ *
+ * 'idx' is a state variable which should be initialized by the caller to zero
+ * before the first call.
+ *
+ * 'retp' is a pointer to the return address on the stack.  It's ignored if
+ * the arch doesn't have HAVE_FUNCTION_GRAPH_RET_ADDR_PTR defined.
+ */
+#ifdef HAVE_FUNCTION_GRAPH_RET_ADDR_PTR
+unsigned long ftrace_graph_ret_addr(struct task_struct *task, int *idx,
+   unsigned long ret, unsigned long *retp)
+{
+   int index = task->curr_ret_stack;
+   int i;
+
+   if (ret != (unsigned long)return_to_handler)
+   return ret;
+
+   if (index < 0)
+   return ret;
+
+   for (i = 0; i <= index; i++)
+   if (task->ret_stack[i].retp == retp)
+   return task->ret_stack[i].ret;
+
+   return ret;
+}
+#else /* !HAVE_FUNCTION_GRAPH_RET_ADDR_PTR */
+unsigned long ftrace_graph_ret_addr(struct task_struct *task, int *idx,
+   unsigned long ret, unsigned long *retp)
+{
+   int task_idx;
+
+   if (ret != (unsigned long)return_to_handler)
+   return ret;
+
+   task_idx = task->curr_ret_stack;
+
+   if (!task->ret_stack || task_idx < *idx)
+   return ret;
+
+   task_idx -= *idx;
+   (*idx)++;
+
+   return task->ret_stack[task_idx].ret;
+}
+#endif /* HAVE_FUNCTION_GRAPH_RET_ADDR_PTR */
+
 static struct ftrace_ops graph_ops = {
.func   = ftrace_stub,
.flags  = FTRACE_OPS_FL_RECURSION_SAFE |
diff --git a/kernel/trace/trace_functions_graph.c 
b/kernel/trace/trace_functions_graph.c
index 140b4b51ab34..c2af1560e856 100644
--- a/kernel/trace/trace_functions_graph.c
+++ b/kernel/trace/trace_functions_graph.c
@@ -94,61 +94,6 @@ static void
 print_graph_duration(struct trace_array *tr, unsigned long long duration,
 struct trace_seq *s, u32 flags);
 
-/**
- * ftrace_graph_ret_addr - convert a potentially modified stack return address
- *to its original value
- *
- * This function can be called by stack unwinding code to convert a found stack
- * return address ('ret') to its original value, in case the function graph
- * tracer has modified it to be 'return_to_handler'.  If the address hasn't
- * been modified, the unchanged value of 'ret' is returned.
- *
- * 'idx' is a state variable which should be initialized by the caller to zero
- * before the first call.
- *
- * 'retp' is a pointer to the return address on the stack.  It's ignored if
- * the arch doesn't have HAVE_FUNCTION_GRAPH_RET_ADDR_PTR defined.
- */
-#ifdef HAVE_FUNCTION_GRAPH_RET_ADDR_PTR
-unsigned long ftrace_graph_ret_addr(struct task_struct *task, int *idx,
-   unsigned long ret, unsigned long *retp)
-{
-   int index = task->curr_ret_stack;
-   int i;
-
-   if (ret != (unsigned long)return_to_handler)
-   return ret;
-
-   if (index < 0)
-   return ret;
-
-   for (i = 0; i <= index; i++)
-   if (task->ret_stack[i].retp == retp)
-   return task->ret_stack[i].ret;
-
-   return ret;
-}
-#else /* !HAVE_FUNCTION_GRAPH_RET_ADDR_PTR */
-unsigned long ftrace_graph_ret_addr(struct task_struct *task, int *idx,
-   unsigned long ret, unsigned long *retp)
-{
-   int task_idx;
-
-   if (ret != (unsigned long)return_to_handler)
-   return ret;
-
-   task_idx = task->curr_ret_stack;
-
-   if (!task->ret_stack || task_idx < *idx)
-   return ret;
-
-   task_idx -= *idx;
-   (*idx)++;
-
-   return task->ret_stack[task_idx].ret;
-}
-#endif /* HAVE_FUNCTION_GRAPH_RET_ADDR_PTR */
-
 int __trace_graph_entry(struct trace_array 

[for-next][PATCH 14/30] tracing: Have trace_stack nr_entries compare not be so subtle

2018-12-05 Thread Steven Rostedt
From: Dan Carpenter 

Dan Carpenter reviewed the trace_stack.c code and figured he found an off by
one bug.

 "From reviewing the code, it seems possible for
  stack_trace_max.nr_entries to be set to .max_entries and in that case we
  would be reading one element beyond the end of the stack_dump_trace[]
  array.  If it's not set to .max_entries then the bug doesn't affect
  runtime."

Although it looks to be the case, it is not. Because we have:

 static unsigned long stack_dump_trace[STACK_TRACE_ENTRIES+1] =
 { [0 ... (STACK_TRACE_ENTRIES)] = ULONG_MAX };

 struct stack_trace stack_trace_max = {
.max_entries= STACK_TRACE_ENTRIES - 1,
.entries= _dump_trace[0],
 };

And:

stack_trace_max.nr_entries = x;
for (; x < i; x++)
stack_dump_trace[x] = ULONG_MAX;

Even if nr_entries equals max_entries, indexing with it into the
stack_dump_trace[] array will not overflow the array. But if it is the case,
the second part of the conditional that tests stack_dump_trace[nr_entries]
to ULONG_MAX will always be true.

By applying Dan's patch, it removes the subtle aspect of it and makes the if
conditional slightly more efficient.

Link: http://lkml.kernel.org/r/20180620110758.crunhd5bfep7zuiz@kili.mountain

Signed-off-by: Dan Carpenter 
Signed-off-by: Steven Rostedt (VMware) 
---
 kernel/trace/trace_stack.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/kernel/trace/trace_stack.c b/kernel/trace/trace_stack.c
index 2b0d1ee3241c..e2a153fc1afc 100644
--- a/kernel/trace/trace_stack.c
+++ b/kernel/trace/trace_stack.c
@@ -286,7 +286,7 @@ __next(struct seq_file *m, loff_t *pos)
 {
long n = *pos - 1;
 
-   if (n > stack_trace_max.nr_entries || stack_dump_trace[n] == ULONG_MAX)
+   if (n >= stack_trace_max.nr_entries || stack_dump_trace[n] == ULONG_MAX)
return NULL;
 
m->private = (void *)n;
-- 
2.19.1




[for-next][PATCH 17/30] tracing: Add tracefs file buffer_percentage

2018-12-05 Thread Steven Rostedt
From: "Steven Rostedt (VMware)" 

Add a "buffer_percentage" file, that allows users to specify how much of the
buffer (percentage of pages) need to be filled before waking up a task
blocked on a per cpu trace_pipe_raw file.

Signed-off-by: Steven Rostedt (VMware) 
---
 kernel/trace/ring_buffer.c | 39 ---
 kernel/trace/trace.c   | 54 +-
 kernel/trace/trace.h   |  1 +
 3 files changed, 77 insertions(+), 17 deletions(-)

diff --git a/kernel/trace/ring_buffer.c b/kernel/trace/ring_buffer.c
index 9edb628603ab..5434c16f2192 100644
--- a/kernel/trace/ring_buffer.c
+++ b/kernel/trace/ring_buffer.c
@@ -489,6 +489,7 @@ struct ring_buffer_per_cpu {
local_t commits;
local_t pages_touched;
local_t pages_read;
+   longlast_pages_touch;
size_t  shortest_full;
unsigned long   read;
unsigned long   read_bytes;
@@ -2632,7 +2633,9 @@ static void rb_commit(struct ring_buffer_per_cpu 
*cpu_buffer,
 static __always_inline void
 rb_wakeups(struct ring_buffer *buffer, struct ring_buffer_per_cpu *cpu_buffer)
 {
-   bool pagebusy;
+   size_t nr_pages;
+   size_t dirty;
+   size_t full;
 
if (buffer->irq_work.waiters_pending) {
buffer->irq_work.waiters_pending = false;
@@ -2646,24 +2649,27 @@ rb_wakeups(struct ring_buffer *buffer, struct 
ring_buffer_per_cpu *cpu_buffer)
irq_work_queue(_buffer->irq_work.work);
}
 
-   pagebusy = cpu_buffer->reader_page == cpu_buffer->commit_page;
+   if (cpu_buffer->last_pages_touch == 
local_read(_buffer->pages_touched))
+   return;
 
-   if (!pagebusy && cpu_buffer->irq_work.full_waiters_pending) {
-   size_t nr_pages;
-   size_t dirty;
-   size_t full;
+   if (cpu_buffer->reader_page == cpu_buffer->commit_page)
+   return;
 
-   full = cpu_buffer->shortest_full;
-   nr_pages = cpu_buffer->nr_pages;
-   dirty = ring_buffer_nr_dirty_pages(buffer, cpu_buffer->cpu);
-   if (full && nr_pages && (dirty * 100) <= full * nr_pages)
-   return;
+   if (!cpu_buffer->irq_work.full_waiters_pending)
+   return;
 
-   cpu_buffer->irq_work.wakeup_full = true;
-   cpu_buffer->irq_work.full_waiters_pending = false;
-   /* irq_work_queue() supplies it's own memory barriers */
-   irq_work_queue(_buffer->irq_work.work);
-   }
+   cpu_buffer->last_pages_touch = local_read(_buffer->pages_touched);
+
+   full = cpu_buffer->shortest_full;
+   nr_pages = cpu_buffer->nr_pages;
+   dirty = ring_buffer_nr_dirty_pages(buffer, cpu_buffer->cpu);
+   if (full && nr_pages && (dirty * 100) <= full * nr_pages)
+   return;
+
+   cpu_buffer->irq_work.wakeup_full = true;
+   cpu_buffer->irq_work.full_waiters_pending = false;
+   /* irq_work_queue() supplies it's own memory barriers */
+   irq_work_queue(_buffer->irq_work.work);
 }
 
 /*
@@ -4394,6 +4400,7 @@ rb_reset_cpu(struct ring_buffer_per_cpu *cpu_buffer)
local_set(_buffer->commits, 0);
local_set(_buffer->pages_touched, 0);
local_set(_buffer->pages_read, 0);
+   cpu_buffer->last_pages_touch = 0;
cpu_buffer->shortest_full = 0;
cpu_buffer->read = 0;
cpu_buffer->read_bytes = 0;
diff --git a/kernel/trace/trace.c b/kernel/trace/trace.c
index 48d5eb22ff33..d382fd1aa4a6 100644
--- a/kernel/trace/trace.c
+++ b/kernel/trace/trace.c
@@ -6948,7 +6948,7 @@ tracing_buffers_splice_read(struct file *file, loff_t 
*ppos,
if ((file->f_flags & O_NONBLOCK) || (flags & SPLICE_F_NONBLOCK))
goto out;
 
-   ret = wait_on_pipe(iter, 1);
+   ret = wait_on_pipe(iter, iter->tr->buffer_percent);
if (ret)
goto out;
 
@@ -7662,6 +7662,53 @@ static const struct file_operations rb_simple_fops = {
.llseek = default_llseek,
 };
 
+static ssize_t
+buffer_percent_read(struct file *filp, char __user *ubuf,
+   size_t cnt, loff_t *ppos)
+{
+   struct trace_array *tr = filp->private_data;
+   char buf[64];
+   int r;
+
+   r = tr->buffer_percent;
+   r = sprintf(buf, "%d\n", r);
+
+   return simple_read_from_buffer(ubuf, cnt, ppos, buf, r);
+}
+
+static ssize_t
+buffer_percent_write(struct file *filp, const char __user *ubuf,
+size_t cnt, loff_t *ppos)
+{
+   struct trace_array *tr = filp->private_data;
+   unsigned long val;
+   int ret;
+
+   ret = kstrtoul_from_user(ubuf, cnt, 10, );
+   if (ret)
+   return ret;
+
+   if (val > 100)
+   return 

[for-next][PATCH 07/30] ftrace: Create new ftrace_internal.h header

2018-12-05 Thread Steven Rostedt
From: "Steven Rostedt (VMware)" 

In order to move function graph infrastructure into its own file (fgraph.h)
it needs to access various functions and variables in ftrace.c that are
currently static. Create a new file called ftrace-internal.h that holds the
function prototypes and the extern declarations of the variables needed by
fgraph.c as well, and make them global in ftrace.c such that they can be
used outside that file.

Reviewed-by: Joel Fernandes (Google) 
Signed-off-by: Steven Rostedt (VMware) 
---
 kernel/trace/ftrace.c  | 76 +++---
 kernel/trace/ftrace_internal.h | 75 +
 2 files changed, 89 insertions(+), 62 deletions(-)
 create mode 100644 kernel/trace/ftrace_internal.h

diff --git a/kernel/trace/ftrace.c b/kernel/trace/ftrace.c
index 77734451cb05..52c89428b0db 100644
--- a/kernel/trace/ftrace.c
+++ b/kernel/trace/ftrace.c
@@ -40,6 +40,7 @@
 #include 
 #include 
 
+#include "ftrace_internal.h"
 #include "trace_output.h"
 #include "trace_stat.h"
 
@@ -77,7 +78,7 @@
 #define ASSIGN_OPS_HASH(opsname, val)
 #endif
 
-static struct ftrace_ops ftrace_list_end __read_mostly = {
+struct ftrace_ops ftrace_list_end __read_mostly = {
.func   = ftrace_stub,
.flags  = FTRACE_OPS_FL_RECURSION_SAFE | FTRACE_OPS_FL_STUB,
INIT_OPS_HASH(ftrace_list_end)
@@ -112,11 +113,11 @@ static void ftrace_update_trampoline(struct ftrace_ops 
*ops);
  */
 static int ftrace_disabled __read_mostly;
 
-static DEFINE_MUTEX(ftrace_lock);
+DEFINE_MUTEX(ftrace_lock);
 
-static struct ftrace_ops __rcu *ftrace_ops_list __read_mostly = 
_list_end;
+struct ftrace_ops __rcu *ftrace_ops_list __read_mostly = _list_end;
 ftrace_func_t ftrace_trace_function __read_mostly = ftrace_stub;
-static struct ftrace_ops global_ops;
+struct ftrace_ops global_ops;
 
 #if ARCH_SUPPORTS_FTRACE_OPS
 static void ftrace_ops_list_func(unsigned long ip, unsigned long parent_ip,
@@ -127,26 +128,6 @@ static void ftrace_ops_no_ops(unsigned long ip, unsigned 
long parent_ip);
 #define ftrace_ops_list_func ((ftrace_func_t)ftrace_ops_no_ops)
 #endif
 
-/*
- * Traverse the ftrace_global_list, invoking all entries.  The reason that we
- * can use rcu_dereference_raw_notrace() is that elements removed from this 
list
- * are simply leaked, so there is no need to interact with a grace-period
- * mechanism.  The rcu_dereference_raw_notrace() calls are needed to handle
- * concurrent insertions into the ftrace_global_list.
- *
- * Silly Alpha and silly pointer-speculation compiler optimizations!
- */
-#define do_for_each_ftrace_op(op, list)\
-   op = rcu_dereference_raw_notrace(list); \
-   do
-
-/*
- * Optimized for just a single item in the list (as that is the normal case).
- */
-#define while_for_each_ftrace_op(op)   \
-   while (likely(op = rcu_dereference_raw_notrace((op)->next)) &&  \
-  unlikely((op) != _list_end))
-
 static inline void ftrace_ops_init(struct ftrace_ops *ops)
 {
 #ifdef CONFIG_DYNAMIC_FTRACE
@@ -187,17 +168,11 @@ static void ftrace_sync_ipi(void *data)
 }
 
 #ifdef CONFIG_FUNCTION_GRAPH_TRACER
-static void update_function_graph_func(void);
-
 /* Both enabled by default (can be cleared by function_graph tracer flags */
 static bool fgraph_sleep_time = true;
 static bool fgraph_graph_time = true;
-
-#else
-static inline void update_function_graph_func(void) { }
 #endif
 
-
 static ftrace_func_t ftrace_ops_get_list_func(struct ftrace_ops *ops)
 {
/*
@@ -334,7 +309,7 @@ static int remove_ftrace_ops(struct ftrace_ops __rcu **list,
 
 static void ftrace_update_trampoline(struct ftrace_ops *ops);
 
-static int __register_ftrace_function(struct ftrace_ops *ops)
+int __register_ftrace_function(struct ftrace_ops *ops)
 {
if (ops->flags & FTRACE_OPS_FL_DELETED)
return -EINVAL;
@@ -375,7 +350,7 @@ static int __register_ftrace_function(struct ftrace_ops 
*ops)
return 0;
 }
 
-static int __unregister_ftrace_function(struct ftrace_ops *ops)
+int __unregister_ftrace_function(struct ftrace_ops *ops)
 {
int ret;
 
@@ -1022,9 +997,7 @@ static __init void ftrace_profile_tracefs(struct dentry 
*d_tracer)
 #endif /* CONFIG_FUNCTION_PROFILER */
 
 #ifdef CONFIG_FUNCTION_GRAPH_TRACER
-static int ftrace_graph_active;
-#else
-# define ftrace_graph_active 0
+int ftrace_graph_active;
 #endif
 
 #ifdef CONFIG_DYNAMIC_FTRACE
@@ -1067,7 +1040,7 @@ static const struct ftrace_hash empty_hash = {
 };
 #define EMPTY_HASH ((struct ftrace_hash *)_hash)
 
-static struct ftrace_ops global_ops = {
+struct ftrace_ops global_ops = {
.func   = ftrace_stub,
.local_hash.notrace_hash= EMPTY_HASH,
.local_hash.filter_hash = EMPTY_HASH,
@@ -1503,7 +1476,7 @@ static bool hash_contains_ip(unsigned long ip,
  * This needs to be called with preemption disabled as
  * the hashes are freed 

[for-next][PATCH 09/30] fgraph: Move function graph specific code into fgraph.c

2018-12-05 Thread Steven Rostedt
From: "Steven Rostedt (VMware)" 

To make the function graph infrastructure more managable, the code needs to
be in its own file (fgraph.c). Move the code that is specific for managing
the function graph infrastructure out of ftrace.c and into fgraph.c

Reviewed-by: Joel Fernandes (Google) 
Signed-off-by: Steven Rostedt (VMware) 
---
 kernel/trace/fgraph.c | 360 -
 kernel/trace/ftrace.c | 368 +-
 2 files changed, 366 insertions(+), 362 deletions(-)

diff --git a/kernel/trace/fgraph.c b/kernel/trace/fgraph.c
index de887a983ac7..374f3e42e29e 100644
--- a/kernel/trace/fgraph.c
+++ b/kernel/trace/fgraph.c
@@ -7,11 +7,27 @@
  *
  * Highly modified by Steven Rostedt (VMware).
  */
+#include 
 #include 
+#include 
 
-#include "trace.h"
+#include 
+
+#include "ftrace_internal.h"
+
+#ifdef CONFIG_DYNAMIC_FTRACE
+#define ASSIGN_OPS_HASH(opsname, val) \
+   .func_hash  = val, \
+   .local_hash.regex_lock  = 
__MUTEX_INITIALIZER(opsname.local_hash.regex_lock),
+#else
+#define ASSIGN_OPS_HASH(opsname, val)
+#endif
 
 static bool kill_ftrace_graph;
+int ftrace_graph_active;
+
+/* Both enabled by default (can be cleared by function_graph tracer flags */
+static bool fgraph_sleep_time = true;
 
 /**
  * ftrace_graph_is_dead - returns true if ftrace_graph_stop() was called
@@ -161,6 +177,31 @@ ftrace_pop_return_trace(struct ftrace_graph_ret *trace, 
unsigned long *ret,
barrier();
 }
 
+/*
+ * Hibernation protection.
+ * The state of the current task is too much unstable during
+ * suspend/restore to disk. We want to protect against that.
+ */
+static int
+ftrace_suspend_notifier_call(struct notifier_block *bl, unsigned long state,
+   void *unused)
+{
+   switch (state) {
+   case PM_HIBERNATION_PREPARE:
+   pause_graph_tracing();
+   break;
+
+   case PM_POST_HIBERNATION:
+   unpause_graph_tracing();
+   break;
+   }
+   return NOTIFY_DONE;
+}
+
+static struct notifier_block ftrace_suspend_notifier = {
+   .notifier_call = ftrace_suspend_notifier_call,
+};
+
 /*
  * Send the trace to the ring-buffer.
  * @return the original return address.
@@ -190,3 +231,320 @@ unsigned long ftrace_return_to_handler(unsigned long 
frame_pointer)
 
return ret;
 }
+
+static struct ftrace_ops graph_ops = {
+   .func   = ftrace_stub,
+   .flags  = FTRACE_OPS_FL_RECURSION_SAFE |
+  FTRACE_OPS_FL_INITIALIZED |
+  FTRACE_OPS_FL_PID |
+  FTRACE_OPS_FL_STUB,
+#ifdef FTRACE_GRAPH_TRAMP_ADDR
+   .trampoline = FTRACE_GRAPH_TRAMP_ADDR,
+   /* trampoline_size is only needed for dynamically allocated tramps */
+#endif
+   ASSIGN_OPS_HASH(graph_ops, _ops.local_hash)
+};
+
+void ftrace_graph_sleep_time_control(bool enable)
+{
+   fgraph_sleep_time = enable;
+}
+
+int ftrace_graph_entry_stub(struct ftrace_graph_ent *trace)
+{
+   return 0;
+}
+
+/* The callbacks that hook a function */
+trace_func_graph_ret_t ftrace_graph_return =
+   (trace_func_graph_ret_t)ftrace_stub;
+trace_func_graph_ent_t ftrace_graph_entry = ftrace_graph_entry_stub;
+static trace_func_graph_ent_t __ftrace_graph_entry = ftrace_graph_entry_stub;
+
+/* Try to assign a return stack array on FTRACE_RETSTACK_ALLOC_SIZE tasks. */
+static int alloc_retstack_tasklist(struct ftrace_ret_stack **ret_stack_list)
+{
+   int i;
+   int ret = 0;
+   int start = 0, end = FTRACE_RETSTACK_ALLOC_SIZE;
+   struct task_struct *g, *t;
+
+   for (i = 0; i < FTRACE_RETSTACK_ALLOC_SIZE; i++) {
+   ret_stack_list[i] =
+   kmalloc_array(FTRACE_RETFUNC_DEPTH,
+ sizeof(struct ftrace_ret_stack),
+ GFP_KERNEL);
+   if (!ret_stack_list[i]) {
+   start = 0;
+   end = i;
+   ret = -ENOMEM;
+   goto free;
+   }
+   }
+
+   read_lock(_lock);
+   do_each_thread(g, t) {
+   if (start == end) {
+   ret = -EAGAIN;
+   goto unlock;
+   }
+
+   if (t->ret_stack == NULL) {
+   atomic_set(>tracing_graph_pause, 0);
+   atomic_set(>trace_overrun, 0);
+   t->curr_ret_stack = -1;
+   t->curr_ret_depth = -1;
+   /* Make sure the tasks see the -1 first: */
+   smp_wmb();
+   t->ret_stack = ret_stack_list[start++];
+   }
+   } while_each_thread(g, t);
+
+unlock:
+   read_unlock(_lock);
+free:
+   for (i = start; i < end; i++)
+   

[for-next][PATCH 06/30] function_graph: Remove the use of FTRACE_NOTRACE_DEPTH

2018-12-05 Thread Steven Rostedt
From: "Steven Rostedt (VMware)" 

The curr_ret_stack is no longer set to a negative value when a function is
not to be traced by the function graph tracer. Remove the usage of
FTRACE_NOTRACE_DEPTH, as it is no longer needed.

Reviewed-by: Joel Fernandes (Google) 
Signed-off-by: Steven Rostedt (VMware) 
---
 include/linux/ftrace.h   |  1 -
 kernel/trace/fgraph.c| 19 ---
 kernel/trace/trace_functions_graph.c | 11 ---
 3 files changed, 31 deletions(-)

diff --git a/include/linux/ftrace.h b/include/linux/ftrace.h
index 10bd46434908..98625f10d982 100644
--- a/include/linux/ftrace.h
+++ b/include/linux/ftrace.h
@@ -790,7 +790,6 @@ unsigned long ftrace_graph_ret_addr(struct task_struct 
*task, int *idx,
  */
 #define __notrace_funcgraphnotrace
 
-#define FTRACE_NOTRACE_DEPTH 65536
 #define FTRACE_RETFUNC_DEPTH 50
 #define FTRACE_RETSTACK_ALLOC_SIZE 32
 extern int register_ftrace_graph(trace_func_graph_ret_t retfunc,
diff --git a/kernel/trace/fgraph.c b/kernel/trace/fgraph.c
index e852b69c0e64..de887a983ac7 100644
--- a/kernel/trace/fgraph.c
+++ b/kernel/trace/fgraph.c
@@ -112,16 +112,6 @@ ftrace_pop_return_trace(struct ftrace_graph_ret *trace, 
unsigned long *ret,
 
index = current->curr_ret_stack;
 
-   /*
-* A negative index here means that it's just returned from a
-* notrace'd function.  Recover index to get an original
-* return address.  See ftrace_push_return_trace().
-*
-* TODO: Need to check whether the stack gets corrupted.
-*/
-   if (index < 0)
-   index += FTRACE_NOTRACE_DEPTH;
-
if (unlikely(index < 0 || index >= FTRACE_RETFUNC_DEPTH)) {
ftrace_graph_stop();
WARN_ON(1);
@@ -190,15 +180,6 @@ unsigned long ftrace_return_to_handler(unsigned long 
frame_pointer)
 */
barrier();
current->curr_ret_stack--;
-   /*
-* The curr_ret_stack can be less than -1 only if it was
-* filtered out and it's about to return from the function.
-* Recover the index and continue to trace normal functions.
-*/
-   if (current->curr_ret_stack < -1) {
-   current->curr_ret_stack += FTRACE_NOTRACE_DEPTH;
-   return ret;
-   }
 
if (unlikely(!ret)) {
ftrace_graph_stop();
diff --git a/kernel/trace/trace_functions_graph.c 
b/kernel/trace/trace_functions_graph.c
index ecf543df943b..eaf9b1629956 100644
--- a/kernel/trace/trace_functions_graph.c
+++ b/kernel/trace/trace_functions_graph.c
@@ -115,9 +115,6 @@ unsigned long ftrace_graph_ret_addr(struct task_struct 
*task, int *idx,
if (ret != (unsigned long)return_to_handler)
return ret;
 
-   if (index < -1)
-   index += FTRACE_NOTRACE_DEPTH;
-
if (index < 0)
return ret;
 
@@ -675,10 +672,6 @@ print_graph_entry_leaf(struct trace_iterator *iter,
 
cpu_data = per_cpu_ptr(data->cpu_data, cpu);
 
-   /* If a graph tracer ignored set_graph_notrace */
-   if (call->depth < -1)
-   call->depth += FTRACE_NOTRACE_DEPTH;
-
/*
 * Comments display at + 1 to depth. Since
 * this is a leaf function, keep the comments
@@ -721,10 +714,6 @@ print_graph_entry_nested(struct trace_iterator *iter,
struct fgraph_cpu_data *cpu_data;
int cpu = iter->cpu;
 
-   /* If a graph tracer ignored set_graph_notrace */
-   if (call->depth < -1)
-   call->depth += FTRACE_NOTRACE_DEPTH;
-
cpu_data = per_cpu_ptr(data->cpu_data, cpu);
cpu_data->depth = call->depth;
 
-- 
2.19.1




[for-next][PATCH 04/30] fgraph: Have set_graph_notrace only affect function_graph tracer

2018-12-05 Thread Steven Rostedt
From: "Steven Rostedt (VMware)" 

In order to make the function graph infrastructure more generic, there can
not be code specific for the function_graph tracer in the generic code. This
includes the set_graph_notrace logic, that stops all graph calls when a
function in the set_graph_notrace is hit.

By using the trace_recursion mask, we can use a bit in the current
task_struct to implement the notrace code, and move the logic out of
fgraph.c and into trace_functions_graph.c and keeps it affecting only the
tracer and not all call graph callbacks.

Acked-by: Namhyung Kim 
Reviewed-by: Joel Fernandes (Google) 
Signed-off-by: Steven Rostedt (VMware) 
---
 kernel/trace/fgraph.c| 21 -
 kernel/trace/trace.h |  7 +++
 kernel/trace/trace_functions_graph.c | 22 ++
 3 files changed, 29 insertions(+), 21 deletions(-)

diff --git a/kernel/trace/fgraph.c b/kernel/trace/fgraph.c
index 5ad9c0e88b80..e852b69c0e64 100644
--- a/kernel/trace/fgraph.c
+++ b/kernel/trace/fgraph.c
@@ -64,30 +64,9 @@ ftrace_push_return_trace(unsigned long ret, unsigned long 
func,
return -EBUSY;
}
 
-   /*
-* The curr_ret_stack is an index to ftrace return stack of
-* current task.  Its value should be in [0, FTRACE_RETFUNC_
-* DEPTH) when the function graph tracer is used.  To support
-* filtering out specific functions, it makes the index
-* negative by subtracting huge value (FTRACE_NOTRACE_DEPTH)
-* so when it sees a negative index the ftrace will ignore
-* the record.  And the index gets recovered when returning
-* from the filtered function by adding the FTRACE_NOTRACE_
-* DEPTH and then it'll continue to record functions normally.
-*
-* The curr_ret_stack is initialized to -1 and get increased
-* in this function.  So it can be less than -1 only if it was
-* filtered out via ftrace_graph_notrace_addr() which can be
-* set from set_graph_notrace file in tracefs by user.
-*/
-   if (current->curr_ret_stack < -1)
-   return -EBUSY;
-
calltime = trace_clock_local();
 
index = ++current->curr_ret_stack;
-   if (ftrace_graph_notrace_addr(func))
-   current->curr_ret_stack -= FTRACE_NOTRACE_DEPTH;
barrier();
current->ret_stack[index].ret = ret;
current->ret_stack[index].func = func;
diff --git a/kernel/trace/trace.h b/kernel/trace/trace.h
index 447bd96ee658..f67060a75f38 100644
--- a/kernel/trace/trace.h
+++ b/kernel/trace/trace.h
@@ -534,6 +534,13 @@ enum {
 
TRACE_GRAPH_DEPTH_START_BIT,
TRACE_GRAPH_DEPTH_END_BIT,
+
+   /*
+* To implement set_graph_notrace, if this bit is set, we ignore
+* function graph tracing of called functions, until the return
+* function is called to clear it.
+*/
+   TRACE_GRAPH_NOTRACE_BIT,
 };
 
 #define trace_recursion_set(bit)   do { (current)->trace_recursion |= 
(1<<(bit)); } while (0)
diff --git a/kernel/trace/trace_functions_graph.c 
b/kernel/trace/trace_functions_graph.c
index b846d82c2f95..ecf543df943b 100644
--- a/kernel/trace/trace_functions_graph.c
+++ b/kernel/trace/trace_functions_graph.c
@@ -188,6 +188,18 @@ int trace_graph_entry(struct ftrace_graph_ent *trace)
int cpu;
int pc;
 
+   if (trace_recursion_test(TRACE_GRAPH_NOTRACE_BIT))
+   return 0;
+
+   if (ftrace_graph_notrace_addr(trace->func)) {
+   trace_recursion_set(TRACE_GRAPH_NOTRACE_BIT);
+   /*
+* Need to return 1 to have the return called
+* that will clear the NOTRACE bit.
+*/
+   return 1;
+   }
+
if (!ftrace_trace_task(tr))
return 0;
 
@@ -290,6 +302,11 @@ void trace_graph_return(struct ftrace_graph_ret *trace)
 
ftrace_graph_addr_finish(trace);
 
+   if (trace_recursion_test(TRACE_GRAPH_NOTRACE_BIT)) {
+   trace_recursion_clear(TRACE_GRAPH_NOTRACE_BIT);
+   return;
+   }
+
local_irq_save(flags);
cpu = raw_smp_processor_id();
data = per_cpu_ptr(tr->trace_buffer.data, cpu);
@@ -315,6 +332,11 @@ static void trace_graph_thresh_return(struct 
ftrace_graph_ret *trace)
 {
ftrace_graph_addr_finish(trace);
 
+   if (trace_recursion_test(TRACE_GRAPH_NOTRACE_BIT)) {
+   trace_recursion_clear(TRACE_GRAPH_NOTRACE_BIT);
+   return;
+   }
+
if (tracing_thresh &&
(trace->rettime - trace->calltime < tracing_thresh))
return;
-- 
2.19.1




[for-next][PATCH 11/30] fgraph: Add new fgraph_ops structure to enable function graph hooks

2018-12-05 Thread Steven Rostedt
From: "Steven Rostedt (VMware)" 

Currently the registering of function graph is to pass in a entry and return
function. We need to have a way to associate those functions together where
the entry can determine to run the return hook. Having a structure that
contains both functions will facilitate the process of converting the code
to be able to do such.

This is similar to the way function hooks are enabled (it passes in
ftrace_ops). Instead of passing in the functions to use, a single structure
is passed in to the registering function.

The unregister function is now passed in the fgraph_ops handle. When we
allow more than one callback to the function graph hooks, this will let the
system know which one to remove.

Reviewed-by: Joel Fernandes (Google) 
Signed-off-by: Steven Rostedt (VMware) 
---
 include/linux/ftrace.h   | 21 +++--
 kernel/trace/fgraph.c|  9 -
 kernel/trace/ftrace.c| 10 +++---
 kernel/trace/trace_functions_graph.c | 21 -
 kernel/trace/trace_irqsoff.c | 18 +++---
 kernel/trace/trace_sched_wakeup.c| 16 +++-
 kernel/trace/trace_selftest.c|  8 ++--
 7 files changed, 58 insertions(+), 45 deletions(-)

diff --git a/include/linux/ftrace.h b/include/linux/ftrace.h
index 98625f10d982..21c80491ccde 100644
--- a/include/linux/ftrace.h
+++ b/include/linux/ftrace.h
@@ -749,6 +749,11 @@ typedef int (*trace_func_graph_ent_t)(struct 
ftrace_graph_ent *); /* entry */
 
 #ifdef CONFIG_FUNCTION_GRAPH_TRACER
 
+struct fgraph_ops {
+   trace_func_graph_ent_t  entryfunc;
+   trace_func_graph_ret_t  retfunc;
+};
+
 /*
  * Stack of return addresses for functions
  * of a thread.
@@ -792,8 +797,9 @@ unsigned long ftrace_graph_ret_addr(struct task_struct 
*task, int *idx,
 
 #define FTRACE_RETFUNC_DEPTH 50
 #define FTRACE_RETSTACK_ALLOC_SIZE 32
-extern int register_ftrace_graph(trace_func_graph_ret_t retfunc,
-   trace_func_graph_ent_t entryfunc);
+
+extern int register_ftrace_graph(struct fgraph_ops *ops);
+extern void unregister_ftrace_graph(struct fgraph_ops *ops);
 
 extern bool ftrace_graph_is_dead(void);
 extern void ftrace_graph_stop(void);
@@ -802,8 +808,6 @@ extern void ftrace_graph_stop(void);
 extern trace_func_graph_ret_t ftrace_graph_return;
 extern trace_func_graph_ent_t ftrace_graph_entry;
 
-extern void unregister_ftrace_graph(void);
-
 extern void ftrace_graph_init_task(struct task_struct *t);
 extern void ftrace_graph_exit_task(struct task_struct *t);
 extern void ftrace_graph_init_idle_task(struct task_struct *t, int cpu);
@@ -825,12 +829,9 @@ static inline void ftrace_graph_init_task(struct 
task_struct *t) { }
 static inline void ftrace_graph_exit_task(struct task_struct *t) { }
 static inline void ftrace_graph_init_idle_task(struct task_struct *t, int cpu) 
{ }
 
-static inline int register_ftrace_graph(trace_func_graph_ret_t retfunc,
- trace_func_graph_ent_t entryfunc)
-{
-   return -1;
-}
-static inline void unregister_ftrace_graph(void) { }
+/* Define as macros as fgraph_ops may not be defined */
+#define register_ftrace_graph(ops) ({ -1; })
+#define unregister_ftrace_graph(ops) do { } while (0)
 
 static inline unsigned long
 ftrace_graph_ret_addr(struct task_struct *task, int *idx, unsigned long ret,
diff --git a/kernel/trace/fgraph.c b/kernel/trace/fgraph.c
index 374f3e42e29e..cc35606e9a3e 100644
--- a/kernel/trace/fgraph.c
+++ b/kernel/trace/fgraph.c
@@ -490,8 +490,7 @@ static int start_graph_tracing(void)
return ret;
 }
 
-int register_ftrace_graph(trace_func_graph_ret_t retfunc,
-   trace_func_graph_ent_t entryfunc)
+int register_ftrace_graph(struct fgraph_ops *gops)
 {
int ret = 0;
 
@@ -512,7 +511,7 @@ int register_ftrace_graph(trace_func_graph_ret_t retfunc,
goto out;
}
 
-   ftrace_graph_return = retfunc;
+   ftrace_graph_return = gops->retfunc;
 
/*
 * Update the indirect function to the entryfunc, and the
@@ -520,7 +519,7 @@ int register_ftrace_graph(trace_func_graph_ret_t retfunc,
 * call the update fgraph entry function to determine if
 * the entryfunc should be called directly or not.
 */
-   __ftrace_graph_entry = entryfunc;
+   __ftrace_graph_entry = gops->entryfunc;
ftrace_graph_entry = ftrace_graph_entry_test;
update_function_graph_func();
 
@@ -530,7 +529,7 @@ int register_ftrace_graph(trace_func_graph_ret_t retfunc,
return ret;
 }
 
-void unregister_ftrace_graph(void)
+void unregister_ftrace_graph(struct fgraph_ops *gops)
 {
mutex_lock(_lock);
 
diff --git a/kernel/trace/ftrace.c b/kernel/trace/ftrace.c
index c53533b833cf..d06fe588e650 100644
--- a/kernel/trace/ftrace.c
+++ b/kernel/trace/ftrace.c
@@ -849,15 +849,19 @@ static void profile_graph_return(struct ftrace_graph_ret 
*trace)

[for-next][PATCH 12/30] function_graph: Move ftrace_graph_ret_addr() to fgraph.c

2018-12-05 Thread Steven Rostedt
From: "Steven Rostedt (VMware)" 

Move the function function_graph_ret_addr() to fgraph.c, as the management
of the curr_ret_stack is going to change, and all the accesses to ret_stack
needs to be done in fgraph.c.

Reviewed-by: Joel Fernandes (Google) 
Signed-off-by: Steven Rostedt (VMware) 
---
 kernel/trace/fgraph.c| 55 
 kernel/trace/trace_functions_graph.c | 55 
 2 files changed, 55 insertions(+), 55 deletions(-)

diff --git a/kernel/trace/fgraph.c b/kernel/trace/fgraph.c
index cc35606e9a3e..90fcefcaff2a 100644
--- a/kernel/trace/fgraph.c
+++ b/kernel/trace/fgraph.c
@@ -232,6 +232,61 @@ unsigned long ftrace_return_to_handler(unsigned long 
frame_pointer)
return ret;
 }
 
+/**
+ * ftrace_graph_ret_addr - convert a potentially modified stack return address
+ *to its original value
+ *
+ * This function can be called by stack unwinding code to convert a found stack
+ * return address ('ret') to its original value, in case the function graph
+ * tracer has modified it to be 'return_to_handler'.  If the address hasn't
+ * been modified, the unchanged value of 'ret' is returned.
+ *
+ * 'idx' is a state variable which should be initialized by the caller to zero
+ * before the first call.
+ *
+ * 'retp' is a pointer to the return address on the stack.  It's ignored if
+ * the arch doesn't have HAVE_FUNCTION_GRAPH_RET_ADDR_PTR defined.
+ */
+#ifdef HAVE_FUNCTION_GRAPH_RET_ADDR_PTR
+unsigned long ftrace_graph_ret_addr(struct task_struct *task, int *idx,
+   unsigned long ret, unsigned long *retp)
+{
+   int index = task->curr_ret_stack;
+   int i;
+
+   if (ret != (unsigned long)return_to_handler)
+   return ret;
+
+   if (index < 0)
+   return ret;
+
+   for (i = 0; i <= index; i++)
+   if (task->ret_stack[i].retp == retp)
+   return task->ret_stack[i].ret;
+
+   return ret;
+}
+#else /* !HAVE_FUNCTION_GRAPH_RET_ADDR_PTR */
+unsigned long ftrace_graph_ret_addr(struct task_struct *task, int *idx,
+   unsigned long ret, unsigned long *retp)
+{
+   int task_idx;
+
+   if (ret != (unsigned long)return_to_handler)
+   return ret;
+
+   task_idx = task->curr_ret_stack;
+
+   if (!task->ret_stack || task_idx < *idx)
+   return ret;
+
+   task_idx -= *idx;
+   (*idx)++;
+
+   return task->ret_stack[task_idx].ret;
+}
+#endif /* HAVE_FUNCTION_GRAPH_RET_ADDR_PTR */
+
 static struct ftrace_ops graph_ops = {
.func   = ftrace_stub,
.flags  = FTRACE_OPS_FL_RECURSION_SAFE |
diff --git a/kernel/trace/trace_functions_graph.c 
b/kernel/trace/trace_functions_graph.c
index 140b4b51ab34..c2af1560e856 100644
--- a/kernel/trace/trace_functions_graph.c
+++ b/kernel/trace/trace_functions_graph.c
@@ -94,61 +94,6 @@ static void
 print_graph_duration(struct trace_array *tr, unsigned long long duration,
 struct trace_seq *s, u32 flags);
 
-/**
- * ftrace_graph_ret_addr - convert a potentially modified stack return address
- *to its original value
- *
- * This function can be called by stack unwinding code to convert a found stack
- * return address ('ret') to its original value, in case the function graph
- * tracer has modified it to be 'return_to_handler'.  If the address hasn't
- * been modified, the unchanged value of 'ret' is returned.
- *
- * 'idx' is a state variable which should be initialized by the caller to zero
- * before the first call.
- *
- * 'retp' is a pointer to the return address on the stack.  It's ignored if
- * the arch doesn't have HAVE_FUNCTION_GRAPH_RET_ADDR_PTR defined.
- */
-#ifdef HAVE_FUNCTION_GRAPH_RET_ADDR_PTR
-unsigned long ftrace_graph_ret_addr(struct task_struct *task, int *idx,
-   unsigned long ret, unsigned long *retp)
-{
-   int index = task->curr_ret_stack;
-   int i;
-
-   if (ret != (unsigned long)return_to_handler)
-   return ret;
-
-   if (index < 0)
-   return ret;
-
-   for (i = 0; i <= index; i++)
-   if (task->ret_stack[i].retp == retp)
-   return task->ret_stack[i].ret;
-
-   return ret;
-}
-#else /* !HAVE_FUNCTION_GRAPH_RET_ADDR_PTR */
-unsigned long ftrace_graph_ret_addr(struct task_struct *task, int *idx,
-   unsigned long ret, unsigned long *retp)
-{
-   int task_idx;
-
-   if (ret != (unsigned long)return_to_handler)
-   return ret;
-
-   task_idx = task->curr_ret_stack;
-
-   if (!task->ret_stack || task_idx < *idx)
-   return ret;
-
-   task_idx -= *idx;
-   (*idx)++;
-
-   return task->ret_stack[task_idx].ret;
-}
-#endif /* HAVE_FUNCTION_GRAPH_RET_ADDR_PTR */
-
 int __trace_graph_entry(struct trace_array 

[for-next][PATCH 14/30] tracing: Have trace_stack nr_entries compare not be so subtle

2018-12-05 Thread Steven Rostedt
From: Dan Carpenter 

Dan Carpenter reviewed the trace_stack.c code and figured he found an off by
one bug.

 "From reviewing the code, it seems possible for
  stack_trace_max.nr_entries to be set to .max_entries and in that case we
  would be reading one element beyond the end of the stack_dump_trace[]
  array.  If it's not set to .max_entries then the bug doesn't affect
  runtime."

Although it looks to be the case, it is not. Because we have:

 static unsigned long stack_dump_trace[STACK_TRACE_ENTRIES+1] =
 { [0 ... (STACK_TRACE_ENTRIES)] = ULONG_MAX };

 struct stack_trace stack_trace_max = {
.max_entries= STACK_TRACE_ENTRIES - 1,
.entries= _dump_trace[0],
 };

And:

stack_trace_max.nr_entries = x;
for (; x < i; x++)
stack_dump_trace[x] = ULONG_MAX;

Even if nr_entries equals max_entries, indexing with it into the
stack_dump_trace[] array will not overflow the array. But if it is the case,
the second part of the conditional that tests stack_dump_trace[nr_entries]
to ULONG_MAX will always be true.

By applying Dan's patch, it removes the subtle aspect of it and makes the if
conditional slightly more efficient.

Link: http://lkml.kernel.org/r/20180620110758.crunhd5bfep7zuiz@kili.mountain

Signed-off-by: Dan Carpenter 
Signed-off-by: Steven Rostedt (VMware) 
---
 kernel/trace/trace_stack.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/kernel/trace/trace_stack.c b/kernel/trace/trace_stack.c
index 2b0d1ee3241c..e2a153fc1afc 100644
--- a/kernel/trace/trace_stack.c
+++ b/kernel/trace/trace_stack.c
@@ -286,7 +286,7 @@ __next(struct seq_file *m, loff_t *pos)
 {
long n = *pos - 1;
 
-   if (n > stack_trace_max.nr_entries || stack_dump_trace[n] == ULONG_MAX)
+   if (n >= stack_trace_max.nr_entries || stack_dump_trace[n] == ULONG_MAX)
return NULL;
 
m->private = (void *)n;
-- 
2.19.1




[for-next][PATCH 17/30] tracing: Add tracefs file buffer_percentage

2018-12-05 Thread Steven Rostedt
From: "Steven Rostedt (VMware)" 

Add a "buffer_percentage" file, that allows users to specify how much of the
buffer (percentage of pages) need to be filled before waking up a task
blocked on a per cpu trace_pipe_raw file.

Signed-off-by: Steven Rostedt (VMware) 
---
 kernel/trace/ring_buffer.c | 39 ---
 kernel/trace/trace.c   | 54 +-
 kernel/trace/trace.h   |  1 +
 3 files changed, 77 insertions(+), 17 deletions(-)

diff --git a/kernel/trace/ring_buffer.c b/kernel/trace/ring_buffer.c
index 9edb628603ab..5434c16f2192 100644
--- a/kernel/trace/ring_buffer.c
+++ b/kernel/trace/ring_buffer.c
@@ -489,6 +489,7 @@ struct ring_buffer_per_cpu {
local_t commits;
local_t pages_touched;
local_t pages_read;
+   longlast_pages_touch;
size_t  shortest_full;
unsigned long   read;
unsigned long   read_bytes;
@@ -2632,7 +2633,9 @@ static void rb_commit(struct ring_buffer_per_cpu 
*cpu_buffer,
 static __always_inline void
 rb_wakeups(struct ring_buffer *buffer, struct ring_buffer_per_cpu *cpu_buffer)
 {
-   bool pagebusy;
+   size_t nr_pages;
+   size_t dirty;
+   size_t full;
 
if (buffer->irq_work.waiters_pending) {
buffer->irq_work.waiters_pending = false;
@@ -2646,24 +2649,27 @@ rb_wakeups(struct ring_buffer *buffer, struct 
ring_buffer_per_cpu *cpu_buffer)
irq_work_queue(_buffer->irq_work.work);
}
 
-   pagebusy = cpu_buffer->reader_page == cpu_buffer->commit_page;
+   if (cpu_buffer->last_pages_touch == 
local_read(_buffer->pages_touched))
+   return;
 
-   if (!pagebusy && cpu_buffer->irq_work.full_waiters_pending) {
-   size_t nr_pages;
-   size_t dirty;
-   size_t full;
+   if (cpu_buffer->reader_page == cpu_buffer->commit_page)
+   return;
 
-   full = cpu_buffer->shortest_full;
-   nr_pages = cpu_buffer->nr_pages;
-   dirty = ring_buffer_nr_dirty_pages(buffer, cpu_buffer->cpu);
-   if (full && nr_pages && (dirty * 100) <= full * nr_pages)
-   return;
+   if (!cpu_buffer->irq_work.full_waiters_pending)
+   return;
 
-   cpu_buffer->irq_work.wakeup_full = true;
-   cpu_buffer->irq_work.full_waiters_pending = false;
-   /* irq_work_queue() supplies it's own memory barriers */
-   irq_work_queue(_buffer->irq_work.work);
-   }
+   cpu_buffer->last_pages_touch = local_read(_buffer->pages_touched);
+
+   full = cpu_buffer->shortest_full;
+   nr_pages = cpu_buffer->nr_pages;
+   dirty = ring_buffer_nr_dirty_pages(buffer, cpu_buffer->cpu);
+   if (full && nr_pages && (dirty * 100) <= full * nr_pages)
+   return;
+
+   cpu_buffer->irq_work.wakeup_full = true;
+   cpu_buffer->irq_work.full_waiters_pending = false;
+   /* irq_work_queue() supplies it's own memory barriers */
+   irq_work_queue(_buffer->irq_work.work);
 }
 
 /*
@@ -4394,6 +4400,7 @@ rb_reset_cpu(struct ring_buffer_per_cpu *cpu_buffer)
local_set(_buffer->commits, 0);
local_set(_buffer->pages_touched, 0);
local_set(_buffer->pages_read, 0);
+   cpu_buffer->last_pages_touch = 0;
cpu_buffer->shortest_full = 0;
cpu_buffer->read = 0;
cpu_buffer->read_bytes = 0;
diff --git a/kernel/trace/trace.c b/kernel/trace/trace.c
index 48d5eb22ff33..d382fd1aa4a6 100644
--- a/kernel/trace/trace.c
+++ b/kernel/trace/trace.c
@@ -6948,7 +6948,7 @@ tracing_buffers_splice_read(struct file *file, loff_t 
*ppos,
if ((file->f_flags & O_NONBLOCK) || (flags & SPLICE_F_NONBLOCK))
goto out;
 
-   ret = wait_on_pipe(iter, 1);
+   ret = wait_on_pipe(iter, iter->tr->buffer_percent);
if (ret)
goto out;
 
@@ -7662,6 +7662,53 @@ static const struct file_operations rb_simple_fops = {
.llseek = default_llseek,
 };
 
+static ssize_t
+buffer_percent_read(struct file *filp, char __user *ubuf,
+   size_t cnt, loff_t *ppos)
+{
+   struct trace_array *tr = filp->private_data;
+   char buf[64];
+   int r;
+
+   r = tr->buffer_percent;
+   r = sprintf(buf, "%d\n", r);
+
+   return simple_read_from_buffer(ubuf, cnt, ppos, buf, r);
+}
+
+static ssize_t
+buffer_percent_write(struct file *filp, const char __user *ubuf,
+size_t cnt, loff_t *ppos)
+{
+   struct trace_array *tr = filp->private_data;
+   unsigned long val;
+   int ret;
+
+   ret = kstrtoul_from_user(ubuf, cnt, 10, );
+   if (ret)
+   return ret;
+
+   if (val > 100)
+   return 

[for-next][PATCH 07/30] ftrace: Create new ftrace_internal.h header

2018-12-05 Thread Steven Rostedt
From: "Steven Rostedt (VMware)" 

In order to move function graph infrastructure into its own file (fgraph.h)
it needs to access various functions and variables in ftrace.c that are
currently static. Create a new file called ftrace-internal.h that holds the
function prototypes and the extern declarations of the variables needed by
fgraph.c as well, and make them global in ftrace.c such that they can be
used outside that file.

Reviewed-by: Joel Fernandes (Google) 
Signed-off-by: Steven Rostedt (VMware) 
---
 kernel/trace/ftrace.c  | 76 +++---
 kernel/trace/ftrace_internal.h | 75 +
 2 files changed, 89 insertions(+), 62 deletions(-)
 create mode 100644 kernel/trace/ftrace_internal.h

diff --git a/kernel/trace/ftrace.c b/kernel/trace/ftrace.c
index 77734451cb05..52c89428b0db 100644
--- a/kernel/trace/ftrace.c
+++ b/kernel/trace/ftrace.c
@@ -40,6 +40,7 @@
 #include 
 #include 
 
+#include "ftrace_internal.h"
 #include "trace_output.h"
 #include "trace_stat.h"
 
@@ -77,7 +78,7 @@
 #define ASSIGN_OPS_HASH(opsname, val)
 #endif
 
-static struct ftrace_ops ftrace_list_end __read_mostly = {
+struct ftrace_ops ftrace_list_end __read_mostly = {
.func   = ftrace_stub,
.flags  = FTRACE_OPS_FL_RECURSION_SAFE | FTRACE_OPS_FL_STUB,
INIT_OPS_HASH(ftrace_list_end)
@@ -112,11 +113,11 @@ static void ftrace_update_trampoline(struct ftrace_ops 
*ops);
  */
 static int ftrace_disabled __read_mostly;
 
-static DEFINE_MUTEX(ftrace_lock);
+DEFINE_MUTEX(ftrace_lock);
 
-static struct ftrace_ops __rcu *ftrace_ops_list __read_mostly = 
_list_end;
+struct ftrace_ops __rcu *ftrace_ops_list __read_mostly = _list_end;
 ftrace_func_t ftrace_trace_function __read_mostly = ftrace_stub;
-static struct ftrace_ops global_ops;
+struct ftrace_ops global_ops;
 
 #if ARCH_SUPPORTS_FTRACE_OPS
 static void ftrace_ops_list_func(unsigned long ip, unsigned long parent_ip,
@@ -127,26 +128,6 @@ static void ftrace_ops_no_ops(unsigned long ip, unsigned 
long parent_ip);
 #define ftrace_ops_list_func ((ftrace_func_t)ftrace_ops_no_ops)
 #endif
 
-/*
- * Traverse the ftrace_global_list, invoking all entries.  The reason that we
- * can use rcu_dereference_raw_notrace() is that elements removed from this 
list
- * are simply leaked, so there is no need to interact with a grace-period
- * mechanism.  The rcu_dereference_raw_notrace() calls are needed to handle
- * concurrent insertions into the ftrace_global_list.
- *
- * Silly Alpha and silly pointer-speculation compiler optimizations!
- */
-#define do_for_each_ftrace_op(op, list)\
-   op = rcu_dereference_raw_notrace(list); \
-   do
-
-/*
- * Optimized for just a single item in the list (as that is the normal case).
- */
-#define while_for_each_ftrace_op(op)   \
-   while (likely(op = rcu_dereference_raw_notrace((op)->next)) &&  \
-  unlikely((op) != _list_end))
-
 static inline void ftrace_ops_init(struct ftrace_ops *ops)
 {
 #ifdef CONFIG_DYNAMIC_FTRACE
@@ -187,17 +168,11 @@ static void ftrace_sync_ipi(void *data)
 }
 
 #ifdef CONFIG_FUNCTION_GRAPH_TRACER
-static void update_function_graph_func(void);
-
 /* Both enabled by default (can be cleared by function_graph tracer flags */
 static bool fgraph_sleep_time = true;
 static bool fgraph_graph_time = true;
-
-#else
-static inline void update_function_graph_func(void) { }
 #endif
 
-
 static ftrace_func_t ftrace_ops_get_list_func(struct ftrace_ops *ops)
 {
/*
@@ -334,7 +309,7 @@ static int remove_ftrace_ops(struct ftrace_ops __rcu **list,
 
 static void ftrace_update_trampoline(struct ftrace_ops *ops);
 
-static int __register_ftrace_function(struct ftrace_ops *ops)
+int __register_ftrace_function(struct ftrace_ops *ops)
 {
if (ops->flags & FTRACE_OPS_FL_DELETED)
return -EINVAL;
@@ -375,7 +350,7 @@ static int __register_ftrace_function(struct ftrace_ops 
*ops)
return 0;
 }
 
-static int __unregister_ftrace_function(struct ftrace_ops *ops)
+int __unregister_ftrace_function(struct ftrace_ops *ops)
 {
int ret;
 
@@ -1022,9 +997,7 @@ static __init void ftrace_profile_tracefs(struct dentry 
*d_tracer)
 #endif /* CONFIG_FUNCTION_PROFILER */
 
 #ifdef CONFIG_FUNCTION_GRAPH_TRACER
-static int ftrace_graph_active;
-#else
-# define ftrace_graph_active 0
+int ftrace_graph_active;
 #endif
 
 #ifdef CONFIG_DYNAMIC_FTRACE
@@ -1067,7 +1040,7 @@ static const struct ftrace_hash empty_hash = {
 };
 #define EMPTY_HASH ((struct ftrace_hash *)_hash)
 
-static struct ftrace_ops global_ops = {
+struct ftrace_ops global_ops = {
.func   = ftrace_stub,
.local_hash.notrace_hash= EMPTY_HASH,
.local_hash.filter_hash = EMPTY_HASH,
@@ -1503,7 +1476,7 @@ static bool hash_contains_ip(unsigned long ip,
  * This needs to be called with preemption disabled as
  * the hashes are freed 

[for-next][PATCH 09/30] fgraph: Move function graph specific code into fgraph.c

2018-12-05 Thread Steven Rostedt
From: "Steven Rostedt (VMware)" 

To make the function graph infrastructure more managable, the code needs to
be in its own file (fgraph.c). Move the code that is specific for managing
the function graph infrastructure out of ftrace.c and into fgraph.c

Reviewed-by: Joel Fernandes (Google) 
Signed-off-by: Steven Rostedt (VMware) 
---
 kernel/trace/fgraph.c | 360 -
 kernel/trace/ftrace.c | 368 +-
 2 files changed, 366 insertions(+), 362 deletions(-)

diff --git a/kernel/trace/fgraph.c b/kernel/trace/fgraph.c
index de887a983ac7..374f3e42e29e 100644
--- a/kernel/trace/fgraph.c
+++ b/kernel/trace/fgraph.c
@@ -7,11 +7,27 @@
  *
  * Highly modified by Steven Rostedt (VMware).
  */
+#include 
 #include 
+#include 
 
-#include "trace.h"
+#include 
+
+#include "ftrace_internal.h"
+
+#ifdef CONFIG_DYNAMIC_FTRACE
+#define ASSIGN_OPS_HASH(opsname, val) \
+   .func_hash  = val, \
+   .local_hash.regex_lock  = 
__MUTEX_INITIALIZER(opsname.local_hash.regex_lock),
+#else
+#define ASSIGN_OPS_HASH(opsname, val)
+#endif
 
 static bool kill_ftrace_graph;
+int ftrace_graph_active;
+
+/* Both enabled by default (can be cleared by function_graph tracer flags */
+static bool fgraph_sleep_time = true;
 
 /**
  * ftrace_graph_is_dead - returns true if ftrace_graph_stop() was called
@@ -161,6 +177,31 @@ ftrace_pop_return_trace(struct ftrace_graph_ret *trace, 
unsigned long *ret,
barrier();
 }
 
+/*
+ * Hibernation protection.
+ * The state of the current task is too much unstable during
+ * suspend/restore to disk. We want to protect against that.
+ */
+static int
+ftrace_suspend_notifier_call(struct notifier_block *bl, unsigned long state,
+   void *unused)
+{
+   switch (state) {
+   case PM_HIBERNATION_PREPARE:
+   pause_graph_tracing();
+   break;
+
+   case PM_POST_HIBERNATION:
+   unpause_graph_tracing();
+   break;
+   }
+   return NOTIFY_DONE;
+}
+
+static struct notifier_block ftrace_suspend_notifier = {
+   .notifier_call = ftrace_suspend_notifier_call,
+};
+
 /*
  * Send the trace to the ring-buffer.
  * @return the original return address.
@@ -190,3 +231,320 @@ unsigned long ftrace_return_to_handler(unsigned long 
frame_pointer)
 
return ret;
 }
+
+static struct ftrace_ops graph_ops = {
+   .func   = ftrace_stub,
+   .flags  = FTRACE_OPS_FL_RECURSION_SAFE |
+  FTRACE_OPS_FL_INITIALIZED |
+  FTRACE_OPS_FL_PID |
+  FTRACE_OPS_FL_STUB,
+#ifdef FTRACE_GRAPH_TRAMP_ADDR
+   .trampoline = FTRACE_GRAPH_TRAMP_ADDR,
+   /* trampoline_size is only needed for dynamically allocated tramps */
+#endif
+   ASSIGN_OPS_HASH(graph_ops, _ops.local_hash)
+};
+
+void ftrace_graph_sleep_time_control(bool enable)
+{
+   fgraph_sleep_time = enable;
+}
+
+int ftrace_graph_entry_stub(struct ftrace_graph_ent *trace)
+{
+   return 0;
+}
+
+/* The callbacks that hook a function */
+trace_func_graph_ret_t ftrace_graph_return =
+   (trace_func_graph_ret_t)ftrace_stub;
+trace_func_graph_ent_t ftrace_graph_entry = ftrace_graph_entry_stub;
+static trace_func_graph_ent_t __ftrace_graph_entry = ftrace_graph_entry_stub;
+
+/* Try to assign a return stack array on FTRACE_RETSTACK_ALLOC_SIZE tasks. */
+static int alloc_retstack_tasklist(struct ftrace_ret_stack **ret_stack_list)
+{
+   int i;
+   int ret = 0;
+   int start = 0, end = FTRACE_RETSTACK_ALLOC_SIZE;
+   struct task_struct *g, *t;
+
+   for (i = 0; i < FTRACE_RETSTACK_ALLOC_SIZE; i++) {
+   ret_stack_list[i] =
+   kmalloc_array(FTRACE_RETFUNC_DEPTH,
+ sizeof(struct ftrace_ret_stack),
+ GFP_KERNEL);
+   if (!ret_stack_list[i]) {
+   start = 0;
+   end = i;
+   ret = -ENOMEM;
+   goto free;
+   }
+   }
+
+   read_lock(_lock);
+   do_each_thread(g, t) {
+   if (start == end) {
+   ret = -EAGAIN;
+   goto unlock;
+   }
+
+   if (t->ret_stack == NULL) {
+   atomic_set(>tracing_graph_pause, 0);
+   atomic_set(>trace_overrun, 0);
+   t->curr_ret_stack = -1;
+   t->curr_ret_depth = -1;
+   /* Make sure the tasks see the -1 first: */
+   smp_wmb();
+   t->ret_stack = ret_stack_list[start++];
+   }
+   } while_each_thread(g, t);
+
+unlock:
+   read_unlock(_lock);
+free:
+   for (i = start; i < end; i++)
+   

[for-next][PATCH 06/30] function_graph: Remove the use of FTRACE_NOTRACE_DEPTH

2018-12-05 Thread Steven Rostedt
From: "Steven Rostedt (VMware)" 

The curr_ret_stack is no longer set to a negative value when a function is
not to be traced by the function graph tracer. Remove the usage of
FTRACE_NOTRACE_DEPTH, as it is no longer needed.

Reviewed-by: Joel Fernandes (Google) 
Signed-off-by: Steven Rostedt (VMware) 
---
 include/linux/ftrace.h   |  1 -
 kernel/trace/fgraph.c| 19 ---
 kernel/trace/trace_functions_graph.c | 11 ---
 3 files changed, 31 deletions(-)

diff --git a/include/linux/ftrace.h b/include/linux/ftrace.h
index 10bd46434908..98625f10d982 100644
--- a/include/linux/ftrace.h
+++ b/include/linux/ftrace.h
@@ -790,7 +790,6 @@ unsigned long ftrace_graph_ret_addr(struct task_struct 
*task, int *idx,
  */
 #define __notrace_funcgraphnotrace
 
-#define FTRACE_NOTRACE_DEPTH 65536
 #define FTRACE_RETFUNC_DEPTH 50
 #define FTRACE_RETSTACK_ALLOC_SIZE 32
 extern int register_ftrace_graph(trace_func_graph_ret_t retfunc,
diff --git a/kernel/trace/fgraph.c b/kernel/trace/fgraph.c
index e852b69c0e64..de887a983ac7 100644
--- a/kernel/trace/fgraph.c
+++ b/kernel/trace/fgraph.c
@@ -112,16 +112,6 @@ ftrace_pop_return_trace(struct ftrace_graph_ret *trace, 
unsigned long *ret,
 
index = current->curr_ret_stack;
 
-   /*
-* A negative index here means that it's just returned from a
-* notrace'd function.  Recover index to get an original
-* return address.  See ftrace_push_return_trace().
-*
-* TODO: Need to check whether the stack gets corrupted.
-*/
-   if (index < 0)
-   index += FTRACE_NOTRACE_DEPTH;
-
if (unlikely(index < 0 || index >= FTRACE_RETFUNC_DEPTH)) {
ftrace_graph_stop();
WARN_ON(1);
@@ -190,15 +180,6 @@ unsigned long ftrace_return_to_handler(unsigned long 
frame_pointer)
 */
barrier();
current->curr_ret_stack--;
-   /*
-* The curr_ret_stack can be less than -1 only if it was
-* filtered out and it's about to return from the function.
-* Recover the index and continue to trace normal functions.
-*/
-   if (current->curr_ret_stack < -1) {
-   current->curr_ret_stack += FTRACE_NOTRACE_DEPTH;
-   return ret;
-   }
 
if (unlikely(!ret)) {
ftrace_graph_stop();
diff --git a/kernel/trace/trace_functions_graph.c 
b/kernel/trace/trace_functions_graph.c
index ecf543df943b..eaf9b1629956 100644
--- a/kernel/trace/trace_functions_graph.c
+++ b/kernel/trace/trace_functions_graph.c
@@ -115,9 +115,6 @@ unsigned long ftrace_graph_ret_addr(struct task_struct 
*task, int *idx,
if (ret != (unsigned long)return_to_handler)
return ret;
 
-   if (index < -1)
-   index += FTRACE_NOTRACE_DEPTH;
-
if (index < 0)
return ret;
 
@@ -675,10 +672,6 @@ print_graph_entry_leaf(struct trace_iterator *iter,
 
cpu_data = per_cpu_ptr(data->cpu_data, cpu);
 
-   /* If a graph tracer ignored set_graph_notrace */
-   if (call->depth < -1)
-   call->depth += FTRACE_NOTRACE_DEPTH;
-
/*
 * Comments display at + 1 to depth. Since
 * this is a leaf function, keep the comments
@@ -721,10 +714,6 @@ print_graph_entry_nested(struct trace_iterator *iter,
struct fgraph_cpu_data *cpu_data;
int cpu = iter->cpu;
 
-   /* If a graph tracer ignored set_graph_notrace */
-   if (call->depth < -1)
-   call->depth += FTRACE_NOTRACE_DEPTH;
-
cpu_data = per_cpu_ptr(data->cpu_data, cpu);
cpu_data->depth = call->depth;
 
-- 
2.19.1




[for-next][PATCH 24/30] tracing/kprobes: Use dyn_event framework for kprobe events

2018-12-05 Thread Steven Rostedt
From: Masami Hiramatsu 

Use dyn_event framework for kprobe events. This shows
kprobe events on "tracing/dynamic_events" file.

User can also define new events via tracing/dynamic_events.

Link: 
http://lkml.kernel.org/r/154140855646.17322.6619219995865980392.stgit@devbox

Reviewed-by: Tom Zanussi 
Tested-by: Tom Zanussi 
Signed-off-by: Masami Hiramatsu 
Signed-off-by: Steven Rostedt (VMware) 
---
 Documentation/trace/kprobetrace.rst |   3 +
 kernel/trace/Kconfig|   1 +
 kernel/trace/trace_kprobe.c | 319 +++-
 kernel/trace/trace_probe.c  |  27 +++
 kernel/trace/trace_probe.h  |   2 +
 5 files changed, 207 insertions(+), 145 deletions(-)

diff --git a/Documentation/trace/kprobetrace.rst 
b/Documentation/trace/kprobetrace.rst
index 47e765c2f2c3..235ce2ab131a 100644
--- a/Documentation/trace/kprobetrace.rst
+++ b/Documentation/trace/kprobetrace.rst
@@ -20,6 +20,9 @@ current_tracer. Instead of that, add probe points via
 /sys/kernel/debug/tracing/kprobe_events, and enable it via
 /sys/kernel/debug/tracing/events/kprobes//enable.
 
+You can also use /sys/kernel/debug/tracing/dynamic_events instead of
+kprobe_events. That interface will provide unified access to other
+dynamic events too.
 
 Synopsis of kprobe_events
 -
diff --git a/kernel/trace/Kconfig b/kernel/trace/Kconfig
index bf2e8a5a91f1..c0f6b0105609 100644
--- a/kernel/trace/Kconfig
+++ b/kernel/trace/Kconfig
@@ -461,6 +461,7 @@ config KPROBE_EVENTS
bool "Enable kprobes-based dynamic events"
select TRACING
select PROBE_EVENTS
+   select DYNAMIC_EVENTS
default y
help
  This allows the user to add tracing events (similar to tracepoints)
diff --git a/kernel/trace/trace_kprobe.c b/kernel/trace/trace_kprobe.c
index d313bcc259dc..bdf8c2ad5152 100644
--- a/kernel/trace/trace_kprobe.c
+++ b/kernel/trace/trace_kprobe.c
@@ -12,6 +12,7 @@
 #include 
 #include 
 
+#include "trace_dynevent.h"
 #include "trace_kprobe_selftest.h"
 #include "trace_probe.h"
 #include "trace_probe_tmpl.h"
@@ -19,17 +20,51 @@
 #define KPROBE_EVENT_SYSTEM "kprobes"
 #define KRETPROBE_MAXACTIVE_MAX 4096
 
+static int trace_kprobe_create(int argc, const char **argv);
+static int trace_kprobe_show(struct seq_file *m, struct dyn_event *ev);
+static int trace_kprobe_release(struct dyn_event *ev);
+static bool trace_kprobe_is_busy(struct dyn_event *ev);
+static bool trace_kprobe_match(const char *system, const char *event,
+  struct dyn_event *ev);
+
+static struct dyn_event_operations trace_kprobe_ops = {
+   .create = trace_kprobe_create,
+   .show = trace_kprobe_show,
+   .is_busy = trace_kprobe_is_busy,
+   .free = trace_kprobe_release,
+   .match = trace_kprobe_match,
+};
+
 /**
  * Kprobe event core functions
  */
 struct trace_kprobe {
-   struct list_headlist;
+   struct dyn_eventdevent;
struct kretproberp; /* Use rp.kp for kprobe use */
unsigned long __percpu *nhit;
const char  *symbol;/* symbol name */
struct trace_probe  tp;
 };
 
+static bool is_trace_kprobe(struct dyn_event *ev)
+{
+   return ev->ops == _kprobe_ops;
+}
+
+static struct trace_kprobe *to_trace_kprobe(struct dyn_event *ev)
+{
+   return container_of(ev, struct trace_kprobe, devent);
+}
+
+/**
+ * for_each_trace_kprobe - iterate over the trace_kprobe list
+ * @pos:   the struct trace_kprobe * for each entry
+ * @dpos:  the struct dyn_event * to use as a loop cursor
+ */
+#define for_each_trace_kprobe(pos, dpos)   \
+   for_each_dyn_event(dpos)\
+   if (is_trace_kprobe(dpos) && (pos = to_trace_kprobe(dpos)))
+
 #define SIZEOF_TRACE_KPROBE(n) \
(offsetof(struct trace_kprobe, tp.args) +   \
(sizeof(struct probe_arg) * (n)))
@@ -81,6 +116,22 @@ static nokprobe_inline bool 
trace_kprobe_module_exist(struct trace_kprobe *tk)
return ret;
 }
 
+static bool trace_kprobe_is_busy(struct dyn_event *ev)
+{
+   struct trace_kprobe *tk = to_trace_kprobe(ev);
+
+   return trace_probe_is_enabled(>tp);
+}
+
+static bool trace_kprobe_match(const char *system, const char *event,
+  struct dyn_event *ev)
+{
+   struct trace_kprobe *tk = to_trace_kprobe(ev);
+
+   return strcmp(trace_event_name(>tp.call), event) == 0 &&
+   (!system || strcmp(tk->tp.call.class->system, system) == 0);
+}
+
 static nokprobe_inline unsigned long trace_kprobe_nhit(struct trace_kprobe *tk)
 {
unsigned long nhit = 0;
@@ -128,9 +179,6 @@ bool trace_kprobe_error_injectable(struct trace_event_call 
*call)
 static int register_kprobe_event(struct trace_kprobe *tk);
 static int unregister_kprobe_event(struct trace_kprobe *tk);
 
-static DEFINE_MUTEX(probe_lock);
-static LIST_HEAD(probe_list);
-
 static int kprobe_dispatcher(struct 

[for-next][PATCH 10/30] tracing: Rearrange functions in trace_sched_wakeup.c

2018-12-05 Thread Steven Rostedt
From: "Steven Rostedt (VMware)" 

Rearrange the functions in trace_sched_wakeup.c so that there are fewer
 #ifdef CONFIG_FUNCTION_TRACER and #ifdef CONFIG_FUNCTION_GRAPH_TRACER,
instead of having the #ifdefs spread all over.

No functional change is made.

Signed-off-by: Steven Rostedt (VMware) 
---
 kernel/trace/trace_sched_wakeup.c | 272 ++
 1 file changed, 130 insertions(+), 142 deletions(-)

diff --git a/kernel/trace/trace_sched_wakeup.c 
b/kernel/trace/trace_sched_wakeup.c
index 7d04b9890755..2ce78100b4d3 100644
--- a/kernel/trace/trace_sched_wakeup.c
+++ b/kernel/trace/trace_sched_wakeup.c
@@ -35,26 +35,19 @@ static arch_spinlock_t wakeup_lock =
 
 static void wakeup_reset(struct trace_array *tr);
 static void __wakeup_reset(struct trace_array *tr);
+static int start_func_tracer(struct trace_array *tr, int graph);
+static void stop_func_tracer(struct trace_array *tr, int graph);
 
 static int save_flags;
 
 #ifdef CONFIG_FUNCTION_GRAPH_TRACER
-static int wakeup_display_graph(struct trace_array *tr, int set);
 # define is_graph(tr) ((tr)->trace_flags & TRACE_ITER_DISPLAY_GRAPH)
 #else
-static inline int wakeup_display_graph(struct trace_array *tr, int set)
-{
-   return 0;
-}
 # define is_graph(tr) false
 #endif
 
-
 #ifdef CONFIG_FUNCTION_TRACER
 
-static int wakeup_graph_entry(struct ftrace_graph_ent *trace);
-static void wakeup_graph_return(struct ftrace_graph_ret *trace);
-
 static bool function_enabled;
 
 /*
@@ -104,122 +97,8 @@ func_prolog_preempt_disable(struct trace_array *tr,
return 0;
 }
 
-/*
- * wakeup uses its own tracer function to keep the overhead down:
- */
-static void
-wakeup_tracer_call(unsigned long ip, unsigned long parent_ip,
-  struct ftrace_ops *op, struct pt_regs *pt_regs)
-{
-   struct trace_array *tr = wakeup_trace;
-   struct trace_array_cpu *data;
-   unsigned long flags;
-   int pc;
-
-   if (!func_prolog_preempt_disable(tr, , ))
-   return;
-
-   local_irq_save(flags);
-   trace_function(tr, ip, parent_ip, flags, pc);
-   local_irq_restore(flags);
-
-   atomic_dec(>disabled);
-   preempt_enable_notrace();
-}
-
-static int register_wakeup_function(struct trace_array *tr, int graph, int set)
-{
-   int ret;
-
-   /* 'set' is set if TRACE_ITER_FUNCTION is about to be set */
-   if (function_enabled || (!set && !(tr->trace_flags & 
TRACE_ITER_FUNCTION)))
-   return 0;
-
-   if (graph)
-   ret = register_ftrace_graph(_graph_return,
-   _graph_entry);
-   else
-   ret = register_ftrace_function(tr->ops);
-
-   if (!ret)
-   function_enabled = true;
-
-   return ret;
-}
-
-static void unregister_wakeup_function(struct trace_array *tr, int graph)
-{
-   if (!function_enabled)
-   return;
-
-   if (graph)
-   unregister_ftrace_graph();
-   else
-   unregister_ftrace_function(tr->ops);
-
-   function_enabled = false;
-}
-
-static int wakeup_function_set(struct trace_array *tr, u32 mask, int set)
-{
-   if (!(mask & TRACE_ITER_FUNCTION))
-   return 0;
-
-   if (set)
-   register_wakeup_function(tr, is_graph(tr), 1);
-   else
-   unregister_wakeup_function(tr, is_graph(tr));
-   return 1;
-}
-#else
-static int register_wakeup_function(struct trace_array *tr, int graph, int set)
-{
-   return 0;
-}
-static void unregister_wakeup_function(struct trace_array *tr, int graph) { }
-static int wakeup_function_set(struct trace_array *tr, u32 mask, int set)
-{
-   return 0;
-}
-#endif /* CONFIG_FUNCTION_TRACER */
-
-static int wakeup_flag_changed(struct trace_array *tr, u32 mask, int set)
-{
-   struct tracer *tracer = tr->current_trace;
-
-   if (wakeup_function_set(tr, mask, set))
-   return 0;
-
 #ifdef CONFIG_FUNCTION_GRAPH_TRACER
-   if (mask & TRACE_ITER_DISPLAY_GRAPH)
-   return wakeup_display_graph(tr, set);
-#endif
-
-   return trace_keep_overwrite(tracer, mask, set);
-}
 
-static int start_func_tracer(struct trace_array *tr, int graph)
-{
-   int ret;
-
-   ret = register_wakeup_function(tr, graph, 0);
-
-   if (!ret && tracing_is_enabled())
-   tracer_enabled = 1;
-   else
-   tracer_enabled = 0;
-
-   return ret;
-}
-
-static void stop_func_tracer(struct trace_array *tr, int graph)
-{
-   tracer_enabled = 0;
-
-   unregister_wakeup_function(tr, graph);
-}
-
-#ifdef CONFIG_FUNCTION_GRAPH_TRACER
 static int wakeup_display_graph(struct trace_array *tr, int set)
 {
if (!(is_graph(tr) ^ set))
@@ -318,20 +197,94 @@ static void wakeup_print_header(struct seq_file *s)
else
trace_default_header(s);
 }
+#else /* CONFIG_FUNCTION_GRAPH_TRACER */
+static int wakeup_graph_entry(struct ftrace_graph_ent *trace)
+{
+   return -1;
+}

[for-next][PATCH 22/30] tracing: Integrate similar probe argument parsers

2018-12-05 Thread Steven Rostedt
From: Masami Hiramatsu 

Integrate similar argument parsers for kprobes and uprobes events
into traceprobe_parse_probe_arg().

Link: 
http://lkml.kernel.org/r/154140850016.17322.9836787731210512176.stgit@devbox

Reviewed-by: Tom Zanussi 
Tested-by: Tom Zanussi 
Signed-off-by: Masami Hiramatsu 
Signed-off-by: Steven Rostedt (VMware) 
---
 kernel/trace/trace_kprobe.c | 48 ++---
 kernel/trace/trace_probe.c  | 47 +---
 kernel/trace/trace_probe.h  |  7 ++
 kernel/trace/trace_uprobe.c | 44 ++
 4 files changed, 50 insertions(+), 96 deletions(-)

diff --git a/kernel/trace/trace_kprobe.c b/kernel/trace/trace_kprobe.c
index fec67188c4d2..d313bcc259dc 100644
--- a/kernel/trace/trace_kprobe.c
+++ b/kernel/trace/trace_kprobe.c
@@ -548,7 +548,6 @@ static int create_trace_kprobe(int argc, char **argv)
bool is_return = false, is_delete = false;
char *symbol = NULL, *event = NULL, *group = NULL;
int maxactive = 0;
-   char *arg;
long offset = 0;
void *addr = NULL;
char buf[MAX_EVENT_NAME_LEN];
@@ -676,53 +675,10 @@ static int create_trace_kprobe(int argc, char **argv)
}
 
/* parse arguments */
-   ret = 0;
for (i = 0; i < argc && i < MAX_TRACE_ARGS; i++) {
-   struct probe_arg *parg = >tp.args[i];
-
-   /* Increment count for freeing args in error case */
-   tk->tp.nr_args++;
-
-   /* Parse argument name */
-   arg = strchr(argv[i], '=');
-   if (arg) {
-   *arg++ = '\0';
-   parg->name = kstrdup(argv[i], GFP_KERNEL);
-   } else {
-   arg = argv[i];
-   /* If argument name is omitted, set "argN" */
-   snprintf(buf, MAX_EVENT_NAME_LEN, "arg%d", i + 1);
-   parg->name = kstrdup(buf, GFP_KERNEL);
-   }
-
-   if (!parg->name) {
-   pr_info("Failed to allocate argument[%d] name.\n", i);
-   ret = -ENOMEM;
-   goto error;
-   }
-
-   if (!is_good_name(parg->name)) {
-   pr_info("Invalid argument[%d] name: %s\n",
-   i, parg->name);
-   ret = -EINVAL;
-   goto error;
-   }
-
-   if (traceprobe_conflict_field_name(parg->name,
-   tk->tp.args, i)) {
-   pr_info("Argument[%d] name '%s' conflicts with "
-   "another field.\n", i, argv[i]);
-   ret = -EINVAL;
-   goto error;
-   }
-
-   /* Parse fetch argument */
-   ret = traceprobe_parse_probe_arg(arg, >tp.size, parg,
-flags);
-   if (ret) {
-   pr_info("Parse error at argument[%d]. (%d)\n", i, ret);
+   ret = traceprobe_parse_probe_arg(>tp, i, argv[i], flags);
+   if (ret)
goto error;
-   }
}
 
ret = register_trace_kprobe(tk);
diff --git a/kernel/trace/trace_probe.c b/kernel/trace/trace_probe.c
index bd30e9398d2a..449150c6a87f 100644
--- a/kernel/trace/trace_probe.c
+++ b/kernel/trace/trace_probe.c
@@ -348,7 +348,7 @@ static int __parse_bitfield_probe_arg(const char *bf,
 }
 
 /* String length checking wrapper */
-int traceprobe_parse_probe_arg(char *arg, ssize_t *size,
+static int traceprobe_parse_probe_arg_body(char *arg, ssize_t *size,
struct probe_arg *parg, unsigned int flags)
 {
struct fetch_insn *code, *scode, *tmp = NULL;
@@ -491,8 +491,8 @@ int traceprobe_parse_probe_arg(char *arg, ssize_t *size,
 }
 
 /* Return 1 if name is reserved or already used by another argument */
-int traceprobe_conflict_field_name(const char *name,
-  struct probe_arg *args, int narg)
+static int traceprobe_conflict_field_name(const char *name,
+ struct probe_arg *args, int narg)
 {
int i;
 
@@ -507,6 +507,47 @@ int traceprobe_conflict_field_name(const char *name,
return 0;
 }
 
+int traceprobe_parse_probe_arg(struct trace_probe *tp, int i, char *arg,
+   unsigned int flags)
+{
+   struct probe_arg *parg = >args[i];
+   char *body;
+   int ret;
+
+   /* Increment count for freeing args in error case */
+   tp->nr_args++;
+
+   body = strchr(arg, '=');
+   if (body) {
+   parg->name = kmemdup_nul(arg, body - arg, GFP_KERNEL);
+   body++;
+   } else {
+   /* If argument name is omitted, set "argN" */
+   parg->name = kasprintf(GFP_KERNEL, "arg%d", i + 1);
+  

[for-next][PATCH 19/30] tracing/uprobes: Add busy check when cleanup all uprobes

2018-12-05 Thread Steven Rostedt
From: Masami Hiramatsu 

Add a busy check loop in cleanup_all_probes() before
trying to remove all events in uprobe_events, the same way
that kprobe_events does.

Without this change, writing null to uprobe_events will
try to remove events but if one of them is enabled, it will
stop there leaving some events cleared and others not clceared.

With this change, writing null to uprobe_events makes
sure all events are not enabled before removing events.
So, it clears all events, or returns an error (-EBUSY)
with keeping all events.

Link: 
http://lkml.kernel.org/r/154140841557.17322.12653952888762532401.stgit@devbox

Reviewed-by: Tom Zanussi 
Tested-by: Tom Zanussi 
Signed-off-by: Masami Hiramatsu 
Signed-off-by: Steven Rostedt (VMware) 
---
 kernel/trace/trace_uprobe.c | 7 +++
 1 file changed, 7 insertions(+)

diff --git a/kernel/trace/trace_uprobe.c b/kernel/trace/trace_uprobe.c
index 31ea48eceda1..b708e4ff7ea7 100644
--- a/kernel/trace/trace_uprobe.c
+++ b/kernel/trace/trace_uprobe.c
@@ -587,12 +587,19 @@ static int cleanup_all_probes(void)
int ret = 0;
 
mutex_lock(_lock);
+   /* Ensure no probe is in use. */
+   list_for_each_entry(tu, _list, list)
+   if (trace_probe_is_enabled(>tp)) {
+   ret = -EBUSY;
+   goto end;
+   }
while (!list_empty(_list)) {
tu = list_entry(uprobe_list.next, struct trace_uprobe, list);
ret = unregister_trace_uprobe(tu);
if (ret)
break;
}
+end:
mutex_unlock(_lock);
return ret;
 }
-- 
2.19.1




[for-next][PATCH 15/30] scripts/recordmcount.{c,pl}: support -ffunction-sections .text.* section names

2018-12-05 Thread Steven Rostedt
From: Joe Lawrence 

When building with -ffunction-sections, the compiler will place each
function into its own ELF section, prefixed with ".text".  For example,
a simple test module with functions test_module_do_work() and
test_module_wq_func():

  % objdump --section-headers test_module.o | awk '/\.text/{print $2}'
  .text
  .text.test_module_do_work
  .text.test_module_wq_func
  .init.text
  .exit.text

Adjust the recordmcount scripts to look for ".text" as a section name
prefix.  This will ensure that those functions will be included in the
__mcount_loc relocations:

  % objdump --reloc --section __mcount_loc test_module.o
  OFFSET   TYPE  VALUE
   R_X86_64_64   .text.test_module_do_work
  0008 R_X86_64_64   .text.test_module_wq_func
  0010 R_X86_64_64   .init.text

Link: 
http://lkml.kernel.org/r/1542745158-25392-2-git-send-email-joe.lawre...@redhat.com

Signed-off-by: Joe Lawrence 
Signed-off-by: Steven Rostedt (VMware) 
---
 scripts/recordmcount.c  |  2 +-
 scripts/recordmcount.pl | 13 +
 2 files changed, 14 insertions(+), 1 deletion(-)

diff --git a/scripts/recordmcount.c b/scripts/recordmcount.c
index 895c40e8679f..a50a2aa963ad 100644
--- a/scripts/recordmcount.c
+++ b/scripts/recordmcount.c
@@ -397,7 +397,7 @@ static uint32_t (*w2)(uint16_t);
 static int
 is_mcounted_section_name(char const *const txtname)
 {
-   return strcmp(".text",   txtname) == 0 ||
+   return strncmp(".text",  txtname, 5) == 0 ||
strcmp(".init.text", txtname) == 0 ||
strcmp(".ref.text",  txtname) == 0 ||
strcmp(".sched.text",txtname) == 0 ||
diff --git a/scripts/recordmcount.pl b/scripts/recordmcount.pl
index f599031260d5..68841d01162c 100755
--- a/scripts/recordmcount.pl
+++ b/scripts/recordmcount.pl
@@ -142,6 +142,11 @@ my %text_sections = (
  ".text.unlikely" => 1,
 );
 
+# Acceptable section-prefixes to record.
+my %text_section_prefixes = (
+ ".text." => 1,
+);
+
 # Note: we are nice to C-programmers here, thus we skip the '||='-idiom.
 $objdump = 'objdump' if (!$objdump);
 $objcopy = 'objcopy' if (!$objcopy);
@@ -519,6 +524,14 @@ while () {
 
# Only record text sections that we know are safe
$read_function = defined($text_sections{$1});
+   if (!$read_function) {
+   foreach my $prefix (keys %text_section_prefixes) {
+   if (substr($1, 0, length $prefix) eq $prefix) {
+   $read_function = 1;
+   last;
+   }
+   }
+   }
# print out any recorded offsets
update_funcs();
 
-- 
2.19.1




[for-next][PATCH 18/30] tracing: Change default buffer_percent to 50

2018-12-05 Thread Steven Rostedt
From: "Steven Rostedt (VMware)" 

After running several tests, it appears that having the reader wait till
half the buffer is full before starting to read (and causing its own events
to fill up the ring buffer constantly), works well. It keeps trace-cmd (the
main user of this interface) from dominating the traces it records.

Signed-off-by: Steven Rostedt (VMware) 
---
 kernel/trace/trace.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/kernel/trace/trace.c b/kernel/trace/trace.c
index d382fd1aa4a6..194c01838e3f 100644
--- a/kernel/trace/trace.c
+++ b/kernel/trace/trace.c
@@ -8017,7 +8017,7 @@ init_tracer_tracefs(struct trace_array *tr, struct dentry 
*d_tracer)
trace_create_file("timestamp_mode", 0444, d_tracer, tr,
  _time_stamp_mode_fops);
 
-   tr->buffer_percent = 1;
+   tr->buffer_percent = 50;
 
trace_create_file("buffer_percent", 0444, d_tracer,
tr, _percent_fops);
-- 
2.19.1




[for-next][PATCH 18/30] tracing: Change default buffer_percent to 50

2018-12-05 Thread Steven Rostedt
From: "Steven Rostedt (VMware)" 

After running several tests, it appears that having the reader wait till
half the buffer is full before starting to read (and causing its own events
to fill up the ring buffer constantly), works well. It keeps trace-cmd (the
main user of this interface) from dominating the traces it records.

Signed-off-by: Steven Rostedt (VMware) 
---
 kernel/trace/trace.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/kernel/trace/trace.c b/kernel/trace/trace.c
index d382fd1aa4a6..194c01838e3f 100644
--- a/kernel/trace/trace.c
+++ b/kernel/trace/trace.c
@@ -8017,7 +8017,7 @@ init_tracer_tracefs(struct trace_array *tr, struct dentry 
*d_tracer)
trace_create_file("timestamp_mode", 0444, d_tracer, tr,
  _time_stamp_mode_fops);
 
-   tr->buffer_percent = 1;
+   tr->buffer_percent = 50;
 
trace_create_file("buffer_percent", 0444, d_tracer,
tr, _percent_fops);
-- 
2.19.1




[for-next][PATCH 24/30] tracing/kprobes: Use dyn_event framework for kprobe events

2018-12-05 Thread Steven Rostedt
From: Masami Hiramatsu 

Use dyn_event framework for kprobe events. This shows
kprobe events on "tracing/dynamic_events" file.

User can also define new events via tracing/dynamic_events.

Link: 
http://lkml.kernel.org/r/154140855646.17322.6619219995865980392.stgit@devbox

Reviewed-by: Tom Zanussi 
Tested-by: Tom Zanussi 
Signed-off-by: Masami Hiramatsu 
Signed-off-by: Steven Rostedt (VMware) 
---
 Documentation/trace/kprobetrace.rst |   3 +
 kernel/trace/Kconfig|   1 +
 kernel/trace/trace_kprobe.c | 319 +++-
 kernel/trace/trace_probe.c  |  27 +++
 kernel/trace/trace_probe.h  |   2 +
 5 files changed, 207 insertions(+), 145 deletions(-)

diff --git a/Documentation/trace/kprobetrace.rst 
b/Documentation/trace/kprobetrace.rst
index 47e765c2f2c3..235ce2ab131a 100644
--- a/Documentation/trace/kprobetrace.rst
+++ b/Documentation/trace/kprobetrace.rst
@@ -20,6 +20,9 @@ current_tracer. Instead of that, add probe points via
 /sys/kernel/debug/tracing/kprobe_events, and enable it via
 /sys/kernel/debug/tracing/events/kprobes//enable.
 
+You can also use /sys/kernel/debug/tracing/dynamic_events instead of
+kprobe_events. That interface will provide unified access to other
+dynamic events too.
 
 Synopsis of kprobe_events
 -
diff --git a/kernel/trace/Kconfig b/kernel/trace/Kconfig
index bf2e8a5a91f1..c0f6b0105609 100644
--- a/kernel/trace/Kconfig
+++ b/kernel/trace/Kconfig
@@ -461,6 +461,7 @@ config KPROBE_EVENTS
bool "Enable kprobes-based dynamic events"
select TRACING
select PROBE_EVENTS
+   select DYNAMIC_EVENTS
default y
help
  This allows the user to add tracing events (similar to tracepoints)
diff --git a/kernel/trace/trace_kprobe.c b/kernel/trace/trace_kprobe.c
index d313bcc259dc..bdf8c2ad5152 100644
--- a/kernel/trace/trace_kprobe.c
+++ b/kernel/trace/trace_kprobe.c
@@ -12,6 +12,7 @@
 #include 
 #include 
 
+#include "trace_dynevent.h"
 #include "trace_kprobe_selftest.h"
 #include "trace_probe.h"
 #include "trace_probe_tmpl.h"
@@ -19,17 +20,51 @@
 #define KPROBE_EVENT_SYSTEM "kprobes"
 #define KRETPROBE_MAXACTIVE_MAX 4096
 
+static int trace_kprobe_create(int argc, const char **argv);
+static int trace_kprobe_show(struct seq_file *m, struct dyn_event *ev);
+static int trace_kprobe_release(struct dyn_event *ev);
+static bool trace_kprobe_is_busy(struct dyn_event *ev);
+static bool trace_kprobe_match(const char *system, const char *event,
+  struct dyn_event *ev);
+
+static struct dyn_event_operations trace_kprobe_ops = {
+   .create = trace_kprobe_create,
+   .show = trace_kprobe_show,
+   .is_busy = trace_kprobe_is_busy,
+   .free = trace_kprobe_release,
+   .match = trace_kprobe_match,
+};
+
 /**
  * Kprobe event core functions
  */
 struct trace_kprobe {
-   struct list_headlist;
+   struct dyn_eventdevent;
struct kretproberp; /* Use rp.kp for kprobe use */
unsigned long __percpu *nhit;
const char  *symbol;/* symbol name */
struct trace_probe  tp;
 };
 
+static bool is_trace_kprobe(struct dyn_event *ev)
+{
+   return ev->ops == _kprobe_ops;
+}
+
+static struct trace_kprobe *to_trace_kprobe(struct dyn_event *ev)
+{
+   return container_of(ev, struct trace_kprobe, devent);
+}
+
+/**
+ * for_each_trace_kprobe - iterate over the trace_kprobe list
+ * @pos:   the struct trace_kprobe * for each entry
+ * @dpos:  the struct dyn_event * to use as a loop cursor
+ */
+#define for_each_trace_kprobe(pos, dpos)   \
+   for_each_dyn_event(dpos)\
+   if (is_trace_kprobe(dpos) && (pos = to_trace_kprobe(dpos)))
+
 #define SIZEOF_TRACE_KPROBE(n) \
(offsetof(struct trace_kprobe, tp.args) +   \
(sizeof(struct probe_arg) * (n)))
@@ -81,6 +116,22 @@ static nokprobe_inline bool 
trace_kprobe_module_exist(struct trace_kprobe *tk)
return ret;
 }
 
+static bool trace_kprobe_is_busy(struct dyn_event *ev)
+{
+   struct trace_kprobe *tk = to_trace_kprobe(ev);
+
+   return trace_probe_is_enabled(>tp);
+}
+
+static bool trace_kprobe_match(const char *system, const char *event,
+  struct dyn_event *ev)
+{
+   struct trace_kprobe *tk = to_trace_kprobe(ev);
+
+   return strcmp(trace_event_name(>tp.call), event) == 0 &&
+   (!system || strcmp(tk->tp.call.class->system, system) == 0);
+}
+
 static nokprobe_inline unsigned long trace_kprobe_nhit(struct trace_kprobe *tk)
 {
unsigned long nhit = 0;
@@ -128,9 +179,6 @@ bool trace_kprobe_error_injectable(struct trace_event_call 
*call)
 static int register_kprobe_event(struct trace_kprobe *tk);
 static int unregister_kprobe_event(struct trace_kprobe *tk);
 
-static DEFINE_MUTEX(probe_lock);
-static LIST_HEAD(probe_list);
-
 static int kprobe_dispatcher(struct 

[for-next][PATCH 10/30] tracing: Rearrange functions in trace_sched_wakeup.c

2018-12-05 Thread Steven Rostedt
From: "Steven Rostedt (VMware)" 

Rearrange the functions in trace_sched_wakeup.c so that there are fewer
 #ifdef CONFIG_FUNCTION_TRACER and #ifdef CONFIG_FUNCTION_GRAPH_TRACER,
instead of having the #ifdefs spread all over.

No functional change is made.

Signed-off-by: Steven Rostedt (VMware) 
---
 kernel/trace/trace_sched_wakeup.c | 272 ++
 1 file changed, 130 insertions(+), 142 deletions(-)

diff --git a/kernel/trace/trace_sched_wakeup.c 
b/kernel/trace/trace_sched_wakeup.c
index 7d04b9890755..2ce78100b4d3 100644
--- a/kernel/trace/trace_sched_wakeup.c
+++ b/kernel/trace/trace_sched_wakeup.c
@@ -35,26 +35,19 @@ static arch_spinlock_t wakeup_lock =
 
 static void wakeup_reset(struct trace_array *tr);
 static void __wakeup_reset(struct trace_array *tr);
+static int start_func_tracer(struct trace_array *tr, int graph);
+static void stop_func_tracer(struct trace_array *tr, int graph);
 
 static int save_flags;
 
 #ifdef CONFIG_FUNCTION_GRAPH_TRACER
-static int wakeup_display_graph(struct trace_array *tr, int set);
 # define is_graph(tr) ((tr)->trace_flags & TRACE_ITER_DISPLAY_GRAPH)
 #else
-static inline int wakeup_display_graph(struct trace_array *tr, int set)
-{
-   return 0;
-}
 # define is_graph(tr) false
 #endif
 
-
 #ifdef CONFIG_FUNCTION_TRACER
 
-static int wakeup_graph_entry(struct ftrace_graph_ent *trace);
-static void wakeup_graph_return(struct ftrace_graph_ret *trace);
-
 static bool function_enabled;
 
 /*
@@ -104,122 +97,8 @@ func_prolog_preempt_disable(struct trace_array *tr,
return 0;
 }
 
-/*
- * wakeup uses its own tracer function to keep the overhead down:
- */
-static void
-wakeup_tracer_call(unsigned long ip, unsigned long parent_ip,
-  struct ftrace_ops *op, struct pt_regs *pt_regs)
-{
-   struct trace_array *tr = wakeup_trace;
-   struct trace_array_cpu *data;
-   unsigned long flags;
-   int pc;
-
-   if (!func_prolog_preempt_disable(tr, , ))
-   return;
-
-   local_irq_save(flags);
-   trace_function(tr, ip, parent_ip, flags, pc);
-   local_irq_restore(flags);
-
-   atomic_dec(>disabled);
-   preempt_enable_notrace();
-}
-
-static int register_wakeup_function(struct trace_array *tr, int graph, int set)
-{
-   int ret;
-
-   /* 'set' is set if TRACE_ITER_FUNCTION is about to be set */
-   if (function_enabled || (!set && !(tr->trace_flags & 
TRACE_ITER_FUNCTION)))
-   return 0;
-
-   if (graph)
-   ret = register_ftrace_graph(_graph_return,
-   _graph_entry);
-   else
-   ret = register_ftrace_function(tr->ops);
-
-   if (!ret)
-   function_enabled = true;
-
-   return ret;
-}
-
-static void unregister_wakeup_function(struct trace_array *tr, int graph)
-{
-   if (!function_enabled)
-   return;
-
-   if (graph)
-   unregister_ftrace_graph();
-   else
-   unregister_ftrace_function(tr->ops);
-
-   function_enabled = false;
-}
-
-static int wakeup_function_set(struct trace_array *tr, u32 mask, int set)
-{
-   if (!(mask & TRACE_ITER_FUNCTION))
-   return 0;
-
-   if (set)
-   register_wakeup_function(tr, is_graph(tr), 1);
-   else
-   unregister_wakeup_function(tr, is_graph(tr));
-   return 1;
-}
-#else
-static int register_wakeup_function(struct trace_array *tr, int graph, int set)
-{
-   return 0;
-}
-static void unregister_wakeup_function(struct trace_array *tr, int graph) { }
-static int wakeup_function_set(struct trace_array *tr, u32 mask, int set)
-{
-   return 0;
-}
-#endif /* CONFIG_FUNCTION_TRACER */
-
-static int wakeup_flag_changed(struct trace_array *tr, u32 mask, int set)
-{
-   struct tracer *tracer = tr->current_trace;
-
-   if (wakeup_function_set(tr, mask, set))
-   return 0;
-
 #ifdef CONFIG_FUNCTION_GRAPH_TRACER
-   if (mask & TRACE_ITER_DISPLAY_GRAPH)
-   return wakeup_display_graph(tr, set);
-#endif
-
-   return trace_keep_overwrite(tracer, mask, set);
-}
 
-static int start_func_tracer(struct trace_array *tr, int graph)
-{
-   int ret;
-
-   ret = register_wakeup_function(tr, graph, 0);
-
-   if (!ret && tracing_is_enabled())
-   tracer_enabled = 1;
-   else
-   tracer_enabled = 0;
-
-   return ret;
-}
-
-static void stop_func_tracer(struct trace_array *tr, int graph)
-{
-   tracer_enabled = 0;
-
-   unregister_wakeup_function(tr, graph);
-}
-
-#ifdef CONFIG_FUNCTION_GRAPH_TRACER
 static int wakeup_display_graph(struct trace_array *tr, int set)
 {
if (!(is_graph(tr) ^ set))
@@ -318,20 +197,94 @@ static void wakeup_print_header(struct seq_file *s)
else
trace_default_header(s);
 }
+#else /* CONFIG_FUNCTION_GRAPH_TRACER */
+static int wakeup_graph_entry(struct ftrace_graph_ent *trace)
+{
+   return -1;
+}

[for-next][PATCH 22/30] tracing: Integrate similar probe argument parsers

2018-12-05 Thread Steven Rostedt
From: Masami Hiramatsu 

Integrate similar argument parsers for kprobes and uprobes events
into traceprobe_parse_probe_arg().

Link: 
http://lkml.kernel.org/r/154140850016.17322.9836787731210512176.stgit@devbox

Reviewed-by: Tom Zanussi 
Tested-by: Tom Zanussi 
Signed-off-by: Masami Hiramatsu 
Signed-off-by: Steven Rostedt (VMware) 
---
 kernel/trace/trace_kprobe.c | 48 ++---
 kernel/trace/trace_probe.c  | 47 +---
 kernel/trace/trace_probe.h  |  7 ++
 kernel/trace/trace_uprobe.c | 44 ++
 4 files changed, 50 insertions(+), 96 deletions(-)

diff --git a/kernel/trace/trace_kprobe.c b/kernel/trace/trace_kprobe.c
index fec67188c4d2..d313bcc259dc 100644
--- a/kernel/trace/trace_kprobe.c
+++ b/kernel/trace/trace_kprobe.c
@@ -548,7 +548,6 @@ static int create_trace_kprobe(int argc, char **argv)
bool is_return = false, is_delete = false;
char *symbol = NULL, *event = NULL, *group = NULL;
int maxactive = 0;
-   char *arg;
long offset = 0;
void *addr = NULL;
char buf[MAX_EVENT_NAME_LEN];
@@ -676,53 +675,10 @@ static int create_trace_kprobe(int argc, char **argv)
}
 
/* parse arguments */
-   ret = 0;
for (i = 0; i < argc && i < MAX_TRACE_ARGS; i++) {
-   struct probe_arg *parg = >tp.args[i];
-
-   /* Increment count for freeing args in error case */
-   tk->tp.nr_args++;
-
-   /* Parse argument name */
-   arg = strchr(argv[i], '=');
-   if (arg) {
-   *arg++ = '\0';
-   parg->name = kstrdup(argv[i], GFP_KERNEL);
-   } else {
-   arg = argv[i];
-   /* If argument name is omitted, set "argN" */
-   snprintf(buf, MAX_EVENT_NAME_LEN, "arg%d", i + 1);
-   parg->name = kstrdup(buf, GFP_KERNEL);
-   }
-
-   if (!parg->name) {
-   pr_info("Failed to allocate argument[%d] name.\n", i);
-   ret = -ENOMEM;
-   goto error;
-   }
-
-   if (!is_good_name(parg->name)) {
-   pr_info("Invalid argument[%d] name: %s\n",
-   i, parg->name);
-   ret = -EINVAL;
-   goto error;
-   }
-
-   if (traceprobe_conflict_field_name(parg->name,
-   tk->tp.args, i)) {
-   pr_info("Argument[%d] name '%s' conflicts with "
-   "another field.\n", i, argv[i]);
-   ret = -EINVAL;
-   goto error;
-   }
-
-   /* Parse fetch argument */
-   ret = traceprobe_parse_probe_arg(arg, >tp.size, parg,
-flags);
-   if (ret) {
-   pr_info("Parse error at argument[%d]. (%d)\n", i, ret);
+   ret = traceprobe_parse_probe_arg(>tp, i, argv[i], flags);
+   if (ret)
goto error;
-   }
}
 
ret = register_trace_kprobe(tk);
diff --git a/kernel/trace/trace_probe.c b/kernel/trace/trace_probe.c
index bd30e9398d2a..449150c6a87f 100644
--- a/kernel/trace/trace_probe.c
+++ b/kernel/trace/trace_probe.c
@@ -348,7 +348,7 @@ static int __parse_bitfield_probe_arg(const char *bf,
 }
 
 /* String length checking wrapper */
-int traceprobe_parse_probe_arg(char *arg, ssize_t *size,
+static int traceprobe_parse_probe_arg_body(char *arg, ssize_t *size,
struct probe_arg *parg, unsigned int flags)
 {
struct fetch_insn *code, *scode, *tmp = NULL;
@@ -491,8 +491,8 @@ int traceprobe_parse_probe_arg(char *arg, ssize_t *size,
 }
 
 /* Return 1 if name is reserved or already used by another argument */
-int traceprobe_conflict_field_name(const char *name,
-  struct probe_arg *args, int narg)
+static int traceprobe_conflict_field_name(const char *name,
+ struct probe_arg *args, int narg)
 {
int i;
 
@@ -507,6 +507,47 @@ int traceprobe_conflict_field_name(const char *name,
return 0;
 }
 
+int traceprobe_parse_probe_arg(struct trace_probe *tp, int i, char *arg,
+   unsigned int flags)
+{
+   struct probe_arg *parg = >args[i];
+   char *body;
+   int ret;
+
+   /* Increment count for freeing args in error case */
+   tp->nr_args++;
+
+   body = strchr(arg, '=');
+   if (body) {
+   parg->name = kmemdup_nul(arg, body - arg, GFP_KERNEL);
+   body++;
+   } else {
+   /* If argument name is omitted, set "argN" */
+   parg->name = kasprintf(GFP_KERNEL, "arg%d", i + 1);
+  

[for-next][PATCH 19/30] tracing/uprobes: Add busy check when cleanup all uprobes

2018-12-05 Thread Steven Rostedt
From: Masami Hiramatsu 

Add a busy check loop in cleanup_all_probes() before
trying to remove all events in uprobe_events, the same way
that kprobe_events does.

Without this change, writing null to uprobe_events will
try to remove events but if one of them is enabled, it will
stop there leaving some events cleared and others not clceared.

With this change, writing null to uprobe_events makes
sure all events are not enabled before removing events.
So, it clears all events, or returns an error (-EBUSY)
with keeping all events.

Link: 
http://lkml.kernel.org/r/154140841557.17322.12653952888762532401.stgit@devbox

Reviewed-by: Tom Zanussi 
Tested-by: Tom Zanussi 
Signed-off-by: Masami Hiramatsu 
Signed-off-by: Steven Rostedt (VMware) 
---
 kernel/trace/trace_uprobe.c | 7 +++
 1 file changed, 7 insertions(+)

diff --git a/kernel/trace/trace_uprobe.c b/kernel/trace/trace_uprobe.c
index 31ea48eceda1..b708e4ff7ea7 100644
--- a/kernel/trace/trace_uprobe.c
+++ b/kernel/trace/trace_uprobe.c
@@ -587,12 +587,19 @@ static int cleanup_all_probes(void)
int ret = 0;
 
mutex_lock(_lock);
+   /* Ensure no probe is in use. */
+   list_for_each_entry(tu, _list, list)
+   if (trace_probe_is_enabled(>tp)) {
+   ret = -EBUSY;
+   goto end;
+   }
while (!list_empty(_list)) {
tu = list_entry(uprobe_list.next, struct trace_uprobe, list);
ret = unregister_trace_uprobe(tu);
if (ret)
break;
}
+end:
mutex_unlock(_lock);
return ret;
 }
-- 
2.19.1




[for-next][PATCH 15/30] scripts/recordmcount.{c,pl}: support -ffunction-sections .text.* section names

2018-12-05 Thread Steven Rostedt
From: Joe Lawrence 

When building with -ffunction-sections, the compiler will place each
function into its own ELF section, prefixed with ".text".  For example,
a simple test module with functions test_module_do_work() and
test_module_wq_func():

  % objdump --section-headers test_module.o | awk '/\.text/{print $2}'
  .text
  .text.test_module_do_work
  .text.test_module_wq_func
  .init.text
  .exit.text

Adjust the recordmcount scripts to look for ".text" as a section name
prefix.  This will ensure that those functions will be included in the
__mcount_loc relocations:

  % objdump --reloc --section __mcount_loc test_module.o
  OFFSET   TYPE  VALUE
   R_X86_64_64   .text.test_module_do_work
  0008 R_X86_64_64   .text.test_module_wq_func
  0010 R_X86_64_64   .init.text

Link: 
http://lkml.kernel.org/r/1542745158-25392-2-git-send-email-joe.lawre...@redhat.com

Signed-off-by: Joe Lawrence 
Signed-off-by: Steven Rostedt (VMware) 
---
 scripts/recordmcount.c  |  2 +-
 scripts/recordmcount.pl | 13 +
 2 files changed, 14 insertions(+), 1 deletion(-)

diff --git a/scripts/recordmcount.c b/scripts/recordmcount.c
index 895c40e8679f..a50a2aa963ad 100644
--- a/scripts/recordmcount.c
+++ b/scripts/recordmcount.c
@@ -397,7 +397,7 @@ static uint32_t (*w2)(uint16_t);
 static int
 is_mcounted_section_name(char const *const txtname)
 {
-   return strcmp(".text",   txtname) == 0 ||
+   return strncmp(".text",  txtname, 5) == 0 ||
strcmp(".init.text", txtname) == 0 ||
strcmp(".ref.text",  txtname) == 0 ||
strcmp(".sched.text",txtname) == 0 ||
diff --git a/scripts/recordmcount.pl b/scripts/recordmcount.pl
index f599031260d5..68841d01162c 100755
--- a/scripts/recordmcount.pl
+++ b/scripts/recordmcount.pl
@@ -142,6 +142,11 @@ my %text_sections = (
  ".text.unlikely" => 1,
 );
 
+# Acceptable section-prefixes to record.
+my %text_section_prefixes = (
+ ".text." => 1,
+);
+
 # Note: we are nice to C-programmers here, thus we skip the '||='-idiom.
 $objdump = 'objdump' if (!$objdump);
 $objcopy = 'objcopy' if (!$objcopy);
@@ -519,6 +524,14 @@ while () {
 
# Only record text sections that we know are safe
$read_function = defined($text_sections{$1});
+   if (!$read_function) {
+   foreach my $prefix (keys %text_section_prefixes) {
+   if (substr($1, 0, length $prefix) eq $prefix) {
+   $read_function = 1;
+   last;
+   }
+   }
+   }
# print out any recorded offsets
update_funcs();
 
-- 
2.19.1




[for-next][PATCH 29/30] tracing: Add generic event-name based remove event method

2018-12-05 Thread Steven Rostedt
From: Masami Hiramatsu 

Add a generic method to remove event from dynamic event
list. This is same as other system under ftrace. You
just need to pass the event name with '!', e.g.

  # echo p:new_grp/new_event _do_fork > dynamic_events

This creates an event, and

  # echo '!p:new_grp/new_event _do_fork' > dynamic_events

Or,

  # echo '!p:new_grp/new_event' > dynamic_events

will remove new_grp/new_event event.

Note that this doesn't check the event prefix (e.g. "p:")
strictly, because the "group/event" name must be unique.

Link: 
http://lkml.kernel.org/r/154140869774.17322.8887303560398645347.stgit@devbox

Reviewed-by: Tom Zanussi 
Tested-by: Tom Zanussi 
Signed-off-by: Masami Hiramatsu 
Signed-off-by: Steven Rostedt (VMware) 
---
 kernel/trace/trace_dynevent.c | 15 +++
 1 file changed, 11 insertions(+), 4 deletions(-)

diff --git a/kernel/trace/trace_dynevent.c b/kernel/trace/trace_dynevent.c
index f17a887abb66..dd1f43588d70 100644
--- a/kernel/trace/trace_dynevent.c
+++ b/kernel/trace/trace_dynevent.c
@@ -37,10 +37,17 @@ int dyn_event_release(int argc, char **argv, struct 
dyn_event_operations *type)
char *system = NULL, *event, *p;
int ret = -ENOENT;
 
-   if (argv[0][1] != ':')
-   return -EINVAL;
+   if (argv[0][0] == '-') {
+   if (argv[0][1] != ':')
+   return -EINVAL;
+   event = [0][2];
+   } else {
+   event = strchr(argv[0], ':');
+   if (!event)
+   return -EINVAL;
+   event++;
+   }
 
-   event = [0][2];
p = strchr(event, '/');
if (p) {
system = event;
@@ -69,7 +76,7 @@ static int create_dyn_event(int argc, char **argv)
struct dyn_event_operations *ops;
int ret;
 
-   if (argv[0][0] == '-')
+   if (argv[0][0] == '-' || argv[0][0] == '!')
return dyn_event_release(argc, argv, NULL);
 
mutex_lock(_event_ops_mutex);
-- 
2.19.1




[for-next][PATCH 16/30] ring-buffer: Add percentage of ring buffer full to wake up reader

2018-12-05 Thread Steven Rostedt
From: "Steven Rostedt (VMware)" 

Instead of just waiting for a page to be full before waking up a pending
reader, allow the reader to pass in a "percentage" of pages that have
content before waking up a reader. This should help keep the process of
reading the events not cause wake ups that constantly cause reading of the
buffer.

Signed-off-by: Steven Rostedt (VMware) 
---
 include/linux/ring_buffer.h |  4 ++-
 kernel/trace/ring_buffer.c  | 71 ++---
 kernel/trace/trace.c|  8 ++---
 3 files changed, 73 insertions(+), 10 deletions(-)

diff --git a/include/linux/ring_buffer.h b/include/linux/ring_buffer.h
index 0940fda59872..5b9ae62272bb 100644
--- a/include/linux/ring_buffer.h
+++ b/include/linux/ring_buffer.h
@@ -97,7 +97,7 @@ __ring_buffer_alloc(unsigned long size, unsigned flags, 
struct lock_class_key *k
__ring_buffer_alloc((size), (flags), &__key);   \
 })
 
-int ring_buffer_wait(struct ring_buffer *buffer, int cpu, bool full);
+int ring_buffer_wait(struct ring_buffer *buffer, int cpu, int full);
 __poll_t ring_buffer_poll_wait(struct ring_buffer *buffer, int cpu,
  struct file *filp, poll_table *poll_table);
 
@@ -189,6 +189,8 @@ bool ring_buffer_time_stamp_abs(struct ring_buffer *buffer);
 
 size_t ring_buffer_page_len(void *page);
 
+size_t ring_buffer_nr_pages(struct ring_buffer *buffer, int cpu);
+size_t ring_buffer_nr_dirty_pages(struct ring_buffer *buffer, int cpu);
 
 void *ring_buffer_alloc_read_page(struct ring_buffer *buffer, int cpu);
 void ring_buffer_free_read_page(struct ring_buffer *buffer, int cpu, void 
*data);
diff --git a/kernel/trace/ring_buffer.c b/kernel/trace/ring_buffer.c
index 65bd4616220d..9edb628603ab 100644
--- a/kernel/trace/ring_buffer.c
+++ b/kernel/trace/ring_buffer.c
@@ -487,6 +487,9 @@ struct ring_buffer_per_cpu {
local_t dropped_events;
local_t committing;
local_t commits;
+   local_t pages_touched;
+   local_t pages_read;
+   size_t  shortest_full;
unsigned long   read;
unsigned long   read_bytes;
u64 write_stamp;
@@ -529,6 +532,41 @@ struct ring_buffer_iter {
u64 read_stamp;
 };
 
+/**
+ * ring_buffer_nr_pages - get the number of buffer pages in the ring buffer
+ * @buffer: The ring_buffer to get the number of pages from
+ * @cpu: The cpu of the ring_buffer to get the number of pages from
+ *
+ * Returns the number of pages used by a per_cpu buffer of the ring buffer.
+ */
+size_t ring_buffer_nr_pages(struct ring_buffer *buffer, int cpu)
+{
+   return buffer->buffers[cpu]->nr_pages;
+}
+
+/**
+ * ring_buffer_nr_pages_dirty - get the number of used pages in the ring buffer
+ * @buffer: The ring_buffer to get the number of pages from
+ * @cpu: The cpu of the ring_buffer to get the number of pages from
+ *
+ * Returns the number of pages that have content in the ring buffer.
+ */
+size_t ring_buffer_nr_dirty_pages(struct ring_buffer *buffer, int cpu)
+{
+   size_t read;
+   size_t cnt;
+
+   read = local_read(>buffers[cpu]->pages_read);
+   cnt = local_read(>buffers[cpu]->pages_touched);
+   /* The reader can read an empty page, but not more than that */
+   if (cnt < read) {
+   WARN_ON_ONCE(read > cnt + 1);
+   return 0;
+   }
+
+   return cnt - read;
+}
+
 /*
  * rb_wake_up_waiters - wake up tasks waiting for ring buffer input
  *
@@ -556,7 +594,7 @@ static void rb_wake_up_waiters(struct irq_work *work)
  * as data is added to any of the @buffer's cpu buffers. Otherwise
  * it will wait for data to be added to a specific cpu buffer.
  */
-int ring_buffer_wait(struct ring_buffer *buffer, int cpu, bool full)
+int ring_buffer_wait(struct ring_buffer *buffer, int cpu, int full)
 {
struct ring_buffer_per_cpu *uninitialized_var(cpu_buffer);
DEFINE_WAIT(wait);
@@ -571,7 +609,7 @@ int ring_buffer_wait(struct ring_buffer *buffer, int cpu, 
bool full)
if (cpu == RING_BUFFER_ALL_CPUS) {
work = >irq_work;
/* Full only makes sense on per cpu reads */
-   full = false;
+   full = 0;
} else {
if (!cpumask_test_cpu(cpu, buffer->cpumask))
return -ENODEV;
@@ -623,15 +661,22 @@ int ring_buffer_wait(struct ring_buffer *buffer, int cpu, 
bool full)
!ring_buffer_empty_cpu(buffer, cpu)) {
unsigned long flags;
bool pagebusy;
+   size_t nr_pages;
+   size_t dirty;
 
if (!full)
break;
 
raw_spin_lock_irqsave(_buffer->reader_lock, flags);
   

[for-next][PATCH 08/30] function_graph: Do not expose the graph_time option when profiler is not configured

2018-12-05 Thread Steven Rostedt
From: "Steven Rostedt (VMware)" 

When the function profiler is not configured, the "graph_time" option is
meaningless, as the function profiler is the only thing that makes use of
it. Do not expose it if the profiler is not configured.

Link: http://lkml.kernel.org/r/20181123061133.ga195...@google.com

Reported-by: Joel Fernandes 
Signed-off-by: Steven Rostedt (VMware) 
---
 kernel/trace/trace.h | 5 +
 kernel/trace/trace_functions_graph.c | 4 
 2 files changed, 9 insertions(+)

diff --git a/kernel/trace/trace.h b/kernel/trace/trace.h
index f67060a75f38..ab16eca76e59 100644
--- a/kernel/trace/trace.h
+++ b/kernel/trace/trace.h
@@ -862,7 +862,12 @@ static __always_inline bool ftrace_hash_empty(struct 
ftrace_hash *hash)
 #define TRACE_GRAPH_PRINT_FILL_MASK(0x3 << TRACE_GRAPH_PRINT_FILL_SHIFT)
 
 extern void ftrace_graph_sleep_time_control(bool enable);
+
+#ifdef CONFIG_FUNCTION_PROFILER
 extern void ftrace_graph_graph_time_control(bool enable);
+#else
+static inline void ftrace_graph_graph_time_control(bool enable) { }
+#endif
 
 extern enum print_line_t
 print_graph_function_flags(struct trace_iterator *iter, u32 flags);
diff --git a/kernel/trace/trace_functions_graph.c 
b/kernel/trace/trace_functions_graph.c
index eaf9b1629956..855c13c61e77 100644
--- a/kernel/trace/trace_functions_graph.c
+++ b/kernel/trace/trace_functions_graph.c
@@ -60,8 +60,12 @@ static struct tracer_opt trace_opts[] = {
{ TRACER_OPT(funcgraph-tail, TRACE_GRAPH_PRINT_TAIL) },
/* Include sleep time (scheduled out) between entry and return */
{ TRACER_OPT(sleep-time, TRACE_GRAPH_SLEEP_TIME) },
+
+#ifdef CONFIG_FUNCTION_PROFILER
/* Include time within nested functions */
{ TRACER_OPT(graph-time, TRACE_GRAPH_GRAPH_TIME) },
+#endif
+
{ } /* Empty entry */
 };
 
-- 
2.19.1




[for-next][PATCH 27/30] tracing: Remove unneeded synth_event_mutex

2018-12-05 Thread Steven Rostedt
From: Masami Hiramatsu 

Rmove unneeded synth_event_mutex. This mutex protects the reference
count in synth_event, however, those operational points are already
protected by event_mutex.

1. In __create_synth_event() and create_or_delete_synth_event(),
 those synth_event_mutex clearly obtained right after event_mutex.

2. event_hist_trigger_func() is trigger_hist_cmd.func() which is
 called by trigger_process_regex(), which is a part of
 event_trigger_regex_write() and this function takes event_mutex.

3. hist_unreg_all() is trigger_hist_cmd.unreg_all() which is called
 by event_trigger_regex_open() and it takes event_mutex.

4. onmatch_destroy() and onmatch_create() have long call tree,
 but both are finally invoked from event_trigger_regex_write()
 and event_trace_del_tracer(), former takes event_mutex, and latter
 ensures called under event_mutex locked.

Finally, I ensured there is no resource conflict. For safety,
I added lockdep_assert_held(_mutex) for each function.

Link: 
http://lkml.kernel.org/r/154140864134.17322.4796059721306031894.stgit@devbox

Reviewed-by: Tom Zanussi 
Tested-by: Tom Zanussi 
Signed-off-by: Masami Hiramatsu 
Signed-off-by: Steven Rostedt (VMware) 
---
 kernel/trace/trace_events_hist.c | 30 +++---
 1 file changed, 7 insertions(+), 23 deletions(-)

diff --git a/kernel/trace/trace_events_hist.c b/kernel/trace/trace_events_hist.c
index 414aabd67d1f..21e4954375a1 100644
--- a/kernel/trace/trace_events_hist.c
+++ b/kernel/trace/trace_events_hist.c
@@ -444,8 +444,6 @@ static bool have_hist_err(void)
return false;
 }
 
-static DEFINE_MUTEX(synth_event_mutex);
-
 struct synth_trace_event {
struct trace_entry  ent;
u64 fields[];
@@ -1077,7 +1075,6 @@ static int __create_synth_event(int argc, const char 
*name, const char **argv)
return -EINVAL;
 
mutex_lock(_mutex);
-   mutex_lock(_event_mutex);
 
event = find_synth_event(name);
if (event) {
@@ -1119,7 +1116,6 @@ static int __create_synth_event(int argc, const char 
*name, const char **argv)
else
free_synth_event(event);
  out:
-   mutex_unlock(_event_mutex);
mutex_unlock(_mutex);
 
return ret;
@@ -1139,7 +1135,6 @@ static int create_or_delete_synth_event(int argc, char 
**argv)
/* trace_run_command() ensures argc != 0 */
if (name[0] == '!') {
mutex_lock(_mutex);
-   mutex_lock(_event_mutex);
event = find_synth_event(name + 1);
if (event) {
if (event->ref)
@@ -1153,7 +1148,6 @@ static int create_or_delete_synth_event(int argc, char 
**argv)
}
} else
ret = -ENOENT;
-   mutex_unlock(_event_mutex);
mutex_unlock(_mutex);
return ret;
}
@@ -3535,7 +3529,7 @@ static void onmatch_destroy(struct action_data *data)
 {
unsigned int i;
 
-   mutex_lock(_event_mutex);
+   lockdep_assert_held(_mutex);
 
kfree(data->onmatch.match_event);
kfree(data->onmatch.match_event_system);
@@ -3548,8 +3542,6 @@ static void onmatch_destroy(struct action_data *data)
data->onmatch.synth_event->ref--;
 
kfree(data);
-
-   mutex_unlock(_event_mutex);
 }
 
 static void destroy_field_var(struct field_var *field_var)
@@ -3700,15 +3692,14 @@ static int onmatch_create(struct hist_trigger_data 
*hist_data,
struct synth_event *event;
int ret = 0;
 
-   mutex_lock(_event_mutex);
+   lockdep_assert_held(_mutex);
+
event = find_synth_event(data->onmatch.synth_event_name);
if (!event) {
hist_err("onmatch: Couldn't find synthetic event: ", 
data->onmatch.synth_event_name);
-   mutex_unlock(_event_mutex);
return -EINVAL;
}
event->ref++;
-   mutex_unlock(_event_mutex);
 
var_ref_idx = hist_data->n_var_refs;
 
@@ -3782,9 +3773,7 @@ static int onmatch_create(struct hist_trigger_data 
*hist_data,
  out:
return ret;
  err:
-   mutex_lock(_event_mutex);
event->ref--;
-   mutex_unlock(_event_mutex);
 
goto out;
 }
@@ -5492,6 +5481,8 @@ static void hist_unreg_all(struct trace_event_file *file)
struct synth_event *se;
const char *se_name;
 
+   lockdep_assert_held(_mutex);
+
if (hist_file_check_refs(file))
return;
 
@@ -5501,12 +5492,10 @@ static void hist_unreg_all(struct trace_event_file 
*file)
list_del_rcu(>list);
trace_event_trigger_enable_disable(file, 0);
 
-   mutex_lock(_event_mutex);
se_name = trace_event_name(file->event_call);
se = find_synth_event(se_name);
if (se)
se->ref--;
-  

[for-next][PATCH 13/30] function_graph: Have profiler use new helper ftrace_graph_get_ret_stack()

2018-12-05 Thread Steven Rostedt
From: "Steven Rostedt (VMware)" 

The ret_stack processing is going to change, and that is going
to break anything that is accessing the ret_stack directly. One user is the
function graph profiler. By using the ftrace_graph_get_ret_stack() helper
function, the profiler can access the ret_stack entry without relying on the
implementation details of the stack itself.

Reviewed-by: Joel Fernandes (Google) 
Signed-off-by: Steven Rostedt (VMware) 
---
 include/linux/ftrace.h |  3 +++
 kernel/trace/fgraph.c  | 11 +++
 kernel/trace/ftrace.c  | 21 +++--
 3 files changed, 25 insertions(+), 10 deletions(-)

diff --git a/include/linux/ftrace.h b/include/linux/ftrace.h
index 21c80491ccde..98e141c71ad0 100644
--- a/include/linux/ftrace.h
+++ b/include/linux/ftrace.h
@@ -785,6 +785,9 @@ extern int
 function_graph_enter(unsigned long ret, unsigned long func,
 unsigned long frame_pointer, unsigned long *retp);
 
+struct ftrace_ret_stack *
+ftrace_graph_get_ret_stack(struct task_struct *task, int idx);
+
 unsigned long ftrace_graph_ret_addr(struct task_struct *task, int *idx,
unsigned long ret, unsigned long *retp);
 
diff --git a/kernel/trace/fgraph.c b/kernel/trace/fgraph.c
index 90fcefcaff2a..a3704ec8b599 100644
--- a/kernel/trace/fgraph.c
+++ b/kernel/trace/fgraph.c
@@ -232,6 +232,17 @@ unsigned long ftrace_return_to_handler(unsigned long 
frame_pointer)
return ret;
 }
 
+struct ftrace_ret_stack *
+ftrace_graph_get_ret_stack(struct task_struct *task, int idx)
+{
+   idx = current->curr_ret_stack - idx;
+
+   if (idx >= 0 && idx <= task->curr_ret_stack)
+   return >ret_stack[idx];
+
+   return NULL;
+}
+
 /**
  * ftrace_graph_ret_addr - convert a potentially modified stack return address
  *to its original value
diff --git a/kernel/trace/ftrace.c b/kernel/trace/ftrace.c
index d06fe588e650..8ef9fc226037 100644
--- a/kernel/trace/ftrace.c
+++ b/kernel/trace/ftrace.c
@@ -792,7 +792,7 @@ void ftrace_graph_graph_time_control(bool enable)
 
 static int profile_graph_entry(struct ftrace_graph_ent *trace)
 {
-   int index = current->curr_ret_stack;
+   struct ftrace_ret_stack *ret_stack;
 
function_profile_call(trace->func, 0, NULL, NULL);
 
@@ -800,14 +800,16 @@ static int profile_graph_entry(struct ftrace_graph_ent 
*trace)
if (!current->ret_stack)
return 0;
 
-   if (index >= 0 && index < FTRACE_RETFUNC_DEPTH)
-   current->ret_stack[index].subtime = 0;
+   ret_stack = ftrace_graph_get_ret_stack(current, 0);
+   if (ret_stack)
+   ret_stack->subtime = 0;
 
return 1;
 }
 
 static void profile_graph_return(struct ftrace_graph_ret *trace)
 {
+   struct ftrace_ret_stack *ret_stack;
struct ftrace_profile_stat *stat;
unsigned long long calltime;
struct ftrace_profile *rec;
@@ -825,16 +827,15 @@ static void profile_graph_return(struct ftrace_graph_ret 
*trace)
calltime = trace->rettime - trace->calltime;
 
if (!fgraph_graph_time) {
-   int index;
-
-   index = current->curr_ret_stack;
 
/* Append this call time to the parent time to subtract */
-   if (index)
-   current->ret_stack[index - 1].subtime += calltime;
+   ret_stack = ftrace_graph_get_ret_stack(current, 1);
+   if (ret_stack)
+   ret_stack->subtime += calltime;
 
-   if (current->ret_stack[index].subtime < calltime)
-   calltime -= current->ret_stack[index].subtime;
+   ret_stack = ftrace_graph_get_ret_stack(current, 0);
+   if (ret_stack && ret_stack->subtime < calltime)
+   calltime -= ret_stack->subtime;
else
calltime = 0;
}
-- 
2.19.1




[for-next][PATCH 26/30] tracing: Use dyn_event framework for synthetic events

2018-12-05 Thread Steven Rostedt
From: Masami Hiramatsu 

Use dyn_event framework for synthetic events. This shows
synthetic events on "tracing/dynamic_events" file in addition
to tracing/synthetic_events interface.

User can also define new events via tracing/dynamic_events
with "s:" prefix. So, the new syntax is below;

  s:[synthetic/]EVENT_NAME TYPE ARG; [TYPE ARG;]...

To remove events via tracing/dynamic_events, you can use
"-:" prefix as same as other events.

Link: 
http://lkml.kernel.org/r/154140861301.17322.15454611233735614508.stgit@devbox

Reviewed-by: Tom Zanussi 
Tested-by: Tom Zanussi 
Signed-off-by: Masami Hiramatsu 
Signed-off-by: Steven Rostedt (VMware) 
---
 kernel/trace/Kconfig |   1 +
 kernel/trace/trace.c |   8 +
 kernel/trace/trace_events_hist.c | 265 +++
 3 files changed, 176 insertions(+), 98 deletions(-)

diff --git a/kernel/trace/Kconfig b/kernel/trace/Kconfig
index 2cab3c5dfe2c..fa8b1fe824f3 100644
--- a/kernel/trace/Kconfig
+++ b/kernel/trace/Kconfig
@@ -635,6 +635,7 @@ config HIST_TRIGGERS
depends on ARCH_HAVE_NMI_SAFE_CMPXCHG
select TRACING_MAP
select TRACING
+   select DYNAMIC_EVENTS
default n
help
  Hist triggers allow one or more arbitrary trace event fields
diff --git a/kernel/trace/trace.c b/kernel/trace/trace.c
index 7e0332f90ed4..911470ad9e94 100644
--- a/kernel/trace/trace.c
+++ b/kernel/trace/trace.c
@@ -4620,6 +4620,9 @@ static const char readme_msg[] =
"\t  accepts: event-definitions (one definition per line)\n"
"\t   Format: p[:[/]]  []\n"
"\t   r[maxactive][:[/]]  []\n"
+#ifdef CONFIG_HIST_TRIGGERS
+   "\t   s:[synthetic/]  []\n"
+#endif
"\t   -:[/]\n"
 #ifdef CONFIG_KPROBE_EVENTS
"\tplace: [:][+]|\n"
@@ -4638,6 +4641,11 @@ static const char readme_msg[] =
"\t type: s8/16/32/64, u8/16/32/64, x8/16/32/64, string, symbol,\n"
"\t   b@/,\n"
"\t   \\[\\]\n"
+#ifdef CONFIG_HIST_TRIGGERS
+   "\tfield:  ;\n"
+   "\tstype: u8/u16/u32/u64, s8/s16/s32/s64, pid_t,\n"
+   "\t   [unsigned] char/int/long\n"
+#endif
 #endif
"  events/\t\t- Directory containing all trace event subsystems:\n"
"  enable\t\t- Write 0/1 to enable/disable tracing of all events\n"
diff --git a/kernel/trace/trace_events_hist.c b/kernel/trace/trace_events_hist.c
index 0feb7f460123..414aabd67d1f 100644
--- a/kernel/trace/trace_events_hist.c
+++ b/kernel/trace/trace_events_hist.c
@@ -15,6 +15,7 @@
 
 #include "tracing_map.h"
 #include "trace.h"
+#include "trace_dynevent.h"
 
 #define SYNTH_SYSTEM   "synthetic"
 #define SYNTH_FIELDS_MAX   16
@@ -292,6 +293,21 @@ struct hist_trigger_data {
unsigned intn_max_var_str;
 };
 
+static int synth_event_create(int argc, const char **argv);
+static int synth_event_show(struct seq_file *m, struct dyn_event *ev);
+static int synth_event_release(struct dyn_event *ev);
+static bool synth_event_is_busy(struct dyn_event *ev);
+static bool synth_event_match(const char *system, const char *event,
+ struct dyn_event *ev);
+
+static struct dyn_event_operations synth_event_ops = {
+   .create = synth_event_create,
+   .show = synth_event_show,
+   .is_busy = synth_event_is_busy,
+   .free = synth_event_release,
+   .match = synth_event_match,
+};
+
 struct synth_field {
char *type;
char *name;
@@ -301,7 +317,7 @@ struct synth_field {
 };
 
 struct synth_event {
-   struct list_headlist;
+   struct dyn_eventdevent;
int ref;
char*name;
struct synth_field  **fields;
@@ -312,6 +328,32 @@ struct synth_event {
struct tracepoint   *tp;
 };
 
+static bool is_synth_event(struct dyn_event *ev)
+{
+   return ev->ops == _event_ops;
+}
+
+static struct synth_event *to_synth_event(struct dyn_event *ev)
+{
+   return container_of(ev, struct synth_event, devent);
+}
+
+static bool synth_event_is_busy(struct dyn_event *ev)
+{
+   struct synth_event *event = to_synth_event(ev);
+
+   return event->ref != 0;
+}
+
+static bool synth_event_match(const char *system, const char *event,
+ struct dyn_event *ev)
+{
+   struct synth_event *sev = to_synth_event(ev);
+
+   return strcmp(sev->name, event) == 0 &&
+   (!system || strcmp(system, SYNTH_SYSTEM) == 0);
+}
+
 struct action_data;
 
 typedef void (*action_fn_t) (struct hist_trigger_data *hist_data,
@@ -402,7 +444,6 @@ static bool have_hist_err(void)
return false;
 }
 
-static LIST_HEAD(synth_event_list);
 static DEFINE_MUTEX(synth_event_mutex);
 
 struct synth_trace_event {
@@ -738,14 +779,12 @@ static void free_synth_field(struct synth_field 

[for-next][PATCH 21/30] tracing: Simplify creation and deletion of synthetic events

2018-12-05 Thread Steven Rostedt
From: Masami Hiramatsu 

Since the event_mutex and synth_event_mutex ordering issue
is gone, we can skip existing event check when adding or
deleting events, and some redundant code in error path.

This changes release_all_synth_events() to abort the process
when it hits any error and returns the error code. It succeeds
only if it has no error.

Link: 
http://lkml.kernel.org/r/154140847194.17322.17960275728005067803.stgit@devbox

Reviewed-by: Tom Zanussi 
Tested-by: Tom Zanussi 
Signed-off-by: Masami Hiramatsu 
Signed-off-by: Steven Rostedt (VMware) 
---
 kernel/trace/trace_events_hist.c | 53 +++-
 1 file changed, 18 insertions(+), 35 deletions(-)

diff --git a/kernel/trace/trace_events_hist.c b/kernel/trace/trace_events_hist.c
index 1670c65389fe..0feb7f460123 100644
--- a/kernel/trace/trace_events_hist.c
+++ b/kernel/trace/trace_events_hist.c
@@ -1008,18 +1008,6 @@ struct hist_var_data {
struct hist_trigger_data *hist_data;
 };
 
-static void add_or_delete_synth_event(struct synth_event *event, int delete)
-{
-   if (delete)
-   free_synth_event(event);
-   else {
-   if (!find_synth_event(event->name))
-   list_add(>list, _event_list);
-   else
-   free_synth_event(event);
-   }
-}
-
 static int create_synth_event(int argc, char **argv)
 {
struct synth_field *field, *fields[SYNTH_FIELDS_MAX];
@@ -1052,15 +1040,16 @@ static int create_synth_event(int argc, char **argv)
if (event) {
if (delete_event) {
if (event->ref) {
-   event = NULL;
ret = -EBUSY;
goto out;
}
-   list_del(>list);
-   goto out;
-   }
-   event = NULL;
-   ret = -EEXIST;
+   ret = unregister_synth_event(event);
+   if (!ret) {
+   list_del(>list);
+   free_synth_event(event);
+   }
+   } else
+   ret = -EEXIST;
goto out;
} else if (delete_event) {
ret = -ENOENT;
@@ -1100,29 +1089,21 @@ static int create_synth_event(int argc, char **argv)
event = NULL;
goto err;
}
+   ret = register_synth_event(event);
+   if (!ret)
+   list_add(>list, _event_list);
+   else
+   free_synth_event(event);
  out:
-   if (event) {
-   if (delete_event) {
-   ret = unregister_synth_event(event);
-   add_or_delete_synth_event(event, !ret);
-   } else {
-   ret = register_synth_event(event);
-   add_or_delete_synth_event(event, ret);
-   }
-   }
mutex_unlock(_event_mutex);
mutex_unlock(_mutex);
 
return ret;
  err:
-   mutex_unlock(_event_mutex);
-   mutex_unlock(_mutex);
-
for (i = 0; i < n_fields; i++)
free_synth_field(fields[i]);
-   free_synth_event(event);
 
-   return ret;
+   goto out;
 }
 
 static int release_all_synth_events(void)
@@ -1141,10 +1122,12 @@ static int release_all_synth_events(void)
}
 
list_for_each_entry_safe(event, e, _event_list, list) {
-   list_del(>list);
-
ret = unregister_synth_event(event);
-   add_or_delete_synth_event(event, !ret);
+   if (!ret) {
+   list_del(>list);
+   free_synth_event(event);
+   } else
+   break;
}
mutex_unlock(_event_mutex);
mutex_unlock(_mutex);
-- 
2.19.1




[for-next][PATCH 20/30] tracing: Lock event_mutex before synth_event_mutex

2018-12-05 Thread Steven Rostedt
From: Masami Hiramatsu 

synthetic event is using synth_event_mutex for protecting
synth_event_list, and event_trigger_write() path acquires
locks as below order.

event_trigger_write(event_mutex)
  ->trigger_process_regex(trigger_cmd_mutex)
->event_hist_trigger_func(synth_event_mutex)

On the other hand, synthetic event creation and deletion paths
call trace_add_event_call() and trace_remove_event_call()
which acquires event_mutex. In that case, if we keep the
synth_event_mutex locked while registering/unregistering synthetic
events, its dependency will be inversed.

To avoid this issue, current synthetic event is using a 2 phase
process to create/delete events. For example, it searches existing
events under synth_event_mutex to check for event-name conflicts, and
unlocks synth_event_mutex, then registers a new event under event_mutex
locked. Finally, it locks synth_event_mutex and tries to add the
new event to the list. But it can introduce complexity and a chance
for name conflicts.

To solve this simpler, this introduces trace_add_event_call_nolock()
and trace_remove_event_call_nolock() which don't acquire
event_mutex inside. synthetic event can lock event_mutex before
synth_event_mutex to solve the lock dependency issue simpler.

Link: 
http://lkml.kernel.org/r/154140844377.17322.13781091165954002713.stgit@devbox

Reviewed-by: Tom Zanussi 
Tested-by: Tom Zanussi 
Signed-off-by: Masami Hiramatsu 
Signed-off-by: Steven Rostedt (VMware) 
---
 include/linux/trace_events.h |  2 ++
 kernel/trace/trace_events.c  | 34 ++--
 kernel/trace/trace_events_hist.c | 24 ++
 3 files changed, 40 insertions(+), 20 deletions(-)

diff --git a/include/linux/trace_events.h b/include/linux/trace_events.h
index 4130a5497d40..3aa05593a53f 100644
--- a/include/linux/trace_events.h
+++ b/include/linux/trace_events.h
@@ -529,6 +529,8 @@ extern int trace_event_raw_init(struct trace_event_call 
*call);
 extern int trace_define_field(struct trace_event_call *call, const char *type,
  const char *name, int offset, int size,
  int is_signed, int filter_type);
+extern int trace_add_event_call_nolock(struct trace_event_call *call);
+extern int trace_remove_event_call_nolock(struct trace_event_call *call);
 extern int trace_add_event_call(struct trace_event_call *call);
 extern int trace_remove_event_call(struct trace_event_call *call);
 extern int trace_event_get_offsets(struct trace_event_call *call);
diff --git a/kernel/trace/trace_events.c b/kernel/trace/trace_events.c
index f94be0c2827b..a3b157f689ee 100644
--- a/kernel/trace/trace_events.c
+++ b/kernel/trace/trace_events.c
@@ -2305,11 +2305,11 @@ __trace_early_add_new_event(struct trace_event_call 
*call,
 struct ftrace_module_file_ops;
 static void __add_event_to_tracers(struct trace_event_call *call);
 
-/* Add an additional event_call dynamically */
-int trace_add_event_call(struct trace_event_call *call)
+int trace_add_event_call_nolock(struct trace_event_call *call)
 {
int ret;
-   mutex_lock(_mutex);
+   lockdep_assert_held(_mutex);
+
mutex_lock(_types_lock);
 
ret = __register_event(call, NULL);
@@ -2317,6 +2317,16 @@ int trace_add_event_call(struct trace_event_call *call)
__add_event_to_tracers(call);
 
mutex_unlock(_types_lock);
+   return ret;
+}
+
+/* Add an additional event_call dynamically */
+int trace_add_event_call(struct trace_event_call *call)
+{
+   int ret;
+
+   mutex_lock(_mutex);
+   ret = trace_add_event_call_nolock(call);
mutex_unlock(_mutex);
return ret;
 }
@@ -2366,17 +2376,29 @@ static int probe_remove_event_call(struct 
trace_event_call *call)
return 0;
 }
 
-/* Remove an event_call */
-int trace_remove_event_call(struct trace_event_call *call)
+/* no event_mutex version */
+int trace_remove_event_call_nolock(struct trace_event_call *call)
 {
int ret;
 
-   mutex_lock(_mutex);
+   lockdep_assert_held(_mutex);
+
mutex_lock(_types_lock);
down_write(_event_sem);
ret = probe_remove_event_call(call);
up_write(_event_sem);
mutex_unlock(_types_lock);
+
+   return ret;
+}
+
+/* Remove an event_call */
+int trace_remove_event_call(struct trace_event_call *call)
+{
+   int ret;
+
+   mutex_lock(_mutex);
+   ret = trace_remove_event_call_nolock(call);
mutex_unlock(_mutex);
 
return ret;
diff --git a/kernel/trace/trace_events_hist.c b/kernel/trace/trace_events_hist.c
index eb908ef2ecec..1670c65389fe 100644
--- a/kernel/trace/trace_events_hist.c
+++ b/kernel/trace/trace_events_hist.c
@@ -912,7 +912,7 @@ static int register_synth_event(struct synth_event *event)
call->data = event;
call->tp = event->tp;
 
-   ret = trace_add_event_call(call);
+   ret = trace_add_event_call_nolock(call);
if (ret) {
pr_warn("Failed to register 

[for-next][PATCH 30/30] selftests/ftrace: Add testcases for dynamic event

2018-12-05 Thread Steven Rostedt
From: Masami Hiramatsu 

Add common testcases for dynamic_events interface.
 - Add/remove kprobe events via dynamic_events
 - Add/remove synthetic events via dynamic_events
 - Selective clear events (clear events other interfaces)
 - Genelic clear events ("!LINE" syntax)

Link: 
http://lkml.kernel.org/r/154140872590.17322.10394440849261743052.stgit@devbox

Reviewed-by: Tom Zanussi 
Tested-by: Tom Zanussi 
Signed-off-by: Masami Hiramatsu 
Signed-off-by: Steven Rostedt (VMware) 
---
 .../test.d/dynevent/add_remove_kprobe.tc  | 30 +++
 .../test.d/dynevent/add_remove_synth.tc   | 27 ++
 .../test.d/dynevent/clear_select_events.tc| 50 +++
 .../test.d/dynevent/generic_clear_event.tc| 49 ++
 4 files changed, 156 insertions(+)
 create mode 100644 
tools/testing/selftests/ftrace/test.d/dynevent/add_remove_kprobe.tc
 create mode 100644 
tools/testing/selftests/ftrace/test.d/dynevent/add_remove_synth.tc
 create mode 100644 
tools/testing/selftests/ftrace/test.d/dynevent/clear_select_events.tc
 create mode 100644 
tools/testing/selftests/ftrace/test.d/dynevent/generic_clear_event.tc

diff --git 
a/tools/testing/selftests/ftrace/test.d/dynevent/add_remove_kprobe.tc 
b/tools/testing/selftests/ftrace/test.d/dynevent/add_remove_kprobe.tc
new file mode 100644
index ..c6d8387dbbb8
--- /dev/null
+++ b/tools/testing/selftests/ftrace/test.d/dynevent/add_remove_kprobe.tc
@@ -0,0 +1,30 @@
+#!/bin/sh
+# SPDX-License-Identifier: GPL-2.0
+# description: Generic dynamic event - add/remove kprobe events
+
+[ -f dynamic_events ] || exit_unsupported
+
+grep -q "place: \[:\]" README || exit_unsupported
+grep -q "place (kretprobe): \[:\]" README || exit_unsupported
+
+echo 0 > events/enable
+echo > dynamic_events
+
+PLACE=_do_fork
+
+echo "p:myevent1 $PLACE" >> dynamic_events
+echo "r:myevent2 $PLACE" >> dynamic_events
+
+grep -q myevent1 dynamic_events
+grep -q myevent2 dynamic_events
+test -d events/kprobes/myevent1
+test -d events/kprobes/myevent2
+
+echo "-:myevent2" >> dynamic_events
+
+grep -q myevent1 dynamic_events
+! grep -q myevent2 dynamic_events
+
+echo > dynamic_events
+
+clear_trace
diff --git a/tools/testing/selftests/ftrace/test.d/dynevent/add_remove_synth.tc 
b/tools/testing/selftests/ftrace/test.d/dynevent/add_remove_synth.tc
new file mode 100644
index ..62b77b5941d0
--- /dev/null
+++ b/tools/testing/selftests/ftrace/test.d/dynevent/add_remove_synth.tc
@@ -0,0 +1,27 @@
+#!/bin/sh
+# SPDX-License-Identifier: GPL-2.0
+# description: Generic dynamic event - add/remove synthetic events
+
+[ -f dynamic_events ] || exit_unsupported
+
+grep -q "s:\[synthetic/\]" README || exit_unsupported
+
+echo 0 > events/enable
+echo > dynamic_events
+
+echo "s:latency1 u64 lat; pid_t pid;" >> dynamic_events
+echo "s:latency2 u64 lat; pid_t pid;" >> dynamic_events
+
+grep -q latency1 dynamic_events
+grep -q latency2 dynamic_events
+test -d events/synthetic/latency1
+test -d events/synthetic/latency2
+
+echo "-:synthetic/latency2" >> dynamic_events
+
+grep -q latency1 dynamic_events
+! grep -q latency2 dynamic_events
+
+echo > dynamic_events
+
+clear_trace
diff --git 
a/tools/testing/selftests/ftrace/test.d/dynevent/clear_select_events.tc 
b/tools/testing/selftests/ftrace/test.d/dynevent/clear_select_events.tc
new file mode 100644
index ..e0842109cb57
--- /dev/null
+++ b/tools/testing/selftests/ftrace/test.d/dynevent/clear_select_events.tc
@@ -0,0 +1,50 @@
+#!/bin/sh
+# SPDX-License-Identifier: GPL-2.0
+# description: Generic dynamic event - selective clear (compatibility)
+
+[ -f dynamic_events ] || exit_unsupported
+
+grep -q "place: \[:\]" README || exit_unsupported
+grep -q "place (kretprobe): \[:\]" README || exit_unsupported
+
+grep -q "s:\[synthetic/\]" README || exit_unsupported
+
+[ -f synthetic_events ] || exit_unsupported
+[ -f kprobe_events ] || exit_unsupported
+
+echo 0 > events/enable
+echo > dynamic_events
+
+PLACE=_do_fork
+
+setup_events() {
+echo "p:myevent1 $PLACE" >> dynamic_events
+echo "s:latency1 u64 lat; pid_t pid;" >> dynamic_events
+echo "r:myevent2 $PLACE" >> dynamic_events
+echo "s:latency2 u64 lat; pid_t pid;" >> dynamic_events
+
+grep -q myevent1 dynamic_events
+grep -q myevent2 dynamic_events
+grep -q latency1 dynamic_events
+grep -q latency2 dynamic_events
+}
+
+setup_events
+echo > synthetic_events
+
+grep -q myevent1 dynamic_events
+grep -q myevent2 dynamic_events
+! grep -q latency1 dynamic_events
+! grep -q latency2 dynamic_events
+
+echo > dynamic_events
+
+setup_events
+echo > kprobe_events
+
+! grep -q myevent1 dynamic_events
+! grep -q myevent2 dynamic_events
+grep -q latency1 dynamic_events
+grep -q latency2 dynamic_events
+
+echo > dynamic_events
diff --git 
a/tools/testing/selftests/ftrace/test.d/dynevent/generic_clear_event.tc 
b/tools/testing/selftests/ftrace/test.d/dynevent/generic_clear_event.tc
new file mode 100644
index ..901922e97878
--- /dev/null
+++ 

[for-next][PATCH 27/30] tracing: Remove unneeded synth_event_mutex

2018-12-05 Thread Steven Rostedt
From: Masami Hiramatsu 

Rmove unneeded synth_event_mutex. This mutex protects the reference
count in synth_event, however, those operational points are already
protected by event_mutex.

1. In __create_synth_event() and create_or_delete_synth_event(),
 those synth_event_mutex clearly obtained right after event_mutex.

2. event_hist_trigger_func() is trigger_hist_cmd.func() which is
 called by trigger_process_regex(), which is a part of
 event_trigger_regex_write() and this function takes event_mutex.

3. hist_unreg_all() is trigger_hist_cmd.unreg_all() which is called
 by event_trigger_regex_open() and it takes event_mutex.

4. onmatch_destroy() and onmatch_create() have long call tree,
 but both are finally invoked from event_trigger_regex_write()
 and event_trace_del_tracer(), former takes event_mutex, and latter
 ensures called under event_mutex locked.

Finally, I ensured there is no resource conflict. For safety,
I added lockdep_assert_held(_mutex) for each function.

Link: 
http://lkml.kernel.org/r/154140864134.17322.4796059721306031894.stgit@devbox

Reviewed-by: Tom Zanussi 
Tested-by: Tom Zanussi 
Signed-off-by: Masami Hiramatsu 
Signed-off-by: Steven Rostedt (VMware) 
---
 kernel/trace/trace_events_hist.c | 30 +++---
 1 file changed, 7 insertions(+), 23 deletions(-)

diff --git a/kernel/trace/trace_events_hist.c b/kernel/trace/trace_events_hist.c
index 414aabd67d1f..21e4954375a1 100644
--- a/kernel/trace/trace_events_hist.c
+++ b/kernel/trace/trace_events_hist.c
@@ -444,8 +444,6 @@ static bool have_hist_err(void)
return false;
 }
 
-static DEFINE_MUTEX(synth_event_mutex);
-
 struct synth_trace_event {
struct trace_entry  ent;
u64 fields[];
@@ -1077,7 +1075,6 @@ static int __create_synth_event(int argc, const char 
*name, const char **argv)
return -EINVAL;
 
mutex_lock(_mutex);
-   mutex_lock(_event_mutex);
 
event = find_synth_event(name);
if (event) {
@@ -1119,7 +1116,6 @@ static int __create_synth_event(int argc, const char 
*name, const char **argv)
else
free_synth_event(event);
  out:
-   mutex_unlock(_event_mutex);
mutex_unlock(_mutex);
 
return ret;
@@ -1139,7 +1135,6 @@ static int create_or_delete_synth_event(int argc, char 
**argv)
/* trace_run_command() ensures argc != 0 */
if (name[0] == '!') {
mutex_lock(_mutex);
-   mutex_lock(_event_mutex);
event = find_synth_event(name + 1);
if (event) {
if (event->ref)
@@ -1153,7 +1148,6 @@ static int create_or_delete_synth_event(int argc, char 
**argv)
}
} else
ret = -ENOENT;
-   mutex_unlock(_event_mutex);
mutex_unlock(_mutex);
return ret;
}
@@ -3535,7 +3529,7 @@ static void onmatch_destroy(struct action_data *data)
 {
unsigned int i;
 
-   mutex_lock(_event_mutex);
+   lockdep_assert_held(_mutex);
 
kfree(data->onmatch.match_event);
kfree(data->onmatch.match_event_system);
@@ -3548,8 +3542,6 @@ static void onmatch_destroy(struct action_data *data)
data->onmatch.synth_event->ref--;
 
kfree(data);
-
-   mutex_unlock(_event_mutex);
 }
 
 static void destroy_field_var(struct field_var *field_var)
@@ -3700,15 +3692,14 @@ static int onmatch_create(struct hist_trigger_data 
*hist_data,
struct synth_event *event;
int ret = 0;
 
-   mutex_lock(_event_mutex);
+   lockdep_assert_held(_mutex);
+
event = find_synth_event(data->onmatch.synth_event_name);
if (!event) {
hist_err("onmatch: Couldn't find synthetic event: ", 
data->onmatch.synth_event_name);
-   mutex_unlock(_event_mutex);
return -EINVAL;
}
event->ref++;
-   mutex_unlock(_event_mutex);
 
var_ref_idx = hist_data->n_var_refs;
 
@@ -3782,9 +3773,7 @@ static int onmatch_create(struct hist_trigger_data 
*hist_data,
  out:
return ret;
  err:
-   mutex_lock(_event_mutex);
event->ref--;
-   mutex_unlock(_event_mutex);
 
goto out;
 }
@@ -5492,6 +5481,8 @@ static void hist_unreg_all(struct trace_event_file *file)
struct synth_event *se;
const char *se_name;
 
+   lockdep_assert_held(_mutex);
+
if (hist_file_check_refs(file))
return;
 
@@ -5501,12 +5492,10 @@ static void hist_unreg_all(struct trace_event_file 
*file)
list_del_rcu(>list);
trace_event_trigger_enable_disable(file, 0);
 
-   mutex_lock(_event_mutex);
se_name = trace_event_name(file->event_call);
se = find_synth_event(se_name);
if (se)
se->ref--;
-  

[for-next][PATCH 13/30] function_graph: Have profiler use new helper ftrace_graph_get_ret_stack()

2018-12-05 Thread Steven Rostedt
From: "Steven Rostedt (VMware)" 

The ret_stack processing is going to change, and that is going
to break anything that is accessing the ret_stack directly. One user is the
function graph profiler. By using the ftrace_graph_get_ret_stack() helper
function, the profiler can access the ret_stack entry without relying on the
implementation details of the stack itself.

Reviewed-by: Joel Fernandes (Google) 
Signed-off-by: Steven Rostedt (VMware) 
---
 include/linux/ftrace.h |  3 +++
 kernel/trace/fgraph.c  | 11 +++
 kernel/trace/ftrace.c  | 21 +++--
 3 files changed, 25 insertions(+), 10 deletions(-)

diff --git a/include/linux/ftrace.h b/include/linux/ftrace.h
index 21c80491ccde..98e141c71ad0 100644
--- a/include/linux/ftrace.h
+++ b/include/linux/ftrace.h
@@ -785,6 +785,9 @@ extern int
 function_graph_enter(unsigned long ret, unsigned long func,
 unsigned long frame_pointer, unsigned long *retp);
 
+struct ftrace_ret_stack *
+ftrace_graph_get_ret_stack(struct task_struct *task, int idx);
+
 unsigned long ftrace_graph_ret_addr(struct task_struct *task, int *idx,
unsigned long ret, unsigned long *retp);
 
diff --git a/kernel/trace/fgraph.c b/kernel/trace/fgraph.c
index 90fcefcaff2a..a3704ec8b599 100644
--- a/kernel/trace/fgraph.c
+++ b/kernel/trace/fgraph.c
@@ -232,6 +232,17 @@ unsigned long ftrace_return_to_handler(unsigned long 
frame_pointer)
return ret;
 }
 
+struct ftrace_ret_stack *
+ftrace_graph_get_ret_stack(struct task_struct *task, int idx)
+{
+   idx = current->curr_ret_stack - idx;
+
+   if (idx >= 0 && idx <= task->curr_ret_stack)
+   return >ret_stack[idx];
+
+   return NULL;
+}
+
 /**
  * ftrace_graph_ret_addr - convert a potentially modified stack return address
  *to its original value
diff --git a/kernel/trace/ftrace.c b/kernel/trace/ftrace.c
index d06fe588e650..8ef9fc226037 100644
--- a/kernel/trace/ftrace.c
+++ b/kernel/trace/ftrace.c
@@ -792,7 +792,7 @@ void ftrace_graph_graph_time_control(bool enable)
 
 static int profile_graph_entry(struct ftrace_graph_ent *trace)
 {
-   int index = current->curr_ret_stack;
+   struct ftrace_ret_stack *ret_stack;
 
function_profile_call(trace->func, 0, NULL, NULL);
 
@@ -800,14 +800,16 @@ static int profile_graph_entry(struct ftrace_graph_ent 
*trace)
if (!current->ret_stack)
return 0;
 
-   if (index >= 0 && index < FTRACE_RETFUNC_DEPTH)
-   current->ret_stack[index].subtime = 0;
+   ret_stack = ftrace_graph_get_ret_stack(current, 0);
+   if (ret_stack)
+   ret_stack->subtime = 0;
 
return 1;
 }
 
 static void profile_graph_return(struct ftrace_graph_ret *trace)
 {
+   struct ftrace_ret_stack *ret_stack;
struct ftrace_profile_stat *stat;
unsigned long long calltime;
struct ftrace_profile *rec;
@@ -825,16 +827,15 @@ static void profile_graph_return(struct ftrace_graph_ret 
*trace)
calltime = trace->rettime - trace->calltime;
 
if (!fgraph_graph_time) {
-   int index;
-
-   index = current->curr_ret_stack;
 
/* Append this call time to the parent time to subtract */
-   if (index)
-   current->ret_stack[index - 1].subtime += calltime;
+   ret_stack = ftrace_graph_get_ret_stack(current, 1);
+   if (ret_stack)
+   ret_stack->subtime += calltime;
 
-   if (current->ret_stack[index].subtime < calltime)
-   calltime -= current->ret_stack[index].subtime;
+   ret_stack = ftrace_graph_get_ret_stack(current, 0);
+   if (ret_stack && ret_stack->subtime < calltime)
+   calltime -= ret_stack->subtime;
else
calltime = 0;
}
-- 
2.19.1




[for-next][PATCH 26/30] tracing: Use dyn_event framework for synthetic events

2018-12-05 Thread Steven Rostedt
From: Masami Hiramatsu 

Use dyn_event framework for synthetic events. This shows
synthetic events on "tracing/dynamic_events" file in addition
to tracing/synthetic_events interface.

User can also define new events via tracing/dynamic_events
with "s:" prefix. So, the new syntax is below;

  s:[synthetic/]EVENT_NAME TYPE ARG; [TYPE ARG;]...

To remove events via tracing/dynamic_events, you can use
"-:" prefix as same as other events.

Link: 
http://lkml.kernel.org/r/154140861301.17322.15454611233735614508.stgit@devbox

Reviewed-by: Tom Zanussi 
Tested-by: Tom Zanussi 
Signed-off-by: Masami Hiramatsu 
Signed-off-by: Steven Rostedt (VMware) 
---
 kernel/trace/Kconfig |   1 +
 kernel/trace/trace.c |   8 +
 kernel/trace/trace_events_hist.c | 265 +++
 3 files changed, 176 insertions(+), 98 deletions(-)

diff --git a/kernel/trace/Kconfig b/kernel/trace/Kconfig
index 2cab3c5dfe2c..fa8b1fe824f3 100644
--- a/kernel/trace/Kconfig
+++ b/kernel/trace/Kconfig
@@ -635,6 +635,7 @@ config HIST_TRIGGERS
depends on ARCH_HAVE_NMI_SAFE_CMPXCHG
select TRACING_MAP
select TRACING
+   select DYNAMIC_EVENTS
default n
help
  Hist triggers allow one or more arbitrary trace event fields
diff --git a/kernel/trace/trace.c b/kernel/trace/trace.c
index 7e0332f90ed4..911470ad9e94 100644
--- a/kernel/trace/trace.c
+++ b/kernel/trace/trace.c
@@ -4620,6 +4620,9 @@ static const char readme_msg[] =
"\t  accepts: event-definitions (one definition per line)\n"
"\t   Format: p[:[/]]  []\n"
"\t   r[maxactive][:[/]]  []\n"
+#ifdef CONFIG_HIST_TRIGGERS
+   "\t   s:[synthetic/]  []\n"
+#endif
"\t   -:[/]\n"
 #ifdef CONFIG_KPROBE_EVENTS
"\tplace: [:][+]|\n"
@@ -4638,6 +4641,11 @@ static const char readme_msg[] =
"\t type: s8/16/32/64, u8/16/32/64, x8/16/32/64, string, symbol,\n"
"\t   b@/,\n"
"\t   \\[\\]\n"
+#ifdef CONFIG_HIST_TRIGGERS
+   "\tfield:  ;\n"
+   "\tstype: u8/u16/u32/u64, s8/s16/s32/s64, pid_t,\n"
+   "\t   [unsigned] char/int/long\n"
+#endif
 #endif
"  events/\t\t- Directory containing all trace event subsystems:\n"
"  enable\t\t- Write 0/1 to enable/disable tracing of all events\n"
diff --git a/kernel/trace/trace_events_hist.c b/kernel/trace/trace_events_hist.c
index 0feb7f460123..414aabd67d1f 100644
--- a/kernel/trace/trace_events_hist.c
+++ b/kernel/trace/trace_events_hist.c
@@ -15,6 +15,7 @@
 
 #include "tracing_map.h"
 #include "trace.h"
+#include "trace_dynevent.h"
 
 #define SYNTH_SYSTEM   "synthetic"
 #define SYNTH_FIELDS_MAX   16
@@ -292,6 +293,21 @@ struct hist_trigger_data {
unsigned intn_max_var_str;
 };
 
+static int synth_event_create(int argc, const char **argv);
+static int synth_event_show(struct seq_file *m, struct dyn_event *ev);
+static int synth_event_release(struct dyn_event *ev);
+static bool synth_event_is_busy(struct dyn_event *ev);
+static bool synth_event_match(const char *system, const char *event,
+ struct dyn_event *ev);
+
+static struct dyn_event_operations synth_event_ops = {
+   .create = synth_event_create,
+   .show = synth_event_show,
+   .is_busy = synth_event_is_busy,
+   .free = synth_event_release,
+   .match = synth_event_match,
+};
+
 struct synth_field {
char *type;
char *name;
@@ -301,7 +317,7 @@ struct synth_field {
 };
 
 struct synth_event {
-   struct list_headlist;
+   struct dyn_eventdevent;
int ref;
char*name;
struct synth_field  **fields;
@@ -312,6 +328,32 @@ struct synth_event {
struct tracepoint   *tp;
 };
 
+static bool is_synth_event(struct dyn_event *ev)
+{
+   return ev->ops == _event_ops;
+}
+
+static struct synth_event *to_synth_event(struct dyn_event *ev)
+{
+   return container_of(ev, struct synth_event, devent);
+}
+
+static bool synth_event_is_busy(struct dyn_event *ev)
+{
+   struct synth_event *event = to_synth_event(ev);
+
+   return event->ref != 0;
+}
+
+static bool synth_event_match(const char *system, const char *event,
+ struct dyn_event *ev)
+{
+   struct synth_event *sev = to_synth_event(ev);
+
+   return strcmp(sev->name, event) == 0 &&
+   (!system || strcmp(system, SYNTH_SYSTEM) == 0);
+}
+
 struct action_data;
 
 typedef void (*action_fn_t) (struct hist_trigger_data *hist_data,
@@ -402,7 +444,6 @@ static bool have_hist_err(void)
return false;
 }
 
-static LIST_HEAD(synth_event_list);
 static DEFINE_MUTEX(synth_event_mutex);
 
 struct synth_trace_event {
@@ -738,14 +779,12 @@ static void free_synth_field(struct synth_field 

[for-next][PATCH 21/30] tracing: Simplify creation and deletion of synthetic events

2018-12-05 Thread Steven Rostedt
From: Masami Hiramatsu 

Since the event_mutex and synth_event_mutex ordering issue
is gone, we can skip existing event check when adding or
deleting events, and some redundant code in error path.

This changes release_all_synth_events() to abort the process
when it hits any error and returns the error code. It succeeds
only if it has no error.

Link: 
http://lkml.kernel.org/r/154140847194.17322.17960275728005067803.stgit@devbox

Reviewed-by: Tom Zanussi 
Tested-by: Tom Zanussi 
Signed-off-by: Masami Hiramatsu 
Signed-off-by: Steven Rostedt (VMware) 
---
 kernel/trace/trace_events_hist.c | 53 +++-
 1 file changed, 18 insertions(+), 35 deletions(-)

diff --git a/kernel/trace/trace_events_hist.c b/kernel/trace/trace_events_hist.c
index 1670c65389fe..0feb7f460123 100644
--- a/kernel/trace/trace_events_hist.c
+++ b/kernel/trace/trace_events_hist.c
@@ -1008,18 +1008,6 @@ struct hist_var_data {
struct hist_trigger_data *hist_data;
 };
 
-static void add_or_delete_synth_event(struct synth_event *event, int delete)
-{
-   if (delete)
-   free_synth_event(event);
-   else {
-   if (!find_synth_event(event->name))
-   list_add(>list, _event_list);
-   else
-   free_synth_event(event);
-   }
-}
-
 static int create_synth_event(int argc, char **argv)
 {
struct synth_field *field, *fields[SYNTH_FIELDS_MAX];
@@ -1052,15 +1040,16 @@ static int create_synth_event(int argc, char **argv)
if (event) {
if (delete_event) {
if (event->ref) {
-   event = NULL;
ret = -EBUSY;
goto out;
}
-   list_del(>list);
-   goto out;
-   }
-   event = NULL;
-   ret = -EEXIST;
+   ret = unregister_synth_event(event);
+   if (!ret) {
+   list_del(>list);
+   free_synth_event(event);
+   }
+   } else
+   ret = -EEXIST;
goto out;
} else if (delete_event) {
ret = -ENOENT;
@@ -1100,29 +1089,21 @@ static int create_synth_event(int argc, char **argv)
event = NULL;
goto err;
}
+   ret = register_synth_event(event);
+   if (!ret)
+   list_add(>list, _event_list);
+   else
+   free_synth_event(event);
  out:
-   if (event) {
-   if (delete_event) {
-   ret = unregister_synth_event(event);
-   add_or_delete_synth_event(event, !ret);
-   } else {
-   ret = register_synth_event(event);
-   add_or_delete_synth_event(event, ret);
-   }
-   }
mutex_unlock(_event_mutex);
mutex_unlock(_mutex);
 
return ret;
  err:
-   mutex_unlock(_event_mutex);
-   mutex_unlock(_mutex);
-
for (i = 0; i < n_fields; i++)
free_synth_field(fields[i]);
-   free_synth_event(event);
 
-   return ret;
+   goto out;
 }
 
 static int release_all_synth_events(void)
@@ -1141,10 +1122,12 @@ static int release_all_synth_events(void)
}
 
list_for_each_entry_safe(event, e, _event_list, list) {
-   list_del(>list);
-
ret = unregister_synth_event(event);
-   add_or_delete_synth_event(event, !ret);
+   if (!ret) {
+   list_del(>list);
+   free_synth_event(event);
+   } else
+   break;
}
mutex_unlock(_event_mutex);
mutex_unlock(_mutex);
-- 
2.19.1




[for-next][PATCH 20/30] tracing: Lock event_mutex before synth_event_mutex

2018-12-05 Thread Steven Rostedt
From: Masami Hiramatsu 

synthetic event is using synth_event_mutex for protecting
synth_event_list, and event_trigger_write() path acquires
locks as below order.

event_trigger_write(event_mutex)
  ->trigger_process_regex(trigger_cmd_mutex)
->event_hist_trigger_func(synth_event_mutex)

On the other hand, synthetic event creation and deletion paths
call trace_add_event_call() and trace_remove_event_call()
which acquires event_mutex. In that case, if we keep the
synth_event_mutex locked while registering/unregistering synthetic
events, its dependency will be inversed.

To avoid this issue, current synthetic event is using a 2 phase
process to create/delete events. For example, it searches existing
events under synth_event_mutex to check for event-name conflicts, and
unlocks synth_event_mutex, then registers a new event under event_mutex
locked. Finally, it locks synth_event_mutex and tries to add the
new event to the list. But it can introduce complexity and a chance
for name conflicts.

To solve this simpler, this introduces trace_add_event_call_nolock()
and trace_remove_event_call_nolock() which don't acquire
event_mutex inside. synthetic event can lock event_mutex before
synth_event_mutex to solve the lock dependency issue simpler.

Link: 
http://lkml.kernel.org/r/154140844377.17322.13781091165954002713.stgit@devbox

Reviewed-by: Tom Zanussi 
Tested-by: Tom Zanussi 
Signed-off-by: Masami Hiramatsu 
Signed-off-by: Steven Rostedt (VMware) 
---
 include/linux/trace_events.h |  2 ++
 kernel/trace/trace_events.c  | 34 ++--
 kernel/trace/trace_events_hist.c | 24 ++
 3 files changed, 40 insertions(+), 20 deletions(-)

diff --git a/include/linux/trace_events.h b/include/linux/trace_events.h
index 4130a5497d40..3aa05593a53f 100644
--- a/include/linux/trace_events.h
+++ b/include/linux/trace_events.h
@@ -529,6 +529,8 @@ extern int trace_event_raw_init(struct trace_event_call 
*call);
 extern int trace_define_field(struct trace_event_call *call, const char *type,
  const char *name, int offset, int size,
  int is_signed, int filter_type);
+extern int trace_add_event_call_nolock(struct trace_event_call *call);
+extern int trace_remove_event_call_nolock(struct trace_event_call *call);
 extern int trace_add_event_call(struct trace_event_call *call);
 extern int trace_remove_event_call(struct trace_event_call *call);
 extern int trace_event_get_offsets(struct trace_event_call *call);
diff --git a/kernel/trace/trace_events.c b/kernel/trace/trace_events.c
index f94be0c2827b..a3b157f689ee 100644
--- a/kernel/trace/trace_events.c
+++ b/kernel/trace/trace_events.c
@@ -2305,11 +2305,11 @@ __trace_early_add_new_event(struct trace_event_call 
*call,
 struct ftrace_module_file_ops;
 static void __add_event_to_tracers(struct trace_event_call *call);
 
-/* Add an additional event_call dynamically */
-int trace_add_event_call(struct trace_event_call *call)
+int trace_add_event_call_nolock(struct trace_event_call *call)
 {
int ret;
-   mutex_lock(_mutex);
+   lockdep_assert_held(_mutex);
+
mutex_lock(_types_lock);
 
ret = __register_event(call, NULL);
@@ -2317,6 +2317,16 @@ int trace_add_event_call(struct trace_event_call *call)
__add_event_to_tracers(call);
 
mutex_unlock(_types_lock);
+   return ret;
+}
+
+/* Add an additional event_call dynamically */
+int trace_add_event_call(struct trace_event_call *call)
+{
+   int ret;
+
+   mutex_lock(_mutex);
+   ret = trace_add_event_call_nolock(call);
mutex_unlock(_mutex);
return ret;
 }
@@ -2366,17 +2376,29 @@ static int probe_remove_event_call(struct 
trace_event_call *call)
return 0;
 }
 
-/* Remove an event_call */
-int trace_remove_event_call(struct trace_event_call *call)
+/* no event_mutex version */
+int trace_remove_event_call_nolock(struct trace_event_call *call)
 {
int ret;
 
-   mutex_lock(_mutex);
+   lockdep_assert_held(_mutex);
+
mutex_lock(_types_lock);
down_write(_event_sem);
ret = probe_remove_event_call(call);
up_write(_event_sem);
mutex_unlock(_types_lock);
+
+   return ret;
+}
+
+/* Remove an event_call */
+int trace_remove_event_call(struct trace_event_call *call)
+{
+   int ret;
+
+   mutex_lock(_mutex);
+   ret = trace_remove_event_call_nolock(call);
mutex_unlock(_mutex);
 
return ret;
diff --git a/kernel/trace/trace_events_hist.c b/kernel/trace/trace_events_hist.c
index eb908ef2ecec..1670c65389fe 100644
--- a/kernel/trace/trace_events_hist.c
+++ b/kernel/trace/trace_events_hist.c
@@ -912,7 +912,7 @@ static int register_synth_event(struct synth_event *event)
call->data = event;
call->tp = event->tp;
 
-   ret = trace_add_event_call(call);
+   ret = trace_add_event_call_nolock(call);
if (ret) {
pr_warn("Failed to register 

[for-next][PATCH 30/30] selftests/ftrace: Add testcases for dynamic event

2018-12-05 Thread Steven Rostedt
From: Masami Hiramatsu 

Add common testcases for dynamic_events interface.
 - Add/remove kprobe events via dynamic_events
 - Add/remove synthetic events via dynamic_events
 - Selective clear events (clear events other interfaces)
 - Genelic clear events ("!LINE" syntax)

Link: 
http://lkml.kernel.org/r/154140872590.17322.10394440849261743052.stgit@devbox

Reviewed-by: Tom Zanussi 
Tested-by: Tom Zanussi 
Signed-off-by: Masami Hiramatsu 
Signed-off-by: Steven Rostedt (VMware) 
---
 .../test.d/dynevent/add_remove_kprobe.tc  | 30 +++
 .../test.d/dynevent/add_remove_synth.tc   | 27 ++
 .../test.d/dynevent/clear_select_events.tc| 50 +++
 .../test.d/dynevent/generic_clear_event.tc| 49 ++
 4 files changed, 156 insertions(+)
 create mode 100644 
tools/testing/selftests/ftrace/test.d/dynevent/add_remove_kprobe.tc
 create mode 100644 
tools/testing/selftests/ftrace/test.d/dynevent/add_remove_synth.tc
 create mode 100644 
tools/testing/selftests/ftrace/test.d/dynevent/clear_select_events.tc
 create mode 100644 
tools/testing/selftests/ftrace/test.d/dynevent/generic_clear_event.tc

diff --git 
a/tools/testing/selftests/ftrace/test.d/dynevent/add_remove_kprobe.tc 
b/tools/testing/selftests/ftrace/test.d/dynevent/add_remove_kprobe.tc
new file mode 100644
index ..c6d8387dbbb8
--- /dev/null
+++ b/tools/testing/selftests/ftrace/test.d/dynevent/add_remove_kprobe.tc
@@ -0,0 +1,30 @@
+#!/bin/sh
+# SPDX-License-Identifier: GPL-2.0
+# description: Generic dynamic event - add/remove kprobe events
+
+[ -f dynamic_events ] || exit_unsupported
+
+grep -q "place: \[:\]" README || exit_unsupported
+grep -q "place (kretprobe): \[:\]" README || exit_unsupported
+
+echo 0 > events/enable
+echo > dynamic_events
+
+PLACE=_do_fork
+
+echo "p:myevent1 $PLACE" >> dynamic_events
+echo "r:myevent2 $PLACE" >> dynamic_events
+
+grep -q myevent1 dynamic_events
+grep -q myevent2 dynamic_events
+test -d events/kprobes/myevent1
+test -d events/kprobes/myevent2
+
+echo "-:myevent2" >> dynamic_events
+
+grep -q myevent1 dynamic_events
+! grep -q myevent2 dynamic_events
+
+echo > dynamic_events
+
+clear_trace
diff --git a/tools/testing/selftests/ftrace/test.d/dynevent/add_remove_synth.tc 
b/tools/testing/selftests/ftrace/test.d/dynevent/add_remove_synth.tc
new file mode 100644
index ..62b77b5941d0
--- /dev/null
+++ b/tools/testing/selftests/ftrace/test.d/dynevent/add_remove_synth.tc
@@ -0,0 +1,27 @@
+#!/bin/sh
+# SPDX-License-Identifier: GPL-2.0
+# description: Generic dynamic event - add/remove synthetic events
+
+[ -f dynamic_events ] || exit_unsupported
+
+grep -q "s:\[synthetic/\]" README || exit_unsupported
+
+echo 0 > events/enable
+echo > dynamic_events
+
+echo "s:latency1 u64 lat; pid_t pid;" >> dynamic_events
+echo "s:latency2 u64 lat; pid_t pid;" >> dynamic_events
+
+grep -q latency1 dynamic_events
+grep -q latency2 dynamic_events
+test -d events/synthetic/latency1
+test -d events/synthetic/latency2
+
+echo "-:synthetic/latency2" >> dynamic_events
+
+grep -q latency1 dynamic_events
+! grep -q latency2 dynamic_events
+
+echo > dynamic_events
+
+clear_trace
diff --git 
a/tools/testing/selftests/ftrace/test.d/dynevent/clear_select_events.tc 
b/tools/testing/selftests/ftrace/test.d/dynevent/clear_select_events.tc
new file mode 100644
index ..e0842109cb57
--- /dev/null
+++ b/tools/testing/selftests/ftrace/test.d/dynevent/clear_select_events.tc
@@ -0,0 +1,50 @@
+#!/bin/sh
+# SPDX-License-Identifier: GPL-2.0
+# description: Generic dynamic event - selective clear (compatibility)
+
+[ -f dynamic_events ] || exit_unsupported
+
+grep -q "place: \[:\]" README || exit_unsupported
+grep -q "place (kretprobe): \[:\]" README || exit_unsupported
+
+grep -q "s:\[synthetic/\]" README || exit_unsupported
+
+[ -f synthetic_events ] || exit_unsupported
+[ -f kprobe_events ] || exit_unsupported
+
+echo 0 > events/enable
+echo > dynamic_events
+
+PLACE=_do_fork
+
+setup_events() {
+echo "p:myevent1 $PLACE" >> dynamic_events
+echo "s:latency1 u64 lat; pid_t pid;" >> dynamic_events
+echo "r:myevent2 $PLACE" >> dynamic_events
+echo "s:latency2 u64 lat; pid_t pid;" >> dynamic_events
+
+grep -q myevent1 dynamic_events
+grep -q myevent2 dynamic_events
+grep -q latency1 dynamic_events
+grep -q latency2 dynamic_events
+}
+
+setup_events
+echo > synthetic_events
+
+grep -q myevent1 dynamic_events
+grep -q myevent2 dynamic_events
+! grep -q latency1 dynamic_events
+! grep -q latency2 dynamic_events
+
+echo > dynamic_events
+
+setup_events
+echo > kprobe_events
+
+! grep -q myevent1 dynamic_events
+! grep -q myevent2 dynamic_events
+grep -q latency1 dynamic_events
+grep -q latency2 dynamic_events
+
+echo > dynamic_events
diff --git 
a/tools/testing/selftests/ftrace/test.d/dynevent/generic_clear_event.tc 
b/tools/testing/selftests/ftrace/test.d/dynevent/generic_clear_event.tc
new file mode 100644
index ..901922e97878
--- /dev/null
+++ 

[for-next][PATCH 29/30] tracing: Add generic event-name based remove event method

2018-12-05 Thread Steven Rostedt
From: Masami Hiramatsu 

Add a generic method to remove event from dynamic event
list. This is same as other system under ftrace. You
just need to pass the event name with '!', e.g.

  # echo p:new_grp/new_event _do_fork > dynamic_events

This creates an event, and

  # echo '!p:new_grp/new_event _do_fork' > dynamic_events

Or,

  # echo '!p:new_grp/new_event' > dynamic_events

will remove new_grp/new_event event.

Note that this doesn't check the event prefix (e.g. "p:")
strictly, because the "group/event" name must be unique.

Link: 
http://lkml.kernel.org/r/154140869774.17322.8887303560398645347.stgit@devbox

Reviewed-by: Tom Zanussi 
Tested-by: Tom Zanussi 
Signed-off-by: Masami Hiramatsu 
Signed-off-by: Steven Rostedt (VMware) 
---
 kernel/trace/trace_dynevent.c | 15 +++
 1 file changed, 11 insertions(+), 4 deletions(-)

diff --git a/kernel/trace/trace_dynevent.c b/kernel/trace/trace_dynevent.c
index f17a887abb66..dd1f43588d70 100644
--- a/kernel/trace/trace_dynevent.c
+++ b/kernel/trace/trace_dynevent.c
@@ -37,10 +37,17 @@ int dyn_event_release(int argc, char **argv, struct 
dyn_event_operations *type)
char *system = NULL, *event, *p;
int ret = -ENOENT;
 
-   if (argv[0][1] != ':')
-   return -EINVAL;
+   if (argv[0][0] == '-') {
+   if (argv[0][1] != ':')
+   return -EINVAL;
+   event = [0][2];
+   } else {
+   event = strchr(argv[0], ':');
+   if (!event)
+   return -EINVAL;
+   event++;
+   }
 
-   event = [0][2];
p = strchr(event, '/');
if (p) {
system = event;
@@ -69,7 +76,7 @@ static int create_dyn_event(int argc, char **argv)
struct dyn_event_operations *ops;
int ret;
 
-   if (argv[0][0] == '-')
+   if (argv[0][0] == '-' || argv[0][0] == '!')
return dyn_event_release(argc, argv, NULL);
 
mutex_lock(_event_ops_mutex);
-- 
2.19.1




<    1   2   3   4   5   6   7   8   9   10   >