[PATCH 05/10] m32r: Add missing RCU idle APIs on idle loop

2012-08-22 Thread Frederic Weisbecker
In the old times, the whole idle task was considered
as an RCU quiescent state. But as RCU became more and
more successful overtime, some RCU read side critical
section have been added even in the code of some
architectures idle tasks, for tracing for example.

So nowadays, rcu_idle_enter() and rcu_idle_exit() must
be called by the architecture to tell RCU about the part
in the idle loop that doesn't make use of rcu read side
critical sections, typically the part that puts the CPU
in low power mode.

This is necessary for RCU to find the quiescent states in
idle in order to complete grace periods.

Add this missing pair of calls in the m32r's idle loop.

Reported-by: Paul E. McKenney 
Signed-off-by: Frederic Weisbecker 
Cc: Hirokazu Takata 
Cc: 3.2.x.. 
Cc: Paul E. McKenney 
---
 arch/m32r/kernel/process.c |3 +++
 1 files changed, 3 insertions(+), 0 deletions(-)

diff --git a/arch/m32r/kernel/process.c b/arch/m32r/kernel/process.c
index 3a4a32b2..384e63f 100644
--- a/arch/m32r/kernel/process.c
+++ b/arch/m32r/kernel/process.c
@@ -26,6 +26,7 @@
 #include 
 #include 
 #include 
+#include 
 
 #include 
 #include 
@@ -82,6 +83,7 @@ void cpu_idle (void)
 {
/* endless idle loop with no priority at all */
while (1) {
+   rcu_idle_enter();
while (!need_resched()) {
void (*idle)(void) = pm_idle;
 
@@ -90,6 +92,7 @@ void cpu_idle (void)
 
idle();
}
+   rcu_idle_exit();
schedule_preempt_disabled();
}
 }
-- 
1.7.5.4

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH 03/10] frv: Add missing RCU idle APIs on idle loop

2012-08-22 Thread Frederic Weisbecker
In the old times, the whole idle task was considered
as an RCU quiescent state. But as RCU became more and
more successful overtime, some RCU read side critical
section have been added even in the code of some
architectures idle tasks, for tracing for example.

So nowadays, rcu_idle_enter() and rcu_idle_exit() must
be called by the architecture to tell RCU about the part
in the idle loop that doesn't make use of rcu read side
critical sections, typically the part that puts the CPU
in low power mode.

This is necessary for RCU to find the quiescent states in
idle in order to complete grace periods.

Add this missing pair of calls in the Frv's idle loop.

Reported-by: Paul E. McKenney 
Signed-off-by: Frederic Weisbecker 
Cc: David Howells 
Cc: 3.2.x.. 
Cc: Paul E. McKenney 
---
 arch/frv/kernel/process.c |3 +++
 1 files changed, 3 insertions(+), 0 deletions(-)

diff --git a/arch/frv/kernel/process.c b/arch/frv/kernel/process.c
index ff95f50..2eb7fa5 100644
--- a/arch/frv/kernel/process.c
+++ b/arch/frv/kernel/process.c
@@ -25,6 +25,7 @@
 #include 
 #include 
 #include 
+#include 
 
 #include 
 #include 
@@ -69,12 +70,14 @@ void cpu_idle(void)
 {
/* endless idle loop with no priority at all */
while (1) {
+   rcu_idle_enter();
while (!need_resched()) {
check_pgt_cache();
 
if (!frv_dma_inprogress && idle)
idle();
}
+   rcu_idle_exit();
 
schedule_preempt_disabled();
}
-- 
1.7.5.4

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH 02/10] cris: Add missing RCU idle APIs on idle loop

2012-08-22 Thread Frederic Weisbecker
In the old times, the whole idle task was considered
as an RCU quiescent state. But as RCU became more and
more successful overtime, some RCU read side critical
section have been added even in the code of some
architectures idle tasks, for tracing for example.

So nowadays, rcu_idle_enter() and rcu_idle_exit() must
be called by the architecture to tell RCU about the part
in the idle loop that doesn't make use of rcu read side
critical sections, typically the part that puts the CPU
in low power mode.

This is necessary for RCU to find the quiescent states in
idle in order to complete grace periods.

Add this missing pair of calls in the Cris's idle loop.

Reported-by: Paul E. McKenney 
Signed-off-by: Frederic Weisbecker 
Cc: Mikael Starvik 
Cc: Jesper Nilsson 
Cc: Cris 
Cc: 3.2.x.. 
Cc: Paul E. McKenney 
---
 arch/cris/kernel/process.c |3 +++
 1 files changed, 3 insertions(+), 0 deletions(-)

diff --git a/arch/cris/kernel/process.c b/arch/cris/kernel/process.c
index 66fd017..7f65be6 100644
--- a/arch/cris/kernel/process.c
+++ b/arch/cris/kernel/process.c
@@ -25,6 +25,7 @@
 #include 
 #include 
 #include 
+#include 
 
 //#define DEBUG
 
@@ -74,6 +75,7 @@ void cpu_idle (void)
 {
/* endless idle loop with no priority at all */
while (1) {
+   rcu_idle_enter();
while (!need_resched()) {
void (*idle)(void);
/*
@@ -86,6 +88,7 @@ void cpu_idle (void)
idle = default_idle;
idle();
}
+   rcu_idle_exit();
schedule_preempt_disabled();
}
 }
-- 
1.7.5.4

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 01/10] alpha: Add missing RCU idle APIs on idle loop

2012-08-22 Thread Frederic Weisbecker
On Wed, Aug 22, 2012 at 10:19:30AM -0700, Paul E. McKenney wrote:
> On Wed, Aug 22, 2012 at 06:23:39PM +0200, Frederic Weisbecker wrote:
> > In the old times, the whole idle task was considered
> > as an RCU quiescent state. But as RCU became more and
> > more successful overtime, some RCU read side critical
> > section have been added even in the code of some
> > architectures idle tasks, for tracing for example.
> > 
> > So nowadays, rcu_idle_enter() and rcu_idle_exit() must
> > be called by the architecture to tell RCU about the part
> > in the idle loop that doesn't make use of rcu read side
> > critical sections, typically the part that puts the CPU
> > in low power mode.
> > 
> > This is necessary for RCU to find the quiescent states in
> > idle in order to complete grace periods.
> > 
> > Add this missing pair of calls in the Alpha's idle loop.
> > 
> > Reported-by: Paul E. McKenney 
> > Signed-off-by: Frederic Weisbecker 
> > Cc: Richard Henderson 
> > Cc: Ivan Kokshaysky 
> > Cc: Matt Turner 
> > Cc: alpha 
> > Cc: Paul E. McKenney 
> > Cc: 3.2.x.. 
> > ---
> >  arch/alpha/kernel/process.c |6 +-
> >  1 files changed, 5 insertions(+), 1 deletions(-)
> > 
> > diff --git a/arch/alpha/kernel/process.c b/arch/alpha/kernel/process.c
> > index 153d3fc..2ebf7b5 100644
> > --- a/arch/alpha/kernel/process.c
> > +++ b/arch/alpha/kernel/process.c
> > @@ -28,6 +28,7 @@
> >  #include 
> >  #include 
> >  #include 
> > +#include 
> > 
> >  #include 
> >  #include 
> > @@ -50,13 +51,16 @@ cpu_idle(void)
> >  {
> > set_thread_flag(TIF_POLLING_NRFLAG);
> > 
> > +   preempt_disable();
> 
> I don't understand the above preempt_disable() not having a matching
> preempt_enable() at exit, but the rest of the patches in this series
> look good to me.

The current code is preemptable, at least it appears so because it calls
schedule() directly. And if I call rcu_idle_enter() in a preemptable section,
I'm in trouble because I'll schedule while in extended QS.

Thus I need to disable preemption here at least until I call rcu_idle_exit().

Now this is an endless loop so there is no need to re-enable
preemption after the loop. And schedule_preempt_disabled()
takes care of enabling preemption before schedule() and redisabling
it afterward.


> 
>   Thanx, Paul
> 
> > while (1) {
> > /* FIXME -- EV6 and LCA45 know how to power down
> >the CPU.  */
> > 
> > +   rcu_idle_enter();
> > while (!need_resched())
> > cpu_relax();
> > -   schedule();
> > +   rcu_idle_exit();
> > +   schedule_preempt_disabled();
> > }
> >  }
> > 
> > -- 
> > 1.7.5.4
> > 
> 
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 01/10] alpha: Add missing RCU idle APIs on idle loop

2012-08-23 Thread Frederic Weisbecker
On Wed, Aug 22, 2012 at 12:01:09PM -0700, Paul E. McKenney wrote:
> > The current code is preemptable, at least it appears so because it calls
> > schedule() directly. And if I call rcu_idle_enter() in a preemptable 
> > section,
> > I'm in trouble because I'll schedule while in extended QS.
> > 
> > Thus I need to disable preemption here at least until I call 
> > rcu_idle_exit().
> > 
> > Now this is an endless loop so there is no need to re-enable
> > preemption after the loop. And schedule_preempt_disabled()
> > takes care of enabling preemption before schedule() and redisabling
> > it afterward.
> > 
> > 
> > > 
> > >   Thanx, Paul
> > > 
> > > > while (1) {
> > > > /* FIXME -- EV6 and LCA45 know how to power down
> > > >the CPU.  */
> > > > 
> > > > +   rcu_idle_enter();
> > > > while (!need_resched())
> > > > cpu_relax();
> > > > -   schedule();
> > > > +   rcu_idle_exit();
> > > > +   schedule_preempt_disabled();
> > > > }
> 
> Understood, but what I don't understand is why you don't need a
> preempt_enable() right here.

Look, let's inline the content of schedule_preempt_disabled(), the code
then looks like:

void cpu_idle(void)
{
set_thread_flag(TIF_POLLING_NRFLAG);

preempt_disable();
while (1) {
/* FIXME -- EV6 and LCA45 know how to power down
   the CPU.  */

rcu_idle_enter();
while (!need_resched())
cpu_relax();
rcu_idle_exit();

sched_preempt_enable_no_resched();
schedule();
preempt_disable();
}
}

So there is a preempt_enable() before we schedule, then we re-disable
preemption after schedule.

Now I realize cpu_idle() is supposed to be called with preemption disabled
already so I shouldn't add an explicit preempt_disable() or it's going to be 
worse.
But that means there is an existing bug here in alpha, it should call 
schedule_preempt_disabled()
instead of schedule(). cpu_idle() is called with preemption disabled on the 
boot CPU.
And it should as well from the secondary CPUs entry but alpha doesn't seem to 
do that.

So I need to fix that first. I'll respin.

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: linux-next: manual merge of the tip tree with the rr tree

2012-08-23 Thread Frederic Weisbecker
On Thu, Aug 23, 2012 at 12:43:48PM +1000, Stephen Rothwell wrote:
> Hi all,
> 
> Today's linux-next merge of the tip tree got a conflict in arch/Kconfig
> between commit bd029f48459a ("Make most arch asm/module.h files use
> asm-generic/module.h") from the rr tree and commit b952741c8079
> ("cputime: Generalize CONFIG_VIRT_CPU_ACCOUNTING") from the tip tree.
> 
> Just context changes.  I fixed it up (see below) and can carry the fix as
> necessary.
> -- 
> Cheers,
> Stephen Rothwells...@canb.auug.org.au

Looks good, thanks!

> 
> diff --cc arch/Kconfig
> index 3450115,ea5feb6..000
> --- a/arch/Kconfig
> +++ b/arch/Kconfig
> @@@ -281,23 -294,7 +294,26 @@@ config SECCOMP_FILTE
>   
> See Documentation/prctl/seccomp_filter.txt for details.
>   
>  +config HAVE_MOD_ARCH_SPECIFIC
>  +bool
>  +help
>  +  The arch uses struct mod_arch_specific to store data.  Many arches
>  +  just need a simple module loader without arch specific data - those
>  +  should not enable this.
>  +
>  +config MODULES_USE_ELF_RELA
>  +bool
>  +help
>  +  Modules only use ELF RELA relocations.  Modules with ELF REL
>  +  relocations will give an error.
>  +
>  +config MODULES_USE_ELF_REL
>  +bool
>  +help
>  +  Modules only use ELF REL relocations.  Modules with ELF RELA
>  +  relocations will give an error.
>  +
> + config HAVE_VIRT_CPU_ACCOUNTING
> + bool
> + 
>   source "kernel/gcov/Kconfig"


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 01/10] alpha: Add missing RCU idle APIs on idle loop

2012-08-23 Thread Frederic Weisbecker
On Thu, Aug 23, 2012 at 09:32:18PM +1200, Michael Cree wrote:
> On 23/08/12 04:23, Frederic Weisbecker wrote:
> > In the old times, the whole idle task was considered
> > as an RCU quiescent state. But as RCU became more and
> > more successful overtime, some RCU read side critical
> > section have been added even in the code of some
> > architectures idle tasks, for tracing for example.
> 
> Fantastic!  It fixes RCU CPU stalls that we were seeing on the SMP
> kernel when built for generic Alpha.
> 
> A build of glibc and running its test suite reliably triggers RCU CPU
> stalls when running a kernel built for generic Alpha.  I have just built
> glibc and ran its test suite twice with no RCU CPU stalls with this
> patch against a 3.5.2 kernel!  Nice.  Very nice.
> 
> I see the stable queue is CCed but I note the patch does not apply
> cleanly to the 3.2.y kernel.  It would be nice to have a backport of the
> patches for the 3.2 stable kernel.

Sure.

> 
> So feel free to add:
> 
> Tested-by:  Michael Cree 

Thanks, but I need to refactor the patch, I suspect a problem with 
CONFIG_PREEMPT.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 00/10] rcu: Add missing RCU idle APIs on idle loop

2012-08-23 Thread Frederic Weisbecker
On Wed, Aug 22, 2012 at 07:18:04PM +0200, Geert Uytterhoeven wrote:
> On Wed, Aug 22, 2012 at 6:23 PM, Frederic Weisbecker  
> wrote:
> > So this fixes some potential RCU stalls in a bunch of architectures.
> > When rcu_idle_enter()/rcu_idle_exit() became a requirement, we forgot
> > to handle the architectures that don't support CONFIG_NO_HZ.
> >
> > I guess the set should be dispatched into arch maintainer trees.
> 
> I can take the m68k version, but are you sure you want it this way?
> Each of them must be in mainline before they can enter stable.

Yeah, I was thinking the right route is for these patches to be
carried by arch maintainer who then push to Linus and then this goes
to stable.

Is that ok for you?

Otherwise I can carry the patches myself. In a tree of my own, or
Paul's or mmotm. As long as I have your ack.

Thanks.

> 
> > I'm sorry I haven't built tested everywhere. But the changes are
> > small and need to be at least boot tested anyway.
> 
> Builds and boots fine on m68k under ARAnyM.
> Acked-by: Geert Uytterhoeven  (for m68k)
> 
> Gr{oetje,eeting}s,
> 
> Geert
> 
> --
> Geert Uytterhoeven -- There's lots of Linux beyond ia32 -- 
> ge...@linux-m68k.org
> 
> In personal conversations with technical people, I call myself a hacker. But
> when I'm talking to journalists I just say "programmer" or something like 
> that.
> -- Linus Torvalds
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH 00/11] rcu: Add missing RCU idle APIs on idle loop v2

2012-08-23 Thread Frederic Weisbecker
Hi,

Changes since v1:

- Fixed preempt handling in alpha idle loop
- added ack from Geert
- fixed stable email address, sorry :-/

This time I built tested everywhere but: h8300 (compiler internal error),
and mn10300, parisc, score (cross compilers not available in
ftp://ftp.kernel.org/pub/tools/crosstool/files/bin/x86_64/4.6.3/)

For testing, you can pull from:

git://github.com/fweisbec/linux-dynticks.git
rcu/idle-fix-v2 

Thanks.

Frederic Weisbecker (11):
  alpha: Fix preemption handling in idle loop
  alpha: Add missing RCU idle APIs on idle loop
  cris: Add missing RCU idle APIs on idle loop
  frv: Add missing RCU idle APIs on idle loop
  h8300: Add missing RCU idle APIs on idle loop
  m32r: Add missing RCU idle APIs on idle loop
  m68k: Add missing RCU idle APIs on idle loop
  mn10300: Add missing RCU idle APIs on idle loop
  parisc: Add missing RCU idle APIs on idle loop
  score: Add missing RCU idle APIs on idle loop
  xtensa: Add missing RCU idle APIs on idle loop

 arch/alpha/kernel/process.c   |6 +-
 arch/alpha/kernel/smp.c   |1 +
 arch/cris/kernel/process.c|3 +++
 arch/frv/kernel/process.c |3 +++
 arch/h8300/kernel/process.c   |3 +++
 arch/m32r/kernel/process.c|3 +++
 arch/m68k/kernel/process.c|3 +++
 arch/mn10300/kernel/process.c |3 +++
 arch/parisc/kernel/process.c  |3 +++
 arch/score/kernel/process.c   |4 +++-
 arch/xtensa/kernel/process.c  |3 +++
 11 files changed, 33 insertions(+), 2 deletions(-)

-- 
1.7.5.4

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH 03/11] cris: Add missing RCU idle APIs on idle loop

2012-08-23 Thread Frederic Weisbecker
In the old times, the whole idle task was considered
as an RCU quiescent state. But as RCU became more and
more successful overtime, some RCU read side critical
section have been added even in the code of some
architectures idle tasks, for tracing for example.

So nowadays, rcu_idle_enter() and rcu_idle_exit() must
be called by the architecture to tell RCU about the part
in the idle loop that doesn't make use of rcu read side
critical sections, typically the part that puts the CPU
in low power mode.

This is necessary for RCU to find the quiescent states in
idle in order to complete grace periods.

Add this missing pair of calls in the Cris's idle loop.

Reported-by: Paul E. McKenney 
Signed-off-by: Frederic Weisbecker 
Cc: Mikael Starvik 
Cc: Jesper Nilsson 
Cc: Cris 
Cc: 3.2.x.. 
Cc: Paul E. McKenney 
---
 arch/cris/kernel/process.c |3 +++
 1 files changed, 3 insertions(+), 0 deletions(-)

diff --git a/arch/cris/kernel/process.c b/arch/cris/kernel/process.c
index 66fd017..7f65be6 100644
--- a/arch/cris/kernel/process.c
+++ b/arch/cris/kernel/process.c
@@ -25,6 +25,7 @@
 #include 
 #include 
 #include 
+#include 
 
 //#define DEBUG
 
@@ -74,6 +75,7 @@ void cpu_idle (void)
 {
/* endless idle loop with no priority at all */
while (1) {
+   rcu_idle_enter();
while (!need_resched()) {
void (*idle)(void);
/*
@@ -86,6 +88,7 @@ void cpu_idle (void)
idle = default_idle;
idle();
}
+   rcu_idle_exit();
schedule_preempt_disabled();
}
 }
-- 
1.7.5.4

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH 02/11] alpha: Add missing RCU idle APIs on idle loop

2012-08-23 Thread Frederic Weisbecker
In the old times, the whole idle task was considered
as an RCU quiescent state. But as RCU became more and
more successful overtime, some RCU read side critical
section have been added even in the code of some
architectures idle tasks, for tracing for example.

So nowadays, rcu_idle_enter() and rcu_idle_exit() must
be called by the architecture to tell RCU about the part
in the idle loop that doesn't make use of rcu read side
critical sections, typically the part that puts the CPU
in low power mode.

This is necessary for RCU to find the quiescent states in
idle in order to complete grace periods.

Add this missing pair of calls in the Alpha's idle loop.

Reported-by: Paul E. McKenney 
Signed-off-by: Frederic Weisbecker 
Cc: Richard Henderson 
Cc: Ivan Kokshaysky 
Cc: Matt Turner 
Cc: alpha 
Cc: Paul E. McKenney 
Cc: Michael Cree 
Cc: 3.2.x.. 
---
 arch/alpha/kernel/process.c |3 +++
 1 files changed, 3 insertions(+), 0 deletions(-)

diff --git a/arch/alpha/kernel/process.c b/arch/alpha/kernel/process.c
index eac5e01..eb9558c 100644
--- a/arch/alpha/kernel/process.c
+++ b/arch/alpha/kernel/process.c
@@ -28,6 +28,7 @@
 #include 
 #include 
 #include 
+#include 
 
 #include 
 #include 
@@ -54,9 +55,11 @@ cpu_idle(void)
/* FIXME -- EV6 and LCA45 know how to power down
   the CPU.  */
 
+   rcu_idle_enter();
while (!need_resched())
cpu_relax();
 
+   rcu_idle_exit();
schedule_preempt_disabled();
}
 }
-- 
1.7.5.4

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH 06/11] m32r: Add missing RCU idle APIs on idle loop

2012-08-23 Thread Frederic Weisbecker
In the old times, the whole idle task was considered
as an RCU quiescent state. But as RCU became more and
more successful overtime, some RCU read side critical
section have been added even in the code of some
architectures idle tasks, for tracing for example.

So nowadays, rcu_idle_enter() and rcu_idle_exit() must
be called by the architecture to tell RCU about the part
in the idle loop that doesn't make use of rcu read side
critical sections, typically the part that puts the CPU
in low power mode.

This is necessary for RCU to find the quiescent states in
idle in order to complete grace periods.

Add this missing pair of calls in the m32r's idle loop.

Reported-by: Paul E. McKenney 
Signed-off-by: Frederic Weisbecker 
Cc: Hirokazu Takata 
Cc: 3.2.x.. 
Cc: Paul E. McKenney 
---
 arch/m32r/kernel/process.c |3 +++
 1 files changed, 3 insertions(+), 0 deletions(-)

diff --git a/arch/m32r/kernel/process.c b/arch/m32r/kernel/process.c
index 3a4a32b2..384e63f 100644
--- a/arch/m32r/kernel/process.c
+++ b/arch/m32r/kernel/process.c
@@ -26,6 +26,7 @@
 #include 
 #include 
 #include 
+#include 
 
 #include 
 #include 
@@ -82,6 +83,7 @@ void cpu_idle (void)
 {
/* endless idle loop with no priority at all */
while (1) {
+   rcu_idle_enter();
while (!need_resched()) {
void (*idle)(void) = pm_idle;
 
@@ -90,6 +92,7 @@ void cpu_idle (void)
 
idle();
}
+   rcu_idle_exit();
schedule_preempt_disabled();
}
 }
-- 
1.7.5.4

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH 08/11] mn10300: Add missing RCU idle APIs on idle loop

2012-08-23 Thread Frederic Weisbecker
In the old times, the whole idle task was considered
as an RCU quiescent state. But as RCU became more and
more successful overtime, some RCU read side critical
section have been added even in the code of some
architectures idle tasks, for tracing for example.

So nowadays, rcu_idle_enter() and rcu_idle_exit() must
be called by the architecture to tell RCU about the part
in the idle loop that doesn't make use of rcu read side
critical sections, typically the part that puts the CPU
in low power mode.

This is necessary for RCU to find the quiescent states in
idle in order to complete grace periods.

Add this missing pair of calls in the mn10300's idle loop.

Reported-by: Paul E. McKenney 
Signed-off-by: Frederic Weisbecker 
Cc: David Howells 
Cc: Koichi Yasutake 
Cc: 3.2.x.. 
Cc: Paul E. McKenney 
---
 arch/mn10300/kernel/process.c |3 +++
 1 files changed, 3 insertions(+), 0 deletions(-)

diff --git a/arch/mn10300/kernel/process.c b/arch/mn10300/kernel/process.c
index 7dab0cd..e9cceba 100644
--- a/arch/mn10300/kernel/process.c
+++ b/arch/mn10300/kernel/process.c
@@ -25,6 +25,7 @@
 #include 
 #include 
 #include 
+#include 
 #include 
 #include 
 #include 
@@ -107,6 +108,7 @@ void cpu_idle(void)
 {
/* endless idle loop with no priority at all */
for (;;) {
+   rcu_idle_enter();
while (!need_resched()) {
void (*idle)(void);
 
@@ -121,6 +123,7 @@ void cpu_idle(void)
}
idle();
}
+   rcu_idle_exit();
 
schedule_preempt_disabled();
}
-- 
1.7.5.4

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH 11/11] xtensa: Add missing RCU idle APIs on idle loop

2012-08-23 Thread Frederic Weisbecker
In the old times, the whole idle task was considered
as an RCU quiescent state. But as RCU became more and
more successful overtime, some RCU read side critical
section have been added even in the code of some
architectures idle tasks, for tracing for example.

So nowadays, rcu_idle_enter() and rcu_idle_exit() must
be called by the architecture to tell RCU about the part
in the idle loop that doesn't make use of rcu read side
critical sections, typically the part that puts the CPU
in low power mode.

This is necessary for RCU to find the quiescent states in
idle in order to complete grace periods.

Add this missing pair of calls in the xtensa's idle loop.

Reported-by: Paul E. McKenney 
Signed-off-by: Frederic Weisbecker 
Cc: Chris Zankel 
Cc: 3.2.x.. 
Cc: Paul E. McKenney 
---
 arch/xtensa/kernel/process.c |3 +++
 1 files changed, 3 insertions(+), 0 deletions(-)

diff --git a/arch/xtensa/kernel/process.c b/arch/xtensa/kernel/process.c
index 2c8d6a3..bc44311 100644
--- a/arch/xtensa/kernel/process.c
+++ b/arch/xtensa/kernel/process.c
@@ -31,6 +31,7 @@
 #include 
 #include 
 #include 
+#include 
 
 #include 
 #include 
@@ -110,8 +111,10 @@ void cpu_idle(void)
 
/* endless idle loop with no priority at all */
while (1) {
+   rcu_idle_enter();
while (!need_resched())
platform_idle();
+   rcu_idle_exit();
schedule_preempt_disabled();
}
 }
-- 
1.7.5.4

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH 10/11] score: Add missing RCU idle APIs on idle loop

2012-08-23 Thread Frederic Weisbecker
In the old times, the whole idle task was considered
as an RCU quiescent state. But as RCU became more and
more successful overtime, some RCU read side critical
section have been added even in the code of some
architectures idle tasks, for tracing for example.

So nowadays, rcu_idle_enter() and rcu_idle_exit() must
be called by the architecture to tell RCU about the part
in the idle loop that doesn't make use of rcu read side
critical sections, typically the part that puts the CPU
in low power mode.

This is necessary for RCU to find the quiescent states in
idle in order to complete grace periods.

Add this missing pair of calls in the scores's idle loop.

Reported-by: Paul E. McKenney 
Signed-off-by: Frederic Weisbecker 
Cc: Chen Liqin 
Cc: Lennox Wu 
Cc: 3.2.x.. 
Cc: Paul E. McKenney 
---
 arch/score/kernel/process.c |4 +++-
 1 files changed, 3 insertions(+), 1 deletions(-)

diff --git a/arch/score/kernel/process.c b/arch/score/kernel/process.c
index 2707023..637970c 100644
--- a/arch/score/kernel/process.c
+++ b/arch/score/kernel/process.c
@@ -27,6 +27,7 @@
 #include 
 #include 
 #include 
+#include 
 
 void (*pm_power_off)(void);
 EXPORT_SYMBOL(pm_power_off);
@@ -50,9 +51,10 @@ void __noreturn cpu_idle(void)
 {
/* endless idle loop with no priority at all */
while (1) {
+   rcu_idle_enter();
while (!need_resched())
barrier();
-
+   rcu_idle_exit();
schedule_preempt_disabled();
}
 }
-- 
1.7.5.4

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH 09/11] parisc: Add missing RCU idle APIs on idle loop

2012-08-23 Thread Frederic Weisbecker
In the old times, the whole idle task was considered
as an RCU quiescent state. But as RCU became more and
more successful overtime, some RCU read side critical
section have been added even in the code of some
architectures idle tasks, for tracing for example.

So nowadays, rcu_idle_enter() and rcu_idle_exit() must
be called by the architecture to tell RCU about the part
in the idle loop that doesn't make use of rcu read side
critical sections, typically the part that puts the CPU
in low power mode.

This is necessary for RCU to find the quiescent states in
idle in order to complete grace periods.

Add this missing pair of calls in the parisc's idle loop.

Reported-by: Paul E. McKenney 
Signed-off-by: Frederic Weisbecker 
Cc: James E.J. Bottomley 
Cc: Helge Deller 
Cc: Parisc 
Cc: 3.2.x.. 
Cc: Paul E. McKenney 
---
 arch/parisc/kernel/process.c |3 +++
 1 files changed, 3 insertions(+), 0 deletions(-)

diff --git a/arch/parisc/kernel/process.c b/arch/parisc/kernel/process.c
index d4b94b3..c54a4db 100644
--- a/arch/parisc/kernel/process.c
+++ b/arch/parisc/kernel/process.c
@@ -48,6 +48,7 @@
 #include 
 #include 
 #include 
+#include 
 
 #include 
 #include 
@@ -69,8 +70,10 @@ void cpu_idle(void)
 
/* endless idle loop with no priority at all */
while (1) {
+   rcu_idle_enter();
while (!need_resched())
barrier();
+   rcu_idle_exit();
schedule_preempt_disabled();
check_pgt_cache();
}
-- 
1.7.5.4

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH 07/11] m68k: Add missing RCU idle APIs on idle loop

2012-08-23 Thread Frederic Weisbecker
In the old times, the whole idle task was considered
as an RCU quiescent state. But as RCU became more and
more successful overtime, some RCU read side critical
section have been added even in the code of some
architectures idle tasks, for tracing for example.

So nowadays, rcu_idle_enter() and rcu_idle_exit() must
be called by the architecture to tell RCU about the part
in the idle loop that doesn't make use of rcu read side
critical sections, typically the part that puts the CPU
in low power mode.

This is necessary for RCU to find the quiescent states in
idle in order to complete grace periods.

Add this missing pair of calls in the m68k's idle loop.

Reported-by: Paul E. McKenney 
Signed-off-by: Frederic Weisbecker 
Acked-by: Geert Uytterhoeven 
Cc: m68k 
Cc: 3.2.x.. 
Cc: Paul E. McKenney 
---
 arch/m68k/kernel/process.c |3 +++
 1 files changed, 3 insertions(+), 0 deletions(-)

diff --git a/arch/m68k/kernel/process.c b/arch/m68k/kernel/process.c
index c488e3c..ac2892e 100644
--- a/arch/m68k/kernel/process.c
+++ b/arch/m68k/kernel/process.c
@@ -25,6 +25,7 @@
 #include 
 #include 
 #include 
+#include 
 
 #include 
 #include 
@@ -75,8 +76,10 @@ void cpu_idle(void)
 {
/* endless idle loop with no priority at all */
while (1) {
+   rcu_idle_enter();
while (!need_resched())
idle();
+   rcu_idle_exit();
schedule_preempt_disabled();
}
 }
-- 
1.7.5.4

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH 04/11] frv: Add missing RCU idle APIs on idle loop

2012-08-23 Thread Frederic Weisbecker
In the old times, the whole idle task was considered
as an RCU quiescent state. But as RCU became more and
more successful overtime, some RCU read side critical
section have been added even in the code of some
architectures idle tasks, for tracing for example.

So nowadays, rcu_idle_enter() and rcu_idle_exit() must
be called by the architecture to tell RCU about the part
in the idle loop that doesn't make use of rcu read side
critical sections, typically the part that puts the CPU
in low power mode.

This is necessary for RCU to find the quiescent states in
idle in order to complete grace periods.

Add this missing pair of calls in the Frv's idle loop.

Reported-by: Paul E. McKenney 
Signed-off-by: Frederic Weisbecker 
Cc: David Howells 
Cc: 3.2.x.. 
Cc: Paul E. McKenney 
---
 arch/frv/kernel/process.c |3 +++
 1 files changed, 3 insertions(+), 0 deletions(-)

diff --git a/arch/frv/kernel/process.c b/arch/frv/kernel/process.c
index ff95f50..2eb7fa5 100644
--- a/arch/frv/kernel/process.c
+++ b/arch/frv/kernel/process.c
@@ -25,6 +25,7 @@
 #include 
 #include 
 #include 
+#include 
 
 #include 
 #include 
@@ -69,12 +70,14 @@ void cpu_idle(void)
 {
/* endless idle loop with no priority at all */
while (1) {
+   rcu_idle_enter();
while (!need_resched()) {
check_pgt_cache();
 
if (!frv_dma_inprogress && idle)
idle();
}
+   rcu_idle_exit();
 
schedule_preempt_disabled();
}
-- 
1.7.5.4

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH 05/11] h8300: Add missing RCU idle APIs on idle loop

2012-08-23 Thread Frederic Weisbecker
In the old times, the whole idle task was considered
as an RCU quiescent state. But as RCU became more and
more successful overtime, some RCU read side critical
section have been added even in the code of some
architectures idle tasks, for tracing for example.

So nowadays, rcu_idle_enter() and rcu_idle_exit() must
be called by the architecture to tell RCU about the part
in the idle loop that doesn't make use of rcu read side
critical sections, typically the part that puts the CPU
in low power mode.

This is necessary for RCU to find the quiescent states in
idle in order to complete grace periods.

Add this missing pair of calls in the h8300's idle loop.

Reported-by: Paul E. McKenney 
Signed-off-by: Frederic Weisbecker 
Cc: Yoshinori Sato 
Cc: 3.2.x.. 
Cc: Paul E. McKenney 
---
 arch/h8300/kernel/process.c |3 +++
 1 files changed, 3 insertions(+), 0 deletions(-)

diff --git a/arch/h8300/kernel/process.c b/arch/h8300/kernel/process.c
index 0e9c315..f153ed1 100644
--- a/arch/h8300/kernel/process.c
+++ b/arch/h8300/kernel/process.c
@@ -36,6 +36,7 @@
 #include 
 #include 
 #include 
+#include 
 
 #include 
 #include 
@@ -78,8 +79,10 @@ void (*idle)(void) = default_idle;
 void cpu_idle(void)
 {
while (1) {
+   rcu_idle_enter();
while (!need_resched())
idle();
+   rcu_idle_exit();
schedule_preempt_disabled();
}
 }
-- 
1.7.5.4

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH 01/11] alpha: Fix preemption handling in idle loop

2012-08-23 Thread Frederic Weisbecker
cpu_idle() is called on the boot CPU by the init code with
preemption disabled. But the cpu_idle() function in alpha
doesn't handle this when it calls schedule() directly.

Fix it by converting it into schedule_preempt_disabled().

Also disable preemption before calling cpu_idle() from
secondary CPU entry code to stay consistent with this
state.

Signed-off-by: Frederic Weisbecker 
Cc: Richard Henderson
Cc: Ivan Kokshaysky 
Cc: Matt Turner 
Cc: alpha 
Cc: Paul E. McKenney 
Cc: Michael Cree 
---
 arch/alpha/kernel/process.c |3 ++-
 arch/alpha/kernel/smp.c |1 +
 2 files changed, 3 insertions(+), 1 deletions(-)

diff --git a/arch/alpha/kernel/process.c b/arch/alpha/kernel/process.c
index 153d3fc..eac5e01 100644
--- a/arch/alpha/kernel/process.c
+++ b/arch/alpha/kernel/process.c
@@ -56,7 +56,8 @@ cpu_idle(void)
 
while (!need_resched())
cpu_relax();
-   schedule();
+
+   schedule_preempt_disabled();
}
 }
 
diff --git a/arch/alpha/kernel/smp.c b/arch/alpha/kernel/smp.c
index 35ddc02..a41ad90 100644
--- a/arch/alpha/kernel/smp.c
+++ b/arch/alpha/kernel/smp.c
@@ -166,6 +166,7 @@ smp_callin(void)
DBGS(("smp_callin: commencing CPU %d current %p active_mm %p\n",
  cpuid, current, current->active_mm));
 
+   preempt_disable();
/* Do nothing.  */
cpu_idle();
 }
-- 
1.7.5.4

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH v2] fork: fix oops after fork failure

2012-08-23 Thread Frederic Weisbecker
On Thu, Aug 23, 2012 at 07:36:08PM +0400, Glauber Costa wrote:
> When we want to duplicate a new process, dup_task_struct() will undergo
> a series of allocations. If alloc_thread_info_node() fails, we call
> free_task_struct() and return.
> 
> This seems right, but it is not. free_task_struct() will not only free
> the task struct from the kmem_cache, but will also call
> arch_release_task_struct(). The problem is that this function is
> supposed to undo whatever arch-specific work done by
> arch_dup_task_struct(), that is not yet called at this point.  The
> particular problem I ran accross was that in x86, we will arrive at
> fpu_free() without having ever allocated it.
> 
> Signed-off-by: Glauber Costa 
> Reported-by: Frederic Weisbecker 

Tested-by: Frederic Weisbecker 
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: mmotm 2012-08-13-16-55 uploaded

2012-08-23 Thread Frederic Weisbecker
On Tue, Aug 14, 2012 at 04:26:56PM +0400, Glauber Costa wrote:
> On 08/14/2012 02:53 PM, Michal Hocko wrote:
> > On Mon 13-08-12 16:56:50, Andrew Morton wrote:
> >> > The mm-of-the-moment snapshot 2012-08-13-16-55 has been uploaded to
> >> > 
> >> >http://www.ozlabs.org/~akpm/mmotm/
> > -mm git tree has been updated as well. You can find the tree at
> > https://github.com/mstsxfx/memcg-devel.git since-3.5
> > 
> > tagged as mmotm-2012-08-13-16-55
> > 
> 
> On top of this tree, people following the kmemcg development may also
> want to checkout
> 
>git://github.com/glommer/linux.git memcg-3.5/kmemcg-stack
> 
> A branch called memcg-3.5/kmemcg-slab is also available with the slab
> changes ontop.

I tested it successfully to stop a forkbomb in a container.
One may need the following fix as well: 
http://marc.info/?l=linux-kernel&m=134573636430031&w=2

Andrew, others, what is your opinion on this patchset?

Thanks.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 00/10] rcu: Add missing RCU idle APIs on idle loop

2012-08-23 Thread Frederic Weisbecker
On Thu, Aug 23, 2012 at 10:23:22PM +0200, Geert Uytterhoeven wrote:
> Hi Frederic,
> 
> On Thu, Aug 23, 2012 at 1:02 PM, Frederic Weisbecker  
> wrote:
> > On Wed, Aug 22, 2012 at 07:18:04PM +0200, Geert Uytterhoeven wrote:
> >> On Wed, Aug 22, 2012 at 6:23 PM, Frederic Weisbecker  
> >> wrote:
> >> > So this fixes some potential RCU stalls in a bunch of architectures.
> >> > When rcu_idle_enter()/rcu_idle_exit() became a requirement, we forgot
> >> > to handle the architectures that don't support CONFIG_NO_HZ.
> >> >
> >> > I guess the set should be dispatched into arch maintainer trees.
> >>
> >> I can take the m68k version, but are you sure you want it this way?
> >> Each of them must be in mainline before they can enter stable.
> >
> > Yeah, I was thinking the right route is for these patches to be
> > carried by arch maintainer who then push to Linus and then this goes
> > to stable.
> >
> > Is that ok for you?
> >
> > Otherwise I can carry the patches myself. In a tree of my own, or
> > Paul's or mmotm. As long as I have your ack.
> 
> I applied your patch to the m68k for-3.6/for-linus branch.
> I'll ask Linus to pull later in the rc cycle (right now I don't have
> anything else
> queued for 3.6).
> Still, I think it's better to just collect acks and send it to Linus
> in one shot,
> so it can go into stable in one shot too.

Sure I can do that if you prefer.

Thanks.

> 
> Gr{oetje,eeting}s,
> 
> Geert
> 
> --
> Geert Uytterhoeven -- There's lots of Linux beyond ia32 -- 
> ge...@linux-m68k.org
> 
> In personal conversations with technical people, I call myself a hacker. But
> when I'm talking to journalists I just say "programmer" or something like 
> that.
> -- Linus Torvalds
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 3/4] perf: teach perf inject to merge sched_stat_* and sched_switch events (v2)

2012-08-25 Thread Frederic Weisbecker
On Tue, Aug 07, 2012 at 04:56:04PM +0400, Andrew Vagin wrote:
> +struct event_entry {
> + struct list_head node;
> + u32  pid;
> + union perf_event event[0];
> +};
> +
> +static LIST_HEAD(samples);
> +
> +static int perf_event__sched_stat(struct perf_tool *tool,
> +   union perf_event *event,
> +   struct perf_sample *sample,
> +   struct perf_evsel *evsel,
> +   struct machine *machine)
> +{
> + const char *evname = NULL;
> + uint32_t size;
> + struct event_entry *ent;
> + union perf_event *event_sw = NULL;
> + struct perf_sample sample_sw;
> + int sched_process_exit;
> +
> + size = event->header.size;
> +
> + evname = evsel->tp_format->name;
> +
> + sched_process_exit = !strcmp(evname, "sched_process_exit");
> +
> + if (!strcmp(evname, "sched_switch") ||  sched_process_exit) {
> + list_for_each_entry(ent, &samples, node)
> + if (sample->pid == ent->pid)

I suspect what you're rather interested in is the sample tid.

> + break;
> +
> + if (&ent->node != &samples) {
> + list_del(&ent->node);
> + free(ent);
> + }
> +
> + if (sched_process_exit)
> + return 0;
> +
> + ent = malloc(size + sizeof(struct event_entry));
> + if (ent == NULL)
> + die("malloc");
> + ent->pid = sample->pid;

Ditto.

> + memcpy(&ent->event, event, size);
> + list_add(&ent->node, &samples);
> + return 0;
> +
> + } else if (!strncmp(evname, "sched_stat_", 11)) {
> + u32 pid;
> +
> + pid = raw_field_value(evsel->tp_format,
> + "pid", sample->raw_data);

There you parse the pid from the trace content. That's fine because
it's actually the tid that is saved on the trace event. But this one
is not pid-namespace safe (it saves current->pid directly) while
sample->tid is pid-namespace safe (it uses task_pid_nr_ns).

So I suggest you to use sample->tid instead, plus that's going to be
consistant with what you did above.

Thanks.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 00/11] rcu: Add missing RCU idle APIs on idle loop v2

2012-08-25 Thread Frederic Weisbecker
On Fri, Aug 24, 2012 at 08:50:47PM -0700, Paul E. McKenney wrote:
> On Sat, Aug 25, 2012 at 02:19:14AM +0100, Ben Hutchings wrote:
> > On Fri, 2012-08-24 at 14:26 -0700, Paul E. McKenney wrote:
> > > On Thu, Aug 23, 2012 at 04:58:24PM +0200, Frederic Weisbecker wrote:
> > > > Hi,
> > > > 
> > > > Changes since v1:
> > > > 
> > > > - Fixed preempt handling in alpha idle loop
> > > > - added ack from Geert
> > > > - fixed stable email address, sorry :-/
> > > > 
> > > > This time I built tested everywhere but: h8300 (compiler internal 
> > > > error),
> > > > and mn10300, parisc, score (cross compilers not available in
> > > > ftp://ftp.kernel.org/pub/tools/crosstool/files/bin/x86_64/4.6.3/)
> > > > 
> > > > For testing, you can pull from:
> > > > 
> > > > git://github.com/fweisbec/linux-dynticks.git
> > > > rcu/idle-fix-v2 
> > > > 
> > > > Thanks.
> > > 
> > > I have queued these on -rcu branch rcu/idle:
> > > 
> > >   git://git.kernel.org/pub/scm/linux/kernel/git/paulmck/linux-rcu.git
> > > 
> > > This problem has been in place since 3.3, so it is hard to argue that
> > > it is a regression for this merge window.  I have therefore queued it
> > > for 3.7.
> > 
> > I don't follow that; I would expect any serious bug fix (serious enough
> > for a stable update) to be acceptable for 3.6 at this point.
> 
> OK, if any of the arch maintainers wishes to submit the patch to 3.6,
> they are free to do so -- just let me know and I will drop the patch from
> my tree.
> 
> That said, all this does is cause spurious warnings to be printed, so
> not sure it really qualifies as serious.  But I am happy to leave that
> decision with the individual arch maintainers -- it is their arch,
> after all, so their decision.

Couldn't that cause hung tasks due to long lasting synchronize_rcu() ?
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: Where to put test code?

2012-09-19 Thread Frederic Weisbecker
2012/9/19 Daniel Santos :
> I'm putting the finishing touches on the generic red-black tree test
> code, but I'm uncertain about where to place it exactly.
>
> I haven't finished the test module just yet, but the idea is that the
> tests can be run in userspace as well as kernelspace to make it easier
> to test on multiple compilers.  It has some common sources files (used
> by in both places) and then specific code for both user- and
> kernel-space that I currently have as follows:
>
> tools/testing/selftests/grbtree/   - common.{c,h}
> tools/testing/selftests/grbtree/user   - user-space main.c, Makefile, etc.
> tools/testing/selftests/grbtree/module - kernel-space grbtest.c,
> Makefile, etc.
>
> Would this be correct or should the common & module code go some place
> else and then just have the user-space code under
> tools/testing/selftests/grbtest?

It depends on the nature of your tests. Are these pure validation
tests (some batch
tests that perform actions and check the result is correct) or stress
tests (something
that runs for a while)?

If these are only about validation tests, then both user and module
can be in that
tools/testing/selftests directory.

What is the module doing?
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: Where to put test code?

2012-09-20 Thread Frederic Weisbecker
2012/9/20 Daniel Santos :
> Thanks for the response!
>
> On 09/19/2012 05:18 PM, Frederic Weisbecker wrote:
>> 2012/9/19 Daniel Santos :
>>> I'm putting the finishing touches on the generic red-black tree test
>>> code, but I'm uncertain about where to place it exactly.
>>>
>>> I haven't finished the test module just yet, but the idea is that the
>>> tests can be run in userspace as well as kernelspace to make it easier
>>> to test on multiple compilers.  It has some common sources files (used
>>> by in both places) and then specific code for both user- and
>>> kernel-space that I currently have as follows:
>>>
>>> tools/testing/selftests/grbtree/   - common.{c,h}
>>> tools/testing/selftests/grbtree/user   - user-space main.c, Makefile, etc.
>>> tools/testing/selftests/grbtree/module - kernel-space grbtest.c,
>>> Makefile, etc.
>>>
>>> Would this be correct or should the common & module code go some place
>>> else and then just have the user-space code under
>>> tools/testing/selftests/grbtest?
>> It depends on the nature of your tests. Are these pure validation
>> tests (some batch
>> tests that perform actions and check the result is correct) or stress
>> tests (something
>> that runs for a while)?
> The program does both performance measurement tests and validation tests
> based upon what you pass at the command line.  The primary aim is to
> measure performance differences between the generic code and specific
> (hand-coded) implementations on various compilers.  The secondary aim is
> to provide validation that the results are correct in all
> circumstances.  I'm not sure in this case what would be considered a
> "stress" test.

Ok. The selftests in tools/testing/selftest run in batch, so if there
is one in the middle that does stress tests for a while, it delays the
other tests. The purpose for these units tests are to quickly detect
for regressions or anything that break expected results.

Your test sounds like a good candidate for that directory I guess.

>
>> If these are only about validation tests, then both user and module
>> can be in that
>> tools/testing/selftests directory.
>>
>> What is the module doing?
> The module is the exact same thing, except built in kernel-space, where
> the actual code will normally reside.  Parameters are passed when you
> load the module and it unloads when the test is complete.  Perhaps what
> I omitted is that the user-space program is generated partially by
> compiling sources and headers that are intended for kernel-space only,
> but linked with glibc using some cute hacks.  This is done mostly to
> ease the process of testing the code with multiple compilers.

Ok, looks good as well.

Thanks!
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH 0/2] perf tools: Basic bash completion support

2012-08-07 Thread Frederic Weisbecker
Hey,

Basic bash completion support. Only support perf subcommands and most -e basic
event descriptor (no grouping).

I just have a small issue with tracepoints because of their ":" in the middle.
It auto completes as long as we haven't yet reached the semicolon. Otherwise
we need to add a double quote in the beginning of the expression. I'm quite
a newbie in bash completion though, so I might find a subtelty later to solve
this.

Frederic Weisbecker (2):
  perf tools: Initial bash completion support
  perf tools: Support for events bash completion

 tools/perf/Makefile|1 +
 tools/perf/bash_completion |   24 ++
 tools/perf/builtin-list.c  |   14 ---
 tools/perf/perf.c  |   69 ++-
 tools/perf/util/parse-events.c |   70 +---
 tools/perf/util/parse-events.h |7 ++--
 6 files changed, 120 insertions(+), 65 deletions(-)
 create mode 100644 tools/perf/bash_completion

-- 
1.7.5.4

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH 2/2] perf tools: Support for events bash completion

2012-08-07 Thread Frederic Weisbecker
Add basic bash completion for the -e option in record, top
and stat subcommands. Only hardware, software and tracepoint
events are supported.

Breakpoints, raw events and events grouping completion
need more thinking.

Signed-off-by: Frederic Weisbecker 
Cc: David Ahern 
Cc: Ingo Molnar 
Cc: Jiri Olsa 
Cc: Namhyung Kim 
Cc: Peter Zijlstra 
Cc: Stephane Eranian 
---
 tools/perf/bash_completion |6 +++-
 tools/perf/builtin-list.c  |   14 ---
 tools/perf/util/parse-events.c |   70 +---
 tools/perf/util/parse-events.h |7 ++--
 4 files changed, 61 insertions(+), 36 deletions(-)

diff --git a/tools/perf/bash_completion b/tools/perf/bash_completion
index 3547703..25f4d99 100644
--- a/tools/perf/bash_completion
+++ b/tools/perf/bash_completion
@@ -6,12 +6,16 @@ _perf()
local cur
 
COMPREPLY=()
-   _get_comp_words_by_ref cur
+   _get_comp_words_by_ref cur prev
 
# List perf subcommands
if [ $COMP_CWORD -eq 1 ]; then
cmds=$(perf --list-cmds)
COMPREPLY=( $( compgen -W '$cmds' -- "$cur" ) )
+   # List possible events for -e option
+   elif [[ $prev == "-e" && "${COMP_WORDS[1]}" == @(record|stat|top) ]]; 
then
+   cmds=$(perf list --raw-dump)
+   COMPREPLY=( $( compgen -W '$cmds' -- $cur ) )
# Fall down to list regular files
else
_filedir
diff --git a/tools/perf/builtin-list.c b/tools/perf/builtin-list.c
index 6313b6e..bdcff81 100644
--- a/tools/perf/builtin-list.c
+++ b/tools/perf/builtin-list.c
@@ -19,15 +19,15 @@ int cmd_list(int argc, const char **argv, const char 
*prefix __used)
setup_pager();
 
if (argc == 1)
-   print_events(NULL);
+   print_events(NULL, false);
else {
int i;
 
for (i = 1; i < argc; ++i) {
-   if (i > 1)
+   if (i > 2)
putchar('\n');
if (strncmp(argv[i], "tracepoint", 10) == 0)
-   print_tracepoint_events(NULL, NULL);
+   print_tracepoint_events(NULL, NULL, false);
else if (strcmp(argv[i], "hw") == 0 ||
 strcmp(argv[i], "hardware") == 0)
print_events_type(PERF_TYPE_HARDWARE);
@@ -36,13 +36,15 @@ int cmd_list(int argc, const char **argv, const char 
*prefix __used)
print_events_type(PERF_TYPE_SOFTWARE);
else if (strcmp(argv[i], "cache") == 0 ||
 strcmp(argv[i], "hwcache") == 0)
-   print_hwcache_events(NULL);
+   print_hwcache_events(NULL, false);
+   else if (strcmp(argv[i], "--raw-dump") == 0)
+   print_events(NULL, true);
else {
char *sep = strchr(argv[i], ':'), *s;
int sep_idx;
 
if (sep == NULL) {
-   print_events(argv[i]);
+   print_events(argv[i], false);
continue;
}
sep_idx = sep - argv[i];
@@ -51,7 +53,7 @@ int cmd_list(int argc, const char **argv, const char *prefix 
__used)
return -1;
 
s[sep_idx] = '\0';
-   print_tracepoint_events(s, s + sep_idx + 1);
+   print_tracepoint_events(s, s + sep_idx + 1, 
false);
free(s);
}
}
diff --git a/tools/perf/util/parse-events.c b/tools/perf/util/parse-events.c
index 74a5af4..30dba72 100644
--- a/tools/perf/util/parse-events.c
+++ b/tools/perf/util/parse-events.c
@@ -799,7 +799,8 @@ static const char * const event_type_descriptors[] = {
  * Print the events from /tracing/events
  */
 
-void print_tracepoint_events(const char *subsys_glob, const char *event_glob)
+void print_tracepoint_events(const char *subsys_glob, const char *event_glob,
+bool name_only)
 {
DIR *sys_dir, *evt_dir;
struct dirent *sys_next, *evt_next, sys_dirent, evt_dirent;
@@ -829,6 +830,11 @@ void print_tracepoint_events(const char *subsys_glob, 
const char *event_glob)
!strglobmatch(evt_dirent.d_name, event_glob))
continue;
 
+   if (name_only) {
+   printf("%s:%s ", sys_

[PATCH 1/2] perf tools: Initial bash completion support

2012-08-07 Thread Frederic Weisbecker
This implements bash completion for perf subcommands such
as record, report, script, probe, etc...

Signed-off-by: Frederic Weisbecker 
Cc: David Ahern 
Cc: Ingo Molnar 
Cc: Jiri Olsa 
Cc: Namhyung Kim 
Cc: Peter Zijlstra 
Cc: Stephane Eranian 
---
 tools/perf/Makefile|1 +
 tools/perf/bash_completion |   20 +
 tools/perf/perf.c  |   69 +---
 3 files changed, 60 insertions(+), 30 deletions(-)
 create mode 100644 tools/perf/bash_completion

diff --git a/tools/perf/Makefile b/tools/perf/Makefile
index 35655c3..4000d72 100644
--- a/tools/perf/Makefile
+++ b/tools/perf/Makefile
@@ -951,6 +951,7 @@ install: all
$(INSTALL) scripts/python/Perf-Trace-Util/lib/Perf/Trace/* -t 
'$(DESTDIR_SQ)$(perfexec_instdir_SQ)/scripts/python/Perf-Trace-Util/lib/Perf/Trace'
$(INSTALL) scripts/python/*.py -t 
'$(DESTDIR_SQ)$(perfexec_instdir_SQ)/scripts/python'
$(INSTALL) scripts/python/bin/* -t 
'$(DESTDIR_SQ)$(perfexec_instdir_SQ)/scripts/python/bin'
+   $(INSTALL) -m 755 bash_completion /etc/bash_completion.d/perf
 
 install-python_ext:
$(PYTHON_WORD) util/setup.py --quiet install --root='/$(DESTDIR_SQ)'
diff --git a/tools/perf/bash_completion b/tools/perf/bash_completion
new file mode 100644
index 000..3547703
--- /dev/null
+++ b/tools/perf/bash_completion
@@ -0,0 +1,20 @@
+# perf completion
+
+have perf &&
+_perf()
+{
+   local cur
+
+   COMPREPLY=()
+   _get_comp_words_by_ref cur
+
+   # List perf subcommands
+   if [ $COMP_CWORD -eq 1 ]; then
+   cmds=$(perf --list-cmds)
+   COMPREPLY=( $( compgen -W '$cmds' -- "$cur" ) )
+   # Fall down to list regular files
+   else
+   _filedir
+   fi
+} &&
+complete -F _perf perf
diff --git a/tools/perf/perf.c b/tools/perf/perf.c
index 2b2e225..db37ee3 100644
--- a/tools/perf/perf.c
+++ b/tools/perf/perf.c
@@ -24,6 +24,37 @@ const char perf_more_info_string[] =
 int use_browser = -1;
 static int use_pager = -1;
 
+struct cmd_struct {
+   const char *cmd;
+   int (*fn)(int, const char **, const char *);
+   int option;
+};
+
+static struct cmd_struct commands[] = {
+   { "buildid-cache", cmd_buildid_cache, 0 },
+   { "buildid-list", cmd_buildid_list, 0 },
+   { "diff",   cmd_diff,   0 },
+   { "evlist", cmd_evlist, 0 },
+   { "help",   cmd_help,   0 },
+   { "list",   cmd_list,   0 },
+   { "record", cmd_record, 0 },
+   { "report", cmd_report, 0 },
+   { "bench",  cmd_bench,  0 },
+   { "stat",   cmd_stat,   0 },
+   { "timechart",  cmd_timechart,  0 },
+   { "top",cmd_top,0 },
+   { "annotate",   cmd_annotate,   0 },
+   { "version",cmd_version,0 },
+   { "script", cmd_script, 0 },
+   { "sched",  cmd_sched,  0 },
+   { "probe",  cmd_probe,  0 },
+   { "kmem",   cmd_kmem,   0 },
+   { "lock",   cmd_lock,   0 },
+   { "kvm",cmd_kvm,0 },
+   { "test",   cmd_test,   0 },
+   { "inject", cmd_inject, 0 },
+};
+
 struct pager_config {
const char *cmd;
int val;
@@ -160,6 +191,14 @@ static int handle_options(const char ***argv, int *argc, 
int *envchanged)
fprintf(stderr, "dir: %s\n", debugfs_mountpoint);
if (envchanged)
*envchanged = 1;
+   } else if (!strcmp(cmd, "--list-cmds")) {
+   unsigned int i;
+
+   for (i = 0; i < ARRAY_SIZE(commands); i++) {
+   struct cmd_struct *p = commands+i;
+   printf("%s ", p->cmd);
+   }
+   exit(0);
} else {
fprintf(stderr, "Unknown option: %s\n", cmd);
usage(perf_usage_string);
@@ -245,12 +284,6 @@ const char perf_version_string[] = PERF_VERSION;
  */
 #define NEED_WORK_TREE (1<<2)
 
-struct cmd_struct {
-   const char *cmd;
-   int (*fn)(int, const char **, const char *);
-   int option;
-};
-
 static int run_builtin(struct cmd_struct *p, int argc, const char **argv)
 {
int status;
@@ -296,30 +329,6 @@ static int run_builtin(struct cmd_struct *p, int argc, 
const char **argv)
 static void handle_internal_command(int argc, const char **argv)
 {
const char *cmd = argv[0];
-   static struct cmd_struct commands[] = {
-   { "build

Re: [PATCH 0/2] perf tools: Basic bash completion support

2012-08-07 Thread Frederic Weisbecker
On Tue, Aug 07, 2012 at 03:19:44PM +0200, Frederic Weisbecker wrote:
> Hey,
> 
> Basic bash completion support. Only support perf subcommands and most -e basic
> event descriptor (no grouping).
> 
> I just have a small issue with tracepoints because of their ":" in the middle.
> It auto completes as long as we haven't yet reached the semicolon. Otherwise
> we need to add a double quote in the beginning of the expression. I'm quite
> a newbie in bash completion though, so I might find a subtelty later to solve
> this.

Tips: for testing, you need to "make install" and update the bash completion
scripts:

# make install
$ . /etc/bash_completion

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 0/2] perf tools: Basic bash completion support

2012-08-07 Thread Frederic Weisbecker
On Tue, Aug 07, 2012 at 08:18:12AM -0600, David Ahern wrote:
> On 8/7/12 7:22 AM, Frederic Weisbecker wrote:
> >On Tue, Aug 07, 2012 at 03:19:44PM +0200, Frederic Weisbecker wrote:
> >>Hey,
> >>
> >>Basic bash completion support. Only support perf subcommands and most -e 
> >>basic
> >>event descriptor (no grouping).
> >>
> >>I just have a small issue with tracepoints because of their ":" in the 
> >>middle.
> >>It auto completes as long as we haven't yet reached the semicolon. Otherwise
> >>we need to add a double quote in the beginning of the expression. I'm quite
> >>a newbie in bash completion though, so I might find a subtelty later to 
> >>solve
> >>this.
> >
> >Tips: for testing, you need to "make install" and update the bash completion
> >scripts:
> >
> > # make install
> > $ . /etc/bash_completion
> >
> 
> ANd you need to make sure the PATH hits the updated binary and not
> the default other wise you end up with:
> 
> /tmp/pbuild/perf recUnknown option: --list-cmds
> 
>  Usage: perf [--version] [--help] COMMAND [ARGS]
> Unknown option: --list-cmds
> 
> It's calling /usr/bin/perf with --list-cmds, versus the perf command
> I am running (/tmp/pbuild/perf). Any way to teach the completion to
> use the perf binary that the user is running?

Ah good point.

Does the below work for you? I'll respin with that change.

diff --git a/tools/perf/bash_completion b/tools/perf/bash_completion
index 25f4d99..cba72a9 100644
--- a/tools/perf/bash_completion
+++ b/tools/perf/bash_completion
@@ -3,18 +3,20 @@
 have perf &&
 _perf()
 {
-   local cur
+   local cur cmd
 
COMPREPLY=()
_get_comp_words_by_ref cur prev
 
+   cmd=${COMP_WORDS[0]}
+
# List perf subcommands
if [ $COMP_CWORD -eq 1 ]; then
-   cmds=$(perf --list-cmds)
+   cmds=$($cmd --list-cmds)
COMPREPLY=( $( compgen -W '$cmds' -- "$cur" ) )
# List possible events for -e option
elif [[ $prev == "-e" && "${COMP_WORDS[1]}" == @(record|stat|top) ]]; 
then
-   cmds=$(perf list --raw-dump)
+   cmds=$($cmd list --raw-dump)
COMPREPLY=( $( compgen -W '$cmds' -- $cur ) )
# Fall down to list regular files
else

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 2/2] perf tools: Support for events bash completion

2012-08-07 Thread Frederic Weisbecker
On Tue, Aug 07, 2012 at 08:48:04AM -0600, David Ahern wrote:
> On 8/7/12 7:19 AM, Frederic Weisbecker wrote:
> >Add basic bash completion for the -e option in record, top
> >and stat subcommands. Only hardware, software and tracepoint
> >events are supported.
> >
> >Breakpoints, raw events and events grouping completion
> >need more thinking.
> >
> >Signed-off-by: Frederic Weisbecker 
> >Cc: David Ahern 
> >Cc: Ingo Molnar 
> >Cc: Jiri Olsa 
> >Cc: Namhyung Kim 
> >Cc: Peter Zijlstra 
> >Cc: Stephane Eranian 
> >---
> >  tools/perf/bash_completion |6 +++-
> >  tools/perf/builtin-list.c  |   14 ---
> >  tools/perf/util/parse-events.c |   70 
> > +---
> >  tools/perf/util/parse-events.h |7 ++--
> >  4 files changed, 61 insertions(+), 36 deletions(-)
> >
> >diff --git a/tools/perf/bash_completion b/tools/perf/bash_completion
> >index 3547703..25f4d99 100644
> >--- a/tools/perf/bash_completion
> >+++ b/tools/perf/bash_completion
> >@@ -6,12 +6,16 @@ _perf()
> > local cur
> >
> > COMPREPLY=()
> >-_get_comp_words_by_ref cur
> >+_get_comp_words_by_ref cur prev
> >
> > # List perf subcommands
> > if [ $COMP_CWORD -eq 1 ]; then
> > cmds=$(perf --list-cmds)
> > COMPREPLY=( $( compgen -W '$cmds' -- "$cur" ) )
> >+# List possible events for -e option
> >+elif [[ $prev == "-e" && "${COMP_WORDS[1]}" == @(record|stat|top) ]]; 
> >then
> >+cmds=$(perf list --raw-dump)
> >+COMPREPLY=( $( compgen -W '$cmds' -- $cur ) )
> > # Fall down to list regular files
> > else
> > _filedir
> 
> Any reason to show a file list except for -i and -o options? e.g.,

Yeah, for example with perf record when you pass a command to launch and 
profile.

In any case I think it's a better idea to keep this as a default. Not breaking 
the
pre-existing default completion in the guarantee that the new completion is 
going
to be more useful than a burden.

> 
> diff --git a/tools/perf/bash_completion b/tools/perf/bash_completion
> index 25f4d99..be97349 100644
> --- a/tools/perf/bash_completion
> +++ b/tools/perf/bash_completion
> @@ -17,7 +17,7 @@ _perf()
> cmds=$(perf list --raw-dump)
> COMPREPLY=( $( compgen -W '$cmds' -- $cur ) )
> # Fall down to list regular files
> -   else
> +   elif [[ $prev == "-o" || $prev == "-i" ]]; then
> _filedir
> fi
>  } &&
> 
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 1/2] perf tools: Initial bash completion support

2012-08-07 Thread Frederic Weisbecker
On Tue, Aug 07, 2012 at 08:11:46AM -0600, David Ahern wrote:
> On 8/7/12 7:19 AM, Frederic Weisbecker wrote:
> >This implements bash completion for perf subcommands such
> >as record, report, script, probe, etc...
> 
> Love it!
> 
> >
> >Signed-off-by: Frederic Weisbecker 
> >Cc: David Ahern 
> >Cc: Ingo Molnar 
> >Cc: Jiri Olsa 
> >Cc: Namhyung Kim 
> >Cc: Peter Zijlstra 
> >Cc: Stephane Eranian 
> >---
> >  tools/perf/Makefile|1 +
> >  tools/perf/bash_completion |   20 +
> >  tools/perf/perf.c  |   69 
> > +---
> >  3 files changed, 60 insertions(+), 30 deletions(-)
> >  create mode 100644 tools/perf/bash_completion
> >
> >diff --git a/tools/perf/Makefile b/tools/perf/Makefile
> >index 35655c3..4000d72 100644
> >--- a/tools/perf/Makefile
> >+++ b/tools/perf/Makefile
> >@@ -951,6 +951,7 @@ install: all
> > $(INSTALL) scripts/python/Perf-Trace-Util/lib/Perf/Trace/* -t 
> > '$(DESTDIR_SQ)$(perfexec_instdir_SQ)/scripts/python/Perf-Trace-Util/lib/Perf/Trace'
> > $(INSTALL) scripts/python/*.py -t 
> > '$(DESTDIR_SQ)$(perfexec_instdir_SQ)/scripts/python'
> > $(INSTALL) scripts/python/bin/* -t 
> > '$(DESTDIR_SQ)$(perfexec_instdir_SQ)/scripts/python/bin'
> >+$(INSTALL) -m 755 bash_completion /etc/bash_completion.d/perf
> 
> $(DESTDIR_SQ) is need in front of the destination.

Right. Fixing this.

Thanks.

> 
> David
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 2/2] perf tools: Support for events bash completion

2012-08-07 Thread Frederic Weisbecker
On Tue, Aug 07, 2012 at 05:05:04PM +0100, Alan Cox wrote:
> > >   COMPREPLY=( $( compgen -W '$cmds' -- "$cur" ) )
> > > + # List possible events for -e option
> > > + elif [[ $prev == "-e" && "${COMP_WORDS[1]}" == @(record|stat|top) ]]; 
> > > then
> > > + cmds=$(perf list --raw-dump)
> > > + COMPREPLY=( $( compgen -W '$cmds' -- $cur ) )
> 
> 
> Surely $cur should be quoted here...

Right, fixing that too.

thanks.
 
> Alan
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH 0/2] perf tools: Basic bash completion support v2

2012-08-07 Thread Frederic Weisbecker
Changes since v1:

- Reuse the perf binary of the user to send the "perf --list-cmds"
and "perf list --raw-dump" instead of the default one. (suggested by
David Ahern)

- Install in DESTDIR_SQ (suggested by David Ahern)

- Protect $cur under quotes on compgen cmdline (suggested by Alan Cox)

Frederic Weisbecker (2):
  perf tools: Initial bash completion support
  perf tools: Support for events bash completion

 tools/perf/Makefile|1 +
 tools/perf/bash_completion |   26 +++
 tools/perf/builtin-list.c  |   14 ---
 tools/perf/perf.c  |   69 ++-
 tools/perf/util/parse-events.c |   70 +---
 tools/perf/util/parse-events.h |7 ++--
 6 files changed, 122 insertions(+), 65 deletions(-)
 create mode 100644 tools/perf/bash_completion

-- 
1.7.5.4

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH 1/2] perf tools: Initial bash completion support

2012-08-07 Thread Frederic Weisbecker
This implements bash completion for perf subcommands such
as record, report, script, probe, etc...

Signed-off-by: Frederic Weisbecker 
Cc: David Ahern 
Cc: Ingo Molnar 
Cc: Jiri Olsa 
Cc: Namhyung Kim 
Cc: Peter Zijlstra 
Cc: Stephane Eranian 
---
 tools/perf/Makefile|1 +
 tools/perf/bash_completion |   22 ++
 tools/perf/perf.c  |   69 +---
 3 files changed, 62 insertions(+), 30 deletions(-)
 create mode 100644 tools/perf/bash_completion

diff --git a/tools/perf/Makefile b/tools/perf/Makefile
index 35655c3..ddfb7e5 100644
--- a/tools/perf/Makefile
+++ b/tools/perf/Makefile
@@ -951,6 +951,7 @@ install: all
$(INSTALL) scripts/python/Perf-Trace-Util/lib/Perf/Trace/* -t 
'$(DESTDIR_SQ)$(perfexec_instdir_SQ)/scripts/python/Perf-Trace-Util/lib/Perf/Trace'
$(INSTALL) scripts/python/*.py -t 
'$(DESTDIR_SQ)$(perfexec_instdir_SQ)/scripts/python'
$(INSTALL) scripts/python/bin/* -t 
'$(DESTDIR_SQ)$(perfexec_instdir_SQ)/scripts/python/bin'
+   $(INSTALL) -m 755 bash_completion 
$(DESTDIR_SQ)/etc/bash_completion.d/perf
 
 install-python_ext:
$(PYTHON_WORD) util/setup.py --quiet install --root='/$(DESTDIR_SQ)'
diff --git a/tools/perf/bash_completion b/tools/perf/bash_completion
new file mode 100644
index 000..9a31fa5
--- /dev/null
+++ b/tools/perf/bash_completion
@@ -0,0 +1,22 @@
+# perf completion
+
+have perf &&
+_perf()
+{
+   local cur cmd
+
+   COMPREPLY=()
+   _get_comp_words_by_ref cur
+
+   cmd=${COMP_WORDS[0]}
+
+   # List perf subcommands
+   if [ $COMP_CWORD -eq 1 ]; then
+   cmds=$($cmd --list-cmds)
+   COMPREPLY=( $( compgen -W '$cmds' -- "$cur" ) )
+   # Fall down to list regular files
+   else
+   _filedir
+   fi
+} &&
+complete -F _perf perf
diff --git a/tools/perf/perf.c b/tools/perf/perf.c
index 2b2e225..db37ee3 100644
--- a/tools/perf/perf.c
+++ b/tools/perf/perf.c
@@ -24,6 +24,37 @@ const char perf_more_info_string[] =
 int use_browser = -1;
 static int use_pager = -1;
 
+struct cmd_struct {
+   const char *cmd;
+   int (*fn)(int, const char **, const char *);
+   int option;
+};
+
+static struct cmd_struct commands[] = {
+   { "buildid-cache", cmd_buildid_cache, 0 },
+   { "buildid-list", cmd_buildid_list, 0 },
+   { "diff",   cmd_diff,   0 },
+   { "evlist", cmd_evlist, 0 },
+   { "help",   cmd_help,   0 },
+   { "list",   cmd_list,   0 },
+   { "record", cmd_record, 0 },
+   { "report", cmd_report, 0 },
+   { "bench",  cmd_bench,  0 },
+   { "stat",   cmd_stat,   0 },
+   { "timechart",  cmd_timechart,  0 },
+   { "top",cmd_top,0 },
+   { "annotate",   cmd_annotate,   0 },
+   { "version",cmd_version,0 },
+   { "script", cmd_script, 0 },
+   { "sched",  cmd_sched,  0 },
+   { "probe",  cmd_probe,  0 },
+   { "kmem",   cmd_kmem,   0 },
+   { "lock",   cmd_lock,   0 },
+   { "kvm",cmd_kvm,0 },
+   { "test",   cmd_test,   0 },
+   { "inject", cmd_inject, 0 },
+};
+
 struct pager_config {
const char *cmd;
int val;
@@ -160,6 +191,14 @@ static int handle_options(const char ***argv, int *argc, 
int *envchanged)
fprintf(stderr, "dir: %s\n", debugfs_mountpoint);
if (envchanged)
*envchanged = 1;
+   } else if (!strcmp(cmd, "--list-cmds")) {
+   unsigned int i;
+
+   for (i = 0; i < ARRAY_SIZE(commands); i++) {
+   struct cmd_struct *p = commands+i;
+   printf("%s ", p->cmd);
+   }
+   exit(0);
} else {
fprintf(stderr, "Unknown option: %s\n", cmd);
usage(perf_usage_string);
@@ -245,12 +284,6 @@ const char perf_version_string[] = PERF_VERSION;
  */
 #define NEED_WORK_TREE (1<<2)
 
-struct cmd_struct {
-   const char *cmd;
-   int (*fn)(int, const char **, const char *);
-   int option;
-};
-
 static int run_builtin(struct cmd_struct *p, int argc, const char **argv)
 {
int status;
@@ -296,30 +329,6 @@ static int run_builtin(struct cmd_struct *p, int argc, 
const char **argv)
 static void handle_internal_command(int argc, const char **argv)
 {
const char *cmd = argv[0];
-   static struct cm

[PATCH 2/2] perf tools: Support for events bash completion

2012-08-07 Thread Frederic Weisbecker
Add basic bash completion for the -e option in record, top
and stat subcommands. Only hardware, software and tracepoint
events are supported.

Breakpoints, raw events and events grouping completion
need more thinking.

Signed-off-by: Frederic Weisbecker 
Cc: David Ahern 
Cc: Ingo Molnar 
Cc: Jiri Olsa 
Cc: Namhyung Kim 
Cc: Peter Zijlstra 
Cc: Stephane Eranian 
---
 tools/perf/bash_completion |6 +++-
 tools/perf/builtin-list.c  |   14 ---
 tools/perf/util/parse-events.c |   70 +---
 tools/perf/util/parse-events.h |7 ++--
 4 files changed, 61 insertions(+), 36 deletions(-)

diff --git a/tools/perf/bash_completion b/tools/perf/bash_completion
index 9a31fa5..1958fa5 100644
--- a/tools/perf/bash_completion
+++ b/tools/perf/bash_completion
@@ -6,7 +6,7 @@ _perf()
local cur cmd
 
COMPREPLY=()
-   _get_comp_words_by_ref cur
+   _get_comp_words_by_ref cur prev
 
cmd=${COMP_WORDS[0]}
 
@@ -14,6 +14,10 @@ _perf()
if [ $COMP_CWORD -eq 1 ]; then
cmds=$($cmd --list-cmds)
COMPREPLY=( $( compgen -W '$cmds' -- "$cur" ) )
+   # List possible events for -e option
+   elif [[ $prev == "-e" && "${COMP_WORDS[1]}" == @(record|stat|top) ]]; 
then
+   cmds=$($cmd list --raw-dump)
+   COMPREPLY=( $( compgen -W '$cmds' -- "$cur" ) )
# Fall down to list regular files
else
_filedir
diff --git a/tools/perf/builtin-list.c b/tools/perf/builtin-list.c
index 6313b6e..bdcff81 100644
--- a/tools/perf/builtin-list.c
+++ b/tools/perf/builtin-list.c
@@ -19,15 +19,15 @@ int cmd_list(int argc, const char **argv, const char 
*prefix __used)
setup_pager();
 
if (argc == 1)
-   print_events(NULL);
+   print_events(NULL, false);
else {
int i;
 
for (i = 1; i < argc; ++i) {
-   if (i > 1)
+   if (i > 2)
putchar('\n');
if (strncmp(argv[i], "tracepoint", 10) == 0)
-   print_tracepoint_events(NULL, NULL);
+   print_tracepoint_events(NULL, NULL, false);
else if (strcmp(argv[i], "hw") == 0 ||
 strcmp(argv[i], "hardware") == 0)
print_events_type(PERF_TYPE_HARDWARE);
@@ -36,13 +36,15 @@ int cmd_list(int argc, const char **argv, const char 
*prefix __used)
print_events_type(PERF_TYPE_SOFTWARE);
else if (strcmp(argv[i], "cache") == 0 ||
 strcmp(argv[i], "hwcache") == 0)
-   print_hwcache_events(NULL);
+   print_hwcache_events(NULL, false);
+   else if (strcmp(argv[i], "--raw-dump") == 0)
+   print_events(NULL, true);
else {
char *sep = strchr(argv[i], ':'), *s;
int sep_idx;
 
if (sep == NULL) {
-   print_events(argv[i]);
+   print_events(argv[i], false);
continue;
}
sep_idx = sep - argv[i];
@@ -51,7 +53,7 @@ int cmd_list(int argc, const char **argv, const char *prefix 
__used)
return -1;
 
s[sep_idx] = '\0';
-   print_tracepoint_events(s, s + sep_idx + 1);
+   print_tracepoint_events(s, s + sep_idx + 1, 
false);
free(s);
}
}
diff --git a/tools/perf/util/parse-events.c b/tools/perf/util/parse-events.c
index 74a5af4..30dba72 100644
--- a/tools/perf/util/parse-events.c
+++ b/tools/perf/util/parse-events.c
@@ -799,7 +799,8 @@ static const char * const event_type_descriptors[] = {
  * Print the events from /tracing/events
  */
 
-void print_tracepoint_events(const char *subsys_glob, const char *event_glob)
+void print_tracepoint_events(const char *subsys_glob, const char *event_glob,
+bool name_only)
 {
DIR *sys_dir, *evt_dir;
struct dirent *sys_next, *evt_next, sys_dirent, evt_dirent;
@@ -829,6 +830,11 @@ void print_tracepoint_events(const char *subsys_glob, 
const char *event_glob)
!strglobmatch(evt_dirent.d_name, event_glob))
continue;
 
+   if (name_only) {
+   p

Re: [PATCH 1/2] perf tools: Initial bash completion support

2012-08-07 Thread Frederic Weisbecker
On Wed, Aug 08, 2012 at 10:10:02AM +0900, Namhyung Kim wrote:
> On Tue, 07 Aug 2012 16:10:54 -0600, David Ahern wrote:
> > On 8/7/12 11:00 AM, Frederic Weisbecker wrote:
> >> diff --git a/tools/perf/Makefile b/tools/perf/Makefile
> >> index 35655c3..ddfb7e5 100644
> >> --- a/tools/perf/Makefile
> >> +++ b/tools/perf/Makefile
> >> @@ -951,6 +951,7 @@ install: all
> >>$(INSTALL) scripts/python/Perf-Trace-Util/lib/Perf/Trace/* -t 
> >> '$(DESTDIR_SQ)$(perfexec_instdir_SQ)/scripts/python/Perf-Trace-Util/lib/Perf/Trace'
> >>$(INSTALL) scripts/python/*.py -t 
> >> '$(DESTDIR_SQ)$(perfexec_instdir_SQ)/scripts/python'
> >>$(INSTALL) scripts/python/bin/* -t 
> >> '$(DESTDIR_SQ)$(perfexec_instdir_SQ)/scripts/python/bin'
> >> +  $(INSTALL) -m 755 bash_completion 
> >> $(DESTDIR_SQ)/etc/bash_completion.d/perf
> >
> > still getting an error here:
> >
> > $ make DESTDIR=/tmp/junk-perf O=/tmp/pbuild -C tools/perf/ install
> > ...
> > install -m 755 bash_completion /tmp/junk-perf/etc/bash_completion.d/perf
> > install: cannot create regular file
> > /tmp/junk-perf/etc/bash_completion.d/perf': No such file or directory
> > make: *** [install] Error 1
> > make: Leaving directory `/opt/sw/ahern/perf.git/tools/perf'
> 
> Does patch below fix it?

Thanks Namhyung.

Can I have your signed-off-by to add this patch on my series?

Thanks.

> 
> 
> diff --git a/tools/perf/Makefile b/tools/perf/Makefile
> index cfe4fc0b67f1..d0b27ba9663e 100644
> --- a/tools/perf/Makefile
> +++ b/tools/perf/Makefile
> @@ -696,6 +696,7 @@ perfexecdir_SQ = $(subst ','\'',$(perfexecdir))
>  template_dir_SQ = $(subst ','\'',$(template_dir))
>  htmldir_SQ = $(subst ','\'',$(htmldir))
>  prefix_SQ = $(subst ','\'',$(prefix))
> +sysconfdir_SQ = $(subst ','\'',$(sysconfdir))
>  
>  SHELL_PATH_SQ = $(subst ','\'',$(SHELL_PATH))
>  
> @@ -947,7 +948,8 @@ install: all
>   $(INSTALL) scripts/python/Perf-Trace-Util/lib/Perf/Trace/* -t 
> '$(DESTDIR_SQ)$(perfexec_instdir_SQ)/scripts/python/Perf-Trace-Util/lib/Perf/Trace'
>   $(INSTALL) scripts/python/*.py -t 
> '$(DESTDIR_SQ)$(perfexec_instdir_SQ)/scripts/python'
>   $(INSTALL) scripts/python/bin/* -t 
> '$(DESTDIR_SQ)$(perfexec_instdir_SQ)/scripts/python/bin'
> - $(INSTALL) -m 755 bash_completion 
> $(DESTDIR_SQ)/etc/bash_completion.d/perf
> + $(INSTALL) -d -m 755 '$(DESTDIR_SQ)$(sysconfdir_SQ)/bash_completion.d'
> + $(INSTALL) bash_completion 
> '$(DESTDIR_SQ)$(sysconfdir_SQ)/bash_completion.d/perf'
>  
>  install-python_ext:
>   $(PYTHON_WORD) util/setup.py --quiet install --root='/$(DESTDIR_SQ)'
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH 1/3] perf tools: Initial bash completion support

2012-08-09 Thread Frederic Weisbecker
This implements bash completion for perf subcommands such
as record, report, script, probe, etc...

Signed-off-by: Frederic Weisbecker 
Cc: David Ahern 
Cc: Ingo Molnar 
Cc: Jiri Olsa 
Cc: Namhyung Kim 
Cc: Peter Zijlstra 
Cc: Stephane Eranian 
---
 tools/perf/Makefile|1 +
 tools/perf/bash_completion |   22 ++
 tools/perf/perf.c  |   69 +---
 3 files changed, 62 insertions(+), 30 deletions(-)
 create mode 100644 tools/perf/bash_completion

diff --git a/tools/perf/Makefile b/tools/perf/Makefile
index 2d4bf6e..84b4227 100644
--- a/tools/perf/Makefile
+++ b/tools/perf/Makefile
@@ -951,6 +951,7 @@ install: all
$(INSTALL) scripts/python/Perf-Trace-Util/lib/Perf/Trace/* -t 
'$(DESTDIR_SQ)$(perfexec_instdir_SQ)/scripts/python/Perf-Trace-Util/lib/Perf/Trace'
$(INSTALL) scripts/python/*.py -t 
'$(DESTDIR_SQ)$(perfexec_instdir_SQ)/scripts/python'
$(INSTALL) scripts/python/bin/* -t 
'$(DESTDIR_SQ)$(perfexec_instdir_SQ)/scripts/python/bin'
+   $(INSTALL) -m 755 bash_completion 
$(DESTDIR_SQ)/etc/bash_completion.d/perf
 
 install-python_ext:
$(PYTHON_WORD) util/setup.py --quiet install --root='/$(DESTDIR_SQ)'
diff --git a/tools/perf/bash_completion b/tools/perf/bash_completion
new file mode 100644
index 000..9a31fa5
--- /dev/null
+++ b/tools/perf/bash_completion
@@ -0,0 +1,22 @@
+# perf completion
+
+have perf &&
+_perf()
+{
+   local cur cmd
+
+   COMPREPLY=()
+   _get_comp_words_by_ref cur
+
+   cmd=${COMP_WORDS[0]}
+
+   # List perf subcommands
+   if [ $COMP_CWORD -eq 1 ]; then
+   cmds=$($cmd --list-cmds)
+   COMPREPLY=( $( compgen -W '$cmds' -- "$cur" ) )
+   # Fall down to list regular files
+   else
+   _filedir
+   fi
+} &&
+complete -F _perf perf
diff --git a/tools/perf/perf.c b/tools/perf/perf.c
index 2b2e225..db37ee3 100644
--- a/tools/perf/perf.c
+++ b/tools/perf/perf.c
@@ -24,6 +24,37 @@ const char perf_more_info_string[] =
 int use_browser = -1;
 static int use_pager = -1;
 
+struct cmd_struct {
+   const char *cmd;
+   int (*fn)(int, const char **, const char *);
+   int option;
+};
+
+static struct cmd_struct commands[] = {
+   { "buildid-cache", cmd_buildid_cache, 0 },
+   { "buildid-list", cmd_buildid_list, 0 },
+   { "diff",   cmd_diff,   0 },
+   { "evlist", cmd_evlist, 0 },
+   { "help",   cmd_help,   0 },
+   { "list",   cmd_list,   0 },
+   { "record", cmd_record, 0 },
+   { "report", cmd_report, 0 },
+   { "bench",  cmd_bench,  0 },
+   { "stat",   cmd_stat,   0 },
+   { "timechart",  cmd_timechart,  0 },
+   { "top",cmd_top,0 },
+   { "annotate",   cmd_annotate,   0 },
+   { "version",cmd_version,0 },
+   { "script", cmd_script, 0 },
+   { "sched",  cmd_sched,  0 },
+   { "probe",  cmd_probe,  0 },
+   { "kmem",   cmd_kmem,   0 },
+   { "lock",   cmd_lock,   0 },
+   { "kvm",cmd_kvm,0 },
+   { "test",   cmd_test,   0 },
+   { "inject", cmd_inject, 0 },
+};
+
 struct pager_config {
const char *cmd;
int val;
@@ -160,6 +191,14 @@ static int handle_options(const char ***argv, int *argc, 
int *envchanged)
fprintf(stderr, "dir: %s\n", debugfs_mountpoint);
if (envchanged)
*envchanged = 1;
+   } else if (!strcmp(cmd, "--list-cmds")) {
+   unsigned int i;
+
+   for (i = 0; i < ARRAY_SIZE(commands); i++) {
+   struct cmd_struct *p = commands+i;
+   printf("%s ", p->cmd);
+   }
+   exit(0);
} else {
fprintf(stderr, "Unknown option: %s\n", cmd);
usage(perf_usage_string);
@@ -245,12 +284,6 @@ const char perf_version_string[] = PERF_VERSION;
  */
 #define NEED_WORK_TREE (1<<2)
 
-struct cmd_struct {
-   const char *cmd;
-   int (*fn)(int, const char **, const char *);
-   int option;
-};
-
 static int run_builtin(struct cmd_struct *p, int argc, const char **argv)
 {
int status;
@@ -296,30 +329,6 @@ static int run_builtin(struct cmd_struct *p, int argc, 
const char **argv)
 static void handle_internal_command(int argc, const char **argv)
 {
const char *cmd = argv[0];
-   static struct cm

[PATCH 0/3] perf tools: Basic bash completion support v3

2012-08-09 Thread Frederic Weisbecker
Changes since v2:

- Fix /etc config installation from Namhyung.

Frederic Weisbecker (2):
  perf tools: Initial bash completion support
  perf tools: Support for events bash completion

Namhyung Kim (1):
  perf tools: Fix /etc config related installation

 tools/perf/Makefile|3 ++
 tools/perf/bash_completion |   26 +++
 tools/perf/builtin-list.c  |   14 ---
 tools/perf/perf.c  |   69 ++-
 tools/perf/util/parse-events.c |   70 +---
 tools/perf/util/parse-events.h |7 ++--
 6 files changed, 124 insertions(+), 65 deletions(-)
 create mode 100644 tools/perf/bash_completion

-- 
1.7.5.4

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH 3/3] perf tools: Fix /etc config related installation

2012-08-09 Thread Frederic Weisbecker
From: Namhyung Kim 

Fix missing /etc/bash_completion.d directory creation, otherwise
the installation fails miserably on systems that don't have bash
completion installed yet or on specific target:

   $ make DESTDIR=/tmp/junk-perf O=/tmp/pbuild -C tools/perf/ install
   ...
   install -m 755 bash_completion 
/tmp/junk-perf/etc/bash_completion.d/perf
   install: cannot create regular file
   `/tmp/junk-perf/etc/bash_completion.d/perf': No such file or 
directory
   make: *** [install] Error 1
   make: Leaving directory `/opt/sw/ahern/perf.git/tools/perf'

Also use sysconfdir variable instead of the hardcoded /etc to handle
overriden conf directory.

Reported-by: David Ahern 
Cc: David Ahern 
Cc: Ingo Molnar 
Cc: Jiri Olsa 
Cc: Namhyung Kim 
Cc: Peter Zijlstra 
Cc: Stephane Eranian 
Signed-off-by: Namhyung Kim 
Signed-off-by: Frederic Weisbecker 
---
 tools/perf/Makefile |4 +++-
 1 files changed, 3 insertions(+), 1 deletions(-)

diff --git a/tools/perf/Makefile b/tools/perf/Makefile
index 84b4227..a9458b9 100644
--- a/tools/perf/Makefile
+++ b/tools/perf/Makefile
@@ -700,6 +700,7 @@ perfexecdir_SQ = $(subst ','\'',$(perfexecdir))
 template_dir_SQ = $(subst ','\'',$(template_dir))
 htmldir_SQ = $(subst ','\'',$(htmldir))
 prefix_SQ = $(subst ','\'',$(prefix))
+sysconfdir_SQ = $(subst ','\'',$(sysconfdir))
 
 SHELL_PATH_SQ = $(subst ','\'',$(SHELL_PATH))
 
@@ -951,7 +952,8 @@ install: all
$(INSTALL) scripts/python/Perf-Trace-Util/lib/Perf/Trace/* -t 
'$(DESTDIR_SQ)$(perfexec_instdir_SQ)/scripts/python/Perf-Trace-Util/lib/Perf/Trace'
$(INSTALL) scripts/python/*.py -t 
'$(DESTDIR_SQ)$(perfexec_instdir_SQ)/scripts/python'
$(INSTALL) scripts/python/bin/* -t 
'$(DESTDIR_SQ)$(perfexec_instdir_SQ)/scripts/python/bin'
-   $(INSTALL) -m 755 bash_completion 
$(DESTDIR_SQ)/etc/bash_completion.d/perf
+   $(INSTALL) -d -m 755 '$(DESTDIR_SQ)$(sysconfdir_SQ)/bash_completion.d'
+   $(INSTALL) bash_completion 
'$(DESTDIR_SQ)$(sysconfdir_SQ)/bash_completion.d/perf'
 
 install-python_ext:
$(PYTHON_WORD) util/setup.py --quiet install --root='/$(DESTDIR_SQ)'
-- 
1.7.5.4

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH 2/3] perf tools: Support for events bash completion

2012-08-09 Thread Frederic Weisbecker
Add basic bash completion for the -e option in record, top
and stat subcommands. Only hardware, software and tracepoint
events are supported.

Breakpoints, raw events and events grouping completion
need more thinking.

Signed-off-by: Frederic Weisbecker 
Cc: David Ahern 
Cc: Ingo Molnar 
Cc: Jiri Olsa 
Cc: Namhyung Kim 
Cc: Peter Zijlstra 
Cc: Stephane Eranian 
---
 tools/perf/bash_completion |6 +++-
 tools/perf/builtin-list.c  |   14 ---
 tools/perf/util/parse-events.c |   70 +---
 tools/perf/util/parse-events.h |7 ++--
 4 files changed, 61 insertions(+), 36 deletions(-)

diff --git a/tools/perf/bash_completion b/tools/perf/bash_completion
index 9a31fa5..1958fa5 100644
--- a/tools/perf/bash_completion
+++ b/tools/perf/bash_completion
@@ -6,7 +6,7 @@ _perf()
local cur cmd
 
COMPREPLY=()
-   _get_comp_words_by_ref cur
+   _get_comp_words_by_ref cur prev
 
cmd=${COMP_WORDS[0]}
 
@@ -14,6 +14,10 @@ _perf()
if [ $COMP_CWORD -eq 1 ]; then
cmds=$($cmd --list-cmds)
COMPREPLY=( $( compgen -W '$cmds' -- "$cur" ) )
+   # List possible events for -e option
+   elif [[ $prev == "-e" && "${COMP_WORDS[1]}" == @(record|stat|top) ]]; 
then
+   cmds=$($cmd list --raw-dump)
+   COMPREPLY=( $( compgen -W '$cmds' -- "$cur" ) )
# Fall down to list regular files
else
_filedir
diff --git a/tools/perf/builtin-list.c b/tools/perf/builtin-list.c
index 6313b6e..bdcff81 100644
--- a/tools/perf/builtin-list.c
+++ b/tools/perf/builtin-list.c
@@ -19,15 +19,15 @@ int cmd_list(int argc, const char **argv, const char 
*prefix __used)
setup_pager();
 
if (argc == 1)
-   print_events(NULL);
+   print_events(NULL, false);
else {
int i;
 
for (i = 1; i < argc; ++i) {
-   if (i > 1)
+   if (i > 2)
putchar('\n');
if (strncmp(argv[i], "tracepoint", 10) == 0)
-   print_tracepoint_events(NULL, NULL);
+   print_tracepoint_events(NULL, NULL, false);
else if (strcmp(argv[i], "hw") == 0 ||
 strcmp(argv[i], "hardware") == 0)
print_events_type(PERF_TYPE_HARDWARE);
@@ -36,13 +36,15 @@ int cmd_list(int argc, const char **argv, const char 
*prefix __used)
print_events_type(PERF_TYPE_SOFTWARE);
else if (strcmp(argv[i], "cache") == 0 ||
 strcmp(argv[i], "hwcache") == 0)
-   print_hwcache_events(NULL);
+   print_hwcache_events(NULL, false);
+   else if (strcmp(argv[i], "--raw-dump") == 0)
+   print_events(NULL, true);
else {
char *sep = strchr(argv[i], ':'), *s;
int sep_idx;
 
if (sep == NULL) {
-   print_events(argv[i]);
+   print_events(argv[i], false);
continue;
}
sep_idx = sep - argv[i];
@@ -51,7 +53,7 @@ int cmd_list(int argc, const char **argv, const char *prefix 
__used)
return -1;
 
s[sep_idx] = '\0';
-   print_tracepoint_events(s, s + sep_idx + 1);
+   print_tracepoint_events(s, s + sep_idx + 1, 
false);
free(s);
}
}
diff --git a/tools/perf/util/parse-events.c b/tools/perf/util/parse-events.c
index 8bdfa3e..3ec4bfc 100644
--- a/tools/perf/util/parse-events.c
+++ b/tools/perf/util/parse-events.c
@@ -799,7 +799,8 @@ static const char * const event_type_descriptors[] = {
  * Print the events from /tracing/events
  */
 
-void print_tracepoint_events(const char *subsys_glob, const char *event_glob)
+void print_tracepoint_events(const char *subsys_glob, const char *event_glob,
+bool name_only)
 {
DIR *sys_dir, *evt_dir;
struct dirent *sys_next, *evt_next, sys_dirent, evt_dirent;
@@ -829,6 +830,11 @@ void print_tracepoint_events(const char *subsys_glob, 
const char *event_glob)
!strglobmatch(evt_dirent.d_name, event_glob))
continue;
 
+   if (name_only) {
+   p

Re: [PATCH 1/3] perf tools: Initial bash completion support

2012-08-09 Thread Frederic Weisbecker
On Thu, Aug 09, 2012 at 01:35:15PM -0300, Arnaldo Carvalho de Melo wrote:
> Em Thu, Aug 09, 2012 at 04:31:51PM +0200, Frederic Weisbecker escreveu:
> > This implements bash completion for perf subcommands such
> > as record, report, script, probe, etc...
> 
> Humm, I get this when doing my usual workflow:
> 
> [acme@sandy linux]$ make -j8 -C tools/perf/ O=/home/acme/git/build/perf 
> install
> make: Entering directory `/home/git/linux/tools/perf'
> PERF_VERSION = 3.6.rc1.152.g5758f7
> 
> install -d -m 755 
> '/home/acme/libexec/perf-core/scripts/python/Perf-Trace-Util/lib/Perf/Trace'
> install -d -m 755 '/home/acme/libexec/perf-core/scripts/python/bin'
> install scripts/python/Perf-Trace-Util/lib/Perf/Trace/* -t 
> '/home/acme/libexec/perf-core/scripts/python/Perf-Trace-Util/lib/Perf/Trace'
> install scripts/python/*.py -t '/home/acme/libexec/perf-core/scripts/python'
> install scripts/python/bin/* -t 
> '/home/acme/libexec/perf-core/scripts/python/bin'
> install -m 755 bash_completion /etc/bash_completion.d/perf
> install: cannot create regular file `/etc/bash_completion.d/perf': Permission 
> denied
> make: *** [install] Error 1
> make: Leaving directory `/home/git/linux/tools/perf'
> [acme@sandy linux]$ make -j8 -C tools/perf/ O=/home/acme/git/build/perf 
> install
> 
>   Shouldn't it install on ~/etc/bash_completion.d/perf ?

Are you sure you have the third patch?

> 
>   Is there a way to have per user bash completion files like that?

It seems that some manual tweaking is needed :(

http://www.simplicidade.org/notes/archives/2008/02/bash_completion.html


> 
> - Arnaldo
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 1/3] perf tools: Initial bash completion support

2012-08-09 Thread Frederic Weisbecker
On Thu, Aug 09, 2012 at 02:11:22PM -0300, Arnaldo Carvalho de Melo wrote:
> Em Thu, Aug 09, 2012 at 07:00:10PM +0200, Frederic Weisbecker escreveu:
> > On Thu, Aug 09, 2012 at 01:35:15PM -0300, Arnaldo Carvalho de Melo wrote:
> > > Em Thu, Aug 09, 2012 at 04:31:51PM +0200, Frederic Weisbecker escreveu:
> > > > This implements bash completion for perf subcommands such
> > > > as record, report, script, probe, etc...
> > > 
> > > Humm, I get this when doing my usual workflow:
> > > 
> > > [acme@sandy linux]$ make -j8 -C tools/perf/ O=/home/acme/git/build/perf 
> > > install
> > > make: Entering directory `/home/git/linux/tools/perf'
> > > PERF_VERSION = 3.6.rc1.152.g5758f7
> > > 
> > > install -d -m 755 
> > > '/home/acme/libexec/perf-core/scripts/python/Perf-Trace-Util/lib/Perf/Trace'
> > > install -d -m 755 '/home/acme/libexec/perf-core/scripts/python/bin'
> > > install scripts/python/Perf-Trace-Util/lib/Perf/Trace/* -t 
> > > '/home/acme/libexec/perf-core/scripts/python/Perf-Trace-Util/lib/Perf/Trace'
> > > install scripts/python/*.py -t 
> > > '/home/acme/libexec/perf-core/scripts/python'
> > > install scripts/python/bin/* -t 
> > > '/home/acme/libexec/perf-core/scripts/python/bin'
> > > install -m 755 bash_completion /etc/bash_completion.d/perf
> > > install: cannot create regular file `/etc/bash_completion.d/perf': 
> > > Permission denied
> > > make: *** [install] Error 1
> > > make: Leaving directory `/home/git/linux/tools/perf'
> > > [acme@sandy linux]$ make -j8 -C tools/perf/ O=/home/acme/git/build/perf 
> > > install
> > > 
> > >   Shouldn't it install on ~/etc/bash_completion.d/perf ?
> > 
> > Are you sure you have the third patch?
> 
> So should I fold the third into the first?

That's up to you. I kept the third patch seperate to let the credit
to Namhyung.

>  
> > > 
> > >   Is there a way to have per user bash completion files like that?
> > 
> > It seems that some manual tweaking is needed :(
> > 
> > http://www.simplicidade.org/notes/archives/2008/02/bash_completion.html
> 
> Will read.
> 
> - Arnaldo
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 1/3] perf tools: Initial bash completion support

2012-08-09 Thread Frederic Weisbecker
On Thu, Aug 09, 2012 at 02:14:19PM -0300, Arnaldo Carvalho de Melo wrote:
> Em Thu, Aug 09, 2012 at 10:40:19AM -0600, David Ahern escreveu:
> > On 8/9/12 10:35 AM, Arnaldo Carvalho de Melo wrote:
> > >Em Thu, Aug 09, 2012 at 04:31:51PM +0200, Frederic Weisbecker escreveu:
> > >>This implements bash completion for perf subcommands such
> > >>as record, report, script, probe, etc...
> > >
> > >Humm, I get this when doing my usual workflow:
> > >
> > >[acme@sandy linux]$ make -j8 -C tools/perf/ O=/home/acme/git/build/perf 
> > >install
> > >make: Entering directory `/home/git/linux/tools/perf'
> > >PERF_VERSION = 3.6.rc1.152.g5758f7
> > >
> > >install -d -m 755 
> > >'/home/acme/libexec/perf-core/scripts/python/Perf-Trace-Util/lib/Perf/Trace'
> > >install -d -m 755 '/home/acme/libexec/perf-core/scripts/python/bin'
> > >install scripts/python/Perf-Trace-Util/lib/Perf/Trace/* -t 
> > >'/home/acme/libexec/perf-core/scripts/python/Perf-Trace-Util/lib/Perf/Trace'
> > >install scripts/python/*.py -t 
> > >'/home/acme/libexec/perf-core/scripts/python'
> > >install scripts/python/bin/* -t 
> > >'/home/acme/libexec/perf-core/scripts/python/bin'
> > >install -m 755 bash_completion /etc/bash_completion.d/perf
> > >install: cannot create regular file `/etc/bash_completion.d/perf': 
> > >Permission denied
> > >make: *** [install] Error 1
> > >make: Leaving directory `/home/git/linux/tools/perf'
> > >[acme@sandy linux]$ make -j8 -C tools/perf/ O=/home/acme/git/build/perf 
> > >install
> > >
> > >   Shouldn't it install on ~/etc/bash_completion.d/perf ?
> > >
> > >   Is there a way to have per user bash completion files like that?
> > 
> > 3rd patch should fix this.
> 
> Huh? The problem is not /etc/bash_completion.d/ not existing, it exists,
> its just that I'm not using sudo nor installing as root, this new bash
> completion file is the only one that is being installed on the root
> filesystem, all others are in ~acme/

No the third patch handles sysconfdir which should take care of that:

$ make -C tools/perf O=/home/fweisbec/build install
make: entrant dans le répertoire « /home/fweisbec/linux-2.6-tip/tools/perf »
make[1]: entrant dans le répertoire « 
/home/fweisbec/linux-2.6-tip/tools/lib/traceevent »
make[1]: quittant le répertoire « 
/home/fweisbec/linux-2.6-tip/tools/lib/traceevent »
LINK /home/fweisbec/build/perf
GEN perf-archive
install -d -m 755 '/home/fweisbec/bin'
install /home/fweisbec/build/perf '/home/fweisbec/bin'
install -d -m 755 
'/home/fweisbec/libexec/perf-core/scripts/perl/Perf-Trace-Util/lib/Perf/Trace'
install -d -m 755 '/home/fweisbec/libexec/perf-core/scripts/perl/bin'
install /home/fweisbec/build/perf-archive -t '/home/fweisbec/libexec/perf-core'
install scripts/perl/Perf-Trace-Util/lib/Perf/Trace/* -t 
'/home/fweisbec/libexec/perf-core/scripts/perl/Perf-Trace-Util/lib/Perf/Trace'
install scripts/perl/*.pl -t '/home/fweisbec/libexec/perf-core/scripts/perl'
install scripts/perl/bin/* -t 
'/home/fweisbec/libexec/perf-core/scripts/perl/bin'
install -d -m 755 
'/home/fweisbec/libexec/perf-core/scripts/python/Perf-Trace-Util/lib/Perf/Trace'
install -d -m 755 '/home/fweisbec/libexec/perf-core/scripts/python/bin'
install scripts/python/Perf-Trace-Util/lib/Perf/Trace/* -t 
'/home/fweisbec/libexec/perf-core/scripts/python/Perf-Trace-Util/lib/Perf/Trace'
install scripts/python/*.py -t '/home/fweisbec/libexec/perf-core/scripts/python'
install scripts/python/bin/* -t 
'/home/fweisbec/libexec/perf-core/scripts/python/bin'
install -d -m 755 '/home/fweisbec/etc/bash_completion.d'
install bash_completion '/home/fweisbec/etc/bash_completion.d/perf'
make: quittant le répertoire « /home/fweisbec/linux-2.6-tip/tools/perf »
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 1/3] perf tools: Initial bash completion support

2012-08-10 Thread Frederic Weisbecker
On Thu, Aug 09, 2012 at 04:08:19PM -0300, Arnaldo Carvalho de Melo wrote:
> Em Thu, Aug 09, 2012 at 07:27:06PM +0100, Alan Cox escreveu:
> > > > 3rd patch should fix this.
> > > 
> > > Huh? The problem is not /etc/bash_completion.d/ not existing, it exists,
> > > its just that I'm not using sudo nor installing as root, this new bash
> > > completion file is the only one that is being installed on the root
> > > filesystem, all others are in ~acme/
> > 
> > And even with permissions it might not have the right security labels on
> > a well secured box.
> > 
> > It's a neat little script (or once its been properly security audited
> > will be) but IMHO it belongs in the distro bash script packages.
> 
> Yeah, I think we can keep it in the kernel sources and then send new
> versions to the bash-completion-de...@lists.alioth.debian.org guys.
> 
> To test I just did:
> 
>   ln -s ~/etc/bash_completion.d/perf ~/.bash_completion
> 
> Frédéric, I merged your patches as-is and pushed them to my perf/core
> branch, thanks!

Thanks!
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH 0/4] cputime: Virtual cputime accounting small cleanups and consolidation v2

2012-08-14 Thread Frederic Weisbecker
Hi,

No fundamental change in this release but a rebase to solve conflicts
against latest tip:/sched/core commits.

Thanks.

Frederic Weisbecker (4):
  cputime: Generalize CONFIG_VIRT_CPU_ACCOUNTING
  sched: Move cputime code to its own file
  cputime: Consolidate vtime handling on context switch
  s390: Remove leftover account_tick_vtime() header

 arch/Kconfig   |3 +
 arch/ia64/Kconfig  |   12 +-
 arch/ia64/include/asm/switch_to.h  |8 -
 arch/ia64/kernel/time.c|4 +-
 arch/powerpc/include/asm/time.h|6 -
 arch/powerpc/kernel/process.c  |3 -
 arch/powerpc/kernel/time.c |6 +
 arch/powerpc/platforms/Kconfig.cputype |   16 +-
 arch/s390/Kconfig  |5 +-
 arch/s390/include/asm/switch_to.h  |4 -
 arch/s390/kernel/vtime.c   |4 +-
 include/linux/kernel_stat.h|6 +
 init/Kconfig   |   13 +
 kernel/sched/Makefile  |2 +-
 kernel/sched/core.c|  558 +---
 kernel/sched/cputime.c |  503 
 kernel/sched/sched.h   |   63 
 17 files changed, 606 insertions(+), 610 deletions(-)
 create mode 100644 kernel/sched/cputime.c

-- 
1.7.5.4

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH 1/4] cputime: Generalize CONFIG_VIRT_CPU_ACCOUNTING

2012-08-14 Thread Frederic Weisbecker
S390, ia64 and powerpc all define their own version
of CONFIG_VIRT_CPU_ACCOUNTING. Generalize the config
and its description to a single place to avoid
duplication.

Signed-off-by: Frederic Weisbecker 
Cc: Tony Luck 
Cc: Fenghua Yu 
Cc: Benjamin Herrenschmidt 
Cc: Paul Mackerras 
Cc: Martin Schwidefsky 
Cc: Heiko Carstens 
Cc: Ingo Molnar 
Cc: Thomas Gleixner 
Cc: Peter Zijlstra 
---
 arch/Kconfig   |3 +++
 arch/ia64/Kconfig  |   12 +---
 arch/powerpc/platforms/Kconfig.cputype |   16 +---
 arch/s390/Kconfig  |5 ++---
 init/Kconfig   |   13 +
 5 files changed, 20 insertions(+), 29 deletions(-)

diff --git a/arch/Kconfig b/arch/Kconfig
index 72f2fa1..f78de57 100644
--- a/arch/Kconfig
+++ b/arch/Kconfig
@@ -281,4 +281,7 @@ config SECCOMP_FILTER
 
  See Documentation/prctl/seccomp_filter.txt for details.
 
+config HAVE_VIRT_CPU_ACCOUNTING
+   bool
+
 source "kernel/gcov/Kconfig"
diff --git a/arch/ia64/Kconfig b/arch/ia64/Kconfig
index 310cf57..3c720ef 100644
--- a/arch/ia64/Kconfig
+++ b/arch/ia64/Kconfig
@@ -25,6 +25,7 @@ config IA64
select HAVE_GENERIC_HARDIRQS
select HAVE_MEMBLOCK
select HAVE_MEMBLOCK_NODE_MAP
+   select HAVE_VIRT_CPU_ACCOUNTING
select ARCH_DISCARD_MEMBLOCK
select GENERIC_IRQ_PROBE
select GENERIC_PENDING_IRQ if SMP
@@ -340,17 +341,6 @@ config FORCE_MAX_ZONEORDER
default "17" if HUGETLB_PAGE
default "11"
 
-config VIRT_CPU_ACCOUNTING
-   bool "Deterministic task and CPU time accounting"
-   default n
-   help
- Select this option to enable more accurate task and CPU time
- accounting.  This is done by reading a CPU counter on each
- kernel entry and exit and on transitions within the kernel
- between system, softirq and hardirq state, so there is a
- small performance impact.
- If in doubt, say N here.
-
 config SMP
bool "Symmetric multi-processing support"
select USE_GENERIC_SMP_HELPERS
diff --git a/arch/powerpc/platforms/Kconfig.cputype 
b/arch/powerpc/platforms/Kconfig.cputype
index 30fd01d..72afd28 100644
--- a/arch/powerpc/platforms/Kconfig.cputype
+++ b/arch/powerpc/platforms/Kconfig.cputype
@@ -1,6 +1,7 @@
 config PPC64
bool "64-bit kernel"
default n
+   select HAVE_VIRT_CPU_ACCOUNTING
help
  This option selects whether a 32-bit or a 64-bit kernel
  will be built.
@@ -337,21 +338,6 @@ config PPC_MM_SLICES
default y if (!PPC_FSL_BOOK3E && PPC64 && HUGETLB_PAGE) || 
(PPC_STD_MMU_64 && PPC_64K_PAGES)
default n
 
-config VIRT_CPU_ACCOUNTING
-   bool "Deterministic task and CPU time accounting"
-   depends on PPC64
-   default y
-   help
- Select this option to enable more accurate task and CPU time
- accounting.  This is done by reading a CPU counter on each
- kernel entry and exit and on transitions within the kernel
- between system, softirq and hardirq state, so there is a
- small performance impact.  This also enables accounting of
- stolen time on logically-partitioned systems running on
- IBM POWER5-based machines.
-
- If in doubt, say Y here.
-
 config PPC_HAVE_PMU_SUPPORT
bool
 
diff --git a/arch/s390/Kconfig b/arch/s390/Kconfig
index 76de6b6..49ebfb6 100644
--- a/arch/s390/Kconfig
+++ b/arch/s390/Kconfig
@@ -49,9 +49,6 @@ config GENERIC_LOCKBREAK
 config PGSTE
def_bool y if KVM
 
-config VIRT_CPU_ACCOUNTING
-   def_bool y
-
 config ARCH_SUPPORTS_DEBUG_PAGEALLOC
def_bool y
 
@@ -89,6 +86,8 @@ config S390
select HAVE_MEMBLOCK
select HAVE_MEMBLOCK_NODE_MAP
select HAVE_CMPXCHG_LOCAL
+   select HAVE_VIRT_CPU_ACCOUNTING
+   select VIRT_CPU_ACCOUNTING
select ARCH_DISCARD_MEMBLOCK
select BUILDTIME_EXTABLE_SORT
select ARCH_INLINE_SPIN_TRYLOCK
diff --git a/init/Kconfig b/init/Kconfig
index af6c7f8..894b073 100644
--- a/init/Kconfig
+++ b/init/Kconfig
@@ -267,6 +267,19 @@ config POSIX_MQUEUE_SYSCTL
depends on SYSCTL
default y
 
+config VIRT_CPU_ACCOUNTING
+   bool "Deterministic task and CPU time accounting"
+   depends on HAVE_VIRT_CPU_ACCOUNTING
+   default y if PPC64
+   help
+ Select this option to enable more accurate task and CPU time
+ accounting.  This is done by reading a CPU counter on each
+ kernel entry and exit and on transitions within the kernel
+ between system, softirq and hardirq state, so there is a
+ small performance impact.  This also enables accounting of
+ stolen time on logically-partitioned systems running on
+ IBM POWER5-based machines.
+
 config BSD_PROCESS_ACCT
   

[PATCH 3/4] cputime: Consolidate vtime handling on context switch

2012-08-14 Thread Frederic Weisbecker
The archs that implement virtual cputime accounting all
flush the cputime of a task when it gets descheduled
and sometimes set up some ground initialization for the
next task to account its cputime.

These archs all put their own hooks in their context
switch callbacks and handle the off-case themselves.

Consolidate this by creating a new account_switch_vtime()
callback called in generic code right after a context switch
and that these archs must implement to flush the prev task
cputime and initialize the next task cputime related state.

Signed-off-by: Frederic Weisbecker 
Cc: Tony Luck 
Cc: Fenghua Yu 
Cc: Benjamin Herrenschmidt 
Cc: Paul Mackerras 
Cc: Martin Schwidefsky 
Cc: Heiko Carstens 
Cc: Ingo Molnar 
Cc: Thomas Gleixner 
Cc: Peter Zijlstra 
---
 arch/ia64/include/asm/switch_to.h |8 
 arch/ia64/kernel/time.c   |4 ++--
 arch/powerpc/include/asm/time.h   |6 --
 arch/powerpc/kernel/process.c |3 ---
 arch/powerpc/kernel/time.c|6 ++
 arch/s390/include/asm/switch_to.h |2 --
 arch/s390/kernel/vtime.c  |4 ++--
 include/linux/kernel_stat.h   |6 ++
 kernel/sched/core.c   |1 +
 9 files changed, 17 insertions(+), 23 deletions(-)

diff --git a/arch/ia64/include/asm/switch_to.h 
b/arch/ia64/include/asm/switch_to.h
index cb2412f..d38c7ea 100644
--- a/arch/ia64/include/asm/switch_to.h
+++ b/arch/ia64/include/asm/switch_to.h
@@ -30,13 +30,6 @@ extern struct task_struct *ia64_switch_to (void *next_task);
 extern void ia64_save_extra (struct task_struct *task);
 extern void ia64_load_extra (struct task_struct *task);
 
-#ifdef CONFIG_VIRT_CPU_ACCOUNTING
-extern void ia64_account_on_switch (struct task_struct *prev, struct 
task_struct *next);
-# define IA64_ACCOUNT_ON_SWITCH(p,n) ia64_account_on_switch(p,n)
-#else
-# define IA64_ACCOUNT_ON_SWITCH(p,n)
-#endif
-
 #ifdef CONFIG_PERFMON
   DECLARE_PER_CPU(unsigned long, pfm_syst_info);
 # define PERFMON_IS_SYSWIDE() (__get_cpu_var(pfm_syst_info) & 0x1)
@@ -49,7 +42,6 @@ extern void ia64_account_on_switch (struct task_struct *prev, 
struct task_struct
 || PERFMON_IS_SYSWIDE())
 
 #define __switch_to(prev,next,last) do {   
 \
-   IA64_ACCOUNT_ON_SWITCH(prev, next); 
 \
if (IA64_HAS_EXTRA_STATE(prev)) 
 \
ia64_save_extra(prev);  
 \
if (IA64_HAS_EXTRA_STATE(next)) 
 \
diff --git a/arch/ia64/kernel/time.c b/arch/ia64/kernel/time.c
index ecc904b..6247197 100644
--- a/arch/ia64/kernel/time.c
+++ b/arch/ia64/kernel/time.c
@@ -88,10 +88,10 @@ extern cputime_t cycle_to_cputime(u64 cyc);
  * accumulated times to the current process, and to prepare accounting on
  * the next process.
  */
-void ia64_account_on_switch(struct task_struct *prev, struct task_struct *next)
+void account_switch_vtime(struct task_struct *prev)
 {
struct thread_info *pi = task_thread_info(prev);
-   struct thread_info *ni = task_thread_info(next);
+   struct thread_info *ni = task_thread_info(current);
cputime_t delta_stime, delta_utime;
__u64 now;
 
diff --git a/arch/powerpc/include/asm/time.h b/arch/powerpc/include/asm/time.h
index 3b4b4a8..c1f2676 100644
--- a/arch/powerpc/include/asm/time.h
+++ b/arch/powerpc/include/asm/time.h
@@ -197,12 +197,6 @@ struct cpu_usage {
 
 DECLARE_PER_CPU(struct cpu_usage, cpu_usage_array);
 
-#if defined(CONFIG_VIRT_CPU_ACCOUNTING)
-#define account_process_vtime(tsk) account_process_tick(tsk, 0)
-#else
-#define account_process_vtime(tsk) do { } while (0)
-#endif
-
 extern void secondary_cpu_time_init(void);
 
 DECLARE_PER_CPU(u64, decrementers_next_tb);
diff --git a/arch/powerpc/kernel/process.c b/arch/powerpc/kernel/process.c
index 710f400..d73fa99 100644
--- a/arch/powerpc/kernel/process.c
+++ b/arch/powerpc/kernel/process.c
@@ -514,9 +514,6 @@ struct task_struct *__switch_to(struct task_struct *prev,
 
local_irq_save(flags);
 
-   account_system_vtime(current);
-   account_process_vtime(current);
-
/*
 * We can't take a PMU exception inside _switch() since there is a
 * window where the kernel stack SLB and the kernel stack are out
diff --git a/arch/powerpc/kernel/time.c b/arch/powerpc/kernel/time.c
index be171ee..49da7f0 100644
--- a/arch/powerpc/kernel/time.c
+++ b/arch/powerpc/kernel/time.c
@@ -366,6 +366,12 @@ void account_process_tick(struct task_struct *tsk, int 
user_tick)
account_user_time(tsk, utime, utimescaled);
 }
 
+void account_switch_vtime(struct task_struct *prev)
+{
+   account_system_vtime(prev);
+   account_process_tick(prev, 0);
+}
+
 #else /* ! CONFIG_VIRT_CPU_ACCOUNTING */
 #define calc_cputime_factors()
 #endif
d

[PATCH 2/4] sched: Move cputime code to its own file

2012-08-14 Thread Frederic Weisbecker
Extract cputime code from the giant sched/core.c and
put it in its own file. This make it easier to deal with
this particular area and de-bloat a bit more core.c

Signed-off-by: Frederic Weisbecker 
Cc: Tony Luck 
Cc: Fenghua Yu 
Cc: Benjamin Herrenschmidt 
Cc: Paul Mackerras 
Cc: Martin Schwidefsky 
Cc: Heiko Carstens 
Cc: Ingo Molnar 
Cc: Thomas Gleixner 
Cc: Peter Zijlstra 
---
 kernel/sched/Makefile  |2 +-
 kernel/sched/core.c|  557 +---
 kernel/sched/cputime.c |  503 +++
 kernel/sched/sched.h   |   63 ++
 4 files changed, 569 insertions(+), 556 deletions(-)
 create mode 100644 kernel/sched/cputime.c

diff --git a/kernel/sched/Makefile b/kernel/sched/Makefile
index 173ea52..f06d249 100644
--- a/kernel/sched/Makefile
+++ b/kernel/sched/Makefile
@@ -11,7 +11,7 @@ ifneq ($(CONFIG_SCHED_OMIT_FRAME_POINTER),y)
 CFLAGS_core.o := $(PROFILING) -fno-omit-frame-pointer
 endif
 
-obj-y += core.o clock.o idle_task.o fair.o rt.o stop_task.o
+obj-y += core.o clock.o cputime.o idle_task.o fair.o rt.o stop_task.o
 obj-$(CONFIG_SMP) += cpupri.o
 obj-$(CONFIG_SCHED_AUTOGROUP) += auto_group.o
 obj-$(CONFIG_SCHEDSTATS) += stats.o
diff --git a/kernel/sched/core.c b/kernel/sched/core.c
index 4376c9f..ae3bcaa 100644
--- a/kernel/sched/core.c
+++ b/kernel/sched/core.c
@@ -740,126 +740,6 @@ void deactivate_task(struct rq *rq, struct task_struct 
*p, int flags)
dequeue_task(rq, p, flags);
 }
 
-#ifdef CONFIG_IRQ_TIME_ACCOUNTING
-
-/*
- * There are no locks covering percpu hardirq/softirq time.
- * They are only modified in account_system_vtime, on corresponding CPU
- * with interrupts disabled. So, writes are safe.
- * They are read and saved off onto struct rq in update_rq_clock().
- * This may result in other CPU reading this CPU's irq time and can
- * race with irq/account_system_vtime on this CPU. We would either get old
- * or new value with a side effect of accounting a slice of irq time to wrong
- * task when irq is in progress while we read rq->clock. That is a worthy
- * compromise in place of having locks on each irq in account_system_time.
- */
-static DEFINE_PER_CPU(u64, cpu_hardirq_time);
-static DEFINE_PER_CPU(u64, cpu_softirq_time);
-
-static DEFINE_PER_CPU(u64, irq_start_time);
-static int sched_clock_irqtime;
-
-void enable_sched_clock_irqtime(void)
-{
-   sched_clock_irqtime = 1;
-}
-
-void disable_sched_clock_irqtime(void)
-{
-   sched_clock_irqtime = 0;
-}
-
-#ifndef CONFIG_64BIT
-static DEFINE_PER_CPU(seqcount_t, irq_time_seq);
-
-static inline void irq_time_write_begin(void)
-{
-   __this_cpu_inc(irq_time_seq.sequence);
-   smp_wmb();
-}
-
-static inline void irq_time_write_end(void)
-{
-   smp_wmb();
-   __this_cpu_inc(irq_time_seq.sequence);
-}
-
-static inline u64 irq_time_read(int cpu)
-{
-   u64 irq_time;
-   unsigned seq;
-
-   do {
-   seq = read_seqcount_begin(&per_cpu(irq_time_seq, cpu));
-   irq_time = per_cpu(cpu_softirq_time, cpu) +
-  per_cpu(cpu_hardirq_time, cpu);
-   } while (read_seqcount_retry(&per_cpu(irq_time_seq, cpu), seq));
-
-   return irq_time;
-}
-#else /* CONFIG_64BIT */
-static inline void irq_time_write_begin(void)
-{
-}
-
-static inline void irq_time_write_end(void)
-{
-}
-
-static inline u64 irq_time_read(int cpu)
-{
-   return per_cpu(cpu_softirq_time, cpu) + per_cpu(cpu_hardirq_time, cpu);
-}
-#endif /* CONFIG_64BIT */
-
-/*
- * Called before incrementing preempt_count on {soft,}irq_enter
- * and before decrementing preempt_count on {soft,}irq_exit.
- */
-void account_system_vtime(struct task_struct *curr)
-{
-   unsigned long flags;
-   s64 delta;
-   int cpu;
-
-   if (!sched_clock_irqtime)
-   return;
-
-   local_irq_save(flags);
-
-   cpu = smp_processor_id();
-   delta = sched_clock_cpu(cpu) - __this_cpu_read(irq_start_time);
-   __this_cpu_add(irq_start_time, delta);
-
-   irq_time_write_begin();
-   /*
-* We do not account for softirq time from ksoftirqd here.
-* We want to continue accounting softirq time to ksoftirqd thread
-* in that case, so as not to confuse scheduler with a special task
-* that do not consume any time, but still wants to run.
-*/
-   if (hardirq_count())
-   __this_cpu_add(cpu_hardirq_time, delta);
-   else if (in_serving_softirq() && curr != this_cpu_ksoftirqd())
-   __this_cpu_add(cpu_softirq_time, delta);
-
-   irq_time_write_end();
-   local_irq_restore(flags);
-}
-EXPORT_SYMBOL_GPL(account_system_vtime);
-
-#endif /* CONFIG_IRQ_TIME_ACCOUNTING */
-
-#ifdef CONFIG_PARAVIRT
-static inline u64 steal_ticks(u64 steal)
-{
-   if (unlikely(steal > NSEC_PER_SEC))
-   return div_u64(steal, TICK_NSEC);
-
-   return __iter_div_u64_rem(steal, TICK_NSEC, &st

[PATCH 4/4] s390: Remove leftover account_tick_vtime() header

2012-08-14 Thread Frederic Weisbecker
The function doesn't seem to exist anymore.

Signed-off-by: Frederic Weisbecker 
Cc: Tony Luck 
Cc: Fenghua Yu 
Cc: Benjamin Herrenschmidt 
Cc: Paul Mackerras 
Cc: Martin Schwidefsky 
Cc: Heiko Carstens 
Cc: Ingo Molnar 
Cc: Thomas Gleixner 
Cc: Peter Zijlstra 
---
 arch/s390/include/asm/switch_to.h |2 --
 1 files changed, 0 insertions(+), 2 deletions(-)

diff --git a/arch/s390/include/asm/switch_to.h 
b/arch/s390/include/asm/switch_to.h
index e7f9b3d..314cc94 100644
--- a/arch/s390/include/asm/switch_to.h
+++ b/arch/s390/include/asm/switch_to.h
@@ -89,8 +89,6 @@ static inline void restore_access_regs(unsigned int *acrs)
prev = __switch_to(prev,next);  \
 } while (0)
 
-extern void account_tick_vtime(struct task_struct *);
-
 #define finish_arch_switch(prev) do {   \
set_fs(current->thread.mm_segment);  \
 } while (0)
-- 
1.7.5.4

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 0/4] cputime: Virtual cputime accounting small cleanups and consolidation v2

2012-08-14 Thread Frederic Weisbecker
On Tue, Aug 14, 2012 at 04:16:46PM +0200, Frederic Weisbecker wrote:
> Hi,
> 
> No fundamental change in this release but a rebase to solve conflicts
> against latest tip:/sched/core commits.
> 
> Thanks.

This can be pulled from:

git://github.com/fweisbec/linux-dynticks.git
virt-cputime-v2

This patchset, besides beeing a desired consolidation and
cleanup IMO, is necessary for the adaptive nohz feature
(see: http://comments.gmane.org/gmane.linux.kernel/1337690)

Thanks.

> 
> Frederic Weisbecker (4):
>   cputime: Generalize CONFIG_VIRT_CPU_ACCOUNTING
>   sched: Move cputime code to its own file
>   cputime: Consolidate vtime handling on context switch
>   s390: Remove leftover account_tick_vtime() header
> 
>  arch/Kconfig   |3 +
>  arch/ia64/Kconfig  |   12 +-
>  arch/ia64/include/asm/switch_to.h  |8 -
>  arch/ia64/kernel/time.c|4 +-
>  arch/powerpc/include/asm/time.h|6 -
>  arch/powerpc/kernel/process.c  |3 -
>  arch/powerpc/kernel/time.c |6 +
>  arch/powerpc/platforms/Kconfig.cputype |   16 +-
>  arch/s390/Kconfig  |5 +-
>  arch/s390/include/asm/switch_to.h  |4 -
>  arch/s390/kernel/vtime.c   |4 +-
>  include/linux/kernel_stat.h|6 +
>  init/Kconfig   |   13 +
>  kernel/sched/Makefile  |2 +-
>  kernel/sched/core.c|  558 
> +---
>  kernel/sched/cputime.c |  503 
>  kernel/sched/sched.h   |   63 
>  17 files changed, 606 insertions(+), 610 deletions(-)
>  create mode 100644 kernel/sched/cputime.c
> 
> -- 
> 1.7.5.4
> 
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH v3] Hardware breakpoints: Invoke __perf_event_disable() if interrupts are already disabled

2012-08-15 Thread Frederic Weisbecker
On Wed, Aug 15, 2012 at 11:07:01PM +0530, Naveen N. Rao wrote:
> Hi Frederick,
> Did you get a chance to take a look at this?
> 
> Regards,
> Naveen

Yeah, I'm ok with the patch. Peter, are you ok with it?

Thanks.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 1/4] cputime: Generalize CONFIG_VIRT_CPU_ACCOUNTING

2012-08-15 Thread Frederic Weisbecker
On Wed, Aug 15, 2012 at 05:03:47PM +0200, Martin Schwidefsky wrote:
> On Tue, 14 Aug 2012 16:16:47 +0200
> Frederic Weisbecker  wrote:
> 
> > S390, ia64 and powerpc all define their own version
> > of CONFIG_VIRT_CPU_ACCOUNTING. Generalize the config
> > and its description to a single place to avoid
> > duplication.
> 
> For S390 CONFIG_VIRT_CPU_ACCOUNTING is not configurable, it is always
> enabled. With this patch we'd get a config option in the menu, no?

Indeed it now appears in the menu but in the case of s390, it's impossible
to turn it off due to:

config S390
select VIRT_CPU_ACCOUNTING

This creates a strict dependency that the user can't override. The option
is untoggable.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 3/4] cputime: Consolidate vtime handling on context switch

2012-08-15 Thread Frederic Weisbecker
On Wed, Aug 15, 2012 at 05:22:19PM +0200, Martin Schwidefsky wrote:
> On Tue, 14 Aug 2012 16:16:49 +0200
> Frederic Weisbecker  wrote:
> 
> > The archs that implement virtual cputime accounting all
> > flush the cputime of a task when it gets descheduled
> > and sometimes set up some ground initialization for the
> > next task to account its cputime.
> > 
> > These archs all put their own hooks in their context
> > switch callbacks and handle the off-case themselves.
> > 
> > Consolidate this by creating a new account_switch_vtime()
> > callback called in generic code right after a context switch
> > and that these archs must implement to flush the prev task
> > cputime and initialize the next task cputime related state.
> 
> That change requires that the accounting for the previous process
> can be done before finish_arch_switch() completed. With the old
> code the architecture could to the accounting call in the middle
> of finish_arch_switch, that is not possible anymore. Dunno if this
> is relevant or not. For s390 the new code should work fine.

I'm not sure how this could potentially cause a problem. Interrupts are disabled
between while we switch_to() until finish_lock_switch(). So nothing
should be able to mess up with the accounting of the prev task.

I don't really understand what you mean actually.

Thanks.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 3/4] cputime: Consolidate vtime handling on context switch

2012-08-16 Thread Frederic Weisbecker
On Thu, Aug 16, 2012 at 09:50:32AM +0200, Martin Schwidefsky wrote:
> On Wed, 15 Aug 2012 21:28:17 +0200
> Frederic Weisbecker  wrote:
> 
> > On Wed, Aug 15, 2012 at 05:22:19PM +0200, Martin Schwidefsky wrote:
> > > On Tue, 14 Aug 2012 16:16:49 +0200
> > > Frederic Weisbecker  wrote:
> > > 
> > > > The archs that implement virtual cputime accounting all
> > > > flush the cputime of a task when it gets descheduled
> > > > and sometimes set up some ground initialization for the
> > > > next task to account its cputime.
> > > > 
> > > > These archs all put their own hooks in their context
> > > > switch callbacks and handle the off-case themselves.
> > > > 
> > > > Consolidate this by creating a new account_switch_vtime()
> > > > callback called in generic code right after a context switch
> > > > and that these archs must implement to flush the prev task
> > > > cputime and initialize the next task cputime related state.
> > > 
> > > That change requires that the accounting for the previous process
> > > can be done before finish_arch_switch() completed. With the old
> > > code the architecture could to the accounting call in the middle
> > > of finish_arch_switch, that is not possible anymore. Dunno if this
> > > is relevant or not. For s390 the new code should work fine.
> > 
> > I'm not sure how this could potentially cause a problem. Interrupts are 
> > disabled
> > between while we switch_to() until finish_lock_switch(). So nothing
> > should be able to mess up with the accounting of the prev task.
> > 
> > I don't really understand what you mean actually.
> 
> It is more a theoretical consideration. If the finish_arch_switch code
> updates fields that are required to do the cputime accounting then the
> order could be important. But then you could move that necessary code
> from finish_arch_switch to account_switch_vtime.
> As said that change is fine for s390, so I'm good with it.

Ah ok. Well like you said this is fine for s390. And it looks also fine
to me on ia64 and powerpc as it doesn't look like we depend on something
done in finish_arch_switch() there. They were flush the previous task
cputime from switch_to() anyway.

Thanks.

PS: can I add your ack?

> 
> -- 
> blue skies,
>Martin.
> 
> "Reality continues to ruin my life." - Calvin.
> 
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 1/4] cputime: Generalize CONFIG_VIRT_CPU_ACCOUNTING

2012-08-16 Thread Frederic Weisbecker
On Thu, Aug 16, 2012 at 07:38:17PM +1000, Benjamin Herrenschmidt wrote:
> On Thu, 2012-08-16 at 09:53 +0200, Martin Schwidefsky wrote:
> > Hmm, ok. But then the description should be reworded not to be specific to
> > the power architecture (the part of the message about "This also enables
> > accounting of stolen time on logically-partitioned systems running on IBM
> > POWER5-based machines."). 
> 
> Which is not very helpful to somebody running on a POWER6 or 7 (which
> also support that option just fine :-)
> 
> So yes, the description should definitely be improved.

All right. How about something like the below?

diff --git a/init/Kconfig b/init/Kconfig
index 894b073..5f5f8c2 100644
--- a/init/Kconfig
+++ b/init/Kconfig
@@ -276,9 +276,9 @@ config VIRT_CPU_ACCOUNTING
  accounting.  This is done by reading a CPU counter on each
  kernel entry and exit and on transitions within the kernel
  between system, softirq and hardirq state, so there is a
- small performance impact.  This also enables accounting of
- stolen time on logically-partitioned systems running on
- IBM POWER5-based machines.
+ small performance impact.  In the case of IBM POWER > 5, this
+ also enables accounting of stolen time on logically-partitioned
+ systems.
 
 config BSD_PROCESS_ACCT
bool "BSD Process Accounting"
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 1/4] cputime: Generalize CONFIG_VIRT_CPU_ACCOUNTING

2012-08-16 Thread Frederic Weisbecker
On Thu, Aug 16, 2012 at 04:00:44PM +0200, Martin Schwidefsky wrote:
> On Thu, 16 Aug 2012 14:55:59 +0200
> Frederic Weisbecker  wrote:
> 
> > On Thu, Aug 16, 2012 at 07:38:17PM +1000, Benjamin Herrenschmidt wrote:
> > > On Thu, 2012-08-16 at 09:53 +0200, Martin Schwidefsky wrote:
> > > > Hmm, ok. But then the description should be reworded not to be specific 
> > > > to
> > > > the power architecture (the part of the message about "This also enables
> > > > accounting of stolen time on logically-partitioned systems running on 
> > > > IBM
> > > > POWER5-based machines."). 
> > > 
> > > Which is not very helpful to somebody running on a POWER6 or 7 (which
> > > also support that option just fine :-)
> > > 
> > > So yes, the description should definitely be improved.
> > 
> > All right. How about something like the below?
> > 
> > diff --git a/init/Kconfig b/init/Kconfig
> > index 894b073..5f5f8c2 100644
> > --- a/init/Kconfig
> > +++ b/init/Kconfig
> > @@ -276,9 +276,9 @@ config VIRT_CPU_ACCOUNTING
> >   accounting.  This is done by reading a CPU counter on each
> >   kernel entry and exit and on transitions within the kernel
> >   between system, softirq and hardirq state, so there is a
> > - small performance impact.  This also enables accounting of
> > - stolen time on logically-partitioned systems running on
> > - IBM POWER5-based machines.
> > + small performance impact.  In the case of IBM POWER > 5, this
> > + also enables accounting of stolen time on logically-partitioned
> > + systems.
> > 
> >  config BSD_PROCESS_ACCT
> > bool "BSD Process Accounting"
> > 
> 
> VIRT_CPU_ACCOUNTING will enable steal time for s390 as well.

Ah right. Fixed below:

diff --git a/init/Kconfig b/init/Kconfig
index 894b073..c40d0fb 100644
--- a/init/Kconfig
+++ b/init/Kconfig
@@ -276,9 +276,9 @@ config VIRT_CPU_ACCOUNTING
  accounting.  This is done by reading a CPU counter on each
  kernel entry and exit and on transitions within the kernel
  between system, softirq and hardirq state, so there is a
- small performance impact.  This also enables accounting of
- stolen time on logically-partitioned systems running on
- IBM POWER5-based machines.
+ small performance impact.  In the case of s390 or IBM POWER > 5,
+ this also enables accounting of stolen time on logically-partitioned
+ systems.
 
 config BSD_PROCESS_ACCT
bool "BSD Process Accounting"
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 03/17] perf, x86: Add copy_from_user_nmi_nochk for best effort copy

2012-07-25 Thread Frederic Weisbecker
On Sun, Jul 22, 2012 at 02:14:26PM +0200, Jiri Olsa wrote:
> Adding copy_from_user_nmi_nochk that provides the best effort
> copy regardless the requesting size crossing the task boundary.
> 
> This is going to be useful for stack dump we need in post
> DWARF CFI based unwind, where we have predefined size of
> the user stack to dump, and we need to store the most of
> the requested dump size, regardless this size is crossing
> the task boundary.

What does that imply when we cross this limit? Are we still in the
task stack?
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 01/17] perf: Unified API to record selective sets of arch registers

2012-07-25 Thread Frederic Weisbecker
On Sun, Jul 22, 2012 at 02:14:24PM +0200, Jiri Olsa wrote:
> This brings a new API to help the selective dump of registers on
> event sampling, and its implementation for x86 arch.
> 
> Added HAVE_PERF_REGS config option to determine if the architecture
> provides perf registers ABI.
> 
> The information about desired registers will be passed in u64 mask.
> It's up to the architecture to map the registers into the mask bits.
> 
> For the x86 arch implementation, both 32 and 64 bit registers
> bits are defined within single enum to ensure 64 bit system can
> provide register dump for compat task if needed in the future.
> 
> Signed-off-by: Jiri Olsa 
> Original-patch-by: Frederic Weisbecker 

Acked-by: Frederic Weisbecker 
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 02/17] perf: Add ability to attach user level registers dump to sample

2012-07-25 Thread Frederic Weisbecker
On Sun, Jul 22, 2012 at 02:14:25PM +0200, Jiri Olsa wrote:
> Introducing PERF_SAMPLE_REGS_USER sample type bit to trigger
> the dump of user level registers on sample. Registers we want
> to dump are specified by sample_regs_user bitmask.
> 
> Only user level registers are dumped at the moment. Meaning the
> register values of the user space context as it was before the
> user entered the kernel for whatever reason (syscall, irq,
> exception, or a PMI happening in userspace).
> 
> The layout of the sample_regs_user bitmap is described in
> asm/perf_regs.h for archs that support register dump.
> 
> This is going to be useful to bring Dwarf CFI based stack
> unwinding on top of samples.
> 
> Signed-off-by: Jiri Olsa 
> Original-patch-by: Frederic Weisbecker 

Acked-by: Frederic Weisbecker 
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 06/17] perf: Add ability to attach user stack dump to sample

2012-07-25 Thread Frederic Weisbecker
On Sun, Jul 22, 2012 at 02:14:29PM +0200, Jiri Olsa wrote:
> Introducing PERF_SAMPLE_STACK_USER sample type bit to trigger
> the dump of the user level stack on sample. The size of the
> dump is specified by sample_stack_user value.
> 
> Being able to dump parts of the user stack, starting from the
> stack pointer, will be useful to make a post mortem dwarf CFI
> based stack unwinding.
> 
> Signed-off-by: Jiri Olsa 
> Signed-off-by: Frederic Weisbecker 

If you keep the SOB of the author then you need to preserve its
authorship (git am --author= / git commit --amend --author=).
Unless you changed the patch significantly enough that you
simply credit with something like "Original-patch-by" and you become the
author. This is left to personal appreciation, I won't mind
in any case.

But there is no middle ground :)
You also need to keep the SOB chain in order. The above SOB chain
suggests I'm carrying a patch from you.

Just saying that so that you make the maintainers job easier ;)
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 14/17] perf, tool: Support for dwarf cfi unwinding on post processing

2012-07-25 Thread Frederic Weisbecker
On Sun, Jul 22, 2012 at 02:14:37PM +0200, Jiri Olsa wrote:
> This brings the support for dwarf cfi unwinding on perf post
> processing. Call frame informations are retrieved and then passed
> to libunwind that requests memory and register content from the
> applications.
> 
> Adding unwind object to handle the user stack backtrace based
> on the user register values and user stack dump.
> 
> The unwind object access the libunwind via remote interface
> and provides to it all the necessary data to unwind the stack.
> 
> The unwind interface provides following function:
>   unwind__get_entries
> 
> And callback (specified in above function) to retrieve
> the backtrace entries:
>   typedef int (*unwind_entry_cb_t)(struct unwind_entry *entry,
>void *arg);
> 
> Signed-off-by: Jiri Olsa 
> Signed-off-by: Frederic Weisbecker 
> ---
>  tools/perf/Makefile|2 +
>  tools/perf/arch/x86/Makefile   |3 +
>  tools/perf/arch/x86/util/unwind.c  |  111 
>  tools/perf/builtin-report.c|   24 +-
>  tools/perf/builtin-script.c|   16 +-
>  tools/perf/builtin-top.c   |5 +-
>  tools/perf/util/include/linux/compiler.h   |1 +
>  tools/perf/util/map.h  |7 +-
>  .../perf/util/scripting-engines/trace-event-perl.c |3 +-
>  .../util/scripting-engines/trace-event-python.c|3 +-
>  tools/perf/util/session.c  |   61 ++-
>  tools/perf/util/session.h  |3 +-
>  tools/perf/util/trace-event-scripting.c|3 +-
>  tools/perf/util/trace-event.h  |5 +-
>  tools/perf/util/unwind.c   |  567 
> 
>  tools/perf/util/unwind.h   |   34 ++
>  16 files changed, 811 insertions(+), 37 deletions(-)
>  create mode 100644 tools/perf/arch/x86/util/unwind.c
>  create mode 100644 tools/perf/util/unwind.c
>  create mode 100644 tools/perf/util/unwind.h
> 
> diff --git a/tools/perf/Makefile b/tools/perf/Makefile
> index d0c3291..c18c790 100644
> --- a/tools/perf/Makefile
> +++ b/tools/perf/Makefile
> @@ -328,6 +328,7 @@ LIB_H += util/cgroup.h
>  LIB_H += $(TRACE_EVENT_DIR)event-parse.h
>  LIB_H += util/target.h
>  LIB_H += util/perf_regs.h
> +LIB_H += util/unwind.h
>  
>  LIB_OBJS += $(OUTPUT)util/abspath.o
>  LIB_OBJS += $(OUTPUT)util/alias.o
> @@ -513,6 +514,7 @@ else
>   EXTLIBS += $(LIBUNWIND_LIBS)
>   BASIC_CFLAGS := $(LIBUNWIND_CFLAGS) $(BASIC_CFLAGS)
>   BASIC_LDFLAGS := $(LIBUNWIND_LDFLAGS) $(BASIC_LDFLAGS)
> + LIB_OBJS += $(OUTPUT)util/unwind.o
>  endif
>  
>  ifdef NO_NEWT
> diff --git a/tools/perf/arch/x86/Makefile b/tools/perf/arch/x86/Makefile
> index 744e629..815841c 100644
> --- a/tools/perf/arch/x86/Makefile
> +++ b/tools/perf/arch/x86/Makefile
> @@ -2,4 +2,7 @@ ifndef NO_DWARF
>  PERF_HAVE_DWARF_REGS := 1
>  LIB_OBJS += $(OUTPUT)arch/$(ARCH)/util/dwarf-regs.o
>  endif
> +ifndef NO_LIBUNWIND
> +LIB_OBJS += $(OUTPUT)arch/$(ARCH)/util/unwind.o
> +endif
>  LIB_OBJS += $(OUTPUT)arch/$(ARCH)/util/header.o
> diff --git a/tools/perf/arch/x86/util/unwind.c 
> b/tools/perf/arch/x86/util/unwind.c
> new file mode 100644
> index 000..78d956e
> --- /dev/null
> +++ b/tools/perf/arch/x86/util/unwind.c
> @@ -0,0 +1,111 @@
> +
> +#include 
> +#include 
> +#include "perf_regs.h"
> +#include "../../util/unwind.h"
> +
> +#ifdef ARCH_X86_64
> +int unwind__arch_reg_id(int regnum)

Please try to avoid __ in function names. We used that convention
before but we gave up because that's actually more painful than
anything.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 16/17] perf, tool: Add dso data caching

2012-07-25 Thread Frederic Weisbecker
On Sun, Jul 22, 2012 at 02:14:39PM +0200, Jiri Olsa wrote:
> Adding dso data caching so we don't need to open/read/close,
> each time we want dso data.
> 
> The DSO data caching affects following functions:
>   dso__data_read_offset
>   dso__data_read_addr
> 
> Each DSO read tries to find the data (based on offset) inside
> the cache. If it's not present it fills the cache from file,
> and returns the data. If it is present, data are returned
> with no file read.
> 
> Each data read is cached by reading cache page sized/aligned
> amount of DSO data. The cache page size is hardcoded to 4096.
> The cache is using RB tree with file offset as a sort key.
> 
> Signed-off-by: Jiri Olsa 

Nice idea.

> ---
>  tools/perf/util/symbol.c |  154 
> --

There seem to be an increasing need to move dso related things
to some util/dso.c file. Just suggesting.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCHv7 00/17] perf: Add backtrace post dwarf unwind

2012-07-25 Thread Frederic Weisbecker
On Sun, Jul 22, 2012 at 02:14:23PM +0200, Jiri Olsa wrote:
> hi,
> 
> patches available also as tarball in here:
> http://people.redhat.com/~jolsa/perf_post_unwind_v7.tar.bz2
> 
> v7 changes:
>- omitted v6 patches 9 and 15
>  They need more work and will be sent separately. I dont want to hold off 
> whole
>  patchset because of them. We could miss some related backtraces 
> (syscall, vdso)
>  in this version.
>- v6 patch 11, 14, 20 already in

I'm personally ok with the kernel bits. And the tool bits look like a nice
base to work on.

If nobody has a strong opposition, it would be nice to merge this in -tip.
Either in perf/core or in some staging tree. So that we continue incrementally.

Nice work overall, thanks!
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 14/17] perf, tool: Support for dwarf cfi unwinding on post processing

2012-07-25 Thread Frederic Weisbecker
On Wed, Jul 25, 2012 at 02:16:55PM -0300, Arnaldo Carvalho de Melo wrote:
> Em Wed, Jul 25, 2012 at 07:05:33PM +0200, Frederic Weisbecker escreveu:
> > > +#ifdef ARCH_X86_64
> > > +int unwind__arch_reg_id(int regnum)
> > 
> > Please try to avoid __ in function names. We used that convention
> > before but we gave up because that's actually more painful than
> > anything.
> 
> Well, I continue using it to separate the struct operated by the
> function from the function name.

As you prefer. I personally don't like it much because when I grep
for some function I have in mind, I stick on finding the right
underscore layout :)
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 03/17] perf, x86: Add copy_from_user_nmi_nochk for best effort copy

2012-07-25 Thread Frederic Weisbecker
On Wed, Jul 25, 2012 at 07:16:43PM +0200, Jiri Olsa wrote:
> On Wed, Jul 25, 2012 at 06:11:53PM +0200, Frederic Weisbecker wrote:
> > On Sun, Jul 22, 2012 at 02:14:26PM +0200, Jiri Olsa wrote:
> > > Adding copy_from_user_nmi_nochk that provides the best effort
> > > copy regardless the requesting size crossing the task boundary.
> > > 
> > > This is going to be useful for stack dump we need in post
> > > DWARF CFI based unwind, where we have predefined size of
> > > the user stack to dump, and we need to store the most of
> > > the requested dump size, regardless this size is crossing
> > > the task boundary.
> > 
> > What does that imply when we cross this limit? Are we still in the
> > task stack?
> 
> We store all we could from 'stack pointer' to 'stack pointer' + dump size.
> 
> I discussed this with Oleg and we could probably find vma for the 'stack 
> pointer'
> and check for its size and narrow the dump - maybe more complex, but probably 
> faster
> in comparison with dumping pages we're not interested in.

Ah, that's because the user stack can be larger than TASK_SIZE, right?

Ok then.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 03/17] perf, x86: Add copy_from_user_nmi_nochk for best effort copy

2012-07-25 Thread Frederic Weisbecker
On Wed, Jul 25, 2012 at 07:30:31PM +0200, Jiri Olsa wrote:
> On Wed, Jul 25, 2012 at 07:16:43PM +0200, Jiri Olsa wrote:
> > On Wed, Jul 25, 2012 at 06:11:53PM +0200, Frederic Weisbecker wrote:
> > > On Sun, Jul 22, 2012 at 02:14:26PM +0200, Jiri Olsa wrote:
> > > > Adding copy_from_user_nmi_nochk that provides the best effort
> > > > copy regardless the requesting size crossing the task boundary.
> > > > 
> > > > This is going to be useful for stack dump we need in post
> > > > DWARF CFI based unwind, where we have predefined size of
> > > > the user stack to dump, and we need to store the most of
> > > > the requested dump size, regardless this size is crossing
> > > > the task boundary.
> > > 
> > > What does that imply when we cross this limit? Are we still in the
> > > task stack?
> > 
> > We store all we could from 'stack pointer' to 'stack pointer' + dump size.
> > 
> > I discussed this with Oleg and we could probably find vma for the 'stack 
> > pointer'
> > and check for its size and narrow the dump - maybe more complex, but 
> > probably faster
> > in comparison with dumping pages we're not interested in.
> > 
> > thanks,
> > jirka
> 
> I can send this update later together with vdso
> and 'syscall regs storage' features ;)

Sure! As long as we are fine with the kernel ABI, the rest can be done
incrementally.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH 1/4] cputime: Generalize CONFIG_VIRT_CPU_ACCOUNTING

2012-08-17 Thread Frederic Weisbecker
S390, ia64 and powerpc all define their own version
of CONFIG_VIRT_CPU_ACCOUNTING. Generalize the config
and its description to a single place to avoid
duplication.

Signed-off-by: Frederic Weisbecker 
Cc: Tony Luck 
Cc: Fenghua Yu 
Cc: Benjamin Herrenschmidt 
Cc: Paul Mackerras 
Cc: Martin Schwidefsky 
Cc: Heiko Carstens 
Cc: Ingo Molnar 
Cc: Thomas Gleixner 
Cc: Peter Zijlstra 
---
 arch/Kconfig   |3 +++
 arch/ia64/Kconfig  |   12 +---
 arch/powerpc/platforms/Kconfig.cputype |   16 +---
 arch/s390/Kconfig  |5 ++---
 init/Kconfig   |   13 +
 5 files changed, 20 insertions(+), 29 deletions(-)

diff --git a/arch/Kconfig b/arch/Kconfig
index 72f2fa1..f78de57 100644
--- a/arch/Kconfig
+++ b/arch/Kconfig
@@ -281,4 +281,7 @@ config SECCOMP_FILTER
 
  See Documentation/prctl/seccomp_filter.txt for details.
 
+config HAVE_VIRT_CPU_ACCOUNTING
+   bool
+
 source "kernel/gcov/Kconfig"
diff --git a/arch/ia64/Kconfig b/arch/ia64/Kconfig
index 310cf57..3c720ef 100644
--- a/arch/ia64/Kconfig
+++ b/arch/ia64/Kconfig
@@ -25,6 +25,7 @@ config IA64
select HAVE_GENERIC_HARDIRQS
select HAVE_MEMBLOCK
select HAVE_MEMBLOCK_NODE_MAP
+   select HAVE_VIRT_CPU_ACCOUNTING
select ARCH_DISCARD_MEMBLOCK
select GENERIC_IRQ_PROBE
select GENERIC_PENDING_IRQ if SMP
@@ -340,17 +341,6 @@ config FORCE_MAX_ZONEORDER
default "17" if HUGETLB_PAGE
default "11"
 
-config VIRT_CPU_ACCOUNTING
-   bool "Deterministic task and CPU time accounting"
-   default n
-   help
- Select this option to enable more accurate task and CPU time
- accounting.  This is done by reading a CPU counter on each
- kernel entry and exit and on transitions within the kernel
- between system, softirq and hardirq state, so there is a
- small performance impact.
- If in doubt, say N here.
-
 config SMP
bool "Symmetric multi-processing support"
select USE_GENERIC_SMP_HELPERS
diff --git a/arch/powerpc/platforms/Kconfig.cputype 
b/arch/powerpc/platforms/Kconfig.cputype
index 30fd01d..72afd28 100644
--- a/arch/powerpc/platforms/Kconfig.cputype
+++ b/arch/powerpc/platforms/Kconfig.cputype
@@ -1,6 +1,7 @@
 config PPC64
bool "64-bit kernel"
default n
+   select HAVE_VIRT_CPU_ACCOUNTING
help
  This option selects whether a 32-bit or a 64-bit kernel
  will be built.
@@ -337,21 +338,6 @@ config PPC_MM_SLICES
default y if (!PPC_FSL_BOOK3E && PPC64 && HUGETLB_PAGE) || 
(PPC_STD_MMU_64 && PPC_64K_PAGES)
default n
 
-config VIRT_CPU_ACCOUNTING
-   bool "Deterministic task and CPU time accounting"
-   depends on PPC64
-   default y
-   help
- Select this option to enable more accurate task and CPU time
- accounting.  This is done by reading a CPU counter on each
- kernel entry and exit and on transitions within the kernel
- between system, softirq and hardirq state, so there is a
- small performance impact.  This also enables accounting of
- stolen time on logically-partitioned systems running on
- IBM POWER5-based machines.
-
- If in doubt, say Y here.
-
 config PPC_HAVE_PMU_SUPPORT
bool
 
diff --git a/arch/s390/Kconfig b/arch/s390/Kconfig
index 76de6b6..49ebfb6 100644
--- a/arch/s390/Kconfig
+++ b/arch/s390/Kconfig
@@ -49,9 +49,6 @@ config GENERIC_LOCKBREAK
 config PGSTE
def_bool y if KVM
 
-config VIRT_CPU_ACCOUNTING
-   def_bool y
-
 config ARCH_SUPPORTS_DEBUG_PAGEALLOC
def_bool y
 
@@ -89,6 +86,8 @@ config S390
select HAVE_MEMBLOCK
select HAVE_MEMBLOCK_NODE_MAP
select HAVE_CMPXCHG_LOCAL
+   select HAVE_VIRT_CPU_ACCOUNTING
+   select VIRT_CPU_ACCOUNTING
select ARCH_DISCARD_MEMBLOCK
select BUILDTIME_EXTABLE_SORT
select ARCH_INLINE_SPIN_TRYLOCK
diff --git a/init/Kconfig b/init/Kconfig
index af6c7f8..c40d0fb 100644
--- a/init/Kconfig
+++ b/init/Kconfig
@@ -267,6 +267,19 @@ config POSIX_MQUEUE_SYSCTL
depends on SYSCTL
default y
 
+config VIRT_CPU_ACCOUNTING
+   bool "Deterministic task and CPU time accounting"
+   depends on HAVE_VIRT_CPU_ACCOUNTING
+   default y if PPC64
+   help
+ Select this option to enable more accurate task and CPU time
+ accounting.  This is done by reading a CPU counter on each
+ kernel entry and exit and on transitions within the kernel
+ between system, softirq and hardirq state, so there is a
+ small performance impact.  In the case of s390 or IBM POWER > 5,
+ this also enables accounting of stolen time on logically-partitioned
+ systems.
+
 config BSD_PROCESS_ACCT

[PATCH 0/4] cputime: Virtual cputime accounting small cleanups and consolidation v3

2012-08-17 Thread Frederic Weisbecker
Hi,

In this v3:

- Rebase against latest tip:sched/core
- Added acks from Martin
- Refined help text for the consolidated CONFIG_VIRT_CPU_ACCOUNTING option
in the 1st patch.

You can pull from:

git://github.com/fweisbec/linux-dynticks.git
virt-cputime-v3

Thanks.

Frederic Weisbecker (4):
  cputime: Generalize CONFIG_VIRT_CPU_ACCOUNTING
  sched: Move cputime code to its own file
  cputime: Consolidate vtime handling on context switch
  s390: Remove leftover account_tick_vtime() header

 arch/Kconfig   |3 +
 arch/ia64/Kconfig  |   12 +-
 arch/ia64/include/asm/switch_to.h  |8 -
 arch/ia64/kernel/time.c|4 +-
 arch/powerpc/include/asm/time.h|6 -
 arch/powerpc/kernel/process.c  |3 -
 arch/powerpc/kernel/time.c |6 +
 arch/powerpc/platforms/Kconfig.cputype |   16 +-
 arch/s390/Kconfig  |5 +-
 arch/s390/include/asm/switch_to.h  |4 -
 arch/s390/kernel/vtime.c   |4 +-
 include/linux/kernel_stat.h|6 +
 init/Kconfig   |   13 +
 kernel/sched/Makefile  |2 +-
 kernel/sched/core.c|  558 +---
 kernel/sched/cputime.c |  503 
 kernel/sched/sched.h   |   63 
 17 files changed, 606 insertions(+), 610 deletions(-)
 create mode 100644 kernel/sched/cputime.c

-- 
1.7.5.4

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH 4/4] s390: Remove leftover account_tick_vtime() header

2012-08-17 Thread Frederic Weisbecker
The function doesn't seem to exist anymore.

Signed-off-by: Frederic Weisbecker 
Acked-by: Martin Schwidefsky 
Cc: Tony Luck 
Cc: Fenghua Yu 
Cc: Benjamin Herrenschmidt 
Cc: Paul Mackerras 
Cc: Heiko Carstens 
Cc: Ingo Molnar 
Cc: Thomas Gleixner 
Cc: Peter Zijlstra 
---
 arch/s390/include/asm/switch_to.h |2 --
 1 files changed, 0 insertions(+), 2 deletions(-)

diff --git a/arch/s390/include/asm/switch_to.h 
b/arch/s390/include/asm/switch_to.h
index e7f9b3d..314cc94 100644
--- a/arch/s390/include/asm/switch_to.h
+++ b/arch/s390/include/asm/switch_to.h
@@ -89,8 +89,6 @@ static inline void restore_access_regs(unsigned int *acrs)
prev = __switch_to(prev,next);  \
 } while (0)
 
-extern void account_tick_vtime(struct task_struct *);
-
 #define finish_arch_switch(prev) do {   \
set_fs(current->thread.mm_segment);  \
 } while (0)
-- 
1.7.5.4

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH 2/4] sched: Move cputime code to its own file

2012-08-17 Thread Frederic Weisbecker
Extract cputime code from the giant sched/core.c and
put it in its own file. This make it easier to deal with
this particular area and de-bloat a bit more core.c

Signed-off-by: Frederic Weisbecker 
Acked-by: Martin Schwidefsky 
Cc: Tony Luck 
Cc: Fenghua Yu 
Cc: Benjamin Herrenschmidt 
Cc: Paul Mackerras 
Cc: Heiko Carstens 
Cc: Ingo Molnar 
Cc: Thomas Gleixner 
Cc: Peter Zijlstra 
---
 kernel/sched/Makefile  |2 +-
 kernel/sched/core.c|  557 +---
 kernel/sched/cputime.c |  503 +++
 kernel/sched/sched.h   |   63 ++
 4 files changed, 569 insertions(+), 556 deletions(-)
 create mode 100644 kernel/sched/cputime.c

diff --git a/kernel/sched/Makefile b/kernel/sched/Makefile
index 173ea52..f06d249 100644
--- a/kernel/sched/Makefile
+++ b/kernel/sched/Makefile
@@ -11,7 +11,7 @@ ifneq ($(CONFIG_SCHED_OMIT_FRAME_POINTER),y)
 CFLAGS_core.o := $(PROFILING) -fno-omit-frame-pointer
 endif
 
-obj-y += core.o clock.o idle_task.o fair.o rt.o stop_task.o
+obj-y += core.o clock.o cputime.o idle_task.o fair.o rt.o stop_task.o
 obj-$(CONFIG_SMP) += cpupri.o
 obj-$(CONFIG_SCHED_AUTOGROUP) += auto_group.o
 obj-$(CONFIG_SCHEDSTATS) += stats.o
diff --git a/kernel/sched/core.c b/kernel/sched/core.c
index 4376c9f..ae3bcaa 100644
--- a/kernel/sched/core.c
+++ b/kernel/sched/core.c
@@ -740,126 +740,6 @@ void deactivate_task(struct rq *rq, struct task_struct 
*p, int flags)
dequeue_task(rq, p, flags);
 }
 
-#ifdef CONFIG_IRQ_TIME_ACCOUNTING
-
-/*
- * There are no locks covering percpu hardirq/softirq time.
- * They are only modified in account_system_vtime, on corresponding CPU
- * with interrupts disabled. So, writes are safe.
- * They are read and saved off onto struct rq in update_rq_clock().
- * This may result in other CPU reading this CPU's irq time and can
- * race with irq/account_system_vtime on this CPU. We would either get old
- * or new value with a side effect of accounting a slice of irq time to wrong
- * task when irq is in progress while we read rq->clock. That is a worthy
- * compromise in place of having locks on each irq in account_system_time.
- */
-static DEFINE_PER_CPU(u64, cpu_hardirq_time);
-static DEFINE_PER_CPU(u64, cpu_softirq_time);
-
-static DEFINE_PER_CPU(u64, irq_start_time);
-static int sched_clock_irqtime;
-
-void enable_sched_clock_irqtime(void)
-{
-   sched_clock_irqtime = 1;
-}
-
-void disable_sched_clock_irqtime(void)
-{
-   sched_clock_irqtime = 0;
-}
-
-#ifndef CONFIG_64BIT
-static DEFINE_PER_CPU(seqcount_t, irq_time_seq);
-
-static inline void irq_time_write_begin(void)
-{
-   __this_cpu_inc(irq_time_seq.sequence);
-   smp_wmb();
-}
-
-static inline void irq_time_write_end(void)
-{
-   smp_wmb();
-   __this_cpu_inc(irq_time_seq.sequence);
-}
-
-static inline u64 irq_time_read(int cpu)
-{
-   u64 irq_time;
-   unsigned seq;
-
-   do {
-   seq = read_seqcount_begin(&per_cpu(irq_time_seq, cpu));
-   irq_time = per_cpu(cpu_softirq_time, cpu) +
-  per_cpu(cpu_hardirq_time, cpu);
-   } while (read_seqcount_retry(&per_cpu(irq_time_seq, cpu), seq));
-
-   return irq_time;
-}
-#else /* CONFIG_64BIT */
-static inline void irq_time_write_begin(void)
-{
-}
-
-static inline void irq_time_write_end(void)
-{
-}
-
-static inline u64 irq_time_read(int cpu)
-{
-   return per_cpu(cpu_softirq_time, cpu) + per_cpu(cpu_hardirq_time, cpu);
-}
-#endif /* CONFIG_64BIT */
-
-/*
- * Called before incrementing preempt_count on {soft,}irq_enter
- * and before decrementing preempt_count on {soft,}irq_exit.
- */
-void account_system_vtime(struct task_struct *curr)
-{
-   unsigned long flags;
-   s64 delta;
-   int cpu;
-
-   if (!sched_clock_irqtime)
-   return;
-
-   local_irq_save(flags);
-
-   cpu = smp_processor_id();
-   delta = sched_clock_cpu(cpu) - __this_cpu_read(irq_start_time);
-   __this_cpu_add(irq_start_time, delta);
-
-   irq_time_write_begin();
-   /*
-* We do not account for softirq time from ksoftirqd here.
-* We want to continue accounting softirq time to ksoftirqd thread
-* in that case, so as not to confuse scheduler with a special task
-* that do not consume any time, but still wants to run.
-*/
-   if (hardirq_count())
-   __this_cpu_add(cpu_hardirq_time, delta);
-   else if (in_serving_softirq() && curr != this_cpu_ksoftirqd())
-   __this_cpu_add(cpu_softirq_time, delta);
-
-   irq_time_write_end();
-   local_irq_restore(flags);
-}
-EXPORT_SYMBOL_GPL(account_system_vtime);
-
-#endif /* CONFIG_IRQ_TIME_ACCOUNTING */
-
-#ifdef CONFIG_PARAVIRT
-static inline u64 steal_ticks(u64 steal)
-{
-   if (unlikely(steal > NSEC_PER_SEC))
-   return div_u64(steal, TICK_NSEC);
-
-   return __iter_div_u64_rem(steal, TICK_NSEC, &st

[PATCH 3/4] cputime: Consolidate vtime handling on context switch

2012-08-17 Thread Frederic Weisbecker
The archs that implement virtual cputime accounting all
flush the cputime of a task when it gets descheduled
and sometimes set up some ground initialization for the
next task to account its cputime.

These archs all put their own hooks in their context
switch callbacks and handle the off-case themselves.

Consolidate this by creating a new account_switch_vtime()
callback called in generic code right after a context switch
and that these archs must implement to flush the prev task
cputime and initialize the next task cputime related state.

Signed-off-by: Frederic Weisbecker 
Acked-by: Martin Schwidefsky 
Cc: Tony Luck 
Cc: Fenghua Yu 
Cc: Benjamin Herrenschmidt 
Cc: Paul Mackerras 
Cc: Heiko Carstens 
Cc: Ingo Molnar 
Cc: Thomas Gleixner 
Cc: Peter Zijlstra 
---
 arch/ia64/include/asm/switch_to.h |8 
 arch/ia64/kernel/time.c   |4 ++--
 arch/powerpc/include/asm/time.h   |6 --
 arch/powerpc/kernel/process.c |3 ---
 arch/powerpc/kernel/time.c|6 ++
 arch/s390/include/asm/switch_to.h |2 --
 arch/s390/kernel/vtime.c  |4 ++--
 include/linux/kernel_stat.h   |6 ++
 kernel/sched/core.c   |1 +
 9 files changed, 17 insertions(+), 23 deletions(-)

diff --git a/arch/ia64/include/asm/switch_to.h 
b/arch/ia64/include/asm/switch_to.h
index cb2412f..d38c7ea 100644
--- a/arch/ia64/include/asm/switch_to.h
+++ b/arch/ia64/include/asm/switch_to.h
@@ -30,13 +30,6 @@ extern struct task_struct *ia64_switch_to (void *next_task);
 extern void ia64_save_extra (struct task_struct *task);
 extern void ia64_load_extra (struct task_struct *task);
 
-#ifdef CONFIG_VIRT_CPU_ACCOUNTING
-extern void ia64_account_on_switch (struct task_struct *prev, struct 
task_struct *next);
-# define IA64_ACCOUNT_ON_SWITCH(p,n) ia64_account_on_switch(p,n)
-#else
-# define IA64_ACCOUNT_ON_SWITCH(p,n)
-#endif
-
 #ifdef CONFIG_PERFMON
   DECLARE_PER_CPU(unsigned long, pfm_syst_info);
 # define PERFMON_IS_SYSWIDE() (__get_cpu_var(pfm_syst_info) & 0x1)
@@ -49,7 +42,6 @@ extern void ia64_account_on_switch (struct task_struct *prev, 
struct task_struct
 || PERFMON_IS_SYSWIDE())
 
 #define __switch_to(prev,next,last) do {   
 \
-   IA64_ACCOUNT_ON_SWITCH(prev, next); 
 \
if (IA64_HAS_EXTRA_STATE(prev)) 
 \
ia64_save_extra(prev);  
 \
if (IA64_HAS_EXTRA_STATE(next)) 
 \
diff --git a/arch/ia64/kernel/time.c b/arch/ia64/kernel/time.c
index ecc904b..6247197 100644
--- a/arch/ia64/kernel/time.c
+++ b/arch/ia64/kernel/time.c
@@ -88,10 +88,10 @@ extern cputime_t cycle_to_cputime(u64 cyc);
  * accumulated times to the current process, and to prepare accounting on
  * the next process.
  */
-void ia64_account_on_switch(struct task_struct *prev, struct task_struct *next)
+void account_switch_vtime(struct task_struct *prev)
 {
struct thread_info *pi = task_thread_info(prev);
-   struct thread_info *ni = task_thread_info(next);
+   struct thread_info *ni = task_thread_info(current);
cputime_t delta_stime, delta_utime;
__u64 now;
 
diff --git a/arch/powerpc/include/asm/time.h b/arch/powerpc/include/asm/time.h
index 3b4b4a8..c1f2676 100644
--- a/arch/powerpc/include/asm/time.h
+++ b/arch/powerpc/include/asm/time.h
@@ -197,12 +197,6 @@ struct cpu_usage {
 
 DECLARE_PER_CPU(struct cpu_usage, cpu_usage_array);
 
-#if defined(CONFIG_VIRT_CPU_ACCOUNTING)
-#define account_process_vtime(tsk) account_process_tick(tsk, 0)
-#else
-#define account_process_vtime(tsk) do { } while (0)
-#endif
-
 extern void secondary_cpu_time_init(void);
 
 DECLARE_PER_CPU(u64, decrementers_next_tb);
diff --git a/arch/powerpc/kernel/process.c b/arch/powerpc/kernel/process.c
index 710f400..d73fa99 100644
--- a/arch/powerpc/kernel/process.c
+++ b/arch/powerpc/kernel/process.c
@@ -514,9 +514,6 @@ struct task_struct *__switch_to(struct task_struct *prev,
 
local_irq_save(flags);
 
-   account_system_vtime(current);
-   account_process_vtime(current);
-
/*
 * We can't take a PMU exception inside _switch() since there is a
 * window where the kernel stack SLB and the kernel stack are out
diff --git a/arch/powerpc/kernel/time.c b/arch/powerpc/kernel/time.c
index be171ee..49da7f0 100644
--- a/arch/powerpc/kernel/time.c
+++ b/arch/powerpc/kernel/time.c
@@ -366,6 +366,12 @@ void account_process_tick(struct task_struct *tsk, int 
user_tick)
account_user_time(tsk, utime, utimescaled);
 }
 
+void account_switch_vtime(struct task_struct *prev)
+{
+   account_system_vtime(prev);
+   account_process_tick(prev, 0);
+}
+
 #else /* ! CONFIG_VIRT_CPU_ACCOUNTING */
 #define calc_cputime_factors()
 #e

Re: powerpc/perf: hw breakpoints return ENOSPC

2012-08-17 Thread Frederic Weisbecker
On Thu, Aug 16, 2012 at 02:23:54PM +1000, Michael Neuling wrote:
> Hi,
> 
> I've been trying to get hardware breakpoints with perf to work on POWER7
> but I'm getting the following:
> 
>   % perf record -e mem:0x1000 true
> 
> Error: sys_perf_event_open() syscall returned with 28 (No space left on 
> device).  /bin/dmesg may provide additional information.
> 
> Fatal: No CONFIG_PERF_EVENTS=y kernel support configured?
> 
>   true: Terminated
> 
> (FWIW adding -a and it works fine)
> 
> Debugging it seems that __reserve_bp_slot() is returning ENOSPC because
> it thinks there are no free breakpoint slots on this CPU.
> 
> I have a 2 CPUs, so perf userspace is doing two perf_event_open syscalls
> to add a counter to each CPU [1].  The first syscall succeeds but the
> second is failing.
> 
> On this second syscall, fetch_bp_busy_slots() sets slots.pinned to be 1,
> despite there being no breakpoint on this CPU.  This is because the call
> the task_bp_pinned, checks all CPUs, rather than just the current CPU.
> POWER7 only has one hardware breakpoint per CPU (ie. HBP_NUM=1), so we
> return ENOSPC.
> 
> The following patch fixes this by checking the associated CPU for each
> breakpoint in task_bp_pinned.  I'm not familiar with this code, so it's
> provided as a reference to the above issue.
> 
> Mikey
> 
> 1. not sure why it doesn't just do one syscall and specify all CPUs, but
> that's another issue.  Using two syscalls should work.

This patch seems to make sense. I'll try it and run some tests.
Can I have your Signed-off-by ?

Thanks.

> 
> diff --git a/kernel/events/hw_breakpoint.c b/kernel/events/hw_breakpoint.c
> index bb38c4d..e092daa 100644
> --- a/kernel/events/hw_breakpoint.c
> +++ b/kernel/events/hw_breakpoint.c
> @@ -111,14 +111,16 @@ static unsigned int max_task_bp_pinned(int cpu, enum 
> bp_type_idx type)
>   * Count the number of breakpoints of the same type and same task.
>   * The given event must be not on the list.
>   */
> -static int task_bp_pinned(struct perf_event *bp, enum bp_type_idx type)
> +static int task_bp_pinned(int cpu, struct perf_event *bp, enum bp_type_idx 
> type)
>  {
>   struct task_struct *tsk = bp->hw.bp_target;
>   struct perf_event *iter;
>   int count = 0;
>  
>   list_for_each_entry(iter, &bp_task_head, hw.bp_list) {
> - if (iter->hw.bp_target == tsk && find_slot_idx(iter) == type)
> + if (iter->hw.bp_target == tsk &&
> + find_slot_idx(iter) == type &&
> + cpu == iter->cpu)
>   count += hw_breakpoint_weight(iter);
>   }
>  
> @@ -141,7 +143,7 @@ fetch_bp_busy_slots(struct bp_busy_slots *slots, struct 
> perf_event *bp,
>   if (!tsk)
>   slots->pinned += max_task_bp_pinned(cpu, type);
>   else
> - slots->pinned += task_bp_pinned(bp, type);
> + slots->pinned += task_bp_pinned(cpu, bp, type);
>   slots->flexible = per_cpu(nr_bp_flexible[type], cpu);
>  
>   return;
> @@ -154,7 +156,7 @@ fetch_bp_busy_slots(struct bp_busy_slots *slots, struct 
> perf_event *bp,
>   if (!tsk)
>   nr += max_task_bp_pinned(cpu, type);
>   else
> - nr += task_bp_pinned(bp, type);
> + nr += task_bp_pinned(cpu, bp, type);
>  
>   if (nr > slots->pinned)
>   slots->pinned = nr;
> @@ -188,7 +190,7 @@ static void toggle_bp_task_slot(struct perf_event *bp, 
> int cpu, bool enable,
>   int old_idx = 0;
>   int idx = 0;
>  
> - old_count = task_bp_pinned(bp, type);
> + old_count = task_bp_pinned(cpu, bp, type);
>   old_idx = old_count - 1;
>   idx = old_idx + weight;
>  
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Status of adaptive tickless patchset as of august 2012

2012-08-18 Thread Frederic Weisbecker
Hi,

I started working on the adaptive nohz patchset by the end of 2010. Since then, 
I
iterated through one big branch:

- Nohz tasks (https://lwn.net/Articles/420490/)
- Nohz cpusets (https://lwn.net/Articles/455044/)
- Nohz cpusets v2 (https://lwn.net/Articles/487599/)
- Nohz cpusets v3 (https://lwn.net/Articles/495422/)

It quickly grew up to more than 40 patches. And still the full support
(ie: handle everything that the tick maintains, but without the tick) wasn't
yet finished.

And the more I was progressing to get this full support, the more I had patches 
to
maintain, rebase, improve, etc...

Some side effects went to increase:

- I had deep reviews about the core overall design in the first iterations. 
Thanks
to that I made nice progresses. But as the patchset grew, I got less reviews 
about
overall design but more about details. And I can totally understand that. Huge 
pile
of patches certainly don't encourage reviews.

- Lacking reviews on the overall design, I was feeling more and more 
uncomfortable about
whatever I was improving or whichever feature I was adding on top of the 
existing ones.
And I was indeed digging on some wrong direction for some parts.

- I was spending too much time in out-of-tree maintainance while my goal is to 
get this
upstream.

All in one, this big branch neither scaled in term of reviews nor development.

So I decided, after Ingo proposed me to set a tree in -tip, to cut all of the 
things the
tick is handling and isolate each of these into single separate topics and 
handle them
individually or at least iteratively, trying to push the things upstream or in 
a staging
tree in -tip piecewise. As long as this is carried by concerned maintainers and 
I can get
their insights on a regular basis. And also as long as we can iterate to some 
central branch
because, even if we can cut out things into individual topics, there are 
significant interdependencies.

I think this has been successfull so far:

- The detection of illegal RCU read side critical sections under RCU extended 
quiescent
state is now upstream. This even helped finding lot of bugs upstream.

- State of user as RCU extended quiescent state is currently pending in Paul's 
tree
in the rcu/idle branch. It's also in linux-next. This may likely go upstream or 
in
a staging branch in -tip for the next merge window.

- Some preparatory work to split nohz and idle logic in nohz API. It went 
upstream
on the last merge window.

- Proposed something to handle nohz cputime accounting: 
https://lwn.net/Articles/501766/
Got fundamental reviews that pointed me to rather reuse virtual based cputime 
accounting.

- Consolidated/cleaned up virtual based cputime accounting (last version is
https://lkml.org/lkml/2012/8/17/326 and waits for inclusion in -tip or so.)

- On top of that vtime consolidation and the RCU pending patches, propose
a generic virtual cputime accounting for archs that don't have 
CONFIG_VTIME_CPU_ACCOUNTING.
See http://comments.gmane.org/gmane.linux.kernel/1337690
A tickless CPU can then account cputime with that.

So the process seem to be in a better direction now. Summer holidays have 
naturally made it a
bit smoother and the rythm will probably stay that way until the end of 
ksummit/linuxcon/LPC. But
I have the feeling we are moving forward.

No schedule plans, but once I get the above topics sorted out, I'll probably 
work on timekeeping
handling in adaptive tickless CPUs. And then the rest...

I'll still keep maintaining the big branch in my tree. But this is now going to 
be rather a big draft or
laboratory, with regular rebases on what is merged upstream or in maintainers 
tree. It helps me to
keep a practical view of the big picture.

Thanks.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 0/4] cputime: Virtual cputime accounting small cleanups and consolidation v3

2012-08-20 Thread Frederic Weisbecker
On Mon, Aug 20, 2012 at 10:40:12AM +0200, Ingo Molnar wrote:
> 
> * Frederic Weisbecker  wrote:
> 
> > Hi,
> > 
> > In this v3:
> > 
> > - Rebase against latest tip:sched/core
> > - Added acks from Martin
> > - Refined help text for the consolidated CONFIG_VIRT_CPU_ACCOUNTING option
> > in the 1st patch.
> > 
> > You can pull from:
> > 
> > git://github.com/fweisbec/linux-dynticks.git
> > virt-cputime-v3
> > 
> > Thanks.
> > 
> > Frederic Weisbecker (4):
> >   cputime: Generalize CONFIG_VIRT_CPU_ACCOUNTING
> >   sched: Move cputime code to its own file
> >   cputime: Consolidate vtime handling on context switch
> >   s390: Remove leftover account_tick_vtime() header
> > 
> >  arch/Kconfig   |3 +
> >  arch/ia64/Kconfig  |   12 +-
> >  arch/ia64/include/asm/switch_to.h  |8 -
> >  arch/ia64/kernel/time.c|4 +-
> >  arch/powerpc/include/asm/time.h|6 -
> >  arch/powerpc/kernel/process.c  |3 -
> >  arch/powerpc/kernel/time.c |6 +
> >  arch/powerpc/platforms/Kconfig.cputype |   16 +-
> >  arch/s390/Kconfig  |5 +-
> >  arch/s390/include/asm/switch_to.h  |4 -
> >  arch/s390/kernel/vtime.c   |4 +-
> >  include/linux/kernel_stat.h|6 +
> >  init/Kconfig   |   13 +
> >  kernel/sched/Makefile  |2 +-
> >  kernel/sched/core.c|  558 
> > +---
> >  kernel/sched/cputime.c |  503 
> >  kernel/sched/sched.h   |   63 
> >  17 files changed, 606 insertions(+), 610 deletions(-)
> >  create mode 100644 kernel/sched/cputime.c
> 
> Hm, I'm getting build failures on x86:
> 
> kernel/sched/cputime.c:60:2: error: implicit declaration of 
> function ‘irq_time_write_begin’ 
> [-Werror=implicit-function-declaration]
> kernel/sched/cputime.c:72:2: error: implicit declaration of 
> function ‘irq_time_write_end’ 
> [-Werror=implicit-function-declaration]
> kernel/sched/cputime.c:274:2: error: implicit declaration of 
> function ‘static_key_false’ 
> [-Werror=implicit-function-declaration]
> 
> Config attached.
> 
> Thanks,
> 
>   Ingo

Oops, sorry. Some misplaced #endif and a missing header inclusion.
So I just fixed that in the branch:

git://github.com/fweisbec/linux-dynticks.git
 virt-cputime-v4

The diff against the previous set is:

$ git diff virt-cputime-v3..virt-cputime-v4

diff --git a/kernel/sched/cputime.c b/kernel/sched/cputime.c
index ea9b4b6..372692b 100644
--- a/kernel/sched/cputime.c
+++ b/kernel/sched/cputime.c
@@ -2,6 +2,7 @@
 #include 
 #include 
 #include 
+#include 
 #include "sched.h"
 
 
diff --git a/kernel/sched/sched.h b/kernel/sched/sched.h
index 62f9850..804c2e5 100644
--- a/kernel/sched/sched.h
+++ b/kernel/sched/sched.h
@@ -1169,6 +1169,7 @@ enum rq_nohz_flag_bits {
 };
 
 #define nohz_flags(cpu)(&cpu_rq(cpu)->nohz_flags)
+#endif
 
 #ifdef CONFIG_IRQ_TIME_ACCOUNTING
 
@@ -1219,4 +1220,3 @@ static inline u64 irq_time_read(int cpu)
 #endif /* CONFIG_64BIT */
 #endif /* CONFIG_IRQ_TIME_ACCOUNTING */
 
-#endif

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 2/6] rcu: Allow rcu_user_enter()/exit() to nest

2012-07-09 Thread Frederic Weisbecker
On Fri, Jul 06, 2012 at 09:27:09AM -0700, Paul E. McKenney wrote:
> On Fri, Jul 06, 2012 at 02:00:14PM +0200, Frederic Weisbecker wrote:
> > Allow calls to rcu_user_enter() even if we are already
> > in userspace (as seen by RCU) and allow calls to rcu_user_exit()
> > even if we are already in the kernel.
> > 
> > This makes the APIs more flexible to be called from architectures.
> > Exception entries for example won't need to know if they come from
> > userspace before calling rcu_user_exit().
> 
> You lost me on this one.  As long as the nesting level stays below
> a few tens, rcu_user_enter() and rcu_user_exit() already can nest.
> 
> Or are you saying that you need to deal with duplicate rcu_user_enter()
> calls that must be matched by a single rcu_user_exit() call?

Yep, we can have that kind of thing:

in_user = 1
 syscall
rcu_user_exit() // in_user = 0
 exception
rcu_user_exit()
 end of exception
 end of syscall
rcu_user_enter()

This is because when we enter an exception, we don't have a different
entry whenever we trapped/faulted in userspace or kernelspace. So it's hard
to know if we were in userspace before the exception triggered. To avoid
complication in architecture code, I'm using this kind of "in_user" state.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 1/6] rcu: Settle config for userspace extended quiescent state

2012-07-09 Thread Frederic Weisbecker
On Fri, Jul 06, 2012 at 09:31:29AM -0700, Paul E. McKenney wrote:
> On Fri, Jul 06, 2012 at 02:00:13PM +0200, Frederic Weisbecker wrote:
> > Create a new config option under the RCU menu that put
> > CPUs under RCU extended quiescent state (as in dynticks
> > idle mode) when they run in userspace. This require
> > some contribution from architectures to hook into kernel
> > and userspace boundaries.
> > 
> > Signed-off-by: Frederic Weisbecker 
> > Cc: Alessio Igor Bogani 
> > Cc: Andrew Morton 
> > Cc: Avi Kivity 
> > Cc: Chris Metcalf 
> > Cc: Christoph Lameter 
> > Cc: Geoff Levand 
> > Cc: Gilad Ben Yossef 
> > Cc: Hakan Akkan 
> > Cc: H. Peter Anvin 
> > Cc: Ingo Molnar 
> > Cc: Josh Triplett 
> > Cc: Kevin Hilman 
> > Cc: Max Krasnyansky 
> > Cc: Peter Zijlstra 
> > Cc: Stephen Hemminger 
> > Cc: Steven Rostedt 
> > Cc: Sven-Thorsten Dietrich 
> > Cc: Thomas Gleixner 
> > ---
> >  arch/Kconfig |   13 +
> >  init/Kconfig |   10 ++
> >  kernel/rcutree.c |4 
> >  3 files changed, 27 insertions(+), 0 deletions(-)
> > 
> > diff --git a/arch/Kconfig b/arch/Kconfig
> > index 8c3d957..c2e0ce4 100644
> > --- a/arch/Kconfig
> > +++ b/arch/Kconfig
> > @@ -274,4 +274,17 @@ config SECCOMP_FILTER
> > 
> >   See Documentation/prctl/seccomp_filter.txt for details.
> > 
> > +config HAVE_RCU_USER_QS
> > +   bool
> > +   help
> > + Provide kernel entry/exit hooks necessary for userspace
> > + RCU extended quiescent state. Syscalls and exceptions
> > + low level handlers must be wrapped with a call to rcu_user_exit()
> > + on entry and rcu_user_enter() before resuming userspace. Irqs
> > + entry don't need to call rcu_user_exit() because their high level
> > + handlers are protected inside rcu_irq_enter/rcu_irq_exit() but
> > + preemption or signal handling on irq exit still need to be protected
> > + with a call to rcu_user_exit(). rcu_user_enter() must then be
> > + called back on irq exit when the preempted task is back on the CPU.
> > +
> >  source "kernel/gcov/Kconfig"
> > diff --git a/init/Kconfig b/init/Kconfig
> > index d07dcf9..3a4af8f 100644
> > --- a/init/Kconfig
> > +++ b/init/Kconfig
> > @@ -441,6 +441,16 @@ config PREEMPT_RCU
> >   This option enables preemptible-RCU code that is common between
> >   the TREE_PREEMPT_RCU and TINY_PREEMPT_RCU implementations.
> > 
> > +config RCU_USER_QS
> > +   bool "Consider userspace as in RCU extended quiescent state"
> > +   depends on HAVE_RCU_USER_QS && SMP
> 
> OK, I'll bite...  Why the "SMP"?  RCU could make good use of knowing
> about user-mode executing even in UP kernels.

Because Tiny RCU doesn't implement rcu_user_enter()/exit yet. And it
doesn't need it for now.

To better express the constraint I should probably have used:

depends on TREE_RCU || TREE_PREEMPT_RCU
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 6/6] x86: Exit RCU extended QS on notify resume

2012-07-09 Thread Frederic Weisbecker
On Fri, Jul 06, 2012 at 09:33:38AM -0700, Paul E. McKenney wrote:
> On Fri, Jul 06, 2012 at 02:00:18PM +0200, Frederic Weisbecker wrote:
> > do_notify_resume() may be called on irq exit but it won't
> > be protected between rcu_irq_enter() and rcu_irq_exit()
> > and we don't call rcu_user_exit() on irq entry (unlike
> > syscalls/exceptions entry).
> > 
> > Since it can use RCU read side critical section, we must call
> > rcu_user_exit() before doing anything there.
> > 
> > This complete support for RCU userspace extended quiescent state
> > in x86.
> > 
> > Signed-off-by: Frederic Weisbecker 
> > Cc: Alessio Igor Bogani 
> > Cc: Andrew Morton 
> > Cc: Avi Kivity 
> > Cc: Chris Metcalf 
> > Cc: Christoph Lameter 
> > Cc: Geoff Levand 
> > Cc: Gilad Ben Yossef 
> > Cc: Hakan Akkan 
> > Cc: H. Peter Anvin 
> > Cc: Ingo Molnar 
> > Cc: Josh Triplett 
> > Cc: Kevin Hilman 
> > Cc: Max Krasnyansky 
> > Cc: Peter Zijlstra 
> > Cc: Stephen Hemminger 
> > Cc: Steven Rostedt 
> > Cc: Sven-Thorsten Dietrich 
> > Cc: Thomas Gleixner 
> > ---
> >  arch/x86/Kconfig |1 +
> >  arch/x86/kernel/signal.c |2 ++
> >  2 files changed, 3 insertions(+), 0 deletions(-)
> > 
> > diff --git a/arch/x86/Kconfig b/arch/x86/Kconfig
> > index c70684f..38dfcc2 100644
> > --- a/arch/x86/Kconfig
> > +++ b/arch/x86/Kconfig
> > @@ -95,6 +95,7 @@ config X86
> > select KTIME_SCALAR if X86_32
> > select GENERIC_STRNCPY_FROM_USER
> > select GENERIC_STRNLEN_USER
> > +   select HAVE_RCU_USER_QS if X86_64
> 
> And I will bite yet again.  Why only 64-bit kernels?
> 
>   Thanx, Paul

Because I don't want to spend time on implementing it the same way on 32
in case people disagree with the whole design :)
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 2/6] rcu: Allow rcu_user_enter()/exit() to nest

2012-07-09 Thread Frederic Weisbecker
On Sun, Jul 08, 2012 at 06:54:18PM +0300, Avi Kivity wrote:
> On 07/06/2012 03:00 PM, Frederic Weisbecker wrote:
> > Allow calls to rcu_user_enter() even if we are already
> > in userspace (as seen by RCU) and allow calls to rcu_user_exit()
> > even if we are already in the kernel.
> > 
> > This makes the APIs more flexible to be called from architectures.
> > Exception entries for example won't need to know if they come from
> > userspace before calling rcu_user_exit().
> 
> I guess I should switch kvm to rcu_user_enter() and co, so we can
> disable the tick while running in a guest.  But where are those
> functions?  What are the rules for calling them?

I guess we need to have a closer look at the guest case first. We probably need
to take some care about specifics in time and load accounting usually
handled by the tick before we can shut it down. RCU is only part of the
problem.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 0/4] cputime: Virtual cputime accounting small cleanups and consolidation

2012-07-09 Thread Frederic Weisbecker
On Tue, Jun 19, 2012 at 03:43:07PM +0200, Frederic Weisbecker wrote:
> Not sure to which tree this should go. The scheduler one may be.
> Anyway if you're fine with it, it is pullable at:
> 
> git://github.com/fweisbec/linux-dynticks.git
>   virt-cputime

Ping. Are you guys fine with the patchset?

> 
> This is only built tested on the relevant archs.
> 
> I wish we could do more vtime cputime accounting consolidation
> but archs do the things pretty differently although I bet the
> behaviour could be more unified.
> 
> 
> Frederic Weisbecker (4):
>   cputime: Generalize CONFIG_VIRT_CPU_ACCOUNTING
>   sched: Move cputime code to its own file
>   cputime: Consolidate vtime handling on context switch
>   s390: Remove leftover account_tick_vtime() header
> 
>  arch/Kconfig   |3 +
>  arch/ia64/Kconfig  |   12 +-
>  arch/ia64/include/asm/switch_to.h  |8 -
>  arch/ia64/kernel/time.c|4 +-
>  arch/powerpc/include/asm/time.h|6 -
>  arch/powerpc/kernel/process.c  |3 -
>  arch/powerpc/kernel/time.c |6 +
>  arch/powerpc/platforms/Kconfig.cputype |   16 +-
>  arch/s390/Kconfig  |5 +-
>  arch/s390/include/asm/switch_to.h  |4 -
>  arch/s390/kernel/vtime.c   |4 +-
>  include/linux/kernel_stat.h|6 +
>  init/Kconfig   |   13 +
>  kernel/sched/Makefile  |2 +-
>  kernel/sched/core.c|  552 
> +---
>  kernel/sched/cputime.c |  497 
>  kernel/sched/sched.h   |   63 
>  17 files changed, 600 insertions(+), 604 deletions(-)
>  create mode 100644 kernel/sched/cputime.c
> 
> -- 
> 1.7.5.4
> 
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 6/6] x86: Exit RCU extended QS on notify resume

2012-07-10 Thread Frederic Weisbecker
On Sun, Jul 08, 2012 at 02:17:07PM -0700, Paul E. McKenney wrote:
> On Fri, Jul 06, 2012 at 01:43:29PM -0700, Josh Triplett wrote:
> > On Fri, Jul 06, 2012 at 09:33:38AM -0700, Paul E. McKenney wrote:
> > > On Fri, Jul 06, 2012 at 02:00:18PM +0200, Frederic Weisbecker wrote:
> > > > --- a/arch/x86/Kconfig
> > > > +++ b/arch/x86/Kconfig
> > > > @@ -95,6 +95,7 @@ config X86
> > > > select KTIME_SCALAR if X86_32
> > > > select GENERIC_STRNCPY_FROM_USER
> > > > select GENERIC_STRNLEN_USER
> > > > +   select HAVE_RCU_USER_QS if X86_64
> > > 
> > > And I will bite yet again.  Why only 64-bit kernels?
> > 
> > Because HAVE_RCU_USER_QS requires an architecture-specific component,
> > and this patch series only added the necessary bits to entry_64.S.
> 
> OK, please allow me to rephrase the question.  Why only entry_64.S?  ;-)

So like I said, I prefer to wait for reviews and general opinion before
pushing further.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] trace: add ability to set a target task for events (v2)

2012-07-11 Thread Frederic Weisbecker
On Wed, Jul 11, 2012 at 06:14:58PM +0400, Andrew Vagin wrote:
> A few events are interesting not only for a current task.
> For example, sched_stat_* are interesting for a task, which
> wake up. For this reason, it will be good, if such events will
> be delivered to a target task too.
> 
> Now a target task can be set by using __perf_task().
> 
> The original idea and a draft patch belongs to Peter Zijlstra.
> 
> I need this events for profiling sleep times.  sched_switch is used for
> getting callchains and sched_stat_* is used for getting time periods.
> This events are combined in user space, then it can be analized by
> perf tools.

We've talked about that numerous times. But I still don't really
understand why you're not using sched switch events and compute
the difference between schedule in and schedule out.

I think you said that's because you got too much events with sched
switch. Are you loosing events? Otherwise I don't see why it's
a problem.

Also the sched_stat_sleep event produce an event which period equals the
time slept. Internally, perf split this into as many events as that period
because the requested period for trace events is 1 by default. We probably
should allow to send events with a higher number than the one requested. This
this produce sometimes a huge pile of events, and that even often result in
tons of lost events. We definetly need to fix that.

In the meantime you'll certainly get saner results by just recording
sched switch events.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] trace: add ability to set a target task for events (v2)

2012-07-11 Thread Frederic Weisbecker
On Wed, Jul 11, 2012 at 04:33:41PM +0200, Peter Zijlstra wrote:
> On Wed, 2012-07-11 at 16:31 +0200, Frederic Weisbecker wrote:
> > On Wed, Jul 11, 2012 at 06:14:58PM +0400, Andrew Vagin wrote:
> > > A few events are interesting not only for a current task.
> > > For example, sched_stat_* are interesting for a task, which
> > > wake up. For this reason, it will be good, if such events will
> > > be delivered to a target task too.
> > > 
> > > Now a target task can be set by using __perf_task().
> > > 
> > > The original idea and a draft patch belongs to Peter Zijlstra.
> > > 
> > > I need this events for profiling sleep times.  sched_switch is used for
> > > getting callchains and sched_stat_* is used for getting time periods.
> > > This events are combined in user space, then it can be analized by
> > > perf tools.
> > 
> > We've talked about that numerous times. But I still don't really
> > understand why you're not using sched switch events and compute
> > the difference between schedule in and schedule out.
> > 
> > I think you said that's because you got too much events with sched
> > switch. Are you loosing events? Otherwise I don't see why it's
> > a problem.
> > 
> > Also the sched_stat_sleep event produce an event which period equals the
> > time slept. Internally, perf split this into as many events as that period
> > because the requested period for trace events is 1 by default. We probably
> > should allow to send events with a higher number than the one requested. 
> > This
> > this produce sometimes a huge pile of events, and that even often result in
> > tons of lost events. We definetly need to fix that.
> > 
> > In the meantime you'll certainly get saner results by just recording
> > sched switch events.
> 
> Not really, there's an arbitrary large delay between wakeup and getting
> scheduled back in, which is unrelated to the cause that you went to
> sleep.
> 
> The wants the time between going to sleep and getting woken up,
> sched_switch simply doesn't give you that.

In this case he can just record sched wakeup as well. With sched_switch
+ sched_wakeup, he'll unlikely lose events.

With sched_stat_sleep he will lose events, unless we fix this period
demux thing.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] trace: add ability to set a target task for events (v2)

2012-07-11 Thread Frederic Weisbecker
On Wed, Jul 11, 2012 at 04:38:19PM +0200, Peter Zijlstra wrote:
> On Wed, 2012-07-11 at 16:36 +0200, Frederic Weisbecker wrote:
> > 
> > In this case he can just record sched wakeup as well. With sched_switch
> > + sched_wakeup, he'll unlikely lose events.
> > 
> > With sched_stat_sleep he will lose events, unless we fix this period
> > demux thing. 
> 
> But without this patch, the sched_wakeup will belong to another task, so
> if you trace task A, and B wakes you, you'll never see the wakeup.

Ah so the goal is to minimize the amount of events by only tracing task A?
Ok then. Still we need to fix these events that use __perf_count() because
wide tracing of sched_switch/wake_up still generate less events than
sched stat sleep.

I believe:

perf record -e sched:sched_stat_sleep sleep 1

produces 1 billion events because we sleep 1 billion nanosecs. Or
something like that.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] trace: add ability to set a target task for events (v2)

2012-07-11 Thread Frederic Weisbecker
On Wed, Jul 11, 2012 at 04:55:08PM +0200, Peter Zijlstra wrote:
> On Wed, 2012-07-11 at 16:48 +0200, Frederic Weisbecker wrote:
> > On Wed, Jul 11, 2012 at 04:38:19PM +0200, Peter Zijlstra wrote:
> > > On Wed, 2012-07-11 at 16:36 +0200, Frederic Weisbecker wrote:
> > > > 
> > > > In this case he can just record sched wakeup as well. With sched_switch
> > > > + sched_wakeup, he'll unlikely lose events.
> > > > 
> > > > With sched_stat_sleep he will lose events, unless we fix this period
> > > > demux thing. 
> > > 
> > > But without this patch, the sched_wakeup will belong to another task, so
> > > if you trace task A, and B wakes you, you'll never see the wakeup.
> > 
> > Ah so the goal is to minimize the amount of events by only tracing task A?
> 
> Right, or just not having sufficient privs to trace the world. And a
> wakeup of A is very much also part of A, not only the task doing the
> wakeup.
> 
> Hence the proposed mechanism.

Yeah that's fair.

> 
> > Ok then. Still we need to fix these events that use __perf_count() because
> > wide tracing of sched_switch/wake_up still generate less events than
> > sched stat sleep.
> > 
> > I believe:
> > 
> > perf record -e sched:sched_stat_sleep sleep 1
> > 
> > produces 1 billion events because we sleep 1 billion nanosecs. Or
> > something like that.
> 
> Right.. back when I did that the plan was to make PERF_SAMPLE_PERIOD fix
> that, of course that never seemed to have happened.
> 
> With PERF_SAMPLE_PERIOD you can simply write the 1b into the period of 1
> event and be done with it.

I believe the perf tools handle pretty well variable periods of an event
on top of PERF_SAMPLE_PERIOD. We just need to tweak the maths in
perf_swevent_overflow() I think...
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] trace: add ability to set a target task for events (v2)

2012-07-11 Thread Frederic Weisbecker
On Wed, Jul 11, 2012 at 05:12:04PM +0200, Peter Zijlstra wrote:
> On Wed, 2012-07-11 at 16:55 +0200, Peter Zijlstra wrote:
> > Right.. back when I did that the plan was to make PERF_SAMPLE_PERIOD fix
> > that, of course that never seemed to have happened.
> > 
> > With PERF_SAMPLE_PERIOD you can simply write the 1b into the period of 1
> > event and be done with it. 
> 
> It did! Andrew fixed it..

Ah! Then may be we need to force PERF_SAMPLE_PERIOD on tracepoints from
perf tools. I need to check that.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[RFC PATCH 00/11] rcu: Userspace RCU extended quiescent state v2

2012-07-11 Thread Frederic Weisbecker
Hi,

There are significant changes this time. I reverted back to using
a TIF flag to hook on syscalls slow path and put the hooks on
high level exception handlers instead of low level ones.

It makes the code more portable between x86-32 and x86-64, it
makes the hooks clearer and easier to review and the overhead
is lowered in the off-case. This can be even better if we use
jump labels later.

Thanks.

git://github.com/fweisbec/linux-dynticks.git
rcu/user-2

Frederic Weisbecker (11):
  rcu: Settle config for userspace extended quiescent state
  rcu: Allow rcu_user_enter()/exit() to nest
  rcu: Ignore userspace extended quiescent state by default
  rcu: Switch task's syscall hooks on context switch
  x86: Syscall hooks for userspace RCU extended QS
  x86: Exception hooks for userspace RCU extended QS
  rcu: Exit RCU extended QS on kernel preemption after irq/exception
  rcu: Exit RCU extended QS on user preemption
  x86: Use the new schedule_user API on userspace preemption
  x86: Exit RCU extended QS on notify resume
  rcu: Userspace RCU extended QS selftest

 arch/Kconfig   |   10 ++
 arch/um/drivers/mconsole_kern.c|2 +-
 arch/x86/Kconfig   |1 +
 arch/x86/include/asm/rcu.h |   20 +++
 arch/x86/include/asm/thread_info.h |   10 --
 arch/x86/kernel/entry_64.S |8 ++--
 arch/x86/kernel/ptrace.c   |5 +++
 arch/x86/kernel/signal.c   |4 ++
 arch/x86/kernel/traps.c|   30 
 arch/x86/mm/fault.c|   13 ++-
 include/linux/rcupdate.h   |   10 ++
 include/linux/sched.h  |   20 ++-
 init/Kconfig   |   18 ++
 kernel/rcutree.c   |   64 +++-
 kernel/rcutree.h   |4 ++
 kernel/sched/core.c|   10 +-
 16 files changed, 192 insertions(+), 37 deletions(-)
 create mode 100644 arch/x86/include/asm/rcu.h

-- 
1.7.5.4

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH 04/11] rcu: Switch task's syscall hooks on context switch

2012-07-11 Thread Frederic Weisbecker
Clear the syscalls hook of a task when it's scheduled out so that if
the task migrates, it doesn't run the syscall slow path on a CPU
that might not need it.

Also set the syscalls hook on the next task if needed.

Signed-off-by: Frederic Weisbecker 
Cc: Alessio Igor Bogani 
Cc: Andrew Morton 
Cc: Avi Kivity 
Cc: Chris Metcalf 
Cc: Christoph Lameter 
Cc: Geoff Levand 
Cc: Gilad Ben Yossef 
Cc: Hakan Akkan 
Cc: H. Peter Anvin 
Cc: Ingo Molnar 
Cc: Josh Triplett 
Cc: Kevin Hilman 
Cc: Max Krasnyansky 
Cc: Peter Zijlstra 
Cc: Stephen Hemminger 
Cc: Steven Rostedt 
Cc: Sven-Thorsten Dietrich 
Cc: Thomas Gleixner 
---
 arch/um/drivers/mconsole_kern.c |2 +-
 include/linux/rcupdate.h|2 ++
 include/linux/sched.h   |   20 +++-
 kernel/rcutree.c|   15 +++
 kernel/sched/core.c |2 +-
 5 files changed, 30 insertions(+), 11 deletions(-)

diff --git a/arch/um/drivers/mconsole_kern.c b/arch/um/drivers/mconsole_kern.c
index 88e466b..e61922d 100644
--- a/arch/um/drivers/mconsole_kern.c
+++ b/arch/um/drivers/mconsole_kern.c
@@ -705,7 +705,7 @@ static void stack_proc(void *arg)
struct task_struct *from = current, *to = arg;
 
to->thread.saved_task = from;
-   rcu_switch_from(from);
+   rcu_switch(from, to);
switch_to(from, to, from);
 }
 
diff --git a/include/linux/rcupdate.h b/include/linux/rcupdate.h
index a72f25e..1e57888 100644
--- a/include/linux/rcupdate.h
+++ b/include/linux/rcupdate.h
@@ -198,6 +198,8 @@ extern void rcu_user_enter(void);
 extern void rcu_user_exit(void);
 extern void rcu_user_enter_irq(void);
 extern void rcu_user_exit_irq(void);
+extern void rcu_user_hooks_switch(struct task_struct *prev,
+ struct task_struct *next);
 #else
 static inline void rcu_user_enter(void) { }
 static inline void rcu_user_exit(void) { }
diff --git a/include/linux/sched.h b/include/linux/sched.h
index 4059c0f..e17fcd0 100644
--- a/include/linux/sched.h
+++ b/include/linux/sched.h
@@ -1871,23 +1871,25 @@ static inline void rcu_copy_process(struct task_struct 
*p)
INIT_LIST_HEAD(&p->rcu_node_entry);
 }
 
-static inline void rcu_switch_from(struct task_struct *prev)
-{
-   if (prev->rcu_read_lock_nesting != 0)
-   rcu_preempt_note_context_switch();
-}
-
 #else
 
 static inline void rcu_copy_process(struct task_struct *p)
 {
 }
 
-static inline void rcu_switch_from(struct task_struct *prev)
-{
-}
+#endif
 
+static inline void rcu_switch(struct task_struct *prev,
+ struct task_struct *next)
+{
+#ifdef CONFIG_PREEMPT_RCU
+   if (prev->rcu_read_lock_nesting != 0)
+   rcu_preempt_note_context_switch();
+#endif
+#ifdef CONFIG_RCU_USER_QS
+   rcu_user_hooks_switch(prev, next);
 #endif
+}
 
 #ifdef CONFIG_SMP
 extern void do_set_cpus_allowed(struct task_struct *p,
diff --git a/kernel/rcutree.c b/kernel/rcutree.c
index 78b0c30..2d79308 100644
--- a/kernel/rcutree.c
+++ b/kernel/rcutree.c
@@ -720,6 +720,21 @@ int rcu_is_cpu_idle(void)
 }
 EXPORT_SYMBOL(rcu_is_cpu_idle);
 
+#ifdef CONFIG_RCU_USER_QS
+void rcu_user_hooks_switch(struct task_struct *prev,
+  struct task_struct *next)
+{
+   struct rcu_dynticks *rdtp;
+
+   /* Interrupts are disabled in context switch */
+   rdtp = &__get_cpu_var(rcu_dynticks);
+   if (!rdtp->ignore_user_qs) {
+   clear_tsk_thread_flag(prev, TIF_NOHZ);
+   set_tsk_thread_flag(next, TIF_NOHZ);
+   }
+}
+#endif
+
 #if defined(CONFIG_PROVE_RCU) && defined(CONFIG_HOTPLUG_CPU)
 
 /*
diff --git a/kernel/sched/core.c b/kernel/sched/core.c
index d5594a4..fa61d8a 100644
--- a/kernel/sched/core.c
+++ b/kernel/sched/core.c
@@ -2081,7 +2081,7 @@ context_switch(struct rq *rq, struct task_struct *prev,
 #endif
 
/* Here we just switch the register state and the stack. */
-   rcu_switch_from(prev);
+   rcu_switch(prev, next);
switch_to(prev, next, prev);
 
barrier();
-- 
1.7.5.4

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH 05/11] x86: Syscall hooks for userspace RCU extended QS

2012-07-11 Thread Frederic Weisbecker
Add syscall slow path hooks to notify syscall entry
and exit on CPUs that want to support userspace RCU
extended quiescent state.

Signed-off-by: Frederic Weisbecker 
Cc: Alessio Igor Bogani 
Cc: Andrew Morton 
Cc: Avi Kivity 
Cc: Chris Metcalf 
Cc: Christoph Lameter 
Cc: Geoff Levand 
Cc: Gilad Ben Yossef 
Cc: Hakan Akkan 
Cc: H. Peter Anvin 
Cc: Ingo Molnar 
Cc: Josh Triplett 
Cc: Kevin Hilman 
Cc: Max Krasnyansky 
Cc: Peter Zijlstra 
Cc: Stephen Hemminger 
Cc: Steven Rostedt 
Cc: Sven-Thorsten Dietrich 
Cc: Thomas Gleixner 
---
 arch/x86/include/asm/thread_info.h |   10 +++---
 arch/x86/kernel/ptrace.c   |5 +
 2 files changed, 12 insertions(+), 3 deletions(-)

diff --git a/arch/x86/include/asm/thread_info.h 
b/arch/x86/include/asm/thread_info.h
index 89f794f..c535d84 100644
--- a/arch/x86/include/asm/thread_info.h
+++ b/arch/x86/include/asm/thread_info.h
@@ -89,6 +89,7 @@ struct thread_info {
 #define TIF_NOTSC  16  /* TSC is not accessible in userland */
 #define TIF_IA32   17  /* IA32 compatibility process */
 #define TIF_FORK   18  /* ret_from_fork */
+#define TIF_NOHZ   19  /* in adaptive nohz mode */
 #define TIF_MEMDIE 20  /* is terminating due to OOM killer */
 #define TIF_DEBUG  21  /* uses debug registers */
 #define TIF_IO_BITMAP  22  /* uses I/O bitmap */
@@ -114,6 +115,7 @@ struct thread_info {
 #define _TIF_NOTSC (1 << TIF_NOTSC)
 #define _TIF_IA32  (1 << TIF_IA32)
 #define _TIF_FORK  (1 << TIF_FORK)
+#define _TIF_NOHZ  (1 << TIF_NOHZ)
 #define _TIF_DEBUG (1 << TIF_DEBUG)
 #define _TIF_IO_BITMAP (1 << TIF_IO_BITMAP)
 #define _TIF_FORCED_TF (1 << TIF_FORCED_TF)
@@ -126,12 +128,13 @@ struct thread_info {
 /* work to do in syscall_trace_enter() */
 #define _TIF_WORK_SYSCALL_ENTRY\
(_TIF_SYSCALL_TRACE | _TIF_SYSCALL_EMU | _TIF_SYSCALL_AUDIT |   \
-_TIF_SECCOMP | _TIF_SINGLESTEP | _TIF_SYSCALL_TRACEPOINT)
+_TIF_SECCOMP | _TIF_SINGLESTEP | _TIF_SYSCALL_TRACEPOINT | \
+_TIF_NOHZ)
 
 /* work to do in syscall_trace_leave() */
 #define _TIF_WORK_SYSCALL_EXIT \
(_TIF_SYSCALL_TRACE | _TIF_SYSCALL_AUDIT | _TIF_SINGLESTEP |\
-_TIF_SYSCALL_TRACEPOINT)
+_TIF_SYSCALL_TRACEPOINT | _TIF_NOHZ)
 
 /* work to do on interrupt/exception return */
 #define _TIF_WORK_MASK \
@@ -141,7 +144,8 @@ struct thread_info {
 
 /* work to do on any return to user space */
 #define _TIF_ALLWORK_MASK  \
-   ((0x & ~_TIF_SECCOMP) | _TIF_SYSCALL_TRACEPOINT)
+   ((0x & ~_TIF_SECCOMP) | _TIF_SYSCALL_TRACEPOINT |   \
+   _TIF_NOHZ)
 
 /* Only used for 64 bit */
 #define _TIF_DO_NOTIFY_MASK\
diff --git a/arch/x86/kernel/ptrace.c b/arch/x86/kernel/ptrace.c
index c4c6a5c..9f94f8e 100644
--- a/arch/x86/kernel/ptrace.c
+++ b/arch/x86/kernel/ptrace.c
@@ -21,6 +21,7 @@
 #include 
 #include 
 #include 
+#include 
 
 #include 
 #include 
@@ -1463,6 +1464,8 @@ long syscall_trace_enter(struct pt_regs *regs)
 {
long ret = 0;
 
+   rcu_user_exit();
+
/*
 * If we stepped into a sysenter/syscall insn, it trapped in
 * kernel mode; do_debug() cleared TF and set TIF_SINGLESTEP.
@@ -1526,4 +1529,6 @@ void syscall_trace_leave(struct pt_regs *regs)
!test_thread_flag(TIF_SYSCALL_EMU);
if (step || test_thread_flag(TIF_SYSCALL_TRACE))
tracehook_report_syscall_exit(regs, step);
+
+   rcu_user_enter();
 }
-- 
1.7.5.4

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH 06/11] x86: Exception hooks for userspace RCU extended QS

2012-07-11 Thread Frederic Weisbecker
Add necessary hooks to x86 exception for userspace
RCU extended quiescent state support.

This includes traps, page fault, debug exceptions, etc...

Signed-off-by: Frederic Weisbecker 
Cc: Alessio Igor Bogani 
Cc: Andrew Morton 
Cc: Avi Kivity 
Cc: Chris Metcalf 
Cc: Christoph Lameter 
Cc: Geoff Levand 
Cc: Gilad Ben Yossef 
Cc: Hakan Akkan 
Cc: H. Peter Anvin 
Cc: Ingo Molnar 
Cc: Josh Triplett 
Cc: Kevin Hilman 
Cc: Max Krasnyansky 
Cc: Peter Zijlstra 
Cc: Stephen Hemminger 
Cc: Steven Rostedt 
Cc: Sven-Thorsten Dietrich 
Cc: Thomas Gleixner 
---
 arch/x86/include/asm/rcu.h |   20 
 arch/x86/kernel/traps.c|   30 ++
 arch/x86/mm/fault.c|   13 +++--
 3 files changed, 53 insertions(+), 10 deletions(-)
 create mode 100644 arch/x86/include/asm/rcu.h

diff --git a/arch/x86/include/asm/rcu.h b/arch/x86/include/asm/rcu.h
new file mode 100644
index 000..439815b
--- /dev/null
+++ b/arch/x86/include/asm/rcu.h
@@ -0,0 +1,20 @@
+#ifndef _ASM_X86_RCU_H
+#define _ASM_X86_RCU_H
+
+#include 
+#include 
+
+static inline void exception_enter(struct pt_regs *regs)
+{
+   rcu_user_exit();
+}
+
+static inline void exception_exit(struct pt_regs *regs)
+{
+#ifdef CONFIG_RCU_USER_QS
+   if (user_mode(regs))
+   rcu_user_enter();
+#endif
+}
+
+#endif
diff --git a/arch/x86/kernel/traps.c b/arch/x86/kernel/traps.c
index 05b31d9..9b8195b 100644
--- a/arch/x86/kernel/traps.c
+++ b/arch/x86/kernel/traps.c
@@ -52,6 +52,7 @@
 #include 
 #include 
 #include 
+#include 
 
 #include 
 
@@ -178,11 +179,15 @@ vm86_trap:
 #define DO_ERROR(trapnr, signr, str, name) \
 dotraplinkage void do_##name(struct pt_regs *regs, long error_code)\
 {  \
-   if (notify_die(DIE_TRAP, str, regs, error_code, trapnr, signr)  \
-   == NOTIFY_STOP) \
+   exception_enter(regs);  \
+   if (notify_die(DIE_TRAP, str, regs, error_code, \
+   trapnr, signr) == NOTIFY_STOP) {\
+   exception_exit(regs);   \
return; \
+   }   \
conditional_sti(regs);  \
do_trap(trapnr, signr, str, regs, error_code, NULL);\
+   exception_exit(regs);   \
 }
 
 #define DO_ERROR_INFO(trapnr, signr, str, name, sicode, siaddr)
\
@@ -193,11 +198,15 @@ dotraplinkage void do_##name(struct pt_regs *regs, long 
error_code)   \
info.si_errno = 0;  \
info.si_code = sicode;  \
info.si_addr = (void __user *)siaddr;   \
-   if (notify_die(DIE_TRAP, str, regs, error_code, trapnr, signr)  \
-   == NOTIFY_STOP) \
+   exception_enter(regs);  \
+   if (notify_die(DIE_TRAP, str, regs, error_code, \
+   trapnr, signr) == NOTIFY_STOP) {\
+   exception_exit(regs);   \
return; \
+   }   \
conditional_sti(regs);  \
do_trap(trapnr, signr, str, regs, error_code, &info);   \
+   exception_exit(regs);   \
 }
 
 DO_ERROR_INFO(X86_TRAP_DE, SIGFPE, "divide error", divide_error, FPE_INTDIV,
@@ -311,6 +320,7 @@ dotraplinkage void __kprobes notrace do_int3(struct pt_regs 
*regs, long error_co
ftrace_int3_handler(regs))
return;
 #endif
+   exception_enter(regs);
 #ifdef CONFIG_KGDB_LOW_LEVEL_TRAP
if (kgdb_ll_trap(DIE_INT3, "int3", regs, error_code, X86_TRAP_BP,
SIGTRAP) == NOTIFY_STOP)
@@ -330,6 +340,7 @@ dotraplinkage void __kprobes notrace do_int3(struct pt_regs 
*regs, long error_co
do_trap(X86_TRAP_BP, SIGTRAP, "int3", regs, error_code, NULL);
preempt_conditional_cli(regs);
debug_stack_usage_dec();
+   exception_exit(regs);
 }
 
 #ifdef CONFIG_X86_64
@@ -390,6 +401,8 @@ dotraplinkage void __kprobes do_debug(struct pt_regs *regs, 
long error_code)
unsigned long dr6;
int si_code;
 
+   exception_enter(regs);
+
get_debugreg(dr6, 6);
 
/* Filter out all the reserved bits which are preset to 1 */
@@ -405,7 +418,7 @@ dotraplinkage void __kprobes do_debug(stru

[PATCH 09/11] x86: Use the new schedule_user API on userspace preemption

2012-07-11 Thread Frederic Weisbecker
This way we can exit the RCU extended quiescent state before
we schedule a new task from irq/exception exit.

Signed-off-by: Frederic Weisbecker 
Cc: Alessio Igor Bogani 
Cc: Andrew Morton 
Cc: Avi Kivity 
Cc: Chris Metcalf 
Cc: Christoph Lameter 
Cc: Geoff Levand 
Cc: Gilad Ben Yossef 
Cc: Hakan Akkan 
Cc: H. Peter Anvin 
Cc: Ingo Molnar 
Cc: Josh Triplett 
Cc: Kevin Hilman 
Cc: Max Krasnyansky 
Cc: Peter Zijlstra 
Cc: Stephen Hemminger 
Cc: Steven Rostedt 
Cc: Sven-Thorsten Dietrich 
Cc: Thomas Gleixner 
---
 arch/x86/kernel/entry_64.S |8 
 1 files changed, 4 insertions(+), 4 deletions(-)

diff --git a/arch/x86/kernel/entry_64.S b/arch/x86/kernel/entry_64.S
index 7d65133..e97d42d 100644
--- a/arch/x86/kernel/entry_64.S
+++ b/arch/x86/kernel/entry_64.S
@@ -565,7 +565,7 @@ sysret_careful:
TRACE_IRQS_ON
ENABLE_INTERRUPTS(CLBR_NONE)
pushq_cfi %rdi
-   call schedule
+   call schedule_user
popq_cfi %rdi
jmp sysret_check
 
@@ -678,7 +678,7 @@ int_careful:
TRACE_IRQS_ON
ENABLE_INTERRUPTS(CLBR_NONE)
pushq_cfi %rdi
-   call schedule
+   call schedule_user
popq_cfi %rdi
DISABLE_INTERRUPTS(CLBR_NONE)
TRACE_IRQS_OFF
@@ -974,7 +974,7 @@ retint_careful:
TRACE_IRQS_ON
ENABLE_INTERRUPTS(CLBR_NONE)
pushq_cfi %rdi
-   call  schedule
+   call  schedule_user
popq_cfi %rdi
GET_THREAD_INFO(%rcx)
DISABLE_INTERRUPTS(CLBR_NONE)
@@ -1467,7 +1467,7 @@ paranoid_userspace:
 paranoid_schedule:
TRACE_IRQS_ON
ENABLE_INTERRUPTS(CLBR_ANY)
-   call schedule
+   call schedule_user
DISABLE_INTERRUPTS(CLBR_ANY)
TRACE_IRQS_OFF
jmp paranoid_userspace
-- 
1.7.5.4

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH 10/11] x86: Exit RCU extended QS on notify resume

2012-07-11 Thread Frederic Weisbecker
do_notify_resume() may be called on irq or exception
exit. But at that time the exception has already called
rcu_user_enter() and the irq has already called rcu_irq_exit().

Since it can use RCU read side critical section, we must call
rcu_user_exit() before doing anything there. Then we must call
back rcu_user_enter() after this function because we know we are
going to userspace from there.

This complete support for userspace RCU extended quiescent state
in x86-64.

Signed-off-by: Frederic Weisbecker 
Cc: Alessio Igor Bogani 
Cc: Andrew Morton 
Cc: Avi Kivity 
Cc: Chris Metcalf 
Cc: Christoph Lameter 
Cc: Geoff Levand 
Cc: Gilad Ben Yossef 
Cc: Hakan Akkan 
Cc: H. Peter Anvin 
Cc: Ingo Molnar 
Cc: Josh Triplett 
Cc: Kevin Hilman 
Cc: Max Krasnyansky 
Cc: Peter Zijlstra 
Cc: Stephen Hemminger 
Cc: Steven Rostedt 
Cc: Sven-Thorsten Dietrich 
Cc: Thomas Gleixner 
---
 arch/x86/Kconfig |1 +
 arch/x86/kernel/signal.c |4 
 2 files changed, 5 insertions(+), 0 deletions(-)

diff --git a/arch/x86/Kconfig b/arch/x86/Kconfig
index c70684f..38dfcc2 100644
--- a/arch/x86/Kconfig
+++ b/arch/x86/Kconfig
@@ -95,6 +95,7 @@ config X86
select KTIME_SCALAR if X86_32
select GENERIC_STRNCPY_FROM_USER
select GENERIC_STRNLEN_USER
+   select HAVE_RCU_USER_QS if X86_64
 
 config INSTRUCTION_DECODER
def_bool (KPROBES || PERF_EVENTS || UPROBES)
diff --git a/arch/x86/kernel/signal.c b/arch/x86/kernel/signal.c
index 21af737..5cc2579 100644
--- a/arch/x86/kernel/signal.c
+++ b/arch/x86/kernel/signal.c
@@ -776,6 +776,8 @@ static void do_signal(struct pt_regs *regs)
 void
 do_notify_resume(struct pt_regs *regs, void *unused, __u32 thread_info_flags)
 {
+   rcu_user_exit();
+
 #ifdef CONFIG_X86_MCE
/* notify userspace of pending MCEs */
if (thread_info_flags & _TIF_MCE_NOTIFY)
@@ -801,6 +803,8 @@ do_notify_resume(struct pt_regs *regs, void *unused, __u32 
thread_info_flags)
 #ifdef CONFIG_X86_32
clear_thread_flag(TIF_IRET);
 #endif /* CONFIG_X86_32 */
+
+   rcu_user_enter();
 }
 
 void signal_fault(struct pt_regs *regs, void __user *frame, char *where)
-- 
1.7.5.4

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH 11/11] rcu: Userspace RCU extended QS selftest

2012-07-11 Thread Frederic Weisbecker
Provide a config option that enables the userspace
RCU extended quiescent state on every CPUs by default.

This is for testing purpose.

Signed-off-by: Frederic Weisbecker 
Cc: Alessio Igor Bogani 
Cc: Andrew Morton 
Cc: Avi Kivity 
Cc: Chris Metcalf 
Cc: Christoph Lameter 
Cc: Geoff Levand 
Cc: Gilad Ben Yossef 
Cc: Hakan Akkan 
Cc: H. Peter Anvin 
Cc: Ingo Molnar 
Cc: Josh Triplett 
Cc: Kevin Hilman 
Cc: Max Krasnyansky 
Cc: Peter Zijlstra 
Cc: Stephen Hemminger 
Cc: Steven Rostedt 
Cc: Sven-Thorsten Dietrich 
Cc: Thomas Gleixner 
---
 init/Kconfig |8 
 kernel/rcutree.c |2 +-
 2 files changed, 9 insertions(+), 1 deletions(-)

diff --git a/init/Kconfig b/init/Kconfig
index 3a4af8f..7d1db2e 100644
--- a/init/Kconfig
+++ b/init/Kconfig
@@ -451,6 +451,14 @@ config RCU_USER_QS
  excluded from the global RCU state machine and thus doesn't
  to keep the timer tick on for RCU.
 
+config RCU_USER_QS_FORCE
+   bool "Force userspace extended QS by default"
+   depends on RCU_USER_QS
+   help
+ Set the hooks in user/kernel boundaries by default in order to
+ test this feature that treats userspace as an extended quiescent
+ state until we have a real user like a full adaptive nohz option.
+
 config RCU_FANOUT
int "Tree-based hierarchical RCU fanout value"
range 2 64 if 64BIT
diff --git a/kernel/rcutree.c b/kernel/rcutree.c
index 2d79308..9427aba 100644
--- a/kernel/rcutree.c
+++ b/kernel/rcutree.c
@@ -209,7 +209,7 @@ EXPORT_SYMBOL_GPL(rcu_note_context_switch);
 DEFINE_PER_CPU(struct rcu_dynticks, rcu_dynticks) = {
.dynticks_nesting = DYNTICK_TASK_EXIT_IDLE,
.dynticks = ATOMIC_INIT(1),
-#ifdef CONFIG_RCU_USER_QS
+#if defined(CONFIG_RCU_USER_QS) && !defined(CONFIG_RCU_USER_QS_FORCE)
.ignore_user_qs = true,
 #endif
 };
-- 
1.7.5.4

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH 07/11] rcu: Exit RCU extended QS on kernel preemption after irq/exception

2012-07-11 Thread Frederic Weisbecker
When an exception or an irq exits, and we are going to resume into
interrupted kernel code, the low level architecture code calls
preempt_schedule_irq() if there is a need to reschedule.

If the interrupt/exception occured between a call to rcu_user_enter()
(from syscall exit, exception exit, do_notify_resume exit, ...) and
a real resume to userspace (iret,...), preempt_schedule_irq() can be
called whereas RCU thinks we are in userspace. But preempt_schedule_irq()
is going to run kernel code and may be some RCU read side critical
section. We must exit the userspace extended quiescent state before
we call it.

To solve this, just call rcu_user_exit() in the beginning of
preempt_schedule_irq().

Signed-off-by: Frederic Weisbecker 
Cc: Alessio Igor Bogani 
Cc: Andrew Morton 
Cc: Avi Kivity 
Cc: Chris Metcalf 
Cc: Christoph Lameter 
Cc: Geoff Levand 
Cc: Gilad Ben Yossef 
Cc: Hakan Akkan 
Cc: H. Peter Anvin 
Cc: Ingo Molnar 
Cc: Josh Triplett 
Cc: Kevin Hilman 
Cc: Max Krasnyansky 
Cc: Peter Zijlstra 
Cc: Stephen Hemminger 
Cc: Steven Rostedt 
Cc: Sven-Thorsten Dietrich 
Cc: Thomas Gleixner 
---
 kernel/sched/core.c |1 +
 1 files changed, 1 insertions(+), 0 deletions(-)

diff --git a/kernel/sched/core.c b/kernel/sched/core.c
index fa61d8a..1e0fa5b 100644
--- a/kernel/sched/core.c
+++ b/kernel/sched/core.c
@@ -3433,6 +3433,7 @@ asmlinkage void __sched preempt_schedule_irq(void)
/* Catch callers which need to be fixed */
BUG_ON(ti->preempt_count || !irqs_disabled());
 
+   rcu_user_exit();
do {
add_preempt_count(PREEMPT_ACTIVE);
local_irq_enable();
-- 
1.7.5.4

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH 01/11] rcu: Settle config for userspace extended quiescent state

2012-07-11 Thread Frederic Weisbecker
Create a new config option under the RCU menu that put
CPUs under RCU extended quiescent state (as in dynticks
idle mode) when they run in userspace. This require
some contribution from architectures to hook into kernel
and userspace boundaries.

Signed-off-by: Frederic Weisbecker 
Cc: Alessio Igor Bogani 
Cc: Andrew Morton 
Cc: Avi Kivity 
Cc: Chris Metcalf 
Cc: Christoph Lameter 
Cc: Geoff Levand 
Cc: Gilad Ben Yossef 
Cc: Hakan Akkan 
Cc: H. Peter Anvin 
Cc: Ingo Molnar 
Cc: Josh Triplett 
Cc: Kevin Hilman 
Cc: Max Krasnyansky 
Cc: Peter Zijlstra 
Cc: Stephen Hemminger 
Cc: Steven Rostedt 
Cc: Sven-Thorsten Dietrich 
Cc: Thomas Gleixner 
---
 arch/Kconfig |   10 ++
 include/linux/rcupdate.h |8 
 init/Kconfig |   10 ++
 kernel/rcutree.c |5 -
 4 files changed, 32 insertions(+), 1 deletions(-)

diff --git a/arch/Kconfig b/arch/Kconfig
index 8c3d957..1c7c9be 100644
--- a/arch/Kconfig
+++ b/arch/Kconfig
@@ -274,4 +274,14 @@ config SECCOMP_FILTER
 
  See Documentation/prctl/seccomp_filter.txt for details.
 
+config HAVE_RCU_USER_QS
+   bool
+   help
+ Provide kernel entry/exit hooks necessary for userspace
+ RCU extended quiescent state. Syscalls need to be wrapped inside
+ rcu_user_exit()-rcu_user_enter() through the slow path using
+ TIF_NOHZ flag. Exceptions handlers must be wrapped as well. Irqs
+ are already protected inside rcu_irq_enter/rcu_irq_exit() but
+ preemption or signal handling on irq exit still need to be protected.
+
 source "kernel/gcov/Kconfig"
diff --git a/include/linux/rcupdate.h b/include/linux/rcupdate.h
index 148f381..a72f25e 100644
--- a/include/linux/rcupdate.h
+++ b/include/linux/rcupdate.h
@@ -192,10 +192,18 @@ extern void rcu_idle_enter(void);
 extern void rcu_idle_exit(void);
 extern void rcu_irq_enter(void);
 extern void rcu_irq_exit(void);
+
+#ifdef CONFIG_RCU_USER_QS
 extern void rcu_user_enter(void);
 extern void rcu_user_exit(void);
 extern void rcu_user_enter_irq(void);
 extern void rcu_user_exit_irq(void);
+#else
+static inline void rcu_user_enter(void) { }
+static inline void rcu_user_exit(void) { }
+#endif /* CONFIG_RCU_USER_QS */
+
+
 extern void exit_rcu(void);
 
 /**
diff --git a/init/Kconfig b/init/Kconfig
index d07dcf9..3a4af8f 100644
--- a/init/Kconfig
+++ b/init/Kconfig
@@ -441,6 +441,16 @@ config PREEMPT_RCU
  This option enables preemptible-RCU code that is common between
  the TREE_PREEMPT_RCU and TINY_PREEMPT_RCU implementations.
 
+config RCU_USER_QS
+   bool "Consider userspace as in RCU extended quiescent state"
+   depends on HAVE_RCU_USER_QS && SMP
+   help
+ This option sets hooks on kernel / userspace boundaries and
+ puts RCU in extended quiescent state when the CPU runs in
+ userspace. It means that when a CPU runs in userspace, it is
+ excluded from the global RCU state machine and thus doesn't
+ to keep the timer tick on for RCU.
+
 config RCU_FANOUT
int "Tree-based hierarchical RCU fanout value"
range 2 64 if 64BIT
diff --git a/kernel/rcutree.c b/kernel/rcutree.c
index 5541a07..efa5983 100644
--- a/kernel/rcutree.c
+++ b/kernel/rcutree.c
@@ -423,6 +423,7 @@ void rcu_idle_enter(void)
 }
 EXPORT_SYMBOL_GPL(rcu_idle_enter);
 
+#ifdef CONFIG_RCU_USER_QS
 /**
  * rcu_user_enter - inform RCU that we are resuming userspace.
  *
@@ -437,7 +438,6 @@ void rcu_user_enter(void)
 }
 EXPORT_SYMBOL_GPL(rcu_user_enter);
 
-
 /**
  * rcu_user_enter_irq - inform RCU that we are going to resume userspace
  * after the current irq returns.
@@ -458,6 +458,7 @@ void rcu_user_enter_irq(void)
rdtp->dynticks_nesting = 1;
local_irq_restore(flags);
 }
+#endif
 
 /**
  * rcu_irq_exit - inform RCU that current CPU is exiting irq towards idle
@@ -561,6 +562,7 @@ void rcu_idle_exit(void)
 }
 EXPORT_SYMBOL_GPL(rcu_idle_exit);
 
+#ifdef CONFIG_RCU_USER_QS
 /**
  * rcu_user_exit - inform RCU that we are exiting userspace.
  *
@@ -594,6 +596,7 @@ void rcu_user_exit_irq(void)
rdtp->dynticks_nesting += DYNTICK_TASK_EXIT_IDLE;
local_irq_restore(flags);
 }
+#endif
 
 /**
  * rcu_irq_enter - inform RCU that current CPU is entering irq away from idle
-- 
1.7.5.4

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH 02/11] rcu: Allow rcu_user_enter()/exit() to nest

2012-07-11 Thread Frederic Weisbecker
Allow calls to rcu_user_enter() even if we are already
in userspace (as seen by RCU) and allow calls to rcu_user_exit()
even if we are already in the kernel.

This makes the APIs more flexible to be called from architectures.
Exception entries for example won't need to know if they come from
userspace before calling rcu_user_exit().

Signed-off-by: Frederic Weisbecker 
Cc: Alessio Igor Bogani 
Cc: Andrew Morton 
Cc: Avi Kivity 
Cc: Chris Metcalf 
Cc: Christoph Lameter 
Cc: Geoff Levand 
Cc: Gilad Ben Yossef 
Cc: Hakan Akkan 
Cc: H. Peter Anvin 
Cc: Ingo Molnar 
Cc: Josh Triplett 
Cc: Kevin Hilman 
Cc: Max Krasnyansky 
Cc: Peter Zijlstra 
Cc: Stephen Hemminger 
Cc: Steven Rostedt 
Cc: Sven-Thorsten Dietrich 
Cc: Thomas Gleixner 
---
 kernel/rcutree.c |   41 +
 kernel/rcutree.h |3 +++
 2 files changed, 36 insertions(+), 8 deletions(-)

diff --git a/kernel/rcutree.c b/kernel/rcutree.c
index efa5983..d5df618 100644
--- a/kernel/rcutree.c
+++ b/kernel/rcutree.c
@@ -389,11 +389,9 @@ static void rcu_eqs_enter_common(struct rcu_dynticks 
*rdtp, long long oldval,
  */
 static void rcu_eqs_enter(bool user)
 {
-   unsigned long flags;
long long oldval;
struct rcu_dynticks *rdtp;
 
-   local_irq_save(flags);
rdtp = &__get_cpu_var(rcu_dynticks);
oldval = rdtp->dynticks_nesting;
WARN_ON_ONCE((oldval & DYNTICK_TASK_NEST_MASK) == 0);
@@ -402,7 +400,6 @@ static void rcu_eqs_enter(bool user)
else
rdtp->dynticks_nesting -= DYNTICK_TASK_NEST_VALUE;
rcu_eqs_enter_common(rdtp, oldval, user);
-   local_irq_restore(flags);
 }
 
 /**
@@ -419,7 +416,11 @@ static void rcu_eqs_enter(bool user)
  */
 void rcu_idle_enter(void)
 {
+   unsigned long flags;
+
+   local_irq_save(flags);
rcu_eqs_enter(0);
+   local_irq_restore(flags);
 }
 EXPORT_SYMBOL_GPL(rcu_idle_enter);
 
@@ -434,7 +435,18 @@ EXPORT_SYMBOL_GPL(rcu_idle_enter);
  */
 void rcu_user_enter(void)
 {
-   rcu_eqs_enter(1);
+   unsigned long flags;
+   struct rcu_dynticks *rdtp;
+
+   WARN_ON_ONCE(!current->mm);
+
+   local_irq_save(flags);
+   rdtp = &__get_cpu_var(rcu_dynticks);
+   if (!rdtp->in_user) {
+   rdtp->in_user = true;
+   rcu_eqs_enter(1);
+   }
+   local_irq_restore(flags);
 }
 EXPORT_SYMBOL_GPL(rcu_user_enter);
 
@@ -529,11 +541,9 @@ static void rcu_eqs_exit_common(struct rcu_dynticks *rdtp, 
long long oldval,
  */
 static void rcu_eqs_exit(bool user)
 {
-   unsigned long flags;
struct rcu_dynticks *rdtp;
long long oldval;
 
-   local_irq_save(flags);
rdtp = &__get_cpu_var(rcu_dynticks);
oldval = rdtp->dynticks_nesting;
WARN_ON_ONCE(oldval < 0);
@@ -542,7 +552,6 @@ static void rcu_eqs_exit(bool user)
else
rdtp->dynticks_nesting = DYNTICK_TASK_EXIT_IDLE;
rcu_eqs_exit_common(rdtp, oldval, user);
-   local_irq_restore(flags);
 }
 
 /**
@@ -558,7 +567,11 @@ static void rcu_eqs_exit(bool user)
  */
 void rcu_idle_exit(void)
 {
+   unsigned long flags;
+
+   local_irq_save(flags);
rcu_eqs_exit(0);
+   local_irq_restore(flags);
 }
 EXPORT_SYMBOL_GPL(rcu_idle_exit);
 
@@ -571,7 +584,16 @@ EXPORT_SYMBOL_GPL(rcu_idle_exit);
  */
 void rcu_user_exit(void)
 {
-   rcu_eqs_exit(1);
+   unsigned long flags;
+   struct rcu_dynticks *rdtp;
+
+   local_irq_save(flags);
+   rdtp = &__get_cpu_var(rcu_dynticks);
+   if (rdtp->in_user) {
+   rdtp->in_user = false;
+   rcu_eqs_exit(1);
+   }
+   local_irq_restore(flags);
 }
 EXPORT_SYMBOL_GPL(rcu_user_exit);
 
@@ -2660,6 +2682,9 @@ rcu_boot_init_percpu_data(int cpu, struct rcu_state *rsp)
rdp->dynticks = &per_cpu(rcu_dynticks, cpu);
WARN_ON_ONCE(rdp->dynticks->dynticks_nesting != DYNTICK_TASK_EXIT_IDLE);
WARN_ON_ONCE(atomic_read(&rdp->dynticks->dynticks) != 1);
+#ifdef CONFIG_RCU_USER_QS
+   WARN_ON_ONCE(rdp->dynticks->in_user);
+#endif
rdp->cpu = cpu;
rdp->rsp = rsp;
raw_spin_unlock_irqrestore(&rnp->lock, flags);
diff --git a/kernel/rcutree.h b/kernel/rcutree.h
index cad96cb..4d82cb5 100644
--- a/kernel/rcutree.h
+++ b/kernel/rcutree.h
@@ -102,6 +102,9 @@ struct rcu_dynticks {
/* idle-period nonlazy_posted snapshot. */
int tick_nohz_enabled_snap; /* Previously seen value from sysfs. */
 #endif /* #ifdef CONFIG_RCU_FAST_NO_HZ */
+#ifdef CONFIG_RCU_USER_QS
+   bool in_user;   /* Is the CPU in userland from RCU POV? */
+#endif
 };
 
 /* RCU's kthread states for tracing. */
-- 
1.7.5.4

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH 03/11] rcu: Ignore userspace extended quiescent state by default

2012-07-11 Thread Frederic Weisbecker
By default we don't want to enter into RCU extended quiescent
state while in userspace because doing this produces some overhead
(eg: use of syscall slowpath). Set it off by default and ready to
run when some feature like adaptive tickless need it.

Signed-off-by: Frederic Weisbecker 
Cc: Alessio Igor Bogani 
Cc: Andrew Morton 
Cc: Avi Kivity 
Cc: Chris Metcalf 
Cc: Christoph Lameter 
Cc: Geoff Levand 
Cc: Gilad Ben Yossef 
Cc: Hakan Akkan 
Cc: H. Peter Anvin 
Cc: Ingo Molnar 
Cc: Josh Triplett 
Cc: Kevin Hilman 
Cc: Max Krasnyansky 
Cc: Peter Zijlstra 
Cc: Stephen Hemminger 
Cc: Steven Rostedt 
Cc: Sven-Thorsten Dietrich 
Cc: Thomas Gleixner 
---
 kernel/rcutree.c |5 -
 kernel/rcutree.h |1 +
 2 files changed, 5 insertions(+), 1 deletions(-)

diff --git a/kernel/rcutree.c b/kernel/rcutree.c
index d5df618..78b0c30 100644
--- a/kernel/rcutree.c
+++ b/kernel/rcutree.c
@@ -209,6 +209,9 @@ EXPORT_SYMBOL_GPL(rcu_note_context_switch);
 DEFINE_PER_CPU(struct rcu_dynticks, rcu_dynticks) = {
.dynticks_nesting = DYNTICK_TASK_EXIT_IDLE,
.dynticks = ATOMIC_INIT(1),
+#ifdef CONFIG_RCU_USER_QS
+   .ignore_user_qs = true,
+#endif
 };
 
 static int blimit = 10;/* Maximum callbacks per rcu_do_batch. 
*/
@@ -442,7 +445,7 @@ void rcu_user_enter(void)
 
local_irq_save(flags);
rdtp = &__get_cpu_var(rcu_dynticks);
-   if (!rdtp->in_user) {
+   if (!rdtp->ignore_user_qs && !rdtp->in_user) {
rdtp->in_user = true;
rcu_eqs_enter(1);
}
diff --git a/kernel/rcutree.h b/kernel/rcutree.h
index 4d82cb5..55bcef1 100644
--- a/kernel/rcutree.h
+++ b/kernel/rcutree.h
@@ -103,6 +103,7 @@ struct rcu_dynticks {
int tick_nohz_enabled_snap; /* Previously seen value from sysfs. */
 #endif /* #ifdef CONFIG_RCU_FAST_NO_HZ */
 #ifdef CONFIG_RCU_USER_QS
+   bool ignore_user_qs;/* Treat userspace as extended QS or not */
bool in_user;   /* Is the CPU in userland from RCU POV? */
 #endif
 };
-- 
1.7.5.4

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH 08/11] rcu: Exit RCU extended QS on user preemption

2012-07-11 Thread Frederic Weisbecker
When exceptions or irq are about to resume userspace, if
the task needs to be rescheduled, the arch low level code
calls schedule() directly.

At that time we may be in extended quiescent state from RCU
POV: the exception is not anymore protected inside
rcu_user_exit() - rcu_user_enter() and the irq has called
rcu_irq_exit() already.

Create a new API schedule_user() that calls schedule() inside
rcu_user_exit()-rcu_user_enter() in order to protect it. Archs
will need to rely on it now to implement user preemption safely.

Signed-off-by: Frederic Weisbecker 
Cc: Alessio Igor Bogani 
Cc: Andrew Morton 
Cc: Avi Kivity 
Cc: Chris Metcalf 
Cc: Christoph Lameter 
Cc: Geoff Levand 
Cc: Gilad Ben Yossef 
Cc: Hakan Akkan 
Cc: H. Peter Anvin 
Cc: Ingo Molnar 
Cc: Josh Triplett 
Cc: Kevin Hilman 
Cc: Max Krasnyansky 
Cc: Peter Zijlstra 
Cc: Stephen Hemminger 
Cc: Steven Rostedt 
Cc: Sven-Thorsten Dietrich 
Cc: Thomas Gleixner 
---
 kernel/sched/core.c |7 +++
 1 files changed, 7 insertions(+), 0 deletions(-)

diff --git a/kernel/sched/core.c b/kernel/sched/core.c
index 1e0fa5b..a37619a 100644
--- a/kernel/sched/core.c
+++ b/kernel/sched/core.c
@@ -3332,6 +3332,13 @@ asmlinkage void __sched schedule(void)
 }
 EXPORT_SYMBOL(schedule);
 
+asmlinkage void __sched schedule_user(void)
+{
+   rcu_user_exit();
+   schedule();
+   rcu_user_enter();
+}
+
 /**
  * schedule_preempt_disabled - called with preemption disabled
  *
-- 
1.7.5.4

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


<    7   8   9   10   11   12   13   14   15   16   >