[PATCH 05/10] m32r: Add missing RCU idle APIs on idle loop
In the old times, the whole idle task was considered as an RCU quiescent state. But as RCU became more and more successful overtime, some RCU read side critical section have been added even in the code of some architectures idle tasks, for tracing for example. So nowadays, rcu_idle_enter() and rcu_idle_exit() must be called by the architecture to tell RCU about the part in the idle loop that doesn't make use of rcu read side critical sections, typically the part that puts the CPU in low power mode. This is necessary for RCU to find the quiescent states in idle in order to complete grace periods. Add this missing pair of calls in the m32r's idle loop. Reported-by: Paul E. McKenney Signed-off-by: Frederic Weisbecker Cc: Hirokazu Takata Cc: 3.2.x.. Cc: Paul E. McKenney --- arch/m32r/kernel/process.c |3 +++ 1 files changed, 3 insertions(+), 0 deletions(-) diff --git a/arch/m32r/kernel/process.c b/arch/m32r/kernel/process.c index 3a4a32b2..384e63f 100644 --- a/arch/m32r/kernel/process.c +++ b/arch/m32r/kernel/process.c @@ -26,6 +26,7 @@ #include #include #include +#include #include #include @@ -82,6 +83,7 @@ void cpu_idle (void) { /* endless idle loop with no priority at all */ while (1) { + rcu_idle_enter(); while (!need_resched()) { void (*idle)(void) = pm_idle; @@ -90,6 +92,7 @@ void cpu_idle (void) idle(); } + rcu_idle_exit(); schedule_preempt_disabled(); } } -- 1.7.5.4 -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[PATCH 03/10] frv: Add missing RCU idle APIs on idle loop
In the old times, the whole idle task was considered as an RCU quiescent state. But as RCU became more and more successful overtime, some RCU read side critical section have been added even in the code of some architectures idle tasks, for tracing for example. So nowadays, rcu_idle_enter() and rcu_idle_exit() must be called by the architecture to tell RCU about the part in the idle loop that doesn't make use of rcu read side critical sections, typically the part that puts the CPU in low power mode. This is necessary for RCU to find the quiescent states in idle in order to complete grace periods. Add this missing pair of calls in the Frv's idle loop. Reported-by: Paul E. McKenney Signed-off-by: Frederic Weisbecker Cc: David Howells Cc: 3.2.x.. Cc: Paul E. McKenney --- arch/frv/kernel/process.c |3 +++ 1 files changed, 3 insertions(+), 0 deletions(-) diff --git a/arch/frv/kernel/process.c b/arch/frv/kernel/process.c index ff95f50..2eb7fa5 100644 --- a/arch/frv/kernel/process.c +++ b/arch/frv/kernel/process.c @@ -25,6 +25,7 @@ #include #include #include +#include #include #include @@ -69,12 +70,14 @@ void cpu_idle(void) { /* endless idle loop with no priority at all */ while (1) { + rcu_idle_enter(); while (!need_resched()) { check_pgt_cache(); if (!frv_dma_inprogress && idle) idle(); } + rcu_idle_exit(); schedule_preempt_disabled(); } -- 1.7.5.4 -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[PATCH 02/10] cris: Add missing RCU idle APIs on idle loop
In the old times, the whole idle task was considered as an RCU quiescent state. But as RCU became more and more successful overtime, some RCU read side critical section have been added even in the code of some architectures idle tasks, for tracing for example. So nowadays, rcu_idle_enter() and rcu_idle_exit() must be called by the architecture to tell RCU about the part in the idle loop that doesn't make use of rcu read side critical sections, typically the part that puts the CPU in low power mode. This is necessary for RCU to find the quiescent states in idle in order to complete grace periods. Add this missing pair of calls in the Cris's idle loop. Reported-by: Paul E. McKenney Signed-off-by: Frederic Weisbecker Cc: Mikael Starvik Cc: Jesper Nilsson Cc: Cris Cc: 3.2.x.. Cc: Paul E. McKenney --- arch/cris/kernel/process.c |3 +++ 1 files changed, 3 insertions(+), 0 deletions(-) diff --git a/arch/cris/kernel/process.c b/arch/cris/kernel/process.c index 66fd017..7f65be6 100644 --- a/arch/cris/kernel/process.c +++ b/arch/cris/kernel/process.c @@ -25,6 +25,7 @@ #include #include #include +#include //#define DEBUG @@ -74,6 +75,7 @@ void cpu_idle (void) { /* endless idle loop with no priority at all */ while (1) { + rcu_idle_enter(); while (!need_resched()) { void (*idle)(void); /* @@ -86,6 +88,7 @@ void cpu_idle (void) idle = default_idle; idle(); } + rcu_idle_exit(); schedule_preempt_disabled(); } } -- 1.7.5.4 -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 01/10] alpha: Add missing RCU idle APIs on idle loop
On Wed, Aug 22, 2012 at 10:19:30AM -0700, Paul E. McKenney wrote: > On Wed, Aug 22, 2012 at 06:23:39PM +0200, Frederic Weisbecker wrote: > > In the old times, the whole idle task was considered > > as an RCU quiescent state. But as RCU became more and > > more successful overtime, some RCU read side critical > > section have been added even in the code of some > > architectures idle tasks, for tracing for example. > > > > So nowadays, rcu_idle_enter() and rcu_idle_exit() must > > be called by the architecture to tell RCU about the part > > in the idle loop that doesn't make use of rcu read side > > critical sections, typically the part that puts the CPU > > in low power mode. > > > > This is necessary for RCU to find the quiescent states in > > idle in order to complete grace periods. > > > > Add this missing pair of calls in the Alpha's idle loop. > > > > Reported-by: Paul E. McKenney > > Signed-off-by: Frederic Weisbecker > > Cc: Richard Henderson > > Cc: Ivan Kokshaysky > > Cc: Matt Turner > > Cc: alpha > > Cc: Paul E. McKenney > > Cc: 3.2.x.. > > --- > > arch/alpha/kernel/process.c |6 +- > > 1 files changed, 5 insertions(+), 1 deletions(-) > > > > diff --git a/arch/alpha/kernel/process.c b/arch/alpha/kernel/process.c > > index 153d3fc..2ebf7b5 100644 > > --- a/arch/alpha/kernel/process.c > > +++ b/arch/alpha/kernel/process.c > > @@ -28,6 +28,7 @@ > > #include > > #include > > #include > > +#include > > > > #include > > #include > > @@ -50,13 +51,16 @@ cpu_idle(void) > > { > > set_thread_flag(TIF_POLLING_NRFLAG); > > > > + preempt_disable(); > > I don't understand the above preempt_disable() not having a matching > preempt_enable() at exit, but the rest of the patches in this series > look good to me. The current code is preemptable, at least it appears so because it calls schedule() directly. And if I call rcu_idle_enter() in a preemptable section, I'm in trouble because I'll schedule while in extended QS. Thus I need to disable preemption here at least until I call rcu_idle_exit(). Now this is an endless loop so there is no need to re-enable preemption after the loop. And schedule_preempt_disabled() takes care of enabling preemption before schedule() and redisabling it afterward. > > Thanx, Paul > > > while (1) { > > /* FIXME -- EV6 and LCA45 know how to power down > >the CPU. */ > > > > + rcu_idle_enter(); > > while (!need_resched()) > > cpu_relax(); > > - schedule(); > > + rcu_idle_exit(); > > + schedule_preempt_disabled(); > > } > > } > > > > -- > > 1.7.5.4 > > > -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 01/10] alpha: Add missing RCU idle APIs on idle loop
On Wed, Aug 22, 2012 at 12:01:09PM -0700, Paul E. McKenney wrote: > > The current code is preemptable, at least it appears so because it calls > > schedule() directly. And if I call rcu_idle_enter() in a preemptable > > section, > > I'm in trouble because I'll schedule while in extended QS. > > > > Thus I need to disable preemption here at least until I call > > rcu_idle_exit(). > > > > Now this is an endless loop so there is no need to re-enable > > preemption after the loop. And schedule_preempt_disabled() > > takes care of enabling preemption before schedule() and redisabling > > it afterward. > > > > > > > > > > Thanx, Paul > > > > > > > while (1) { > > > > /* FIXME -- EV6 and LCA45 know how to power down > > > >the CPU. */ > > > > > > > > + rcu_idle_enter(); > > > > while (!need_resched()) > > > > cpu_relax(); > > > > - schedule(); > > > > + rcu_idle_exit(); > > > > + schedule_preempt_disabled(); > > > > } > > Understood, but what I don't understand is why you don't need a > preempt_enable() right here. Look, let's inline the content of schedule_preempt_disabled(), the code then looks like: void cpu_idle(void) { set_thread_flag(TIF_POLLING_NRFLAG); preempt_disable(); while (1) { /* FIXME -- EV6 and LCA45 know how to power down the CPU. */ rcu_idle_enter(); while (!need_resched()) cpu_relax(); rcu_idle_exit(); sched_preempt_enable_no_resched(); schedule(); preempt_disable(); } } So there is a preempt_enable() before we schedule, then we re-disable preemption after schedule. Now I realize cpu_idle() is supposed to be called with preemption disabled already so I shouldn't add an explicit preempt_disable() or it's going to be worse. But that means there is an existing bug here in alpha, it should call schedule_preempt_disabled() instead of schedule(). cpu_idle() is called with preemption disabled on the boot CPU. And it should as well from the secondary CPUs entry but alpha doesn't seem to do that. So I need to fix that first. I'll respin. -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: linux-next: manual merge of the tip tree with the rr tree
On Thu, Aug 23, 2012 at 12:43:48PM +1000, Stephen Rothwell wrote: > Hi all, > > Today's linux-next merge of the tip tree got a conflict in arch/Kconfig > between commit bd029f48459a ("Make most arch asm/module.h files use > asm-generic/module.h") from the rr tree and commit b952741c8079 > ("cputime: Generalize CONFIG_VIRT_CPU_ACCOUNTING") from the tip tree. > > Just context changes. I fixed it up (see below) and can carry the fix as > necessary. > -- > Cheers, > Stephen Rothwells...@canb.auug.org.au Looks good, thanks! > > diff --cc arch/Kconfig > index 3450115,ea5feb6..000 > --- a/arch/Kconfig > +++ b/arch/Kconfig > @@@ -281,23 -294,7 +294,26 @@@ config SECCOMP_FILTE > > See Documentation/prctl/seccomp_filter.txt for details. > > +config HAVE_MOD_ARCH_SPECIFIC > +bool > +help > + The arch uses struct mod_arch_specific to store data. Many arches > + just need a simple module loader without arch specific data - those > + should not enable this. > + > +config MODULES_USE_ELF_RELA > +bool > +help > + Modules only use ELF RELA relocations. Modules with ELF REL > + relocations will give an error. > + > +config MODULES_USE_ELF_REL > +bool > +help > + Modules only use ELF REL relocations. Modules with ELF RELA > + relocations will give an error. > + > + config HAVE_VIRT_CPU_ACCOUNTING > + bool > + > source "kernel/gcov/Kconfig" -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 01/10] alpha: Add missing RCU idle APIs on idle loop
On Thu, Aug 23, 2012 at 09:32:18PM +1200, Michael Cree wrote: > On 23/08/12 04:23, Frederic Weisbecker wrote: > > In the old times, the whole idle task was considered > > as an RCU quiescent state. But as RCU became more and > > more successful overtime, some RCU read side critical > > section have been added even in the code of some > > architectures idle tasks, for tracing for example. > > Fantastic! It fixes RCU CPU stalls that we were seeing on the SMP > kernel when built for generic Alpha. > > A build of glibc and running its test suite reliably triggers RCU CPU > stalls when running a kernel built for generic Alpha. I have just built > glibc and ran its test suite twice with no RCU CPU stalls with this > patch against a 3.5.2 kernel! Nice. Very nice. > > I see the stable queue is CCed but I note the patch does not apply > cleanly to the 3.2.y kernel. It would be nice to have a backport of the > patches for the 3.2 stable kernel. Sure. > > So feel free to add: > > Tested-by: Michael Cree Thanks, but I need to refactor the patch, I suspect a problem with CONFIG_PREEMPT. -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 00/10] rcu: Add missing RCU idle APIs on idle loop
On Wed, Aug 22, 2012 at 07:18:04PM +0200, Geert Uytterhoeven wrote: > On Wed, Aug 22, 2012 at 6:23 PM, Frederic Weisbecker > wrote: > > So this fixes some potential RCU stalls in a bunch of architectures. > > When rcu_idle_enter()/rcu_idle_exit() became a requirement, we forgot > > to handle the architectures that don't support CONFIG_NO_HZ. > > > > I guess the set should be dispatched into arch maintainer trees. > > I can take the m68k version, but are you sure you want it this way? > Each of them must be in mainline before they can enter stable. Yeah, I was thinking the right route is for these patches to be carried by arch maintainer who then push to Linus and then this goes to stable. Is that ok for you? Otherwise I can carry the patches myself. In a tree of my own, or Paul's or mmotm. As long as I have your ack. Thanks. > > > I'm sorry I haven't built tested everywhere. But the changes are > > small and need to be at least boot tested anyway. > > Builds and boots fine on m68k under ARAnyM. > Acked-by: Geert Uytterhoeven (for m68k) > > Gr{oetje,eeting}s, > > Geert > > -- > Geert Uytterhoeven -- There's lots of Linux beyond ia32 -- > ge...@linux-m68k.org > > In personal conversations with technical people, I call myself a hacker. But > when I'm talking to journalists I just say "programmer" or something like > that. > -- Linus Torvalds -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[PATCH 00/11] rcu: Add missing RCU idle APIs on idle loop v2
Hi, Changes since v1: - Fixed preempt handling in alpha idle loop - added ack from Geert - fixed stable email address, sorry :-/ This time I built tested everywhere but: h8300 (compiler internal error), and mn10300, parisc, score (cross compilers not available in ftp://ftp.kernel.org/pub/tools/crosstool/files/bin/x86_64/4.6.3/) For testing, you can pull from: git://github.com/fweisbec/linux-dynticks.git rcu/idle-fix-v2 Thanks. Frederic Weisbecker (11): alpha: Fix preemption handling in idle loop alpha: Add missing RCU idle APIs on idle loop cris: Add missing RCU idle APIs on idle loop frv: Add missing RCU idle APIs on idle loop h8300: Add missing RCU idle APIs on idle loop m32r: Add missing RCU idle APIs on idle loop m68k: Add missing RCU idle APIs on idle loop mn10300: Add missing RCU idle APIs on idle loop parisc: Add missing RCU idle APIs on idle loop score: Add missing RCU idle APIs on idle loop xtensa: Add missing RCU idle APIs on idle loop arch/alpha/kernel/process.c |6 +- arch/alpha/kernel/smp.c |1 + arch/cris/kernel/process.c|3 +++ arch/frv/kernel/process.c |3 +++ arch/h8300/kernel/process.c |3 +++ arch/m32r/kernel/process.c|3 +++ arch/m68k/kernel/process.c|3 +++ arch/mn10300/kernel/process.c |3 +++ arch/parisc/kernel/process.c |3 +++ arch/score/kernel/process.c |4 +++- arch/xtensa/kernel/process.c |3 +++ 11 files changed, 33 insertions(+), 2 deletions(-) -- 1.7.5.4 -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[PATCH 03/11] cris: Add missing RCU idle APIs on idle loop
In the old times, the whole idle task was considered as an RCU quiescent state. But as RCU became more and more successful overtime, some RCU read side critical section have been added even in the code of some architectures idle tasks, for tracing for example. So nowadays, rcu_idle_enter() and rcu_idle_exit() must be called by the architecture to tell RCU about the part in the idle loop that doesn't make use of rcu read side critical sections, typically the part that puts the CPU in low power mode. This is necessary for RCU to find the quiescent states in idle in order to complete grace periods. Add this missing pair of calls in the Cris's idle loop. Reported-by: Paul E. McKenney Signed-off-by: Frederic Weisbecker Cc: Mikael Starvik Cc: Jesper Nilsson Cc: Cris Cc: 3.2.x.. Cc: Paul E. McKenney --- arch/cris/kernel/process.c |3 +++ 1 files changed, 3 insertions(+), 0 deletions(-) diff --git a/arch/cris/kernel/process.c b/arch/cris/kernel/process.c index 66fd017..7f65be6 100644 --- a/arch/cris/kernel/process.c +++ b/arch/cris/kernel/process.c @@ -25,6 +25,7 @@ #include #include #include +#include //#define DEBUG @@ -74,6 +75,7 @@ void cpu_idle (void) { /* endless idle loop with no priority at all */ while (1) { + rcu_idle_enter(); while (!need_resched()) { void (*idle)(void); /* @@ -86,6 +88,7 @@ void cpu_idle (void) idle = default_idle; idle(); } + rcu_idle_exit(); schedule_preempt_disabled(); } } -- 1.7.5.4 -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[PATCH 02/11] alpha: Add missing RCU idle APIs on idle loop
In the old times, the whole idle task was considered as an RCU quiescent state. But as RCU became more and more successful overtime, some RCU read side critical section have been added even in the code of some architectures idle tasks, for tracing for example. So nowadays, rcu_idle_enter() and rcu_idle_exit() must be called by the architecture to tell RCU about the part in the idle loop that doesn't make use of rcu read side critical sections, typically the part that puts the CPU in low power mode. This is necessary for RCU to find the quiescent states in idle in order to complete grace periods. Add this missing pair of calls in the Alpha's idle loop. Reported-by: Paul E. McKenney Signed-off-by: Frederic Weisbecker Cc: Richard Henderson Cc: Ivan Kokshaysky Cc: Matt Turner Cc: alpha Cc: Paul E. McKenney Cc: Michael Cree Cc: 3.2.x.. --- arch/alpha/kernel/process.c |3 +++ 1 files changed, 3 insertions(+), 0 deletions(-) diff --git a/arch/alpha/kernel/process.c b/arch/alpha/kernel/process.c index eac5e01..eb9558c 100644 --- a/arch/alpha/kernel/process.c +++ b/arch/alpha/kernel/process.c @@ -28,6 +28,7 @@ #include #include #include +#include #include #include @@ -54,9 +55,11 @@ cpu_idle(void) /* FIXME -- EV6 and LCA45 know how to power down the CPU. */ + rcu_idle_enter(); while (!need_resched()) cpu_relax(); + rcu_idle_exit(); schedule_preempt_disabled(); } } -- 1.7.5.4 -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[PATCH 06/11] m32r: Add missing RCU idle APIs on idle loop
In the old times, the whole idle task was considered as an RCU quiescent state. But as RCU became more and more successful overtime, some RCU read side critical section have been added even in the code of some architectures idle tasks, for tracing for example. So nowadays, rcu_idle_enter() and rcu_idle_exit() must be called by the architecture to tell RCU about the part in the idle loop that doesn't make use of rcu read side critical sections, typically the part that puts the CPU in low power mode. This is necessary for RCU to find the quiescent states in idle in order to complete grace periods. Add this missing pair of calls in the m32r's idle loop. Reported-by: Paul E. McKenney Signed-off-by: Frederic Weisbecker Cc: Hirokazu Takata Cc: 3.2.x.. Cc: Paul E. McKenney --- arch/m32r/kernel/process.c |3 +++ 1 files changed, 3 insertions(+), 0 deletions(-) diff --git a/arch/m32r/kernel/process.c b/arch/m32r/kernel/process.c index 3a4a32b2..384e63f 100644 --- a/arch/m32r/kernel/process.c +++ b/arch/m32r/kernel/process.c @@ -26,6 +26,7 @@ #include #include #include +#include #include #include @@ -82,6 +83,7 @@ void cpu_idle (void) { /* endless idle loop with no priority at all */ while (1) { + rcu_idle_enter(); while (!need_resched()) { void (*idle)(void) = pm_idle; @@ -90,6 +92,7 @@ void cpu_idle (void) idle(); } + rcu_idle_exit(); schedule_preempt_disabled(); } } -- 1.7.5.4 -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[PATCH 08/11] mn10300: Add missing RCU idle APIs on idle loop
In the old times, the whole idle task was considered as an RCU quiescent state. But as RCU became more and more successful overtime, some RCU read side critical section have been added even in the code of some architectures idle tasks, for tracing for example. So nowadays, rcu_idle_enter() and rcu_idle_exit() must be called by the architecture to tell RCU about the part in the idle loop that doesn't make use of rcu read side critical sections, typically the part that puts the CPU in low power mode. This is necessary for RCU to find the quiescent states in idle in order to complete grace periods. Add this missing pair of calls in the mn10300's idle loop. Reported-by: Paul E. McKenney Signed-off-by: Frederic Weisbecker Cc: David Howells Cc: Koichi Yasutake Cc: 3.2.x.. Cc: Paul E. McKenney --- arch/mn10300/kernel/process.c |3 +++ 1 files changed, 3 insertions(+), 0 deletions(-) diff --git a/arch/mn10300/kernel/process.c b/arch/mn10300/kernel/process.c index 7dab0cd..e9cceba 100644 --- a/arch/mn10300/kernel/process.c +++ b/arch/mn10300/kernel/process.c @@ -25,6 +25,7 @@ #include #include #include +#include #include #include #include @@ -107,6 +108,7 @@ void cpu_idle(void) { /* endless idle loop with no priority at all */ for (;;) { + rcu_idle_enter(); while (!need_resched()) { void (*idle)(void); @@ -121,6 +123,7 @@ void cpu_idle(void) } idle(); } + rcu_idle_exit(); schedule_preempt_disabled(); } -- 1.7.5.4 -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[PATCH 11/11] xtensa: Add missing RCU idle APIs on idle loop
In the old times, the whole idle task was considered as an RCU quiescent state. But as RCU became more and more successful overtime, some RCU read side critical section have been added even in the code of some architectures idle tasks, for tracing for example. So nowadays, rcu_idle_enter() and rcu_idle_exit() must be called by the architecture to tell RCU about the part in the idle loop that doesn't make use of rcu read side critical sections, typically the part that puts the CPU in low power mode. This is necessary for RCU to find the quiescent states in idle in order to complete grace periods. Add this missing pair of calls in the xtensa's idle loop. Reported-by: Paul E. McKenney Signed-off-by: Frederic Weisbecker Cc: Chris Zankel Cc: 3.2.x.. Cc: Paul E. McKenney --- arch/xtensa/kernel/process.c |3 +++ 1 files changed, 3 insertions(+), 0 deletions(-) diff --git a/arch/xtensa/kernel/process.c b/arch/xtensa/kernel/process.c index 2c8d6a3..bc44311 100644 --- a/arch/xtensa/kernel/process.c +++ b/arch/xtensa/kernel/process.c @@ -31,6 +31,7 @@ #include #include #include +#include #include #include @@ -110,8 +111,10 @@ void cpu_idle(void) /* endless idle loop with no priority at all */ while (1) { + rcu_idle_enter(); while (!need_resched()) platform_idle(); + rcu_idle_exit(); schedule_preempt_disabled(); } } -- 1.7.5.4 -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[PATCH 10/11] score: Add missing RCU idle APIs on idle loop
In the old times, the whole idle task was considered as an RCU quiescent state. But as RCU became more and more successful overtime, some RCU read side critical section have been added even in the code of some architectures idle tasks, for tracing for example. So nowadays, rcu_idle_enter() and rcu_idle_exit() must be called by the architecture to tell RCU about the part in the idle loop that doesn't make use of rcu read side critical sections, typically the part that puts the CPU in low power mode. This is necessary for RCU to find the quiescent states in idle in order to complete grace periods. Add this missing pair of calls in the scores's idle loop. Reported-by: Paul E. McKenney Signed-off-by: Frederic Weisbecker Cc: Chen Liqin Cc: Lennox Wu Cc: 3.2.x.. Cc: Paul E. McKenney --- arch/score/kernel/process.c |4 +++- 1 files changed, 3 insertions(+), 1 deletions(-) diff --git a/arch/score/kernel/process.c b/arch/score/kernel/process.c index 2707023..637970c 100644 --- a/arch/score/kernel/process.c +++ b/arch/score/kernel/process.c @@ -27,6 +27,7 @@ #include #include #include +#include void (*pm_power_off)(void); EXPORT_SYMBOL(pm_power_off); @@ -50,9 +51,10 @@ void __noreturn cpu_idle(void) { /* endless idle loop with no priority at all */ while (1) { + rcu_idle_enter(); while (!need_resched()) barrier(); - + rcu_idle_exit(); schedule_preempt_disabled(); } } -- 1.7.5.4 -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[PATCH 09/11] parisc: Add missing RCU idle APIs on idle loop
In the old times, the whole idle task was considered as an RCU quiescent state. But as RCU became more and more successful overtime, some RCU read side critical section have been added even in the code of some architectures idle tasks, for tracing for example. So nowadays, rcu_idle_enter() and rcu_idle_exit() must be called by the architecture to tell RCU about the part in the idle loop that doesn't make use of rcu read side critical sections, typically the part that puts the CPU in low power mode. This is necessary for RCU to find the quiescent states in idle in order to complete grace periods. Add this missing pair of calls in the parisc's idle loop. Reported-by: Paul E. McKenney Signed-off-by: Frederic Weisbecker Cc: James E.J. Bottomley Cc: Helge Deller Cc: Parisc Cc: 3.2.x.. Cc: Paul E. McKenney --- arch/parisc/kernel/process.c |3 +++ 1 files changed, 3 insertions(+), 0 deletions(-) diff --git a/arch/parisc/kernel/process.c b/arch/parisc/kernel/process.c index d4b94b3..c54a4db 100644 --- a/arch/parisc/kernel/process.c +++ b/arch/parisc/kernel/process.c @@ -48,6 +48,7 @@ #include #include #include +#include #include #include @@ -69,8 +70,10 @@ void cpu_idle(void) /* endless idle loop with no priority at all */ while (1) { + rcu_idle_enter(); while (!need_resched()) barrier(); + rcu_idle_exit(); schedule_preempt_disabled(); check_pgt_cache(); } -- 1.7.5.4 -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[PATCH 07/11] m68k: Add missing RCU idle APIs on idle loop
In the old times, the whole idle task was considered as an RCU quiescent state. But as RCU became more and more successful overtime, some RCU read side critical section have been added even in the code of some architectures idle tasks, for tracing for example. So nowadays, rcu_idle_enter() and rcu_idle_exit() must be called by the architecture to tell RCU about the part in the idle loop that doesn't make use of rcu read side critical sections, typically the part that puts the CPU in low power mode. This is necessary for RCU to find the quiescent states in idle in order to complete grace periods. Add this missing pair of calls in the m68k's idle loop. Reported-by: Paul E. McKenney Signed-off-by: Frederic Weisbecker Acked-by: Geert Uytterhoeven Cc: m68k Cc: 3.2.x.. Cc: Paul E. McKenney --- arch/m68k/kernel/process.c |3 +++ 1 files changed, 3 insertions(+), 0 deletions(-) diff --git a/arch/m68k/kernel/process.c b/arch/m68k/kernel/process.c index c488e3c..ac2892e 100644 --- a/arch/m68k/kernel/process.c +++ b/arch/m68k/kernel/process.c @@ -25,6 +25,7 @@ #include #include #include +#include #include #include @@ -75,8 +76,10 @@ void cpu_idle(void) { /* endless idle loop with no priority at all */ while (1) { + rcu_idle_enter(); while (!need_resched()) idle(); + rcu_idle_exit(); schedule_preempt_disabled(); } } -- 1.7.5.4 -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[PATCH 04/11] frv: Add missing RCU idle APIs on idle loop
In the old times, the whole idle task was considered as an RCU quiescent state. But as RCU became more and more successful overtime, some RCU read side critical section have been added even in the code of some architectures idle tasks, for tracing for example. So nowadays, rcu_idle_enter() and rcu_idle_exit() must be called by the architecture to tell RCU about the part in the idle loop that doesn't make use of rcu read side critical sections, typically the part that puts the CPU in low power mode. This is necessary for RCU to find the quiescent states in idle in order to complete grace periods. Add this missing pair of calls in the Frv's idle loop. Reported-by: Paul E. McKenney Signed-off-by: Frederic Weisbecker Cc: David Howells Cc: 3.2.x.. Cc: Paul E. McKenney --- arch/frv/kernel/process.c |3 +++ 1 files changed, 3 insertions(+), 0 deletions(-) diff --git a/arch/frv/kernel/process.c b/arch/frv/kernel/process.c index ff95f50..2eb7fa5 100644 --- a/arch/frv/kernel/process.c +++ b/arch/frv/kernel/process.c @@ -25,6 +25,7 @@ #include #include #include +#include #include #include @@ -69,12 +70,14 @@ void cpu_idle(void) { /* endless idle loop with no priority at all */ while (1) { + rcu_idle_enter(); while (!need_resched()) { check_pgt_cache(); if (!frv_dma_inprogress && idle) idle(); } + rcu_idle_exit(); schedule_preempt_disabled(); } -- 1.7.5.4 -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[PATCH 05/11] h8300: Add missing RCU idle APIs on idle loop
In the old times, the whole idle task was considered as an RCU quiescent state. But as RCU became more and more successful overtime, some RCU read side critical section have been added even in the code of some architectures idle tasks, for tracing for example. So nowadays, rcu_idle_enter() and rcu_idle_exit() must be called by the architecture to tell RCU about the part in the idle loop that doesn't make use of rcu read side critical sections, typically the part that puts the CPU in low power mode. This is necessary for RCU to find the quiescent states in idle in order to complete grace periods. Add this missing pair of calls in the h8300's idle loop. Reported-by: Paul E. McKenney Signed-off-by: Frederic Weisbecker Cc: Yoshinori Sato Cc: 3.2.x.. Cc: Paul E. McKenney --- arch/h8300/kernel/process.c |3 +++ 1 files changed, 3 insertions(+), 0 deletions(-) diff --git a/arch/h8300/kernel/process.c b/arch/h8300/kernel/process.c index 0e9c315..f153ed1 100644 --- a/arch/h8300/kernel/process.c +++ b/arch/h8300/kernel/process.c @@ -36,6 +36,7 @@ #include #include #include +#include #include #include @@ -78,8 +79,10 @@ void (*idle)(void) = default_idle; void cpu_idle(void) { while (1) { + rcu_idle_enter(); while (!need_resched()) idle(); + rcu_idle_exit(); schedule_preempt_disabled(); } } -- 1.7.5.4 -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[PATCH 01/11] alpha: Fix preemption handling in idle loop
cpu_idle() is called on the boot CPU by the init code with preemption disabled. But the cpu_idle() function in alpha doesn't handle this when it calls schedule() directly. Fix it by converting it into schedule_preempt_disabled(). Also disable preemption before calling cpu_idle() from secondary CPU entry code to stay consistent with this state. Signed-off-by: Frederic Weisbecker Cc: Richard Henderson Cc: Ivan Kokshaysky Cc: Matt Turner Cc: alpha Cc: Paul E. McKenney Cc: Michael Cree --- arch/alpha/kernel/process.c |3 ++- arch/alpha/kernel/smp.c |1 + 2 files changed, 3 insertions(+), 1 deletions(-) diff --git a/arch/alpha/kernel/process.c b/arch/alpha/kernel/process.c index 153d3fc..eac5e01 100644 --- a/arch/alpha/kernel/process.c +++ b/arch/alpha/kernel/process.c @@ -56,7 +56,8 @@ cpu_idle(void) while (!need_resched()) cpu_relax(); - schedule(); + + schedule_preempt_disabled(); } } diff --git a/arch/alpha/kernel/smp.c b/arch/alpha/kernel/smp.c index 35ddc02..a41ad90 100644 --- a/arch/alpha/kernel/smp.c +++ b/arch/alpha/kernel/smp.c @@ -166,6 +166,7 @@ smp_callin(void) DBGS(("smp_callin: commencing CPU %d current %p active_mm %p\n", cpuid, current, current->active_mm)); + preempt_disable(); /* Do nothing. */ cpu_idle(); } -- 1.7.5.4 -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH v2] fork: fix oops after fork failure
On Thu, Aug 23, 2012 at 07:36:08PM +0400, Glauber Costa wrote: > When we want to duplicate a new process, dup_task_struct() will undergo > a series of allocations. If alloc_thread_info_node() fails, we call > free_task_struct() and return. > > This seems right, but it is not. free_task_struct() will not only free > the task struct from the kmem_cache, but will also call > arch_release_task_struct(). The problem is that this function is > supposed to undo whatever arch-specific work done by > arch_dup_task_struct(), that is not yet called at this point. The > particular problem I ran accross was that in x86, we will arrive at > fpu_free() without having ever allocated it. > > Signed-off-by: Glauber Costa > Reported-by: Frederic Weisbecker Tested-by: Frederic Weisbecker -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: mmotm 2012-08-13-16-55 uploaded
On Tue, Aug 14, 2012 at 04:26:56PM +0400, Glauber Costa wrote: > On 08/14/2012 02:53 PM, Michal Hocko wrote: > > On Mon 13-08-12 16:56:50, Andrew Morton wrote: > >> > The mm-of-the-moment snapshot 2012-08-13-16-55 has been uploaded to > >> > > >> >http://www.ozlabs.org/~akpm/mmotm/ > > -mm git tree has been updated as well. You can find the tree at > > https://github.com/mstsxfx/memcg-devel.git since-3.5 > > > > tagged as mmotm-2012-08-13-16-55 > > > > On top of this tree, people following the kmemcg development may also > want to checkout > >git://github.com/glommer/linux.git memcg-3.5/kmemcg-stack > > A branch called memcg-3.5/kmemcg-slab is also available with the slab > changes ontop. I tested it successfully to stop a forkbomb in a container. One may need the following fix as well: http://marc.info/?l=linux-kernel&m=134573636430031&w=2 Andrew, others, what is your opinion on this patchset? Thanks. -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 00/10] rcu: Add missing RCU idle APIs on idle loop
On Thu, Aug 23, 2012 at 10:23:22PM +0200, Geert Uytterhoeven wrote: > Hi Frederic, > > On Thu, Aug 23, 2012 at 1:02 PM, Frederic Weisbecker > wrote: > > On Wed, Aug 22, 2012 at 07:18:04PM +0200, Geert Uytterhoeven wrote: > >> On Wed, Aug 22, 2012 at 6:23 PM, Frederic Weisbecker > >> wrote: > >> > So this fixes some potential RCU stalls in a bunch of architectures. > >> > When rcu_idle_enter()/rcu_idle_exit() became a requirement, we forgot > >> > to handle the architectures that don't support CONFIG_NO_HZ. > >> > > >> > I guess the set should be dispatched into arch maintainer trees. > >> > >> I can take the m68k version, but are you sure you want it this way? > >> Each of them must be in mainline before they can enter stable. > > > > Yeah, I was thinking the right route is for these patches to be > > carried by arch maintainer who then push to Linus and then this goes > > to stable. > > > > Is that ok for you? > > > > Otherwise I can carry the patches myself. In a tree of my own, or > > Paul's or mmotm. As long as I have your ack. > > I applied your patch to the m68k for-3.6/for-linus branch. > I'll ask Linus to pull later in the rc cycle (right now I don't have > anything else > queued for 3.6). > Still, I think it's better to just collect acks and send it to Linus > in one shot, > so it can go into stable in one shot too. Sure I can do that if you prefer. Thanks. > > Gr{oetje,eeting}s, > > Geert > > -- > Geert Uytterhoeven -- There's lots of Linux beyond ia32 -- > ge...@linux-m68k.org > > In personal conversations with technical people, I call myself a hacker. But > when I'm talking to journalists I just say "programmer" or something like > that. > -- Linus Torvalds -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 3/4] perf: teach perf inject to merge sched_stat_* and sched_switch events (v2)
On Tue, Aug 07, 2012 at 04:56:04PM +0400, Andrew Vagin wrote: > +struct event_entry { > + struct list_head node; > + u32 pid; > + union perf_event event[0]; > +}; > + > +static LIST_HEAD(samples); > + > +static int perf_event__sched_stat(struct perf_tool *tool, > + union perf_event *event, > + struct perf_sample *sample, > + struct perf_evsel *evsel, > + struct machine *machine) > +{ > + const char *evname = NULL; > + uint32_t size; > + struct event_entry *ent; > + union perf_event *event_sw = NULL; > + struct perf_sample sample_sw; > + int sched_process_exit; > + > + size = event->header.size; > + > + evname = evsel->tp_format->name; > + > + sched_process_exit = !strcmp(evname, "sched_process_exit"); > + > + if (!strcmp(evname, "sched_switch") || sched_process_exit) { > + list_for_each_entry(ent, &samples, node) > + if (sample->pid == ent->pid) I suspect what you're rather interested in is the sample tid. > + break; > + > + if (&ent->node != &samples) { > + list_del(&ent->node); > + free(ent); > + } > + > + if (sched_process_exit) > + return 0; > + > + ent = malloc(size + sizeof(struct event_entry)); > + if (ent == NULL) > + die("malloc"); > + ent->pid = sample->pid; Ditto. > + memcpy(&ent->event, event, size); > + list_add(&ent->node, &samples); > + return 0; > + > + } else if (!strncmp(evname, "sched_stat_", 11)) { > + u32 pid; > + > + pid = raw_field_value(evsel->tp_format, > + "pid", sample->raw_data); There you parse the pid from the trace content. That's fine because it's actually the tid that is saved on the trace event. But this one is not pid-namespace safe (it saves current->pid directly) while sample->tid is pid-namespace safe (it uses task_pid_nr_ns). So I suggest you to use sample->tid instead, plus that's going to be consistant with what you did above. Thanks. -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 00/11] rcu: Add missing RCU idle APIs on idle loop v2
On Fri, Aug 24, 2012 at 08:50:47PM -0700, Paul E. McKenney wrote: > On Sat, Aug 25, 2012 at 02:19:14AM +0100, Ben Hutchings wrote: > > On Fri, 2012-08-24 at 14:26 -0700, Paul E. McKenney wrote: > > > On Thu, Aug 23, 2012 at 04:58:24PM +0200, Frederic Weisbecker wrote: > > > > Hi, > > > > > > > > Changes since v1: > > > > > > > > - Fixed preempt handling in alpha idle loop > > > > - added ack from Geert > > > > - fixed stable email address, sorry :-/ > > > > > > > > This time I built tested everywhere but: h8300 (compiler internal > > > > error), > > > > and mn10300, parisc, score (cross compilers not available in > > > > ftp://ftp.kernel.org/pub/tools/crosstool/files/bin/x86_64/4.6.3/) > > > > > > > > For testing, you can pull from: > > > > > > > > git://github.com/fweisbec/linux-dynticks.git > > > > rcu/idle-fix-v2 > > > > > > > > Thanks. > > > > > > I have queued these on -rcu branch rcu/idle: > > > > > > git://git.kernel.org/pub/scm/linux/kernel/git/paulmck/linux-rcu.git > > > > > > This problem has been in place since 3.3, so it is hard to argue that > > > it is a regression for this merge window. I have therefore queued it > > > for 3.7. > > > > I don't follow that; I would expect any serious bug fix (serious enough > > for a stable update) to be acceptable for 3.6 at this point. > > OK, if any of the arch maintainers wishes to submit the patch to 3.6, > they are free to do so -- just let me know and I will drop the patch from > my tree. > > That said, all this does is cause spurious warnings to be printed, so > not sure it really qualifies as serious. But I am happy to leave that > decision with the individual arch maintainers -- it is their arch, > after all, so their decision. Couldn't that cause hung tasks due to long lasting synchronize_rcu() ? -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: Where to put test code?
2012/9/19 Daniel Santos : > I'm putting the finishing touches on the generic red-black tree test > code, but I'm uncertain about where to place it exactly. > > I haven't finished the test module just yet, but the idea is that the > tests can be run in userspace as well as kernelspace to make it easier > to test on multiple compilers. It has some common sources files (used > by in both places) and then specific code for both user- and > kernel-space that I currently have as follows: > > tools/testing/selftests/grbtree/ - common.{c,h} > tools/testing/selftests/grbtree/user - user-space main.c, Makefile, etc. > tools/testing/selftests/grbtree/module - kernel-space grbtest.c, > Makefile, etc. > > Would this be correct or should the common & module code go some place > else and then just have the user-space code under > tools/testing/selftests/grbtest? It depends on the nature of your tests. Are these pure validation tests (some batch tests that perform actions and check the result is correct) or stress tests (something that runs for a while)? If these are only about validation tests, then both user and module can be in that tools/testing/selftests directory. What is the module doing? -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: Where to put test code?
2012/9/20 Daniel Santos : > Thanks for the response! > > On 09/19/2012 05:18 PM, Frederic Weisbecker wrote: >> 2012/9/19 Daniel Santos : >>> I'm putting the finishing touches on the generic red-black tree test >>> code, but I'm uncertain about where to place it exactly. >>> >>> I haven't finished the test module just yet, but the idea is that the >>> tests can be run in userspace as well as kernelspace to make it easier >>> to test on multiple compilers. It has some common sources files (used >>> by in both places) and then specific code for both user- and >>> kernel-space that I currently have as follows: >>> >>> tools/testing/selftests/grbtree/ - common.{c,h} >>> tools/testing/selftests/grbtree/user - user-space main.c, Makefile, etc. >>> tools/testing/selftests/grbtree/module - kernel-space grbtest.c, >>> Makefile, etc. >>> >>> Would this be correct or should the common & module code go some place >>> else and then just have the user-space code under >>> tools/testing/selftests/grbtest? >> It depends on the nature of your tests. Are these pure validation >> tests (some batch >> tests that perform actions and check the result is correct) or stress >> tests (something >> that runs for a while)? > The program does both performance measurement tests and validation tests > based upon what you pass at the command line. The primary aim is to > measure performance differences between the generic code and specific > (hand-coded) implementations on various compilers. The secondary aim is > to provide validation that the results are correct in all > circumstances. I'm not sure in this case what would be considered a > "stress" test. Ok. The selftests in tools/testing/selftest run in batch, so if there is one in the middle that does stress tests for a while, it delays the other tests. The purpose for these units tests are to quickly detect for regressions or anything that break expected results. Your test sounds like a good candidate for that directory I guess. > >> If these are only about validation tests, then both user and module >> can be in that >> tools/testing/selftests directory. >> >> What is the module doing? > The module is the exact same thing, except built in kernel-space, where > the actual code will normally reside. Parameters are passed when you > load the module and it unloads when the test is complete. Perhaps what > I omitted is that the user-space program is generated partially by > compiling sources and headers that are intended for kernel-space only, > but linked with glibc using some cute hacks. This is done mostly to > ease the process of testing the code with multiple compilers. Ok, looks good as well. Thanks! -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[PATCH 0/2] perf tools: Basic bash completion support
Hey, Basic bash completion support. Only support perf subcommands and most -e basic event descriptor (no grouping). I just have a small issue with tracepoints because of their ":" in the middle. It auto completes as long as we haven't yet reached the semicolon. Otherwise we need to add a double quote in the beginning of the expression. I'm quite a newbie in bash completion though, so I might find a subtelty later to solve this. Frederic Weisbecker (2): perf tools: Initial bash completion support perf tools: Support for events bash completion tools/perf/Makefile|1 + tools/perf/bash_completion | 24 ++ tools/perf/builtin-list.c | 14 --- tools/perf/perf.c | 69 ++- tools/perf/util/parse-events.c | 70 +--- tools/perf/util/parse-events.h |7 ++-- 6 files changed, 120 insertions(+), 65 deletions(-) create mode 100644 tools/perf/bash_completion -- 1.7.5.4 -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[PATCH 2/2] perf tools: Support for events bash completion
Add basic bash completion for the -e option in record, top and stat subcommands. Only hardware, software and tracepoint events are supported. Breakpoints, raw events and events grouping completion need more thinking. Signed-off-by: Frederic Weisbecker Cc: David Ahern Cc: Ingo Molnar Cc: Jiri Olsa Cc: Namhyung Kim Cc: Peter Zijlstra Cc: Stephane Eranian --- tools/perf/bash_completion |6 +++- tools/perf/builtin-list.c | 14 --- tools/perf/util/parse-events.c | 70 +--- tools/perf/util/parse-events.h |7 ++-- 4 files changed, 61 insertions(+), 36 deletions(-) diff --git a/tools/perf/bash_completion b/tools/perf/bash_completion index 3547703..25f4d99 100644 --- a/tools/perf/bash_completion +++ b/tools/perf/bash_completion @@ -6,12 +6,16 @@ _perf() local cur COMPREPLY=() - _get_comp_words_by_ref cur + _get_comp_words_by_ref cur prev # List perf subcommands if [ $COMP_CWORD -eq 1 ]; then cmds=$(perf --list-cmds) COMPREPLY=( $( compgen -W '$cmds' -- "$cur" ) ) + # List possible events for -e option + elif [[ $prev == "-e" && "${COMP_WORDS[1]}" == @(record|stat|top) ]]; then + cmds=$(perf list --raw-dump) + COMPREPLY=( $( compgen -W '$cmds' -- $cur ) ) # Fall down to list regular files else _filedir diff --git a/tools/perf/builtin-list.c b/tools/perf/builtin-list.c index 6313b6e..bdcff81 100644 --- a/tools/perf/builtin-list.c +++ b/tools/perf/builtin-list.c @@ -19,15 +19,15 @@ int cmd_list(int argc, const char **argv, const char *prefix __used) setup_pager(); if (argc == 1) - print_events(NULL); + print_events(NULL, false); else { int i; for (i = 1; i < argc; ++i) { - if (i > 1) + if (i > 2) putchar('\n'); if (strncmp(argv[i], "tracepoint", 10) == 0) - print_tracepoint_events(NULL, NULL); + print_tracepoint_events(NULL, NULL, false); else if (strcmp(argv[i], "hw") == 0 || strcmp(argv[i], "hardware") == 0) print_events_type(PERF_TYPE_HARDWARE); @@ -36,13 +36,15 @@ int cmd_list(int argc, const char **argv, const char *prefix __used) print_events_type(PERF_TYPE_SOFTWARE); else if (strcmp(argv[i], "cache") == 0 || strcmp(argv[i], "hwcache") == 0) - print_hwcache_events(NULL); + print_hwcache_events(NULL, false); + else if (strcmp(argv[i], "--raw-dump") == 0) + print_events(NULL, true); else { char *sep = strchr(argv[i], ':'), *s; int sep_idx; if (sep == NULL) { - print_events(argv[i]); + print_events(argv[i], false); continue; } sep_idx = sep - argv[i]; @@ -51,7 +53,7 @@ int cmd_list(int argc, const char **argv, const char *prefix __used) return -1; s[sep_idx] = '\0'; - print_tracepoint_events(s, s + sep_idx + 1); + print_tracepoint_events(s, s + sep_idx + 1, false); free(s); } } diff --git a/tools/perf/util/parse-events.c b/tools/perf/util/parse-events.c index 74a5af4..30dba72 100644 --- a/tools/perf/util/parse-events.c +++ b/tools/perf/util/parse-events.c @@ -799,7 +799,8 @@ static const char * const event_type_descriptors[] = { * Print the events from /tracing/events */ -void print_tracepoint_events(const char *subsys_glob, const char *event_glob) +void print_tracepoint_events(const char *subsys_glob, const char *event_glob, +bool name_only) { DIR *sys_dir, *evt_dir; struct dirent *sys_next, *evt_next, sys_dirent, evt_dirent; @@ -829,6 +830,11 @@ void print_tracepoint_events(const char *subsys_glob, const char *event_glob) !strglobmatch(evt_dirent.d_name, event_glob)) continue; + if (name_only) { + printf("%s:%s ", sys_
[PATCH 1/2] perf tools: Initial bash completion support
This implements bash completion for perf subcommands such as record, report, script, probe, etc... Signed-off-by: Frederic Weisbecker Cc: David Ahern Cc: Ingo Molnar Cc: Jiri Olsa Cc: Namhyung Kim Cc: Peter Zijlstra Cc: Stephane Eranian --- tools/perf/Makefile|1 + tools/perf/bash_completion | 20 + tools/perf/perf.c | 69 +--- 3 files changed, 60 insertions(+), 30 deletions(-) create mode 100644 tools/perf/bash_completion diff --git a/tools/perf/Makefile b/tools/perf/Makefile index 35655c3..4000d72 100644 --- a/tools/perf/Makefile +++ b/tools/perf/Makefile @@ -951,6 +951,7 @@ install: all $(INSTALL) scripts/python/Perf-Trace-Util/lib/Perf/Trace/* -t '$(DESTDIR_SQ)$(perfexec_instdir_SQ)/scripts/python/Perf-Trace-Util/lib/Perf/Trace' $(INSTALL) scripts/python/*.py -t '$(DESTDIR_SQ)$(perfexec_instdir_SQ)/scripts/python' $(INSTALL) scripts/python/bin/* -t '$(DESTDIR_SQ)$(perfexec_instdir_SQ)/scripts/python/bin' + $(INSTALL) -m 755 bash_completion /etc/bash_completion.d/perf install-python_ext: $(PYTHON_WORD) util/setup.py --quiet install --root='/$(DESTDIR_SQ)' diff --git a/tools/perf/bash_completion b/tools/perf/bash_completion new file mode 100644 index 000..3547703 --- /dev/null +++ b/tools/perf/bash_completion @@ -0,0 +1,20 @@ +# perf completion + +have perf && +_perf() +{ + local cur + + COMPREPLY=() + _get_comp_words_by_ref cur + + # List perf subcommands + if [ $COMP_CWORD -eq 1 ]; then + cmds=$(perf --list-cmds) + COMPREPLY=( $( compgen -W '$cmds' -- "$cur" ) ) + # Fall down to list regular files + else + _filedir + fi +} && +complete -F _perf perf diff --git a/tools/perf/perf.c b/tools/perf/perf.c index 2b2e225..db37ee3 100644 --- a/tools/perf/perf.c +++ b/tools/perf/perf.c @@ -24,6 +24,37 @@ const char perf_more_info_string[] = int use_browser = -1; static int use_pager = -1; +struct cmd_struct { + const char *cmd; + int (*fn)(int, const char **, const char *); + int option; +}; + +static struct cmd_struct commands[] = { + { "buildid-cache", cmd_buildid_cache, 0 }, + { "buildid-list", cmd_buildid_list, 0 }, + { "diff", cmd_diff, 0 }, + { "evlist", cmd_evlist, 0 }, + { "help", cmd_help, 0 }, + { "list", cmd_list, 0 }, + { "record", cmd_record, 0 }, + { "report", cmd_report, 0 }, + { "bench", cmd_bench, 0 }, + { "stat", cmd_stat, 0 }, + { "timechart", cmd_timechart, 0 }, + { "top",cmd_top,0 }, + { "annotate", cmd_annotate, 0 }, + { "version",cmd_version,0 }, + { "script", cmd_script, 0 }, + { "sched", cmd_sched, 0 }, + { "probe", cmd_probe, 0 }, + { "kmem", cmd_kmem, 0 }, + { "lock", cmd_lock, 0 }, + { "kvm",cmd_kvm,0 }, + { "test", cmd_test, 0 }, + { "inject", cmd_inject, 0 }, +}; + struct pager_config { const char *cmd; int val; @@ -160,6 +191,14 @@ static int handle_options(const char ***argv, int *argc, int *envchanged) fprintf(stderr, "dir: %s\n", debugfs_mountpoint); if (envchanged) *envchanged = 1; + } else if (!strcmp(cmd, "--list-cmds")) { + unsigned int i; + + for (i = 0; i < ARRAY_SIZE(commands); i++) { + struct cmd_struct *p = commands+i; + printf("%s ", p->cmd); + } + exit(0); } else { fprintf(stderr, "Unknown option: %s\n", cmd); usage(perf_usage_string); @@ -245,12 +284,6 @@ const char perf_version_string[] = PERF_VERSION; */ #define NEED_WORK_TREE (1<<2) -struct cmd_struct { - const char *cmd; - int (*fn)(int, const char **, const char *); - int option; -}; - static int run_builtin(struct cmd_struct *p, int argc, const char **argv) { int status; @@ -296,30 +329,6 @@ static int run_builtin(struct cmd_struct *p, int argc, const char **argv) static void handle_internal_command(int argc, const char **argv) { const char *cmd = argv[0]; - static struct cmd_struct commands[] = { - { "build
Re: [PATCH 0/2] perf tools: Basic bash completion support
On Tue, Aug 07, 2012 at 03:19:44PM +0200, Frederic Weisbecker wrote: > Hey, > > Basic bash completion support. Only support perf subcommands and most -e basic > event descriptor (no grouping). > > I just have a small issue with tracepoints because of their ":" in the middle. > It auto completes as long as we haven't yet reached the semicolon. Otherwise > we need to add a double quote in the beginning of the expression. I'm quite > a newbie in bash completion though, so I might find a subtelty later to solve > this. Tips: for testing, you need to "make install" and update the bash completion scripts: # make install $ . /etc/bash_completion -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 0/2] perf tools: Basic bash completion support
On Tue, Aug 07, 2012 at 08:18:12AM -0600, David Ahern wrote: > On 8/7/12 7:22 AM, Frederic Weisbecker wrote: > >On Tue, Aug 07, 2012 at 03:19:44PM +0200, Frederic Weisbecker wrote: > >>Hey, > >> > >>Basic bash completion support. Only support perf subcommands and most -e > >>basic > >>event descriptor (no grouping). > >> > >>I just have a small issue with tracepoints because of their ":" in the > >>middle. > >>It auto completes as long as we haven't yet reached the semicolon. Otherwise > >>we need to add a double quote in the beginning of the expression. I'm quite > >>a newbie in bash completion though, so I might find a subtelty later to > >>solve > >>this. > > > >Tips: for testing, you need to "make install" and update the bash completion > >scripts: > > > > # make install > > $ . /etc/bash_completion > > > > ANd you need to make sure the PATH hits the updated binary and not > the default other wise you end up with: > > /tmp/pbuild/perf recUnknown option: --list-cmds > > Usage: perf [--version] [--help] COMMAND [ARGS] > Unknown option: --list-cmds > > It's calling /usr/bin/perf with --list-cmds, versus the perf command > I am running (/tmp/pbuild/perf). Any way to teach the completion to > use the perf binary that the user is running? Ah good point. Does the below work for you? I'll respin with that change. diff --git a/tools/perf/bash_completion b/tools/perf/bash_completion index 25f4d99..cba72a9 100644 --- a/tools/perf/bash_completion +++ b/tools/perf/bash_completion @@ -3,18 +3,20 @@ have perf && _perf() { - local cur + local cur cmd COMPREPLY=() _get_comp_words_by_ref cur prev + cmd=${COMP_WORDS[0]} + # List perf subcommands if [ $COMP_CWORD -eq 1 ]; then - cmds=$(perf --list-cmds) + cmds=$($cmd --list-cmds) COMPREPLY=( $( compgen -W '$cmds' -- "$cur" ) ) # List possible events for -e option elif [[ $prev == "-e" && "${COMP_WORDS[1]}" == @(record|stat|top) ]]; then - cmds=$(perf list --raw-dump) + cmds=$($cmd list --raw-dump) COMPREPLY=( $( compgen -W '$cmds' -- $cur ) ) # Fall down to list regular files else -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 2/2] perf tools: Support for events bash completion
On Tue, Aug 07, 2012 at 08:48:04AM -0600, David Ahern wrote: > On 8/7/12 7:19 AM, Frederic Weisbecker wrote: > >Add basic bash completion for the -e option in record, top > >and stat subcommands. Only hardware, software and tracepoint > >events are supported. > > > >Breakpoints, raw events and events grouping completion > >need more thinking. > > > >Signed-off-by: Frederic Weisbecker > >Cc: David Ahern > >Cc: Ingo Molnar > >Cc: Jiri Olsa > >Cc: Namhyung Kim > >Cc: Peter Zijlstra > >Cc: Stephane Eranian > >--- > > tools/perf/bash_completion |6 +++- > > tools/perf/builtin-list.c | 14 --- > > tools/perf/util/parse-events.c | 70 > > +--- > > tools/perf/util/parse-events.h |7 ++-- > > 4 files changed, 61 insertions(+), 36 deletions(-) > > > >diff --git a/tools/perf/bash_completion b/tools/perf/bash_completion > >index 3547703..25f4d99 100644 > >--- a/tools/perf/bash_completion > >+++ b/tools/perf/bash_completion > >@@ -6,12 +6,16 @@ _perf() > > local cur > > > > COMPREPLY=() > >-_get_comp_words_by_ref cur > >+_get_comp_words_by_ref cur prev > > > > # List perf subcommands > > if [ $COMP_CWORD -eq 1 ]; then > > cmds=$(perf --list-cmds) > > COMPREPLY=( $( compgen -W '$cmds' -- "$cur" ) ) > >+# List possible events for -e option > >+elif [[ $prev == "-e" && "${COMP_WORDS[1]}" == @(record|stat|top) ]]; > >then > >+cmds=$(perf list --raw-dump) > >+COMPREPLY=( $( compgen -W '$cmds' -- $cur ) ) > > # Fall down to list regular files > > else > > _filedir > > Any reason to show a file list except for -i and -o options? e.g., Yeah, for example with perf record when you pass a command to launch and profile. In any case I think it's a better idea to keep this as a default. Not breaking the pre-existing default completion in the guarantee that the new completion is going to be more useful than a burden. > > diff --git a/tools/perf/bash_completion b/tools/perf/bash_completion > index 25f4d99..be97349 100644 > --- a/tools/perf/bash_completion > +++ b/tools/perf/bash_completion > @@ -17,7 +17,7 @@ _perf() > cmds=$(perf list --raw-dump) > COMPREPLY=( $( compgen -W '$cmds' -- $cur ) ) > # Fall down to list regular files > - else > + elif [[ $prev == "-o" || $prev == "-i" ]]; then > _filedir > fi > } && > -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 1/2] perf tools: Initial bash completion support
On Tue, Aug 07, 2012 at 08:11:46AM -0600, David Ahern wrote: > On 8/7/12 7:19 AM, Frederic Weisbecker wrote: > >This implements bash completion for perf subcommands such > >as record, report, script, probe, etc... > > Love it! > > > > >Signed-off-by: Frederic Weisbecker > >Cc: David Ahern > >Cc: Ingo Molnar > >Cc: Jiri Olsa > >Cc: Namhyung Kim > >Cc: Peter Zijlstra > >Cc: Stephane Eranian > >--- > > tools/perf/Makefile|1 + > > tools/perf/bash_completion | 20 + > > tools/perf/perf.c | 69 > > +--- > > 3 files changed, 60 insertions(+), 30 deletions(-) > > create mode 100644 tools/perf/bash_completion > > > >diff --git a/tools/perf/Makefile b/tools/perf/Makefile > >index 35655c3..4000d72 100644 > >--- a/tools/perf/Makefile > >+++ b/tools/perf/Makefile > >@@ -951,6 +951,7 @@ install: all > > $(INSTALL) scripts/python/Perf-Trace-Util/lib/Perf/Trace/* -t > > '$(DESTDIR_SQ)$(perfexec_instdir_SQ)/scripts/python/Perf-Trace-Util/lib/Perf/Trace' > > $(INSTALL) scripts/python/*.py -t > > '$(DESTDIR_SQ)$(perfexec_instdir_SQ)/scripts/python' > > $(INSTALL) scripts/python/bin/* -t > > '$(DESTDIR_SQ)$(perfexec_instdir_SQ)/scripts/python/bin' > >+$(INSTALL) -m 755 bash_completion /etc/bash_completion.d/perf > > $(DESTDIR_SQ) is need in front of the destination. Right. Fixing this. Thanks. > > David -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 2/2] perf tools: Support for events bash completion
On Tue, Aug 07, 2012 at 05:05:04PM +0100, Alan Cox wrote: > > > COMPREPLY=( $( compgen -W '$cmds' -- "$cur" ) ) > > > + # List possible events for -e option > > > + elif [[ $prev == "-e" && "${COMP_WORDS[1]}" == @(record|stat|top) ]]; > > > then > > > + cmds=$(perf list --raw-dump) > > > + COMPREPLY=( $( compgen -W '$cmds' -- $cur ) ) > > > Surely $cur should be quoted here... Right, fixing that too. thanks. > Alan -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[PATCH 0/2] perf tools: Basic bash completion support v2
Changes since v1: - Reuse the perf binary of the user to send the "perf --list-cmds" and "perf list --raw-dump" instead of the default one. (suggested by David Ahern) - Install in DESTDIR_SQ (suggested by David Ahern) - Protect $cur under quotes on compgen cmdline (suggested by Alan Cox) Frederic Weisbecker (2): perf tools: Initial bash completion support perf tools: Support for events bash completion tools/perf/Makefile|1 + tools/perf/bash_completion | 26 +++ tools/perf/builtin-list.c | 14 --- tools/perf/perf.c | 69 ++- tools/perf/util/parse-events.c | 70 +--- tools/perf/util/parse-events.h |7 ++-- 6 files changed, 122 insertions(+), 65 deletions(-) create mode 100644 tools/perf/bash_completion -- 1.7.5.4 -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[PATCH 1/2] perf tools: Initial bash completion support
This implements bash completion for perf subcommands such as record, report, script, probe, etc... Signed-off-by: Frederic Weisbecker Cc: David Ahern Cc: Ingo Molnar Cc: Jiri Olsa Cc: Namhyung Kim Cc: Peter Zijlstra Cc: Stephane Eranian --- tools/perf/Makefile|1 + tools/perf/bash_completion | 22 ++ tools/perf/perf.c | 69 +--- 3 files changed, 62 insertions(+), 30 deletions(-) create mode 100644 tools/perf/bash_completion diff --git a/tools/perf/Makefile b/tools/perf/Makefile index 35655c3..ddfb7e5 100644 --- a/tools/perf/Makefile +++ b/tools/perf/Makefile @@ -951,6 +951,7 @@ install: all $(INSTALL) scripts/python/Perf-Trace-Util/lib/Perf/Trace/* -t '$(DESTDIR_SQ)$(perfexec_instdir_SQ)/scripts/python/Perf-Trace-Util/lib/Perf/Trace' $(INSTALL) scripts/python/*.py -t '$(DESTDIR_SQ)$(perfexec_instdir_SQ)/scripts/python' $(INSTALL) scripts/python/bin/* -t '$(DESTDIR_SQ)$(perfexec_instdir_SQ)/scripts/python/bin' + $(INSTALL) -m 755 bash_completion $(DESTDIR_SQ)/etc/bash_completion.d/perf install-python_ext: $(PYTHON_WORD) util/setup.py --quiet install --root='/$(DESTDIR_SQ)' diff --git a/tools/perf/bash_completion b/tools/perf/bash_completion new file mode 100644 index 000..9a31fa5 --- /dev/null +++ b/tools/perf/bash_completion @@ -0,0 +1,22 @@ +# perf completion + +have perf && +_perf() +{ + local cur cmd + + COMPREPLY=() + _get_comp_words_by_ref cur + + cmd=${COMP_WORDS[0]} + + # List perf subcommands + if [ $COMP_CWORD -eq 1 ]; then + cmds=$($cmd --list-cmds) + COMPREPLY=( $( compgen -W '$cmds' -- "$cur" ) ) + # Fall down to list regular files + else + _filedir + fi +} && +complete -F _perf perf diff --git a/tools/perf/perf.c b/tools/perf/perf.c index 2b2e225..db37ee3 100644 --- a/tools/perf/perf.c +++ b/tools/perf/perf.c @@ -24,6 +24,37 @@ const char perf_more_info_string[] = int use_browser = -1; static int use_pager = -1; +struct cmd_struct { + const char *cmd; + int (*fn)(int, const char **, const char *); + int option; +}; + +static struct cmd_struct commands[] = { + { "buildid-cache", cmd_buildid_cache, 0 }, + { "buildid-list", cmd_buildid_list, 0 }, + { "diff", cmd_diff, 0 }, + { "evlist", cmd_evlist, 0 }, + { "help", cmd_help, 0 }, + { "list", cmd_list, 0 }, + { "record", cmd_record, 0 }, + { "report", cmd_report, 0 }, + { "bench", cmd_bench, 0 }, + { "stat", cmd_stat, 0 }, + { "timechart", cmd_timechart, 0 }, + { "top",cmd_top,0 }, + { "annotate", cmd_annotate, 0 }, + { "version",cmd_version,0 }, + { "script", cmd_script, 0 }, + { "sched", cmd_sched, 0 }, + { "probe", cmd_probe, 0 }, + { "kmem", cmd_kmem, 0 }, + { "lock", cmd_lock, 0 }, + { "kvm",cmd_kvm,0 }, + { "test", cmd_test, 0 }, + { "inject", cmd_inject, 0 }, +}; + struct pager_config { const char *cmd; int val; @@ -160,6 +191,14 @@ static int handle_options(const char ***argv, int *argc, int *envchanged) fprintf(stderr, "dir: %s\n", debugfs_mountpoint); if (envchanged) *envchanged = 1; + } else if (!strcmp(cmd, "--list-cmds")) { + unsigned int i; + + for (i = 0; i < ARRAY_SIZE(commands); i++) { + struct cmd_struct *p = commands+i; + printf("%s ", p->cmd); + } + exit(0); } else { fprintf(stderr, "Unknown option: %s\n", cmd); usage(perf_usage_string); @@ -245,12 +284,6 @@ const char perf_version_string[] = PERF_VERSION; */ #define NEED_WORK_TREE (1<<2) -struct cmd_struct { - const char *cmd; - int (*fn)(int, const char **, const char *); - int option; -}; - static int run_builtin(struct cmd_struct *p, int argc, const char **argv) { int status; @@ -296,30 +329,6 @@ static int run_builtin(struct cmd_struct *p, int argc, const char **argv) static void handle_internal_command(int argc, const char **argv) { const char *cmd = argv[0]; - static struct cm
[PATCH 2/2] perf tools: Support for events bash completion
Add basic bash completion for the -e option in record, top and stat subcommands. Only hardware, software and tracepoint events are supported. Breakpoints, raw events and events grouping completion need more thinking. Signed-off-by: Frederic Weisbecker Cc: David Ahern Cc: Ingo Molnar Cc: Jiri Olsa Cc: Namhyung Kim Cc: Peter Zijlstra Cc: Stephane Eranian --- tools/perf/bash_completion |6 +++- tools/perf/builtin-list.c | 14 --- tools/perf/util/parse-events.c | 70 +--- tools/perf/util/parse-events.h |7 ++-- 4 files changed, 61 insertions(+), 36 deletions(-) diff --git a/tools/perf/bash_completion b/tools/perf/bash_completion index 9a31fa5..1958fa5 100644 --- a/tools/perf/bash_completion +++ b/tools/perf/bash_completion @@ -6,7 +6,7 @@ _perf() local cur cmd COMPREPLY=() - _get_comp_words_by_ref cur + _get_comp_words_by_ref cur prev cmd=${COMP_WORDS[0]} @@ -14,6 +14,10 @@ _perf() if [ $COMP_CWORD -eq 1 ]; then cmds=$($cmd --list-cmds) COMPREPLY=( $( compgen -W '$cmds' -- "$cur" ) ) + # List possible events for -e option + elif [[ $prev == "-e" && "${COMP_WORDS[1]}" == @(record|stat|top) ]]; then + cmds=$($cmd list --raw-dump) + COMPREPLY=( $( compgen -W '$cmds' -- "$cur" ) ) # Fall down to list regular files else _filedir diff --git a/tools/perf/builtin-list.c b/tools/perf/builtin-list.c index 6313b6e..bdcff81 100644 --- a/tools/perf/builtin-list.c +++ b/tools/perf/builtin-list.c @@ -19,15 +19,15 @@ int cmd_list(int argc, const char **argv, const char *prefix __used) setup_pager(); if (argc == 1) - print_events(NULL); + print_events(NULL, false); else { int i; for (i = 1; i < argc; ++i) { - if (i > 1) + if (i > 2) putchar('\n'); if (strncmp(argv[i], "tracepoint", 10) == 0) - print_tracepoint_events(NULL, NULL); + print_tracepoint_events(NULL, NULL, false); else if (strcmp(argv[i], "hw") == 0 || strcmp(argv[i], "hardware") == 0) print_events_type(PERF_TYPE_HARDWARE); @@ -36,13 +36,15 @@ int cmd_list(int argc, const char **argv, const char *prefix __used) print_events_type(PERF_TYPE_SOFTWARE); else if (strcmp(argv[i], "cache") == 0 || strcmp(argv[i], "hwcache") == 0) - print_hwcache_events(NULL); + print_hwcache_events(NULL, false); + else if (strcmp(argv[i], "--raw-dump") == 0) + print_events(NULL, true); else { char *sep = strchr(argv[i], ':'), *s; int sep_idx; if (sep == NULL) { - print_events(argv[i]); + print_events(argv[i], false); continue; } sep_idx = sep - argv[i]; @@ -51,7 +53,7 @@ int cmd_list(int argc, const char **argv, const char *prefix __used) return -1; s[sep_idx] = '\0'; - print_tracepoint_events(s, s + sep_idx + 1); + print_tracepoint_events(s, s + sep_idx + 1, false); free(s); } } diff --git a/tools/perf/util/parse-events.c b/tools/perf/util/parse-events.c index 74a5af4..30dba72 100644 --- a/tools/perf/util/parse-events.c +++ b/tools/perf/util/parse-events.c @@ -799,7 +799,8 @@ static const char * const event_type_descriptors[] = { * Print the events from /tracing/events */ -void print_tracepoint_events(const char *subsys_glob, const char *event_glob) +void print_tracepoint_events(const char *subsys_glob, const char *event_glob, +bool name_only) { DIR *sys_dir, *evt_dir; struct dirent *sys_next, *evt_next, sys_dirent, evt_dirent; @@ -829,6 +830,11 @@ void print_tracepoint_events(const char *subsys_glob, const char *event_glob) !strglobmatch(evt_dirent.d_name, event_glob)) continue; + if (name_only) { + p
Re: [PATCH 1/2] perf tools: Initial bash completion support
On Wed, Aug 08, 2012 at 10:10:02AM +0900, Namhyung Kim wrote: > On Tue, 07 Aug 2012 16:10:54 -0600, David Ahern wrote: > > On 8/7/12 11:00 AM, Frederic Weisbecker wrote: > >> diff --git a/tools/perf/Makefile b/tools/perf/Makefile > >> index 35655c3..ddfb7e5 100644 > >> --- a/tools/perf/Makefile > >> +++ b/tools/perf/Makefile > >> @@ -951,6 +951,7 @@ install: all > >>$(INSTALL) scripts/python/Perf-Trace-Util/lib/Perf/Trace/* -t > >> '$(DESTDIR_SQ)$(perfexec_instdir_SQ)/scripts/python/Perf-Trace-Util/lib/Perf/Trace' > >>$(INSTALL) scripts/python/*.py -t > >> '$(DESTDIR_SQ)$(perfexec_instdir_SQ)/scripts/python' > >>$(INSTALL) scripts/python/bin/* -t > >> '$(DESTDIR_SQ)$(perfexec_instdir_SQ)/scripts/python/bin' > >> + $(INSTALL) -m 755 bash_completion > >> $(DESTDIR_SQ)/etc/bash_completion.d/perf > > > > still getting an error here: > > > > $ make DESTDIR=/tmp/junk-perf O=/tmp/pbuild -C tools/perf/ install > > ... > > install -m 755 bash_completion /tmp/junk-perf/etc/bash_completion.d/perf > > install: cannot create regular file > > /tmp/junk-perf/etc/bash_completion.d/perf': No such file or directory > > make: *** [install] Error 1 > > make: Leaving directory `/opt/sw/ahern/perf.git/tools/perf' > > Does patch below fix it? Thanks Namhyung. Can I have your signed-off-by to add this patch on my series? Thanks. > > > diff --git a/tools/perf/Makefile b/tools/perf/Makefile > index cfe4fc0b67f1..d0b27ba9663e 100644 > --- a/tools/perf/Makefile > +++ b/tools/perf/Makefile > @@ -696,6 +696,7 @@ perfexecdir_SQ = $(subst ','\'',$(perfexecdir)) > template_dir_SQ = $(subst ','\'',$(template_dir)) > htmldir_SQ = $(subst ','\'',$(htmldir)) > prefix_SQ = $(subst ','\'',$(prefix)) > +sysconfdir_SQ = $(subst ','\'',$(sysconfdir)) > > SHELL_PATH_SQ = $(subst ','\'',$(SHELL_PATH)) > > @@ -947,7 +948,8 @@ install: all > $(INSTALL) scripts/python/Perf-Trace-Util/lib/Perf/Trace/* -t > '$(DESTDIR_SQ)$(perfexec_instdir_SQ)/scripts/python/Perf-Trace-Util/lib/Perf/Trace' > $(INSTALL) scripts/python/*.py -t > '$(DESTDIR_SQ)$(perfexec_instdir_SQ)/scripts/python' > $(INSTALL) scripts/python/bin/* -t > '$(DESTDIR_SQ)$(perfexec_instdir_SQ)/scripts/python/bin' > - $(INSTALL) -m 755 bash_completion > $(DESTDIR_SQ)/etc/bash_completion.d/perf > + $(INSTALL) -d -m 755 '$(DESTDIR_SQ)$(sysconfdir_SQ)/bash_completion.d' > + $(INSTALL) bash_completion > '$(DESTDIR_SQ)$(sysconfdir_SQ)/bash_completion.d/perf' > > install-python_ext: > $(PYTHON_WORD) util/setup.py --quiet install --root='/$(DESTDIR_SQ)' -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[PATCH 1/3] perf tools: Initial bash completion support
This implements bash completion for perf subcommands such as record, report, script, probe, etc... Signed-off-by: Frederic Weisbecker Cc: David Ahern Cc: Ingo Molnar Cc: Jiri Olsa Cc: Namhyung Kim Cc: Peter Zijlstra Cc: Stephane Eranian --- tools/perf/Makefile|1 + tools/perf/bash_completion | 22 ++ tools/perf/perf.c | 69 +--- 3 files changed, 62 insertions(+), 30 deletions(-) create mode 100644 tools/perf/bash_completion diff --git a/tools/perf/Makefile b/tools/perf/Makefile index 2d4bf6e..84b4227 100644 --- a/tools/perf/Makefile +++ b/tools/perf/Makefile @@ -951,6 +951,7 @@ install: all $(INSTALL) scripts/python/Perf-Trace-Util/lib/Perf/Trace/* -t '$(DESTDIR_SQ)$(perfexec_instdir_SQ)/scripts/python/Perf-Trace-Util/lib/Perf/Trace' $(INSTALL) scripts/python/*.py -t '$(DESTDIR_SQ)$(perfexec_instdir_SQ)/scripts/python' $(INSTALL) scripts/python/bin/* -t '$(DESTDIR_SQ)$(perfexec_instdir_SQ)/scripts/python/bin' + $(INSTALL) -m 755 bash_completion $(DESTDIR_SQ)/etc/bash_completion.d/perf install-python_ext: $(PYTHON_WORD) util/setup.py --quiet install --root='/$(DESTDIR_SQ)' diff --git a/tools/perf/bash_completion b/tools/perf/bash_completion new file mode 100644 index 000..9a31fa5 --- /dev/null +++ b/tools/perf/bash_completion @@ -0,0 +1,22 @@ +# perf completion + +have perf && +_perf() +{ + local cur cmd + + COMPREPLY=() + _get_comp_words_by_ref cur + + cmd=${COMP_WORDS[0]} + + # List perf subcommands + if [ $COMP_CWORD -eq 1 ]; then + cmds=$($cmd --list-cmds) + COMPREPLY=( $( compgen -W '$cmds' -- "$cur" ) ) + # Fall down to list regular files + else + _filedir + fi +} && +complete -F _perf perf diff --git a/tools/perf/perf.c b/tools/perf/perf.c index 2b2e225..db37ee3 100644 --- a/tools/perf/perf.c +++ b/tools/perf/perf.c @@ -24,6 +24,37 @@ const char perf_more_info_string[] = int use_browser = -1; static int use_pager = -1; +struct cmd_struct { + const char *cmd; + int (*fn)(int, const char **, const char *); + int option; +}; + +static struct cmd_struct commands[] = { + { "buildid-cache", cmd_buildid_cache, 0 }, + { "buildid-list", cmd_buildid_list, 0 }, + { "diff", cmd_diff, 0 }, + { "evlist", cmd_evlist, 0 }, + { "help", cmd_help, 0 }, + { "list", cmd_list, 0 }, + { "record", cmd_record, 0 }, + { "report", cmd_report, 0 }, + { "bench", cmd_bench, 0 }, + { "stat", cmd_stat, 0 }, + { "timechart", cmd_timechart, 0 }, + { "top",cmd_top,0 }, + { "annotate", cmd_annotate, 0 }, + { "version",cmd_version,0 }, + { "script", cmd_script, 0 }, + { "sched", cmd_sched, 0 }, + { "probe", cmd_probe, 0 }, + { "kmem", cmd_kmem, 0 }, + { "lock", cmd_lock, 0 }, + { "kvm",cmd_kvm,0 }, + { "test", cmd_test, 0 }, + { "inject", cmd_inject, 0 }, +}; + struct pager_config { const char *cmd; int val; @@ -160,6 +191,14 @@ static int handle_options(const char ***argv, int *argc, int *envchanged) fprintf(stderr, "dir: %s\n", debugfs_mountpoint); if (envchanged) *envchanged = 1; + } else if (!strcmp(cmd, "--list-cmds")) { + unsigned int i; + + for (i = 0; i < ARRAY_SIZE(commands); i++) { + struct cmd_struct *p = commands+i; + printf("%s ", p->cmd); + } + exit(0); } else { fprintf(stderr, "Unknown option: %s\n", cmd); usage(perf_usage_string); @@ -245,12 +284,6 @@ const char perf_version_string[] = PERF_VERSION; */ #define NEED_WORK_TREE (1<<2) -struct cmd_struct { - const char *cmd; - int (*fn)(int, const char **, const char *); - int option; -}; - static int run_builtin(struct cmd_struct *p, int argc, const char **argv) { int status; @@ -296,30 +329,6 @@ static int run_builtin(struct cmd_struct *p, int argc, const char **argv) static void handle_internal_command(int argc, const char **argv) { const char *cmd = argv[0]; - static struct cm
[PATCH 0/3] perf tools: Basic bash completion support v3
Changes since v2: - Fix /etc config installation from Namhyung. Frederic Weisbecker (2): perf tools: Initial bash completion support perf tools: Support for events bash completion Namhyung Kim (1): perf tools: Fix /etc config related installation tools/perf/Makefile|3 ++ tools/perf/bash_completion | 26 +++ tools/perf/builtin-list.c | 14 --- tools/perf/perf.c | 69 ++- tools/perf/util/parse-events.c | 70 +--- tools/perf/util/parse-events.h |7 ++-- 6 files changed, 124 insertions(+), 65 deletions(-) create mode 100644 tools/perf/bash_completion -- 1.7.5.4 -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[PATCH 3/3] perf tools: Fix /etc config related installation
From: Namhyung Kim Fix missing /etc/bash_completion.d directory creation, otherwise the installation fails miserably on systems that don't have bash completion installed yet or on specific target: $ make DESTDIR=/tmp/junk-perf O=/tmp/pbuild -C tools/perf/ install ... install -m 755 bash_completion /tmp/junk-perf/etc/bash_completion.d/perf install: cannot create regular file `/tmp/junk-perf/etc/bash_completion.d/perf': No such file or directory make: *** [install] Error 1 make: Leaving directory `/opt/sw/ahern/perf.git/tools/perf' Also use sysconfdir variable instead of the hardcoded /etc to handle overriden conf directory. Reported-by: David Ahern Cc: David Ahern Cc: Ingo Molnar Cc: Jiri Olsa Cc: Namhyung Kim Cc: Peter Zijlstra Cc: Stephane Eranian Signed-off-by: Namhyung Kim Signed-off-by: Frederic Weisbecker --- tools/perf/Makefile |4 +++- 1 files changed, 3 insertions(+), 1 deletions(-) diff --git a/tools/perf/Makefile b/tools/perf/Makefile index 84b4227..a9458b9 100644 --- a/tools/perf/Makefile +++ b/tools/perf/Makefile @@ -700,6 +700,7 @@ perfexecdir_SQ = $(subst ','\'',$(perfexecdir)) template_dir_SQ = $(subst ','\'',$(template_dir)) htmldir_SQ = $(subst ','\'',$(htmldir)) prefix_SQ = $(subst ','\'',$(prefix)) +sysconfdir_SQ = $(subst ','\'',$(sysconfdir)) SHELL_PATH_SQ = $(subst ','\'',$(SHELL_PATH)) @@ -951,7 +952,8 @@ install: all $(INSTALL) scripts/python/Perf-Trace-Util/lib/Perf/Trace/* -t '$(DESTDIR_SQ)$(perfexec_instdir_SQ)/scripts/python/Perf-Trace-Util/lib/Perf/Trace' $(INSTALL) scripts/python/*.py -t '$(DESTDIR_SQ)$(perfexec_instdir_SQ)/scripts/python' $(INSTALL) scripts/python/bin/* -t '$(DESTDIR_SQ)$(perfexec_instdir_SQ)/scripts/python/bin' - $(INSTALL) -m 755 bash_completion $(DESTDIR_SQ)/etc/bash_completion.d/perf + $(INSTALL) -d -m 755 '$(DESTDIR_SQ)$(sysconfdir_SQ)/bash_completion.d' + $(INSTALL) bash_completion '$(DESTDIR_SQ)$(sysconfdir_SQ)/bash_completion.d/perf' install-python_ext: $(PYTHON_WORD) util/setup.py --quiet install --root='/$(DESTDIR_SQ)' -- 1.7.5.4 -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[PATCH 2/3] perf tools: Support for events bash completion
Add basic bash completion for the -e option in record, top and stat subcommands. Only hardware, software and tracepoint events are supported. Breakpoints, raw events and events grouping completion need more thinking. Signed-off-by: Frederic Weisbecker Cc: David Ahern Cc: Ingo Molnar Cc: Jiri Olsa Cc: Namhyung Kim Cc: Peter Zijlstra Cc: Stephane Eranian --- tools/perf/bash_completion |6 +++- tools/perf/builtin-list.c | 14 --- tools/perf/util/parse-events.c | 70 +--- tools/perf/util/parse-events.h |7 ++-- 4 files changed, 61 insertions(+), 36 deletions(-) diff --git a/tools/perf/bash_completion b/tools/perf/bash_completion index 9a31fa5..1958fa5 100644 --- a/tools/perf/bash_completion +++ b/tools/perf/bash_completion @@ -6,7 +6,7 @@ _perf() local cur cmd COMPREPLY=() - _get_comp_words_by_ref cur + _get_comp_words_by_ref cur prev cmd=${COMP_WORDS[0]} @@ -14,6 +14,10 @@ _perf() if [ $COMP_CWORD -eq 1 ]; then cmds=$($cmd --list-cmds) COMPREPLY=( $( compgen -W '$cmds' -- "$cur" ) ) + # List possible events for -e option + elif [[ $prev == "-e" && "${COMP_WORDS[1]}" == @(record|stat|top) ]]; then + cmds=$($cmd list --raw-dump) + COMPREPLY=( $( compgen -W '$cmds' -- "$cur" ) ) # Fall down to list regular files else _filedir diff --git a/tools/perf/builtin-list.c b/tools/perf/builtin-list.c index 6313b6e..bdcff81 100644 --- a/tools/perf/builtin-list.c +++ b/tools/perf/builtin-list.c @@ -19,15 +19,15 @@ int cmd_list(int argc, const char **argv, const char *prefix __used) setup_pager(); if (argc == 1) - print_events(NULL); + print_events(NULL, false); else { int i; for (i = 1; i < argc; ++i) { - if (i > 1) + if (i > 2) putchar('\n'); if (strncmp(argv[i], "tracepoint", 10) == 0) - print_tracepoint_events(NULL, NULL); + print_tracepoint_events(NULL, NULL, false); else if (strcmp(argv[i], "hw") == 0 || strcmp(argv[i], "hardware") == 0) print_events_type(PERF_TYPE_HARDWARE); @@ -36,13 +36,15 @@ int cmd_list(int argc, const char **argv, const char *prefix __used) print_events_type(PERF_TYPE_SOFTWARE); else if (strcmp(argv[i], "cache") == 0 || strcmp(argv[i], "hwcache") == 0) - print_hwcache_events(NULL); + print_hwcache_events(NULL, false); + else if (strcmp(argv[i], "--raw-dump") == 0) + print_events(NULL, true); else { char *sep = strchr(argv[i], ':'), *s; int sep_idx; if (sep == NULL) { - print_events(argv[i]); + print_events(argv[i], false); continue; } sep_idx = sep - argv[i]; @@ -51,7 +53,7 @@ int cmd_list(int argc, const char **argv, const char *prefix __used) return -1; s[sep_idx] = '\0'; - print_tracepoint_events(s, s + sep_idx + 1); + print_tracepoint_events(s, s + sep_idx + 1, false); free(s); } } diff --git a/tools/perf/util/parse-events.c b/tools/perf/util/parse-events.c index 8bdfa3e..3ec4bfc 100644 --- a/tools/perf/util/parse-events.c +++ b/tools/perf/util/parse-events.c @@ -799,7 +799,8 @@ static const char * const event_type_descriptors[] = { * Print the events from /tracing/events */ -void print_tracepoint_events(const char *subsys_glob, const char *event_glob) +void print_tracepoint_events(const char *subsys_glob, const char *event_glob, +bool name_only) { DIR *sys_dir, *evt_dir; struct dirent *sys_next, *evt_next, sys_dirent, evt_dirent; @@ -829,6 +830,11 @@ void print_tracepoint_events(const char *subsys_glob, const char *event_glob) !strglobmatch(evt_dirent.d_name, event_glob)) continue; + if (name_only) { + p
Re: [PATCH 1/3] perf tools: Initial bash completion support
On Thu, Aug 09, 2012 at 01:35:15PM -0300, Arnaldo Carvalho de Melo wrote: > Em Thu, Aug 09, 2012 at 04:31:51PM +0200, Frederic Weisbecker escreveu: > > This implements bash completion for perf subcommands such > > as record, report, script, probe, etc... > > Humm, I get this when doing my usual workflow: > > [acme@sandy linux]$ make -j8 -C tools/perf/ O=/home/acme/git/build/perf > install > make: Entering directory `/home/git/linux/tools/perf' > PERF_VERSION = 3.6.rc1.152.g5758f7 > > install -d -m 755 > '/home/acme/libexec/perf-core/scripts/python/Perf-Trace-Util/lib/Perf/Trace' > install -d -m 755 '/home/acme/libexec/perf-core/scripts/python/bin' > install scripts/python/Perf-Trace-Util/lib/Perf/Trace/* -t > '/home/acme/libexec/perf-core/scripts/python/Perf-Trace-Util/lib/Perf/Trace' > install scripts/python/*.py -t '/home/acme/libexec/perf-core/scripts/python' > install scripts/python/bin/* -t > '/home/acme/libexec/perf-core/scripts/python/bin' > install -m 755 bash_completion /etc/bash_completion.d/perf > install: cannot create regular file `/etc/bash_completion.d/perf': Permission > denied > make: *** [install] Error 1 > make: Leaving directory `/home/git/linux/tools/perf' > [acme@sandy linux]$ make -j8 -C tools/perf/ O=/home/acme/git/build/perf > install > > Shouldn't it install on ~/etc/bash_completion.d/perf ? Are you sure you have the third patch? > > Is there a way to have per user bash completion files like that? It seems that some manual tweaking is needed :( http://www.simplicidade.org/notes/archives/2008/02/bash_completion.html > > - Arnaldo -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 1/3] perf tools: Initial bash completion support
On Thu, Aug 09, 2012 at 02:11:22PM -0300, Arnaldo Carvalho de Melo wrote: > Em Thu, Aug 09, 2012 at 07:00:10PM +0200, Frederic Weisbecker escreveu: > > On Thu, Aug 09, 2012 at 01:35:15PM -0300, Arnaldo Carvalho de Melo wrote: > > > Em Thu, Aug 09, 2012 at 04:31:51PM +0200, Frederic Weisbecker escreveu: > > > > This implements bash completion for perf subcommands such > > > > as record, report, script, probe, etc... > > > > > > Humm, I get this when doing my usual workflow: > > > > > > [acme@sandy linux]$ make -j8 -C tools/perf/ O=/home/acme/git/build/perf > > > install > > > make: Entering directory `/home/git/linux/tools/perf' > > > PERF_VERSION = 3.6.rc1.152.g5758f7 > > > > > > install -d -m 755 > > > '/home/acme/libexec/perf-core/scripts/python/Perf-Trace-Util/lib/Perf/Trace' > > > install -d -m 755 '/home/acme/libexec/perf-core/scripts/python/bin' > > > install scripts/python/Perf-Trace-Util/lib/Perf/Trace/* -t > > > '/home/acme/libexec/perf-core/scripts/python/Perf-Trace-Util/lib/Perf/Trace' > > > install scripts/python/*.py -t > > > '/home/acme/libexec/perf-core/scripts/python' > > > install scripts/python/bin/* -t > > > '/home/acme/libexec/perf-core/scripts/python/bin' > > > install -m 755 bash_completion /etc/bash_completion.d/perf > > > install: cannot create regular file `/etc/bash_completion.d/perf': > > > Permission denied > > > make: *** [install] Error 1 > > > make: Leaving directory `/home/git/linux/tools/perf' > > > [acme@sandy linux]$ make -j8 -C tools/perf/ O=/home/acme/git/build/perf > > > install > > > > > > Shouldn't it install on ~/etc/bash_completion.d/perf ? > > > > Are you sure you have the third patch? > > So should I fold the third into the first? That's up to you. I kept the third patch seperate to let the credit to Namhyung. > > > > > > > Is there a way to have per user bash completion files like that? > > > > It seems that some manual tweaking is needed :( > > > > http://www.simplicidade.org/notes/archives/2008/02/bash_completion.html > > Will read. > > - Arnaldo -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 1/3] perf tools: Initial bash completion support
On Thu, Aug 09, 2012 at 02:14:19PM -0300, Arnaldo Carvalho de Melo wrote: > Em Thu, Aug 09, 2012 at 10:40:19AM -0600, David Ahern escreveu: > > On 8/9/12 10:35 AM, Arnaldo Carvalho de Melo wrote: > > >Em Thu, Aug 09, 2012 at 04:31:51PM +0200, Frederic Weisbecker escreveu: > > >>This implements bash completion for perf subcommands such > > >>as record, report, script, probe, etc... > > > > > >Humm, I get this when doing my usual workflow: > > > > > >[acme@sandy linux]$ make -j8 -C tools/perf/ O=/home/acme/git/build/perf > > >install > > >make: Entering directory `/home/git/linux/tools/perf' > > >PERF_VERSION = 3.6.rc1.152.g5758f7 > > > > > >install -d -m 755 > > >'/home/acme/libexec/perf-core/scripts/python/Perf-Trace-Util/lib/Perf/Trace' > > >install -d -m 755 '/home/acme/libexec/perf-core/scripts/python/bin' > > >install scripts/python/Perf-Trace-Util/lib/Perf/Trace/* -t > > >'/home/acme/libexec/perf-core/scripts/python/Perf-Trace-Util/lib/Perf/Trace' > > >install scripts/python/*.py -t > > >'/home/acme/libexec/perf-core/scripts/python' > > >install scripts/python/bin/* -t > > >'/home/acme/libexec/perf-core/scripts/python/bin' > > >install -m 755 bash_completion /etc/bash_completion.d/perf > > >install: cannot create regular file `/etc/bash_completion.d/perf': > > >Permission denied > > >make: *** [install] Error 1 > > >make: Leaving directory `/home/git/linux/tools/perf' > > >[acme@sandy linux]$ make -j8 -C tools/perf/ O=/home/acme/git/build/perf > > >install > > > > > > Shouldn't it install on ~/etc/bash_completion.d/perf ? > > > > > > Is there a way to have per user bash completion files like that? > > > > 3rd patch should fix this. > > Huh? The problem is not /etc/bash_completion.d/ not existing, it exists, > its just that I'm not using sudo nor installing as root, this new bash > completion file is the only one that is being installed on the root > filesystem, all others are in ~acme/ No the third patch handles sysconfdir which should take care of that: $ make -C tools/perf O=/home/fweisbec/build install make: entrant dans le répertoire « /home/fweisbec/linux-2.6-tip/tools/perf » make[1]: entrant dans le répertoire « /home/fweisbec/linux-2.6-tip/tools/lib/traceevent » make[1]: quittant le répertoire « /home/fweisbec/linux-2.6-tip/tools/lib/traceevent » LINK /home/fweisbec/build/perf GEN perf-archive install -d -m 755 '/home/fweisbec/bin' install /home/fweisbec/build/perf '/home/fweisbec/bin' install -d -m 755 '/home/fweisbec/libexec/perf-core/scripts/perl/Perf-Trace-Util/lib/Perf/Trace' install -d -m 755 '/home/fweisbec/libexec/perf-core/scripts/perl/bin' install /home/fweisbec/build/perf-archive -t '/home/fweisbec/libexec/perf-core' install scripts/perl/Perf-Trace-Util/lib/Perf/Trace/* -t '/home/fweisbec/libexec/perf-core/scripts/perl/Perf-Trace-Util/lib/Perf/Trace' install scripts/perl/*.pl -t '/home/fweisbec/libexec/perf-core/scripts/perl' install scripts/perl/bin/* -t '/home/fweisbec/libexec/perf-core/scripts/perl/bin' install -d -m 755 '/home/fweisbec/libexec/perf-core/scripts/python/Perf-Trace-Util/lib/Perf/Trace' install -d -m 755 '/home/fweisbec/libexec/perf-core/scripts/python/bin' install scripts/python/Perf-Trace-Util/lib/Perf/Trace/* -t '/home/fweisbec/libexec/perf-core/scripts/python/Perf-Trace-Util/lib/Perf/Trace' install scripts/python/*.py -t '/home/fweisbec/libexec/perf-core/scripts/python' install scripts/python/bin/* -t '/home/fweisbec/libexec/perf-core/scripts/python/bin' install -d -m 755 '/home/fweisbec/etc/bash_completion.d' install bash_completion '/home/fweisbec/etc/bash_completion.d/perf' make: quittant le répertoire « /home/fweisbec/linux-2.6-tip/tools/perf » -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 1/3] perf tools: Initial bash completion support
On Thu, Aug 09, 2012 at 04:08:19PM -0300, Arnaldo Carvalho de Melo wrote: > Em Thu, Aug 09, 2012 at 07:27:06PM +0100, Alan Cox escreveu: > > > > 3rd patch should fix this. > > > > > > Huh? The problem is not /etc/bash_completion.d/ not existing, it exists, > > > its just that I'm not using sudo nor installing as root, this new bash > > > completion file is the only one that is being installed on the root > > > filesystem, all others are in ~acme/ > > > > And even with permissions it might not have the right security labels on > > a well secured box. > > > > It's a neat little script (or once its been properly security audited > > will be) but IMHO it belongs in the distro bash script packages. > > Yeah, I think we can keep it in the kernel sources and then send new > versions to the bash-completion-de...@lists.alioth.debian.org guys. > > To test I just did: > > ln -s ~/etc/bash_completion.d/perf ~/.bash_completion > > Frédéric, I merged your patches as-is and pushed them to my perf/core > branch, thanks! Thanks! -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[PATCH 0/4] cputime: Virtual cputime accounting small cleanups and consolidation v2
Hi, No fundamental change in this release but a rebase to solve conflicts against latest tip:/sched/core commits. Thanks. Frederic Weisbecker (4): cputime: Generalize CONFIG_VIRT_CPU_ACCOUNTING sched: Move cputime code to its own file cputime: Consolidate vtime handling on context switch s390: Remove leftover account_tick_vtime() header arch/Kconfig |3 + arch/ia64/Kconfig | 12 +- arch/ia64/include/asm/switch_to.h |8 - arch/ia64/kernel/time.c|4 +- arch/powerpc/include/asm/time.h|6 - arch/powerpc/kernel/process.c |3 - arch/powerpc/kernel/time.c |6 + arch/powerpc/platforms/Kconfig.cputype | 16 +- arch/s390/Kconfig |5 +- arch/s390/include/asm/switch_to.h |4 - arch/s390/kernel/vtime.c |4 +- include/linux/kernel_stat.h|6 + init/Kconfig | 13 + kernel/sched/Makefile |2 +- kernel/sched/core.c| 558 +--- kernel/sched/cputime.c | 503 kernel/sched/sched.h | 63 17 files changed, 606 insertions(+), 610 deletions(-) create mode 100644 kernel/sched/cputime.c -- 1.7.5.4 -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[PATCH 1/4] cputime: Generalize CONFIG_VIRT_CPU_ACCOUNTING
S390, ia64 and powerpc all define their own version of CONFIG_VIRT_CPU_ACCOUNTING. Generalize the config and its description to a single place to avoid duplication. Signed-off-by: Frederic Weisbecker Cc: Tony Luck Cc: Fenghua Yu Cc: Benjamin Herrenschmidt Cc: Paul Mackerras Cc: Martin Schwidefsky Cc: Heiko Carstens Cc: Ingo Molnar Cc: Thomas Gleixner Cc: Peter Zijlstra --- arch/Kconfig |3 +++ arch/ia64/Kconfig | 12 +--- arch/powerpc/platforms/Kconfig.cputype | 16 +--- arch/s390/Kconfig |5 ++--- init/Kconfig | 13 + 5 files changed, 20 insertions(+), 29 deletions(-) diff --git a/arch/Kconfig b/arch/Kconfig index 72f2fa1..f78de57 100644 --- a/arch/Kconfig +++ b/arch/Kconfig @@ -281,4 +281,7 @@ config SECCOMP_FILTER See Documentation/prctl/seccomp_filter.txt for details. +config HAVE_VIRT_CPU_ACCOUNTING + bool + source "kernel/gcov/Kconfig" diff --git a/arch/ia64/Kconfig b/arch/ia64/Kconfig index 310cf57..3c720ef 100644 --- a/arch/ia64/Kconfig +++ b/arch/ia64/Kconfig @@ -25,6 +25,7 @@ config IA64 select HAVE_GENERIC_HARDIRQS select HAVE_MEMBLOCK select HAVE_MEMBLOCK_NODE_MAP + select HAVE_VIRT_CPU_ACCOUNTING select ARCH_DISCARD_MEMBLOCK select GENERIC_IRQ_PROBE select GENERIC_PENDING_IRQ if SMP @@ -340,17 +341,6 @@ config FORCE_MAX_ZONEORDER default "17" if HUGETLB_PAGE default "11" -config VIRT_CPU_ACCOUNTING - bool "Deterministic task and CPU time accounting" - default n - help - Select this option to enable more accurate task and CPU time - accounting. This is done by reading a CPU counter on each - kernel entry and exit and on transitions within the kernel - between system, softirq and hardirq state, so there is a - small performance impact. - If in doubt, say N here. - config SMP bool "Symmetric multi-processing support" select USE_GENERIC_SMP_HELPERS diff --git a/arch/powerpc/platforms/Kconfig.cputype b/arch/powerpc/platforms/Kconfig.cputype index 30fd01d..72afd28 100644 --- a/arch/powerpc/platforms/Kconfig.cputype +++ b/arch/powerpc/platforms/Kconfig.cputype @@ -1,6 +1,7 @@ config PPC64 bool "64-bit kernel" default n + select HAVE_VIRT_CPU_ACCOUNTING help This option selects whether a 32-bit or a 64-bit kernel will be built. @@ -337,21 +338,6 @@ config PPC_MM_SLICES default y if (!PPC_FSL_BOOK3E && PPC64 && HUGETLB_PAGE) || (PPC_STD_MMU_64 && PPC_64K_PAGES) default n -config VIRT_CPU_ACCOUNTING - bool "Deterministic task and CPU time accounting" - depends on PPC64 - default y - help - Select this option to enable more accurate task and CPU time - accounting. This is done by reading a CPU counter on each - kernel entry and exit and on transitions within the kernel - between system, softirq and hardirq state, so there is a - small performance impact. This also enables accounting of - stolen time on logically-partitioned systems running on - IBM POWER5-based machines. - - If in doubt, say Y here. - config PPC_HAVE_PMU_SUPPORT bool diff --git a/arch/s390/Kconfig b/arch/s390/Kconfig index 76de6b6..49ebfb6 100644 --- a/arch/s390/Kconfig +++ b/arch/s390/Kconfig @@ -49,9 +49,6 @@ config GENERIC_LOCKBREAK config PGSTE def_bool y if KVM -config VIRT_CPU_ACCOUNTING - def_bool y - config ARCH_SUPPORTS_DEBUG_PAGEALLOC def_bool y @@ -89,6 +86,8 @@ config S390 select HAVE_MEMBLOCK select HAVE_MEMBLOCK_NODE_MAP select HAVE_CMPXCHG_LOCAL + select HAVE_VIRT_CPU_ACCOUNTING + select VIRT_CPU_ACCOUNTING select ARCH_DISCARD_MEMBLOCK select BUILDTIME_EXTABLE_SORT select ARCH_INLINE_SPIN_TRYLOCK diff --git a/init/Kconfig b/init/Kconfig index af6c7f8..894b073 100644 --- a/init/Kconfig +++ b/init/Kconfig @@ -267,6 +267,19 @@ config POSIX_MQUEUE_SYSCTL depends on SYSCTL default y +config VIRT_CPU_ACCOUNTING + bool "Deterministic task and CPU time accounting" + depends on HAVE_VIRT_CPU_ACCOUNTING + default y if PPC64 + help + Select this option to enable more accurate task and CPU time + accounting. This is done by reading a CPU counter on each + kernel entry and exit and on transitions within the kernel + between system, softirq and hardirq state, so there is a + small performance impact. This also enables accounting of + stolen time on logically-partitioned systems running on + IBM POWER5-based machines. + config BSD_PROCESS_ACCT
[PATCH 3/4] cputime: Consolidate vtime handling on context switch
The archs that implement virtual cputime accounting all flush the cputime of a task when it gets descheduled and sometimes set up some ground initialization for the next task to account its cputime. These archs all put their own hooks in their context switch callbacks and handle the off-case themselves. Consolidate this by creating a new account_switch_vtime() callback called in generic code right after a context switch and that these archs must implement to flush the prev task cputime and initialize the next task cputime related state. Signed-off-by: Frederic Weisbecker Cc: Tony Luck Cc: Fenghua Yu Cc: Benjamin Herrenschmidt Cc: Paul Mackerras Cc: Martin Schwidefsky Cc: Heiko Carstens Cc: Ingo Molnar Cc: Thomas Gleixner Cc: Peter Zijlstra --- arch/ia64/include/asm/switch_to.h |8 arch/ia64/kernel/time.c |4 ++-- arch/powerpc/include/asm/time.h |6 -- arch/powerpc/kernel/process.c |3 --- arch/powerpc/kernel/time.c|6 ++ arch/s390/include/asm/switch_to.h |2 -- arch/s390/kernel/vtime.c |4 ++-- include/linux/kernel_stat.h |6 ++ kernel/sched/core.c |1 + 9 files changed, 17 insertions(+), 23 deletions(-) diff --git a/arch/ia64/include/asm/switch_to.h b/arch/ia64/include/asm/switch_to.h index cb2412f..d38c7ea 100644 --- a/arch/ia64/include/asm/switch_to.h +++ b/arch/ia64/include/asm/switch_to.h @@ -30,13 +30,6 @@ extern struct task_struct *ia64_switch_to (void *next_task); extern void ia64_save_extra (struct task_struct *task); extern void ia64_load_extra (struct task_struct *task); -#ifdef CONFIG_VIRT_CPU_ACCOUNTING -extern void ia64_account_on_switch (struct task_struct *prev, struct task_struct *next); -# define IA64_ACCOUNT_ON_SWITCH(p,n) ia64_account_on_switch(p,n) -#else -# define IA64_ACCOUNT_ON_SWITCH(p,n) -#endif - #ifdef CONFIG_PERFMON DECLARE_PER_CPU(unsigned long, pfm_syst_info); # define PERFMON_IS_SYSWIDE() (__get_cpu_var(pfm_syst_info) & 0x1) @@ -49,7 +42,6 @@ extern void ia64_account_on_switch (struct task_struct *prev, struct task_struct || PERFMON_IS_SYSWIDE()) #define __switch_to(prev,next,last) do { \ - IA64_ACCOUNT_ON_SWITCH(prev, next); \ if (IA64_HAS_EXTRA_STATE(prev)) \ ia64_save_extra(prev); \ if (IA64_HAS_EXTRA_STATE(next)) \ diff --git a/arch/ia64/kernel/time.c b/arch/ia64/kernel/time.c index ecc904b..6247197 100644 --- a/arch/ia64/kernel/time.c +++ b/arch/ia64/kernel/time.c @@ -88,10 +88,10 @@ extern cputime_t cycle_to_cputime(u64 cyc); * accumulated times to the current process, and to prepare accounting on * the next process. */ -void ia64_account_on_switch(struct task_struct *prev, struct task_struct *next) +void account_switch_vtime(struct task_struct *prev) { struct thread_info *pi = task_thread_info(prev); - struct thread_info *ni = task_thread_info(next); + struct thread_info *ni = task_thread_info(current); cputime_t delta_stime, delta_utime; __u64 now; diff --git a/arch/powerpc/include/asm/time.h b/arch/powerpc/include/asm/time.h index 3b4b4a8..c1f2676 100644 --- a/arch/powerpc/include/asm/time.h +++ b/arch/powerpc/include/asm/time.h @@ -197,12 +197,6 @@ struct cpu_usage { DECLARE_PER_CPU(struct cpu_usage, cpu_usage_array); -#if defined(CONFIG_VIRT_CPU_ACCOUNTING) -#define account_process_vtime(tsk) account_process_tick(tsk, 0) -#else -#define account_process_vtime(tsk) do { } while (0) -#endif - extern void secondary_cpu_time_init(void); DECLARE_PER_CPU(u64, decrementers_next_tb); diff --git a/arch/powerpc/kernel/process.c b/arch/powerpc/kernel/process.c index 710f400..d73fa99 100644 --- a/arch/powerpc/kernel/process.c +++ b/arch/powerpc/kernel/process.c @@ -514,9 +514,6 @@ struct task_struct *__switch_to(struct task_struct *prev, local_irq_save(flags); - account_system_vtime(current); - account_process_vtime(current); - /* * We can't take a PMU exception inside _switch() since there is a * window where the kernel stack SLB and the kernel stack are out diff --git a/arch/powerpc/kernel/time.c b/arch/powerpc/kernel/time.c index be171ee..49da7f0 100644 --- a/arch/powerpc/kernel/time.c +++ b/arch/powerpc/kernel/time.c @@ -366,6 +366,12 @@ void account_process_tick(struct task_struct *tsk, int user_tick) account_user_time(tsk, utime, utimescaled); } +void account_switch_vtime(struct task_struct *prev) +{ + account_system_vtime(prev); + account_process_tick(prev, 0); +} + #else /* ! CONFIG_VIRT_CPU_ACCOUNTING */ #define calc_cputime_factors() #endif d
[PATCH 2/4] sched: Move cputime code to its own file
Extract cputime code from the giant sched/core.c and put it in its own file. This make it easier to deal with this particular area and de-bloat a bit more core.c Signed-off-by: Frederic Weisbecker Cc: Tony Luck Cc: Fenghua Yu Cc: Benjamin Herrenschmidt Cc: Paul Mackerras Cc: Martin Schwidefsky Cc: Heiko Carstens Cc: Ingo Molnar Cc: Thomas Gleixner Cc: Peter Zijlstra --- kernel/sched/Makefile |2 +- kernel/sched/core.c| 557 +--- kernel/sched/cputime.c | 503 +++ kernel/sched/sched.h | 63 ++ 4 files changed, 569 insertions(+), 556 deletions(-) create mode 100644 kernel/sched/cputime.c diff --git a/kernel/sched/Makefile b/kernel/sched/Makefile index 173ea52..f06d249 100644 --- a/kernel/sched/Makefile +++ b/kernel/sched/Makefile @@ -11,7 +11,7 @@ ifneq ($(CONFIG_SCHED_OMIT_FRAME_POINTER),y) CFLAGS_core.o := $(PROFILING) -fno-omit-frame-pointer endif -obj-y += core.o clock.o idle_task.o fair.o rt.o stop_task.o +obj-y += core.o clock.o cputime.o idle_task.o fair.o rt.o stop_task.o obj-$(CONFIG_SMP) += cpupri.o obj-$(CONFIG_SCHED_AUTOGROUP) += auto_group.o obj-$(CONFIG_SCHEDSTATS) += stats.o diff --git a/kernel/sched/core.c b/kernel/sched/core.c index 4376c9f..ae3bcaa 100644 --- a/kernel/sched/core.c +++ b/kernel/sched/core.c @@ -740,126 +740,6 @@ void deactivate_task(struct rq *rq, struct task_struct *p, int flags) dequeue_task(rq, p, flags); } -#ifdef CONFIG_IRQ_TIME_ACCOUNTING - -/* - * There are no locks covering percpu hardirq/softirq time. - * They are only modified in account_system_vtime, on corresponding CPU - * with interrupts disabled. So, writes are safe. - * They are read and saved off onto struct rq in update_rq_clock(). - * This may result in other CPU reading this CPU's irq time and can - * race with irq/account_system_vtime on this CPU. We would either get old - * or new value with a side effect of accounting a slice of irq time to wrong - * task when irq is in progress while we read rq->clock. That is a worthy - * compromise in place of having locks on each irq in account_system_time. - */ -static DEFINE_PER_CPU(u64, cpu_hardirq_time); -static DEFINE_PER_CPU(u64, cpu_softirq_time); - -static DEFINE_PER_CPU(u64, irq_start_time); -static int sched_clock_irqtime; - -void enable_sched_clock_irqtime(void) -{ - sched_clock_irqtime = 1; -} - -void disable_sched_clock_irqtime(void) -{ - sched_clock_irqtime = 0; -} - -#ifndef CONFIG_64BIT -static DEFINE_PER_CPU(seqcount_t, irq_time_seq); - -static inline void irq_time_write_begin(void) -{ - __this_cpu_inc(irq_time_seq.sequence); - smp_wmb(); -} - -static inline void irq_time_write_end(void) -{ - smp_wmb(); - __this_cpu_inc(irq_time_seq.sequence); -} - -static inline u64 irq_time_read(int cpu) -{ - u64 irq_time; - unsigned seq; - - do { - seq = read_seqcount_begin(&per_cpu(irq_time_seq, cpu)); - irq_time = per_cpu(cpu_softirq_time, cpu) + - per_cpu(cpu_hardirq_time, cpu); - } while (read_seqcount_retry(&per_cpu(irq_time_seq, cpu), seq)); - - return irq_time; -} -#else /* CONFIG_64BIT */ -static inline void irq_time_write_begin(void) -{ -} - -static inline void irq_time_write_end(void) -{ -} - -static inline u64 irq_time_read(int cpu) -{ - return per_cpu(cpu_softirq_time, cpu) + per_cpu(cpu_hardirq_time, cpu); -} -#endif /* CONFIG_64BIT */ - -/* - * Called before incrementing preempt_count on {soft,}irq_enter - * and before decrementing preempt_count on {soft,}irq_exit. - */ -void account_system_vtime(struct task_struct *curr) -{ - unsigned long flags; - s64 delta; - int cpu; - - if (!sched_clock_irqtime) - return; - - local_irq_save(flags); - - cpu = smp_processor_id(); - delta = sched_clock_cpu(cpu) - __this_cpu_read(irq_start_time); - __this_cpu_add(irq_start_time, delta); - - irq_time_write_begin(); - /* -* We do not account for softirq time from ksoftirqd here. -* We want to continue accounting softirq time to ksoftirqd thread -* in that case, so as not to confuse scheduler with a special task -* that do not consume any time, but still wants to run. -*/ - if (hardirq_count()) - __this_cpu_add(cpu_hardirq_time, delta); - else if (in_serving_softirq() && curr != this_cpu_ksoftirqd()) - __this_cpu_add(cpu_softirq_time, delta); - - irq_time_write_end(); - local_irq_restore(flags); -} -EXPORT_SYMBOL_GPL(account_system_vtime); - -#endif /* CONFIG_IRQ_TIME_ACCOUNTING */ - -#ifdef CONFIG_PARAVIRT -static inline u64 steal_ticks(u64 steal) -{ - if (unlikely(steal > NSEC_PER_SEC)) - return div_u64(steal, TICK_NSEC); - - return __iter_div_u64_rem(steal, TICK_NSEC, &st
[PATCH 4/4] s390: Remove leftover account_tick_vtime() header
The function doesn't seem to exist anymore. Signed-off-by: Frederic Weisbecker Cc: Tony Luck Cc: Fenghua Yu Cc: Benjamin Herrenschmidt Cc: Paul Mackerras Cc: Martin Schwidefsky Cc: Heiko Carstens Cc: Ingo Molnar Cc: Thomas Gleixner Cc: Peter Zijlstra --- arch/s390/include/asm/switch_to.h |2 -- 1 files changed, 0 insertions(+), 2 deletions(-) diff --git a/arch/s390/include/asm/switch_to.h b/arch/s390/include/asm/switch_to.h index e7f9b3d..314cc94 100644 --- a/arch/s390/include/asm/switch_to.h +++ b/arch/s390/include/asm/switch_to.h @@ -89,8 +89,6 @@ static inline void restore_access_regs(unsigned int *acrs) prev = __switch_to(prev,next); \ } while (0) -extern void account_tick_vtime(struct task_struct *); - #define finish_arch_switch(prev) do { \ set_fs(current->thread.mm_segment); \ } while (0) -- 1.7.5.4 -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 0/4] cputime: Virtual cputime accounting small cleanups and consolidation v2
On Tue, Aug 14, 2012 at 04:16:46PM +0200, Frederic Weisbecker wrote: > Hi, > > No fundamental change in this release but a rebase to solve conflicts > against latest tip:/sched/core commits. > > Thanks. This can be pulled from: git://github.com/fweisbec/linux-dynticks.git virt-cputime-v2 This patchset, besides beeing a desired consolidation and cleanup IMO, is necessary for the adaptive nohz feature (see: http://comments.gmane.org/gmane.linux.kernel/1337690) Thanks. > > Frederic Weisbecker (4): > cputime: Generalize CONFIG_VIRT_CPU_ACCOUNTING > sched: Move cputime code to its own file > cputime: Consolidate vtime handling on context switch > s390: Remove leftover account_tick_vtime() header > > arch/Kconfig |3 + > arch/ia64/Kconfig | 12 +- > arch/ia64/include/asm/switch_to.h |8 - > arch/ia64/kernel/time.c|4 +- > arch/powerpc/include/asm/time.h|6 - > arch/powerpc/kernel/process.c |3 - > arch/powerpc/kernel/time.c |6 + > arch/powerpc/platforms/Kconfig.cputype | 16 +- > arch/s390/Kconfig |5 +- > arch/s390/include/asm/switch_to.h |4 - > arch/s390/kernel/vtime.c |4 +- > include/linux/kernel_stat.h|6 + > init/Kconfig | 13 + > kernel/sched/Makefile |2 +- > kernel/sched/core.c| 558 > +--- > kernel/sched/cputime.c | 503 > kernel/sched/sched.h | 63 > 17 files changed, 606 insertions(+), 610 deletions(-) > create mode 100644 kernel/sched/cputime.c > > -- > 1.7.5.4 > -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH v3] Hardware breakpoints: Invoke __perf_event_disable() if interrupts are already disabled
On Wed, Aug 15, 2012 at 11:07:01PM +0530, Naveen N. Rao wrote: > Hi Frederick, > Did you get a chance to take a look at this? > > Regards, > Naveen Yeah, I'm ok with the patch. Peter, are you ok with it? Thanks. -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 1/4] cputime: Generalize CONFIG_VIRT_CPU_ACCOUNTING
On Wed, Aug 15, 2012 at 05:03:47PM +0200, Martin Schwidefsky wrote: > On Tue, 14 Aug 2012 16:16:47 +0200 > Frederic Weisbecker wrote: > > > S390, ia64 and powerpc all define their own version > > of CONFIG_VIRT_CPU_ACCOUNTING. Generalize the config > > and its description to a single place to avoid > > duplication. > > For S390 CONFIG_VIRT_CPU_ACCOUNTING is not configurable, it is always > enabled. With this patch we'd get a config option in the menu, no? Indeed it now appears in the menu but in the case of s390, it's impossible to turn it off due to: config S390 select VIRT_CPU_ACCOUNTING This creates a strict dependency that the user can't override. The option is untoggable. -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 3/4] cputime: Consolidate vtime handling on context switch
On Wed, Aug 15, 2012 at 05:22:19PM +0200, Martin Schwidefsky wrote: > On Tue, 14 Aug 2012 16:16:49 +0200 > Frederic Weisbecker wrote: > > > The archs that implement virtual cputime accounting all > > flush the cputime of a task when it gets descheduled > > and sometimes set up some ground initialization for the > > next task to account its cputime. > > > > These archs all put their own hooks in their context > > switch callbacks and handle the off-case themselves. > > > > Consolidate this by creating a new account_switch_vtime() > > callback called in generic code right after a context switch > > and that these archs must implement to flush the prev task > > cputime and initialize the next task cputime related state. > > That change requires that the accounting for the previous process > can be done before finish_arch_switch() completed. With the old > code the architecture could to the accounting call in the middle > of finish_arch_switch, that is not possible anymore. Dunno if this > is relevant or not. For s390 the new code should work fine. I'm not sure how this could potentially cause a problem. Interrupts are disabled between while we switch_to() until finish_lock_switch(). So nothing should be able to mess up with the accounting of the prev task. I don't really understand what you mean actually. Thanks. -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 3/4] cputime: Consolidate vtime handling on context switch
On Thu, Aug 16, 2012 at 09:50:32AM +0200, Martin Schwidefsky wrote: > On Wed, 15 Aug 2012 21:28:17 +0200 > Frederic Weisbecker wrote: > > > On Wed, Aug 15, 2012 at 05:22:19PM +0200, Martin Schwidefsky wrote: > > > On Tue, 14 Aug 2012 16:16:49 +0200 > > > Frederic Weisbecker wrote: > > > > > > > The archs that implement virtual cputime accounting all > > > > flush the cputime of a task when it gets descheduled > > > > and sometimes set up some ground initialization for the > > > > next task to account its cputime. > > > > > > > > These archs all put their own hooks in their context > > > > switch callbacks and handle the off-case themselves. > > > > > > > > Consolidate this by creating a new account_switch_vtime() > > > > callback called in generic code right after a context switch > > > > and that these archs must implement to flush the prev task > > > > cputime and initialize the next task cputime related state. > > > > > > That change requires that the accounting for the previous process > > > can be done before finish_arch_switch() completed. With the old > > > code the architecture could to the accounting call in the middle > > > of finish_arch_switch, that is not possible anymore. Dunno if this > > > is relevant or not. For s390 the new code should work fine. > > > > I'm not sure how this could potentially cause a problem. Interrupts are > > disabled > > between while we switch_to() until finish_lock_switch(). So nothing > > should be able to mess up with the accounting of the prev task. > > > > I don't really understand what you mean actually. > > It is more a theoretical consideration. If the finish_arch_switch code > updates fields that are required to do the cputime accounting then the > order could be important. But then you could move that necessary code > from finish_arch_switch to account_switch_vtime. > As said that change is fine for s390, so I'm good with it. Ah ok. Well like you said this is fine for s390. And it looks also fine to me on ia64 and powerpc as it doesn't look like we depend on something done in finish_arch_switch() there. They were flush the previous task cputime from switch_to() anyway. Thanks. PS: can I add your ack? > > -- > blue skies, >Martin. > > "Reality continues to ruin my life." - Calvin. > -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 1/4] cputime: Generalize CONFIG_VIRT_CPU_ACCOUNTING
On Thu, Aug 16, 2012 at 07:38:17PM +1000, Benjamin Herrenschmidt wrote: > On Thu, 2012-08-16 at 09:53 +0200, Martin Schwidefsky wrote: > > Hmm, ok. But then the description should be reworded not to be specific to > > the power architecture (the part of the message about "This also enables > > accounting of stolen time on logically-partitioned systems running on IBM > > POWER5-based machines."). > > Which is not very helpful to somebody running on a POWER6 or 7 (which > also support that option just fine :-) > > So yes, the description should definitely be improved. All right. How about something like the below? diff --git a/init/Kconfig b/init/Kconfig index 894b073..5f5f8c2 100644 --- a/init/Kconfig +++ b/init/Kconfig @@ -276,9 +276,9 @@ config VIRT_CPU_ACCOUNTING accounting. This is done by reading a CPU counter on each kernel entry and exit and on transitions within the kernel between system, softirq and hardirq state, so there is a - small performance impact. This also enables accounting of - stolen time on logically-partitioned systems running on - IBM POWER5-based machines. + small performance impact. In the case of IBM POWER > 5, this + also enables accounting of stolen time on logically-partitioned + systems. config BSD_PROCESS_ACCT bool "BSD Process Accounting" -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 1/4] cputime: Generalize CONFIG_VIRT_CPU_ACCOUNTING
On Thu, Aug 16, 2012 at 04:00:44PM +0200, Martin Schwidefsky wrote: > On Thu, 16 Aug 2012 14:55:59 +0200 > Frederic Weisbecker wrote: > > > On Thu, Aug 16, 2012 at 07:38:17PM +1000, Benjamin Herrenschmidt wrote: > > > On Thu, 2012-08-16 at 09:53 +0200, Martin Schwidefsky wrote: > > > > Hmm, ok. But then the description should be reworded not to be specific > > > > to > > > > the power architecture (the part of the message about "This also enables > > > > accounting of stolen time on logically-partitioned systems running on > > > > IBM > > > > POWER5-based machines."). > > > > > > Which is not very helpful to somebody running on a POWER6 or 7 (which > > > also support that option just fine :-) > > > > > > So yes, the description should definitely be improved. > > > > All right. How about something like the below? > > > > diff --git a/init/Kconfig b/init/Kconfig > > index 894b073..5f5f8c2 100644 > > --- a/init/Kconfig > > +++ b/init/Kconfig > > @@ -276,9 +276,9 @@ config VIRT_CPU_ACCOUNTING > > accounting. This is done by reading a CPU counter on each > > kernel entry and exit and on transitions within the kernel > > between system, softirq and hardirq state, so there is a > > - small performance impact. This also enables accounting of > > - stolen time on logically-partitioned systems running on > > - IBM POWER5-based machines. > > + small performance impact. In the case of IBM POWER > 5, this > > + also enables accounting of stolen time on logically-partitioned > > + systems. > > > > config BSD_PROCESS_ACCT > > bool "BSD Process Accounting" > > > > VIRT_CPU_ACCOUNTING will enable steal time for s390 as well. Ah right. Fixed below: diff --git a/init/Kconfig b/init/Kconfig index 894b073..c40d0fb 100644 --- a/init/Kconfig +++ b/init/Kconfig @@ -276,9 +276,9 @@ config VIRT_CPU_ACCOUNTING accounting. This is done by reading a CPU counter on each kernel entry and exit and on transitions within the kernel between system, softirq and hardirq state, so there is a - small performance impact. This also enables accounting of - stolen time on logically-partitioned systems running on - IBM POWER5-based machines. + small performance impact. In the case of s390 or IBM POWER > 5, + this also enables accounting of stolen time on logically-partitioned + systems. config BSD_PROCESS_ACCT bool "BSD Process Accounting" -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 03/17] perf, x86: Add copy_from_user_nmi_nochk for best effort copy
On Sun, Jul 22, 2012 at 02:14:26PM +0200, Jiri Olsa wrote: > Adding copy_from_user_nmi_nochk that provides the best effort > copy regardless the requesting size crossing the task boundary. > > This is going to be useful for stack dump we need in post > DWARF CFI based unwind, where we have predefined size of > the user stack to dump, and we need to store the most of > the requested dump size, regardless this size is crossing > the task boundary. What does that imply when we cross this limit? Are we still in the task stack? -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 01/17] perf: Unified API to record selective sets of arch registers
On Sun, Jul 22, 2012 at 02:14:24PM +0200, Jiri Olsa wrote: > This brings a new API to help the selective dump of registers on > event sampling, and its implementation for x86 arch. > > Added HAVE_PERF_REGS config option to determine if the architecture > provides perf registers ABI. > > The information about desired registers will be passed in u64 mask. > It's up to the architecture to map the registers into the mask bits. > > For the x86 arch implementation, both 32 and 64 bit registers > bits are defined within single enum to ensure 64 bit system can > provide register dump for compat task if needed in the future. > > Signed-off-by: Jiri Olsa > Original-patch-by: Frederic Weisbecker Acked-by: Frederic Weisbecker -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 02/17] perf: Add ability to attach user level registers dump to sample
On Sun, Jul 22, 2012 at 02:14:25PM +0200, Jiri Olsa wrote: > Introducing PERF_SAMPLE_REGS_USER sample type bit to trigger > the dump of user level registers on sample. Registers we want > to dump are specified by sample_regs_user bitmask. > > Only user level registers are dumped at the moment. Meaning the > register values of the user space context as it was before the > user entered the kernel for whatever reason (syscall, irq, > exception, or a PMI happening in userspace). > > The layout of the sample_regs_user bitmap is described in > asm/perf_regs.h for archs that support register dump. > > This is going to be useful to bring Dwarf CFI based stack > unwinding on top of samples. > > Signed-off-by: Jiri Olsa > Original-patch-by: Frederic Weisbecker Acked-by: Frederic Weisbecker -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 06/17] perf: Add ability to attach user stack dump to sample
On Sun, Jul 22, 2012 at 02:14:29PM +0200, Jiri Olsa wrote: > Introducing PERF_SAMPLE_STACK_USER sample type bit to trigger > the dump of the user level stack on sample. The size of the > dump is specified by sample_stack_user value. > > Being able to dump parts of the user stack, starting from the > stack pointer, will be useful to make a post mortem dwarf CFI > based stack unwinding. > > Signed-off-by: Jiri Olsa > Signed-off-by: Frederic Weisbecker If you keep the SOB of the author then you need to preserve its authorship (git am --author= / git commit --amend --author=). Unless you changed the patch significantly enough that you simply credit with something like "Original-patch-by" and you become the author. This is left to personal appreciation, I won't mind in any case. But there is no middle ground :) You also need to keep the SOB chain in order. The above SOB chain suggests I'm carrying a patch from you. Just saying that so that you make the maintainers job easier ;) -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 14/17] perf, tool: Support for dwarf cfi unwinding on post processing
On Sun, Jul 22, 2012 at 02:14:37PM +0200, Jiri Olsa wrote: > This brings the support for dwarf cfi unwinding on perf post > processing. Call frame informations are retrieved and then passed > to libunwind that requests memory and register content from the > applications. > > Adding unwind object to handle the user stack backtrace based > on the user register values and user stack dump. > > The unwind object access the libunwind via remote interface > and provides to it all the necessary data to unwind the stack. > > The unwind interface provides following function: > unwind__get_entries > > And callback (specified in above function) to retrieve > the backtrace entries: > typedef int (*unwind_entry_cb_t)(struct unwind_entry *entry, >void *arg); > > Signed-off-by: Jiri Olsa > Signed-off-by: Frederic Weisbecker > --- > tools/perf/Makefile|2 + > tools/perf/arch/x86/Makefile |3 + > tools/perf/arch/x86/util/unwind.c | 111 > tools/perf/builtin-report.c| 24 +- > tools/perf/builtin-script.c| 16 +- > tools/perf/builtin-top.c |5 +- > tools/perf/util/include/linux/compiler.h |1 + > tools/perf/util/map.h |7 +- > .../perf/util/scripting-engines/trace-event-perl.c |3 +- > .../util/scripting-engines/trace-event-python.c|3 +- > tools/perf/util/session.c | 61 ++- > tools/perf/util/session.h |3 +- > tools/perf/util/trace-event-scripting.c|3 +- > tools/perf/util/trace-event.h |5 +- > tools/perf/util/unwind.c | 567 > > tools/perf/util/unwind.h | 34 ++ > 16 files changed, 811 insertions(+), 37 deletions(-) > create mode 100644 tools/perf/arch/x86/util/unwind.c > create mode 100644 tools/perf/util/unwind.c > create mode 100644 tools/perf/util/unwind.h > > diff --git a/tools/perf/Makefile b/tools/perf/Makefile > index d0c3291..c18c790 100644 > --- a/tools/perf/Makefile > +++ b/tools/perf/Makefile > @@ -328,6 +328,7 @@ LIB_H += util/cgroup.h > LIB_H += $(TRACE_EVENT_DIR)event-parse.h > LIB_H += util/target.h > LIB_H += util/perf_regs.h > +LIB_H += util/unwind.h > > LIB_OBJS += $(OUTPUT)util/abspath.o > LIB_OBJS += $(OUTPUT)util/alias.o > @@ -513,6 +514,7 @@ else > EXTLIBS += $(LIBUNWIND_LIBS) > BASIC_CFLAGS := $(LIBUNWIND_CFLAGS) $(BASIC_CFLAGS) > BASIC_LDFLAGS := $(LIBUNWIND_LDFLAGS) $(BASIC_LDFLAGS) > + LIB_OBJS += $(OUTPUT)util/unwind.o > endif > > ifdef NO_NEWT > diff --git a/tools/perf/arch/x86/Makefile b/tools/perf/arch/x86/Makefile > index 744e629..815841c 100644 > --- a/tools/perf/arch/x86/Makefile > +++ b/tools/perf/arch/x86/Makefile > @@ -2,4 +2,7 @@ ifndef NO_DWARF > PERF_HAVE_DWARF_REGS := 1 > LIB_OBJS += $(OUTPUT)arch/$(ARCH)/util/dwarf-regs.o > endif > +ifndef NO_LIBUNWIND > +LIB_OBJS += $(OUTPUT)arch/$(ARCH)/util/unwind.o > +endif > LIB_OBJS += $(OUTPUT)arch/$(ARCH)/util/header.o > diff --git a/tools/perf/arch/x86/util/unwind.c > b/tools/perf/arch/x86/util/unwind.c > new file mode 100644 > index 000..78d956e > --- /dev/null > +++ b/tools/perf/arch/x86/util/unwind.c > @@ -0,0 +1,111 @@ > + > +#include > +#include > +#include "perf_regs.h" > +#include "../../util/unwind.h" > + > +#ifdef ARCH_X86_64 > +int unwind__arch_reg_id(int regnum) Please try to avoid __ in function names. We used that convention before but we gave up because that's actually more painful than anything. -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 16/17] perf, tool: Add dso data caching
On Sun, Jul 22, 2012 at 02:14:39PM +0200, Jiri Olsa wrote: > Adding dso data caching so we don't need to open/read/close, > each time we want dso data. > > The DSO data caching affects following functions: > dso__data_read_offset > dso__data_read_addr > > Each DSO read tries to find the data (based on offset) inside > the cache. If it's not present it fills the cache from file, > and returns the data. If it is present, data are returned > with no file read. > > Each data read is cached by reading cache page sized/aligned > amount of DSO data. The cache page size is hardcoded to 4096. > The cache is using RB tree with file offset as a sort key. > > Signed-off-by: Jiri Olsa Nice idea. > --- > tools/perf/util/symbol.c | 154 > -- There seem to be an increasing need to move dso related things to some util/dso.c file. Just suggesting. -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCHv7 00/17] perf: Add backtrace post dwarf unwind
On Sun, Jul 22, 2012 at 02:14:23PM +0200, Jiri Olsa wrote: > hi, > > patches available also as tarball in here: > http://people.redhat.com/~jolsa/perf_post_unwind_v7.tar.bz2 > > v7 changes: >- omitted v6 patches 9 and 15 > They need more work and will be sent separately. I dont want to hold off > whole > patchset because of them. We could miss some related backtraces > (syscall, vdso) > in this version. >- v6 patch 11, 14, 20 already in I'm personally ok with the kernel bits. And the tool bits look like a nice base to work on. If nobody has a strong opposition, it would be nice to merge this in -tip. Either in perf/core or in some staging tree. So that we continue incrementally. Nice work overall, thanks! -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 14/17] perf, tool: Support for dwarf cfi unwinding on post processing
On Wed, Jul 25, 2012 at 02:16:55PM -0300, Arnaldo Carvalho de Melo wrote: > Em Wed, Jul 25, 2012 at 07:05:33PM +0200, Frederic Weisbecker escreveu: > > > +#ifdef ARCH_X86_64 > > > +int unwind__arch_reg_id(int regnum) > > > > Please try to avoid __ in function names. We used that convention > > before but we gave up because that's actually more painful than > > anything. > > Well, I continue using it to separate the struct operated by the > function from the function name. As you prefer. I personally don't like it much because when I grep for some function I have in mind, I stick on finding the right underscore layout :) -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 03/17] perf, x86: Add copy_from_user_nmi_nochk for best effort copy
On Wed, Jul 25, 2012 at 07:16:43PM +0200, Jiri Olsa wrote: > On Wed, Jul 25, 2012 at 06:11:53PM +0200, Frederic Weisbecker wrote: > > On Sun, Jul 22, 2012 at 02:14:26PM +0200, Jiri Olsa wrote: > > > Adding copy_from_user_nmi_nochk that provides the best effort > > > copy regardless the requesting size crossing the task boundary. > > > > > > This is going to be useful for stack dump we need in post > > > DWARF CFI based unwind, where we have predefined size of > > > the user stack to dump, and we need to store the most of > > > the requested dump size, regardless this size is crossing > > > the task boundary. > > > > What does that imply when we cross this limit? Are we still in the > > task stack? > > We store all we could from 'stack pointer' to 'stack pointer' + dump size. > > I discussed this with Oleg and we could probably find vma for the 'stack > pointer' > and check for its size and narrow the dump - maybe more complex, but probably > faster > in comparison with dumping pages we're not interested in. Ah, that's because the user stack can be larger than TASK_SIZE, right? Ok then. -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 03/17] perf, x86: Add copy_from_user_nmi_nochk for best effort copy
On Wed, Jul 25, 2012 at 07:30:31PM +0200, Jiri Olsa wrote: > On Wed, Jul 25, 2012 at 07:16:43PM +0200, Jiri Olsa wrote: > > On Wed, Jul 25, 2012 at 06:11:53PM +0200, Frederic Weisbecker wrote: > > > On Sun, Jul 22, 2012 at 02:14:26PM +0200, Jiri Olsa wrote: > > > > Adding copy_from_user_nmi_nochk that provides the best effort > > > > copy regardless the requesting size crossing the task boundary. > > > > > > > > This is going to be useful for stack dump we need in post > > > > DWARF CFI based unwind, where we have predefined size of > > > > the user stack to dump, and we need to store the most of > > > > the requested dump size, regardless this size is crossing > > > > the task boundary. > > > > > > What does that imply when we cross this limit? Are we still in the > > > task stack? > > > > We store all we could from 'stack pointer' to 'stack pointer' + dump size. > > > > I discussed this with Oleg and we could probably find vma for the 'stack > > pointer' > > and check for its size and narrow the dump - maybe more complex, but > > probably faster > > in comparison with dumping pages we're not interested in. > > > > thanks, > > jirka > > I can send this update later together with vdso > and 'syscall regs storage' features ;) Sure! As long as we are fine with the kernel ABI, the rest can be done incrementally. -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[PATCH 1/4] cputime: Generalize CONFIG_VIRT_CPU_ACCOUNTING
S390, ia64 and powerpc all define their own version of CONFIG_VIRT_CPU_ACCOUNTING. Generalize the config and its description to a single place to avoid duplication. Signed-off-by: Frederic Weisbecker Cc: Tony Luck Cc: Fenghua Yu Cc: Benjamin Herrenschmidt Cc: Paul Mackerras Cc: Martin Schwidefsky Cc: Heiko Carstens Cc: Ingo Molnar Cc: Thomas Gleixner Cc: Peter Zijlstra --- arch/Kconfig |3 +++ arch/ia64/Kconfig | 12 +--- arch/powerpc/platforms/Kconfig.cputype | 16 +--- arch/s390/Kconfig |5 ++--- init/Kconfig | 13 + 5 files changed, 20 insertions(+), 29 deletions(-) diff --git a/arch/Kconfig b/arch/Kconfig index 72f2fa1..f78de57 100644 --- a/arch/Kconfig +++ b/arch/Kconfig @@ -281,4 +281,7 @@ config SECCOMP_FILTER See Documentation/prctl/seccomp_filter.txt for details. +config HAVE_VIRT_CPU_ACCOUNTING + bool + source "kernel/gcov/Kconfig" diff --git a/arch/ia64/Kconfig b/arch/ia64/Kconfig index 310cf57..3c720ef 100644 --- a/arch/ia64/Kconfig +++ b/arch/ia64/Kconfig @@ -25,6 +25,7 @@ config IA64 select HAVE_GENERIC_HARDIRQS select HAVE_MEMBLOCK select HAVE_MEMBLOCK_NODE_MAP + select HAVE_VIRT_CPU_ACCOUNTING select ARCH_DISCARD_MEMBLOCK select GENERIC_IRQ_PROBE select GENERIC_PENDING_IRQ if SMP @@ -340,17 +341,6 @@ config FORCE_MAX_ZONEORDER default "17" if HUGETLB_PAGE default "11" -config VIRT_CPU_ACCOUNTING - bool "Deterministic task and CPU time accounting" - default n - help - Select this option to enable more accurate task and CPU time - accounting. This is done by reading a CPU counter on each - kernel entry and exit and on transitions within the kernel - between system, softirq and hardirq state, so there is a - small performance impact. - If in doubt, say N here. - config SMP bool "Symmetric multi-processing support" select USE_GENERIC_SMP_HELPERS diff --git a/arch/powerpc/platforms/Kconfig.cputype b/arch/powerpc/platforms/Kconfig.cputype index 30fd01d..72afd28 100644 --- a/arch/powerpc/platforms/Kconfig.cputype +++ b/arch/powerpc/platforms/Kconfig.cputype @@ -1,6 +1,7 @@ config PPC64 bool "64-bit kernel" default n + select HAVE_VIRT_CPU_ACCOUNTING help This option selects whether a 32-bit or a 64-bit kernel will be built. @@ -337,21 +338,6 @@ config PPC_MM_SLICES default y if (!PPC_FSL_BOOK3E && PPC64 && HUGETLB_PAGE) || (PPC_STD_MMU_64 && PPC_64K_PAGES) default n -config VIRT_CPU_ACCOUNTING - bool "Deterministic task and CPU time accounting" - depends on PPC64 - default y - help - Select this option to enable more accurate task and CPU time - accounting. This is done by reading a CPU counter on each - kernel entry and exit and on transitions within the kernel - between system, softirq and hardirq state, so there is a - small performance impact. This also enables accounting of - stolen time on logically-partitioned systems running on - IBM POWER5-based machines. - - If in doubt, say Y here. - config PPC_HAVE_PMU_SUPPORT bool diff --git a/arch/s390/Kconfig b/arch/s390/Kconfig index 76de6b6..49ebfb6 100644 --- a/arch/s390/Kconfig +++ b/arch/s390/Kconfig @@ -49,9 +49,6 @@ config GENERIC_LOCKBREAK config PGSTE def_bool y if KVM -config VIRT_CPU_ACCOUNTING - def_bool y - config ARCH_SUPPORTS_DEBUG_PAGEALLOC def_bool y @@ -89,6 +86,8 @@ config S390 select HAVE_MEMBLOCK select HAVE_MEMBLOCK_NODE_MAP select HAVE_CMPXCHG_LOCAL + select HAVE_VIRT_CPU_ACCOUNTING + select VIRT_CPU_ACCOUNTING select ARCH_DISCARD_MEMBLOCK select BUILDTIME_EXTABLE_SORT select ARCH_INLINE_SPIN_TRYLOCK diff --git a/init/Kconfig b/init/Kconfig index af6c7f8..c40d0fb 100644 --- a/init/Kconfig +++ b/init/Kconfig @@ -267,6 +267,19 @@ config POSIX_MQUEUE_SYSCTL depends on SYSCTL default y +config VIRT_CPU_ACCOUNTING + bool "Deterministic task and CPU time accounting" + depends on HAVE_VIRT_CPU_ACCOUNTING + default y if PPC64 + help + Select this option to enable more accurate task and CPU time + accounting. This is done by reading a CPU counter on each + kernel entry and exit and on transitions within the kernel + between system, softirq and hardirq state, so there is a + small performance impact. In the case of s390 or IBM POWER > 5, + this also enables accounting of stolen time on logically-partitioned + systems. + config BSD_PROCESS_ACCT
[PATCH 0/4] cputime: Virtual cputime accounting small cleanups and consolidation v3
Hi, In this v3: - Rebase against latest tip:sched/core - Added acks from Martin - Refined help text for the consolidated CONFIG_VIRT_CPU_ACCOUNTING option in the 1st patch. You can pull from: git://github.com/fweisbec/linux-dynticks.git virt-cputime-v3 Thanks. Frederic Weisbecker (4): cputime: Generalize CONFIG_VIRT_CPU_ACCOUNTING sched: Move cputime code to its own file cputime: Consolidate vtime handling on context switch s390: Remove leftover account_tick_vtime() header arch/Kconfig |3 + arch/ia64/Kconfig | 12 +- arch/ia64/include/asm/switch_to.h |8 - arch/ia64/kernel/time.c|4 +- arch/powerpc/include/asm/time.h|6 - arch/powerpc/kernel/process.c |3 - arch/powerpc/kernel/time.c |6 + arch/powerpc/platforms/Kconfig.cputype | 16 +- arch/s390/Kconfig |5 +- arch/s390/include/asm/switch_to.h |4 - arch/s390/kernel/vtime.c |4 +- include/linux/kernel_stat.h|6 + init/Kconfig | 13 + kernel/sched/Makefile |2 +- kernel/sched/core.c| 558 +--- kernel/sched/cputime.c | 503 kernel/sched/sched.h | 63 17 files changed, 606 insertions(+), 610 deletions(-) create mode 100644 kernel/sched/cputime.c -- 1.7.5.4 -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[PATCH 4/4] s390: Remove leftover account_tick_vtime() header
The function doesn't seem to exist anymore. Signed-off-by: Frederic Weisbecker Acked-by: Martin Schwidefsky Cc: Tony Luck Cc: Fenghua Yu Cc: Benjamin Herrenschmidt Cc: Paul Mackerras Cc: Heiko Carstens Cc: Ingo Molnar Cc: Thomas Gleixner Cc: Peter Zijlstra --- arch/s390/include/asm/switch_to.h |2 -- 1 files changed, 0 insertions(+), 2 deletions(-) diff --git a/arch/s390/include/asm/switch_to.h b/arch/s390/include/asm/switch_to.h index e7f9b3d..314cc94 100644 --- a/arch/s390/include/asm/switch_to.h +++ b/arch/s390/include/asm/switch_to.h @@ -89,8 +89,6 @@ static inline void restore_access_regs(unsigned int *acrs) prev = __switch_to(prev,next); \ } while (0) -extern void account_tick_vtime(struct task_struct *); - #define finish_arch_switch(prev) do { \ set_fs(current->thread.mm_segment); \ } while (0) -- 1.7.5.4 -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[PATCH 2/4] sched: Move cputime code to its own file
Extract cputime code from the giant sched/core.c and put it in its own file. This make it easier to deal with this particular area and de-bloat a bit more core.c Signed-off-by: Frederic Weisbecker Acked-by: Martin Schwidefsky Cc: Tony Luck Cc: Fenghua Yu Cc: Benjamin Herrenschmidt Cc: Paul Mackerras Cc: Heiko Carstens Cc: Ingo Molnar Cc: Thomas Gleixner Cc: Peter Zijlstra --- kernel/sched/Makefile |2 +- kernel/sched/core.c| 557 +--- kernel/sched/cputime.c | 503 +++ kernel/sched/sched.h | 63 ++ 4 files changed, 569 insertions(+), 556 deletions(-) create mode 100644 kernel/sched/cputime.c diff --git a/kernel/sched/Makefile b/kernel/sched/Makefile index 173ea52..f06d249 100644 --- a/kernel/sched/Makefile +++ b/kernel/sched/Makefile @@ -11,7 +11,7 @@ ifneq ($(CONFIG_SCHED_OMIT_FRAME_POINTER),y) CFLAGS_core.o := $(PROFILING) -fno-omit-frame-pointer endif -obj-y += core.o clock.o idle_task.o fair.o rt.o stop_task.o +obj-y += core.o clock.o cputime.o idle_task.o fair.o rt.o stop_task.o obj-$(CONFIG_SMP) += cpupri.o obj-$(CONFIG_SCHED_AUTOGROUP) += auto_group.o obj-$(CONFIG_SCHEDSTATS) += stats.o diff --git a/kernel/sched/core.c b/kernel/sched/core.c index 4376c9f..ae3bcaa 100644 --- a/kernel/sched/core.c +++ b/kernel/sched/core.c @@ -740,126 +740,6 @@ void deactivate_task(struct rq *rq, struct task_struct *p, int flags) dequeue_task(rq, p, flags); } -#ifdef CONFIG_IRQ_TIME_ACCOUNTING - -/* - * There are no locks covering percpu hardirq/softirq time. - * They are only modified in account_system_vtime, on corresponding CPU - * with interrupts disabled. So, writes are safe. - * They are read and saved off onto struct rq in update_rq_clock(). - * This may result in other CPU reading this CPU's irq time and can - * race with irq/account_system_vtime on this CPU. We would either get old - * or new value with a side effect of accounting a slice of irq time to wrong - * task when irq is in progress while we read rq->clock. That is a worthy - * compromise in place of having locks on each irq in account_system_time. - */ -static DEFINE_PER_CPU(u64, cpu_hardirq_time); -static DEFINE_PER_CPU(u64, cpu_softirq_time); - -static DEFINE_PER_CPU(u64, irq_start_time); -static int sched_clock_irqtime; - -void enable_sched_clock_irqtime(void) -{ - sched_clock_irqtime = 1; -} - -void disable_sched_clock_irqtime(void) -{ - sched_clock_irqtime = 0; -} - -#ifndef CONFIG_64BIT -static DEFINE_PER_CPU(seqcount_t, irq_time_seq); - -static inline void irq_time_write_begin(void) -{ - __this_cpu_inc(irq_time_seq.sequence); - smp_wmb(); -} - -static inline void irq_time_write_end(void) -{ - smp_wmb(); - __this_cpu_inc(irq_time_seq.sequence); -} - -static inline u64 irq_time_read(int cpu) -{ - u64 irq_time; - unsigned seq; - - do { - seq = read_seqcount_begin(&per_cpu(irq_time_seq, cpu)); - irq_time = per_cpu(cpu_softirq_time, cpu) + - per_cpu(cpu_hardirq_time, cpu); - } while (read_seqcount_retry(&per_cpu(irq_time_seq, cpu), seq)); - - return irq_time; -} -#else /* CONFIG_64BIT */ -static inline void irq_time_write_begin(void) -{ -} - -static inline void irq_time_write_end(void) -{ -} - -static inline u64 irq_time_read(int cpu) -{ - return per_cpu(cpu_softirq_time, cpu) + per_cpu(cpu_hardirq_time, cpu); -} -#endif /* CONFIG_64BIT */ - -/* - * Called before incrementing preempt_count on {soft,}irq_enter - * and before decrementing preempt_count on {soft,}irq_exit. - */ -void account_system_vtime(struct task_struct *curr) -{ - unsigned long flags; - s64 delta; - int cpu; - - if (!sched_clock_irqtime) - return; - - local_irq_save(flags); - - cpu = smp_processor_id(); - delta = sched_clock_cpu(cpu) - __this_cpu_read(irq_start_time); - __this_cpu_add(irq_start_time, delta); - - irq_time_write_begin(); - /* -* We do not account for softirq time from ksoftirqd here. -* We want to continue accounting softirq time to ksoftirqd thread -* in that case, so as not to confuse scheduler with a special task -* that do not consume any time, but still wants to run. -*/ - if (hardirq_count()) - __this_cpu_add(cpu_hardirq_time, delta); - else if (in_serving_softirq() && curr != this_cpu_ksoftirqd()) - __this_cpu_add(cpu_softirq_time, delta); - - irq_time_write_end(); - local_irq_restore(flags); -} -EXPORT_SYMBOL_GPL(account_system_vtime); - -#endif /* CONFIG_IRQ_TIME_ACCOUNTING */ - -#ifdef CONFIG_PARAVIRT -static inline u64 steal_ticks(u64 steal) -{ - if (unlikely(steal > NSEC_PER_SEC)) - return div_u64(steal, TICK_NSEC); - - return __iter_div_u64_rem(steal, TICK_NSEC, &st
[PATCH 3/4] cputime: Consolidate vtime handling on context switch
The archs that implement virtual cputime accounting all flush the cputime of a task when it gets descheduled and sometimes set up some ground initialization for the next task to account its cputime. These archs all put their own hooks in their context switch callbacks and handle the off-case themselves. Consolidate this by creating a new account_switch_vtime() callback called in generic code right after a context switch and that these archs must implement to flush the prev task cputime and initialize the next task cputime related state. Signed-off-by: Frederic Weisbecker Acked-by: Martin Schwidefsky Cc: Tony Luck Cc: Fenghua Yu Cc: Benjamin Herrenschmidt Cc: Paul Mackerras Cc: Heiko Carstens Cc: Ingo Molnar Cc: Thomas Gleixner Cc: Peter Zijlstra --- arch/ia64/include/asm/switch_to.h |8 arch/ia64/kernel/time.c |4 ++-- arch/powerpc/include/asm/time.h |6 -- arch/powerpc/kernel/process.c |3 --- arch/powerpc/kernel/time.c|6 ++ arch/s390/include/asm/switch_to.h |2 -- arch/s390/kernel/vtime.c |4 ++-- include/linux/kernel_stat.h |6 ++ kernel/sched/core.c |1 + 9 files changed, 17 insertions(+), 23 deletions(-) diff --git a/arch/ia64/include/asm/switch_to.h b/arch/ia64/include/asm/switch_to.h index cb2412f..d38c7ea 100644 --- a/arch/ia64/include/asm/switch_to.h +++ b/arch/ia64/include/asm/switch_to.h @@ -30,13 +30,6 @@ extern struct task_struct *ia64_switch_to (void *next_task); extern void ia64_save_extra (struct task_struct *task); extern void ia64_load_extra (struct task_struct *task); -#ifdef CONFIG_VIRT_CPU_ACCOUNTING -extern void ia64_account_on_switch (struct task_struct *prev, struct task_struct *next); -# define IA64_ACCOUNT_ON_SWITCH(p,n) ia64_account_on_switch(p,n) -#else -# define IA64_ACCOUNT_ON_SWITCH(p,n) -#endif - #ifdef CONFIG_PERFMON DECLARE_PER_CPU(unsigned long, pfm_syst_info); # define PERFMON_IS_SYSWIDE() (__get_cpu_var(pfm_syst_info) & 0x1) @@ -49,7 +42,6 @@ extern void ia64_account_on_switch (struct task_struct *prev, struct task_struct || PERFMON_IS_SYSWIDE()) #define __switch_to(prev,next,last) do { \ - IA64_ACCOUNT_ON_SWITCH(prev, next); \ if (IA64_HAS_EXTRA_STATE(prev)) \ ia64_save_extra(prev); \ if (IA64_HAS_EXTRA_STATE(next)) \ diff --git a/arch/ia64/kernel/time.c b/arch/ia64/kernel/time.c index ecc904b..6247197 100644 --- a/arch/ia64/kernel/time.c +++ b/arch/ia64/kernel/time.c @@ -88,10 +88,10 @@ extern cputime_t cycle_to_cputime(u64 cyc); * accumulated times to the current process, and to prepare accounting on * the next process. */ -void ia64_account_on_switch(struct task_struct *prev, struct task_struct *next) +void account_switch_vtime(struct task_struct *prev) { struct thread_info *pi = task_thread_info(prev); - struct thread_info *ni = task_thread_info(next); + struct thread_info *ni = task_thread_info(current); cputime_t delta_stime, delta_utime; __u64 now; diff --git a/arch/powerpc/include/asm/time.h b/arch/powerpc/include/asm/time.h index 3b4b4a8..c1f2676 100644 --- a/arch/powerpc/include/asm/time.h +++ b/arch/powerpc/include/asm/time.h @@ -197,12 +197,6 @@ struct cpu_usage { DECLARE_PER_CPU(struct cpu_usage, cpu_usage_array); -#if defined(CONFIG_VIRT_CPU_ACCOUNTING) -#define account_process_vtime(tsk) account_process_tick(tsk, 0) -#else -#define account_process_vtime(tsk) do { } while (0) -#endif - extern void secondary_cpu_time_init(void); DECLARE_PER_CPU(u64, decrementers_next_tb); diff --git a/arch/powerpc/kernel/process.c b/arch/powerpc/kernel/process.c index 710f400..d73fa99 100644 --- a/arch/powerpc/kernel/process.c +++ b/arch/powerpc/kernel/process.c @@ -514,9 +514,6 @@ struct task_struct *__switch_to(struct task_struct *prev, local_irq_save(flags); - account_system_vtime(current); - account_process_vtime(current); - /* * We can't take a PMU exception inside _switch() since there is a * window where the kernel stack SLB and the kernel stack are out diff --git a/arch/powerpc/kernel/time.c b/arch/powerpc/kernel/time.c index be171ee..49da7f0 100644 --- a/arch/powerpc/kernel/time.c +++ b/arch/powerpc/kernel/time.c @@ -366,6 +366,12 @@ void account_process_tick(struct task_struct *tsk, int user_tick) account_user_time(tsk, utime, utimescaled); } +void account_switch_vtime(struct task_struct *prev) +{ + account_system_vtime(prev); + account_process_tick(prev, 0); +} + #else /* ! CONFIG_VIRT_CPU_ACCOUNTING */ #define calc_cputime_factors() #e
Re: powerpc/perf: hw breakpoints return ENOSPC
On Thu, Aug 16, 2012 at 02:23:54PM +1000, Michael Neuling wrote: > Hi, > > I've been trying to get hardware breakpoints with perf to work on POWER7 > but I'm getting the following: > > % perf record -e mem:0x1000 true > > Error: sys_perf_event_open() syscall returned with 28 (No space left on > device). /bin/dmesg may provide additional information. > > Fatal: No CONFIG_PERF_EVENTS=y kernel support configured? > > true: Terminated > > (FWIW adding -a and it works fine) > > Debugging it seems that __reserve_bp_slot() is returning ENOSPC because > it thinks there are no free breakpoint slots on this CPU. > > I have a 2 CPUs, so perf userspace is doing two perf_event_open syscalls > to add a counter to each CPU [1]. The first syscall succeeds but the > second is failing. > > On this second syscall, fetch_bp_busy_slots() sets slots.pinned to be 1, > despite there being no breakpoint on this CPU. This is because the call > the task_bp_pinned, checks all CPUs, rather than just the current CPU. > POWER7 only has one hardware breakpoint per CPU (ie. HBP_NUM=1), so we > return ENOSPC. > > The following patch fixes this by checking the associated CPU for each > breakpoint in task_bp_pinned. I'm not familiar with this code, so it's > provided as a reference to the above issue. > > Mikey > > 1. not sure why it doesn't just do one syscall and specify all CPUs, but > that's another issue. Using two syscalls should work. This patch seems to make sense. I'll try it and run some tests. Can I have your Signed-off-by ? Thanks. > > diff --git a/kernel/events/hw_breakpoint.c b/kernel/events/hw_breakpoint.c > index bb38c4d..e092daa 100644 > --- a/kernel/events/hw_breakpoint.c > +++ b/kernel/events/hw_breakpoint.c > @@ -111,14 +111,16 @@ static unsigned int max_task_bp_pinned(int cpu, enum > bp_type_idx type) > * Count the number of breakpoints of the same type and same task. > * The given event must be not on the list. > */ > -static int task_bp_pinned(struct perf_event *bp, enum bp_type_idx type) > +static int task_bp_pinned(int cpu, struct perf_event *bp, enum bp_type_idx > type) > { > struct task_struct *tsk = bp->hw.bp_target; > struct perf_event *iter; > int count = 0; > > list_for_each_entry(iter, &bp_task_head, hw.bp_list) { > - if (iter->hw.bp_target == tsk && find_slot_idx(iter) == type) > + if (iter->hw.bp_target == tsk && > + find_slot_idx(iter) == type && > + cpu == iter->cpu) > count += hw_breakpoint_weight(iter); > } > > @@ -141,7 +143,7 @@ fetch_bp_busy_slots(struct bp_busy_slots *slots, struct > perf_event *bp, > if (!tsk) > slots->pinned += max_task_bp_pinned(cpu, type); > else > - slots->pinned += task_bp_pinned(bp, type); > + slots->pinned += task_bp_pinned(cpu, bp, type); > slots->flexible = per_cpu(nr_bp_flexible[type], cpu); > > return; > @@ -154,7 +156,7 @@ fetch_bp_busy_slots(struct bp_busy_slots *slots, struct > perf_event *bp, > if (!tsk) > nr += max_task_bp_pinned(cpu, type); > else > - nr += task_bp_pinned(bp, type); > + nr += task_bp_pinned(cpu, bp, type); > > if (nr > slots->pinned) > slots->pinned = nr; > @@ -188,7 +190,7 @@ static void toggle_bp_task_slot(struct perf_event *bp, > int cpu, bool enable, > int old_idx = 0; > int idx = 0; > > - old_count = task_bp_pinned(bp, type); > + old_count = task_bp_pinned(cpu, bp, type); > old_idx = old_count - 1; > idx = old_idx + weight; > -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Status of adaptive tickless patchset as of august 2012
Hi, I started working on the adaptive nohz patchset by the end of 2010. Since then, I iterated through one big branch: - Nohz tasks (https://lwn.net/Articles/420490/) - Nohz cpusets (https://lwn.net/Articles/455044/) - Nohz cpusets v2 (https://lwn.net/Articles/487599/) - Nohz cpusets v3 (https://lwn.net/Articles/495422/) It quickly grew up to more than 40 patches. And still the full support (ie: handle everything that the tick maintains, but without the tick) wasn't yet finished. And the more I was progressing to get this full support, the more I had patches to maintain, rebase, improve, etc... Some side effects went to increase: - I had deep reviews about the core overall design in the first iterations. Thanks to that I made nice progresses. But as the patchset grew, I got less reviews about overall design but more about details. And I can totally understand that. Huge pile of patches certainly don't encourage reviews. - Lacking reviews on the overall design, I was feeling more and more uncomfortable about whatever I was improving or whichever feature I was adding on top of the existing ones. And I was indeed digging on some wrong direction for some parts. - I was spending too much time in out-of-tree maintainance while my goal is to get this upstream. All in one, this big branch neither scaled in term of reviews nor development. So I decided, after Ingo proposed me to set a tree in -tip, to cut all of the things the tick is handling and isolate each of these into single separate topics and handle them individually or at least iteratively, trying to push the things upstream or in a staging tree in -tip piecewise. As long as this is carried by concerned maintainers and I can get their insights on a regular basis. And also as long as we can iterate to some central branch because, even if we can cut out things into individual topics, there are significant interdependencies. I think this has been successfull so far: - The detection of illegal RCU read side critical sections under RCU extended quiescent state is now upstream. This even helped finding lot of bugs upstream. - State of user as RCU extended quiescent state is currently pending in Paul's tree in the rcu/idle branch. It's also in linux-next. This may likely go upstream or in a staging branch in -tip for the next merge window. - Some preparatory work to split nohz and idle logic in nohz API. It went upstream on the last merge window. - Proposed something to handle nohz cputime accounting: https://lwn.net/Articles/501766/ Got fundamental reviews that pointed me to rather reuse virtual based cputime accounting. - Consolidated/cleaned up virtual based cputime accounting (last version is https://lkml.org/lkml/2012/8/17/326 and waits for inclusion in -tip or so.) - On top of that vtime consolidation and the RCU pending patches, propose a generic virtual cputime accounting for archs that don't have CONFIG_VTIME_CPU_ACCOUNTING. See http://comments.gmane.org/gmane.linux.kernel/1337690 A tickless CPU can then account cputime with that. So the process seem to be in a better direction now. Summer holidays have naturally made it a bit smoother and the rythm will probably stay that way until the end of ksummit/linuxcon/LPC. But I have the feeling we are moving forward. No schedule plans, but once I get the above topics sorted out, I'll probably work on timekeeping handling in adaptive tickless CPUs. And then the rest... I'll still keep maintaining the big branch in my tree. But this is now going to be rather a big draft or laboratory, with regular rebases on what is merged upstream or in maintainers tree. It helps me to keep a practical view of the big picture. Thanks. -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 0/4] cputime: Virtual cputime accounting small cleanups and consolidation v3
On Mon, Aug 20, 2012 at 10:40:12AM +0200, Ingo Molnar wrote: > > * Frederic Weisbecker wrote: > > > Hi, > > > > In this v3: > > > > - Rebase against latest tip:sched/core > > - Added acks from Martin > > - Refined help text for the consolidated CONFIG_VIRT_CPU_ACCOUNTING option > > in the 1st patch. > > > > You can pull from: > > > > git://github.com/fweisbec/linux-dynticks.git > > virt-cputime-v3 > > > > Thanks. > > > > Frederic Weisbecker (4): > > cputime: Generalize CONFIG_VIRT_CPU_ACCOUNTING > > sched: Move cputime code to its own file > > cputime: Consolidate vtime handling on context switch > > s390: Remove leftover account_tick_vtime() header > > > > arch/Kconfig |3 + > > arch/ia64/Kconfig | 12 +- > > arch/ia64/include/asm/switch_to.h |8 - > > arch/ia64/kernel/time.c|4 +- > > arch/powerpc/include/asm/time.h|6 - > > arch/powerpc/kernel/process.c |3 - > > arch/powerpc/kernel/time.c |6 + > > arch/powerpc/platforms/Kconfig.cputype | 16 +- > > arch/s390/Kconfig |5 +- > > arch/s390/include/asm/switch_to.h |4 - > > arch/s390/kernel/vtime.c |4 +- > > include/linux/kernel_stat.h|6 + > > init/Kconfig | 13 + > > kernel/sched/Makefile |2 +- > > kernel/sched/core.c| 558 > > +--- > > kernel/sched/cputime.c | 503 > > kernel/sched/sched.h | 63 > > 17 files changed, 606 insertions(+), 610 deletions(-) > > create mode 100644 kernel/sched/cputime.c > > Hm, I'm getting build failures on x86: > > kernel/sched/cputime.c:60:2: error: implicit declaration of > function ‘irq_time_write_begin’ > [-Werror=implicit-function-declaration] > kernel/sched/cputime.c:72:2: error: implicit declaration of > function ‘irq_time_write_end’ > [-Werror=implicit-function-declaration] > kernel/sched/cputime.c:274:2: error: implicit declaration of > function ‘static_key_false’ > [-Werror=implicit-function-declaration] > > Config attached. > > Thanks, > > Ingo Oops, sorry. Some misplaced #endif and a missing header inclusion. So I just fixed that in the branch: git://github.com/fweisbec/linux-dynticks.git virt-cputime-v4 The diff against the previous set is: $ git diff virt-cputime-v3..virt-cputime-v4 diff --git a/kernel/sched/cputime.c b/kernel/sched/cputime.c index ea9b4b6..372692b 100644 --- a/kernel/sched/cputime.c +++ b/kernel/sched/cputime.c @@ -2,6 +2,7 @@ #include #include #include +#include #include "sched.h" diff --git a/kernel/sched/sched.h b/kernel/sched/sched.h index 62f9850..804c2e5 100644 --- a/kernel/sched/sched.h +++ b/kernel/sched/sched.h @@ -1169,6 +1169,7 @@ enum rq_nohz_flag_bits { }; #define nohz_flags(cpu)(&cpu_rq(cpu)->nohz_flags) +#endif #ifdef CONFIG_IRQ_TIME_ACCOUNTING @@ -1219,4 +1220,3 @@ static inline u64 irq_time_read(int cpu) #endif /* CONFIG_64BIT */ #endif /* CONFIG_IRQ_TIME_ACCOUNTING */ -#endif -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 2/6] rcu: Allow rcu_user_enter()/exit() to nest
On Fri, Jul 06, 2012 at 09:27:09AM -0700, Paul E. McKenney wrote: > On Fri, Jul 06, 2012 at 02:00:14PM +0200, Frederic Weisbecker wrote: > > Allow calls to rcu_user_enter() even if we are already > > in userspace (as seen by RCU) and allow calls to rcu_user_exit() > > even if we are already in the kernel. > > > > This makes the APIs more flexible to be called from architectures. > > Exception entries for example won't need to know if they come from > > userspace before calling rcu_user_exit(). > > You lost me on this one. As long as the nesting level stays below > a few tens, rcu_user_enter() and rcu_user_exit() already can nest. > > Or are you saying that you need to deal with duplicate rcu_user_enter() > calls that must be matched by a single rcu_user_exit() call? Yep, we can have that kind of thing: in_user = 1 syscall rcu_user_exit() // in_user = 0 exception rcu_user_exit() end of exception end of syscall rcu_user_enter() This is because when we enter an exception, we don't have a different entry whenever we trapped/faulted in userspace or kernelspace. So it's hard to know if we were in userspace before the exception triggered. To avoid complication in architecture code, I'm using this kind of "in_user" state. -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 1/6] rcu: Settle config for userspace extended quiescent state
On Fri, Jul 06, 2012 at 09:31:29AM -0700, Paul E. McKenney wrote: > On Fri, Jul 06, 2012 at 02:00:13PM +0200, Frederic Weisbecker wrote: > > Create a new config option under the RCU menu that put > > CPUs under RCU extended quiescent state (as in dynticks > > idle mode) when they run in userspace. This require > > some contribution from architectures to hook into kernel > > and userspace boundaries. > > > > Signed-off-by: Frederic Weisbecker > > Cc: Alessio Igor Bogani > > Cc: Andrew Morton > > Cc: Avi Kivity > > Cc: Chris Metcalf > > Cc: Christoph Lameter > > Cc: Geoff Levand > > Cc: Gilad Ben Yossef > > Cc: Hakan Akkan > > Cc: H. Peter Anvin > > Cc: Ingo Molnar > > Cc: Josh Triplett > > Cc: Kevin Hilman > > Cc: Max Krasnyansky > > Cc: Peter Zijlstra > > Cc: Stephen Hemminger > > Cc: Steven Rostedt > > Cc: Sven-Thorsten Dietrich > > Cc: Thomas Gleixner > > --- > > arch/Kconfig | 13 + > > init/Kconfig | 10 ++ > > kernel/rcutree.c |4 > > 3 files changed, 27 insertions(+), 0 deletions(-) > > > > diff --git a/arch/Kconfig b/arch/Kconfig > > index 8c3d957..c2e0ce4 100644 > > --- a/arch/Kconfig > > +++ b/arch/Kconfig > > @@ -274,4 +274,17 @@ config SECCOMP_FILTER > > > > See Documentation/prctl/seccomp_filter.txt for details. > > > > +config HAVE_RCU_USER_QS > > + bool > > + help > > + Provide kernel entry/exit hooks necessary for userspace > > + RCU extended quiescent state. Syscalls and exceptions > > + low level handlers must be wrapped with a call to rcu_user_exit() > > + on entry and rcu_user_enter() before resuming userspace. Irqs > > + entry don't need to call rcu_user_exit() because their high level > > + handlers are protected inside rcu_irq_enter/rcu_irq_exit() but > > + preemption or signal handling on irq exit still need to be protected > > + with a call to rcu_user_exit(). rcu_user_enter() must then be > > + called back on irq exit when the preempted task is back on the CPU. > > + > > source "kernel/gcov/Kconfig" > > diff --git a/init/Kconfig b/init/Kconfig > > index d07dcf9..3a4af8f 100644 > > --- a/init/Kconfig > > +++ b/init/Kconfig > > @@ -441,6 +441,16 @@ config PREEMPT_RCU > > This option enables preemptible-RCU code that is common between > > the TREE_PREEMPT_RCU and TINY_PREEMPT_RCU implementations. > > > > +config RCU_USER_QS > > + bool "Consider userspace as in RCU extended quiescent state" > > + depends on HAVE_RCU_USER_QS && SMP > > OK, I'll bite... Why the "SMP"? RCU could make good use of knowing > about user-mode executing even in UP kernels. Because Tiny RCU doesn't implement rcu_user_enter()/exit yet. And it doesn't need it for now. To better express the constraint I should probably have used: depends on TREE_RCU || TREE_PREEMPT_RCU -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 6/6] x86: Exit RCU extended QS on notify resume
On Fri, Jul 06, 2012 at 09:33:38AM -0700, Paul E. McKenney wrote: > On Fri, Jul 06, 2012 at 02:00:18PM +0200, Frederic Weisbecker wrote: > > do_notify_resume() may be called on irq exit but it won't > > be protected between rcu_irq_enter() and rcu_irq_exit() > > and we don't call rcu_user_exit() on irq entry (unlike > > syscalls/exceptions entry). > > > > Since it can use RCU read side critical section, we must call > > rcu_user_exit() before doing anything there. > > > > This complete support for RCU userspace extended quiescent state > > in x86. > > > > Signed-off-by: Frederic Weisbecker > > Cc: Alessio Igor Bogani > > Cc: Andrew Morton > > Cc: Avi Kivity > > Cc: Chris Metcalf > > Cc: Christoph Lameter > > Cc: Geoff Levand > > Cc: Gilad Ben Yossef > > Cc: Hakan Akkan > > Cc: H. Peter Anvin > > Cc: Ingo Molnar > > Cc: Josh Triplett > > Cc: Kevin Hilman > > Cc: Max Krasnyansky > > Cc: Peter Zijlstra > > Cc: Stephen Hemminger > > Cc: Steven Rostedt > > Cc: Sven-Thorsten Dietrich > > Cc: Thomas Gleixner > > --- > > arch/x86/Kconfig |1 + > > arch/x86/kernel/signal.c |2 ++ > > 2 files changed, 3 insertions(+), 0 deletions(-) > > > > diff --git a/arch/x86/Kconfig b/arch/x86/Kconfig > > index c70684f..38dfcc2 100644 > > --- a/arch/x86/Kconfig > > +++ b/arch/x86/Kconfig > > @@ -95,6 +95,7 @@ config X86 > > select KTIME_SCALAR if X86_32 > > select GENERIC_STRNCPY_FROM_USER > > select GENERIC_STRNLEN_USER > > + select HAVE_RCU_USER_QS if X86_64 > > And I will bite yet again. Why only 64-bit kernels? > > Thanx, Paul Because I don't want to spend time on implementing it the same way on 32 in case people disagree with the whole design :) -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 2/6] rcu: Allow rcu_user_enter()/exit() to nest
On Sun, Jul 08, 2012 at 06:54:18PM +0300, Avi Kivity wrote: > On 07/06/2012 03:00 PM, Frederic Weisbecker wrote: > > Allow calls to rcu_user_enter() even if we are already > > in userspace (as seen by RCU) and allow calls to rcu_user_exit() > > even if we are already in the kernel. > > > > This makes the APIs more flexible to be called from architectures. > > Exception entries for example won't need to know if they come from > > userspace before calling rcu_user_exit(). > > I guess I should switch kvm to rcu_user_enter() and co, so we can > disable the tick while running in a guest. But where are those > functions? What are the rules for calling them? I guess we need to have a closer look at the guest case first. We probably need to take some care about specifics in time and load accounting usually handled by the tick before we can shut it down. RCU is only part of the problem. -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 0/4] cputime: Virtual cputime accounting small cleanups and consolidation
On Tue, Jun 19, 2012 at 03:43:07PM +0200, Frederic Weisbecker wrote: > Not sure to which tree this should go. The scheduler one may be. > Anyway if you're fine with it, it is pullable at: > > git://github.com/fweisbec/linux-dynticks.git > virt-cputime Ping. Are you guys fine with the patchset? > > This is only built tested on the relevant archs. > > I wish we could do more vtime cputime accounting consolidation > but archs do the things pretty differently although I bet the > behaviour could be more unified. > > > Frederic Weisbecker (4): > cputime: Generalize CONFIG_VIRT_CPU_ACCOUNTING > sched: Move cputime code to its own file > cputime: Consolidate vtime handling on context switch > s390: Remove leftover account_tick_vtime() header > > arch/Kconfig |3 + > arch/ia64/Kconfig | 12 +- > arch/ia64/include/asm/switch_to.h |8 - > arch/ia64/kernel/time.c|4 +- > arch/powerpc/include/asm/time.h|6 - > arch/powerpc/kernel/process.c |3 - > arch/powerpc/kernel/time.c |6 + > arch/powerpc/platforms/Kconfig.cputype | 16 +- > arch/s390/Kconfig |5 +- > arch/s390/include/asm/switch_to.h |4 - > arch/s390/kernel/vtime.c |4 +- > include/linux/kernel_stat.h|6 + > init/Kconfig | 13 + > kernel/sched/Makefile |2 +- > kernel/sched/core.c| 552 > +--- > kernel/sched/cputime.c | 497 > kernel/sched/sched.h | 63 > 17 files changed, 600 insertions(+), 604 deletions(-) > create mode 100644 kernel/sched/cputime.c > > -- > 1.7.5.4 > -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 6/6] x86: Exit RCU extended QS on notify resume
On Sun, Jul 08, 2012 at 02:17:07PM -0700, Paul E. McKenney wrote: > On Fri, Jul 06, 2012 at 01:43:29PM -0700, Josh Triplett wrote: > > On Fri, Jul 06, 2012 at 09:33:38AM -0700, Paul E. McKenney wrote: > > > On Fri, Jul 06, 2012 at 02:00:18PM +0200, Frederic Weisbecker wrote: > > > > --- a/arch/x86/Kconfig > > > > +++ b/arch/x86/Kconfig > > > > @@ -95,6 +95,7 @@ config X86 > > > > select KTIME_SCALAR if X86_32 > > > > select GENERIC_STRNCPY_FROM_USER > > > > select GENERIC_STRNLEN_USER > > > > + select HAVE_RCU_USER_QS if X86_64 > > > > > > And I will bite yet again. Why only 64-bit kernels? > > > > Because HAVE_RCU_USER_QS requires an architecture-specific component, > > and this patch series only added the necessary bits to entry_64.S. > > OK, please allow me to rephrase the question. Why only entry_64.S? ;-) So like I said, I prefer to wait for reviews and general opinion before pushing further. -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] trace: add ability to set a target task for events (v2)
On Wed, Jul 11, 2012 at 06:14:58PM +0400, Andrew Vagin wrote: > A few events are interesting not only for a current task. > For example, sched_stat_* are interesting for a task, which > wake up. For this reason, it will be good, if such events will > be delivered to a target task too. > > Now a target task can be set by using __perf_task(). > > The original idea and a draft patch belongs to Peter Zijlstra. > > I need this events for profiling sleep times. sched_switch is used for > getting callchains and sched_stat_* is used for getting time periods. > This events are combined in user space, then it can be analized by > perf tools. We've talked about that numerous times. But I still don't really understand why you're not using sched switch events and compute the difference between schedule in and schedule out. I think you said that's because you got too much events with sched switch. Are you loosing events? Otherwise I don't see why it's a problem. Also the sched_stat_sleep event produce an event which period equals the time slept. Internally, perf split this into as many events as that period because the requested period for trace events is 1 by default. We probably should allow to send events with a higher number than the one requested. This this produce sometimes a huge pile of events, and that even often result in tons of lost events. We definetly need to fix that. In the meantime you'll certainly get saner results by just recording sched switch events. -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] trace: add ability to set a target task for events (v2)
On Wed, Jul 11, 2012 at 04:33:41PM +0200, Peter Zijlstra wrote: > On Wed, 2012-07-11 at 16:31 +0200, Frederic Weisbecker wrote: > > On Wed, Jul 11, 2012 at 06:14:58PM +0400, Andrew Vagin wrote: > > > A few events are interesting not only for a current task. > > > For example, sched_stat_* are interesting for a task, which > > > wake up. For this reason, it will be good, if such events will > > > be delivered to a target task too. > > > > > > Now a target task can be set by using __perf_task(). > > > > > > The original idea and a draft patch belongs to Peter Zijlstra. > > > > > > I need this events for profiling sleep times. sched_switch is used for > > > getting callchains and sched_stat_* is used for getting time periods. > > > This events are combined in user space, then it can be analized by > > > perf tools. > > > > We've talked about that numerous times. But I still don't really > > understand why you're not using sched switch events and compute > > the difference between schedule in and schedule out. > > > > I think you said that's because you got too much events with sched > > switch. Are you loosing events? Otherwise I don't see why it's > > a problem. > > > > Also the sched_stat_sleep event produce an event which period equals the > > time slept. Internally, perf split this into as many events as that period > > because the requested period for trace events is 1 by default. We probably > > should allow to send events with a higher number than the one requested. > > This > > this produce sometimes a huge pile of events, and that even often result in > > tons of lost events. We definetly need to fix that. > > > > In the meantime you'll certainly get saner results by just recording > > sched switch events. > > Not really, there's an arbitrary large delay between wakeup and getting > scheduled back in, which is unrelated to the cause that you went to > sleep. > > The wants the time between going to sleep and getting woken up, > sched_switch simply doesn't give you that. In this case he can just record sched wakeup as well. With sched_switch + sched_wakeup, he'll unlikely lose events. With sched_stat_sleep he will lose events, unless we fix this period demux thing. -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] trace: add ability to set a target task for events (v2)
On Wed, Jul 11, 2012 at 04:38:19PM +0200, Peter Zijlstra wrote: > On Wed, 2012-07-11 at 16:36 +0200, Frederic Weisbecker wrote: > > > > In this case he can just record sched wakeup as well. With sched_switch > > + sched_wakeup, he'll unlikely lose events. > > > > With sched_stat_sleep he will lose events, unless we fix this period > > demux thing. > > But without this patch, the sched_wakeup will belong to another task, so > if you trace task A, and B wakes you, you'll never see the wakeup. Ah so the goal is to minimize the amount of events by only tracing task A? Ok then. Still we need to fix these events that use __perf_count() because wide tracing of sched_switch/wake_up still generate less events than sched stat sleep. I believe: perf record -e sched:sched_stat_sleep sleep 1 produces 1 billion events because we sleep 1 billion nanosecs. Or something like that. -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] trace: add ability to set a target task for events (v2)
On Wed, Jul 11, 2012 at 04:55:08PM +0200, Peter Zijlstra wrote: > On Wed, 2012-07-11 at 16:48 +0200, Frederic Weisbecker wrote: > > On Wed, Jul 11, 2012 at 04:38:19PM +0200, Peter Zijlstra wrote: > > > On Wed, 2012-07-11 at 16:36 +0200, Frederic Weisbecker wrote: > > > > > > > > In this case he can just record sched wakeup as well. With sched_switch > > > > + sched_wakeup, he'll unlikely lose events. > > > > > > > > With sched_stat_sleep he will lose events, unless we fix this period > > > > demux thing. > > > > > > But without this patch, the sched_wakeup will belong to another task, so > > > if you trace task A, and B wakes you, you'll never see the wakeup. > > > > Ah so the goal is to minimize the amount of events by only tracing task A? > > Right, or just not having sufficient privs to trace the world. And a > wakeup of A is very much also part of A, not only the task doing the > wakeup. > > Hence the proposed mechanism. Yeah that's fair. > > > Ok then. Still we need to fix these events that use __perf_count() because > > wide tracing of sched_switch/wake_up still generate less events than > > sched stat sleep. > > > > I believe: > > > > perf record -e sched:sched_stat_sleep sleep 1 > > > > produces 1 billion events because we sleep 1 billion nanosecs. Or > > something like that. > > Right.. back when I did that the plan was to make PERF_SAMPLE_PERIOD fix > that, of course that never seemed to have happened. > > With PERF_SAMPLE_PERIOD you can simply write the 1b into the period of 1 > event and be done with it. I believe the perf tools handle pretty well variable periods of an event on top of PERF_SAMPLE_PERIOD. We just need to tweak the maths in perf_swevent_overflow() I think... -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] trace: add ability to set a target task for events (v2)
On Wed, Jul 11, 2012 at 05:12:04PM +0200, Peter Zijlstra wrote: > On Wed, 2012-07-11 at 16:55 +0200, Peter Zijlstra wrote: > > Right.. back when I did that the plan was to make PERF_SAMPLE_PERIOD fix > > that, of course that never seemed to have happened. > > > > With PERF_SAMPLE_PERIOD you can simply write the 1b into the period of 1 > > event and be done with it. > > It did! Andrew fixed it.. Ah! Then may be we need to force PERF_SAMPLE_PERIOD on tracepoints from perf tools. I need to check that. -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[RFC PATCH 00/11] rcu: Userspace RCU extended quiescent state v2
Hi, There are significant changes this time. I reverted back to using a TIF flag to hook on syscalls slow path and put the hooks on high level exception handlers instead of low level ones. It makes the code more portable between x86-32 and x86-64, it makes the hooks clearer and easier to review and the overhead is lowered in the off-case. This can be even better if we use jump labels later. Thanks. git://github.com/fweisbec/linux-dynticks.git rcu/user-2 Frederic Weisbecker (11): rcu: Settle config for userspace extended quiescent state rcu: Allow rcu_user_enter()/exit() to nest rcu: Ignore userspace extended quiescent state by default rcu: Switch task's syscall hooks on context switch x86: Syscall hooks for userspace RCU extended QS x86: Exception hooks for userspace RCU extended QS rcu: Exit RCU extended QS on kernel preemption after irq/exception rcu: Exit RCU extended QS on user preemption x86: Use the new schedule_user API on userspace preemption x86: Exit RCU extended QS on notify resume rcu: Userspace RCU extended QS selftest arch/Kconfig | 10 ++ arch/um/drivers/mconsole_kern.c|2 +- arch/x86/Kconfig |1 + arch/x86/include/asm/rcu.h | 20 +++ arch/x86/include/asm/thread_info.h | 10 -- arch/x86/kernel/entry_64.S |8 ++-- arch/x86/kernel/ptrace.c |5 +++ arch/x86/kernel/signal.c |4 ++ arch/x86/kernel/traps.c| 30 arch/x86/mm/fault.c| 13 ++- include/linux/rcupdate.h | 10 ++ include/linux/sched.h | 20 ++- init/Kconfig | 18 ++ kernel/rcutree.c | 64 +++- kernel/rcutree.h |4 ++ kernel/sched/core.c| 10 +- 16 files changed, 192 insertions(+), 37 deletions(-) create mode 100644 arch/x86/include/asm/rcu.h -- 1.7.5.4 -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[PATCH 04/11] rcu: Switch task's syscall hooks on context switch
Clear the syscalls hook of a task when it's scheduled out so that if the task migrates, it doesn't run the syscall slow path on a CPU that might not need it. Also set the syscalls hook on the next task if needed. Signed-off-by: Frederic Weisbecker Cc: Alessio Igor Bogani Cc: Andrew Morton Cc: Avi Kivity Cc: Chris Metcalf Cc: Christoph Lameter Cc: Geoff Levand Cc: Gilad Ben Yossef Cc: Hakan Akkan Cc: H. Peter Anvin Cc: Ingo Molnar Cc: Josh Triplett Cc: Kevin Hilman Cc: Max Krasnyansky Cc: Peter Zijlstra Cc: Stephen Hemminger Cc: Steven Rostedt Cc: Sven-Thorsten Dietrich Cc: Thomas Gleixner --- arch/um/drivers/mconsole_kern.c |2 +- include/linux/rcupdate.h|2 ++ include/linux/sched.h | 20 +++- kernel/rcutree.c| 15 +++ kernel/sched/core.c |2 +- 5 files changed, 30 insertions(+), 11 deletions(-) diff --git a/arch/um/drivers/mconsole_kern.c b/arch/um/drivers/mconsole_kern.c index 88e466b..e61922d 100644 --- a/arch/um/drivers/mconsole_kern.c +++ b/arch/um/drivers/mconsole_kern.c @@ -705,7 +705,7 @@ static void stack_proc(void *arg) struct task_struct *from = current, *to = arg; to->thread.saved_task = from; - rcu_switch_from(from); + rcu_switch(from, to); switch_to(from, to, from); } diff --git a/include/linux/rcupdate.h b/include/linux/rcupdate.h index a72f25e..1e57888 100644 --- a/include/linux/rcupdate.h +++ b/include/linux/rcupdate.h @@ -198,6 +198,8 @@ extern void rcu_user_enter(void); extern void rcu_user_exit(void); extern void rcu_user_enter_irq(void); extern void rcu_user_exit_irq(void); +extern void rcu_user_hooks_switch(struct task_struct *prev, + struct task_struct *next); #else static inline void rcu_user_enter(void) { } static inline void rcu_user_exit(void) { } diff --git a/include/linux/sched.h b/include/linux/sched.h index 4059c0f..e17fcd0 100644 --- a/include/linux/sched.h +++ b/include/linux/sched.h @@ -1871,23 +1871,25 @@ static inline void rcu_copy_process(struct task_struct *p) INIT_LIST_HEAD(&p->rcu_node_entry); } -static inline void rcu_switch_from(struct task_struct *prev) -{ - if (prev->rcu_read_lock_nesting != 0) - rcu_preempt_note_context_switch(); -} - #else static inline void rcu_copy_process(struct task_struct *p) { } -static inline void rcu_switch_from(struct task_struct *prev) -{ -} +#endif +static inline void rcu_switch(struct task_struct *prev, + struct task_struct *next) +{ +#ifdef CONFIG_PREEMPT_RCU + if (prev->rcu_read_lock_nesting != 0) + rcu_preempt_note_context_switch(); +#endif +#ifdef CONFIG_RCU_USER_QS + rcu_user_hooks_switch(prev, next); #endif +} #ifdef CONFIG_SMP extern void do_set_cpus_allowed(struct task_struct *p, diff --git a/kernel/rcutree.c b/kernel/rcutree.c index 78b0c30..2d79308 100644 --- a/kernel/rcutree.c +++ b/kernel/rcutree.c @@ -720,6 +720,21 @@ int rcu_is_cpu_idle(void) } EXPORT_SYMBOL(rcu_is_cpu_idle); +#ifdef CONFIG_RCU_USER_QS +void rcu_user_hooks_switch(struct task_struct *prev, + struct task_struct *next) +{ + struct rcu_dynticks *rdtp; + + /* Interrupts are disabled in context switch */ + rdtp = &__get_cpu_var(rcu_dynticks); + if (!rdtp->ignore_user_qs) { + clear_tsk_thread_flag(prev, TIF_NOHZ); + set_tsk_thread_flag(next, TIF_NOHZ); + } +} +#endif + #if defined(CONFIG_PROVE_RCU) && defined(CONFIG_HOTPLUG_CPU) /* diff --git a/kernel/sched/core.c b/kernel/sched/core.c index d5594a4..fa61d8a 100644 --- a/kernel/sched/core.c +++ b/kernel/sched/core.c @@ -2081,7 +2081,7 @@ context_switch(struct rq *rq, struct task_struct *prev, #endif /* Here we just switch the register state and the stack. */ - rcu_switch_from(prev); + rcu_switch(prev, next); switch_to(prev, next, prev); barrier(); -- 1.7.5.4 -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[PATCH 05/11] x86: Syscall hooks for userspace RCU extended QS
Add syscall slow path hooks to notify syscall entry and exit on CPUs that want to support userspace RCU extended quiescent state. Signed-off-by: Frederic Weisbecker Cc: Alessio Igor Bogani Cc: Andrew Morton Cc: Avi Kivity Cc: Chris Metcalf Cc: Christoph Lameter Cc: Geoff Levand Cc: Gilad Ben Yossef Cc: Hakan Akkan Cc: H. Peter Anvin Cc: Ingo Molnar Cc: Josh Triplett Cc: Kevin Hilman Cc: Max Krasnyansky Cc: Peter Zijlstra Cc: Stephen Hemminger Cc: Steven Rostedt Cc: Sven-Thorsten Dietrich Cc: Thomas Gleixner --- arch/x86/include/asm/thread_info.h | 10 +++--- arch/x86/kernel/ptrace.c |5 + 2 files changed, 12 insertions(+), 3 deletions(-) diff --git a/arch/x86/include/asm/thread_info.h b/arch/x86/include/asm/thread_info.h index 89f794f..c535d84 100644 --- a/arch/x86/include/asm/thread_info.h +++ b/arch/x86/include/asm/thread_info.h @@ -89,6 +89,7 @@ struct thread_info { #define TIF_NOTSC 16 /* TSC is not accessible in userland */ #define TIF_IA32 17 /* IA32 compatibility process */ #define TIF_FORK 18 /* ret_from_fork */ +#define TIF_NOHZ 19 /* in adaptive nohz mode */ #define TIF_MEMDIE 20 /* is terminating due to OOM killer */ #define TIF_DEBUG 21 /* uses debug registers */ #define TIF_IO_BITMAP 22 /* uses I/O bitmap */ @@ -114,6 +115,7 @@ struct thread_info { #define _TIF_NOTSC (1 << TIF_NOTSC) #define _TIF_IA32 (1 << TIF_IA32) #define _TIF_FORK (1 << TIF_FORK) +#define _TIF_NOHZ (1 << TIF_NOHZ) #define _TIF_DEBUG (1 << TIF_DEBUG) #define _TIF_IO_BITMAP (1 << TIF_IO_BITMAP) #define _TIF_FORCED_TF (1 << TIF_FORCED_TF) @@ -126,12 +128,13 @@ struct thread_info { /* work to do in syscall_trace_enter() */ #define _TIF_WORK_SYSCALL_ENTRY\ (_TIF_SYSCALL_TRACE | _TIF_SYSCALL_EMU | _TIF_SYSCALL_AUDIT | \ -_TIF_SECCOMP | _TIF_SINGLESTEP | _TIF_SYSCALL_TRACEPOINT) +_TIF_SECCOMP | _TIF_SINGLESTEP | _TIF_SYSCALL_TRACEPOINT | \ +_TIF_NOHZ) /* work to do in syscall_trace_leave() */ #define _TIF_WORK_SYSCALL_EXIT \ (_TIF_SYSCALL_TRACE | _TIF_SYSCALL_AUDIT | _TIF_SINGLESTEP |\ -_TIF_SYSCALL_TRACEPOINT) +_TIF_SYSCALL_TRACEPOINT | _TIF_NOHZ) /* work to do on interrupt/exception return */ #define _TIF_WORK_MASK \ @@ -141,7 +144,8 @@ struct thread_info { /* work to do on any return to user space */ #define _TIF_ALLWORK_MASK \ - ((0x & ~_TIF_SECCOMP) | _TIF_SYSCALL_TRACEPOINT) + ((0x & ~_TIF_SECCOMP) | _TIF_SYSCALL_TRACEPOINT | \ + _TIF_NOHZ) /* Only used for 64 bit */ #define _TIF_DO_NOTIFY_MASK\ diff --git a/arch/x86/kernel/ptrace.c b/arch/x86/kernel/ptrace.c index c4c6a5c..9f94f8e 100644 --- a/arch/x86/kernel/ptrace.c +++ b/arch/x86/kernel/ptrace.c @@ -21,6 +21,7 @@ #include #include #include +#include #include #include @@ -1463,6 +1464,8 @@ long syscall_trace_enter(struct pt_regs *regs) { long ret = 0; + rcu_user_exit(); + /* * If we stepped into a sysenter/syscall insn, it trapped in * kernel mode; do_debug() cleared TF and set TIF_SINGLESTEP. @@ -1526,4 +1529,6 @@ void syscall_trace_leave(struct pt_regs *regs) !test_thread_flag(TIF_SYSCALL_EMU); if (step || test_thread_flag(TIF_SYSCALL_TRACE)) tracehook_report_syscall_exit(regs, step); + + rcu_user_enter(); } -- 1.7.5.4 -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[PATCH 06/11] x86: Exception hooks for userspace RCU extended QS
Add necessary hooks to x86 exception for userspace RCU extended quiescent state support. This includes traps, page fault, debug exceptions, etc... Signed-off-by: Frederic Weisbecker Cc: Alessio Igor Bogani Cc: Andrew Morton Cc: Avi Kivity Cc: Chris Metcalf Cc: Christoph Lameter Cc: Geoff Levand Cc: Gilad Ben Yossef Cc: Hakan Akkan Cc: H. Peter Anvin Cc: Ingo Molnar Cc: Josh Triplett Cc: Kevin Hilman Cc: Max Krasnyansky Cc: Peter Zijlstra Cc: Stephen Hemminger Cc: Steven Rostedt Cc: Sven-Thorsten Dietrich Cc: Thomas Gleixner --- arch/x86/include/asm/rcu.h | 20 arch/x86/kernel/traps.c| 30 ++ arch/x86/mm/fault.c| 13 +++-- 3 files changed, 53 insertions(+), 10 deletions(-) create mode 100644 arch/x86/include/asm/rcu.h diff --git a/arch/x86/include/asm/rcu.h b/arch/x86/include/asm/rcu.h new file mode 100644 index 000..439815b --- /dev/null +++ b/arch/x86/include/asm/rcu.h @@ -0,0 +1,20 @@ +#ifndef _ASM_X86_RCU_H +#define _ASM_X86_RCU_H + +#include +#include + +static inline void exception_enter(struct pt_regs *regs) +{ + rcu_user_exit(); +} + +static inline void exception_exit(struct pt_regs *regs) +{ +#ifdef CONFIG_RCU_USER_QS + if (user_mode(regs)) + rcu_user_enter(); +#endif +} + +#endif diff --git a/arch/x86/kernel/traps.c b/arch/x86/kernel/traps.c index 05b31d9..9b8195b 100644 --- a/arch/x86/kernel/traps.c +++ b/arch/x86/kernel/traps.c @@ -52,6 +52,7 @@ #include #include #include +#include #include @@ -178,11 +179,15 @@ vm86_trap: #define DO_ERROR(trapnr, signr, str, name) \ dotraplinkage void do_##name(struct pt_regs *regs, long error_code)\ { \ - if (notify_die(DIE_TRAP, str, regs, error_code, trapnr, signr) \ - == NOTIFY_STOP) \ + exception_enter(regs); \ + if (notify_die(DIE_TRAP, str, regs, error_code, \ + trapnr, signr) == NOTIFY_STOP) {\ + exception_exit(regs); \ return; \ + } \ conditional_sti(regs); \ do_trap(trapnr, signr, str, regs, error_code, NULL);\ + exception_exit(regs); \ } #define DO_ERROR_INFO(trapnr, signr, str, name, sicode, siaddr) \ @@ -193,11 +198,15 @@ dotraplinkage void do_##name(struct pt_regs *regs, long error_code) \ info.si_errno = 0; \ info.si_code = sicode; \ info.si_addr = (void __user *)siaddr; \ - if (notify_die(DIE_TRAP, str, regs, error_code, trapnr, signr) \ - == NOTIFY_STOP) \ + exception_enter(regs); \ + if (notify_die(DIE_TRAP, str, regs, error_code, \ + trapnr, signr) == NOTIFY_STOP) {\ + exception_exit(regs); \ return; \ + } \ conditional_sti(regs); \ do_trap(trapnr, signr, str, regs, error_code, &info); \ + exception_exit(regs); \ } DO_ERROR_INFO(X86_TRAP_DE, SIGFPE, "divide error", divide_error, FPE_INTDIV, @@ -311,6 +320,7 @@ dotraplinkage void __kprobes notrace do_int3(struct pt_regs *regs, long error_co ftrace_int3_handler(regs)) return; #endif + exception_enter(regs); #ifdef CONFIG_KGDB_LOW_LEVEL_TRAP if (kgdb_ll_trap(DIE_INT3, "int3", regs, error_code, X86_TRAP_BP, SIGTRAP) == NOTIFY_STOP) @@ -330,6 +340,7 @@ dotraplinkage void __kprobes notrace do_int3(struct pt_regs *regs, long error_co do_trap(X86_TRAP_BP, SIGTRAP, "int3", regs, error_code, NULL); preempt_conditional_cli(regs); debug_stack_usage_dec(); + exception_exit(regs); } #ifdef CONFIG_X86_64 @@ -390,6 +401,8 @@ dotraplinkage void __kprobes do_debug(struct pt_regs *regs, long error_code) unsigned long dr6; int si_code; + exception_enter(regs); + get_debugreg(dr6, 6); /* Filter out all the reserved bits which are preset to 1 */ @@ -405,7 +418,7 @@ dotraplinkage void __kprobes do_debug(stru
[PATCH 09/11] x86: Use the new schedule_user API on userspace preemption
This way we can exit the RCU extended quiescent state before we schedule a new task from irq/exception exit. Signed-off-by: Frederic Weisbecker Cc: Alessio Igor Bogani Cc: Andrew Morton Cc: Avi Kivity Cc: Chris Metcalf Cc: Christoph Lameter Cc: Geoff Levand Cc: Gilad Ben Yossef Cc: Hakan Akkan Cc: H. Peter Anvin Cc: Ingo Molnar Cc: Josh Triplett Cc: Kevin Hilman Cc: Max Krasnyansky Cc: Peter Zijlstra Cc: Stephen Hemminger Cc: Steven Rostedt Cc: Sven-Thorsten Dietrich Cc: Thomas Gleixner --- arch/x86/kernel/entry_64.S |8 1 files changed, 4 insertions(+), 4 deletions(-) diff --git a/arch/x86/kernel/entry_64.S b/arch/x86/kernel/entry_64.S index 7d65133..e97d42d 100644 --- a/arch/x86/kernel/entry_64.S +++ b/arch/x86/kernel/entry_64.S @@ -565,7 +565,7 @@ sysret_careful: TRACE_IRQS_ON ENABLE_INTERRUPTS(CLBR_NONE) pushq_cfi %rdi - call schedule + call schedule_user popq_cfi %rdi jmp sysret_check @@ -678,7 +678,7 @@ int_careful: TRACE_IRQS_ON ENABLE_INTERRUPTS(CLBR_NONE) pushq_cfi %rdi - call schedule + call schedule_user popq_cfi %rdi DISABLE_INTERRUPTS(CLBR_NONE) TRACE_IRQS_OFF @@ -974,7 +974,7 @@ retint_careful: TRACE_IRQS_ON ENABLE_INTERRUPTS(CLBR_NONE) pushq_cfi %rdi - call schedule + call schedule_user popq_cfi %rdi GET_THREAD_INFO(%rcx) DISABLE_INTERRUPTS(CLBR_NONE) @@ -1467,7 +1467,7 @@ paranoid_userspace: paranoid_schedule: TRACE_IRQS_ON ENABLE_INTERRUPTS(CLBR_ANY) - call schedule + call schedule_user DISABLE_INTERRUPTS(CLBR_ANY) TRACE_IRQS_OFF jmp paranoid_userspace -- 1.7.5.4 -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[PATCH 10/11] x86: Exit RCU extended QS on notify resume
do_notify_resume() may be called on irq or exception exit. But at that time the exception has already called rcu_user_enter() and the irq has already called rcu_irq_exit(). Since it can use RCU read side critical section, we must call rcu_user_exit() before doing anything there. Then we must call back rcu_user_enter() after this function because we know we are going to userspace from there. This complete support for userspace RCU extended quiescent state in x86-64. Signed-off-by: Frederic Weisbecker Cc: Alessio Igor Bogani Cc: Andrew Morton Cc: Avi Kivity Cc: Chris Metcalf Cc: Christoph Lameter Cc: Geoff Levand Cc: Gilad Ben Yossef Cc: Hakan Akkan Cc: H. Peter Anvin Cc: Ingo Molnar Cc: Josh Triplett Cc: Kevin Hilman Cc: Max Krasnyansky Cc: Peter Zijlstra Cc: Stephen Hemminger Cc: Steven Rostedt Cc: Sven-Thorsten Dietrich Cc: Thomas Gleixner --- arch/x86/Kconfig |1 + arch/x86/kernel/signal.c |4 2 files changed, 5 insertions(+), 0 deletions(-) diff --git a/arch/x86/Kconfig b/arch/x86/Kconfig index c70684f..38dfcc2 100644 --- a/arch/x86/Kconfig +++ b/arch/x86/Kconfig @@ -95,6 +95,7 @@ config X86 select KTIME_SCALAR if X86_32 select GENERIC_STRNCPY_FROM_USER select GENERIC_STRNLEN_USER + select HAVE_RCU_USER_QS if X86_64 config INSTRUCTION_DECODER def_bool (KPROBES || PERF_EVENTS || UPROBES) diff --git a/arch/x86/kernel/signal.c b/arch/x86/kernel/signal.c index 21af737..5cc2579 100644 --- a/arch/x86/kernel/signal.c +++ b/arch/x86/kernel/signal.c @@ -776,6 +776,8 @@ static void do_signal(struct pt_regs *regs) void do_notify_resume(struct pt_regs *regs, void *unused, __u32 thread_info_flags) { + rcu_user_exit(); + #ifdef CONFIG_X86_MCE /* notify userspace of pending MCEs */ if (thread_info_flags & _TIF_MCE_NOTIFY) @@ -801,6 +803,8 @@ do_notify_resume(struct pt_regs *regs, void *unused, __u32 thread_info_flags) #ifdef CONFIG_X86_32 clear_thread_flag(TIF_IRET); #endif /* CONFIG_X86_32 */ + + rcu_user_enter(); } void signal_fault(struct pt_regs *regs, void __user *frame, char *where) -- 1.7.5.4 -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[PATCH 11/11] rcu: Userspace RCU extended QS selftest
Provide a config option that enables the userspace RCU extended quiescent state on every CPUs by default. This is for testing purpose. Signed-off-by: Frederic Weisbecker Cc: Alessio Igor Bogani Cc: Andrew Morton Cc: Avi Kivity Cc: Chris Metcalf Cc: Christoph Lameter Cc: Geoff Levand Cc: Gilad Ben Yossef Cc: Hakan Akkan Cc: H. Peter Anvin Cc: Ingo Molnar Cc: Josh Triplett Cc: Kevin Hilman Cc: Max Krasnyansky Cc: Peter Zijlstra Cc: Stephen Hemminger Cc: Steven Rostedt Cc: Sven-Thorsten Dietrich Cc: Thomas Gleixner --- init/Kconfig |8 kernel/rcutree.c |2 +- 2 files changed, 9 insertions(+), 1 deletions(-) diff --git a/init/Kconfig b/init/Kconfig index 3a4af8f..7d1db2e 100644 --- a/init/Kconfig +++ b/init/Kconfig @@ -451,6 +451,14 @@ config RCU_USER_QS excluded from the global RCU state machine and thus doesn't to keep the timer tick on for RCU. +config RCU_USER_QS_FORCE + bool "Force userspace extended QS by default" + depends on RCU_USER_QS + help + Set the hooks in user/kernel boundaries by default in order to + test this feature that treats userspace as an extended quiescent + state until we have a real user like a full adaptive nohz option. + config RCU_FANOUT int "Tree-based hierarchical RCU fanout value" range 2 64 if 64BIT diff --git a/kernel/rcutree.c b/kernel/rcutree.c index 2d79308..9427aba 100644 --- a/kernel/rcutree.c +++ b/kernel/rcutree.c @@ -209,7 +209,7 @@ EXPORT_SYMBOL_GPL(rcu_note_context_switch); DEFINE_PER_CPU(struct rcu_dynticks, rcu_dynticks) = { .dynticks_nesting = DYNTICK_TASK_EXIT_IDLE, .dynticks = ATOMIC_INIT(1), -#ifdef CONFIG_RCU_USER_QS +#if defined(CONFIG_RCU_USER_QS) && !defined(CONFIG_RCU_USER_QS_FORCE) .ignore_user_qs = true, #endif }; -- 1.7.5.4 -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[PATCH 07/11] rcu: Exit RCU extended QS on kernel preemption after irq/exception
When an exception or an irq exits, and we are going to resume into interrupted kernel code, the low level architecture code calls preempt_schedule_irq() if there is a need to reschedule. If the interrupt/exception occured between a call to rcu_user_enter() (from syscall exit, exception exit, do_notify_resume exit, ...) and a real resume to userspace (iret,...), preempt_schedule_irq() can be called whereas RCU thinks we are in userspace. But preempt_schedule_irq() is going to run kernel code and may be some RCU read side critical section. We must exit the userspace extended quiescent state before we call it. To solve this, just call rcu_user_exit() in the beginning of preempt_schedule_irq(). Signed-off-by: Frederic Weisbecker Cc: Alessio Igor Bogani Cc: Andrew Morton Cc: Avi Kivity Cc: Chris Metcalf Cc: Christoph Lameter Cc: Geoff Levand Cc: Gilad Ben Yossef Cc: Hakan Akkan Cc: H. Peter Anvin Cc: Ingo Molnar Cc: Josh Triplett Cc: Kevin Hilman Cc: Max Krasnyansky Cc: Peter Zijlstra Cc: Stephen Hemminger Cc: Steven Rostedt Cc: Sven-Thorsten Dietrich Cc: Thomas Gleixner --- kernel/sched/core.c |1 + 1 files changed, 1 insertions(+), 0 deletions(-) diff --git a/kernel/sched/core.c b/kernel/sched/core.c index fa61d8a..1e0fa5b 100644 --- a/kernel/sched/core.c +++ b/kernel/sched/core.c @@ -3433,6 +3433,7 @@ asmlinkage void __sched preempt_schedule_irq(void) /* Catch callers which need to be fixed */ BUG_ON(ti->preempt_count || !irqs_disabled()); + rcu_user_exit(); do { add_preempt_count(PREEMPT_ACTIVE); local_irq_enable(); -- 1.7.5.4 -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[PATCH 01/11] rcu: Settle config for userspace extended quiescent state
Create a new config option under the RCU menu that put CPUs under RCU extended quiescent state (as in dynticks idle mode) when they run in userspace. This require some contribution from architectures to hook into kernel and userspace boundaries. Signed-off-by: Frederic Weisbecker Cc: Alessio Igor Bogani Cc: Andrew Morton Cc: Avi Kivity Cc: Chris Metcalf Cc: Christoph Lameter Cc: Geoff Levand Cc: Gilad Ben Yossef Cc: Hakan Akkan Cc: H. Peter Anvin Cc: Ingo Molnar Cc: Josh Triplett Cc: Kevin Hilman Cc: Max Krasnyansky Cc: Peter Zijlstra Cc: Stephen Hemminger Cc: Steven Rostedt Cc: Sven-Thorsten Dietrich Cc: Thomas Gleixner --- arch/Kconfig | 10 ++ include/linux/rcupdate.h |8 init/Kconfig | 10 ++ kernel/rcutree.c |5 - 4 files changed, 32 insertions(+), 1 deletions(-) diff --git a/arch/Kconfig b/arch/Kconfig index 8c3d957..1c7c9be 100644 --- a/arch/Kconfig +++ b/arch/Kconfig @@ -274,4 +274,14 @@ config SECCOMP_FILTER See Documentation/prctl/seccomp_filter.txt for details. +config HAVE_RCU_USER_QS + bool + help + Provide kernel entry/exit hooks necessary for userspace + RCU extended quiescent state. Syscalls need to be wrapped inside + rcu_user_exit()-rcu_user_enter() through the slow path using + TIF_NOHZ flag. Exceptions handlers must be wrapped as well. Irqs + are already protected inside rcu_irq_enter/rcu_irq_exit() but + preemption or signal handling on irq exit still need to be protected. + source "kernel/gcov/Kconfig" diff --git a/include/linux/rcupdate.h b/include/linux/rcupdate.h index 148f381..a72f25e 100644 --- a/include/linux/rcupdate.h +++ b/include/linux/rcupdate.h @@ -192,10 +192,18 @@ extern void rcu_idle_enter(void); extern void rcu_idle_exit(void); extern void rcu_irq_enter(void); extern void rcu_irq_exit(void); + +#ifdef CONFIG_RCU_USER_QS extern void rcu_user_enter(void); extern void rcu_user_exit(void); extern void rcu_user_enter_irq(void); extern void rcu_user_exit_irq(void); +#else +static inline void rcu_user_enter(void) { } +static inline void rcu_user_exit(void) { } +#endif /* CONFIG_RCU_USER_QS */ + + extern void exit_rcu(void); /** diff --git a/init/Kconfig b/init/Kconfig index d07dcf9..3a4af8f 100644 --- a/init/Kconfig +++ b/init/Kconfig @@ -441,6 +441,16 @@ config PREEMPT_RCU This option enables preemptible-RCU code that is common between the TREE_PREEMPT_RCU and TINY_PREEMPT_RCU implementations. +config RCU_USER_QS + bool "Consider userspace as in RCU extended quiescent state" + depends on HAVE_RCU_USER_QS && SMP + help + This option sets hooks on kernel / userspace boundaries and + puts RCU in extended quiescent state when the CPU runs in + userspace. It means that when a CPU runs in userspace, it is + excluded from the global RCU state machine and thus doesn't + to keep the timer tick on for RCU. + config RCU_FANOUT int "Tree-based hierarchical RCU fanout value" range 2 64 if 64BIT diff --git a/kernel/rcutree.c b/kernel/rcutree.c index 5541a07..efa5983 100644 --- a/kernel/rcutree.c +++ b/kernel/rcutree.c @@ -423,6 +423,7 @@ void rcu_idle_enter(void) } EXPORT_SYMBOL_GPL(rcu_idle_enter); +#ifdef CONFIG_RCU_USER_QS /** * rcu_user_enter - inform RCU that we are resuming userspace. * @@ -437,7 +438,6 @@ void rcu_user_enter(void) } EXPORT_SYMBOL_GPL(rcu_user_enter); - /** * rcu_user_enter_irq - inform RCU that we are going to resume userspace * after the current irq returns. @@ -458,6 +458,7 @@ void rcu_user_enter_irq(void) rdtp->dynticks_nesting = 1; local_irq_restore(flags); } +#endif /** * rcu_irq_exit - inform RCU that current CPU is exiting irq towards idle @@ -561,6 +562,7 @@ void rcu_idle_exit(void) } EXPORT_SYMBOL_GPL(rcu_idle_exit); +#ifdef CONFIG_RCU_USER_QS /** * rcu_user_exit - inform RCU that we are exiting userspace. * @@ -594,6 +596,7 @@ void rcu_user_exit_irq(void) rdtp->dynticks_nesting += DYNTICK_TASK_EXIT_IDLE; local_irq_restore(flags); } +#endif /** * rcu_irq_enter - inform RCU that current CPU is entering irq away from idle -- 1.7.5.4 -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[PATCH 02/11] rcu: Allow rcu_user_enter()/exit() to nest
Allow calls to rcu_user_enter() even if we are already in userspace (as seen by RCU) and allow calls to rcu_user_exit() even if we are already in the kernel. This makes the APIs more flexible to be called from architectures. Exception entries for example won't need to know if they come from userspace before calling rcu_user_exit(). Signed-off-by: Frederic Weisbecker Cc: Alessio Igor Bogani Cc: Andrew Morton Cc: Avi Kivity Cc: Chris Metcalf Cc: Christoph Lameter Cc: Geoff Levand Cc: Gilad Ben Yossef Cc: Hakan Akkan Cc: H. Peter Anvin Cc: Ingo Molnar Cc: Josh Triplett Cc: Kevin Hilman Cc: Max Krasnyansky Cc: Peter Zijlstra Cc: Stephen Hemminger Cc: Steven Rostedt Cc: Sven-Thorsten Dietrich Cc: Thomas Gleixner --- kernel/rcutree.c | 41 + kernel/rcutree.h |3 +++ 2 files changed, 36 insertions(+), 8 deletions(-) diff --git a/kernel/rcutree.c b/kernel/rcutree.c index efa5983..d5df618 100644 --- a/kernel/rcutree.c +++ b/kernel/rcutree.c @@ -389,11 +389,9 @@ static void rcu_eqs_enter_common(struct rcu_dynticks *rdtp, long long oldval, */ static void rcu_eqs_enter(bool user) { - unsigned long flags; long long oldval; struct rcu_dynticks *rdtp; - local_irq_save(flags); rdtp = &__get_cpu_var(rcu_dynticks); oldval = rdtp->dynticks_nesting; WARN_ON_ONCE((oldval & DYNTICK_TASK_NEST_MASK) == 0); @@ -402,7 +400,6 @@ static void rcu_eqs_enter(bool user) else rdtp->dynticks_nesting -= DYNTICK_TASK_NEST_VALUE; rcu_eqs_enter_common(rdtp, oldval, user); - local_irq_restore(flags); } /** @@ -419,7 +416,11 @@ static void rcu_eqs_enter(bool user) */ void rcu_idle_enter(void) { + unsigned long flags; + + local_irq_save(flags); rcu_eqs_enter(0); + local_irq_restore(flags); } EXPORT_SYMBOL_GPL(rcu_idle_enter); @@ -434,7 +435,18 @@ EXPORT_SYMBOL_GPL(rcu_idle_enter); */ void rcu_user_enter(void) { - rcu_eqs_enter(1); + unsigned long flags; + struct rcu_dynticks *rdtp; + + WARN_ON_ONCE(!current->mm); + + local_irq_save(flags); + rdtp = &__get_cpu_var(rcu_dynticks); + if (!rdtp->in_user) { + rdtp->in_user = true; + rcu_eqs_enter(1); + } + local_irq_restore(flags); } EXPORT_SYMBOL_GPL(rcu_user_enter); @@ -529,11 +541,9 @@ static void rcu_eqs_exit_common(struct rcu_dynticks *rdtp, long long oldval, */ static void rcu_eqs_exit(bool user) { - unsigned long flags; struct rcu_dynticks *rdtp; long long oldval; - local_irq_save(flags); rdtp = &__get_cpu_var(rcu_dynticks); oldval = rdtp->dynticks_nesting; WARN_ON_ONCE(oldval < 0); @@ -542,7 +552,6 @@ static void rcu_eqs_exit(bool user) else rdtp->dynticks_nesting = DYNTICK_TASK_EXIT_IDLE; rcu_eqs_exit_common(rdtp, oldval, user); - local_irq_restore(flags); } /** @@ -558,7 +567,11 @@ static void rcu_eqs_exit(bool user) */ void rcu_idle_exit(void) { + unsigned long flags; + + local_irq_save(flags); rcu_eqs_exit(0); + local_irq_restore(flags); } EXPORT_SYMBOL_GPL(rcu_idle_exit); @@ -571,7 +584,16 @@ EXPORT_SYMBOL_GPL(rcu_idle_exit); */ void rcu_user_exit(void) { - rcu_eqs_exit(1); + unsigned long flags; + struct rcu_dynticks *rdtp; + + local_irq_save(flags); + rdtp = &__get_cpu_var(rcu_dynticks); + if (rdtp->in_user) { + rdtp->in_user = false; + rcu_eqs_exit(1); + } + local_irq_restore(flags); } EXPORT_SYMBOL_GPL(rcu_user_exit); @@ -2660,6 +2682,9 @@ rcu_boot_init_percpu_data(int cpu, struct rcu_state *rsp) rdp->dynticks = &per_cpu(rcu_dynticks, cpu); WARN_ON_ONCE(rdp->dynticks->dynticks_nesting != DYNTICK_TASK_EXIT_IDLE); WARN_ON_ONCE(atomic_read(&rdp->dynticks->dynticks) != 1); +#ifdef CONFIG_RCU_USER_QS + WARN_ON_ONCE(rdp->dynticks->in_user); +#endif rdp->cpu = cpu; rdp->rsp = rsp; raw_spin_unlock_irqrestore(&rnp->lock, flags); diff --git a/kernel/rcutree.h b/kernel/rcutree.h index cad96cb..4d82cb5 100644 --- a/kernel/rcutree.h +++ b/kernel/rcutree.h @@ -102,6 +102,9 @@ struct rcu_dynticks { /* idle-period nonlazy_posted snapshot. */ int tick_nohz_enabled_snap; /* Previously seen value from sysfs. */ #endif /* #ifdef CONFIG_RCU_FAST_NO_HZ */ +#ifdef CONFIG_RCU_USER_QS + bool in_user; /* Is the CPU in userland from RCU POV? */ +#endif }; /* RCU's kthread states for tracing. */ -- 1.7.5.4 -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[PATCH 03/11] rcu: Ignore userspace extended quiescent state by default
By default we don't want to enter into RCU extended quiescent state while in userspace because doing this produces some overhead (eg: use of syscall slowpath). Set it off by default and ready to run when some feature like adaptive tickless need it. Signed-off-by: Frederic Weisbecker Cc: Alessio Igor Bogani Cc: Andrew Morton Cc: Avi Kivity Cc: Chris Metcalf Cc: Christoph Lameter Cc: Geoff Levand Cc: Gilad Ben Yossef Cc: Hakan Akkan Cc: H. Peter Anvin Cc: Ingo Molnar Cc: Josh Triplett Cc: Kevin Hilman Cc: Max Krasnyansky Cc: Peter Zijlstra Cc: Stephen Hemminger Cc: Steven Rostedt Cc: Sven-Thorsten Dietrich Cc: Thomas Gleixner --- kernel/rcutree.c |5 - kernel/rcutree.h |1 + 2 files changed, 5 insertions(+), 1 deletions(-) diff --git a/kernel/rcutree.c b/kernel/rcutree.c index d5df618..78b0c30 100644 --- a/kernel/rcutree.c +++ b/kernel/rcutree.c @@ -209,6 +209,9 @@ EXPORT_SYMBOL_GPL(rcu_note_context_switch); DEFINE_PER_CPU(struct rcu_dynticks, rcu_dynticks) = { .dynticks_nesting = DYNTICK_TASK_EXIT_IDLE, .dynticks = ATOMIC_INIT(1), +#ifdef CONFIG_RCU_USER_QS + .ignore_user_qs = true, +#endif }; static int blimit = 10;/* Maximum callbacks per rcu_do_batch. */ @@ -442,7 +445,7 @@ void rcu_user_enter(void) local_irq_save(flags); rdtp = &__get_cpu_var(rcu_dynticks); - if (!rdtp->in_user) { + if (!rdtp->ignore_user_qs && !rdtp->in_user) { rdtp->in_user = true; rcu_eqs_enter(1); } diff --git a/kernel/rcutree.h b/kernel/rcutree.h index 4d82cb5..55bcef1 100644 --- a/kernel/rcutree.h +++ b/kernel/rcutree.h @@ -103,6 +103,7 @@ struct rcu_dynticks { int tick_nohz_enabled_snap; /* Previously seen value from sysfs. */ #endif /* #ifdef CONFIG_RCU_FAST_NO_HZ */ #ifdef CONFIG_RCU_USER_QS + bool ignore_user_qs;/* Treat userspace as extended QS or not */ bool in_user; /* Is the CPU in userland from RCU POV? */ #endif }; -- 1.7.5.4 -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[PATCH 08/11] rcu: Exit RCU extended QS on user preemption
When exceptions or irq are about to resume userspace, if the task needs to be rescheduled, the arch low level code calls schedule() directly. At that time we may be in extended quiescent state from RCU POV: the exception is not anymore protected inside rcu_user_exit() - rcu_user_enter() and the irq has called rcu_irq_exit() already. Create a new API schedule_user() that calls schedule() inside rcu_user_exit()-rcu_user_enter() in order to protect it. Archs will need to rely on it now to implement user preemption safely. Signed-off-by: Frederic Weisbecker Cc: Alessio Igor Bogani Cc: Andrew Morton Cc: Avi Kivity Cc: Chris Metcalf Cc: Christoph Lameter Cc: Geoff Levand Cc: Gilad Ben Yossef Cc: Hakan Akkan Cc: H. Peter Anvin Cc: Ingo Molnar Cc: Josh Triplett Cc: Kevin Hilman Cc: Max Krasnyansky Cc: Peter Zijlstra Cc: Stephen Hemminger Cc: Steven Rostedt Cc: Sven-Thorsten Dietrich Cc: Thomas Gleixner --- kernel/sched/core.c |7 +++ 1 files changed, 7 insertions(+), 0 deletions(-) diff --git a/kernel/sched/core.c b/kernel/sched/core.c index 1e0fa5b..a37619a 100644 --- a/kernel/sched/core.c +++ b/kernel/sched/core.c @@ -3332,6 +3332,13 @@ asmlinkage void __sched schedule(void) } EXPORT_SYMBOL(schedule); +asmlinkage void __sched schedule_user(void) +{ + rcu_user_exit(); + schedule(); + rcu_user_enter(); +} + /** * schedule_preempt_disabled - called with preemption disabled * -- 1.7.5.4 -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/