[PATCH] x86/platform/uv: Include clocksource.h for clocksource_touch_watchdog()

2015-12-11 Thread Ingo Molnar

* Andy Lutomirski <l...@kernel.org> wrote:

> diff --git a/arch/x86/include/asm/fixmap.h b/arch/x86/include/asm/fixmap.h
> index f80d70009ff8..6d7d0e52ed5a 100644
> --- a/arch/x86/include/asm/fixmap.h
> +++ b/arch/x86/include/asm/fixmap.h
> @@ -19,7 +19,6 @@
>  #include 
>  #include 
>  #include 
> -#include 
>  #ifdef CONFIG_X86_32
>  #include 
>  #include 

So this change triggered a build failure on 64-bit allmodconfig - fixed via the 
patch below. Your change unearthed a latent bug, a missing header inclusion.

Thanks,

Ingo

>
>From d51953b0873358d13b189996e6976dfa12a9b59d Mon Sep 17 00:00:00 2001
From: Ingo Molnar <mi...@kernel.org>
Date: Fri, 11 Dec 2015 09:01:30 +0100
Subject: [PATCH] x86/platform/uv: Include clocksource.h for 
clocksource_touch_watchdog()

This build failure triggers on 64-bit allmodconfig:

  arch/x86/platform/uv/uv_nmi.c:493:2: error: implicit declaration of function 
‘clocksource_touch_watchdog’ [-Werror=implicit-function-declaration]

which is caused by recent changes exposing a missing clocksource.h include
in uv_nmi.c:

  cc1e24fdb064 x86/vdso: Remove pvclock fixmap machinery

this file got clocksource.h indirectly via fixmap.h - that stealth route
of header inclusion is now gone.

Cc: Borislav Petkov <b...@alien8.de>
Cc: H. Peter Anvin <h...@zytor.com>
Cc: Linus Torvalds <torva...@linux-foundation.org>
Cc: Thomas Gleixner <t...@linutronix.de>
Signed-off-by: Ingo Molnar <mi...@kernel.org>
---
 arch/x86/platform/uv/uv_nmi.c | 1 +
 1 file changed, 1 insertion(+)

diff --git a/arch/x86/platform/uv/uv_nmi.c b/arch/x86/platform/uv/uv_nmi.c
index 327f21c3bde1..8dd80050d705 100644
--- a/arch/x86/platform/uv/uv_nmi.c
+++ b/arch/x86/platform/uv/uv_nmi.c
@@ -28,6 +28,7 @@
 #include 
 #include 
 #include 
+#include 
 
 #include 
 #include 

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 2/5] x86, vdso, pvclock: Simplify and speed up the vdso pvclock reader

2015-12-10 Thread Ingo Molnar

* Paolo Bonzini  wrote:

> 
> 
> On 10/12/2015 00:12, Andy Lutomirski wrote:
> > From: Andy Lutomirski 
> > 
> > The pvclock vdso code was too abstracted to understand easily and
> > excessively paranoid.  Simplify it for a huge speedup.
> > 
> > This opens the door for additional simplifications, as the vdso no
> > longer accesses the pvti for any vcpu other than vcpu 0.
> > 
> > Before, vclock_gettime using kvm-clock took about 45ns on my machine.
> > With this change, it takes 29ns, which is almost as fast as the pure TSC
> > implementation.
> > 
> > Signed-off-by: Andy Lutomirski 
> > ---
> >  arch/x86/entry/vdso/vclock_gettime.c | 81 
> > 
> >  1 file changed, 46 insertions(+), 35 deletions(-)
> > 
> > diff --git a/arch/x86/entry/vdso/vclock_gettime.c 
> > b/arch/x86/entry/vdso/vclock_gettime.c
> > index ca94fa649251..c325ba1bdddf 100644
> > --- a/arch/x86/entry/vdso/vclock_gettime.c
> > +++ b/arch/x86/entry/vdso/vclock_gettime.c
> > @@ -78,47 +78,58 @@ static notrace const struct pvclock_vsyscall_time_info 
> > *get_pvti(int cpu)
> >  
> >  static notrace cycle_t vread_pvclock(int *mode)
> >  {
> > -   const struct pvclock_vsyscall_time_info *pvti;
> > +   const struct pvclock_vcpu_time_info *pvti = _pvti(0)->pvti;
> > cycle_t ret;
> > -   u64 last;
> > -   u32 version;
> > -   u8 flags;
> > -   unsigned cpu, cpu1;
> > -
> > +   u64 tsc, pvti_tsc;
> > +   u64 last, delta, pvti_system_time;
> > +   u32 version, pvti_tsc_to_system_mul, pvti_tsc_shift;
> >  
> > /*
> > -* Note: hypervisor must guarantee that:
> > -* 1. cpu ID number maps 1:1 to per-CPU pvclock time info.
> > -* 2. that per-CPU pvclock time info is updated if the
> > -*underlying CPU changes.
> > -* 3. that version is increased whenever underlying CPU
> > -*changes.
> > +* Note: The kernel and hypervisor must guarantee that cpu ID
> > +* number maps 1:1 to per-CPU pvclock time info.
> > +*
> > +* Because the hypervisor is entirely unaware of guest userspace
> > +* preemption, it cannot guarantee that per-CPU pvclock time
> > +* info is updated if the underlying CPU changes or that that
> > +* version is increased whenever underlying CPU changes.
> >  *
> > +* On KVM, we are guaranteed that pvti updates for any vCPU are
> > +* atomic as seen by *all* vCPUs.  This is an even stronger
> > +* guarantee than we get with a normal seqlock.
> > +*
> > +* On Xen, we don't appear to have that guarantee, but Xen still
> > +* supplies a valid seqlock using the version field.
> > +
> > +* We only do pvclock vdso timing at all if
> > +* PVCLOCK_TSC_STABLE_BIT is set, and we interpret that bit to
> > +* mean that all vCPUs have matching pvti and that the TSC is
> > +* synced, so we can just look at vCPU 0's pvti.
> >  */
> > -   do {
> > -   cpu = __getcpu() & VGETCPU_CPU_MASK;
> > -   /* TODO: We can put vcpu id into higher bits of pvti.version.
> > -* This will save a couple of cycles by getting rid of
> > -* __getcpu() calls (Gleb).
> > -*/
> > -
> > -   pvti = get_pvti(cpu);
> > -
> > -   version = __pvclock_read_cycles(>pvti, , );
> > -
> > -   /*
> > -* Test we're still on the cpu as well as the version.
> > -* We could have been migrated just after the first
> > -* vgetcpu but before fetching the version, so we
> > -* wouldn't notice a version change.
> > -*/
> > -   cpu1 = __getcpu() & VGETCPU_CPU_MASK;
> > -   } while (unlikely(cpu != cpu1 ||
> > - (pvti->pvti.version & 1) ||
> > - pvti->pvti.version != version));
> > -
> > -   if (unlikely(!(flags & PVCLOCK_TSC_STABLE_BIT)))
> > +
> > +   if (unlikely(!(pvti->flags & PVCLOCK_TSC_STABLE_BIT))) {
> > *mode = VCLOCK_NONE;
> > +   return 0;
> > +   }
> > +
> > +   do {
> > +   version = pvti->version;
> > +
> > +   /* This is also a read barrier, so we'll read version first. */
> > +   tsc = rdtsc_ordered();
> > +
> > +   pvti_tsc_to_system_mul = pvti->tsc_to_system_mul;
> > +   pvti_tsc_shift = pvti->tsc_shift;
> > +   pvti_system_time = pvti->system_time;
> > +   pvti_tsc = pvti->tsc_timestamp;
> > +
> > +   /* Make sure that the version double-check is last. */
> > +   smp_rmb();
> > +   } while (unlikely((version & 1) || version != pvti->version));
> > +
> > +   delta = tsc - pvti_tsc;
> > +   ret = pvti_system_time +
> > +   pvclock_scale_delta(delta, pvti_tsc_to_system_mul,
> > +   pvti_tsc_shift);
> >  
> > /* refer to tsc.c read_tsc() comment for rationale */
> > last = gtod->cycle_last;
> > 
> 
> Reviewed-by: Paolo Bonzini 

Thanks. I've 

Re: [GIT PULL 0/5] perf/core improvements and fixes

2015-10-22 Thread Ingo Molnar

* Arnaldo Carvalho de Melo  wrote:

> Hi Ingo,
> 
>   Please consider pulling,
> 
> - Arnaldo
> 
> The following changes since commit 43e41adc9e8c36545888d78fed2ef8d102a938dc:
> 
>   perf record: Add ability to sample call branches (2015-10-20 10:30:55 +0200)
> 
> are available in the git repository at:
> 
>   git://git.kernel.org/pub/scm/linux/kernel/git/acme/linux.git 
> tags/perf-core-for-mingo
> 
> for you to fetch changes up to e3d006ce8180a0c025ce66bdc89bbc125f85be57:
> 
>   perf annotate: Add debug message for out of bounds sample (2015-10-21 
> 18:12:37 -0300)
> 
> 
> perf/core improvements and fixes:
> 
> User visible:
> 
> - Print branch filter state with verbose mode (Andi Kleen)
> 
> - Fix core dump caused by per-socket/core system-wide stat (Kan Liang)
> 
> - Update libtraceevent KVM plugin (Paolo Bonzini)
> 
> Developer stuff:
> 
> - Add fixdep to 'tools/build' .gitignore (Yunlong Song)
> 
> Signed-off-by: Arnaldo Carvalho de Melo 
> 
> 
> Andi Kleen (1):
>   perf evsel: Print branch filter state with -vv
> 
> Arnaldo Carvalho de Melo (1):
>   perf annotate: Add debug message for out of bounds sample
> 
> Kan Liang (1):
>   perf cpu_map: Fix core dump caused by per-socket/core system-wide stat
> 
> Paolo Bonzini (1):
>   tools lib traceevent: update KVM plugin
> 
> Yunlong Song (1):
>   perf build: Add fixdep to .gitignore
> 
>  tools/build/.gitignore|  1 +
>  tools/lib/traceevent/plugin_kvm.c | 25 +
>  tools/perf/util/annotate.c|  5 -
>  tools/perf/util/cpumap.c  |  2 +-
>  tools/perf/util/evsel.c   |  1 +
>  5 files changed, 24 insertions(+), 10 deletions(-)
>  create mode 100644 tools/build/.gitignore

Pulled, thanks a lot Arnaldo!

Ingo
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v2 1/2] x86/msr: Carry on after a non-"safe" MSR access fails without !panic_on_oops

2015-10-01 Thread Ingo Molnar

* Andy Lutomirski  wrote:

> > These could still be open coded in an inlined fashion, like the scheduler 
> > usage.
> 
> We could have a raw_rdmsr for those.
> 
> OTOH, I'm still not 100% convinced that this warn-but-don't-die behavior is 
> worth the effort.  This isn't a frequent source of bugs to my knowledge, and 
> we 
> don't try to recover from incorrect cr writes, out-of-bounds MMIO, etc, so do 
> we 
> really gain much by rigging a recovery mechanism for rdmsr and wrmsr failures 
> for code that doesn't use the _safe variants?

It's just the general principle really: don't crash the kernel on bootup. 
There's 
few things more user hostile than that.

Also, this would maintain the status quo: since we now (accidentally) don't 
crash 
the kernel on distro kernels (but silently and unsafely ignore the faulting 
instruction), we should not regress that behavior (by adding the chance to 
crash 
again), but improve upon it.

Thanks,

Ingo
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v2 1/2] x86/msr: Carry on after a non-"safe" MSR access fails without !panic_on_oops

2015-09-30 Thread Ingo Molnar

* Peter Zijlstra <pet...@infradead.org> wrote:

> On Mon, Sep 21, 2015 at 09:36:15AM -0700, Linus Torvalds wrote:
> > On Mon, Sep 21, 2015 at 1:46 AM, Ingo Molnar <mi...@kernel.org> wrote:
> > >
> > > Linus, what's your preference?
> > 
> > So quite frankly, is there any reason we don't just implement
> > native_read_msr() as just
> > 
> >unsigned long long native_read_msr(unsigned int msr)
> >{
> >   int err;
> >   unsigned long long val;
> > 
> >   val = native_read_msr_safe(msr, );
> >   WARN_ON_ONCE(err);
> >   return val;
> >}
> > 
> > Note: no inline, no nothing. Just put it in arch/x86/lib/msr.c, and be
> > done with it. I don't see the downside.
> > 
> > How many msr reads are so critical that the function call
> > overhead would matter? Get rid of the inline version of the _safe()
> > thing too, and put that thing there too.
> 
> There are a few in the perf code, and esp. on cores without a stack engine 
> the 
> call overhead is noticeable. Also note that the perf MSRs are generally 
> optimized MSRs and less slow (we cannot say fast, they're still MSRs) than 
> regular MSRs.

These could still be open coded in an inlined fashion, like the scheduler usage.

Thanks,

Ingo
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] x86: Use entire page for the per-cpu GDT only if paravirt-enabled

2015-09-29 Thread Ingo Molnar

* Denys Vlasenko <dvlas...@redhat.com> wrote:

> On 09/28/2015 09:58 AM, Ingo Molnar wrote:
> > 
> > * Denys Vlasenko <dvlas...@redhat.com> wrote:
> > 
> >> On 09/26/2015 09:50 PM, H. Peter Anvin wrote:
> >>> NAK.  We really should map the GDT read-only on all 64 bit systems,
> >>> since we can't hide the address from SLDT.  Same with the IDT.
> >>
> >> Sorry, I don't understand your point.
> > 
> > So the problem is that right now the SGDT instruction (which is 
> > unprivileged) 
> > leaks the real address of the kernel image:
> > 
> >  fomalhaut:~> ./sgdt 
> >  SGDT: 88303fd89000 / 007f
> > 
> > that '88303fd89000' is a kernel address.
> 
> Thank you.
> I do know that SGDT and friends are unprivileged on x86
> and thus they allow userspace (and guest kernels in paravirt)
> learn things they don't need to know.
> 
> I don't see how making GDT page-aligned and page-sized
> changes anything in this regard. SGDT will still work,
> and still leak GDT address.

Well, as I try to explain it in the other part of my mail, doing so enables us 
to 
remap the GDT to a less security sensitive virtual address that does not leak 
the 
kernel's randomized address:

> > Your observation in the changelog and your patch:
> > 
> >>>> It is page-sized because of paravirt. [...]
> > 
> > ... conflicts with the intention to mark (remap) the primary GDT address 
> > read-only 
> > on native kernels as well.
> > 
> > So what we should do instead is to use the page alignment properly and 
> > remap the 
> > GDT to a read-only location, and load that one.
> 
> If we'd have a small GDT (i.e. what my patch does), we still can remap the 
> entire page which contains small GDT, and simply don't care that some other 
> data 
> is also visible through that RO page.

That's generally considered fragile: suppose an attacker has a limited 
information 
leak that can read absolute addresses with system privilege but he doesn't know 
the kernel's randomized base offset. With a 'partial page' mapping there could 
be 
function pointers near the GDT, part of the page the GDT happens to be on, that 
leak this information.

(Same goes for crypto keys or other critical information (like canary 
information, 
salts, etc.) accidentally ending up nearby.)

Arguably it's a bit tenuous, but when playing remapping games it's generally 
considered good to be page aligned and page sized, with zero padding.

> > This would have a couple of advantages:
> > 
> >  - This would give kernel address randomization more teeth on x86.
> > 
> >  - An additional advantage would be that rootkits overwriting the GDT would 
> > have 
> >a bit more work to do.
> > 
> >  - A third advantage would be that for NUMA systems we could 'mirror' the 
> > GDT into
> >node-local memory and load those. This makes GDT load cache-misses a bit 
> > less
> >expensive.
> 
> GDT is per-cpu. Isn't per-cpu memory already NUMA-local?

Indeed it is:

fomalhaut:~> for ((cpu=1; cpu<9; cpu++)); do taskset $cpu ./sgdt ; done
SGDT: 88103fa09000 / 007f
SGDT: 88103fa29000 / 007f
SGDT: 88103fa29000 / 007f
SGDT: 88103fa49000 / 007f
SGDT: 88103fa49000 / 007f
SGDT: 88103fa49000 / 007f
SGDT: 88103fa29000 / 007f
SGDT: 88103fa69000 / 007f

I confused it with the IDT, which is still global.

This also means that the GDT in itself does not leak kernel addresses at the 
moment, except it leaks the layout of the percpu area.

So my suggestion would be to:

 - make the GDT unconditionally page aligned and sized, then remap it to a
   read-only address unconditionally as well, like we do it for the IDT.

 - make the IDT per CPU as well, for performance reasons.

Thanks,

Ingo
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] x86: Use entire page for the per-cpu GDT only if paravirt-enabled

2015-09-28 Thread Ingo Molnar

* Denys Vlasenko  wrote:

> On 09/26/2015 09:50 PM, H. Peter Anvin wrote:
> > NAK.  We really should map the GDT read-only on all 64 bit systems,
> > since we can't hide the address from SLDT.  Same with the IDT.
> 
> Sorry, I don't understand your point.

So the problem is that right now the SGDT instruction (which is unprivileged) 
leaks the real address of the kernel image:

 fomalhaut:~> ./sgdt 
 SGDT: 88303fd89000 / 007f

that '88303fd89000' is a kernel address.

 fomalhaut:~> cat sgdt.c 
 #include 
 #include 

 int main(void)
 {
 struct gdt_desc {
 unsigned short  limit;
 unsigned long   addr;
 } __attribute__((packed)) gdt_desc = { -1, -1 };

 asm volatile("sgdt %0": "=m" (gdt_desc));

 printf("SGDT: %016lx / %04x\n", gdt_desc.addr, gdt_desc.limit);

 return 0;
 }

Your observation in the changelog and your patch:

> >> It is page-sized because of paravirt. [...]

... conflicts with the intention to mark (remap) the primary GDT address 
read-only 
on native kernels as well.

So what we should do instead is to use the page alignment properly and remap 
the 
GDT to a read-only location, and load that one.

This would have a couple of advantages:

 - This would give kernel address randomization more teeth on x86.

 - An additional advantage would be that rootkits overwriting the GDT would 
have 
   a bit more work to do.

 - A third advantage would be that for NUMA systems we could 'mirror' the GDT 
into
   node-local memory and load those. This makes GDT load cache-misses a bit less
   expensive.

The IDT is already remapped:

 fomalhaut:~> ./sidt 
 Sidt: ff57b000 / 0fff
 fomalhaut:~> cat sidt.c
 #include 
 #include 

 int main(void)
 {
 struct idt_desc {
 unsigned short  limit;
 unsigned long   addr;
 } __attribute__((packed)) idt_desc = { -1, -1 };

 asm volatile("sidt %0": "=m" (idt_desc));

 printf("Sidt: %016lx / %04x\n", idt_desc.addr, idt_desc.limit);

 return 0;
 }

Thanks,

Ingo
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v2 1/2] x86/msr: Carry on after a non-"safe" MSR access fails without !panic_on_oops

2015-09-22 Thread Ingo Molnar

* Linus Torvalds <torva...@linux-foundation.org> wrote:

> On Mon, Sep 21, 2015 at 1:46 AM, Ingo Molnar <mi...@kernel.org> wrote:
> >
> > Linus, what's your preference?
> 
> So quite frankly, is there any reason we don't just implement
> native_read_msr() as just
> 
>unsigned long long native_read_msr(unsigned int msr)
>{
>   int err;
>   unsigned long long val;
> 
>   val = native_read_msr_safe(msr, );
>   WARN_ON_ONCE(err);
>   return val;
>}
> 
> Note: no inline, no nothing. Just put it in arch/x86/lib/msr.c, and be
> done with it. I don't see the downside.

Absolutely!

> How many msr reads are so critical that the function call overhead 
> would 
> matter? Get rid of the inline version of the _safe() thing too, and put that 
> thing there too.

Only a very low number of them is performance critical (because even 
hw-accelerated MSR accesses are generally slow so we try to avoid MSR accesses 
in 
fast paths as much as possible, via shadowing, etc.) - and in the few cases 
where 
we have to access an MSR in a fast path we can do those separately.

I'm only worried about the 'default' APIs, i.e. rdmsr() that is used throughout 
arch/x86/ over a hundred times, not about performance critical code paths that 
get 
enough testing and enough attention in general.

Thanks,

Ingo
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v2 1/2] x86/msr: Carry on after a non-"safe" MSR access fails without !panic_on_oops

2015-09-21 Thread Ingo Molnar

* Andy Lutomirski  wrote:

> On Sep 20, 2015 5:15 PM, "Linus Torvalds"  
> wrote:
> >
> > On Sun, Sep 20, 2015 at 5:02 PM, Andy Lutomirski  wrote:
> > > This demotes an OOPS and likely panic due to a failed non-"safe" MSR
> > > access to a WARN_ON_ONCE and a return of zero (in the RDMSR case).
> > > We still write a pr_info entry unconditionally for debugging.
> >
> > No, this is wrong.
> >
> > If you really want to do something like this, then just make all MSR reads 
> > safe. So the only difference between "safe" and "unsafe" is that the unsafe 
> > version just doesn't check the return value, and silently just returns zero 
> > for reads (or writes nothing).
> >
> > To quote Obi-Wan: "Use the exception table, Luke".
> >
> > Because decoding instructions is just too ugly. We'll do it for CPU errata 
> > where we might have to do it for user space code too (ie the AMD prefetch 
> > mess), but for code that _we_ control? Hell no.
> >
> > So NAK on this.
> 
> My personal preference is to just not do this at all.  A couple people 
> disagree.  
> If we make the unsafe variants not oops, then I think we want to have the 
> nice 
> loud warning, since these issues are bugs if they happen.
> 
> We could certainly use the exception table for this, but it'll result in 
> bigger 
> core, since each MSR access will need an exception table entry and an 
> associated 
> fixup to call some helper that warns and sets the result to zero.
> 
> I'd be happy to implement that, but only if it'll be applied. Otherwise I'd 
> rather just drop this patch and keep the rest of the series.

Linus, what's your preference?

Due to the bug mentioned earlier in this thread all MSR reads are currently 
'safe' 
on all the major Linux distros (which all have CONFIG_PARAVIRT=y), i.e. by 
'fixing' them we'd reintroduce random crashes into various fragile pieces of 
code...

To add insult to injury, the current 'silently safe by accident' MSR code isn't 
so 
safe: because it leaves the result of the read uninitialized...

To fix this all I'd really like to have:

 - safe MSR reads by default (i.e. never boot crash the kernel on some rare 
   condition - which to most users is either a silent boot hang or an instant 
   restart). Historicaly we had a stream of 'silly boot crashes' due to MSR 
reads 
   that generate a #GPF. They make Linux less usable around the edges, 
especially 
   in the x86 non-server (desktop) space where most hardware vendors are either 
   openly Linux hostile, or, at best, Linux oblivious.

 - proper result-zeroing behavior on exceptions

 - and we should also generate _some_ sort of warning when MSR exceptions happen
   in an 'unintended' fashion.

Maybe the warning could be put under a (default-enabled) config option for the 
size conscious.

Or we could extend exception table entry encoding to include a 'warning bit', 
to 
not bloat the kernel. If the exception handler code encounters such an 
exception 
it would generate a one-time warning for that entry, but otherwise not crash 
the 
kernel and continue execution with an all-zeroes result for the MSR read.

Thanks,

Ingo
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 1/2] x86/msr: Carry on after a non-"safe" MSR access fails without !panic_on_oops

2015-09-18 Thread Ingo Molnar

* Andy Lutomirski  wrote:

> This demotes an OOPS and likely panic due to a failed non-"safe" MSR
> access to a WARN_ON_ONCE and a return of poisoned values (in the
> RDMSR case).  We still write a pr_info entry unconditionally for
> debugging.
> 
> To be clear, this type of failure should *not* happen.  This patch
> exists to minimize the chance of nasty undebuggable failures due on
> systems that used to work due to a now-fixed CONFIG_PARAVIRT=y bug.

> + if (opcode == 0x320f) {
> + /* RDMSR */
> + pr_info("bad kernel RDMSR from non-existent MSR 0x%x",
> + (unsigned int)regs->cx);
> + if (!panic_on_oops) {
> + WARN_ON_ONCE(true);
> +
> + /* Patch it up with deterministic poison. */
> + regs->ax = 0x5aadc0de;
> + regs->dx = 0x8badf00d;
> + regs->ip += 2;
> + return true;

IMHO this should really not poison the result, but use zero as the result.

The poison might randomly indicate 'present' feature in various registers that 
might be accessed in a buggy way. Don't send the code further down into 
la-la-land 
by giving it a 'success'.

And yes, zero can mean success too, but we have to pick a side here ...

The warning will be enough to fix these ups, people (and in particular distro 
testing people) will be watching out for them.

Thanks,

Ingo
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 0/3] x86/paravirt: Fix baremetal paravirt MSR ops

2015-09-17 Thread Ingo Molnar

* Andy Lutomirski  wrote:

> Setting CONFIG_PARAVIRT=y has an unintended side effect: it silently
> turns all rdmsr and wrmsr operations into the safe variants without
> any checks that the operations actually succeed.
> 
> This is IMO awful: it papers over bugs.  In particular, KVM gueests
> might be unwittingly depending on this behavior because
> CONFIG_KVM_GUEST currently depends on CONFIG_PARAVIRT.  I'm not
> aware of any such problems, but applying this series would be a good
> way to shake them out.
> 
> Fix it so that the MSR operations work the same on CONFIG_PARAVIRT=n
> and CONFIG_PARAVIRT=y as long as Xen isn't being used.  The Xen
> maintainers are welcome to make a similar change on top of this.
> 
> Since there's plenty of time before the next merge window, I think
> we should apply and fix anything that breaks.

No, I think we should at most generate a warning instead, and not crash the 
kernel 
via rdmsr()!

Most big distro kernels on bare metal have CONFIG_PARAVIRT=y (I checked Ubuntu 
and 
Fedora), so we are potentially exposing a lot of users to problems.

Crashing the bootup on an unknown MSR is bad. Many MSR reads and writes are 
non-critical and returning the 'safe' result is much better than crashing or 
hanging the bootup.

( We should double check that rdmsr()/wrmsr() results are never left 
  uninitialized, but are set to zero or so, for cases where the return code is 
not 
  checked. )

Thanks,

Ingo
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 0/3] x86/paravirt: Fix baremetal paravirt MSR ops

2015-09-17 Thread Ingo Molnar

* Andy Lutomirski <l...@amacapital.net> wrote:

> On Thu, Sep 17, 2015 at 12:19 AM, Ingo Molnar <mi...@kernel.org> wrote:
> >
> > * Andy Lutomirski <l...@kernel.org> wrote:
> >
> >> Setting CONFIG_PARAVIRT=y has an unintended side effect: it silently
> >> turns all rdmsr and wrmsr operations into the safe variants without
> >> any checks that the operations actually succeed.
> >>
> >> This is IMO awful: it papers over bugs.  In particular, KVM gueests
> >> might be unwittingly depending on this behavior because
> >> CONFIG_KVM_GUEST currently depends on CONFIG_PARAVIRT.  I'm not
> >> aware of any such problems, but applying this series would be a good
> >> way to shake them out.
> >>
> >> Fix it so that the MSR operations work the same on CONFIG_PARAVIRT=n
> >> and CONFIG_PARAVIRT=y as long as Xen isn't being used.  The Xen
> >> maintainers are welcome to make a similar change on top of this.
> >>
> >> Since there's plenty of time before the next merge window, I think
> >> we should apply and fix anything that breaks.
> >
> > No, I think we should at most generate a warning instead, and not crash the 
> > kernel
> > via rdmsr()!
> >
> > Most big distro kernels on bare metal have CONFIG_PARAVIRT=y (I checked 
> > Ubuntu and
> > Fedora), so we are potentially exposing a lot of users to problems.
> >
> > Crashing the bootup on an unknown MSR is bad. Many MSR reads and writes are
> > non-critical and returning the 'safe' result is much better than crashing or
> > hanging the bootup.
> >
> 
> Should we do that for CONFIG_PARAVIRT=n, too?

Absolutely. PARAVIRT=n should not behave differently from PARAVIRT=y on bare 
metal.

> It would be straightforward to rig this up (temporarily?) on top of these 
> patches.  To keep bloat down, we might want to implement it in 
> do_general_protection rather than sticking it in native_read_msr.

Fair enough.

> wrmsr is a different beast, since we can fail due to writing the wrong value 
> to 
> an otherwise valid MSR.  Given that MSR screwups can very easily be security 
> holes, I'm not sure that warning and blindly continuing on an unchecked 
> failed 
> wrmsr is a good idea.

So the fact is that right now we are silently ignoring failures there, and have 
been doing that for some time. The right first step is to live with that and 
start 
generating low-key, once-per-bootup warnings at most, and see how frequent (and 
how serious) they are.

We could add a (default disabled) CONFIG_PANIC_ON_UNKNOWN_MSR=y option if 
that's 
really a serious concern.

> In any event, I think it's nuts that CONFIG_PARAVIRT changes this
> behavior.  We should pick something sane and stick with it.

Absolutely - and as it happens, the 'does not crash the kernel' PARAVIRT=y 
accidental behavior is actually quite close to what we wanted for a long time, 
so 
let's make it official - and add a warning to make sure we are aware of 
problems. 

But don't turn 'potential problems' into showstopper bugs such as a hard to 
debug 
early boot crash, which to most Linux users means a black screen on bootup!

> > ( We should double check that rdmsr()/wrmsr() results are never left
> >   uninitialized, but are set to zero or so, for cases where the return code 
> > is not
> >   checked. )
> 
> It sure looks like native_read_msr_safe doesn't clear the output if the rdmsr 
> fails.

Yeah, that should be fixed, to make such soft failures deterministic.

Thanks,

Ingo
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[tip:x86/asm] x86/asm/tsc: Add rdtscll() merge helper

2015-08-21 Thread tip-bot for Ingo Molnar
Commit-ID:  99770737ca7e3ebc14e66460a69b7032de9421e1
Gitweb: http://git.kernel.org/tip/99770737ca7e3ebc14e66460a69b7032de9421e1
Author: Ingo Molnar mi...@kernel.org
AuthorDate: Fri, 21 Aug 2015 08:33:53 +0200
Committer:  Ingo Molnar mi...@kernel.org
CommitDate: Fri, 21 Aug 2015 08:35:42 +0200

x86/asm/tsc: Add rdtscll() merge helper

Some in-flight code makes use of the old rdtscll() (now removed), provide a 
wrapper
for a kernel cycle to smooth the transition to rdtsc().

( We use the safest variant, rdtsc_ordered(), which has barriers - this adds 
another
  incentive to remove the wrapper in the future. )

Cc: Andy Lutomirski l...@kernel.org
Cc: Borislav Petkov b...@suse.de
Cc: Andy Lutomirski l...@amacapital.net
Cc: Borislav Petkov b...@alien8.de
Cc: Brian Gerst brge...@gmail.com
Cc: Denys Vlasenko dvlas...@redhat.com
Cc: H. Peter Anvin h...@zytor.com
Cc: Huang Rui ray.hu...@amd.com
Cc: John Stultz john.stu...@linaro.org
Cc: Len Brown l...@kernel.org
Cc: Linus Torvalds torva...@linux-foundation.org
Cc: Peter Zijlstra pet...@infradead.org
Cc: Ralf Baechle r...@linux-mips.org
Cc: Thomas Gleixner t...@linutronix.de
Cc: kvm ML kvm@vger.kernel.org
Link: 
http://lkml.kernel.org/r/dddbf98a2af53312e9aa73a5a2b1622fe5d6f52b.1434501121.git.l...@kernel.org
Signed-off-by: Ingo Molnar mi...@kernel.org
---
 arch/x86/include/asm/msr.h | 3 +++
 1 file changed, 3 insertions(+)

diff --git a/arch/x86/include/asm/msr.h b/arch/x86/include/asm/msr.h
index 131eec2..54e9f08 100644
--- a/arch/x86/include/asm/msr.h
+++ b/arch/x86/include/asm/msr.h
@@ -152,6 +152,9 @@ static __always_inline unsigned long long 
rdtsc_ordered(void)
return rdtsc();
 }
 
+/* Deprecated, keep it for a cycle for easier merging: */
+#define rdtscll(now)   do { (now) = rdtsc_ordered(); } while (0)
+
 static inline unsigned long long native_read_pmc(int counter)
 {
DECLARE_ARGS(val, low, high);
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] kvm:Return -ENOMEM directly for the function, kvm_create_lapic

2015-05-21 Thread Ingo Molnar

* Nicholas Krause xerofo...@gmail.com wrote:

 In order to make code paths easier to read in the function,
 kvm_create_lapic we return -ENOMEM when unable to allocate
 memory for a kvm_lapic structure pointer directly. This
 makes the code easier to read and cleaner then jumping
 to a goto label at the end of the function's body for
 returning just the error code, -ENOMEM.
 
 Signed-off-by: Nicholas Krause xerofo...@gmail.com
 ---
  arch/x86/kvm/lapic.c | 3 +--
  1 file changed, 1 insertion(+), 2 deletions(-)
 
 diff --git a/arch/x86/kvm/lapic.c b/arch/x86/kvm/lapic.c
 index 629af0f..88d0cce 100644
 --- a/arch/x86/kvm/lapic.c
 +++ b/arch/x86/kvm/lapic.c
 @@ -1687,7 +1687,7 @@ int kvm_create_lapic(struct kvm_vcpu *vcpu)
  
   apic = kzalloc(sizeof(*apic), GFP_KERNEL);
   if (!apic)
 - goto nomem;
 + return -ENOMEM;
  
   vcpu-arch.apic = apic;
  
 @@ -1718,7 +1718,6 @@ int kvm_create_lapic(struct kvm_vcpu *vcpu)
   return 0;
  nomem_free_apic:
   kfree(apic);
 -nomem:
   return -ENOMEM;
  }

NAK!

You just half destroyed the nice error handling cascade of labels.

Thanks,

Ingo
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v3 1/2] perf/kvm: Port perf kvm to powerpc

2015-05-07 Thread Ingo Molnar

* Hemant Kumar hem...@linux.vnet.ibm.com wrote:

  # perf kvm stat report -p 60515
 Analyze events for pid(s) 60515, all VCPUs:
 
VM-EXITSamples  Samples% Time%Min Time Max
 Time Avg time
 
 H_DATA_STORAGE   500635.30% 0.13%  1.94us 49.46us 
 12.37us ( +-   0.52% )
 HV_DECREMENTER   445731.43% 0.02%  0.72us 16.14us  
 1.91us ( +-   0.96% )
SYSCALL   269018.97% 0.10%  2.84us528.24us 
 18.29us ( +-   3.75% )
 RETURN_TO_HOST   178912.61%99.76%  1.58us 672791.91us  
 27470.23us ( +-   3.00% )
   EXTERNAL240 1.69% 0.00%0.69us 10.67us  
 1.33us ( +-   5.34% )

Where is the last line misaligned? Copy  paste error or does perf kvm 
produce it in such a way?

Thanks,

Ingo
--
To unsubscribe from this list: send the line unsubscribe kvm-ppc in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH for stable] x86/spinlocks/paravirt: Fix memory corruption on unlock

2015-02-25 Thread Ingo Molnar

* Christian Borntraeger borntrae...@de.ibm.com wrote:

  By all means!
  
  You'll first need to cherry-pick these commits:
 
   927609d622a3 kernel: tighten rules for ACCESS ONCE
   c5b19946eb76 kernel: Fix sparse warning for ACCESS_ONCE
   dd36929720f4 kernel: make READ_ONCE() valid on const arguments
 
 If you go before 3.19, you will also need
 
230fa253df63 kernel: Provide READ_ONCE and ASSIGN_ONCE
43239cbe79fc kernel: Change ASSIGN_ONCE(val, x) to WRITE_ONCE(x, val)

The affected spinlock code went over several iterations 
post v3.18, which I think makes the spinlock change too 
risky and complex to backport so far back. So it's not 
necessay to backport these READ_ONCE() changes.

  That's the minimum set you will need for backporting, 
  due to overlapping changes to the ACCESS_ONCE() 
  definition.
  
  and then apply this commit:
  
   d6abfdb20223 x86/spinlocks/paravirt: Fix memory corruption on unlock
 
 the alternative might be to replace READ_ONCE with 
 ACCESS_ONCE when doing the backport.

Doing changes to patches when doing a backport is a big 
no-no IMHO. Either there is a clean sequence of upstream 
commit IDs to cherry-pick, or it should not be backported 
in most cases.

Thanks,

Ingo
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH for stable] x86/spinlocks/paravirt: Fix memory corruption on unlock

2015-02-25 Thread Ingo Molnar

* Greg KH gre...@linuxfoundation.org wrote:

  It's:
  
d6abfdb20223 x86/spinlocks/paravirt: Fix memory corruption on unlock
  
  Yes, This is the original patch. Please note I have taken out the
  READ_ONCE changes from the original patch to avoid build warnings
  mentioned below.
  (Those READ_ONCE changes were cosmetic and was not present in the
  previous versions)
  
  
  You'll also need this fix from Linus to avoid (harmless)
  build warnings:
  
dd36929720f4 kernel: make READ_ONCE() valid on const arguments
  
  So this may not be absolutely necessary with the current patch.
 
 I'd prefer to be as close as possible to the upstream 
 patch.  So if applying both of these patches will work, 
 I'd much rather do that. Changing patches when 
 backporting them to stable for no good reason than to 
 clean things up, just confuses everyone involved.
 
 Let's keep our messy history :)

By all means!

You'll first need to cherry-pick these commits:

 927609d622a3 kernel: tighten rules for ACCESS ONCE
 c5b19946eb76 kernel: Fix sparse warning for ACCESS_ONCE
 dd36929720f4 kernel: make READ_ONCE() valid on const arguments

That's the minimum set you will need for backporting, due 
to overlapping changes to the ACCESS_ONCE() definition.

and then apply this commit:

 d6abfdb20223 x86/spinlocks/paravirt: Fix memory corruption on unlock

I've double checked that these commits will cherry-pick 
fine on top of v3.19, in that order, and that an x86-64 
defconfig+kvmconfig+PARAVIRT_SPINLOCK=y kernel builds fine 
without warnings.

I've not boot tested the changes, so if anything breaks 
it's all your fault - while if it works just fine then
I'll be glad to take credit for that.

Thanks,

Ingo
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH for stable] x86/spinlocks/paravirt: Fix memory corruption on unlock

2015-02-24 Thread Ingo Molnar

* Greg KH gre...@linuxfoundation.org wrote:

 On Tue, Feb 24, 2015 at 02:54:59PM +0530, Raghavendra K T wrote:
  Paravirt spinlock clears slowpath flag after doing unlock.
  As explained by Linus currently it does:
  prev = *lock;
  add_smp(lock-tickets.head, TICKET_LOCK_INC);
  
  /* add_smp() is a full mb() */
  
  if (unlikely(lock-tickets.tail  TICKET_SLOWPATH_FLAG))
  __ticket_unlock_slowpath(lock, prev);
  
  which is *exactly* the kind of things you cannot do with spinlocks,
  because after you've done the add_smp() and released the spinlock
  for the fast-path, you can't access the spinlock any more.  Exactly
  because a fast-path lock might come in, and release the whole data
  structure.
  
  Linus suggested that we should not do any writes to lock after unlock(),
  and we can move slowpath clearing to fastpath lock.
  
  So this patch implements the fix with:
  1. Moving slowpath flag to head (Oleg):
  Unlocked locks don't care about the slowpath flag; therefore we can keep
  it set after the last unlock, and clear it again on the first (try)lock.
  -- this removes the write after unlock. note that keeping slowpath flag 
  would
  result in unnecessary kicks.
  By moving the slowpath flag from the tail to the head ticket we also avoid
  the need to access both the head and tail tickets on unlock.
  
  2. use xadd to avoid read/write after unlock that checks the need for
  unlock_kick (Linus):
  We further avoid the need for a read-after-release by using xadd;
  the prev head value will include the slowpath flag and indicate if we
  need to do PV kicking of suspended spinners -- on modern chips xadd
  isn't (much) more expensive than an add + load.
  
  Result:
   setup: 16core (32 cpu +ht sandy bridge 8GB 16vcpu guest)
   benchmark overcommit %improve
   kernbench  1x   -0.13
   kernbench  2x0.02
   dbench 1x   -1.77
   dbench 2x   -0.63
  
  [Jeremy: hinted missing TICKET_LOCK_INC for kick]
  [Oleg: Moving slowpath flag to head, ticket_equals idea]
  [PeterZ: Detailed changelog]
  
  Reported-by: Sasha Levin sasha.le...@oracle.com
  Suggested-by: Linus Torvalds torva...@linux-foundation.org
  Signed-off-by: Raghavendra K T raghavendra...@linux.vnet.ibm.com
  Reviewed-by: Oleg Nesterov o...@redhat.com
  Acked-by: David Vrabel david.vra...@citrix.com
  ---
   arch/x86/include/asm/spinlock.h | 94 
  -
   arch/x86/kernel/kvm.c   |  7 ++-
   arch/x86/xen/spinlock.c |  7 ++-
   3 files changed, 58 insertions(+), 50 deletions(-)
  
  Changes for stable:
- Don't replace the ACCESS_ONCE to READ_ONCE which would cause horraneous
  Compiler warnings (Linus, David Vbriel, PeterZ, Ingo)
 
 What is the git commit id of this in Linus's tree?  What 
 stable tree(s) do you want this applied to?

It's:

 d6abfdb20223 x86/spinlocks/paravirt: Fix memory corruption on unlock

You'll also need this fix from Linus to avoid (harmless) 
build warnings:

 dd36929720f4 kernel: make READ_ONCE() valid on const arguments

Thanks,

Ingo
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: typo of the comment

2014-11-09 Thread Ingo Molnar

* john_gong john_g...@yeah.net wrote:

 hi Paolo,
 i find a typo of the comment.
 
 
 From 09d5df31f0930e8e3eb10ad60a3debc53d6ce992 Mon Sep 17 00:00:00 2001
 From: john_gong john_g...@yeah.net
 Date: Fri, 7 Nov 2014 07:32:17 +0800
 Subject: [PATCH] modify a typo of the comment
 
 Signed-off-by: john_gong john_g...@yeah.net
 ---
  arch/x86/kvm/x86.c | 2 +-
  1 file changed, 1 insertion(+), 1 deletion(-)
 
 diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
 index ef432f8..42bb2c8 100644
 --- a/arch/x86/kvm/x86.c
 +++ b/arch/x86/kvm/x86.c
 @@ -214,7 +214,7 @@ void kvm_define_shared_msr(unsigned slot, u32 msr)
  if (slot = shared_msrs_global.nr)
  shared_msrs_global.nr = slot + 1;
  shared_msrs_global.msrs[slot] = msr;
 -/* we need ensured the shared_msr_global have been updated */
 +/* we need ensured the shared_msrs_global have been updated */
  smp_wmb();

That's not the only typo in that sentence though.

Thanks,

Ingo
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 0/3] x86: structs for cpuid info in x86

2014-09-17 Thread Ingo Molnar

* Nadav Amit nadav.a...@gmail.com wrote:

 
 
 On 9/16/14 4:22 PM, Ingo Molnar wrote:
  
  * Nadav Amit na...@cs.technion.ac.il wrote:
  
  The code that deals with x86 cpuid fields is hard to follow since it 
  performs
  many bit operations and does not refer to cpuid field explicitly.  To
  eliminate the need of openning a spec whenever dealing with cpuid fields, 
  this
  patch-set introduces structs that reflect the various cpuid functions.
 
  Thanks for reviewing the patch-set.
 
  Nadav Amit (3):
x86: Adding structs to reflect cpuid fields
x86: Use new cpuid structs in cpuid functions
KVM: x86: Using cpuid structs in KVM
 
   arch/x86/include/asm/cpuid_def.h | 163 
  +++
   arch/x86/kernel/cpu/common.c |  56 --
   arch/x86/kvm/cpuid.c |  36 +
   3 files changed, 219 insertions(+), 36 deletions(-)
   create mode 100644 arch/x86/include/asm/cpuid_def.h
  
  I personally like bitfields in theory (they provide type clarity 
  and abstract robustness, compared to open-coded bitmask numeric 
  literals that are often used in cpuid using code, obfuscating 
  cpuid usage), with the big caveat that for many years I didn't 
  like bitfields in practice: older versions of GCC did a really 
  poor job of optimizing them.
  
  So such a series would only be acceptable if it's demonstrated 
  that both 'latest' and 'reasonably old' GCC versions do a good 
  job in that department, compared to the old open-coded bitmask 
  ops ...
  
  Comparing the 'size vmlinux' output of before/after kernels would 
  probably be a good start in seeing the impact of such a change.
  
  If those results are positive then this technique could be 
  propagated to all cpuid using code in arch/x86/, of which
  there's plenty.
 
 Thanks for the quick response. I was not aware GCC behaves this 
 way. I made some small experiments with GCC-4.8 and GCC-4.4 and 
 in brief my conclusions are:

 1. The assembled code of bitmask and bitfields is indeed different.
 2. GCC-4.8 and GCC-4.4 behave pretty much the same, yet GCC-4.8 appears
 to make better instructions reordering.
 3. Loading/storing a single bitfield seems to be pretty much optimized
 (marginal advantage from code size point-of-view for bitmask, same
 number of instructions).
 4. Loading/storing multiple bitfields seems to be somewhat
 under-optimized - multiple accesses to the original value result in ~30%
 more instructions and code-size.

That's better than what I remembered.

 So you are correct - bitfields are less optimized. Nonetheless, 
 since cpuid data is mostly used during startup, and otherwise a 
 single bitfield is usually accessed in each function - I wonder 
 whether it worth keeping the optimized but obfuscate code. 
 Obviously, I can guess your answer to this question...

So with the condition that you are actively watching out for 
performance critical code paths, I think the type clarity (i.e. 
bitfields) is a win.

If hpa, tglx or Linus objects I'll yield to that objection 
though.

Opinions, objections?

Thanks,

Ingo
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 0/3] x86: structs for cpuid info in x86

2014-09-16 Thread Ingo Molnar

* Nadav Amit na...@cs.technion.ac.il wrote:

 The code that deals with x86 cpuid fields is hard to follow since it performs
 many bit operations and does not refer to cpuid field explicitly.  To
 eliminate the need of openning a spec whenever dealing with cpuid fields, this
 patch-set introduces structs that reflect the various cpuid functions.
 
 Thanks for reviewing the patch-set.
 
 Nadav Amit (3):
   x86: Adding structs to reflect cpuid fields
   x86: Use new cpuid structs in cpuid functions
   KVM: x86: Using cpuid structs in KVM
 
  arch/x86/include/asm/cpuid_def.h | 163 
 +++
  arch/x86/kernel/cpu/common.c |  56 --
  arch/x86/kvm/cpuid.c |  36 +
  3 files changed, 219 insertions(+), 36 deletions(-)
  create mode 100644 arch/x86/include/asm/cpuid_def.h

I personally like bitfields in theory (they provide type clarity 
and abstract robustness, compared to open-coded bitmask numeric 
literals that are often used in cpuid using code, obfuscating 
cpuid usage), with the big caveat that for many years I didn't 
like bitfields in practice: older versions of GCC did a really 
poor job of optimizing them.

So such a series would only be acceptable if it's demonstrated 
that both 'latest' and 'reasonably old' GCC versions do a good 
job in that department, compared to the old open-coded bitmask 
ops ...

Comparing the 'size vmlinux' output of before/after kernels would 
probably be a good start in seeing the impact of such a change.

If those results are positive then this technique could be 
propagated to all cpuid using code in arch/x86/, of which
there's plenty.

Thanks,

Ingo
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 2/5] softlockup: make detector be aware of task switch of processes hogging cpu

2014-08-18 Thread Ingo Molnar
* Don Zickus dzic...@redhat.com wrote:

 From: chai wen chaiw.f...@cn.fujitsu.com
 
 For now, soft lockup detector warns once for each case of process softlockup.
 But the thread 'watchdog/n' may not always get the cpu at the time slot 
 between
 the task switch of two processes hogging that cpu to reset soft_watchdog_warn.
 
 An example would be two processes hogging the cpu.  Process A causes the
 softlockup warning and is killed manually by a user.  Process B immediately
 becomes the new process hogging the cpu preventing the softlockup code from
 resetting the soft_watchdog_warn variable.
 
 This case is a false negative of warn only once for a process, as there may
 be a different process that is going to hog the cpu.  Resolve this by
 saving/checking the pid of the hogging process and use that to reset
 soft_watchdog_warn too.
 
 Signed-off-by: chai wen chaiw.f...@cn.fujitsu.com
 [modified the comment and changelog to be more specific]
 Signed-off-by: Don Zickus dzic...@redhat.com
 ---
  kernel/watchdog.c |   20 ++--
  1 files changed, 18 insertions(+), 2 deletions(-)
 
 diff --git a/kernel/watchdog.c b/kernel/watchdog.c
 index 4c2e11c..6d0a891 100644
 --- a/kernel/watchdog.c
 +++ b/kernel/watchdog.c
 @@ -42,6 +42,7 @@ static DEFINE_PER_CPU(bool, softlockup_touch_sync);
  static DEFINE_PER_CPU(bool, soft_watchdog_warn);
  static DEFINE_PER_CPU(unsigned long, hrtimer_interrupts);
  static DEFINE_PER_CPU(unsigned long, soft_lockup_hrtimer_cnt);
 +static DEFINE_PER_CPU(pid_t, softlockup_warn_pid_saved);
  #ifdef CONFIG_HARDLOCKUP_DETECTOR
  static DEFINE_PER_CPU(bool, hard_watchdog_warn);
  static DEFINE_PER_CPU(bool, watchdog_nmi_touch);
 @@ -317,6 +318,8 @@ static enum hrtimer_restart watchdog_timer_fn(struct 
 hrtimer *hrtimer)
*/
   duration = is_softlockup(touch_ts);
   if (unlikely(duration)) {
 + pid_t pid = task_pid_nr(current);
 +
   /*
* If a virtual machine is stopped by the host it can look to
* the watchdog like a soft lockup, check to see if the host
 @@ -326,8 +329,20 @@ static enum hrtimer_restart watchdog_timer_fn(struct 
 hrtimer *hrtimer)
   return HRTIMER_RESTART;
  
   /* only warn once */
 - if (__this_cpu_read(soft_watchdog_warn) == true)
 + if (__this_cpu_read(soft_watchdog_warn) == true) {
 +
 + /*
 +  * Handle the case where multiple processes are
 +  * causing softlockups but the duration is small
 +  * enough, the softlockup detector can not reset
 +  * itself in time.  Use pids to detect this.
 +  */
 + if (__this_cpu_read(softlockup_warn_pid_saved) != pid) {

So I agree with the motivation of this improvement, but is this 
implementation namespace-safe?

Thanks,

Ingo
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 3/5] watchdog: fix print-once on enable

2014-08-18 Thread Ingo Molnar

* Don Zickus dzic...@redhat.com wrote:

 --- a/kernel/watchdog.c
 +++ b/kernel/watchdog.c
 @@ -522,6 +522,9 @@ static void watchdog_nmi_disable(unsigned int cpu)
   /* should be in cleanup, but blocks oprofile */
   perf_event_release_kernel(event);
   }
 + if (cpu == 0)
 + /* watchdog_nmi_enable() expects this to be zero initially. */
 + cpu0_err = 0;
   return;
  }

While at it I also removed the stray 'return;'.

Thanks,

Ingo
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 4/5] watchdog: control hard lockup detection default

2014-08-18 Thread Ingo Molnar

* Don Zickus dzic...@redhat.com wrote:

 The running kernel still has the ability to enable/disable at any
 time with /proc/sys/kernel/nmi_watchdog us usual. However even
 when the default has been overridden /proc/sys/kernel/nmi_watchdog
 will initially show '1'. To truly turn it on one must disable/enable
 it, i.e.
   echo 0  /proc/sys/kernel/nmi_watchdog
   echo 1  /proc/sys/kernel/nmi_watchdog

This looks like a bug, why is this so?

Thanks,

Ingo
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 3/5] watchdog: fix print-once on enable

2014-08-18 Thread Ingo Molnar

* Don Zickus dzic...@redhat.com wrote:

 From: Ulrich Obergfell uober...@redhat.com
 
 This patch avoids printing the message 'enabled on all CPUs, ...'
 multiple times. For example, the issue can occur in the following
 scenario:
 
 1) watchdog_nmi_enable() fails to enable PMU counters and sets
cpu0_err.
 
 2) 'echo [0|1]  /proc/sys/kernel/nmi_watchdog' is executed to
disable and re-enable the watchdog mechanism 'on the fly'.
 
 3) If watchdog_nmi_enable() succeeds to enable PMU counters, each
CPU will print the message because step1 left behind a non-zero
cpu0_err.
 
if (!IS_ERR(event)) {
if (cpu == 0 || cpu0_err)
pr_info(enabled on all CPUs, ...)
 
 The patch avoids this by clearing cpu0_err in watchdog_nmi_disable().
 
 Signed-off-by: Ulrich Obergfell uober...@redhat.com
 Signed-off-by: Andrew Jones drjo...@redhat.com
 Signed-off-by: Don Zickus dzic...@redhat.com
 ---
  kernel/watchdog.c |3 +++
  1 files changed, 3 insertions(+), 0 deletions(-)
 
 diff --git a/kernel/watchdog.c b/kernel/watchdog.c
 index 6d0a891..0838685 100644
 --- a/kernel/watchdog.c
 +++ b/kernel/watchdog.c
 @@ -522,6 +522,9 @@ static void watchdog_nmi_disable(unsigned int cpu)
   /* should be in cleanup, but blocks oprofile */
   perf_event_release_kernel(event);
   }
 + if (cpu == 0)
 + /* watchdog_nmi_enable() expects this to be zero initially. */
 + cpu0_err = 0;

Looks good except a small detail: two-line blocks need curly 
braces too, even if it's just a single C statement. I've fixed 
this up in the commit.

thanks,

Ingo

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 4/5] watchdog: control hard lockup detection default

2014-08-18 Thread Ingo Molnar

* Don Zickus dzic...@redhat.com wrote:

 From: Ulrich Obergfell uober...@redhat.com
 
 In some cases we don't want hard lockup detection enabled by default.
 An example is when running as a guest. Introduce
 
   watchdog_enable_hardlockup_detector(bool)

So, the name watchdog_enable_hardlockup_detector_enable(false) 
is both too long and also really confusing (because first it 
suggests enablement, then disables it), so I renamed it to 
hardlockup_detector_set(), which allows two natural variants:

hardlockup_detector_set(false);
...
hardlockup_detector_set(true);

Thanks,

Ingo
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 2/5] softlockup: make detector be aware of task switch of processes hogging cpu

2014-08-18 Thread Ingo Molnar

* Don Zickus dzic...@redhat.com wrote:

 On Mon, Aug 18, 2014 at 11:03:19AM +0200, Ingo Molnar wrote:
  * Don Zickus dzic...@redhat.com wrote:
  
   From: chai wen chaiw.f...@cn.fujitsu.com
   
   For now, soft lockup detector warns once for each case of process 
   softlockup.
   But the thread 'watchdog/n' may not always get the cpu at the time slot 
   between
   the task switch of two processes hogging that cpu to reset 
   soft_watchdog_warn.
   
   An example would be two processes hogging the cpu.  Process A causes the
   softlockup warning and is killed manually by a user.  Process B 
   immediately
   becomes the new process hogging the cpu preventing the softlockup code 
   from
   resetting the soft_watchdog_warn variable.
   
   This case is a false negative of warn only once for a process, as there 
   may
   be a different process that is going to hog the cpu.  Resolve this by
   saving/checking the pid of the hogging process and use that to reset
   soft_watchdog_warn too.
   
   Signed-off-by: chai wen chaiw.f...@cn.fujitsu.com
   [modified the comment and changelog to be more specific]
   Signed-off-by: Don Zickus dzic...@redhat.com
   ---
kernel/watchdog.c |   20 ++--
1 files changed, 18 insertions(+), 2 deletions(-)
   
   diff --git a/kernel/watchdog.c b/kernel/watchdog.c
   index 4c2e11c..6d0a891 100644
   --- a/kernel/watchdog.c
   +++ b/kernel/watchdog.c
   @@ -42,6 +42,7 @@ static DEFINE_PER_CPU(bool, softlockup_touch_sync);
static DEFINE_PER_CPU(bool, soft_watchdog_warn);
static DEFINE_PER_CPU(unsigned long, hrtimer_interrupts);
static DEFINE_PER_CPU(unsigned long, soft_lockup_hrtimer_cnt);
   +static DEFINE_PER_CPU(pid_t, softlockup_warn_pid_saved);
#ifdef CONFIG_HARDLOCKUP_DETECTOR
static DEFINE_PER_CPU(bool, hard_watchdog_warn);
static DEFINE_PER_CPU(bool, watchdog_nmi_touch);
   @@ -317,6 +318,8 @@ static enum hrtimer_restart watchdog_timer_fn(struct 
   hrtimer *hrtimer)
  */
 duration = is_softlockup(touch_ts);
 if (unlikely(duration)) {
   + pid_t pid = task_pid_nr(current);
   +
 /*
  * If a virtual machine is stopped by the host it can look to
  * the watchdog like a soft lockup, check to see if the host
   @@ -326,8 +329,20 @@ static enum hrtimer_restart watchdog_timer_fn(struct 
   hrtimer *hrtimer)
 return HRTIMER_RESTART;

 /* only warn once */
   - if (__this_cpu_read(soft_watchdog_warn) == true)
   + if (__this_cpu_read(soft_watchdog_warn) == true) {
   +
   + /*
   +  * Handle the case where multiple processes are
   +  * causing softlockups but the duration is small
   +  * enough, the softlockup detector can not reset
   +  * itself in time.  Use pids to detect this.
   +  */
   + if (__this_cpu_read(softlockup_warn_pid_saved) != pid) {
  
  So I agree with the motivation of this improvement, but is this 
  implementation namespace-safe?
 
 What namespace are you worried about colliding with?  I thought
 softlockup_ would provide the safety??  Maybe I am missing something
 obvious. :-(

I meant PID namespaces - a PID in itself isn't guaranteed to be 
unique across the system.

Thanks,

Ingo
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 4/5] watchdog: control hard lockup detection default

2014-08-18 Thread Ingo Molnar

* Don Zickus dzic...@redhat.com wrote:

 On Mon, Aug 18, 2014 at 11:16:44AM +0200, Ingo Molnar wrote:
  
  * Don Zickus dzic...@redhat.com wrote:
  
   The running kernel still has the ability to enable/disable at any
   time with /proc/sys/kernel/nmi_watchdog us usual. However even
   when the default has been overridden /proc/sys/kernel/nmi_watchdog
   will initially show '1'. To truly turn it on one must disable/enable
   it, i.e.
 echo 0  /proc/sys/kernel/nmi_watchdog
 echo 1  /proc/sys/kernel/nmi_watchdog
  
  This looks like a bug, why is this so?
 
 It is, but it always has been there in the case of the PMU 
 not being able to provide a resource for the hardlockup.  
 This change just exposes it more.

There seems to be two issues:

1)

When it's impossible to enable the hardlockup detector, it 
should default to -1 or so, and attempts to set it should 
return a -EINVAL or so.

Bootup messages should also indicate when it's not possible to 
enable it but a user requests it.

2)

The softlockup and hardlockup detection control variables 
should be in separate flags, inside and outside the kernel - 
they (should) not relate to each other.

Thanks,

Ingo
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 4/5] watchdog: control hard lockup detection default

2014-08-18 Thread Ingo Molnar
* Don Zickus dzic...@redhat.com wrote:

  2)
  
  The softlockup and hardlockup detection control variables 
  should be in separate flags, inside and outside the kernel 
  - they (should) not relate to each other.
 
 They did because years ago I thought we wanted to keep them 
 as one entity instead of two.  I would have to re-work the 
 code to do this (and add more knobs).
 
 I presume you would want those changes done before taking 
 this patchset?

Yeah, fixing/cleaning up things would be nice before spreading 
the pain via new features/control mechanisms.

Thanks,

Ingo
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 2/5] softlockup: make detector be aware of task switch of processes hogging cpu

2014-08-18 Thread Ingo Molnar

* Don Zickus dzic...@redhat.com wrote:

So I agree with the motivation of this improvement, but 
is this implementation namespace-safe?
   
   What namespace are you worried about colliding with?  I 
   thought softlockup_ would provide the safety??  Maybe I 
   am missing something obvious. :-(
  
  I meant PID namespaces - a PID in itself isn't guaranteed 
  to be unique across the system.
 
 Ah, I don't think we thought about that.  Is there a better 
 way to do this?  Is there a domain id or something that can 
 be OR'd with the pid?

What is always unique is the task pointer itself. We use pids 
when we interface with user-space - but we don't really do that 
here, right?

Thanks,

Ingo
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v9 05/19] qspinlock: Optimize for smaller NR_CPUS

2014-04-19 Thread Ingo Molnar

* Waiman Long waiman.l...@hp.com wrote:

 On 04/18/2014 03:46 AM, Ingo Molnar wrote:
 * Waiman Longwaiman.l...@hp.com  wrote:
 
 On 04/17/2014 11:58 AM, Peter Zijlstra wrote:
 On Thu, Apr 17, 2014 at 11:03:57AM -0400, Waiman Long wrote:
 +static __always_inline void
 +clear_pending_set_locked(struct qspinlock *lock, u32 val)
 +{
 + struct __qspinlock *l = (void *)lock;
 +
 + ACCESS_ONCE(l-locked_pending) = 1;
 +}
 @@ -157,8 +251,13 @@ static inline int trylock_pending(struct qspinlock 
 *lock, u32 *pval)
* we're pending, wait for the owner to go away.
*
* *,1,1 -   *,1,0
 +  *
 +  * this wait loop must be a load-acquire such that we match the
 +  * store-release that clears the locked bit and create lock
 +  * sequentiality; this because not all try_clear_pending_set_locked()
 +  * implementations imply full barriers.
 You renamed the function referred in the above comment.
 
 Sorry, will fix the comments.
 I suggest not renaming the function instead.
 try_clear_pending_set_locked() tells the intent in a clearer fashion.
 
 Thanks,
 
  Ingo
 
 I usually use the word try if there is a possibility of failure. 
 However, the function will always succeed, albeit by waiting a bit 
 in some cases. That is why I remove try from the name.

Fair enough!

Thanks,

Ingo
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v9 03/19] qspinlock: Add pending bit

2014-04-18 Thread Ingo Molnar

* Waiman Long waiman.l...@hp.com wrote:

 Because the qspinlock needs to touch a second cacheline; add a pending
 bit and allow a single in-word spinner before we punt to the second
 cacheline.
 
 Signed-off-by: Peter Zijlstra pet...@infradead.org
 Signed-off-by: Waiman Long waiman.l...@hp.com

This patch should have a From: Peter in it as well, right?

Thanks,

Ingo
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v9 05/19] qspinlock: Optimize for smaller NR_CPUS

2014-04-18 Thread Ingo Molnar

* Waiman Long waiman.l...@hp.com wrote:

 On 04/17/2014 11:58 AM, Peter Zijlstra wrote:
 On Thu, Apr 17, 2014 at 11:03:57AM -0400, Waiman Long wrote:
 +static __always_inline void
 +clear_pending_set_locked(struct qspinlock *lock, u32 val)
 +{
 +   struct __qspinlock *l = (void *)lock;
 +
 +   ACCESS_ONCE(l-locked_pending) = 1;
 +}
 @@ -157,8 +251,13 @@ static inline int trylock_pending(struct qspinlock 
 *lock, u32 *pval)
  * we're pending, wait for the owner to go away.
  *
  * *,1,1 -  *,1,0
 +*
 +* this wait loop must be a load-acquire such that we match the
 +* store-release that clears the locked bit and create lock
 +* sequentiality; this because not all try_clear_pending_set_locked()
 +* implementations imply full barriers.
 You renamed the function referred in the above comment.
 
 
 Sorry, will fix the comments.

I suggest not renaming the function instead. 
try_clear_pending_set_locked() tells the intent in a clearer fashion.

Thanks,

Ingo
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v8 01/10] qspinlock: A generic 4-byte queue spinlock implementation

2014-04-04 Thread Ingo Molnar

* Waiman Long waiman.l...@hp.com wrote:

 On 04/04/2014 09:00 AM, Peter Zijlstra wrote:
 
  So I'm just not ever going to pick up this patch; I spend a week 
  trying to reverse engineer this; I posted a 7 patch series 
  creating the equivalent, but in a gradual and readable fashion:
 
http://lkml.kernel.org/r/20140310154236.038181...@infradead.org
 
  You keep on ignoring that; I'll keep on ignoring your patches.
 
  I might at some point rewrite some of your pv stuff on top to get 
  this moving again, but I'm not really motivated to work with you 
  atm.
 
 Peter, I am sorry that I have focused recently on making the 
 qspinlock patch works with virtualization and it is easier for me to 
 based off on my patch initially. Now the PV part is almost done, I 
 will apply them on top of your patch. Hopefully, I will get a new 
 patch out sometime next week.

Note that it's not a patch that PeterZ posted, but a series of 7 
finegrained patches, each properly documented and commented. Please 
preserve that work, build on top of it, and don't just ignore it!

Thanks,

Ingo
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v8 01/10] qspinlock: A generic 4-byte queue spinlock implementation

2014-04-04 Thread Ingo Molnar

* Waiman Long waiman.l...@hp.com wrote:

 On 04/04/2014 12:57 PM, Konrad Rzeszutek Wilk wrote:
 On Fri, Apr 04, 2014 at 03:00:12PM +0200, Peter Zijlstra wrote:
 So I'm just not ever going to pick up this patch; I spend a week trying
 to reverse engineer this; I posted a 7 patch series creating the
 equivalent, but in a gradual and readable fashion:
 
http://lkml.kernel.org/r/20140310154236.038181...@infradead.org
 
 You keep on ignoring that; I'll keep on ignoring your patches.
 
 I might at some point rewrite some of your pv stuff on top to get this
 moving again, but I'm not really motivated to work with you atm.
 Uh? Did you CC also xen-de...@lists.xenproject.org on your patches Peter?
 I hadn't had a chance to see or comment on them :-(
 
 
 Peter's patch is a rewrite of my patches 1-4, there is no PV or 
 unfair lock support in there.

It is a fine grained split-up, which does one thing at a time, so it 
all becomes reviewable and mergable (and the claimed effects become 
testable!). Please use that as a base.

Thanks,

Ingo
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [GIT PULL] tree-wide: clean up no longer required #include linux/init.h

2014-02-04 Thread Ingo Molnar

* Paul Gortmaker paul.gortma...@windriver.com wrote:

 On Feb 4, 2014 3:52 PM, Paul Gortmaker paul.gortma...@windriver.com
 wrote:
 
  We've had this in linux-next for 2+ weeks (thanks Stephen!) as a
  linux-stable like queue of patches, and as can be seen here:
 
https://git.kernel.org/pub/scm/linux/kernel/git/paulg/init.git
 
 Argh, above link is meant for cloning, not viewing.
 
 This should be better...
 
 https://git.kernel.org/cgit/linux/kernel/git/paulg/init.git/

So, if you meant Linus to pull it, you probably want to cite a real 
Git URI along the lines of:

   git://git.kernel.org/pub/scm/linux/kernel/git/paulg/init.git

( Otherwise your pull request might be ignored or worse, you might get
  an honest reply, due to the https transport being considered evil
  that no free man outside of corporate firewalls should ever consider
  and all that. )

Nice cleanups btw.

Thanks,

Ingo
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [GIT PULL] tree-wide: clean up no longer required #include linux/init.h

2014-02-04 Thread Ingo Molnar

* Stephen Rothwell s...@canb.auug.org.au wrote:

 Hi Ingo,
 
 On Wed, 5 Feb 2014 07:06:33 +0100 Ingo Molnar mi...@kernel.org wrote:
  
  So, if you meant Linus to pull it, you probably want to cite a real 
  Git URI along the lines of:
  
 git://git.kernel.org/pub/scm/linux/kernel/git/paulg/init.git
 
 Paul provided the proper git url further down in the mail along with the
 usual pull request message (I guess he should have put that bit at the
 top).

Yeah, indeed, and it even comes with a signed tag, which is an extra 
nice touch:

  git://git.kernel.org/pub/scm/linux/kernel/git/paulg/linux.git 
tags/init-cleanup

(I guess the https was mentioned first to lower expectations.)

Thanks,

Ingo
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH RESEND V13 14/14] kvm : Paravirtual ticketlocks support for linux guests running on KVM hypervisor

2013-08-13 Thread Ingo Molnar

* H. Peter Anvin h...@zytor.com wrote:

 Raghavendra...
 
 Even with this latest patch this branch is broken:
 
 :(.discard+0x6108): multiple definition of `__pcpu_unique_lock_waiting'
 arch/x86/xen/built-in.o:(.discard+0x23): first defined here
   CC  drivers/firmware/google/gsmi.o
 arch/x86/kernel/built-in.o:(.discard+0x6108): multiple definition of
 `__pcpu_unique_lock_waiting'
 arch/x86/xen/built-in.o:(.discard+0x23): first defined here
   CC  sound/core/seq/oss/seq_oss_init.o
 
 This is trivially reproducible by doing a build with make allyesconfig.
 
 Please fix and *verify* it is fixed before resubmitting.
 
 I will be away so Ingo will have to handle the resubmission.

Would be nice to have a delta fix patch against tip:x86/spinlocks, which 
I'll then backmerge into that series via rebasing it.

Thanks,

Ingo
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH RFC V11 15/18] kvm : Paravirtual ticketlocks support for linux guests running on KVM hypervisor

2013-08-05 Thread Ingo Molnar

* Gleb Natapov g...@redhat.com wrote:

 On Fri, Aug 02, 2013 at 11:25:39AM +0200, Ingo Molnar wrote:
   Ingo,
   
   Do you have any concerns reg this series? please let me know if this 
   looks good now to you.
  
  I'm inclined to NAK it for excessive quotation - who knows how many 
  people left the discussion in disgust? Was it done to drive away as 
  many reviewers as possible?
  
  Anyway, see my other reply, the measurement results seem hard to 
  interpret and inconclusive at the moment.

 That result was only for patch 18 of the series, not pvspinlock in 
 general.

Okay - I've re-read the performance numbers and they are impressive, so no 
objections from me.

The x86 impact seems to be a straightforward API change, with most of the 
changes on the virtualization side. So:

Acked-by: Ingo Molnar mi...@kernel.org

I guess you'd want to carry this in the KVM tree or so - maybe in a 
separate branch because it changes Xen as well?

Thanks,

Ingo
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH RFC V11 15/18] kvm : Paravirtual ticketlocks support for linux guests running on KVM hypervisor

2013-08-05 Thread Ingo Molnar

* Gleb Natapov g...@redhat.com wrote:

 On Mon, Aug 05, 2013 at 11:46:03AM +0200, Ingo Molnar wrote:
  Acked-by: Ingo Molnar mi...@kernel.org
  
  I guess you'd want to carry this in the KVM tree or so - maybe in a 
  separate branch because it changes Xen as well?
 
 It changes KVM host and guest side, XEN and common x86 spinlock code. I 
 think it would be best to merge common x86 spinlock bits and guest side 
 KVM/XEN bits through tip tree and host KVM part will go through KVM 
 tree. If this is OK with you, Ingo, and XEN folks Raghavendra can send 
 two separate patch series one for the tip and one for KVM host side.

Sure, that's fine - if the initial series works fine in isolation as well 
(i.e. won't break anything).

Thanks,

Ingo
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH RFC V11 15/18] kvm : Paravirtual ticketlocks support for linux guests running on KVM hypervisor

2013-08-02 Thread Ingo Molnar

* Raghavendra K T raghavendra...@linux.vnet.ibm.com wrote:

 On 08/01/2013 02:34 PM, Raghavendra K T wrote:
 On 08/01/2013 01:15 PM, Gleb Natapov wrote:
 Shall I consider this as an ack for kvm part?
 
 For everything except 18/18. For that I still want to see numbers. But
 18/18 is pretty independent from the reset of the series so it should
 not stop the reset from going in.
 
 Yes. agreed.
 I am going to evaluate patch 18 separately and come with results for
 that. Now we can consider only 1-17 patches.
 
 
 Gleb,
 
 32 core machine with HT off 32 vcpu guests.
 base = 3.11-rc + patch 1 -17 pvspinlock_v11
 patched = base + patch 18
 
 +---+---+---++---+
   dbench  (Throughput in MB/sec higher is better)
 +---+---+---++---+
   base  stdev   patchedstdev   %improvement
 +---+---+---++---+
 1x 14584.3800   146.9074   14705.1000   163.1060 0.82773
 2x  1713.730032.87501717.320045.5979 0.20948
 3x   967.821242.0257 971.885518.8532 0.41994
 4x   685.276425.7150 694.5881 8.3907 1.35882
 +---+---+---++---+

Please list stddev in percentage as well ...

a blind stab gave me these figures:

   base  stdev   patchedstdev   %improvement
 3x   967.82124.3% 971.8855  1.8% 0.4

That makes the improvement an order of magnitude smaller than the noise of 
the measurement ... i.e. totally inconclusive.

Also please cut the excessive decimal points: with 2-4% noise what point 
is there in 5 decimal point results??

Thanks,

Ingo
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH RFC V11 15/18] kvm : Paravirtual ticketlocks support for linux guests running on KVM hypervisor

2013-08-02 Thread Ingo Molnar

* Raghavendra K T raghavendra...@linux.vnet.ibm.com wrote:

 On 07/31/2013 11:54 AM, Gleb Natapov wrote:
 On Tue, Jul 30, 2013 at 10:13:12PM +0530, Raghavendra K T wrote:
 On 07/25/2013 03:08 PM, Raghavendra K T wrote:
 On 07/25/2013 02:45 PM, Gleb Natapov wrote:
 On Thu, Jul 25, 2013 at 02:47:37PM +0530, Raghavendra K T wrote:
 On 07/24/2013 06:06 PM, Raghavendra K T wrote:
 On 07/24/2013 05:36 PM, Gleb Natapov wrote:
 On Wed, Jul 24, 2013 at 05:30:20PM +0530, Raghavendra K T wrote:
 On 07/24/2013 04:09 PM, Gleb Natapov wrote:
 On Wed, Jul 24, 2013 at 03:15:50PM +0530, Raghavendra K T wrote:
 On 07/23/2013 08:37 PM, Gleb Natapov wrote:
 On Mon, Jul 22, 2013 at 11:50:16AM +0530, Raghavendra K T wrote:
 +static void kvm_lock_spinning(struct arch_spinlock *lock,
 __ticket_t want)
 [...]

[ a few hundred lines of unnecessary quotation snipped. ]

 Ingo,
 
 Do you have any concerns reg this series? please let me know if this 
 looks good now to you.

I'm inclined to NAK it for excessive quotation - who knows how many people 
left the discussion in disgust? Was it done to drive away as many 
reviewers as possible?

Anyway, see my other reply, the measurement results seem hard to interpret 
and inconclusive at the moment.

Thanks,

Ingo
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH -v2] x86: Add a Kconfig shortcut for kvm guest kernel

2013-04-26 Thread Ingo Molnar

* Borislav Petkov b...@alien8.de wrote:

 From: Borislav Petkov b...@suse.de
 Date: Tue, 16 Apr 2013 18:24:34 +0200
 Subject: [PATCH -v2] x86: Add a Kconfig shortcut for kvm guest kernel
 
 This is pretty useful for the case where people want to boot the
 resulting kernel in qemu/kvm. Instead of going and searching for each
 required option through the Kconfig maze, this single option should
 simply enable everything required/good to have to boot the resulting
 kernel in the guest.

Please mention:

 ' This patch is based on a similar utility patch of the external
   lkvm tree. '

 
 Cc: Fengguang Wu fengguang...@intel.com
 Originally-by: Pekka Enberg penb...@kernel.org
 Originally-by: Sasha Levin levinsasha...@gmail.com
 Signed-off-by: Borislav Petkov b...@suse.de
 ---
 
 
 Here's v2 which should be addressing all review comments so far.
 
 
  arch/x86/Kconfig | 38 ++
  1 file changed, 38 insertions(+)
 
 diff --git a/arch/x86/Kconfig b/arch/x86/Kconfig
 index 5651374d179f..76a95ffa959a 100644
 --- a/arch/x86/Kconfig
 +++ b/arch/x86/Kconfig
 @@ -680,6 +680,44 @@ config KVM_GUEST
 underlying device model, the host provides the guest with
 timing infrastructure such as time of day, and system time
  
 +config KVM_GUEST_COMMODITY_OPTIONS

Call this KVM_GUEST_COMMON_OPTIONS?

 + bool Enable commodity options for a standalone KVM guest
 + depends on KVM_GUEST
 + select NET
 + select NETDEVICES
 + select BLOCK
 + select BLK_DEV
 + select NETWORK_FILESYSTEMS
 + select INET
 + select EXPERIMENTAL
 + select TTY
 + select SERIAL_8250
 + select SERIAL_8250_CONSOLE
 + select IP_PNP
 + select IP_PNP_DHCP
 + select BINFMT_ELF
 + select PCI_MSI
 + select HAVE_ARCH_KGDB
 + select DEBUG_KERNEL
 + select KGDB
 + select KGDB_SERIAL_CONSOLE
 + select VIRTUALIZATION
 + select VIRTIO
 + select VIRTIO_RING
 + select VIRTIO_PCI
 + select VIRTIO_BLK
 + select VIRTIO_CONSOLE
 + select VIRTIO_NET
 + select 9P_FS
 + select NET_9P
 + select NET_9P_VIRTIO
 + ---help---
 +   Select guest kernel functionality which facilitates booting the
 +   kernel as a guest in qemu/kvm. This entails basic stuff like

s/qemu/qemu or lkvm

 +   serial support, kgdb, virtio and other so that you can be able to
 +   have commodity functionality like serial output from the guest,
 +   networking, etc.

And seemless host file system integration into guest context. (that is 
what the 9P options are about)

Thanks,

Ingo
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v8 2/7] KVM: VMX: Register a new IPI for posted interrupt

2013-04-10 Thread Ingo Molnar

* Gleb Natapov g...@redhat.com wrote:

 On Mon, Apr 08, 2013 at 10:23:17PM +0800, Yang Zhang wrote:
  From: Yang Zhang yang.z.zh...@intel.com
  
  Posted Interrupt feature requires a special IPI to deliver posted interrupt
  to guest. And it should has a high priority so the interrupt will not be
  blocked by others.
  Normally, the posted interrupt will be consumed by vcpu if target vcpu is
  running and transparent to OS. But in some cases, the interrupt will arrive
  when target vcpu is scheduled out. And host will see it. So we need to
  register a dump handler to handle it.
 
 Ingo can I add your ACK to this one? In the past you agreed to the
 approach.

Yeah, it's fine to me - assuming it's tested and does not break anything.

Thanks,

Ingo
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v5 2/5] KVM: VMX: Register a new IPI for posted interrupt

2013-03-08 Thread Ingo Molnar

* Yang Zhang yang.z.zh...@intel.com wrote:

 diff --git a/arch/x86/kernel/irqinit.c b/arch/x86/kernel/irqinit.c
 index 6e03b0d..2329a54 100644
 --- a/arch/x86/kernel/irqinit.c
 +++ b/arch/x86/kernel/irqinit.c
 @@ -205,6 +205,10 @@ static void __init apic_intr_init(void)
  
   /* IPI for X86 platform specific use */
   alloc_intr_gate(X86_PLATFORM_IPI_VECTOR, x86_platform_ipi);
 +#ifdef CONFIG_HAVE_KVM
 + /* IPI for KVM to deliver posted interrupt */
 + alloc_intr_gate(POSTED_INTR_VECTOR, kvm_posted_intr_ipi);
 +#endif

Please avoid wasting an IDT entry by reusing x86_platform_ipi.

A KVM guest is in essence one type of 'x86 platform', and this callback is used 
by 
hardware platforms, so collision is not an issue AFAICS.

Thanks,

Ingo
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v5 2/5] KVM: VMX: Register a new IPI for posted interrupt

2013-03-08 Thread Ingo Molnar

* Gleb Natapov g...@redhat.com wrote:

 On Fri, Mar 08, 2013 at 02:26:25PM +0100, Ingo Molnar wrote:
  
  * Yang Zhang yang.z.zh...@intel.com wrote:
  
   diff --git a/arch/x86/kernel/irqinit.c b/arch/x86/kernel/irqinit.c
   index 6e03b0d..2329a54 100644
   --- a/arch/x86/kernel/irqinit.c
   +++ b/arch/x86/kernel/irqinit.c
   @@ -205,6 +205,10 @@ static void __init apic_intr_init(void)

 /* IPI for X86 platform specific use */
 alloc_intr_gate(X86_PLATFORM_IPI_VECTOR, x86_platform_ipi);
   +#ifdef CONFIG_HAVE_KVM
   + /* IPI for KVM to deliver posted interrupt */
   + alloc_intr_gate(POSTED_INTR_VECTOR, kvm_posted_intr_ipi);
   +#endif
  
  Please avoid wasting an IDT entry by reusing x86_platform_ipi.
  
  A KVM guest is in essence one type of 'x86 platform', and this callback is 
  used 
  by hardware platforms, so collision is not an issue AFAICS.
 
 This is IPI send by a host though.

But received on the guest side, right?

Thanks,

Ingo
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v5 2/5] KVM: VMX: Register a new IPI for posted interrupt

2013-03-08 Thread Ingo Molnar

* Gleb Natapov g...@redhat.com wrote:

 On Fri, Mar 08, 2013 at 03:05:45PM +0100, Ingo Molnar wrote:
  
  * Gleb Natapov g...@redhat.com wrote:
  
   On Fri, Mar 08, 2013 at 02:26:25PM +0100, Ingo Molnar wrote:

* Yang Zhang yang.z.zh...@intel.com wrote:

 diff --git a/arch/x86/kernel/irqinit.c b/arch/x86/kernel/irqinit.c
 index 6e03b0d..2329a54 100644
 --- a/arch/x86/kernel/irqinit.c
 +++ b/arch/x86/kernel/irqinit.c
 @@ -205,6 +205,10 @@ static void __init apic_intr_init(void)
  
   /* IPI for X86 platform specific use */
   alloc_intr_gate(X86_PLATFORM_IPI_VECTOR, x86_platform_ipi);
 +#ifdef CONFIG_HAVE_KVM
 + /* IPI for KVM to deliver posted interrupt */
 + alloc_intr_gate(POSTED_INTR_VECTOR, kvm_posted_intr_ipi);
 +#endif

Please avoid wasting an IDT entry by reusing x86_platform_ipi.

A KVM guest is in essence one type of 'x86 platform', and this callback 
is used 
by hardware platforms, so collision is not an issue AFAICS.
   
   This is IPI send by a host though.
  
  But received on the guest side, right?
 
 Not directly. If CPU that receives it happens to run in a guest mode it makes 
 VMX 
 to re-evaluate pending interrupt and inject one if possible without vmexit.  
 If 
 CPU is not in a guest mode the handler for the IPI is called in a host mode 
 and 
 does nothing. Guest code is unaware of the existence of that IPI.

Ok, I guess a separate IPI is fine (and better) in this case then.

Thanks,

Ingo
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH V3 RESEND RFC 1/2] sched: Bail out of yield_to when source and target runqueue has one task

2013-01-25 Thread Ingo Molnar

* Raghavendra K T raghavendra...@linux.vnet.ibm.com wrote:

 * Ingo Molnar mi...@kernel.org [2013-01-24 11:32:13]:
 
  
  * Raghavendra K T raghavendra...@linux.vnet.ibm.com wrote:
  
   From: Peter Zijlstra pet...@infradead.org
   
   In case of undercomitted scenarios, especially in large guests
   yield_to overhead is significantly high. when run queue length of
   source and target is one, take an opportunity to bail out and return
   -ESRCH. This return condition can be further exploited to quickly come
   out of PLE handler.
   
   (History: Raghavendra initially worked on break out of kvm ple handler 
   upon
seeing source runqueue length = 1, but it had to export rq length).
Peter came up with the elegant idea of return -ESRCH in scheduler core.
   
   Signed-off-by: Peter Zijlstra pet...@infradead.org
   Raghavendra, Checking the rq length of target vcpu condition 
   added.(thanks Avi)
   Reviewed-by: Srikar Dronamraju sri...@linux.vnet.ibm.com
   Signed-off-by: Raghavendra K T raghavendra...@linux.vnet.ibm.com
   Acked-by: Andrew Jones drjo...@redhat.com
   Tested-by: Chegu Vinod chegu_vi...@hp.com
   ---
   
kernel/sched/core.c |   25 +++--
1 file changed, 19 insertions(+), 6 deletions(-)
   
   diff --git a/kernel/sched/core.c b/kernel/sched/core.c
   index 2d8927f..fc219a5 100644
   --- a/kernel/sched/core.c
   +++ b/kernel/sched/core.c
   @@ -4289,7 +4289,10 @@ EXPORT_SYMBOL(yield);
 * It's the caller's job to ensure that the target task struct
 * can't go away on us before we can do any checks.
 *
   - * Returns true if we indeed boosted the target task.
   + * Returns:
   + *   true (0) if we indeed boosted the target task.
   + *   false (0) if we failed to boost the target.
   + *   -ESRCH if there's no task to yield to.
 */
bool __sched yield_to(struct task_struct *p, bool preempt)
{
   @@ -4303,6 +4306,15 @@ bool __sched yield_to(struct task_struct *p, bool 
   preempt)

again:
 p_rq = task_rq(p);
   + /*
   +  * If we're the only runnable task on the rq and target rq also
   +  * has only one task, there's absolutely no point in yielding.
   +  */
   + if (rq-nr_running == 1  p_rq-nr_running == 1) {
   + yielded = -ESRCH;
   + goto out_irq;
   + }
  
  Looks good to me in principle.
  
  Would be nice to get more consistent benchmark numbers. Once 
  those are unambiguously showing that this is a win:
  
Acked-by: Ingo Molnar mi...@kernel.org
 
 
 I ran the test with kernbench and sysbench again on 32 core mx3850
 machine with 32 vcpu guests. Results shows definite improvements.
 
 ebizzy and dbench show similar improvement for 1x overcommit
 (note that stdev for 1x in dbench is lesser improvemet is now seen at
 only 20%)
 
 [ all the experiments are taken out of 8 run averages ].
 
 The patches benefit large guest undercommit scenarios, so I believe
 with large guest performance improvemnt is even significant. [ Chegu
 Vinod results show performance near to no ple cases ]. Unfortunately I
 do not have a machine to test larger guest (32).
 
 Ingo, Please let me know if this is okay to you.
 
 base kernel = 3.8.0-rc4
 
 +---+---+---++---+
 kernbench  (time in sec lower is better)
 +---+---+---++---+
 basestdevpatchedstdev  %improve
 +---+---+---++---+
 1x   46.6028 1.8672   42.4494 1.1390 8.91234
 2x   99.9074 9.1859   90.4050 2.6131 9.51121
 +---+---+---++---+
 +---+---+---++---+
sysbench (time in sec lower is better) 
 +---+---+---++---+
 basestdevpatchedstdev  %improve
 +---+---+---++---+
 1x   18.7402 0.3764   17.7431 0.3589 5.32065
 2x   13.2238 0.1935   13.0096 0.3152 1.61981
 +---+---+---++---+
 
 +---+---+---++---+
 ebizzy  (records/sec higher is better)
 +---+---+---++---+
 basestdevpatchedstdev  %improve
 +---+---+---++---+
 1x  2421.900019.18015883.1000   112.7243   142.91259
 +---+---+---++---+
 
 +---+---+---++---+
 dbench (throughput MB/sec  higher is better)
 +---+---+---++---+
 basestdevpatchedstdev  %improve
 +---+---+---++---+
 1x  11675.9900   857.4154

Re: [PATCH V3 RESEND RFC 1/2] sched: Bail out of yield_to when source and target runqueue has one task

2013-01-25 Thread Ingo Molnar

* Raghavendra K T raghavendra...@linux.vnet.ibm.com wrote:

 On 01/25/2013 04:17 PM, Ingo Molnar wrote:
 
 * Raghavendra K T raghavendra...@linux.vnet.ibm.com wrote:
 
 * Ingo Molnar mi...@kernel.org [2013-01-24 11:32:13]:
 
 
 * Raghavendra K T raghavendra...@linux.vnet.ibm.com wrote:
 
 From: Peter Zijlstra pet...@infradead.org
 
 In case of undercomitted scenarios, especially in large guests
 yield_to overhead is significantly high. when run queue length of
 source and target is one, take an opportunity to bail out and return
 -ESRCH. This return condition can be further exploited to quickly come
 out of PLE handler.
 
 (History: Raghavendra initially worked on break out of kvm ple handler 
 upon
   seeing source runqueue length = 1, but it had to export rq length).
   Peter came up with the elegant idea of return -ESRCH in scheduler core.
 
 Signed-off-by: Peter Zijlstra pet...@infradead.org
 Raghavendra, Checking the rq length of target vcpu condition 
 added.(thanks Avi)
 Reviewed-by: Srikar Dronamraju sri...@linux.vnet.ibm.com
 Signed-off-by: Raghavendra K T raghavendra...@linux.vnet.ibm.com
 Acked-by: Andrew Jones drjo...@redhat.com
 Tested-by: Chegu Vinod chegu_vi...@hp.com
 ---
 
   kernel/sched/core.c |   25 +++--
   1 file changed, 19 insertions(+), 6 deletions(-)
 
 diff --git a/kernel/sched/core.c b/kernel/sched/core.c
 index 2d8927f..fc219a5 100644
 --- a/kernel/sched/core.c
 +++ b/kernel/sched/core.c
 @@ -4289,7 +4289,10 @@ EXPORT_SYMBOL(yield);
* It's the caller's job to ensure that the target task struct
* can't go away on us before we can do any checks.
*
 - * Returns true if we indeed boosted the target task.
 + * Returns:
 + *   true (0) if we indeed boosted the target task.
 + *   false (0) if we failed to boost the target.
 + *   -ESRCH if there's no task to yield to.
*/
   bool __sched yield_to(struct task_struct *p, bool preempt)
   {
 @@ -4303,6 +4306,15 @@ bool __sched yield_to(struct task_struct *p, bool 
 preempt)
 
   again:
   p_rq = task_rq(p);
 + /*
 +  * If we're the only runnable task on the rq and target rq also
 +  * has only one task, there's absolutely no point in yielding.
 +  */
 + if (rq-nr_running == 1  p_rq-nr_running == 1) {
 + yielded = -ESRCH;
 + goto out_irq;
 + }
 
 Looks good to me in principle.
 
 Would be nice to get more consistent benchmark numbers. Once
 those are unambiguously showing that this is a win:
 
Acked-by: Ingo Molnar mi...@kernel.org
 
 
 I ran the test with kernbench and sysbench again on 32 core mx3850
 machine with 32 vcpu guests. Results shows definite improvements.
 
 ebizzy and dbench show similar improvement for 1x overcommit
 (note that stdev for 1x in dbench is lesser improvemet is now seen at
 only 20%)
 
 [ all the experiments are taken out of 8 run averages ].
 
 The patches benefit large guest undercommit scenarios, so I believe
 with large guest performance improvemnt is even significant. [ Chegu
 Vinod results show performance near to no ple cases ]. Unfortunately I
 do not have a machine to test larger guest (32).
 
 Ingo, Please let me know if this is okay to you.
 
 base kernel = 3.8.0-rc4
 
 +---+---+---++---+
  kernbench  (time in sec lower is better)
 +---+---+---++---+
  basestdevpatchedstdev  %improve
 +---+---+---++---+
 1x   46.6028 1.8672 42.4494 1.1390 8.91234
 2x   99.9074 9.1859 90.4050 2.6131 9.51121
 +---+---+---++---+
 +---+---+---++---+
 sysbench (time in sec lower is better)
 +---+---+---++---+
  basestdevpatchedstdev  %improve
 +---+---+---++---+
 1x   18.7402 0.3764 17.7431 0.3589 5.32065
 2x   13.2238 0.1935 13.0096 0.3152 1.61981
 +---+---+---++---+
 
 +---+---+---++---+
  ebizzy  (records/sec higher is better)
 +---+---+---++---+
  basestdevpatchedstdev  %improve
 +---+---+---++---+
 1x  2421.900019.1801  5883.1000   112.7243   142.91259
 +---+---+---++---+
 
 +---+---+---++---+
  dbench (throughput MB/sec  higher is better)
 +---+---+---++---+
  basestdevpatchedstdev  %improve
 +---+---+---++---+
 1x  11675.9900

Re: [PATCH V3 RESEND RFC 1/2] sched: Bail out of yield_to when source and target runqueue has one task

2013-01-24 Thread Ingo Molnar

* Raghavendra K T raghavendra...@linux.vnet.ibm.com wrote:

 From: Peter Zijlstra pet...@infradead.org
 
 In case of undercomitted scenarios, especially in large guests
 yield_to overhead is significantly high. when run queue length of
 source and target is one, take an opportunity to bail out and return
 -ESRCH. This return condition can be further exploited to quickly come
 out of PLE handler.
 
 (History: Raghavendra initially worked on break out of kvm ple handler upon
  seeing source runqueue length = 1, but it had to export rq length).
  Peter came up with the elegant idea of return -ESRCH in scheduler core.
 
 Signed-off-by: Peter Zijlstra pet...@infradead.org
 Raghavendra, Checking the rq length of target vcpu condition added.(thanks 
 Avi)
 Reviewed-by: Srikar Dronamraju sri...@linux.vnet.ibm.com
 Signed-off-by: Raghavendra K T raghavendra...@linux.vnet.ibm.com
 Acked-by: Andrew Jones drjo...@redhat.com
 Tested-by: Chegu Vinod chegu_vi...@hp.com
 ---
 
  kernel/sched/core.c |   25 +++--
  1 file changed, 19 insertions(+), 6 deletions(-)
 
 diff --git a/kernel/sched/core.c b/kernel/sched/core.c
 index 2d8927f..fc219a5 100644
 --- a/kernel/sched/core.c
 +++ b/kernel/sched/core.c
 @@ -4289,7 +4289,10 @@ EXPORT_SYMBOL(yield);
   * It's the caller's job to ensure that the target task struct
   * can't go away on us before we can do any checks.
   *
 - * Returns true if we indeed boosted the target task.
 + * Returns:
 + *   true (0) if we indeed boosted the target task.
 + *   false (0) if we failed to boost the target.
 + *   -ESRCH if there's no task to yield to.
   */
  bool __sched yield_to(struct task_struct *p, bool preempt)
  {
 @@ -4303,6 +4306,15 @@ bool __sched yield_to(struct task_struct *p, bool 
 preempt)
  
  again:
   p_rq = task_rq(p);
 + /*
 +  * If we're the only runnable task on the rq and target rq also
 +  * has only one task, there's absolutely no point in yielding.
 +  */
 + if (rq-nr_running == 1  p_rq-nr_running == 1) {
 + yielded = -ESRCH;
 + goto out_irq;
 + }

Looks good to me in principle.

Would be nice to get more consistent benchmark numbers. Once 
those are unambiguously showing that this is a win:

  Acked-by: Ingo Molnar mi...@kernel.org

Thanks,

Ingo
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] x86, add hypervisor name to dump_stack() [v2]

2012-10-26 Thread Ingo Molnar

* Prarit Bhargava pra...@redhat.com wrote:

 Debugging crash, panics, stack trace WARN_ONs, etc., from both virtual and
 bare-metal boots can get difficult very quickly.  While there are ways to
 decipher the output and determine if the output is from a virtual guest,
 the in-kernel hypervisors now have a single registration point
 and set x86_hyper.  We can use this to output additional debug
 information during a panic/oops/stack trace.
 
 Signed-off-by: Prarit Bhargava pra...@redhat.com
 Cc: Avi Kivity a...@redhat.com
 Cc: Gleb Natapov g...@redhat.com
 Cc: Alex Williamson alex.william...@redhat.com
 Cc: Marcelo Tostatti mtosa...@redhat.com
 Cc: Ingo Molnar mi...@redhat.com
 Cc: kvm@vger.kernel.org
 Cc: x...@kernel.org
 
 [v2]: Modifications suggested by Ingo and added changes for similar output
   from process.c
 ---
  arch/x86/kernel/dumpstack.c |   11 ++-
  arch/x86/kernel/process.c   |   12 +++-
  2 files changed, 21 insertions(+), 2 deletions(-)
 
 diff --git a/arch/x86/kernel/dumpstack.c b/arch/x86/kernel/dumpstack.c
 index ae42418b..5dd680f 100644
 --- a/arch/x86/kernel/dumpstack.c
 +++ b/arch/x86/kernel/dumpstack.c
 @@ -16,6 +16,7 @@
  #include linux/nmi.h
  #include linux/sysfs.h
  
 +#include asm/hypervisor.h
  #include asm/stacktrace.h
  
  
 @@ -186,9 +187,17 @@ void dump_stack(void)
  {
   unsigned long bp;
   unsigned long stack;
 + const char *machine_name = x86;
 + const char *kernel_type = native;
 +
 + if (x86_hyper) {
 + machine_name = x86_hyper-name;
 + kernel_type = guest;
 + }
  
   bp = stack_frame(current, NULL);
 - printk(Pid: %d, comm: %.20s %s %s %.*s\n,
 + printk([%s %s kernel] Pid: %d, comm: %.20s %s %s %.*s\n,
 + machine_name, kernel_type,

I'd put the kernel info at the end of the line.

It's all very exciting I know, because we are working on this 
printout right now and all that - but to users and developers 
the PID/comm output plus the backtrace is far more important.

   current-pid, current-comm, print_tainted(),
   init_utsname()-release,
   (int)strcspn(init_utsname()-version,  ),
 diff --git a/arch/x86/kernel/process.c b/arch/x86/kernel/process.c
 index b644e1c..14bd064 100644
 --- a/arch/x86/kernel/process.c
 +++ b/arch/x86/kernel/process.c
 @@ -28,6 +28,7 @@
  #include asm/fpu-internal.h
  #include asm/debugreg.h
  #include asm/nmi.h
 +#include asm/hypervisor.h
  
  /*
   * per-CPU TSS segments. Threads are completely 'soft' on Linux,
 @@ -124,6 +125,13 @@ void exit_thread(void)
  void show_regs_common(void)
  {
   const char *vendor, *product, *board;
 + const char *machine_name = x86;
 + const char *kernel_type = native;
 +
 + if (x86_hyper) {
 + machine_name = x86_hyper-name;
 + kernel_type = guest;
 + }
  
   vendor = dmi_get_system_info(DMI_SYS_VENDOR);
   if (!vendor)
 @@ -135,7 +143,9 @@ void show_regs_common(void)
   /* Board Name is optional */
   board = dmi_get_system_info(DMI_BOARD_NAME);
  
 - printk(KERN_DEFAULT Pid: %d, comm: %.20s %s %s %.*s %s %s%s%s\n,
 + printk(KERN_DEFAULT
 +[%s %s kernel] Pid: %d, comm: %.20s %s %s %.*s %s %s%s%s\n,
 +machine_name, kernel_type,
  current-pid, current-comm, print_tainted(),
  init_utsname()-release,
  (int)strcspn(init_utsname()-version,  ),

Ha, duplicate code doing almost the same thing!

I suspect you know what my next suggestion would be? :-)

Thanks,

Ingo
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [RFC 2/3] lockdep: be nice about compiling from userspace

2012-10-25 Thread Ingo Molnar

* Sasha Levin sasha.le...@oracle.com wrote:

 We can rather easily make lockdep work from userspace, although 3 issues
 remain which I'm not sure about:
 
  - Kernel naming - we can just wrap init_utsname() to return kvmtool related
 utsname, is that what we want though?
 
  - static_obj() - I don't have a better idea than calling mprobe(), which 
 sounds
 wrong as well.
 
  - debug_show_all_locks() - we don't actually call it from userspace yet, but 
 I think
 we might want to, so I'm not sure how to make it pretty using existing kernel 
 code.
 
 Signed-off-by: Sasha Levin sasha.le...@oracle.com
 ---
  kernel/lockdep.c | 9 +++--
  1 file changed, 7 insertions(+), 2 deletions(-)
 
 diff --git a/kernel/lockdep.c b/kernel/lockdep.c
 index 7981e5b..fdd3670 100644
 --- a/kernel/lockdep.c
 +++ b/kernel/lockdep.c
 @@ -567,10 +567,12 @@ static void lockdep_print_held_locks(struct task_struct 
 *curr)
  
  static void print_kernel_ident(void)
  {
 +#ifdef __KERNEL__
   printk(%s %.*s %s\n, init_utsname()-release,
   (int)strcspn(init_utsname()-version,  ),
   init_utsname()-version,
   print_tainted());
 +#endif

I guess wrapping init_utsname() is not worth it. Although 
kvmtool could provide the host system's utsname - kernel 
identity is useful for debugging info.

You could generate a Git hash version string like tools/perf/ 
does (see PERF_VERSION and tools/perf/util/PERF-VERSION-GEN), 
and put that into the -version field.

-release could be the kvmtool version, and print_tainted() 
could return an empty string.

That way you could provide init_utsname() and could remove this 
#ifdef.

  }
  
  static int very_verbose(struct lock_class *class)
 @@ -586,6 +588,7 @@ static int very_verbose(struct lock_class *class)
   */
  static int static_obj(void *obj)
  {
 +#ifdef __KERNEL__
   unsigned long start = (unsigned long) _stext,
 end   = (unsigned long) _end,
 addr  = (unsigned long) obj;
 @@ -609,6 +612,8 @@ static int static_obj(void *obj)
* module static or percpu var?
*/
   return is_module_address(addr) || is_module_percpu_address(addr);
 +#endif
 + return 1;

Could you put an:

#ifndef static_obj

around it? Then kvmtool could define its own trivial version of 
static_obj():

  #define static_obj(x) 1U

or so.

 @@ -4108,7 +4113,7 @@ void debug_check_no_locks_held(struct task_struct *task)
   if (unlikely(task-lockdep_depth  0))
   print_held_locks_bug(task);
  }
 -
 +#ifdef __KERNEL__
  void debug_show_all_locks(void)
  {
   struct task_struct *g, *p;

I guess a show-all-locks functionality would be useful to 
kvmtool as well?

Thanks,

Ingo
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [RFC 2/3] lockdep: be nice about compiling from userspace

2012-10-25 Thread Ingo Molnar

* Sasha Levin sasha.le...@oracle.com wrote:

 On 10/25/2012 04:05 AM, Ingo Molnar wrote:
  
  * Sasha Levin sasha.le...@oracle.com wrote:
  
  We can rather easily make lockdep work from userspace, although 3 issues
  remain which I'm not sure about:
 
   - Kernel naming - we can just wrap init_utsname() to return kvmtool 
  related
  utsname, is that what we want though?
 
   - static_obj() - I don't have a better idea than calling mprobe(), which 
  sounds
  wrong as well.
 
   - debug_show_all_locks() - we don't actually call it from userspace yet, 
  but I think
  we might want to, so I'm not sure how to make it pretty using existing 
  kernel code.
 
  Signed-off-by: Sasha Levin sasha.le...@oracle.com
  ---
   kernel/lockdep.c | 9 +++--
   1 file changed, 7 insertions(+), 2 deletions(-)
 
  diff --git a/kernel/lockdep.c b/kernel/lockdep.c
  index 7981e5b..fdd3670 100644
  --- a/kernel/lockdep.c
  +++ b/kernel/lockdep.c
  @@ -567,10 +567,12 @@ static void lockdep_print_held_locks(struct 
  task_struct *curr)
   
   static void print_kernel_ident(void)
   {
  +#ifdef __KERNEL__
 printk(%s %.*s %s\n, init_utsname()-release,
 (int)strcspn(init_utsname()-version,  ),
 init_utsname()-version,
 print_tainted());
  +#endif
  
  I guess wrapping init_utsname() is not worth it. Although 
  kvmtool could provide the host system's utsname - kernel 
  identity is useful for debugging info.
  
  You could generate a Git hash version string like tools/perf/ 
  does (see PERF_VERSION and tools/perf/util/PERF-VERSION-GEN), 
  and put that into the -version field.
  
  -release could be the kvmtool version, and print_tainted() 
  could return an empty string.
  
  That way you could provide init_utsname() and could remove this 
  #ifdef.
 
 Yeah, we already generate the version string for
 'lkvm version' anyways, so I guess I'll just add init_utsname().
 
 
   }
   
   static int very_verbose(struct lock_class *class)
  @@ -586,6 +588,7 @@ static int very_verbose(struct lock_class *class)
*/
   static int static_obj(void *obj)
   {
  +#ifdef __KERNEL__
 unsigned long start = (unsigned long) _stext,
   end   = (unsigned long) _end,
   addr  = (unsigned long) obj;
  @@ -609,6 +612,8 @@ static int static_obj(void *obj)
  * module static or percpu var?
  */
 return is_module_address(addr) || is_module_percpu_address(addr);
  +#endif
  +  return 1;
  
  Could you put an:
  
  #ifndef static_obj
  
  around it? Then kvmtool could define its own trivial version of 
  static_obj():
  
#define static_obj(x) 1U
  
  or so.
  
  @@ -4108,7 +4113,7 @@ void debug_check_no_locks_held(struct task_struct 
  *task)
 if (unlikely(task-lockdep_depth  0))
 print_held_locks_bug(task);
   }
  -
  +#ifdef __KERNEL__
   void debug_show_all_locks(void)
   {
 struct task_struct *g, *p;
  
  I guess a show-all-locks functionality would be useful to 
  kvmtool as well?
 
 Regarding the above two,
 
 Yes, we can wrap both static_obj() and debug_show_all_locks() 
 with #ifndefs and let kvmtool provide it's own version of 
 those two.

Only static_obj() - I see no immediate reason why you shouldn't 
be able to utilize debug_show_all_locks(). 'vm debug -a' already 
lists all backtraces on all vcpus - so 'vm debug lockdep -a' 
could list all current locks and indicate which one is held and 
by whom.

 The question is here more of a would lockdep maintainers be 
 ok with adding that considering there's no in-kernel 
 justification for those?

Yeah, such a simple patch would be acceptable to me, being nice 
isn't against the law.

Thanks,

Ingo
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] x86, add hypervisor name to dump_stack()

2012-10-24 Thread Ingo Molnar

* Prarit Bhargava pra...@redhat.com wrote:

 Debugging crash, panics, stack trace WARN_ONs, etc., from both virtual and
 bare-metal boots can get difficult very quickly.  While there are ways to
 decipher the output and determine if the output is from a virtual guest,
 the in-kernel hypervisors now have a single registration point and set
 x86_hyper.  We can use this to output a single extra line on virtual
 machines that indicates the hypervisor type.
 
 Signed-off-by: Prarit Bhargava pra...@redhat.com
 Cc: Avi Kivity a...@redhat.com
 Cc: Gleb Natapov g...@redhat.com
 Cc: Alex Williamson alex.william...@redhat.com
 Cc: Marcelo Tostatti mtosa...@redhat.com
 Cc: Ingo Molnar mi...@redhat.com
 Cc: kvm@vger.kernel.org
 Cc: x...@kernel.org
 ---
  arch/x86/kernel/dumpstack.c |3 +++
  1 file changed, 3 insertions(+)
 
 diff --git a/arch/x86/kernel/dumpstack.c b/arch/x86/kernel/dumpstack.c
 index ae42418b..75a635e 100644
 --- a/arch/x86/kernel/dumpstack.c
 +++ b/arch/x86/kernel/dumpstack.c
 @@ -17,6 +17,7 @@
  #include linux/sysfs.h
  
  #include asm/stacktrace.h
 +#include asm/hypervisor.h
  
  
  int panic_on_unrecovered_nmi;
 @@ -193,6 +194,8 @@ void dump_stack(void)
   init_utsname()-release,
   (int)strcspn(init_utsname()-version,  ),
   init_utsname()-version);
 + if (x86_hyper  x86_hyper-name)
 + printk(Hypervisor: %s\n,  x86_hyper-name);
   show_trace(NULL, NULL, stack, bp);

Looks useful, but please don't waste a full new line on it but 
embedd it in the already existing status line that prints 
details like release and version.

Thanks,

Ingo
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 1/2] kvm tools: Export DISPLAY ENV as our default host ip address

2012-09-05 Thread Ingo Molnar

* Pekka Enberg penb...@kernel.org wrote:

 On 08/24/2012 02:29 PM, Asias He wrote:
  It is useful to run a X program in guest and display it on host.
 
  1) Make host's x server listen to localhost:6000
 host_shell$ socat -d -d TCP-LISTEN:6000,fork,bind=localhost \
 UNIX-CONNECT:/tmp/.X11-unix/X0
 
  2) Start the guest and run X program
 host_shell$ lkvm run -k /boot/bzImage
guest_shell$ xlogo
 
 On Tue, Sep 4, 2012 at 4:07 PM, Avi Kivity a...@redhat.com wrote:
  Note, this is insecure, don't do this with untrusted guests.
 
 Asias, can we add a command line argument that enables this? 
 It'd be safer to keep it disabled by default.

It might also be prudent to name the option in a way that 
signals that the user of it understands the security 
implications:

--X11-trusted-guest 1

or so.

Thanks,

Ingo
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 5/7] kvm tools: enable LTO

2012-08-30 Thread Ingo Molnar

* Pekka Enberg penb...@kernel.org wrote:

 On Thu, Aug 30, 2012 at 10:36 AM, Sasha Levin levinsasha...@gmail.com wrote:
  Build with -flto set, which should enable link-time-optimizations.
 
  I'm not sure if it provides a significant performance increase, but
  it's probably just worth it for catching issues which it may cause.
 
  Signed-off-by: Sasha Levin levinsasha...@gmail.com
 
 Ingo, any objections to this?

No objections if you can live with a 2x-4x increase in build 
time - at worst it might cause funnies with the BIOS linker 
script and such.

Thanks,

Ingo
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 1/3] apic: add apic_set_eoi_write for PV use

2012-07-16 Thread Ingo Molnar

* Michael S. Tsirkin m...@redhat.com wrote:

 KVM PV EOI optimization overrides eoi_write apic op with its own
 version. Add an API for this to avoid meddling with core x86 apic driver
 data structures directly.
 
 For KVM use, we don't need any guarantees about when the switch to the
 new op will take place, so it could in theory use this API after SMP init,
 but it currently doesn't, and restricting callers to early init makes it
 clear that it's safe as it won't race with actual APIC driver use.
 
 Signed-off-by: Michael S. Tsirkin m...@redhat.com
 ---
  arch/x86/include/asm/apic.h |  3 +++
  arch/x86/kernel/apic/apic.c | 17 +
  2 files changed, 20 insertions(+)
 
 diff --git a/arch/x86/include/asm/apic.h b/arch/x86/include/asm/apic.h
 index aa5b2ee..ff8dff6 100644
 --- a/arch/x86/include/asm/apic.h
 +++ b/arch/x86/include/asm/apic.h
 @@ -469,6 +469,8 @@ static inline u32 safe_apic_wait_icr_idle(void)
   return apic-safe_wait_icr_idle();
  }
  
 +extern void __init apic_set_eoi_write(void (*eoi_write)(u32 reg, u32 v));
 +
  #else /* CONFIG_X86_LOCAL_APIC */
  
  static inline u32 apic_read(u32 reg) { return 0; }
 @@ -478,6 +480,7 @@ static inline u64 apic_icr_read(void) { return 0; }
  static inline void apic_icr_write(u32 low, u32 high) { }
  static inline void apic_wait_icr_idle(void) { }
  static inline u32 safe_apic_wait_icr_idle(void) { return 0; }
 +static inline void apic_set_eoi_write(void (*eoi_write)(u32 reg, u32 v)) {}
  
  #endif /* CONFIG_X86_LOCAL_APIC */
  
 diff --git a/arch/x86/kernel/apic/apic.c b/arch/x86/kernel/apic/apic.c
 index 39a222e..c7520b6 100644
 --- a/arch/x86/kernel/apic/apic.c
 +++ b/arch/x86/kernel/apic/apic.c
 @@ -2124,6 +2124,23 @@ void default_init_apic_ldr(void)
  }
  
  /*
 + * Override the generic EOI implementation with an optimized version.
 + * Only called during early boot when only one CPU is active and with
 + * interrupts disabled, so we know this does not race with actual APIC driver
 + * use.
 + */
 +void __init apic_set_eoi_write(void (*eoi_write)(u32 reg, u32 v))
 +{
 + struct apic **drv;
 +
 + for (drv = __apicdrivers; drv  __apicdrivers_end; drv++) {
 + /* Should happen once for each apic */
 + WARN_ON((*drv)-eoi_write == eoi_write);
 + (*drv)-eoi_write = eoi_write;
 + }
 +}
 +

ok, it's better this way.

Acked-by: Ingo Molnar mi...@kernel.org

Thanks,

Ingo
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH RFC V2 2/2] kvm PLE handler: Choose better candidate for directed yield

2012-07-10 Thread Ingo Molnar

* Raghavendra K T raghavendra...@linux.vnet.ibm.com wrote:

 --- a/virt/kvm/kvm_main.c
 +++ b/virt/kvm/kvm_main.c
 @@ -1595,6 +1595,9 @@ void kvm_vcpu_on_spin(struct kvm_vcpu *me)
   continue;
   if (waitqueue_active(vcpu-wq))
   continue;
 + if (!kvm_arch_vcpu_check_and_update_eligible(vcpu)) {
 + continue;
 + }

Saw this fly by for a second time and now it annoyed me enough 
to mention it: those curly braces are superfluous and should be 
dropped.

Thanks,

Ingo
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] apic: fix kvm build on UP without IOAPIC

2012-07-09 Thread Ingo Molnar

* Michael S. Tsirkin m...@redhat.com wrote:

 On Fri, Jul 06, 2012 at 04:12:23PM +0200, Ingo Molnar wrote:
  
  * Marcelo Tosatti mtosa...@redhat.com wrote:
  
   On Fri, Jul 06, 2012 at 01:13:14PM +0200, Ingo Molnar wrote:

* H. Peter Anvin h...@zytor.com wrote:

 On 07/01/2012 08:05 AM, Michael S. Tsirkin wrote:
 On UP i386, when APIC is disabled
 # CONFIG_X86_UP_APIC is not set
 # CONFIG_PCI_IOAPIC is not set
 
 code looking at apicdrivers never has any effect but it
 still gets compiled in. In particular, this causes
 build failures with kvm, but it generally bloats the kernel
 unnecessarily.
 
 Fix by defining both __apicdrivers and __apicdrivers_end
 to be NULL when CONFIG_X86_LOCAL_APIC is unset: I verified
 that as the result any loop scanning __apicdrivers gets optimized 
 out by
 the compiler.
 
 Warning: a .config with apic disabled doesn't seem to boot
 for me (even without this patch). Still verifying why,
 meanwhile this patch is compile-tested only.
 
 Signed-off-by: Michael S. Tsirkin m...@redhat.com
 ---
 
 Note: if this patch makes sense, can x86 maintainers
 please ACK applying it through the kvm tree, since that is
 where we see the issue that it addresses?
 Avi, Marcelo, maybe you can carry this in kvm/linux-next as a 
 temporary
 measure so that linux-next builds?
 
 
 I'm not happy about that as a workflow, but since you guys have an
 immediate problem I guess we can do that.

I'm rather unhappy about this workflow - we've got quite a few 
apic bits in the x86 tree this cycle as well and need extra 
external interaction, not.

Which KVM tree commit caused this, could someone please give a 
lkml link or quote it here? It's not referenced in the fix patch 
either.

Thanks,

Ingo
   
   This tree (kvm.git next):
   
   http://git.kernel.org/?p=virt/kvm/kvm.git;a=shortlog;h=refs/heads/next
   
   Introduced by this commit:
   
   http://git.kernel.org/?p=virt/kvm/kvm.git;a=commit;h=ab9cf4996bb989983e73da894b8dd0239aa2c3c2
  
  This bit:
  
   + if (kvm_para_has_feature(KVM_FEATURE_PV_EOI)) {
   + struct apic **drv;
   +
   + for (drv = __apicdrivers; drv  __apicdrivers_end; drv++) {
   + /* Should happen once for each apic */
   + WARN_ON((*drv)-eoi_write == kvm_guest_apic_eoi_write);
   + (*drv)-eoi_write = kvm_guest_apic_eoi_write;
   + }
   + }
   +
  
  is rather disgusting I have to say.
  
  WTH is the KVM code meddling with core x86 apic driver data 
  structures directly? At minimum factor this out and create a 
  proper apic.c function which is EXPORT_SYMBOL_GPL() exported or 
  so...
  
  Thanks,
  
  Ingo
 
 OK, so apic_set_eoi_write()?

Yes, with a changelog comment analyzing the design decisions and 
locking here - what happens if actual APIC driver use races with 
this update on SMP, why is it all safe, etc?

Thanks,

Ingo
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] apic: fix kvm build on UP without IOAPIC

2012-07-06 Thread Ingo Molnar

* H. Peter Anvin h...@zytor.com wrote:

 On 07/01/2012 08:05 AM, Michael S. Tsirkin wrote:
 On UP i386, when APIC is disabled
 # CONFIG_X86_UP_APIC is not set
 # CONFIG_PCI_IOAPIC is not set
 
 code looking at apicdrivers never has any effect but it
 still gets compiled in. In particular, this causes
 build failures with kvm, but it generally bloats the kernel
 unnecessarily.
 
 Fix by defining both __apicdrivers and __apicdrivers_end
 to be NULL when CONFIG_X86_LOCAL_APIC is unset: I verified
 that as the result any loop scanning __apicdrivers gets optimized out by
 the compiler.
 
 Warning: a .config with apic disabled doesn't seem to boot
 for me (even without this patch). Still verifying why,
 meanwhile this patch is compile-tested only.
 
 Signed-off-by: Michael S. Tsirkin m...@redhat.com
 ---
 
 Note: if this patch makes sense, can x86 maintainers
 please ACK applying it through the kvm tree, since that is
 where we see the issue that it addresses?
 Avi, Marcelo, maybe you can carry this in kvm/linux-next as a temporary
 measure so that linux-next builds?
 
 
 I'm not happy about that as a workflow, but since you guys have an
 immediate problem I guess we can do that.

I'm rather unhappy about this workflow - we've got quite a few 
apic bits in the x86 tree this cycle as well and need extra 
external interaction, not.

Which KVM tree commit caused this, could someone please give a 
lkml link or quote it here? It's not referenced in the fix patch 
either.

Thanks,

Ingo
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] apic: fix kvm build on UP without IOAPIC

2012-07-06 Thread Ingo Molnar

* Marcelo Tosatti mtosa...@redhat.com wrote:

 On Fri, Jul 06, 2012 at 01:13:14PM +0200, Ingo Molnar wrote:
  
  * H. Peter Anvin h...@zytor.com wrote:
  
   On 07/01/2012 08:05 AM, Michael S. Tsirkin wrote:
   On UP i386, when APIC is disabled
   # CONFIG_X86_UP_APIC is not set
   # CONFIG_PCI_IOAPIC is not set
   
   code looking at apicdrivers never has any effect but it
   still gets compiled in. In particular, this causes
   build failures with kvm, but it generally bloats the kernel
   unnecessarily.
   
   Fix by defining both __apicdrivers and __apicdrivers_end
   to be NULL when CONFIG_X86_LOCAL_APIC is unset: I verified
   that as the result any loop scanning __apicdrivers gets optimized out by
   the compiler.
   
   Warning: a .config with apic disabled doesn't seem to boot
   for me (even without this patch). Still verifying why,
   meanwhile this patch is compile-tested only.
   
   Signed-off-by: Michael S. Tsirkin m...@redhat.com
   ---
   
   Note: if this patch makes sense, can x86 maintainers
   please ACK applying it through the kvm tree, since that is
   where we see the issue that it addresses?
   Avi, Marcelo, maybe you can carry this in kvm/linux-next as a temporary
   measure so that linux-next builds?
   
   
   I'm not happy about that as a workflow, but since you guys have an
   immediate problem I guess we can do that.
  
  I'm rather unhappy about this workflow - we've got quite a few 
  apic bits in the x86 tree this cycle as well and need extra 
  external interaction, not.
  
  Which KVM tree commit caused this, could someone please give a 
  lkml link or quote it here? It's not referenced in the fix patch 
  either.
  
  Thanks,
  
  Ingo
 
 This tree (kvm.git next):
 
 http://git.kernel.org/?p=virt/kvm/kvm.git;a=shortlog;h=refs/heads/next
 
 Introduced by this commit:
 
 http://git.kernel.org/?p=virt/kvm/kvm.git;a=commit;h=ab9cf4996bb989983e73da894b8dd0239aa2c3c2

This bit:

 + if (kvm_para_has_feature(KVM_FEATURE_PV_EOI)) {
 + struct apic **drv;
 +
 + for (drv = __apicdrivers; drv  __apicdrivers_end; drv++) {
 + /* Should happen once for each apic */
 + WARN_ON((*drv)-eoi_write == kvm_guest_apic_eoi_write);
 + (*drv)-eoi_write = kvm_guest_apic_eoi_write;
 + }
 + }
 +

is rather disgusting I have to say.

WTH is the KVM code meddling with core x86 apic driver data 
structures directly? At minimum factor this out and create a 
proper apic.c function which is EXPORT_SYMBOL_GPL() exported or 
so...

Thanks,

Ingo
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] kvm: optimize ISR lookups

2012-05-23 Thread Ingo Molnar

* Avi Kivity a...@redhat.com wrote:

 On 05/22/2012 02:01 AM, Thomas Gleixner wrote:
   
   Others are not my fault :)
   
   Seriously, if Avi/Marcelo want to rewrite the ISR emulation
 
  Interesting POV, really.
 
  Did you ever notice that the kernel is a collaborative effort and not
  controlled by Avi/Marcelo?
 
  Did you ever notice that arch/x86/kvm is part of arch/x86? 
 
 This is silly.  Most of the time the kernel is advanced by 
 incremental patches.  Sometimes it is advanced by minor or 
 major refactoring.  It is never advanced by personal attacks 
 on contributors.

Thomas wasn't so much doing a personal attack, it was pointing 
out stupidity and then it was mocking the repeated stupidity. He 
very politely explained his point of view (with which I agree), 
and then you guys pressed the issue and there's just so many 
hours in the merge window, so you asked to be flamed ...

Avi, if you cannot be brought to properly reject incomplete 
patches going in the wrong direction then others maintainers 
interested in the code will do it.

If you start to consistently require from KVM contributors 
incremental updates in the right direction, not piling crap on 
crap, then such incidents won't happen. This isn't the first 
such incident but there's hope that it might be the last one.

The rule in arch/x86/ (and many other subsystems) is very 
simple: we don't speed up crappy code. If you want to speed it 
up then make it clean first, *then* is it suited for speedups. 
Crappy code is fragile and bound to introduce bugs, and crappy 
code leads to continued increased maintenance overhead, so 
crappy code is basically under a perpetual code freeze until 
it's uncrapped.

Thanks,

Ingo
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [RFC] sched: make callers check lock contention for cond_resched_lock()

2012-05-18 Thread Ingo Molnar

* Takuya Yoshikawa takuya.yoshik...@gmail.com wrote:

 Replaced Ingo's address with kernel.org one,
 
 On Thu, 03 May 2012 17:47:30 +0200
 Peter Zijlstra pet...@infradead.org wrote:
 
  On Thu, 2012-05-03 at 22:00 +0900, Takuya Yoshikawa wrote:
   But as I could not see why spin_needbreak() was differently
   implemented
   depending on CONFIG_PREEMPT, I wanted to understand the meaning. 
  
  Its been that way since before voluntary preemption was introduced, so
  its possible Ingo simply missed that spot and nobody noticed until now.
  
  Ingo, do you have any recollections from back when?
 
 ping

I'm not sure we had a usable spin_is_contended() back then, nor 
was the !PREEMPT case in my mind really.

( The patch looks ugly though, in 99% of the lines it just does
  something that cond_resched_lock() itself could do. )

Thanks,

Ingo
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH RFC V8 0/17] Paravirtualized ticket spinlocks

2012-05-07 Thread Ingo Molnar

* Raghavendra K T raghavendra...@linux.vnet.ibm.com wrote:

 This series replaces the existing paravirtualized spinlock mechanism
 with a paravirtualized ticketlock mechanism. The series provides
 implementation for both Xen and KVM.(targeted for 3.5 window)
 
 Note: This needs debugfs changes patch that should be in Xen / linux-next
https://lkml.org/lkml/2012/3/30/687
 
 Changes in V8:
  - Reabsed patches to 3.4-rc4
  - Combined the KVM changes with ticketlock + Xen changes (Ingo)
  - Removed CAP_PV_UNHALT since it is redundant (Avi). But note that we
 need newer qemu which uses KVM_GET_SUPPORTED_CPUID ioctl.
  - Rewrite GET_MP_STATE condition (Avi)
  - Make pv_unhalt = bool (Avi)
  - Move out reset pv_unhalt code to vcpu_run from vcpu_block (Gleb)
  - Documentation changes (Rob Landley)
  - Have a printk to recognize that paravirt spinlock is enabled (Nikunj)
  - Move out kick hypercall out of CONFIG_PARAVIRT_SPINLOCK now
so that it can be used for other optimizations such as 
flush_tlb_ipi_others etc. (Nikunj)
 
 Ticket locks have an inherent problem in a virtualized case, because
 the vCPUs are scheduled rather than running concurrently (ignoring
 gang scheduled vCPUs).  This can result in catastrophic performance
 collapses when the vCPU scheduler doesn't schedule the correct next
 vCPU, and ends up scheduling a vCPU which burns its entire timeslice
 spinning.  (Note that this is not the same problem as lock-holder
 preemption, which this series also addresses; that's also a problem,
 but not catastrophic).
 
 (See Thomas Friebel's talk Prevent Guests from Spinning Around
 http://www.xen.org/files/xensummitboston08/LHP.pdf for more details.)
 
 Currently we deal with this by having PV spinlocks, which adds a layer
 of indirection in front of all the spinlock functions, and defining a
 completely new implementation for Xen (and for other pvops users, but
 there are none at present).
 
 PV ticketlocks keeps the existing ticketlock implemenentation
 (fastpath) as-is, but adds a couple of pvops for the slow paths:
 
 - If a CPU has been waiting for a spinlock for SPIN_THRESHOLD
   iterations, then call out to the __ticket_lock_spinning() pvop,
   which allows a backend to block the vCPU rather than spinning.  This
   pvop can set the lock into slowpath state.
 
 - When releasing a lock, if it is in slowpath state, the call
   __ticket_unlock_kick() to kick the next vCPU in line awake.  If the
   lock is no longer in contention, it also clears the slowpath flag.
 
 The slowpath state is stored in the LSB of the within the lock tail
 ticket.  This has the effect of reducing the max number of CPUs by
 half (so, a small ticket can deal with 128 CPUs, and large ticket
 32768).
 
 For KVM, one hypercall is introduced in hypervisor,that allows a vcpu to kick
 another vcpu out of halt state.
 The blocking of vcpu is done using halt() in (lock_spinning) slowpath.
 
 Overall, it results in a large reduction in code, it makes the native
 and virtualized cases closer, and it removes a layer of indirection
 around all the spinlock functions.
 
 The fast path (taking an uncontended lock which isn't in slowpath
 state) is optimal, identical to the non-paravirtualized case.
 
 The inner part of ticket lock code becomes:
   inc = xadd(lock-tickets, inc);
   inc.tail = ~TICKET_SLOWPATH_FLAG;
 
   if (likely(inc.head == inc.tail))
   goto out;
   for (;;) {
   unsigned count = SPIN_THRESHOLD;
   do {
   if (ACCESS_ONCE(lock-tickets.head) == inc.tail)
   goto out;
   cpu_relax();
   } while (--count);
   __ticket_lock_spinning(lock, inc.tail);
   }
 out:  barrier();
 which results in:
   push   %rbp
   mov%rsp,%rbp
 
   mov$0x200,%eax
   lock xadd %ax,(%rdi)
   movzbl %ah,%edx
   cmp%al,%dl
   jne1f   # Slowpath if lock in contention
 
   pop%rbp
   retq   
 
   ### SLOWPATH START
 1:and$-2,%edx
   movzbl %dl,%esi
 
 2:mov$0x800,%eax
   jmp4f
 
 3:pause  
   sub$0x1,%eax
   je 5f
 
 4:movzbl (%rdi),%ecx
   cmp%cl,%dl
   jne3b
 
   pop%rbp
   retq   
 
 5:callq  *__ticket_lock_spinning
   jmp2b
   ### SLOWPATH END
 
 with CONFIG_PARAVIRT_SPINLOCKS=n, the code has changed slightly, where
 the fastpath case is straight through (taking the lock without
 contention), and the spin loop is out of line:
 
   push   %rbp
   mov%rsp,%rbp
 
   mov$0x100,%eax
   lock xadd %ax,(%rdi)
   movzbl %ah,%edx
   cmp%al,%dl
   jne1f
 
   pop%rbp
   retq   
 
   ### SLOWPATH START
 1:pause  
   movzbl (%rdi),%eax
   cmp%dl,%al
   jne1b
 
   pop%rbp
   retq   
   ### SLOWPATH END
 
 The unlock code is complicated by the need to 

Re: [PATCH RFC 0/5] apic: eoi optimization support

2012-05-07 Thread Ingo Molnar

* Michael S. Tsirkin m...@redhat.com wrote:

 I'm looking at reducing the interrupt overhead for virtualized guests:
 some workloads spend a large part of their time processing interrupts.
 This patchset supplies infrastructure to reduce the IRQ ack overhead on
 x86: the idea is to add an eoi_write callback that we can then optimize
 without touching other apic functionality.
 
 The main user will be kvm: on kvm, an EOI write from the guest causes an
 expensive exit to host; we can avoid this using shared memory as the
 last patch in the series demonstrates.
 
 But I also wrote a micro-optimized version for the regular x2apic: this
 shaves off a branch and about 9 instructions from EOI when x2apic is
 used, and a comment in ack_APIC_irq implies that someone counted
 instructions there, at some point.
 
 Also included in the patchset are a couple of trivial macro fixes.
 
 The patches work fine on my boxes and I did look at the
 objdump output to verify that the generated code
 for the micro-optimization patch looks right
 and actually is shorter.
 
 Some benchmark results below (not sure what kind of
 testing is the most appropriate) show a tiny
 but measureable improvement. The tests were run on
 an AMD box with 24 cpus.
 
 - A clean kernel build after reboot shows
 a tiny but measureable improvement in system time
 which means lower CPU overhead (though not measureable
 in total time - that is dominated by user time and fluctuates
 too much):
 
 linux# reboot -f
 ...
 linux# make clean
 linux# time make -j 64 LOCALVERSION= 21  /dev/null
 
 Before:
 
 real2m52.244s
 user35m53.833s
 sys 6m7.194s
 
 After:
 
 real2m52.827s
 user35m48.916s
 sys 6m2.305s
 
 - perf micro-benchmarks seem to consistently show
   a tiny improvement in total time as well but it's below
   the confidence level of 3 std deviations:
 
 # ./tools/perf/perf   stat --sync --repeat 100 --null perf bench sched 
 messaging
 ...
0.414666797 seconds time elapsed ( +-  1.29% )
 
 Performance counter stats for 'perf bench sched messaging' (100 runs):
 
0.395370891 seconds time elapsed
 ( +-  1.04% )
 
 
 # ./tools/perf/perf   stat --sync --repeat 100 --null perf bench sched pipe 
 -l 1
0.307019664 seconds time elapsed
 ( +-  0.10% )
 
0.304738024 seconds time elapsed
 ( +-  0.08% )
 
 The patches are against 3.4-rc3 - let me know if
 I need to rebase.
 
 I think patches 1-2 are definitely a good idea,
 and patches 3-4 might be a good idea.
 Please review, and consider patches 1-4 for linux 3.5.
 
 Thanks,
 MST
 
 Michael S. Tsirkin (5):
   apic: fix typo EIO_ACK - EOI_ACK and document
   apic: use symbolic APIC_EOI_ACK
   x86: add apic-eoi_write callback
   x86: eoi micro-optimization
   kvm_para: guest side for eoi avoidance
 
  arch/x86/include/asm/apic.h|   22 --
  arch/x86/include/asm/apicdef.h |2 +-
  arch/x86/include/asm/bitops.h  |6 ++-
  arch/x86/include/asm/kvm_para.h|2 +
  arch/x86/kernel/apic/apic_flat_64.c|2 +
  arch/x86/kernel/apic/apic_noop.c   |1 +
  arch/x86/kernel/apic/apic_numachip.c   |1 +
  arch/x86/kernel/apic/bigsmp_32.c   |1 +
  arch/x86/kernel/apic/es7000_32.c   |2 +
  arch/x86/kernel/apic/numaq_32.c|1 +
  arch/x86/kernel/apic/probe_32.c|1 +
  arch/x86/kernel/apic/summit_32.c   |1 +
  arch/x86/kernel/apic/x2apic_cluster.c  |1 +
  arch/x86/kernel/apic/x2apic_phys.c |1 +
  arch/x86/kernel/apic/x2apic_uv_x.c |1 +
  arch/x86/kernel/kvm.c  |   51 
 ++--
  arch/x86/platform/visws/visws_quirks.c |2 +-
  17 files changed, 88 insertions(+), 10 deletions(-)

No objections from the x86 side.

In terms of advantages, could you please create perf stat runs 
that counts the number of MMIOs or so? That should show a pretty 
obvious improvement - and that is enough as proof, no need to 
try to reproduce the performance win in a noisy benchmark.

Thanks,

Ingo
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH RFC 0/5] apic: eoi optimization support

2012-05-07 Thread Ingo Molnar

* Michael S. Tsirkin m...@redhat.com wrote:

 On Mon, May 07, 2012 at 12:35:12PM +0200, Ingo Molnar wrote:
   Michael S. Tsirkin (5):
 apic: fix typo EIO_ACK - EOI_ACK and document
 apic: use symbolic APIC_EOI_ACK
 x86: add apic-eoi_write callback
 x86: eoi micro-optimization
 kvm_para: guest side for eoi avoidance
   
arch/x86/include/asm/apic.h|   22 --
arch/x86/include/asm/apicdef.h |2 +-
arch/x86/include/asm/bitops.h  |6 ++-
arch/x86/include/asm/kvm_para.h|2 +
arch/x86/kernel/apic/apic_flat_64.c|2 +
arch/x86/kernel/apic/apic_noop.c   |1 +
arch/x86/kernel/apic/apic_numachip.c   |1 +
arch/x86/kernel/apic/bigsmp_32.c   |1 +
arch/x86/kernel/apic/es7000_32.c   |2 +
arch/x86/kernel/apic/numaq_32.c|1 +
arch/x86/kernel/apic/probe_32.c|1 +
arch/x86/kernel/apic/summit_32.c   |1 +
arch/x86/kernel/apic/x2apic_cluster.c  |1 +
arch/x86/kernel/apic/x2apic_phys.c |1 +
arch/x86/kernel/apic/x2apic_uv_x.c |1 +
arch/x86/kernel/kvm.c  |   51 
   ++--
arch/x86/platform/visws/visws_quirks.c |2 +-
17 files changed, 88 insertions(+), 10 deletions(-)
  
  No objections from the x86 side.
 
 Is kvm.git a good tree to merge this through?

Fine to me, but I haven't checked how widely it conflicts with 
existing bits: by the looks of it most of the linecount is on 
the core x86 side, while the kvm change is well concentrated.

  In terms of advantages, could you please create perf stat 
  runs that counts the number of MMIOs or so? That should show 
  a pretty obvious improvement - and that is enough as proof, 
  no need to try to reproduce the performance win in a noisy 
  benchmark.
 
 You mean with kvm PV, right? On real hardware the 
 micro-optimization removes branches and maybe cache-misses but 
 I don't see why would it reduce MMIOs.

Yeah, on KVM. On real hw I doubt it's measurable.

Thanks,

Ingo
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH RFC 0/5] apic: eoi optimization support

2012-05-07 Thread Ingo Molnar

* Avi Kivity a...@redhat.com wrote:

 On 05/07/2012 02:40 PM, Ingo Molnar wrote:

No objections from the x86 side.
   
   Is kvm.git a good tree to merge this through?
 
  Fine to me, but I haven't checked how widely it conflicts with 
  existing bits: by the looks of it most of the linecount is on 
  the core x86 side, while the kvm change is well concentrated.
 
 I don't see a problem with merging though tip.git - we're close to the
 next merge window, and the guest side rarely causes conflicts.  But
 please don't apply the last patch yet, I want to review it more closely
 (esp. with the host side).

That last patch was marked don't apply yet, so I definitely 
planned on another iteration that incorporates all the feedback 
that has been given.

Thanks,

Ingo
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH RFC V8 0/17] Paravirtualized ticket spinlocks

2012-05-07 Thread Ingo Molnar

* Avi Kivity a...@redhat.com wrote:

  PS: Nikunj had experimented that pv-flush tlb + 
  paravirt-spinlock is a win on PLE where only one of them 
  alone could not prove the benefit.
 
 I'd like to see those numbers, then.
 
 Ingo, please hold on the kvm-specific patches, meanwhile.

I'll hold off on the whole thing - frankly, we don't want this 
kind of Xen-only complexity. If KVM can make use of PLE then Xen 
ought to be able to do it as well.

If both Xen and KVM makes good use of it then that's a different 
matter.

Thanks,

Ingo
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH RFC V6 0/11] Paravirtualized ticketlocks

2012-03-31 Thread Ingo Molnar

* Andi Kleen a...@firstfloor.org wrote:

  Care to back that up with numbers and proper trace evidence 
  instead of handwaving?
 
 E.g. my plumbers presentations on lock and mm scalability from 
 last year has some graphs that show this very clearly, plus 
 some additional data on the mutexes. This compares to the 
 glibc futex locks, which perform much better than the kernel 
 mutex locks on larger systems under higher contention

If you mean these draft slides:

  http://www.halobates.de/plumbers-fork-locks_v2.pdf

it has very little verifiable information in it. It just 
cryptically says lock hold time microbenchmark, which might or 
might not be a valid measurement.

You could have been honest and straightforward in your first 
mail:

 I ran workload X on machine Y, and got results Z.

Instead you are *hindering* the discussion:

 Given your tone I will not supply an URL. [...]

If you meant the above URL then it's not the proper numbers 
Thomas asked for, just some vague slides. If you meant something 
else then put up or shut up.

Thanks,

Ingo
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [GIT PULL] KVM updates for the 3.4 merge window

2012-03-26 Thread Ingo Molnar

* Avi Kivity a...@redhat.com wrote:

 Say a fix comes in which needs to be mainlined during -rc.  So 
 I put it in some other branch, to be sent off to Linus in a 
 few days after maturing a little.  Meanwhile developers see an 
 incomplete tree, since that patch is not in it.
 
 Once Linus pulls, I can merge it back (or even before, if I'm 
 reasonably certain it's not going to change), but it leaves a 
 history of unneeded merges.  Or we can do throwaway merges 
 like tip.git.

We don't do throwaway merges in the -tip development branches 
themselves, i.e. in tip:sched/core, tip:perf/core, 
tip:timers/core, etc.

When a fix goes into tip:sched/urgent then until Linus merges it 
it's not in tip:sched/core. 99% of the fixes don't *have to* go 
into sched/core straight away.

In the odd case where there's some dependency, we can manually 
merge it into tip:sched/core ahead of Linus pulling into an -rc. 
Those rare merges are not a problem, and I explain the reason in 
the merge commit itself.

If you look at:

 gll v3.2..v3.3 | grep -E '/urgent.*/core'

you'll see that I only had to do it once in the previous cycle:

 d6c1c49de577 Merge branch 'perf/urgent' into perf/core

and the changelog explains the background:

Merge reason: Add these cherry-picked commits so that future changes
  on perf/core don't conflict.

it was a rare, oddball situation where we cherry-picked 
perf/core changes into perf/urgent. Extra merges are perfectly 
fine in that case.

The 'throwaway' tip:master branch you are probably referring to 
is basically just a testing branch, a convenient merged tree of 
the one dozen maintainer trees that are hosted in -tip. Since we 
don't want to force Linus's hand of him being able to reject 
individual trees we don't merge them properly - hence the 
integrated tree is a throwaway tree in theory.

In practice I tend to throw it away only once per cycle, around 
-rc1, once all pending trees went to Linus. tip:master is not 
used for any Git based contribution work - it's for testing, 
it's for people who want to work with patches - the commits 
themselves always go into persistent non-rebasing, append-only 
Git trees.

If we mess up bisectability we do a delta fix. When choosing 
between somewhat better bisectability and a proper history that 
others can rely on then proper history wins hands down.

Thanks,

Ingo
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v5 0/3] KVM: perf: kvm events analysis tool

2012-03-06 Thread Ingo Molnar

* Xiao Guangrong xiaoguangr...@linux.vnet.ibm.com wrote:

 The output example is following:
 
 #./perf kvm-events report --event mmio --vcpu 3

So we already have 'perf kvm':

 usage: perf kvm [options] {top|record|report|diff|buildid-list}

which is a sub-namespace for all things KVM instrumentation 
goodies. [ Arguably there should be a 'perf kvm trace' as well, 
but I digress. ]

So, your new tool has a similar workflow to:

  perf kvm record
  perf kvm report

but differs from it in terms of events used and in terms of 
reported output.

To me it appears that your tool is basically pretty similar to 
'perf stat', adapted to KVM, right?

So, could your new tool's workflow be simplified like this:

  perf kvm stat ..

?

To automatically stat all vcpus in the system, the well-known 
-a/--all-cpus system-wide method could be used:

  perf kvm stat -a ...

with stat output following immediately after it has finished.

It should also be possible to use those new events in a 
recording fashion - a new, rather logical command sub-space 
could be used for that:

 perf kvm stat record ...
 perf kvm stat report ...

[ This could be expanded to regular 'perf stat' as well: 'perf 
  stat record' and 'perf stat report' would be useful - but I 
  suspect that's outside the scope of your patches. ]

Thanks,

Ingo
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v5 0/3] KVM: perf: kvm events analysis tool

2012-03-06 Thread Ingo Molnar

* Xiao Guangrong xiaoguangr...@linux.vnet.ibm.com wrote:

 Thanks for your review, Ingo!
 
 On 03/06/2012 05:07 PM, Ingo Molnar wrote:
 
 
  So, your new tool has a similar workflow to:
  
perf kvm record
perf kvm report
  
  but differs from it in terms of events used and in terms of 
  reported output.
  
  To me it appears that your tool is basically pretty similar to 
  'perf stat', adapted to KVM, right?
  
  So, could your new tool's workflow be simplified like this:
  
perf kvm stat ..
  
  ?
  
  To automatically stat all vcpus in the system, the well-known 
  -a/--all-cpus system-wide method could be used:
  
perf kvm stat -a ...
  
  with stat output following immediately after it has finished.
  
 
 
 Actually, the stat information has already been included in the report.
 
 
  It should also be possible to use those new events in a 
  recording fashion - a new, rather logical command sub-space 
  could be used for that:
  
   perf kvm stat record ...
   perf kvm stat report ...
  
  [ This could be expanded to regular 'perf stat' as well: 'perf 
stat record' and 'perf stat report' would be useful - but I 
suspect that's outside the scope of your patches. ]
  
 
 
 I totally agree with you except i prefer 'perf kvm events' to 
 'perf kvm stat' :) : it records some specified kvm events and 
 smartly analyze it. I think it matches its doing better.

Well, my problem is that *all* of the 'perf kvm' functionality 
deals with events :-) So the 'events' does not say much - you 
just add a variation to the theme without really distinguishing 
it.

Since 'perf stat' is not yet used and the output of your tool 
looks quite perf stat alike, why not use that name and 
standardize all these workflows?

Thanks,

Ingo
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[VT-d reboot problems] Re: [PATCH] x86 / reboot: Blacklist Dell OptiPlex 990 known to require PCI reboot

2012-01-31 Thread Ingo Molnar

(added KVM folks to the Cc:)

* Bastien ROUCARIES roucaries.bast...@gmail.com wrote:

 Ping^2
 
 Bastien
 On Mon, Jan 23, 2012 at 11:28 AM, Bastien ROUCARIES
 roucaries.bast...@gmail.com wrote:
  On Mon, Jan 16, 2012 at 8:21 PM, H. Peter Anvin h...@zytor.com wrote:
  On 01/16/2012 03:27 AM, Bastien ROUCARIES wrote:
 
  Does it work if you disable VT-d in the firmware? If so, then adding it
  to the reboot method blacklist is the wrong fix - we need to figure out
  why VT-d interferes with Dell's reboot code.
 
  Yes it work
 
 
  This is particularly so since we are very close to having a full Dell
  model catalogue in the kernel...
 
  Ping ? Do you need some dump ? testing ?

So disabling VT-d in the BIOS fixes the reboot problem and 
Matthew Garrett suggests we should figure out why and how VT-d 
on this Dell box interferes with the reboot method.

Thanks,

Ingo
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] kvm tools: Use assert() helper to check a variable value

2011-12-19 Thread Ingo Molnar

* Cyrill Gorcunov gorcu...@gmail.com wrote:

 On Mon, Dec 19, 2011 at 09:13:28AM +0200, Pekka Enberg wrote:
  
  -  BUILD_BUG_ON(i  E820_X_MAX);
  +  assert(i = E820_X_MAX);
  
  We should use BUG_ON() like tools/perf does.
  
 
 We dont have it yet. So I'll introduce this helper later,
 but note that we will have to cover _all_ assert() calls then,
 so it's better to make in a separate patch. [...]

There's only about a dozen of them, should be easy.

Thanks,

Ingo
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] kvm tools: Use assert() helper to check a variable value

2011-12-19 Thread Ingo Molnar

* Cyrill Gorcunov gorcu...@gmail.com wrote:

 +#
 +#define BUILD_BUG_ON(condition)  ((void)sizeof(char[1 - 
 2*!!(condition)]))
 +#define BUG_ON(condition)assert(!(condition))

Just a sidenote, the patch is fine but the above will result in 
weird double negated assertion messages.

it's better to just do our own __BUG() function and use it. That 
way it can also use the kernel standard 'BUG: ...' message 
format.

To make such BUG()s easier to debug via GDB, do this from the 
__BUG() handler:

raise(SIGABRT);

GDB will catch that signal.

Thanks,

Ingo
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] kvm tools: Use assert() helper to check a variable value

2011-12-19 Thread Ingo Molnar

* Cyrill Gorcunov gorcu...@gmail.com wrote:

 On Mon, Dec 19, 2011 at 11:40:09AM +0100, Ingo Molnar wrote:
  
  GDB will catch that signal.
  
 
 Yeah, good point! Pekka, drop this patch please, I'll make new one at evening.

Patch is good as-is IMO - assert() was used before so it's not a 
regression per se.

Acked-by: Ingo Molnar mi...@elte.hu

The tool-specific BUG() implementation can be added as a delta 
on top of that. It's in fact better to keep those two steps 
separate.

Thanks,

Ingo
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] kvm tools: Fix 'make kvmconfig'

2011-12-16 Thread Ingo Molnar

* Sasha Levin levinsasha...@gmail.com wrote:

 Set features which depend on the config we select to make 'make kvmconfig'
 non-interactive.
 
 Signed-off-by: Sasha Levin levinsasha...@gmail.com

Acked-by: Ingo Molnar mi...@elte.hu

Thanks,

Ingo
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] kvm tools: mmap guest kernel instead of reading it into memory

2011-12-12 Thread Ingo Molnar

* Sasha Levin levinsasha...@gmail.com wrote:

 + kvm-bz_start = mmap(NULL, kvm-bz_len, PROT_READ | PROT_WRITE,
 + MAP_PRIVATE, kvm-bz_fd, setup_end);
  
 + /* NOP everything before the kernel start */
 + memset(kvm-bz_start, 0x90, setup_size - setup_end);

You should really, really think about the case where mmap() 
fails.

Thanks,

Ingo
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] kvm tools: Fix serial port probing

2011-12-09 Thread Ingo Molnar

* Pekka Enberg penb...@kernel.org wrote:

                         dev-ier        = ioport__read8(data)  0x3f;
  +                       kvm__irq_line(kvm, dev-irq, dev-ier?1:0);
                         break;
                 case UART_LCR:
                         dev-lcr        = ioport__read8(data);
 
 Applied, thanks!
 
 Ingo, does this fix the occasional slowdowns you are seeing?

Indeed it does!

I still see things like 'top' refreshes progressing - i.e. 
taking something like 100-200 msecs to finish, but it's a *lot* 
faster now.

Thanks,

Ingo
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 05/28] kvm tools: 64-bit tidy; use PRIx64 when printf'ing u64s and link appropriately

2011-12-07 Thread Ingo Molnar

* Matt Evans m...@ozlabs.org wrote:

  [...]  I haven't looked closely at Matt's 
  patches, but it should be possible to use [un]signed long long 
  for the u64/s64 types, I would think.
  
  In tools/kvm/ we are using our own u64/s64 definitions, not 
  glibc's, so i think it should be fine - as long as we don't pick 
  up int-l64.h accidentally via the 
  arch/powerpc/include/asm/types.h exception for user-space.
 
 That's what's happening here; we're __powerpc64__ and 
 !__KERNEL__, tools/kvm/include/linux/types.h includes 
 asm/types.h so gets the int-l64.h definition of __u64, as 
 above.  :/
 
 builtin-run.c:389: error: format `%llx' expects type `long 
 long unsigned int', but argument 2 has type `u64'

So either define __KERNEL__ or patch a __NEW_USERSPACE__ define 
into power/asm/types.h and use it - if the PowerPC folks agree 
with that approach.

Sane userspace should not be prevented from using the same sane 
types the kernel is already using :-)

Thanks,

Ingo
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] kvm tools: Add 'kvm nmi' command

2011-12-07 Thread Ingo Molnar

* Sasha Levin levinsasha...@gmail.com wrote:

 This patch adds a 'kvm nmi' command which can be used to trigger NMIs in
 guests.
 
 Mostly useful for debugging purposes.

Could we make this something like 'kvm debug nmi'?

We already have some debug functionality, right? Those should be 
consolidated under 'kvm debug'.

Thanks,

Ingo
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 05/28] kvm tools: 64-bit tidy; use PRIx64 when printf'ing u64s and link appropriately

2011-12-07 Thread Ingo Molnar

* Matt Evans m...@ozlabs.org wrote:

 On 08/12/11 04:14, Pekka Enberg wrote:
  On Wed, 7 Dec 2011, Ingo Molnar wrote:
  
 
  * Matt Evans m...@ozlabs.org wrote:
 
  [...]  I haven't looked closely at Matt's
  patches, but it should be possible to use [un]signed long long
  for the u64/s64 types, I would think.
 
  In tools/kvm/ we are using our own u64/s64 definitions, not
  glibc's, so i think it should be fine - as long as we don't pick
  up int-l64.h accidentally via the
  arch/powerpc/include/asm/types.h exception for user-space.
 
  That's what's happening here; we're __powerpc64__ and
  !__KERNEL__, tools/kvm/include/linux/types.h includes
  asm/types.h so gets the int-l64.h definition of __u64, as
  above.  :/
 
  builtin-run.c:389: error: format `%llx' expects type `long
  long unsigned int', but argument 2 has type `u64'
 
  So either define __KERNEL__ or patch a __NEW_USERSPACE__ define
  into power/asm/types.h and use it - if the PowerPC folks agree
  with that approach.
 
  Sane userspace should not be prevented from using the same sane
  types the kernel is already using :-)
  
  How does perf handle this? I'm sure it has the exact same 
  issue, doesn't it?
 
 It does; ironically it uses PRIblah, so I had followed its 
 example.

Sadly it regressed lately in that area - it was certainly 
PRIblah-less in the early stages :-)

Pekka, how do the headers react if we define __KERNEL__? 
Alternatively, allowing an override in powerpc/types.h beyond 
__KERNEL__ looks sensible as well.

Thanks,

Ingo
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 05/28] kvm tools: 64-bit tidy; use PRIx64 when printf'ing u64s and link appropriately

2011-12-07 Thread Ingo Molnar

* Matt Evans m...@ozlabs.org wrote:

 Since tools/kvm/include/linux/types.h only requires __u32, 
 __u64 et al from asm/types.h, wouldn't it be most 
 straightforward to just #include asm-generic/int-ll64.h?  
 This avoids #define __KERNEL__ breaking other includes brought 
 into userland, avoids changing system headers/distros, and 
 includes the file we're really interested in on both x86  
 PPC.

No system headers need to be changed: you can just patch 
powerpc/types.h with the extra __SANE_USERSPACE__ define (or any 
suitable name) and the KVM tool should work as-is.

The beauty of tools/kvm/ being integrated into the kernel tree.

Really, if that works and if the PowerPC folks agree then we 
have an immediate, fully adequate, instantly applicable 
solution.

Thanks,

Ingo
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 05/28] kvm tools: 64-bit tidy; use PRIx64 when printf'ing u64s and link appropriately

2011-12-07 Thread Ingo Molnar

* Matt Evans m...@ozlabs.org wrote:

  [...]  I haven't looked closely at Matt's 
  patches, but it should be possible to use [un]signed long long 
  for the u64/s64 types, I would think.
  
  In tools/kvm/ we are using our own u64/s64 definitions, not 
  glibc's, so i think it should be fine - as long as we don't pick 
  up int-l64.h accidentally via the 
  arch/powerpc/include/asm/types.h exception for user-space.
 
 That's what's happening here; we're __powerpc64__ and 
 !__KERNEL__, tools/kvm/include/linux/types.h includes 
 asm/types.h so gets the int-l64.h definition of __u64, as 
 above.  :/
 
 builtin-run.c:389: error: format `%llx' expects type `long 
 long unsigned int', but argument 2 has type `u64'

So either define __KERNEL__ or patch a __NEW_USERSPACE__ define 
into power/asm/types.h and use it - if the PowerPC folks agree 
with that approach.

Sane userspace should not be prevented from using the same sane 
types the kernel is already using :-)

Thanks,

Ingo
--
To unsubscribe from this list: send the line unsubscribe kvm-ppc in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 05/28] kvm tools: 64-bit tidy; use PRIx64 when printf'ing u64s and link appropriately

2011-12-07 Thread Ingo Molnar

* Matt Evans m...@ozlabs.org wrote:

 On 08/12/11 04:14, Pekka Enberg wrote:
  On Wed, 7 Dec 2011, Ingo Molnar wrote:
  
 
  * Matt Evans m...@ozlabs.org wrote:
 
  [...]  I haven't looked closely at Matt's
  patches, but it should be possible to use [un]signed long long
  for the u64/s64 types, I would think.
 
  In tools/kvm/ we are using our own u64/s64 definitions, not
  glibc's, so i think it should be fine - as long as we don't pick
  up int-l64.h accidentally via the
  arch/powerpc/include/asm/types.h exception for user-space.
 
  That's what's happening here; we're __powerpc64__ and
  !__KERNEL__, tools/kvm/include/linux/types.h includes
  asm/types.h so gets the int-l64.h definition of __u64, as
  above.  :/
 
  builtin-run.c:389: error: format `%llx' expects type `long
  long unsigned int', but argument 2 has type `u64'
 
  So either define __KERNEL__ or patch a __NEW_USERSPACE__ define
  into power/asm/types.h and use it - if the PowerPC folks agree
  with that approach.
 
  Sane userspace should not be prevented from using the same sane
  types the kernel is already using :-)
  
  How does perf handle this? I'm sure it has the exact same 
  issue, doesn't it?
 
 It does; ironically it uses PRIblah, so I had followed its 
 example.

Sadly it regressed lately in that area - it was certainly 
PRIblah-less in the early stages :-)

Pekka, how do the headers react if we define __KERNEL__? 
Alternatively, allowing an override in powerpc/types.h beyond 
__KERNEL__ looks sensible as well.

Thanks,

Ingo
--
To unsubscribe from this list: send the line unsubscribe kvm-ppc in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 05/28] kvm tools: 64-bit tidy; use PRIx64 when printf'ing u64s and link appropriately

2011-12-07 Thread Ingo Molnar

* Matt Evans m...@ozlabs.org wrote:

 Since tools/kvm/include/linux/types.h only requires __u32, 
 __u64 et al from asm/types.h, wouldn't it be most 
 straightforward to just #include asm-generic/int-ll64.h?  
 This avoids #define __KERNEL__ breaking other includes brought 
 into userland, avoids changing system headers/distros, and 
 includes the file we're really interested in on both x86  
 PPC.

No system headers need to be changed: you can just patch 
powerpc/types.h with the extra __SANE_USERSPACE__ define (or any 
suitable name) and the KVM tool should work as-is.

The beauty of tools/kvm/ being integrated into the kernel tree.

Really, if that works and if the PowerPC folks agree then we 
have an immediate, fully adequate, instantly applicable 
solution.

Thanks,

Ingo
--
To unsubscribe from this list: send the line unsubscribe kvm-ppc in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 05/28] kvm tools: 64-bit tidy; use PRIx64 when printf'ing u64s and link appropriately

2011-12-06 Thread Ingo Molnar

* Sasha Levin levinsasha...@gmail.com wrote:

 Ingo actually got us to remove all the PRI* specifiers, but 
 that was back when we only did x86 :)
 
 Ingo, does it make sense to use them now when we support 
 different architectures?

Not at all - ppc should use a sane u64/s64 definition, i.e. 
int-ll64.h instead of the int-l64.h crap.

The powerpc maintainers indicated that they'd fix that, a couple 
of years ago, but nothing appears to have come out of that.

Thanks,

Ingo
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 05/28] kvm tools: 64-bit tidy; use PRIx64 when printf'ing u64s and link appropriately

2011-12-06 Thread Ingo Molnar

* Paul Mackerras pau...@samba.org wrote:

 On Tue, Dec 06, 2011 at 09:28:27AM +0100, Ingo Molnar wrote:
  
  * Sasha Levin levinsasha...@gmail.com wrote:
  
   Ingo actually got us to remove all the PRI* specifiers, but 
   that was back when we only did x86 :)
   
   Ingo, does it make sense to use them now when we support 
   different architectures?
  
  Not at all - ppc should use a sane u64/s64 definition, i.e. 
  int-ll64.h instead of the int-l64.h crap.
  
  The powerpc maintainers indicated that they'd fix that, a couple 
  of years ago, but nothing appears to have come out of that.
 
 We changed it for the kernel, but not for userspace (i.e. 
 glibc) because of concerns about possibly breaking existing 
 userspace programs. [...]

Indeed, you do:
 
 #if defined(__powerpc64__)  !defined(__KERNEL__)
 # include asm-generic/int-l64.h
 #else
 # include asm-generic/int-ll64.h
 #endif

which correctly uses int-ll64.h everywhere except 64-bit 
userspace inclusions. So i take back my 'nothing appears to have 
come out of that' comment - it's all nicely fixed!

 [...]  I haven't looked closely at Matt's 
 patches, but it should be possible to use [un]signed long long 
 for the u64/s64 types, I would think.

In tools/kvm/ we are using our own u64/s64 definitions, not 
glibc's, so i think it should be fine - as long as we don't pick 
up int-l64.h accidentally via the 
arch/powerpc/include/asm/types.h exception for user-space.

Stray uses of int64 should be converted to u64 (i see some in 
tools/kvm/virtio/9p-pdu.c) but otherwise we should be ok - 
unless i'm missing some complication.

Thanks,

Ingo
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 1/2] kvm tools: Zero out event before calling epoll_ctl()

2011-12-06 Thread Ingo Molnar

* Sasha Levin levinsasha...@gmail.com wrote:

 Zero out struct epoll_event before calling epoll_ctl(). This isn't really
 required, but is done to avoid warnings.
 
 Signed-off-by: Sasha Levin levinsasha...@gmail.com
 ---
  tools/kvm/kvm-ipc.c |2 +-
  1 files changed, 1 insertions(+), 1 deletions(-)
 
 diff --git a/tools/kvm/kvm-ipc.c b/tools/kvm/kvm-ipc.c
 index 40ab457..68c2565 100644
 --- a/tools/kvm/kvm-ipc.c
 +++ b/tools/kvm/kvm-ipc.c
 @@ -132,7 +132,7 @@ static void *kvm_ipc__thread(void *param)
  
  int kvm_ipc__start(int sock)
  {
 - struct epoll_event ev;
 + struct epoll_event ev = {0};
  
   server_fd = sock;

While it is true that epoll_ctl() currently ignores certain 
fields so we could leave them uninitialized but generally it's 
very good practice to only pass in well-defined values to kernel 
syscalls, so the zeroing is a quality of implementation issue as 
well, not just a Valgrind warning fix.

Thanks,

Ingo
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 05/28] kvm tools: 64-bit tidy; use PRIx64 when printf'ing u64s and link appropriately

2011-12-06 Thread Ingo Molnar

* Sasha Levin levinsasha...@gmail.com wrote:

 Ingo actually got us to remove all the PRI* specifiers, but 
 that was back when we only did x86 :)
 
 Ingo, does it make sense to use them now when we support 
 different architectures?

Not at all - ppc should use a sane u64/s64 definition, i.e. 
int-ll64.h instead of the int-l64.h crap.

The powerpc maintainers indicated that they'd fix that, a couple 
of years ago, but nothing appears to have come out of that.

Thanks,

Ingo
--
To unsubscribe from this list: send the line unsubscribe kvm-ppc in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 05/28] kvm tools: 64-bit tidy; use PRIx64 when printf'ing u64s and link appropriately

2011-12-06 Thread Ingo Molnar

* Paul Mackerras pau...@samba.org wrote:

 On Tue, Dec 06, 2011 at 09:28:27AM +0100, Ingo Molnar wrote:
  
  * Sasha Levin levinsasha...@gmail.com wrote:
  
   Ingo actually got us to remove all the PRI* specifiers, but 
   that was back when we only did x86 :)
   
   Ingo, does it make sense to use them now when we support 
   different architectures?
  
  Not at all - ppc should use a sane u64/s64 definition, i.e. 
  int-ll64.h instead of the int-l64.h crap.
  
  The powerpc maintainers indicated that they'd fix that, a couple 
  of years ago, but nothing appears to have come out of that.
 
 We changed it for the kernel, but not for userspace (i.e. 
 glibc) because of concerns about possibly breaking existing 
 userspace programs. [...]

Indeed, you do:
 
 #if defined(__powerpc64__)  !defined(__KERNEL__)
 # include asm-generic/int-l64.h
 #else
 # include asm-generic/int-ll64.h
 #endif

which correctly uses int-ll64.h everywhere except 64-bit 
userspace inclusions. So i take back my 'nothing appears to have 
come out of that' comment - it's all nicely fixed!

 [...]  I haven't looked closely at Matt's 
 patches, but it should be possible to use [un]signed long long 
 for the u64/s64 types, I would think.

In tools/kvm/ we are using our own u64/s64 definitions, not 
glibc's, so i think it should be fine - as long as we don't pick 
up int-l64.h accidentally via the 
arch/powerpc/include/asm/types.h exception for user-space.

Stray uses of int64 should be converted to u64 (i see some in 
tools/kvm/virtio/9p-pdu.c) but otherwise we should be ok - 
unless i'm missing some complication.

Thanks,

Ingo
--
To unsubscribe from this list: send the line unsubscribe kvm-ppc in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [F.A.Q.] the advantages of a shared tool/kernel Git repository, tools/perf/ and tools/kvm/

2011-11-10 Thread Ingo Molnar

* Anca Emanuel anca.eman...@gmail.com wrote:

 I'd even argue that that C library is obviously something the 
 kernelshould offer as well - so klibc is the way to go and would 
 help usfurther streamline this and keep Linux quality high.
 
 I think there is code to share. Why not ?

The biggest downside of libc integration into the kernel would be 
that the libc ABI is *vastly* larger than the kernel ABI, and i'm not 
sure the kernel community is good enough to handle that. It's roughly 
3000 ABI components compared to the 300 ABI functions the kernel has 
today - so at least an order of magnitude larger...

The biggest upside of libc integration into the kernel would be that 
we could push Linux kernel improvements into the C library - and thus 
to apps - immediately, along a much larger ABI surface. The 
'specialization' resolution of the libc ABI is an order of magnitude 
larger than that of the kernel's, giving many more opportunities for 
good, workload specific optimizations and unique solutions.

Today the latency of getting a kernel improvement to applications via 
a change in the C library is above a year, so most kernel people 
don't actually try to improve the C library but try to find 
improvements on the kernel level which gets to a distro within a 
couple of months.

If the kernel offers a /proc/libc.so.6 library then the kernel will 
always be 'in sync' with the library (there's no library to install 
on-disk - it would be offered by the kernel) and we could use 
integration techniques like the vDSO uses today.

Thanks,

Ingo
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [F.A.Q.] the advantages of a shared tool/kernel Git repository, tools/perf/ and tools/kvm/

2011-11-10 Thread Ingo Molnar

* Alexander Graf ag...@suse.de wrote:

 [...]

 Outside of the kernel tree, you can do your own decisions. If 
 someone thinks it's a great idea to write device emulation in 
 python (I would love that!), he could go in and implement it 
 without having to worry about Linus possibly rejecting it because 
 it's out of scope for a Linux kernel testing tool. If you want to 
 create the greatest GUI for virtualization the world has ever seen, 
 you can just do it! Nothing holds you back.

We actually recently added Python bindings to event tracing in perf:

 earth5:~/tip find tools/perf/ -name '*.py'
 tools/perf/python/twatch.py
 tools/perf/util/setup.py
 tools/perf/scripts/python/Perf-Trace-Util/lib/Perf/Trace/Util.py
 tools/perf/scripts/python/Perf-Trace-Util/lib/Perf/Trace/Core.py
 tools/perf/scripts/python/Perf-Trace-Util/lib/Perf/Trace/SchedGui.py
 tools/perf/scripts/python/syscall-counts.py
 tools/perf/scripts/python/sctop.py
 tools/perf/scripts/python/sched-migration.py
 tools/perf/scripts/python/check-perf-trace.py
 tools/perf/scripts/python/futex-contention.py
 tools/perf/scripts/python/failed-syscalls-by-pid.py
 tools/perf/scripts/python/net_dropmonitor.py
 tools/perf/scripts/python/syscall-counts-by-pid.py
 tools/perf/scripts/python/netdev-times.py

... and Linus did not object (so far ;-) - nor does he IMHO have many 
reasons to object as long as the code is sane and useful. Nor did 
Linus object when perf extended its scope from profiling to tracing, 
system monitoring, etc.

While i don't talk for Linus, the only 'hard boundary' that Linus 
enforces and expects all maintainers to enforce that i'm aware of is 
don't do crazy crap. Everything else is possible as long as it's 
high quality and reasonable, with a good upside story that is 
relevant to the kernel - you can let your imagination run wild, 
there's no artificial barriers that i'm aware of.

Anyway, i have outlined the rough consequences of a user-space 
project being inside the kernel repo in this post:

  http://lkml.org/lkml/2011/11/10/86

... and they are definitely not trivial and easy to meet.

Thanks,

Ingo
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [F.A.Q.] the advantages of a shared tool/kernel Git repository, tools/perf/ and tools/kvm/

2011-11-09 Thread Ingo Molnar

* Ted Ts'o ty...@mit.edu wrote:

 On Tue, Nov 08, 2011 at 01:55:09PM +0100, Ingo Molnar wrote:

  I guess you can do well with a split project as well - my main 
  claim is that good compatibility comes *naturally* with 
  integration.
 
 Here I have to disagree; my main worry is that integration makes it 
 *naturally* easy for people to skip the hard work needed to keep a 
 stable kernel/userspace interface.

There's two observations i have:

Firstly, how come that this has not actually happened in practice in 
the case of perf? Looks like the (random) version compatibility 
experiment i conducted yesterday should have failed spectacularly.

Secondly, within the kernel we don't have a stable ABI - we don't 
even have stable APIs, and still it's a 15 MLOC project that is 
thriving.

I argue that it is thriving in large part *BECAUSE* we don't have a 
stable API of any sort: if stuff is broken and the whole world needs 
to be fixed then we fix the whole world.

One could even make the argument that in the special case of deeply 
kernel integrated tools a stable kernel/userspace interface for those 
special, Linux-specific ABIs is *too expensive* and results in an 
inferior end result.

I'd really love it if people started thinking outside the box a bit. 
Why do people assume that *all* of the kernel project's code *has* to 
run in kernel mode? It's not a valid technical restriction *at all*.

It has been done like this for 30 years is not a valid technical 
restriction. Splitting deeply kernel related tools away from the 
kernel was a valid decision 15 years ago due to kernel image size and 
similar resource considerations. Today it's less and less true and we 
are *actively hurting* from tools being split away from the kernel 
proper.

Graphics, storage and user-space suspend are good examples i think of 
separation gone bad: and the resulting mess has cost Linux distros 
*the desktop market*. Think about it, the price we pay for this 
inferior end result is huge.

ext4tools is an example of separation gone good. I think it's the 
exception that strengthens the rule.

Why was the 2.4 to 2.6 migration so difficult? I can tell you the 
distro side story: mainly because the release took too long and tools 
broke left and right which created stop-ship situations. We had a 
much larger ABI cross section than we could sanely handle with the 
testing power we had. So we got into a negative feedback loop: the 
reduction in 2.3 testers further delayed the release, which moved the 
(independently evolving ...) tools further away from the to-be-2.6 
kernel, which further reduced the effective testing. It was not a 
sustainable. We addressed many of the problems by shortening the 
release cycle to 3 months, but IMHO we have not addressed the 
underlying problem of lack of integration.

Responsible release engineering is actually *easier* if you don't 
have a moving target and if you have the ability to fix stuff that 
breaks without being bound to an external project.

Deeply kernel integrated tools could come in the initrd and could be 
offered by the kernel, statically linked images made available via 
/proc/sbin or such. We could even swap them out on demand so there's 
no RAM overhead. There's no technical barrier.

I'd even argue that that C library is obviously something the kernel 
should offer as well - so klibc is the way to go and would help us 
further streamline this and keep Linux quality high.

We could actually keep the kernel and such tools tightly integrated, 
reducing the compatibility matrix. The kernel would upgrade with 
these tools but it *already* upgrades with some user-space components 
like the vdso so it's not a true technical barrier.

 The other worry which I've mentioned, but which I haven't seen 
 addressed, is that the even if you can use a perf from a newer 
 kernel with an older kernel, this causes distributions a huge 
 amount of pain, since they have to package two different kernel 
 source packages, and only compile perf from the newer kernel source 
 package.  This leads to all sorts of confusion from a distribution 
 packaging point of view.
 
 For example, assume that RHEL 5, which is using 2.6.32 or something 
 like that, wants to use a newer e2fsck that does a better job 
 fixing file system corruptions. [...]

Firstly, it's not a big issue: if a tool comes with the kernel 
package then it's part of the regular backporting flow: if you 
backport a new tool to an old kernel then you do the same as if you 
backported a new kernel feature to an older enterprise kernel. 
Happens all the time, it's a technological problem with technological 
solutions. Enterprise distros explicitly do not support 
cross-distro-version package installs, so backporting will be done 
anyway.

Secondly, i actually think that the obsession with using obsolete 
kernel versions is silly technologically - and it has evolved that 
way partly *BECAUSE* we are not integrated enough and distros fear 
kernel

Re: [F.A.Q.] the advantages of a shared tool/kernel Git repository, tools/perf/ and tools/kvm/

2011-11-09 Thread Ingo Molnar

* Ted Ts'o ty...@mit.edu wrote:

 On Tue, Nov 08, 2011 at 07:14:57PM +0200, Anca Emanuel wrote:

  @Ten Ts'o: you are sponsored by something like microsoft (joking) 
  ? Stop trolling. If you are not familiar with perf, or other 
  tools, save your time and do some useful things.
 
 I am quite familiar with perf.  A disagreement with how things are 
 done is not trolling.

Anca, Ted is not trolling me in any fashion. He is a (very 
successful) tool space and kernel developer and his opinion and 
experience about how tools should interact with the kernel project is 
of utmost importance.

Clearly Ted thinks that filesystem tools should stay separate from 
the kernel repo. I agree with him that the case for filesystem tool 
integration is weaker than for deeply kernel integrated tools such as 
perf or kvm and calling him a troll is not a way to settle that 
honest disagreement in any case.

Thanks,

Ingo
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [F.A.Q.] the advantages of a shared tool/kernel Git repository, tools/perf/ and tools/kvm/

2011-11-09 Thread Ingo Molnar

* John Kacur jka...@redhat.com wrote:

 On Tue, 8 Nov 2011, Ted Ts'o wrote:
 
  On Tue, Nov 08, 2011 at 01:55:09PM +0100, Ingo Molnar wrote:

   I guess you can do well with a split project as well - my main 
   claim is that good compatibility comes *naturally* with 
   integration.
  
  Here I have to disagree; my main worry is that integration makes 
  it *naturally* easy for people to skip the hard work needed to 
  keep a stable kernel/userspace interface.
  
  The other worry which I've mentioned, but which I haven't seen 
  addressed, is that the even if you can use a perf from a newer 
  kernel with an older kernel, this causes distributions a huge 
  amount of pain, since they have to package two different kernel 
  source packages, and only compile perf from the newer kernel 
  source package.  This leads to all sorts of confusion from a 
  distribution packaging point of view.
  
  For example, assume that RHEL 5, which is using 2.6.32 or 
  something like that, wants to use a newer e2fsck that does a 
  better job fixing file system corruptions.  If it were bundled 
  with the kernel, then they would have to package up the v3.1 
  kernel sources, and have a source RPM that isn't used for 
  building kernel sources, but just to build a newer version of 
  e2fsck.  Fortunately, they don't have to do that.  They just pull 
  down a newer version of e2fsprogs, and package, build, test, and 
  ship that.
  
  In addition, suppose Red Hat ships a security bug fix which means 
  a new kernel-image RPM has to be shipped.  Does that mean that 
  Red Hat has to ship new binary RPM's for any and all tools/* 
  programs that they have packaged as separate RPM's?  Or should 
  installing a new kernel RPM also imply dropping new binaries in 
  /usr/bin/perf, et. al? There are all sorts of packaging questions 
  that are raised integration, and from where I sit I don't think 
  they've been adequately solved yet.
 
  
 This in practice is not a big deal.
 
 There are many approaches for how the RPM can be built, but basically
 getting the perf source is just a matter of
 make perf-tar-src-pkg or friends such as
 make perf-tarbz2-src-pkg
 which will create perf-3.2.0-rc1.tar, and perf-3.2.0-rc1.tar.bz2
 respectively which can be used for the src rpms. This tar ball can be used
 as a separate package or subpackage.

Great - the 'perf is impossible for distros' was a common counter 
argument early in the perf project's lifetime - i'm glad it turned 
out to be bogus in practice.

Would it further simplify distro side life if all utilities deeply 
related to the kernel got built together and came in a single well 
working package? kutils-3.2.0-rc1.rpm or such.

They would always upgrade together with the kernel so there would 
never be any forced backporting or separate errata pressure, beyond 
the existing flow of -stable fixes.

We do -stable fixes for tools/perf/ as well, for stability/security 
fixes, naturally - other tools would have to follow the regular 
kernel maintenance process to manage high priority fixes.

Basically distros could rely on the kernel and its utilities being a 
coherent whole, which is expected to work together, which is 
maintained and built together and which, if it regresses, is handled 
by the regular -stable kernel regressions process with high priority.

I expect it would grow one by one - it's not like we can or want to 
force utilities to go into the kernel proper. I'd also expect that 
new tools would be added initially - not existing ones moved. My 
question to you would rather be, would it make the life of distro 
release engineers gradually easier if this space grew gradually over 
the years, adding more and more critical tool functionality?

Thanks,

Ingo
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [F.A.Q.] the advantages of a shared tool/kernel Git repository, tools/perf/ and tools/kvm/

2011-11-09 Thread Ingo Molnar

* Gerd Hoffmann kra...@redhat.com wrote:

  For reference, the default set of colors now is (from
  tools/perf/util/ui/browser.c):
  
  static struct ui_browser__colorset {
  const char *name, *fg, *bg;
  int colorset;
  } ui_browser__colorsets[] = {
  {
  .colorset = HE_COLORSET_TOP,
  .name = top,
  .fg   = red,
  .bg   = default,
 
 Bad idea IMO.  Setting only one of foreground+background gives 
 pretty much unpredictable results.  My xterms have different 
 background colors, the ones with a root shell happen to have a 
 (dark) red background. Which results in red-on-dark-red text.  Not 
 good.
 
 I'd strongly suggest to either set both background and foreground 
 to default or to set both to a specific color.  When doing the 
 latter make sure the colors have enougth contrast so they are 
 readable.

Indeed.

What we want to have is to have a set of distinctive colors - just 
two (background, foreground) colors are not enough - we also need 
colors to highlight certain information - we need 5-6 colors for the 
output to be maximally expressive. Is there a canonical way to handle 
that while still adapting to user preferences automatically by taking 
background/foreground color scheme of the xterm into account?

I suspect to fix the worst of the fallout we could add some logic to 
detect low contrast combinations (too low color distance) and fall 
back to the foreground/background colors in that case.

Plus allowing full .perfconfig configurability of all the relevant 
colors, for those with special taste.

Thanks,

Ingo
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [F.A.Q.] the advantages of a shared tool/kernel Git repository, tools/perf/ and tools/kvm/

2011-11-09 Thread Ingo Molnar

* Arnaldo Carvalho de Melo a...@redhat.com wrote:

  sure the colors have enougth contrast so they are readable.
 
 Problem is figuring out something that is considered a good default 
 :-\ There will always be somebody that will complain.
 
 When doing the coding to allow using the default xterm colors I 
 tried several of the gnome-terminal xterm profiles and all looked 
 kinda sane for the top (hottest functions, with most hits) and 
 medium lines, where we combine some chosen foreground color 
 (red and green).
 
 Laziest solution would be: If the user customizes that much, could 
 the user please customize this as well? :-)

I don't think it's acceptable to output unreadable color combinations 
(red on dark red, etc.) in any case, so we should add some safety 
mechanism that detects bad color combinations and a fallback, static 
color scheme.

I like the current way how perf top/report adapts to the xterm color 
scheme. I use it both on dark and white backgrounds and it's easy to 
mistake it for --stdio output - which is good, a good TUI should 
blend into the console's color scheme. So i think we should keep that 
and just detect the few cases where it results in something 
unreadable.

Thanks,

Ingo
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [F.A.Q.] the advantages of a shared tool/kernel Git repository, tools/perf/ and tools/kvm/

2011-11-09 Thread Ingo Molnar

* Steven Rostedt rost...@goodmis.org wrote:

 On Tue, Nov 08, 2011 at 10:32:25AM +0100, Ingo Molnar wrote:
  
  None of the perf developers with whom i'm working complained 
  about the shared repo so far - publicly or privately. By all 
  means they are enjoying it and if you look at the stats and 
  results you'll agree that they are highly productive working in 
  that environment.
 
 Just because you brought it up.
 
 I personally find it awkward to work in the linux tools directory. 
 Maybe this is the reason that I haven't been such a big contributor 
 of perf. [...]

Well, this is an argument with a long history we've had from the 
moment we started perf - i think the main underlying reason for that 
is that you still see perf as competition to ftrace instead of seeing 
perf the child of ftrace, the next version of ftrace, the next 
iterative step of evolution :-/

Unfortunately there's not much that i can do about that beyond 
telling you that you are IMHO wrong - you as the main ftrace 
developer thinking that it's competition is a self-fulfilling 
expectation.

Eventually someone will do the right thing and implement 'perf trace' 
(there's still the tip:tmp.perf/trace2 prototype branch) and users 
will flock to that workflow because it's so much more intuitive in 
practice. From what i've seen from the short prototype experiments 
i've conducted it's a no-brainer superior workflow and design.

 [...] I only pushed ktest into the kernel tools directory because 
 people convinced me to do so. Having it there didn't seem to bring 
 in many other developers. [...]

It was somewhat similar with perf - contributors only arrived after 
it went upstream, and even then with a delay of a few releases.

Also, and it pains me to have to mention it, but putting a .pl script 
into the kernel repo is not necessarily a reciepe for attracting a 
lot of developers. We went to great lengths to kill the .cc perf 
report file in perf, to keep the programming environment familiar to 
kernel developers and other low level utility folks.

Also, obviously a tool has to be important, interesting and has to 
offer a distinct edge over other tools to attract contributors. Maybe 
tools/testing/ktest/ does not sound that interesting? Naming also 
matters: i sure would have moved it to tools/ktest/, its name already 
suggests that it's about testing, why repeat that twice? Sounds 
weird.

In that sense tools/kvm/ is better than perf: it has already 
attracted a core group of good, productive contributors despite still 
being an out of tree fork.

The point here was that Pekka  co not just clearly enjoys working on 
tools/kvm/ and has no trouble attracting contributors, but also 
*relies* on it being in the kernel tree.

Thanks,

Ingo
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [F.A.Q.] the advantages of a shared tool/kernel Git repository, tools/perf/ and tools/kvm/

2011-11-09 Thread Ingo Molnar

* Arnaldo Carvalho de Melo a...@redhat.com wrote:

 Em Wed, Nov 09, 2011 at 10:30:50AM -0200, Arnaldo Carvalho de Melo escreveu:
  Em Wed, Nov 09, 2011 at 01:26:34PM +0100, Gerd Hoffmann escreveu:
Its fully configurable as of now, what we need is a set of .perfconfigs
that show how people think its better, we try it, set it as the default,
leave the others in tools/perf/Documentation/perfconfig/color.examples.
 
   Yep, a set of examples works too.
 
   The colors are not fully configurable yet though.  First, when switching
   all five colorsets to default, default there are still things which
   are colored (top bar, bottom bar, keys help display).  Second there is
   no way to set terminal attributes (i.e. top = bold or selected =
   reverse).
 
  Ok, adding those to my TODO list.
 
  /me goes to check if http://perf.wiki.kernel.org is back working so that
  we can have a _public_ TODO list, perhaps it may attract more
  contributors :)
 
 Oops, there is one, utterly old tho ;-\
 
 I tried changing that and adding this entry but:
 
 https://perf.wiki.kernel.org/articles/u/s/e/Special~UserLogin_94cd.html
 
 Returns:
 
 The requested URL /articles/u/s/e/Special~UserLogin_94cd.html was not
 found on this server.
 
 Ingo, would that G+ page be useful for that?

Not sure - i think perf.wiki.kernel.org is a good place for 
documentation kind of information. The G+ page is more like for news 
items.

Thanks,

Ingo
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


  1   2   3   4   5   6   7   >