Re: [PATCH RFC 0/2] kvm: Improving directed yield in PLE handler

2012-07-09 Thread Christian Borntraeger
On 09/07/12 08:20, Raghavendra K T wrote:
> Currently Pause Looop Exit (PLE) handler is doing directed yield to a
> random VCPU on PL exit. Though we already have filtering while choosing
> the candidate to yield_to, we can do better.
> 
> Problem is, for large vcpu guests, we have more probability of yielding
> to a bad vcpu. We are not able to prevent directed yield to same guy who
> has done PL exit recently, who perhaps spins again and wastes CPU.
> 
> Fix that by keeping track of who has done PL exit. So The Algorithm in series
> give chance to a VCPU which has:


We could do the same for s390. The appropriate exit would be diag44 (yield to 
hypervisor).

Almost all s390 kernels use diag9c (directed yield to a given guest cpu) for 
spinlocks, though.
So there is no win here, but there are other cases were diag44 is used, e.g. 
cpu_relax.
I have to double check with others, if these cases are critical, but for now, 
it seems 
that your dummy implementation  for s390 is just fine. After all it is a no-op 
until 
we implement something.

Thanks

Christian

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] ehci: Kick async schedule on wakeup in the non companion case

2012-07-09 Thread Gerd Hoffmann
On 07/06/12 16:53, Hans de Goede wrote:
> Commit 0f588df8b3688b00e77aabaa32e26ece5f19bd39, added code
> to ehci_wakeup to kick the async schedule on wakeup, but the else
> was positioned wrong making it trigger for devices which are routed
> to the companion rather then to the ehci controller itself.
> 
> This patch fixes this. Note that the "programming style" with using the
> return at the end of the companion block matches how the companion case
> is handled in the other ports ops, and is done this way for consistency.

Added to usb patch queue.

thanks,
  Gerd

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Issue with mouse-capture

2012-07-09 Thread Veruca Salt


I realise you guys are very busy, but I'm about to go into the Qemu-kvm code 
seriously for the first time; I have to try and fix an irritating little issue 
whereby on all settings(including maximised and no frame) our Virtual Machine 
requires a manual click to capture the mouse.
We're using 32 bit XP and 64 bit Windows 7 on a 64 bit linux host.

Before I start making an unholy mess of your code, can anybody point me in the 
direction of the most likely areas to check?
I want to do this using 1.0; we aren't planning to move to 1.1 just yet, as the 
USB changes would require extra integration and device-level 
changes/installations to our VM's.

Look forward to hearing from you.

Simon O'Riordan
  --
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 2/2] KVM: PPC: booke: Add watchdog emulation

2012-07-09 Thread Alexander Graf


On 09.07.2012, at 07:13, Bhushan Bharat-R65777  wrote:

> 
> 
>> -Original Message-
>> From: Alexander Graf [mailto:ag...@suse.de]
>> Sent: Saturday, July 07, 2012 1:21 PM
>> To: Wood Scott-B07421
>> Cc: Bhushan Bharat-R65777; kvm-...@vger.kernel.org; kvm@vger.kernel.org; 
>> Bhushan
>> Bharat-R65777
>> Subject: Re: [PATCH 2/2] KVM: PPC: booke: Add watchdog emulation
>> 
>> 
>> On 07.07.2012, at 01:37, Scott Wood wrote:
>> 
>>> On 07/06/2012 08:17 AM, Alexander Graf wrote:
 On 28.06.2012, at 08:17, Bharat Bhushan wrote:
> +/*
> + * The timer system can almost deal with LONG_MAX timeouts, except that
> + * when you get very close to LONG_MAX, the slack added can cause 
> overflow.
> + *
> + * LONG_MAX/2 is a conservative threshold, but it should be adequate for
> + * any realistic use.
> + */
> +#define MAX_TIMEOUT (LONG_MAX/2)
 
 Should this really be in kvm code?
>>> 
>>> It looks like we can use NEXT_TIMER_MAX_DELTA for this.
>>> 
> +mask = 1ULL << (63 - period);
> +tb = get_tb();
> +if (tb & mask)
> +nr_jiffies += mask;
 
 To be honest, you lost me here. nr_jiffies is jiffies, right? While
 mask is basically in timebase granularity. I suppose you're just
 reusing the variable here for no good reason - the compiler will
 gladly optimize things for you if you write things a bit more verbose
 :).
>>> 
>>> Probably due to the way do_div() works, but yeah, it's confusing.  Maybe
>>> something generic like "ticks", "interval", "remaining", etc. would be
>>> better, with a comment on the do_div saying it's converting timebase
>>> ticks into jiffies.
>> 
>> Well, you could start off with a variable "delta_tb", then do
>> 
>>  nr_jiffies = delta_tb;
>>  x = do_div(...);
>> 
>> and things would suddenly become readable :). Of course I don't object to
>> comments along the code either :).
> 
> Ok
> 
>> 
>>> 
> +static void arm_next_watchdog(struct kvm_vcpu *vcpu)
> +{
> +unsigned long nr_jiffies;
> +
> +nr_jiffies = watchdog_next_timeout(vcpu);
> +if (nr_jiffies < MAX_TIMEOUT)
> +mod_timer(&vcpu->arch.wdt_timer, jiffies + nr_jiffies);
> +else
> +del_timer(&vcpu->arch.wdt_timer);
 
 Can you del a timer that's not armed? Could that ever happen in this case?
>>> 
>>> "del_timer() deactivates a timer - this works on both active and
>>> inactive timers."
>> 
>> Ah, good :).
>> 
>>> 
 Also, could the timer possibly be running somewhere, so do we need
>> del_timer_sync? Or don't we need to care?
>>> 
>>> This can be called in the context of the timer, so del_timer_sync()
>>> would hang.
>>> 
>>> As for what would happen if a caller from a different context were to
>>> race with a timer, I think you could end up with the timer armed based
>>> on an old TCR.  del_timer_sync() won't help though, unless you replace
>>> mod_timer() with del_timer_sync() plus add_timer() (with a check to see
>>> whether it's running in timer context).  A better solution is probably
>>> to use a spinlock in arm_next_watchdog().
>> 
>> Yup. Either way, we have a race that the guest might not expect.
> 
> Ok, will use spinlock in arm_next_watchdog().
> 
>> 
>>> 
> +void kvmppc_watchdog_func(unsigned long data)
> +{
> +struct kvm_vcpu *vcpu = (struct kvm_vcpu *)data;
> +u32 tsr, new_tsr;
> +int final;
> +
> +do {
> +new_tsr = tsr = vcpu->arch.tsr;
> +final = 0;
> +
> +/* Time out event */
> +if (tsr & TSR_ENW) {
> +if (tsr & TSR_WIS) {
> +new_tsr = (tsr & ~TCR_WRC_MASK) |
> +  (vcpu->arch.tcr & TCR_WRC_MASK);
> +vcpu->arch.tcr &= ~TCR_WRC_MASK;
> +final = 1;
> +} else {
> +new_tsr = tsr | TSR_WIS;
> +}
> +} else {
> +new_tsr = tsr | TSR_ENW;
> +}
> +} while (cmpxchg(&vcpu->arch.tsr, tsr, new_tsr) != tsr);
> +
> +if (new_tsr & (TSR_WIS | TCR_WRC_MASK)) {
> +smp_wmb();
> +kvm_make_request(KVM_REQ_PENDING_TIMER, vcpu);
> +kvm_vcpu_kick(vcpu);
> +}
> +
> +/*
> + * Avoid getting a storm of timers if the guest sets
> + * the period very short.  We'll restart it if anything
> + * changes.
> + */
> +if (!final)
> +arm_next_watchdog(vcpu);
 
 Mind to explain this part a bit further?
>>> 
>>> The whole function, or some subset near the end?
>>> 
>>> The "if (!final)" check means that we stop running the timer after final
>>> expiration, to prevent the host from being flooded with timers if the
>>> guest sets a short period but does not have TCR set to exit to QEMU.
>>> Timers will resume the next time TSR/TCR is updated.
>> 
>> Ah. The semantics m

[PATCH 2/2] KVM: X86: introduce set_mmio_exit_info

2012-07-09 Thread Xiao Guangrong
Introduce set_mmio_exit_info to cleanup the common code

Signed-off-by: Xiao Guangrong 
---
 arch/x86/kvm/x86.c |   33 +
 1 files changed, 17 insertions(+), 16 deletions(-)

diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index 7445545..7771f45 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -3755,9 +3755,6 @@ static int read_exit_mmio(struct kvm_vcpu *vcpu, gpa_t 
gpa,
 static int write_exit_mmio(struct kvm_vcpu *vcpu, gpa_t gpa,
   void *val, int bytes)
 {
-   struct kvm_mmio_fragment *frag = &vcpu->mmio_fragments[0];
-
-   memcpy(vcpu->run->mmio.data, frag->data, frag->len);
return X86EMUL_CONTINUE;
 }

@@ -3825,6 +3822,20 @@ mmio:
return X86EMUL_CONTINUE;
 }

+static void set_mmio_exit_info(struct kvm_vcpu *vcpu,
+  struct kvm_mmio_fragment *frag, bool write)
+{
+   struct kvm_run *run = vcpu->run;
+
+   run->exit_reason = KVM_EXIT_MMIO;
+   run->mmio.phys_addr = frag->gpa;
+   run->mmio.len = frag->len;
+   run->mmio.is_write = vcpu->mmio_is_write = write;
+
+   if (write)
+   memcpy(run->mmio.data, frag->data, frag->len);
+}
+
 int emulator_read_write(struct x86_emulate_ctxt *ctxt, unsigned long addr,
void *val, unsigned int bytes,
struct x86_exception *exception,
@@ -3864,14 +3875,10 @@ int emulator_read_write(struct x86_emulate_ctxt *ctxt, 
unsigned long addr,
return rc;

gpa = vcpu->mmio_fragments[0].gpa;
-
vcpu->mmio_needed = 1;
vcpu->mmio_cur_fragment = 0;

-   vcpu->run->mmio.len = vcpu->mmio_fragments[0].len;
-   vcpu->run->mmio.is_write = vcpu->mmio_is_write = ops->write;
-   vcpu->run->exit_reason = KVM_EXIT_MMIO;
-   vcpu->run->mmio.phys_addr = gpa;
+   set_mmio_exit_info(vcpu, &vcpu->mmio_fragments[0], ops->write);

return ops->read_write_exit_mmio(vcpu, gpa, val, bytes);
 }
@@ -5490,7 +5497,6 @@ static int __vcpu_run(struct kvm_vcpu *vcpu)
  */
 static int complete_mmio(struct kvm_vcpu *vcpu)
 {
-   struct kvm_run *run = vcpu->run;
struct kvm_mmio_fragment *frag;
int r;

@@ -5501,7 +5507,7 @@ static int complete_mmio(struct kvm_vcpu *vcpu)
/* Complete previous fragment */
frag = &vcpu->mmio_fragments[vcpu->mmio_cur_fragment++];
if (!vcpu->mmio_is_write)
-   memcpy(frag->data, run->mmio.data, frag->len);
+   memcpy(frag->data, vcpu->run->mmio.data, frag->len);
if (vcpu->mmio_cur_fragment == vcpu->mmio_nr_fragments) {
vcpu->mmio_needed = 0;
if (vcpu->mmio_is_write)
@@ -5511,12 +5517,7 @@ static int complete_mmio(struct kvm_vcpu *vcpu)
}
/* Initiate next fragment */
++frag;
-   run->exit_reason = KVM_EXIT_MMIO;
-   run->mmio.phys_addr = frag->gpa;
-   if (vcpu->mmio_is_write)
-   memcpy(run->mmio.data, frag->data, frag->len);
-   run->mmio.len = frag->len;
-   run->mmio.is_write = vcpu->mmio_is_write;
+   set_mmio_exit_info(vcpu, frag, vcpu->mmio_is_write);
return 0;

}
-- 
1.7.7.6

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 1/2] KVM: X86: remove read buffer for mmio read

2012-07-09 Thread Xiao Guangrong
After commit f78146b0f9230765c6315b2e14f56112513389ad:

 KVM: Fix page-crossing MMIO

MMIO that are split across a page boundary are currently broken - the
code does not expect to be aborted by the exit to userspace for the
first MMIO fragment.

This patch fixes the problem by generalizing the current code for handling
16-byte MMIOs to handle a number of "fragments", and changes the MMIO
code to create those fragments.

Signed-off-by: Avi Kivity 
Signed-off-by: Marcelo Tosatti 

Multiple MMIO reads can be merged into mmio_fragments, the read buffer is not
needed anymore

Signed-off-by: Xiao Guangrong 
---
 arch/x86/include/asm/kvm_emulate.h |1 -
 arch/x86/kvm/emulate.c |   43 ---
 arch/x86/kvm/x86.c |2 -
 3 files changed, 5 insertions(+), 41 deletions(-)

diff --git a/arch/x86/include/asm/kvm_emulate.h 
b/arch/x86/include/asm/kvm_emulate.h
index 1ac46c22..339d7c6 100644
--- a/arch/x86/include/asm/kvm_emulate.h
+++ b/arch/x86/include/asm/kvm_emulate.h
@@ -286,7 +286,6 @@ struct x86_emulate_ctxt {
struct operand *memopp;
struct fetch_cache fetch;
struct read_cache io_read;
-   struct read_cache mem_read;
 };

 /* Repeat String Operation Prefix */
diff --git a/arch/x86/kvm/emulate.c b/arch/x86/kvm/emulate.c
index f95d242..aa455da 100644
--- a/arch/x86/kvm/emulate.c
+++ b/arch/x86/kvm/emulate.c
@@ -1128,33 +1128,6 @@ static void fetch_bit_operand(struct x86_emulate_ctxt 
*ctxt)
ctxt->src.val &= (ctxt->dst.bytes << 3) - 1;
 }

-static int read_emulated(struct x86_emulate_ctxt *ctxt,
-unsigned long addr, void *dest, unsigned size)
-{
-   int rc;
-   struct read_cache *mc = &ctxt->mem_read;
-
-   while (size) {
-   int n = min(size, 8u);
-   size -= n;
-   if (mc->pos < mc->end)
-   goto read_cached;
-
-   rc = ctxt->ops->read_emulated(ctxt, addr, mc->data + mc->end, n,
- &ctxt->exception);
-   if (rc != X86EMUL_CONTINUE)
-   return rc;
-   mc->end += n;
-
-   read_cached:
-   memcpy(dest, mc->data + mc->pos, n);
-   mc->pos += n;
-   dest += n;
-   addr += n;
-   }
-   return X86EMUL_CONTINUE;
-}
-
 static int segmented_read(struct x86_emulate_ctxt *ctxt,
  struct segmented_address addr,
  void *data,
@@ -1166,7 +1139,9 @@ static int segmented_read(struct x86_emulate_ctxt *ctxt,
rc = linearize(ctxt, addr, size, false, &linear);
if (rc != X86EMUL_CONTINUE)
return rc;
-   return read_emulated(ctxt, linear, data, size);
+
+   return ctxt->ops->read_emulated(ctxt, linear, data, size,
+   &ctxt->exception);
 }

 static int segmented_write(struct x86_emulate_ctxt *ctxt,
@@ -4122,8 +4097,6 @@ int x86_emulate_insn(struct x86_emulate_ctxt *ctxt)
int rc = X86EMUL_CONTINUE;
int saved_dst_type = ctxt->dst.type;

-   ctxt->mem_read.pos = 0;
-
if (ctxt->mode == X86EMUL_MODE_PROT64 && (ctxt->d & No64)) {
rc = emulate_ud(ctxt);
goto done;
@@ -4364,15 +4337,9 @@ writeback:
 * or, if it is not used, after each 1024 iteration.
 */
if ((r->end != 0 || ctxt->regs[VCPU_REGS_RCX] & 0x3ff) 
&&
-   (r->end == 0 || r->end != r->pos)) {
-   /*
-* Reset read cache. Usually happens before
-* decode, but since instruction is restarted
-* we have to do it here.
-*/
-   ctxt->mem_read.end = 0;
+   (r->end == 0 || r->end != r->pos))
return EMULATION_RESTART;
-   }
+
goto done; /* skip rip writeback */
}
}
diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index a01a424..7445545 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -4399,8 +4399,6 @@ static void init_decode_cache(struct x86_emulate_ctxt 
*ctxt,
ctxt->fetch.end = 0;
ctxt->io_read.pos = 0;
ctxt->io_read.end = 0;
-   ctxt->mem_read.pos = 0;
-   ctxt->mem_read.end = 0;
 }

 static void init_emulate_ctxt(struct kvm_vcpu *vcpu)
-- 
1.7.7.6

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 2/2] KVM: PPC: booke: Add watchdog emulation

2012-07-09 Thread Alexander Graf

On 09.07.2012, at 08:43, Bhushan Bharat-R65777 wrote:

>>> +
>>> +/*
>>> + * Return the number of jiffies until the next timeout.  If the
>>> +timeout is
>>> + * longer than the MAX_TIMEOUT, that we return MAX_TIMEOUT instead.
>>> + */
>>> +static unsigned long watchdog_next_timeout(struct kvm_vcpu *vcpu) {
>>> +   unsigned long long tb, mask, nr_jiffies = 0;
>> 
>> u64?
>> 
>>> +   u32 period = TCR_GET_FSL_WP(vcpu->arch.tcr);
>> 
>> Doesn't sound like something booke generic to me.
> 
> the name '*FSL*' does not look good, right?

It usually indicates that it's not booke generic :).

> 
>> 
>>> +#ifdef CONFIG_BOOKE
>>> +   ret = ret || (v->arch.tsr & TCR_WRC_MASK);
>> 
>> Please make this a callback. In a header if you think it's performance 
>> critical,
>> but I don't want to see #ifdef CONFIG_BOOKE too often in powerpc.c.
> 
> Not sure: do you mean something like this:
> 
> diff --git a/arch/powerpc/include/asm/kvm_book3s.h 
> b/arch/powerpc/include/asm/kvm_book3s.h
> index f0e0c6a..7bbc6cd 100644
> --- a/arch/powerpc/include/asm/kvm_book3s.h
> +++ b/arch/powerpc/include/asm/kvm_book3s.h
> @@ -446,6 +446,11 @@ static inline bool kvmppc_critical_section(struct 
> kvm_vcpu *vcpu)
> }
> #endif
> 
> +static inline u32 kvmppc_get_tsr_wrc(struct kvm_vcpu *vcpu)
> +{
> +   return 0;
> +}
> +
> /* Magic register values loaded into r3 and r4 before the 'sc' assembly
>  * instruction for the OSI hypercalls */
> #define OSI_SC_MAGIC_R30x113724FA
> diff --git a/arch/powerpc/include/asm/kvm_booke.h 
> b/arch/powerpc/include/asm/kvm_booke.h
> index b7cd335..e5b86c1 100644
> --- a/arch/powerpc/include/asm/kvm_booke.h
> +++ b/arch/powerpc/include/asm/kvm_booke.h
> @@ -100,4 +100,9 @@ static inline ulong kvmppc_get_msr(struct kvm_vcpu *vcpu)
> {
>return vcpu->arch.shared->msr;
> }
> +
> +static inline u32 kvmppc_get_tsr_wrc(struct kvm_vcpu *vcpu)
> +{
> +   return vcpu->arch.tsr & TCR_WRC_MASK;
> +}

No, I was more thinking along the lines of e500/440 here. Also it should simply 
be a generic callback, like

ret = kvmppc_core_check_runnable(vcpu, ret);

So that if 440 wants to do something special about modifying the runnable 
semantics, it can easily do so. Or we just move the whole function to core 
specific code.


Alex

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] apic: fix kvm build on UP without IOAPIC

2012-07-09 Thread Ingo Molnar

* Michael S. Tsirkin  wrote:

> On Fri, Jul 06, 2012 at 04:12:23PM +0200, Ingo Molnar wrote:
> > 
> > * Marcelo Tosatti  wrote:
> > 
> > > On Fri, Jul 06, 2012 at 01:13:14PM +0200, Ingo Molnar wrote:
> > > > 
> > > > * H. Peter Anvin  wrote:
> > > > 
> > > > > On 07/01/2012 08:05 AM, Michael S. Tsirkin wrote:
> > > > > >On UP i386, when APIC is disabled
> > > > > ># CONFIG_X86_UP_APIC is not set
> > > > > ># CONFIG_PCI_IOAPIC is not set
> > > > > >
> > > > > >code looking at apicdrivers never has any effect but it
> > > > > >still gets compiled in. In particular, this causes
> > > > > >build failures with kvm, but it generally bloats the kernel
> > > > > >unnecessarily.
> > > > > >
> > > > > >Fix by defining both __apicdrivers and __apicdrivers_end
> > > > > >to be NULL when CONFIG_X86_LOCAL_APIC is unset: I verified
> > > > > >that as the result any loop scanning __apicdrivers gets optimized 
> > > > > >out by
> > > > > >the compiler.
> > > > > >
> > > > > >Warning: a .config with apic disabled doesn't seem to boot
> > > > > >for me (even without this patch). Still verifying why,
> > > > > >meanwhile this patch is compile-tested only.
> > > > > >
> > > > > >Signed-off-by: Michael S. Tsirkin 
> > > > > >---
> > > > > >
> > > > > >Note: if this patch makes sense, can x86 maintainers
> > > > > >please ACK applying it through the kvm tree, since that is
> > > > > >where we see the issue that it addresses?
> > > > > >Avi, Marcelo, maybe you can carry this in kvm/linux-next as a 
> > > > > >temporary
> > > > > >measure so that linux-next builds?
> > > > > >
> > > > > 
> > > > > I'm not happy about that as a workflow, but since you guys have an
> > > > > immediate problem I guess we can do that.
> > > > 
> > > > I'm rather unhappy about this workflow - we've got quite a few 
> > > > apic bits in the x86 tree this cycle as well and need extra 
> > > > external interaction, not.
> > > > 
> > > > Which KVM tree commit caused this, could someone please give a 
> > > > lkml link or quote it here? It's not referenced in the fix patch 
> > > > either.
> > > > 
> > > > Thanks,
> > > > 
> > > > Ingo
> > > 
> > > This tree (kvm.git next):
> > > 
> > > http://git.kernel.org/?p=virt/kvm/kvm.git;a=shortlog;h=refs/heads/next
> > > 
> > > Introduced by this commit:
> > > 
> > > http://git.kernel.org/?p=virt/kvm/kvm.git;a=commit;h=ab9cf4996bb989983e73da894b8dd0239aa2c3c2
> > 
> > This bit:
> > 
> > > + if (kvm_para_has_feature(KVM_FEATURE_PV_EOI)) {
> > > + struct apic **drv;
> > > +
> > > + for (drv = __apicdrivers; drv < __apicdrivers_end; drv++) {
> > > + /* Should happen once for each apic */
> > > + WARN_ON((*drv)->eoi_write == kvm_guest_apic_eoi_write);
> > > + (*drv)->eoi_write = kvm_guest_apic_eoi_write;
> > > + }
> > > + }
> > > +
> > 
> > is rather disgusting I have to say.
> > 
> > WTH is the KVM code meddling with core x86 apic driver data 
> > structures directly? At minimum factor this out and create a 
> > proper apic.c function which is EXPORT_SYMBOL_GPL() exported or 
> > so...
> > 
> > Thanks,
> > 
> > Ingo
> 
> OK, so apic_set_eoi_write()?

Yes, with a changelog comment analyzing the design decisions and 
locking here - what happens if actual APIC driver use races with 
this update on SMP, why is it all safe, etc?

Thanks,

Ingo
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 2/2 v2] KVM: PPC: booke: Add watchdog emulation

2012-07-09 Thread Bharat Bhushan
This patch adds the watchdog emulation in KVM. The watchdog
emulation is enabled by KVM_ENABLE_CAP(KVM_CAP_PPC_WDT) ioctl.
The kernel timer are used for watchdog emulation and emulates
h/w watchdog state machine. On watchdog timer expiry, it exit to QEMU
if TCR.WRC is non ZERO. QEMU can reset/shutdown etc depending upon how
it is configured.

Signed-off-by: Liu Yu 
Signed-off-by: Scott Wood 
Signed-off-by: Bharat Bhushan 
---
 -v2:
  - Addressed the review comments 

 arch/powerpc/include/asm/kvm_book3s.h |5 ++
 arch/powerpc/include/asm/kvm_booke.h  |5 ++
 arch/powerpc/include/asm/kvm_host.h   |3 +
 arch/powerpc/include/asm/kvm_ppc.h|3 +
 arch/powerpc/include/asm/reg_booke.h  |7 ++
 arch/powerpc/kvm/44x.c|1 +
 arch/powerpc/kvm/booke.c  |  128 +
 arch/powerpc/kvm/booke_emulate.c  |8 ++
 arch/powerpc/kvm/powerpc.c|   31 +++-
 include/linux/kvm.h   |2 +
 10 files changed, 189 insertions(+), 4 deletions(-)

diff --git a/arch/powerpc/include/asm/kvm_book3s.h 
b/arch/powerpc/include/asm/kvm_book3s.h
index f0e0c6a..7bbc6cd 100644
--- a/arch/powerpc/include/asm/kvm_book3s.h
+++ b/arch/powerpc/include/asm/kvm_book3s.h
@@ -446,6 +446,11 @@ static inline bool kvmppc_critical_section(struct kvm_vcpu 
*vcpu)
 }
 #endif
 
+static inline u32 kvmppc_get_tsr_wrc(struct kvm_vcpu *vcpu)
+{
+   return 0;
+}
+
 /* Magic register values loaded into r3 and r4 before the 'sc' assembly
  * instruction for the OSI hypercalls */
 #define OSI_SC_MAGIC_R30x113724FA
diff --git a/arch/powerpc/include/asm/kvm_booke.h 
b/arch/powerpc/include/asm/kvm_booke.h
index b7cd335..e5b86c1 100644
--- a/arch/powerpc/include/asm/kvm_booke.h
+++ b/arch/powerpc/include/asm/kvm_booke.h
@@ -100,4 +100,9 @@ static inline ulong kvmppc_get_msr(struct kvm_vcpu *vcpu)
 {
return vcpu->arch.shared->msr;
 }
+
+static inline u32 kvmppc_get_tsr_wrc(struct kvm_vcpu *vcpu)
+{
+   return vcpu->arch.tsr & TCR_WRC_MASK;
+}
 #endif /* __ASM_KVM_BOOKE_H__ */
diff --git a/arch/powerpc/include/asm/kvm_host.h 
b/arch/powerpc/include/asm/kvm_host.h
index 50ea12f..01047f4 100644
--- a/arch/powerpc/include/asm/kvm_host.h
+++ b/arch/powerpc/include/asm/kvm_host.h
@@ -467,6 +467,8 @@ struct kvm_vcpu_arch {
ulong fault_esr;
ulong queued_dear;
ulong queued_esr;
+   spinlock_t wdt_lock;
+   struct timer_list wdt_timer;
u32 tlbcfg[4];
u32 mmucfg;
u32 epr;
@@ -482,6 +484,7 @@ struct kvm_vcpu_arch {
u8 osi_needed;
u8 osi_enabled;
u8 papr_enabled;
+   u8 watchdog_enable;
u8 sane;
u8 cpu_type;
u8 hcall_needed;
diff --git a/arch/powerpc/include/asm/kvm_ppc.h 
b/arch/powerpc/include/asm/kvm_ppc.h
index 0124937..e5cf4b9 100644
--- a/arch/powerpc/include/asm/kvm_ppc.h
+++ b/arch/powerpc/include/asm/kvm_ppc.h
@@ -67,6 +67,7 @@ extern int kvmppc_emulate_mmio(struct kvm_run *run, struct 
kvm_vcpu *vcpu);
 extern void kvmppc_emulate_dec(struct kvm_vcpu *vcpu);
 extern u32 kvmppc_get_dec(struct kvm_vcpu *vcpu, u64 tb);
 extern void kvmppc_decrementer_func(unsigned long data);
+extern void kvmppc_watchdog_func(unsigned long data);
 extern int kvmppc_sanity_check(struct kvm_vcpu *vcpu);
 
 /* Core-specific hooks */
@@ -104,6 +105,8 @@ extern void kvmppc_core_queue_external(struct kvm_vcpu 
*vcpu,
struct kvm_interrupt *irq);
 extern void kvmppc_core_dequeue_external(struct kvm_vcpu *vcpu,
  struct kvm_interrupt *irq);
+extern void kvmppc_core_queue_watchdog(struct kvm_vcpu *vcpu);
+extern void kvmppc_core_dequeue_watchdog(struct kvm_vcpu *vcpu);
 
 extern int kvmppc_core_emulate_op(struct kvm_run *run, struct kvm_vcpu *vcpu,
   unsigned int op, int *advance);
diff --git a/arch/powerpc/include/asm/reg_booke.h 
b/arch/powerpc/include/asm/reg_booke.h
index 2d916c4..e07e6af 100644
--- a/arch/powerpc/include/asm/reg_booke.h
+++ b/arch/powerpc/include/asm/reg_booke.h
@@ -539,6 +539,13 @@
 #define TCR_FIE0x0080  /* FIT Interrupt Enable */
 #define TCR_ARE0x0040  /* Auto Reload Enable */
 
+#ifdef CONFIG_E500
+#define TCR_GET_WP(tcr)  tcr) & 0xC000) >> 30) | \
+ (((tcr) & 0x1E) >> 15))
+#else
+#define TCR_GET_WP(tcr)  (((tcr) & 0xC000) >> 30)
+#endif
+
 /* Bit definitions for the TSR. */
 #define TSR_ENW0x8000  /* Enable Next Watchdog */
 #define TSR_WIS0x4000  /* WDT Interrupt Status */
diff --git a/arch/powerpc/kvm/44x.c b/arch/powerpc/kvm/44x.c
index 50e7dbc..a213aba 100644
--- a/arch/powerpc/kvm/44x.c
+++ b/arch/powerpc/kvm/44x.c
@@ -28,6 +28,7 @@
 #include 
 #include 
 
+#include "booke.h"
 #include "44x_tlb.h"
 #include "booke.h"
 
diff --git a/arch/powerpc/kvm/booke.c b/arch/po

Re: [PATCH 7/7 v6] deal with panicked event accoring to '-machine panic_action=action'

2012-07-09 Thread Wen Congyang
At 07/06/2012 07:09 PM, Jan Kiszka Wrote:
> On 2012-07-06 11:41, Wen Congyang wrote:
>> The action is the same as -onpanic parameter.
> 
> As explained in patch 5, now that we have a related device, this no
> longer needs to be a machine property.
> 
> Would could be a machine property is enabling/disabling this device.
> That's probably useful as it uses a fixed PIO port that might conflict
> with (non-Linux) guest expectations and/or future device models.

Yes, it is very useful. I will do it.

Thanks
Wen Congyang

> 
> Jan
> 
>>
>> Signed-off-by: Wen Congyang 
>> ---
>>  qemu-config.c   |4 
>>  qemu-options.hx |4 +++-
>>  vl.c|7 +++
>>  3 files changed, 14 insertions(+), 1 deletions(-)
>>
>> diff --git a/qemu-config.c b/qemu-config.c
>> index 5c3296b..805e7c4 100644
>> --- a/qemu-config.c
>> +++ b/qemu-config.c
>> @@ -595,6 +595,10 @@ static QemuOptsList qemu_machine_opts = {
>>  .name = "dt_compatible",
>>  .type = QEMU_OPT_STRING,
>>  .help = "Overrides the \"compatible\" property of the dt root 
>> node",
>> +}, {
>> +.name = "panic_action",
>> +.type = QEMU_OPT_STRING,
>> +.help = "The action what QEMU will do when the guest is 
>> panicked",
>>  },
>>  { /* End of list */ }
>>  },
>> diff --git a/qemu-options.hx b/qemu-options.hx
>> index 4a061bf..083a21d 100644
>> --- a/qemu-options.hx
>> +++ b/qemu-options.hx
>> @@ -33,7 +33,9 @@ DEF("machine", HAS_ARG, QEMU_OPTION_machine, \
>>  "property accel=accel1[:accel2[:...]] selects 
>> accelerator\n"
>>  "supported accelerators are kvm, xen, tcg (default: 
>> tcg)\n"
>>  "kernel_irqchip=on|off controls accelerated irqchip 
>> support\n"
>> -"kvm_shadow_mem=size of KVM shadow MMU\n",
>> +"kvm_shadow_mem=size of KVM shadow MMU\n"
>> +"panic_action=none|pause|poweroff|reset controls what 
>> QEmu\n"
>> +"will do when the guest is panicked",
>>  QEMU_ARCH_ALL)
>>  STEXI
>>  @item -machine [type=]@var{name}[,prop=@var{value}[,...]]
>> diff --git a/vl.c b/vl.c
>> index 1a68257..091c43b 100644
>> --- a/vl.c
>> +++ b/vl.c
>> @@ -2301,6 +2301,7 @@ int main(int argc, char **argv, char **envp)
>>  };
>>  const char *trace_events = NULL;
>>  const char *trace_file = NULL;
>> +const char *panic_action = NULL;
>>  
>>  atexit(qemu_run_exit_notifiers);
>>  error_set_progname(argv[0]);
>> @@ -3372,10 +3373,16 @@ int main(int argc, char **argv, char **envp)
>>  kernel_filename = qemu_opt_get(machine_opts, "kernel");
>>  initrd_filename = qemu_opt_get(machine_opts, "initrd");
>>  kernel_cmdline = qemu_opt_get(machine_opts, "append");
>> +panic_action = qemu_opt_get(machine_opts, "panic_action");
>>  } else {
>>  kernel_filename = initrd_filename = kernel_cmdline = NULL;
>>  }
>>  
>> +if (panic_action && select_panicked_action(panic_action) == -1) {
>> +fprintf(stderr, "Unknown -panic_action parameter\n");
>> +exit(1);
>> +}
>> +
>>  if (!kernel_cmdline) {
>>  kernel_cmdline = "";
>>  }
>>
> 

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] add PLE stats to kvmstat

2012-07-09 Thread Avi Kivity
On 07/09/2012 09:42 AM, Xiao Guangrong wrote:
>> 
>> Note, kvm_stat is able to use both the old debugfs based statistics and
>> the tracepoint based events.  The latter requires root privileges.
>> 
> 
> Yes, but you are going to remove KVM debugfs statistics, will you
> plan to remove kvm_stat after debugfs based statistics removed and
> kvm-events merged?

Probably not for a while, it will still be useful for older kernels.

-- 
error compiling committee.c: too many arguments to function


--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


qemu-kvm-1.1.0 crashing with kernel 3.5.0-rc6

2012-07-09 Thread Chris Clayton

Hi,

When I run WinXP SP3 through qemu-kvm-1.1.0 on linux kernel 3.5.0-rc6, I 
get a segmentation fault within 3 or 4 minutes maximum. In dmesg I see:


qemu-kvm: sending ioctl 5326 to a partition!
qemu-kvm: sending ioctl 801c0204 to a partition!
qemu-kvm: sending ioctl 5326 to a partition!
qemu-kvm: sending ioctl 801c0204 to a partition!
qemu-kvm: sending ioctl 5326 to a partition!
qemu-kvm: sending ioctl 801c0204 to a partition!
qemu-kvm: sending ioctl 5326 to a partition!
qemu-kvm: sending ioctl 801c0204 to a partition!
qemu-kvm[860] general protection ip:b6abad77 sp:b52ff09c error:0 in 
libc-2.16.so[b697d000+1b4000]


The crash does not occur with qemu-kvm-1.0.1 on rc6. Nor does it occur 
qemu-kvm-1.0.1 or qemu-kvm-1.1.0 on kernel 3.4.4. All three combinations 
survive for 15 minutes or more


When I try to get a backtrace with gdb, the screen on which konsole and 
qemu are running locks up until I kill qemu in another console. 
Consequently I can't get a full BT, but, although probably not very 
helpful, what I did get is:


Program received signal SIGSEGV, Segmentation fault.
[Switching to Thread 0xb6946b40 (LWP 506)]
0xb7705d77 in __strcmp_sse4_2 () from /lib/libc.so.6
(gdb) bt
#0  0xb7705d77 in __strcmp_sse4_2 () from /lib/libc.so.6
#1  0xb7e8d6e3 in g_str_equal () from /usr/lib/libglib-2.0.so.0
#2  0xb7e8c94c in g_hash_table_lookup () from /usr/lib/libglib-2.0.so.0
Cannot access memory at address 0xb694610c
(gdb)

Note that the gdb and dmesg outputs above are not from the same crash 
instance.


I'm not subscribed,so please cc me on any reply.

Happy to provide any additional diagnostics (but may need help on how to 
get them) or test patches, etc


Thanks

Chris Clayton
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 1/2] KVM: X86: remove read buffer for mmio read

2012-07-09 Thread Avi Kivity
On 07/09/2012 12:02 PM, Xiao Guangrong wrote:
> After commit f78146b0f9230765c6315b2e14f56112513389ad:
> 
>  KVM: Fix page-crossing MMIO
> 
> MMIO that are split across a page boundary are currently broken - the
> code does not expect to be aborted by the exit to userspace for the
> first MMIO fragment.
> 
> This patch fixes the problem by generalizing the current code for handling
> 16-byte MMIOs to handle a number of "fragments", and changes the MMIO
> code to create those fragments.
> 
> Signed-off-by: Avi Kivity 
> Signed-off-by: Marcelo Tosatti 
> 
> Multiple MMIO reads can be merged into mmio_fragments, the read buffer is not
> needed anymore
> 
> Signed-off-by: Xiao Guangrong 
> ---
>  arch/x86/include/asm/kvm_emulate.h |1 -
>  arch/x86/kvm/emulate.c |   43 ---
>  arch/x86/kvm/x86.c |2 -
>  3 files changed, 5 insertions(+), 41 deletions(-)
> 
> diff --git a/arch/x86/include/asm/kvm_emulate.h 
> b/arch/x86/include/asm/kvm_emulate.h
> index 1ac46c22..339d7c6 100644
> --- a/arch/x86/include/asm/kvm_emulate.h
> +++ b/arch/x86/include/asm/kvm_emulate.h
> @@ -286,7 +286,6 @@ struct x86_emulate_ctxt {
>   struct operand *memopp;
>   struct fetch_cache fetch;
>   struct read_cache io_read;
> - struct read_cache mem_read;
>  };
> 

Suppose we have a RMW instruction.  On the first entry to
x86_emulate_insn() we'll drop to userspace and perform the read, and the
seconds we'll read from the cache and complete the write.

Without the read cache this cannot work.

kvm-unit-tests.git has a test for xchg to mmio.  Does it still work?

I agree this code has to go, but it needs to be replaced by something.
Maybe a .valid flag in struct operand.

-- 
error compiling committee.c: too many arguments to function


--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 1/2] KVM: X86: remove read buffer for mmio read

2012-07-09 Thread Gleb Natapov
On Mon, Jul 09, 2012 at 02:15:37PM +0300, Avi Kivity wrote:
> On 07/09/2012 12:02 PM, Xiao Guangrong wrote:
> > After commit f78146b0f9230765c6315b2e14f56112513389ad:
> > 
> >  KVM: Fix page-crossing MMIO
> > 
> > MMIO that are split across a page boundary are currently broken - the
> > code does not expect to be aborted by the exit to userspace for the
> > first MMIO fragment.
> > 
> > This patch fixes the problem by generalizing the current code for 
> > handling
> > 16-byte MMIOs to handle a number of "fragments", and changes the MMIO
> > code to create those fragments.
> > 
> > Signed-off-by: Avi Kivity 
> > Signed-off-by: Marcelo Tosatti 
> > 
> > Multiple MMIO reads can be merged into mmio_fragments, the read buffer is 
> > not
> > needed anymore
> > 
> > Signed-off-by: Xiao Guangrong 
> > ---
> >  arch/x86/include/asm/kvm_emulate.h |1 -
> >  arch/x86/kvm/emulate.c |   43 
> > ---
> >  arch/x86/kvm/x86.c |2 -
> >  3 files changed, 5 insertions(+), 41 deletions(-)
> > 
> > diff --git a/arch/x86/include/asm/kvm_emulate.h 
> > b/arch/x86/include/asm/kvm_emulate.h
> > index 1ac46c22..339d7c6 100644
> > --- a/arch/x86/include/asm/kvm_emulate.h
> > +++ b/arch/x86/include/asm/kvm_emulate.h
> > @@ -286,7 +286,6 @@ struct x86_emulate_ctxt {
> > struct operand *memopp;
> > struct fetch_cache fetch;
> > struct read_cache io_read;
> > -   struct read_cache mem_read;
> >  };
> > 
> 
> Suppose we have a RMW instruction.  On the first entry to
> x86_emulate_insn() we'll drop to userspace and perform the read, and the
> seconds we'll read from the cache and complete the write.
> 
> Without the read cache this cannot work.
> 
Cache is needed to emulate instructions that need more than one read
that can go to MMIO.

> kvm-unit-tests.git has a test for xchg to mmio.  Does it still work?
> 
> I agree this code has to go, but it needs to be replaced by something.
> Maybe a .valid flag in struct operand.
> 
Valid will not enough for that.

--
Gleb.
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v2 1/3] KVM: Add new -cpu best

2012-07-09 Thread Alexander Graf

On 02.07.2012, at 16:25, Avi Kivity wrote:

> On 06/26/2012 07:39 PM, Alexander Graf wrote:
>> During discussions on whether to make -cpu host the default in SLE, I found
>> myself disagreeing to the thought, because it potentially opens a big can
>> of worms for potential bugs. But if I already am so opposed to it for SLE, 
>> how
>> can it possibly be reasonable to default to -cpu host in upstream QEMU? And
>> what would a sane default look like?
>> 
>> So I had this idea of looping through all available CPU definitions. We can
>> pretty well tell if our host is able to execute any of them by checking the
>> respective flags and seeing if our host has all features the CPU definition
>> requires. With that, we can create a -cpu type that would fall back to the
>> "best known CPU definition" that our host can fulfill. On my Phenom II
>> system for example, that would be -cpu phenom.
>> 
>> With this approach we can test and verify that CPU types actually work at
>> any random user setup, because we can always verify that all the -cpu types
>> we ship actually work. And we only default to some clever mechanism that
>> chooses from one of these.
>> 
>> 
>> +/* Are all guest feature bits present on the host? */
>> +static bool cpu_x86_feature_subset(uint32_t host, uint32_t guest)
>> +{
>> +int i;
>> +
>> +for (i = 0; i < 32; i++) {
>> +uint32_t mask = 1 << i;
>> +if ((guest & mask) && !(host & mask)) {
>> +return false;
>> +}
>> +}
>> +
>> +return true;
> 
>return !(guest & ~host);

I guess it helps to think :).

> 
> 
>> +}
> 
> 
> 
>> +
>> +
>> +
>> +static void cpu_x86_fill_best(x86_def_t *x86_cpu_def)
>> +{
>> +x86_def_t *def;
>> +
>> +x86_cpu_def->family = 0;
>> +x86_cpu_def->model = 0;
>> +for (def = x86_defs; def; def = def->next) {
>> +if (cpu_x86_fits_host(def) && cpu_x86_fits_higher(def, 
>> x86_cpu_def)) {
>> +memcpy(x86_cpu_def, def, sizeof(*def));
>> +}
>  *x86_cpu_def = *def;
>> +}
>> +
>> +if (!x86_cpu_def->family && !x86_cpu_def->model) {
>> +fprintf(stderr, "No fitting CPU model found!\n");
>> +exit(1);
>> +}
>> +}
>> +
>> static int unavailable_host_feature(struct model_features_t *f, uint32_t 
>> mask)
>> {
>> int i;
>> @@ -878,6 +957,8 @@ static int cpu_x86_find_by_name(x86_def_t *x86_cpu_def, 
>> const char *cpu_model)
>> break;
>> if (kvm_enabled() && name && strcmp(name, "host") == 0) {
>> cpu_x86_fill_host(x86_cpu_def);
>> +} else if (kvm_enabled() && name && strcmp(name, "best") == 0) {
>> +cpu_x86_fill_best(x86_cpu_def);
>> } else if (!def) {
>> goto error;
>> } else {
>> 
> 
> Should we copy the cache size etc. from the host?

I don't think so. We should rather make sure we always have cpu descriptions 
available close to what people out there actually use.


Alex

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 1/2] KVM: X86: remove read buffer for mmio read

2012-07-09 Thread Avi Kivity
On 07/09/2012 02:23 PM, Gleb Natapov wrote:
> 
>> kvm-unit-tests.git has a test for xchg to mmio.  Does it still work?
>> 
>> I agree this code has to go, but it needs to be replaced by something.
>> Maybe a .valid flag in struct operand.
>> 
> Valid will not enough for that.

If me make everything go through operands, why not?


-- 
error compiling committee.c: too many arguments to function


--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 1/2] KVM: X86: remove read buffer for mmio read

2012-07-09 Thread Avi Kivity
On 07/09/2012 02:23 PM, Gleb Natapov wrote:
> 
>> kvm-unit-tests.git has a test for xchg to mmio.  Does it still work?
>> 
>> I agree this code has to go, but it needs to be replaced by something.
>> Maybe a .valid flag in struct operand.
>> 
> Valid will not enough for that.

If we make everything go through operands, any reason why not?

-- 
error compiling committee.c: too many arguments to function


--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: First shot at adding IPMI to qemu

2012-07-09 Thread Corey Minyard
I haven't heard anything about these patches.  Any comments, good or 
bad?  Has anyone tried these?


Thanks,

-corey

On 07/02/2012 02:44 PM, miny...@acm.org wrote:

I had asked about getting an IPMI device into qemu and received some
interest, and it's useful to me, so I've done some work to add it.
The following patch set has a set of patches to add an IPMI KCS
device, and IPMI BT device, a built-in BMC (IPMI management controller),
and a way to attach an external BMC through a chardev.

There was some discussion on whether to make the BMC internal or
external, but I went ahead and added both.  The internal one is
fairly basic and not extensible, at least without adding code.
I've modified the OpenIPMI library simulator to work with the
external interface to allow it to receive connections from the
qemu external simulator with a fairly basic protocol.

I've also added the ability for the OpenIPMI library to manage
a VM to power it on, power it off, reset it, and handle an IPMI
watchdog timer.  So it looks quite like a real system.  Instructions
for using it are in the OpenIPMI release candidate I uploaded to
https://sourceforge.net/projects/openipmi

Since IPMI can advertise it's presence via SMBIOS, I added a
way for a driver to add an SMBIOS entry.  I also added a way
to query a free interrupt from the ISA bus, since the interrupt
is in the SMBIOS entry and nobody really cares which one is used.

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html



--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 1/2] KVM: X86: remove read buffer for mmio read

2012-07-09 Thread Xiao Guangrong
On 07/09/2012 08:49 PM, Avi Kivity wrote:
> On 07/09/2012 02:23 PM, Gleb Natapov wrote:
>>
>>> kvm-unit-tests.git has a test for xchg to mmio.  Does it still work?
>>>
>>> I agree this code has to go, but it needs to be replaced by something.
>>> Maybe a .valid flag in struct operand.
>>>
>> Valid will not enough for that.
> 
> If we make everything go through operands, any reason why not?
> 

I noticed some instructions need to read ESP for many times (e.g, iret_real),
maybe .valid flag is not enough for this case if the stack is in MMIO, yes?

IIUC, I also noticed ESP is not reset back if it is emulated fail (mmio is 
needed).
If the stack located in mmio region, this kind of instruct will be broken, i 
know no
guest will use mmio as stack but SDM does not limit it, is it valid?




--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 1/2] KVM: X86: remove read buffer for mmio read

2012-07-09 Thread Gleb Natapov
On Mon, Jul 09, 2012 at 03:49:05PM +0300, Avi Kivity wrote:
> On 07/09/2012 02:23 PM, Gleb Natapov wrote:
> > 
> >> kvm-unit-tests.git has a test for xchg to mmio.  Does it still work?
> >> 
> >> I agree this code has to go, but it needs to be replaced by something.
> >> Maybe a .valid flag in struct operand.
> >> 
> > Valid will not enough for that.
> 
> If we make everything go through operands, any reason why not?
> 
Ah I missed operand part. Thought about adding valid to x86.c mmio read
buffer. What about doing more complex things from MMIO, like task switch?

--
Gleb.
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 0/3] Add -cpu best support

2012-07-09 Thread Alexander Graf
This patch set implements support for a sane default CPU type, that plays the
middle ground between -cpu host (unmanagable test matrix) and -cpu qemu64 
(breaks
assumptions wrt family/model numbers). It also makes it the default for -M pc,
so that users who don't specify a specific CPU type on the command line always
get a fast, yet reliable CPU type chosen for them.

For a detailed description, please see patch 1/3.

Alexander Graf (3):
  KVM: Add new -cpu best
  KVM: Use -cpu best as default on x86
  i386: KVM: List -cpu host and best in -cpu ?

 hw/pc_piix.c  |   34 +-
 target-i386/cpu.c |   79 +++--
 2 files changed, 102 insertions(+), 11 deletions(-)

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 2/3] KVM: Use -cpu best as default on x86

2012-07-09 Thread Alexander Graf
When running QEMU without -cpu parameter, the user usually wants a sane
default. So far, we're using the qemu64/qemu32 CPU type, which basically
means "the maximum TCG can emulate".

That's a really good default when using TCG, but when running with KVM
we much rather want a default saying "the maximum performance I can get".

Fortunately we just added an option that gives us the best performance
while still staying safe on the testability side of things: -cpu best.
So all we need to do is make -cpu best the default when the user doesn't
explicitly specify a CPU type.

This fixes a lot of subtle breakage in the GNU toolchain (libgmp) which
hicks up on QEMU's non-existent CPU models.

This patch also adds a new pc-1.2 machine type to stay backwards compatible
with older versions of QEMU.

Signed-off-by: Alexander Graf 

---

v1 -> v2:

  - rebase

v2 -> v3:

  - fix typo in commit message
---
 hw/pc_piix.c |   34 ++
 1 files changed, 26 insertions(+), 8 deletions(-)

diff --git a/hw/pc_piix.c b/hw/pc_piix.c
index 0c0096f..a955bf8 100644
--- a/hw/pc_piix.c
+++ b/hw/pc_piix.c
@@ -127,7 +127,8 @@ static void pc_init1(MemoryRegion *system_memory,
  const char *initrd_filename,
  const char *cpu_model,
  int pci_enabled,
- int kvmclock_enabled)
+ int kvmclock_enabled,
+ int may_cpu_best)
 {
 int i;
 ram_addr_t below_4g_mem_size, above_4g_mem_size;
@@ -149,6 +150,9 @@ static void pc_init1(MemoryRegion *system_memory,
 MemoryRegion *rom_memory;
 void *fw_cfg = NULL;
 
+if (!cpu_model && kvm_enabled() && may_cpu_best) {
+cpu_model = "best";
+}
 pc_cpus_init(cpu_model);
 
 if (kvmclock_enabled) {
@@ -298,7 +302,21 @@ static void pc_init_pci(ram_addr_t ram_size,
  get_system_io(),
  ram_size, boot_device,
  kernel_filename, kernel_cmdline,
- initrd_filename, cpu_model, 1, 1);
+ initrd_filename, cpu_model, 1, 1, 1);
+}
+
+static void pc_init_pci_oldcpu(ram_addr_t ram_size,
+   const char *boot_device,
+   const char *kernel_filename,
+   const char *kernel_cmdline,
+   const char *initrd_filename,
+   const char *cpu_model)
+{
+pc_init1(get_system_memory(),
+ get_system_io(),
+ ram_size, boot_device,
+ kernel_filename, kernel_cmdline,
+ initrd_filename, cpu_model, 1, 1, 0);
 }
 
 static void pc_init_pci_no_kvmclock(ram_addr_t ram_size,
@@ -312,7 +330,7 @@ static void pc_init_pci_no_kvmclock(ram_addr_t ram_size,
  get_system_io(),
  ram_size, boot_device,
  kernel_filename, kernel_cmdline,
- initrd_filename, cpu_model, 1, 0);
+ initrd_filename, cpu_model, 1, 0, 0);
 }
 
 static void pc_init_isa(ram_addr_t ram_size,
@@ -328,7 +346,7 @@ static void pc_init_isa(ram_addr_t ram_size,
  get_system_io(),
  ram_size, boot_device,
  kernel_filename, kernel_cmdline,
- initrd_filename, cpu_model, 0, 1);
+ initrd_filename, cpu_model, 0, 1, 0);
 }
 
 #ifdef CONFIG_XEN
@@ -380,7 +398,7 @@ static QEMUMachine pc_machine_v1_2 = {
 static QEMUMachine pc_machine_v1_1 = {
 .name = "pc-1.1",
 .desc = "Standard PC",
-.init = pc_init_pci,
+.init = pc_init_pci_oldcpu,
 .max_cpus = 255,
 .compat_props = (GlobalProperty[]) {
 PC_COMPAT_1_1,
@@ -415,7 +433,7 @@ static QEMUMachine pc_machine_v1_1 = {
 static QEMUMachine pc_machine_v1_0 = {
 .name = "pc-1.0",
 .desc = "Standard PC",
-.init = pc_init_pci,
+.init = pc_init_pci_oldcpu,
 .max_cpus = 255,
 .compat_props = (GlobalProperty[]) {
 PC_COMPAT_1_0,
@@ -430,7 +448,7 @@ static QEMUMachine pc_machine_v1_0 = {
 static QEMUMachine pc_machine_v0_15 = {
 .name = "pc-0.15",
 .desc = "Standard PC",
-.init = pc_init_pci,
+.init = pc_init_pci_oldcpu,
 .max_cpus = 255,
 .compat_props = (GlobalProperty[]) {
 PC_COMPAT_0_15,
@@ -462,7 +480,7 @@ static QEMUMachine pc_machine_v0_15 = {
 static QEMUMachine pc_machine_v0_14 = {
 .name = "pc-0.14",
 .desc = "Standard PC",
-.init = pc_init_pci,
+.init = pc_init_pci_oldcpu,
 .max_cpus = 255,
 .compat_props = (GlobalProperty[]) {
 PC_COMPAT_0_14, 
-- 
1.6.0.2

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 1/3] KVM: Add new -cpu best

2012-07-09 Thread Alexander Graf
During discussions on whether to support -cpu host in SLE, I found myself
disagreeing to the thought, because it potentially opens a big can of worms
for potential bugs. But if I already am so opposed to it for SLE, how can
it possibly be reasonable to default to -cpu host in upstream QEMU? And what
would a sane default look like?

So I had this idea of looping through all available CPU definitions. We can
pretty well tell if our host is able to execute any of them by checking the
respective flags and seeing if our host has all features the CPU definition
requires. With that, we can create a -cpu type that would fall back to the
"best known CPU definition" that our host can fulfill. On my Phenom II
system for example, that would be -cpu phenom.

With this approach we can test and verify that CPU types actually work at
any random user setup, because we can always verify that all the -cpu types
we ship actually work. And we only default to some clever mechanism that
chooses from one of these.

Signed-off-by: Alexander Graf 

---

v2 -> v3:

  - clarify commit message
  - fix avi's style comments
---
 target-i386/cpu.c |   72 +
 1 files changed, 72 insertions(+), 0 deletions(-)

diff --git a/target-i386/cpu.c b/target-i386/cpu.c
index 5521709..a5d9468 100644
--- a/target-i386/cpu.c
+++ b/target-i386/cpu.c
@@ -558,6 +558,76 @@ static int cpu_x86_fill_host(x86_def_t *x86_cpu_def)
 return 0;
 }
 
+/* Are all guest feature bits present on the host? */
+static bool cpu_x86_feature_subset(uint32_t host, uint32_t guest)
+{
+return !(guest & ~host);
+}
+
+/* Does the host support all the features of the CPU definition? */
+static bool cpu_x86_fits_host(x86_def_t *x86_cpu_def)
+{
+uint32_t eax = 0, ebx = 0, ecx = 0, edx = 0;
+
+host_cpuid(0x0, 0, &eax, &ebx, &ecx, &edx);
+if (x86_cpu_def->level > eax) {
+return false;
+}
+if ((x86_cpu_def->vendor1 != ebx) ||
+(x86_cpu_def->vendor2 != edx) ||
+(x86_cpu_def->vendor3 != ecx)) {
+return false;
+}
+
+host_cpuid(0x1, 0, &eax, &ebx, &ecx, &edx);
+if (!cpu_x86_feature_subset(ecx, x86_cpu_def->ext_features) ||
+!cpu_x86_feature_subset(edx, x86_cpu_def->features)) {
+return false;
+}
+
+host_cpuid(0x8000, 0, &eax, &ebx, &ecx, &edx);
+if (x86_cpu_def->xlevel > eax) {
+return false;
+}
+
+host_cpuid(0x8001, 0, &eax, &ebx, &ecx, &edx);
+if (!cpu_x86_feature_subset(edx, x86_cpu_def->ext2_features) ||
+!cpu_x86_feature_subset(ecx, x86_cpu_def->ext3_features)) {
+return false;
+}
+
+return true;
+}
+
+/* Returns true when new_def is higher versioned than old_def */
+static int cpu_x86_fits_higher(x86_def_t *new_def, x86_def_t *old_def)
+{
+int old_fammod = (old_def->family << 24) | (old_def->model << 8)
+   | (old_def->stepping);
+int new_fammod = (new_def->family << 24) | (new_def->model << 8)
+   | (new_def->stepping);
+
+return new_fammod > old_fammod;
+}
+
+static void cpu_x86_fill_best(x86_def_t *x86_cpu_def)
+{
+x86_def_t *def;
+
+x86_cpu_def->family = 0;
+x86_cpu_def->model = 0;
+for (def = x86_defs; def; def = def->next) {
+if (cpu_x86_fits_host(def) && cpu_x86_fits_higher(def, x86_cpu_def)) {
+*x86_cpu_def = *def;
+}
+}
+
+if (!x86_cpu_def->family && !x86_cpu_def->model) {
+fprintf(stderr, "No fitting CPU model found!\n");
+exit(1);
+}
+}
+
 static int unavailable_host_feature(struct model_features_t *f, uint32_t mask)
 {
 int i;
@@ -878,6 +948,8 @@ static int cpu_x86_find_by_name(x86_def_t *x86_cpu_def, 
const char *cpu_model)
 break;
 if (kvm_enabled() && name && strcmp(name, "host") == 0) {
 cpu_x86_fill_host(x86_cpu_def);
+} else if (kvm_enabled() && name && strcmp(name, "best") == 0) {
+cpu_x86_fill_best(x86_cpu_def);
 } else if (!def) {
 goto error;
 } else {
-- 
1.6.0.2

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 1/2] KVM: X86: remove read buffer for mmio read

2012-07-09 Thread Gleb Natapov
On Mon, Jul 09, 2012 at 09:23:03PM +0800, Xiao Guangrong wrote:
> On 07/09/2012 08:49 PM, Avi Kivity wrote:
> > On 07/09/2012 02:23 PM, Gleb Natapov wrote:
> >>
> >>> kvm-unit-tests.git has a test for xchg to mmio.  Does it still work?
> >>>
> >>> I agree this code has to go, but it needs to be replaced by something.
> >>> Maybe a .valid flag in struct operand.
> >>>
> >> Valid will not enough for that.
> > 
> > If we make everything go through operands, any reason why not?
> > 
> 
> I noticed some instructions need to read ESP for many times (e.g, iret_real),
> maybe .valid flag is not enough for this case if the stack is in MMIO, yes?
> 
> IIUC, I also noticed ESP is not reset back if it is emulated fail (mmio is 
> needed).
> If the stack located in mmio region, this kind of instruct will be broken, i 
> know no
> guest will use mmio as stack but SDM does not limit it, is it valid?
> 
Good point about MMIO stack too.

--
Gleb.
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH v3 0/3] Add -cpu best support

2012-07-09 Thread Alexander Graf
[resend, forgot the v3, sorry]

This patch set implements support for a sane default CPU type, that plays the
middle ground between -cpu host (unmanagable test matrix) and -cpu qemu64 
(breaks
assumptions wrt family/model numbers). It also makes it the default for -M pc,
so that users who don't specify a specific CPU type on the command line always
get a fast, yet reliable CPU type chosen for them.

For a detailed description, please see patch 1/3.

Alexander Graf (3):
  KVM: Add new -cpu best
  KVM: Use -cpu best as default on x86
  i386: KVM: List -cpu host and best in -cpu ?

 hw/pc_piix.c  |   34 +-
 target-i386/cpu.c |   79 +++--
 2 files changed, 102 insertions(+), 11 deletions(-)

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH v3 1/3] KVM: Add new -cpu best

2012-07-09 Thread Alexander Graf
During discussions on whether to support -cpu host in SLE, I found myself
disagreeing to the thought, because it potentially opens a big can of worms
for potential bugs. But if I already am so opposed to it for SLE, how can
it possibly be reasonable to default to -cpu host in upstream QEMU? And what
would a sane default look like?

So I had this idea of looping through all available CPU definitions. We can
pretty well tell if our host is able to execute any of them by checking the
respective flags and seeing if our host has all features the CPU definition
requires. With that, we can create a -cpu type that would fall back to the
"best known CPU definition" that our host can fulfill. On my Phenom II
system for example, that would be -cpu phenom.

With this approach we can test and verify that CPU types actually work at
any random user setup, because we can always verify that all the -cpu types
we ship actually work. And we only default to some clever mechanism that
chooses from one of these.

Signed-off-by: Alexander Graf 

---

v2 -> v3:

  - clarify commit message
  - fix avi's style comments
---
 target-i386/cpu.c |   72 +
 1 files changed, 72 insertions(+), 0 deletions(-)

diff --git a/target-i386/cpu.c b/target-i386/cpu.c
index 5521709..a5d9468 100644
--- a/target-i386/cpu.c
+++ b/target-i386/cpu.c
@@ -558,6 +558,76 @@ static int cpu_x86_fill_host(x86_def_t *x86_cpu_def)
 return 0;
 }
 
+/* Are all guest feature bits present on the host? */
+static bool cpu_x86_feature_subset(uint32_t host, uint32_t guest)
+{
+return !(guest & ~host);
+}
+
+/* Does the host support all the features of the CPU definition? */
+static bool cpu_x86_fits_host(x86_def_t *x86_cpu_def)
+{
+uint32_t eax = 0, ebx = 0, ecx = 0, edx = 0;
+
+host_cpuid(0x0, 0, &eax, &ebx, &ecx, &edx);
+if (x86_cpu_def->level > eax) {
+return false;
+}
+if ((x86_cpu_def->vendor1 != ebx) ||
+(x86_cpu_def->vendor2 != edx) ||
+(x86_cpu_def->vendor3 != ecx)) {
+return false;
+}
+
+host_cpuid(0x1, 0, &eax, &ebx, &ecx, &edx);
+if (!cpu_x86_feature_subset(ecx, x86_cpu_def->ext_features) ||
+!cpu_x86_feature_subset(edx, x86_cpu_def->features)) {
+return false;
+}
+
+host_cpuid(0x8000, 0, &eax, &ebx, &ecx, &edx);
+if (x86_cpu_def->xlevel > eax) {
+return false;
+}
+
+host_cpuid(0x8001, 0, &eax, &ebx, &ecx, &edx);
+if (!cpu_x86_feature_subset(edx, x86_cpu_def->ext2_features) ||
+!cpu_x86_feature_subset(ecx, x86_cpu_def->ext3_features)) {
+return false;
+}
+
+return true;
+}
+
+/* Returns true when new_def is higher versioned than old_def */
+static int cpu_x86_fits_higher(x86_def_t *new_def, x86_def_t *old_def)
+{
+int old_fammod = (old_def->family << 24) | (old_def->model << 8)
+   | (old_def->stepping);
+int new_fammod = (new_def->family << 24) | (new_def->model << 8)
+   | (new_def->stepping);
+
+return new_fammod > old_fammod;
+}
+
+static void cpu_x86_fill_best(x86_def_t *x86_cpu_def)
+{
+x86_def_t *def;
+
+x86_cpu_def->family = 0;
+x86_cpu_def->model = 0;
+for (def = x86_defs; def; def = def->next) {
+if (cpu_x86_fits_host(def) && cpu_x86_fits_higher(def, x86_cpu_def)) {
+*x86_cpu_def = *def;
+}
+}
+
+if (!x86_cpu_def->family && !x86_cpu_def->model) {
+fprintf(stderr, "No fitting CPU model found!\n");
+exit(1);
+}
+}
+
 static int unavailable_host_feature(struct model_features_t *f, uint32_t mask)
 {
 int i;
@@ -878,6 +948,8 @@ static int cpu_x86_find_by_name(x86_def_t *x86_cpu_def, 
const char *cpu_model)
 break;
 if (kvm_enabled() && name && strcmp(name, "host") == 0) {
 cpu_x86_fill_host(x86_cpu_def);
+} else if (kvm_enabled() && name && strcmp(name, "best") == 0) {
+cpu_x86_fill_best(x86_cpu_def);
 } else if (!def) {
 goto error;
 } else {
-- 
1.6.0.2

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH v3 2/3] KVM: Use -cpu best as default on x86

2012-07-09 Thread Alexander Graf
When running QEMU without -cpu parameter, the user usually wants a sane
default. So far, we're using the qemu64/qemu32 CPU type, which basically
means "the maximum TCG can emulate".

That's a really good default when using TCG, but when running with KVM
we much rather want a default saying "the maximum performance I can get".

Fortunately we just added an option that gives us the best performance
while still staying safe on the testability side of things: -cpu best.
So all we need to do is make -cpu best the default when the user doesn't
explicitly specify a CPU type.

This fixes a lot of subtle breakage in the GNU toolchain (libgmp) which
hicks up on QEMU's non-existent CPU models.

This patch also adds a new pc-1.2 machine type to stay backwards compatible
with older versions of QEMU.

Signed-off-by: Alexander Graf 

---

v1 -> v2:

  - rebase

v2 -> v3:

  - fix typo in commit message
---
 hw/pc_piix.c |   34 ++
 1 files changed, 26 insertions(+), 8 deletions(-)

diff --git a/hw/pc_piix.c b/hw/pc_piix.c
index 0c0096f..a955bf8 100644
--- a/hw/pc_piix.c
+++ b/hw/pc_piix.c
@@ -127,7 +127,8 @@ static void pc_init1(MemoryRegion *system_memory,
  const char *initrd_filename,
  const char *cpu_model,
  int pci_enabled,
- int kvmclock_enabled)
+ int kvmclock_enabled,
+ int may_cpu_best)
 {
 int i;
 ram_addr_t below_4g_mem_size, above_4g_mem_size;
@@ -149,6 +150,9 @@ static void pc_init1(MemoryRegion *system_memory,
 MemoryRegion *rom_memory;
 void *fw_cfg = NULL;
 
+if (!cpu_model && kvm_enabled() && may_cpu_best) {
+cpu_model = "best";
+}
 pc_cpus_init(cpu_model);
 
 if (kvmclock_enabled) {
@@ -298,7 +302,21 @@ static void pc_init_pci(ram_addr_t ram_size,
  get_system_io(),
  ram_size, boot_device,
  kernel_filename, kernel_cmdline,
- initrd_filename, cpu_model, 1, 1);
+ initrd_filename, cpu_model, 1, 1, 1);
+}
+
+static void pc_init_pci_oldcpu(ram_addr_t ram_size,
+   const char *boot_device,
+   const char *kernel_filename,
+   const char *kernel_cmdline,
+   const char *initrd_filename,
+   const char *cpu_model)
+{
+pc_init1(get_system_memory(),
+ get_system_io(),
+ ram_size, boot_device,
+ kernel_filename, kernel_cmdline,
+ initrd_filename, cpu_model, 1, 1, 0);
 }
 
 static void pc_init_pci_no_kvmclock(ram_addr_t ram_size,
@@ -312,7 +330,7 @@ static void pc_init_pci_no_kvmclock(ram_addr_t ram_size,
  get_system_io(),
  ram_size, boot_device,
  kernel_filename, kernel_cmdline,
- initrd_filename, cpu_model, 1, 0);
+ initrd_filename, cpu_model, 1, 0, 0);
 }
 
 static void pc_init_isa(ram_addr_t ram_size,
@@ -328,7 +346,7 @@ static void pc_init_isa(ram_addr_t ram_size,
  get_system_io(),
  ram_size, boot_device,
  kernel_filename, kernel_cmdline,
- initrd_filename, cpu_model, 0, 1);
+ initrd_filename, cpu_model, 0, 1, 0);
 }
 
 #ifdef CONFIG_XEN
@@ -380,7 +398,7 @@ static QEMUMachine pc_machine_v1_2 = {
 static QEMUMachine pc_machine_v1_1 = {
 .name = "pc-1.1",
 .desc = "Standard PC",
-.init = pc_init_pci,
+.init = pc_init_pci_oldcpu,
 .max_cpus = 255,
 .compat_props = (GlobalProperty[]) {
 PC_COMPAT_1_1,
@@ -415,7 +433,7 @@ static QEMUMachine pc_machine_v1_1 = {
 static QEMUMachine pc_machine_v1_0 = {
 .name = "pc-1.0",
 .desc = "Standard PC",
-.init = pc_init_pci,
+.init = pc_init_pci_oldcpu,
 .max_cpus = 255,
 .compat_props = (GlobalProperty[]) {
 PC_COMPAT_1_0,
@@ -430,7 +448,7 @@ static QEMUMachine pc_machine_v1_0 = {
 static QEMUMachine pc_machine_v0_15 = {
 .name = "pc-0.15",
 .desc = "Standard PC",
-.init = pc_init_pci,
+.init = pc_init_pci_oldcpu,
 .max_cpus = 255,
 .compat_props = (GlobalProperty[]) {
 PC_COMPAT_0_15,
@@ -462,7 +480,7 @@ static QEMUMachine pc_machine_v0_15 = {
 static QEMUMachine pc_machine_v0_14 = {
 .name = "pc-0.14",
 .desc = "Standard PC",
-.init = pc_init_pci,
+.init = pc_init_pci_oldcpu,
 .max_cpus = 255,
 .compat_props = (GlobalProperty[]) {
 PC_COMPAT_0_14, 
-- 
1.6.0.2

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH v3 3/3] i386: KVM: List -cpu host and best in -cpu ?

2012-07-09 Thread Alexander Graf
The kvm_enabled() helper doesn't work in a function as early as -cpu ?
yet. It also doesn't make sense to list the -cpu ? output conditional on
the -enable-kvm parameter. So let's always mention -cpu host in the
CPU list when KVM is supported on that configuration.

In addition, this patch also adds listing of -cpu best in the -cpu ?
list, so that people know that this option exists.

Signed-off-by: Alexander Graf 
---
 target-i386/cpu.c |7 ---
 1 files changed, 4 insertions(+), 3 deletions(-)

diff --git a/target-i386/cpu.c b/target-i386/cpu.c
index a5d9468..c77 100644
--- a/target-i386/cpu.c
+++ b/target-i386/cpu.c
@@ -1190,9 +1190,10 @@ void x86_cpu_list(FILE *f, fprintf_function cpu_fprintf, 
const char *optarg)
 (*cpu_fprintf)(f, "\n");
 }
 }
-if (kvm_enabled()) {
-(*cpu_fprintf)(f, "x86 %16s\n", "[host]");
-}
+#ifdef CONFIG_KVM
+(*cpu_fprintf)(f, "x86 %16s\n", "KVM only: [host]");
+(*cpu_fprintf)(f, "x86 %16s\n", "KVM only: [best]");
+#endif
 }
 
 int cpu_x86_register(X86CPU *cpu, const char *cpu_model)
-- 
1.6.0.2

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 1/2] KVM: X86: remove read buffer for mmio read

2012-07-09 Thread Avi Kivity
On 07/09/2012 04:23 PM, Xiao Guangrong wrote:
> On 07/09/2012 08:49 PM, Avi Kivity wrote:
>> On 07/09/2012 02:23 PM, Gleb Natapov wrote:
>>>
 kvm-unit-tests.git has a test for xchg to mmio.  Does it still work?

 I agree this code has to go, but it needs to be replaced by something.
 Maybe a .valid flag in struct operand.

>>> Valid will not enough for that.
>> 
>> If we make everything go through operands, any reason why not?
>> 
> 
> I noticed some instructions need to read ESP for many times (e.g, iret_real),
> maybe .valid flag is not enough for this case if the stack is in MMIO, yes?

Good catch.  We either have to fix it or to restrict stack operations to
regular memory (->read_std).

> IIUC, I also noticed ESP is not reset back if it is emulated fail (mmio is 
> needed).
> If the stack located in mmio region, this kind of instruct will be broken, i 
> know no
> guest will use mmio as stack but SDM does not limit it, is it valid?

Stack in mmio (or task switch in mmio) is architecturally valid.  We
don't have to support it if no guests do it.

-- 
error compiling committee.c: too many arguments to function


--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [Qemu-devel] [PATCH 2/3] KVM: Use -cpu best as default on x86

2012-07-09 Thread Eric Blake
On 07/09/2012 06:10 AM, Alexander Graf wrote:

> 
> This fixes a lot of subtle breakage in the GNU toolchain (libgmp) which
> hicks up on QEMU's non-existent CPU models.

s/hicks up/hiccups/


> 
> v2 -> v3:
> 
>   - fix typo in commit message

but not all of them :)

-- 
Eric Blake   ebl...@redhat.com+1-919-301-3266
Libvirt virtualization library http://libvirt.org





signature.asc
Description: OpenPGP digital signature


Re: [Qemu-devel] [PATCH 2/3] KVM: Use -cpu best as default on x86

2012-07-09 Thread Alexander Graf

On 09.07.2012, at 15:47, Eric Blake wrote:

> On 07/09/2012 06:10 AM, Alexander Graf wrote:
> 
>> 
>> This fixes a lot of subtle breakage in the GNU toolchain (libgmp) which
>> hicks up on QEMU's non-existent CPU models.
> 
> s/hicks up/hiccups/
> 
> 
>> 
>> v2 -> v3:
>> 
>>  - fix typo in commit message
> 
> but not all of them :)

I hope you don't expect me to respin now :)


Alex

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [Qemu-devel] [PATCH 2/3] KVM: Use -cpu best as default on x86

2012-07-09 Thread Eric Blake
On 07/09/2012 07:48 AM, Alexander Graf wrote:
> 
> On 09.07.2012, at 15:47, Eric Blake wrote:
> 
>> On 07/09/2012 06:10 AM, Alexander Graf wrote:
>>
>>>
>>> This fixes a lot of subtle breakage in the GNU toolchain (libgmp) which
>>> hicks up on QEMU's non-existent CPU models.
>>
>> s/hicks up/hiccups/
>>
>>
>>>
>>> v2 -> v3:
>>>
>>>  - fix typo in commit message
>>
>> but not all of them :)
> 
> I hope you don't expect me to respin now :)

I don't.  If someone wants to touch it up, great; if not, it's not the
first typo in a commit.  :)

-- 
Eric Blake   ebl...@redhat.com+1-919-301-3266
Libvirt virtualization library http://libvirt.org





signature.asc
Description: OpenPGP digital signature


Re: First shot at adding IPMI to qemu

2012-07-09 Thread Daniel P. Berrange
On Mon, Jul 09, 2012 at 08:23:11AM -0500, Corey Minyard wrote:
> I haven't heard anything about these patches.  Any comments, good or
> bad?  Has anyone tried these?

You really ought to post this to the qemu-devel mailing list,
since that's where the majority of QEMU developers hang out.
This KVM list is primarily for KVM specific development tasks
in QEMU.

Daniel
-- 
|: http://berrange.com  -o-http://www.flickr.com/photos/dberrange/ :|
|: http://libvirt.org  -o- http://virt-manager.org :|
|: http://autobuild.org   -o- http://search.cpan.org/~danberr/ :|
|: http://entangle-photo.org   -o-   http://live.gnome.org/gtk-vnc :|
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 2/2] KVM: PPC: booke: Add watchdog emulation

2012-07-09 Thread Scott Wood
On 07/09/2012 12:13 AM, Bhushan Bharat-R65777 wrote:
> 
> 
>> -Original Message-
>> From: Alexander Graf [mailto:ag...@suse.de]
>> Sent: Saturday, July 07, 2012 1:21 PM
>> To: Wood Scott-B07421
>> Cc: Bhushan Bharat-R65777; kvm-...@vger.kernel.org; kvm@vger.kernel.org; 
>> Bhushan
>> Bharat-R65777
>> Subject: Re: [PATCH 2/2] KVM: PPC: booke: Add watchdog emulation
>>
>>
>> Ah, being 2 bits wide, the above code suddenly makes more sense :). How about
>>
>> /* WRC is a 2-bit field that is supposed to preserve its value once written 
>> to
>> be non-zero */
>> spr_val &= ~TCR_WRC_MASK;
>> spr_val |= vcpu->arch.tcr & TCR_WRC_MASK;
>> kvmppc_set_tcr(vcpu, spr_val);
> 
> I think you mean:
> 
> if (TCR_WRC_MASK & vcpu->arch.tcr) {
> spr_val &= ~TCR_WRC_MASK;
> spr_val |= vcpu->arch.tcr & TCR_WRC_MASK;
> }
> kvmppc_set_tcr(vcpu, spr_val);

Actually I think he means:

if (vcpu->arch.tcr & TCR_WRC_MASK) {
spr_val &= ~TCR_WRC_MASK;
spr_val |= vcpu->arch.tcr & TCR_WRC_MASK;
}

kvmppc_set_tcr(vcpu, spr_val);

:-)

-Scott

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 2/2] KVM: PPC: booke: Add watchdog emulation

2012-07-09 Thread Alexander Graf

On 09.07.2012, at 16:28, Scott Wood wrote:

> On 07/09/2012 12:13 AM, Bhushan Bharat-R65777 wrote:
>> 
>> 
>>> -Original Message-
>>> From: Alexander Graf [mailto:ag...@suse.de]
>>> Sent: Saturday, July 07, 2012 1:21 PM
>>> To: Wood Scott-B07421
>>> Cc: Bhushan Bharat-R65777; kvm-...@vger.kernel.org; kvm@vger.kernel.org; 
>>> Bhushan
>>> Bharat-R65777
>>> Subject: Re: [PATCH 2/2] KVM: PPC: booke: Add watchdog emulation
>>> 
>>> 
>>> Ah, being 2 bits wide, the above code suddenly makes more sense :). How 
>>> about
>>> 
>>> /* WRC is a 2-bit field that is supposed to preserve its value once written 
>>> to
>>> be non-zero */
>>> spr_val &= ~TCR_WRC_MASK;
>>> spr_val |= vcpu->arch.tcr & TCR_WRC_MASK;
>>> kvmppc_set_tcr(vcpu, spr_val);
>> 
>> I think you mean:
>> 
>> if (TCR_WRC_MASK & vcpu->arch.tcr) {
>>spr_val &= ~TCR_WRC_MASK;
>>spr_val |= vcpu->arch.tcr & TCR_WRC_MASK;
>> }
>> kvmppc_set_tcr(vcpu, spr_val);
> 
> Actually I think he means:
> 
> if (vcpu->arch.tcr & TCR_WRC_MASK) {
>   spr_val &= ~TCR_WRC_MASK;
>   spr_val |= vcpu->arch.tcr & TCR_WRC_MASK;
> }
> 
> kvmppc_set_tcr(vcpu, spr_val);

Plus the comment, yes :). I really don't like (mask & val) constructs. About as 
much as I dislike if (0 == x) ones.


Alex

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 1/3] apic: Resolve potential endless loop around apic_update_irq

2012-07-09 Thread Jan Kiszka
Commit d96e173769 refactored the reinjection of pending PIC interrupts.
However, it missed the potential loop of apic_update_irq ->
apic_deliver_pic_intr -> apic_local_deliver -> apic_set_irq ->
apic_update_irq that /could/ occur if LINT0 is injected as APIC_DM_FIXED
and that vector is currently blocked via TPR.

Resolve this by reinjecting only where it matters: inside
apic_get_interrupt. This function may clear a vector while a
PIC-originated reason still exists.

Signed-off-by: Jan Kiszka 
---
 hw/apic.c |   10 +++---
 1 files changed, 7 insertions(+), 3 deletions(-)

diff --git a/hw/apic.c b/hw/apic.c
index 60552df..e65a35f 100644
--- a/hw/apic.c
+++ b/hw/apic.c
@@ -363,9 +363,6 @@ static void apic_update_irq(APICCommonState *s)
 }
 if (apic_irq_pending(s) > 0) {
 cpu_interrupt(s->cpu_env, CPU_INTERRUPT_HARD);
-} else if (apic_accept_pic_intr(&s->busdev.qdev) &&
-   pic_get_output(isa_pic)) {
-apic_deliver_pic_intr(&s->busdev.qdev, 1);
 }
 }
 
@@ -560,7 +557,14 @@ int apic_get_interrupt(DeviceState *d)
 reset_bit(s->irr, intno);
 set_bit(s->isr, intno);
 apic_sync_vapic(s, SYNC_TO_VAPIC);
+
+/* re-inject if there is still a pending PIC interrupt */
+if (apic_accept_pic_intr(&s->busdev.qdev) && pic_get_output(isa_pic)) {
+apic_deliver_pic_intr(&s->busdev.qdev, 1);
+}
+
 apic_update_irq(s);
+
 return intno;
 }
 
-- 
1.7.3.4

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 2/3] apic: Reevaluate pending interrupts on LVT_LINT0 changes

2012-07-09 Thread Jan Kiszka
When the guest modifies the LVT_LINT0 register, we need to check if some
pending PIC interrupt can now be delivered.

Signed-off-by: Jan Kiszka 
---
 hw/apic.c |   18 ++
 1 files changed, 14 insertions(+), 4 deletions(-)

diff --git a/hw/apic.c b/hw/apic.c
index e65a35f..5b8f3e8 100644
--- a/hw/apic.c
+++ b/hw/apic.c
@@ -532,6 +532,15 @@ static void apic_deliver(DeviceState *d, uint8_t dest, 
uint8_t dest_mode,
 apic_bus_deliver(deliver_bitmask, delivery_mode, vector_num, trigger_mode);
 }
 
+static bool apic_check_pic(APICCommonState *s)
+{
+if (!apic_accept_pic_intr(&s->busdev.qdev) || !pic_get_output(isa_pic)) {
+return false;
+}
+apic_deliver_pic_intr(&s->busdev.qdev, 1);
+return true;
+}
+
 int apic_get_interrupt(DeviceState *d)
 {
 APICCommonState *s = DO_UPCAST(APICCommonState, busdev.qdev, d);
@@ -559,9 +568,7 @@ int apic_get_interrupt(DeviceState *d)
 apic_sync_vapic(s, SYNC_TO_VAPIC);
 
 /* re-inject if there is still a pending PIC interrupt */
-if (apic_accept_pic_intr(&s->busdev.qdev) && pic_get_output(isa_pic)) {
-apic_deliver_pic_intr(&s->busdev.qdev, 1);
-}
+apic_check_pic(s);
 
 apic_update_irq(s);
 
@@ -804,8 +811,11 @@ static void apic_mem_writel(void *opaque, 
target_phys_addr_t addr, uint32_t val)
 {
 int n = index - 0x32;
 s->lvt[n] = val;
-if (n == APIC_LVT_TIMER)
+if (n == APIC_LVT_TIMER) {
 apic_timer_update(s, qemu_get_clock_ns(vm_clock));
+} else if (n == APIC_LVT_LINT0 && apic_check_pic(s)) {
+apic_update_irq(s);
+}
 }
 break;
 case 0x38:
-- 
1.7.3.4

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 0/3] apic: Fixes for userspace model

2012-07-09 Thread Jan Kiszka
As Avi noted recently, there is a problem in way we inject interrupts
into the userspace APIC under KVM: The TRP check over the iothread may
race with the VCPU raising the TPR value while in KVM mode. Patch 3
addresses this issue.

The other two patches fix problems I came across while thinking about
the first one.

Who would like to process this series, up/master? Or should it go in
directly?

Jan Kiszka (3):
  apic: Resolve potential endless loop around apic_update_irq
  apic: Reevaluate pending interrupts on LVT_LINT0 changes
  apic: Defer interrupt updates to VCPU thread

 cpu-exec.c |6 ++
 hw/apic.c  |   27 ++-
 hw/apic.h  |1 +
 hw/apic_internal.h |1 -
 target-i386/cpu.h  |4 +++-
 target-i386/kvm.c  |4 
 6 files changed, 36 insertions(+), 7 deletions(-)

-- 
1.7.3.4

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 3/3] apic: Defer interrupt updates to VCPU thread

2012-07-09 Thread Jan Kiszka
KVM performs TPR raising asynchronously to QEMU, specifically outside
QEMU's global lock. When an interrupt is injected into the APIC and TPR
is checked to decide if this can be delivered, a stale TPR value may be
used, causing spurious interrupts in the end.

Fix this by deferring apic_update_irq to the context of the target VCPU.
We introduce a new interrupt flag for this, CPU_INTERRUPT_POLL. When it
is set, the VCPU calls apic_poll_irq before checking for further pending
interrupts. To avoid special-casing KVM, we also implement this logic
for TCG mode.

Signed-off-by: Jan Kiszka 
---
 cpu-exec.c |6 ++
 hw/apic.c  |5 -
 hw/apic.h  |1 +
 hw/apic_internal.h |1 -
 target-i386/cpu.h  |4 +++-
 target-i386/kvm.c  |4 
 6 files changed, 18 insertions(+), 3 deletions(-)

diff --git a/cpu-exec.c b/cpu-exec.c
index 08c35f7..fc185a4 100644
--- a/cpu-exec.c
+++ b/cpu-exec.c
@@ -288,6 +288,12 @@ int cpu_exec(CPUArchState *env)
 }
 #endif
 #if defined(TARGET_I386)
+#if !defined(CONFIG_USER_ONLY)
+if (interrupt_request & CPU_INTERRUPT_POLL) {
+env->interrupt_request &= ~CPU_INTERRUPT_POLL;
+apic_poll_irq(env->apic_state);
+}
+#endif
 if (interrupt_request & CPU_INTERRUPT_INIT) {
 cpu_svm_check_intercept_param(env, SVM_EXIT_INIT,
   0);
diff --git a/hw/apic.c b/hw/apic.c
index 5b8f3e8..38e 100644
--- a/hw/apic.c
+++ b/hw/apic.c
@@ -16,6 +16,7 @@
  * You should have received a copy of the GNU Lesser General Public
  * License along with this library; if not, see 
  */
+#include "qemu-thread.h"
 #include "apic_internal.h"
 #include "apic.h"
 #include "ioapic.h"
@@ -361,7 +362,9 @@ static void apic_update_irq(APICCommonState *s)
 if (!(s->spurious_vec & APIC_SV_ENABLE)) {
 return;
 }
-if (apic_irq_pending(s) > 0) {
+if (!qemu_cpu_is_self(s->cpu_env)) {
+cpu_interrupt(s->cpu_env, CPU_INTERRUPT_POLL);
+} else if (apic_irq_pending(s) > 0) {
 cpu_interrupt(s->cpu_env, CPU_INTERRUPT_HARD);
 }
 }
diff --git a/hw/apic.h b/hw/apic.h
index 62179ce..a89542b 100644
--- a/hw/apic.h
+++ b/hw/apic.h
@@ -20,6 +20,7 @@ void apic_init_reset(DeviceState *s);
 void apic_sipi(DeviceState *s);
 void apic_handle_tpr_access_report(DeviceState *d, target_ulong ip,
TPRAccess access);
+void apic_poll_irq(DeviceState *d);
 
 /* pc.c */
 int cpu_is_bsp(CPUX86State *env);
diff --git a/hw/apic_internal.h b/hw/apic_internal.h
index 60a6a8b..4d8ff49 100644
--- a/hw/apic_internal.h
+++ b/hw/apic_internal.h
@@ -141,7 +141,6 @@ void apic_report_irq_delivered(int delivered);
 bool apic_next_timer(APICCommonState *s, int64_t current_time);
 void apic_enable_tpr_access_reporting(DeviceState *d, bool enable);
 void apic_enable_vapic(DeviceState *d, target_phys_addr_t paddr);
-void apic_poll_irq(DeviceState *d);
 
 void vapic_report_tpr_access(DeviceState *dev, void *cpu, target_ulong ip,
  TPRAccess access);
diff --git a/target-i386/cpu.h b/target-i386/cpu.h
index f257c97..1f6f14f 100644
--- a/target-i386/cpu.h
+++ b/target-i386/cpu.h
@@ -477,6 +477,7 @@
  for syscall instruction */
 
 /* i386-specific interrupt pending bits.  */
+#define CPU_INTERRUPT_POLL  CPU_INTERRUPT_TGT_EXT_1
 #define CPU_INTERRUPT_SMI   CPU_INTERRUPT_TGT_EXT_2
 #define CPU_INTERRUPT_NMI   CPU_INTERRUPT_TGT_EXT_3
 #define CPU_INTERRUPT_MCE   CPU_INTERRUPT_TGT_EXT_4
@@ -1047,7 +1048,8 @@ static inline void cpu_clone_regs(CPUX86State *env, 
target_ulong newsp)
 
 static inline bool cpu_has_work(CPUX86State *env)
 {
-return ((env->interrupt_request & CPU_INTERRUPT_HARD) &&
+return ((env->interrupt_request & (CPU_INTERRUPT_HARD |
+   CPU_INTERRUPT_POLL)) &&
 (env->eflags & IF_MASK)) ||
(env->interrupt_request & (CPU_INTERRUPT_NMI |
   CPU_INTERRUPT_INIT |
diff --git a/target-i386/kvm.c b/target-i386/kvm.c
index 0d0d8f6..cfe60bc 100644
--- a/target-i386/kvm.c
+++ b/target-i386/kvm.c
@@ -1727,6 +1727,10 @@ int kvm_arch_process_async_events(CPUX86State *env)
 return 0;
 }
 
+if (env->interrupt_request & CPU_INTERRUPT_POLL) {
+env->interrupt_request &= ~CPU_INTERRUPT_POLL;
+apic_poll_irq(env->apic_state);
+}
 if (((env->interrupt_request & CPU_INTERRUPT_HARD) &&
  (env->eflags & IF_MASK)) ||
 (env->interrupt_request & CPU_INTERRUPT_NMI)) {
-- 
1.7.3.4

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


RE: [PATCH 2/2] KVM: PPC: booke: Add watchdog emulation

2012-07-09 Thread Bhushan Bharat-R65777


> -Original Message-
> From: kvm-ppc-ow...@vger.kernel.org [mailto:kvm-ppc-ow...@vger.kernel.org] On
> Behalf Of Alexander Graf
> Sent: Monday, July 09, 2012 8:07 PM
> To: Wood Scott-B07421
> Cc: Bhushan Bharat-R65777; Wood Scott-B07421; kvm-...@vger.kernel.org;
> kvm@vger.kernel.org
> Subject: Re: [PATCH 2/2] KVM: PPC: booke: Add watchdog emulation
> 
> 
> On 09.07.2012, at 16:28, Scott Wood wrote:
> 
> > On 07/09/2012 12:13 AM, Bhushan Bharat-R65777 wrote:
> >>
> >>
> >>> -Original Message-
> >>> From: Alexander Graf [mailto:ag...@suse.de]
> >>> Sent: Saturday, July 07, 2012 1:21 PM
> >>> To: Wood Scott-B07421
> >>> Cc: Bhushan Bharat-R65777; kvm-...@vger.kernel.org;
> >>> kvm@vger.kernel.org; Bhushan
> >>> Bharat-R65777
> >>> Subject: Re: [PATCH 2/2] KVM: PPC: booke: Add watchdog emulation
> >>>
> >>>
> >>> Ah, being 2 bits wide, the above code suddenly makes more sense :).
> >>> How about
> >>>
> >>> /* WRC is a 2-bit field that is supposed to preserve its value once
> >>> written to be non-zero */ spr_val &= ~TCR_WRC_MASK; spr_val |=
> >>> vcpu->arch.tcr & TCR_WRC_MASK; kvmppc_set_tcr(vcpu, spr_val);
> >>
> >> I think you mean:
> >>
> >> if (TCR_WRC_MASK & vcpu->arch.tcr) {
> >>spr_val &= ~TCR_WRC_MASK;
> >>spr_val |= vcpu->arch.tcr & TCR_WRC_MASK; } kvmppc_set_tcr(vcpu,
> >> spr_val);
> >
> > Actually I think he means:
> >
> > if (vcpu->arch.tcr & TCR_WRC_MASK) {
> > spr_val &= ~TCR_WRC_MASK;
> > spr_val |= vcpu->arch.tcr & TCR_WRC_MASK; }
> >
> > kvmppc_set_tcr(vcpu, spr_val);
> 
> Plus the comment, yes :). I really don't like (mask & val) constructs. About 
> as
> much as I dislike if (0 == x) ones.

I think I did in v2 :)

> 
> 
> Alex
> 
> --
> To unsubscribe from this list: send the line "unsubscribe kvm-ppc" in the body
> of a message to majord...@vger.kernel.org More majordomo info at
> http://vger.kernel.org/majordomo-info.html


--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] pci-assign: Switch to PCI_HOST_DEVADDR property

2012-07-09 Thread Alex Williamson
On Fri, 2012-07-06 at 18:22 +0200, Jan Kiszka wrote:
> Replace the home-brewed qdev property for PCI host addresses with the
> new upstream version.
> 
> Signed-off-by: Jan Kiszka 
> ---
>  hw/device-assignment.c |   64 ---
>  hw/pci.c   |   77 
> 
>  hw/pci.h   |3 --
>  3 files changed, 20 insertions(+), 124 deletions(-)

Looks good

Acked-by: Alex Williamson 



> diff --git a/hw/device-assignment.c b/hw/device-assignment.c
> index 1336689..34593ab 100644
> --- a/hw/device-assignment.c
> +++ b/hw/device-assignment.c
> @@ -63,13 +63,6 @@
>  #define DEBUG(fmt, ...) do { } while(0)
>  #endif
>  
> -typedef struct PCIHostDevice {
> -int seg;
> -int bus;
> -int dev;
> -int func;
> -} PCIHostDevice;
> -
>  typedef struct {
>  int type;   /* Memory or port I/O */
>  int valid;
> @@ -115,7 +108,7 @@ typedef struct {
>  
>  typedef struct AssignedDevice {
>  PCIDevice dev;
> -PCIHostDevice host;
> +PCIHostDeviceAddress host;
>  uint32_t features;
>  int intpin;
>  uint8_t debug_flags;
> @@ -778,7 +771,8 @@ static void assign_failed_examine(AssignedDevice *dev)
>  int r;
>  
>  sprintf(dir, "/sys/bus/pci/devices/%04x:%02x:%02x.%01x/",
> -dev->host.seg, dev->host.bus, dev->host.dev, dev->host.func);
> +dev->host.domain, dev->host.bus, dev->host.slot,
> +dev->host.function);
>  
>  sprintf(name, "%sdriver", dir);
>  
> @@ -796,7 +790,8 @@ static void assign_failed_examine(AssignedDevice *dev)
>  
>  fprintf(stderr, "*** The driver '%s' is occupying your device "
>  "%04x:%02x:%02x.%x.\n",
> -ns, dev->host.seg, dev->host.bus, dev->host.dev, dev->host.func);
> +ns, dev->host.domain, dev->host.bus, dev->host.slot,
> +dev->host.function);
>  fprintf(stderr, "***\n");
>  fprintf(stderr, "*** You can try the following commands to free it:\n");
>  fprintf(stderr, "***\n");
> @@ -804,10 +799,12 @@ static void assign_failed_examine(AssignedDevice *dev)
>  "new_id\n", vendor_id, device_id);
>  fprintf(stderr, "*** $ echo \"%04x:%02x:%02x.%x\" > 
> /sys/bus/pci/drivers/"
>  "%s/unbind\n",
> -dev->host.seg, dev->host.bus, dev->host.dev, dev->host.func, ns);
> +dev->host.domain, dev->host.bus, dev->host.slot,
> +dev->host.function, ns);
>  fprintf(stderr, "*** $ echo \"%04x:%02x:%02x.%x\" > 
> /sys/bus/pci/drivers/"
>  "pci-stub/bind\n",
> -dev->host.seg, dev->host.bus, dev->host.dev, dev->host.func);
> +dev->host.domain, dev->host.bus, dev->host.slot,
> +dev->host.function);
>  fprintf(stderr, "*** $ echo \"%04x %04x\" > 
> /sys/bus/pci/drivers/pci-stub"
>  "/remove_id\n", vendor_id, device_id);
>  fprintf(stderr, "***\n");
> @@ -1699,7 +1696,8 @@ static void reset_assigned_device(DeviceState *dev)
>  
>  snprintf(reset_file, sizeof(reset_file),
>   "/sys/bus/pci/devices/%04x:%02x:%02x.%01x/reset",
> - adev->host.seg, adev->host.bus, adev->host.dev, 
> adev->host.func);
> + adev->host.domain, adev->host.bus, adev->host.slot,
> + adev->host.function);
>  
>  /*
>   * Issue a device reset via pci-sysfs.  Note that we use write(2) here
> @@ -1732,7 +1730,8 @@ static int assigned_initfn(struct PCIDevice *pci_dev)
>  return -1;
>  }
>  
> -if (!dev->host.seg && !dev->host.bus && !dev->host.dev && 
> !dev->host.func) {
> +if (!dev->host.domain && !dev->host.bus && !dev->host.slot &&
> +!dev->host.function) {
>  error_report("pci-assign: error: no host device specified");
>  return -1;
>  }
> @@ -1757,8 +1756,8 @@ static int assigned_initfn(struct PCIDevice *pci_dev)
>  memcpy(dev->emulate_config_write, dev->emulate_config_read,
> sizeof(dev->emulate_config_read));
>  
> -if (get_real_device(dev, dev->host.seg, dev->host.bus,
> -dev->host.dev, dev->host.func)) {
> +if (get_real_device(dev, dev->host.domain, dev->host.bus,
> +dev->host.slot, dev->host.function)) {
>  error_report("pci-assign: Error: Couldn't get real device (%s)!",
>   dev->dev.qdev.id);
>  goto out;
> @@ -1786,9 +1785,9 @@ static int assigned_initfn(struct PCIDevice *pci_dev)
>  dev->intpin = e_intx;
>  dev->run = 0;
>  dev->girq = -1;
> -dev->h_segnr = dev->host.seg;
> +dev->h_segnr = dev->host.domain;
>  dev->h_busnr = dev->host.bus;
> -dev->h_devfn = PCI_DEVFN(dev->host.dev, dev->host.func);
> +dev->h_devfn = PCI_DEVFN(dev->host.slot, dev->host.function);
>  
>  /* assign device to guest */
>  r = assign_device(dev);
> @@ -1824,33 +1823,9 @@ static int assigned_exitf

Re: [PATCHv3 RFC 0/2] kvm: direct msix injection

2012-07-09 Thread Jan Kiszka
On 2012-07-09 00:55, Michael S. Tsirkin wrote:
> On Mon, Jul 02, 2012 at 11:08:20AM -0600, Alex Williamson wrote:
>> On Mon, 2012-06-25 at 11:32 +0200, Jan Kiszka wrote:
>>> On 2012-06-11 13:19, Michael S. Tsirkin wrote:
 We can deliver certain interrupts, notably MSIX,
 from atomic context.
 Here's an untested patch to do this (compiled only).

 Changes from v2:
 Don't inject broadcast interrupts directly
 Changes from v1:
 Tried to address comments from v1, except unifying
 with kvm_set_irq: passing flags to it looks too ugly.
 Added a comment.

 Jan, you said you can test this?


 Michael S. Tsirkin (2):
   kvm: implement kvm_set_msi_inatomic
   kvm: deliver msix interrupts from irq handler

  include/linux/kvm_host.h |  3 ++
  virt/kvm/assigned-dev.c  | 31 ++--
  virt/kvm/irq_comm.c  | 75 
 
  3 files changed, 102 insertions(+), 7 deletions(-)

>>>
>>> Finally-tested-by: Jan Kiszka 
>>
>> Michael, we need either this or the simple oneshot patch to get device
>> assignment working again for 3.5.  Are you planning to push this for
>> 3.5?  Thanks,
>>
>> Alex
>>
> 
> Avi wants this reworked using an injection cache. I agree with Jan
> though: oneshot looks too ugly. Just so you can make progress, can't we
> add a stub handler returning IRQ_WAKE_THREAD unconditionally for now?
> I.e. like the below (warning: completely untested).
> 
> Signed-off-by: Michael S. Tsirkin 
> 
> --
> 
> diff --git a/virt/kvm/assigned-dev.c b/virt/kvm/assigned-dev.c
> index b1e091a..18cc36e 100644
> --- a/virt/kvm/assigned-dev.c
> +++ b/virt/kvm/assigned-dev.c
> @@ -334,6 +334,11 @@ static int assigned_device_enable_host_intx(struct kvm 
> *kvm,
>  }
>  
>  #ifdef __KVM_HAVE_MSI
> +static irqreturn_t kvm_assigned_dev_msi(int irq, void *dev_id)
> +{
> +   return IRQ_WAKE_THREAD;
> +}
> +
>  static int assigned_device_enable_host_msi(struct kvm *kvm,
>  struct kvm_assigned_dev_kernel *dev)
>  {
> @@ -346,7 +351,7 @@ static int assigned_device_enable_host_msi(struct kvm 
> *kvm,
>   }
>  
>   dev->host_irq = dev->dev->irq;
> - if (request_threaded_irq(dev->host_irq, NULL,
> + if (request_threaded_irq(dev->host_irq, kvm_assigned_dev_msi,
>kvm_assigned_dev_thread_msi, 0,
>dev->irq_name, dev)) {
>   pci_disable_msi(dev->dev);
> @@ -358,6 +363,11 @@ static int assigned_device_enable_host_msi(struct kvm 
> *kvm,
>  #endif
>  
>  #ifdef __KVM_HAVE_MSIX
> +static irqreturn_t kvm_assigned_dev_msix(int irq, void *dev_id)
> +{
> +   return IRQ_WAKE_THREAD;
> +}
> +
>  static int assigned_device_enable_host_msix(struct kvm *kvm,
>   struct kvm_assigned_dev_kernel *dev)
>  {
> @@ -374,7 +384,8 @@ static int assigned_device_enable_host_msix(struct kvm 
> *kvm,
>  
>   for (i = 0; i < dev->entries_nr; i++) {
>   r = request_threaded_irq(dev->host_msix_entries[i].vector,
> -  NULL, kvm_assigned_dev_thread_msix,
> +  kvm_assigned_dev_msix,
> +  kvm_assigned_dev_thread_msix,
>0, dev->irq_name, dev);
>   if (r)
>   goto err;
> 

Should restore the status quo before 3.5's IRQ layer changes. Looks ok
to me.

Jan

-- 
Siemens AG, Corporate Technology, CT RTC ITP SDP-DE
Corporate Competence Center Embedded Linux
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCHv3 RFC 0/2] kvm: direct msix injection

2012-07-09 Thread Alex Williamson
On Mon, 2012-07-09 at 17:32 +0200, Jan Kiszka wrote:
> On 2012-07-09 00:55, Michael S. Tsirkin wrote:
> > On Mon, Jul 02, 2012 at 11:08:20AM -0600, Alex Williamson wrote:
> >> On Mon, 2012-06-25 at 11:32 +0200, Jan Kiszka wrote:
> >>> On 2012-06-11 13:19, Michael S. Tsirkin wrote:
>  We can deliver certain interrupts, notably MSIX,
>  from atomic context.
>  Here's an untested patch to do this (compiled only).
> 
>  Changes from v2:
>  Don't inject broadcast interrupts directly
>  Changes from v1:
>  Tried to address comments from v1, except unifying
>  with kvm_set_irq: passing flags to it looks too ugly.
>  Added a comment.
> 
>  Jan, you said you can test this?
> 
> 
>  Michael S. Tsirkin (2):
>    kvm: implement kvm_set_msi_inatomic
>    kvm: deliver msix interrupts from irq handler
> 
>   include/linux/kvm_host.h |  3 ++
>   virt/kvm/assigned-dev.c  | 31 ++--
>   virt/kvm/irq_comm.c  | 75 
>  
>   3 files changed, 102 insertions(+), 7 deletions(-)
> 
> >>>
> >>> Finally-tested-by: Jan Kiszka 
> >>
> >> Michael, we need either this or the simple oneshot patch to get device
> >> assignment working again for 3.5.  Are you planning to push this for
> >> 3.5?  Thanks,
> >>
> >> Alex
> >>
> > 
> > Avi wants this reworked using an injection cache. I agree with Jan
> > though: oneshot looks too ugly. Just so you can make progress, can't we
> > add a stub handler returning IRQ_WAKE_THREAD unconditionally for now?
> > I.e. like the below (warning: completely untested).
> > 
> > Signed-off-by: Michael S. Tsirkin 
> > 
> > --
> > 
> > diff --git a/virt/kvm/assigned-dev.c b/virt/kvm/assigned-dev.c
> > index b1e091a..18cc36e 100644
> > --- a/virt/kvm/assigned-dev.c
> > +++ b/virt/kvm/assigned-dev.c
> > @@ -334,6 +334,11 @@ static int assigned_device_enable_host_intx(struct kvm 
> > *kvm,
> >  }
> >  
> >  #ifdef __KVM_HAVE_MSI
> > +static irqreturn_t kvm_assigned_dev_msi(int irq, void *dev_id)
> > +{
> > +   return IRQ_WAKE_THREAD;
> > +}
> > +
> >  static int assigned_device_enable_host_msi(struct kvm *kvm,
> >struct kvm_assigned_dev_kernel *dev)
> >  {
> > @@ -346,7 +351,7 @@ static int assigned_device_enable_host_msi(struct kvm 
> > *kvm,
> > }
> >  
> > dev->host_irq = dev->dev->irq;
> > -   if (request_threaded_irq(dev->host_irq, NULL,
> > +   if (request_threaded_irq(dev->host_irq, kvm_assigned_dev_msi,
> >  kvm_assigned_dev_thread_msi, 0,
> >  dev->irq_name, dev)) {
> > pci_disable_msi(dev->dev);
> > @@ -358,6 +363,11 @@ static int assigned_device_enable_host_msi(struct kvm 
> > *kvm,
> >  #endif
> >  
> >  #ifdef __KVM_HAVE_MSIX
> > +static irqreturn_t kvm_assigned_dev_msix(int irq, void *dev_id)
> > +{
> > +   return IRQ_WAKE_THREAD;
> > +}
> > +
> >  static int assigned_device_enable_host_msix(struct kvm *kvm,
> > struct kvm_assigned_dev_kernel *dev)
> >  {
> > @@ -374,7 +384,8 @@ static int assigned_device_enable_host_msix(struct kvm 
> > *kvm,
> >  
> > for (i = 0; i < dev->entries_nr; i++) {
> > r = request_threaded_irq(dev->host_msix_entries[i].vector,
> > -NULL, kvm_assigned_dev_thread_msix,
> > +kvm_assigned_dev_msix,
> > +kvm_assigned_dev_thread_msix,
> >  0, dev->irq_name, dev);
> > if (r)
> > goto err;
> > 
> 
> Should restore the status quo before 3.5's IRQ layer changes. Looks ok
> to me.

I'll test and post this, Michael is out this week.  Thanks,

Alex


--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [net-next RFC V5 0/5] Multiqueue virtio-net

2012-07-09 Thread Rick Jones

On 07/08/2012 08:23 PM, Jason Wang wrote:

On 07/07/2012 12:23 AM, Rick Jones wrote:

On 07/06/2012 12:42 AM, Jason Wang wrote:
Which mechanism to address skew error?  The netperf manual describes
more than one:


This mechanism is missed in my test, I would add them to my test scripts.


http://www.netperf.org/svn/netperf2/trunk/doc/netperf.html#Using-Netperf-to-Measure-Aggregate-Performance


Personally, my preference these days is to use the "demo mode" method
of aggregate results as it can be rather faster than (ab)using the
confidence intervals mechanism, which I suspect may not really scale
all that well to large numbers of concurrent netperfs.


During my test, the confidence interval would even hard to achieved in
RR test when I pin vhost/vcpus in the processors, so I didn't use it.


When running aggregate netperfs, *something* has to be done to address 
the prospect of skew error.  Otherwise the results are suspect.


happy benchmarking,

rick jones
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH] kvm: Fix device assignment threaded irq handler

2012-07-09 Thread Alex Williamson
The kernel no longer allows us to pass NULL for the hard handler
without also specifying IRQF_ONESHOT.  IRQF_ONESHOT imposes latency
in the exit path that we don't need for MSI interrupts.  Long term
we'd like to inject these interrupts from the hard handler when
possible.  In the short term, we can create dummy hard handlers
that return us to the previous behavior.  Credit to Michael for
original patch.

Fixes: https://bugzilla.kernel.org/show_bug.cgi?id=43328

Signed-off-by: Michael S. Tsirkin 
Signed-off-by: Alex Williamson 
---

Patch against kvm/master for v3.5

 virt/kvm/assigned-dev.c |   15 +--
 1 file changed, 13 insertions(+), 2 deletions(-)

diff --git a/virt/kvm/assigned-dev.c b/virt/kvm/assigned-dev.c
index b1e091a..23a41a9 100644
--- a/virt/kvm/assigned-dev.c
+++ b/virt/kvm/assigned-dev.c
@@ -334,6 +334,11 @@ static int assigned_device_enable_host_intx(struct kvm 
*kvm,
 }
 
 #ifdef __KVM_HAVE_MSI
+static irqreturn_t kvm_assigned_dev_msi(int irq, void *dev_id)
+{
+   return IRQ_WAKE_THREAD;
+}
+
 static int assigned_device_enable_host_msi(struct kvm *kvm,
   struct kvm_assigned_dev_kernel *dev)
 {
@@ -346,7 +351,7 @@ static int assigned_device_enable_host_msi(struct kvm *kvm,
}
 
dev->host_irq = dev->dev->irq;
-   if (request_threaded_irq(dev->host_irq, NULL,
+   if (request_threaded_irq(dev->host_irq, kvm_assigned_dev_msi,
 kvm_assigned_dev_thread_msi, 0,
 dev->irq_name, dev)) {
pci_disable_msi(dev->dev);
@@ -358,6 +363,11 @@ static int assigned_device_enable_host_msi(struct kvm *kvm,
 #endif
 
 #ifdef __KVM_HAVE_MSIX
+static irqreturn_t kvm_assigned_dev_msix(int irq, void *dev_id)
+{
+   return IRQ_WAKE_THREAD;
+}
+
 static int assigned_device_enable_host_msix(struct kvm *kvm,
struct kvm_assigned_dev_kernel *dev)
 {
@@ -374,7 +384,8 @@ static int assigned_device_enable_host_msix(struct kvm *kvm,
 
for (i = 0; i < dev->entries_nr; i++) {
r = request_threaded_irq(dev->host_msix_entries[i].vector,
-NULL, kvm_assigned_dev_thread_msix,
+kvm_assigned_dev_msix,
+kvm_assigned_dev_thread_msix,
 0, dev->irq_name, dev);
if (r)
goto err;

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH v3 0/6] Optimize vcpu->requests processing

2012-07-09 Thread Avi Kivity
Currently, any time a request bit is set (not too uncommon) we check all of 
them.
This patchset optimizes the process slightly by skipping over unset bits using
for_each_set_bit().

v3:
   new approach using for_each_set_bit(), as the previous one might have 
skipped a bit.

Avi Kivity (6):
  KVM: Don't use KVM_REQ_PENDING_TIMER
  KVM: Simplify KVM_REQ_EVENT/req_int_win handling
  KVM: Move mmu reload out of line
  KVM: Optimize vcpu->requests checking
  KVM: Reorder KVM_REQ_EVENT to optimize processing
  KVM: Clean up vcpu->requests == 0 processing

 arch/x86/kvm/mmu.c   |   4 +-
 arch/x86/kvm/svm.c   |   1 +
 arch/x86/kvm/timer.c |   8 +--
 arch/x86/kvm/x86.c   | 142 ++-
 include/linux/kvm_host.h |  14 ++---
 5 files changed, 91 insertions(+), 78 deletions(-)

-- 
1.7.11

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH v3 1/6] KVM: Don't use KVM_REQ_PENDING_TIMER

2012-07-09 Thread Avi Kivity
It's a write-only bit, set by the timer and cleared by the main loop.
Remove it.  Retain the definition since ppc uses it.

Signed-off-by: Avi Kivity 
---
 arch/x86/kvm/timer.c | 8 ++--
 arch/x86/kvm/x86.c   | 1 -
 2 files changed, 2 insertions(+), 7 deletions(-)

diff --git a/arch/x86/kvm/timer.c b/arch/x86/kvm/timer.c
index 6b85cc6..c28f838 100644
--- a/arch/x86/kvm/timer.c
+++ b/arch/x86/kvm/timer.c
@@ -27,14 +27,10 @@ enum hrtimer_restart kvm_timer_fn(struct hrtimer *data)
/*
 * There is a race window between reading and incrementing, but we do
 * not care about potentially losing timer events in the !reinject
-* case anyway. Note: KVM_REQ_PENDING_TIMER is implicitly checked
-* in vcpu_enter_guest.
+* case anyway.
 */
-   if (ktimer->reinject || !atomic_read(&ktimer->pending)) {
+   if (ktimer->reinject || !atomic_read(&ktimer->pending))
atomic_inc(&ktimer->pending);
-   /* FIXME: this code should not know anything about vcpus */
-   kvm_make_request(KVM_REQ_PENDING_TIMER, vcpu);
-   }
 
if (waitqueue_active(q))
wake_up_interruptible(q);
diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index ff0b487..ae07ef2 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -5437,7 +5437,6 @@ static int __vcpu_run(struct kvm_vcpu *vcpu)
if (r <= 0)
break;
 
-   clear_bit(KVM_REQ_PENDING_TIMER, &vcpu->requests);
if (kvm_cpu_has_pending_timer(vcpu))
kvm_inject_pending_timer_irqs(vcpu);
 
-- 
1.7.11

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH v3 2/6] KVM: Simplify KVM_REQ_EVENT/req_int_win handling

2012-07-09 Thread Avi Kivity
Put the KVM_REQ_EVENT block in the regular vcpu->requests if (),
instead of its own little check.

Signed-off-by: Avi Kivity 
---
 arch/x86/kvm/x86.c | 30 --
 1 file changed, 16 insertions(+), 14 deletions(-)

diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index ae07ef2..959e5a9 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -5222,6 +5222,9 @@ static int vcpu_enter_guest(struct kvm_vcpu *vcpu)
vcpu->run->request_interrupt_window;
bool req_immediate_exit = 0;
 
+   if (unlikely(req_int_win))
+   kvm_make_request(KVM_REQ_EVENT, vcpu);
+
if (vcpu->requests) {
if (kvm_check_request(KVM_REQ_MMU_RELOAD, vcpu))
kvm_mmu_unload(vcpu);
@@ -5266,20 +5269,19 @@ static int vcpu_enter_guest(struct kvm_vcpu *vcpu)
kvm_handle_pmu_event(vcpu);
if (kvm_check_request(KVM_REQ_PMI, vcpu))
kvm_deliver_pmi(vcpu);
-   }
-
-   if (kvm_check_request(KVM_REQ_EVENT, vcpu) || req_int_win) {
-   inject_pending_event(vcpu);
-
-   /* enable NMI/IRQ window open exits if needed */
-   if (vcpu->arch.nmi_pending)
-   kvm_x86_ops->enable_nmi_window(vcpu);
-   else if (kvm_cpu_has_interrupt(vcpu) || req_int_win)
-   kvm_x86_ops->enable_irq_window(vcpu);
-
-   if (kvm_lapic_enabled(vcpu)) {
-   update_cr8_intercept(vcpu);
-   kvm_lapic_sync_to_vapic(vcpu);
+   if (kvm_check_request(KVM_REQ_EVENT, vcpu)) {
+   inject_pending_event(vcpu);
+
+   /* enable NMI/IRQ window open exits if needed */
+   if (vcpu->arch.nmi_pending)
+   kvm_x86_ops->enable_nmi_window(vcpu);
+   else if (kvm_cpu_has_interrupt(vcpu) || req_int_win)
+   kvm_x86_ops->enable_irq_window(vcpu);
+
+   if (kvm_lapic_enabled(vcpu)) {
+   update_cr8_intercept(vcpu);
+   kvm_lapic_sync_to_vapic(vcpu);
+   }
}
}
 
-- 
1.7.11

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH v3 3/6] KVM: Move mmu reload out of line

2012-07-09 Thread Avi Kivity
Currently we check that the mmu root exits before every entry.  Use the
existing KVM_REQ_MMU_RELOAD mechanism instead, by making it really reload
the mmu, and by adding the request to mmu initialization code.

Signed-off-by: Avi Kivity 
---
 arch/x86/kvm/mmu.c |  4 +++-
 arch/x86/kvm/svm.c |  1 +
 arch/x86/kvm/x86.c | 13 +++--
 3 files changed, 11 insertions(+), 7 deletions(-)

diff --git a/arch/x86/kvm/mmu.c b/arch/x86/kvm/mmu.c
index 569cd66..136d757 100644
--- a/arch/x86/kvm/mmu.c
+++ b/arch/x86/kvm/mmu.c
@@ -3180,7 +3180,8 @@ void kvm_mmu_flush_tlb(struct kvm_vcpu *vcpu)
 static void paging_new_cr3(struct kvm_vcpu *vcpu)
 {
pgprintk("%s: cr3 %lx\n", __func__, kvm_read_cr3(vcpu));
-   mmu_free_roots(vcpu);
+   kvm_mmu_unload(vcpu);
+   kvm_mmu_load(vcpu);
 }
 
 static unsigned long get_cr3(struct kvm_vcpu *vcpu)
@@ -3469,6 +3470,7 @@ static int init_kvm_nested_mmu(struct kvm_vcpu *vcpu)
 
 static int init_kvm_mmu(struct kvm_vcpu *vcpu)
 {
+   kvm_make_request(KVM_REQ_MMU_RELOAD, vcpu);
if (mmu_is_nested(vcpu))
return init_kvm_nested_mmu(vcpu);
else if (tdp_enabled)
diff --git a/arch/x86/kvm/svm.c b/arch/x86/kvm/svm.c
index 7a41878..d77ad8c 100644
--- a/arch/x86/kvm/svm.c
+++ b/arch/x86/kvm/svm.c
@@ -2523,6 +2523,7 @@ static bool nested_svm_vmrun(struct vcpu_svm *svm)
 
if (nested_vmcb->control.nested_ctl) {
kvm_mmu_unload(&svm->vcpu);
+   kvm_make_request(KVM_REQ_MMU_RELOAD, &svm->vcpu);
svm->nested.nested_cr3 = nested_vmcb->control.nested_cr3;
nested_svm_init_mmu_context(&svm->vcpu);
}
diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index 959e5a9..162231f 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -5226,8 +5226,14 @@ static int vcpu_enter_guest(struct kvm_vcpu *vcpu)
kvm_make_request(KVM_REQ_EVENT, vcpu);
 
if (vcpu->requests) {
-   if (kvm_check_request(KVM_REQ_MMU_RELOAD, vcpu))
+   if (kvm_check_request(KVM_REQ_MMU_RELOAD, vcpu)) {
kvm_mmu_unload(vcpu);
+   r = kvm_mmu_reload(vcpu);
+   if (unlikely(r)) {
+   kvm_make_request(KVM_REQ_MMU_RELOAD, vcpu);
+   goto out;
+   }
+   }
if (kvm_check_request(KVM_REQ_MIGRATE_TIMER, vcpu))
__kvm_migrate_timers(vcpu);
if (kvm_check_request(KVM_REQ_CLOCK_UPDATE, vcpu)) {
@@ -5285,11 +5291,6 @@ static int vcpu_enter_guest(struct kvm_vcpu *vcpu)
}
}
 
-   r = kvm_mmu_reload(vcpu);
-   if (unlikely(r)) {
-   goto cancel_injection;
-   }
-
preempt_disable();
 
kvm_x86_ops->prepare_guest_switch(vcpu);
-- 
1.7.11

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH v3 4/6] KVM: Optimize vcpu->requests checking

2012-07-09 Thread Avi Kivity
Instead of checking for each request linearly, use for_each_set_bit() to
iterate on just the requests that are set (should be 0 or 1 most of the
time).

To avoid a useless call to find_first_bit(), add an extra check for no
requests set.  To avoid an extra indent and an unreviewable patch, I
added a rather ugly goto.  This can be fixed in a later patch.

Signed-off-by: Avi Kivity 
---
 arch/x86/kvm/x86.c | 62 ++
 1 file changed, 39 insertions(+), 23 deletions(-)

diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index 162231f..9296dce 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -5217,6 +5217,7 @@ static void process_nmi(struct kvm_vcpu *vcpu)
 
 static int vcpu_enter_guest(struct kvm_vcpu *vcpu)
 {
+   unsigned req;
int r;
bool req_int_win = !irqchip_in_kernel(vcpu->kvm) &&
vcpu->run->request_interrupt_window;
@@ -5225,57 +5226,67 @@ static int vcpu_enter_guest(struct kvm_vcpu *vcpu)
if (unlikely(req_int_win))
kvm_make_request(KVM_REQ_EVENT, vcpu);
 
-   if (vcpu->requests) {
-   if (kvm_check_request(KVM_REQ_MMU_RELOAD, vcpu)) {
+   if (!vcpu->requests)
+   goto no_requests;
+
+   for_each_set_bit(req, &vcpu->requests, BITS_PER_LONG) {
+   clear_bit(req, &vcpu->requests);
+   switch (req) {
+   case KVM_REQ_MMU_RELOAD:
kvm_mmu_unload(vcpu);
r = kvm_mmu_reload(vcpu);
if (unlikely(r)) {
kvm_make_request(KVM_REQ_MMU_RELOAD, vcpu);
goto out;
}
-   }
-   if (kvm_check_request(KVM_REQ_MIGRATE_TIMER, vcpu))
+   break;
+   case KVM_REQ_MIGRATE_TIMER:
__kvm_migrate_timers(vcpu);
-   if (kvm_check_request(KVM_REQ_CLOCK_UPDATE, vcpu)) {
+   break;
+   case KVM_REQ_CLOCK_UPDATE:
r = kvm_guest_time_update(vcpu);
if (unlikely(r))
goto out;
-   }
-   if (kvm_check_request(KVM_REQ_MMU_SYNC, vcpu))
+   break;
+   case KVM_REQ_MMU_SYNC:
kvm_mmu_sync_roots(vcpu);
-   if (kvm_check_request(KVM_REQ_TLB_FLUSH, vcpu))
+   break;
+   case KVM_REQ_TLB_FLUSH:
kvm_x86_ops->tlb_flush(vcpu);
-   if (kvm_check_request(KVM_REQ_REPORT_TPR_ACCESS, vcpu)) {
+   break;
+   case KVM_REQ_REPORT_TPR_ACCESS:
vcpu->run->exit_reason = KVM_EXIT_TPR_ACCESS;
r = 0;
goto out;
-   }
-   if (kvm_check_request(KVM_REQ_TRIPLE_FAULT, vcpu)) {
+   case KVM_REQ_TRIPLE_FAULT:
vcpu->run->exit_reason = KVM_EXIT_SHUTDOWN;
r = 0;
goto out;
-   }
-   if (kvm_check_request(KVM_REQ_DEACTIVATE_FPU, vcpu)) {
+   case KVM_REQ_DEACTIVATE_FPU:
vcpu->fpu_active = 0;
kvm_x86_ops->fpu_deactivate(vcpu);
-   }
-   if (kvm_check_request(KVM_REQ_APF_HALT, vcpu)) {
+   break;
+   case KVM_REQ_APF_HALT:
/* Page is swapped out. Do synthetic halt */
vcpu->arch.apf.halted = true;
r = 1;
goto out;
-   }
-   if (kvm_check_request(KVM_REQ_STEAL_UPDATE, vcpu))
+   case KVM_REQ_STEAL_UPDATE:
record_steal_time(vcpu);
-   if (kvm_check_request(KVM_REQ_NMI, vcpu))
+   break;
+   case KVM_REQ_NMI:
process_nmi(vcpu);
-   req_immediate_exit =
-   kvm_check_request(KVM_REQ_IMMEDIATE_EXIT, vcpu);
-   if (kvm_check_request(KVM_REQ_PMU, vcpu))
+   break;
+   case KVM_REQ_IMMEDIATE_EXIT:
+   req_immediate_exit = true;
+   break;
+   case KVM_REQ_PMU:
kvm_handle_pmu_event(vcpu);
-   if (kvm_check_request(KVM_REQ_PMI, vcpu))
+   break;
+   case KVM_REQ_PMI:
kvm_deliver_pmi(vcpu);
-   if (kvm_check_request(KVM_REQ_EVENT, vcpu)) {
+   break;
+   case KVM_REQ_EVENT:
inject_pending_event(vcpu);
 
/* enable NMI/IRQ window open exits if needed */
@@ -5288,9 +5299,14 @@ static int vcpu_enter_guest(struct kvm_vcpu *vcpu)
 

[PATCH v3 5/6] KVM: Reorder KVM_REQ_EVENT to optimize processing

2012-07-09 Thread Avi Kivity
Since we now process events in order of their bit values, we need to keep
KVM_REQ_EVENT last.  This is so that if an event which may generate an
interrupt (say, PMU) happens, we can process the generated KVM_REQ_EVENT
before entering the critical section (and then aborting because vcpu->requests
has KVM_REQ_EVENT set).

Reorder the KVM_REQ_ definitions to make it so.

Signed-off-by: Avi Kivity 
---
 include/linux/kvm_host.h | 14 +++---
 1 file changed, 7 insertions(+), 7 deletions(-)

diff --git a/include/linux/kvm_host.h b/include/linux/kvm_host.h
index e3c86f8..6d71058 100644
--- a/include/linux/kvm_host.h
+++ b/include/linux/kvm_host.h
@@ -62,13 +62,13 @@
 #define KVM_REQ_CLOCK_UPDATE   8
 #define KVM_REQ_KICK   9
 #define KVM_REQ_DEACTIVATE_FPU10
-#define KVM_REQ_EVENT 11
-#define KVM_REQ_APF_HALT  12
-#define KVM_REQ_STEAL_UPDATE  13
-#define KVM_REQ_NMI   14
-#define KVM_REQ_IMMEDIATE_EXIT15
-#define KVM_REQ_PMU   16
-#define KVM_REQ_PMI   17
+#define KVM_REQ_APF_HALT  11
+#define KVM_REQ_STEAL_UPDATE  12
+#define KVM_REQ_NMI   13
+#define KVM_REQ_IMMEDIATE_EXIT14
+#define KVM_REQ_PMU   15
+#define KVM_REQ_PMI   16
+#define KVM_REQ_EVENT 17 /* Keep after any req which can cause an 
interrupt */
 
 #define KVM_USERSPACE_IRQ_SOURCE_ID0
 
-- 
1.7.11

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH v3 6/6] KVM: Clean up vcpu->requests == 0 processing

2012-07-09 Thread Avi Kivity
A previous patch introduced a goto to make the patch clearer.  This
patch cleans up the code but has no functionality changes.

Signed-off-by: Avi Kivity 
---
 arch/x86/kvm/x86.c | 148 ++---
 1 file changed, 72 insertions(+), 76 deletions(-)

diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index 9296dce..44a2c87 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -5226,86 +5226,82 @@ static int vcpu_enter_guest(struct kvm_vcpu *vcpu)
if (unlikely(req_int_win))
kvm_make_request(KVM_REQ_EVENT, vcpu);
 
-   if (!vcpu->requests)
-   goto no_requests;
-
-   for_each_set_bit(req, &vcpu->requests, BITS_PER_LONG) {
-   clear_bit(req, &vcpu->requests);
-   switch (req) {
-   case KVM_REQ_MMU_RELOAD:
-   kvm_mmu_unload(vcpu);
-   r = kvm_mmu_reload(vcpu);
-   if (unlikely(r)) {
-   kvm_make_request(KVM_REQ_MMU_RELOAD, vcpu);
+   if (vcpu->requests)
+   for_each_set_bit(req, &vcpu->requests, BITS_PER_LONG) {
+   clear_bit(req, &vcpu->requests);
+   switch (req) {
+   case KVM_REQ_MMU_RELOAD:
+   kvm_mmu_unload(vcpu);
+   r = kvm_mmu_reload(vcpu);
+   if (unlikely(r)) {
+   kvm_make_request(KVM_REQ_MMU_RELOAD, 
vcpu);
+   goto out;
+   }
+   break;
+   case KVM_REQ_MIGRATE_TIMER:
+   __kvm_migrate_timers(vcpu);
+   break;
+   case KVM_REQ_CLOCK_UPDATE:
+   r = kvm_guest_time_update(vcpu);
+   if (unlikely(r))
+   goto out;
+   break;
+   case KVM_REQ_MMU_SYNC:
+   kvm_mmu_sync_roots(vcpu);
+   break;
+   case KVM_REQ_TLB_FLUSH:
+   kvm_x86_ops->tlb_flush(vcpu);
+   break;
+   case KVM_REQ_REPORT_TPR_ACCESS:
+   vcpu->run->exit_reason = KVM_EXIT_TPR_ACCESS;
+   r = 0;
goto out;
-   }
-   break;
-   case KVM_REQ_MIGRATE_TIMER:
-   __kvm_migrate_timers(vcpu);
-   break;
-   case KVM_REQ_CLOCK_UPDATE:
-   r = kvm_guest_time_update(vcpu);
-   if (unlikely(r))
+   case KVM_REQ_TRIPLE_FAULT:
+   vcpu->run->exit_reason = KVM_EXIT_SHUTDOWN;
+   r = 0;
goto out;
-   break;
-   case KVM_REQ_MMU_SYNC:
-   kvm_mmu_sync_roots(vcpu);
-   break;
-   case KVM_REQ_TLB_FLUSH:
-   kvm_x86_ops->tlb_flush(vcpu);
-   break;
-   case KVM_REQ_REPORT_TPR_ACCESS:
-   vcpu->run->exit_reason = KVM_EXIT_TPR_ACCESS;
-   r = 0;
-   goto out;
-   case KVM_REQ_TRIPLE_FAULT:
-   vcpu->run->exit_reason = KVM_EXIT_SHUTDOWN;
-   r = 0;
-   goto out;
-   case KVM_REQ_DEACTIVATE_FPU:
-   vcpu->fpu_active = 0;
-   kvm_x86_ops->fpu_deactivate(vcpu);
-   break;
-   case KVM_REQ_APF_HALT:
-   /* Page is swapped out. Do synthetic halt */
-   vcpu->arch.apf.halted = true;
-   r = 1;
-   goto out;
-   case KVM_REQ_STEAL_UPDATE:
-   record_steal_time(vcpu);
-   break;
-   case KVM_REQ_NMI:
-   process_nmi(vcpu);
-   break;
-   case KVM_REQ_IMMEDIATE_EXIT:
-   req_immediate_exit = true;
-   break;
-   case KVM_REQ_PMU:
-   kvm_handle_pmu_event(vcpu);
-   break;
-   case KVM_REQ_PMI:
-   kvm_deliver_pmi(vcpu);
-   break;
-   case KVM_REQ_EVENT:
-   inject_pending_event(vcpu);
-
-   /* enable NMI/IRQ window open exits if needed */
-   if (vcpu->arch.nmi_pending)
-   kvm_

Re: [PATCH 2/2] KVM: PPC: booke: Add watchdog emulation

2012-07-09 Thread Scott Wood
On 07/07/2012 02:50 AM, Alexander Graf wrote:
> 
> On 07.07.2012, at 01:37, Scott Wood wrote:
> 
>> On 07/06/2012 08:17 AM, Alexander Graf wrote:
>>> On 28.06.2012, at 08:17, Bharat Bhushan wrote:
 +/*
 + * The timer system can almost deal with LONG_MAX timeouts, except that
 + * when you get very close to LONG_MAX, the slack added can cause 
 overflow.
 + *
 + * LONG_MAX/2 is a conservative threshold, but it should be adequate for
 + * any realistic use.
 + */
 +#define MAX_TIMEOUT (LONG_MAX/2)
>>>
>>> Should this really be in kvm code?
>>
>> It looks like we can use NEXT_TIMER_MAX_DELTA for this.
>>
 +  mask = 1ULL << (63 - period);
 +  tb = get_tb();
 +  if (tb & mask)
 +  nr_jiffies += mask;
>>>
>>> To be honest, you lost me here. nr_jiffies is jiffies, right? While
>>> mask is basically in timebase granularity. I suppose you're just
>>> reusing the variable here for no good reason - the compiler will
>>> gladly optimize things for you if you write things a bit more verbose
>>> :).
>>
>> Probably due to the way do_div() works, but yeah, it's confusing.  Maybe
>> something generic like "ticks", "interval", "remaining", etc. would be
>> better, with a comment on the do_div saying it's converting timebase
>> ticks into jiffies.
> 
> Well, you could start off with a variable "delta_tb", then do
> 
>   nr_jiffies = delta_tb;
>   x = do_div(...);
> 
> and things would suddenly become readable :). Of course I don't object to 
> comments along the code either :).
> 
>>
 +static void arm_next_watchdog(struct kvm_vcpu *vcpu)
 +{
 +  unsigned long nr_jiffies;
 +
 +  nr_jiffies = watchdog_next_timeout(vcpu);
 +  if (nr_jiffies < MAX_TIMEOUT)
 +  mod_timer(&vcpu->arch.wdt_timer, jiffies + nr_jiffies);
 +  else
 +  del_timer(&vcpu->arch.wdt_timer);
>>>
>>> Can you del a timer that's not armed? Could that ever happen in this case?
>>
>> "del_timer() deactivates a timer - this works on both active and
>> inactive timers."
> 
> Ah, good :).
> 
>>
>>> Also, could the timer possibly be running somewhere, so do we need 
>>> del_timer_sync? Or don't we need to care?
>>
>> This can be called in the context of the timer, so del_timer_sync()
>> would hang.
>>
>> As for what would happen if a caller from a different context were to
>> race with a timer, I think you could end up with the timer armed based
>> on an old TCR.  del_timer_sync() won't help though, unless you replace
>> mod_timer() with del_timer_sync() plus add_timer() (with a check to see
>> whether it's running in timer context).  A better solution is probably
>> to use a spinlock in arm_next_watchdog().
> 
> Yup. Either way, we have a race that the guest might not expect.
> 
>>
 +void kvmppc_watchdog_func(unsigned long data)
 +{
 +  struct kvm_vcpu *vcpu = (struct kvm_vcpu *)data;
 +  u32 tsr, new_tsr;
 +  int final;
 +
 +  do {
 +  new_tsr = tsr = vcpu->arch.tsr;
 +  final = 0;
 +
 +  /* Time out event */
 +  if (tsr & TSR_ENW) {
 +  if (tsr & TSR_WIS) {
 +  new_tsr = (tsr & ~TCR_WRC_MASK) |
 +(vcpu->arch.tcr & TCR_WRC_MASK);
 +  vcpu->arch.tcr &= ~TCR_WRC_MASK;
 +  final = 1;
 +  } else {
 +  new_tsr = tsr | TSR_WIS;
 +  }
 +  } else {
 +  new_tsr = tsr | TSR_ENW;
 +  }
 +  } while (cmpxchg(&vcpu->arch.tsr, tsr, new_tsr) != tsr);
 +
 +  if (new_tsr & (TSR_WIS | TCR_WRC_MASK)) {
 +  smp_wmb();
 +  kvm_make_request(KVM_REQ_PENDING_TIMER, vcpu);
 +  kvm_vcpu_kick(vcpu);
 +  }
 +
 +  /*
 +   * Avoid getting a storm of timers if the guest sets
 +   * the period very short.  We'll restart it if anything
 +   * changes.
 +   */
 +  if (!final)
 +  arm_next_watchdog(vcpu);
>>>
>>> Mind to explain this part a bit further?
>>
>> The whole function, or some subset near the end?
>>
>> The "if (!final)" check means that we stop running the timer after final
>> expiration, to prevent the host from being flooded with timers if the
>> guest sets a short period but does not have TCR set to exit to QEMU.
>> Timers will resume the next time TSR/TCR is updated.
> 
> Ah. The semantics make sense. The comment however is slightly too short. 
> Please explain this in a more verbose way, so someone who didn't write the 
> code knows what's going on :).
> 
>>
 @@ -1106,7 +1213,14 @@ static int set_sregs_base(struct kvm_vcpu *vcpu,
}

if (sregs->u.e.update_special & KVM_SREGS_E_UPDATE_TSR) {
 +  u32 old_tsr = vcpu->arch.tsr;
 +
vcpu->arch.tsr = sregs->u.e.tsr;
 +
 +  if ((old_tsr ^ vcpu-

Re: [PATCH 2/2] KVM: PPC: booke: Add watchdog emulation

2012-07-09 Thread Alexander Graf

On 09.07.2012, at 19:09, Scott Wood wrote:

> On 07/07/2012 02:50 AM, Alexander Graf wrote:
>> 
>> On 07.07.2012, at 01:37, Scott Wood wrote:
>> 
>>> On 07/06/2012 08:17 AM, Alexander Graf wrote:
 On 28.06.2012, at 08:17, Bharat Bhushan wrote:
> +/*
> + * The timer system can almost deal with LONG_MAX timeouts, except that
> + * when you get very close to LONG_MAX, the slack added can cause 
> overflow.
> + *
> + * LONG_MAX/2 is a conservative threshold, but it should be adequate for
> + * any realistic use.
> + */
> +#define MAX_TIMEOUT (LONG_MAX/2)
 
 Should this really be in kvm code?
>>> 
>>> It looks like we can use NEXT_TIMER_MAX_DELTA for this.
>>> 
> + mask = 1ULL << (63 - period);
> + tb = get_tb();
> + if (tb & mask)
> + nr_jiffies += mask;
 
 To be honest, you lost me here. nr_jiffies is jiffies, right? While
 mask is basically in timebase granularity. I suppose you're just
 reusing the variable here for no good reason - the compiler will
 gladly optimize things for you if you write things a bit more verbose
 :).
>>> 
>>> Probably due to the way do_div() works, but yeah, it's confusing.  Maybe
>>> something generic like "ticks", "interval", "remaining", etc. would be
>>> better, with a comment on the do_div saying it's converting timebase
>>> ticks into jiffies.
>> 
>> Well, you could start off with a variable "delta_tb", then do
>> 
>>  nr_jiffies = delta_tb;
>>  x = do_div(...);
>> 
>> and things would suddenly become readable :). Of course I don't object to 
>> comments along the code either :).
>> 
>>> 
> +static void arm_next_watchdog(struct kvm_vcpu *vcpu)
> +{
> + unsigned long nr_jiffies;
> +
> + nr_jiffies = watchdog_next_timeout(vcpu);
> + if (nr_jiffies < MAX_TIMEOUT)
> + mod_timer(&vcpu->arch.wdt_timer, jiffies + nr_jiffies);
> + else
> + del_timer(&vcpu->arch.wdt_timer);
 
 Can you del a timer that's not armed? Could that ever happen in this case?
>>> 
>>> "del_timer() deactivates a timer - this works on both active and
>>> inactive timers."
>> 
>> Ah, good :).
>> 
>>> 
 Also, could the timer possibly be running somewhere, so do we need 
 del_timer_sync? Or don't we need to care?
>>> 
>>> This can be called in the context of the timer, so del_timer_sync()
>>> would hang.
>>> 
>>> As for what would happen if a caller from a different context were to
>>> race with a timer, I think you could end up with the timer armed based
>>> on an old TCR.  del_timer_sync() won't help though, unless you replace
>>> mod_timer() with del_timer_sync() plus add_timer() (with a check to see
>>> whether it's running in timer context).  A better solution is probably
>>> to use a spinlock in arm_next_watchdog().
>> 
>> Yup. Either way, we have a race that the guest might not expect.
>> 
>>> 
> +void kvmppc_watchdog_func(unsigned long data)
> +{
> + struct kvm_vcpu *vcpu = (struct kvm_vcpu *)data;
> + u32 tsr, new_tsr;
> + int final;
> +
> + do {
> + new_tsr = tsr = vcpu->arch.tsr;
> + final = 0;
> +
> + /* Time out event */
> + if (tsr & TSR_ENW) {
> + if (tsr & TSR_WIS) {
> + new_tsr = (tsr & ~TCR_WRC_MASK) |
> +   (vcpu->arch.tcr & TCR_WRC_MASK);
> + vcpu->arch.tcr &= ~TCR_WRC_MASK;
> + final = 1;
> + } else {
> + new_tsr = tsr | TSR_WIS;
> + }
> + } else {
> + new_tsr = tsr | TSR_ENW;
> + }
> + } while (cmpxchg(&vcpu->arch.tsr, tsr, new_tsr) != tsr);
> +
> + if (new_tsr & (TSR_WIS | TCR_WRC_MASK)) {
> + smp_wmb();
> + kvm_make_request(KVM_REQ_PENDING_TIMER, vcpu);
> + kvm_vcpu_kick(vcpu);
> + }
> +
> + /*
> +  * Avoid getting a storm of timers if the guest sets
> +  * the period very short.  We'll restart it if anything
> +  * changes.
> +  */
> + if (!final)
> + arm_next_watchdog(vcpu);
 
 Mind to explain this part a bit further?
>>> 
>>> The whole function, or some subset near the end?
>>> 
>>> The "if (!final)" check means that we stop running the timer after final
>>> expiration, to prevent the host from being flooded with timers if the
>>> guest sets a short period but does not have TCR set to exit to QEMU.
>>> Timers will resume the next time TSR/TCR is updated.
>> 
>> Ah. The semantics make sense. The comment however is slightly too short. 
>> Please explain this in a more verbose way, so someone who didn't write the 
>> code knows what's going on :).
>> 
>>> 
> @@ -1106,7 +1213,14 @@ static int set_sregs_base(struct kvm_vcpu *vcpu,
>   }
> 
>   if (sregs->u.e.update_special & KVM_SREGS_E_UPDATE

Re: [PATCH 2/6] rcu: Allow rcu_user_enter()/exit() to nest

2012-07-09 Thread Frederic Weisbecker
On Sun, Jul 08, 2012 at 06:54:18PM +0300, Avi Kivity wrote:
> On 07/06/2012 03:00 PM, Frederic Weisbecker wrote:
> > Allow calls to rcu_user_enter() even if we are already
> > in userspace (as seen by RCU) and allow calls to rcu_user_exit()
> > even if we are already in the kernel.
> > 
> > This makes the APIs more flexible to be called from architectures.
> > Exception entries for example won't need to know if they come from
> > userspace before calling rcu_user_exit().
> 
> I guess I should switch kvm to rcu_user_enter() and co, so we can
> disable the tick while running in a guest.  But where are those
> functions?  What are the rules for calling them?

I guess we need to have a closer look at the guest case first. We probably need
to take some care about specifics in time and load accounting usually
handled by the tick before we can shut it down. RCU is only part of the
problem.
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [net-next RFC V5 5/5] virtio_net: support negotiating the number of queues through ctrl vq

2012-07-09 Thread Ben Hutchings
On Thu, 2012-07-05 at 18:29 +0800, Jason Wang wrote:
> This patch let the virtio_net driver can negotiate the number of queues it
> wishes to use through control virtqueue and export an ethtool interface to let
> use tweak it.
> 
> As current multiqueue virtio-net implementation has optimizations on per-cpu
> virtuqueues, so only two modes were support:
> 
> - single queue pair mode
> - multiple queue paris mode, the number of queues matches the number of vcpus
> 
> The single queue mode were used by default currently due to regression of
> multiqueue mode in some test (especially in stream test).
> 
> Since virtio core does not support paritially deleting virtqueues, so during
> mode switching the whole virtqueue were deleted and the driver would re-create
> the virtqueues it would used.
> 
> btw. The queue number negotiating were defered to .ndo_open(), this is because
> only after feature negotitaion could we send the command to control virtqueue
> (as it may also use event index).
[...]
> +static int virtnet_set_channels(struct net_device *dev,
> + struct ethtool_channels *channels)
> +{
> + struct virtnet_info *vi = netdev_priv(dev);
> + u16 queues = channels->rx_count;
> + unsigned status = VIRTIO_CONFIG_S_ACKNOWLEDGE | VIRTIO_CONFIG_S_DRIVER;
> +
> + if (channels->rx_count != channels->tx_count)
> + return -EINVAL;
[...]
> +static void virtnet_get_channels(struct net_device *dev,
> +  struct ethtool_channels *channels)
> +{
> + struct virtnet_info *vi = netdev_priv(dev);
> +
> + channels->max_rx = vi->total_queue_pairs;
> + channels->max_tx = vi->total_queue_pairs;
> + channels->max_other = 0;
> + channels->max_combined = 0;
> + channels->rx_count = vi->num_queue_pairs;
> + channels->tx_count = vi->num_queue_pairs;
> + channels->other_count = 0;
> + channels->combined_count = 0;
> +}
[...]

It looks like the queue-pairs should be treated as 'combined channels',
not separate RX and TX channels.  Also you don't need to clear the other
members; you can assume that the ethtool core will zero-initialise
structures for 'get' operations.

Ben.

-- 
Ben Hutchings, Staff Engineer, Solarflare
Not speaking for my employer; that's the marketing department's job.
They asked us to note that Solarflare product names are trademarked.

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v3 2/2] kvm: KVM_EOIFD, an eventfd for EOIs

2012-07-09 Thread Alex Williamson
On Thu, 2012-07-05 at 18:53 +0300, Michael S. Tsirkin wrote:
> On Wed, Jul 04, 2012 at 10:24:54PM -0600, Alex Williamson wrote:
> > On Wed, 2012-07-04 at 17:00 +0300, Michael S. Tsirkin wrote:
> > > On Tue, Jul 03, 2012 at 01:21:29PM -0600, Alex Williamson wrote:
> > > > This new ioctl enables an eventfd to be triggered when an EOI is
> > > > written for a specified irqchip pin.  By default this is a simple
> > > > notification, but we can also tie the eoifd to a level irqfd, which
> > > > enables the irqchip pin to be automatically de-asserted on EOI.
> > > > This mode is particularly useful for device-assignment applications
> > > > where the unmask and notify triggers a hardware unmask.  The default
> > > > mode is most applicable to simple notify with no side-effects for
> > > > userspace usage, such as Qemu.
> > > > 
> > > > Here we make use of the reference counting of the _irq_source
> > > > object allowing us to share it with an irqfd and cleanup regardless
> > > > of the release order.
> > > > 
> > > > Signed-off-by: Alex Williamson 
> > > > ---
> > > > 
> > > >  Documentation/virtual/kvm/api.txt |   21 
> > > >  arch/x86/kvm/x86.c|1 
> > > >  include/linux/kvm.h   |   14 ++
> > > >  include/linux/kvm_host.h  |   13 ++
> > > >  virt/kvm/eventfd.c|  208 
> > > > +
> > > >  virt/kvm/kvm_main.c   |   11 ++
> > > >  6 files changed, 266 insertions(+), 2 deletions(-)
> > > > 
> > > > diff --git a/Documentation/virtual/kvm/api.txt 
> > > > b/Documentation/virtual/kvm/api.txt
> > > > index c7267d5..a38af14 100644
> > > > --- a/Documentation/virtual/kvm/api.txt
> > > > +++ b/Documentation/virtual/kvm/api.txt
> > > > @@ -1988,6 +1988,27 @@ to independently assert level interrupts.  The 
> > > > KVM_IRQFD_FLAG_LEVEL
> > > >  is only necessary on setup, teardown is identical to that above.
> > > >  KVM_IRQFD_FLAG_LEVEL support is indicated by KVM_CAP_IRQFD_LEVEL.
> > > >  
> > > > +4.77 KVM_EOIFD
> > > > +
> > > > +Capability: KVM_CAP_EOIFD
> > > 
> > > Maybe add a specific capability KVM_CAP_EOIFD_LEVEL_IRQFD too?
> > 
> > Good idea, allows them to be split later.
> > 
> > > > +Architectures: x86
> > > > +Type: vm ioctl
> > > > +Parameters: struct kvm_eoifd (in)
> > > > +Returns: 0 on success, -1 on error
> > > > +
> > > > +KVM_EOIFD allows userspace to receive interrupt EOI notification
> > > > +through an eventfd.  kvm_eoifd.fd specifies the eventfd used for
> > > > +notification and kvm_eoifd.gsi specifies the irchip pin, similar to
> > > > +KVM_IRQFD.  The eoifd is removed using the KVM_EOIFD_FLAG_DEASSIGN
> > > > +flag, specifying both kvm_eoifd.fd and kvm_eoifd.gsi.
> > > 
> > > This text reads like it would give you EOIs for any GSI,
> > > but you haven't actually implemented this for edge GSIs - and
> > > making it work would bloat the datapath for fast (edge) interrupts.
> > 
> > I do allow you to register any GSI, but you won't be getting EOIs unless
> > it's operating in level triggered mode.  Perhaps it's best to specify it
> > as unsupported and let some future patch create a new capability if
> > support is added.  I'll add a comment.
> > 
> > > What's the intended use of this part of the interface? qemu with
> > > irqchip disabled?
> > 
> > VFIO should not be dependent on KVM, therefore when kvm is not enabled
> > we need to add an interface in qemu for devices to be notified of eoi.
> >
> > This doesn't currently exist.  VFIO can take additional advantage of
> > irqchip when it is enabled, thus the interface below.  However, I don't
> > feel I can propose an eoi notifier in qemu that stops working as soon as
> > irqchip is enabled, even if I'm the only user.  This theoretical qemu
> > eoi notifier could then use the above when irqchip is enabled.
> 
> Well internal qemu APIs are qemu's problem and can be addressed there.
> For example, can we make it mimic our interface: make qemu EOI notifier
> accept an object that includes qemu_irq without irqchip and irqfd with?

So the qemu API changes based on whether irqchip is enabled or not?!

> In other words adding interface with no users looks weird.

If we could ever finish this patch, I could start working on the
user... ;)

> > > > +
> > > > +The KVM_EOIFD_FLAG_LEVEL_IRQFD flag indicates that the provided
> > > > +kvm_eoifd stucture includes a valid kvm_eoifd.irqfd file descriptor
> > > > +for a level irqfd configured using the KVM_IRQFD_FLAG_LEVEL flag.
> > > > +In this mode the level interrupt is de-asserted prior to EOI eventfd
> > > > +notification.  The KVM_EOIFD_FLAG_LEVEL_IRQFD is only necessary on
> > > > +setup, teardown is identical to that above.
> > > 
> > > It seems a single fd can not be used multiple times for
> > > different irqfds? In that case:
> > > 1. Need to document this limitation
> > 
> > Ok
> > 
> > > 2. This differs from other notifiers e.g.  ioeventfd.
> > 
> > But the same as an irqfd.
> 
> However irqfd is ab

KVM call agenda for Tuesday, July 10th

2012-07-09 Thread Juan Quintela

Hi

Please send in any agenda items you are interested in covering.

Later, Juan.
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


KVM VM's disappeared

2012-07-09 Thread ToddAndMargo

Hi All,

I am in trouble here.  I would really appreciate any help
you guys an spare.

Scientific Linux 6.2, 64 bit.  (Red Hat Enterprise Linux 6.2 clone)

$ rpm -qa \*qemu\*
qemu-kvm-0.12.1.2-2.209.el6_2.4.x86_64
qemu-img-0.12.1.2-2.209.el6_2.4.x86_64
gpxe-roms-qemu-0.9.7-6.9.el6.noarch

$ uname -r
2.6.32-220.23.1.el6.x86_64

When I fired up my KVM Virtual Machine Manager (virt-manager)
this morning, four of my seven virtual machines disappeared,
including the one is desperately need.

Checking /etc/libvirt/qemu and they are all there.  Same
attributes too.  Checking where I put the virtual hard
drives and they are all there too.

Okay, so I try firing up the three that remain, I get the
following error message:

 Error starting domain: unsupported configuration: spice
 TLS port set in XML configuration, but TLS is disabled
 in qemu.conf

Yes, each VM has a different spice port set so I can tell them
apart.  This has always worked smoothly.

Huh? qemu.conf is the default.  The one with everything
commented out.  I even checked my backup: no change in
qemu.conf.

Checking /var/log/libvirt/libvirtd.log gives:

2012-07-09 19:36:41.957+: 2821: info : libvirt version: 0.9.10, 
package: 21.el6 (Scientific Linux, 2012-06-22-02:34:35, sl6.fnal.gov)
2012-07-09 19:36:41.957+: 2821: error : virDomainDefParseXML:8871 : 
Maximum CPUs greater than topology limit
2012-07-09 19:36:41.959+: 2821: error : virDomainDefParseXML:8871 : 
Maximum CPUs greater than topology limit
2012-07-09 19:36:41.959+: 2821: error : virDomainDefParseXML:8871 : 
Maximum CPUs greater than topology limit
2012-07-09 19:36:41.960+: 2821: error : virDomainDefParseXML:8871 : 
Maximum CPUs greater than topology limit
2012-07-09 19:47:59.089+: 2811: error : qemuBuildCommandLine:5526 : 
unsupported configuration: spice TLS port set in XML configuration, but 
TLS is disabled in qemu.conf


Again with the spice port error.

The only thing I did to my system between working yesterday and
not working today was downgrade my flash-plugin.

I tried setting "spice_tls = 1" in qemu.conf, but the other
four VM still do not show up.  Spice lays an egg on the ones
that do show up, so I set spice_tls back to commented out.

I removed qemu-kvm-0.12.1.2-2.209.el6_2.4.x86_64,
rebooted, reinstalled, rebooted.  No symptom change.

What is the world?  I can not find anything wrong!

Many thanks,
-T

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH RFC 0/2] kvm: Improving directed yield in PLE handler

2012-07-09 Thread Andrew Theurer
On Mon, 2012-07-09 at 11:50 +0530, Raghavendra K T wrote:
> Currently Pause Looop Exit (PLE) handler is doing directed yield to a
> random VCPU on PL exit. Though we already have filtering while choosing
> the candidate to yield_to, we can do better.

Hi, Raghu.

> Problem is, for large vcpu guests, we have more probability of yielding
> to a bad vcpu. We are not able to prevent directed yield to same guy who
> has done PL exit recently, who perhaps spins again and wastes CPU.
> 
> Fix that by keeping track of who has done PL exit. So The Algorithm in series
> give chance to a VCPU which has:
> 
>  (a) Not done PLE exit at all (probably he is preempted lock-holder)
> 
>  (b) VCPU skipped in last iteration because it did PL exit, and probably
>  has become eligible now (next eligible lock holder)
> 
> Future enhancemnets:
>   (1) Currently we have a boolean to decide on eligibility of vcpu. It
> would be nice if I get feedback on guest (>32 vcpu) whether we can
> improve better with integer counter. (with counter = say f(log n )).
>   
>   (2) We have not considered system load during iteration of vcpu. With
>that information we can limit the scan and also decide whether schedule()
>is better. [ I am able to use #kicked vcpus to decide on this But may
>be there are better ideas like information from global loadavg.]
> 
>   (3) We can exploit this further with PV patches since it also knows about
>next eligible lock-holder.
> 
> Summary: There is a huge improvement for moderate / no overcommit scenario
>  for kvm based guest on PLE machine (which is difficult ;) ).
> 
> Result:
> Base : kernel 3.5.0-rc5 with Rik's Ple handler fix
> 
> Machine : Intel(R) Xeon(R) CPU X7560  @ 2.27GHz, 4 numa node, 256GB RAM,
>   32 core machine

Is this with HT enabled, therefore 64 CPU threads?

> Host: enterprise linux  gcc version 4.4.6 20120305 (Red Hat 4.4.6-4) (GCC)
>   with test kernels 
> 
> Guest: fedora 16 with 32 vcpus 8GB memory. 

Can you briefly explain the 1x and 2x configs?  This of course is highly
dependent whether or not HT is enabled...

FWIW, I started testing what I would call "0.5x", where I have one 40
vcpu guest running on a host with 40 cores and 80 CPU threads total (HT
enabled, no extra load on the system).  For ebizzy, the results are
quite erratic from run to run, so I am inclined to discard it as a
workload, but maybe I should try "1x" and "2x" cpu over-commit as well.

>From initial observations, at least for the ebizzy workload, the
percentage of exits that result in a yield_to() are very low, around 1%,
before these patches.  So, I am concerned that at least for this test,
reducing that number even more has diminishing returns.  I am however
still concerned about the scalability problem with yield_to(), which
shows like this for me (perf):

> 63.56% 282095 qemu-kvm  [kernel.kallsyms][k] 
> _raw_spin_lock  
> 5.42%  24420 qemu-kvm  [kvm][k] 
> kvm_vcpu_yield_to   
> 5.33%  26481 qemu-kvm  [kernel.kallsyms][k] get_pid_task  
>   
> 4.35%  20049 qemu-kvm  [kernel.kallsyms][k] yield_to  
>   
> 2.74%  15652 qemu-kvm  [kvm][k] 
> kvm_apic_present
> 1.70%   8657 qemu-kvm  [kvm][k] 
> kvm_vcpu_on_spin
> 1.45%   7889 qemu-kvm  [kvm][k] 
> vcpu_enter_guest
 
For the cpu threads in the host that are actually active (in this case
1/2 of them), ~50% of their time is in kernel and ~43% in guest.  This
is for a no-IO workload, so that's just incredible to see so much cpu
wasted.  I feel that 2 important areas to tackle are a more scalable
yield_to() and reducing the number of pause exits itself (hopefully by
just tuning ple_window for the latter).

Honestly, I not confident addressing this problem will improve the
ebizzy score. That workload is so erratic for me, that I do not trust
the results at all.  I have however seen consistent improvements in
disabling PLE for a http guest workload and a very high IOPS guest
workload, both with much time spent in host in the double runqueue lock
for yield_to(), so that's why I still gravitate toward that issue.


-Andrew Theurer

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


KVM VM's disappeared

2012-07-09 Thread ToddAndMargo

Hi All,

I am in trouble here.  I would really appreciate any help
you guys an spare.

Scientific Linux 6.2, 64 bit.  (Red Hat Enterprise Linux 6.2 clone)

$ rpm -qa \*qemu\*
qemu-kvm-0.12.1.2-2.209.el6_2.4.x86_64
qemu-img-0.12.1.2-2.209.el6_2.4.x86_64
gpxe-roms-qemu-0.9.7-6.9.el6.noarch

$ uname -r
2.6.32-220.23.1.el6.x86_64

When I fired up my KVM Virtual Machine Manager (virt-manager)
this morning, four of my seven virtual machines disappeared,
including the one is desperately need.

Checking /etc/libvirt/qemu and they are all there.  Same
attributes too.  Checking where I put the virtual hard
drives and they are all there too.

Okay, so I try firing up the three that remain, I get the
following error message:

 Error starting domain: unsupported configuration: spice
 TLS port set in XML configuration, but TLS is disabled
 in qemu.conf

Yes, each VM has a different spice port set so I can tell them
apart.  This has always worked smoothly.

Huh? qemu.conf is the default.  The one with everything
commented out.  I even checked my backup: no change in
qemu.conf.

Checking /var/log/libvirt/libvirtd.log gives:

2012-07-09 19:36:41.957+: 2821: info : libvirt version: 0.9.10, package: 
21.el6 (Scientific Linux, 2012-06-22-02:34:35, sl6.fnal.gov)
2012-07-09 19:36:41.957+: 2821: error : virDomainDefParseXML:8871 : Maximum 
CPUs greater than topology limit
2012-07-09 19:36:41.959+: 2821: error : virDomainDefParseXML:8871 : Maximum 
CPUs greater than topology limit
2012-07-09 19:36:41.959+: 2821: error : virDomainDefParseXML:8871 : Maximum 
CPUs greater than topology limit
2012-07-09 19:36:41.960+: 2821: error : virDomainDefParseXML:8871 : Maximum 
CPUs greater than topology limit
2012-07-09 19:47:59.089+: 2811: error : qemuBuildCommandLine:5526 : 
unsupported configuration: spice TLS port set in XML configuration, but TLS is 
disabled in qemu.conf

Again with the spice port error.

The only thing I did to my system between working yesterday and
not working today was downgrade my flash-plugin.

I tried setting "spice_tls = 1" in qemu.conf, but the other
four VM still do not show up.  Spice lays an egg on the ones
that do show up, so I set spice_tls back to commented out.

I removed qemu-kvm-0.12.1.2-2.209.el6_2.4.x86_64,
rebooted, reinstalled, rebooted.  No symptom change.

What is the world?  I can not find anything wrong!

Many thanks,
-T



Another symptom.  I can not edit ANY of the VMs:

# virsh edit KVM-W8.xml
error: failed to get domain 'KVM-W8.xml'
error: Domain not found: no domain with matching name 'KVM-W8.xml'




--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


KVM VM's disappeared

2012-07-09 Thread ToddAndMargo

Hi All,

I am in trouble here.  I would really appreciate any help
you guys an spare.

Scientific Linux 6.2, 64 bit.  (Red Hat Enterprise Linux 6.2 clone)

$ rpm -qa \*qemu\*
qemu-kvm-0.12.1.2-2.209.el6_2.4.x86_64
qemu-img-0.12.1.2-2.209.el6_2.4.x86_64
gpxe-roms-qemu-0.9.7-6.9.el6.noarch

$ uname -r
2.6.32-220.23.1.el6.x86_64

When I fired up my KVM Virtual Machine Manager (virt-manager)
this morning, four of my seven virtual machines disappeared,
including the one is desperately need.

Checking /etc/libvirt/qemu and they are all there.  Same
attributes too.  Checking where I put the virtual hard
drives and they are all there too.

Okay, so I try firing up the three that remain, I get the
following error message:

 Error starting domain: unsupported configuration: spice
 TLS port set in XML configuration, but TLS is disabled
 in qemu.conf

Yes, each VM has a different spice port set so I can tell them
apart.  This has always worked smoothly.

Huh? qemu.conf is the default.  The one with everything
commented out.  I even checked my backup: no change in
qemu.conf.

Checking /var/log/libvirt/libvirtd.log gives:

2012-07-09 19:36:41.957+: 2821: info : libvirt version: 0.9.10, package: 
21.el6 (Scientific Linux, 2012-06-22-02:34:35, sl6.fnal.gov)
2012-07-09 19:36:41.957+: 2821: error : virDomainDefParseXML:8871 : Maximum 
CPUs greater than topology limit
2012-07-09 19:36:41.959+: 2821: error : virDomainDefParseXML:8871 : Maximum 
CPUs greater than topology limit
2012-07-09 19:36:41.959+: 2821: error : virDomainDefParseXML:8871 : Maximum 
CPUs greater than topology limit
2012-07-09 19:36:41.960+: 2821: error : virDomainDefParseXML:8871 : Maximum 
CPUs greater than topology limit
2012-07-09 19:47:59.089+: 2811: error : qemuBuildCommandLine:5526 : 
unsupported configuration: spice TLS port set in XML configuration, but TLS is 
disabled in qemu.conf

Again with the spice port error.

The only thing I did to my system between working yesterday and
not working today was downgrade my flash-plugin.

I tried setting "spice_tls = 1" in qemu.conf, but the other
four VM still do not show up.  Spice lays an egg on the ones
that do show up, so I set spice_tls back to commented out.

I removed qemu-kvm-0.12.1.2-2.209.el6_2.4.x86_64,
rebooted, reinstalled, rebooted.  No symptom change.

What is the world?  I can not find anything wrong!

Many thanks,
-T



Another symptom.  I can not edit ANY of the VMs:

# virsh edit KVM-W8.xml
error: failed to get domain 'KVM-W8.xml'
error: Domain not found: no domain with matching name 'KVM-W8.xml'


Same thing when I dropped the .xml from the name.  The three that
show up work, the other four do not


--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


KVM VM's disappeared

2012-07-09 Thread ToddAndMargo

>> Hi All,
>>
>> I am in trouble here.  I would really appreciate any help
>> you guys an spare.
>>
>> Scientific Linux 6.2, 64 bit.  (Red Hat Enterprise Linux 6.2 clone)
>>
>> $ rpm -qa \*qemu\*
>> qemu-kvm-0.12.1.2-2.209.el6_2.4.x86_64
>> qemu-img-0.12.1.2-2.209.el6_2.4.x86_64
>> gpxe-roms-qemu-0.9.7-6.9.el6.noarch
>>
>> $ uname -r
>> 2.6.32-220.23.1.el6.x86_64
>>
>> When I fired up my KVM Virtual Machine Manager (virt-manager)
>> this morning, four of my seven virtual machines disappeared,
>> including the one is desperately need.
>>
>> Checking /etc/libvirt/qemu and they are all there.  Same
>> attributes too.  Checking where I put the virtual hard
>> drives and they are all there too.
>>
>> Okay, so I try firing up the three that remain, I get the
>> following error message:
>>
>>  Error starting domain: unsupported configuration: spice
>>  TLS port set in XML configuration, but TLS is disabled
>>  in qemu.conf
>>
>> Yes, each VM has a different spice port set so I can tell them
>> apart.  This has always worked smoothly.
>>
>> Huh? qemu.conf is the default.  The one with everything
>> commented out.  I even checked my backup: no change in
>> qemu.conf.
>>
>> Checking /var/log/libvirt/libvirtd.log gives:
>>
>> 2012-07-09 19:36:41.957+: 2821: info : libvirt version: 0.9.10, 
package: 21.el6 (Scientific Linux, 2012-06-22-02:34:35, sl6.fnal.gov)
>> 2012-07-09 19:36:41.957+: 2821: error : 
virDomainDefParseXML:8871 : Maximum CPUs greater than topology limit
>> 2012-07-09 19:36:41.959+: 2821: error : 
virDomainDefParseXML:8871 : Maximum CPUs greater than topology limit
>> 2012-07-09 19:36:41.959+: 2821: error : 
virDomainDefParseXML:8871 : Maximum CPUs greater than topology limit
>> 2012-07-09 19:36:41.960+: 2821: error : 
virDomainDefParseXML:8871 : Maximum CPUs greater than topology limit
>> 2012-07-09 19:47:59.089+: 2811: error : 
qemuBuildCommandLine:5526 : unsupported configuration: spice TLS port 
set in XML configuration, but TLS is disabled in qemu.conf

>>
>> Again with the spice port error.
>>
>> The only thing I did to my system between working yesterday and
>> not working today was downgrade my flash-plugin.
>>
>> I tried setting "spice_tls = 1" in qemu.conf, but the other
>> four VM still do not show up.  Spice lays an egg on the ones
>> that do show up, so I set spice_tls back to commented out.
>>
>> I removed qemu-kvm-0.12.1.2-2.209.el6_2.4.x86_64,
>> rebooted, reinstalled, rebooted.  No symptom change.
>>
>> What is the world?  I can not find anything wrong!
>>
>> Many thanks,
>> -T
>
>
> Another symptom.  I can not edit ANY of the VMs:
>
> # virsh edit KVM-W8.xml
> error: failed to get domain 'KVM-W8.xml'
> error: Domain not found: no domain with matching name 'KVM-W8.xml'

Same thing when I dropped the .xml from the name.  The three that
show up work, the other four do not


--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH RFC 0/2] kvm: Improving directed yield in PLE handler

2012-07-09 Thread Rik van Riel

On 07/09/2012 02:20 AM, Raghavendra K T wrote:

Currently Pause Looop Exit (PLE) handler is doing directed yield to a
random VCPU on PL exit. Though we already have filtering while choosing
the candidate to yield_to, we can do better.

Problem is, for large vcpu guests, we have more probability of yielding
to a bad vcpu. We are not able to prevent directed yield to same guy who
has done PL exit recently, who perhaps spins again and wastes CPU.

Fix that by keeping track of who has done PL exit. So The Algorithm in series
give chance to a VCPU which has:

  (a) Not done PLE exit at all (probably he is preempted lock-holder)

  (b) VCPU skipped in last iteration because it did PL exit, and probably
  has become eligible now (next eligible lock holder)

Future enhancemnets:


Your patch series looks good to me. Simple changes with a
significant result.

However, the simple heuristic could use some comments :)

--
All rights reversed
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH RFC 2/2] kvm PLE handler: Choose better candidate for directed yield

2012-07-09 Thread Rik van Riel

On 07/09/2012 02:20 AM, Raghavendra K T wrote:


+bool kvm_arch_vcpu_check_and_update_eligible(struct kvm_vcpu *vcpu)
+{
+   bool eligible;
+
+   eligible = !vcpu->arch.plo.pause_loop_exited ||
+   (vcpu->arch.plo.pause_loop_exited&&
+vcpu->arch.plo.dy_eligible);
+
+   if (vcpu->arch.plo.pause_loop_exited)
+   vcpu->arch.plo.dy_eligible = !vcpu->arch.plo.dy_eligible;
+
+   return eligible;
+}


This is a nice simple mechanism to skip CPUs that were
eligible last time and had pause loop exits recently.

However, it could stand some documentation.  Please
add a good comment explaining how and why the algorithm
works, when arch.plo.pause_loop_exited is cleared, etc...

It would be good to make this heuristic understandable
to people who look at the code for the first time.

--
All rights reversed
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH RFC 1/2] kvm vcpu: Note down pause loop exit

2012-07-09 Thread Rik van Riel

On 07/09/2012 02:20 AM, Raghavendra K T wrote:


@@ -484,6 +484,13 @@ struct kvm_vcpu_arch {
u64 length;
u64 status;
} osvw;
+
+   /* Pause loop exit optimization */
+   struct {
+   bool pause_loop_exited;
+   bool dy_eligible;
+   } plo;


I know kvm_vcpu_arch is traditionally not a well documented
structure, but it would be really nice if each variable inside
this sub-structure could get some documentation.

Also, do we really want to introduce another acronym here?

Or would we be better off simply calling this struct .ple,
since that is a name people are already familiar with.

--
All rights reversed
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


KVM VM's disappeared

2012-07-09 Thread ToddAndMargo

Hi All,

I am in trouble here.  I would really appreciate any help
you guys an spare.

Scientific Linux 6.2, 64 bit.  (Red Hat Enterprise Linux 6.2 clone)

$ rpm -qa \*qemu\*
qemu-kvm-0.12.1.2-2.209.el6_2.4.x86_64
qemu-img-0.12.1.2-2.209.el6_2.4.x86_64
gpxe-roms-qemu-0.9.7-6.9.el6.noarch

$ uname -r
2.6.32-220.23.1.el6.x86_64

When I fired up my KVM Virtual Machine Manager (virt-manager)
this morning, four of my seven virtual machines disappeared,
including the one is desperately need.

Checking /etc/libvirt/qemu and they are all there.  Same
attributes too.  Checking where I put the virtual hard
drives and they are all there too.

Okay, so I try firing up the three that remain, I get the
following error message:

 Error starting domain: unsupported configuration: spice
 TLS port set in XML configuration, but TLS is disabled
 in qemu.conf

Yes, each VM has a different spice port set so I can tell them
apart.  This has always worked smoothly.

Huh? qemu.conf is the default.  The one with everything
commented out.  I even checked my backup: no change in
qemu.conf.

Checking /var/log/libvirt/libvirtd.log gives:

2012-07-09 19:36:41.957+: 2821: info : libvirt version: 0.9.10,

package: 21.el6 (Scientific Linux, 2012-06-22-02:34:35, sl6.fnal.gov)

2012-07-09 19:36:41.957+: 2821: error :

virDomainDefParseXML:8871 : Maximum CPUs greater than topology limit

2012-07-09 19:36:41.959+: 2821: error :

virDomainDefParseXML:8871 : Maximum CPUs greater than topology limit

2012-07-09 19:36:41.959+: 2821: error :

virDomainDefParseXML:8871 : Maximum CPUs greater than topology limit

2012-07-09 19:36:41.960+: 2821: error :

virDomainDefParseXML:8871 : Maximum CPUs greater than topology limit

2012-07-09 19:47:59.089+: 2811: error :

qemuBuildCommandLine:5526 : unsupported configuration: spice TLS port set in 
XML configuration, but TLS is disabled in qemu.conf


Again with the spice port error.

The only thing I did to my system between working yesterday and
not working today was downgrade my flash-plugin.

I tried setting "spice_tls = 1" in qemu.conf, but the other
four VM still do not show up.  Spice lays an egg on the ones
that do show up, so I set spice_tls back to commented out.

I removed qemu-kvm-0.12.1.2-2.209.el6_2.4.x86_64,
rebooted, reinstalled, rebooted.  No symptom change.

What is the world?  I can not find anything wrong!

Many thanks,
-T



Another symptom.  I can not edit ANY of the VMs:

# virsh edit KVM-W8.xml
error: failed to get domain 'KVM-W8.xml'
error: Domain not found: no domain with matching name 'KVM-W8.xml'


Same thing when I dropped the .xml from the name.  The three that
show up work, the other four do not




Figured it out.  A bad update for libvirt.  Looking at
/var/log/yum.log, I found:
Jul 09 11:18:50 Updated: libvirt-0.9.10-21.el6.x86_64

I did not ask for this update.  Somehow it went on in
the background.


The following fixed my problem:

# yum downgrade libvirt  libvirt-client  libvirt-python
Loaded plugins: priorities, refresh-packagekit, security
Setting up Downgrade Process
18 packages excluded due to repository priority protections
Resolving Dependencies
--> Running transaction check
---> Package libvirt.x86_64 0:0.9.4-23.el6 will be a downgrade
---> Package libvirt.x86_64 0:0.9.10-21.el6 will be erased
---> Package libvirt-client.x86_64 0:0.9.4-23.el6 will be a downgrade
---> Package libvirt-client.x86_64 0:0.9.10-21.el6 will be erased
---> Package libvirt-python.x86_64 0:0.9.4-23.el6 will be a downgrade
---> Package libvirt-python.x86_64 0:0.9.10-21.el6 will be erased
--> Finished Dependency Resolution

Next step, Red Hat's bugzilla.

-T

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 0/6] tcm_vhost/virtio-scsi WIP code for-3.6

2012-07-09 Thread Nicholas A. Bellinger
Hi folks,

On Wed, 2012-07-04 at 18:52 -0700, Nicholas A. Bellinger wrote:
> 
> To give an idea of how things are looking on the performance side, here
> some initial numbers for small block (4k) mixed random IOPs using the
> following fio test setup:



> fio randrw workload | virtio-scsi-raw | virtio-scsi+tcm_vhost | bare-metal 
> raw block
> 
> 25 Write / 75 Read  |  ~15K   | ~45K  | ~70K
> 75 Write / 25 Read  |  ~20K   | ~55K  | ~60K
> 
> 

After checking the original benchmarks here again, I realized that for
virtio-scsi+tcm_vhost the results where actually switched..

So this should have been: heavier READ case (25 / 75) == 55K, and
heavier WRITE case (75 / 25) == 45K.

> In the first case, virtio-scsi+tcm_vhost is out performing by 3x
> compared to virtio-scsi-raw using QEMU SCSI emulation with the same raw
> flash backend device.  For the second case heavier WRITE case, tcm_vhost
> is nearing full bare-metal utilization (~55K vs. ~60K).
> 
> Also converting tcm_vhost to use proper cmwq process context I/O
> submission will help to get even closer to bare metal speeds for both
> work-loads.
> 

Here are initial follow-up virtio-scsi randrw 4k benchmarks with
tcm_vhost recently converted to run backend I/O dispatch via modern cmwq
primitives (kworkerd).

fio randrw 4k workload | virtio-scsi+tcm_vhost+cmwq
---
  25 Write / 75 Read   |  ~60K
  75 Write / 25 Read   |  ~45K

So aside from the minor performance improvement for the 25 / 75
workload, the other main improvement is lower CPU usage using the
iomemory_vsl backends.  This is attributed to cmwq providing process
context on the same core as the vhost thread pulling items off vq, which
ends up being on the order of 1/3 less host CPU usage (for both
workloads) primarly from positive cache effects.

This patch is now available in target-pending/tcm_vhost, and I'll be
respinning the initial merge series into for-next-merge over the next
days + another round of list review.

Please let us know if you have any concerns.

Thanks!

--nab

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v4 2/3] KVM-INTEL: Add new module vmcsinfo-intel to fill VMCSINFO

2012-07-09 Thread HATAYAMA Daisuke
From: Wen Congyang 
Subject: Re: [PATCH v4 2/3] KVM-INTEL: Add new module vmcsinfo-intel to fill 
VMCSINFO
Date: Fri, 6 Jul 2012 16:25:23 +0800

> At 07/06/2012 04:04 PM, HATAYAMA Daisuke Wrote:
>> From: Yanfei Zhang 
>> Subject: [PATCH v4 2/3] KVM-INTEL: Add new module vmcsinfo-intel to fill 
>> VMCSINFO
>> Date: Wed, 4 Jul 2012 18:05:19 +0800
>> 
>>> Besides, this patch also exports vmcs revision identifier via
>>> /sys/devices/system/cpu/vmcs_id and offsets of fields via
>>> /sys/devices/system/cpu/vmcs/.
>>> Individual offsets are contained in subfiles named by the filed's
>>> encoding, e.g.: /sys/devices/cpu/vmcs/0800
>> 
>> According to the discussion starting from
>> 
>> http://lkml.indiana.edu/hypermail/linux/kernel/1105.3/00749.html
> 
> IIRC, kvm can not work in such environment. The vcpu can run on
> different cpu. If the cpu's vmcs is different, I don't know what
> will happen. So do we need to support for such environment now?
> I think that if kvm can not work in such environment, we should
> not provide vmcs information for each physical cpu.
> 

Ah, I noticed my basic confusion: if it --- vcpu can run on cpus with
differnet VMCS revision --- were possible, this vmcsinfo would be
unnecessary because it would mean processer could read VMCS data with
revision different from its own one or some kind of reverse
engineering for convertion of differnet VMCS data were done.

I think kvm could probably work if only processors that have the same
VMCS revision are assigned to a single guest. But considering the VMCS
nature, such processor with differnet revision seems unlikely to be
used on host machine.

Thanks.
HATAYAMA, Daisuke
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v4 2/3] KVM-INTEL: Add new module vmcsinfo-intel to fill VMCSINFO

2012-07-09 Thread HATAYAMA Daisuke
From: Yanfei Zhang 
Subject: [PATCH v4 2/3] KVM-INTEL: Add new module vmcsinfo-intel to fill 
VMCSINFO
Date: Wed, 4 Jul 2012 18:05:19 +0800

> 
> Besides, this patch also exports vmcs revision identifier via
> /sys/devices/system/cpu/vmcs_id and offsets of fields via
> /sys/devices/system/cpu/vmcs/.

I think /sys/devices/system/cpu/vmcs/id is more natural, which also
belongs to vmcs.


> +/*
> + * vmcs field offsets.
> + */
> +static struct vmcsinfo {
> + u32 vmcs_revision_id;
> + u16 vmcs_field_to_offset_table[HOST_RIP + 1];
> +} vmcsinfo;

This is what I said previously. HOST_RIP is 0x6c16 => 27670. This
means sizeof (struct vmcsinfo) => 55346 bytes == 54 kbytes. But
actually exported fields are only 153, so 4 + 2 * 153 => 310 bytes are
enough.

How about getting the number of attributes from vmcs_attrs array? I
guess this is exactly the number of vmcs fields exported; here 153.

Thanks.
HATAYAMA, Daisuke
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v3 3/6] KVM: Move mmu reload out of line

2012-07-09 Thread Xiao Guangrong
On 07/10/2012 01:05 AM, Avi Kivity wrote:
> Currently we check that the mmu root exits before every entry.  Use the
> existing KVM_REQ_MMU_RELOAD mechanism instead, by making it really reload
> the mmu, and by adding the request to mmu initialization code.
> 
> Signed-off-by: Avi Kivity 
> ---
>  arch/x86/kvm/mmu.c |  4 +++-
>  arch/x86/kvm/svm.c |  1 +
>  arch/x86/kvm/x86.c | 13 +++--
>  3 files changed, 11 insertions(+), 7 deletions(-)
> 
> diff --git a/arch/x86/kvm/mmu.c b/arch/x86/kvm/mmu.c
> index 569cd66..136d757 100644
> --- a/arch/x86/kvm/mmu.c
> +++ b/arch/x86/kvm/mmu.c
> @@ -3180,7 +3180,8 @@ void kvm_mmu_flush_tlb(struct kvm_vcpu *vcpu)
>  static void paging_new_cr3(struct kvm_vcpu *vcpu)
>  {
>   pgprintk("%s: cr3 %lx\n", __func__, kvm_read_cr3(vcpu));
> - mmu_free_roots(vcpu);
> + kvm_mmu_unload(vcpu);
> + kvm_mmu_load(vcpu);
>  }
> 
>  static unsigned long get_cr3(struct kvm_vcpu *vcpu)
> @@ -3469,6 +3470,7 @@ static int init_kvm_nested_mmu(struct kvm_vcpu *vcpu)
> 
>  static int init_kvm_mmu(struct kvm_vcpu *vcpu)
>  {
> + kvm_make_request(KVM_REQ_MMU_RELOAD, vcpu);
>   if (mmu_is_nested(vcpu))
>   return init_kvm_nested_mmu(vcpu);
>   else if (tdp_enabled)
> diff --git a/arch/x86/kvm/svm.c b/arch/x86/kvm/svm.c
> index 7a41878..d77ad8c 100644
> --- a/arch/x86/kvm/svm.c
> +++ b/arch/x86/kvm/svm.c
> @@ -2523,6 +2523,7 @@ static bool nested_svm_vmrun(struct vcpu_svm *svm)
> 
>   if (nested_vmcb->control.nested_ctl) {
>   kvm_mmu_unload(&svm->vcpu);
> + kvm_make_request(KVM_REQ_MMU_RELOAD, &svm->vcpu);
>   svm->nested.nested_cr3 = nested_vmcb->control.nested_cr3;
>   nested_svm_init_mmu_context(&svm->vcpu);
>   }
> diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
> index 959e5a9..162231f 100644
> --- a/arch/x86/kvm/x86.c
> +++ b/arch/x86/kvm/x86.c
> @@ -5226,8 +5226,14 @@ static int vcpu_enter_guest(struct kvm_vcpu *vcpu)
>   kvm_make_request(KVM_REQ_EVENT, vcpu);
> 
>   if (vcpu->requests) {
> - if (kvm_check_request(KVM_REQ_MMU_RELOAD, vcpu))
> + if (kvm_check_request(KVM_REQ_MMU_RELOAD, vcpu)) {
>   kvm_mmu_unload(vcpu);
> + r = kvm_mmu_reload(vcpu);
> + if (unlikely(r)) {
> + kvm_make_request(KVM_REQ_MMU_RELOAD, vcpu);
> + goto out;
> + }

Now, reload mmu is before event injecting, can below bug be triggered again?

commit d8368af8b46b904def42a0f341d2f4f29001fa77
Author: Avi Kivity 
Date:   Mon May 14 18:07:56 2012 +0300

KVM: Fix mmu_reload() clash with nested vmx event injection

Currently the inject_pending_event() call during guest entry happens after
kvm_mmu_reload().  This is for historical reasons - we used to
inject_pending_event() in atomic context, while kvm_mmu_reload() needs task
context.

A problem is that nested vmx can cause the mmu context to be reset, if event
injection is intercepted and causes a #VMEXIT instead (the #VMEXIT resets
CR0/CR3/CR4).  If this happens, we end up with invalid root_hpa, and since
kvm_mmu_reload() has already run, no one will fix it and we end up entering
the guest this way.

Fix by reordering event injection to be before kvm_mmu_reload().  Use
->cancel_injection() to undo if kvm_mmu_reload() fails.

https://bugzilla.kernel.org/show_bug.cgi?id=42980

Reported-by: Luke-Jr 
Signed-off-by: Avi Kivity 
Signed-off-by: Marcelo Tosatti 

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html