RE: [PATCH 7/9] KVM: PPC: Emulate trap SRR1 flags properly

2010-01-21 Thread Liu Yu-B13201
 

> -Original Message-
> From: kvm-ppc-ow...@vger.kernel.org 
> [mailto:kvm-ppc-ow...@vger.kernel.org] On Behalf Of Hollis Blanchard
> Sent: Saturday, January 09, 2010 3:30 AM
> To: Alexander Graf
> Cc: kvm@vger.kernel.org; kvm-ppc; Benjamin Herrenschmidt; Liu Yu
> Subject: Re: [PATCH 7/9] KVM: PPC: Emulate trap SRR1 flags properly
> 
> > diff --git a/arch/powerpc/kvm/booke.c b/arch/powerpc/kvm/booke.c
> > index 338baf9..e283e44 100644
> > --- a/arch/powerpc/kvm/booke.c
> > +++ b/arch/powerpc/kvm/booke.c
> > @@ -82,8 +82,9 @@ static void 
> kvmppc_booke_queue_irqprio(struct kvm_vcpu *vcpu,
> >        set_bit(priority, &vcpu->arch.pending_exceptions);
> >  }
> >
> > -void kvmppc_core_queue_program(struct kvm_vcpu *vcpu)
> > +void kvmppc_core_queue_program(struct kvm_vcpu *vcpu, ulong flags)
> >  {
> > +       /* BookE does flags in ESR, so ignore those we get here */
> >        kvmppc_booke_queue_irqprio(vcpu, BOOKE_IRQPRIO_PROGRAM);
> >  }
> 
> Actually, I think Book E prematurely sets ESR, since it's done before
> the program interrupt is actually delivered. Architecturally, I'm not
> sure if it's a problem, but philosophically I've always wanted it to
> work the way you've just implemented for Book S.
> 

ESR is updated not only by program but by data_tlb, data_storage, etc.
Should we rearrange them all? 
Also DEAR has the same situation as ESR.
Should it be updated when we decide to inject interrupt to guest?


--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH] KVM: Fix kvm_coalesced_mmio_ring duplicate allocation

2010-01-21 Thread Sheng Yang
The commit 0953ca73 "KVM: Simplify coalesced mmio initialization"
allocate kvm_coalesced_mmio_ring in the kvm_coalesced_mmio_init(), but
didn't discard the original allocation...

Signed-off-by: Sheng Yang 
---
 virt/kvm/kvm_main.c |   17 -
 1 files changed, 0 insertions(+), 17 deletions(-)

diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c
index 7c5c873..2b0974a 100644
--- a/virt/kvm/kvm_main.c
+++ b/virt/kvm/kvm_main.c
@@ -371,9 +371,6 @@ static struct kvm *kvm_create_vm(void)
 {
int r = 0, i;
struct kvm *kvm = kvm_arch_create_vm();
-#ifdef KVM_COALESCED_MMIO_PAGE_OFFSET
-   struct page *page;
-#endif
 
if (IS_ERR(kvm))
goto out;
@@ -402,23 +399,9 @@ static struct kvm *kvm_create_vm(void)
}
}
 
-#ifdef KVM_COALESCED_MMIO_PAGE_OFFSET
-   page = alloc_page(GFP_KERNEL | __GFP_ZERO);
-   if (!page) {
-   cleanup_srcu_struct(&kvm->srcu);
-   goto out_err;
-   }
-
-   kvm->coalesced_mmio_ring =
-   (struct kvm_coalesced_mmio_ring *)page_address(page);
-#endif
-
r = kvm_init_mmu_notifier(kvm);
if (r) {
cleanup_srcu_struct(&kvm->srcu);
-#ifdef KVM_COALESCED_MMIO_PAGE_OFFSET
-   put_page(page);
-#endif
goto out_err;
}
 
-- 
1.5.4.5

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v3 04/12] Add "handle page fault" PV helper.

2010-01-21 Thread Avi Kivity

On 01/20/2010 07:18 PM, Rik van Riel wrote:

On 01/20/2010 07:00 AM, Avi Kivity wrote:

On 01/20/2010 12:02 PM, Gleb Natapov wrote:


I can inject the event as HW interrupt on vector greater then 32 but 
not

go through APIC so EOI will not be required. This sounds
non-architectural
and I am not sure kernel has entry point code for this kind of 
event, it
has one for exception and one for interrupts that goes through 
__do_IRQ()

which assumes that interrupts should be ACKed.


Further, we start to interact with the TPR; Linux doesn't use the TPR or
cr8 but if it does one day we don't want it interfering with apf.


That's not an issue is it?  The guest will tell the host what
vector to use for pseudo page faults.


And kill 15 other vectors?

--
error compiling committee.c: too many arguments to function

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v3 04/12] Add "handle page fault" PV helper.

2010-01-21 Thread Avi Kivity

On 01/20/2010 08:45 PM, H. Peter Anvin wrote:

On 01/20/2010 04:00 AM, Avi Kivity wrote:
   

On 01/20/2010 12:02 PM, Gleb Natapov wrote:
 

I can inject the event as HW interrupt on vector greater then 32 but not
go through APIC so EOI will not be required. This sounds
non-architectural
and I am not sure kernel has entry point code for this kind of event, it
has one for exception and one for interrupts that goes through __do_IRQ()
which assumes that interrupts should be ACKed.

   

Further, we start to interact with the TPR; Linux doesn't use the TPR or
cr8 but if it does one day we don't want it interfering with apf.

 

I don't think the TPR would be involved unless you involve the APIC
(which you absolutely don't want to do.)  What I'm trying to figure out
is if you could inject this vector as "external interrupt" and still
have it deliver if IF=0, or if it would cause any other funnies.
   


No, and it poses problems further down the line if the hardware 
virtualizes more and more of the APIC as seems likely to happen.


External interrupts are asynchronous events, so they're likely not to be 
guaranteed to be delivered on an instruction boundary like exceptions.  
Things like interrupt shadow will affect them as well.



As that point, you do not want to go through the do_IRQ path but rather
through your own exception vector entry point (it would be an entry
point which doesn't get an error code, like #UD.)
   


An error code would actually be useful.

--
error compiling committee.c: too many arguments to function

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v3 04/12] Add "handle page fault" PV helper.

2010-01-21 Thread Avi Kivity

On 01/20/2010 07:43 PM, H. Peter Anvin wrote:

On 01/20/2010 02:02 AM, Gleb Natapov wrote:



You can have the guest OS take an exception on a vector above 31 just
fine; you just need it to tell the hypervisor which vector it, the OS,
assigned for this purpose.

VMX doesn't allow to inject hardware exception with vector greater 
then 31.

SDM 3B section 23.2.1.3.



OK, you're right.  I had missed that... I presume it was done for 
implementation reasons.


My expectation is that is was done for forward compatibility reasons.




I can inject the event as HW interrupt on vector greater then 32 but not
go through APIC so EOI will not be required. This sounds 
non-architectural

and I am not sure kernel has entry point code for this kind of event, it
has one for exception and one for interrupts that goes through 
__do_IRQ()

which assumes that interrupts should be ACKed.


You can also just emulate the state transition -- since you know 
you're dealing with a flat protected-mode or long-mode OS (and just 
make that a condition of enabling the feature) you don't have to deal 
with all the strange combinations of directions that an unrestricted 
x86 event can take.  Since it's an exception, it is unconditional.


Do you mean create the stack frame manually?  I'd really like to avoid 
that for many reasons, one of which is performance (need to do all the 
virt-to-phys walks manually), the other is that we're certain to end up 
with something horribly underspecified.  I'd really like to keep as 
close as possible to the hardware.  For the alternative approach, see Xen.


--
error compiling committee.c: too many arguments to function

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v3 04/12] Add "handle page fault" PV helper.

2010-01-21 Thread Gleb Natapov
On Thu, Jan 21, 2010 at 11:02:19AM +0200, Avi Kivity wrote:
> On 01/20/2010 07:43 PM, H. Peter Anvin wrote:
> >On 01/20/2010 02:02 AM, Gleb Natapov wrote:
> >>
> >>>You can have the guest OS take an exception on a vector above 31 just
> >>>fine; you just need it to tell the hypervisor which vector it, the OS,
> >>>assigned for this purpose.
> >>>
> >>VMX doesn't allow to inject hardware exception with vector
> >>greater then 31.
> >>SDM 3B section 23.2.1.3.
> >>
> >
> >OK, you're right.  I had missed that... I presume it was done for
> >implementation reasons.
> 
> My expectation is that is was done for forward compatibility reasons.
> 
> >
> >>I can inject the event as HW interrupt on vector greater then 32 but not
> >>go through APIC so EOI will not be required. This sounds
> >>non-architectural
> >>and I am not sure kernel has entry point code for this kind of event, it
> >>has one for exception and one for interrupts that goes through
> >>__do_IRQ()
> >>which assumes that interrupts should be ACKed.
> >
> >You can also just emulate the state transition -- since you know
> >you're dealing with a flat protected-mode or long-mode OS (and
> >just make that a condition of enabling the feature) you don't have
> >to deal with all the strange combinations of directions that an
> >unrestricted x86 event can take.  Since it's an exception, it is
> >unconditional.
> 
> Do you mean create the stack frame manually?  I'd really like to
> avoid that for many reasons, one of which is performance (need to do
> all the virt-to-phys walks manually), the other is that we're
> certain to end up with something horribly underspecified.  I'd
> really like to keep as close as possible to the hardware.  For the
> alternative approach, see Xen.
> 
That and our event injection path can't play with guest memory right now
since it is done from atomic context.

--
Gleb.
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v3 04/12] Add "handle page fault" PV helper.

2010-01-21 Thread Avi Kivity

On 01/21/2010 11:04 AM, Gleb Natapov wrote:



Do you mean create the stack frame manually?  I'd really like to
avoid that for many reasons, one of which is performance (need to do
all the virt-to-phys walks manually), the other is that we're
certain to end up with something horribly underspecified.  I'd
really like to keep as close as possible to the hardware.  For the
alternative approach, see Xen.

 

That and our event injection path can't play with guest memory right now
since it is done from atomic context.
   


That's true (I'd like to fix that though, for the real mode stuff).

--
error compiling committee.c: too many arguments to function

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH] kvm: Flush coalesced MMIO buffer periodly

2010-01-21 Thread Sheng Yang
The default action of coalesced MMIO is, cache the writing in buffer, until:
1. The buffer is full.
2. Or the exit to QEmu due to other reasons.

But this would result in a very late writing in some condition.
1. The each time write to MMIO content is small.
2. The writing interval is big.
3. No need for input or accessing other devices frequently.

This issue was observed in a experimental embbed system. The test image
simply print "test" every 1 seconds. The output in QEmu meets expectation,
but the output in KVM is delayed for seconds.

Per Avi's suggestion, I add a periodly flushing coalesced MMIO buffer in
QEmu IO thread. By this way, We don't need vcpu explicit exit to QEmu to
handle this issue. Current synchronize rate is 1/25s.

Signed-off-by: Sheng Yang 
---
 qemu-kvm.c |   47 +--
 qemu-kvm.h |2 ++
 2 files changed, 47 insertions(+), 2 deletions(-)

diff --git a/qemu-kvm.c b/qemu-kvm.c
index 599c3d6..38f890c 100644
--- a/qemu-kvm.c
+++ b/qemu-kvm.c
@@ -463,6 +463,12 @@ static void kvm_create_vcpu(CPUState *env, int id)
 goto err_fd;
 }
 
+#ifdef KVM_CAP_COALESCED_MMIO
+if (kvm_state->coalesced_mmio && !kvm_state->coalesced_mmio_ring)
+kvm_state->coalesced_mmio_ring = (void *) env->kvm_run +
+   kvm_state->coalesced_mmio * PAGE_SIZE;
+#endif
+
 return;
   err_fd:
 close(env->kvm_fd);
@@ -927,8 +933,7 @@ int kvm_run(CPUState *env)
 
 #if defined(KVM_CAP_COALESCED_MMIO)
 if (kvm_state->coalesced_mmio) {
-struct kvm_coalesced_mmio_ring *ring =
-(void *) run + kvm_state->coalesced_mmio * PAGE_SIZE;
+struct kvm_coalesced_mmio_ring *ring = kvm_state->coalesced_mmio_ring;
 while (ring->first != ring->last) {
 cpu_physical_memory_rw(ring->coalesced_mmio[ring->first].phys_addr,
&ring->coalesced_mmio[ring->first].data[0],
@@ -2073,6 +2078,29 @@ static void io_thread_wakeup(void *opaque)
 }
 }
 
+#ifdef KVM_CAP_COALESCED_MMIO
+
+/* flush interval is 1/25 second */
+#define KVM_COALESCED_MMIO_FLUSH_INTERVAL4000LL
+
+static void flush_coalesced_mmio_buffer(void *opaque)
+{
+if (kvm_state->coalesced_mmio_ring) {
+struct kvm_coalesced_mmio_ring *ring =
+kvm_state->coalesced_mmio_ring;
+while (ring->first != ring->last) {
+cpu_physical_memory_rw(ring->coalesced_mmio[ring->first].phys_addr,
+   &ring->coalesced_mmio[ring->first].data[0],
+   ring->coalesced_mmio[ring->first].len, 1);
+smp_wmb();
+ring->first = (ring->first + 1) % KVM_COALESCED_MMIO_MAX;
+}
+}
+qemu_mod_timer(kvm_state->coalesced_mmio_timer,
+   qemu_get_clock(host_clock) + KVM_COALESCED_MMIO_FLUSH_INTERVAL);
+}
+#endif
+
 int kvm_main_loop(void)
 {
 int fds[2];
@@ -2117,6 +2145,15 @@ int kvm_main_loop(void)
 io_thread_sigfd = sigfd;
 cpu_single_env = NULL;
 
+#ifdef KVM_CAP_COALESCED_MMIO
+if (kvm_state->coalesced_mmio) {
+kvm_state->coalesced_mmio_timer =
+qemu_new_timer(host_clock, flush_coalesced_mmio_buffer, NULL);
+qemu_mod_timer(kvm_state->coalesced_mmio_timer,
+qemu_get_clock(host_clock) + KVM_COALESCED_MMIO_FLUSH_INTERVAL);
+}
+#endif
+
 while (1) {
 main_loop_wait(1000);
 if (qemu_shutdown_requested()) {
@@ -2135,6 +2172,12 @@ int kvm_main_loop(void)
 }
 }
 
+#ifdef KVM_CAP_COALESCED_MMIO
+if (kvm_state->coalesced_mmio) {
+qemu_del_timer(kvm_state->coalesced_mmio_timer);
+qemu_free_timer(kvm_state->coalesced_mmio_timer);
+}
+#endif
 pause_all_threads();
 pthread_mutex_unlock(&qemu_mutex);
 
diff --git a/qemu-kvm.h b/qemu-kvm.h
index 6b3e5a1..17f9d1b 100644
--- a/qemu-kvm.h
+++ b/qemu-kvm.h
@@ -1144,6 +1144,8 @@ typedef struct KVMState {
 int fd;
 int vmfd;
 int coalesced_mmio;
+struct kvm_coalesced_mmio_ring *coalesced_mmio_ring;
+struct QEMUTimer *coalesced_mmio_timer;
 int broken_set_mem_region;
 int migration_log;
 int vcpu_events;
-- 
1.5.4.5

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] kvm: Flush coalesced MMIO buffer periodly

2010-01-21 Thread Avi Kivity

On 01/21/2010 11:37 AM, Sheng Yang wrote:

The default action of coalesced MMIO is, cache the writing in buffer, until:
1. The buffer is full.
2. Or the exit to QEmu due to other reasons.

But this would result in a very late writing in some condition.
1. The each time write to MMIO content is small.
2. The writing interval is big.
3. No need for input or accessing other devices frequently.

This issue was observed in a experimental embbed system. The test image
simply print "test" every 1 seconds. The output in QEmu meets expectation,
but the output in KVM is delayed for seconds.

Per Avi's suggestion, I add a periodly flushing coalesced MMIO buffer in
QEmu IO thread. By this way, We don't need vcpu explicit exit to QEmu to
handle this issue. Current synchronize rate is 1/25s.

   


I'm not sure that a new timer is needed.  If the only problem case is 
the display, maybe we can flush coalesced mmio from the vga refresh 
timer.  That ensures that we flash exactly when needed, and don't have 
extra timers.


--
error compiling committee.c: too many arguments to function

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 0/5] Debug register emulation fixes and optimizations (reloaded)

2010-01-21 Thread Avi Kivity

On 01/20/2010 07:20 PM, Jan Kiszka wrote:

Major parts of this series were already posted a while ago during the
debug register switch optimizations. This version now comes with an
additional fix for VMX (patch 1) and a rework of mov dr emulation for
SVM.
   

Looks good.

--
error compiling committee.c: too many arguments to function

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH] kvm-s390: fix potential array overrun in intercept handling

2010-01-21 Thread Christian Borntraeger
Avi, Marcelo,

kvm_handle_sie_intercept uses a jump table to get the intercept handler
for a SIE intercept. Static code analysis revealed a potential problem:
the intercept_funcs jump table was defined to contain (0x48 >> 2) entries,
but we only checked for code > 0x48 which would cause an off-by-one
array overflow if code == 0x48.

Since the table is only populated up to (0x28 >> 2), we can reduce the
jump table size while fixing the off-by-one.

Signed-off-by: Christian Borntraeger 

---
(patch was refreshed with -U8 to see the full jump table.)
 arch/s390/kvm/intercept.c |4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

Index: linux-2.6/arch/s390/kvm/intercept.c
===
--- linux-2.6.orig/arch/s390/kvm/intercept.c
+++ linux-2.6/arch/s390/kvm/intercept.c
@@ -208,32 +208,32 @@ static int handle_instruction_and_prog(s
 
if (rc == -ENOTSUPP)
vcpu->arch.sie_block->icptcode = 0x04;
if (rc)
return rc;
return rc2;
 }
 
-static const intercept_handler_t intercept_funcs[0x48 >> 2] = {
+static const intercept_handler_t intercept_funcs[(0x28 >> 2) + 1] = {
[0x00 >> 2] = handle_noop,
[0x04 >> 2] = handle_instruction,
[0x08 >> 2] = handle_prog,
[0x0C >> 2] = handle_instruction_and_prog,
[0x10 >> 2] = handle_noop,
[0x14 >> 2] = handle_noop,
[0x1C >> 2] = kvm_s390_handle_wait,
[0x20 >> 2] = handle_validity,
[0x28 >> 2] = handle_stop,
 };
 
 int kvm_handle_sie_intercept(struct kvm_vcpu *vcpu)
 {
intercept_handler_t func;
u8 code = vcpu->arch.sie_block->icptcode;
 
-   if (code & 3 || code > 0x48)
+   if (code & 3 || code > 0x28)
return -ENOTSUPP;
func = intercept_funcs[code >> 2];
if (func)
return func(vcpu);
return -ENOTSUPP;
 }
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] kvm-s390: fix potential array overrun in intercept handling

2010-01-21 Thread Avi Kivity

On 01/21/2010 12:56 PM, Christian Borntraeger wrote:

Avi, Marcelo,

kvm_handle_sie_intercept uses a jump table to get the intercept handler
for a SIE intercept. Static code analysis revealed a potential problem:
the intercept_funcs jump table was defined to contain (0x48>>  2) entries,
but we only checked for code>  0x48 which would cause an off-by-one
array overflow if code == 0x48.

Since the table is only populated up to (0x28>>  2), we can reduce the
jump table size while fixing the off-by-one.

   




-static const intercept_handler_t intercept_funcs[0x48>>  2] = {
+static const intercept_handler_t intercept_funcs[(0x28>>  2) + 1] = {
[0x00>>  2] = handle_noop,
[0x04>>  2] = handle_instruction,
[0x08>>  2] = handle_prog,
[0x0C>>  2] = handle_instruction_and_prog,
[0x10>>  2] = handle_noop,
[0x14>>  2] = handle_noop,
[0x1C>>  2] = kvm_s390_handle_wait,
[0x20>>  2] = handle_validity,
[0x28>>  2] = handle_stop,
  };
   


You can define the array without a size to let the compiler figure out 
the minimum size.




  int kvm_handle_sie_intercept(struct kvm_vcpu *vcpu)
  {
intercept_handler_t func;
u8 code = vcpu->arch.sie_block->icptcode;

-   if (code&  3 || code>  0x48)
+   if (code&  3 || code>  0x28)
return -ENOTSUPP;
   


And here, check against ARRAY_SIZE() instead of a magic number.


--
error compiling committee.c: too many arguments to function

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCHv2] kvm-s390: fix potential array overrun in intercept handling

2010-01-21 Thread Christian Borntraeger
v2: apply Avis suggestions about ARRAY_SIZE.

kvm_handle_sie_intercept uses a jump table to get the intercept handler
for a SIE intercept. Static code analysis revealed a potential problem:
the intercept_funcs jump table was defined to contain (0x48 >> 2) entries,
but we only checked for code > 0x48 which would cause an off-by-one
array overflow if code == 0x48.

Use the compiler and ARRAY_SIZE to automatically set the limits.

Signed-off-by: Christian Borntraeger 

---
(patch was refreshed with -U8 to see the full jump table.)
 arch/s390/kvm/intercept.c |4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

Index: linux-2.6/arch/s390/kvm/intercept.c
===
--- linux-2.6.orig/arch/s390/kvm/intercept.c
+++ linux-2.6/arch/s390/kvm/intercept.c
@@ -208,32 +208,32 @@ static int handle_instruction_and_prog(s
 
if (rc == -ENOTSUPP)
vcpu->arch.sie_block->icptcode = 0x04;
if (rc)
return rc;
return rc2;
 }
 
-static const intercept_handler_t intercept_funcs[0x48 >> 2] = {
+static const intercept_handler_t intercept_funcs[] = {
[0x00 >> 2] = handle_noop,
[0x04 >> 2] = handle_instruction,
[0x08 >> 2] = handle_prog,
[0x0C >> 2] = handle_instruction_and_prog,
[0x10 >> 2] = handle_noop,
[0x14 >> 2] = handle_noop,
[0x1C >> 2] = kvm_s390_handle_wait,
[0x20 >> 2] = handle_validity,
[0x28 >> 2] = handle_stop,
 };
 
 int kvm_handle_sie_intercept(struct kvm_vcpu *vcpu)
 {
intercept_handler_t func;
u8 code = vcpu->arch.sie_block->icptcode;
 
-   if (code & 3 || code > 0x48)
+   if (code & 3 || (code >> 2)  >= ARRAY_SIZE(intercept_funcs))
return -ENOTSUPP;
func = intercept_funcs[code >> 2];
if (func)
return func(vcpu);
return -ENOTSUPP;
 }
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCHv2] kvm-s390: fix potential array overrun in intercept handling

2010-01-21 Thread Heiko Carstens
> - if (code & 3 || code > 0x48)
> + if (code & 3 || (code >> 2)  >= ARRAY_SIZE(intercept_funcs))
>   return -ENOTSUPP;

Not that it matters for this patch, but -ENOTSUPP should not leak to
userspace. Not sure if it does somewhere, but it is used all over the
place within arch/s390/kvm...
Use -EOPNOTSUPP or something similar instead.
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCHv2] kvm-s390: fix potential array overrun in intercept handling

2010-01-21 Thread Christian Borntraeger
Am Donnerstag 21 Januar 2010 12:24:18 schrieb Heiko Carstens:
> > -   if (code & 3 || code > 0x48)
> > +   if (code & 3 || (code >> 2)  >= ARRAY_SIZE(intercept_funcs))
> > return -ENOTSUPP;
> 
> Not that it matters for this patch, but -ENOTSUPP should not leak to
> userspace. Not sure if it does somewhere, but it is used all over the
> place within arch/s390/kvm...
> Use -EOPNOTSUPP or something similar instead.

AFAICS it does not leak to userspace, ENOTSUPP is an internal code. see
kvm_arch_vcpu_ioctl_run:
[...]
if (rc == -ENOTSUPP) {
/* intercept cannot be handled in-kernel, prepare kvm-run */
kvm_run->exit_reason = KVM_EXIT_S390_SIEIC;
kvm_run->s390_sieic.icptcode = vcpu->arch.sie_block->icptcode;
kvm_run->s390_sieic.ipa  = vcpu->arch.sie_block->ipa;
kvm_run->s390_sieic.ipb  = vcpu->arch.sie_block->ipb;
rc = 0;
}
[...]
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH] Setup vcpu add/remove infrastructure, including madt bios_info and dsdt.

2010-01-21 Thread Liu, Jinsong
>From cb997030cba02e7e74a29b3d942aeba9808ed293 Mon Sep 17 00:00:00 2001
From: Liu, Jinsong 
Date: Fri, 22 Jan 2010 03:18:46 +0800
Subject: [PATCH] Setup vcpu add/remove infrastructure, including madt 
bios_info and dsdt.

1. setup madt bios_info structure, so that static dsdt get
   run-time madt info like checksum address, lapic address,
   max cpu numbers, with least hardcode magic number (realmode
   address of bios_info).
2. setup vcpu add/remove dsdt infrastructure, including processor
   related acpi objects and control methods. vcpu add/remove will
   trigger SCI and then control method _L02. By matching madt, vcpu
   number and add/remove action were found, then by notify control
   method, it will notify OS acpi driver.

Signed-off-by: Liu, Jinsong 
---
 src/acpi-dsdt.dsl |  131 -
 src/acpi-dsdt.hex |  441 ++---
 src/acpi.c|7 +
 src/biosvar.h |   14 ++
 src/post.c|   13 ++
 5 files changed, 582 insertions(+), 24 deletions(-)

diff --git a/src/acpi-dsdt.dsl b/src/acpi-dsdt.dsl
index cc31112..ed78489 100644
--- a/src/acpi-dsdt.dsl
+++ b/src/acpi-dsdt.dsl
@@ -700,8 +700,11 @@ DefinitionBlock (
 Return (0x01)

 }
+/*
+ * _L02 method for CPU notification
+ */
 Method(_L02) {
-Return(0x01)
+Return(\_PR.PRSC())
 }
 Method(_L03) {
 Return(0x01)
@@ -744,4 +747,130 @@ DefinitionBlock (
 }
 }

+
+Scope (\_PR)
+{
+/* BIOS_INFO_PHYSICAL_ADDRESS == 0xEA000 */
+OperationRegion(BIOS, SystemMemory, 0xEA000, 16)
+Field(BIOS, DwordAcc, NoLock, Preserve)
+{
+MSUA, 32, /* MADT checksum address */
+MAPA, 32, /* MADT LAPIC0 address */
+PBYT, 32, /* bytes of max vcpus bitmap */
+PBIT, 32  /* bits of last byte of max vcpus bitmap */
+}
+
+OperationRegion(MSUM, SystemMemory, MSUA, 1)
+Field(MSUM, ByteAcc, NoLock, Preserve)
+{
+MSU, 8/* MADT checksum */
+}
+
+#define gen_processor(nr, name)   \
+Processor (C##name, nr, 0xb010, 0x06) {   \
+Name (_HID, "ACPI0007")   \
+OperationRegion(MATR, SystemMemory, Add(MAPA, Multiply(nr,8)), 8) \
+Field (MATR, ByteAcc, NoLock, Preserve)   \
+{ \
+MAT, 64   \
+} \
+Field (MATR, ByteAcc, NoLock, Preserve)   \
+{ \
+Offset(4),\
+FLG, 1\
+} \
+Method(_MAT, 0) { \
+Return(ToBuffer(MAT)) \
+} \
+Method (_STA) {   \
+If (FLG) { Return(0xF) } Else { Return(0x9) } \
+} \
+Method (_EJ0, 1, NotSerialized) { \
+Sleep (0xC8)  \
+} \
+} \
+
+gen_processor(0, 0)
+gen_processor(1, 1)
+gen_processor(2, 2)
+gen_processor(3, 3)
+gen_processor(4, 4)
+gen_processor(5, 5)
+gen_processor(6, 6)
+gen_processor(7, 7)
+gen_processor(8, 8)
+gen_processor(9, 9)
+gen_processor(10, A)
+gen_processor(11, B)
+gen_processor(12, C)
+gen_processor(13, D)
+gen_processor(14, E)
+
+
+Method (NTFY, 2) {
+#define gen_ntfy(nr)\
+If (LEqual(Arg0, 0x##nr)) { \
+If (LNotEqual(Arg1, \_PR.C##nr.FLG)) {  \
+Store (Arg1, \_PR.C##nr.FLG)\
+If (LEqual(Arg1, 1)) {  \
+Notify(C##nr, 1)\
+Subtract(\_PR.MSU, 1, \_PR.MSU) \
+} Else {\
+  

[PATCH] Debug vcpu add

2010-01-21 Thread Liu, Jinsong
>From 479e84d9ce9d7d78d845f438071a4b1a44aca0bb Mon Sep 17 00:00:00 2001
From: Liu, Jinsong 
Date: Fri, 22 Jan 2010 03:30:33 +0800
Subject: [PATCH] Debug vcpu add

Add 'kvm_vcpu_inited' check so that when adding vcpu it will not
cause segmentation fault. This is especially necessary when vpu
hotadd after guestos ready.

Signed-off-by: Liu, Jinsong 
---
 qemu-kvm.c |2 +-
 1 files changed, 1 insertions(+), 1 deletions(-)

diff --git a/qemu-kvm.c b/qemu-kvm.c
index 599c3d6..bdf90b4 100644
--- a/qemu-kvm.c
+++ b/qemu-kvm.c
@@ -1618,7 +1618,7 @@ static void kvm_do_load_mpstate(void *_env)
 
 void kvm_load_mpstate(CPUState *env)
 {
-if (kvm_enabled() && qemu_system_ready)
+if (kvm_enabled() && qemu_system_ready && kvm_vcpu_inited(env))
 on_vcpu(env, kvm_do_load_mpstate, env);
 }
 
-- 
1.6.5.6
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


vcpu hotplug support

2010-01-21 Thread Liu, Jinsong
Avi,

I just send 2 patches for KVM vcpu hotplug support.
1 is seabios patch: Setup vcpu add/remove infrastructure, including madt 
bios_info and dsdt
2 is qemu-kvm patch: Debug vcpu add

Thanks,
Jinsong--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 7/9] KVM: PPC: Emulate trap SRR1 flags properly

2010-01-21 Thread Alexander Graf

On 21.01.2010, at 09:09, Liu Yu-B13201 wrote:

> 
> 
>> -Original Message-
>> From: kvm-ppc-ow...@vger.kernel.org 
>> [mailto:kvm-ppc-ow...@vger.kernel.org] On Behalf Of Hollis Blanchard
>> Sent: Saturday, January 09, 2010 3:30 AM
>> To: Alexander Graf
>> Cc: kvm@vger.kernel.org; kvm-ppc; Benjamin Herrenschmidt; Liu Yu
>> Subject: Re: [PATCH 7/9] KVM: PPC: Emulate trap SRR1 flags properly
>> 
>>> diff --git a/arch/powerpc/kvm/booke.c b/arch/powerpc/kvm/booke.c
>>> index 338baf9..e283e44 100644
>>> --- a/arch/powerpc/kvm/booke.c
>>> +++ b/arch/powerpc/kvm/booke.c
>>> @@ -82,8 +82,9 @@ static void 
>> kvmppc_booke_queue_irqprio(struct kvm_vcpu *vcpu,
>>>   set_bit(priority, &vcpu->arch.pending_exceptions);
>>> }
>>> 
>>> -void kvmppc_core_queue_program(struct kvm_vcpu *vcpu)
>>> +void kvmppc_core_queue_program(struct kvm_vcpu *vcpu, ulong flags)
>>> {
>>> +   /* BookE does flags in ESR, so ignore those we get here */
>>>   kvmppc_booke_queue_irqprio(vcpu, BOOKE_IRQPRIO_PROGRAM);
>>> }
>> 
>> Actually, I think Book E prematurely sets ESR, since it's done before
>> the program interrupt is actually delivered. Architecturally, I'm not
>> sure if it's a problem, but philosophically I've always wanted it to
>> work the way you've just implemented for Book S.
>> 
> 
> ESR is updated not only by program but by data_tlb, data_storage, etc.
> Should we rearrange them all? 
> Also DEAR has the same situation as ESR.
> Should it be updated when we decide to inject interrupt to guest?

If that's what the hardware does, then yes. I'm good with taking small steps 
though. So if you don't have the time to convert all of the handlers, you can 
easily start off with program interrupts.

Alex--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: vcpu hotplug support

2010-01-21 Thread Avi Kivity

On 01/21/2010 01:54 PM, Liu, Jinsong wrote:

Avi,

I just send 2 patches for KVM vcpu hotplug support.
1 is seabios patch: Setup vcpu add/remove infrastructure, including madt 
bios_info and dsdt
2 is qemu-kvm patch: Debug vcpu add

   


The patches look reasonable (of course I'd like to see Gleb review it), 
but please send the seabios patch to the seabios mailing list 
(seab...@seabios.org) so we don't have to diverge.


--
error compiling committee.c: too many arguments to function

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] Debug vcpu add

2010-01-21 Thread Sheng Yang
On Thursday 21 January 2010 19:50:17 Liu, Jinsong wrote:
> From 479e84d9ce9d7d78d845f438071a4b1a44aca0bb Mon Sep 17 00:00:00 2001
> From: Liu, Jinsong 
> Date: Fri, 22 Jan 2010 03:30:33 +0800
> Subject: [PATCH] Debug vcpu add

Jinsong, this name is pretty strange...

I think something like "Fix vcpu hot add feature" should be more proper...

-- 
regards
Yang, Sheng

> 
> Add 'kvm_vcpu_inited' check so that when adding vcpu it will not
> cause segmentation fault. This is especially necessary when vpu
> hotadd after guestos ready.
> 
> Signed-off-by: Liu, Jinsong 
> ---
>  qemu-kvm.c |2 +-
>  1 files changed, 1 insertions(+), 1 deletions(-)
> 
> diff --git a/qemu-kvm.c b/qemu-kvm.c
> index 599c3d6..bdf90b4 100644
> --- a/qemu-kvm.c
> +++ b/qemu-kvm.c
> @@ -1618,7 +1618,7 @@ static void kvm_do_load_mpstate(void *_env)
> 
>  void kvm_load_mpstate(CPUState *env)
>  {
> -if (kvm_enabled() && qemu_system_ready)
> +if (kvm_enabled() && qemu_system_ready && kvm_vcpu_inited(env))
>  on_vcpu(env, kvm_do_load_mpstate, env);
>  }
> 
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] Setup vcpu add/remove infrastructure, including madt bios_info and dsdt.

2010-01-21 Thread Gleb Natapov
On Thu, Jan 21, 2010 at 07:48:23PM +0800, Liu, Jinsong wrote:
> >From cb997030cba02e7e74a29b3d942aeba9808ed293 Mon Sep 17 00:00:00 2001
> From: Liu, Jinsong 
> Date: Fri, 22 Jan 2010 03:18:46 +0800
> Subject: [PATCH] Setup vcpu add/remove infrastructure, including madt 
> bios_info and dsdt.
> 
> 1. setup madt bios_info structure, so that static dsdt get
>run-time madt info like checksum address, lapic address,
>max cpu numbers, with least hardcode magic number (realmode
>address of bios_info).
> 2. setup vcpu add/remove dsdt infrastructure, including processor
>related acpi objects and control methods. vcpu add/remove will
>trigger SCI and then control method _L02. By matching madt, vcpu
>number and add/remove action were found, then by notify control
>method, it will notify OS acpi driver.
> 
> Signed-off-by: Liu, Jinsong 
It looks like AML code is a port of what we had in BOCHS bios with minor
changes. Can you detail what is changed and why for easy review please?
And this still doesn't work with Windows I assume.

> ---
>  src/acpi-dsdt.dsl |  131 -
>  src/acpi-dsdt.hex |  441 
> ++---
>  src/acpi.c|7 +
>  src/biosvar.h |   14 ++
>  src/post.c|   13 ++
>  5 files changed, 582 insertions(+), 24 deletions(-)
> 
> diff --git a/src/acpi-dsdt.dsl b/src/acpi-dsdt.dsl
> index cc31112..ed78489 100644
> --- a/src/acpi-dsdt.dsl
> +++ b/src/acpi-dsdt.dsl
> @@ -700,8 +700,11 @@ DefinitionBlock (
>  Return (0x01)
> 
>  }
> +/*
> + * _L02 method for CPU notification
> + */
>  Method(_L02) {
> -Return(0x01)
> +Return(\_PR.PRSC())
>  }
>  Method(_L03) {
>  Return(0x01)
> @@ -744,4 +747,130 @@ DefinitionBlock (
>  }
>  }
> 
> +
> +Scope (\_PR)
> +{
> +/* BIOS_INFO_PHYSICAL_ADDRESS == 0xEA000 */
> +OperationRegion(BIOS, SystemMemory, 0xEA000, 16)
> +Field(BIOS, DwordAcc, NoLock, Preserve)
> +{
> +MSUA, 32, /* MADT checksum address */
> +MAPA, 32, /* MADT LAPIC0 address */
> +PBYT, 32, /* bytes of max vcpus bitmap */
> +PBIT, 32  /* bits of last byte of max vcpus bitmap */
Why do you need PBYT/PBIT? Adds complexity for no apparent reason.

> +}
> +
> +OperationRegion(MSUM, SystemMemory, MSUA, 1)
> +Field(MSUM, ByteAcc, NoLock, Preserve)
> +{
> +MSU, 8/* MADT checksum */
> +}
> +
> +#define gen_processor(nr, name)  
>  \
> +Processor (C##name, nr, 0xb010, 0x06) {  
>  \
> +Name (_HID, "ACPI0007")  
>  \
> +OperationRegion(MATR, SystemMemory, Add(MAPA, Multiply(nr,8)), 
> 8) \
> +Field (MATR, ByteAcc, NoLock, Preserve)  
>  \
> +{
>  \
> +MAT, 64  
>  \
> +}
>  \
> +Field (MATR, ByteAcc, NoLock, Preserve)  
>  \
> +{
>  \
> +Offset(4),   
>  \
> +FLG, 1   
>  \
> +}
>  \
> +Method(_MAT, 0) {
>  \
> +Return(ToBuffer(MAT))
>  \
> +}
>  \
> +Method (_STA) {  
>  \
> +If (FLG) { Return(0xF) } Else { Return(0x9) }
>  \
> +}
>  \
> +Method (_EJ0, 1, NotSerialized) {
>  \
> +Sleep (0xC8) 
>  \
> +}
>  \
Why _EJ0 is needed?

> +}
>  \
> +
> +gen_processor(0, 0)
> +gen_processor(1, 1)
> +gen_processor(2, 2)
> +gen_processor(3, 3)
> +gen_processor(4, 4)
> +gen_processor(5, 5)
> +gen_processor(6, 6)
> +gen_processor(7, 7)
> +gen_processor(8, 8)
> +gen_processor(9, 9)
> +gen_proc

[PATCH] fix checking of cr0 validity

2010-01-21 Thread Gleb Natapov
Move to/from Control Registers chapter of Intel SDM says.  "Reserved bits
in CR0 remain clear after any load of those registers; attempts to set
them have no impact". Control Register chapter says "Bits 63:32 of CR0 are
reserved and must be written with zeros. Writing a nonzero value to any
of the upper 32 bits results in a general-protection exception, #GP(0)."

This patch tries to implement this twisted logic.

Signed-off-by: Gleb Natapov 
Reported-by: Lorenzo Martignoni 
diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index 47c6e23..1df691d 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -430,12 +430,16 @@ void kvm_set_cr0(struct kvm_vcpu *vcpu, unsigned long cr0)
 {
cr0 |= X86_CR0_ET;
 
-   if (cr0 & CR0_RESERVED_BITS) {
+#ifdef CONFIG_X86_64
+   if (cr0 & 0xlu) {
printk(KERN_DEBUG "set_cr0: 0x%lx #GP, reserved bits 0x%lx\n",
   cr0, kvm_read_cr0(vcpu));
kvm_inject_gp(vcpu, 0);
return;
}
+#endif
+
+   cr0 &= ~CR0_RESERVED_BITS;
 
if ((cr0 & X86_CR0_NW) && !(cr0 & X86_CR0_CD)) {
printk(KERN_DEBUG "set_cr0: #GP, CD == 0 && NW == 1\n");
--
Gleb.
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 0/8] cr0/cr4/efer/fpu miscellaneous bits

2010-01-21 Thread Avi Kivity
Mostly trivial cleanups with the exception of a patch activating the fpu
on clts.

Avi Kivity (8):
  KVM: Allow kvm_load_guest_fpu() even when !vcpu->fpu_active
  KVM: Drop kvm_{load,put}_guest_fpu() exports
  KVM: Activate fpu on clts
  KVM: Add a helper for checking if the guest is in protected mode
  KVM: Move cr0/cr4/efer related helpers to x86.h
  KVM: Rename vcpu->shadow_efer to efer
  KVM: Optimize kvm_read_cr[04]_bits()
  KVM: trace guest fpu loads and unloads

 arch/x86/include/asm/kvm_host.h |3 ++-
 arch/x86/kvm/emulate.c  |   10 --
 arch/x86/kvm/kvm_cache_regs.h   |9 +++--
 arch/x86/kvm/mmu.c  |3 ++-
 arch/x86/kvm/mmu.h  |   24 
 arch/x86/kvm/svm.c  |   20 +---
 arch/x86/kvm/vmx.c  |   19 ++-
 arch/x86/kvm/x86.c  |   31 ---
 arch/x86/kvm/x86.h  |   30 ++
 include/trace/events/kvm.h  |   19 +++
 10 files changed, 103 insertions(+), 65 deletions(-)

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 1/8] KVM: Allow kvm_load_guest_fpu() even when !vcpu->fpu_active

2010-01-21 Thread Avi Kivity
This allows accessing the guest fpu from the instruction emulator, as well as
being symmetric with kvm_put_guest_fpu().

Signed-off-by: Avi Kivity 
---
 arch/x86/kvm/x86.c |5 +++--
 1 files changed, 3 insertions(+), 2 deletions(-)

diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index 47c6e23..e3145d5 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -4251,7 +4251,8 @@ static int vcpu_enter_guest(struct kvm_vcpu *vcpu)
preempt_disable();
 
kvm_x86_ops->prepare_guest_switch(vcpu);
-   kvm_load_guest_fpu(vcpu);
+   if (vcpu->fpu_active)
+   kvm_load_guest_fpu(vcpu);
 
local_irq_disable();
 
@@ -5297,7 +5298,7 @@ EXPORT_SYMBOL_GPL(fx_init);
 
 void kvm_load_guest_fpu(struct kvm_vcpu *vcpu)
 {
-   if (!vcpu->fpu_active || vcpu->guest_fpu_loaded)
+   if (vcpu->guest_fpu_loaded)
return;
 
vcpu->guest_fpu_loaded = 1;
-- 
1.6.5.3

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 3/8] KVM: Activate fpu on clts

2010-01-21 Thread Avi Kivity
Assume that if the guest executes clts, it knows what it's doing, and load the
guest fpu to prevent an #NM exception.

Signed-off-by: Avi Kivity 
---
 arch/x86/include/asm/kvm_host.h |1 +
 arch/x86/kvm/svm.c  |8 +++-
 arch/x86/kvm/vmx.c  |1 +
 arch/x86/kvm/x86.c  |1 +
 4 files changed, 10 insertions(+), 1 deletions(-)

diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
index a1f0b5d..bf3ec76 100644
--- a/arch/x86/include/asm/kvm_host.h
+++ b/arch/x86/include/asm/kvm_host.h
@@ -512,6 +512,7 @@ struct kvm_x86_ops {
void (*cache_reg)(struct kvm_vcpu *vcpu, enum kvm_reg reg);
unsigned long (*get_rflags)(struct kvm_vcpu *vcpu);
void (*set_rflags)(struct kvm_vcpu *vcpu, unsigned long rflags);
+   void (*fpu_activate)(struct kvm_vcpu *vcpu);
void (*fpu_deactivate)(struct kvm_vcpu *vcpu);
 
void (*tlb_flush)(struct kvm_vcpu *vcpu);
diff --git a/arch/x86/kvm/svm.c b/arch/x86/kvm/svm.c
index 8d7cb62..0f3738a 100644
--- a/arch/x86/kvm/svm.c
+++ b/arch/x86/kvm/svm.c
@@ -1259,12 +1259,17 @@ static int ud_interception(struct vcpu_svm *svm)
return 1;
 }
 
-static int nm_interception(struct vcpu_svm *svm)
+static void svm_fpu_activate(struct kvm_vcpu *vcpu)
 {
+   struct vcpu_svm *svm = to_svm(vcpu);
svm->vmcb->control.intercept_exceptions &= ~(1 << NM_VECTOR);
svm->vcpu.fpu_active = 1;
update_cr0_intercept(svm);
+}
 
+static int nm_interception(struct vcpu_svm *svm)
+{
+   svm_fpu_activate(&svm->vcpu);
return 1;
 }
 
@@ -2971,6 +2976,7 @@ static struct kvm_x86_ops svm_x86_ops = {
.cache_reg = svm_cache_reg,
.get_rflags = svm_get_rflags,
.set_rflags = svm_set_rflags,
+   .fpu_activate = svm_fpu_activate,
.fpu_deactivate = svm_fpu_deactivate,
 
.tlb_flush = svm_flush_tlb,
diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c
index 7375ae1..372bc38 100644
--- a/arch/x86/kvm/vmx.c
+++ b/arch/x86/kvm/vmx.c
@@ -3011,6 +3011,7 @@ static int handle_cr(struct kvm_vcpu *vcpu)
vmcs_writel(CR0_READ_SHADOW, kvm_read_cr0(vcpu));
trace_kvm_cr_write(0, kvm_read_cr0(vcpu));
skip_emulated_instruction(vcpu);
+   vmx_fpu_activate(vcpu);
return 1;
case 1: /*mov from cr*/
switch (cr) {
diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index feca59f..09207ba 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -3266,6 +3266,7 @@ int emulate_invlpg(struct kvm_vcpu *vcpu, gva_t address)
 int emulate_clts(struct kvm_vcpu *vcpu)
 {
kvm_x86_ops->set_cr0(vcpu, kvm_read_cr0_bits(vcpu, ~X86_CR0_TS));
+   kvm_x86_ops->fpu_activate(vcpu);
return X86EMUL_CONTINUE;
 }
 
-- 
1.6.5.3

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 4/8] KVM: Add a helper for checking if the guest is in protected mode

2010-01-21 Thread Avi Kivity
Signed-off-by: Avi Kivity 
---
 arch/x86/kvm/emulate.c |9 -
 arch/x86/kvm/vmx.c |4 ++--
 arch/x86/kvm/x86.c |7 +++
 arch/x86/kvm/x86.h |6 ++
 4 files changed, 15 insertions(+), 11 deletions(-)

diff --git a/arch/x86/kvm/emulate.c b/arch/x86/kvm/emulate.c
index 0f89e32..e46f276 100644
--- a/arch/x86/kvm/emulate.c
+++ b/arch/x86/kvm/emulate.c
@@ -32,6 +32,7 @@
 #include 
 #include 
 
+#include "x86.h"
 #include "mmu.h"   /* for is_long_mode() */
 
 /*
@@ -1515,7 +1516,7 @@ emulate_syscall(struct x86_emulate_ctxt *ctxt)
 
/* syscall is not available in real mode */
if (c->lock_prefix || ctxt->mode == X86EMUL_MODE_REAL
-   || !kvm_read_cr0_bits(ctxt->vcpu, X86_CR0_PE))
+   || !is_protmode(ctxt->vcpu))
return -1;
 
setup_syscalls_segments(ctxt, &cs, &ss);
@@ -1568,8 +1569,7 @@ emulate_sysenter(struct x86_emulate_ctxt *ctxt)
return -1;
 
/* inject #GP if in real mode or paging is disabled */
-   if (ctxt->mode == X86EMUL_MODE_REAL ||
-   !kvm_read_cr0_bits(ctxt->vcpu, X86_CR0_PE)) {
+   if (ctxt->mode == X86EMUL_MODE_REAL || !is_protmode(ctxt->vcpu)) {
kvm_inject_gp(ctxt->vcpu, 0);
return -1;
}
@@ -1634,8 +1634,7 @@ emulate_sysexit(struct x86_emulate_ctxt *ctxt)
return -1;
 
/* inject #GP if in real mode or paging is disabled */
-   if (ctxt->mode == X86EMUL_MODE_REAL
-   || !kvm_read_cr0_bits(ctxt->vcpu, X86_CR0_PE)) {
+   if (ctxt->mode == X86EMUL_MODE_REAL || !is_protmode(ctxt->vcpu)) {
kvm_inject_gp(ctxt->vcpu, 0);
return -1;
}
diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c
index 372bc38..cd78049 100644
--- a/arch/x86/kvm/vmx.c
+++ b/arch/x86/kvm/vmx.c
@@ -1853,7 +1853,7 @@ static void vmx_get_segment(struct kvm_vcpu *vcpu,
 
 static int vmx_get_cpl(struct kvm_vcpu *vcpu)
 {
-   if (!kvm_read_cr0_bits(vcpu, X86_CR0_PE)) /* if real mode */
+   if (!is_protmode(vcpu))
return 0;
 
if (vmx_get_rflags(vcpu) & X86_EFLAGS_VM) /* if virtual 8086 */
@@ -2108,7 +2108,7 @@ static bool cs_ss_rpl_check(struct kvm_vcpu *vcpu)
 static bool guest_state_valid(struct kvm_vcpu *vcpu)
 {
/* real mode guest state checks */
-   if (!kvm_read_cr0_bits(vcpu, X86_CR0_PE)) {
+   if (!is_protmode(vcpu)) {
if (!rmode_segment_valid(vcpu, VCPU_SREG_CS))
return false;
if (!rmode_segment_valid(vcpu, VCPU_SREG_SS))
diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index 09207ba..6cdead0 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -3798,8 +3798,7 @@ int kvm_hv_hypercall(struct kvm_vcpu *vcpu)
 * hypercall generates UD from non zero cpl and real mode
 * per HYPER-V spec
 */
-   if (kvm_x86_ops->get_cpl(vcpu) != 0 ||
-   !kvm_read_cr0_bits(vcpu, X86_CR0_PE)) {
+   if (kvm_x86_ops->get_cpl(vcpu) != 0 || !is_protmode(vcpu)) {
kvm_queue_exception(vcpu, UD_VECTOR);
return 0;
}
@@ -4763,7 +4762,7 @@ int kvm_load_segment_descriptor(struct kvm_vcpu *vcpu, 
u16 selector,
 {
struct kvm_segment kvm_seg;
 
-   if (is_vm86_segment(vcpu, seg) || !(kvm_read_cr0_bits(vcpu, 
X86_CR0_PE)))
+   if (is_vm86_segment(vcpu, seg) || !is_protmode(vcpu))
return kvm_load_realmode_segment(vcpu, selector, seg);
if (load_segment_descriptor_to_kvm_desct(vcpu, selector, &kvm_seg))
return 1;
@@ -5115,7 +5114,7 @@ int kvm_arch_vcpu_ioctl_set_sregs(struct kvm_vcpu *vcpu,
/* Older userspace won't unhalt the vcpu on reset. */
if (kvm_vcpu_is_bsp(vcpu) && kvm_rip_read(vcpu) == 0xfff0 &&
sregs->cs.selector == 0xf000 && sregs->cs.base == 0x &&
-   !(kvm_read_cr0_bits(vcpu, X86_CR0_PE)))
+   !is_protmode(vcpu))
vcpu->arch.mp_state = KVM_MP_STATE_RUNNABLE;
 
vcpu_put(vcpu);
diff --git a/arch/x86/kvm/x86.h b/arch/x86/kvm/x86.h
index 5eadea5..f783d8f 100644
--- a/arch/x86/kvm/x86.h
+++ b/arch/x86/kvm/x86.h
@@ -2,6 +2,7 @@
 #define ARCH_X86_KVM_X86_H
 
 #include 
+#include "kvm_cache_regs.h"
 
 static inline void kvm_clear_exception_queue(struct kvm_vcpu *vcpu)
 {
@@ -35,4 +36,9 @@ static inline bool kvm_exception_is_soft(unsigned int nr)
 struct kvm_cpuid_entry2 *kvm_find_cpuid_entry(struct kvm_vcpu *vcpu,
  u32 function, u32 index);
 
+static inline bool is_protmode(struct kvm_vcpu *vcpu)
+{
+   return kvm_read_cr0_bits(vcpu, X86_CR0_PE);
+}
+
 #endif
-- 
1.6.5.3

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 2/8] KVM: Drop kvm_{load,put}_guest_fpu() exports

2010-01-21 Thread Avi Kivity
Not used anymore.

Signed-off-by: Avi Kivity 
---
 arch/x86/kvm/x86.c |2 --
 1 files changed, 0 insertions(+), 2 deletions(-)

diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index e3145d5..feca59f 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -5305,7 +5305,6 @@ void kvm_load_guest_fpu(struct kvm_vcpu *vcpu)
kvm_fx_save(&vcpu->arch.host_fx_image);
kvm_fx_restore(&vcpu->arch.guest_fx_image);
 }
-EXPORT_SYMBOL_GPL(kvm_load_guest_fpu);
 
 void kvm_put_guest_fpu(struct kvm_vcpu *vcpu)
 {
@@ -5318,7 +5317,6 @@ void kvm_put_guest_fpu(struct kvm_vcpu *vcpu)
++vcpu->stat.fpu_reload;
set_bit(KVM_REQ_DEACTIVATE_FPU, &vcpu->requests);
 }
-EXPORT_SYMBOL_GPL(kvm_put_guest_fpu);
 
 void kvm_arch_vcpu_free(struct kvm_vcpu *vcpu)
 {
-- 
1.6.5.3

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 6/8] KVM: Rename vcpu->shadow_efer to efer

2010-01-21 Thread Avi Kivity
None of the other registers have the shadow_ prefix.

Signed-off-by: Avi Kivity 
---
 arch/x86/include/asm/kvm_host.h |2 +-
 arch/x86/kvm/mmu.c  |2 +-
 arch/x86/kvm/svm.c  |   12 ++--
 arch/x86/kvm/vmx.c  |   14 +++---
 arch/x86/kvm/x86.c  |   14 +++---
 arch/x86/kvm/x86.h  |2 +-
 6 files changed, 23 insertions(+), 23 deletions(-)

diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
index bf3ec76..76bf686 100644
--- a/arch/x86/include/asm/kvm_host.h
+++ b/arch/x86/include/asm/kvm_host.h
@@ -277,7 +277,7 @@ struct kvm_vcpu_arch {
unsigned long cr8;
u32 hflags;
u64 pdptrs[4]; /* pae */
-   u64 shadow_efer;
+   u64 efer;
u64 apic_base;
struct kvm_lapic *apic;/* kernel irqchip context */
int32_t apic_arb_prio;
diff --git a/arch/x86/kvm/mmu.c b/arch/x86/kvm/mmu.c
index 6f7158f..599c422 100644
--- a/arch/x86/kvm/mmu.c
+++ b/arch/x86/kvm/mmu.c
@@ -237,7 +237,7 @@ static int is_cpuid_PSE36(void)
 
 static int is_nx(struct kvm_vcpu *vcpu)
 {
-   return vcpu->arch.shadow_efer & EFER_NX;
+   return vcpu->arch.efer & EFER_NX;
 }
 
 static int is_shadow_present_pte(u64 pte)
diff --git a/arch/x86/kvm/svm.c b/arch/x86/kvm/svm.c
index 0f3738a..0242fdd 100644
--- a/arch/x86/kvm/svm.c
+++ b/arch/x86/kvm/svm.c
@@ -231,7 +231,7 @@ static void svm_set_efer(struct kvm_vcpu *vcpu, u64 efer)
efer &= ~EFER_LME;
 
to_svm(vcpu)->vmcb->save.efer = efer | EFER_SVME;
-   vcpu->arch.shadow_efer = efer;
+   vcpu->arch.efer = efer;
 }
 
 static void svm_queue_exception(struct kvm_vcpu *vcpu, unsigned nr,
@@ -990,14 +990,14 @@ static void svm_set_cr0(struct kvm_vcpu *vcpu, unsigned 
long cr0)
struct vcpu_svm *svm = to_svm(vcpu);
 
 #ifdef CONFIG_X86_64
-   if (vcpu->arch.shadow_efer & EFER_LME) {
+   if (vcpu->arch.efer & EFER_LME) {
if (!is_paging(vcpu) && (cr0 & X86_CR0_PG)) {
-   vcpu->arch.shadow_efer |= EFER_LMA;
+   vcpu->arch.efer |= EFER_LMA;
svm->vmcb->save.efer |= EFER_LMA | EFER_LME;
}
 
if (is_paging(vcpu) && !(cr0 & X86_CR0_PG)) {
-   vcpu->arch.shadow_efer &= ~EFER_LMA;
+   vcpu->arch.efer &= ~EFER_LMA;
svm->vmcb->save.efer &= ~(EFER_LMA | EFER_LME);
}
}
@@ -1361,7 +1361,7 @@ static int vmmcall_interception(struct vcpu_svm *svm)
 
 static int nested_svm_check_permissions(struct vcpu_svm *svm)
 {
-   if (!(svm->vcpu.arch.shadow_efer & EFER_SVME)
+   if (!(svm->vcpu.arch.efer & EFER_SVME)
|| !is_paging(&svm->vcpu)) {
kvm_queue_exception(&svm->vcpu, UD_VECTOR);
return 1;
@@ -1764,7 +1764,7 @@ static bool nested_svm_vmrun(struct vcpu_svm *svm)
hsave->save.ds = vmcb->save.ds;
hsave->save.gdtr   = vmcb->save.gdtr;
hsave->save.idtr   = vmcb->save.idtr;
-   hsave->save.efer   = svm->vcpu.arch.shadow_efer;
+   hsave->save.efer   = svm->vcpu.arch.efer;
hsave->save.cr0= kvm_read_cr0(&svm->vcpu);
hsave->save.cr4= svm->vcpu.arch.cr4;
hsave->save.rflags = vmcb->save.rflags;
diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c
index cd78049..d4a6260 100644
--- a/arch/x86/kvm/vmx.c
+++ b/arch/x86/kvm/vmx.c
@@ -618,7 +618,7 @@ static bool update_transition_efer(struct vcpu_vmx *vmx, 
int efer_offset)
u64 guest_efer;
u64 ignore_bits;
 
-   guest_efer = vmx->vcpu.arch.shadow_efer;
+   guest_efer = vmx->vcpu.arch.efer;
 
/*
 * NX is emulated; LMA and LME handled by hardware; SCE meaninless
@@ -963,7 +963,7 @@ static void setup_msrs(struct vcpu_vmx *vmx)
 * if efer.sce is enabled.
 */
index = __find_msr_index(vmx, MSR_K6_STAR);
-   if ((index >= 0) && (vmx->vcpu.arch.shadow_efer & EFER_SCE))
+   if ((index >= 0) && (vmx->vcpu.arch.efer & EFER_SCE))
move_msr_up(vmx, index, save_nmsrs++);
}
 #endif
@@ -1608,7 +1608,7 @@ static void vmx_set_efer(struct kvm_vcpu *vcpu, u64 efer)
 * of this msr depends on is_long_mode().
 */
vmx_load_host_state(to_vmx(vcpu));
-   vcpu->arch.shadow_efer = efer;
+   vcpu->arch.efer = efer;
if (!msr)
return;
if (efer & EFER_LMA) {
@@ -1640,13 +1640,13 @@ static void enter_lmode(struct kvm_vcpu *vcpu)
 (guest_tr_ar & ~AR_TYPE_MASK)
 | AR_TYPE_BUSY_64_TSS);
}
-   vcpu->arch.shadow_efer |= EFER_LMA;
-   vmx_set_efer(vcpu, vcpu->arch.shadow_efer);
+   vcpu->arch.efer |= EFER_LMA;
+   vmx_set_efer(vcpu, vcpu->arch.efer);
 }
 
 static void exit_lmode(struct kvm_vcpu *vcpu)
 {
-   vcpu->arch.s

[PATCH 8/8] KVM: trace guest fpu loads and unloads

2010-01-21 Thread Avi Kivity
Signed-off-by: Avi Kivity 
---
 arch/x86/kvm/x86.c |2 ++
 include/trace/events/kvm.h |   19 +++
 2 files changed, 21 insertions(+), 0 deletions(-)

diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index 8b42c19..06a03c1 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -5304,6 +5304,7 @@ void kvm_load_guest_fpu(struct kvm_vcpu *vcpu)
vcpu->guest_fpu_loaded = 1;
kvm_fx_save(&vcpu->arch.host_fx_image);
kvm_fx_restore(&vcpu->arch.guest_fx_image);
+   trace_kvm_fpu(1);
 }
 
 void kvm_put_guest_fpu(struct kvm_vcpu *vcpu)
@@ -5316,6 +5317,7 @@ void kvm_put_guest_fpu(struct kvm_vcpu *vcpu)
kvm_fx_restore(&vcpu->arch.host_fx_image);
++vcpu->stat.fpu_reload;
set_bit(KVM_REQ_DEACTIVATE_FPU, &vcpu->requests);
+   trace_kvm_fpu(0);
 }
 
 void kvm_arch_vcpu_free(struct kvm_vcpu *vcpu)
diff --git a/include/trace/events/kvm.h b/include/trace/events/kvm.h
index dbe1084..8abdc12 100644
--- a/include/trace/events/kvm.h
+++ b/include/trace/events/kvm.h
@@ -145,6 +145,25 @@ TRACE_EVENT(kvm_mmio,
  __entry->len, __entry->gpa, __entry->val)
 );
 
+#define kvm_fpu_load_symbol\
+   {0, "unload"},  \
+   {1, "load"}
+
+TRACE_EVENT(kvm_fpu,
+   TP_PROTO(int load),
+   TP_ARGS(load),
+
+   TP_STRUCT__entry(
+   __field(u32,load)
+   ),
+
+   TP_fast_assign(
+   __entry->load   = load;
+   ),
+
+   TP_printk("%s", __print_symbolic(__entry->load, kvm_fpu_load_symbol))
+);
+
 #endif /* _TRACE_KVM_MAIN_H */
 
 /* This part must be outside protection */
-- 
1.6.5.3

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 5/8] KVM: Move cr0/cr4/efer related helpers to x86.h

2010-01-21 Thread Avi Kivity
They have more general scope than the mmu.

Signed-off-by: Avi Kivity 
---
 arch/x86/kvm/emulate.c |1 -
 arch/x86/kvm/mmu.c |1 +
 arch/x86/kvm/mmu.h |   24 
 arch/x86/kvm/x86.h |   24 
 4 files changed, 25 insertions(+), 25 deletions(-)

diff --git a/arch/x86/kvm/emulate.c b/arch/x86/kvm/emulate.c
index e46f276..a2adec8 100644
--- a/arch/x86/kvm/emulate.c
+++ b/arch/x86/kvm/emulate.c
@@ -33,7 +33,6 @@
 #include 
 
 #include "x86.h"
-#include "mmu.h"   /* for is_long_mode() */
 
 /*
  * Opcode effective-address decode tables.
diff --git a/arch/x86/kvm/mmu.c b/arch/x86/kvm/mmu.c
index ff2b2e8..6f7158f 100644
--- a/arch/x86/kvm/mmu.c
+++ b/arch/x86/kvm/mmu.c
@@ -18,6 +18,7 @@
  */
 
 #include "mmu.h"
+#include "x86.h"
 #include "kvm_cache_regs.h"
 
 #include 
diff --git a/arch/x86/kvm/mmu.h b/arch/x86/kvm/mmu.h
index 599159f..61ef5a6 100644
--- a/arch/x86/kvm/mmu.h
+++ b/arch/x86/kvm/mmu.h
@@ -58,30 +58,6 @@ static inline int kvm_mmu_reload(struct kvm_vcpu *vcpu)
return kvm_mmu_load(vcpu);
 }
 
-static inline int is_long_mode(struct kvm_vcpu *vcpu)
-{
-#ifdef CONFIG_X86_64
-   return vcpu->arch.shadow_efer & EFER_LMA;
-#else
-   return 0;
-#endif
-}
-
-static inline int is_pae(struct kvm_vcpu *vcpu)
-{
-   return kvm_read_cr4_bits(vcpu, X86_CR4_PAE);
-}
-
-static inline int is_pse(struct kvm_vcpu *vcpu)
-{
-   return kvm_read_cr4_bits(vcpu, X86_CR4_PSE);
-}
-
-static inline int is_paging(struct kvm_vcpu *vcpu)
-{
-   return kvm_read_cr0_bits(vcpu, X86_CR0_PG);
-}
-
 static inline int is_present_gpte(unsigned long pte)
 {
return pte & PT_PRESENT_MASK;
diff --git a/arch/x86/kvm/x86.h b/arch/x86/kvm/x86.h
index f783d8f..2dc24a7 100644
--- a/arch/x86/kvm/x86.h
+++ b/arch/x86/kvm/x86.h
@@ -41,4 +41,28 @@ static inline bool is_protmode(struct kvm_vcpu *vcpu)
return kvm_read_cr0_bits(vcpu, X86_CR0_PE);
 }
 
+static inline int is_long_mode(struct kvm_vcpu *vcpu)
+{
+#ifdef CONFIG_X86_64
+   return vcpu->arch.shadow_efer & EFER_LMA;
+#else
+   return 0;
+#endif
+}
+
+static inline int is_pae(struct kvm_vcpu *vcpu)
+{
+   return kvm_read_cr4_bits(vcpu, X86_CR4_PAE);
+}
+
+static inline int is_pse(struct kvm_vcpu *vcpu)
+{
+   return kvm_read_cr4_bits(vcpu, X86_CR4_PSE);
+}
+
+static inline int is_paging(struct kvm_vcpu *vcpu)
+{
+   return kvm_read_cr0_bits(vcpu, X86_CR0_PG);
+}
+
 #endif
-- 
1.6.5.3

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 7/8] KVM: Optimize kvm_read_cr[04]_bits()

2010-01-21 Thread Avi Kivity
'mask' is always a constant, so we can check whether it includes a bit that
might be owned by the guest very cheaply, and avoid the decache call.  Saves
a few hundred bytes of module text.

Signed-off-by: Avi Kivity 
---
 arch/x86/kvm/kvm_cache_regs.h |9 +++--
 1 files changed, 7 insertions(+), 2 deletions(-)

diff --git a/arch/x86/kvm/kvm_cache_regs.h b/arch/x86/kvm/kvm_cache_regs.h
index 6b419a3..5a109c6 100644
--- a/arch/x86/kvm/kvm_cache_regs.h
+++ b/arch/x86/kvm/kvm_cache_regs.h
@@ -1,6 +1,9 @@
 #ifndef ASM_KVM_CACHE_REGS_H
 #define ASM_KVM_CACHE_REGS_H
 
+#define KVM_POSSIBLE_CR0_GUEST_BITS X86_CR0_TS
+#define KVM_POSSIBLE_CR4_GUEST_BITS X86_CR4_PGE
+
 static inline unsigned long kvm_register_read(struct kvm_vcpu *vcpu,
  enum kvm_reg reg)
 {
@@ -40,7 +43,8 @@ static inline u64 kvm_pdptr_read(struct kvm_vcpu *vcpu, int 
index)
 
 static inline ulong kvm_read_cr0_bits(struct kvm_vcpu *vcpu, ulong mask)
 {
-   if (mask & vcpu->arch.cr0_guest_owned_bits)
+   ulong tmask = mask & KVM_POSSIBLE_CR0_GUEST_BITS;
+   if (tmask & vcpu->arch.cr0_guest_owned_bits)
kvm_x86_ops->decache_cr0_guest_bits(vcpu);
return vcpu->arch.cr0 & mask;
 }
@@ -52,7 +56,8 @@ static inline ulong kvm_read_cr0(struct kvm_vcpu *vcpu)
 
 static inline ulong kvm_read_cr4_bits(struct kvm_vcpu *vcpu, ulong mask)
 {
-   if (mask & vcpu->arch.cr4_guest_owned_bits)
+   ulong tmask = mask & KVM_POSSIBLE_CR4_GUEST_BITS;
+   if (tmask & vcpu->arch.cr4_guest_owned_bits)
kvm_x86_ops->decache_cr4_guest_bits(vcpu);
return vcpu->arch.cr4 & mask;
 }
-- 
1.6.5.3

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


How to debug Ubuntu 8.04 LTS guest crash during install?

2010-01-21 Thread Neil Aggarwal
Hello:

I am using kvm on a CentOS 5.4 server.

I am trying to install the TunkeyLinux Core appliance 
found here: http://www.turnkeylinux.org/core

I downloaded the ISO file from the web site.

Then, I used this command to intall it:
virt-install -n tkl-core -r 512 --vcpus=1 --check-cpu --os-type=linux 
--os-variant=ubuntuhardy -v --accelerate 
-c /tmp/turnkey-core-2009.10-hardy-x86.iso 
-f /var/lib/libvirt/images/tkl-core.img -s 15 -b br0 --vnc noautoconsole

When I connect to the VNC console, I get the Turnkey linux 
options screen.
I select Install to hard disk from there and it seems to 
start the install but crashes during the installer startup.

This is repeatable so there has to be a way to debug it.

I tried turning on the debug option for virt-install but that
did not give me any useful info.

Any ideas how to debug this?

Thanks,
Neil

--
Neil Aggarwal, (281)846-8957, http://UnmeteredVPS.net/cpanel
cPanel/WHM preinstalled on a virtual server for only $40/month!
No overage charges, 7 day free trial, PayPal, Google Checkout

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Luvalley-5 has been released (with whitepaper!): enables arbitrary OS to run VMs without any modification

2010-01-21 Thread Xiaodong Yi
Luvalley is a lightweight type-1 Virtual Machine Monitor (VMM).
Its part of source codes are derived from KVM to virtualize
CPU instructions and memory management unit (MMU). However, its
overall architecture is completely different from KVM, but somewhat
like Xen. Luvalley runs outside of Linux, just like Xen's architecture.
Any operating system, including Linux, could be used as
Luvalley's scheduler, memory manager, physical device driver provider
and virtual IO device
emulator. Currently, Luvalley supports Linux and Windows. That is to
say, one may run Luvalley to boot a Linux or Windows, and then run
multiple virtualized operating systems on such Linux or Windows.

If you are interested in Luvalley project, you may download the source
codes as well as the whitepaper from
   http://sourceforge.net/projects/luvalley/

The main changes of this release (Luvalley-5) are:

 * The code derived is updated from KVM-83 to KVM-88

 * Supports both Intel and AMD CPUs

 * Automatically identify Intel and AMD CPUs

This release (Luvalley-5) includes:

 * Luvalley whitepaper (the first edition)

 * Luvalley binary and source code tarball

 * Readme, changelog and release notes files
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [Qemu-devel] [PATCH] Add definitions for current cpu models..

2010-01-21 Thread Andre Przywara

john cooper wrote:

Chris Wright wrote:

* Daniel P. Berrange (berra...@redhat.com) wrote:

To be honest all possible naming schemes for '-cpu ' are just as
unfriendly as each other. The only user friendly option is '-cpu host'. 


IMHO, we should just pick a concise naming scheme & document it. Given
they are all equally unfriendly, the one that has consistency with vmware
naming seems like a mild winner.

Heh, I completely agree, and was just saying the same thing to John
earlier today.  May as well be -cpu {foo,bar,baz} since the meaning for
those command line options must be well-documented in the man page.


I can appreciate the concern of wanting to get this
as "correct" as possible.  But ultimately we just
need three unique tags which ideally have some relation
to their associated architectures.  The diatribes
available from /proc/cpuinfo while generally accurate
don't really offer any more of a clue to the model
group, and in their unmodified form are rather unwieldy
as command line flags.
I agree. I'd underline that this patch is for migration purposes only, 
so you don't want to specify an exact CPU, but more like a class of 
CPUs. If you look into the available CPUID features in each CPU, you 
will find that there are only a few groups, with currently three for 
each vendor being a good guess.
/proc/cpuinfo just prints out marketing names, which have only a mild 
relationship to a feature-related technical CPU model. Maybe we can use 
a generation approach like the AMD Opteron ones for Intel, too.

These G1/G2/G3 names are just arbitrary and have no roots within AMD.

I think that an exact CPU model specification is out of scope for this 
patch and maybe even for QEMU. One could create a database with CPU 
names and associated CPUID flags and provide an external tool to 
generate a QEMU command line out of this. Keeping this database 
up-to-date (especially for desktop CPU models) is a burden that the QEMU 
project does not want to bear.





This is from an EVC kb article[1]:


Here is a pointer to a more detailed version:

   
http://kb.vmware.com/selfservice/microsites/search.do?language=en_US&cmd=displayKC&externalId=1003212


We probably should also add an option to dump out the
full set of qemu-side cpuid flags for the benefit of
users and upper level tools.

You mean like this one?
http://lists.gnu.org/archive/html/qemu-devel/2009-09/msg01228.html
Resending this patch set is on my plan for next week. What is the state 
of this patch? Will it go in soon? Then I'd rebase my patch set on top 
of it.


Regards,
Andre.

--
Andre Przywara
AMD-OSRC (Dresden)
Tel: x29712

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [Qemu-devel] [PATCH] Add definitions for current cpu models..

2010-01-21 Thread Anthony Liguori

On 01/20/2010 07:18 PM, john cooper wrote:

Chris Wright wrote:
   

* Daniel P. Berrange (berra...@redhat.com) wrote:
 

To be honest all possible naming schemes for '-cpu' are just as
unfriendly as each other. The only user friendly option is '-cpu host'.

IMHO, we should just pick a concise naming scheme&  document it. Given
they are all equally unfriendly, the one that has consistency with vmware
naming seems like a mild winner.
   

Heh, I completely agree, and was just saying the same thing to John
earlier today.  May as well be -cpu {foo,bar,baz} since the meaning for
those command line options must be well-documented in the man page.
 

I can appreciate the concern of wanting to get this
as "correct" as possible.
   


This is the root of the trouble.  At the qemu layer, we try to focus on 
being correct.


Management tools are typically the layer that deals with being "correct".

A good compromise is making things user tunable which means that a 
downstream can make "correctness" decisions without forcing those 
decisions on upstream.


In this case, the idea would be to introduce a new option, say something 
like -cpu-def.  The syntax would be:


 -cpu-def 
name=coreduo,level=10,family=6,model=14,stepping=8,features=+vme+mtrr+clflush+mca+sse3+monitor,xlevel=0x8008,model_id="Genuine 
Intel(R) CPU T2600 @ 2.16GHz"


Which is not that exciting since it just lets you do -cpu coreduo in a 
much more complex way.  However, if we take advantage of the current 
config support, you can have:


[cpu-def]
  name=coreduo
  level=10
  family=6
  model=14
  stepping=8
  features="+vme+mtrr+clflush+mca+sse3.."
  model_id="Genuine Intel..."

And that can be stored in a config file.  We should then parse 
/etc/qemu/target-.conf by default.  We'll move the current 
x86_defs table into this config file and then downstreams/users can 
define whatever compatibility classes they want.


With this feature, I'd be inclined to take "correct" compatibility 
classes like Nehalem as part of the default qemurc that we install 
because it's easily overridden by a user.  It then becomes just a 
suggestion on our part verses a guarantee.


It should just be a matter of adding qemu_cpudefs_opts to 
qemu-config.[ch], taking a new command line that parses the argument via 
QemuOpts, then passing the parsed options to a target-specific function 
that then builds the table of supported cpus.


Regards,

Anthony Liguori
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v3 04/12] Add "handle page fault" PV helper.

2010-01-21 Thread H. Peter Anvin
On 01/21/2010 01:02 AM, Avi Kivity wrote:
>>
>> You can also just emulate the state transition -- since you know
>> you're dealing with a flat protected-mode or long-mode OS (and just
>> make that a condition of enabling the feature) you don't have to deal
>> with all the strange combinations of directions that an unrestricted
>> x86 event can take.  Since it's an exception, it is unconditional.
> 
> Do you mean create the stack frame manually?  I'd really like to avoid
> that for many reasons, one of which is performance (need to do all the
> virt-to-phys walks manually), the other is that we're certain to end up
> with something horribly underspecified.  I'd really like to keep as
> close as possible to the hardware.  For the alternative approach, see Xen.
> 

I obviously didn't mean to do something which didn't look like a
hardware-delivered exception.  That by itself provides a tight spec.
The performance issue is real, of course.

Obviously, the design of VT-x was before my time at Intel, so I'm not
familiar with why the tradeoffs that were done they way they were.

-hpa

-- 
H. Peter Anvin, Intel Open Source Technology Center
I work for Intel.  I don't speak on their behalf.

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCHv2 1/3] eventfd: allow atomic read and waitqueue remove

2010-01-21 Thread Michael S. Tsirkin
This is a backport of commit: 03db343a6320f780937078433fa7d8da955e6fce
modified in a way that introduces some code duplication on the one hand,
but reduces the risk of regressing existing eventfd users on the other
hand.

KVM needs a wait to atomically remove themselves from the eventfd
->poll() wait queue head, in order to handle correctly their IRQfd
deassign operation.

This patch introduces such API, plus a way to read an eventfd from its
context.

Signed-off-by: Michael S. Tsirkin 
---

Avi, Davidel, how about only including the following part for -stable
then?  Reason is, I still would like to be able to use irqfd there, and
getting spurious interrupts 100% of times unmask is done isn't a very
good idea IMO ...


 fs/eventfd.c|   35 +++
 include/linux/eventfd.h |9 +
 2 files changed, 44 insertions(+), 0 deletions(-)

diff --git a/fs/eventfd.c b/fs/eventfd.c
index 8b47e42..ea9c18a 100644
--- a/fs/eventfd.c
+++ b/fs/eventfd.c
@@ -135,6 +135,41 @@ static unsigned int eventfd_poll(struct file *file, 
poll_table *wait)
return events;
 }
 
+static void eventfd_ctx_do_read(struct eventfd_ctx *ctx, __u64 *cnt)
+{
+   *cnt = (ctx->flags & EFD_SEMAPHORE) ? 1 : ctx->count;
+   ctx->count -= *cnt;
+}
+
+/**
+ * eventfd_ctx_remove_wait_queue - Read the current counter and removes wait 
queue.
+ * @ctx: [in] Pointer to eventfd context.
+ * @wait: [in] Wait queue to be removed.
+ * @cnt: [out] Pointer to the 64bit conter value.
+ *
+ * Returns zero if successful, or the following error codes:
+ *
+ * -EAGAIN  : The operation would have blocked.
+ *
+ * This is used to atomically remove a wait queue entry from the eventfd wait
+ * queue head, and read/reset the counter value.
+ */
+int eventfd_ctx_remove_wait_queue(struct eventfd_ctx *ctx, wait_queue_t *wait,
+ __u64 *cnt)
+{
+   unsigned long flags;
+
+   spin_lock_irqsave(&ctx->wqh.lock, flags);
+   eventfd_ctx_do_read(ctx, cnt);
+   __remove_wait_queue(&ctx->wqh, wait);
+   if (*cnt != 0 && waitqueue_active(&ctx->wqh))
+   wake_up_locked_poll(&ctx->wqh, POLLOUT);
+   spin_unlock_irqrestore(&ctx->wqh.lock, flags);
+
+   return *cnt != 0 ? 0 : -EAGAIN;
+}
+EXPORT_SYMBOL_GPL(eventfd_ctx_remove_wait_queue);
+
 static ssize_t eventfd_read(struct file *file, char __user *buf, size_t count,
loff_t *ppos)
 {
diff --git a/include/linux/eventfd.h b/include/linux/eventfd.h
index 94dd103..85eac48 100644
--- a/include/linux/eventfd.h
+++ b/include/linux/eventfd.h
@@ -10,6 +10,7 @@
 
 #include 
 #include 
+#include 
 
 /*
  * CAREFUL: Check include/asm-generic/fcntl.h when defining
@@ -34,6 +35,8 @@ struct file *eventfd_fget(int fd);
 struct eventfd_ctx *eventfd_ctx_fdget(int fd);
 struct eventfd_ctx *eventfd_ctx_fileget(struct file *file);
 int eventfd_signal(struct eventfd_ctx *ctx, int n);
+int eventfd_ctx_remove_wait_queue(struct eventfd_ctx *ctx, wait_queue_t *wait,
+ __u64 *cnt);
 
 #else /* CONFIG_EVENTFD */
 
@@ -61,6 +64,12 @@ static inline void eventfd_ctx_put(struct eventfd_ctx *ctx)
 
 }
 
+static inline int eventfd_ctx_remove_wait_queue(struct eventfd_ctx *ctx,
+   wait_queue_t *wait, __u64 *cnt)
+{
+   return -ENOSYS;
+}
+
 #endif
 
 #endif /* _LINUX_EVENTFD_H */
-- 
1.6.6.144.g5c3af
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [Qemu-devel] [PATCH] Add definitions for current cpu models..

2010-01-21 Thread john cooper
Anthony Liguori wrote:
> On 01/20/2010 07:18 PM, john cooper wrote: 
>> I can appreciate the concern of wanting to get this
>> as "correct" as possible.
>>
> 
> This is the root of the trouble.  At the qemu layer, we try to focus on
> being correct.
> 
> Management tools are typically the layer that deals with being "correct".
> 
> A good compromise is making things user tunable which means that a
> downstream can make "correctness" decisions without forcing those
> decisions on upstream.

Conceptually I agree with such a malleable approach -- actually
I prefer it.  I thought however it was too much infrastructure to
foist on the problem just to add a few more models into the mix.

The only reservation which comes to mind is that of logistics.
This may ruffle the code some and impact others such as Andre
who seem to have existing patches relative to the current structure.
Anyone have strong objections to this approach before I have a
look at an implementation?

Thanks,

-john


-- 
john.coo...@redhat.com
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCHv2 1/3] eventfd: allow atomic read and waitqueue remove

2010-01-21 Thread Davide Libenzi
On Thu, 21 Jan 2010, Michael S. Tsirkin wrote:

> This is a backport of commit: 03db343a6320f780937078433fa7d8da955e6fce
> modified in a way that introduces some code duplication on the one hand,
> but reduces the risk of regressing existing eventfd users on the other
> hand.
> 
> KVM needs a wait to atomically remove themselves from the eventfd
> ->poll() wait queue head, in order to handle correctly their IRQfd
> deassign operation.
> 
> This patch introduces such API, plus a way to read an eventfd from its
> context.
> 
> Signed-off-by: Michael S. Tsirkin 
> ---
> 
> Avi, Davidel, how about only including the following part for -stable
> then?  Reason is, I still would like to be able to use irqfd there, and
> getting spurious interrupts 100% of times unmask is done isn't a very
> good idea IMO ...

It's the same thing. Unless there are *real* problems in KVM due to the 
spurious ints, I still think this is .33 material.


- Davide


--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [Qemu-devel] [PATCH] Add definitions for current cpu models..

2010-01-21 Thread Blue Swirl
On Thu, Jan 21, 2010 at 2:39 PM, Andre Przywara  wrote:
> john cooper wrote:
>>
>> Chris Wright wrote:
>>>
>>> * Daniel P. Berrange (berra...@redhat.com) wrote:

 To be honest all possible naming schemes for '-cpu ' are just as
 unfriendly as each other. The only user friendly option is '-cpu host'.
 IMHO, we should just pick a concise naming scheme & document it. Given
 they are all equally unfriendly, the one that has consistency with
 vmware
 naming seems like a mild winner.
>>>
>>> Heh, I completely agree, and was just saying the same thing to John
>>> earlier today.  May as well be -cpu {foo,bar,baz} since the meaning for
>>> those command line options must be well-documented in the man page.
>>
>> I can appreciate the concern of wanting to get this
>> as "correct" as possible.  But ultimately we just
>> need three unique tags which ideally have some relation
>> to their associated architectures.  The diatribes
>> available from /proc/cpuinfo while generally accurate
>> don't really offer any more of a clue to the model
>> group, and in their unmodified form are rather unwieldy
>> as command line flags.
>
> I agree. I'd underline that this patch is for migration purposes only, so
> you don't want to specify an exact CPU, but more like a class of CPUs. If
> you look into the available CPUID features in each CPU, you will find that
> there are only a few groups, with currently three for each vendor being a
> good guess.
> /proc/cpuinfo just prints out marketing names, which have only a mild
> relationship to a feature-related technical CPU model. Maybe we can use a
> generation approach like the AMD Opteron ones for Intel, too.
> These G1/G2/G3 names are just arbitrary and have no roots within AMD.
>
> I think that an exact CPU model specification is out of scope for this patch
> and maybe even for QEMU. One could create a database with CPU names and
> associated CPUID flags and provide an external tool to generate a QEMU
> command line out of this. Keeping this database up-to-date (especially for
> desktop CPU models) is a burden that the QEMU project does not want to bear.
>
>>
>>> This is from an EVC kb article[1]:
>>
>> Here is a pointer to a more detailed version:
>>
>>
>> http://kb.vmware.com/selfservice/microsites/search.do?language=en_US&cmd=displayKC&externalId=1003212
>>
>>
>> We probably should also add an option to dump out the
>> full set of qemu-side cpuid flags for the benefit of
>> users and upper level tools.
>
> You mean like this one?
> http://lists.gnu.org/archive/html/qemu-devel/2009-09/msg01228.html
> Resending this patch set is on my plan for next week. What is the state of
> this patch? Will it go in soon? Then I'd rebase my patch set on top of it.

FYI, a similar CPU flag mechanism has been implemented for Sparc and
x86, unifying these would be cool.
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCHv2 1/3] eventfd: allow atomic read and waitqueue remove

2010-01-21 Thread Avi Kivity

On 01/21/2010 06:58 PM, Davide Libenzi wrote:

On Thu, 21 Jan 2010, Michael S. Tsirkin wrote:

   

This is a backport of commit: 03db343a6320f780937078433fa7d8da955e6fce
modified in a way that introduces some code duplication on the one hand,
but reduces the risk of regressing existing eventfd users on the other
hand.

KVM needs a wait to atomically remove themselves from the eventfd
->poll() wait queue head, in order to handle correctly their IRQfd
deassign operation.

This patch introduces such API, plus a way to read an eventfd from its
context.

Signed-off-by: Michael S. Tsirkin
---

Avi, Davidel, how about only including the following part for -stable
then?  Reason is, I still would like to be able to use irqfd there, and
getting spurious interrupts 100% of times unmask is done isn't a very
good idea IMO ...
 

It's the same thing. Unless there are *real* problems in KVM due to the
spurious ints, I still think this is .33 material.
   


I agree.

But I think we can solve this in another way in .32: we can clear the 
eventfd from irqfd->inject work, which is in process context.  The new 
stuff is only needed for lockless clearing, no?


--
Do not meddle in the internals of kernels, for they are subtle and quick to 
panic.

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCHv2 1/3] eventfd: allow atomic read and waitqueue remove

2010-01-21 Thread Avi Kivity

On 01/21/2010 07:13 PM, Avi Kivity wrote:


But I think we can solve this in another way in .32: we can clear the 
eventfd from irqfd->inject work, which is in process context.  The new 
stuff is only needed for lockless clearing, no?




I meant atomic clearing, when we inject interrupts from the irqfd atomic 
context.


--
Do not meddle in the internals of kernels, for they are subtle and quick to 
panic.

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCHv2 1/3] eventfd: allow atomic read and waitqueue remove

2010-01-21 Thread Michael S. Tsirkin
On Thu, Jan 21, 2010 at 07:13:13PM +0200, Avi Kivity wrote:
> On 01/21/2010 06:58 PM, Davide Libenzi wrote:
>> On Thu, 21 Jan 2010, Michael S. Tsirkin wrote:
>>
>>
>>> This is a backport of commit: 03db343a6320f780937078433fa7d8da955e6fce
>>> modified in a way that introduces some code duplication on the one hand,
>>> but reduces the risk of regressing existing eventfd users on the other
>>> hand.
>>>
>>> KVM needs a wait to atomically remove themselves from the eventfd
>>> ->poll() wait queue head, in order to handle correctly their IRQfd
>>> deassign operation.
>>>
>>> This patch introduces such API, plus a way to read an eventfd from its
>>> context.
>>>
>>> Signed-off-by: Michael S. Tsirkin
>>> ---
>>>
>>> Avi, Davidel, how about only including the following part for -stable
>>> then?  Reason is, I still would like to be able to use irqfd there, and
>>> getting spurious interrupts 100% of times unmask is done isn't a very
>>> good idea IMO ...
>>>  
>> It's the same thing. Unless there are *real* problems in KVM due to the
>> spurious ints, I still think this is .33 material.
>>
>
> I agree.
>
> But I think we can solve this in another way in .32: we can clear the  
> eventfd from irqfd->inject work, which is in process context.  The new  
> stuff is only needed for lockless clearing, no?

No, AFAIK there's no way to clear the counter from kernel without
this patch.

> -- 
> Do not meddle in the internals of kernels, for they are subtle and quick to 
> panic.
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCHv2 1/3] eventfd: allow atomic read and waitqueue remove

2010-01-21 Thread Avi Kivity

On 01/21/2010 07:23 PM, Michael S. Tsirkin wrote:



I agree.

But I think we can solve this in another way in .32: we can clear the
eventfd from irqfd->inject work, which is in process context.  The new
stuff is only needed for lockless clearing, no?
 

No, AFAIK there's no way to clear the counter from kernel without
this patch.
   


Can't you read from the file?

--
Do not meddle in the internals of kernels, for they are subtle and quick to 
panic.

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


repeatable hang with loop mount and heavy IO in guest

2010-01-21 Thread Antoine Martin
I've tried various guests, including most recent Fedora12 kernels, 
custom 2.6.32.x
All of them hang around the same point (~1GB written) when I do heavy IO 
write inside the guest.
I have waited 30 minutes to see if the guest would recover, but it just 
sits there, not writing back any data, not doing anything - but 
certainly not allowing any new IO writes. The host has some load on it, 
but nothing heavy enough to completely hand a guest for that long.


mount -o loop some_image.fs ./somewhere bs=512
dd if=/dev/zero of=/somewhere/zero
then after ~1GB: sync

Host is running: 2.6.31.4
QEMU PC emulator version 0.10.50 (qemu-kvm-devel-88)

Guests are booted with "elevator=noop" as the filesystems are stored as 
files, accessed as virtio disks.



The "hung" backtraces always look similar to these:
[  361.460136] INFO: task loop0:2097 blocked for more than 120 seconds.
[  361.460139] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" 
disables this message.
[  361.460142] loop0 D 88000b92c848 0  2097  2 
0x0080
[  361.460148]  88000b92c5d0 0046 880008c1f810 
880009829fd8
[  361.460153]  880009829fd8 880009829fd8 88000a21ee80 
88000b92c5d0
[  361.460157]  880009829610 8181b768 880001af33b0 
0002

[  361.460161] Call Trace:
[  361.460216]  [] ? sync_page+0x0/0x43
[  361.460253]  [] ? io_schedule+0x2c/0x43
[  361.460257]  [] ? sync_page+0x3e/0x43
[  361.460261]  [] ? __wait_on_bit+0x41/0x71
[  361.460264]  [] ? wait_on_page_bit+0x6a/0x70
[  361.460283]  [] ? wake_bit_function+0x0/0x23
[  361.460287]  [] ? shrink_page_list+0x3e5/0x61e
[  361.460291]  [] ? schedule_timeout+0xa3/0xbe
[  361.460305]  [] ? autoremove_wake_function+0x0/0x2e
[  361.460308]  [] ? shrink_zone+0x7e1/0xaf6
[  361.460310]  [] ? determine_dirtyable_memory+0xd/0x17
[  361.460314]  [] ? isolate_pages_global+0xa3/0x216
[  361.460316]  [] ? mark_page_accessed+0x2a/0x39
[  361.460335]  [] ? __find_get_block+0x13b/0x15c
[  361.460337]  [] ? try_to_free_pages+0x1ab/0x2c9
[  361.460340]  [] ? isolate_pages_global+0x0/0x216
[  361.460343]  [] ? __alloc_pages_nodemask+0x394/0x564
[  361.460350]  [] ? __slab_alloc+0x137/0x44f
[  361.460371]  [] ? radix_tree_preload+0x1f/0x6a
[  361.460374]  [] ? kmem_cache_alloc+0x5d/0x88
[  361.460376]  [] ? radix_tree_preload+0x1f/0x6a
[  361.460379]  [] ? add_to_page_cache_locked+0x1d/0xf1
[  361.460381]  [] ? add_to_page_cache_lru+0x27/0x57
[  361.460384]  [] ? grab_cache_page_write_begin+0x7a/0xa0
[  361.460399]  [] ? ext3_write_begin+0x7e/0x201
[  361.460417]  [] ? do_lo_send_aops+0xa1/0x174
[  361.460420]  [] ? virt_to_head_page+0x9/0x2a
[  361.460422]  [] ? loop_thread+0x309/0x48a
[  361.460425]  [] ? do_lo_send_aops+0x0/0x174
[  361.460427]  [] ? autoremove_wake_function+0x0/0x2e
[  361.460430]  [] ? loop_thread+0x0/0x48a
[  361.460432]  [] ? kthread+0x78/0x80
[  361.460441]  [] ? finish_task_switch+0x2b/0x78
[  361.460454]  [] ? child_rip+0xa/0x20
[  361.460460]  [] ? native_pax_close_kernel+0x0/0x32
[  361.460463]  [] ? kthread+0x0/0x80
[  361.460469]  [] ? child_rip+0x0/0x20
[  361.460471] INFO: task kjournald:2098 blocked for more than 120 seconds.
[  361.460473] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" 
disables this message.
[  361.460474] kjournald D 88000b92e558 0  2098  2 
0x0080
[  361.460477]  88000b92e2e0 0046 88000aad9840 
88000983ffd8
[  361.460480]  88000983ffd8 88000983ffd8 81808e00 
88000b92e2e0
[  361.460483]  88000983fcf0 8181b768 880001af3c40 
0002

[  361.460486] Call Trace:
[  361.460488]  [] ? sync_buffer+0x0/0x3c
[  361.460491]  [] ? io_schedule+0x2c/0x43
[  361.460494]  [] ? sync_buffer+0x38/0x3c
[  361.460496]  [] ? __wait_on_bit+0x41/0x71
[  361.460499]  [] ? sync_buffer+0x0/0x3c
[  361.460501]  [] ? out_of_line_wait_on_bit+0x6a/0x76
[  361.460504]  [] ? wake_bit_function+0x0/0x23
[  361.460514]  [] ? 
journal_commit_transaction+0x769/0xbb8

[  361.460517]  [] ? finish_task_switch+0x2b/0x78
[  361.460519]  [] ? thread_return+0x40/0x79
[  361.460522]  [] ? kjournald+0xc7/0x1cb
[  361.460525]  [] ? autoremove_wake_function+0x0/0x2e
[  361.460527]  [] ? kjournald+0x0/0x1cb
[  361.460530]  [] ? kthread+0x78/0x80
[  361.460532]  [] ? finish_task_switch+0x2b/0x78
[  361.460534]  [] ? child_rip+0xa/0x20
[  361.460537]  [] ? native_pax_close_kernel+0x0/0x32
[  361.460540]  [] ? kthread+0x0/0x80
[  361.460542]  [] ? child_rip+0x0/0x20
[  361.460544] INFO: task dd:2132 blocked for more than 120 seconds.
[  361.460546] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" 
disables this message.
[  361.460547] ddD 88000a21f0f8 0  2132   2090 
0x0080
[  361.460550]  88000a21ee80 0082 88000a21ee80 
88000b3affd8
[  361.460553]  88000b3affd8 88000b3affd8 81808e00 
880001af3510
[  361.460556]  88000b78eaf0 88000b3daa00 880008de6c40 

Re: [PATCHv2 1/3] eventfd: allow atomic read and waitqueue remove

2010-01-21 Thread Michael S. Tsirkin
On Thu, Jan 21, 2010 at 07:33:02PM +0200, Avi Kivity wrote:
> On 01/21/2010 07:23 PM, Michael S. Tsirkin wrote:
>>
>>> I agree.
>>>
>>> But I think we can solve this in another way in .32: we can clear the
>>> eventfd from irqfd->inject work, which is in process context.  The new
>>> stuff is only needed for lockless clearing, no?
>>>  
>> No, AFAIK there's no way to clear the counter from kernel without
>> this patch.
>>
>
> Can't you read from the file?

IMO no, the read could block.

> -- 
> Do not meddle in the internals of kernels, for they are subtle and quick to 
> panic.
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCHv2 1/3] eventfd: allow atomic read and waitqueue remove

2010-01-21 Thread Avi Kivity

On 01/21/2010 07:32 PM, Michael S. Tsirkin wrote:



Can't you read from the file?
 

IMO no, the read could block.
   


But you're in process context.  An eventfd never blocks.

--
Do not meddle in the internals of kernels, for they are subtle and quick to 
panic.

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCHv2 1/3] eventfd: allow atomic read and waitqueue remove

2010-01-21 Thread Michael S. Tsirkin
On Thu, Jan 21, 2010 at 07:47:40PM +0200, Avi Kivity wrote:
> On 01/21/2010 07:32 PM, Michael S. Tsirkin wrote:
>>
>>> Can't you read from the file?
>>>  
>> IMO no, the read could block.
>>
>
> But you're in process context.  An eventfd never blocks.

Yes it blocks if counter is 0. And we don't know
it's not 0 unless we read :) catch-22.

> -- 
> Do not meddle in the internals of kernels, for they are subtle and quick to 
> panic.
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCHv2 1/3] eventfd: allow atomic read and waitqueue remove

2010-01-21 Thread Davide Libenzi
On Thu, 21 Jan 2010, Avi Kivity wrote:

> On 01/21/2010 07:32 PM, Michael S. Tsirkin wrote:
> > 
> > > Can't you read from the file?
> > >  
> > IMO no, the read could block.
> >
> 
> But you're in process context.  An eventfd never blocks.

Can you control the eventfd flags? Because if yes, O_NONBLOCK will never 
block.



- Davide


--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [Qemu-devel] [PATCH] Add definitions for current cpu models..

2010-01-21 Thread Jamie Lokier
john cooper wrote:
> kvm itself can modify flags exported from qemu to a guest.

I would hope for an option to request that qemu doesn't run if the
guest won't get the cpuid flags requested on the command line.

-- Jamie
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCHv2 1/3] eventfd: allow atomic read and waitqueue remove

2010-01-21 Thread Michael S. Tsirkin
On Thu, Jan 21, 2010 at 09:50:34AM -0800, Davide Libenzi wrote:
> On Thu, 21 Jan 2010, Avi Kivity wrote:
> 
> > On 01/21/2010 07:32 PM, Michael S. Tsirkin wrote:
> > > 
> > > > Can't you read from the file?
> > > >  
> > > IMO no, the read could block.
> > >
> > 
> > But you're in process context.  An eventfd never blocks.
> 
> Can you control the eventfd flags? Because if yes, O_NONBLOCK will never 
> block.
> 

Userspace can but kvm can't.

> 
> - Davide
> 
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [Qemu-devel] [PATCH] Add definitions for current cpu models..

2010-01-21 Thread Jamie Lokier
john cooper wrote:
> > I foresee wanting to iterate over the models and pick the latest one
> > which a host supports - on the grounds that you have done the hard
> > work of ensuring it is a reasonably good performer, while "probably"
> > working on another host of similar capability when a new host is made
> > available.
> 
> That's a fairly close use case to that of safe migration
> which was one of the primary motivations to identify
> the models being discussed.  Although presentation and
> administration of such was considered the domain of management
> tools.

My hypothetical script which iterates over models in that way is a
"management tool", and would use qemu to help do its job.

Do you mean that more powerful management tools to support safe
migration will maintain _their own_ processor model tables, and
perform their calculations using their own tables instead of querying
qemu, and therefore not have any need of qemu's built in table?

If so, I favour more strongly Anthony's suggestion that the processor
model table lives in a config file (eventually), as that file could be
shared between management tools and qemu itself without duplication.

-- Jamie
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCHv2 1/3] eventfd: allow atomic read and waitqueue remove

2010-01-21 Thread Avi Kivity

On 01/21/2010 07:45 PM, Michael S. Tsirkin wrote:



But you're in process context.  An eventfd never blocks.
 

Yes it blocks if counter is 0. And we don't know
it's not 0 unless we read :) catch-22.
   


Ah yes, I forgot.

--
Do not meddle in the internals of kernels, for they are subtle and quick to 
panic.

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCHv2 1/3] eventfd: allow atomic read and waitqueue remove

2010-01-21 Thread Avi Kivity

On 01/21/2010 07:56 PM, Avi Kivity wrote:

On 01/21/2010 07:45 PM, Michael S. Tsirkin wrote:



But you're in process context.  An eventfd never blocks.

Yes it blocks if counter is 0. And we don't know
it's not 0 unless we read :) catch-22.


Ah yes, I forgot.



Well, you can poll it and then read it... this introduces a new race (if 
userspace does a read in parallel) but it's limited to kvm and buggy 
userspace.


--
Do not meddle in the internals of kernels, for they are subtle and quick to 
panic.

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] Use compile_prog as rest of configure

2010-01-21 Thread Marcelo Tosatti
On Wed, Jan 20, 2010 at 12:46:28PM +0100, Juan Quintela wrote:
> This substitution got missed somehow
> 
> Signed-off-by: Juan Quintela 

Applied, thanks.

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] KVM: Fix kvm_coalesced_mmio_ring duplicate allocation

2010-01-21 Thread Marcelo Tosatti
On Thu, Jan 21, 2010 at 04:20:04PM +0800, Sheng Yang wrote:
> The commit 0953ca73 "KVM: Simplify coalesced mmio initialization"
> allocate kvm_coalesced_mmio_ring in the kvm_coalesced_mmio_init(), but
> didn't discard the original allocation...
> 
> Signed-off-by: Sheng Yang 

Applied, thanks.

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 1/2] qemu-kvm: Use kvm-kmod headers if available

2010-01-21 Thread Marcelo Tosatti
On Tue, Jan 12, 2010 at 10:21:27PM +0100, Jan Kiszka wrote:
> Since kvm-kmod-2.6.32.2 we have an alternative source for recent KVM
> kernel headers. Use it when available and not overruled by --kerneldir.
> If there is no kvm-kmod and no --kerneldir, we continue to fall back to
> the qemu-kvm's kernel headers.

Applied both, thanks.

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCHv2] kvm-s390: fix potential array overrun in intercept handling

2010-01-21 Thread Marcelo Tosatti
On Thu, Jan 21, 2010 at 12:19:07PM +0100, Christian Borntraeger wrote:
> v2: apply Avis suggestions about ARRAY_SIZE.
> 
> kvm_handle_sie_intercept uses a jump table to get the intercept handler
> for a SIE intercept. Static code analysis revealed a potential problem:
> the intercept_funcs jump table was defined to contain (0x48 >> 2) entries,
> but we only checked for code > 0x48 which would cause an off-by-one
> array overflow if code == 0x48.
> 
> Use the compiler and ARRAY_SIZE to automatically set the limits.
> 
> Signed-off-by: Christian Borntraeger 

Applied and queued for .33, CC: stable, thanks.

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [RFC] [PATCH] Use macros for x86_emulate_ops to avoid future mistakes

2010-01-21 Thread Marcelo Tosatti
On Wed, Jan 20, 2010 at 04:47:21PM +0900, Takuya Yoshikawa wrote:
> The return values from x86_emulate_ops are defined
> in kvm_emulate.h as macros X86EMUL_*.
> 
> But in emulate.c, we are comparing the return values
> from these ops with 0 to check if they're X86EMUL_CONTINUE
> or not: X86EMUL_CONTINUE is defined as 0 now.
> 
> To avoid possible mistakes in the future, this patch
> substitutes "X86EMUL_CONTINUE" for "0" that are being
> compared with the return values from x86_emulate_ops.
> 
>   We think that there are more places we should use these
>   macros, but the meanings of rc values in x86_emulate_insn()
>   were not so clear at a glance. If we use proper macros in
>   this function, we would be able to follow the flow of each
>   emulation more easily and, maybe, more securely.
> 
> Signed-off-by: Takuya Yoshikawa 

Applied, thanks.

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [KVM PATCH] pci passthrough: zap option rom scanning.

2010-01-21 Thread Marcelo Tosatti
On Wed, Jan 20, 2010 at 11:58:48AM +0100, Gerd Hoffmann wrote:
> Nowdays (qemu 0.12) seabios loads option roms from pci rom bars.  So
> there is no need any more to scan for option roms and have qemu load
> them.  Zap the code.
> 
> Signed-off-by: Gerd Hoffmann 

Applied, thanks.

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCHv2 1/3] eventfd: allow atomic read and waitqueue remove

2010-01-21 Thread Michael S. Tsirkin
On Thu, Jan 21, 2010 at 07:57:22PM +0200, Avi Kivity wrote:
> On 01/21/2010 07:56 PM, Avi Kivity wrote:
>> On 01/21/2010 07:45 PM, Michael S. Tsirkin wrote:
>>>
 But you're in process context.  An eventfd never blocks.
>>> Yes it blocks if counter is 0. And we don't know
>>> it's not 0 unless we read :) catch-22.
>>
>> Ah yes, I forgot.
>>
>
> Well, you can poll it and then read it... this introduces a new race (if  
> userspace does a read in parallel) but it's limited to kvm and buggy  
> userspace.

I would rather not require that userspace never reads this fd.
You are right that it does not now, but adding this as requirement
looks like exporting an implementation bug to userspace.

-- 
MST
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [Qemu-devel] [PATCH] Add definitions for current cpu models..

2010-01-21 Thread Jamie Lokier
john cooper wrote:
> I can appreciate the argument above, however the goal was
> choosing names with some basis in reality.  These were
> recommended by our contacts within Intel, are used by VmWare
> to describe their similar cpu models, and arguably have fallen
> to defacto usage as evidenced by such sources as:
> 
> http://en.wikipedia.org/wiki/Conroe_(microprocessor)
> http://en.wikipedia.org/wiki/Penryn_(microprocessor)
> http://en.wikipedia.org/wiki/Nehalem_(microarchitecture)

(Aside: I can confirm they haven't fallen into de facto usage anywhere
in my vicinity :-) I wonder if the contact within Intel are living in
a bit of a bubble where these names are more familiar than the outside
world.)

I think we can all agree that there is no point looking for a familiar
-cpu naming scheme because there aren't any familiar and meaningful names
these days.

> used by VmWare to describe their similar cpu models

If the same names are being used, I see some merit in qemu's list
matching VMware's cpu models *exactly* (in capabilities, not id
strings), to aid migration from VMware.  Is that feasible?  Do they
match already?

> I suspect whatever we choose of reasonable length as a model
> tag for "-cpu" some further detail is going to be required.
> That was the motivation to augment the table as above with
> an instance of a LCD for that associated class.
>  
> > I'm not a typical user: I know quite a lot about x86 architecture;
> > I just haven't kept up to date enough to know the code/model names.
> > Typical users will know less about them.
> 
> Understood.


> One thought I had to further clarify what is going on under the hood
> was to dump the cpuid flags for each model as part of (or in
> addition to) the above table.  But this seems a bit extreme and kvm
> itself can modify flags exported from qemu to a guest.

Here's another idea.

It would be nice if qemu could tell the user which of the built-in
-cpu choices is the most featureful subset of their own host.  With
-cpu host implemented, finding that is probably quite easy.

Users with multiple hosts will get a better feel for what the -cpu
names mean that way, probably better than any documentation would give
them, because they probably have not much idea what CPU families they
have anyway.  (cat /proc/cpuinfo doesn't clarify, as I found).

And it would give a simple, effective, quick indication of what they
must choose if they want an VM image that runs on more than one of
their hosts without a management tool.

-- Jamie
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [Qemu-devel] [PATCH] Add definitions for current cpu models..

2010-01-21 Thread john cooper
Jamie Lokier wrote:

> Do you mean that more powerful management tools to support safe
> migration will maintain _their own_ processor model tables, and
> perform their calculations using their own tables instead of querying
> qemu, and therefore not have any need of qemu's built in table?

I would expect so.  IIRC that is what the libvirt folks have
in mind for example.  But we're also trying to simplify the use
case of the lonesome user at one with the qemu CLI.

-john

-- 
john.coo...@redhat.com
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [Qemu-devel] [PATCH] Add definitions for current cpu models..

2010-01-21 Thread john cooper
Jamie Lokier wrote:

> I think we can all agree that there is no point looking for a familiar
> -cpu naming scheme because there aren't any familiar and meaningful names
> these days.

Even if we dismiss the Intel coined names as internal
code names, there is still VMW's use of them in this
space which we can either align with or attempt to
displace.   All considered I don't see any motivation
nor gain in doing the latter.  Anyway it doesn't appear
likely we're going to resolve this to our collective
satisfaction with a hard-wired naming scheme.   
 
> It would be nice if qemu could tell the user which of the built-in
> -cpu choices is the most featureful subset of their own host.  With
> -cpu host implemented, finding that is probably quite easy.

This should be doable although it may not be as simple
as traversing a hierarchy of features and picking one
with the most host flags present.  In any case this
should be fairly detachable from settling the immediate
issue.

-john

-- 
john.coo...@redhat.com
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


KVM Virtual CPU time profiling

2010-01-21 Thread Saksena, Abhishek
Hi All,
Is there a way in KVM to measure the real physical (CPU) time consumed by each 
running Virtual CPU?  (I want to do time profiling of the virtual machines 
running on host system)


Also, is there an explanation somewhere on how Virtual CPU scheduling is 
achieved in KVM?
Thanks
Abhishek
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [Qemu-devel] [PATCH] Add definitions for current cpu models..

2010-01-21 Thread Anthony Liguori

On 01/21/2010 10:43 AM, john cooper wrote:

Anthony Liguori wrote:
   

On 01/20/2010 07:18 PM, john cooper wrote:
 

I can appreciate the concern of wanting to get this
as "correct" as possible.

   

This is the root of the trouble.  At the qemu layer, we try to focus on
being correct.

Management tools are typically the layer that deals with being "correct".

A good compromise is making things user tunable which means that a
downstream can make "correctness" decisions without forcing those
decisions on upstream.
 

Conceptually I agree with such a malleable approach -- actually
I prefer it.  I thought however it was too much infrastructure to
foist on the problem just to add a few more models into the mix.
   


See list for patches.  I didn't do the cpu bits but it should be very 
obvious how to do that now.


Regards,

Anthony Liguori


The only reservation which comes to mind is that of logistics.
This may ruffle the code some and impact others such as Andre
who seem to have existing patches relative to the current structure.
Anyone have strong objections to this approach before I have a
look at an implementation?

Thanks,

-john


   


--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: repeatable hang with loop mount and heavy IO in guest

2010-01-21 Thread RW
Some months ago I also thought elevator=noop should be a good idea.
But it isn't. It works good as long as you only do short IO requests.
Try using deadline in host and guest.

Robert


On 01/21/10 18:26, Antoine Martin wrote:
> I've tried various guests, including most recent Fedora12 kernels,
> custom 2.6.32.x
> All of them hang around the same point (~1GB written) when I do heavy IO
> write inside the guest.
> I have waited 30 minutes to see if the guest would recover, but it just
> sits there, not writing back any data, not doing anything - but
> certainly not allowing any new IO writes. The host has some load on it,
> but nothing heavy enough to completely hand a guest for that long.
> 
> mount -o loop some_image.fs ./somewhere bs=512
> dd if=/dev/zero of=/somewhere/zero
> then after ~1GB: sync
> 
> Host is running: 2.6.31.4
> QEMU PC emulator version 0.10.50 (qemu-kvm-devel-88)
> 
> Guests are booted with "elevator=noop" as the filesystems are stored as
> files, accessed as virtio disks.
> 
> 
> The "hung" backtraces always look similar to these:
> [  361.460136] INFO: task loop0:2097 blocked for more than 120 seconds.
> [  361.460139] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs"
> disables this message.
> [  361.460142] loop0 D 88000b92c848 0  2097  2
> 0x0080
> [  361.460148]  88000b92c5d0 0046 880008c1f810
> 880009829fd8
> [  361.460153]  880009829fd8 880009829fd8 88000a21ee80
> 88000b92c5d0
> [  361.460157]  880009829610 8181b768 880001af33b0
> 0002
> [  361.460161] Call Trace:
> [  361.460216]  [] ? sync_page+0x0/0x43
> [  361.460253]  [] ? io_schedule+0x2c/0x43
> [  361.460257]  [] ? sync_page+0x3e/0x43
> [  361.460261]  [] ? __wait_on_bit+0x41/0x71
> [  361.460264]  [] ? wait_on_page_bit+0x6a/0x70
> [  361.460283]  [] ? wake_bit_function+0x0/0x23
> [  361.460287]  [] ? shrink_page_list+0x3e5/0x61e
> [  361.460291]  [] ? schedule_timeout+0xa3/0xbe
> [  361.460305]  [] ? autoremove_wake_function+0x0/0x2e
> [  361.460308]  [] ? shrink_zone+0x7e1/0xaf6
> [  361.460310]  [] ? determine_dirtyable_memory+0xd/0x17
> [  361.460314]  [] ? isolate_pages_global+0xa3/0x216
> [  361.460316]  [] ? mark_page_accessed+0x2a/0x39
> [  361.460335]  [] ? __find_get_block+0x13b/0x15c
> [  361.460337]  [] ? try_to_free_pages+0x1ab/0x2c9
> [  361.460340]  [] ? isolate_pages_global+0x0/0x216
> [  361.460343]  [] ? __alloc_pages_nodemask+0x394/0x564
> [  361.460350]  [] ? __slab_alloc+0x137/0x44f
> [  361.460371]  [] ? radix_tree_preload+0x1f/0x6a
> [  361.460374]  [] ? kmem_cache_alloc+0x5d/0x88
> [  361.460376]  [] ? radix_tree_preload+0x1f/0x6a
> [  361.460379]  [] ? add_to_page_cache_locked+0x1d/0xf1
> [  361.460381]  [] ? add_to_page_cache_lru+0x27/0x57
> [  361.460384]  [] ?
> grab_cache_page_write_begin+0x7a/0xa0
> [  361.460399]  [] ? ext3_write_begin+0x7e/0x201
> [  361.460417]  [] ? do_lo_send_aops+0xa1/0x174
> [  361.460420]  [] ? virt_to_head_page+0x9/0x2a
> [  361.460422]  [] ? loop_thread+0x309/0x48a
> [  361.460425]  [] ? do_lo_send_aops+0x0/0x174
> [  361.460427]  [] ? autoremove_wake_function+0x0/0x2e
> [  361.460430]  [] ? loop_thread+0x0/0x48a
> [  361.460432]  [] ? kthread+0x78/0x80
> [  361.460441]  [] ? finish_task_switch+0x2b/0x78
> [  361.460454]  [] ? child_rip+0xa/0x20
> [  361.460460]  [] ? native_pax_close_kernel+0x0/0x32
> [  361.460463]  [] ? kthread+0x0/0x80
> [  361.460469]  [] ? child_rip+0x0/0x20
> [  361.460471] INFO: task kjournald:2098 blocked for more than 120 seconds.
> [  361.460473] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs"
> disables this message.
> [  361.460474] kjournald D 88000b92e558 0  2098  2
> 0x0080
> [  361.460477]  88000b92e2e0 0046 88000aad9840
> 88000983ffd8
> [  361.460480]  88000983ffd8 88000983ffd8 81808e00
> 88000b92e2e0
> [  361.460483]  88000983fcf0 8181b768 880001af3c40
> 0002
> [  361.460486] Call Trace:
> [  361.460488]  [] ? sync_buffer+0x0/0x3c
> [  361.460491]  [] ? io_schedule+0x2c/0x43
> [  361.460494]  [] ? sync_buffer+0x38/0x3c
> [  361.460496]  [] ? __wait_on_bit+0x41/0x71
> [  361.460499]  [] ? sync_buffer+0x0/0x3c
> [  361.460501]  [] ? out_of_line_wait_on_bit+0x6a/0x76
> [  361.460504]  [] ? wake_bit_function+0x0/0x23
> [  361.460514]  [] ?
> journal_commit_transaction+0x769/0xbb8
> [  361.460517]  [] ? finish_task_switch+0x2b/0x78
> [  361.460519]  [] ? thread_return+0x40/0x79
> [  361.460522]  [] ? kjournald+0xc7/0x1cb
> [  361.460525]  [] ? autoremove_wake_function+0x0/0x2e
> [  361.460527]  [] ? kjournald+0x0/0x1cb
> [  361.460530]  [] ? kthread+0x78/0x80
> [  361.460532]  [] ? finish_task_switch+0x2b/0x78
> [  361.460534]  [] ? child_rip+0xa/0x20
> [  361.460537]  [] ? native_pax_close_kernel+0x0/0x32
> [  361.460540]  [] ? kthread+0x0/0x80
> [  361.460542]  [] ? child_rip+0x0/0x20
> [  361.460544] INFO: task dd:2132 blocked for more than 120 se

Re: repeatable hang with loop mount and heavy IO in guest

2010-01-21 Thread Thomas Beinicke
On Thursday 21 January 2010 21:08:38 RW wrote:
> Some months ago I also thought elevator=noop should be a good idea.
> But it isn't. It works good as long as you only do short IO requests.
> Try using deadline in host and guest.
> 
> Robert

@Robert: I've been using noop on all of my KVMs and didn't have any problems 
so far, never had any crash too.
Do you have any performance data or comparisons between noop and deadline io 
schedulers?

Cheers,

Thomas
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: repeatable hang with loop mount and heavy IO in guest

2010-01-21 Thread RW
No sorry, I haven't any performance data with noop. I even don't
have had a crash. BUT I've experienced serve I/O degradation
with noop. Once I've written a big chunk of data (e.g. a simple
rsync -av /usr /opt) with noop it works for a while and
after a few seconds I saw heavy writes which made the
VM virtually unusable. As far as I remember it was kjournald
which cases the writes.

I've written a mail to the list some months ago with some benchmarks:
http://article.gmane.org/gmane.comp.emulators.kvm.devel/41112/match=benchmark
There're some I/O benchmarks in there. You can't get the graphs
currently since tauceti.net is offline until monday. I haven't
tested noop in these benchmarks because of the problems
mentioned above. But it compares deadline and cfq a little bit
on a HP DL 380 G6 server.

Robert

On 01/21/10 22:08, Thomas Beinicke wrote:
> On Thursday 21 January 2010 21:08:38 RW wrote:
>> Some months ago I also thought elevator=noop should be a good idea.
>> But it isn't. It works good as long as you only do short IO requests.
>> Try using deadline in host and guest.
>>
>> Robert
> 
> @Robert: I've been using noop on all of my KVMs and didn't have any problems 
> so far, never had any crash too.
> Do you have any performance data or comparisons between noop and deadline io 
> schedulers?
> 
> Cheers,
> 
> Thomas
> --
> To unsubscribe from this list: send the line "unsubscribe kvm" in
> the body of a message to majord...@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCHv2] kvm-s390: fix potential array overrun in intercept handling

2010-01-21 Thread Alexander Graf

On 21.01.2010, at 18:36, Marcelo Tosatti wrote:

> On Thu, Jan 21, 2010 at 12:19:07PM +0100, Christian Borntraeger wrote:
>> v2: apply Avis suggestions about ARRAY_SIZE.
>> 
>> kvm_handle_sie_intercept uses a jump table to get the intercept handler
>> for a SIE intercept. Static code analysis revealed a potential problem:
>> the intercept_funcs jump table was defined to contain (0x48 >> 2) entries,
>> but we only checked for code > 0x48 which would cause an off-by-one
>> array overflow if code == 0x48.
>> 
>> Use the compiler and ARRAY_SIZE to automatically set the limits.
>> 
>> Signed-off-by: Christian Borntraeger 
> 
> Applied and queued for .33, CC: stable, thanks.

Yes. Christian, please get this into 2.6.32-stable.

Alex
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


RE: Some keys don't repeat in 64 bit Widows 7 kvm guest

2010-01-21 Thread Jimmy Crossley
I am now running qemu-kvm 0.11.1:

$ kvm -h | head -1
QEMU PC emulator version 0.11.1 (qemu-kvm-0.11.1), Copyright (c) 2003-2008 
Fabrice Bellard

My Windows 7 guest detected a lot of new hardware, but I still have the same 
key repeating problem.  I think I will just leave this alone for now since I am 
going to be away from my office (and this machine) for several weeks.   When I 
return, I plan on doing a clean install of everything.  If I still have this 
issue, I will report back.

Thanks to everyone for your help.

> -Original Message-
> From: kvm-ow...@vger.kernel.org [mailto:kvm-ow...@vger.kernel.org] On
> Behalf Of Jimmy Crossley
> Sent: Saturday, January 16, 2010 21:33
> To: 'Jim Paris'
> Cc: 'Gleb Natapov'; kvm@vger.kernel.org
> Subject: RE: Some keys don't repeat in 64 bit Widows 7 kvm guest
> 
> > From: j...@jim.sh [mailto:j...@jim.sh] On Behalf Of Jim Paris
> > Sent: Saturday, January 16, 2010 20:40
> > To: Jimmy Crossley
> > Cc: 'Gleb Natapov'; kvm@vger.kernel.org
> > Subject: Re: Some keys don't repeat in 64 bit Widows 7 kvm guest
> >
> > Jimmy Crossley wrote:
> > > Thanks for the quick response, Gleb.  You are right - we should
> not
> > > spend our time troubleshooting an issue with something this old.
> > > I'll try downloading all the sources and headers I need to build
> > > kvm-88.  I think I'll need another Debian install, since this is a
> > > production machine and I don't want to destabilize it.  Go ahead
> and
> > > laugh - I ran Debian stable for years before finally deciding I
> > > could risk running testing.
> >
> > Debian testing still has the "kvm" package at version 72, but the
> new
> > package name "qemu-kvm" is at version 0.11.0 which is quite a bit
> > newer.
> >
> > -jim
> 
> It looks like I need to switch to qemu-kvm.  That kvm package that I
> have
> Installed (72+dfsg=5+squeeze1) is not in the squeeze repositories any
> more.
> 
> It sure is hard to keep up with everything.  Thanks, Jim.
> 
> 
> Jimmy Crossley
> CoNetrix
> 5214 68th Street
> Suite 200
> Lubbock TX 79424
> jcross...@conetrix.com
> http://www.conetrix.com
> tel: 806-687-8600 800-356-6568
> fax: 806-687-8511
> This e-mail message (and attachments) may contain confidential
> CoNetrix information. If you are not the intended recipient, you
> cannot use, distribute or copy the message or attachments. In such a
> case, please notify the sender by return e-mail immediately and erase
> all copies of the message and attachments. Opinions, conclusions and
> other information in this message and attachments that do not relate
> to official business are neither given nor endorsed by CoNetrix.
> 
> 
> --
> To unsubscribe from this list: send the line "unsubscribe kvm" in
> the body of a message to majord...@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


qemu-kvm-0.12.2 hangs when booting grub, when kvm is disabled

2010-01-21 Thread Jim Paris
Hi,

With this small disk image:

  http://psy.jim.sh/~jim/tmp/diskimage.gz

and the new qemu-kvm-0.12.2:

  $ kvm --version
  QEMU PC emulator version 0.12.2 (qemu-kvm-0.12.2), Copyright (c) 2003-2008 
Fabrice Bellard

I can successfully boot to a "grub> " prompt with:

  $ kvm -drive file=diskimage,boot=on

However, if kvm gets disabled:

  $ kvm -no-kvm -drive file=diskimage,boot=on

then the boot hangs at "GRUB Loading, please wait..." and consumes 100% CPU.

-jim
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 0/5] Debug register emulation fixes and optimizations (reloaded)

2010-01-21 Thread Marcelo Tosatti
On Wed, Jan 20, 2010 at 06:20:20PM +0100, Jan Kiszka wrote:
> Major parts of this series were already posted a while ago during the
> debug register switch optimizations. This version now comes with an
> additional fix for VMX (patch 1) and a rework of mov dr emulation for
> SVM.
> 
> Find this series also at git://git.kiszka.org/linux-kvm.git queues/debugregs
> 
> Jan Kiszka (5):
>   KVM: VMX: Fix exceptions of mov to dr
>   KVM: VMX: Fix emulation of DR4 and DR5
>   KVM: VMX: Clean up DR6 emulation
>   KVM: SVM: Clean up and enhance mov dr emulation
>   KVM: SVM: Trap all debug register accesses
> 
>  arch/x86/include/asm/kvm_host.h |5 +-
>  arch/x86/kvm/svm.c  |   78 
> +--
>  arch/x86/kvm/vmx.c  |   67 +++--
>  arch/x86/kvm/x86.c  |   19 +
>  4 files changed, 84 insertions(+), 85 deletions(-)

Applied, thanks.

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


RE: vcpu hotplug support

2010-01-21 Thread Liu, Jinsong
Avi Kivity wrote:
> On 01/21/2010 01:54 PM, Liu, Jinsong wrote:
>> Avi,
>> 
>> I just send 2 patches for KVM vcpu hotplug support.
>> 1 is seabios patch: Setup vcpu add/remove infrastructure, including
>> madt bios_info and dsdt 2 is qemu-kvm patch: Debug vcpu add
>> 
>> 
> 
> The patches look reasonable (of course I'd like to see Gleb review
> it), but please send the seabios patch to the seabios mailing list
> (seab...@seabios.org) so we don't have to diverge.

Thanks for remind! I have sent to seabios.

Jinsong--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


RE: [PATCH] Setup vcpu add/remove infrastructure, including madt bios_info and dsdt.

2010-01-21 Thread Liu, Jinsong
Gleb Natapov wrote:
> On Thu, Jan 21, 2010 at 07:48:23PM +0800, Liu, Jinsong wrote:
>>> From cb997030cba02e7e74a29b3d942aeba9808ed293 Mon Sep 17 00:00:00
>>> 2001 
>> From: Liu, Jinsong 
>> Date: Fri, 22 Jan 2010 03:18:46 +0800
>> Subject: [PATCH] Setup vcpu add/remove infrastructure,
>> including madt bios_info and dsdt. 
>> 
>> 1. setup madt bios_info structure, so that static dsdt get
>>run-time madt info like checksum address, lapic address,
>>max cpu numbers, with least hardcode magic number
>>(realmode address of bios_info).
>> 2. setup vcpu add/remove dsdt infrastructure, including
>>processor related acpi objects and control methods. vcpu
>>add/remove will trigger SCI and then control method _L02.
>>By matching madt, vcpu number and add/remove action were
>>found, then by notify control method, it will notify OS
>> acpi driver. 
>> 
>> Signed-off-by: Liu, Jinsong 
> It looks like AML code is a port of what we had in BOCHS bios with
> minor changes. Can you detail what is changed and why for easy review
> please? And this still doesn't work with Windows I assume.
> 

Yes, my work is based on BOCHS infrastructure, thanks BOCHS :)
I just change some minor points:
1. explicitly define returen value of '_MAT' as 'buffer', otherwise some linux 
acpi driver (i.e. linux 2.6.30) would parse error which will handle it as 
'integer' not 'buffer';
2. keep correct 'checksum' of madt when vcpu add/remove, otherwise it will 
report 'checksum error' when using acpi tools to get madt info if we add/remove 
vcpu;
3. add '_EJ0' so that linux has acpi obj under /sys/devices/LNXSYSTM:00, which 
is need for vcpu remove;
4. on Method(PRSC, 0), just scan 'xxx' vcpus that qemu get from cmdline para 
'maxcpus=xxx', not all 256 vcpus, otherwise under some dsdt processor define, 
it will result error;
5. use 1 hardcode address bios_info structure to replace '0x514', so that it 
can transfer more madt info to dsdt;

Thanks,
Jinsong--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH] kvm: Flush coalesced MMIO buffer periodly

2010-01-21 Thread Sheng Yang
The default action of coalesced MMIO is, cache the writing in buffer, until:
1. The buffer is full.
2. Or the exit to QEmu due to other reasons.

But this would result in a very late writing in some condition.
1. The each time write to MMIO content is small.
2. The writing interval is big.
3. No need for input or accessing other devices frequently.

This issue was observed in a experimental embbed system. The test image
simply print "test" every 1 seconds. The output in QEmu meets expectation,
but the output in KVM is delayed for seconds.

Per Avi's suggestion, I hooked a flushing for coalesced MMIO buffer in VGA
update handler. By this way, We don't need vcpu explicit exit to QEmu to
handle this issue.

Signed-off-by: Sheng Yang 
---

Like this?

 qemu-kvm.c |   26 --
 qemu-kvm.h |6 ++
 vl.c   |2 ++
 3 files changed, 32 insertions(+), 2 deletions(-)

diff --git a/qemu-kvm.c b/qemu-kvm.c
index 599c3d6..a9b5107 100644
--- a/qemu-kvm.c
+++ b/qemu-kvm.c
@@ -463,6 +463,12 @@ static void kvm_create_vcpu(CPUState *env, int id)
 goto err_fd;
 }
 
+#ifdef KVM_CAP_COALESCED_MMIO
+if (kvm_state->coalesced_mmio && !kvm_state->coalesced_mmio_ring)
+kvm_state->coalesced_mmio_ring = (void *) env->kvm_run +
+   kvm_state->coalesced_mmio * PAGE_SIZE;
+#endif
+
 return;
   err_fd:
 close(env->kvm_fd);
@@ -927,8 +933,7 @@ int kvm_run(CPUState *env)
 
 #if defined(KVM_CAP_COALESCED_MMIO)
 if (kvm_state->coalesced_mmio) {
-struct kvm_coalesced_mmio_ring *ring =
-(void *) run + kvm_state->coalesced_mmio * PAGE_SIZE;
+struct kvm_coalesced_mmio_ring *ring = kvm_state->coalesced_mmio_ring;
 while (ring->first != ring->last) {
 cpu_physical_memory_rw(ring->coalesced_mmio[ring->first].phys_addr,
&ring->coalesced_mmio[ring->first].data[0],
@@ -2073,6 +2078,23 @@ static void io_thread_wakeup(void *opaque)
 }
 }
 
+#ifdef KVM_CAP_COALESCED_MMIO
+void kvm_flush_coalesced_mmio_buffer(void)
+{
+if (kvm_state->coalesced_mmio_ring) {
+struct kvm_coalesced_mmio_ring *ring =
+kvm_state->coalesced_mmio_ring;
+while (ring->first != ring->last) {
+cpu_physical_memory_rw(ring->coalesced_mmio[ring->first].phys_addr,
+   &ring->coalesced_mmio[ring->first].data[0],
+   ring->coalesced_mmio[ring->first].len, 1);
+smp_wmb();
+ring->first = (ring->first + 1) % KVM_COALESCED_MMIO_MAX;
+}
+}
+}
+#endif
+
 int kvm_main_loop(void)
 {
 int fds[2];
diff --git a/qemu-kvm.h b/qemu-kvm.h
index 6b3e5a1..8188ff6 100644
--- a/qemu-kvm.h
+++ b/qemu-kvm.h
@@ -1125,6 +1125,11 @@ static inline int kvm_set_migration_log(int enable)
 return kvm_physical_memory_set_dirty_tracking(enable);
 }
 
+#ifdef KVM_CAP_COALESCED_MMIO
+void kvm_flush_coalesced_mmio_buffer(void);
+#else
+void kvm_flush_coalesced_mmio_buffer(void) {}
+#endif
 
 int kvm_irqchip_in_kernel(void);
 #ifdef CONFIG_KVM
@@ -1144,6 +1149,7 @@ typedef struct KVMState {
 int fd;
 int vmfd;
 int coalesced_mmio;
+struct kvm_coalesced_mmio_ring *coalesced_mmio_ring;
 int broken_set_mem_region;
 int migration_log;
 int vcpu_events;
diff --git a/vl.c b/vl.c
index 9edea10..64902f2 100644
--- a/vl.c
+++ b/vl.c
@@ -3235,6 +3235,7 @@ static void gui_update(void *opaque)
 interval = dcl->gui_timer_interval;
 dcl = dcl->next;
 }
+kvm_flush_coalesced_mmio_buffer();
 qemu_mod_timer(ds->gui_timer, interval + qemu_get_clock(rt_clock));
 }
 
@@ -3242,6 +3243,7 @@ static void nographic_update(void *opaque)
 {
 uint64_t interval = GUI_REFRESH_INTERVAL;
 
+kvm_flush_coalesced_mmio_buffer();
 qemu_mod_timer(nographic_timer, interval + qemu_get_clock(rt_clock));
 }
 
-- 
1.5.4.5

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: KVM Virtual CPU time profiling

2010-01-21 Thread Sheng Yang
On Friday 22 January 2010 02:41:35 Saksena, Abhishek wrote:
> Hi All,
> Is there a way in KVM to measure the real physical (CPU) time consumed by
>  each running Virtual CPU?  (I want to do time profiling of the virtual
>  machines running on host system)
> 
> Also, is there an explanation somewhere on how Virtual CPU scheduling is
>  achieved in KVM? Thanks

Each VM is a QEmu process, and each vcpu is a thread of it(but not all the 
threads are vcpus). Currently the KVM related scheduler algorithm is the same 
as other host threads/processes.

You can get thread_id for each vcpu in QEmu monitor, by:

(qemu) info cpus

Then, you can do anything you want with it, e.g. using top to got each 
thread/vcpu's CPU time. :)

-- 
regards
Yang, Sheng
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Unable to single-step in kvm, always results in a resume

2010-01-21 Thread Nicholas Amon
So now I can step instruction but my breakpoints do not work.  I have 
verified that disabling kvm restores the breakpoint functionality.  Any 
suggestions?


Thanks,

Nicholas

Jan Kiszka wrote:

Hi Nicholas,

please don't drop CCs on reply.

Nicholas Amon wrote:
  

Hi Jan,

Thanks for responding.  Yes, I am able to step instruction when I 
disable kvm w/ the no-kvm option.  My host kernel is 64bit  2.6.27 and 
the program that I am debugging is 32 bit but starts in real mode.  But 
the KVM module I am running is from kvm-88.  Is there anyway I can check 
the version definitively?



kvm modules issue a message when being loaded, check your kernel log.
qemu-kvm gives you the version via -version.

OK, the problems you see is likely related to the very old versions you
use. Update to recent kvm-kmod (2.6.32 series) and qemu-kvm (0.12
series) and retry.

Jan

  

Thanks,

Nicholas

Jan Kiszka wrote:


Jan Kiszka wrote:
  
  

Nicholas Amon wrote:



Hi All,

I am trying to single-step through my kernel using qemu and kvm.  I have
run qemu via:  qemu-system-x86_64 -s -S -hda
/home/nickamon/lab1/obj/kernel.img and also connected to the process
using gdb.

Problem is that whenever I try and step instruction, it seems to resume
my kernel rather than allowing me to progress instruction by
instruction.  I have built the kvm snapshot from git and still no luck. 
Tried following the code for a few hours and have no luck.  Any

suggestions?
  
  

What's you host kernel or kvm-kmod version?




...and does -no-kvm make any difference (except that it's much slower)?

Jan

  
  


  


--
Nicholas Amon
Senior Software Engineer
Xceedium Inc.
Office: 201-536-1000 x127
Cell: 732-236-7698
na...@xceedium.com

See How to Control & Track High-Risk Users: Join our Webinar on Tuesday, 
June 2

Network World Names Xceedium GateKeeper "RSA 2009 Best of Show"

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


PCI Passthrough Problem

2010-01-21 Thread Aaron Clausen
I'm trying once again to get PCI passthrough working (KVM 84 on Ubuntu
9.10), and I'm getting this error :

LC_ALL=C PATH=/usr/local/sbin:/usr/local/bin:/sbin:/bin:/usr/sbin:/usr/bin
/usr/bin/kvm -S -M pc-0.11 -m 4096 -smp 4 -name mailserver -uuid
76a83471-e94a-3658-fa61-8eceaa74ffc2 -monitor
unix:/var/run/libvirt/qemu/mailserver.monitor,server,nowait -localtime
-boot c -drive file=,if=ide,media=cdrom,index=2 -drive
file=/var/lib/libvirt/images/mailserver.img,if=virtio,index=0,boot=on
-drive file=/var/lib/libvirt/images/mailserver-2.img,if=virtio,index=1
-net nic,macaddr=54:52:00:1b:b2:56,vlan=0,model=virtio,name=virtio.0
-net tap,fd=17,vlan=0,name=tap.0 -serial pty -parallel none -usb
-usbdevice tablet -vnc 127.0.0.1:0 -k en-us -vga cirrus -pcidevice
host=0a:01.0
char device redirected to /dev/pts/0
get_real_device: /sys/bus/pci/devices/:0a:01.0/config: Permission denied
init_assigned_device: Error: Couldn't get real device (0a:01.0)!
Failed to initialize assigned device host=0a:01.0

Any thoughts?

-- 
Aaron Clausen
mightymartia...@gmail.com
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] Setup vcpu add/remove infrastructure, including madt bios_info and dsdt.

2010-01-21 Thread Gleb Natapov
On Fri, Jan 22, 2010 at 10:15:44AM +0800, Liu, Jinsong wrote:
> Gleb Natapov wrote:
> > On Thu, Jan 21, 2010 at 07:48:23PM +0800, Liu, Jinsong wrote:
> >>> From cb997030cba02e7e74a29b3d942aeba9808ed293 Mon Sep 17 00:00:00
> >>> 2001 
> >> From: Liu, Jinsong 
> >> Date: Fri, 22 Jan 2010 03:18:46 +0800
> >> Subject: [PATCH] Setup vcpu add/remove infrastructure,
> >> including madt bios_info and dsdt. 
> >> 
> >> 1. setup madt bios_info structure, so that static dsdt get
> >>run-time madt info like checksum address, lapic address,
> >>max cpu numbers, with least hardcode magic number
> >>(realmode address of bios_info).
> >> 2. setup vcpu add/remove dsdt infrastructure, including
> >>processor related acpi objects and control methods. vcpu
> >>add/remove will trigger SCI and then control method _L02.
> >>By matching madt, vcpu number and add/remove action were
> >>found, then by notify control method, it will notify OS
> >> acpi driver. 
> >> 
> >> Signed-off-by: Liu, Jinsong 
> > It looks like AML code is a port of what we had in BOCHS bios with
> > minor changes. Can you detail what is changed and why for easy review
> > please? And this still doesn't work with Windows I assume.
> > 
> 
> Yes, my work is based on BOCHS infrastructure, thanks BOCHS :)
> I just change some minor points:
> 1. explicitly define returen value of '_MAT' as 'buffer', otherwise some 
> linux acpi driver (i.e. linux 2.6.30) would parse error which will handle it 
> as 'integer' not 'buffer';
> 2. keep correct 'checksum' of madt when vcpu add/remove, otherwise it will 
> report 'checksum error' when using acpi tools to get madt info if we 
> add/remove vcpu;
> 3. add '_EJ0' so that linux has acpi obj under /sys/devices/LNXSYSTM:00, 
> which is need for vcpu remove;
> 4. on Method(PRSC, 0), just scan 'xxx' vcpus that qemu get from cmdline para 
> 'maxcpus=xxx', not all 256 vcpus, otherwise under some dsdt processor define, 
> it will result error;
What kind of errors? Qemu should never set bit over maxcpus in PRS.

> 5. use 1 hardcode address bios_info structure to replace '0x514', so that it 
> can transfer more madt info to dsdt;
> 
> Thanks,
> Jinsong

--
Gleb.
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: PCI Passthrough Problem

2010-01-21 Thread Yolkfull Chow
On Thu, Jan 21, 2010 at 09:24:36PM -0800, Aaron Clausen wrote:
> I'm trying once again to get PCI passthrough working (KVM 84 on Ubuntu
> 9.10), and I'm getting this error :
> 
> LC_ALL=C PATH=/usr/local/sbin:/usr/local/bin:/sbin:/bin:/usr/sbin:/usr/bin
> /usr/bin/kvm -S -M pc-0.11 -m 4096 -smp 4 -name mailserver -uuid
> 76a83471-e94a-3658-fa61-8eceaa74ffc2 -monitor
> unix:/var/run/libvirt/qemu/mailserver.monitor,server,nowait -localtime
> -boot c -drive file=,if=ide,media=cdrom,index=2 -drive
> file=/var/lib/libvirt/images/mailserver.img,if=virtio,index=0,boot=on
> -drive file=/var/lib/libvirt/images/mailserver-2.img,if=virtio,index=1
> -net nic,macaddr=54:52:00:1b:b2:56,vlan=0,model=virtio,name=virtio.0
> -net tap,fd=17,vlan=0,name=tap.0 -serial pty -parallel none -usb
> -usbdevice tablet -vnc 127.0.0.1:0 -k en-us -vga cirrus -pcidevice
> host=0a:01.0
> char device redirected to /dev/pts/0
> get_real_device: /sys/bus/pci/devices/:0a:01.0/config: Permission denied
> init_assigned_device: Error: Couldn't get real device (0a:01.0)!
> Failed to initialize assigned device host=0a:01.0

Seems libvirt initialize the PCI devices problem, you could manually unbind 
this 
device from host kernel driver and try above command again.

For unbind this device please refer to :

http://www.linux-kvm.org/page/How_to_assign_devices_with_VT-d_in_KVM

> 
> Any thoughts?
> 
> -- 
> Aaron Clausen
> mightymartia...@gmail.com
> --
> To unsubscribe from this list: send the line "unsubscribe kvm" in
> the body of a message to majord...@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 1/2] KVM: x86: Fix probable memory leak of vcpu->arch.mce_banks

2010-01-21 Thread Wei Yongjun
vcpu->arch.mce_banks is malloc in kvm_arch_vcpu_init(), but
never free in any place, this may cause memory leak. So this
patch fixed to free it in kvm_arch_vcpu_uninit().

Signed-off-by: Wei Yongjun 
---
 arch/x86/kvm/x86.c |1 +
 1 files changed, 1 insertions(+), 0 deletions(-)

diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index f25b52e..1ddcad4 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -5089,6 +5089,7 @@ fail:
 
 void kvm_arch_vcpu_uninit(struct kvm_vcpu *vcpu)
 {
+   kfree(vcpu->arch.mce_banks);
kvm_free_lapic(vcpu);
down_read(&vcpu->kvm->slots_lock);
kvm_mmu_destroy(vcpu);
-- 
1.6.2.2


--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 2/2] KVM: x86: Fix leak of free lapic date in kvm_arch_vcpu_init()

2010-01-21 Thread Wei Yongjun
In function kvm_arch_vcpu_init(), if the memory malloc for
vcpu->arch.mce_banks is fail, it does not free the memory
of lapic date. This patch fixed it.

Signed-off-by: Wei Yongjun 
---
 arch/x86/kvm/x86.c |5 +++--
 1 files changed, 3 insertions(+), 2 deletions(-)

diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index 6651dbf..f25b52e 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -5072,12 +5072,13 @@ int kvm_arch_vcpu_init(struct kvm_vcpu *vcpu)
   GFP_KERNEL);
if (!vcpu->arch.mce_banks) {
r = -ENOMEM;
-   goto fail_mmu_destroy;
+   goto fail_free_lapic;
}
vcpu->arch.mcg_cap = KVM_MAX_MCE_BANKS;
 
return 0;
-
+fail_free_lapic:
+   kvm_free_lapic(vcpu);
 fail_mmu_destroy:
kvm_mmu_destroy(vcpu);
 fail_free_pio_data:
-- 
1.6.2.2


--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 1/2 v2] KVM: x86: Fix probable memory leak of vcpu->arch.mce_banks

2010-01-21 Thread Wei Yongjun
vcpu->arch.mce_banks is malloc in kvm_arch_vcpu_init(), but
never free in any place, this may cause memory leak. So this
patch fixed to free it in kvm_arch_vcpu_uninit().

Signed-off-by: Wei Yongjun 
---
 arch/x86/kvm/x86.c |1 +
 1 files changed, 1 insertions(+), 0 deletions(-)

diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index 56a90a6..c27ebb1 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -5470,6 +5470,7 @@ void kvm_arch_vcpu_uninit(struct kvm_vcpu *vcpu)
 {
int idx;
 
+   kfree(vcpu->arch.mce_banks);
kvm_free_lapic(vcpu);
idx = srcu_read_lock(&vcpu->kvm->srcu);
kvm_mmu_destroy(vcpu);

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Some keys don't repeat in 64 bit Widows 7 kvm guest

2010-01-21 Thread Gleb Natapov
On Thu, Jan 21, 2010 at 05:35:08PM -0600, Jimmy Crossley wrote:
> I am now running qemu-kvm 0.11.1:
> 
> $ kvm -h | head -1
> QEMU PC emulator version 0.11.1 (qemu-kvm-0.11.1), Copyright (c) 2003-2008 
> Fabrice Bellard
> 
> My Windows 7 guest detected a lot of new hardware, but I still have the same 
> key repeating problem.  I think I will just leave this alone for now since I 
> am going to be away from my office (and this machine) for several weeks.   
> When I return, I plan on doing a clean install of everything.  If I still 
> have this issue, I will report back.
> 
qemu-kvm-0.11.1 is still pretty old. The latest version is qemu-kvm-0.12
and you need to update you kernel modules too. Similarly sounding
problem was fixed by kernel changes a while ago.

> Thanks to everyone for your help.
> 
> > -Original Message-
> > From: kvm-ow...@vger.kernel.org [mailto:kvm-ow...@vger.kernel.org] On
> > Behalf Of Jimmy Crossley
> > Sent: Saturday, January 16, 2010 21:33
> > To: 'Jim Paris'
> > Cc: 'Gleb Natapov'; kvm@vger.kernel.org
> > Subject: RE: Some keys don't repeat in 64 bit Widows 7 kvm guest
> > 
> > > From: j...@jim.sh [mailto:j...@jim.sh] On Behalf Of Jim Paris
> > > Sent: Saturday, January 16, 2010 20:40
> > > To: Jimmy Crossley
> > > Cc: 'Gleb Natapov'; kvm@vger.kernel.org
> > > Subject: Re: Some keys don't repeat in 64 bit Widows 7 kvm guest
> > >
> > > Jimmy Crossley wrote:
> > > > Thanks for the quick response, Gleb.  You are right - we should
> > not
> > > > spend our time troubleshooting an issue with something this old.
> > > > I'll try downloading all the sources and headers I need to build
> > > > kvm-88.  I think I'll need another Debian install, since this is a
> > > > production machine and I don't want to destabilize it.  Go ahead
> > and
> > > > laugh - I ran Debian stable for years before finally deciding I
> > > > could risk running testing.
> > >
> > > Debian testing still has the "kvm" package at version 72, but the
> > new
> > > package name "qemu-kvm" is at version 0.11.0 which is quite a bit
> > > newer.
> > >
> > > -jim
> > 
> > It looks like I need to switch to qemu-kvm.  That kvm package that I
> > have
> > Installed (72+dfsg=5+squeeze1) is not in the squeeze repositories any
> > more.
> > 
> > It sure is hard to keep up with everything.  Thanks, Jim.
> > 
> > 
> > Jimmy Crossley
> > CoNetrix
> > 5214 68th Street
> > Suite 200
> > Lubbock TX 79424
> > jcross...@conetrix.com
> > http://www.conetrix.com
> > tel: 806-687-8600 800-356-6568
> > fax: 806-687-8511
> > This e-mail message (and attachments) may contain confidential
> > CoNetrix information. If you are not the intended recipient, you
> > cannot use, distribute or copy the message or attachments. In such a
> > case, please notify the sender by return e-mail immediately and erase
> > all copies of the message and attachments. Opinions, conclusions and
> > other information in this message and attachments that do not relate
> > to official business are neither given nor endorsed by CoNetrix.
> > 
> > 
> > --
> > To unsubscribe from this list: send the line "unsubscribe kvm" in
> > the body of a message to majord...@vger.kernel.org
> > More majordomo info at  http://vger.kernel.org/majordomo-info.html

--
Gleb.
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v3 04/12] Add "handle page fault" PV helper.

2010-01-21 Thread Gleb Natapov
On Thu, Jan 21, 2010 at 07:47:22AM -0800, H. Peter Anvin wrote:
> On 01/21/2010 01:02 AM, Avi Kivity wrote:
> >>
> >> You can also just emulate the state transition -- since you know
> >> you're dealing with a flat protected-mode or long-mode OS (and just
> >> make that a condition of enabling the feature) you don't have to deal
> >> with all the strange combinations of directions that an unrestricted
> >> x86 event can take.  Since it's an exception, it is unconditional.
> > 
> > Do you mean create the stack frame manually?  I'd really like to avoid
> > that for many reasons, one of which is performance (need to do all the
> > virt-to-phys walks manually), the other is that we're certain to end up
> > with something horribly underspecified.  I'd really like to keep as
> > close as possible to the hardware.  For the alternative approach, see Xen.
> > 
> 
> I obviously didn't mean to do something which didn't look like a
> hardware-delivered exception.  That by itself provides a tight spec.
> The performance issue is real, of course.
> 
> Obviously, the design of VT-x was before my time at Intel, so I'm not
> familiar with why the tradeoffs that were done they way they were.
> 
Is it so out of question to reserver exception below 32 for PV use?

--
Gleb.
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: repeatable hang with loop mount and heavy IO in guest

2010-01-21 Thread Michael Tokarev
Antoine Martin wrote:
> I've tried various guests, including most recent Fedora12 kernels,
> custom 2.6.32.x
> All of them hang around the same point (~1GB written) when I do heavy IO
> write inside the guest.
[]
> Host is running: 2.6.31.4
> QEMU PC emulator version 0.10.50 (qemu-kvm-devel-88)

Please update to last version and repeat.  kvm-88 is ancient and
_lots_ of stuff fixed and changed since that time, I doubt anyone
here will try to dig into kvm-88 problems.

Current kvm is qemu-kvm-0.12.2, released yesterday.

/mjt
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html