date:20100218

[PATCH] QEMU kill CR3_CACHE references

2010-02-18 Thread Jes Sorensen


Hi,

The CR3 caching was never implemented in QEMU and is obsoleted by
NPT/EPT. This patch removes the unused references to it from
target-i386/kvm.c.

Cheers,
Jes

commit 5ed16687929511d015dd3542c4359cabe170401a
Author: Jes Sorensen 
Date:   Fri Feb 19 07:39:56 2010 +0100

Remove all references to KVM_CR3_CACHE as it was never implemented.

Signed-off-by: Jes Sorensen 

diff --git a/target-i386/kvm.c b/target-i386/kvm.c
index 0d08cd5..5d9aecc 100644
--- a/target-i386/kvm.c
+++ b/target-i386/kvm.c
@@ -158,9 +158,6 @@ struct kvm_para_features {
 #ifdef KVM_CAP_PV_MMU
 { KVM_CAP_PV_MMU, KVM_FEATURE_MMU_OP },
 #endif
-#ifdef KVM_CAP_CR3_CACHE
-{ KVM_CAP_CR3_CACHE, KVM_FEATURE_CR3_CACHE },
-#endif
 { -1, -1 }
 };

Re: [patch] x86: kvm: Convert i8254/i8259 locks to raw_spinlocks

2010-02-18 Thread Marcelo Tosatti

On Thu, Feb 18, 2010 at 12:18:25PM +0200, Avi Kivity wrote:
> On 02/18/2010 12:05 PM, Jan Kiszka wrote:
> >Avi Kivity wrote:
> >>On 02/18/2010 11:45 AM, Avi Kivity wrote:
> >>>On 02/18/2010 11:40 AM, Jan Kiszka wrote:
> >Meanwhile, if anyone has any idea how to kill this lock, I'd love to
> >see it.
> >
> What concurrency does it resolve in the end? On first glance, it only
> synchronize the fiddling with pre-VCPU request bits, right? What forces
> us to do this? Wouldn't it suffice to disable preemption (thus
> migration) and the let concurrent requests race for setting the bits? I
> mean if some request bit was already set on entry, we don't include the
>    related VCPU in smp_call_function_many anyway.
> >>>It's more difficult.
> >>>
> >>>vcpu 0: sets request bit on vcpu 2
> >>>   vcpu 1: test_and_set request bit on vcpu 2, returns already set
> >>>   vcpu 1: returns
> >>>vcpu 0: sends IPI
> >>>vcpu 0: returns
> >>>
> >>>so vcpu 1 returns before the IPI was performed.  If the request was a
> >>>tlb flush, for example, vcpu 1 may free a page that is still in vcpu
> >>>2's tlb.
> >>One way out would be to have a KVM_REQ_IN_PROGRESS, set it in
> >>make_request, clear it in the IPI function.
> >>
> >>If a second make_request sees it already set, it can simply busy wait
> >>until it is cleared, without sending the IPI.  Of course the busy wait
> >>means we can't enable preemption (or we may busy wait on an unscheduled
> >>task), but at least the requests can proceed in parallel instead of
> >>serializing.
>
> >...or include VCPUs with KVM_REQ_IN_PROGRESS set into the IPI set even
> >if they already have the desired request bit set.
> 
> But then we're making them take the IPI, which is pointless and
> expensive.  My approach piggy backs multiple requesters on one IPI.

I have played with this in the past (collapsing that would avoid two
simultaneous requestors from issuing two IPI's to a given vcpu, and
unification with KVM_REQ_KICK to avoid the IPI if vcpu not in guest
mode).

Its not worthwhile though, this is not a contention point with TDP
(maybe it becomes in the future with fine grained flushing, but not
ATM).

> >Then we should
> >serialize in smp_call_function_many.
> 
> Do you mean rely on s_c_f_m's internal synchronization?

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH 1/2] qemu-kvm: extboot: Keep variables in RAM

2010-02-18 Thread H. Peter Anvin

On 02/18/2010 12:27 PM, Anthony Liguori wrote:
> On 02/18/2010 10:13 AM, Jan Kiszka wrote:
>> Instead of saving the old INT 0x13 and 0x19 handlers in ROM which fails
>> under QEMU as it enforces protection, keep them in spare vectors of the
>> interrupt table, namely INT 0x80 and 0x81.
>>
>> Signed-off-by: Jan Kiszka
>>
> 
> commit a4492b03932ea3c9762372f3e15e8c6526ee56c6
> Author: H. Peter Anvin 
> Date:   Fri Jul 18 11:22:59 2008 -0700
> 
>  kvm: extboot: don't use interrupt vectors $0x2b and $0x2c
> 
>  extboot's use of interrupt vectors $0x2b and $0x2c is unsafe, as these
>  interrupt vectors fall in the OS-use range (0x20-0x3f).  Furthermore,
>  it's unnecessary: we can keep a local pointer instead of hooking
>  another interrupt as long as we can write to our own segment.
> 
>  Make the extboot segment writable, and use local variables to hold the
>  old link pointers.
> 
>  If this turns out to cause problems, we should probably switch to
>  using vectors in the 0xc0-0xef range, and/or other BIOS-reserved
>  memory.
> 
>  Signed-off-by: H. Peter Anvin 
>  Signed-off-by: Avi Kivity 
> 
> Sounds like 0x80/0x81 is probably not the best choice.  hpa: any 
> suggestions?

There aren't really any free memory that you can just use -- there are
no free interrupt vectors which are safe to use.  Furthermore, this
implies that there is a bug in the Qemu BIOS in this respect: the PCI
spec requires that the "ROM region" (upper memory area) that contains
shadow ROM contents be writable until INT 19h is executed -- at *that*
point it gets write protected.

However, as I did point out in the original comment, there are some
BIOSes in the field which uses vectors 0xc0-0xdf as a scratch memory
pool -- usually to have somewhere to stash a small stack -- so if you
absolutely have to go down this route that range those probably be the
safest.  An alternative would be to use memory in the BDA in the range
0x4ac-0x4ff (absolute), which appears to be available for BIOS-specific
uses.

>> NEITHER OF THESE OPTIONS ARE SAFE ON REAL HARDWARE <<
These are both BIOS-specific use areas.

-hpa
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH] virtio-spec: document block CMD and FLUSH

2010-02-18 Thread Michael S. Tsirkin

I took a stub at documenting CMD and FLUSH request types in virtio
block.  Christoph, could you look over this please?

I note that the interface seems full of warts to me,
this might be a first step to cleaning them.

One issue I struggled with especially is how type
field mixes bits and non-bit values. I ended up
simply defining all legal values, so that we have
CMD = 2, CMD_OUT = 3 and so on.

I also avoided instroducing inhdr/outhdr structures
that virtio blk driver in linux uses, I was concerned
that nesting tables will confuse the reader.

Comments welcome.

Signed-off-by: Michael S. Tsirkin 

--

diff --git a/virtio-spec.lyx b/virtio-spec.lyx
index d16104a..ed35893 100644
--- a/virtio-spec.lyx
+++ b/virtio-spec.lyx
@@ -67,7 +67,11 @@ IBM Corporation
 \end_layout
 
 \begin_layout Standard
+
+\change_deleted 0 1266531118
 FIXME: virtio block scsi passthrough section
+\change_unchanged
+
 \end_layout
 
 \begin_layout Standard
@@ -4376,7 +4380,7 @@ struct virtio_net_ctrl_mac {
 The device can filter incoming packets by any number of destination MAC
  addresses.
 \begin_inset Foot
-status open
+status collapsed
 
 \begin_layout Plain Layout
 Since there are no guarentees, it can use a hash filter orsilently switch
@@ -4549,6 +4553,22 @@ blk_size
 \end_inset
 
 .
+\change_inserted 0 1266444580
+
+\end_layout
+
+\begin_layout Description
+
+\change_inserted 0 1266471229
+VIRTIO_BLK_F_SCSI (7) Device supports scsi packet commands.
+\end_layout
+
+\begin_layout Description
+
+\change_inserted 0 1266444605
+VIRTIO_BLK_F_FLUSH (9) Cache flush command support.
+\change_unchanged
+
 \end_layout
 
 \begin_layout Description
@@ -4700,17 +4720,25 @@ struct virtio_blk_req {
 
 \begin_layout Plain Layout
 
+\change_deleted 0 1266472188
+
 #define VIRTIO_BLK_T_IN  0
 \end_layout
 
 \begin_layout Plain Layout
 
+\change_deleted 0 1266472188
+
 #define VIRTIO_BLK_T_OUT 1
 \end_layout
 
 \begin_layout Plain Layout
 
+\change_deleted 0 1266472188
+
 #define VIRTIO_BLK_T_BARRIER0x8000
+\change_unchanged
+
 \end_layout
 
 \begin_layout Plain Layout
@@ -4735,11 +4763,15 @@ struct virtio_blk_req {
 
 \begin_layout Plain Layout
 
+\change_deleted 0 1266472204
+
 #define VIRTIO_BLK_S_OK0
 \end_layout
 
 \begin_layout Plain Layout
 
+\change_deleted 0 1266472204
+
 #define VIRTIO_BLK_S_IOERR 1
 \end_layout
 
@@ -4759,32 +4791,481 @@ struct virtio_blk_req {
 \end_layout
 
 \begin_layout Standard
-The type of the request is either a read (VIRTIO_BLK_T_IN) or a write 
(VIRTIO_BL
-K_T_OUT); the high bit indicates that this request acts as a barrier and
- that all preceeding requests must be complete before this one, and all
- following requests must not be started until this is complete.
+
+\change_inserted 0 1266472490
+If the device has VIRTIO_BLK_F_SCSI feature, it can also support scsi packet
+ command requests, each of these requests is of form:
+\begin_inset listings
+inline false
+status open
+
+\begin_layout Plain Layout
+
+\change_inserted 0 1266472395
+
+struct virtio_scsi_pc_req {
+\end_layout
+
+\begin_layout Plain Layout
+
+\change_inserted 0 1266472375
+
+   u32 type;
+\end_layout
+
+\begin_layout Plain Layout
+
+\change_inserted 0 1266472375
+
+   u32 ioprio;
+\end_layout
+
+\begin_layout Plain Layout
+
+\change_inserted 0 1266474298
+
+   u64 sector;
+\end_layout
+
+\begin_layout Plain Layout
+
+\change_inserted 0 1266474308
+
+char cmd[];
+\end_layout
+
+\begin_layout Plain Layout
+
+\change_inserted 0 1266505809
+
+   char data[][512];
+\end_layout
+
+\begin_layout Plain Layout
+
+\change_inserted 0 1266505825
+
+#define SCSI_SENSE_BUFFERSIZE   96
+\end_layout
+
+\begin_layout Plain Layout
+
+\change_inserted 0 1266505848
+
+u8 sense[SCSI_SENSE_BUFFERSIZE];
+\end_layout
+
+\begin_layout Plain Layout
+
+\change_inserted 0 1266472969
+
+u32 errors;
+\end_layout
+
+\begin_layout Plain Layout
+
+\change_inserted 0 1266472979
+
+u32 data_len;
+\end_layout
+
+\begin_layout Plain Layout
+
+\change_inserted 0 1266472984
+
+u32 sense_len;
+\end_layout
+
+\begin_layout Plain Layout
+
+\change_inserted 0 1266472987
+
+u32 residual;
+\end_layout
+
+\begin_layout Plain Layout
+
+\change_inserted 0 1266472375
+
+   u8 status;
+\end_layout
+
+\begin_layout Plain Layout
+
+\change_inserted 0 1266472375
+
+};
+\end_layout
+
+\end_inset
+
+
+\change_unchanged
+
 \end_layout
 
 \begin_layout Standard
-The ioprio field is a hint about the relative priorities of requests to
- the device: higher numbers indicate more important requests.
+The 
+\emph on
+type
+\emph default
+ of the request is either a read (VIRTIO_BLK_T_IN)
+\change_inserted 0 1266495815
+,
+\change_unchanged
+ 
+\change_deleted 0 1266495817
+or
+\change_unchanged
+ a write (VIRTIO_BLK_T_OUT)
+\change_inserted 0 1266497316
+, a scsi packet command (VIRTIO_BLK_T_SCSI_CMD or VIRTIO_BLK_T_SCSI_CMD_OUT
+\begin_inset Foot
+status open
+
+\begin_layout Plain Layout
+
+\change_inserted 0 1266497390

Re: [PATCH 1/2] qemu-kvm: extboot: Keep variables in RAM

2010-02-18 Thread Anthony Liguori


On 02/18/2010 10:13 AM, Jan Kiszka wrote:

Instead of saving the old INT 0x13 and 0x19 handlers in ROM which fails
under QEMU as it enforces protection, keep them in spare vectors of the
interrupt table, namely INT 0x80 and 0x81.

Signed-off-by: Jan Kiszka
   


commit a4492b03932ea3c9762372f3e15e8c6526ee56c6
Author: H. Peter Anvin 
Date:   Fri Jul 18 11:22:59 2008 -0700

kvm: extboot: don't use interrupt vectors $0x2b and $0x2c

extboot's use of interrupt vectors $0x2b and $0x2c is unsafe, as these
interrupt vectors fall in the OS-use range (0x20-0x3f).  Furthermore,
it's unnecessary: we can keep a local pointer instead of hooking
another interrupt as long as we can write to our own segment.

Make the extboot segment writable, and use local variables to hold the
old link pointers.

If this turns out to cause problems, we should probably switch to
using vectors in the 0xc0-0xef range, and/or other BIOS-reserved
memory.

Signed-off-by: H. Peter Anvin 
Signed-off-by: Avi Kivity 

Sounds like 0x80/0x81 is probably not the best choice.  hpa: any 
suggestions?


Regards,

Anthony Liguori


---

Don't forget to update extboot.bin after merging both patches.

  pc-bios/optionrom/extboot.S |   41 ++---
  1 files changed, 30 insertions(+), 11 deletions(-)

diff --git a/pc-bios/optionrom/extboot.S b/pc-bios/optionrom/extboot.S
index 1e60f68..1eeb172 100644
--- a/pc-bios/optionrom/extboot.S
+++ b/pc-bios/optionrom/extboot.S
@@ -19,6 +19,9 @@
   *   Authors: Anthony Liguori
   */

+#define OLD_INT19  (0x80 * 4)  /* re-use INT 0x80 BASIC vector */
+#define OLD_INT13  (0x81 * 4)  /* re-use INT 0x81 BASIC vector */
+
  .code16
  .text
.global _start
@@ -37,7 +40,7 @@ _start:

/* save old int 19 */
mov (0x19*4), %eax
-   mov %eax, %cs:old_int19
+   mov %eax, (OLD_INT19)

/* install out int 19 handler */
movw $int19_handler, (0x19*4)
@@ -48,6 +51,7 @@ _start:
lret

  int19_handler:
+   push %eax /* reserve space for lret */
push %eax
push %bx
push %cx
@@ -69,7 +73,7 @@ int19_handler:
  1: /* hook int13: intb(0x404) == 1 */
/* save old int 13 to int 2c */
mov (0x13*4), %eax
-   mov %eax, %cs:old_int13
+   mov %eax, (OLD_INT13)

/* install our int 13 handler */
movw $int13_handler, (0x13*4)
@@ -90,15 +94,21 @@ int19_handler:

  3: /* fall through: inb(0x404) == 0 */
/* restore previous int $0x19 handler */
-   mov %cs:old_int19,%eax
+   mov (OLD_INT19),%eax
mov %eax,(0x19*4)
-   
+
+   /* write old handler as return address onto stack */
+   push %bp
+   mov %sp, %bp
+   mov %eax, 14(%bp)
+   pop %bp
+
pop %ds
pop %dx
pop %cx
pop %bx
pop %eax
-   ljmpw *%cs:old_int19
+   lret

  #define FLAGS_CF  0x01

@@ -626,7 +636,21 @@ terminate_disk_emulation:
  int13_handler:
cmp $0x80, %dl
je 1f
-   ljmpw *%cs:old_int13
+
+   /* write old handler as return address onto stack */
+   push %eax
+   push %eax
+   push %ds
+   push %bp
+   mov %sp, %bp
+   xor %ax, %ax
+   mov %ax, %ds
+   mov (OLD_INT13), %eax
+   mov %eax, 8(%bp)
+   pop %bp
+   pop %ds
+   pop %eax
+   lret
  1:
cmp $0x0, %ah
jne 1f
@@ -686,10 +710,5 @@ int13_handler:
int $0x18  /* boot failed */
iret

-/* Variables */
-.align 4, 0
-old_int13: .long 0
-old_int19: .long 0
-   
  .align 512, 0
  _end:
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
   


--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH 2/2] qemu-kvm: extboot: Clean up host-guest interface

2010-02-18 Thread Anthony Liguori


On 02/18/2010 10:13 AM, Jan Kiszka wrote:

Drop the unused boot mode port 0x404 from the host-guest interface of
extboot and remove related code from both sides.

Signed-off-by: Jan Kiszka
   


Makes sense.

Acked-by: Anthony Liguori 


---
  hw/extboot.c|   14 +-
  hw/pc.c |2 +-
  hw/pc.h |2 +-
  pc-bios/optionrom/extboot.S |   23 ---
  4 files changed, 3 insertions(+), 38 deletions(-)

diff --git a/hw/extboot.c b/hw/extboot.c
index b91d54f..8ada21b 100644
--- a/hw/extboot.c
+++ b/hw/extboot.c
@@ -65,12 +65,6 @@ static void get_translated_chs(BlockDriverState *bs, int *c, 
int *h, int *s)
  }
  }

-static uint32_t extboot_read(void *opaque, uint32_t addr)
-{
-int *pcmd = opaque;
-return *pcmd;
-}
-
  static void extboot_write_cmd(void *opaque, uint32_t addr, uint32_t value)
  {
  union extboot_cmd cmd;
@@ -123,13 +117,7 @@ static void extboot_write_cmd(void *opaque, uint32_t addr, 
uint32_t value)
  qemu_free(buf);
  }

-void extboot_init(BlockDriverState *bs, int cmd)
+void extboot_init(BlockDriverState *bs)
  {
-int *pcmd;
-
-pcmd = qemu_mallocz(sizeof(int));
-
-*pcmd = cmd;
-register_ioport_read(0x404, 1, 1, extboot_read, pcmd);
  register_ioport_write(0x405, 1, 2, extboot_write_cmd, bs);
  }
diff --git a/hw/pc.c b/hw/pc.c
index 97e16ce..8175874 100644
--- a/hw/pc.c
+++ b/hw/pc.c
@@ -1057,7 +1057,7 @@ static void pc_init1(ram_addr_t ram_size,
bdrv_set_geometry_hint(info->bdrv, cyls, heads, secs);
}

-   extboot_init(info->bdrv, 1);
+   extboot_init(info->bdrv);
  }

  #ifdef CONFIG_KVM_DEVICE_ASSIGNMENT
diff --git a/hw/pc.h b/hw/pc.h
index b00f311..e9da683 100644
--- a/hw/pc.h
+++ b/hw/pc.h
@@ -165,7 +165,7 @@ void isa_ne2000_init(int base, int irq, NICInfo *nd);

  /* extboot.c */

-void extboot_init(BlockDriverState *bs, int cmd);
+void extboot_init(BlockDriverState *bs);

  int cpu_is_bsp(CPUState *env);

diff --git a/pc-bios/optionrom/extboot.S b/pc-bios/optionrom/extboot.S
index 1eeb172..db6c2b6 100644
--- a/pc-bios/optionrom/extboot.S
+++ b/pc-bios/optionrom/extboot.S
@@ -62,15 +62,6 @@ int19_handler:
xor %ax, %ax
mov %ax, %ds

-   movw $0x404, %dx
-   inb %dx, %al
-   cmp $1, %al
-   je 1f
-   cmp $2, %al
-   je 2f
-   jmp 3f
-
-1: /* hook int13: intb(0x404) == 1 */
/* save old int 13 to int 2c */
mov (0x13*4), %eax
mov %eax, (OLD_INT13)
@@ -78,21 +69,7 @@ int19_handler:
/* install our int 13 handler */
movw $int13_handler, (0x13*4)
mov %cs, (0x13*4+2)
-   jmp 3f

-2: /* linux boot: intb(0x404) == 2 */
-   cli
-   cld
-   mov $0x9000, %ax
-   mov %ax, %ds
-   mov %ax, %es
-   mov %ax, %fs
-   mov %ax, %gs
-   mov %ax, %ss
-   mov $0x8ffe, %sp
-   ljmp $0x9000 + 0x20, $0
-
-3: /* fall through: inb(0x404) == 0 */
/* restore previous int $0x19 handler */
mov (OLD_INT19),%eax
mov %eax,(0x19*4)
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
   


--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH 0/10] Nested SVM fixes (and Win7-64bit bringup)

2010-02-18 Thread Joerg Roedel

On Thu, Feb 18, 2010 at 04:54:37PM +0200, Avi Kivity wrote:
> On 02/18/2010 04:48 PM, Alexander Graf wrote:
> >On 18.02.2010, at 15:33, Avi Kivity wrote:
> >
> >>On 02/18/2010 01:38 PM, Joerg Roedel wrote:
> >>>Hi,
> >>>
> >>>here is a couple of fixes for the nested SVM implementation. I collected 
> >>>these
> >>>fixes mostly when trying to get Windows 7 64bit running as an L2 guest. 
> >>>Most
> >>>important fixes in this set make lazy fpu switching working with nested 
> >>>SVM and
> >>>the nested tpr handling fixes. Without the later fix the l1 guest freezes 
> >>>when
> >>>trying to run win7 as l2 guest. Please review and comment on these patches 
> >>>:-)
> >>>
> >>Overall looks good.  Would appreciate Alex looking over these as well.
> >The kmap thing is broken though, right?
> 
> Oh yes, but that's a search and replace, not something needing deep rework.

Great. I'll post again with fixes soon.

Joerg


--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH 09/10] KVM: SVM: Make lazy FPU switching work with nested svm

2010-02-18 Thread Joerg Roedel

On Thu, Feb 18, 2010 at 04:55:06PM +0200, Avi Kivity wrote:
> On 02/18/2010 04:51 PM, Alexander Graf wrote:
> >On 18.02.2010, at 12:38, Joerg Roedel wrote:
> >
> >>TDB.
> >TDB? That's not a patch description.
> >
> 
> Short for "To De Befined", I presume.

Ups. I just forgot to give this patch a right commit message. I add one
for the next post.

Joerg


--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH 09/10] KVM: SVM: Make lazy FPU switching work with nested svm

2010-02-18 Thread Joerg Roedel

On Thu, Feb 18, 2010 at 04:32:02PM +0200, Avi Kivity wrote:
> On 02/18/2010 01:38 PM, Joerg Roedel wrote:
> >TDB.
> >
> 
> ...
> 
> >@@ -973,6 +973,7 @@ static void svm_decache_cr4_guest_bits(struct kvm_vcpu 
> >*vcpu)
> >
> >  static void update_cr0_intercept(struct vcpu_svm *svm)
> >  {
> >+struct vmcb *vmcb = svm->vmcb;
> > ulong gcr0 = svm->vcpu.arch.cr0;
> > u64 *hcr0 =&svm->vmcb->save.cr0;
> >
> >@@ -984,11 +985,25 @@ static void update_cr0_intercept(struct vcpu_svm *svm)
> >
> >
> > if (gcr0 == *hcr0&&  svm->vcpu.fpu_active) {
> >-svm->vmcb->control.intercept_cr_read&= ~INTERCEPT_CR0_MASK;
> >-svm->vmcb->control.intercept_cr_write&= ~INTERCEPT_CR0_MASK;
> >+vmcb->control.intercept_cr_read&= ~INTERCEPT_CR0_MASK;
> >+vmcb->control.intercept_cr_write&= ~INTERCEPT_CR0_MASK;
> >+if (is_nested(svm)) {
> >+struct vmcb *hsave = svm->nested.hsave;
> >+
> >+hsave->control.intercept_cr_read&= ~INTERCEPT_CR0_MASK;
> >+hsave->control.intercept_cr_write&= ~INTERCEPT_CR0_MASK;
> >+vmcb->control.intercept_cr_read  |= 
> >svm->nested.intercept_cr_read;
> >+vmcb->control.intercept_cr_write |= 
> >svm->nested.intercept_cr_write;
> 
> Why are the last two lines needed?

Because we don't know if the l1 hypervisor wants to intercept cr0. In
this case we need this intercept to stay enabled.

> >+}
> > } else {
> > svm->vmcb->control.intercept_cr_read |= INTERCEPT_CR0_MASK;
> > svm->vmcb->control.intercept_cr_write |= INTERCEPT_CR0_MASK;
> >+if (is_nested(svm)) {
> >+struct vmcb *hsave = svm->nested.hsave;
> >+
> >+hsave->control.intercept_cr_read |= INTERCEPT_CR0_MASK;
> >+hsave->control.intercept_cr_write |= INTERCEPT_CR0_MASK;
> >+}
> > }
> >  }
> 
> Maybe it's better to call update_cr0_intercept() after a vmexit
> instead, to avoid this repetition, and since the if () may take a
> different branch for the nested guest and guest cr0.

Thinking again about it I am not sure if this is needed at all. At
vmexit emulation we call svm_set_cr0 which itself calls
update_cr0_intercept. I'll try this.

Joerg


--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH 03/10] KVM: SVM: Fix schedule-while-atomic on nested exception handling

2010-02-18 Thread Joerg Roedel

On Thu, Feb 18, 2010 at 03:52:20PM +0200, Avi Kivity wrote:
> On 02/18/2010 01:38 PM, Joerg Roedel wrote:
> >Move the actual vmexit routine out of code that runs with
> >irqs and preemption disabled.
> >
> >Cc: sta...@kernel.org
> >Signed-off-by: Joerg Roedel
> >---
> >  arch/x86/kvm/svm.c |   20 +---
> >  1 files changed, 17 insertions(+), 3 deletions(-)
> >
> >diff --git a/arch/x86/kvm/svm.c b/arch/x86/kvm/svm.c
> >index 7c96b8b..25d26ec 100644
> >--- a/arch/x86/kvm/svm.c
> >+++ b/arch/x86/kvm/svm.c
> >@@ -128,6 +128,7 @@ static void svm_flush_tlb(struct kvm_vcpu *vcpu);
> >  static void svm_complete_interrupts(struct vcpu_svm *svm);
> >
> >  static int nested_svm_exit_handled(struct vcpu_svm *svm);
> >+static int nested_svm_exit_handled_atomic(struct vcpu_svm *svm);
> >  static int nested_svm_vmexit(struct vcpu_svm *svm);
> >  static int nested_svm_check_exception(struct vcpu_svm *svm, unsigned nr,
> >   bool has_error_code, u32 error_code);
> >@@ -1386,7 +1387,7 @@ static int nested_svm_check_exception(struct vcpu_svm 
> >*svm, unsigned nr,
> > svm->vmcb->control.exit_info_1 = error_code;
> > svm->vmcb->control.exit_info_2 = svm->vcpu.arch.cr2;
> >
> >-return nested_svm_exit_handled(svm);
> >+return nested_svm_exit_handled_atomic(svm);
> >  }
> 
> What do you say to
> 
> 
>if (nested_svm_intercepts(svm))
> svm->nested.exit_required = true;
> 
> here, and recoding nested_svm_exit_handled() to call
> nested_svm_intercepts()?  I think it improves readability a little
> by avoiding a function that changes behaviour according to how it is
> called.

Thats a good idea, will change that. It improves readability.

> Long term, we may want to split out the big switch into the
> individual handlers, to avoid decoding the exit reason twice.

I don't think thats a good idea. The nested exit handling is at the
beginning of svm_handle_exit to hide the nested vcpu state from the kvm
logic which may not be aware of nesting at all. My rationale is that the
host hypervisor don't see any exits that needs to be reinjected to the
l1 hypervisor.

Joerg


--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH 01/10] KVM: SVM: Don't use kmap_atomic in nested_svm_map

2010-02-18 Thread Joerg Roedel

On Thu, Feb 18, 2010 at 03:40:56PM +0200, Avi Kivity wrote:
> On 02/18/2010 01:38 PM, Joerg Roedel wrote:
> >Use of kmap_atomic disables preemption but if we run in
> >shadow-shadow mode the vmrun emulation executes kvm_set_cr3
> >which might sleep or fault. So use kmap instead for
> >nested_svm_map.
> >
> >
> >
> >-static void nested_svm_unmap(void *addr, enum km_type idx)
> >+static void nested_svm_unmap(void *addr)
> >  {
> > struct page *page;
> >
> >@@ -1443,7 +1443,7 @@ static void nested_svm_unmap(void *addr, enum km_type 
> >idx)
> >
> > page = kmap_atomic_to_page(addr);
> >
> >-kunmap_atomic(addr, idx);
> >+kunmap(addr);
> > kvm_release_page_dirty(page);
> >  }
> 
> kunmap() takes a struct page *, not the virtual address (a
> consistent source of bugs).

Ah true, thanks. I'll fix that.

> kmap() is generally an unloved interface, it is slow and possibly
> deadlock prone, but it's better than sleeping in atomic context.  If
> you can hack your way around it, that is preferred.

Best would be to use kvm_read_guest, but I fear that this will have an
performance impact. Maybe I'll try this and measure if it really has a
significant performance impact.

Joerg


--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH 2/2] qemu-kvm: extboot: Clean up host-guest interface

2010-02-18 Thread Jan Kiszka

Drop the unused boot mode port 0x404 from the host-guest interface of
extboot and remove related code from both sides.

Signed-off-by: Jan Kiszka 
---
 hw/extboot.c|   14 +-
 hw/pc.c |2 +-
 hw/pc.h |2 +-
 pc-bios/optionrom/extboot.S |   23 ---
 4 files changed, 3 insertions(+), 38 deletions(-)

diff --git a/hw/extboot.c b/hw/extboot.c
index b91d54f..8ada21b 100644
--- a/hw/extboot.c
+++ b/hw/extboot.c
@@ -65,12 +65,6 @@ static void get_translated_chs(BlockDriverState *bs, int *c, 
int *h, int *s)
 }
 }
 
-static uint32_t extboot_read(void *opaque, uint32_t addr)
-{
-int *pcmd = opaque;
-return *pcmd;
-}
-
 static void extboot_write_cmd(void *opaque, uint32_t addr, uint32_t value)
 {
 union extboot_cmd cmd;
@@ -123,13 +117,7 @@ static void extboot_write_cmd(void *opaque, uint32_t addr, 
uint32_t value)
 qemu_free(buf);
 }
 
-void extboot_init(BlockDriverState *bs, int cmd)
+void extboot_init(BlockDriverState *bs)
 {
-int *pcmd;
-
-pcmd = qemu_mallocz(sizeof(int));
-
-*pcmd = cmd;
-register_ioport_read(0x404, 1, 1, extboot_read, pcmd);
 register_ioport_write(0x405, 1, 2, extboot_write_cmd, bs);
 }
diff --git a/hw/pc.c b/hw/pc.c
index 97e16ce..8175874 100644
--- a/hw/pc.c
+++ b/hw/pc.c
@@ -1057,7 +1057,7 @@ static void pc_init1(ram_addr_t ram_size,
bdrv_set_geometry_hint(info->bdrv, cyls, heads, secs);
}
 
-   extboot_init(info->bdrv, 1);
+   extboot_init(info->bdrv);
 }
 
 #ifdef CONFIG_KVM_DEVICE_ASSIGNMENT
diff --git a/hw/pc.h b/hw/pc.h
index b00f311..e9da683 100644
--- a/hw/pc.h
+++ b/hw/pc.h
@@ -165,7 +165,7 @@ void isa_ne2000_init(int base, int irq, NICInfo *nd);
 
 /* extboot.c */
 
-void extboot_init(BlockDriverState *bs, int cmd);
+void extboot_init(BlockDriverState *bs);
 
 int cpu_is_bsp(CPUState *env);
 
diff --git a/pc-bios/optionrom/extboot.S b/pc-bios/optionrom/extboot.S
index 1eeb172..db6c2b6 100644
--- a/pc-bios/optionrom/extboot.S
+++ b/pc-bios/optionrom/extboot.S
@@ -62,15 +62,6 @@ int19_handler:
xor %ax, %ax
mov %ax, %ds
 
-   movw $0x404, %dx
-   inb %dx, %al
-   cmp $1, %al
-   je 1f
-   cmp $2, %al
-   je 2f
-   jmp 3f
-
-1: /* hook int13: intb(0x404) == 1 */
/* save old int 13 to int 2c */
mov (0x13*4), %eax
mov %eax, (OLD_INT13)
@@ -78,21 +69,7 @@ int19_handler:
/* install our int 13 handler */
movw $int13_handler, (0x13*4)
mov %cs, (0x13*4+2)
-   jmp 3f
 
-2: /* linux boot: intb(0x404) == 2 */
-   cli
-   cld
-   mov $0x9000, %ax
-   mov %ax, %ds
-   mov %ax, %es
-   mov %ax, %fs
-   mov %ax, %gs
-   mov %ax, %ss
-   mov $0x8ffe, %sp
-   ljmp $0x9000 + 0x20, $0
-
-3: /* fall through: inb(0x404) == 0 */
/* restore previous int $0x19 handler */
mov (OLD_INT19),%eax
mov %eax,(0x19*4)
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH 1/2] qemu-kvm: extboot: Keep variables in RAM

2010-02-18 Thread Jan Kiszka

Instead of saving the old INT 0x13 and 0x19 handlers in ROM which fails
under QEMU as it enforces protection, keep them in spare vectors of the
interrupt table, namely INT 0x80 and 0x81.

Signed-off-by: Jan Kiszka 
---

Don't forget to update extboot.bin after merging both patches.

 pc-bios/optionrom/extboot.S |   41 ++---
 1 files changed, 30 insertions(+), 11 deletions(-)

diff --git a/pc-bios/optionrom/extboot.S b/pc-bios/optionrom/extboot.S
index 1e60f68..1eeb172 100644
--- a/pc-bios/optionrom/extboot.S
+++ b/pc-bios/optionrom/extboot.S
@@ -19,6 +19,9 @@
  *   Authors: Anthony Liguori 
  */
 
+#define OLD_INT19  (0x80 * 4)  /* re-use INT 0x80 BASIC vector */
+#define OLD_INT13  (0x81 * 4)  /* re-use INT 0x81 BASIC vector */
+
 .code16
 .text
.global _start
@@ -37,7 +40,7 @@ _start:
 
/* save old int 19 */
mov (0x19*4), %eax
-   mov %eax, %cs:old_int19
+   mov %eax, (OLD_INT19)
 
/* install out int 19 handler */
movw $int19_handler, (0x19*4)
@@ -48,6 +51,7 @@ _start:
lret
 
 int19_handler:
+   push %eax /* reserve space for lret */
push %eax
push %bx
push %cx
@@ -69,7 +73,7 @@ int19_handler:
 1: /* hook int13: intb(0x404) == 1 */
/* save old int 13 to int 2c */
mov (0x13*4), %eax
-   mov %eax, %cs:old_int13
+   mov %eax, (OLD_INT13)
 
/* install our int 13 handler */
movw $int13_handler, (0x13*4)
@@ -90,15 +94,21 @@ int19_handler:
 
 3: /* fall through: inb(0x404) == 0 */
/* restore previous int $0x19 handler */
-   mov %cs:old_int19,%eax
+   mov (OLD_INT19),%eax
mov %eax,(0x19*4)
-   
+
+   /* write old handler as return address onto stack */
+   push %bp
+   mov %sp, %bp
+   mov %eax, 14(%bp)
+   pop %bp
+
pop %ds
pop %dx
pop %cx
pop %bx
pop %eax
-   ljmpw *%cs:old_int19
+   lret
 
 #define FLAGS_CF   0x01
 
@@ -626,7 +636,21 @@ terminate_disk_emulation:
 int13_handler:
cmp $0x80, %dl
je 1f
-   ljmpw *%cs:old_int13
+
+   /* write old handler as return address onto stack */
+   push %eax
+   push %eax
+   push %ds
+   push %bp
+   mov %sp, %bp
+   xor %ax, %ax
+   mov %ax, %ds
+   mov (OLD_INT13), %eax
+   mov %eax, 8(%bp)
+   pop %bp
+   pop %ds
+   pop %eax
+   lret
 1:
cmp $0x0, %ah
jne 1f
@@ -686,10 +710,5 @@ int13_handler:
int $0x18  /* boot failed */
iret
 
-/* Variables */
-.align 4, 0
-old_int13: .long 0
-old_int19: .long 0
-   
 .align 512, 0
 _end:
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH 09/10] KVM: SVM: Make lazy FPU switching work with nested svm

2010-02-18 Thread Avi Kivity


On 02/18/2010 04:51 PM, Alexander Graf wrote:

On 18.02.2010, at 12:38, Joerg Roedel wrote:

   

TDB.
 

TDB? That's not a patch description.

   


Short for "To De Befined", I presume.

--
error compiling committee.c: too many arguments to function

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH 0/10] Nested SVM fixes (and Win7-64bit bringup)

2010-02-18 Thread Avi Kivity


On 02/18/2010 04:48 PM, Alexander Graf wrote:

On 18.02.2010, at 15:33, Avi Kivity wrote:

   

On 02/18/2010 01:38 PM, Joerg Roedel wrote:
 

Hi,

here is a couple of fixes for the nested SVM implementation. I collected these
fixes mostly when trying to get Windows 7 64bit running as an L2 guest. Most
important fixes in this set make lazy fpu switching working with nested SVM and
the nested tpr handling fixes. Without the later fix the l1 guest freezes when
trying to run win7 as l2 guest. Please review and comment on these patches :-)

   

Overall looks good.  Would appreciate Alex looking over these as well.
 

The kmap thing is broken though, right?
   


Oh yes, but that's a search and replace, not something needing deep rework.

--
error compiling committee.c: too many arguments to function

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH 09/10] KVM: SVM: Make lazy FPU switching work with nested svm

2010-02-18 Thread Alexander Graf


On 18.02.2010, at 12:38, Joerg Roedel wrote:

> TDB.

TDB? That's not a patch description.


Alex
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH 0/10] Nested SVM fixes (and Win7-64bit bringup)

2010-02-18 Thread Alexander Graf


On 18.02.2010, at 15:33, Avi Kivity wrote:

> On 02/18/2010 01:38 PM, Joerg Roedel wrote:
>> Hi,
>> 
>> here is a couple of fixes for the nested SVM implementation. I collected 
>> these
>> fixes mostly when trying to get Windows 7 64bit running as an L2 guest. Most
>> important fixes in this set make lazy fpu switching working with nested SVM 
>> and
>> the nested tpr handling fixes. Without the later fix the l1 guest freezes 
>> when
>> trying to run win7 as l2 guest. Please review and comment on these patches 
>> :-)
>>   
> 
> Overall looks good.  Would appreciate Alex looking over these as well.

The kmap thing is broken though, right?

Alex--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [patch uq/master 2/4] qemu: kvm specific wait_io_event

2010-02-18 Thread Avi Kivity


On 02/18/2010 03:58 PM, Marcelo Tosatti wrote:

On Thu, Feb 18, 2010 at 10:29:35AM +0200, Avi Kivity wrote:
   

+static void qemu_kvm_wait_io_event(CPUState *env)
+{
+while (!cpu_has_work(env))
+qemu_cond_timedwait(env->halt_cond,&qemu_global_mutex, 1000);
+
+qemu_wait_io_event_common(env);
  }
   

Shouldn't kvm specific code be in kvm-all.c?
 

The context is in vl.c, so don't see much gain.
   


ok.

--
error compiling committee.c: too many arguments to function

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH 0/10] Nested SVM fixes (and Win7-64bit bringup)

2010-02-18 Thread Avi Kivity


On 02/18/2010 01:38 PM, Joerg Roedel wrote:

Hi,

here is a couple of fixes for the nested SVM implementation. I collected these
fixes mostly when trying to get Windows 7 64bit running as an L2 guest. Most
important fixes in this set make lazy fpu switching working with nested SVM and
the nested tpr handling fixes. Without the later fix the l1 guest freezes when
trying to run win7 as l2 guest. Please review and comment on these patches :-)
   


Overall looks good.  Would appreciate Alex looking over these as well.

--
error compiling committee.c: too many arguments to function

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH 09/10] KVM: SVM: Make lazy FPU switching work with nested svm

2010-02-18 Thread Avi Kivity


On 02/18/2010 01:38 PM, Joerg Roedel wrote:

TDB.

   


...


@@ -973,6 +973,7 @@ static void svm_decache_cr4_guest_bits(struct kvm_vcpu 
*vcpu)

  static void update_cr0_intercept(struct vcpu_svm *svm)
  {
+   struct vmcb *vmcb = svm->vmcb;
ulong gcr0 = svm->vcpu.arch.cr0;
u64 *hcr0 =&svm->vmcb->save.cr0;

@@ -984,11 +985,25 @@ static void update_cr0_intercept(struct vcpu_svm *svm)


if (gcr0 == *hcr0&&  svm->vcpu.fpu_active) {
-   svm->vmcb->control.intercept_cr_read&= ~INTERCEPT_CR0_MASK;
-   svm->vmcb->control.intercept_cr_write&= ~INTERCEPT_CR0_MASK;
+   vmcb->control.intercept_cr_read&= ~INTERCEPT_CR0_MASK;
+   vmcb->control.intercept_cr_write&= ~INTERCEPT_CR0_MASK;
+   if (is_nested(svm)) {
+   struct vmcb *hsave = svm->nested.hsave;
+
+   hsave->control.intercept_cr_read&= ~INTERCEPT_CR0_MASK;
+   hsave->control.intercept_cr_write&= ~INTERCEPT_CR0_MASK;
+   vmcb->control.intercept_cr_read  |= 
svm->nested.intercept_cr_read;
+   vmcb->control.intercept_cr_write |= 
svm->nested.intercept_cr_write;
   


Why are the last two lines needed?


+   }
} else {
svm->vmcb->control.intercept_cr_read |= INTERCEPT_CR0_MASK;
svm->vmcb->control.intercept_cr_write |= INTERCEPT_CR0_MASK;
+   if (is_nested(svm)) {
+   struct vmcb *hsave = svm->nested.hsave;
+
+   hsave->control.intercept_cr_read |= INTERCEPT_CR0_MASK;
+   hsave->control.intercept_cr_write |= INTERCEPT_CR0_MASK;
+   }
}
  }
   


Maybe it's better to call update_cr0_intercept() after a vmexit instead, 
to avoid this repetition, and since the if () may take a different 
branch for the nested guest and guest cr0.



--
error compiling committee.c: too many arguments to function

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: extboot: Purpose of cmd value

2010-02-18 Thread Anthony Liguori


On 02/18/2010 08:01 AM, Jan Kiszka wrote:

Hi Anthony,

I have to fix extboot for non-KVM usage which means pushing its
variables to writable RAM (probably some spare INT vector(s) in the
BASIC area).

In that process, I stumbled over the cmd value passed to extboot_init,
then stored in qemu_malloc'ed memory, and finally reported to the guest
side via port 0x404. It's constant so far, always 1. So there is quite a
bit unused code and data on both sides. Can we drop that?
   


Yes, that's historic from the days when extboot also did Linux kernel 
loading.  I don't see a problem dropping.


Regards,

Anthony Liguori


Jan

   


--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

extboot: Purpose of cmd value

2010-02-18 Thread Jan Kiszka

Hi Anthony,

I have to fix extboot for non-KVM usage which means pushing its
variables to writable RAM (probably some spare INT vector(s) in the
BASIC area).

In that process, I stumbled over the cmd value passed to extboot_init,
then stored in qemu_malloc'ed memory, and finally reported to the guest
side via port 0x404. It's constant so far, always 1. So there is quite a
bit unused code and data on both sides. Can we drop that?

Jan

-- 
Siemens AG, Corporate Technology, CT T DE IT 1
Corporate Competence Center Embedded Linux
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [patch uq/master 2/4] qemu: kvm specific wait_io_event

2010-02-18 Thread Marcelo Tosatti

On Thu, Feb 18, 2010 at 10:29:35AM +0200, Avi Kivity wrote:
> >+static void qemu_kvm_wait_io_event(CPUState *env)
> >+{
> >+while (!cpu_has_work(env))
> >+qemu_cond_timedwait(env->halt_cond,&qemu_global_mutex, 1000);
> >+
> >+qemu_wait_io_event_common(env);
> >  }
> 
> Shouldn't kvm specific code be in kvm-all.c?

The context is in vl.c, so don't see much gain.

> >
> >  static int qemu_cpu_exec(CPUState *env);
> >@@ -3448,7 +3462,7 @@ static void *kvm_cpu_thread_fn(void *arg
> >  while (1) {
> >  if (cpu_can_run(env))
> >  qemu_cpu_exec(env);
> >-qemu_wait_io_event(env);
> >+qemu_kvm_wait_io_event(env);
> >  }
> >
> >  return NULL;
> 
> Well, kvm_cpu_thread_fn() apparently isn't.


--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH 03/10] KVM: SVM: Fix schedule-while-atomic on nested exception handling

2010-02-18 Thread Avi Kivity


On 02/18/2010 01:38 PM, Joerg Roedel wrote:

Move the actual vmexit routine out of code that runs with
irqs and preemption disabled.

Cc: sta...@kernel.org
Signed-off-by: Joerg Roedel
---
  arch/x86/kvm/svm.c |   20 +---
  1 files changed, 17 insertions(+), 3 deletions(-)

diff --git a/arch/x86/kvm/svm.c b/arch/x86/kvm/svm.c
index 7c96b8b..25d26ec 100644
--- a/arch/x86/kvm/svm.c
+++ b/arch/x86/kvm/svm.c
@@ -128,6 +128,7 @@ static void svm_flush_tlb(struct kvm_vcpu *vcpu);
  static void svm_complete_interrupts(struct vcpu_svm *svm);

  static int nested_svm_exit_handled(struct vcpu_svm *svm);
+static int nested_svm_exit_handled_atomic(struct vcpu_svm *svm);
  static int nested_svm_vmexit(struct vcpu_svm *svm);
  static int nested_svm_check_exception(struct vcpu_svm *svm, unsigned nr,
  bool has_error_code, u32 error_code);
@@ -1386,7 +1387,7 @@ static int nested_svm_check_exception(struct vcpu_svm 
*svm, unsigned nr,
svm->vmcb->control.exit_info_1 = error_code;
svm->vmcb->control.exit_info_2 = svm->vcpu.arch.cr2;

-   return nested_svm_exit_handled(svm);
+   return nested_svm_exit_handled_atomic(svm);
  }
   


What do you say to


   if (nested_svm_intercepts(svm))
svm->nested.exit_required = true;

here, and recoding nested_svm_exit_handled() to call 
nested_svm_intercepts()?  I think it improves readability a little by 
avoiding a function that changes behaviour according to how it is called.


Long term, we may want to split out the big switch into the individual 
handlers, to avoid decoding the exit reason twice.


--
error compiling committee.c: too many arguments to function

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH 01/10] KVM: SVM: Don't use kmap_atomic in nested_svm_map

2010-02-18 Thread Avi Kivity


On 02/18/2010 01:38 PM, Joerg Roedel wrote:

Use of kmap_atomic disables preemption but if we run in
shadow-shadow mode the vmrun emulation executes kvm_set_cr3
which might sleep or fault. So use kmap instead for
nested_svm_map.



-static void nested_svm_unmap(void *addr, enum km_type idx)
+static void nested_svm_unmap(void *addr)
  {
struct page *page;

@@ -1443,7 +1443,7 @@ static void nested_svm_unmap(void *addr, enum km_type idx)

page = kmap_atomic_to_page(addr);

-   kunmap_atomic(addr, idx);
+   kunmap(addr);
kvm_release_page_dirty(page);
  }
   


kunmap() takes a struct page *, not the virtual address (a consistent 
source of bugs).


kmap() is generally an unloved interface, it is slow and possibly 
deadlock prone, but it's better than sleeping in atomic context.  If you 
can hack your way around it, that is preferred.


--
error compiling committee.c: too many arguments to function

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH] Remove all references to KVM_CR3_CACHE

2010-02-18 Thread Avi Kivity


On 02/18/2010 11:57 AM, jes.soren...@redhat.com wrote:

This patch removes all references to KVM_CR3_CACHE as suggested by
Marcello.
   


Applied, thanks.

--
error compiling committee.c: too many arguments to function

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: qemu-kvm: do not allow vcpu stop with in progress PIO

2010-02-18 Thread Avi Kivity


On 02/09/2010 10:58 PM, Marcelo Tosatti wrote:

You're right... this should be enough to avoid a stop with uncomplete
PIO (and this is what happens for MMIO already). The signal will not
be dequeued, so KVM will complete_pio and exit before entering with
-EAGAIN. Please review and queue for stable.

qemu upstream needs a bit more work.

---

Re-enter the kernel to complete in progress PIO. Otherwise the
operation can be lost during migration.

   


Thanks - applied to master and 0.12.

--
error compiling committee.c: too many arguments to function

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: qemu-kvm-0.12.2 hangs when booting grub, when kvm is disabled

2010-02-18 Thread Jan Kiszka

Gleb Natapov wrote:
> On Thu, Feb 18, 2010 at 12:32:39PM +0100, Jan Kiszka wrote:
>> Jacques Landru wrote:
>>> Hi,
>>>
>>> Same problem here
>>>
>>> qemu-kvm-0.12.x hangs if I have at the same time "-no-kvm" and 
>>> "file=essai-slitaz.raw,if=ide,index=0,boot=on" sometime with the
>>> message  below
>>>
>>> but
>>>
>>> qemu-kvm with "-no-kvm" and without "boot=on" option for the file 
>>> parameter, works
>>> qemu-kvm with "boot=on" option and kvm enable works. (kvm-kmod is 2.6.32.7)
>> I have to confirm this issue: Something badly crashes here as well,
>> either grub or the extboot ROM or Seabios.
>>
>> Does anyone has a good idea what makes the difference here, ie. where to
>> start debugging?
>>
> May be TCG interprets something incorrectly in extboot.bin. Sounds
> unlikely, but symptoms look like this is the case.

Looks like the old story again: extboot tries to write to ROM (old_int13
and old_int19 variables). That happens to work due to KVM limitations,
but breaks once true protection is established. Don't we have some heap
managed by Seabios that extension ROMs can use?

Jan

-- 
Siemens AG, Corporate Technology, CT T DE IT 1
Corporate Competence Center Embedded Linux
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH 04/10] KVM: SVM: Sync all control registers on nested vmexit

2010-02-18 Thread Joerg Roedel

Currently the vmexit emulation does not sync control
registers were the access is typically intercepted by the
nested hypervisor. But we can not count on that intercepts
to sync these registers too and make the code
architecturally more correct.

Cc: sta...@kernel.org
Signed-off-by: Joerg Roedel 
---
 arch/x86/kvm/svm.c |4 
 1 files changed, 4 insertions(+), 0 deletions(-)

diff --git a/arch/x86/kvm/svm.c b/arch/x86/kvm/svm.c
index 25d26ec..9a4f9ee 100644
--- a/arch/x86/kvm/svm.c
+++ b/arch/x86/kvm/svm.c
@@ -1644,9 +1644,13 @@ static int nested_svm_vmexit(struct vcpu_svm *svm)
nested_vmcb->save.ds = vmcb->save.ds;
nested_vmcb->save.gdtr   = vmcb->save.gdtr;
nested_vmcb->save.idtr   = vmcb->save.idtr;
+   nested_vmcb->save.cr0= kvm_read_cr0(&svm->vcpu);
if (npt_enabled)
nested_vmcb->save.cr3= vmcb->save.cr3;
+   else
+   nested_vmcb->save.cr3= svm->vcpu.arch.cr3;
nested_vmcb->save.cr2= vmcb->save.cr2;
+   nested_vmcb->save.cr4= svm->vcpu.arch.cr4;
nested_vmcb->save.rflags = vmcb->save.rflags;
nested_vmcb->save.rip= vmcb->save.rip;
nested_vmcb->save.rsp= vmcb->save.rsp;
-- 
1.6.6


--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: qemu-kvm-0.12.2 hangs when booting grub, when kvm is disabled

2010-02-18 Thread Jan Kiszka

Gleb Natapov wrote:
> On Thu, Feb 18, 2010 at 12:32:39PM +0100, Jan Kiszka wrote:
>> Jacques Landru wrote:
>>> Hi,
>>>
>>> Same problem here
>>>
>>> qemu-kvm-0.12.x hangs if I have at the same time "-no-kvm" and 
>>> "file=essai-slitaz.raw,if=ide,index=0,boot=on" sometime with the
>>> message  below
>>>
>>> but
>>>
>>> qemu-kvm with "-no-kvm" and without "boot=on" option for the file 
>>> parameter, works
>>> qemu-kvm with "boot=on" option and kvm enable works. (kvm-kmod is 2.6.32.7)
>> I have to confirm this issue: Something badly crashes here as well,
>> either grub or the extboot ROM or Seabios.
>>
>> Does anyone has a good idea what makes the difference here, ie. where to
>> start debugging?
>>
> May be TCG interprets something incorrectly in extboot.bin. Sounds
> unlikely, but symptoms look like this is the case.

Yes, maybe. I'm currently building a debug version that has support for
TCG block tracing enabled. Should get us closer to the crash.

Jan

-- 
Siemens AG, Corporate Technology, CT T DE IT 1
Corporate Competence Center Embedded Linux
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: qemu-kvm-0.12.2 hangs when booting grub, when kvm is disabled

2010-02-18 Thread Gleb Natapov

On Thu, Feb 18, 2010 at 12:32:39PM +0100, Jan Kiszka wrote:
> Jacques Landru wrote:
> > Hi,
> > 
> > Same problem here
> > 
> > qemu-kvm-0.12.x hangs if I have at the same time "-no-kvm" and 
> > "file=essai-slitaz.raw,if=ide,index=0,boot=on" sometime with the
> > message  below
> > 
> > but
> > 
> > qemu-kvm with "-no-kvm" and without "boot=on" option for the file 
> > parameter, works
> > qemu-kvm with "boot=on" option and kvm enable works. (kvm-kmod is 2.6.32.7)
> 
> I have to confirm this issue: Something badly crashes here as well,
> either grub or the extboot ROM or Seabios.
> 
> Does anyone has a good idea what makes the difference here, ie. where to
> start debugging?
> 
May be TCG interprets something incorrectly in extboot.bin. Sounds
unlikely, but symptoms look like this is the case.

--
Gleb.
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH 05/10] KVM: SVM: Annotate nested_svm_map with might_sleep()

2010-02-18 Thread Joerg Roedel

The nested_svm_map() function can sleep and must not be
called from atomic context. So annotate that function.

Signed-off-by: Joerg Roedel 
---
 arch/x86/kvm/svm.c |2 ++
 1 files changed, 2 insertions(+), 0 deletions(-)

diff --git a/arch/x86/kvm/svm.c b/arch/x86/kvm/svm.c
index 9a4f9ee..3f59cbd 100644
--- a/arch/x86/kvm/svm.c
+++ b/arch/x86/kvm/svm.c
@@ -1423,6 +1423,8 @@ static void *nested_svm_map(struct vcpu_svm *svm, u64 gpa)
 {
struct page *page;
 
+   might_sleep();
+
page = gfn_to_page(svm->vcpu.kvm, gpa >> PAGE_SHIFT);
if (is_error_page(page))
goto error;
-- 
1.6.6


--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH 01/10] KVM: SVM: Don't use kmap_atomic in nested_svm_map

2010-02-18 Thread Joerg Roedel

Use of kmap_atomic disables preemption but if we run in
shadow-shadow mode the vmrun emulation executes kvm_set_cr3
which might sleep or fault. So use kmap instead for
nested_svm_map.

Cc: sta...@kernel.org
Signed-off-by: Joerg Roedel 
---
 arch/x86/kvm/svm.c |   32 
 1 files changed, 16 insertions(+), 16 deletions(-)

diff --git a/arch/x86/kvm/svm.c b/arch/x86/kvm/svm.c
index 52f78dd..041ef6f 100644
--- a/arch/x86/kvm/svm.c
+++ b/arch/x86/kvm/svm.c
@@ -1417,7 +1417,7 @@ static inline int nested_svm_intr(struct vcpu_svm *svm)
return 0;
 }
 
-static void *nested_svm_map(struct vcpu_svm *svm, u64 gpa, enum km_type idx)
+static void *nested_svm_map(struct vcpu_svm *svm, u64 gpa)
 {
struct page *page;
 
@@ -1425,7 +1425,7 @@ static void *nested_svm_map(struct vcpu_svm *svm, u64 
gpa, enum km_type idx)
if (is_error_page(page))
goto error;
 
-   return kmap_atomic(page, idx);
+   return kmap(page);
 
 error:
kvm_release_page_clean(page);
@@ -1434,7 +1434,7 @@ error:
return NULL;
 }
 
-static void nested_svm_unmap(void *addr, enum km_type idx)
+static void nested_svm_unmap(void *addr)
 {
struct page *page;
 
@@ -1443,7 +1443,7 @@ static void nested_svm_unmap(void *addr, enum km_type idx)
 
page = kmap_atomic_to_page(addr);
 
-   kunmap_atomic(addr, idx);
+   kunmap(addr);
kvm_release_page_dirty(page);
 }
 
@@ -1458,7 +1458,7 @@ static bool nested_svm_exit_handled_msr(struct vcpu_svm 
*svm)
if (!(svm->nested.intercept & (1ULL << INTERCEPT_MSR_PROT)))
return false;
 
-   msrpm = nested_svm_map(svm, svm->nested.vmcb_msrpm, KM_USER0);
+   msrpm = nested_svm_map(svm, svm->nested.vmcb_msrpm);
 
if (!msrpm)
goto out;
@@ -1486,7 +1486,7 @@ static bool nested_svm_exit_handled_msr(struct vcpu_svm 
*svm)
ret = msrpm[t1] & ((1 << param) << t0);
 
 out:
-   nested_svm_unmap(msrpm, KM_USER0);
+   nested_svm_unmap(msrpm);
 
return ret;
 }
@@ -1616,7 +1616,7 @@ static int nested_svm_vmexit(struct vcpu_svm *svm)
   vmcb->control.exit_int_info,
   vmcb->control.exit_int_info_err);
 
-   nested_vmcb = nested_svm_map(svm, svm->nested.vmcb, KM_USER0);
+   nested_vmcb = nested_svm_map(svm, svm->nested.vmcb);
if (!nested_vmcb)
return 1;
 
@@ -1706,7 +1706,7 @@ static int nested_svm_vmexit(struct vcpu_svm *svm)
/* Exit nested SVM mode */
svm->nested.vmcb = 0;
 
-   nested_svm_unmap(nested_vmcb, KM_USER0);
+   nested_svm_unmap(nested_vmcb);
 
kvm_mmu_reset_context(&svm->vcpu);
kvm_mmu_load(&svm->vcpu);
@@ -1719,7 +1719,7 @@ static bool nested_svm_vmrun_msrpm(struct vcpu_svm *svm)
u32 *nested_msrpm;
int i;
 
-   nested_msrpm = nested_svm_map(svm, svm->nested.vmcb_msrpm, KM_USER0);
+   nested_msrpm = nested_svm_map(svm, svm->nested.vmcb_msrpm);
if (!nested_msrpm)
return false;
 
@@ -1728,7 +1728,7 @@ static bool nested_svm_vmrun_msrpm(struct vcpu_svm *svm)
 
svm->vmcb->control.msrpm_base_pa = __pa(svm->nested.msrpm);
 
-   nested_svm_unmap(nested_msrpm, KM_USER0);
+   nested_svm_unmap(nested_msrpm);
 
return true;
 }
@@ -1739,7 +1739,7 @@ static bool nested_svm_vmrun(struct vcpu_svm *svm)
struct vmcb *hsave = svm->nested.hsave;
struct vmcb *vmcb = svm->vmcb;
 
-   nested_vmcb = nested_svm_map(svm, svm->vmcb->save.rax, KM_USER0);
+   nested_vmcb = nested_svm_map(svm, svm->vmcb->save.rax);
if (!nested_vmcb)
return false;
 
@@ -1851,7 +1851,7 @@ static bool nested_svm_vmrun(struct vcpu_svm *svm)
svm->vmcb->control.event_inj = nested_vmcb->control.event_inj;
svm->vmcb->control.event_inj_err = nested_vmcb->control.event_inj_err;
 
-   nested_svm_unmap(nested_vmcb, KM_USER0);
+   nested_svm_unmap(nested_vmcb);
 
enable_gif(svm);
 
@@ -1884,12 +1884,12 @@ static int vmload_interception(struct vcpu_svm *svm)
svm->next_rip = kvm_rip_read(&svm->vcpu) + 3;
skip_emulated_instruction(&svm->vcpu);
 
-   nested_vmcb = nested_svm_map(svm, svm->vmcb->save.rax, KM_USER0);
+   nested_vmcb = nested_svm_map(svm, svm->vmcb->save.rax);
if (!nested_vmcb)
return 1;
 
nested_svm_vmloadsave(nested_vmcb, svm->vmcb);
-   nested_svm_unmap(nested_vmcb, KM_USER0);
+   nested_svm_unmap(nested_vmcb);
 
return 1;
 }
@@ -1904,12 +1904,12 @@ static int vmsave_interception(struct vcpu_svm *svm)
svm->next_rip = kvm_rip_read(&svm->vcpu) + 3;
skip_emulated_instruction(&svm->vcpu);
 
-   nested_vmcb = nested_svm_map(svm, svm->vmcb->save.rax, KM_USER0);
+   nested_vmcb = nested_svm_map(svm, svm->vmcb->save.rax);
if (!nested_vmcb)
return 1;
 
nested_svm_vmloadsav

[PATCH 09/10] KVM: SVM: Make lazy FPU switching work with nested svm

2010-02-18 Thread Joerg Roedel

TDB.

Signed-off-by: Joerg Roedel 
---
 arch/x86/kvm/svm.c |   43 +++
 1 files changed, 39 insertions(+), 4 deletions(-)

diff --git a/arch/x86/kvm/svm.c b/arch/x86/kvm/svm.c
index a64b871..ad419aa 100644
--- a/arch/x86/kvm/svm.c
+++ b/arch/x86/kvm/svm.c
@@ -973,6 +973,7 @@ static void svm_decache_cr4_guest_bits(struct kvm_vcpu 
*vcpu)
 
 static void update_cr0_intercept(struct vcpu_svm *svm)
 {
+   struct vmcb *vmcb = svm->vmcb;
ulong gcr0 = svm->vcpu.arch.cr0;
u64 *hcr0 = &svm->vmcb->save.cr0;
 
@@ -984,11 +985,25 @@ static void update_cr0_intercept(struct vcpu_svm *svm)
 
 
if (gcr0 == *hcr0 && svm->vcpu.fpu_active) {
-   svm->vmcb->control.intercept_cr_read &= ~INTERCEPT_CR0_MASK;
-   svm->vmcb->control.intercept_cr_write &= ~INTERCEPT_CR0_MASK;
+   vmcb->control.intercept_cr_read &= ~INTERCEPT_CR0_MASK;
+   vmcb->control.intercept_cr_write &= ~INTERCEPT_CR0_MASK;
+   if (is_nested(svm)) {
+   struct vmcb *hsave = svm->nested.hsave;
+
+   hsave->control.intercept_cr_read  &= 
~INTERCEPT_CR0_MASK;
+   hsave->control.intercept_cr_write &= 
~INTERCEPT_CR0_MASK;
+   vmcb->control.intercept_cr_read  |= 
svm->nested.intercept_cr_read;
+   vmcb->control.intercept_cr_write |= 
svm->nested.intercept_cr_write;
+   }
} else {
svm->vmcb->control.intercept_cr_read |= INTERCEPT_CR0_MASK;
svm->vmcb->control.intercept_cr_write |= INTERCEPT_CR0_MASK;
+   if (is_nested(svm)) {
+   struct vmcb *hsave = svm->nested.hsave;
+
+   hsave->control.intercept_cr_read |= INTERCEPT_CR0_MASK;
+   hsave->control.intercept_cr_write |= INTERCEPT_CR0_MASK;
+   }
}
 }
 
@@ -1263,7 +1278,22 @@ static int ud_interception(struct vcpu_svm *svm)
 static void svm_fpu_activate(struct kvm_vcpu *vcpu)
 {
struct vcpu_svm *svm = to_svm(vcpu);
-   svm->vmcb->control.intercept_exceptions &= ~(1 << NM_VECTOR);
+   u32 excp;
+
+   if (is_nested(svm)) {
+   u32 h_excp, n_excp;
+
+   h_excp  = svm->nested.hsave->control.intercept_exceptions;
+   n_excp  = svm->nested.intercept_exceptions;
+   h_excp &= ~(1 << NM_VECTOR);
+   excp= h_excp | n_excp;
+   } else {
+   excp  = svm->vmcb->control.intercept_exceptions;
+   excp &= ~(1 << NM_VECTOR);
+   }
+
+   svm->vmcb->control.intercept_exceptions = excp;
+
svm->vcpu.fpu_active = 1;
update_cr0_intercept(svm);
 }
@@ -1507,6 +1537,9 @@ static int nested_svm_exit_special(struct vcpu_svm *svm)
if (!npt_enabled)
return NESTED_EXIT_HOST;
break;
+   case SVM_EXIT_EXCP_BASE + NM_VECTOR:
+   nm_interception(svm);
+   break;
default:
break;
}
@@ -2972,8 +3005,10 @@ static void svm_fpu_deactivate(struct kvm_vcpu *vcpu)
 {
struct vcpu_svm *svm = to_svm(vcpu);
 
-   update_cr0_intercept(svm);
svm->vmcb->control.intercept_exceptions |= 1 << NM_VECTOR;
+   if (is_nested(svm))
+   svm->nested.hsave->control.intercept_exceptions |= 1 << 
NM_VECTOR;
+   update_cr0_intercept(svm);
 }
 
 static struct kvm_x86_ops svm_x86_ops = {
-- 
1.6.6


--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH 02/10] KVM: SVM: Fix wrong interrupt injection in enable_irq_windows

2010-02-18 Thread Joerg Roedel

The nested_svm_intr() function does not execute the vmexit
anymore. Therefore we may still be in the nested state after
that function ran. This patch changes the nested_svm_intr()
function to return wether the irq window could be enabled.

Cc: sta...@kernel.org
Signed-off-by: Joerg Roedel 
---
 arch/x86/kvm/svm.c |   17 -
 1 files changed, 8 insertions(+), 9 deletions(-)

diff --git a/arch/x86/kvm/svm.c b/arch/x86/kvm/svm.c
index 041ef6f..7c96b8b 100644
--- a/arch/x86/kvm/svm.c
+++ b/arch/x86/kvm/svm.c
@@ -1389,16 +1389,17 @@ static int nested_svm_check_exception(struct vcpu_svm 
*svm, unsigned nr,
return nested_svm_exit_handled(svm);
 }
 
-static inline int nested_svm_intr(struct vcpu_svm *svm)
+/* This function returns true if it is save to enable the irq window */
+static inline bool nested_svm_intr(struct vcpu_svm *svm)
 {
if (!is_nested(svm))
-   return 0;
+   return true;
 
if (!(svm->vcpu.arch.hflags & HF_VINTR_MASK))
-   return 0;
+   return true;
 
if (!(svm->vcpu.arch.hflags & HF_HIF_MASK))
-   return 0;
+   return false;
 
svm->vmcb->control.exit_code = SVM_EXIT_INTR;
 
@@ -1411,10 +1412,10 @@ static inline int nested_svm_intr(struct vcpu_svm *svm)
 */
svm->nested.exit_required = true;
trace_kvm_nested_intr_vmexit(svm->vmcb->save.rip);
-   return 1;
+   return false;
}
 
-   return 0;
+   return true;
 }
 
 static void *nested_svm_map(struct vcpu_svm *svm, u64 gpa)
@@ -2562,13 +2563,11 @@ static void enable_irq_window(struct kvm_vcpu *vcpu)
 {
struct vcpu_svm *svm = to_svm(vcpu);
 
-   nested_svm_intr(svm);
-
/* In case GIF=0 we can't rely on the CPU to tell us when
 * GIF becomes 1, because that's a separate STGI/VMRUN intercept.
 * The next time we get that intercept, this function will be
 * called again though and we'll get the vintr intercept. */
-   if (gif_set(svm)) {
+   if (gif_set(svm) && nested_svm_intr(svm)) {
svm_set_vintr(svm);
svm_inject_irq(svm, 0x0);
}
-- 
1.6.6


--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH 08/10] KVM: SVM: Activate nested state only when guest state is complete

2010-02-18 Thread Joerg Roedel

Certain functions called during the emulated world switch
behave differently when the vcpu is running nested. This is
not the expected behavior during a world switch emulation.
This patch ensures that the nested state is activated only
if the vcpu is completly in nested state.

Signed-off-by: Joerg Roedel 
---
 arch/x86/kvm/svm.c |   15 +--
 1 files changed, 9 insertions(+), 6 deletions(-)

diff --git a/arch/x86/kvm/svm.c b/arch/x86/kvm/svm.c
index 2a3d525..a64b871 100644
--- a/arch/x86/kvm/svm.c
+++ b/arch/x86/kvm/svm.c
@@ -1631,6 +1631,9 @@ static int nested_svm_vmexit(struct vcpu_svm *svm)
if (!nested_vmcb)
return 1;
 
+   /* Exit nested SVM mode */
+   svm->nested.vmcb = 0;
+
/* Give the current vmcb to the guest */
disable_gif(svm);
 
@@ -1718,9 +1721,6 @@ static int nested_svm_vmexit(struct vcpu_svm *svm)
svm->vmcb->save.cpl = 0;
svm->vmcb->control.exit_int_info = 0;
 
-   /* Exit nested SVM mode */
-   svm->nested.vmcb = 0;
-
nested_svm_unmap(nested_vmcb);
 
kvm_mmu_reset_context(&svm->vcpu);
@@ -1753,14 +1753,14 @@ static bool nested_svm_vmrun(struct vcpu_svm *svm)
struct vmcb *nested_vmcb;
struct vmcb *hsave = svm->nested.hsave;
struct vmcb *vmcb = svm->vmcb;
+   u64 vmcb_gpa;
+
+   vmcb_gpa = svm->vmcb->save.rax;
 
nested_vmcb = nested_svm_map(svm, svm->vmcb->save.rax);
if (!nested_vmcb)
return false;
 
-   /* nested_vmcb is our indicator if nested SVM is activated */
-   svm->nested.vmcb = svm->vmcb->save.rax;
-
trace_kvm_nested_vmrun(svm->vmcb->save.rip - 3, svm->nested.vmcb,
   nested_vmcb->save.rip,
   nested_vmcb->control.int_ctl,
@@ -1875,6 +1875,9 @@ static bool nested_svm_vmrun(struct vcpu_svm *svm)
 
nested_svm_unmap(nested_vmcb);
 
+   /* nested_vmcb is our indicator if nested SVM is activated */
+   svm->nested.vmcb = vmcb_gpa;
+
enable_gif(svm);
 
return true;
-- 
1.6.6


--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH 06/10] KVM: SVM: Fix nested msr intercept handling

2010-02-18 Thread Joerg Roedel

The nested_svm_exit_handled_msr() function maps only one
page of the guests msr permission bitmap. This patch changes
the code to use kvm_read_guest to fix the bug.

Cc: sta...@kernel.org
Signed-off-by: Joerg Roedel 
---
 arch/x86/kvm/svm.c |   12 +++-
 1 files changed, 3 insertions(+), 9 deletions(-)

diff --git a/arch/x86/kvm/svm.c b/arch/x86/kvm/svm.c
index 3f59cbd..cbf798f 100644
--- a/arch/x86/kvm/svm.c
+++ b/arch/x86/kvm/svm.c
@@ -1457,16 +1457,11 @@ static bool nested_svm_exit_handled_msr(struct vcpu_svm 
*svm)
u32 msr = svm->vcpu.arch.regs[VCPU_REGS_RCX];
bool ret = false;
u32 t0, t1;
-   u8 *msrpm;
+   u8 val;
 
if (!(svm->nested.intercept & (1ULL << INTERCEPT_MSR_PROT)))
return false;
 
-   msrpm = nested_svm_map(svm, svm->nested.vmcb_msrpm);
-
-   if (!msrpm)
-   goto out;
-
switch (msr) {
case 0 ... 0x1fff:
t0 = (msr * 2) % 8;
@@ -1487,11 +1482,10 @@ static bool nested_svm_exit_handled_msr(struct vcpu_svm 
*svm)
goto out;
}
 
-   ret = msrpm[t1] & ((1 << param) << t0);
+   if (!kvm_read_guest(svm->vcpu.kvm, svm->nested.vmcb_msrpm + t1, &val, 
1))
+   ret = val & ((1 << param) << t0);
 
 out:
-   nested_svm_unmap(msrpm);
-
return ret;
 }
 
-- 
1.6.6


--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH 10/10] KVM: SVM: Remove newlines from nested trace points

2010-02-18 Thread Joerg Roedel

The tracing infrastructure adds its own newlines. Remove
them from the trace point printk format strings.

Signed-off-by: Joerg Roedel 
---
 arch/x86/kvm/trace.h |   12 ++--
 1 files changed, 6 insertions(+), 6 deletions(-)

diff --git a/arch/x86/kvm/trace.h b/arch/x86/kvm/trace.h
index 6ad30a2..12f8d2d 100644
--- a/arch/x86/kvm/trace.h
+++ b/arch/x86/kvm/trace.h
@@ -413,7 +413,7 @@ TRACE_EVENT(kvm_nested_vmrun,
),
 
TP_printk("rip: 0x%016llx vmcb: 0x%016llx nrip: 0x%016llx int_ctl: 
0x%08x "
- "event_inj: 0x%08x npt: %s\n",
+ "event_inj: 0x%08x npt: %s",
__entry->rip, __entry->vmcb, __entry->nested_rip,
__entry->int_ctl, __entry->event_inj,
__entry->npt ? "on" : "off")
@@ -447,7 +447,7 @@ TRACE_EVENT(kvm_nested_vmexit,
__entry->exit_int_info_err  = exit_int_info_err;
),
TP_printk("rip: 0x%016llx reason: %s ext_inf1: 0x%016llx "
- "ext_inf2: 0x%016llx ext_int: 0x%08x ext_int_err: 0x%08x\n",
+ "ext_inf2: 0x%016llx ext_int: 0x%08x ext_int_err: 0x%08x",
  __entry->rip,
  ftrace_print_symbols_seq(p, __entry->exit_code,
   kvm_x86_ops->exit_reasons_str),
@@ -482,7 +482,7 @@ TRACE_EVENT(kvm_nested_vmexit_inject,
),
 
TP_printk("reason: %s ext_inf1: 0x%016llx "
- "ext_inf2: 0x%016llx ext_int: 0x%08x ext_int_err: 0x%08x\n",
+ "ext_inf2: 0x%016llx ext_int: 0x%08x ext_int_err: 0x%08x",
  ftrace_print_symbols_seq(p, __entry->exit_code,
   kvm_x86_ops->exit_reasons_str),
__entry->exit_info1, __entry->exit_info2,
@@ -504,7 +504,7 @@ TRACE_EVENT(kvm_nested_intr_vmexit,
__entry->rip=   rip
),
 
-   TP_printk("rip: 0x%016llx\n", __entry->rip)
+   TP_printk("rip: 0x%016llx", __entry->rip)
 );
 
 /*
@@ -526,7 +526,7 @@ TRACE_EVENT(kvm_invlpga,
__entry->address=   address;
),
 
-   TP_printk("rip: 0x%016llx asid: %d address: 0x%016llx\n",
+   TP_printk("rip: 0x%016llx asid: %d address: 0x%016llx",
  __entry->rip, __entry->asid, __entry->address)
 );
 
@@ -547,7 +547,7 @@ TRACE_EVENT(kvm_skinit,
__entry->slb=   slb;
),
 
-   TP_printk("rip: 0x%016llx slb: 0x%08x\n",
+   TP_printk("rip: 0x%016llx slb: 0x%08x",
  __entry->rip, __entry->slb)
 );
 
-- 
1.6.6


--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH 07/10] KVM: SVM: Don't sync nested cr8 to lapic and back

2010-02-18 Thread Joerg Roedel

This patch makes syncing of the guest tpr to the lapic
conditional on !nested. Otherwise a nested guest using the
TPR could freeze the guest.
Another important change this patch introduces is that the
cr8 intercept bits are no longer ORed at vmrun emulation if
the guest sets VINTR_MASKING in its VMCB. The reason is that
nested cr8 accesses need alway be handled by the nested
hypervisor because they change the shadow version of the
tpr.

Cc: sta...@kernel.org
Signed-off-by: Joerg Roedel 
---
 arch/x86/kvm/svm.c |   46 +++---
 1 files changed, 31 insertions(+), 15 deletions(-)

diff --git a/arch/x86/kvm/svm.c b/arch/x86/kvm/svm.c
index cbf798f..2a3d525 100644
--- a/arch/x86/kvm/svm.c
+++ b/arch/x86/kvm/svm.c
@@ -1828,21 +1828,6 @@ static bool nested_svm_vmrun(struct vcpu_svm *svm)
svm->vmcb->save.dr6 = nested_vmcb->save.dr6;
svm->vmcb->save.cpl = nested_vmcb->save.cpl;
 
-   /* We don't want a nested guest to be more powerful than the guest,
-  so all intercepts are ORed */
-   svm->vmcb->control.intercept_cr_read |=
-   nested_vmcb->control.intercept_cr_read;
-   svm->vmcb->control.intercept_cr_write |=
-   nested_vmcb->control.intercept_cr_write;
-   svm->vmcb->control.intercept_dr_read |=
-   nested_vmcb->control.intercept_dr_read;
-   svm->vmcb->control.intercept_dr_write |=
-   nested_vmcb->control.intercept_dr_write;
-   svm->vmcb->control.intercept_exceptions |=
-   nested_vmcb->control.intercept_exceptions;
-
-   svm->vmcb->control.intercept |= nested_vmcb->control.intercept;
-
svm->nested.vmcb_msrpm = nested_vmcb->control.msrpm_base_pa;
 
/* cache intercepts */
@@ -1860,6 +1845,28 @@ static bool nested_svm_vmrun(struct vcpu_svm *svm)
else
svm->vcpu.arch.hflags &= ~HF_VINTR_MASK;
 
+   if (svm->vcpu.arch.hflags & HF_VINTR_MASK) {
+   /* We only want the cr8 intercept bits of the guest */
+   svm->vmcb->control.intercept_cr_read &= ~INTERCEPT_CR8_MASK;
+   svm->vmcb->control.intercept_cr_write &= ~INTERCEPT_CR8_MASK;
+   }
+
+   /* We don't want a nested guest to be more powerful than the guest,
+  so all intercepts are ORed */
+   svm->vmcb->control.intercept_cr_read |=
+   nested_vmcb->control.intercept_cr_read;
+   svm->vmcb->control.intercept_cr_write |=
+   nested_vmcb->control.intercept_cr_write;
+   svm->vmcb->control.intercept_dr_read |=
+   nested_vmcb->control.intercept_dr_read;
+   svm->vmcb->control.intercept_dr_write |=
+   nested_vmcb->control.intercept_dr_write;
+   svm->vmcb->control.intercept_exceptions |=
+   nested_vmcb->control.intercept_exceptions;
+
+   svm->vmcb->control.intercept |= nested_vmcb->control.intercept;
+
+   svm->vmcb->control.lbr_ctl = nested_vmcb->control.lbr_ctl;
svm->vmcb->control.int_vector = nested_vmcb->control.int_vector;
svm->vmcb->control.int_state = nested_vmcb->control.int_state;
svm->vmcb->control.tsc_offset += nested_vmcb->control.tsc_offset;
@@ -2520,6 +2527,9 @@ static void update_cr8_intercept(struct kvm_vcpu *vcpu, 
int tpr, int irr)
 {
struct vcpu_svm *svm = to_svm(vcpu);
 
+   if (is_nested(svm) && (vcpu->arch.hflags & HF_VINTR_MASK))
+   return;
+
if (irr == -1)
return;
 
@@ -2621,6 +2631,9 @@ static inline void sync_cr8_to_lapic(struct kvm_vcpu 
*vcpu)
 {
struct vcpu_svm *svm = to_svm(vcpu);
 
+   if (is_nested(svm) && (vcpu->arch.hflags & HF_VINTR_MASK))
+   return;
+
if (!(svm->vmcb->control.intercept_cr_write & INTERCEPT_CR8_MASK)) {
int cr8 = svm->vmcb->control.int_ctl & V_TPR_MASK;
kvm_set_cr8(vcpu, cr8);
@@ -2632,6 +2645,9 @@ static inline void sync_lapic_to_cr8(struct kvm_vcpu 
*vcpu)
struct vcpu_svm *svm = to_svm(vcpu);
u64 cr8;
 
+   if (is_nested(svm) && (vcpu->arch.hflags & HF_VINTR_MASK))
+   return;
+
cr8 = kvm_get_cr8(vcpu);
svm->vmcb->control.int_ctl &= ~V_TPR_MASK;
svm->vmcb->control.int_ctl |= cr8 & V_TPR_MASK;
-- 
1.6.6


--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH 03/10] KVM: SVM: Fix schedule-while-atomic on nested exception handling

2010-02-18 Thread Joerg Roedel

Move the actual vmexit routine out of code that runs with
irqs and preemption disabled.

Cc: sta...@kernel.org
Signed-off-by: Joerg Roedel 
---
 arch/x86/kvm/svm.c |   20 +---
 1 files changed, 17 insertions(+), 3 deletions(-)

diff --git a/arch/x86/kvm/svm.c b/arch/x86/kvm/svm.c
index 7c96b8b..25d26ec 100644
--- a/arch/x86/kvm/svm.c
+++ b/arch/x86/kvm/svm.c
@@ -128,6 +128,7 @@ static void svm_flush_tlb(struct kvm_vcpu *vcpu);
 static void svm_complete_interrupts(struct vcpu_svm *svm);
 
 static int nested_svm_exit_handled(struct vcpu_svm *svm);
+static int nested_svm_exit_handled_atomic(struct vcpu_svm *svm);
 static int nested_svm_vmexit(struct vcpu_svm *svm);
 static int nested_svm_check_exception(struct vcpu_svm *svm, unsigned nr,
  bool has_error_code, u32 error_code);
@@ -1386,7 +1387,7 @@ static int nested_svm_check_exception(struct vcpu_svm 
*svm, unsigned nr,
svm->vmcb->control.exit_info_1 = error_code;
svm->vmcb->control.exit_info_2 = svm->vcpu.arch.cr2;
 
-   return nested_svm_exit_handled(svm);
+   return nested_svm_exit_handled_atomic(svm);
 }
 
 /* This function returns true if it is save to enable the irq window */
@@ -1520,7 +1521,7 @@ static int nested_svm_exit_special(struct vcpu_svm *svm)
 /*
  * If this function returns true, this #vmexit was already handled
  */
-static int nested_svm_exit_handled(struct vcpu_svm *svm)
+static int __nested_svm_exit_handled(struct vcpu_svm *svm, bool atomic)
 {
u32 exit_code = svm->vmcb->control.exit_code;
int vmexit = NESTED_EXIT_HOST;
@@ -1567,12 +1568,25 @@ static int nested_svm_exit_handled(struct vcpu_svm *svm)
}
 
if (vmexit == NESTED_EXIT_DONE) {
-   nested_svm_vmexit(svm);
+   if (!atomic)
+   nested_svm_vmexit(svm);
+   else
+   svm->nested.exit_required = true;
}
 
return vmexit;
 }
 
+static int nested_svm_exit_handled(struct vcpu_svm *svm)
+{
+   return __nested_svm_exit_handled(svm, false);
+}
+
+static int nested_svm_exit_handled_atomic(struct vcpu_svm *svm)
+{
+   return __nested_svm_exit_handled(svm, true);
+}
+
 static inline void copy_vmcb_control_area(struct vmcb *dst_vmcb, struct vmcb 
*from_vmcb)
 {
struct vmcb_control_area *dst  = &dst_vmcb->control;
-- 
1.6.6


--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH 0/10] Nested SVM fixes (and Win7-64bit bringup)

2010-02-18 Thread Joerg Roedel

Hi,

here is a couple of fixes for the nested SVM implementation. I collected these
fixes mostly when trying to get Windows 7 64bit running as an L2 guest. Most
important fixes in this set make lazy fpu switching working with nested SVM and
the nested tpr handling fixes. Without the later fix the l1 guest freezes when
trying to run win7 as l2 guest. Please review and comment on these patches :-)

Joerg

Diffstat:

 arch/x86/kvm/svm.c   |  187 ++
 arch/x86/kvm/trace.h |   12 ++--
 2 files changed, 133 insertions(+), 66 deletions(-)

Shortlog:

Joerg Roedel (10):
  KVM: SVM: Don't use kmap_atomic in nested_svm_map
  KVM: SVM: Fix wrong interrupt injection in enable_irq_windows
  KVM: SVM: Fix schedule-while-atomic on nested exception handling
  KVM: SVM: Sync all control registers on nested vmexit
  KVM: SVM: Annotate nested_svm_map with might_sleep()
  KVM: SVM: Fix nested msr intercept handling
  KVM: SVM: Don't sync nested cr8 to lapic and back
  KVM: SVM: Activate nested state only when guest state is complete
  KVM: SVM: Make lazy FPU switching work with nested svm
  KVM: SVM: Remove newlines from nested trace points


--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: qemu-kvm-0.12.2 hangs when booting grub, when kvm is disabled

2010-02-18 Thread Jan Kiszka

Jacques Landru wrote:
> Hi,
> 
> Same problem here
> 
> qemu-kvm-0.12.x hangs if I have at the same time "-no-kvm" and 
> "file=essai-slitaz.raw,if=ide,index=0,boot=on" sometime with the
> message  below
> 
> but
> 
> qemu-kvm with "-no-kvm" and without "boot=on" option for the file 
> parameter, works
> qemu-kvm with "boot=on" option and kvm enable works. (kvm-kmod is 2.6.32.7)

I have to confirm this issue: Something badly crashes here as well,
either grub or the extboot ROM or Seabios.

Does anyone has a good idea what makes the difference here, ie. where to
start debugging?

Jan

-- 
Siemens AG, Corporate Technology, CT T DE IT 1
Corporate Competence Center Embedded Linux
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH] [RFC] KVM test: Control files automatic generation to save memory

2010-02-18 Thread Michael Goldish


- "Lucas Meneghel Rodrigues"  wrote:

> On Sun, 2010-02-14 at 12:07 -0500, Michael Goldish wrote:
> > - "Lucas Meneghel Rodrigues"  wrote:
> > 
> > > As our configuration system generates a list of dicts
> > > with test parameters, and that list might be potentially
> > > *very* large, keeping all this information in memory might
> > > be a problem for smaller virtualization hosts due to
> > > the memory pressure created. Tests made on my 4GB laptop
> > > show that most of the memory is being used during a
> > > typical kvm autotest session.
> > > 
> > > So, instead of keeping all this information in memory,
> > > let's take a different approach and unfold all the
> > > tests generated by the config system and generate a
> > > control file:
> > > 
> > > job.run_test('kvm', params={param1, param2, ...}, tag='foo', ...)
> > > job.run_test('kvm', params={param1, param2, ...}, tag='bar', ...)
> > > 
> > > By dumping all the dicts that were before in the memory to
> > > a control file, the memory usage of a typical kvm autotest
> > > session is drastically reduced making it easier to run in smaller
> > > virt hosts.
> > > 
> > > The advantages of taking this new approach are:
> > >  * You can see what tests are going to run and the dependencies
> > >between them by looking at the generated control file
> > >  * The control file is all ready to use, you can for example
> > >paste it on the web interface and profit
> > >  * As mentioned, a lot less memory consumption, avoiding
> > >memory pressure on virtualization hosts.
> > > 
> > > This is a crude 1st pass at implementing this approach, so please
> > > provide comments.
> > > 
> > > Signed-off-by: Lucas Meneghel Rodrigues 
> > > ---
> > 
> > Interesting idea!
> > 
> > - Personally I don't like the renaming of kvm_config.py to
> > generate_control.py, and prefer to keep them separate, so that
> > generate_control.py has the create_control() function and
> > kvm_config.py has everything else.  It's just a matter of naming;
> > kvm_config.py deals mostly with config files, not with control
> files,
> > and it can be used for other purposes than generating control
> files.
> 
> Fair enough, no problem.
> 
> > - I wonder why so much memory is used by the test list.  Our daily
> > test sets aren't very big, so although the parser should use a huge
> > amount of memory while parsing, nearly all of that memory should be
> > freed by the time the parser is done, because the final 'only'
> > statement reduces the number of tests to a small fraction of the
> total
> > number in a full set.  What test set did you try with that 4 GB
> > machine, and how much memory was used by the test list?  If a
> > ridiculous amount of memory was used, this might indicate a bug in
> > kvm_config.py (maybe it keeps references to deleted tests, forcing
> > them to stay in memory).
> 
> This problem wasn't found during the daily test routine, rather it was
> a
> comment I heard from Naphtali about the typical autotest memory
> usage.
> Also Marcelo made a similar comment, so I thought it was a problem
> worth
> looking. I tried to run the default test set that we selected for
> upstream (3 resulting dicts) on my 4GB RAM laptop, here are my
> findings:
> 
>  * Before autotest usage: Around 20% of memory used, 10% used as
> cache.
>  * During autotest usage: About 99% of memory used, 27% used as
> cache.

Before autotest usage, were there any VMs running?

3 dicts can't possibly take up so much space.  If it is indeed kvm_config's
fault (which I doubt), there's probably a bug in it that prevents it from
freeing unused memory, and once we fix that bug the problem should be gone.

> So yes, there's a significant memory usage increase, that doesn't
> happen
> using a "flat", autogenerated control file. Sure it doesn't make my
> laptop crawl, but it is a *lot* of resource usage anyway.
> 
> Also, let's assume that for small test sets, we can can reclaim all
> memory back. Still we have to consider large test sets. I am all for
> profiling the memory usage and fix eventual bugs, but we need to keep
> in
> mind that one might want to run large test sets, and large test sets
> imply keeping a fairly large amount of data in memory. If the amount
> of
> memory is negligible on most use cases, then let's just fix bugs and
> forget about using the proposed approach.
> 
> Also, a "flat" control file is quicker to run, because there's no
> parsing of the config file happening in there. So, this control file

Agreed, but on the other hand, the static control file idea introduces
an extra preprocessing step (not necessarily bad).

> generation thing makes some sense, that's why I decided to code this
> 1st pass attempt at doing it.
> 
> > - I don't think this approach will work for control.parallel,
> because
> > the tests have to be assigned dynamically to available queues, and
> > AFAIK this can't be done by a simple static control file.
> 
> Not necessarily, as the control file is a

Re: [PATCH 0/4] More emulator correctness fixes

2010-02-18 Thread Takuya Yoshikawa


Gleb Natapov wrote:

This patch series adds proper permission checking during segment
selector loading. Some missing fault injections are added.

Gleb Natapov (2):
  KVM: Forbid modifying CS segment register by mov instruction.
  KVM: Fix segment descriptor loading.

Takuya Yoshikawa (2):
  KVM: Fix load_guest_segment_descriptor() to inject page fault
  KVM: Fix emulate_sys[call, enter, exit]()'s fault handling


Thanks you for rebasing my work and making the fix more comprehensive.



 arch/x86/include/asm/kvm_host.h |3 +-
 arch/x86/kvm/emulate.c  |   71 +++
 arch/x86/kvm/x86.c  |  190 +++
 3 files changed, 186 insertions(+), 78 deletions(-)

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: Poor performance with KVM, how to figure out why?

2010-02-18 Thread Beinicke, Thomas

On Thursday 18 February 2010 11:31:36 you wrote:
> Hi, sorry about the lengthy e-mail.

Hi,

are you sure the kvm-intel kernel module is loaded?

What is the output of  "lsmod" ?

Any useful kernel messages on the host or the VMs? What's the output of 
"dmesg"?

Cheers,

Thomas

> We've been evaluating KVM for a while. Currently the host is on
> 2.6.30-bpo.1-amd64, 4 CPU cores on an Intel Xeon 2,33. Disk controller is
> Areca ARC-1210 and the machine has 12GB of memory. KVM 85+dfsg-4~bpo50+1
> and libvirt 0.6.5-3~bpo50+1, both from backports.org. Guests are in qcow2
> images.
> 
> A few test servers have been running here for a while. That worked ok,
> so we've moved a few production servers on it as well. It's now running
> 8 guests, none of them are CPU or disk intensive (well, there are a mail
> server and web server there, which from time to time spike, but it's
> generally very low).
> 
> After a reboot the other day, performance is suddenly disaster. The only
> change we can see we've done is that we've allocated a bit more memory
> to the guests, and enabled 4 vcpus on all guests (some of them ran with
> 1 vcpu before). When I say performance is bad, it's to the point where
> typing on the keyboard is lagging. It seems load on one guest affects all
> of the others.
> 
> What is weird is that before the reboot, the host machine usually had a
> system load of about 0.30 on average, and a CPU load of 20-30% (total of
> all cores). After the reboot, this is a typical top output:
> 
> top - 11:17:49 up 1 day,  5:32,  3 users,  load average: 3.81, 3.85, 3.96
> Tasks: 113 total,   4 running, 109 sleeping,   0 stopped,   0 zombie
> Cpu0  : 93.7%us,  3.6%sy,  0.0%ni,  0.7%id,  1.7%wa,  0.3%hi,  0.0%si, 
> 0.0%st Cpu1  : 96.3%us,  3.7%sy,  0.0%ni,  0.0%id,  0.0%wa,  0.0%hi, 
> 0.0%si,  0.0%st Cpu2  : 93.7%us,  5.0%sy,  0.0%ni,  1.0%id,  0.0%wa, 
> 0.0%hi,  0.3%si,  0.0%st Cpu3  : 91.4%us,  5.6%sy,  0.0%ni,  2.7%id, 
> 0.0%wa,  0.0%hi,  0.3%si,  0.0%st Mem:  12335492k total, 12257056k used,  
>  78436k free,   24k buffers Swap:  7807580k total,   744584k used, 
> 7062996k free,  4927212k cached
> 
>   PID USER  PR  NI  VIRT  RES  SHR S %CPU %MEMTIME+  COMMAND
>  3398 root  20   0 2287m 1.9g  680 S  149 16.5 603:10.92 kvm
>  5041 root  20   0 2255m 890m  540 R   99  7.4 603:38.07 kvm
>  5055 root  20   0 2272m 980m  668 S   86  8.1 305:42.95 kvm
>  5095 root  20   0 2287m 1.9g  532 R   33 16.6 655:11.53 kvm
>  5073 root  20   0 2253m 435m  532 S   19  3.6 371:59.80 kvm
>  3334 root  20   0 2254m  66m  532 S6  0.5 106:58.20 kvm
> 
> Now this is the weird part: The guests are (really!) doing nothing.
> Before this started, each guest's load were typically 0.02 - 0.30. Now
> their load is suddenly 2.x and in top, even simple CPU processes like
> syslogd uses 20% CPU.
> 
> It _might_ seem like an i/o problem, because disk performance seems bad
> on all guests. find / would ususally fly by, now you'd see a bit lag'ish
> output (I know, bad performance test).
> 
> The host machine seems fast&fine, except it has a system load of about
> 2-6. It seems snappy, though. Here you have some info like iostat etc:
> 
> # iostat -kdxx1
> 
> inux 2.6.30-bpo.1-amd64 (cf01)  02/18/2010  _x86_64_
> 
> Device: rrqm/s   wrqm/s r/s w/srkB/swkB/s avgrq-sz
> avgqu-sz   await  svctm  %util sda   1.7318.91   29.89  
> 18.40   855.89   454.1754.26 0.489.98   2.23  10.75 sda1  
>0.3516.17   29.58   18.24   849.11   442.5854.03 0.47  
>  9.79   2.20  10.52 sda2  1.38 2.740.310.16
> 6.7811.5977.44 0.01   29.14  13.60   0.64
> 
> Device: rrqm/s   wrqm/s r/s w/srkB/swkB/s avgrq-sz
> avgqu-sz   await  svctm  %util sda   6.00 0.001.00   
> 0.00 4.00 0.00 8.00 0.310.00 308.00  30.80 sda1   
>   0.00 0.000.000.00 0.00 0.00 0.00 0.00   
> 0.00   0.00   0.00 sda2  6.00 0.001.000.00
> 4.00 0.00 8.00 0.310.00 308.00  30.80
> 
> Device: rrqm/s   wrqm/s r/s w/srkB/swkB/s avgrq-sz
> avgqu-sz   await  svctm  %util sda   0.00 0.001.00   
> 0.0028.00 0.0056.00 0.63  936.00 628.00  62.80 sda1   
>   0.00 0.000.000.00 0.00 0.00 0.00 0.00   
> 0.00   0.00   0.00 sda2  0.00 0.001.000.00   
> 28.00 0.0056.00 0.63  936.00 628.00  62.80
> 
> Device: rrqm/s   wrqm/s r/s w/srkB/swkB/s avgrq-sz
> avgqu-sz   await  svctm  %util sda   0.00 0.004.00   
> 0.00   228.00 0.00   114.00 0.049.00   5.00   2.00 sda1   
>   0.00 0.004.000.00   228.00 0.00   114.00 0.04   
> 9.00   5.00   2.00 sda2  0.00 0.000.000.00
> 0.00 0.00 0.00

Re: Poor performance with KVM, how to figure out why?

2010-02-18 Thread Avi Kivity


On 02/18/2010 12:31 PM, Vegard Svanberg wrote:

Hi, sorry about the lengthy e-mail.

We've been evaluating KVM for a while. Currently the host is on
2.6.30-bpo.1-amd64, 4 CPU cores on an Intel Xeon 2,33. Disk controller is Areca
ARC-1210 and the machine has 12GB of memory. KVM 85+dfsg-4~bpo50+1 and libvirt
0.6.5-3~bpo50+1, both from backports.org. Guests are in qcow2 images.
   


These are all really old.


A few test servers have been running here for a while. That worked ok,
so we've moved a few production servers on it as well. It's now running
8 guests, none of them are CPU or disk intensive (well, there are a mail server
and web server there, which from time to time spike, but it's generally very
low).

After a reboot the other day, performance is suddenly disaster. The only
change we can see we've done is that we've allocated a bit more memory
to the guests, and enabled 4 vcpus on all guests (some of them ran with
1 vcpu before). When I say performance is bad, it's to the point where
typing on the keyboard is lagging. It seems load on one guest affects all of
the others.

What is weird is that before the reboot, the host machine usually had a
system load of about 0.30 on average, and a CPU load of 20-30% (total of
all cores). After the reboot, this is a typical top output:

top - 11:17:49 up 1 day,  5:32,  3 users,  load average: 3.81, 3.85, 3.96
Tasks: 113 total,   4 running, 109 sleeping,   0 stopped,   0 zombie
Cpu0  : 93.7%us,  3.6%sy,  0.0%ni,  0.7%id,  1.7%wa,  0.3%hi,  0.0%si,  0.0%st
Cpu1  : 96.3%us,  3.7%sy,  0.0%ni,  0.0%id,  0.0%wa,  0.0%hi,  0.0%si,  0.0%st
Cpu2  : 93.7%us,  5.0%sy,  0.0%ni,  1.0%id,  0.0%wa,  0.0%hi,  0.3%si,  0.0%st
Cpu3  : 91.4%us,  5.6%sy,  0.0%ni,  2.7%id,  0.0%wa,  0.0%hi,  0.3%si,  0.0%st
Mem:  12335492k total, 12257056k used,78436k free,   24k buffers
Swap:  7807580k total,   744584k used,  7062996k free,  4927212k cached
   


Looks like you're running into swap.  Does 'vmstat 1' show swap activity?

Try dropping the vcpu count and memory back to initial levels, 
separately, to see which triggers the bad behaviour.



   PID USER  PR  NI  VIRT  RES  SHR S %CPU %MEMTIME+  COMMAND
  3398 root  20   0 2287m 1.9g  680 S  149 16.5 603:10.92 kvm
  5041 root  20   0 2255m 890m  540 R   99  7.4 603:38.07 kvm
  5055 root  20   0 2272m 980m  668 S   86  8.1 305:42.95 kvm
  5095 root  20   0 2287m 1.9g  532 R   33 16.6 655:11.53 kvm
  5073 root  20   0 2253m 435m  532 S   19  3.6 371:59.80 kvm
  3334 root  20   0 2254m  66m  532 S6  0.5 106:58.20 kvm
   


None of the RSS figures are nice round numbers, which might mean some of 
guest memory is swapped out.


(note: you can run kvm as non-root).


Now this is the weird part: The guests are (really!) doing nothing.
Before this started, each guest's load were typically 0.02 - 0.30. Now
their load is suddenly 2.x and in top, even simple CPU processes like
syslogd uses 20% CPU.
   


That may also be an indication of swap.  When the guest accesses a 
swapped out page, the time to swap in the page is accounted to the 
instruction that caused the access.



# vmstat
procs ---memory-- ---swap-- -io -system-- cpu
  r  b   swpd   free   buff  cache   si   sobibo   in   cs us sy id wa
  4  0 747292  98584 24 491236823   207   1108   14 35  5 58  2
   


'vmstat' rate data is from machine boot up.  'vmstat 1' gives current rates.


kvm statistics

  efer_reload   2493   0
  exits   7012998022   86420
  fpu_reload   1079562451121
  halt_exits   839269827   10930
  halt_wakeup   795288051364
  host_state_reload   1159155068   15293
  hypercalls  1471039754   17008
  insn_emulation  2782749902   35121
  insn_emulation_fail  0   0
  invlpg   1721196871754
  io_exits 1294826882084
  irq_exits4555154344884
  irq_injections   973172925   12423
  irq_window41631517 635
  largepages   0   0
  mmio_exits  941756   0
  mmu_cache_miss74512394 849
  mmu_flooded5132926  41
  mmu_pde_zapped40341877 356
  mmu_pte_updated 1150029759   13443
  mmu_pte_write   2184765599   27182
  mmu_recycled 52261   0
  mmu_shadow_zapped 74494953 766
  mmu_unsync 390  23
  mmu_unsync_global0   0
  nmi_injections   0   0
  nmi_window   0   0
  pf_fixed 4700579395144
  pf_guest 4638018765900
  remote_tlb_flush 1287650571024
  request_irq  0   0
  request_nmi  0   0
  signal_exits 0   0
  tlb_flush   1528830191   17996

   


This is reasonable for a 4-cpu host under moderate load.


--
error compiling committee.c: too many arguments to fu

Re: [PATCH 0/4] More emulator correctness fixes

2010-02-18 Thread Avi Kivity


On 02/18/2010 12:14 PM, Gleb Natapov wrote:

This patch series adds proper permission checking during segment
selector loading. Some missing fault injections are added.
   


Thanks, applied.

--
error compiling committee.c: too many arguments to function

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Poor performance with KVM, how to figure out why?

2010-02-18 Thread Vegard Svanberg

Hi, sorry about the lengthy e-mail.

We've been evaluating KVM for a while. Currently the host is on
2.6.30-bpo.1-amd64, 4 CPU cores on an Intel Xeon 2,33. Disk controller is Areca
ARC-1210 and the machine has 12GB of memory. KVM 85+dfsg-4~bpo50+1 and libvirt
0.6.5-3~bpo50+1, both from backports.org. Guests are in qcow2 images.

A few test servers have been running here for a while. That worked ok,
so we've moved a few production servers on it as well. It's now running
8 guests, none of them are CPU or disk intensive (well, there are a mail server
and web server there, which from time to time spike, but it's generally very
low).

After a reboot the other day, performance is suddenly disaster. The only
change we can see we've done is that we've allocated a bit more memory
to the guests, and enabled 4 vcpus on all guests (some of them ran with
1 vcpu before). When I say performance is bad, it's to the point where
typing on the keyboard is lagging. It seems load on one guest affects all of
the others.

What is weird is that before the reboot, the host machine usually had a
system load of about 0.30 on average, and a CPU load of 20-30% (total of
all cores). After the reboot, this is a typical top output:

top - 11:17:49 up 1 day,  5:32,  3 users,  load average: 3.81, 3.85, 3.96
Tasks: 113 total,   4 running, 109 sleeping,   0 stopped,   0 zombie
Cpu0  : 93.7%us,  3.6%sy,  0.0%ni,  0.7%id,  1.7%wa,  0.3%hi,  0.0%si,  0.0%st
Cpu1  : 96.3%us,  3.7%sy,  0.0%ni,  0.0%id,  0.0%wa,  0.0%hi,  0.0%si,  0.0%st
Cpu2  : 93.7%us,  5.0%sy,  0.0%ni,  1.0%id,  0.0%wa,  0.0%hi,  0.3%si,  0.0%st
Cpu3  : 91.4%us,  5.6%sy,  0.0%ni,  2.7%id,  0.0%wa,  0.0%hi,  0.3%si,  0.0%st
Mem:  12335492k total, 12257056k used,78436k free,   24k buffers
Swap:  7807580k total,   744584k used,  7062996k free,  4927212k cached

  PID USER  PR  NI  VIRT  RES  SHR S %CPU %MEMTIME+  COMMAND
 3398 root  20   0 2287m 1.9g  680 S  149 16.5 603:10.92 kvm
 5041 root  20   0 2255m 890m  540 R   99  7.4 603:38.07 kvm
 5055 root  20   0 2272m 980m  668 S   86  8.1 305:42.95 kvm
 5095 root  20   0 2287m 1.9g  532 R   33 16.6 655:11.53 kvm
 5073 root  20   0 2253m 435m  532 S   19  3.6 371:59.80 kvm
 3334 root  20   0 2254m  66m  532 S6  0.5 106:58.20 kvm

Now this is the weird part: The guests are (really!) doing nothing.
Before this started, each guest's load were typically 0.02 - 0.30. Now
their load is suddenly 2.x and in top, even simple CPU processes like
syslogd uses 20% CPU.

It _might_ seem like an i/o problem, because disk performance seems bad
on all guests. find / would ususally fly by, now you'd see a bit lag'ish
output (I know, bad performance test). 

The host machine seems fast&fine, except it has a system load of about
2-6. It seems snappy, though. Here you have some info like iostat etc:

# iostat -kdxx1 

inux 2.6.30-bpo.1-amd64 (cf01)  02/18/2010  _x86_64_

Device: rrqm/s   wrqm/s r/s w/srkB/swkB/s avgrq-sz 
avgqu-sz   await  svctm  %util
sda   1.7318.91   29.89   18.40   855.89   454.1754.26 
0.489.98   2.23  10.75
sda1  0.3516.17   29.58   18.24   849.11   442.5854.03 
0.479.79   2.20  10.52
sda2  1.38 2.740.310.16 6.7811.5977.44 
0.01   29.14  13.60   0.64

Device: rrqm/s   wrqm/s r/s w/srkB/swkB/s avgrq-sz 
avgqu-sz   await  svctm  %util
sda   6.00 0.001.000.00 4.00 0.00 8.00 
0.310.00 308.00  30.80
sda1  0.00 0.000.000.00 0.00 0.00 0.00 
0.000.00   0.00   0.00
sda2  6.00 0.001.000.00 4.00 0.00 8.00 
0.310.00 308.00  30.80

Device: rrqm/s   wrqm/s r/s w/srkB/swkB/s avgrq-sz 
avgqu-sz   await  svctm  %util
sda   0.00 0.001.000.0028.00 0.0056.00 
0.63  936.00 628.00  62.80
sda1  0.00 0.000.000.00 0.00 0.00 0.00 
0.000.00   0.00   0.00
sda2  0.00 0.001.000.0028.00 0.0056.00 
0.63  936.00 628.00  62.80

Device: rrqm/s   wrqm/s r/s w/srkB/swkB/s avgrq-sz 
avgqu-sz   await  svctm  %util
sda   0.00 0.004.000.00   228.00 0.00   114.00 
0.049.00   5.00   2.00
sda1  0.00 0.004.000.00   228.00 0.00   114.00 
0.049.00   5.00   2.00
sda2  0.00 0.000.000.00 0.00 0.00 0.00 
0.000.00   0.00   0.00

Device: rrqm/s   wrqm/s r/s w/srkB/swkB/s avgrq-sz 
avgqu-sz   await  svctm  %util
sda   6.00 0.003.000.0036.00 0.0024.00 
0.06   21.33  14.67   4.40
sda1  0.00 0.001.000.00 4.00 0.00 8.00 
0.02   24.00  24.00   2.40
sda2  6.00 0.002.00

Re: [Qemu-devel] Re: [PATCH v2] qemu-kvm: Speed up of the dirty-bitmap-traveling

2010-02-18 Thread Alexander Graf


On 18.02.2010, at 06:57, OHMURA Kei wrote:

>> "We think"? I mean - yes, I think so too. But have you actually measured 
>> it?
>> How much improvement are we talking here?
>> Is it still faster when a bswap is involved?
> Thanks for pointing out.
> I will post the data for x86 later.
> However, I don't have a test environment to check the impact of bswap.
> Would you please measure the run time between the following section if 
> possible?
 It'd make more sense to have a real stand alone test program, no?
 I can try to write one today, but I have some really nasty important bugs 
 to fix first.
>>> 
>>> OK.  I will prepare a test code with sample data.  Since I found a ppc 
>>> machine around, I will run the code and post the results of
>>> x86 and ppc.
>>> 
>>> 
>>> By the way, the following data is a result of x86 measured in QEMU/KVM.  
>>> This data shows, how many times the function is called (#called), runtime 
>>> of original function(orig.), runtime of this patch(patch), speedup ratio 
>>> (ratio).
>> That does indeed look promising!
>> Thanks for doing this micro-benchmark. I just want to be 100% sure that it 
>> doesn't affect performance for big endian badly.
> 
> 
> I measured runtime of the test code with sample data.  My test environment 
> and results are described below.
> 
> x86 Test Environment:
> CPU: 4x Intel Xeon Quad Core 2.66GHz
> Mem size: 6GB
> 
> ppc Test Environment:
> CPU: 2x Dual Core PPC970MP
> Mem size: 2GB
> 
> The sample data of dirty bitmap was produced by QEMU/KVM while the guest OS
> was live migrating.  To measure the runtime I copied cpu_get_real_ticks() of
> QEMU to my test program.
> 
> 
> Experimental results:
> Test1: Guest OS read 3GB file, which is bigger than memory.   orig.(msec) 
>patch(msec)ratio
> x860.30.16.4 ppc7.92.7
> 3.0 
> Test2: Guest OS read/write 3GB file, which is bigger than memory.   
> orig.(msec)patch(msec)ratio
> x8612.0   3.23.7 ppc251.1  123
> 2.0 
> 
> I also measured the runtime of bswap itself on ppc, and I found it was only 
> just 0.3% ~ 0.7 % of the runtime described above. 

Awesome! Thank you so much for giving actual data to make me feel comfortable 
with it :-).


Alex--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH] [RFC] KVM test: Control files automatic generation to save memory

2010-02-18 Thread Lucas Meneghel Rodrigues

On Sun, 2010-02-14 at 12:07 -0500, Michael Goldish wrote:
> - "Lucas Meneghel Rodrigues"  wrote:
> 
> > As our configuration system generates a list of dicts
> > with test parameters, and that list might be potentially
> > *very* large, keeping all this information in memory might
> > be a problem for smaller virtualization hosts due to
> > the memory pressure created. Tests made on my 4GB laptop
> > show that most of the memory is being used during a
> > typical kvm autotest session.
> > 
> > So, instead of keeping all this information in memory,
> > let's take a different approach and unfold all the
> > tests generated by the config system and generate a
> > control file:
> > 
> > job.run_test('kvm', params={param1, param2, ...}, tag='foo', ...)
> > job.run_test('kvm', params={param1, param2, ...}, tag='bar', ...)
> > 
> > By dumping all the dicts that were before in the memory to
> > a control file, the memory usage of a typical kvm autotest
> > session is drastically reduced making it easier to run in smaller
> > virt hosts.
> > 
> > The advantages of taking this new approach are:
> >  * You can see what tests are going to run and the dependencies
> >between them by looking at the generated control file
> >  * The control file is all ready to use, you can for example
> >paste it on the web interface and profit
> >  * As mentioned, a lot less memory consumption, avoiding
> >memory pressure on virtualization hosts.
> > 
> > This is a crude 1st pass at implementing this approach, so please
> > provide comments.
> > 
> > Signed-off-by: Lucas Meneghel Rodrigues 
> > ---
> 
> Interesting idea!
> 
> - Personally I don't like the renaming of kvm_config.py to
> generate_control.py, and prefer to keep them separate, so that
> generate_control.py has the create_control() function and
> kvm_config.py has everything else.  It's just a matter of naming;
> kvm_config.py deals mostly with config files, not with control files,
> and it can be used for other purposes than generating control files.

Fair enough, no problem.

> - I wonder why so much memory is used by the test list.  Our daily
> test sets aren't very big, so although the parser should use a huge
> amount of memory while parsing, nearly all of that memory should be
> freed by the time the parser is done, because the final 'only'
> statement reduces the number of tests to a small fraction of the total
> number in a full set.  What test set did you try with that 4 GB
> machine, and how much memory was used by the test list?  If a
> ridiculous amount of memory was used, this might indicate a bug in
> kvm_config.py (maybe it keeps references to deleted tests, forcing
> them to stay in memory).

This problem wasn't found during the daily test routine, rather it was a
comment I heard from Naphtali about the typical autotest memory usage.
Also Marcelo made a similar comment, so I thought it was a problem worth
looking. I tried to run the default test set that we selected for
upstream (3 resulting dicts) on my 4GB RAM laptop, here are my findings:

 * Before autotest usage: Around 20% of memory used, 10% used as cache.
 * During autotest usage: About 99% of memory used, 27% used as cache.

So yes, there's a significant memory usage increase, that doesn't happen
using a "flat", autogenerated control file. Sure it doesn't make my
laptop crawl, but it is a *lot* of resource usage anyway.

Also, let's assume that for small test sets, we can can reclaim all
memory back. Still we have to consider large test sets. I am all for
profiling the memory usage and fix eventual bugs, but we need to keep in
mind that one might want to run large test sets, and large test sets
imply keeping a fairly large amount of data in memory. If the amount of
memory is negligible on most use cases, then let's just fix bugs and
forget about using the proposed approach.

Also, a "flat" control file is quicker to run, because there's no
parsing of the config file happening in there. So, this control file
generation thing makes some sense, that's why I decided to code this 1st
pass attempt at doing it.

> - I don't think this approach will work for control.parallel, because
> the tests have to be assigned dynamically to available queues, and
> AFAIK this can't be done by a simple static control file.

Not necessarily, as the control file is a program, we can just generate
the code using some sort of function that can do the assignment. I don't
fully see all that's needed to get the job done, but in theory should be
possible.

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [patch] x86: kvm: Convert i8254/i8259 locks to raw_spinlocks

2010-02-18 Thread Avi Kivity


On 02/18/2010 12:05 PM, Jan Kiszka wrote:

Avi Kivity wrote:
   

On 02/18/2010 11:45 AM, Avi Kivity wrote:
 

On 02/18/2010 11:40 AM, Jan Kiszka wrote:
   

Meanwhile, if anyone has any idea how to kill this lock, I'd love to
see it.

   

What concurrency does it resolve in the end? On first glance, it only
synchronize the fiddling with pre-VCPU request bits, right? What forces
us to do this? Wouldn't it suffice to disable preemption (thus
migration) and the let concurrent requests race for setting the bits? I
mean if some request bit was already set on entry, we don't include the
   related VCPU in smp_call_function_many anyway.
 

It's more difficult.

vcpu 0: sets request bit on vcpu 2
   vcpu 1: test_and_set request bit on vcpu 2, returns already set
   vcpu 1: returns
vcpu 0: sends IPI
vcpu 0: returns

so vcpu 1 returns before the IPI was performed.  If the request was a
tlb flush, for example, vcpu 1 may free a page that is still in vcpu
2's tlb.
   

One way out would be to have a KVM_REQ_IN_PROGRESS, set it in
make_request, clear it in the IPI function.

If a second make_request sees it already set, it can simply busy wait
until it is cleared, without sending the IPI.  Of course the busy wait
means we can't enable preemption (or we may busy wait on an unscheduled
task), but at least the requests can proceed in parallel instead of
serializing.
 

...or include VCPUs with KVM_REQ_IN_PROGRESS set into the IPI set even
if they already have the desired request bit set.


But then we're making them take the IPI, which is pointless and 
expensive.  My approach piggy backs multiple requesters on one IPI.



Then we should
serialize in smp_call_function_many.
   


Do you mean rely on s_c_f_m's internal synchronization?

--
error compiling committee.c: too many arguments to function

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH 3/4] KVM: Fix segment descriptor loading.

2010-02-18 Thread Gleb Natapov

Add proper error and permission checking. This patch also change task
switching code to load segment selectors before segment descriptors, like
SDM requires, otherwise permission checking during segment descriptor
loading will be incorrect.

Signed-off-by: Gleb Natapov 
---
 arch/x86/include/asm/kvm_host.h |3 +-
 arch/x86/kvm/emulate.c  |   30 ++-
 arch/x86/kvm/x86.c  |  177 +++
 3 files changed, 151 insertions(+), 59 deletions(-)

diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
index 96b1e6e..d46e791 100644
--- a/arch/x86/include/asm/kvm_host.h
+++ b/arch/x86/include/asm/kvm_host.h
@@ -609,8 +609,7 @@ int emulator_set_dr(struct x86_emulate_ctxt *ctxt, int dr,
unsigned long value);
 
 void kvm_get_segment(struct kvm_vcpu *vcpu, struct kvm_segment *var, int seg);
-int kvm_load_segment_descriptor(struct kvm_vcpu *vcpu, u16 selector,
-   int type_bits, int seg);
+int kvm_load_segment_descriptor(struct kvm_vcpu *vcpu, u16 selector, int seg);
 
 int kvm_task_switch(struct kvm_vcpu *vcpu, u16 tss_selector, int reason);
 
diff --git a/arch/x86/kvm/emulate.c b/arch/x86/kvm/emulate.c
index 08ac9cf..8d315ab 100644
--- a/arch/x86/kvm/emulate.c
+++ b/arch/x86/kvm/emulate.c
@@ -1304,7 +1304,7 @@ static int emulate_pop_sreg(struct x86_emulate_ctxt *ctxt,
if (rc != X86EMUL_CONTINUE)
return rc;
 
-   rc = kvm_load_segment_descriptor(ctxt->vcpu, (u16)selector, 1, seg);
+   rc = kvm_load_segment_descriptor(ctxt->vcpu, (u16)selector, seg);
return rc;
 }
 
@@ -1482,7 +1482,7 @@ static int emulate_ret_far(struct x86_emulate_ctxt *ctxt,
rc = emulate_pop(ctxt, ops, &cs, c->op_bytes);
if (rc != X86EMUL_CONTINUE)
return rc;
-   rc = kvm_load_segment_descriptor(ctxt->vcpu, (u16)cs, 1, VCPU_SREG_CS);
+   rc = kvm_load_segment_descriptor(ctxt->vcpu, (u16)cs, VCPU_SREG_CS);
return rc;
 }
 
@@ -2118,12 +2118,11 @@ special_insn:
break;
case 0x8e: { /* mov seg, r/m16 */
uint16_t sel;
-   int type_bits;
-   int err;
 
sel = c->src.val;
 
-   if (c->modrm_reg == VCPU_SREG_CS) {
+   if (c->modrm_reg == VCPU_SREG_CS ||
+   c->modrm_reg > VCPU_SREG_GS) {
kvm_queue_exception(ctxt->vcpu, UD_VECTOR);
goto done;
}
@@ -2131,18 +2130,7 @@ special_insn:
if (c->modrm_reg == VCPU_SREG_SS)
toggle_interruptibility(ctxt, X86_SHADOW_INT_MOV_SS);
 
-   if (c->modrm_reg <= 5) {
-   type_bits = (c->modrm_reg == 1) ? 9 : 1;
-   err = kvm_load_segment_descriptor(ctxt->vcpu, sel,
- type_bits, 
c->modrm_reg);
-   } else {
-   printk(KERN_INFO "Invalid segreg in modrm byte 
0x%02x\n",
-   c->modrm);
-   goto cannot_emulate;
-   }
-
-   if (err < 0)
-   goto cannot_emulate;
+   rc = kvm_load_segment_descriptor(ctxt->vcpu, sel, c->modrm_reg);
 
c->dst.type = OP_NONE;  /* Disable writeback. */
break;
@@ -2316,11 +2304,9 @@ special_insn:
case 0xe9: /* jmp rel */
goto jmp;
case 0xea: /* jmp far */
-   if (kvm_load_segment_descriptor(ctxt->vcpu, c->src2.val, 9,
-   VCPU_SREG_CS) < 0) {
-   DPRINTF("jmp far: Failed to load CS descriptor\n");
-   goto cannot_emulate;
-   }
+   if (kvm_load_segment_descriptor(ctxt->vcpu, c->src2.val,
+   VCPU_SREG_CS))
+   goto done;
 
c->eip = c->src.val;
break;
diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index 0273980..c43d73d 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -4787,7 +4787,7 @@ static int kvm_load_realmode_segment(struct kvm_vcpu 
*vcpu, u16 selector, int se
.unusable = 0,
};
kvm_x86_ops->set_segment(vcpu, &segvar, seg);
-   return 0;
+   return X86EMUL_CONTINUE;
 }
 
 static int is_vm86_segment(struct kvm_vcpu *vcpu, int seg)
@@ -4797,43 +4797,112 @@ static int is_vm86_segment(struct kvm_vcpu *vcpu, int 
seg)
(kvm_get_rflags(vcpu) & X86_EFLAGS_VM);
 }
 
-static void kvm_check_segment_descriptor(struct kvm_vcpu *vcpu, int seg,
-u16 selector)
-{
-   /* NULL selector is not valid for CS and SS */
-   if (seg == VCPU_SREG_CS || seg == VCPU_SREG_SS)
-   if (!selector)
-   kvm_queue_exception_e(vcpu, TS_VECTOR, s

[PATCH 1/4] KVM: Forbid modifying CS segment register by mov instruction.

2010-02-18 Thread Gleb Natapov

Inject #UD if guest attempts to do so. This is in accordance to Intel
SDM.

Signed-off-by: Gleb Natapov 
---
 arch/x86/kvm/emulate.c |6 ++
 1 files changed, 6 insertions(+), 0 deletions(-)

diff --git a/arch/x86/kvm/emulate.c b/arch/x86/kvm/emulate.c
index 9beda8e..08ac9cf 100644
--- a/arch/x86/kvm/emulate.c
+++ b/arch/x86/kvm/emulate.c
@@ -2122,6 +2122,12 @@ special_insn:
int err;
 
sel = c->src.val;
+
+   if (c->modrm_reg == VCPU_SREG_CS) {
+   kvm_queue_exception(ctxt->vcpu, UD_VECTOR);
+   goto done;
+   }
+
if (c->modrm_reg == VCPU_SREG_SS)
toggle_interruptibility(ctxt, X86_SHADOW_INT_MOV_SS);
 
-- 
1.6.5

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH 2/4] KVM: Fix load_guest_segment_descriptor() to inject page fault

2010-02-18 Thread Gleb Natapov

From: Takuya Yoshikawa 

This patch injects page fault when reading descriptor in
load_guest_segment_descriptor() fails with FAULT.

Effects of this injection: This function is used by
kvm_load_segment_descriptor() which is necessary for the
following instructions.
 - mov seg,r/m16
 - jmp far
 - pop ?s
This patch makes it possible to emulate the page faults
generated by these instructions. But be sure that unless
we change the kvm_load_segment_descriptor()'s ret value
propagation this patch has no effect.

Signed-off-by: Takuya Yoshikawa 
Signed-off-by: Gleb Natapov 
---
 arch/x86/kvm/x86.c |   13 ++---
 1 files changed, 10 insertions(+), 3 deletions(-)

diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index b2335f6..0273980 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -4713,6 +4713,9 @@ static int load_guest_segment_descriptor(struct kvm_vcpu 
*vcpu, u16 selector,
 {
struct desc_ptr dtable;
u16 index = selector >> 3;
+   int ret;
+   u32 err;
+   gva_t addr;
 
get_segment_descriptor_dtable(vcpu, selector, &dtable);
 
@@ -4720,9 +4723,13 @@ static int load_guest_segment_descriptor(struct kvm_vcpu 
*vcpu, u16 selector,
kvm_queue_exception_e(vcpu, GP_VECTOR, selector & 0xfffc);
return X86EMUL_PROPAGATE_FAULT;
}
-   return kvm_read_guest_virt_system(dtable.address + index*8,
- seg_desc, sizeof(*seg_desc),
- vcpu, NULL);
+   addr = dtable.address + index * 8;
+   ret = kvm_read_guest_virt_system(addr, seg_desc, sizeof(*seg_desc),
+vcpu,  &err);
+   if (ret == X86EMUL_PROPAGATE_FAULT)
+   kvm_inject_page_fault(vcpu, addr, err);
+
+   return ret;
 }
 
 /* allowed just for 8 bytes segments */
-- 
1.6.5

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH 0/4] More emulator correctness fixes

2010-02-18 Thread Gleb Natapov

This patch series adds proper permission checking during segment
selector loading. Some missing fault injections are added.

Gleb Natapov (2):
  KVM: Forbid modifying CS segment register by mov instruction.
  KVM: Fix segment descriptor loading.

Takuya Yoshikawa (2):
  KVM: Fix load_guest_segment_descriptor() to inject page fault
  KVM: Fix emulate_sys[call, enter, exit]()'s fault handling

 arch/x86/include/asm/kvm_host.h |3 +-
 arch/x86/kvm/emulate.c  |   71 +++
 arch/x86/kvm/x86.c  |  190 +++
 3 files changed, 186 insertions(+), 78 deletions(-)

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH 4/4] KVM: Fix emulate_sys[call, enter, exit]()'s fault handling

2010-02-18 Thread Gleb Natapov

From: Takuya Yoshikawa 

This patch fixes emulate_syscall(), emulate_sysenter() and
emulate_sysexit() to handle injected faults properly.

Even though original code injects faults in these functions,
we cannot handle these unless we use the different return
value from the UNHANDLEABLE case. So this patch use X86EMUL_*
codes instead of -1 and 0 and makes x86_emulate_insn() to
handle these propagated faults.

Be sure that, in x86_emulate_insn(), goto cannot_emulate and
goto done with rc equals X86EMUL_UNHANDLEABLE have same effect.

Signed-off-by: Takuya Yoshikawa 
Signed-off-by: Gleb Natapov 
---
 arch/x86/kvm/emulate.c |   37 -
 1 files changed, 20 insertions(+), 17 deletions(-)

diff --git a/arch/x86/kvm/emulate.c b/arch/x86/kvm/emulate.c
index 8d315ab..35f7acd 100644
--- a/arch/x86/kvm/emulate.c
+++ b/arch/x86/kvm/emulate.c
@@ -1590,7 +1590,7 @@ emulate_syscall(struct x86_emulate_ctxt *ctxt)
 
/* syscall is not available in real mode */
if (ctxt->mode == X86EMUL_MODE_REAL || ctxt->mode == X86EMUL_MODE_VM86)
-   return -1;
+   return X86EMUL_UNHANDLEABLE;
 
setup_syscalls_segments(ctxt, &cs, &ss);
 
@@ -1627,7 +1627,7 @@ emulate_syscall(struct x86_emulate_ctxt *ctxt)
ctxt->eflags &= ~(EFLG_VM | EFLG_IF | EFLG_RF);
}
 
-   return 0;
+   return X86EMUL_CONTINUE;
 }
 
 static int
@@ -1640,14 +1640,14 @@ emulate_sysenter(struct x86_emulate_ctxt *ctxt)
/* inject #GP if in real mode */
if (ctxt->mode == X86EMUL_MODE_REAL) {
kvm_inject_gp(ctxt->vcpu, 0);
-   return -1;
+   return X86EMUL_UNHANDLEABLE;
}
 
/* XXX sysenter/sysexit have not been tested in 64bit mode.
* Therefore, we inject an #UD.
*/
if (ctxt->mode == X86EMUL_MODE_PROT64)
-   return -1;
+   return X86EMUL_UNHANDLEABLE;
 
setup_syscalls_segments(ctxt, &cs, &ss);
 
@@ -1656,13 +1656,13 @@ emulate_sysenter(struct x86_emulate_ctxt *ctxt)
case X86EMUL_MODE_PROT32:
if ((msr_data & 0xfffc) == 0x0) {
kvm_inject_gp(ctxt->vcpu, 0);
-   return -1;
+   return X86EMUL_PROPAGATE_FAULT;
}
break;
case X86EMUL_MODE_PROT64:
if (msr_data == 0x0) {
kvm_inject_gp(ctxt->vcpu, 0);
-   return -1;
+   return X86EMUL_PROPAGATE_FAULT;
}
break;
}
@@ -1687,7 +1687,7 @@ emulate_sysenter(struct x86_emulate_ctxt *ctxt)
kvm_x86_ops->get_msr(ctxt->vcpu, MSR_IA32_SYSENTER_ESP, &msr_data);
c->regs[VCPU_REGS_RSP] = msr_data;
 
-   return 0;
+   return X86EMUL_CONTINUE;
 }
 
 static int
@@ -1702,7 +1702,7 @@ emulate_sysexit(struct x86_emulate_ctxt *ctxt)
if (ctxt->mode == X86EMUL_MODE_REAL ||
ctxt->mode == X86EMUL_MODE_VM86) {
kvm_inject_gp(ctxt->vcpu, 0);
-   return -1;
+   return X86EMUL_UNHANDLEABLE;
}
 
setup_syscalls_segments(ctxt, &cs, &ss);
@@ -1720,7 +1720,7 @@ emulate_sysexit(struct x86_emulate_ctxt *ctxt)
cs.selector = (u16)(msr_data + 16);
if ((msr_data & 0xfffc) == 0x0) {
kvm_inject_gp(ctxt->vcpu, 0);
-   return -1;
+   return X86EMUL_PROPAGATE_FAULT;
}
ss.selector = (u16)(msr_data + 24);
break;
@@ -1728,7 +1728,7 @@ emulate_sysexit(struct x86_emulate_ctxt *ctxt)
cs.selector = (u16)(msr_data + 32);
if (msr_data == 0x0) {
kvm_inject_gp(ctxt->vcpu, 0);
-   return -1;
+   return X86EMUL_PROPAGATE_FAULT;
}
ss.selector = cs.selector + 8;
cs.db = 0;
@@ -1744,7 +1744,7 @@ emulate_sysexit(struct x86_emulate_ctxt *ctxt)
c->eip = ctxt->vcpu->arch.regs[VCPU_REGS_RDX];
c->regs[VCPU_REGS_RSP] = ctxt->vcpu->arch.regs[VCPU_REGS_RCX];
 
-   return 0;
+   return X86EMUL_CONTINUE;
 }
 
 static bool emulator_bad_iopl(struct x86_emulate_ctxt *ctxt)
@@ -2472,8 +2472,9 @@ twobyte_insn:
}
break;
case 0x05:  /* syscall */
-   if (emulate_syscall(ctxt) == -1)
-   goto cannot_emulate;
+   rc = emulate_syscall(ctxt);
+   if (rc != X86EMUL_CONTINUE)
+   goto done;
else
goto writeback;
break;
@@ -2541,14 +2542,16 @@ twobyte_insn:
c->dst.type = OP_NONE;
break;
case 0x34:  /* sysenter */
-   if (emulate_sysenter(ctxt) == -1)
-   goto cannot_emulate;
+

Re: [PATCH] KVM test: Modifying finish.exe to support parallel installs

2010-02-18 Thread Lucas Meneghel Rodrigues

On Sun, 2010-02-14 at 11:21 -0500, Michael Goldish wrote:
> - "Lucas Meneghel Rodrigues"  wrote:
> 
> > In order to adapt all the OS unattended installs to parallel
> > installs, finish.exe also had to be adapted to be a server
> > instead of a client. These are the modifications needed.
> > 
> > Once the whole patchset is worked out, an updated version
> > of finish.exe will be shipped on version control.
> > 
> > Signed-off-by: Lucas Meneghel Rodrigues 
> 
> Now that finish.exe is a server it looks like a stripped version
> of rss.exe.  Since we're already running rss.exe at the end of
> each unattended_install, why not just use VM.remote_login() to
> verify that the installation was successful?

If we can guarantee that the sshd daemon is up and running by the end of
linux guest installs, then that is a perfectly fine solution. Need to
check though.

> >  client/tests/kvm/deps/finish.cpp |  111
> > --
> >  1 files changed, 59 insertions(+), 52 deletions(-)
> > 
> > diff --git a/client/tests/kvm/deps/finish.cpp
> > b/client/tests/kvm/deps/finish.cpp
> > index 9c2867c..e5ba128 100644
> > --- a/client/tests/kvm/deps/finish.cpp
> > +++ b/client/tests/kvm/deps/finish.cpp
> > @@ -1,12 +1,13 @@
> > -// Simple app that only sends an ack string to the KVM unattended
> > install
> > -// watch code.
> > +// Simple application that creates a server socket, listening for
> > connections
> > +// of the unattended install test. Once it gets a client connected,
> > the
> > +// app will send back an ACK string, indicating the install process
> > is done.
> >  //
> >  // You must link this code with Ws2_32.lib, Mswsock.lib, and
> > Advapi32.lib
> >  //
> >  // Author: Lucas Meneghel Rodrigues 
> >  // Code was adapted from an MSDN sample.
> >  
> > -// Usage: finish.exe [Host OS IP]
> > +// Usage: finish.exe
> >  
> >  // MinGW's ws2tcpip.h only defines getaddrinfo and other functions
> > only for
> >  // the case _WIN32_WINNT >= 0x0501.
> > @@ -21,24 +22,18 @@
> >  #include 
> >  #include 
> >  
> > -#define DEFAULT_BUFLEN 512
> >  #define DEFAULT_PORT "12323"
> > -
> >  int main(int argc, char **argv)
> >  {
> >  WSADATA wsaData;
> > -SOCKET ConnectSocket = INVALID_SOCKET;
> > -struct addrinfo *result = NULL,
> > -*ptr = NULL,
> > -hints;
> > +SOCKET ListenSocket = INVALID_SOCKET, ClientSocket =
> > INVALID_SOCKET;
> > +struct addrinfo *result = NULL, hints;
> >  char *sendbuf = "done";
> > -char recvbuf[DEFAULT_BUFLEN];
> > -int iResult;
> > -int recvbuflen = DEFAULT_BUFLEN;
> > +int iResult, iSendResult;
> >  
> >  // Validate the parameters
> > -if (argc != 2) {
> > -printf("usage: %s server-name\n", argv[0]);
> > +if (argc != 1) {
> > +printf("usage: %s", argv[0]);
> >  return 1;
> >  }
> >  
> > @@ -49,72 +44,84 @@ int main(int argc, char **argv)
> >  return 1;
> >  }
> >  
> > -ZeroMemory( &hints, sizeof(hints) );
> > -hints.ai_family = AF_UNSPEC;
> > +ZeroMemory(&hints, sizeof(hints));
> > +hints.ai_family = AF_INET;
> >  hints.ai_socktype = SOCK_STREAM;
> >  hints.ai_protocol = IPPROTO_TCP;
> > +hints.ai_flags = AI_PASSIVE;
> >  
> >  // Resolve the server address and port
> > -iResult = getaddrinfo(argv[1], DEFAULT_PORT, &hints, &result);
> > -if ( iResult != 0 ) {
> > +iResult = getaddrinfo(NULL, DEFAULT_PORT, &hints, &result);
> > +if (iResult != 0) {
> >  printf("getaddrinfo failed: %d\n", iResult);
> >  WSACleanup();
> >  return 1;
> >  }
> >  
> > -// Attempt to connect to an address until one succeeds
> > -for(ptr=result; ptr != NULL ;ptr=ptr->ai_next) {
> > -
> > -// Create a SOCKET for connecting to server
> > -ConnectSocket = socket(ptr->ai_family, ptr->ai_socktype,
> > -ptr->ai_protocol);
> > -if (ConnectSocket == INVALID_SOCKET) {
> > -printf("Error at socket(): %ld\n", WSAGetLastError());
> > -freeaddrinfo(result);
> > -WSACleanup();
> > -return 1;
> > -}
> > -
> > -// Connect to server.
> > -iResult = connect( ConnectSocket, ptr->ai_addr,
> > (int)ptr->ai_addrlen);
> > -if (iResult == SOCKET_ERROR) {
> > -closesocket(ConnectSocket);
> > -ConnectSocket = INVALID_SOCKET;
> > -continue;
> > -}
> > -break;
> > +// Create a SOCKET for connecting to server
> > +ListenSocket = socket(result->ai_family, result->ai_socktype,
> > +  result->ai_protocol);
> > +if (ListenSocket == INVALID_SOCKET) {
> > +printf("socket failed: %ld\n", WSAGetLastError());
> > +freeaddrinfo(result);
> > +WSACleanup();
> > +return 1;
> > +}
> > +
> > +// Setup the TCP listening socket
> > +iResult = bind(ListenSocket, result->ai_addr,

Re: [patch] x86: kvm: Convert i8254/i8259 locks to raw_spinlocks

2010-02-18 Thread Jan Kiszka

Avi Kivity wrote:
> On 02/18/2010 11:45 AM, Avi Kivity wrote:
>> On 02/18/2010 11:40 AM, Jan Kiszka wrote:
 Meanwhile, if anyone has any idea how to kill this lock, I'd love to 
 see it.

>>> What concurrency does it resolve in the end? On first glance, it only
>>> synchronize the fiddling with pre-VCPU request bits, right? What forces
>>> us to do this? Wouldn't it suffice to disable preemption (thus
>>> migration) and the let concurrent requests race for setting the bits? I
>>> mean if some request bit was already set on entry, we don't include the
>>>   related VCPU in smp_call_function_many anyway.
>> It's more difficult.
>>
>> vcpu 0: sets request bit on vcpu 2
>>   vcpu 1: test_and_set request bit on vcpu 2, returns already set
>>   vcpu 1: returns
>> vcpu 0: sends IPI
>> vcpu 0: returns
>>
>> so vcpu 1 returns before the IPI was performed.  If the request was a 
>> tlb flush, for example, vcpu 1 may free a page that is still in vcpu 
>> 2's tlb.
> 
> One way out would be to have a KVM_REQ_IN_PROGRESS, set it in 
> make_request, clear it in the IPI function.
> 
> If a second make_request sees it already set, it can simply busy wait 
> until it is cleared, without sending the IPI.  Of course the busy wait 
> means we can't enable preemption (or we may busy wait on an unscheduled 
> task), but at least the requests can proceed in parallel instead of 
> serializing.

...or include VCPUs with KVM_REQ_IN_PROGRESS set into the IPI set even
if they already have the desired request bit set. Then we should
serialize in smp_call_function_many.

Jan

-- 
Siemens AG, Corporate Technology, CT T DE IT 1
Corporate Competence Center Embedded Linux
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH] Remove all references to KVM_CR3_CACHE

2010-02-18 Thread Jes . Sorensen

This patch removes all references to KVM_CR3_CACHE as suggested by
Marcello.

Jes


commit 3e3b0979d7d1464baeb5770f1de4954da7e59e1b
Author: Jes Sorensen 
Date:   Wed Feb 17 18:03:37 2010 +0100

Remove all references to KVM_CR3_CACHE as it was never implemented.

Signed-off-by: Jes Sorensen 

diff --git a/qemu-kvm-x86.c b/qemu-kvm-x86.c
index 7f820a4..e57c479 100644
--- a/qemu-kvm-x86.c
+++ b/qemu-kvm-x86.c
@@ -1248,9 +1248,6 @@ struct kvm_para_features {
 #ifdef KVM_CAP_PV_MMU
{ KVM_CAP_PV_MMU, KVM_FEATURE_MMU_OP },
 #endif
-#ifdef KVM_CAP_CR3_CACHE
-   { KVM_CAP_CR3_CACHE, KVM_FEATURE_CR3_CACHE },
-#endif
{ -1, -1 }
 };
 
diff --git a/target-i386/kvm.c b/target-i386/kvm.c
index 504f501..36fa736 100644
--- a/target-i386/kvm.c
+++ b/target-i386/kvm.c
@@ -154,9 +154,6 @@ struct kvm_para_features {
 #ifdef KVM_CAP_PV_MMU
 { KVM_CAP_PV_MMU, KVM_FEATURE_MMU_OP },
 #endif
-#ifdef KVM_CAP_CR3_CACHE
-{ KVM_CAP_CR3_CACHE, KVM_FEATURE_CR3_CACHE },
-#endif
 { -1, -1 }
 };
 
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [patch] x86: kvm: Convert i8254/i8259 locks to raw_spinlocks

2010-02-18 Thread Avi Kivity


On 02/18/2010 11:49 AM, Jan Kiszka wrote:

Avi Kivity wrote:
   

On 02/18/2010 11:40 AM, Jan Kiszka wrote:
 

Meanwhile, if anyone has any idea how to kill this lock, I'd love to see it.


 

What concurrency does it resolve in the end? On first glance, it only
synchronize the fiddling with pre-VCPU request bits, right? What forces
us to do this? Wouldn't it suffice to disable preemption (thus
migration) and the let concurrent requests race for setting the bits? I
mean if some request bit was already set on entry, we don't include the
   related VCPU in smp_call_function_many anyway.

   

It's more difficult.

vcpu 0: sets request bit on vcpu 2
vcpu 1: test_and_set request bit on vcpu 2, returns already set
vcpu 1: returns
vcpu 0: sends IPI
vcpu 0: returns

so vcpu 1 returns before the IPI was performed.  If the request was a
tlb flush, for example, vcpu 1 may free a page that is still in vcpu 2's
tlb.
 

So the requests bits we are interested in are exclusively set in this
function under requests_lock?
   


Yes.  vcpu 1 may still see the bit set (if vcpu 2 didn't reach the start 
of its loop and cleared it), so it's not total serialization, but nearly so.


--
error compiling committee.c: too many arguments to function

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [patch] x86: kvm: Convert i8254/i8259 locks to raw_spinlocks

2010-02-18 Thread Avi Kivity


On 02/18/2010 11:45 AM, Avi Kivity wrote:

On 02/18/2010 11:40 AM, Jan Kiszka wrote:


Meanwhile, if anyone has any idea how to kill this lock, I'd love to 
see it.



What concurrency does it resolve in the end? On first glance, it only
synchronize the fiddling with pre-VCPU request bits, right? What forces
us to do this? Wouldn't it suffice to disable preemption (thus
migration) and the let concurrent requests race for setting the bits? I
mean if some request bit was already set on entry, we don't include the
  related VCPU in smp_call_function_many anyway.


It's more difficult.

vcpu 0: sets request bit on vcpu 2
  vcpu 1: test_and_set request bit on vcpu 2, returns already set
  vcpu 1: returns
vcpu 0: sends IPI
vcpu 0: returns

so vcpu 1 returns before the IPI was performed.  If the request was a 
tlb flush, for example, vcpu 1 may free a page that is still in vcpu 
2's tlb.


One way out would be to have a KVM_REQ_IN_PROGRESS, set it in 
make_request, clear it in the IPI function.


If a second make_request sees it already set, it can simply busy wait 
until it is cleared, without sending the IPI.  Of course the busy wait 
means we can't enable preemption (or we may busy wait on an unscheduled 
task), but at least the requests can proceed in parallel instead of 
serializing.


--
error compiling committee.c: too many arguments to function

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [patch] x86: kvm: Convert i8254/i8259 locks to raw_spinlocks

2010-02-18 Thread Jan Kiszka

Avi Kivity wrote:
> On 02/18/2010 11:40 AM, Jan Kiszka wrote:
>>> Meanwhile, if anyone has any idea how to kill this lock, I'd love to see it.
>>>
>>>  
>> What concurrency does it resolve in the end? On first glance, it only
>> synchronize the fiddling with pre-VCPU request bits, right? What forces
>> us to do this? Wouldn't it suffice to disable preemption (thus
>> migration) and the let concurrent requests race for setting the bits? I
>> mean if some request bit was already set on entry, we don't include the
>>   related VCPU in smp_call_function_many anyway.
>>
> 
> It's more difficult.
> 
> vcpu 0: sets request bit on vcpu 2
>vcpu 1: test_and_set request bit on vcpu 2, returns already set
>vcpu 1: returns
> vcpu 0: sends IPI
> vcpu 0: returns
> 
> so vcpu 1 returns before the IPI was performed.  If the request was a 
> tlb flush, for example, vcpu 1 may free a page that is still in vcpu 2's 
> tlb.

So the requests bits we are interested in are exclusively set in this
function under requests_lock?

Jan

-- 
Siemens AG, Corporate Technology, CT T DE IT 1
Corporate Competence Center Embedded Linux
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [patch] x86: kvm: Convert i8254/i8259 locks to raw_spinlocks

2010-02-18 Thread Avi Kivity


On 02/18/2010 11:40 AM, Jan Kiszka wrote:



Meanwhile, if anyone has any idea how to kill this lock, I'd love to see it.

 

What concurrency does it resolve in the end? On first glance, it only
synchronize the fiddling with pre-VCPU request bits, right? What forces
us to do this? Wouldn't it suffice to disable preemption (thus
migration) and the let concurrent requests race for setting the bits? I
mean if some request bit was already set on entry, we don't include the
  related VCPU in smp_call_function_many anyway.
   


It's more difficult.

vcpu 0: sets request bit on vcpu 2
  vcpu 1: test_and_set request bit on vcpu 2, returns already set
  vcpu 1: returns
vcpu 0: sends IPI
vcpu 0: returns

so vcpu 1 returns before the IPI was performed.  If the request was a 
tlb flush, for example, vcpu 1 may free a page that is still in vcpu 2's 
tlb.


--
error compiling committee.c: too many arguments to function

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [patch] x86: kvm: Convert i8254/i8259 locks to raw_spinlocks

2010-02-18 Thread Jan Kiszka

Avi Kivity wrote:
> On 02/18/2010 11:12 AM, Jan Kiszka wrote:
>> Thomas Gleixner wrote:
>>
>>> The i8254/i8259 locks need to be real spinlocks on preempt-rt. Convert
>>> them to raw_spinlock. No change for !RT kernels.
>>>  
>> Last time I ran KVM over RT, I also had to convert requests_lock (struct
>> kvm): make_all_cpus_request assumes that this lock prevents migration.
>>
>>
> 
> True.  Will commit something to that effect.
> 
> Meanwhile, if anyone has any idea how to kill this lock, I'd love to see it.
> 

What concurrency does it resolve in the end? On first glance, it only
synchronize the fiddling with pre-VCPU request bits, right? What forces
us to do this? Wouldn't it suffice to disable preemption (thus
migration) and the let concurrent requests race for setting the bits? I
mean if some request bit was already set on entry, we don't include the
 related VCPU in smp_call_function_many anyway.

Jan

-- 
Siemens AG, Corporate Technology, CT T DE IT 1
Corporate Competence Center Embedded Linux
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [patch] x86: kvm: Convert i8254/i8259 locks to raw_spinlocks

2010-02-18 Thread Avi Kivity


On 02/18/2010 11:12 AM, Jan Kiszka wrote:

Thomas Gleixner wrote:
   

The i8254/i8259 locks need to be real spinlocks on preempt-rt. Convert
them to raw_spinlock. No change for !RT kernels.
 

Last time I ran KVM over RT, I also had to convert requests_lock (struct
kvm): make_all_cpus_request assumes that this lock prevents migration.

   


True.  Will commit something to that effect.

Meanwhile, if anyone has any idea how to kill this lock, I'd love to see it.

--
error compiling committee.c: too many arguments to function

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [patch] x86: kvm: Convert i8254/i8259 locks to raw_spinlocks

2010-02-18 Thread Avi Kivity


On 02/17/2010 04:00 PM, Thomas Gleixner wrote:

The i8254/i8259 locks need to be real spinlocks on preempt-rt. Convert
them to raw_spinlock. No change for !RT kernels.
   


Applied, thanks.

--
error compiling committee.c: too many arguments to function

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [patch] x86: kvm: Convert i8254/i8259 locks to raw_spinlocks

2010-02-18 Thread Jan Kiszka

Thomas Gleixner wrote:
> The i8254/i8259 locks need to be real spinlocks on preempt-rt. Convert
> them to raw_spinlock. No change for !RT kernels.

Last time I ran KVM over RT, I also had to convert requests_lock (struct
kvm): make_all_cpus_request assumes that this lock prevents migration.

Jan

-- 
Siemens AG, Corporate Technology, CT T DE IT 1
Corporate Competence Center Embedded Linux
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [patch uq/master 2/4] qemu: kvm specific wait_io_event

2010-02-18 Thread Avi Kivity


On 02/18/2010 12:14 AM, Marcelo Tosatti wrote:

In KVM mode the global mutex is released when vcpus are executing,
which means acquiring the fairness mutex is not required.

Also for KVM there is one thread per vcpu, so tcg_has_work is meaningless.

Add a new qemu_wait_io_event_common function to hold common code
between TCG/KVM.

Signed-off-by: Marcelo Tosatti

Index: qemu/vl.c
===
--- qemu.orig/vl.c
+++ qemu/vl.c
@@ -3382,6 +3382,7 @@ static QemuCond qemu_pause_cond;
  static void block_io_signals(void);
  static void unblock_io_signals(void);
  static int tcg_has_work(void);
+static int cpu_has_work(CPUState *env);

  static int qemu_init_main_loop(void)
  {
@@ -3402,6 +3403,15 @@ static int qemu_init_main_loop(void)
  return 0;
  }

+static void qemu_wait_io_event_common(CPUState *env)
+{
+if (env->stop) {
+env->stop = 0;
+env->stopped = 1;
+qemu_cond_signal(&qemu_pause_cond);
+}
+}
+
  static void qemu_wait_io_event(CPUState *env)
  {
  while (!tcg_has_work())
@@ -3418,11 +3428,15 @@ static void qemu_wait_io_event(CPUState
  qemu_mutex_unlock(&qemu_fair_mutex);

  qemu_mutex_lock(&qemu_global_mutex);
-if (env->stop) {
-env->stop = 0;
-env->stopped = 1;
-qemu_cond_signal(&qemu_pause_cond);
-}
+qemu_wait_io_event_common(env);
+}
+
+static void qemu_kvm_wait_io_event(CPUState *env)
+{
+while (!cpu_has_work(env))
+qemu_cond_timedwait(env->halt_cond,&qemu_global_mutex, 1000);
+
+qemu_wait_io_event_common(env);
  }
   


Shouldn't kvm specific code be in kvm-all.c?



  static int qemu_cpu_exec(CPUState *env);
@@ -3448,7 +3462,7 @@ static void *kvm_cpu_thread_fn(void *arg
  while (1) {
  if (cpu_can_run(env))
  qemu_cpu_exec(env);
-qemu_wait_io_event(env);
+qemu_kvm_wait_io_event(env);
  }

  return NULL;
   


Well, kvm_cpu_thread_fn() apparently isn't.

--
Do not meddle in the internals of kernels, for they are subtle and quick to 
panic.

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [patch uq/master 0/4] uq/master: iothread consume signals via sigtimedwait and cleanups

2010-02-18 Thread Avi Kivity


On 02/18/2010 12:14 AM, Marcelo Tosatti wrote:

See individual patches for details.

   


Please repost, copying qemu-devel, since this code is to be queued for 
qemu.git.


--
Do not meddle in the internals of kernels, for they are subtle and quick to 
panic.

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: Recommended network driver for a windows KVM guest

2010-02-18 Thread Dor Laor


On 02/17/2010 12:51 PM, carlopmart wrote:

Hi all,

I need to install several windows KVM (rhel5.4 host fully updated)
guests for iSCSI boot. iSCSI servers are Solaris/OpenSolaris storage
servers and I need to boot windows guests (2008R2 and Win7) using gpxe.
Can i use virtio net dirver during windows install or e1000 driver??


rhel5.4 does not have gpxe so it won't work. rhel5.5 will have such but
I don't recall someone testing iScsi with kvm+gpxe on upstream too, 
worth testing.


Anyway, virtio performs better than e1000 and potentially more stable 
than it.




Many thanks.


--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH 00/18] KVM: PPC: Virtualize Gekko guests

2010-02-18 Thread Avi Kivity


On 02/18/2010 09:40 AM, Avi Kivity wrote:
Now you made me check how fast the real hw is. I get about 65,000,000 
fmul operations per second on it.




That's surprisingly low.


I get 3.7 Gflops on my home machine (1G loops, 4 fmul and 4 fadds, all 
independent, in 2.15 seconds; otherwise I can't saturate the pipeline).


--
Do not meddle in the internals of kernels, for they are subtle and quick to 
panic.

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

71 matches

Mail list logo