Re: KVM call agenda for September 7

2010-09-06 Thread Avi Kivity

 On 09/06/2010 11:00 PM, Juan Quintela wrote:

Please send in any agenda items you are interested in covering.



0.13?

--
I have a truly marvellous patch that fixes the bug which this
signature is too narrow to contain.

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


RE: [KVM timekeeping 26/35] Catchup slower TSC to guest rate

2010-09-06 Thread Dong, Eddie
Zachary:
Will you extend the logic to cover the situation when the guest runs at 
higher than the guest rate but the PCPU is over committed. In that case, likely 
we can use the time spent when the VCPU is scheduled out to catch up as well. 
Of course if the VCPU scheduled out time is not enough to compensate the cycles 
caused by fast host TSC (exceeding a threahold), we will eventually have to 
fall back to trap and emulation mode.

Thx, Eddie

-Original Message-
From: kvm-ow...@vger.kernel.org [mailto:kvm-ow...@vger.kernel.org] On Behalf Of 
Zachary Amsden
Sent: 2010年8月20日 16:08
To: kvm@vger.kernel.org
Cc: Zachary Amsden; Avi Kivity; Marcelo Tosatti; Glauber Costa; Thomas 
Gleixner; John Stultz; linux-ker...@vger.kernel.org
Subject: [KVM timekeeping 26/35] Catchup slower TSC to guest rate

Use the catchup code to continue adjusting the TSC when
running at lower than the guest rate

Signed-off-by: Zachary Amsden 
---
 arch/x86/kvm/x86.c |9 -
 1 files changed, 8 insertions(+), 1 deletions(-)

diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index a4215d7..086d56a 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -1013,8 +1013,11 @@ static int kvm_guest_time_update(struct kvm_vcpu *v)
kvm_x86_ops->adjust_tsc_offset(v, tsc-tsc_timestamp);
}
local_irq_restore(flags);
-   if (catchup)
+   if (catchup) {
+   if (this_tsc_khz < v->kvm->arch.virtual_tsc_khz)
+   vcpu->tsc_rebase = 1;
return 0;
+   }
 
/*
 * Time as measured by the TSC may go backwards when resetting the base
@@ -5022,6 +5025,10 @@ static int vcpu_enter_guest(struct kvm_vcpu *vcpu)
 
kvm_guest_exit();
 
+   /* Running on slower TSC without kvmclock, we must bump TSC */
+   if (vcpu->arch.tsc_rebase)
+   kvm_request_clock_update(vcpu);
+
preempt_enable();
 
vcpu->srcu_idx = srcu_read_lock(&vcpu->kvm->srcu);
-- 
1.7.1

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Fwd: Guest hangs when I do general operation.

2010-09-06 Thread Amos Kong
-- Forwarded message --
From: Amos Kong 
Date: Tue, Sep 7, 2010 at 7:49 AM
Subject: Guest hangs when I do general operation.
To: 王箫 


kvm upstream: 43e413f7db1a4a90671dda0b1d6c1f8cb30673ed KVM: Whitespace
changes to remove differences wrt kvm-updates/2.6.37
qemu upstream: cb93bbdd7db92e50ff5e60a346b23df68acae46b Fix OpenBSD
linker warning

# ./x86_64-softmmu/qemu-system-x86_64 ~/win7-32.qcow2 -m 1024 -vnc :0
-usbdevice tablet -cpu qemu64 -snapshot -enable-kvm -bios pc-bios/bios.bin

Guest hangs when I do general operation.
It's fine when using upstream qemu-kvm.  this problem only occurred
one time when I debug by gdb, after execute 'continue', guest runs
normally.

debug msg of (qemu + kvm)
(gdb) c
Continuing.

Program received signal SIGUSR2, User defined signal 2.
0x7fc7d4d7bfb3 in select () at ../sysdeps/unix/syscall-template.S:82
82      ../sysdeps/unix/syscall-template.S: No such file or directory.
       in ../sysdeps/unix/syscall-template.S
(gdb) bt
#0  0x7fc7d4d7bfb3 in select () at ../sysdeps/unix/syscall-template.S:82
#1  0x004270ea in qemu_aio_wait () at aio.c:193
#2  0x00426475 in bdrv_read_em (bs=0x186a340,
sector_num=6343320, buf=0x7fc7c5a9b010 "RCRD(", nb_sectors=104) at
block.c:2432
#3  0x0043c437 in qcow_read (bs=0x1838680, start_sect=, cluster_offset=, n_start=,
           n_end=) at block/qcow2-cluster.c:368
#4  copy_sectors (bs=0x1838680, start_sect=,
cluster_offset=, n_start=,
               n_end=) at block/qcow2-cluster.c:406
#5  0x0043c69b in qcow2_alloc_cluster_link_l2 (bs=0x1838680,
m=0x1d5d798) at block/qcow2-cluster.c:689
#6  0x004378d5 in qcow_aio_write_cb (opaque=0x1d5d700, ret=0)
at block/qcow2.c:566
#7  0x00429c5d in posix_aio_process_queue (opaque=) at posix-aio-compat.c:459
#8  0x00429d0c in posix_aio_read (opaque=0x183a250) at
posix-aio-compat.c:489
#9  0x0051fec6 in main_loop_wait (nonblocking=) at /home/devel/qemu/vl.c:1281
#10 0x005209bd in main_loop (argc=0, argv=, envp=) at /home/devel/qemu/vl.c:1332
#11 main (argc=0, argv=, envp=) at /home/devel/qemu/vl.c:2995
---
kvm statistics

 efer_reload                  0       0
 exits                  8714404       0
 fpu_reload              115538       0
 halt_exits               66926       0
 halt_wakeup                  0       0
 host_state_reload      2366344       0
 hypercalls                   0       0
 insn_emulation         1848818       0
 insn_emulation_fail          0       0
 invlpg                  662261       0
 io_exits               1293800       0
 irq_exits               531478       0
 irq_injections          109588       0
 irq_window              114236       0
 largepages                   0       0
 mmio_exits              705388       0
 mmu_cache_miss          355201       0
 mmu_flooded             298554       0
 mmu_pde_zapped           25705       0
 mmu_pte_updated         241815       0
 mmu_pte_write         15701676       0
 mmu_recycled               546       0
 mmu_shadow_zapped       527220       0
 mmu_unsync                4203       0
 nmi_injections               0       0
 nmi_window                   0       0
 pf_fixed               3107522       0
 pf_guest                631148       0
 remote_tlb_flush         31032       0
 request_irq                  0       0
 signal_exits            310597       0
 tlb_flush              2164428       0


RE: KVM Test report, kernel e6a9246... qemu 94f964d...

2010-09-06 Thread Hao, Xudong
Avi Kivity wrote:
>   On 09/06/2010 11:08 AM, Hao, Xudong wrote:
>> 
>>> Unable to reproduce - R5u3 i386 guest installed and booted, x86_64
>>> booted from cd, all as expected.
>> Do you use EPT or shadow mode? This issue only exist on shadow mode.
> 
> Shadow.  What's your command line?
> 

qemu-system-x86_64 -m 1024 -smp 1 -hda /imagepath/RHEL5u3.img


Thanks,
Xudong--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[patch 1/2] qemu-kvm: use usptream eventfd code

2010-09-06 Thread Marcelo Tosatti
Upstream code is equivalent.

Signed-off-by: Marcelo Tosatti 

Index: qemu-kvm/cpus.c
===
--- qemu-kvm.orig/cpus.c
+++ qemu-kvm/cpus.c
@@ -290,11 +290,6 @@ void qemu_notify_event(void)
 {
 CPUState *env = cpu_single_env;
 
-if (kvm_enabled()) {
-qemu_kvm_notify_work();
-return;
-}
-
 qemu_event_increment ();
 if (env) {
 cpu_exit(env);
Index: qemu-kvm/qemu-kvm.c
===
--- qemu-kvm.orig/qemu-kvm.c
+++ qemu-kvm/qemu-kvm.c
@@ -71,7 +71,6 @@ static int qemu_system_ready;
 #define SIG_IPI (SIGRTMIN+4)
 
 pthread_t io_thread;
-static int io_thread_fd = -1;
 static int io_thread_sigfd = -1;
 
 static CPUState *kvm_debug_cpu_requested;
@@ -1634,28 +1633,6 @@ int kvm_init_ap(void)
 return 0;
 }
 
-void qemu_kvm_notify_work(void)
-{
-/* Write 8 bytes to be compatible with eventfd.  */
-static uint64_t val = 1;
-ssize_t ret;
-
-if (io_thread_fd == -1) {
-return;
-}
-
-do {
-ret = write(io_thread_fd, &val, sizeof(val));
-} while (ret < 0 && errno == EINTR);
-
-/* EAGAIN is fine in case we have a pipe.  */
-if (ret < 0 && errno != EAGAIN) {
- fprintf(stderr, "qemu_kvm_notify_work: write() filed: %s\n",
- strerror(errno));
- exit (1);
-}
-}
-
 /* If we have signalfd, we mask out the signals we want to handle and then
  * use signalfd to listen for them.  We rely on whatever the current signal
  * handler is to dispatch the signals when we receive them.
@@ -1692,41 +1669,14 @@ static void sigfd_handler(void *opaque)
 }
 }
 
-/* Used to break IO thread out of select */
-static void io_thread_wakeup(void *opaque)
-{
-int fd = (unsigned long) opaque;
-ssize_t len;
-char buffer[512];
-
-/* Drain the notify pipe.  For eventfd, only 8 bytes will be read.  */
-do {
-len = read(fd, buffer, sizeof(buffer));
-} while ((len == -1 && errno == EINTR) || len == sizeof(buffer));
-}
-
 int kvm_main_loop(void)
 {
-int fds[2];
 sigset_t mask;
 int sigfd;
 
 io_thread = pthread_self();
 qemu_system_ready = 1;
 
-if (qemu_eventfd(fds) == -1) {
-fprintf(stderr, "failed to create eventfd\n");
-return -errno;
-}
-
-fcntl(fds[0], F_SETFL, O_NONBLOCK);
-fcntl(fds[1], F_SETFL, O_NONBLOCK);
-
-qemu_set_fd_handler2(fds[0], NULL, io_thread_wakeup, NULL,
- (void *)(unsigned long) fds[0]);
-
-io_thread_fd = fds[1];
-
 sigemptyset(&mask);
 sigaddset(&mask, SIGIO);
 sigaddset(&mask, SIGALRM);
Index: qemu-kvm/qemu-kvm.h
===
--- qemu-kvm.orig/qemu-kvm.h
+++ qemu-kvm/qemu-kvm.h
@@ -863,8 +863,6 @@ void qemu_kvm_aio_wait_start(void);
 void qemu_kvm_aio_wait(void);
 void qemu_kvm_aio_wait_end(void);
 
-void qemu_kvm_notify_work(void);
-
 void kvm_tpr_access_report(CPUState *env, uint64_t rip, int is_write);
 
 int kvm_arch_init_irq_routing(void);


--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[patch 2/2] qemu-kvm: drop posix-aio-compat.cs signalfd usage

2010-09-06 Thread Marcelo Tosatti
Block SIGUSR2, which makes the signal be handled through qemu-kvm.c's
signalfd.

Signed-off-by: Marcelo Tosatti 

Index: qemu-kvm/posix-aio-compat.c
===
--- qemu-kvm.orig/posix-aio-compat.c
+++ qemu-kvm/posix-aio-compat.c
@@ -26,7 +26,6 @@
 #include "osdep.h"
 #include "qemu-common.h"
 #include "block_int.h"
-#include "compatfd.h"
 
 #include "block/raw-posix-aio.h"
 
@@ -54,7 +53,7 @@ struct qemu_paiocb {
 };
 
 typedef struct PosixAioState {
-int fd;
+int rfd, wfd;
 struct qemu_paiocb *first_aio;
 } PosixAioState;
 
@@ -473,29 +472,18 @@ static int posix_aio_process_queue(void 
 static void posix_aio_read(void *opaque)
 {
 PosixAioState *s = opaque;
-union {
-struct qemu_signalfd_siginfo siginfo;
-char buf[128];
-} sig;
-size_t offset;
-
-/* try to read from signalfd, don't freak out if we can't read anything */
-offset = 0;
-while (offset < 128) {
-ssize_t len;
+ssize_t len;
 
-len = read(s->fd, sig.buf + offset, 128 - offset);
-if (len == -1 && errno == EINTR)
-continue;
-if (len == -1 && errno == EAGAIN) {
-/* there is no natural reason for this to happen,
- * so we'll spin hard until we get everything just
- * to be on the safe side. */
-if (offset > 0)
-continue;
-}
+/* read all bytes from signal pipe */
+for (;;) {
+char bytes[16];
 
-offset += len;
+len = read(s->rfd, bytes, sizeof(bytes));
+if (len == -1 && errno == EINTR)
+continue; /* try again */
+if (len == sizeof(bytes))
+continue; /* more to read */
+break;
 }
 
 posix_aio_process_queue(s);
@@ -509,6 +497,20 @@ static int posix_aio_flush(void *opaque)
 
 static PosixAioState *posix_aio_state;
 
+static void aio_signal_handler(int signum)
+{
+if (posix_aio_state) {
+char byte = 0;
+ssize_t ret;
+
+ret = write(posix_aio_state->wfd, &byte, sizeof(byte));
+if (ret < 0 && errno != EAGAIN)
+die("write()");
+}
+
+qemu_service_io();
+}
+
 static void paio_remove(struct qemu_paiocb *acb)
 {
 struct qemu_paiocb **pacb;
@@ -610,8 +612,9 @@ BlockDriverAIOCB *paio_ioctl(BlockDriver
 
 int paio_init(void)
 {
-sigset_t mask;
+struct sigaction act;
 PosixAioState *s;
+int fds[2];
 int ret;
 
 if (posix_aio_state)
@@ -619,21 +622,24 @@ int paio_init(void)
 
 s = qemu_malloc(sizeof(PosixAioState));
 
-/* Make sure to block AIO signal */
-sigemptyset(&mask);
-sigaddset(&mask, SIGUSR2);
-sigprocmask(SIG_BLOCK, &mask, NULL);
+sigfillset(&act.sa_mask);
+act.sa_flags = 0; /* do not restart syscalls to interrupt select() */
+act.sa_handler = aio_signal_handler;
+sigaction(SIGUSR2, &act, NULL);
 
 s->first_aio = NULL;
-s->fd = qemu_signalfd(&mask);
-if (s->fd == -1) {
-fprintf(stderr, "failed to create signalfd\n");
+if (qemu_pipe(fds) == -1) {
+fprintf(stderr, "failed to create pipe\n");
 return -1;
 }
 
-fcntl(s->fd, F_SETFL, O_NONBLOCK);
+s->rfd = fds[0];
+s->wfd = fds[1];
+
+fcntl(s->rfd, F_SETFL, O_NONBLOCK);
+fcntl(s->wfd, F_SETFL, O_NONBLOCK);
 
-qemu_aio_set_fd_handler(s->fd, posix_aio_read, NULL, posix_aio_flush,
+qemu_aio_set_fd_handler(s->rfd, posix_aio_read, NULL, posix_aio_flush,
 posix_aio_process_queue, s);
 
 ret = pthread_attr_init(&attr);
Index: qemu-kvm/qemu-kvm.c
===
--- qemu-kvm.orig/qemu-kvm.c
+++ qemu-kvm/qemu-kvm.c
@@ -1680,6 +1680,7 @@ int kvm_main_loop(void)
 sigemptyset(&mask);
 sigaddset(&mask, SIGIO);
 sigaddset(&mask, SIGALRM);
+sigaddset(&mask, SIGUSR2);
 sigaddset(&mask, SIGBUS);
 sigprocmask(SIG_BLOCK, &mask, NULL);
 


--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[patch 0/2] qemu-kvm cleanups

2010-09-06 Thread Marcelo Tosatti
Two minor signal related cleanups.


--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 0/4] Apply some CODING_STYLE to qemu-kvm*.c

2010-09-06 Thread Marcelo Tosatti
On Sun, Sep 05, 2010 at 03:13:53PM +0300, Avi Kivity wrote:
> 
> Avi Kivity (4):
>   qemu-kvm-x86.c: reindent
>   qemu-kvm-x86.c: remove extraneous line continuation
>   qemu-kvm-x86.c: add braces where appropriate
>   qemu-kvm.c: add braces where appropriate
> 
>  qemu-kvm-x86.c |  909 
> +---
>  qemu-kvm.c |  174 +++
>  2 files changed, 586 insertions(+), 497 deletions(-)

Applied, thanks.

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


KVM call agenda for September 7

2010-09-06 Thread Juan Quintela

Please send in any agenda items you are interested in covering.

thanks,
Juan.
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 3/4] x86: Fix allowed CPUID bits for KVM guests

2010-09-06 Thread Avi Kivity

 On 09/06/2010 04:14 PM, Andre Przywara wrote:

The AMD extensions to AVX (FMA4, XOP) work on the same YMM register set
as AVX, so they are safe for guests to use, as long as AVX itself
is allowed. Add F16C and AES on the way for the same reasons.


Acked-by: Avi Kivity 

--
I have a truly marvellous patch that fixes the bug which this
signature is too narrow to contain.

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 0/3] KVM fixes and cleanups

2010-09-06 Thread Marcelo Tosatti
On Thu, Sep 02, 2010 at 05:29:44PM +0200, Joerg Roedel wrote:
> Hi Avi, Marcelo,
> 
> here are 3 patches which came up during the final testing of the
> nested-npt patch set. This patch set fixes two issues I found and the
> last patch contains a minor cleanup which does not fix any real bug.
> Please have a look at them and feel free to apply them (only if no
> objections, of course ;-) )
> For the bug that patch 2 fixes I will write a unit-test and submit it
> separatly.
> 
> Thanks,
>   Joerg
> 
> Shortlog:
> 
> Joerg Roedel (3):
>   KVM: MMU: Fix 32 bit legacy paging with NPT
>   KVM: SVM: Restore correct registers after sel_cr0 intercept emulation
>   KVM: SVM: Clean up rip handling in vmrun emulation
> 
> Diffstat:
> 
>  arch/x86/kvm/mmu.c |8 ++--
>  arch/x86/kvm/svm.c |   39 ++-
>  2 files changed, 40 insertions(+), 7 deletions(-)

Applied, thanks.

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 0/27] Nested Paging Virtualization for KVM v3 (now with fixed Cc-List)

2010-09-06 Thread Avi Kivity

 On 09/06/2010 06:55 PM, Joerg Roedel wrote:

(Now with correct Cc-list. I accidentially copied the wrong line from
  MAINTAINERS in the first post of this. Sorry for the double-post)

Hi Avi, Marcelo,

here is finally the third round of my NPT virtualization patches for KVM. It
took a while to get everything running (including KVM itself) on 32 bit again
to actually test it. But testing on 32 bit host and with a 32 bit hypervisor
was a very good idea. I found some serious bugs and shortcomings in my code
that are fixed now in v3.






This patchset applies on todays avi/master + the three patches I sent end of
last week. These patches are necessary for some of the tests above to run.

For the curious and impatient user I put everything in a branch on kernel.org.
If you want to test it you can pull the tree from

git://git.kernel.org/pub/scm/linux/kernel/git/joro/linux-2.6-kvm.git 
npt-virt-v3

Please review and/or apply these patches if considered good enough. Otherwise I
appreciate your feedback.


Very impressive patchset.  It's broken out so finely that the careful 
reader gets the feeling he understands every little detail, without 
noticing you've introduced recursion into the kvm mmu.


The little nit regarding patch 10 can be addressed in a follow-on patch.

Reviewed-by: Avi Kivity 

Please also post a unit test that checks that nested page faults for l1 
ptes with bad NX, U, W, or reserved bits set are correctly intercepted 
and reported.  W should work already if you tested nested vga, but the 
rest are untested during normal operation and pose a security problem if 
they are incorrect.


--
I have a truly marvellous patch that fixes the bug which this
signature is too narrow to contain.

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 17/27] KVM: MMU: Track page fault data in struct vcpu

2010-09-06 Thread Avi Kivity

 On 09/06/2010 06:55 PM, Joerg Roedel wrote:

This patch introduces a struct with two new fields in
vcpu_arch for x86:

* fault.address
* fault.error_code

This will be used to correctly propagate page faults back
into the guest when we could have either an ordinary page
fault or a nested page fault. In the case of a nested page
fault the fault-address is different from the original
address that should be walked. So we need to keep track
about the real fault-address.



-static void emulate_pf(struct x86_emulate_ctxt *ctxt, unsigned long addr,
-  int err)
+static void emulate_pf(struct x86_emulate_ctxt *ctxt)
  {
-   ctxt->cr2 = addr;
-   emulate_exception(ctxt, PF_VECTOR, err, true);
+   emulate_exception(ctxt, PF_VECTOR, 0, true);
  }


What happened to the error code?


diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index b2fe9e7..38d482d 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -4130,7 +4130,8 @@ static void inject_emulated_exception(struct kvm_vcpu 
*vcpu)
  {
struct x86_emulate_ctxt *ctxt =&vcpu->arch.emulate_ctxt;
if (ctxt->exception == PF_VECTOR)
-   kvm_inject_page_fault(vcpu, ctxt->cr2, ctxt->error_code);
+   kvm_inject_page_fault(vcpu, vcpu->arch.fault.address,
+   vcpu->arch.fault.error_code);
else if (ctxt->error_code_valid)
kvm_queue_exception_e(vcpu, ctxt->exception, ctxt->error_code);
else


Ah.  Not lovely, but it was ugly before as well.

--
I have a truly marvellous patch that fixes the bug which this
signature is too narrow to contain.

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 10/27] KVM: MMU: Add infrastructure for two-level page walker

2010-09-06 Thread Avi Kivity

 On 09/06/2010 06:55 PM, Joerg Roedel wrote:

This patch introduces a mmu-callback to translate gpa
addresses in the walk_addr code. This is later used to
translate l2_gpa addresses into l1_gpa addresses.



@@ -534,6 +534,11 @@ static inline gpa_t gfn_to_gpa(gfn_t gfn)
return (gpa_t)gfn<<  PAGE_SHIFT;
  }

+static inline gfn_t gpa_to_gfn(gpa_t gpa)
+{
+   return (gfn_t)gpa>>  PAGE_SHIFT;
+}
+


That's a bug - gfn_t may be smaller than gpa_t, so you're truncating 
just before the shift.  Note the casts in the surrounding functions are 
widening, not narrowing.


However, gfn_t is u64 so the bug is only theoretical.

--
I have a truly marvellous patch that fixes the bug which this
signature is too narrow to contain.

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 16/27] KVM: MMU: Introduce init_kvm_nested_mmu()

2010-09-06 Thread Joerg Roedel
This patch introduces the init_kvm_nested_mmu() function
which is used to re-initialize the nested mmu when the l2
guest changes its paging mode.

Signed-off-by: Joerg Roedel 
---
 arch/x86/include/asm/kvm_host.h |1 +
 arch/x86/kvm/mmu.c  |   34 +-
 arch/x86/kvm/mmu.h  |1 +
 arch/x86/kvm/x86.c  |   20 
 4 files changed, 55 insertions(+), 1 deletions(-)

diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
index 38dc82e..a338235 100644
--- a/arch/x86/include/asm/kvm_host.h
+++ b/arch/x86/include/asm/kvm_host.h
@@ -805,3 +805,4 @@ void kvm_set_shared_msr(unsigned index, u64 val, u64 mask);
 bool kvm_is_linear_rip(struct kvm_vcpu *vcpu, unsigned long linear_rip);
 
 #endif /* _ASM_X86_KVM_HOST_H */
+
diff --git a/arch/x86/kvm/mmu.c b/arch/x86/kvm/mmu.c
index 1f425f3..7bc8d67 100644
--- a/arch/x86/kvm/mmu.c
+++ b/arch/x86/kvm/mmu.c
@@ -2784,11 +2784,43 @@ static int init_kvm_softmmu(struct kvm_vcpu *vcpu)
return r;
 }
 
+static int init_kvm_nested_mmu(struct kvm_vcpu *vcpu)
+{
+   struct kvm_mmu *g_context = &vcpu->arch.nested_mmu;
+
+   g_context->get_cr3   = get_cr3;
+   g_context->inject_page_fault = kvm_inject_page_fault;
+
+   /*
+* Note that arch.mmu.gva_to_gpa translates l2_gva to l1_gpa. The
+* translation of l2_gpa to l1_gpa addresses is done using the
+* arch.nested_mmu.gva_to_gpa function. Basically the gva_to_gpa
+* functions between mmu and nested_mmu are swapped.
+*/
+   if (!is_paging(vcpu)) {
+   g_context->root_level = 0;
+   g_context->gva_to_gpa = nonpaging_gva_to_gpa_nested;
+   } else if (is_long_mode(vcpu)) {
+   g_context->root_level = PT64_ROOT_LEVEL;
+   g_context->gva_to_gpa = paging64_gva_to_gpa_nested;
+   } else if (is_pae(vcpu)) {
+   g_context->root_level = PT32E_ROOT_LEVEL;
+   g_context->gva_to_gpa = paging64_gva_to_gpa_nested;
+   } else {
+   g_context->root_level = PT32_ROOT_LEVEL;
+   g_context->gva_to_gpa = paging32_gva_to_gpa_nested;
+   }
+
+   return 0;
+}
+
 static int init_kvm_mmu(struct kvm_vcpu *vcpu)
 {
vcpu->arch.update_pte.pfn = bad_pfn;
 
-   if (tdp_enabled)
+   if (mmu_is_nested(vcpu))
+   return init_kvm_nested_mmu(vcpu);
+   else if (tdp_enabled)
return init_kvm_tdp_mmu(vcpu);
else
return init_kvm_softmmu(vcpu);
diff --git a/arch/x86/kvm/mmu.h b/arch/x86/kvm/mmu.h
index 7086ca8..513abbb 100644
--- a/arch/x86/kvm/mmu.h
+++ b/arch/x86/kvm/mmu.h
@@ -47,6 +47,7 @@
 #define PFERR_USER_MASK (1U << 2)
 #define PFERR_RSVD_MASK (1U << 3)
 #define PFERR_FETCH_MASK (1U << 4)
+#define PFERR_NESTED_MASK (1U << 31)
 
 int kvm_mmu_get_spte_hierarchy(struct kvm_vcpu *vcpu, u64 addr, u64 sptes[4]);
 int kvm_init_shadow_mmu(struct kvm_vcpu *vcpu, struct kvm_mmu *context);
diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index 1f5db75..b2fe9e7 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -3476,6 +3476,25 @@ static gpa_t translate_gpa(struct kvm_vcpu *vcpu, gpa_t 
gpa, u32 *error)
return gpa;
 }
 
+static gpa_t translate_nested_gpa(struct kvm_vcpu *vcpu, gpa_t gpa, u32 *error)
+{
+   gpa_t t_gpa;
+   u32 access;
+   u32 err;
+
+   BUG_ON(!mmu_is_nested(vcpu));
+
+   /* NPT walks are treated as user writes */
+   access = PFERR_WRITE_MASK | PFERR_USER_MASK;
+   t_gpa  = vcpu->arch.mmu.gva_to_gpa(vcpu, gpa, access, &err);
+   if (t_gpa == UNMAPPED_GVA) {
+   vcpu->arch.fault.address= gpa;
+   vcpu->arch.fault.error_code = err | PFERR_NESTED_MASK;
+   }
+
+   return t_gpa;
+}
+
 gpa_t kvm_mmu_gva_to_gpa_read(struct kvm_vcpu *vcpu, gva_t gva, u32 *error)
 {
u32 access = (kvm_x86_ops->get_cpl(vcpu) == 3) ? PFERR_USER_MASK : 0;
@@ -5691,6 +5710,7 @@ int kvm_arch_vcpu_init(struct kvm_vcpu *vcpu)
vcpu->arch.walk_mmu = &vcpu->arch.mmu;
vcpu->arch.mmu.root_hpa = INVALID_PAGE;
vcpu->arch.mmu.translate_gpa = translate_gpa;
+   vcpu->arch.nested_mmu.translate_gpa = translate_nested_gpa;
if (!irqchip_in_kernel(kvm) || kvm_vcpu_is_bsp(vcpu))
vcpu->arch.mp_state = KVM_MP_STATE_RUNNABLE;
else
-- 
1.7.0.4


--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 13/27] KVM: X86: Add kvm_read_guest_page_tdp function

2010-09-06 Thread Joerg Roedel
This patch adds a function which can read from the guests
physical memory or from the guest's guest physical memory.
This will be used in the two-dimensional page table walker.

Signed-off-by: Joerg Roedel 
---
 arch/x86/include/asm/kvm_host.h |3 +++
 arch/x86/kvm/x86.c  |   24 
 2 files changed, 27 insertions(+), 0 deletions(-)

diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
index 9b9c096..38dc82e 100644
--- a/arch/x86/include/asm/kvm_host.h
+++ b/arch/x86/include/asm/kvm_host.h
@@ -651,6 +651,9 @@ void kvm_requeue_exception(struct kvm_vcpu *vcpu, unsigned 
nr);
 void kvm_requeue_exception_e(struct kvm_vcpu *vcpu, unsigned nr, u32 
error_code);
 void kvm_inject_page_fault(struct kvm_vcpu *vcpu, unsigned long cr2,
   u32 error_code);
+int kvm_read_guest_page_mmu(struct kvm_vcpu *vcpu, struct kvm_mmu *mmu,
+   gfn_t gfn, void *data, int offset, int len,
+   u32 *error);
 bool kvm_require_cpl(struct kvm_vcpu *vcpu, int required_cpl);
 
 int kvm_pic_set_irq(void *opaque, int irq, int level);
diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index e5dcf7f..f1bdf4e 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -369,6 +369,30 @@ bool kvm_require_cpl(struct kvm_vcpu *vcpu, int 
required_cpl)
 EXPORT_SYMBOL_GPL(kvm_require_cpl);
 
 /*
+ * This function will be used to read from the physical memory of the currently
+ * running guest. The difference to kvm_read_guest_page is that this function
+ * can read from guest physical or from the guest's guest physical memory.
+ */
+int kvm_read_guest_page_mmu(struct kvm_vcpu *vcpu, struct kvm_mmu *mmu,
+   gfn_t ngfn, void *data, int offset, int len,
+   u32 *error)
+{
+   gfn_t real_gfn;
+   gpa_t ngpa;
+
+   *error   = 0;
+   ngpa = gfn_to_gpa(ngfn);
+   real_gfn = mmu->translate_gpa(vcpu, ngpa, error);
+   if (real_gfn == UNMAPPED_GVA)
+   return -EFAULT;
+
+   real_gfn = gpa_to_gfn(real_gfn);
+
+   return kvm_read_guest_page(vcpu->kvm, real_gfn, data, offset, len);
+}
+EXPORT_SYMBOL_GPL(kvm_read_guest_page_mmu);
+
+/*
  * Load the pae pdptrs.  Return true is they are all valid.
  */
 int load_pdptrs(struct kvm_vcpu *vcpu, unsigned long cr3)
-- 
1.7.0.4


--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 08/27] KVM: MMU: Let is_rsvd_bits_set take mmu context instead of vcpu

2010-09-06 Thread Joerg Roedel
This patch changes is_rsvd_bits_set() function prototype to
take only a kvm_mmu context instead of a full vcpu.

Signed-off-by: Joerg Roedel 
---
 arch/x86/kvm/mmu.c |6 +++---
 arch/x86/kvm/paging_tmpl.h |7 ---
 2 files changed, 7 insertions(+), 6 deletions(-)

diff --git a/arch/x86/kvm/mmu.c b/arch/x86/kvm/mmu.c
index 787540d..9668f91 100644
--- a/arch/x86/kvm/mmu.c
+++ b/arch/x86/kvm/mmu.c
@@ -2577,12 +2577,12 @@ static void paging_free(struct kvm_vcpu *vcpu)
nonpaging_free(vcpu);
 }
 
-static bool is_rsvd_bits_set(struct kvm_vcpu *vcpu, u64 gpte, int level)
+static bool is_rsvd_bits_set(struct kvm_mmu *mmu, u64 gpte, int level)
 {
int bit7;
 
bit7 = (gpte >> 7) & 1;
-   return (gpte & vcpu->arch.mmu.rsvd_bits_mask[bit7][level-1]) != 0;
+   return (gpte & mmu->rsvd_bits_mask[bit7][level-1]) != 0;
 }
 
 #define PTTYPE 64
@@ -2857,7 +2857,7 @@ static void mmu_pte_write_new_pte(struct kvm_vcpu *vcpu,
return;
 }
 
-   if (is_rsvd_bits_set(vcpu, *(u64 *)new, PT_PAGE_TABLE_LEVEL))
+   if (is_rsvd_bits_set(&vcpu->arch.mmu, *(u64 *)new, PT_PAGE_TABLE_LEVEL))
return;
 
++vcpu->kvm->stat.mmu_pte_updated;
diff --git a/arch/x86/kvm/paging_tmpl.h b/arch/x86/kvm/paging_tmpl.h
index 13d0c06..68ee1b7 100644
--- a/arch/x86/kvm/paging_tmpl.h
+++ b/arch/x86/kvm/paging_tmpl.h
@@ -168,7 +168,7 @@ walk:
break;
}
 
-   if (is_rsvd_bits_set(vcpu, pte, walker->level)) {
+   if (is_rsvd_bits_set(&vcpu->arch.mmu, pte, walker->level)) {
rsvd_fault = true;
break;
}
@@ -327,6 +327,7 @@ static void FNAME(pte_prefetch)(struct kvm_vcpu *vcpu, 
struct guest_walker *gw,
u64 *sptep)
 {
struct kvm_mmu_page *sp;
+   struct kvm_mmu *mmu = &vcpu->arch.mmu;
pt_element_t *gptep = gw->prefetch_ptes;
u64 *spte;
int i;
@@ -358,7 +359,7 @@ static void FNAME(pte_prefetch)(struct kvm_vcpu *vcpu, 
struct guest_walker *gw,
gpte = gptep[i];
 
if (!is_present_gpte(gpte) ||
- is_rsvd_bits_set(vcpu, gpte, PT_PAGE_TABLE_LEVEL)) {
+ is_rsvd_bits_set(mmu, gpte, PT_PAGE_TABLE_LEVEL)) {
if (!sp->unsync)
__set_spte(spte, shadow_notrap_nonpresent_pte);
continue;
@@ -713,7 +714,7 @@ static int FNAME(sync_page)(struct kvm_vcpu *vcpu, struct 
kvm_mmu_page *sp,
return -EINVAL;
 
gfn = gpte_to_gfn(gpte);
-   if (is_rsvd_bits_set(vcpu, gpte, PT_PAGE_TABLE_LEVEL)
+   if (is_rsvd_bits_set(&vcpu->arch.mmu, gpte, PT_PAGE_TABLE_LEVEL)
  || gfn != sp->gfns[i] || !is_present_gpte(gpte)
  || !(gpte & PT_ACCESSED_MASK)) {
u64 nonpresent;
-- 
1.7.0.4


--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 15/27] KVM: MMU: Introduce kvm_read_guest_page_x86()

2010-09-06 Thread Joerg Roedel
This patch introduces the kvm_read_guest_page_x86 function
which reads from the physical memory of the guest. If the
guest is running in guest-mode itself with nested paging
enabled it will read from the guest's guest physical memory
instead.
The patch also changes changes the code to use this function
where it is necessary.

Signed-off-by: Joerg Roedel 
---
 arch/x86/kvm/x86.c |   22 ++
 1 files changed, 18 insertions(+), 4 deletions(-)

diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index f1bdf4e..1f5db75 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -392,6 +392,13 @@ int kvm_read_guest_page_mmu(struct kvm_vcpu *vcpu, struct 
kvm_mmu *mmu,
 }
 EXPORT_SYMBOL_GPL(kvm_read_guest_page_mmu);
 
+int kvm_read_nested_guest_page(struct kvm_vcpu *vcpu, gfn_t gfn,
+  void *data, int offset, int len, u32 *error)
+{
+   return kvm_read_guest_page_mmu(vcpu, vcpu->arch.walk_mmu, gfn,
+  data, offset, len, error);
+}
+
 /*
  * Load the pae pdptrs.  Return true is they are all valid.
  */
@@ -399,12 +406,13 @@ int load_pdptrs(struct kvm_vcpu *vcpu, unsigned long cr3)
 {
gfn_t pdpt_gfn = cr3 >> PAGE_SHIFT;
unsigned offset = ((cr3 & (PAGE_SIZE-1)) >> 5) << 2;
-   int i;
+   int i, error;
int ret;
u64 pdpte[ARRAY_SIZE(vcpu->arch.pdptrs)];
 
-   ret = kvm_read_guest_page(vcpu->kvm, pdpt_gfn, pdpte,
- offset * sizeof(u64), sizeof(pdpte));
+   ret = kvm_read_nested_guest_page(vcpu, pdpt_gfn, pdpte,
+offset * sizeof(u64),
+sizeof(pdpte), &error);
if (ret < 0) {
ret = 0;
goto out;
@@ -433,6 +441,9 @@ static bool pdptrs_changed(struct kvm_vcpu *vcpu)
 {
u64 pdpte[ARRAY_SIZE(vcpu->arch.pdptrs)];
bool changed = true;
+   int offset;
+   u32 error;
+   gfn_t gfn;
int r;
 
if (is_long_mode(vcpu) || !is_pae(vcpu))
@@ -442,7 +453,10 @@ static bool pdptrs_changed(struct kvm_vcpu *vcpu)
  (unsigned long *)&vcpu->arch.regs_avail))
return true;
 
-   r = kvm_read_guest(vcpu->kvm, vcpu->arch.cr3 & ~31u, pdpte, 
sizeof(pdpte));
+   gfn = (vcpu->arch.cr3 & ~31u) >> PAGE_SHIFT;
+   offset = (vcpu->arch.cr3 & ~31u) & (PAGE_SIZE - 1);
+   r = kvm_read_nested_guest_page(vcpu, gfn, pdpte, offset,
+  sizeof(pdpte), &error);
if (r < 0)
goto out;
changed = memcmp(pdpte, vcpu->arch.pdptrs, sizeof(pdpte)) != 0;
-- 
1.7.0.4


--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 0/27] Nested Paging Virtualization for KVM v3 (now with fixed Cc-List)

2010-09-06 Thread Joerg Roedel
(Now with correct Cc-list. I accidentially copied the wrong line from
 MAINTAINERS in the first post of this. Sorry for the double-post)

Hi Avi, Marcelo,

here is finally the third round of my NPT virtualization patches for KVM. It
took a while to get everything running (including KVM itself) on 32 bit again
to actually test it. But testing on 32 bit host and with a 32 bit hypervisor
was a very good idea. I found some serious bugs and shortcomings in my code
that are fixed now in v3.

The patchset was tested in a number of combinations:

host(64|32e)
->kvm(shadow|npt)
->guest(64|32e|32)
->test(boot|kbuild)

host(64|32e)
->kvm(npt)
->guest(64|32e|32)
->kvm(shadow|kvm)
->guest(64|32e|32)
->test(boot|kbuild)

Only the valid combinations where tested of course, so no 64 bit on 32 bit
combinations were tested. Except for that I tested all of the above
combinations and all worked without any regressions.

Other changes since v2 are:

* Addressed the review comments from v2:
- Rebased everything to latest upstream code
- renamed nested_mmu to walk_mmu to make its
  meaning more clear
- the gva_to_gpa functions are not longer swapped
  between the two mmu states which makes it more
  consistent
- Moved struct vcpu page fault data into seperate
  sub-struct for better readability
- Other minor stuff (coding style, typos)
- Renamed the kvm_*_page_x86 functions to kvm_*_page_mmu so
  that they can be made more generic later.
* Made everything work on 32 bit
- Introduced mmu->lm_root pointer to let the softmmu shadow 32
  bit page tables with a long-mode page table. The lm_root
  page-table root always just points to the mmu.pae_root, so
  this builds entirely on the pae-shadow code.
- Split mmu_alloc_roots into a shadow and direct_map version to
  simplify the code and to not break the direct_map paths 
anymore
  when changing something in that function.
* Probably other changes I forgot about

This patchset applies on todays avi/master + the three patches I sent end of
last week. These patches are necessary for some of the tests above to run.

For the curious and impatient user I put everything in a branch on kernel.org.
If you want to test it you can pull the tree from

git://git.kernel.org/pub/scm/linux/kernel/git/joro/linux-2.6-kvm.git 
npt-virt-v3

Please review and/or apply these patches if considered good enough. Otherwise I
appreciate your feedback.

Thanks,

Joerg


--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 05/27] KVM: MMU: Introduce get_cr3 function pointer

2010-09-06 Thread Joerg Roedel
This function pointer in the MMU context is required to
implement Nested Nested Paging.

Signed-off-by: Joerg Roedel 
---
 arch/x86/include/asm/kvm_host.h |1 +
 arch/x86/kvm/mmu.c  |9 -
 arch/x86/kvm/paging_tmpl.h  |4 ++--
 3 files changed, 11 insertions(+), 3 deletions(-)

diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
index aeeea9c..ab708ee 100644
--- a/arch/x86/include/asm/kvm_host.h
+++ b/arch/x86/include/asm/kvm_host.h
@@ -237,6 +237,7 @@ struct kvm_pio_request {
 struct kvm_mmu {
void (*new_cr3)(struct kvm_vcpu *vcpu);
void (*set_cr3)(struct kvm_vcpu *vcpu, unsigned long root);
+   unsigned long (*get_cr3)(struct kvm_vcpu *vcpu);
int (*page_fault)(struct kvm_vcpu *vcpu, gva_t gva, u32 err);
void (*free)(struct kvm_vcpu *vcpu);
gpa_t (*gva_to_gpa)(struct kvm_vcpu *vcpu, gva_t gva, u32 access,
diff --git a/arch/x86/kvm/mmu.c b/arch/x86/kvm/mmu.c
index 543ec74..d2213fa 100644
--- a/arch/x86/kvm/mmu.c
+++ b/arch/x86/kvm/mmu.c
@@ -2365,7 +2365,7 @@ static int mmu_alloc_roots(struct kvm_vcpu *vcpu)
int direct = 0;
u64 pdptr;
 
-   root_gfn = vcpu->arch.cr3 >> PAGE_SHIFT;
+   root_gfn = vcpu->arch.mmu.get_cr3(vcpu) >> PAGE_SHIFT;
 
if (vcpu->arch.mmu.shadow_root_level == PT64_ROOT_LEVEL) {
hpa_t root = vcpu->arch.mmu.root_hpa;
@@ -2561,6 +2561,11 @@ static void paging_new_cr3(struct kvm_vcpu *vcpu)
mmu_free_roots(vcpu);
 }
 
+static unsigned long get_cr3(struct kvm_vcpu *vcpu)
+{
+   return vcpu->arch.cr3;
+}
+
 static void inject_page_fault(struct kvm_vcpu *vcpu,
  u64 addr,
  u32 err_code)
@@ -2712,6 +2717,7 @@ static int init_kvm_tdp_mmu(struct kvm_vcpu *vcpu)
context->root_hpa = INVALID_PAGE;
context->direct_map = true;
context->set_cr3 = kvm_x86_ops->set_tdp_cr3;
+   context->get_cr3 = get_cr3;
 
if (!is_paging(vcpu)) {
context->gva_to_gpa = nonpaging_gva_to_gpa;
@@ -2753,6 +2759,7 @@ static int init_kvm_softmmu(struct kvm_vcpu *vcpu)
vcpu->arch.mmu.base_role.cr0_wp = is_write_protection(vcpu);
vcpu->arch.mmu.direct_map= false;
vcpu->arch.mmu.set_cr3   = kvm_x86_ops->set_cr3;
+   vcpu->arch.mmu.get_cr3   = get_cr3;
 
return r;
 }
diff --git a/arch/x86/kvm/paging_tmpl.h b/arch/x86/kvm/paging_tmpl.h
index e4ad3dc..13d0c06 100644
--- a/arch/x86/kvm/paging_tmpl.h
+++ b/arch/x86/kvm/paging_tmpl.h
@@ -130,7 +130,7 @@ walk:
present = true;
eperm = rsvd_fault = false;
walker->level = vcpu->arch.mmu.root_level;
-   pte = vcpu->arch.cr3;
+   pte = vcpu->arch.mmu.get_cr3(vcpu);
 #if PTTYPE == 64
if (vcpu->arch.mmu.root_level == PT32E_ROOT_LEVEL) {
pte = kvm_pdptr_read(vcpu, (addr >> 30) & 3);
@@ -143,7 +143,7 @@ walk:
}
 #endif
ASSERT((!is_long_mode(vcpu) && is_pae(vcpu)) ||
-  (vcpu->arch.cr3 & CR3_NONPAE_RESERVED_BITS) == 0);
+  (vcpu->arch.mmu.get_cr3(vcpu) & CR3_NONPAE_RESERVED_BITS) == 0);
 
pt_access = ACC_ALL;
 
-- 
1.7.0.4


--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 01/27] KVM: MMU: Check for root_level instead of long mode

2010-09-06 Thread Joerg Roedel
The walk_addr function checks for !is_long_mode in its 64
bit version. But what is meant here is a check for pae
paging. Change the condition to really check for pae paging
so that it also works with nested nested paging.

Signed-off-by: Joerg Roedel 
---
 arch/x86/kvm/paging_tmpl.h |4 ++--
 1 files changed, 2 insertions(+), 2 deletions(-)

diff --git a/arch/x86/kvm/paging_tmpl.h b/arch/x86/kvm/paging_tmpl.h
index debe770..e4ad3dc 100644
--- a/arch/x86/kvm/paging_tmpl.h
+++ b/arch/x86/kvm/paging_tmpl.h
@@ -132,7 +132,7 @@ walk:
walker->level = vcpu->arch.mmu.root_level;
pte = vcpu->arch.cr3;
 #if PTTYPE == 64
-   if (!is_long_mode(vcpu)) {
+   if (vcpu->arch.mmu.root_level == PT32E_ROOT_LEVEL) {
pte = kvm_pdptr_read(vcpu, (addr >> 30) & 3);
trace_kvm_mmu_paging_element(pte, walker->level);
if (!is_present_gpte(pte)) {
@@ -205,7 +205,7 @@ walk:
(PTTYPE == 64 || is_pse(vcpu))) ||
((walker->level == PT_PDPE_LEVEL) &&
is_large_pte(pte) &&
-   is_long_mode(vcpu))) {
+   vcpu->arch.mmu.root_level == PT64_ROOT_LEVEL)) {
int lvl = walker->level;
 
walker->gfn = gpte_to_gfn_lvl(pte, lvl);
-- 
1.7.0.4


--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 20/27] KVM: MMU: Add kvm_mmu parameter to load_pdptrs function

2010-09-06 Thread Joerg Roedel
This function need to be able to load the pdptrs from any
mmu context currently in use. So change this function to
take an kvm_mmu parameter to fit these needs.
As a side effect this patch also moves the cached pdptrs
from vcpu_arch into the kvm_mmu struct.

Signed-off-by: Joerg Roedel 
---
 arch/x86/include/asm/kvm_host.h |5 +++--
 arch/x86/kvm/kvm_cache_regs.h   |2 +-
 arch/x86/kvm/svm.c  |2 +-
 arch/x86/kvm/vmx.c  |   16 
 arch/x86/kvm/x86.c  |   26 ++
 5 files changed, 27 insertions(+), 24 deletions(-)

diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
index 173834b..1080c0f 100644
--- a/arch/x86/include/asm/kvm_host.h
+++ b/arch/x86/include/asm/kvm_host.h
@@ -259,6 +259,8 @@ struct kvm_mmu {
 
u64 *pae_root;
u64 rsvd_bits_mask[2][4];
+
+   u64 pdptrs[4]; /* pae */
 };
 
 struct kvm_vcpu_arch {
@@ -278,7 +280,6 @@ struct kvm_vcpu_arch {
unsigned long cr4_guest_owned_bits;
unsigned long cr8;
u32 hflags;
-   u64 pdptrs[4]; /* pae */
u64 efer;
u64 apic_base;
struct kvm_lapic *apic;/* kernel irqchip context */
@@ -594,7 +595,7 @@ void kvm_mmu_zap_all(struct kvm *kvm);
 unsigned int kvm_mmu_calculate_mmu_pages(struct kvm *kvm);
 void kvm_mmu_change_mmu_pages(struct kvm *kvm, unsigned int kvm_nr_mmu_pages);
 
-int load_pdptrs(struct kvm_vcpu *vcpu, unsigned long cr3);
+int load_pdptrs(struct kvm_vcpu *vcpu, struct kvm_mmu *mmu, unsigned long cr3);
 
 int emulator_write_phys(struct kvm_vcpu *vcpu, gpa_t gpa,
  const void *val, int bytes);
diff --git a/arch/x86/kvm/kvm_cache_regs.h b/arch/x86/kvm/kvm_cache_regs.h
index 6491ac8..a37abe2 100644
--- a/arch/x86/kvm/kvm_cache_regs.h
+++ b/arch/x86/kvm/kvm_cache_regs.h
@@ -42,7 +42,7 @@ static inline u64 kvm_pdptr_read(struct kvm_vcpu *vcpu, int 
index)
  (unsigned long *)&vcpu->arch.regs_avail))
kvm_x86_ops->cache_reg(vcpu, VCPU_EXREG_PDPTR);
 
-   return vcpu->arch.pdptrs[index];
+   return vcpu->arch.walk_mmu->pdptrs[index];
 }
 
 static inline ulong kvm_read_cr0_bits(struct kvm_vcpu *vcpu, ulong mask)
diff --git a/arch/x86/kvm/svm.c b/arch/x86/kvm/svm.c
index 094df31..a98ac52 100644
--- a/arch/x86/kvm/svm.c
+++ b/arch/x86/kvm/svm.c
@@ -1010,7 +1010,7 @@ static void svm_cache_reg(struct kvm_vcpu *vcpu, enum 
kvm_reg reg)
switch (reg) {
case VCPU_EXREG_PDPTR:
BUG_ON(!npt_enabled);
-   load_pdptrs(vcpu, vcpu->arch.cr3);
+   load_pdptrs(vcpu, vcpu->arch.walk_mmu, vcpu->arch.cr3);
break;
default:
BUG();
diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c
index 0e62d8a..0a70194 100644
--- a/arch/x86/kvm/vmx.c
+++ b/arch/x86/kvm/vmx.c
@@ -1848,20 +1848,20 @@ static void ept_load_pdptrs(struct kvm_vcpu *vcpu)
return;
 
if (is_paging(vcpu) && is_pae(vcpu) && !is_long_mode(vcpu)) {
-   vmcs_write64(GUEST_PDPTR0, vcpu->arch.pdptrs[0]);
-   vmcs_write64(GUEST_PDPTR1, vcpu->arch.pdptrs[1]);
-   vmcs_write64(GUEST_PDPTR2, vcpu->arch.pdptrs[2]);
-   vmcs_write64(GUEST_PDPTR3, vcpu->arch.pdptrs[3]);
+   vmcs_write64(GUEST_PDPTR0, vcpu->arch.mmu.pdptrs[0]);
+   vmcs_write64(GUEST_PDPTR1, vcpu->arch.mmu.pdptrs[1]);
+   vmcs_write64(GUEST_PDPTR2, vcpu->arch.mmu.pdptrs[2]);
+   vmcs_write64(GUEST_PDPTR3, vcpu->arch.mmu.pdptrs[3]);
}
 }
 
 static void ept_save_pdptrs(struct kvm_vcpu *vcpu)
 {
if (is_paging(vcpu) && is_pae(vcpu) && !is_long_mode(vcpu)) {
-   vcpu->arch.pdptrs[0] = vmcs_read64(GUEST_PDPTR0);
-   vcpu->arch.pdptrs[1] = vmcs_read64(GUEST_PDPTR1);
-   vcpu->arch.pdptrs[2] = vmcs_read64(GUEST_PDPTR2);
-   vcpu->arch.pdptrs[3] = vmcs_read64(GUEST_PDPTR3);
+   vcpu->arch.mmu.pdptrs[0] = vmcs_read64(GUEST_PDPTR0);
+   vcpu->arch.mmu.pdptrs[1] = vmcs_read64(GUEST_PDPTR1);
+   vcpu->arch.mmu.pdptrs[2] = vmcs_read64(GUEST_PDPTR2);
+   vcpu->arch.mmu.pdptrs[3] = vmcs_read64(GUEST_PDPTR3);
}
 
__set_bit(VCPU_EXREG_PDPTR,
diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index ca69dcc..337f59f 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -418,17 +418,17 @@ int kvm_read_nested_guest_page(struct kvm_vcpu *vcpu, 
gfn_t gfn,
 /*
  * Load the pae pdptrs.  Return true is they are all valid.
  */
-int load_pdptrs(struct kvm_vcpu *vcpu, unsigned long cr3)
+int load_pdptrs(struct kvm_vcpu *vcpu, struct kvm_mmu *mmu, unsigned long cr3)
 {
gfn_t pdpt_gfn = cr3 >> PAGE_SHIFT;
unsigned offset = ((cr3 & (PAGE_SIZE-1)) >> 5) << 2;
int i, error;
int ret;
-   u64 pdpte[ARRAY_SIZE(vcpu->arch.pdptrs)];
+   u64 pdpte[ARRAY_SIZE(mmu->pdptrs)];
 
-   

[PATCH 22/27] KVM: MMU: Refactor mmu_alloc_roots function

2010-09-06 Thread Joerg Roedel
This patch factors out the direct-mapping paths of the
mmu_alloc_roots function into a seperate function. This
makes it a lot easier to avoid all the unnecessary checks
done in the shadow path which may break when running direct.
In fact, this patch already fixes a problem when running PAE
guests on a PAE shadow page table.

Signed-off-by: Joerg Roedel 
---
 arch/x86/kvm/mmu.c |   82 ++--
 1 files changed, 60 insertions(+), 22 deletions(-)

diff --git a/arch/x86/kvm/mmu.c b/arch/x86/kvm/mmu.c
index 3663d1c..e7e5527 100644
--- a/arch/x86/kvm/mmu.c
+++ b/arch/x86/kvm/mmu.c
@@ -2357,42 +2357,77 @@ static int mmu_check_root(struct kvm_vcpu *vcpu, gfn_t 
root_gfn)
return ret;
 }
 
-static int mmu_alloc_roots(struct kvm_vcpu *vcpu)
+static int mmu_alloc_direct_roots(struct kvm_vcpu *vcpu)
+{
+   struct kvm_mmu_page *sp;
+   int i;
+
+   if (vcpu->arch.mmu.shadow_root_level == PT64_ROOT_LEVEL) {
+   spin_lock(&vcpu->kvm->mmu_lock);
+   kvm_mmu_free_some_pages(vcpu);
+   sp = kvm_mmu_get_page(vcpu, 0, 0, PT64_ROOT_LEVEL,
+ 1, ACC_ALL, NULL);
+   ++sp->root_count;
+   spin_unlock(&vcpu->kvm->mmu_lock);
+   vcpu->arch.mmu.root_hpa = __pa(sp->spt);
+   } else if (vcpu->arch.mmu.shadow_root_level == PT32E_ROOT_LEVEL) {
+   for (i = 0; i < 4; ++i) {
+   hpa_t root = vcpu->arch.mmu.pae_root[i];
+
+   ASSERT(!VALID_PAGE(root));
+   spin_lock(&vcpu->kvm->mmu_lock);
+   kvm_mmu_free_some_pages(vcpu);
+   sp = kvm_mmu_get_page(vcpu, i << 30, i << 30,
+ PT32_ROOT_LEVEL, 1, ACC_ALL,
+ NULL);
+   root = __pa(sp->spt);
+   ++sp->root_count;
+   spin_unlock(&vcpu->kvm->mmu_lock);
+   vcpu->arch.mmu.pae_root[i] = root | PT_PRESENT_MASK;
+   vcpu->arch.mmu.root_hpa = __pa(vcpu->arch.mmu.pae_root);
+   }
+   } else
+   BUG();
+
+   return 0;
+}
+
+static int mmu_alloc_shadow_roots(struct kvm_vcpu *vcpu)
 {
int i;
gfn_t root_gfn;
struct kvm_mmu_page *sp;
-   int direct = 0;
u64 pdptr;
 
root_gfn = vcpu->arch.mmu.get_cr3(vcpu) >> PAGE_SHIFT;
 
-   if (vcpu->arch.mmu.shadow_root_level == PT64_ROOT_LEVEL) {
+   if (mmu_check_root(vcpu, root_gfn))
+   return 1;
+
+   /*
+* Do we shadow a long mode page table? If so we need to
+* write-protect the guests page table root.
+*/
+   if (vcpu->arch.mmu.root_level == PT64_ROOT_LEVEL) {
hpa_t root = vcpu->arch.mmu.root_hpa;
 
ASSERT(!VALID_PAGE(root));
-   if (mmu_check_root(vcpu, root_gfn))
-   return 1;
-   if (vcpu->arch.mmu.direct_map) {
-   direct = 1;
-   root_gfn = 0;
-   }
+
spin_lock(&vcpu->kvm->mmu_lock);
kvm_mmu_free_some_pages(vcpu);
-   sp = kvm_mmu_get_page(vcpu, root_gfn, 0,
- PT64_ROOT_LEVEL, direct,
- ACC_ALL, NULL);
+   sp = kvm_mmu_get_page(vcpu, root_gfn, 0, PT64_ROOT_LEVEL,
+ 0, ACC_ALL, NULL);
root = __pa(sp->spt);
++sp->root_count;
spin_unlock(&vcpu->kvm->mmu_lock);
vcpu->arch.mmu.root_hpa = root;
return 0;
}
-   direct = !is_paging(vcpu);
-
-   if (mmu_check_root(vcpu, root_gfn))
-   return 1;
 
+   /*
+* We shadow a 32 bit page table. This may be a legacy 2-level
+* or a PAE 3-level page table.
+*/
for (i = 0; i < 4; ++i) {
hpa_t root = vcpu->arch.mmu.pae_root[i];
 
@@ -2406,16 +2441,11 @@ static int mmu_alloc_roots(struct kvm_vcpu *vcpu)
root_gfn = pdptr >> PAGE_SHIFT;
if (mmu_check_root(vcpu, root_gfn))
return 1;
-   } else if (vcpu->arch.mmu.root_level == 0)
-   root_gfn = 0;
-   if (vcpu->arch.mmu.direct_map) {
-   direct = 1;
-   root_gfn = i << 30;
}
spin_lock(&vcpu->kvm->mmu_lock);
kvm_mmu_free_some_pages(vcpu);
sp = kvm_mmu_get_page(vcpu, root_gfn, i << 30,
- PT32_ROOT_LEVEL, direct,
+ PT32_ROOT_LEVEL, 0,
  ACC_ALL, NULL);
root = __pa(sp->spt);
++sp->root_count;

[PATCH 18/27] KVM: MMU: Propagate the right fault back to the guest after gva_to_gpa

2010-09-06 Thread Joerg Roedel
This patch implements logic to make sure that either a
page-fault/page-fault-vmexit or a nested-page-fault-vmexit
is propagated back to the guest.

Signed-off-by: Joerg Roedel 
---
 arch/x86/include/asm/kvm_host.h |1 +
 arch/x86/kvm/x86.c  |   19 +--
 2 files changed, 18 insertions(+), 2 deletions(-)

diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
index e5eb57c..173834b 100644
--- a/arch/x86/include/asm/kvm_host.h
+++ b/arch/x86/include/asm/kvm_host.h
@@ -663,6 +663,7 @@ void kvm_inject_page_fault(struct kvm_vcpu *vcpu, unsigned 
long cr2,
 int kvm_read_guest_page_mmu(struct kvm_vcpu *vcpu, struct kvm_mmu *mmu,
gfn_t gfn, void *data, int offset, int len,
u32 *error);
+void kvm_propagate_fault(struct kvm_vcpu *vcpu);
 bool kvm_require_cpl(struct kvm_vcpu *vcpu, int required_cpl);
 
 int kvm_pic_set_irq(void *opaque, int irq, int level);
diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index 38d482d..65b00f0 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -337,6 +337,22 @@ void kvm_inject_page_fault(struct kvm_vcpu *vcpu, unsigned 
long addr,
kvm_queue_exception_e(vcpu, PF_VECTOR, error_code);
 }
 
+void kvm_propagate_fault(struct kvm_vcpu *vcpu)
+{
+   unsigned long address;
+   u32 nested, error;
+
+   address = vcpu->arch.fault.address;
+   error   = vcpu->arch.fault.error_code;
+   nested  = error &  PFERR_NESTED_MASK;
+   error   = error & ~PFERR_NESTED_MASK;
+
+   if (mmu_is_nested(vcpu) && !nested)
+   vcpu->arch.nested_mmu.inject_page_fault(vcpu, address, error);
+   else
+   vcpu->arch.mmu.inject_page_fault(vcpu, address, error);
+}
+
 void kvm_inject_nmi(struct kvm_vcpu *vcpu)
 {
vcpu->arch.nmi_pending = 1;
@@ -4130,8 +4146,7 @@ static void inject_emulated_exception(struct kvm_vcpu 
*vcpu)
 {
struct x86_emulate_ctxt *ctxt = &vcpu->arch.emulate_ctxt;
if (ctxt->exception == PF_VECTOR)
-   kvm_inject_page_fault(vcpu, vcpu->arch.fault.address,
-   vcpu->arch.fault.error_code);
+   kvm_propagate_fault(vcpu);
else if (ctxt->error_code_valid)
kvm_queue_exception_e(vcpu, ctxt->exception, ctxt->error_code);
else
-- 
1.7.0.4


--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 23/27] KVM: MMU: Allow long mode shadows for legacy page tables

2010-09-06 Thread Joerg Roedel
Currently the KVM softmmu implementation can not shadow a 32
bit legacy or PAE page table with a long mode page table.
This is a required feature for nested paging emulation
because the nested page table must alway be in host format.
So this patch implements the missing pieces to allow long
mode page tables for page table types.

Signed-off-by: Joerg Roedel 
---
 arch/x86/include/asm/kvm_host.h |1 +
 arch/x86/kvm/mmu.c  |   60 +-
 2 files changed, 53 insertions(+), 8 deletions(-)

diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
index 1080c0f..475fc70 100644
--- a/arch/x86/include/asm/kvm_host.h
+++ b/arch/x86/include/asm/kvm_host.h
@@ -258,6 +258,7 @@ struct kvm_mmu {
bool direct_map;
 
u64 *pae_root;
+   u64 *lm_root;
u64 rsvd_bits_mask[2][4];
 
u64 pdptrs[4]; /* pae */
diff --git a/arch/x86/kvm/mmu.c b/arch/x86/kvm/mmu.c
index e7e5527..ea8ed8b 100644
--- a/arch/x86/kvm/mmu.c
+++ b/arch/x86/kvm/mmu.c
@@ -1504,6 +1504,12 @@ static void shadow_walk_init(struct 
kvm_shadow_walk_iterator *iterator,
iterator->addr = addr;
iterator->shadow_addr = vcpu->arch.mmu.root_hpa;
iterator->level = vcpu->arch.mmu.shadow_root_level;
+
+   if (iterator->level == PT64_ROOT_LEVEL &&
+   vcpu->arch.mmu.root_level < PT64_ROOT_LEVEL &&
+   !vcpu->arch.mmu.direct_map)
+   --iterator->level;
+
if (iterator->level == PT32E_ROOT_LEVEL) {
iterator->shadow_addr
= vcpu->arch.mmu.pae_root[(addr >> 30) & 3];
@@ -2314,7 +2320,9 @@ static void mmu_free_roots(struct kvm_vcpu *vcpu)
if (!VALID_PAGE(vcpu->arch.mmu.root_hpa))
return;
spin_lock(&vcpu->kvm->mmu_lock);
-   if (vcpu->arch.mmu.shadow_root_level == PT64_ROOT_LEVEL) {
+   if (vcpu->arch.mmu.shadow_root_level == PT64_ROOT_LEVEL &&
+   (vcpu->arch.mmu.root_level == PT64_ROOT_LEVEL ||
+vcpu->arch.mmu.direct_map)) {
hpa_t root = vcpu->arch.mmu.root_hpa;
 
sp = page_header(root);
@@ -2394,10 +2402,10 @@ static int mmu_alloc_direct_roots(struct kvm_vcpu *vcpu)
 
 static int mmu_alloc_shadow_roots(struct kvm_vcpu *vcpu)
 {
-   int i;
-   gfn_t root_gfn;
struct kvm_mmu_page *sp;
-   u64 pdptr;
+   u64 pdptr, pm_mask;
+   gfn_t root_gfn;
+   int i;
 
root_gfn = vcpu->arch.mmu.get_cr3(vcpu) >> PAGE_SHIFT;
 
@@ -2426,8 +2434,13 @@ static int mmu_alloc_shadow_roots(struct kvm_vcpu *vcpu)
 
/*
 * We shadow a 32 bit page table. This may be a legacy 2-level
-* or a PAE 3-level page table.
+* or a PAE 3-level page table. In either case we need to be aware that
+* the shadow page table may be a PAE or a long mode page table.
 */
+   pm_mask = PT_PRESENT_MASK;
+   if (vcpu->arch.mmu.shadow_root_level == PT64_ROOT_LEVEL)
+   pm_mask |= PT_ACCESSED_MASK | PT_WRITABLE_MASK | PT_USER_MASK;
+
for (i = 0; i < 4; ++i) {
hpa_t root = vcpu->arch.mmu.pae_root[i];
 
@@ -2451,9 +2464,35 @@ static int mmu_alloc_shadow_roots(struct kvm_vcpu *vcpu)
++sp->root_count;
spin_unlock(&vcpu->kvm->mmu_lock);
 
-   vcpu->arch.mmu.pae_root[i] = root | PT_PRESENT_MASK;
+   vcpu->arch.mmu.pae_root[i] = root | pm_mask;
+   vcpu->arch.mmu.root_hpa = __pa(vcpu->arch.mmu.pae_root);
}
-   vcpu->arch.mmu.root_hpa = __pa(vcpu->arch.mmu.pae_root);
+
+   /*
+* If we shadow a 32 bit page table with a long mode page
+* table we enter this path.
+*/
+   if (vcpu->arch.mmu.shadow_root_level == PT64_ROOT_LEVEL) {
+   if (vcpu->arch.mmu.lm_root == NULL) {
+   /*
+* The additional page necessary for this is only
+* allocated on demand.
+*/
+
+   u64 *lm_root;
+
+   lm_root = (void*)get_zeroed_page(GFP_KERNEL);
+   if (lm_root == NULL)
+   return 1;
+
+   lm_root[0] = __pa(vcpu->arch.mmu.pae_root) | pm_mask;
+
+   vcpu->arch.mmu.lm_root = lm_root;
+   }
+
+   vcpu->arch.mmu.root_hpa = __pa(vcpu->arch.mmu.lm_root);
+   }
+
return 0;
 }
 
@@ -2470,9 +2509,12 @@ static void mmu_sync_roots(struct kvm_vcpu *vcpu)
int i;
struct kvm_mmu_page *sp;
 
+   if (vcpu->arch.mmu.direct_map)
+   return;
+
if (!VALID_PAGE(vcpu->arch.mmu.root_hpa))
return;
-   if (vcpu->arch.mmu.shadow_root_level == PT64_ROOT_LEVEL) {
+   if (vcpu->arch.mmu.root_level == PT64_ROOT_LEVEL) {
hpa_t root = vcpu->arch.mmu.root_hpa;
sp = page_header(root);
mmu_sy

[PATCH 12/27] KVM: MMU: Implement nested gva_to_gpa functions

2010-09-06 Thread Joerg Roedel
This patch adds the functions to do a nested l2_gva to
l1_gpa page table walk.

Signed-off-by: Joerg Roedel 
---
 arch/x86/include/asm/kvm_host.h |   10 ++
 arch/x86/kvm/mmu.c  |8 
 arch/x86/kvm/paging_tmpl.h  |   31 +++
 arch/x86/kvm/x86.h  |5 +
 4 files changed, 54 insertions(+), 0 deletions(-)

diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
index d797746..9b9c096 100644
--- a/arch/x86/include/asm/kvm_host.h
+++ b/arch/x86/include/asm/kvm_host.h
@@ -298,6 +298,16 @@ struct kvm_vcpu_arch {
struct kvm_mmu mmu;
 
/*
+* Paging state of an L2 guest (used for nested npt)
+*
+* This context will save all necessary information to walk page tables
+* of the an L2 guest. This context is only initialized for page table
+* walking and not for faulting since we never handle l2 page faults on
+* the host.
+*/
+   struct kvm_mmu nested_mmu;
+
+   /*
 * Pointer to the mmu context currently used for
 * gva_to_gpa translations.
 */
diff --git a/arch/x86/kvm/mmu.c b/arch/x86/kvm/mmu.c
index a2cd2ce..1f425f3 100644
--- a/arch/x86/kvm/mmu.c
+++ b/arch/x86/kvm/mmu.c
@@ -2466,6 +2466,14 @@ static gpa_t nonpaging_gva_to_gpa(struct kvm_vcpu *vcpu, 
gva_t vaddr,
return vaddr;
 }
 
+static gpa_t nonpaging_gva_to_gpa_nested(struct kvm_vcpu *vcpu, gva_t vaddr,
+u32 access, u32 *error)
+{
+   if (error)
+   *error = 0;
+   return vcpu->arch.nested_mmu.translate_gpa(vcpu, vaddr, error);
+}
+
 static int nonpaging_page_fault(struct kvm_vcpu *vcpu, gva_t gva,
u32 error_code)
 {
diff --git a/arch/x86/kvm/paging_tmpl.h b/arch/x86/kvm/paging_tmpl.h
index f26fee9..cd59af1 100644
--- a/arch/x86/kvm/paging_tmpl.h
+++ b/arch/x86/kvm/paging_tmpl.h
@@ -272,6 +272,16 @@ static int FNAME(walk_addr)(struct guest_walker *walker,
write_fault, user_fault, fetch_fault);
 }
 
+static int FNAME(walk_addr_nested)(struct guest_walker *walker,
+  struct kvm_vcpu *vcpu, gva_t addr,
+  int write_fault, int user_fault,
+  int fetch_fault)
+{
+   return FNAME(walk_addr_generic)(walker, vcpu, &vcpu->arch.nested_mmu,
+   addr, write_fault, user_fault,
+   fetch_fault);
+}
+
 static void FNAME(update_pte)(struct kvm_vcpu *vcpu, struct kvm_mmu_page *sp,
  u64 *spte, const void *pte)
 {
@@ -656,6 +666,27 @@ static gpa_t FNAME(gva_to_gpa)(struct kvm_vcpu *vcpu, 
gva_t vaddr, u32 access,
return gpa;
 }
 
+static gpa_t FNAME(gva_to_gpa_nested)(struct kvm_vcpu *vcpu, gva_t vaddr,
+ u32 access, u32 *error)
+{
+   struct guest_walker walker;
+   gpa_t gpa = UNMAPPED_GVA;
+   int r;
+
+   r = FNAME(walk_addr_nested)(&walker, vcpu, vaddr,
+   access & PFERR_WRITE_MASK,
+   access & PFERR_USER_MASK,
+   access & PFERR_FETCH_MASK);
+
+   if (r) {
+   gpa = gfn_to_gpa(walker.gfn);
+   gpa |= vaddr & ~PAGE_MASK;
+   } else if (error)
+   *error = walker.error_code;
+
+   return gpa;
+}
+
 static void FNAME(prefetch_page)(struct kvm_vcpu *vcpu,
 struct kvm_mmu_page *sp)
 {
diff --git a/arch/x86/kvm/x86.h b/arch/x86/kvm/x86.h
index 2d6385e..bf4dc2f 100644
--- a/arch/x86/kvm/x86.h
+++ b/arch/x86/kvm/x86.h
@@ -50,6 +50,11 @@ static inline int is_long_mode(struct kvm_vcpu *vcpu)
 #endif
 }
 
+static inline bool mmu_is_nested(struct kvm_vcpu *vcpu)
+{
+   return vcpu->arch.walk_mmu == &vcpu->arch.nested_mmu;
+}
+
 static inline int is_pae(struct kvm_vcpu *vcpu)
 {
return kvm_read_cr4_bits(vcpu, X86_CR4_PAE);
-- 
1.7.0.4


--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 24/27] KVM: SVM: Implement MMU helper functions for Nested Nested Paging

2010-09-06 Thread Joerg Roedel
This patch adds the helper functions which will be used in
the mmu context for handling nested nested page faults.

Signed-off-by: Joerg Roedel 
---
 arch/x86/kvm/svm.c |   32 
 1 files changed, 32 insertions(+), 0 deletions(-)

diff --git a/arch/x86/kvm/svm.c b/arch/x86/kvm/svm.c
index a98ac52..6e72ba9 100644
--- a/arch/x86/kvm/svm.c
+++ b/arch/x86/kvm/svm.c
@@ -104,6 +104,8 @@ struct nested_state {
u32 intercept_exceptions;
u64 intercept;
 
+   /* Nested Paging related state */
+   u64 nested_cr3;
 };
 
 #define MSRPM_OFFSETS  16
@@ -1600,6 +1602,36 @@ static int vmmcall_interception(struct vcpu_svm *svm)
return 1;
 }
 
+static unsigned long nested_svm_get_tdp_cr3(struct kvm_vcpu *vcpu)
+{
+   struct vcpu_svm *svm = to_svm(vcpu);
+
+   return svm->nested.nested_cr3;
+}
+
+static void nested_svm_set_tdp_cr3(struct kvm_vcpu *vcpu,
+  unsigned long root)
+{
+   struct vcpu_svm *svm = to_svm(vcpu);
+
+   svm->vmcb->control.nested_cr3 = root;
+   force_new_asid(vcpu);
+}
+
+static void nested_svm_inject_npf_exit(struct kvm_vcpu *vcpu,
+  unsigned long addr,
+  u32 error_code)
+{
+   struct vcpu_svm *svm = to_svm(vcpu);
+
+   svm->vmcb->control.exit_code = SVM_EXIT_NPF;
+   svm->vmcb->control.exit_code_hi = 0;
+   svm->vmcb->control.exit_info_1 = error_code;
+   svm->vmcb->control.exit_info_2 = addr;
+
+   nested_svm_vmexit(svm);
+}
+
 static int nested_svm_check_permissions(struct vcpu_svm *svm)
 {
if (!(svm->vcpu.arch.efer & EFER_SVME)
-- 
1.7.0.4


--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 06/27] KVM: MMU: Introduce inject_page_fault function pointer

2010-09-06 Thread Joerg Roedel
This patch introduces an inject_page_fault function pointer
into struct kvm_mmu which will be used to inject a page
fault. This will be used later when Nested Nested Paging is
implemented.

Signed-off-by: Joerg Roedel 
---
 arch/x86/include/asm/kvm_host.h |3 +++
 arch/x86/kvm/mmu.c  |4 +++-
 2 files changed, 6 insertions(+), 1 deletions(-)

diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
index ab708ee..3fefcd8 100644
--- a/arch/x86/include/asm/kvm_host.h
+++ b/arch/x86/include/asm/kvm_host.h
@@ -239,6 +239,9 @@ struct kvm_mmu {
void (*set_cr3)(struct kvm_vcpu *vcpu, unsigned long root);
unsigned long (*get_cr3)(struct kvm_vcpu *vcpu);
int (*page_fault)(struct kvm_vcpu *vcpu, gva_t gva, u32 err);
+   void (*inject_page_fault)(struct kvm_vcpu *vcpu,
+ unsigned long addr,
+ u32 error_code);
void (*free)(struct kvm_vcpu *vcpu);
gpa_t (*gva_to_gpa)(struct kvm_vcpu *vcpu, gva_t gva, u32 access,
u32 *error);
diff --git a/arch/x86/kvm/mmu.c b/arch/x86/kvm/mmu.c
index d2213fa..5b55451 100644
--- a/arch/x86/kvm/mmu.c
+++ b/arch/x86/kvm/mmu.c
@@ -2570,7 +2570,7 @@ static void inject_page_fault(struct kvm_vcpu *vcpu,
  u64 addr,
  u32 err_code)
 {
-   kvm_inject_page_fault(vcpu, addr, err_code);
+   vcpu->arch.mmu.inject_page_fault(vcpu, addr, err_code);
 }
 
 static void paging_free(struct kvm_vcpu *vcpu)
@@ -2718,6 +2718,7 @@ static int init_kvm_tdp_mmu(struct kvm_vcpu *vcpu)
context->direct_map = true;
context->set_cr3 = kvm_x86_ops->set_tdp_cr3;
context->get_cr3 = get_cr3;
+   context->inject_page_fault = kvm_inject_page_fault;
 
if (!is_paging(vcpu)) {
context->gva_to_gpa = nonpaging_gva_to_gpa;
@@ -2760,6 +2761,7 @@ static int init_kvm_softmmu(struct kvm_vcpu *vcpu)
vcpu->arch.mmu.direct_map= false;
vcpu->arch.mmu.set_cr3   = kvm_x86_ops->set_cr3;
vcpu->arch.mmu.get_cr3   = get_cr3;
+   vcpu->arch.mmu.inject_page_fault = kvm_inject_page_fault;
 
return r;
 }
-- 
1.7.0.4


--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 25/27] KVM: SVM: Initialize Nested Nested MMU context on VMRUN

2010-09-06 Thread Joerg Roedel
This patch adds code to initialize the Nested Nested Paging
MMU context when the L1 guest executes a VMRUN instruction
and has nested paging enabled in its VMCB.

Signed-off-by: Joerg Roedel 
---
 arch/x86/kvm/mmu.c |1 +
 arch/x86/kvm/svm.c |   50 +-
 2 files changed, 42 insertions(+), 9 deletions(-)

diff --git a/arch/x86/kvm/mmu.c b/arch/x86/kvm/mmu.c
index ea8ed8b..cf4474b 100644
--- a/arch/x86/kvm/mmu.c
+++ b/arch/x86/kvm/mmu.c
@@ -2945,6 +2945,7 @@ void kvm_mmu_unload(struct kvm_vcpu *vcpu)
 {
mmu_free_roots(vcpu);
 }
+EXPORT_SYMBOL_GPL(kvm_mmu_unload);
 
 static void mmu_pte_write_zap_pte(struct kvm_vcpu *vcpu,
  struct kvm_mmu_page *sp,
diff --git a/arch/x86/kvm/svm.c b/arch/x86/kvm/svm.c
index 6e72ba9..949e10d 100644
--- a/arch/x86/kvm/svm.c
+++ b/arch/x86/kvm/svm.c
@@ -294,6 +294,15 @@ static inline void flush_guest_tlb(struct kvm_vcpu *vcpu)
force_new_asid(vcpu);
 }
 
+static int get_npt_level(void)
+{
+#ifdef CONFIG_X86_64
+   return PT64_ROOT_LEVEL;
+#else
+   return PT32E_ROOT_LEVEL;
+#endif
+}
+
 static void svm_set_efer(struct kvm_vcpu *vcpu, u64 efer)
 {
vcpu->arch.efer = efer;
@@ -1632,6 +1641,26 @@ static void nested_svm_inject_npf_exit(struct kvm_vcpu 
*vcpu,
nested_svm_vmexit(svm);
 }
 
+static int nested_svm_init_mmu_context(struct kvm_vcpu *vcpu)
+{
+   int r;
+
+   r = kvm_init_shadow_mmu(vcpu, &vcpu->arch.mmu);
+
+   vcpu->arch.mmu.set_cr3   = nested_svm_set_tdp_cr3;
+   vcpu->arch.mmu.get_cr3   = nested_svm_get_tdp_cr3;
+   vcpu->arch.mmu.inject_page_fault = nested_svm_inject_npf_exit;
+   vcpu->arch.mmu.shadow_root_level = get_npt_level();
+   vcpu->arch.walk_mmu  = &vcpu->arch.nested_mmu;
+
+   return r;
+}
+
+static void nested_svm_uninit_mmu_context(struct kvm_vcpu *vcpu)
+{
+   vcpu->arch.walk_mmu = &vcpu->arch.mmu;
+}
+
 static int nested_svm_check_permissions(struct vcpu_svm *svm)
 {
if (!(svm->vcpu.arch.efer & EFER_SVME)
@@ -2000,6 +2029,8 @@ static int nested_svm_vmexit(struct vcpu_svm *svm)
kvm_clear_exception_queue(&svm->vcpu);
kvm_clear_interrupt_queue(&svm->vcpu);
 
+   svm->nested.nested_cr3 = 0;
+
/* Restore selected save entries */
svm->vmcb->save.es = hsave->save.es;
svm->vmcb->save.cs = hsave->save.cs;
@@ -2026,6 +2057,7 @@ static int nested_svm_vmexit(struct vcpu_svm *svm)
 
nested_svm_unmap(page);
 
+   nested_svm_uninit_mmu_context(&svm->vcpu);
kvm_mmu_reset_context(&svm->vcpu);
kvm_mmu_load(&svm->vcpu);
 
@@ -2073,6 +2105,9 @@ static bool nested_vmcb_checks(struct vmcb *vmcb)
if (vmcb->control.asid == 0)
return false;
 
+   if (vmcb->control.nested_ctl && !npt_enabled)
+   return false;
+
return true;
 }
 
@@ -2145,6 +2180,12 @@ static bool nested_svm_vmrun(struct vcpu_svm *svm)
else
svm->vcpu.arch.hflags &= ~HF_HIF_MASK;
 
+   if (nested_vmcb->control.nested_ctl) {
+   kvm_mmu_unload(&svm->vcpu);
+   svm->nested.nested_cr3 = nested_vmcb->control.nested_cr3;
+   nested_svm_init_mmu_context(&svm->vcpu);
+   }
+
/* Load the nested guest state */
svm->vmcb->save.es = nested_vmcb->save.es;
svm->vmcb->save.cs = nested_vmcb->save.cs;
@@ -3412,15 +3453,6 @@ static bool svm_cpu_has_accelerated_tpr(void)
return false;
 }
 
-static int get_npt_level(void)
-{
-#ifdef CONFIG_X86_64
-   return PT64_ROOT_LEVEL;
-#else
-   return PT32E_ROOT_LEVEL;
-#endif
-}
-
 static u64 svm_get_mt_mask(struct kvm_vcpu *vcpu, gfn_t gfn, bool is_mmio)
 {
return 0;
-- 
1.7.0.4


--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 27/27] KVM: SVM: Report Nested Paging support to userspace

2010-09-06 Thread Joerg Roedel
This patch implements the reporting of the nested paging
feature support to userspace.

Signed-off-by: Joerg Roedel 
---
 arch/x86/kvm/svm.c |4 
 1 files changed, 4 insertions(+), 0 deletions(-)

diff --git a/arch/x86/kvm/svm.c b/arch/x86/kvm/svm.c
index 932183e..dd6c529 100644
--- a/arch/x86/kvm/svm.c
+++ b/arch/x86/kvm/svm.c
@@ -3478,6 +3478,10 @@ static void svm_set_supported_cpuid(u32 func, struct 
kvm_cpuid_entry2 *entry)
if (svm_has(SVM_FEATURE_NRIP))
entry->edx |= SVM_FEATURE_NRIP;
 
+   /* Support NPT for the guest if enabled */
+   if (npt_enabled)
+   entry->edx |= SVM_FEATURE_NPT;
+
break;
}
 }
-- 
1.7.0.4


--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 26/27] KVM: SVM: Expect two more candiates for exit_int_info

2010-09-06 Thread Joerg Roedel
This patch adds INTR and NMI intercepts to the list of
expected intercepts with an exit_int_info set. While this
can't happen on bare metal it is architectural legal and may
happen with KVMs SVM emulation.

Signed-off-by: Joerg Roedel 
---
 arch/x86/kvm/svm.c |3 ++-
 1 files changed, 2 insertions(+), 1 deletions(-)

diff --git a/arch/x86/kvm/svm.c b/arch/x86/kvm/svm.c
index 949e10d..932183e 100644
--- a/arch/x86/kvm/svm.c
+++ b/arch/x86/kvm/svm.c
@@ -2993,7 +2993,8 @@ static int handle_exit(struct kvm_vcpu *vcpu)
 
if (is_external_interrupt(svm->vmcb->control.exit_int_info) &&
exit_code != SVM_EXIT_EXCP_BASE + PF_VECTOR &&
-   exit_code != SVM_EXIT_NPF && exit_code != SVM_EXIT_TASK_SWITCH)
+   exit_code != SVM_EXIT_NPF && exit_code != SVM_EXIT_TASK_SWITCH &&
+   exit_code != SVM_EXIT_INTR && exit_code != SVM_EXIT_NMI)
printk(KERN_ERR "%s: unexpected exit_ini_info 0x%x "
   "exit_code 0x%x\n",
   __func__, svm->vmcb->control.exit_int_info,
-- 
1.7.0.4


--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 11/27] KVM: X86: Introduce pointer to mmu context used for gva_to_gpa

2010-09-06 Thread Joerg Roedel
This patch introduces the walk_mmu pointer which points to
the mmu-context currently used for gva_to_gpa translations.

Signed-off-by: Joerg Roedel 
---
 arch/x86/include/asm/kvm_host.h |   14 ++
 arch/x86/kvm/mmu.c  |   10 +-
 arch/x86/kvm/x86.c  |   17 ++---
 3 files changed, 29 insertions(+), 12 deletions(-)

diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
index af8cce3..d797746 100644
--- a/arch/x86/include/asm/kvm_host.h
+++ b/arch/x86/include/asm/kvm_host.h
@@ -288,7 +288,21 @@ struct kvm_vcpu_arch {
u64 ia32_misc_enable_msr;
bool tpr_access_reporting;
 
+   /*
+* Paging state of the vcpu
+*
+* If the vcpu runs in guest mode with two level paging this still saves
+* the paging mode of the l1 guest. This context is always used to
+* handle faults.
+*/
struct kvm_mmu mmu;
+
+   /*
+* Pointer to the mmu context currently used for
+* gva_to_gpa translations.
+*/
+   struct kvm_mmu *walk_mmu;
+
/* only needed in kvm_pv_mmu_op() path, but it's hot so
 * put it here to avoid allocation */
struct kvm_pv_mmu_op_buffer mmu_op_buffer;
diff --git a/arch/x86/kvm/mmu.c b/arch/x86/kvm/mmu.c
index 9668f91..a2cd2ce 100644
--- a/arch/x86/kvm/mmu.c
+++ b/arch/x86/kvm/mmu.c
@@ -2707,7 +2707,7 @@ static int paging32E_init_context(struct kvm_vcpu *vcpu,
 
 static int init_kvm_tdp_mmu(struct kvm_vcpu *vcpu)
 {
-   struct kvm_mmu *context = &vcpu->arch.mmu;
+   struct kvm_mmu *context = vcpu->arch.walk_mmu;
 
context->new_cr3 = nonpaging_new_cr3;
context->page_fault = tdp_page_fault;
@@ -2767,11 +2767,11 @@ EXPORT_SYMBOL_GPL(kvm_init_shadow_mmu);
 
 static int init_kvm_softmmu(struct kvm_vcpu *vcpu)
 {
-   int r = kvm_init_shadow_mmu(vcpu, &vcpu->arch.mmu);
+   int r = kvm_init_shadow_mmu(vcpu, vcpu->arch.walk_mmu);
 
-   vcpu->arch.mmu.set_cr3   = kvm_x86_ops->set_cr3;
-   vcpu->arch.mmu.get_cr3   = get_cr3;
-   vcpu->arch.mmu.inject_page_fault = kvm_inject_page_fault;
+   vcpu->arch.walk_mmu->set_cr3   = kvm_x86_ops->set_cr3;
+   vcpu->arch.walk_mmu->get_cr3   = get_cr3;
+   vcpu->arch.walk_mmu->inject_page_fault = kvm_inject_page_fault;
 
return r;
 }
diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index 829efb0..e5dcf7f 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -3441,27 +3441,27 @@ static gpa_t translate_gpa(struct kvm_vcpu *vcpu, gpa_t 
gpa, u32 *error)
 gpa_t kvm_mmu_gva_to_gpa_read(struct kvm_vcpu *vcpu, gva_t gva, u32 *error)
 {
u32 access = (kvm_x86_ops->get_cpl(vcpu) == 3) ? PFERR_USER_MASK : 0;
-   return vcpu->arch.mmu.gva_to_gpa(vcpu, gva, access, error);
+   return vcpu->arch.walk_mmu->gva_to_gpa(vcpu, gva, access, error);
 }
 
  gpa_t kvm_mmu_gva_to_gpa_fetch(struct kvm_vcpu *vcpu, gva_t gva, u32 *error)
 {
u32 access = (kvm_x86_ops->get_cpl(vcpu) == 3) ? PFERR_USER_MASK : 0;
access |= PFERR_FETCH_MASK;
-   return vcpu->arch.mmu.gva_to_gpa(vcpu, gva, access, error);
+   return vcpu->arch.walk_mmu->gva_to_gpa(vcpu, gva, access, error);
 }
 
 gpa_t kvm_mmu_gva_to_gpa_write(struct kvm_vcpu *vcpu, gva_t gva, u32 *error)
 {
u32 access = (kvm_x86_ops->get_cpl(vcpu) == 3) ? PFERR_USER_MASK : 0;
access |= PFERR_WRITE_MASK;
-   return vcpu->arch.mmu.gva_to_gpa(vcpu, gva, access, error);
+   return vcpu->arch.walk_mmu->gva_to_gpa(vcpu, gva, access, error);
 }
 
 /* uses this to access any guest's mapped memory without checking CPL */
 gpa_t kvm_mmu_gva_to_gpa_system(struct kvm_vcpu *vcpu, gva_t gva, u32 *error)
 {
-   return vcpu->arch.mmu.gva_to_gpa(vcpu, gva, 0, error);
+   return vcpu->arch.walk_mmu->gva_to_gpa(vcpu, gva, 0, error);
 }
 
 static int kvm_read_guest_virt_helper(gva_t addr, void *val, unsigned int 
bytes,
@@ -3472,7 +3472,8 @@ static int kvm_read_guest_virt_helper(gva_t addr, void 
*val, unsigned int bytes,
int r = X86EMUL_CONTINUE;
 
while (bytes) {
-   gpa_t gpa = vcpu->arch.mmu.gva_to_gpa(vcpu, addr, access, 
error);
+   gpa_t gpa = vcpu->arch.walk_mmu->gva_to_gpa(vcpu, addr, access,
+   error);
unsigned offset = addr & (PAGE_SIZE-1);
unsigned toread = min(bytes, (unsigned)PAGE_SIZE - offset);
int ret;
@@ -3527,8 +3528,9 @@ static int kvm_write_guest_virt_system(gva_t addr, void 
*val,
int r = X86EMUL_CONTINUE;
 
while (bytes) {
-   gpa_t gpa =  vcpu->arch.mmu.gva_to_gpa(vcpu, addr,
-  PFERR_WRITE_MASK, error);
+   gpa_t gpa =  vcpu->arch.walk_mmu->gva_to_gpa(vcpu, addr,
+PFERR_WRITE_MASK,
+

[PATCH 04/27] KVM: X86: Introduce a tdp_set_cr3 function

2010-09-06 Thread Joerg Roedel
This patch introduces a special set_tdp_cr3 function pointer
in kvm_x86_ops which is only used for tpd enabled mmu
contexts. This allows to remove some hacks from svm code.

Signed-off-by: Joerg Roedel 
---
 arch/x86/include/asm/kvm_host.h |2 ++
 arch/x86/kvm/mmu.c  |2 +-
 arch/x86/kvm/svm.c  |   23 ++-
 arch/x86/kvm/vmx.c  |2 ++
 4 files changed, 19 insertions(+), 10 deletions(-)

diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
index 43c8db0..aeeea9c 100644
--- a/arch/x86/include/asm/kvm_host.h
+++ b/arch/x86/include/asm/kvm_host.h
@@ -526,6 +526,8 @@ struct kvm_x86_ops {
bool (*rdtscp_supported)(void);
void (*adjust_tsc_offset)(struct kvm_vcpu *vcpu, s64 adjustment);
 
+   void (*set_tdp_cr3)(struct kvm_vcpu *vcpu, unsigned long cr3);
+
void (*set_supported_cpuid)(u32 func, struct kvm_cpuid_entry2 *entry);
 
bool (*has_wbinvd_exit)(void);
diff --git a/arch/x86/kvm/mmu.c b/arch/x86/kvm/mmu.c
index 2ac3851..543ec74 100644
--- a/arch/x86/kvm/mmu.c
+++ b/arch/x86/kvm/mmu.c
@@ -2711,7 +2711,7 @@ static int init_kvm_tdp_mmu(struct kvm_vcpu *vcpu)
context->shadow_root_level = kvm_x86_ops->get_tdp_level();
context->root_hpa = INVALID_PAGE;
context->direct_map = true;
-   context->set_cr3 = kvm_x86_ops->set_cr3;
+   context->set_cr3 = kvm_x86_ops->set_tdp_cr3;
 
if (!is_paging(vcpu)) {
context->gva_to_gpa = nonpaging_gva_to_gpa;
diff --git a/arch/x86/kvm/svm.c b/arch/x86/kvm/svm.c
index 6808f64..094df31 100644
--- a/arch/x86/kvm/svm.c
+++ b/arch/x86/kvm/svm.c
@@ -3216,9 +3216,6 @@ static void svm_vcpu_run(struct kvm_vcpu *vcpu)
gs_selector = kvm_read_gs();
ldt_selector = kvm_read_ldt();
svm->vmcb->save.cr2 = vcpu->arch.cr2;
-   /* required for live migration with NPT */
-   if (npt_enabled)
-   svm->vmcb->save.cr3 = vcpu->arch.cr3;
 
clgi();
 
@@ -3335,16 +3332,22 @@ static void svm_set_cr3(struct kvm_vcpu *vcpu, unsigned 
long root)
 {
struct vcpu_svm *svm = to_svm(vcpu);
 
-   if (npt_enabled) {
-   svm->vmcb->control.nested_cr3 = root;
-   force_new_asid(vcpu);
-   return;
-   }
-
svm->vmcb->save.cr3 = root;
force_new_asid(vcpu);
 }
 
+static void set_tdp_cr3(struct kvm_vcpu *vcpu, unsigned long root)
+{
+   struct vcpu_svm *svm = to_svm(vcpu);
+
+   svm->vmcb->control.nested_cr3 = root;
+
+   /* Also sync guest cr3 here in case we live migrate */
+   svm->vmcb->save.cr3 = vcpu->arch.cr3;
+
+   force_new_asid(vcpu);
+}
+
 static int is_disabled(void)
 {
u64 vm_cr;
@@ -3571,6 +3574,8 @@ static struct kvm_x86_ops svm_x86_ops = {
 
.write_tsc_offset = svm_write_tsc_offset,
.adjust_tsc_offset = svm_adjust_tsc_offset,
+
+   .set_tdp_cr3 = set_tdp_cr3,
 };
 
 static int __init svm_init(void)
diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c
index 676555c..0e62d8a 100644
--- a/arch/x86/kvm/vmx.c
+++ b/arch/x86/kvm/vmx.c
@@ -4347,6 +4347,8 @@ static struct kvm_x86_ops vmx_x86_ops = {
 
.write_tsc_offset = vmx_write_tsc_offset,
.adjust_tsc_offset = vmx_adjust_tsc_offset,
+
+   .set_tdp_cr3 = vmx_set_cr3,
 };
 
 static int __init vmx_init(void)
-- 
1.7.0.4


--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 02/27] KVM: MMU: Make tdp_enabled a mmu-context parameter

2010-09-06 Thread Joerg Roedel
This patch changes the tdp_enabled flag from its global
meaning to the mmu-context and renames it to direct_map
there. This is necessary for Nested SVM with emulation of
Nested Paging where we need an extra MMU context to shadow
the Nested Nested Page Table.

Signed-off-by: Joerg Roedel 
---
 arch/x86/include/asm/kvm_host.h |1 +
 arch/x86/kvm/mmu.c  |   20 
 2 files changed, 13 insertions(+), 8 deletions(-)

diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
index 9b30285..53cdf39 100644
--- a/arch/x86/include/asm/kvm_host.h
+++ b/arch/x86/include/asm/kvm_host.h
@@ -249,6 +249,7 @@ struct kvm_mmu {
int root_level;
int shadow_root_level;
union kvm_mmu_page_role base_role;
+   bool direct_map;
 
u64 *pae_root;
u64 rsvd_bits_mask[2][4];
diff --git a/arch/x86/kvm/mmu.c b/arch/x86/kvm/mmu.c
index b2136f9..bfb3f23 100644
--- a/arch/x86/kvm/mmu.c
+++ b/arch/x86/kvm/mmu.c
@@ -1448,7 +1448,8 @@ static struct kvm_mmu_page *kvm_mmu_get_page(struct 
kvm_vcpu *vcpu,
if (role.direct)
role.cr4_pae = 0;
role.access = access;
-   if (!tdp_enabled && vcpu->arch.mmu.root_level <= PT32_ROOT_LEVEL) {
+   if (!vcpu->arch.mmu.direct_map
+   && vcpu->arch.mmu.root_level <= PT32_ROOT_LEVEL) {
quadrant = gaddr >> (PAGE_SHIFT + (PT64_PT_BITS * level));
quadrant &= (1 << ((PT32_PT_BITS - PT64_PT_BITS) * level)) - 1;
role.quadrant = quadrant;
@@ -1973,7 +1974,7 @@ static int set_spte(struct kvm_vcpu *vcpu, u64 *sptep,
spte |= shadow_user_mask;
if (level > PT_PAGE_TABLE_LEVEL)
spte |= PT_PAGE_SIZE_MASK;
-   if (tdp_enabled)
+   if (vcpu->arch.mmu.direct_map)
spte |= kvm_x86_ops->get_mt_mask(vcpu, gfn,
kvm_is_mmio_pfn(pfn));
 
@@ -1983,8 +1984,8 @@ static int set_spte(struct kvm_vcpu *vcpu, u64 *sptep,
spte |= (u64)pfn << PAGE_SHIFT;
 
if ((pte_access & ACC_WRITE_MASK)
-   || (!tdp_enabled && write_fault && !is_write_protection(vcpu)
-   && !user_fault)) {
+   || (!vcpu->arch.mmu.direct_map && write_fault
+   && !is_write_protection(vcpu) && !user_fault)) {
 
if (level > PT_PAGE_TABLE_LEVEL &&
has_wrprotected_page(vcpu->kvm, gfn, level)) {
@@ -1995,7 +1996,8 @@ static int set_spte(struct kvm_vcpu *vcpu, u64 *sptep,
 
spte |= PT_WRITABLE_MASK;
 
-   if (!tdp_enabled && !(pte_access & ACC_WRITE_MASK))
+   if (!vcpu->arch.mmu.direct_map
+   && !(pte_access & ACC_WRITE_MASK))
spte &= ~PT_USER_MASK;
 
/*
@@ -2371,7 +2373,7 @@ static int mmu_alloc_roots(struct kvm_vcpu *vcpu)
ASSERT(!VALID_PAGE(root));
if (mmu_check_root(vcpu, root_gfn))
return 1;
-   if (tdp_enabled) {
+   if (vcpu->arch.mmu.direct_map) {
direct = 1;
root_gfn = 0;
}
@@ -2406,7 +2408,7 @@ static int mmu_alloc_roots(struct kvm_vcpu *vcpu)
return 1;
} else if (vcpu->arch.mmu.root_level == 0)
root_gfn = 0;
-   if (tdp_enabled) {
+   if (vcpu->arch.mmu.direct_map) {
direct = 1;
root_gfn = i << 30;
}
@@ -2708,6 +2710,7 @@ static int init_kvm_tdp_mmu(struct kvm_vcpu *vcpu)
context->invlpg = nonpaging_invlpg;
context->shadow_root_level = kvm_x86_ops->get_tdp_level();
context->root_hpa = INVALID_PAGE;
+   context->direct_map = true;
 
if (!is_paging(vcpu)) {
context->gva_to_gpa = nonpaging_gva_to_gpa;
@@ -2747,6 +2750,7 @@ static int init_kvm_softmmu(struct kvm_vcpu *vcpu)
 
vcpu->arch.mmu.base_role.cr4_pae = !!is_pae(vcpu);
vcpu->arch.mmu.base_role.cr0_wp = is_write_protection(vcpu);
+   vcpu->arch.mmu.direct_map= false;
 
return r;
 }
@@ -3060,7 +3064,7 @@ int kvm_mmu_unprotect_page_virt(struct kvm_vcpu *vcpu, 
gva_t gva)
gpa_t gpa;
int r;
 
-   if (tdp_enabled)
+   if (vcpu->arch.mmu.direct_map)
return 0;
 
gpa = kvm_mmu_gva_to_gpa_read(vcpu, gva, NULL);
-- 
1.7.0.4


--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 03/27] KVM: MMU: Make set_cr3 a function pointer in kvm_mmu

2010-09-06 Thread Joerg Roedel
This is necessary to implement Nested Nested Paging. As a
side effect this allows some cleanups in the SVM nested
paging code.

Signed-off-by: Joerg Roedel 
---
 arch/x86/include/asm/kvm_host.h |1 +
 arch/x86/kvm/mmu.c  |4 +++-
 2 files changed, 4 insertions(+), 1 deletions(-)

diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
index 53cdf39..43c8db0 100644
--- a/arch/x86/include/asm/kvm_host.h
+++ b/arch/x86/include/asm/kvm_host.h
@@ -236,6 +236,7 @@ struct kvm_pio_request {
  */
 struct kvm_mmu {
void (*new_cr3)(struct kvm_vcpu *vcpu);
+   void (*set_cr3)(struct kvm_vcpu *vcpu, unsigned long root);
int (*page_fault)(struct kvm_vcpu *vcpu, gva_t gva, u32 err);
void (*free)(struct kvm_vcpu *vcpu);
gpa_t (*gva_to_gpa)(struct kvm_vcpu *vcpu, gva_t gva, u32 access,
diff --git a/arch/x86/kvm/mmu.c b/arch/x86/kvm/mmu.c
index bfb3f23..2ac3851 100644
--- a/arch/x86/kvm/mmu.c
+++ b/arch/x86/kvm/mmu.c
@@ -2711,6 +2711,7 @@ static int init_kvm_tdp_mmu(struct kvm_vcpu *vcpu)
context->shadow_root_level = kvm_x86_ops->get_tdp_level();
context->root_hpa = INVALID_PAGE;
context->direct_map = true;
+   context->set_cr3 = kvm_x86_ops->set_cr3;
 
if (!is_paging(vcpu)) {
context->gva_to_gpa = nonpaging_gva_to_gpa;
@@ -2751,6 +2752,7 @@ static int init_kvm_softmmu(struct kvm_vcpu *vcpu)
vcpu->arch.mmu.base_role.cr4_pae = !!is_pae(vcpu);
vcpu->arch.mmu.base_role.cr0_wp = is_write_protection(vcpu);
vcpu->arch.mmu.direct_map= false;
+   vcpu->arch.mmu.set_cr3   = kvm_x86_ops->set_cr3;
 
return r;
 }
@@ -2794,7 +2796,7 @@ int kvm_mmu_load(struct kvm_vcpu *vcpu)
if (r)
goto out;
/* set_cr3() should ensure TLB has been flushed */
-   kvm_x86_ops->set_cr3(vcpu, vcpu->arch.mmu.root_hpa);
+   vcpu->arch.mmu.set_cr3(vcpu, vcpu->arch.mmu.root_hpa);
 out:
return r;
 }
-- 
1.7.0.4


--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 21/27] KVM: MMU: Introduce kvm_pdptr_read_mmu

2010-09-06 Thread Joerg Roedel
This function is implemented to load the pdptr pointers of
the currently running guest (l1 or l2 guest). Therefore it
takes care about the current paging mode and can read pdptrs
out of l2 guest physical memory.

Signed-off-by: Joerg Roedel 
---
 arch/x86/kvm/kvm_cache_regs.h |7 +++
 arch/x86/kvm/mmu.c|2 +-
 arch/x86/kvm/paging_tmpl.h|2 +-
 3 files changed, 9 insertions(+), 2 deletions(-)

diff --git a/arch/x86/kvm/kvm_cache_regs.h b/arch/x86/kvm/kvm_cache_regs.h
index a37abe2..975bb45 100644
--- a/arch/x86/kvm/kvm_cache_regs.h
+++ b/arch/x86/kvm/kvm_cache_regs.h
@@ -45,6 +45,13 @@ static inline u64 kvm_pdptr_read(struct kvm_vcpu *vcpu, int 
index)
return vcpu->arch.walk_mmu->pdptrs[index];
 }
 
+static inline u64 kvm_pdptr_read_mmu(struct kvm_vcpu *vcpu, struct kvm_mmu 
*mmu, int index)
+{
+   load_pdptrs(vcpu, mmu, mmu->get_cr3(vcpu));
+
+   return mmu->pdptrs[index];
+}
+
 static inline ulong kvm_read_cr0_bits(struct kvm_vcpu *vcpu, ulong mask)
 {
ulong tmask = mask & KVM_POSSIBLE_CR0_GUEST_BITS;
diff --git a/arch/x86/kvm/mmu.c b/arch/x86/kvm/mmu.c
index 7bc8d67..3663d1c 100644
--- a/arch/x86/kvm/mmu.c
+++ b/arch/x86/kvm/mmu.c
@@ -2398,7 +2398,7 @@ static int mmu_alloc_roots(struct kvm_vcpu *vcpu)
 
ASSERT(!VALID_PAGE(root));
if (vcpu->arch.mmu.root_level == PT32E_ROOT_LEVEL) {
-   pdptr = kvm_pdptr_read(vcpu, i);
+   pdptr = kvm_pdptr_read_mmu(vcpu, &vcpu->arch.mmu, i);
if (!is_present_gpte(pdptr)) {
vcpu->arch.mmu.pae_root[i] = 0;
continue;
diff --git a/arch/x86/kvm/paging_tmpl.h b/arch/x86/kvm/paging_tmpl.h
index 20fc815..c0aac98 100644
--- a/arch/x86/kvm/paging_tmpl.h
+++ b/arch/x86/kvm/paging_tmpl.h
@@ -137,7 +137,7 @@ walk:
 
 #if PTTYPE == 64
if (walker->level == PT32E_ROOT_LEVEL) {
-   pte = kvm_pdptr_read(vcpu, (addr >> 30) & 3);
+   pte = kvm_pdptr_read_mmu(vcpu, mmu, (addr >> 30) & 3);
trace_kvm_mmu_paging_element(pte, walker->level);
if (!is_present_gpte(pte)) {
present = false;
-- 
1.7.0.4


--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 09/27] KVM: MMU: Introduce generic walk_addr function

2010-09-06 Thread Joerg Roedel
This is the first patch in the series towards a generic
walk_addr implementation which could walk two-dimensional
page tables in the end. In this first step the walk_addr
function is renamed into walk_addr_generic which takes a
mmu context as an additional parameter.

Signed-off-by: Joerg Roedel 
---
 arch/x86/kvm/paging_tmpl.h |   26 ++
 1 files changed, 18 insertions(+), 8 deletions(-)

diff --git a/arch/x86/kvm/paging_tmpl.h b/arch/x86/kvm/paging_tmpl.h
index 68ee1b7..f26fee9 100644
--- a/arch/x86/kvm/paging_tmpl.h
+++ b/arch/x86/kvm/paging_tmpl.h
@@ -114,9 +114,10 @@ static unsigned FNAME(gpte_access)(struct kvm_vcpu *vcpu, 
pt_element_t gpte)
 /*
  * Fetch a guest pte for a guest virtual address
  */
-static int FNAME(walk_addr)(struct guest_walker *walker,
-   struct kvm_vcpu *vcpu, gva_t addr,
-   int write_fault, int user_fault, int fetch_fault)
+static int FNAME(walk_addr_generic)(struct guest_walker *walker,
+   struct kvm_vcpu *vcpu, struct kvm_mmu *mmu,
+   gva_t addr, int write_fault,
+   int user_fault, int fetch_fault)
 {
pt_element_t pte;
gfn_t table_gfn;
@@ -129,10 +130,11 @@ static int FNAME(walk_addr)(struct guest_walker *walker,
 walk:
present = true;
eperm = rsvd_fault = false;
-   walker->level = vcpu->arch.mmu.root_level;
-   pte = vcpu->arch.mmu.get_cr3(vcpu);
+   walker->level = mmu->root_level;
+   pte   = mmu->get_cr3(vcpu);
+
 #if PTTYPE == 64
-   if (vcpu->arch.mmu.root_level == PT32E_ROOT_LEVEL) {
+   if (walker->level == PT32E_ROOT_LEVEL) {
pte = kvm_pdptr_read(vcpu, (addr >> 30) & 3);
trace_kvm_mmu_paging_element(pte, walker->level);
if (!is_present_gpte(pte)) {
@@ -143,7 +145,7 @@ walk:
}
 #endif
ASSERT((!is_long_mode(vcpu) && is_pae(vcpu)) ||
-  (vcpu->arch.mmu.get_cr3(vcpu) & CR3_NONPAE_RESERVED_BITS) == 0);
+  (mmu->get_cr3(vcpu) & CR3_NONPAE_RESERVED_BITS) == 0);
 
pt_access = ACC_ALL;
 
@@ -205,7 +207,7 @@ walk:
(PTTYPE == 64 || is_pse(vcpu))) ||
((walker->level == PT_PDPE_LEVEL) &&
is_large_pte(pte) &&
-   vcpu->arch.mmu.root_level == PT64_ROOT_LEVEL)) {
+   mmu->root_level == PT64_ROOT_LEVEL)) {
int lvl = walker->level;
 
walker->gfn = gpte_to_gfn_lvl(pte, lvl);
@@ -262,6 +264,14 @@ error:
return 0;
 }
 
+static int FNAME(walk_addr)(struct guest_walker *walker,
+   struct kvm_vcpu *vcpu, gva_t addr,
+   int write_fault, int user_fault, int fetch_fault)
+{
+   return FNAME(walk_addr_generic)(walker, vcpu, &vcpu->arch.mmu, addr,
+   write_fault, user_fault, fetch_fault);
+}
+
 static void FNAME(update_pte)(struct kvm_vcpu *vcpu, struct kvm_mmu_page *sp,
  u64 *spte, const void *pte)
 {
-- 
1.7.0.4


--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 17/27] KVM: MMU: Track page fault data in struct vcpu

2010-09-06 Thread Joerg Roedel
This patch introduces a struct with two new fields in
vcpu_arch for x86:

* fault.address
* fault.error_code

This will be used to correctly propagate page faults back
into the guest when we could have either an ordinary page
fault or a nested page fault. In the case of a nested page
fault the fault-address is different from the original
address that should be walked. So we need to keep track
about the real fault-address.

Signed-off-by: Joerg Roedel 
---
 arch/x86/include/asm/kvm_emulate.h |1 -
 arch/x86/include/asm/kvm_host.h|9 +
 arch/x86/kvm/emulate.c |   30 ++
 arch/x86/kvm/paging_tmpl.h |4 
 arch/x86/kvm/x86.c |3 ++-
 5 files changed, 29 insertions(+), 18 deletions(-)

diff --git a/arch/x86/include/asm/kvm_emulate.h 
b/arch/x86/include/asm/kvm_emulate.h
index 1bf1140..5187dd8 100644
--- a/arch/x86/include/asm/kvm_emulate.h
+++ b/arch/x86/include/asm/kvm_emulate.h
@@ -229,7 +229,6 @@ struct x86_emulate_ctxt {
int exception; /* exception that happens during emulation or -1 */
u32 error_code; /* error code for exception */
bool error_code_valid;
-   unsigned long cr2; /* faulted address in case of #PF */
 
/* decode cache */
struct decode_cache decode;
diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
index a338235..e5eb57c 100644
--- a/arch/x86/include/asm/kvm_host.h
+++ b/arch/x86/include/asm/kvm_host.h
@@ -313,6 +313,15 @@ struct kvm_vcpu_arch {
 */
struct kvm_mmu *walk_mmu;
 
+   /*
+* This struct is filled with the necessary information to propagate a
+* page fault into the guest
+*/
+   struct {
+   u64  address;
+   unsigned error_code;
+   } fault;
+
/* only needed in kvm_pv_mmu_op() path, but it's hot so
 * put it here to avoid allocation */
struct kvm_pv_mmu_op_buffer mmu_op_buffer;
diff --git a/arch/x86/kvm/emulate.c b/arch/x86/kvm/emulate.c
index 27d2c22..2b08b78 100644
--- a/arch/x86/kvm/emulate.c
+++ b/arch/x86/kvm/emulate.c
@@ -487,11 +487,9 @@ static void emulate_gp(struct x86_emulate_ctxt *ctxt, int 
err)
emulate_exception(ctxt, GP_VECTOR, err, true);
 }
 
-static void emulate_pf(struct x86_emulate_ctxt *ctxt, unsigned long addr,
-  int err)
+static void emulate_pf(struct x86_emulate_ctxt *ctxt)
 {
-   ctxt->cr2 = addr;
-   emulate_exception(ctxt, PF_VECTOR, err, true);
+   emulate_exception(ctxt, PF_VECTOR, 0, true);
 }
 
 static void emulate_ud(struct x86_emulate_ctxt *ctxt)
@@ -834,7 +832,7 @@ static int read_emulated(struct x86_emulate_ctxt *ctxt,
rc = ops->read_emulated(addr, mc->data + mc->end, n, &err,
ctxt->vcpu);
if (rc == X86EMUL_PROPAGATE_FAULT)
-   emulate_pf(ctxt, addr, err);
+   emulate_pf(ctxt);
if (rc != X86EMUL_CONTINUE)
return rc;
mc->end += n;
@@ -921,7 +919,7 @@ static int read_segment_descriptor(struct x86_emulate_ctxt 
*ctxt,
addr = dt.address + index * 8;
ret = ops->read_std(addr, desc, sizeof *desc, ctxt->vcpu,  &err);
if (ret == X86EMUL_PROPAGATE_FAULT)
-   emulate_pf(ctxt, addr, err);
+   emulate_pf(ctxt);
 
return ret;
 }
@@ -947,7 +945,7 @@ static int write_segment_descriptor(struct x86_emulate_ctxt 
*ctxt,
addr = dt.address + index * 8;
ret = ops->write_std(addr, desc, sizeof *desc, ctxt->vcpu, &err);
if (ret == X86EMUL_PROPAGATE_FAULT)
-   emulate_pf(ctxt, addr, err);
+   emulate_pf(ctxt);
 
return ret;
 }
@@ -1117,7 +1115,7 @@ static inline int writeback(struct x86_emulate_ctxt *ctxt,
&err,
ctxt->vcpu);
if (rc == X86EMUL_PROPAGATE_FAULT)
-   emulate_pf(ctxt, c->dst.addr.mem, err);
+   emulate_pf(ctxt);
if (rc != X86EMUL_CONTINUE)
return rc;
break;
@@ -1939,7 +1937,7 @@ static int task_switch_16(struct x86_emulate_ctxt *ctxt,
&err);
if (ret == X86EMUL_PROPAGATE_FAULT) {
/* FIXME: need to provide precise fault address */
-   emulate_pf(ctxt, old_tss_base, err);
+   emulate_pf(ctxt);
return ret;
}
 
@@ -1949,7 +1947,7 @@ static int task_switch_16(struct x86_emulate_ctxt *ctxt,
 &err);
if (ret == X86EMUL_PROPAGATE_FAULT) {
/* FIXME: need to provide precise fault address */
-   emulate_pf(ctxt, old_tss_base, err);
+   emulate_pf(ctxt);
return ret;
}
 
@@ -1957,7 +1955,7 @@ static

[PATCH 14/27] KVM: MMU: Make walk_addr_generic capable for two-level walking

2010-09-06 Thread Joerg Roedel
This patch uses kvm_read_guest_page_tdp to make the
walk_addr_generic functions suitable for two-level page
table walking.

Signed-off-by: Joerg Roedel 
---
 arch/x86/kvm/paging_tmpl.h |   27 ---
 1 files changed, 20 insertions(+), 7 deletions(-)

diff --git a/arch/x86/kvm/paging_tmpl.h b/arch/x86/kvm/paging_tmpl.h
index cd59af1..a5b5759 100644
--- a/arch/x86/kvm/paging_tmpl.h
+++ b/arch/x86/kvm/paging_tmpl.h
@@ -124,6 +124,8 @@ static int FNAME(walk_addr_generic)(struct guest_walker 
*walker,
unsigned index, pt_access, uninitialized_var(pte_access);
gpa_t pte_gpa;
bool eperm, present, rsvd_fault;
+   int offset;
+   u32 error = 0;
 
trace_kvm_mmu_pagetable_walk(addr, write_fault, user_fault,
 fetch_fault);
@@ -153,12 +155,13 @@ walk:
index = PT_INDEX(addr, walker->level);
 
table_gfn = gpte_to_gfn(pte);
-   pte_gpa = gfn_to_gpa(table_gfn);
-   pte_gpa += index * sizeof(pt_element_t);
+   offset= index * sizeof(pt_element_t);
+   pte_gpa   = gfn_to_gpa(table_gfn) + offset;
walker->table_gfn[walker->level - 1] = table_gfn;
walker->pte_gpa[walker->level - 1] = pte_gpa;
 
-   if (kvm_read_guest(vcpu->kvm, pte_gpa, &pte, sizeof(pte))) {
+   if (kvm_read_guest_page_mmu(vcpu, mmu, table_gfn, &pte, offset,
+   sizeof(pte), &error)) {
present = false;
break;
}
@@ -209,15 +212,25 @@ walk:
is_large_pte(pte) &&
mmu->root_level == PT64_ROOT_LEVEL)) {
int lvl = walker->level;
+   gpa_t real_gpa;
+   gfn_t gfn;
 
-   walker->gfn = gpte_to_gfn_lvl(pte, lvl);
-   walker->gfn += (addr & PT_LVL_OFFSET_MASK(lvl))
-   >> PAGE_SHIFT;
+   gfn = gpte_to_gfn_lvl(pte, lvl);
+   gfn += (addr & PT_LVL_OFFSET_MASK(lvl)) >> PAGE_SHIFT;
 
if (PTTYPE == 32 &&
walker->level == PT_DIRECTORY_LEVEL &&
is_cpuid_PSE36())
-   walker->gfn += pse36_gfn_delta(pte);
+   gfn += pse36_gfn_delta(pte);
+
+   real_gpa = mmu->translate_gpa(vcpu, gfn_to_gpa(gfn),
+ &error);
+   if (real_gpa == UNMAPPED_GVA) {
+   walker->error_code = error;
+   return 0;
+   }
+
+   walker->gfn = real_gpa >> PAGE_SHIFT;
 
break;
}
-- 
1.7.0.4


--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 10/27] KVM: MMU: Add infrastructure for two-level page walker

2010-09-06 Thread Joerg Roedel
This patch introduces a mmu-callback to translate gpa
addresses in the walk_addr code. This is later used to
translate l2_gpa addresses into l1_gpa addresses.

Signed-off-by: Joerg Roedel 
---
 arch/x86/include/asm/kvm_host.h |1 +
 arch/x86/kvm/x86.c  |6 ++
 include/linux/kvm_host.h|5 +
 3 files changed, 12 insertions(+), 0 deletions(-)

diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
index 3fefcd8..af8cce3 100644
--- a/arch/x86/include/asm/kvm_host.h
+++ b/arch/x86/include/asm/kvm_host.h
@@ -245,6 +245,7 @@ struct kvm_mmu {
void (*free)(struct kvm_vcpu *vcpu);
gpa_t (*gva_to_gpa)(struct kvm_vcpu *vcpu, gva_t gva, u32 access,
u32 *error);
+   gpa_t (*translate_gpa)(struct kvm_vcpu *vcpu, gpa_t gpa, u32 *error);
void (*prefetch_page)(struct kvm_vcpu *vcpu,
  struct kvm_mmu_page *page);
int (*sync_page)(struct kvm_vcpu *vcpu,
diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index f47db25..829efb0 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -3433,6 +3433,11 @@ void kvm_get_segment(struct kvm_vcpu *vcpu,
kvm_x86_ops->get_segment(vcpu, var, seg);
 }
 
+static gpa_t translate_gpa(struct kvm_vcpu *vcpu, gpa_t gpa, u32 *error)
+{
+   return gpa;
+}
+
 gpa_t kvm_mmu_gva_to_gpa_read(struct kvm_vcpu *vcpu, gva_t gva, u32 *error)
 {
u32 access = (kvm_x86_ops->get_cpl(vcpu) == 3) ? PFERR_USER_MASK : 0;
@@ -5644,6 +5649,7 @@ int kvm_arch_vcpu_init(struct kvm_vcpu *vcpu)
 
vcpu->arch.emulate_ctxt.ops = &emulate_ops;
vcpu->arch.mmu.root_hpa = INVALID_PAGE;
+   vcpu->arch.mmu.translate_gpa = translate_gpa;
if (!irqchip_in_kernel(kvm) || kvm_vcpu_is_bsp(vcpu))
vcpu->arch.mp_state = KVM_MP_STATE_RUNNABLE;
else
diff --git a/include/linux/kvm_host.h b/include/linux/kvm_host.h
index f2ecdd5..f2989a7 100644
--- a/include/linux/kvm_host.h
+++ b/include/linux/kvm_host.h
@@ -534,6 +534,11 @@ static inline gpa_t gfn_to_gpa(gfn_t gfn)
return (gpa_t)gfn << PAGE_SHIFT;
 }
 
+static inline gfn_t gpa_to_gfn(gpa_t gpa)
+{
+   return (gfn_t)gpa >> PAGE_SHIFT;
+}
+
 static inline hpa_t pfn_to_hpa(pfn_t pfn)
 {
return (hpa_t)pfn << PAGE_SHIFT;
-- 
1.7.0.4


--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 07/27] KVM: MMU: Introduce kvm_init_shadow_mmu helper function

2010-09-06 Thread Joerg Roedel
Some logic of the init_kvm_softmmu function is required to
build the Nested Nested Paging context. So factor the
required logic into a seperate function and export it.
Also make the whole init path suitable for more than one mmu
context.

Signed-off-by: Joerg Roedel 
---
 arch/x86/kvm/mmu.c |   60 ++-
 arch/x86/kvm/mmu.h |1 +
 2 files changed, 36 insertions(+), 25 deletions(-)

diff --git a/arch/x86/kvm/mmu.c b/arch/x86/kvm/mmu.c
index 5b55451..787540d 100644
--- a/arch/x86/kvm/mmu.c
+++ b/arch/x86/kvm/mmu.c
@@ -2532,10 +2532,9 @@ static void nonpaging_free(struct kvm_vcpu *vcpu)
mmu_free_roots(vcpu);
 }
 
-static int nonpaging_init_context(struct kvm_vcpu *vcpu)
+static int nonpaging_init_context(struct kvm_vcpu *vcpu,
+ struct kvm_mmu *context)
 {
-   struct kvm_mmu *context = &vcpu->arch.mmu;
-
context->new_cr3 = nonpaging_new_cr3;
context->page_fault = nonpaging_page_fault;
context->gva_to_gpa = nonpaging_gva_to_gpa;
@@ -2594,9 +2593,10 @@ static bool is_rsvd_bits_set(struct kvm_vcpu *vcpu, u64 
gpte, int level)
 #include "paging_tmpl.h"
 #undef PTTYPE
 
-static void reset_rsvds_bits_mask(struct kvm_vcpu *vcpu, int level)
+static void reset_rsvds_bits_mask(struct kvm_vcpu *vcpu,
+ struct kvm_mmu *context,
+ int level)
 {
-   struct kvm_mmu *context = &vcpu->arch.mmu;
int maxphyaddr = cpuid_maxphyaddr(vcpu);
u64 exb_bit_rsvd = 0;
 
@@ -2655,9 +2655,11 @@ static void reset_rsvds_bits_mask(struct kvm_vcpu *vcpu, 
int level)
}
 }
 
-static int paging64_init_context_common(struct kvm_vcpu *vcpu, int level)
+static int paging64_init_context_common(struct kvm_vcpu *vcpu,
+   struct kvm_mmu *context,
+   int level)
 {
-   struct kvm_mmu *context = &vcpu->arch.mmu;
+   reset_rsvds_bits_mask(vcpu, context, level);
 
ASSERT(is_pae(vcpu));
context->new_cr3 = paging_new_cr3;
@@ -2673,17 +2675,17 @@ static int paging64_init_context_common(struct kvm_vcpu 
*vcpu, int level)
return 0;
 }
 
-static int paging64_init_context(struct kvm_vcpu *vcpu)
+static int paging64_init_context(struct kvm_vcpu *vcpu,
+struct kvm_mmu *context)
 {
-   reset_rsvds_bits_mask(vcpu, PT64_ROOT_LEVEL);
-   return paging64_init_context_common(vcpu, PT64_ROOT_LEVEL);
+   return paging64_init_context_common(vcpu, context, PT64_ROOT_LEVEL);
 }
 
-static int paging32_init_context(struct kvm_vcpu *vcpu)
+static int paging32_init_context(struct kvm_vcpu *vcpu,
+struct kvm_mmu *context)
 {
-   struct kvm_mmu *context = &vcpu->arch.mmu;
+   reset_rsvds_bits_mask(vcpu, context, PT32_ROOT_LEVEL);
 
-   reset_rsvds_bits_mask(vcpu, PT32_ROOT_LEVEL);
context->new_cr3 = paging_new_cr3;
context->page_fault = paging32_page_fault;
context->gva_to_gpa = paging32_gva_to_gpa;
@@ -2697,10 +2699,10 @@ static int paging32_init_context(struct kvm_vcpu *vcpu)
return 0;
 }
 
-static int paging32E_init_context(struct kvm_vcpu *vcpu)
+static int paging32E_init_context(struct kvm_vcpu *vcpu,
+ struct kvm_mmu *context)
 {
-   reset_rsvds_bits_mask(vcpu, PT32E_ROOT_LEVEL);
-   return paging64_init_context_common(vcpu, PT32E_ROOT_LEVEL);
+   return paging64_init_context_common(vcpu, context, PT32E_ROOT_LEVEL);
 }
 
 static int init_kvm_tdp_mmu(struct kvm_vcpu *vcpu)
@@ -2724,15 +2726,15 @@ static int init_kvm_tdp_mmu(struct kvm_vcpu *vcpu)
context->gva_to_gpa = nonpaging_gva_to_gpa;
context->root_level = 0;
} else if (is_long_mode(vcpu)) {
-   reset_rsvds_bits_mask(vcpu, PT64_ROOT_LEVEL);
+   reset_rsvds_bits_mask(vcpu, context, PT64_ROOT_LEVEL);
context->gva_to_gpa = paging64_gva_to_gpa;
context->root_level = PT64_ROOT_LEVEL;
} else if (is_pae(vcpu)) {
-   reset_rsvds_bits_mask(vcpu, PT32E_ROOT_LEVEL);
+   reset_rsvds_bits_mask(vcpu, context, PT32E_ROOT_LEVEL);
context->gva_to_gpa = paging64_gva_to_gpa;
context->root_level = PT32E_ROOT_LEVEL;
} else {
-   reset_rsvds_bits_mask(vcpu, PT32_ROOT_LEVEL);
+   reset_rsvds_bits_mask(vcpu, context, PT32_ROOT_LEVEL);
context->gva_to_gpa = paging32_gva_to_gpa;
context->root_level = PT32_ROOT_LEVEL;
}
@@ -2740,25 +2742,33 @@ static int init_kvm_tdp_mmu(struct kvm_vcpu *vcpu)
return 0;
 }
 
-static int init_kvm_softmmu(struct kvm_vcpu *vcpu)
+int kvm_init_shadow_mmu(struct kvm_vcpu *vcpu, struct kvm_mmu *context)
 {
int r;
-
ASSERT(vcpu);
ASSERT(!VALID_PAGE(vcpu->arch.mmu.root_hpa));
 
if (!is_pag

[PATCH 19/27] KVM: X86: Propagate fetch faults

2010-09-06 Thread Joerg Roedel
KVM currently ignores fetch faults in the instruction
emulator. With nested-npt we could have such faults. This
patch adds the code to handle these.

Signed-off-by: Joerg Roedel 
---
 arch/x86/kvm/emulate.c |3 +++
 arch/x86/kvm/x86.c |4 
 2 files changed, 7 insertions(+), 0 deletions(-)

diff --git a/arch/x86/kvm/emulate.c b/arch/x86/kvm/emulate.c
index 2b08b78..aead72e 100644
--- a/arch/x86/kvm/emulate.c
+++ b/arch/x86/kvm/emulate.c
@@ -1198,6 +1198,9 @@ static int emulate_popf(struct x86_emulate_ctxt *ctxt,
*(unsigned long *)dest =
(ctxt->eflags & ~change_mask) | (val & change_mask);
 
+   if (rc == X86EMUL_PROPAGATE_FAULT)
+   emulate_pf(ctxt);
+
return rc;
 }
 
diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index 65b00f0..ca69dcc 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -4237,6 +4237,9 @@ int emulate_instruction(struct kvm_vcpu *vcpu,
vcpu->arch.emulate_ctxt.perm_ok = false;
 
r = x86_decode_insn(&vcpu->arch.emulate_ctxt);
+   if (r == X86EMUL_PROPAGATE_FAULT)
+   goto done;
+
trace_kvm_emulate_insn_start(vcpu);
 
/* Only allow emulation of specific instructions on #UD
@@ -4295,6 +4298,7 @@ restart:
return handle_emulation_failure(vcpu);
}
 
+done:
if (vcpu->arch.emulate_ctxt.exception >= 0) {
inject_emulated_exception(vcpu);
r = EMULATE_DONE;
-- 
1.7.0.4


--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [patch 3/3] Add mce test

2010-09-06 Thread Avi Kivity

 On 09/06/2010 12:17 AM, Marcelo Tosatti wrote:

On Sun, Sep 05, 2010 at 11:15:16AM +0300, Avi Kivity wrote:

  On 09/03/2010 02:36 AM, Marcelo Tosatti wrote:

+
+int main(void)
+{
+unsigned long long status, addr;
+int bank;
+
+smp_init();
+init_idt();
+set_idt_entry(18, do_handle_mce);
+
+write_cr4(read_cr4() | X86_CR4_MCE);
+
+wrmsr(MSR_IA32_MCG_CTL, ~0ULL);
+wrmsr(MSR_IA32_MC0_CTL, ~0ULL);
+
+status = MCI_STATUS_VAL|MCI_STATUS_UC;
+addr = 0x7ff;
+bank = 0;
+ex_done = 0;
+/* mce cpu bank status mcgstatus addr misc */
+monitor_printf("mce %d %d 0x%llx 1 0x%llx 1\n", 0, bank, status, addr);

Wow, this is really evil.

I guess it could be done more nicely via the api unit tests we
talked about during kf2010?

I don't remember the details.


Map gpa:hva 1:1; map gva:gpa 1:1; do direct calls between host userspace 
and guest kernel.


Hope to post a patch soon.


What do you consider evil?


Not really evil.  It's a nice roundabout loop guest -> testdev -> 
chardev -> host -> chardev -> monitor -> kvm -> guest.


Works well, but only for features that are directly controlled via the 
monitor.  To test save/restore (for vcpu events or fpu) we need 
something more direct (guest -> host userspace -> guest).


--
error compiling committee.c: too many arguments to function

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: PCI Device Assignment status

2010-09-06 Thread Rodrigo Campos
On Mon, Sep 06, 2010 at 03:11:32PM +0400, Konstantin Khlebnikov wrote:
> On Fri, 3 Sep 2010 19:54:00 +0400
> Rodrigo Campos  wrote:
> 
> > I wanted to know the status of PCI device assignment.
> 
> I successfully use kvm for assignment second GPU in my notebook
> (Thinkpad T500) into guest OS, two months ago. But it require several
> hacks for correct vga-bios passthrough and it work only for linux
> guests -- all windows guests hung whole system at early boot. Currently
> I haven't enough free time to finish this investigation.

Thanks, its good to know. Luckily I don't need to assign a second GPU, so I
won't need those hacks :)




Thanks a lot,
Rodrigo
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: PCI Device Assignment status

2010-09-06 Thread Rodrigo Campos
On Sun, Sep 05, 2010 at 07:00:08PM +0200, Joerg Roedel wrote:
> On Fri, Sep 03, 2010 at 04:38:00PM -0600, Alex Williamson wrote:
> > On Fri, Sep 3, 2010 at 9:54 AM, Rodrigo Campos  wrote:
> > > Hi!
> > >
> > > I wanted to know the status of PCI device assignment.
> > >
> > > As far as I can see in the webpage and in the mailing list, it seems to be
> > > working ok if you have VT-d support on the motherboard and cpu. But if it 
> > > isn't
> > > too much trouble, I wanted some confirmation about this, since I'm not 
> > > sure and
> > > I don't want to buy hardware to test this when there is no way it's going 
> > > to
> > > work :)
> > 
> > Yes, it works if you have VT-d support (Intel) or AMD IOMMU (note this
> > is different than the AMD GART that's often used as an IOMMU).  The
> > Intel boxes are a lot easier to find.
> 
> For an AMD IOMMU you just need to buy a Mainboard with the AMD 890FX
> chipset. As far as I know all available boards support IOMMU with the
> latest BIOS.
> On the server-side, look for an AMD SR56x0 chipset [where x=(5|7|9)].

Great, thanks a lot!
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 0/4 -v2] x86: update AMD CPUID bits

2010-09-06 Thread Andre Przywara
Changes from v1:
 - pull SSE5^WXOP bit from KVM features to patch 1
 - add AES and F16C to list of propagated features
 - add k...@vger to CC  ;-) 

Recently the public AMD CPUID specification
http://support.amd.com/us/Processor_TechDocs/25481.pdf
has been updated and revealed new CPUID flag feature names.
The following patches introduce them to the kernel to properly
display them in /proc/cpuinfo and allows KVM guests to use them.
Note: One bit has been renamed, so I propose patch 1/4 for inclusion
in the stable series.

Please apply!

Regards,
Andre.

--
Andre Przywara
AMD-Operating System Research Center (OSRC), Dresden, Germany


--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 4/4] x86, kvm: add new AMD SVM feature bits

2010-09-06 Thread Andre Przywara
The recently updated CPUID specification names new SVM feature bits.
Add them to the list of reported features.

Signed-off-by: Andre Przywara 
---
 arch/x86/include/asm/cpufeature.h |7 +++
 arch/x86/kernel/cpu/scattered.c   |6 ++
 2 files changed, 13 insertions(+), 0 deletions(-)

diff --git a/arch/x86/include/asm/cpufeature.h 
b/arch/x86/include/asm/cpufeature.h
index 341835d..bffeab7 100644
--- a/arch/x86/include/asm/cpufeature.h
+++ b/arch/x86/include/asm/cpufeature.h
@@ -183,6 +183,13 @@
 #define X86_FEATURE_LBRV   (8*32+ 6) /* AMD LBR Virtualization support */
 #define X86_FEATURE_SVML   (8*32+ 7) /* "svm_lock" AMD SVM locking MSR */
 #define X86_FEATURE_NRIPS  (8*32+ 8) /* "nrip_save" AMD SVM next_rip save 
*/
+#define X86_FEATURE_TSCRATEMSR  (8*32+ 9) /* "tsc_scale" AMD TSC scaling 
support */
+#define X86_FEATURE_VMCBCLEAN   (8*32+10) /* "vmcb_clean" AMD VMCB clean bits 
support */
+#define X86_FEATURE_FLUSHBYASID (8*32+11) /* AMD flush-by-ASID support */
+#define X86_FEATURE_DECODEASSISTS (8*32+12) /* AMD Decode Assists support */
+#define X86_FEATURE_PAUSEFILTER (8*32+13) /* AMD filtered pause intercept */
+#define X86_FEATURE_PFTHRESHOLD (8*32+14) /* AMD pause filter threshold */
+
 
 /* Intel-defined CPU features, CPUID level 0x0007:0 (ebx), word 9 */
 #define X86_FEATURE_FSGSBASE   (9*32+ 0) /* {RD/WR}{FS/GS}BASE instructions*/
diff --git a/arch/x86/kernel/cpu/scattered.c b/arch/x86/kernel/cpu/scattered.c
index 34b4dad..2c77931 100644
--- a/arch/x86/kernel/cpu/scattered.c
+++ b/arch/x86/kernel/cpu/scattered.c
@@ -43,6 +43,12 @@ void __cpuinit init_scattered_cpuid_features(struct 
cpuinfo_x86 *c)
{ X86_FEATURE_LBRV, CR_EDX, 1, 0x800a, 0 },
{ X86_FEATURE_SVML, CR_EDX, 2, 0x800a, 0 },
{ X86_FEATURE_NRIPS,CR_EDX, 3, 0x800a, 0 },
+   { X86_FEATURE_TSCRATEMSR,   CR_EDX, 4, 0x800a, 0 },
+   { X86_FEATURE_VMCBCLEAN,CR_EDX, 5, 0x800a, 0 },
+   { X86_FEATURE_FLUSHBYASID,  CR_EDX, 6, 0x800a, 0 },
+   { X86_FEATURE_DECODEASSISTS,CR_EDX, 7, 0x800a, 0 },
+   { X86_FEATURE_PAUSEFILTER,  CR_EDX,10, 0x800a, 0 },
+   { X86_FEATURE_PFTHRESHOLD,  CR_EDX,12, 0x800a, 0 },
{ 0, 0, 0, 0, 0 }
};
 
-- 
1.6.4


--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 2/4] x86: Update AMD CPUID feature bits

2010-09-06 Thread Andre Przywara
AMD's public CPUID specification has been updated and some bits have
got names. Add them to properly describe new CPU features.

Signed-off-by: Andre Przywara 
---
 arch/x86/include/asm/cpufeature.h |4 
 1 files changed, 4 insertions(+), 0 deletions(-)

diff --git a/arch/x86/include/asm/cpufeature.h 
b/arch/x86/include/asm/cpufeature.h
index c9c73d8..341835d 100644
--- a/arch/x86/include/asm/cpufeature.h
+++ b/arch/x86/include/asm/cpufeature.h
@@ -155,7 +155,11 @@
 #define X86_FEATURE_XOP(6*32+11) /* extended AVX instructions 
*/
 #define X86_FEATURE_SKINIT (6*32+12) /* SKINIT/STGI instructions */
 #define X86_FEATURE_WDT(6*32+13) /* Watchdog timer */
+#define X86_FEATURE_LWP(6*32+15) /* Light Weight Profiling */
+#define X86_FEATURE_FMA4   (6*32+16) /* 4 operands MAC instructions */
 #define X86_FEATURE_NODEID_MSR (6*32+19) /* NodeId MSR */
+#define X86_FEATURE_TBM(6*32+21) /* trailing bit manipulations 
*/
+#define X86_FEATURE_TOPOEXT(6*32+22) /* topology extensions CPUID leafs */
 
 /*
  * Auxiliary flags: Linux defined - For features scattered in various
-- 
1.6.4


--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 1/4] x86: Fix misnamed AMD CPUID feature bit

2010-09-06 Thread Andre Przywara
The AMD SSE5 feature set as-it has been replaced by some extensions
to the AVX instruction set. Thus the bit formerly advertised as SSE5
is re-used for one of these extensions (XOP).
Although this changes the /proc/cpuinfo output, it is not user visible, as
there are no CPUs (yet) having this feature.
To avoid confusion this should be added to the stable series, too.

Cc: sta...@kernel.org [.32.x .34.x, .35.x]
Signed-off-by: Andre Przywara 
---
 arch/x86/include/asm/cpufeature.h |2 +-
 arch/x86/kvm/x86.c|2 +-
 2 files changed, 2 insertions(+), 2 deletions(-)

diff --git a/arch/x86/include/asm/cpufeature.h 
b/arch/x86/include/asm/cpufeature.h
index 781a50b..c9c73d8 100644
--- a/arch/x86/include/asm/cpufeature.h
+++ b/arch/x86/include/asm/cpufeature.h
@@ -152,7 +152,7 @@
 #define X86_FEATURE_3DNOWPREFETCH (6*32+ 8) /* 3DNow prefetch instructions */
 #define X86_FEATURE_OSVW   (6*32+ 9) /* OS Visible Workaround */
 #define X86_FEATURE_IBS(6*32+10) /* Instruction Based Sampling 
*/
-#define X86_FEATURE_SSE5   (6*32+11) /* SSE-5 */
+#define X86_FEATURE_XOP(6*32+11) /* extended AVX instructions 
*/
 #define X86_FEATURE_SKINIT (6*32+12) /* SKINIT/STGI instructions */
 #define X86_FEATURE_WDT(6*32+13) /* Watchdog timer */
 #define X86_FEATURE_NODEID_MSR (6*32+19) /* NodeId MSR */
diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index 3a09c62..dd54779 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -1996,7 +1996,7 @@ static void do_cpuid_ent(struct kvm_cpuid_entry2 *entry, 
u32 function,
const u32 kvm_supported_word6_x86_features =
F(LAHF_LM) | F(CMP_LEGACY) | F(SVM) | 0 /* ExtApicSpace */ |
F(CR8_LEGACY) | F(ABM) | F(SSE4A) | F(MISALIGNSSE) |
-   F(3DNOWPREFETCH) | 0 /* OSVW */ | 0 /* IBS */ | F(SSE5) |
+   F(3DNOWPREFETCH) | 0 /* OSVW */ | 0 /* IBS */ | F(XOP) |
0 /* SKINIT */ | 0 /* WDT */;
 
/* all calls to cpuid_count() should be made on the same cpu */
-- 
1.6.4


--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 3/4] x86: Fix allowed CPUID bits for KVM guests

2010-09-06 Thread Andre Przywara
The AMD extensions to AVX (FMA4, XOP) work on the same YMM register set
as AVX, so they are safe for guests to use, as long as AVX itself
is allowed. Add F16C and AES on the way for the same reasons.

Signed-off-by: Andre Przywara 
---
 arch/x86/kvm/x86.c |5 +++--
 1 files changed, 3 insertions(+), 2 deletions(-)

diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index dd54779..6c2ecf0 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -1991,13 +1991,14 @@ static void do_cpuid_ent(struct kvm_cpuid_entry2 
*entry, u32 function,
0 /* Reserved */ | F(CX16) | 0 /* xTPR Update, PDCM */ |
0 /* Reserved, DCA */ | F(XMM4_1) |
F(XMM4_2) | F(X2APIC) | F(MOVBE) | F(POPCNT) |
-   0 /* Reserved, AES */ | F(XSAVE) | 0 /* OSXSAVE */ | F(AVX);
+   0 /* Reserved*/ | F(AES) | F(XSAVE) | 0 /* OSXSAVE */ | F(AVX) |
+   F(F16C);
/* cpuid 0x8001.ecx */
const u32 kvm_supported_word6_x86_features =
F(LAHF_LM) | F(CMP_LEGACY) | F(SVM) | 0 /* ExtApicSpace */ |
F(CR8_LEGACY) | F(ABM) | F(SSE4A) | F(MISALIGNSSE) |
F(3DNOWPREFETCH) | 0 /* OSVW */ | 0 /* IBS */ | F(XOP) |
-   0 /* SKINIT */ | 0 /* WDT */;
+   0 /* SKINIT, WDT, LWP */ | F(FMA4) | F(TBM);
 
/* all calls to cpuid_count() should be made on the same cpu */
get_cpu();
-- 
1.6.4


--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 3/4] x86: Fix allowed CPUID bits for KVM guests

2010-09-06 Thread Avi Kivity

 On 09/06/2010 03:05 PM, Andre Przywara wrote:




Did we really enable "sse5" before xsave?  That looks broken, but I 
guess no real harm if xsave itself is not enabled.
Yes. It somehow slipped through when you introduced the other feature 
flags to KVM. I also think this is not a serious problem.
BTW: I realized that AES is currently denied. Reading the manual I see 
that it operates on SSE registers, so it should be safe to be passed 
through. The only drawback is that it would change the visible CPUID 
on CPUs that already have AES, whereas earlier KVM versions did hide it.


This code doesn't directly affect a guest's cpuid, it merely tells host 
userspace which cpuid bits are supported by kvm.  It's perfectly fine to 
add bits as we add support, in fact this interface is what makes 
migration work across cpus with different capabilities.


This could become a problem with migration. But if you agree, I'd 
integrate this flag in the v2 series.


Shouldn't be a problem - please do.

--
error compiling committee.c: too many arguments to function

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 3/4] x86: Fix allowed CPUID bits for KVM guests

2010-09-06 Thread Andre Przywara

Avi Kivity wrote:

  On 09/03/2010 12:27 PM, Andre Przywara wrote:

The AMD extension to AVX (FMA4, XOP) work on the same YMM register set
as AVX, so they are safe for guests to use, as long as AVX itself
is allowed.

Signed-off-by: Andre Przywara
---
  arch/x86/kvm/x86.c |4 ++--
  1 files changed, 2 insertions(+), 2 deletions(-)

diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index 3a09c62..eb89e7b 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -1996,8 +1996,8 @@ static void do_cpuid_ent(struct kvm_cpuid_entry2 *entry, 
u32 function,
const u32 kvm_supported_word6_x86_features =
F(LAHF_LM) | F(CMP_LEGACY) | F(SVM) | 0 /* ExtApicSpace */ |
F(CR8_LEGACY) | F(ABM) | F(SSE4A) | F(MISALIGNSSE) |
-   F(3DNOWPREFETCH) | 0 /* OSVW */ | 0 /* IBS */ | F(SSE5) |
-   0 /* SKINIT */ | 0 /* WDT */;
+   F(3DNOWPREFETCH) | 0 /* OSVW */ | 0 /* IBS */ | F(XOP) |
+   0 /* SKINIT, WDT, LWP */ | F(FMA4) | F(TBM);



Should be folded into patch 1 to avoid build breakage.
Right you are. Thanks for spotting this. I fixed that and will sent out 
a version 2 later.


Did we really enable "sse5" before xsave?  That looks broken, but I 
guess no real harm if xsave itself is not enabled.
Yes. It somehow slipped through when you introduced the other feature 
flags to KVM. I also think this is not a serious problem.
BTW: I realized that AES is currently denied. Reading the manual I see 
that it operates on SSE registers, so it should be safe to be passed 
through. The only drawback is that it would change the visible CPUID on 
CPUs that already have AES, whereas earlier KVM versions did hide it. 
This could become a problem with migration. But if you agree, I'd 
integrate this flag in the v2 series.


Regards,
Andre.



--
Andre Przywara
AMD-Operating System Research Center (OSRC), Dresden, Germany
Tel: +49 351 448-3567-12

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[GIT PULL net-2.6] vhost-net: 2.6.36 regression fixes

2010-09-06 Thread Michael S. Tsirkin
David,
The following tree includes more regression fixes for vhost-net
in 2.6.36.  It is on top of net-2.6.
Please merge it for 2.6.36.
Thanks!

The following changes since commit 0b5d404e349c0236b11466c0a4785520c0be6982:

  pkt_sched: Fix lockdep warning on est_tree_lock in gen_estimator (2010-09-02 
13:22:11 -0700)

are available in the git repository at:
  git://git.kernel.org/pub/scm/linux/kernel/git/mst/vhost.git vhost-net

Michael S. Tsirkin (3):
  cgroups: fix API thinko
  vhost: fix attach to cgroups regression
  vhost: error handling fix

 drivers/vhost/vhost.c  |   80 ++-
 include/linux/cgroup.h |   11 ++-
 kernel/cgroup.c|9 +++--
 3 files changed, 73 insertions(+), 27 deletions(-)
-- 
MST
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: PCI Device Assignment status

2010-09-06 Thread Konstantin Khlebnikov
On Fri, 3 Sep 2010 19:54:00 +0400
Rodrigo Campos  wrote:

> I wanted to know the status of PCI device assignment.

I successfully use kvm for assignment second GPU in my notebook
(Thinkpad T500) into guest OS, two months ago. But it require several
hacks for correct vga-bios passthrough and it work only for linux
guests -- all windows guests hung whole system at early boot. Currently
I haven't enough free time to finish this investigation.
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [RFC PATCH v9 12/16] Add mp(mediate passthru) device.

2010-09-06 Thread Michael S. Tsirkin
So - does this driver help reduce service demand signifiantly?
Some comments from looking at the code:

On Fri, Aug 06, 2010 at 05:23:41PM +0800, xiaohui@intel.com wrote:
> +static struct page_info *alloc_page_info(struct page_ctor *ctor,
> + struct kiocb *iocb, struct iovec *iov,
> + int count, struct frag *frags,
> + int npages, int total)
> +{
> + int rc;
> + int i, j, n = 0;
> + int len;
> + unsigned long base, lock_limit;
> + struct page_info *info = NULL;
> +
> + lock_limit = current->signal->rlim[RLIMIT_MEMLOCK].rlim_cur;
> + lock_limit >>= PAGE_SHIFT;

Playing with rlimit on data path, transparently to the application in this way
looks strange to me, I suspect this has unexpected security implications.
Further, applications may have other uses for locked memory
besides mpassthru - you should not just take it because it's there.

Can we have an ioctl that lets userspace configure how much
memory to lock? This ioctl will decrement the rlimit and store
the data in the device structure so we can do accounting
internally. Put it back on close or on another ioctl.
Need to be careful for when this operation gets called
again with 0 or another small value while we have locked memory -
maybe just fail with EBUSY?  or wait until it gets unlocked?
Maybe 0 can be special-cased and deactivate zero-copy?.


> +
> + if (ctor->lock_pages + count > lock_limit && npages) {
> + printk(KERN_INFO "exceed the locked memory rlimit.");
> + return NULL;
> + }
> +
> + info = kmem_cache_zalloc(ext_page_info_cache, GFP_KERNEL);

You seem to fill in all memory, why zalloc? this is data path ...

> +
> + if (!info)
> + return NULL;
> +
> + for (i = j = 0; i < count; i++) {
> + base = (unsigned long)iov[i].iov_base;
> + len = iov[i].iov_len;
> +
> + if (!len)
> + continue;
> + n = ((base & ~PAGE_MASK) + len + ~PAGE_MASK) >> PAGE_SHIFT;
> +
> + rc = get_user_pages_fast(base, n, npages ? 1 : 0,

npages controls whether this is a write? Why?

> + &info->pages[j]);
> + if (rc != n)
> + goto failed;
> +
> + while (n--) {
> + frags[j].offset = base & ~PAGE_MASK;
> + frags[j].size = min_t(int, len,
> + PAGE_SIZE - frags[j].offset);
> + len -= frags[j].size;
> + base += frags[j].size;
> + j++;
> + }
> + }
> +
> +#ifdef CONFIG_HIGHMEM
> + if (npages && !(dev->features & NETIF_F_HIGHDMA)) {
> + for (i = 0; i < j; i++) {
> + if (PageHighMem(info->pages[i]))
> + goto failed;
> + }
> + }
> +#endif

Are non-highdma devices worth bothering with? If yes -
are there other limitations devices might have that we need to handle?
E.g. what about non-s/g devices or no checksum offloading?

> + skb_push(skb, ETH_HLEN);
> +
> + if (skb_is_gso(skb)) {
> + hdr.hdr.hdr_len = skb_headlen(skb);
> + hdr.hdr.gso_size = shinfo->gso_size;
> + if (shinfo->gso_type & SKB_GSO_TCPV4)
> + hdr.hdr.gso_type = VIRTIO_NET_HDR_GSO_TCPV4;
> + else if (shinfo->gso_type & SKB_GSO_TCPV6)
> + hdr.hdr.gso_type = VIRTIO_NET_HDR_GSO_TCPV6;
> + else if (shinfo->gso_type & SKB_GSO_UDP)
> + hdr.hdr.gso_type = VIRTIO_NET_HDR_GSO_UDP;
> + else
> + BUG();
> + if (shinfo->gso_type & SKB_GSO_TCP_ECN)
> + hdr.hdr.gso_type |= VIRTIO_NET_HDR_GSO_ECN;
> +
> + } else
> + hdr.hdr.gso_type = VIRTIO_NET_HDR_GSO_NONE;
> +
> + if (skb->ip_summed == CHECKSUM_PARTIAL) {
> + hdr.hdr.flags = VIRTIO_NET_HDR_F_NEEDS_CSUM;
> + hdr.hdr.csum_start =
> + skb->csum_start - skb_headroom(skb);
> + hdr.hdr.csum_offset = skb->csum_offset;
> + }

We have this code in tun, macvtap and packet socket already.
Could this be a good time to move these into networking core?
I'm not asking you to do this right now, but could this generic
virtio-net to skb stuff be encapsulated in functions?

-- 
MST
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 0/2] KVM: cleanup: get_dirty_log

2010-09-06 Thread Takuya Yoshikawa

(2010/09/04 18:24), Alexander Graf wrote:


On 03.09.2010, at 10:34, Takuya Yoshikawa wrote:


This is the 2nd version of get_dirty_log cleanup.

Changelog:
  In version 1, I changed the timing of copy_to_user() in the
  powerpc's get_dirty_log by mistake. This time, I've kept the
  timing and tests on ppc box now look OK to me!


I seem to get a pretty big chunk of false positives with this set applied. 
Qemu's vnc server tries to be clever and searches for real changes inside the 
pages it finds dirty, so the only perceptible thing there is an increased 
amount of CPU being wasted.

In MOL instead, the VNC server just directly sends pages that are marked dirty. 
And I get a full new screen update on every update cycle.

Please check with some debugging code if the dirty region really from the dirty 
bitmap is really only the one that was updated :).



Interesting behavior! This functionality might be more sensitive than I 
imagined.

I'll investigate with some debugging code as you suggest!

Thank you,
  Takuya
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 2/2] KVM test: unittest: Build unittests from new repo

2010-09-06 Thread Jason Wang
Kvm-unit-tests have moved to a repo out of qemu-kvm. This patch let
the unittest could build the tests from specified git repo.

Signed-off-by: Jason Wang 
---
 client/tests/kvm/tests/build.py   |   16 ++--
 client/tests/kvm/unittests.cfg.sample |1 +
 2 files changed, 15 insertions(+), 2 deletions(-)

diff --git a/client/tests/kvm/tests/build.py b/client/tests/kvm/tests/build.py
index 5a8f3b0..f39371a 100644
--- a/client/tests/kvm/tests/build.py
+++ b/client/tests/kvm/tests/build.py
@@ -495,18 +495,22 @@ class GitInstaller(SourceDirInstaller):
 kernel_repo = params.get("git_repo")
 user_repo = params.get("user_git_repo")
 kmod_repo = params.get("kmod_repo")
+test_repo = params.get("test_git_repo")
 
 kernel_branch = params.get("kernel_branch", "master")
 user_branch = params.get("user_branch", "master")
 kmod_branch = params.get("kmod_branch", "master")
+test_branch = params.get("test_branch", "master")
 
 kernel_lbranch = params.get("kernel_lbranch", "master")
 user_lbranch = params.get("user_lbranch", "master")
 kmod_lbranch = params.get("kmod_lbranch", "master")
+test_lbranch = params.get("test_lbranch", "master")
 
 kernel_commit = params.get("kernel_commit", None)
 user_commit = params.get("user_commit", None)
 kmod_commit = params.get("kmod_commit", None)
+test_commit = params.get("test_commit", None)
 
 kernel_patches = eval(params.get("kernel_patches", "[]"))
 user_patches = eval(params.get("user_patches", "[]"))
@@ -529,8 +533,16 @@ class GitInstaller(SourceDirInstaller):
os.path.basename(patch)))
 utils.system('patch -p1 %s' % os.path.basename(patch))
 
-unittest_cfg = os.path.join(userspace_srcdir, 'kvm', 'test', 'x86',
-'unittests.cfg')
+if test_repo:
+test_srcdir = os.path.join(self.srcdir, "kvm-unit-tests")
+kvm_utils.get_git_branch(test_repo, test_branch, test_srcdir,
+ test_commit, test_lbranch)
+unittest_cfg = os.path.join(test_srcdir, 'x86',
+'unittests.cfg')
+self.test_srcdir = test_srcdir
+else:
+unittest_cfg = os.path.join(userspace_srcdir, 'kvm', 'test', 'x86',
+'unittests.cfg')
 
 self.unittest_cfg = None
 if os.path.isfile(unittest_cfg):
diff --git a/client/tests/kvm/unittests.cfg.sample 
b/client/tests/kvm/unittests.cfg.sample
index 7ea0674..3d32cb2 100644
--- a/client/tests/kvm/unittests.cfg.sample
+++ b/client/tests/kvm/unittests.cfg.sample
@@ -58,6 +58,7 @@ variants:
 user_git_repo = 
git://git.kernel.org/pub/scm/virt/kvm/qemu-kvm.git
 user_branch = next
 user_lbranch = next
+test_git_repo = 
git://git.kernel.org/pub/scm/virt/kvm/kvm-unit-tests.git
 
 - unittest:
 type = unittest

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 1/2] KVM test: unittest: Fix the broken smp params

2010-09-06 Thread Jason Wang
Make smp param in unittests.cfg works again.

Signed-off-by: Jason Wang 
---
 client/tests/kvm/tests/unittest.py |1 +
 1 files changed, 1 insertions(+), 0 deletions(-)

diff --git a/client/tests/kvm/tests/unittest.py 
b/client/tests/kvm/tests/unittest.py
index 69c4b43..54e5f73 100644
--- a/client/tests/kvm/tests/unittest.py
+++ b/client/tests/kvm/tests/unittest.py
@@ -73,6 +73,7 @@ def run_unittest(test, params, env):
 smp = None
 if parser.has_option(t, 'smp'):
 smp = int(parser.get(t, 'smp'))
+params['smp'] = smp
 
 extra_params = None
 if parser.has_option(t, 'extra_params'):

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] KVM: don't sent IPI if the vcpu is not online

2010-09-06 Thread Alexander Graf

On 06.09.2010, at 11:14, Avi Kivity wrote:

> On 09/06/2010 12:09 PM, Alexander Graf wrote:
>> 
>>> It can also be running host kernel or user code.
>> In that case it's the same as running guest code, no? We'll pass by the vcpu 
>> entry check asap.
> 
> Sure, but the IPI is wasted.  If you spend 10% of your time in host code, you 
> can avoid 10% of the IPIs.
> 
> (actually less, since the atomic part of the guest switch has guest_mode 
> enabled)

Oh. My bad. I had a thinko here :).

Sure, sounds like a good idea.


Alex

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] KVM: don't sent IPI if the vcpu is not online

2010-09-06 Thread Avi Kivity

 On 09/06/2010 12:09 PM, Alexander Graf wrote:



It can also be running host kernel or user code.

In that case it's the same as running guest code, no? We'll pass by the vcpu 
entry check asap.


Sure, but the IPI is wasted.  If you spend 10% of your time in host 
code, you can avoid 10% of the IPIs.


(actually less, since the atomic part of the guest switch has guest_mode 
enabled)


--
error compiling committee.c: too many arguments to function

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] KVM: don't sent IPI if the vcpu is not online

2010-09-06 Thread Alexander Graf

On 06.09.2010, at 11:05, Avi Kivity wrote:

> On 09/06/2010 11:59 AM, Alexander Graf wrote:
>> On 06.09.2010, at 10:55, Avi Kivity wrote:
>> 
>>> On 09/06/2010 11:51 AM, Alexander Graf wrote:
> I think guest_mode makes sense for the other archs for reducing IPIs, so 
> let's leave it common and recommend that they implement it.  Alex, if 
> you're ever bored.
 What does the bit do? Do we have documentation on it ;)? No seriously, 
 what's the intent of the field?
 
>>> It indicates that the vcpu is currently executing guest code.  Which in 
>>> turn is important if you need to force it out of guest mode in order to 
>>> inject an interrupt or flush the tlb.
>> Well, a vcpu is either offline in halt state or in guest mode, no? So we can 
>> conclude that guest_mode == !offline&&  !halt_state.
> 
> It can also be running host kernel or user code.

In that case it's the same as running guest code, no? We'll pass by the vcpu 
entry check asap.

Alex

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] KVM: don't sent IPI if the vcpu is not online

2010-09-06 Thread Avi Kivity

 On 09/06/2010 11:59 AM, Alexander Graf wrote:

On 06.09.2010, at 10:55, Avi Kivity wrote:


On 09/06/2010 11:51 AM, Alexander Graf wrote:

I think guest_mode makes sense for the other archs for reducing IPIs, so let's 
leave it common and recommend that they implement it.  Alex, if you're ever 
bored.

What does the bit do? Do we have documentation on it ;)? No seriously, what's 
the intent of the field?


It indicates that the vcpu is currently executing guest code.  Which in turn is 
important if you need to force it out of guest mode in order to inject an 
interrupt or flush the tlb.

Well, a vcpu is either offline in halt state or in guest mode, no? So we can conclude 
that guest_mode == !offline&&  !halt_state.


It can also be running host kernel or user code.


When in halt state, we are active on the wakeup waitqueue:

static void kvmppc_decrementer_func(unsigned long data)
{
 struct kvm_vcpu *vcpu = (struct kvm_vcpu *)data;

 kvmppc_core_queue_dec(vcpu);

 if (waitqueue_active(&vcpu->wq)) {
 wake_up_interruptible(&vcpu->wq);
 vcpu->stat.halt_wakeup++;
 }
}

Shouldn't that be enough information already?


It's sufficient for correctness.  It's not optimal since you miss the 
cases where you're not running guest code.


Not sure how important it is.  Xiao, any numbers?

--
error compiling committee.c: too many arguments to function

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] KVM: don't sent IPI if the vcpu is not online

2010-09-06 Thread Alexander Graf

On 06.09.2010, at 10:55, Avi Kivity wrote:

> On 09/06/2010 11:51 AM, Alexander Graf wrote:
>> 
>>> I think guest_mode makes sense for the other archs for reducing IPIs, so 
>>> let's leave it common and recommend that they implement it.  Alex, if 
>>> you're ever bored.
>> What does the bit do? Do we have documentation on it ;)? No seriously, 
>> what's the intent of the field?
>> 
> 
> It indicates that the vcpu is currently executing guest code.  Which in turn 
> is important if you need to force it out of guest mode in order to inject an 
> interrupt or flush the tlb.

Well, a vcpu is either offline in halt state or in guest mode, no? So we can 
conclude that guest_mode == !offline && !halt_state.

When in halt state, we are active on the wakeup waitqueue:

static void kvmppc_decrementer_func(unsigned long data)
{
struct kvm_vcpu *vcpu = (struct kvm_vcpu *)data;

kvmppc_core_queue_dec(vcpu);

if (waitqueue_active(&vcpu->wq)) {
wake_up_interruptible(&vcpu->wq);
vcpu->stat.halt_wakeup++;
}
}

Shouldn't that be enough information already?


Alex

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: KVM Test report, kernel e6a9246... qemu 94f964d...

2010-09-06 Thread Avi Kivity

 On 09/06/2010 11:08 AM, Hao, Xudong wrote:



Unable to reproduce - R5u3 i386 guest installed and booted, x86_64
booted from cd, all as expected.

Do you use EPT or shadow mode? This issue only exist on shadow mode.


Shadow.  What's your command line?


But if we assign one pci device to guest, both EPT and shadow mode can 
reproduce.


I never assigned a pci device in my life...


--
error compiling committee.c: too many arguments to function

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] KVM: don't sent IPI if the vcpu is not online

2010-09-06 Thread Avi Kivity

 On 09/06/2010 11:51 AM, Alexander Graf wrote:



I think guest_mode makes sense for the other archs for reducing IPIs, so let's 
leave it common and recommend that they implement it.  Alex, if you're ever 
bored.

What does the bit do? Do we have documentation on it ;)? No seriously, what's 
the intent of the field?



It indicates that the vcpu is currently executing guest code.  Which in 
turn is important if you need to force it out of guest mode in order to 
inject an interrupt or flush the tlb.


The procedure is:

remote:
- queue a request in vcpu->requests
- IPI vcpu->cpu (new: only if in guest_mode)

vcpu:
- set guest_mode
- dequeue and execute requests in vcpu->requests
- enter guest
- clear guest_mode

--
error compiling committee.c: too many arguments to function

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] kvm_stat: ignore events that have never occured

2010-09-06 Thread Avi Kivity

 On 09/06/2010 11:39 AM, Jes Sorensen wrote:

On 09/01/10 09:17, Avi Kivity wrote:

Less cluttered display.

Signed-off-by: Avi Kivity
---
  kvm/kvm_stat |2 ++
  1 files changed, 2 insertions(+), 0 deletions(-)

diff --git a/kvm/kvm_stat b/kvm/kvm_stat
index 4a16277..d373c60 100755
--- a/kvm/kvm_stat
+++ b/kvm/kvm_stat
@@ -315,6 +315,8 @@ def tui(screen, stats):
  if row>= screen.getmaxyx()[0]:
  break
  values = s[key]
+if not values[0] and not values[1]:
+break
  col = 1
  screen.addstr(row, col, key)
  col += label_width

Is this just for the live display update, or does it also affect single
run outputs?



Just the live update.

--
error compiling committee.c: too many arguments to function

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] KVM: don't sent IPI if the vcpu is not online

2010-09-06 Thread Alexander Graf

On 06.09.2010, at 07:46, Avi Kivity wrote:

> On 09/06/2010 04:48 AM, Xiao Guangrong wrote:
>> On 09/05/2010 03:18 PM, Avi Kivity wrote:
>>>  On 09/03/2010 07:12 AM, Xiao Guangrong wrote:
 It's no need sent IPI to the vcpu which is schedule out
 
 
 @@ -86,6 +86,7 @@ struct kvm_vcpu {
   unsigned long requests;
   unsigned long guest_debug;
   int srcu_idx;
 +bool online;
>>> Why not check for guest_mode instead?
>>> 
>> Oh, i forget it...but 'vcpu->guest_mode' is only used in x86 platform,
>> and make_all_cpus_request() is a common function.
> 
> We can have a function kvm_vcpu_guest_mode() that is defined differently for 
> x86 and the other.
> 
>> So, maybe it's better use 'vcpu->online' here, and move 'guest_mode' into
>> 'vcpu->arch' ?
> 
> I think guest_mode makes sense for the other archs for reducing IPIs, so 
> let's leave it common and recommend that they implement it.  Alex, if you're 
> ever bored.

What does the bit do? Do we have documentation on it ;)? No seriously, what's 
the intent of the field?


Alex

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] kvm_stat: ignore events that have never occured

2010-09-06 Thread Jes Sorensen
On 09/01/10 09:17, Avi Kivity wrote:
> Less cluttered display.
> 
> Signed-off-by: Avi Kivity 
> ---
>  kvm/kvm_stat |2 ++
>  1 files changed, 2 insertions(+), 0 deletions(-)
> 
> diff --git a/kvm/kvm_stat b/kvm/kvm_stat
> index 4a16277..d373c60 100755
> --- a/kvm/kvm_stat
> +++ b/kvm/kvm_stat
> @@ -315,6 +315,8 @@ def tui(screen, stats):
>  if row >= screen.getmaxyx()[0]:
>  break
>  values = s[key]
> +if not values[0] and not values[1]:
> +break
>  col = 1
>  screen.addstr(row, col, key)
>  col += label_width

Is this just for the live display update, or does it also affect single
run outputs?

If the latter, it will break scripting where you do a first run, then a
second run and then calculate the result based on the changes.

Cheers,
Jes

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


RE: KVM Test report, kernel e6a9246... qemu 94f964d...

2010-09-06 Thread Hao, Xudong
Avi Kivity wrote:
>   On 09/06/2010 09:59 AM, Hao, Xudong wrote:
>> Avi Kivity wrote:
>>>On 09/06/2010 06:05 AM, Hao, Xudong wrote:
 New issue
 1. [KVM] Linux guest is too slow to boot up
 https://bugzilla.kernel.org/show_bug.cgi?id=17882
 
>>> I'll take a look.  What kernel is running in the guest?  What
>>> distribution?
>> Guest run on RHEL5u3 with it's default kernel 2.6.18.
>> 
> 
> Unable to reproduce - R5u3 i386 guest installed and booted, x86_64
> booted from cd, all as expected.

Do you use EPT or shadow mode? This issue only exist on shadow mode.
But if we assign one pci device to guest, both EPT and shadow mode can 
reproduce.

Thanks,
Xudong--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: KVM Test report, kernel e6a9246... qemu 94f964d...

2010-09-06 Thread Avi Kivity

 On 09/06/2010 09:59 AM, Hao, Xudong wrote:

Avi Kivity wrote:

   On 09/06/2010 06:05 AM, Hao, Xudong wrote:

New issue
1. [KVM] Linux guest is too slow to boot up
https://bugzilla.kernel.org/show_bug.cgi?id=17882


I'll take a look.  What kernel is running in the guest?  What
distribution?

Guest run on RHEL5u3 with it's default kernel 2.6.18.



Unable to reproduce - R5u3 i386 guest installed and booted, x86_64 
booted from cd, all as expected.


--
error compiling committee.c: too many arguments to function

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


RE: KVM Test report, kernel e6a9246... qemu 94f964d...

2010-09-06 Thread Hao, Xudong
Avi Kivity wrote:
>   On 09/06/2010 06:05 AM, Hao, Xudong wrote:
>> 
>> New issue
>> 1. [KVM] Linux guest is too slow to boot up
>> https://bugzilla.kernel.org/show_bug.cgi?id=17882
>> 
> 
> I'll take a look.  What kernel is running in the guest?  What
> distribution? 

Guest run on RHEL5u3 with it's default kernel 2.6.18.

Thanks,
Xudong--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html